Technology-driven modern biology explores the entire human genome and its expression for associations with disease risks, drug effects/toxicities, and modifications of environmental exposure effects. Claimed benefits of this line of research are immeasurable including new/tailored prevention, screening, and treatment of various diseases as well as knowledge on disease biology/etiology. Given that as many as 1 million pieces of information per subject are being analyzed in such research and biological phenomena of interest are not deterministic, statistics plays a crucial role here.
In this talk, I will discuss two standard frameworks of scientific investigations in this area. Focus will be on relevant statistical principles that are largely ignored such as importance of alternative hypotheses, evidential interpretation of data, parameter estimation vs. significance test, and collapsibility conditions of log-linear models in assessing marginal associations. I explain why these are critical in making the statistical analysis of these massive genetic data biologically meaningful. Methods that account for these issues will be discussed, illustrating differences in biological findings. Implications on statistical training and collaborations will also be explored.