Group for Research in Decision Analysis

Bayesian Imputation-based Association Mapping

Matthew Stephens

Ongoing large-scale genetic association studies, in an attempt to identify variants and genes affecting susceptibility to common diseases, are typing hundreds of thousands of SNPs in thousands of individuals, and testing these SNPs for association with phenotypes. Although this is a large number of SNPs, an even larger number of SNPs remain untyped. For example, the International HapMap Project contains genotype data on more than 3 million SNPs, many of which will not be typed in current studies. In this talk we will describe an approach that allows these untyped SNPs to be tested for association with phenotype. The basic idea is to exploit the fact that untyped SNPs are often correlated with typed SNPs, so genotype data on typed SNPs can be used to indirectly test untyped SNPs for association with phenotypes. Specifically, our approach exploits available information about patterns of correlation among typed and untyped SNPs in a panel of densely-genotyped individuals (e.g. the HapMap samples) to explicitly predict, or "impute", the genotypes at untyped SNPs in a study sample, and then tests these imputed genotypes for association with a phenotype. By using Bayesian statistical methods we are able to take account of potential errors in these imputed genotypes. We illustrate the benefits of this approach in terms of both gain in power, and improved interpretability of association signals, particularly when comparing results across studies that have typed different SNP markers.