Groupe d’études et de recherche en analyse des décisions

Trend tests that accommodate genotyping errors

Thomas A. Louis

High-throuput SNP arrays provide estimates of genotypes for up to one million loci. These estimates are used, for example, in genome-wide association studies that relate genotype and phenotype (e.g., disease) for a sample of individuals. Common practice is to rank SNPs using test statistics, \(p\)- values or Bayesian structuring. While genotype calls are typically very accurate, genotyping errors do occur and these can greatly influence statistical analysis of genotype/phenotype associations. However, estimates of genotype uncertainty are available for some platforms. Currently, they are used to identify, for each individual, SNPs with a sufficiently uncertain call. These are set aside in evaluating associations. This approach unnecessarily reduces information and can be biased. As an improvement, we derive and study a trend test test statistic for genotype/phenotype association that takes genotype uncertainty into account, thus avoiding the need to set-aside uncertain SNPs and thereby making best use of available information.

Using simulations informed by the HapMap dataset, we show the effectiveness of this approach compared to setting aside uncertain genotype calls and to making deterministic calls. Effective- ness depends on an accurate assessment of uncertainty; with accurate assessment the approach can substantially improve identification of causal SNPs. In addition, we present a mathematical representation that reduces the need for simulation to assess performance in identifying a single, causal SNP in the context of a large number of comparator SNPs.