Original Article

The Pharmacogenomics Journal (2010) 10, 324–335; doi:10.1038/tpj.2010.46

Variability in GWAS analysis: the impact of genotype calling algorithm inconsistencies

K Miclaus1, M Chierici2, C Lambert3, L Zhang4, S Vega5, H Hong6, S Yin4, C Furlanello2, R Wolfinger1 and F Goodsaid4

  1. 1SAS Institute, Cary, NC, USA
  2. 2Fondazione Bruno Kessler, Trento, Italy
  3. 3Golden Helix, Bozeman, MT, USA
  4. 4Center for Drug Evaluation and Research, FDA, Silver Spring, MD, USA
  5. 5Health Solutions Group, Microsoft, Redmond, WA, USA
  6. 6National Center for Toxicological Research, FDA, Jefferson, AR, USA

Correspondence: Dr K Miclaus, JMP Genomics, SAS Institute, 100 SAS Campus Drive, Cary, NC 27513, USA. E-mail: Kelci.Miclaus@sas.com

Received 17 December 2009; Revised 3 May 2010; Accepted 4 May 2010.

Top

Abstract

The Genome-Wide Association Working Group (GWAWG) is part of a large-scale effort by the MicroArray Quality Consortium (MAQC) to assess the quality of genomic experiments, technologies and analyses for genome-wide association studies (GWASs). One of the aims of the working group is to assess the variability of genotype calls within and between different genotype calling algorithms using data for coronary artery disease from the Wellcome Trust Case Control Consortium (WTCCC) and the University of Ottawa Heart Institute. Our results show that the choice of genotyping algorithm (for example, Bayesian robust linear model with Mahalanobis distance classifier (BRLMM), the corrected robust linear model with maximum-likelihood-based distances (CRLMM) and CHIAMO (developed and implemented by the WTCCC)) can introduce marked variability in the results of downstream case–control association analysis for the Affymetrix 500K array. The amount of discordance between results is influenced by how samples are combined and processed through the respective genotype calling algorithm, indicating that systematic genotype errors due to computational batch effects are propagated to the list of single-nucleotide polymorphisms found to be significantly associated with the trait of interest. Further work using HapMap samples shows that inconsistencies between Affymetrix arrays and calling algorithms can lead to genotyping errors that influence downstream analysis.

Keywords:

genotype calling algorithms; BRLMM; CRLMM; CHIAMO; batch effects; GWAS; association studies