Original Article

The Pharmacogenomics Journal (2010) 10, 336–346; doi:10.1038/tpj.2010.36

Batch effects in the BRLMM genotype calling algorithm influence GWAS results for the Affymetrix 500K array

K Miclaus1, R Wolfinger1, S Vega2, M Chierici3, C Furlanello3, C Lambert4, H Hong5, Li Zhang6, S Yin6 and F Goodsaid6

  1. 1SAS Institute, Cary, NC, USA
  2. 2Health Solutions Group, Microsoft, Redmond, WA, USA
  3. 3Fondazione Bruno Kessler, Trento, Italy
  4. 4Golden Helix, Bozeman, MT, USA
  5. 5National Center for Toxicological Research, FDA, Jefferson, AR, USA
  6. 6Center for Drug Evaluation and Research, FDA, Silver Spring, MD, USA

Correspondence: Dr K Miclaus, JMP Genomics, SAS Institute, 100 SAS Campus Drive, Cary, NC 27513, USA. E-mail: Kelci.Miclaus@jmp.com

Received 14 December 2009; Revised 23 March 2010; Accepted 26 April 2010.

Top

Abstract

The Affymetrix GeneChip Human Mapping 500K array is common for genome-wide association studies (GWASs). Recent findings highlight the importance of accurate genotype calling algorithms to reduce the inflation in Type I and Type II error rates. Differential results due to genotype calling errors can introduce severe bias in case–control association study results. Using data from the Wellcome Trust Case Control Consortium, 1991 individuals with coronary artery disease (CAD) and 1500 controls from the UK Blood Services (NBS) were genotyped on the Affymetrix 500K array. Different batch sizes and compositions were used in the Bayesian Robust Linear Model with Mahalanobis distance classifier (BRLMM) genotype calling algorithm to assess the batch effect on downstream association analysis. Results show that composition (cases and controls genotyped simultaneously or separate) and size (number of individuals processed by BRLMM at a time) can create 2–3% discordance in the results for quality control and statistical analysis and may contribute to the lack of reproducibility between GWASs. The changes in batch size are largely responsible for differential single-nucleotide polymorphism results, yet we observe evidence of an interactive effect of batch size and composition that contributes to discordant results in the list of significantly associated loci.

Keywords:

genotype calling error; BRLMM calling algorithm; WTCCC; GWAS; association studies