Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Original Article
  • Published:

An interactive effect of batch size and composition contributes to discordant results in GWAS with the CHIAMO genotyping algorithm

Abstract

The discordance in results between independent genome-wide association studies (GWAS) indicates the potential for Type I and Type II errors. To identify the causes of variability underlying lack of reproducibility, here we present the results of a repeatability experiment on GWAS on a cohort of 1991 coronary artery disease individuals and 1500 controls (National Blood Service) provided by the Wellcome Trust Case Control Consortium. As part of the MicroArray Quality Control project, we identified quality control (QC) and association analysis steps with a major impact on the identification of candidate markers for possible classifiers. Different experimental conditions were used with the CHIAMO calling algorithm to assess the effects of batch size and case–control composition on downstream association analysis. Results showed that both composition and size create discordant single-nucleotide polymorphism (SNP) results for QC and statistical analysis and may contribute to the lack of reproducibility in GWAS. An interactive effect of batch size and composition contributes to discordant results in significantly associated loci. About 800 significant SNPs (Cochran–Armitage trend test, P<5.0 × 10−7) were found for batches of 2000 samples with separated cases and controls, whereas only 14 significant markers were found with one batch of all samples.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6

Similar content being viewed by others

References

  1. Sladek R, Rocheleau G, Rung J, Dina C, Shen L, Serre D et al. A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature 2007; 445: 881–885.

    Article  CAS  PubMed  Google Scholar 

  2. The Wellcome Trust Case Control Consortium. Genome-wide association study of 14 000 cases of seven common diseases and 3000 shared controls. Nature 2007; 447: 661–678.

    Article  PubMed Central  Google Scholar 

  3. Thorleifsson G, Walters GB, Gudbjartsson DF, Steinthorsdottir V, Sulem P, Helgadottir A et al. Genome-wide association yields new sequence variants at seven loci that associate with measures of obesity. Nat Genet 2009; 41: 18–24.

    Article  CAS  PubMed  Google Scholar 

  4. Shete S, Hosking FJ, Robertson LB, Dobbins SE, Sanson M, Malmer B et al. Genome-wide association study identifies five susceptibility loci for glioma. Nat Genet 2009; 41: 899–906.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Ioannidis JPA, Allison DB, Ball CA, Coulibaly I, Cui X, Culhane AC et al. Repeatability of published microarray gene expression analyses. Nat Genet 2009; 41: 149–155.

    Article  CAS  PubMed  Google Scholar 

  6. Zeggini E, Weedon MN, Lindgren CM, Frayling TM, Elliott KS, Lango H et al. Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science 2007; 316: 1336–1341.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Scott LJ, Mohlke KL, Bonnycastle LL, Willer CJ, Li Y, Duren WL et al. A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science 2007; 316: 1341–1345.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Easton DF, Pooley KA, Dunning AM, Pharoah PDP, Thompson D, Ballinger DG et al. Genome-wide association study identifies novel breast cancer susceptibility loci. Nature 2007; 447: 1087–1093.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Ioannidis JPA . Non-replication and inconsistency in the genome-wide association setting. Hum Hered 2007; 64: 203–213.

    Article  CAS  PubMed  Google Scholar 

  10. Ioannidis JPA, Patsopoulos NA, Evangelou E . Heterogeneity in meta-analyses of genome-wide association investigations. PLoS One 2007; 2: e841.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Kavvoura FK, McQueen MB, Khoury MJ, Tanzi RE, Bertram L, Ioannidis JPA . Evaluation of the potential excess of statistically significant findings in published genetic association studies: application to Alzheimer's disease. Am J Epidemiol 2008; 168: 855–865.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Zeggini E, Ioannidis JPA . Meta-analysis in genome-wide association studies. Pharmacogenomics 2009; 10: 191–201.

    Article  PubMed  Google Scholar 

  13. Kingsmore SF, Lindquist IE, Mudge J, Gessler DD, Beavis WD . Genome-wide association studies: progress and potential for drug discovery and development. Nat Rev Drug Discov 2008; 7: 221–230.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Ioannidis JPA, Trikalinos TA, Khoury MJ . Implications of small effect sizes of individual genetic variants on the design and interpretation of genetic association studies of complex diseases. Am J Epidemiol 2006; 164: 609–614.

    Article  PubMed  Google Scholar 

  15. Kraft P, Zeggini E, Ioannidis JPA . Replication in genome-wide association studies. Stat Sci 2009; 24: 561–573.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Donnelly P . Progress and challenges in genome-wide association studies in humans. Nature 2008; 456: 728–731.

    Article  CAS  PubMed  Google Scholar 

  17. Saxena R, Voight BF, Lyssenko V, Burtt NP, de Bakker P, Chen H et al. Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science 2007; 316: 1331–1336.

    Article  CAS  PubMed  Google Scholar 

  18. Gold B, Kirchhoff T, Stefanov S, Lautenberger JL, Viale A, Garber J et al. Genome-wide association study provides evidence for a breast cancer risk locus at 6q22.33. Proc Natl Acad Sci USA 2008; 105: 4340–4345.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Affymetrix White Paper Publication. BRLMM: an improved genotype calling method for the GeneChip Human Mapping 500K array set 2006.

  20. Hua J, Craig DW, Brun M, Webster J, Zismann V, Tembe W et al. SNiPer-HD: improved genotype calling accuracy by an expectation-maximization algorithm for high-density SNP arrays. Bioinformatics 2007; 23: 57–63.

    Article  CAS  PubMed  Google Scholar 

  21. Carvalho B, Bengtsson H, Speed TP, Irizarry RA . Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data. Biostatistics 2007; 8: 485–499.

    Article  PubMed  Google Scholar 

  22. Hong H, Su Z, Ge W, Shi L, Perkins R, Fang H et al. Assessing batch effects of genotype calling algorithm BRLMM for the Affymetrix GeneChip Human Mapping 500K array set using 270 HapMap samples. BMC Bioinf 2008; 9: S17.

    Article  Google Scholar 

  23. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D et al. PLINK: a toolset for whole-genome association and population-based linkage analysis. Am J Hum Genet 2007; 81: 559–575.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Armitage P . Tests for linear trends in proportions and frequencies. Biometrics 1971; 11: 375–386.

    Article  Google Scholar 

  25. Lin S, Carvalho B, Cutler DJ, Arking DE, Chakravarti A, Irizarry RA . Validation and extension of an empirical Bayes method for SNP calling on Affymetrix microarrays. Genome Biol 2008; 9: R63.

    Article  PubMed  PubMed Central  Google Scholar 

  26. MAQC Consortium. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol 2006; 24: 1151–1161.

    Article  PubMed Central  Google Scholar 

Download references

Acknowledgements

We acknowledge the contribution of S Paoli and R Flor for assistance in developing and administrating the FBK high-performance computing facility, on which most of the analyses were run, and of M Showe for useful comments on an earlier version of the paper. We truly thank F Goodsaid for scientific motivation and coordination of the MAQC Genome Wide Association working group and Dr George Wells from the University of Ottawa Heart Institute for providing access to the data.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M Chierici.

Ethics declarations

Competing interests

The authors declare no conflict of interest.

PowerPoint slides

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chierici, M., Miclaus, K., Vega, S. et al. An interactive effect of batch size and composition contributes to discordant results in GWAS with the CHIAMO genotyping algorithm. Pharmacogenomics J 10, 355–363 (2010). https://doi.org/10.1038/tpj.2010.47

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/tpj.2010.47

Keywords

This article is cited by

Search

Quick links