Abstract
We have systematically compared copy number variant (CNV) detection on eleven microarrays to evaluate data quality and CNV calling, reproducibility, concordance across array platforms and laboratory sites, breakpoint accuracy and analysis tool variability. Different analytic tools applied to the same raw data typically yield CNV calls with <50% concordance. Moreover, reproducibility in replicate experiments is <70% for most platforms. Nevertheless, these findings should not preclude detection of large CNVs for clinical diagnostic purposes because large CNVs with poor reproducibility are found primarily in complex genomic regions and would typically be removed by standard clinical data curation. The striking differences between CNV calls from different platforms and analytic tools highlight the importance of careful assessment of experimental design in discovery and association studies and of strict data curation and filtering in diagnostics. The CNV resource presented here allows independent data evaluation and provides a means to benchmark new algorithms.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
Purchase on Springer Link
Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Accession codes
Change history
29 May 2011
In the version of this article initially published online, Bhooma Thiruvahindrapuram’s name was misspelled. The error has been corrected for the print, PDF and HTML versions of this article.
References
Iafrate, A.J. et al. Detection of large-scale variation in the human genome. Nat. Genet. 36, 949–951 (2004).
Redon, R. et al. Global variation in copy number in the human genome. Nature 444, 444–454 (2006).
Sebat, J. et al. Large-scale copy number polymorphism in the human genome. Science 305, 525–528 (2004).
Tuzun, E. et al. Fine-scale structural variation of the human genome. Nat. Genet. 37, 727–732 (2005).
Zhang, J., Feuk, L., Duggan, G.E., Khaja, R. & Scherer, S.W. Development of bioinformatics resources for display and analysis of copy number and other structural variants in the human genome. Cytogenet. Genome Res. 115, 205–214 (2006).
Cho, E.K. et al. Array-based comparative genomic hybridization and copy number variation in cancer research. Cytogenet. Genome Res. 115, 262–272 (2006).
Diskin, S.J. et al. Copy number variation at 1q21.1 associated with neuroblastoma. Nature 459, 987–991 (2009).
Shlien, A. et al. Excessive genomic DNA copy number variation in the Li-Fraumeni cancer predisposition syndrome. Proc. Natl. Acad. Sci. USA 105, 11264–11269 (2008).
Beaudet, A.L. & Belmont, J.W. Array-based DNA diagnostics: let the revolution begin. Annu. Rev. Med. 59, 113–129 (2008).
Lee, C., Iafrate, A.J. & Brothman, A.R. Copy number variations and clinical cytogenetic diagnosis of constitutional disorders. Nat. Genet. 39, S48–S54 (2007).
Conrad, D.F. et al. Origins and functional impact of copy number variation in the human genome. Nature 464, 704–712 (2010).
McCarroll, S.A. et al. Deletion polymorphism upstream of IRGM associated with altered IRGM expression and Crohn's disease. Nat. Genet. 40, 1107–1112 (2008).
Pinto, D. et al. Functional impact of global rare copy number variation in autism spectrum disorders. Nature 466, 368–372 (2010).
Wellcome Trust Case Control Consortium. Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls. Nature 464, 713–720 (2010).
The DNA microarray market. UBS Investment Research Q-Series (2006).
Carson, A.R., Feuk, L., Mohammed, M. & Scherer, S.W. Strategies for the detection of copy number and other structural variants in the human genome. Hum. Genomics 2, 403–414 (2006).
Pang, A.W. et al. Towards a comprehensive structural variation map of an individual human genome. Genome Biol. 11, R52 (2010).
Miller, D.T. et al. Consensus statement: chromosomal microarray is a first-tier clinical diagnostic test for individuals with developmental disabilities or congenital anomalies. Am. J. Hum. Genet. 86, 749–764 (2010).
Pinkel, D. et al. High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nat. Genet. 20, 207–211 (1998).
Huang, J. et al. Whole genome DNA copy number changes identified by high density oligonucleotide arrays. Hum. Genomics 1, 287–299 (2004).
Scherer, S.W. et al. Challenges and standards in integrating surveys of structural variation. Nat. Genet. 39, S7–S15 (2007).
The International HapMap Consortium. The International HapMap Project. Nature 426, 789–796 (2003).
Eichler, E.E. Widening the spectrum of human genetic variation. Nat. Genet. 38, 9–11 (2006).
Locke, D.P. et al. Linkage disequilibrium and heritability of copy-number polymorphisms within duplicated regions of the human genome. Am. J. Hum. Genet. 79, 275–290 (2006).
Pinto, D., Marshall, C., Feuk, L. & Scherer, S.W. Copy-number variation in control population cohorts. Hum. Mol. Genet. 16, R168–R173 (2007).
Lai, W.R., Johnson, M.D., Kucherlapati, R. & Park, P.J. Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data. Bioinformatics 21, 3763–3770 (2005).
Winchester, L., Yau, C. & Ragoussis, J. Comparing CNV detection methods for SNP arrays. Brief. Funct. Genomics 8, 353–366 (2009).
Irizarry, R.A. et al. Multiple-laboratory comparison of microarray platforms. Nat. Methods 2, 345–350 (2005).
Kothapalli, R., Yoder, S.J., Mane, S. & Loughran, T.P. Jr. Microarray results: how accurate are they? BMC Bioinformatics 3, 22 (2002).
Tan, P.K. et al. Evaluation of gene expression measurements from commercial microarray platforms. Nucleic Acids Res. 31, 5676–5684 (2003).
Zhang, Z.F. et al. Detection of submicroscopic constitutional chromosome aberrations in clinical diagnostics: a validation of the practical performance of different array platforms. Eur. J. Hum. Genet. 16, 786–792 (2008).
Baumbusch, L.O. et al. Comparison of the Agilent, ROMA/NimbleGen and Illumina platforms for classification of copy number alterations in human breast tumors. BMC Genomics 9, 379 (2008).
Curtis, C. et al. The pitfalls of platform comparison: DNA copy number array technologies assessed. BMC Genomics 10, 588 (2009).
Coe, B.P. et al. Resolving the resolution of array CGH. Genomics 89, 647–653 (2007).
Greshock, J. et al. A comparison of DNA copy number profiling platforms. Cancer Res. 67, 10173–10180 (2007).
Hehir-Kwa, J.Y. et al. Genome-wide copy number profiling on high-density bacterial artificial chromosomes, single-nucleotide polymorphisms, and oligonucleotide microarrays: a platform comparison based on statistical power analysis. DNA Res. 14, 1–11 (2007).
Hester, S.D. et al. Comparison of comparative genomic hybridization technologies across microarray platforms. J. Biomol. Tech. 20, 135–151 (2009).
Wicker, N. et al. A new look towards BAC-based array CGH through a comprehensive comparison with oligo-based array CGH. BMC Genomics 8, 84 (2007).
Dellinger, A.E. et al. Comparative analyses of seven algorithms for copy number variant identification from single nucleotide polymorphism arrays. Nucleic Acids Res. 38, e105 (2010).
Matsuzaki, H., Wang, P.H., Hu, J., Rava, R. & Fu, G.K. High resolution discovery and confirmation of copy number variants in 90 Yoruba Nigerians. Genome Biol. 10, R125 (2009).
Kidd, J.M. et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008).
Korn, J.M. et al. Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nat. Genet. 40, 1253–1260 (2008).
Conrad, D.F. et al. Mutation spectrum revealed by breakpoint sequencing of human germline CNVs. Nat. Genet. 42, 385–391 (2010).
Lam, H.Y. et al. Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library. Nat. Biotechnol. 28, 47–55 (2010).
The 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).
Mills, R.E. et al. Mapping copy number variation by population-scale genome sequencing. Nature 470, 59–65 (2011).
Marshall, C.R. et al. Structural variation of chromosomes in autism spectrum disorder. Am. J. Hum. Genet. 82, 477–488 (2008).
Xu, B. et al. Strong association of de novo copy number mutations with sporadic schizophrenia. Nat. Genet. 40, 880–885 (2008).
Leek, J.T. et al. Tackling the widespread and critical impact of batch effects in high-throughput data. Nat. Rev. Genet. 11, 733–739 (2010).
Acknowledgements
We thank J. Rickaby and M. Lee for excellent technical assistance. We thank colleagues at Affymetrix, Agilent, Illumina and NimbleGen, and Biodiscovery for sharing data, sharing software and technical assistance. The Toronto Centre for Applied Genomics at the Hospital for Sick Children is acknowledged for database, technical assistance and bioinformatics support. This work was supported by funding from the Genome Canada/Ontario Genomics Institute, the Canadian Institutes of Health Research (CIHR), the McLaughlin Centre, the Canadian Institute of Advanced Research, the Hospital for Sick Children (SickKids) Foundation, a Broad SPARC Project award to P.K.D. and C.L., US National Institutes of Health (NIH) grant HD055150 to P.K.D., and the Department of Pathology at Brigham and Women's Hospital in Boston and NIH grants HG005209, HG004221 and CA111560 to C.L. N.P.C., D. Rajan, D. Rigler, T.F., S.G. and E.P. are supported by the Wellcome Trust (grant no. WT077008). D.P. is supported by fellowships from the Canadian Institutes of Health Research (no. 213997) and the Netherlands Organization for Scientific Research (Rubicon 825.06.031). X.S. is supported by a T32 Harvard Medical School training grant, and K.N. is supported by a T32 institutional training grant (HD007396). S.W.S. holds the GlaxoSmithKline-CIHR Pathfinder Chair in Genetics and Genomics at the University of Toronto and the Hospital for Sick Children (Canada). L.F. is supported by the Göran Gustafsson Foundation and the Swedish Foundation for Strategic Research.
Author information
Authors and Affiliations
Contributions
D.P., C.L., N.P.C., M.E.H., S.W.S. and L.F. conceived and designed the study. D.P. and L.F. coordinated sample distribution, experiments and analysis. K.D. managed the experiments conceived at the Harvard Medical School and performed the Nexus analysis. R.S.S., D. Rajan, D. Rigler, T.F., J.H.P., K.N., S.G. and E.P. performed the experiments. Data analyses were performed by D.P., K.D., R.S.S., D. Rajan, T.F., A.C.L., B.T., J.R.M., R.M., A.P., K.N., X.S., P.K.D. and L.F. All authors participated in discussions of different parts of the study. D.P., C.L., S.W.S. and L.F. wrote the manuscript. All authors read and approved the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare competing interests. Affymetrix, Agilent, Illumina and Nimblegen provided arrays or reagents for use in this study at substantial discount. The Centre for Applied Genomics (TCAG) routinely provides fee-for-service experimentation using products from Affymetrix, Agilent and Illumina, and is a Core Lab for Affymetrix and Illumina. S.W.S. belongs to the Scientific Advisory Board of Combimatrix Diagnostics.
Supplementary information
Supplementary Text and Figures
Supplementary Methods, Supplementary Tables 1, 2, 4–6, and Supplementary Figs. 1–15 (PDF 2872 kb)
Supplementary Table 3
List of all CNVs that passed QC. (XLS 15495 kb)
Rights and permissions
About this article
Cite this article
Pinto, D., Darvishi, K., Shi, X. et al. Comprehensive assessment of array-based platforms and calling algorithms for detection of copy number variants. Nat Biotechnol 29, 512–520 (2011). https://doi.org/10.1038/nbt.1852
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nbt.1852
This article is cited by
-
Genome-wide investigation to assess copy number variants in the Italian local chicken population
Journal of Animal Science and Biotechnology (2024)
-
Frequent copy number variants in a cohort of Mexican-Mestizo individuals
Molecular Cytogenetics (2023)
-
Exome-wide analysis of copy number variation shows association of the human leukocyte antigen region with asthma in UK Biobank
BMC Medical Genomics (2022)
-
Genome-wide identification of copy number variation and association with fat deposition in thin and fat-tailed sheep breeds
Scientific Reports (2022)
-
A genome-wide scan of copy number variants in three Iranian indigenous river buffaloes
BMC Genomics (2021)