Letter | Published:

Systematic assessment of copy number variant detection via genome-wide SNP genotyping

Nature Genetics volume 40, pages 11991203 (2008) | Download Citation

Abstract

SNP genotyping has emerged as a technology to incorporate copy number variants (CNVs) into genetic analyses of human traits. However, the extent to which SNP platforms accurately capture CNVs remains unclear. Using independent, sequence-based CNV maps, we find that commonly used SNP platforms have limited or no probe coverage for a large fraction of CNVs. Despite this, in 9 samples we inferred 368 CNVs using Illumina SNP genotyping data and experimentally validated over two-thirds of these. We also developed a method (SNP-Conditional Mixture Modeling, SCIMM) to robustly genotype deletions using as few as two SNP probes. We find that HapMap SNPs are strongly correlated with 82% of common deletions, but the newest SNP platforms effectively tag about 50%. We conclude that currently available genome-wide SNP assays can capture CNVs accurately, but improvements in array designs, particularly in duplicated sequences, are necessary to facilitate more comprehensive analyses of genomic variation.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    et al. Large-scale copy number polymorphism in the human genome. Science 305, 525–528 (2004).

  2. 2.

    et al. Fine-scale structural variation of the human genome. Nat. Genet. 37, 727–732 (2005).

  3. 3.

    et al. Global variation in copy number in the human genome. Nature 444, 444–454 (2006).

  4. 4.

    et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008).

  5. 5.

    , & Mutational and selective effects on copy-number variants in the human genome. Nat. Genet. 39, S22–S29 (2007).

  6. 6.

    et al. alpha-Synuclein locus triplication causes Parkinson's disease. Science 302, 841 (2003).

  7. 7.

    et al. The influence of CCL3L1 gene-containing segmental duplications on HIV-1/AIDS susceptibility. Science 307, 1434–1440 (2005).

  8. 8.

    et al. Discovery of previously unidentified genomic disorders from the duplication architecture of the human genome. Nat. Genet. 38, 1038–1042 (2006).

  9. 9.

    et al. Diet and the evolution of human amylase gene copy number variation. Nat. Genet. 39, 1256–1260 (2007).

  10. 10.

    et al. Rare structural variants disrupt multiple genes in neurodevelopmental pathways in schizophrenia. Science 320, 539–543 (2008).

  11. 11.

    & Copy number variants and common disorders: filling the gaps and exploring complexity in genome-wide association studies. PLoS Genet. 3, 1787–1799 (2007).

  12. 12.

    & Molecular mechanisms for constitutional chromosomal rearrangements in humans. Annu. Rev. Genet. 34, 297–329 (2000).

  13. 13.

    Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).

  14. 14.

    , , , & A high-resolution survey of deletion polymorphism in the human genome. Nat. Genet. 38, 75–81 (2006).

  15. 15.

    et al. Linkage disequilibrium and heritability of copy-number polymorphisms within duplicated regions of the human genome. Am. J. Hum. Genet. 79, 275–290 (2006).

  16. 16.

    et al. Common deletion polymorphisms in the human genome. Nat. Genet. 38, 86–92 (2006).

  17. 17.

    et al. High-resolution genomic profiling of chromosomal aberrations using Infinium whole-genome genotyping. Genome Res. 16, 1136–1148 (2006).

  18. 18.

    et al. Genome-wide detection of human copy number variations using high-density DNA oligonucleotide arrays. Genome Res. 16, 1575–1584 (2006).

  19. 19.

    et al. QuantiSNP: an objective Bayes hidden-Markov model to detect and accurately map copy number variation using SNP genotyping data. Nucleic Acids Res. 35, 2013–2025 (2007).

  20. 20.

    et al. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 17, 1665–1674 (2007).

  21. 21.

    , , , & Unsupervised segmentation of continuous genomic data. Bioinformatics 23, 1424–1426 (2007).

  22. 22.

    et al. Segmental duplications and copy-number variation in the human genome. Am. J. Hum. Genet. 77, 78–88 (2005).

  23. 23.

    et al. Shotgun sequence assembly and recent segmental duplications within the human genome. Nature 431, 927–930 (2004).

  24. 24.

    , & Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B. Methodological 39, 1–38 (1977).

  25. 25.

    et al. High-throughput genotyping of intermediate-size structural variation. Hum. Mol. Genet. 15, 1159–1167 (2006).

  26. 26.

    International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005).

  27. 27.

    et al. Completing the map of human genetic variation. Nature 447, 161–165 (2007).

  28. 28.

    et al. Paired-end mapping reveals extensive structural variation in the human genome. Science 318, 420–426 (2007).

  29. 29.

    et al. Array CGH analysis of copy number variation identifies 1284 new genes variant in healthy white males: implications for association studies of complex diseases. Hum. Mol. Genet. 16, 2783–2794 (2007).

  30. 30.

    Estimating the dimension of a model. Annals of Statistics 6, 461–464 (1978).

Download references

Acknowledgements

We thank D. Peiffer and colleagues at Illumina for sharing Human 1M and HumanHap 550K genotyping data. We apologize to all colleagues whose work we could not cite because of space constraints. G.M.C. is supported by a Merck, Jane Coffin Childs Memorial Fund Postdoctoral Fellowship. T.Z. acknowledges support from the National Human Genome Research Institute (NHGRI) Interdisciplinary Training in Genomic Sciences grant T32 HG00035. J.M.K. is supported by a National Science Foundation graduate fellowship. This work was supported by the National Heart, Lung, and Blood Institute Programs for Genomic Applications grant HL066682 to D.A.N. and NHGRI grant HG004120 to E.E.E. E.E.E. is an investigator of the Howard Hughes Medical Institute.

Author information

Author notes

    • Gregory M Cooper
    •  & Troy Zerr

    These authors contributed equally to this work.

Affiliations

  1. Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA.

    • Gregory M Cooper
    • , Troy Zerr
    • , Jeffrey M Kidd
    • , Evan E Eichler
    •  & Deborah A Nickerson
  2. Howard Hughes Medical Institute.

    • Evan E Eichler

Authors

  1. Search for Gregory M Cooper in:

  2. Search for Troy Zerr in:

  3. Search for Jeffrey M Kidd in:

  4. Search for Evan E Eichler in:

  5. Search for Deborah A Nickerson in:

Corresponding author

Correspondence to Gregory M Cooper.

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Methods, Supplementary Tables 1, 3–6, 9, 10 and Supplementary Figures 1–4

Excel files

  1. 1.

    Supplementary Table 2

  2. 2.

    Supplementary Table 7

  3. 3.

    Supplementary Table 8

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/ng.236

Further reading