Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Integrated detection and population-genetic analysis of SNPs and copy number variation

Abstract

Dissecting the genetic basis of disease risk requires measuring all forms of genetic variation, including SNPs and copy number variants (CNVs), and is enabled by accurate maps of their locations, frequencies and population-genetic properties. We designed a hybrid genotyping array (Affymetrix SNP 6.0) to simultaneously measure 906,600 SNPs and copy number at 1.8 million genomic locations. By characterizing 270 HapMap samples, we developed a map of human CNV (at 2-kb breakpoint resolution) informed by integer genotypes for 1,320 copy number polymorphisms (CNPs) that segregate at an allele frequency >1%. More than 80% of the sequence in previously reported CNV regions fell outside our estimated CNV boundaries, indicating that large (>100 kb) CNVs affect much less of the genome than initially reported. Approximately 80% of observed copy number differences between pairs of individuals were due to common CNPs with an allele frequency >5%, and more than 99% derived from inheritance rather than new mutation. Most common, diallelic CNPs were in strong linkage disequilibrium with SNPs, and most low-frequency CNVs segregated on specific SNP haplotypes.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Design of new microarrays.
Figure 2: Discovery and sizes of CNV regions.
Figure 3: Classes of copy number variants and reproducibility and discrete quality of copy-number measurements.
Figure 4: Linkage disequilibrium properties of CNPs.
Figure 5: Dissection of complex CNVs.
Figure 6: Capture of CNPs in genome-wide association studies via direct interrogation and linkage disequilibrium.

Similar content being viewed by others

References

  1. Sebat, J. et al. Large-scale copy number polymorphism in the human genome. Science 305, 525–528 (2004).

    Article  CAS  Google Scholar 

  2. Iafrate, A.J. et al. Detection of large-scale variation in the human genome. Nat. Genet. 36, 949–951 (2004).

    Article  CAS  Google Scholar 

  3. Tuzun, E. et al. Fine-scale structural variation of the human genome. Nat. Genet. 37, 727–732 (2005).

    Article  CAS  Google Scholar 

  4. Sharp, A.J. et al. Segmental duplications and copy-number variation in the human genome. Am. J. Hum. Genet. 77, 78–88 (2005).

    Article  CAS  Google Scholar 

  5. Hinds, D.A., Kloek, A.P., Jen, M., Chen, X. & Frazer, K.A. Common deletions and SNPs are in linkage disequilibrium in the human genome. Nat. Genet. 38, 82–85 (2006).

    Article  CAS  Google Scholar 

  6. Conrad, D.F., Andrews, T.D., Carter, N.P., Hurles, M.E. & Pritchard, J.K. A high-resolution survey of deletion polymorphism in the human genome. Nat. Genet. 38, 75–81 (2006).

    Article  CAS  Google Scholar 

  7. McCarroll, S.A. et al. Common deletion polymorphisms in the human genome. Nat. Genet. 38, 86–92 (2006).

    Article  CAS  Google Scholar 

  8. Locke, D.P. et al. Linkage disequilibrium and heritability of copy-number polymorphisms within duplicated regions of the human genome. Am. J. Hum. Genet. 79, 275–290 (2006).

    Article  CAS  Google Scholar 

  9. Redon, R. et al. Global variation in copy number in the human genome. Nature 444, 444–454 (2006).

    Article  CAS  Google Scholar 

  10. Korbel, J.O. et al. Paired-end mapping reveals extensive structural variation in the human genome. Science 318, 420–426 (2007).

    Article  CAS  Google Scholar 

  11. Kidd, J.M. et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008).

    Article  CAS  Google Scholar 

  12. McCarroll, S.A. & Altshuler, D.M. Copy-number variation and association studies of human disease. Nat. Genet. 39, S37–S42 (2007).

    Article  CAS  Google Scholar 

  13. The International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005).

  14. Frazer, K.A. et al. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–861 (2007).

    Article  CAS  Google Scholar 

  15. Smemo, S. & Borevitz, J.O. Redundancy in genotyping arrays. PLoS ONE 2, e287 (2007).

    Article  Google Scholar 

  16. Antipova, A.A., Tamayo, P. & Golub, T.R. A strategy for oligonucleotide microarray probe reduction. Genome Biol 3, RESEARCH0073 (2002).

  17. Shen, F. et al. Improved detection of global copy number variation using high density, non-polymorphic oligonucleotide probes. BMC Genet. 9, 27 (2008).

    Article  Google Scholar 

  18. Korn, J.M. et al. Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nat. Genet. advance online publication, 10.1038/ng.237 (7 September 2008).

  19. Stranger, B.E. et al. Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science 315, 848–853 (2007).

    Article  CAS  Google Scholar 

  20. Cooper, G.M., Nickerson, D.A. & Eichler, E.E. Mutational and selective effects on copy-number variants in the human genome. Nat. Genet. 39, S22–S29 (2007).

    Article  CAS  Google Scholar 

  21. McCarroll, S.A. Copy-number analysis goes more than skin deep. Nat. Genet. 40, 5–6 (2008).

    Article  CAS  Google Scholar 

  22. Zhang, J., Feuk, L., Duggan, G.E., Khaja, R. & Scherer, S.W. Development of bioinformatics resources for display and analysis of copy number and other structural variants in the human genome. Cytogenet. Genome Res. 115, 205–214 (2006).

    Article  CAS  Google Scholar 

  23. Scherer, S.W. et al. Challenges and standards in integrating surveys of structural variation. Nat. Genet. 39, S7–S15 (2007).

    Article  CAS  Google Scholar 

  24. Kidd, J.M., Newman, T.L., Tuzun, E., Kaul, R. & Eichler, E.E. Population stratification of a common APOBEC gene deletion polymorphism. PLoS Genet. 3, e63 (2007).

    Article  Google Scholar 

  25. Perry, G.H. et al. Diet and the evolution of human amylase gene copy number variation. Nat. Genet. 39, 1256–1260 (2007).

    Article  CAS  Google Scholar 

  26. Jakobsson, M. et al. Genotype, haplotype and copy-number variation in worldwide human populations. Nature 451, 998–1003 (2008).

    Article  CAS  Google Scholar 

  27. McCarroll, S.A. et al. Deletion polymorphism upstream of IRGM associated with altered IRGM expression and Crohn's disease. Nat. Genet. advance online publication, 10.1038/ng.215 (24 August 2008).

  28. Cohen, J.C., Boerwinkle, E., Mosley, T.H. Jr & Hobbs, H.H. Sequence variations in PCSK9, low LDL, and protection against coronary heart disease. N. Engl. J. Med. 354, 1264–1272 (2006).

    Article  CAS  Google Scholar 

  29. Cohen, J.C. et al. Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science 305, 869–872 (2004).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank J. Kidd, G. Cooper and E. Eichler for sharing data on high-resolution breakpoints of select CNVs prior to its publication; E. Lander, J. Hirschhorn and S. Kathiresan for thoughtful readings of the manuscript; the Affymetrix team of the Broad Institute Genetic Analysis Platform: W. Brodeur, N. Chia, M. DaSilva, J. Gibbons, N. Houde, M. McConnell, R. Barry, K. Nguyen, J. Camarata, M. Fava and T. Nyinjee under the supervision of C. Gates, B. Blumenstiel, D. Gage and M. Parkin; members of the Affymetrix informatics team: X. Di, H. Gorrell, G. Liu, M. Mittmann, M. Shen, C. Sugnet, A. Willams and G. Yang; members of the Affymetrix arrays and assays team: T. Berntsen, M. Chadha, J. Law, H. Matsuzaki, B. Nguyen, K. Travers, N. Vissa and S. Walsh. S.A.M. was supported by a Lilly Life Sciences Research Fellowship.

Author information

Authors and Affiliations

Authors

Contributions

F.G.K. conceived a strategy for empirical probe reduction of SNP probe sets. S.A.M. conceived of hybrid arrays consisting of polymorphic (SNP) and nonpolymorphic (copy number) probes. F.G.K., S.A.M. and D.A. proposed to Affymetrix a specific redesign of the 500K SNP array based on these concepts. The idea was further developed with input from R.R., J.B., S.C., S.L., K.W.J., S.B.G. and M.J.D., and a pilot initiated. For the pilot (which became the SNP 5.0 array), F.G.K. and J.B.M. selected SNP probe sets, and S.A.M. and J.M.K. selected copy number probes. For the development of the SNP 6.0 array, R.M. directed laboratory SNP screening experiments which were analyzed by S.C., E.H. and T.W. P.I.W.d.B., J.B.M. and S.C. selected SNPs from those which passed the screening effort, using a linkage-disequilibrium tagging strategy. S.A.M. and M.H.S. designed and M.H.S. directed laboratory work for the titration experiment that guided empirical selection of copy number probes; on the basis of these results, together with informatic analyses which A.K. performed, S.A.M. and J.M.K. selected copy number probes. Laboratory experiments at Broad Institute were led by M.P. and S.B.G. A.W., J.N., R.H. and E.H. developed supporting software. S.A.M., J.M.K. and J.N. analyzed the data to identify CNVs. S.A.M., J.N., F.G.K. and J.M.K. developed CNP genotyping analysis. P.J.C. conducted and J.V. analyzed experiments to validate CNP genotypes experimentally. S.A.M. analyzed the population-genetic and linkage-disequilibrium properties of CNVs. J.M.K. analyzed the data for evidence of de novo CNVs. A.L.E. analyzed platforms' coverage of CNVs. S.A.M., F.G.K., J.M.K., M.J.D. and D.A. wrote the manuscript. Discussions among all authors informed the array design, the development of algorithms for analysis and the interpretation of results.

Corresponding authors

Correspondence to Steven A McCarroll or David Altshuler.

Ethics declarations

Competing interests

S.C., M.H.S., E.H., T.W., R.M., S.L., J.B., K.J. and R.R. are employees of Affymetrix. The remaining authors (S.A.M., F.G.K., J.M.K., J.N., A.W., P.I.W.dB., J.M., A.K., A.L.E., M.P., R.H., M.J.B., S.B.G. and D.A.) neither personally nor institutionally receive financial support from Affymetrix, and neither the authors nor their employers receive compensation or royalties from the work described in this article.

Supplementary information

Supplementary Text and Figures

Supplementary Methods, Supplementary Tables 1, 4 and Supplementary Figures 1 and 2 (PDF 1559 kb)

Supplementary Table 2

Genomic locations of copy-number polymorphisms (XLS 147 kb)

Supplementary Table 3

Sample-level determinations of integer copy number for each CNP (XLS 2419 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

McCarroll, S., Kuruvilla, F., Korn, J. et al. Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat Genet 40, 1166–1174 (2008). https://doi.org/10.1038/ng.238

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng.238

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing