Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Personalized copy number and segmental duplication maps using next-generation sequencing

Abstract

Despite their importance in gene innovation and phenotypic variation, duplicated regions have remained largely intractable owing to difficulties in accurately resolving their structure, copy number and sequence content. We present an algorithm (mrFAST) to comprehensively map next-generation sequence reads, which allows for the prediction of absolute copy-number variation of duplicated segments and genes. We examine three human genomes and experimentally validate genome-wide copy number differences. We estimate that, on average, 73–87 genes vary in copy number between any two individuals and find that these genic differences overwhelmingly correspond to segmental duplications (odds ratio = 135; P < 2.2 × 10−16). Our method can distinguish between different copies of highly identical genes, providing a more accurate assessment of gene content and insight into functional constraint without the limitations of array-based technology.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Figure 1: Correlation of predicted and known segmental duplications in NA18507.
Figure 2: Computational prediction and array CGH validation of segmental duplication copy number differences for three human genomes.
Figure 3: Validation of individual specific segmental duplications.
Figure 4: Correlation between computational and experimental copy number for NA18507 and JDW.
Figure 5: FISH validation.
Figure 6: Copy number differences between unique and duplicated regions.

References

  1. Bailey, J.A. et al. Recent segmental duplications in the human genome. Science 297, 1003–1007 (2002).

    Article  CAS  Google Scholar 

  2. Iafrate, A.J. et al. Detection of large-scale variation in the human genome. Nat. Genet. 36, 949–951 (2004).

    Article  CAS  Google Scholar 

  3. Redon, R. et al. Global variation in copy number in the human genome. Nature 444, 444–454 (2006).

    Article  CAS  Google Scholar 

  4. Kidd, J.M. et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008).

    Article  CAS  Google Scholar 

  5. Fanciulli, M. et al. FCGR3B copy number variation is associated with susceptibility to systemic, but not organ-specific, autoimmunity. Nat. Genet. 39, 721–723 (2007).

    Article  CAS  Google Scholar 

  6. Aitman, T.J. et al. Copy number polymorphism in Fcgr3 predisposes to glomerulonephritis in rats and humans. Nature 439, 851–855 (2006).

    Article  CAS  Google Scholar 

  7. Gonzalez, E. et al. The influence of CCL3L1 gene-containing segmental duplications on HIV-1/AIDS susceptibility. Science 307, 1434–1440 (2005).

    Article  CAS  Google Scholar 

  8. Fellermann, K. et al. A chromosome 8 gene-cluster polymorphism with low human beta-defensin 2 gene copy number predisposes to Crohn disease of the colon. Am. J. Hum. Genet. 79, 439–448 (2006).

    Article  CAS  Google Scholar 

  9. Yang, Y. et al. Gene copy-number variation and associated polymorphisms of complement component C4 in human systemic lupus erythematosus (SLE): low copy number is a risk factor for and high copy number is a protective factor against SLE susceptibility in European Americans. Am. J. Hum. Genet. 80, 1037–1054 (2007).

    Article  CAS  Google Scholar 

  10. Hollox, E.J. et al. Psoriasis is associated with increased beta-defensin genomic copy number. Nat. Genet. 40, 23–25 (2008).

    Article  CAS  Google Scholar 

  11. Estivill, X. et al. Chromosomal regions containing high-density and ambiguously mapped putative single nucleotide polymorphisms (SNPs) correlate with segmental duplications in the human genome. Hum. Mol. Genet. 11, 1987–1995 (2002).

    Article  CAS  Google Scholar 

  12. Locke, D.P. et al. Linkage disequilibrium and heritability of copy-number polymorphisms within duplicated regions of the human genome. Am. J. Hum. Genet. 79, 275–290 (2006).

    Article  CAS  Google Scholar 

  13. Cooper, G.M., Zerr, T., Kidd, J.M., Eichler, E.E. & Nickerson, D.A. Systematic assessment of copy number variant detection via genome-wide SNP genotyping. Nat. Genet. 40, 1199–1203 (2008).

    Article  CAS  Google Scholar 

  14. Locke, D.P. et al. Large-scale variation among human and great ape genomes determined by array comparative genomic hybridization. Genome Res. 13, 347–357 (2003).

    Article  CAS  Google Scholar 

  15. Sharp, A.J. et al. Segmental duplications and copy-number variation in the human genome. Am. J. Hum. Genet. 77, 78–88 (2005).

    Article  CAS  Google Scholar 

  16. Tuzun, E. et al. Fine-scale structural variation of the human genome. Nat. Genet. 37, 727–732 (2005).

    Article  CAS  Google Scholar 

  17. Korbel, J.O. et al. Paired-end mapping reveals extensive structural variation in the human genome. Science 318, 420–426 (2007).

    Article  CAS  Google Scholar 

  18. Wang, J. et al. The diploid genome sequence of an Asian individual. Nature 456, 60–65 (2008).

    Article  CAS  Google Scholar 

  19. Campbell, P.J. et al. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat. Genet. 40, 722–729 (2008).

    Article  CAS  Google Scholar 

  20. Wheeler, D.A. et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872–876 (2008).

    Article  CAS  Google Scholar 

  21. Chiang, D.Y. et al. High-resolution mapping of copy-number alterations with massively parallel sequencing. Nat. Methods 6, 99–103 (2009).

    Article  CAS  Google Scholar 

  22. Bentley, D.R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).

    Article  CAS  Google Scholar 

  23. Li, H., Ruan, J. & Durbin, R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18, 1851–1858 (2008).

    Article  CAS  Google Scholar 

  24. Hillier, L.W. et al. Whole-genome sequencing and variant discovery in C. elegans. Nat. Methods 5, 183–188 (2008).

    Article  CAS  Google Scholar 

  25. Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

    Article  CAS  Google Scholar 

  26. Levenshtein, V.I. Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys. Dokl. 10, 707–710 (1966).

    Google Scholar 

  27. Ukkonen, E. On approximate string matching. in Fundamentals of Computation Theory, Proceedings of the 1983 International FCT Conference 487–495 (Springer-Verlag, London, 1983).

    Google Scholar 

  28. Smit, A.F.A., Hubley, R. & Green, P. RepeatMasker Open-3.0. http://www.repeatmasker.org (1996–2004).

  29. Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).

    Article  CAS  Google Scholar 

  30. Morgulis, A., Gertz, E.M., Schaffer, A.A. & Agarwala, R. WindowMasker: window-based masker for sequenced genomes. Bioinformatics 22, 134–141 (2006).

    Article  CAS  Google Scholar 

  31. Smith, D.R. et al. Rapid whole-genome mutational profiling using next-generation sequencing technologies. Genome Res. 18, 1638–1642 (2008).

    Article  CAS  Google Scholar 

  32. She, X. et al. Shotgun sequence assembly and recent segmental duplications within the human genome. Nature 431, 927–930 (2004).

    Article  CAS  Google Scholar 

  33. Istrail, S. et al. Whole-genome shotgun assembly and comparison of human genome assemblies. Proc. Natl. Acad. Sci. USA 101, 1916–1921 (2004).

    Article  CAS  Google Scholar 

  34. Bailey, J.A., Yavor, A.M., Massa, H.F., Trask, B.J. & Eichler, E.E. Segmental duplications: organization and impact within the current human genome project assembly. Genome Res. 11, 1005–1017 (2001).

    Article  CAS  Google Scholar 

  35. McCarroll, S.A. et al. Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat. Genet. 40, 1166–1174 (2008).

    Article  CAS  Google Scholar 

  36. Lackner, C., Cohen, J.C. & Hobbs, H.H. Molecular definition of the extreme size polymorphism in apolipoprotein(a). Hum. Mol. Genet. 2, 933–940 (1993).

    Article  CAS  Google Scholar 

  37. Pruitt, K.D., Tatusova, T. & Maglott, D.R. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 35, D61–D65 (2007).

    Article  CAS  Google Scholar 

  38. Jiang, Z. et al. Ancestral reconstruction of segmental duplications reveals punctuated cores of human genome evolution. Nat. Genet. 39, 1361–1368 (2007).

    Article  CAS  Google Scholar 

  39. Marques-Bonet, T. et al. A burst of segmental duplications in the genome of the African great ape ancestor. Nature 457, 877–881 (2009).

    Article  CAS  Google Scholar 

  40. Lichter, P. et al. High-resolution mapping of human chromosome 11 by in situ hybridization with cosmid clones. Science 247, 64–69 (1990).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank D. Bentley for early access to the Illumina WGS dataset for NA18507; J. Wang for the YH DNA and the cell line; M. Egholm and B. Simen for the JDW DNA and J.D. Watson for permission to analyze his genome. We also thank M. Shumway, P. Flicek and R. Leinonen for technical assistance in transferring large sequence datasets; E. Tüzün for help in parallelizing mrFAST for computation clusters through message passing interface; S. Girirajan for assistance with experiments and T. Brown for her help in manuscript preparation. J.M.K. is supported by a US National Science Foundation Graduate Research Fellowship. T.M.-B. is supported by a Marie Curie fellowship (FP7). This work was supported, in part, by U.S. National Institutes of Health grant HG004120 to E.E.E. E.E.E. is an investigator of the Howard Hughes Medical Institute.

Author information

Authors and Affiliations

Authors

Contributions

C.A., J.M.K., T.M.-B. and E.E.E. designed the study, performed analytical work and wrote the manuscript. C.A., F.H. and O.M. designed and implemented the mrFAST algorithm. C.A., J.M.K., G.A. and J.O.K. performed computational analysis. T.M.-B., F.A., C.B. and M.M. performed validation experiments. R.A.G. advised on handling of JDW data analysis. S.C.S. and E.E.E. obtained funding for the study.

Corresponding author

Correspondence to Evan E Eichler.

Supplementary information

Supplementary Text and Figures

Supplementary Note, Supplementary Figures 1–7, and Supplementary Tables 1–3 and 6 (PDF 2725 kb)

Supplementary Table 4

Estimated diploid copy number for 17,601 autosomal coding genes (XLS 3961 kb)

Supplementary Table 5

Individual exons which are estimated to be copy-number variable among the three analyzed individuals (XLS 631 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Alkan, C., Kidd, J., Marques-Bonet, T. et al. Personalized copy number and segmental duplication maps using next-generation sequencing. Nat Genet 41, 1061–1067 (2009). https://doi.org/10.1038/ng.437

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng.437

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing