Abstract

Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analysing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.

  • Subscribe to Nature for full access:

    $199

    Subscribe

Additional access options:

Already a subscriber?  Log in  now or  Register  for online access.

References

  1. 1.

    et al. Origins and functional impact of copy number variation in the human genome. Nature 464, 704–712 (2010)

  2. 2.

    et al. Functional impact of global rare copy number variation in autism spectrum disorders. Nature 466, 368–372 (2010)

  3. 3.

    et al. Strong association of de novo copy number mutations with autism. Science 316, 445–449 (2007)

  4. 4.

    et al. Large recurrent microdeletions associated with schizophrenia. Nature 455, 232–236 (2008)

  5. 5.

    et al. Microduplications of 16p11.2 are associated with schizophrenia. Nature Genet. 41, 1223–1227 (2009)

  6. 6.

    et al. Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls. Nature 464, 713–720 (2010)

  7. 7.

    et al. Deletion polymorphism upstream of IRGM associated with altered IRGM expression and Crohn’s disease. Nature Genet. 40, 1107–1112 (2008)

  8. 8.

    , , & Mechanisms of change in gene copy number. Nature Rev. Genet. 10, 551–564 (2009)

  9. 9.

    & Structural variation in the human genome and its role in disease. Annu. Rev. Med. 61, 437–455 (2010)

  10. 10.

    et al. Large-scale copy number polymorphism in the human genome. Science 305, 525–528 (2004)

  11. 11.

    et al. Detection of large-scale variation in the human genome. Nature Genet. 36, 949–951 (2004)

  12. 12.

    et al. Segmental duplications and copy-number variation in the human genome. Am. J. Hum. Genet. 77, 78–88 (2005)

  13. 13.

    et al. Integrated detection and population-genetic analysis of SNPs and copy number variation. Nature Genet. 40, 1166–1174 (2008)

  14. 14.

    et al. Fine-scale structural variation of the human genome. Nature Genet. 37, 727–732 (2005)

  15. 15.

    et al. Paired-end mapping reveals extensive structural variation in the human genome. Science 318, 420–426 (2007)

  16. 16.

    et al. Personalized copy number and segmental duplication maps using next-generation sequencing. Nature Genet. 41, 1061–1067 (2009)

  17. 17.

    et al. BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nature Methods 6, 677–681 (2009)

  18. 18.

    , , & Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes. Genome Res. 19, 1270–1278 (2009)

  19. 19.

    , & Computational methods for discovering structural variation with next-generation sequencing. Nature Methods 6, S13–S20 (2009)

  20. 20.

    et al. Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. Genome Res. 19, 1527–1541 (2009)

  21. 21.

    et al. High-resolution mapping of copy-number alterations with massively parallel sequencing. Nature Methods 6, 99–103 (2009)

  22. 22.

    et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008)

  23. 23.

    , & A robust framework for detecting structural variations in a genome. Bioinformatics 24, i59–i67 (2008)

  24. 24.

    et al. Towards a comprehensive structural variation map of an individual human genome. Genome Biol. 11, R52 (2010)

  25. 25.

    et al. Recent segmental duplications in the human genome. Science 297, 1003–1007 (2002)

  26. 26.

    et al. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nature Genet. 40, 722–729 (2008)

  27. 27.

    , , , & Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Res. 19, 1586–1592 (2009)

  28. 28.

    et al. An initial map of insertion and deletion (INDEL) variation in the human genome. Genome Res. 16, 1182–1190 (2006)

  29. 29.

    , , , & Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 25, 2865–2871 (2009)

  30. 30.

    et al. ABySS: a parallel assembler for short read sequence data. Genome Res. 19, 1117–1123 (2009)

  31. 31.

    et al. Detection and characterization of novel sequence insertions using paired-end next-generation sequencing. Bioinformatics 26, 1277–1283 (2010)

  32. 32.

    et al. The sequence and de novo assembly of the giant panda genome. Nature 463, 311–317 (2010)

  33. 33.

    The 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010)

  34. 34.

    et al. Diversity of human copy number variation and multicopy genes. Science 330, 641–646 (2010)

  35. 35.

    et al. Six new loci associated with body mass index highlight a neuronal influence on body weight regulation. Nature Genet. 41, 25–34 (2008)

  36. 36.

    et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007)

  37. 37.

    , & Human olfaction: from genomic variation to phenotypic diversity. Trends Genet. 25, 178–184 (2009)

  38. 38.

    , , , & Common deletions and SNPs are in linkage disequilibrium in the human genome. Nature Genet. 38, 82–85 (2006)

  39. 39.

    et al. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010)

  40. 40.

    et al. Mutation spectrum revealed by breakpoint sequencing of human germline CNVs. Nature Genet. 42, 385–391 (2010)

  41. 41.

    et al. Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library. Nature Biotechnol. 28, 47–55 (2010)

  42. 42.

    Genomic disorders: structural features of the genome can lead to DNA rearrangements and human disease traits. Trends Genet. 14, 417–422 (1998)

  43. 43.

    , & DNA replication mechanism for generating nonrecurrent rearrangements associated with genomic disorders. Cell 131, 1235–1247 (2007)

  44. 44.

    et al. GENCODE: producing a reference annotation for ENCODE. Genome Biol. 7, (suppl. 1)S4 (2006)

Download references

Acknowledgements

We would like to acknowledge C. Hardy, R. Smith, A. De Witte and S. Giles for their assistance with validation. M.A.B.’s group was supported by a grant from the National Institutes of Health (RO1 GM59290) and G.T.M.’s group by grants R01 HG004719 and RC2 HG005552, also from the NIH. J.O.K.’s group was supported by an Emmy Noether Fellowship of the German Research Foundation (Deutsche Forschungsgemeinschaft). J.W.’s group was supported by the National Basic Research Program of China (973 program no. 2011CB809200), the National Natural Science Foundation of China (30725008; 30890032; 30811130531; 30221004), the Chinese 863 program (2006AA02Z177; 2006AA02Z334; 2006AA02A302; 2009AA022707), the Shenzhen Municipal Government of China (grants JC200903190767A; JC200903190772A; ZYC200903240076A; CXB200903110066A; ZYC200903240077A; ZYC200903240076A and ZYC200903240080A) and the Ole Rømer grant from the Danish Natural Science Research Council. E.E.E.’s group was supported by grants P01 HG004120 and U01 HG005209 from the National Institutes of Health. C.L.’s group was supported by grants from the National Institutes of Health: P41 HG004221, RO1 GM081533 and UO1 HG005209 and X.S. was supported by a T32 fellowship award from the NIH. We thank the Genome Structural Variation Consortium (http://www.sanger.ac.uk/humgen/cnv/42mio/) and the International HapMap Consortium for making available microarray data. The authors acknowledge the individuals participating in the 1000 Genomes Project by providing samples, including the Yoruba people of Ibadan, Nigeria, the community at Beijing Normal University, the people of Tokyo, Japan, and the people of the Utah CEPH community. Furthermore, we thank R. Durbin and L. Steinmetz for comments on the manuscript.

Author information

Author notes

    • Ryan E. Mills
    • , Klaudia Walter
    • , Chip Stewart
    • , Robert E. Handsaker
    • , Ken Chen
    • , Can Alkan
    • , Alexej Abyzov
    • , Seungtai Chris Yoon
    •  & Kai Ye

    These authors contributed equally to this work.

Affiliations

  1. Department of Pathology, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts, USA

    • Ryan E. Mills
    • , Xinghua Shi
    •  & Charles Lee
  2. The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK

    • Klaudia Walter
    • , Donald F. Conrad
    • , Aylwyn Scally
    • , Yujun Zhang
    •  & Matthew E. Hurles
  3. Department of Biology, Boston College, Boston, Massachusetts, USA

    • Chip Stewart
    • , Deniz Kural
    • , Michael P. Stromberg
    • , Jiantao Wu
    •  & Gabor T. Marth
  4. Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts, USA

    • Robert E. Handsaker
    • , Joshua Korn
    • , James Nemesh
    •  & Steven A. McCarroll
  5. The Genome Center at Washington University, St. Louis, Missouri, USA

    • Ken Chen
    • , Asif Chinwalla
    •  & Li Ding
  6. Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington, USA

    • Can Alkan
    • , Jeffrey M. Kidd
    •  & Evan E. Eichler
  7. Howard Hughes Medical Institute, University of Washington, Seattle, Washington, USA

    • Can Alkan
    •  & Evan E. Eichler
  8. Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, USA

    • Alexej Abyzov
    • , Ekta Khurana
    • , Jing Leng
    • , Xinmeng Jasmine Mu
    • , Zhengdong D. Zhang
    •  & Mark B. Gerstein
  9. Seaver Autism Center and Department of Psychiatry, Mount Sinai School of Medicine, New York, New York, USA

    • Seungtai Chris Yoon
  10. Departments of Molecular Epidemiology, Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, The Netherlands

    • Kai Ye
  11. Illumina Cambridge Ltd, Chesterford Research Park, Little Chesterford, Saffron Walden CB10 1XL, UK

    • R. Keira Cheetham
  12. Life Technologies, Beverly, Massachusetts, USA

    • Yutao Fu
    •  & Heather E. Peckham
  13. Department of Genetics, Stanford University, Stanford, California, USA

    • Fabian Grubert
    • , Hugo Y. K. Lam
    • , Alexander Eckehart Urban
    •  & Michael Snyder
  14. School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada

    • Iman Hajirasouliha
    •  & Fereydoun Hormozdiari
  15. Department of Psychiatry, Department of Cellular and Molecular Medicine, Institute for Genomic Medicine, University of California, San Diego, La Jolla, California, USA

    • Lilia M. Iakoucheva
    • , Shuli Kang
    •  & Jonathan Sebat
  16. Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK

    • Zamin Iqbal
  17. Department of Biological Sciences, Louisiana State University, Baton Rouge, Louisiana, USA

    • Miriam K. Konkel
    • , Jerilyn A. Walker
    •  & Mark A. Batzer
  18. Molecular Biophysics and Biochemistry Department, Yale University, New Haven, Connecticut, USA

    • Ekta Khurana
    •  & Mark B. Gerstein
  19. BGI-Shenzhen, Shenzhen 518083, China

    • Ruiqiang Li
    • , Yingrui Li
    • , Ruibang Luo
    •  & Jun Wang
  20. Albert Einstein College of Medicine, Bronx, New York, USA

    • Chang-Yun Lin
    •  & Kenny Ye
  21. Genome Biology Research Unit, European Molecular Biology Laboratory, Heidelberg, Germany

    • Tobias Rausch
    • , Adrian M. Stütz
    •  & Jan O. Korbel
  22. Department of Genetics, Washington University, St Louis, Missouri, USA

    • Li Ding
  23. Department of Statistics, University of Oxford, OX3 7BN, UK

    • Gil McVean
  24. Department of Biology, University of Copenhagen, Copenhagen, Denmark

    • Jun Wang
  25. Department of Computer Science, Yale University, New Haven, Connecticut, USA

    • Mark B. Gerstein
  26. Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA

    • Steven A. McCarroll
  27. Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, California, USA

    • Alexander Eckehart Urban

Consortia

  1. 1000 Genomes Project

    Lists of participants and affiliations are shown in Supplementary Information.

Authors

  1. Search for Ryan E. Mills in:

  2. Search for Klaudia Walter in:

  3. Search for Chip Stewart in:

  4. Search for Robert E. Handsaker in:

  5. Search for Ken Chen in:

  6. Search for Can Alkan in:

  7. Search for Alexej Abyzov in:

  8. Search for Seungtai Chris Yoon in:

  9. Search for Kai Ye in:

  10. Search for R. Keira Cheetham in:

  11. Search for Asif Chinwalla in:

  12. Search for Donald F. Conrad in:

  13. Search for Yutao Fu in:

  14. Search for Fabian Grubert in:

  15. Search for Iman Hajirasouliha in:

  16. Search for Fereydoun Hormozdiari in:

  17. Search for Lilia M. Iakoucheva in:

  18. Search for Zamin Iqbal in:

  19. Search for Shuli Kang in:

  20. Search for Jeffrey M. Kidd in:

  21. Search for Miriam K. Konkel in:

  22. Search for Joshua Korn in:

  23. Search for Ekta Khurana in:

  24. Search for Deniz Kural in:

  25. Search for Hugo Y. K. Lam in:

  26. Search for Jing Leng in:

  27. Search for Ruiqiang Li in:

  28. Search for Yingrui Li in:

  29. Search for Chang-Yun Lin in:

  30. Search for Ruibang Luo in:

  31. Search for Xinmeng Jasmine Mu in:

  32. Search for James Nemesh in:

  33. Search for Heather E. Peckham in:

  34. Search for Tobias Rausch in:

  35. Search for Aylwyn Scally in:

  36. Search for Xinghua Shi in:

  37. Search for Michael P. Stromberg in:

  38. Search for Adrian M. Stütz in:

  39. Search for Alexander Eckehart Urban in:

  40. Search for Jerilyn A. Walker in:

  41. Search for Jiantao Wu in:

  42. Search for Yujun Zhang in:

  43. Search for Zhengdong D. Zhang in:

  44. Search for Mark A. Batzer in:

  45. Search for Li Ding in:

  46. Search for Gabor T. Marth in:

  47. Search for Gil McVean in:

  48. Search for Jonathan Sebat in:

  49. Search for Michael Snyder in:

  50. Search for Jun Wang in:

  51. Search for Kenny Ye in:

  52. Search for Evan E. Eichler in:

  53. Search for Mark B. Gerstein in:

  54. Search for Matthew E. Hurles in:

  55. Search for Charles Lee in:

  56. Search for Steven A. McCarroll in:

  57. Search for Jan O. Korbel in:

Contributions

The authors contributed this study at different levels, as described in the following. SV discovery: K.W., C.S., R.E.H., K.C., C.A., A.A., S.C.Y., R.K.C., A.C., Y.F., I.H., F.H., Z.I., D.K., R.Li., Y.L., C.L., R.Lu., X.J.M., H.E.P., L.D., G.T.M., J.S., Ju.W., Ka.Y., Ke.Y., E.E.E., M.B.G., M.E.H., S.A.M. and J.O.K. SV validation: R.E.M., K.W., K.C., A.A., S.C.Y., F.G., M.K.K., J.K., J.N., A.E.U., X.S., A.M.S., J.A.W., Y.Z., Z.D.Z., M.A.B., J.S., M.S., M.E.H., C.L. and J.O.K. SV genotyping: K.W., R.E.H., J.K., J.N., M.E.H. and S.A.M. Data analysis: R.E.M., C.S., C.A., A.A., R.E.H., K.C., S.C.Y., R.K.C., A.C., D.F.C., Y.F., F.H., L.M.I., Z.I., J.M.K., M.K.K., S.K., J.K., E.K., D.K., H.Y.K.L., J.L., R.Li, Y.L., C.L., R.Luo, X.J.M., J.N., H.E.P., T.R., A.S., X.S., M.P.S., J.A.W., Ji.W., Y.Z., Z.D.Z., M.A.B., L.D., G.T.M., G.M., J.S., M.S., Ju.W., Ka.Y., Ke.Y., E.E.E., M.B.G., M.E.H., C.L, S.A.M. and J.O.K. Preparation of manuscript display items: R.E.M., K.W., C.S., C.A., A.A., R.E.H., S.C.Y., L.M.I., S.K., E.K., M.K.K., X.J.M., X.S., J.A.W., M.B.G., S.A.M. and J.O.K. Co-chairs of the Structural Variation Analysis group: E.E.E., M.E.H. and C.L. The following equally contributed to directing the described analyses and participating in the design of the study and should be considered joint senior authors: E.E.E., M.B.G., M.E.H., C.L., S.A.M. and J.O.K. The manuscript was written by the following authors: R.E.M. and J.O.K.

Competing interests

H.E.P. and Y.F. are employees of Life Technologies, the manufacturers of the SOLiD sequencing platform. R.K.C. is an employee of Illumina Cambridge Ltd., the manufacturer of the Illumina sequencing platform.

Corresponding author

Correspondence to Jan O. Korbel.

Data sets described here can be obtained from the1000 Genomes Project website at http://www.1000genomes.org (July 2010 Data Release). Individual SV discovery methods can be obtained from sources mentioned in Supplementary Table 2, or upon request from the authors.

Supplementary information

PDF files

  1. 1.

    Supplementary Information

    This file contains Supplementary Notes, Supplementary Figures 1- 15 with legends, Supplementary Tables 2, 6-8, 12-17, 19 and legends for Supplementary Tables 1-20 (see separate files for Supplementary Tables 1, 3- 5, 9-11, 18 and 20) and Supplementary References.

  2. 2.

    Supplementary Methods

    This file contains Supplementary Methods and References.

Excel files

  1. 1.

    Supplementary Table 1

    This file contains the sequencing statistics for SV discovery.

  2. 2.

    Supplementary Table 5

    This file contains the Gold standard SV sets for NA12878 and NA12156 from 4 external and orthogonal data sets.

  3. 3.

    Supplementary Table 9

    This file contains the functional analysis of deletions, which overlap transcripts.

  4. 4.

    Supplementary Table 10

    This file contains the Gene Ontology (GO) enrichment analysis for deletions overlapping protein coding regions.

  5. 5.

    Supplementary Table 11

    This file contains the formation mechanisms and ancestral states of SVs inferred with the BreakSeq pipeline.

  6. 6.

    Supplementary Table 18

    This file contains a summary of assembled breakpoints for deletion release set.

  7. 7.

    Supplementary Table 20

    This file contains the overlap of partial or whole genotyped, coding region deletions with OMIM Morbid Map.

Zip files

  1. 1.

    Supplementary Table 3

    This file contains a complete list of low coverage calls by institution and set.

  2. 2.

    Supplementary Table 4

    This file contains a complete list of trio calls by institution and set.

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.