Abstract

Genetic variation among individual humans occurs on many different scales, ranging from gross alterations in the human karyotype to single nucleotide changes. Here we explore variation on an intermediate scale—particularly insertions, deletions and inversions affecting from a few thousand to a few million base pairs. We employed a clone-based method to interrogate this intermediate structural variation in eight individuals of diverse geographic ancestry. Our analysis provides a comprehensive overview of the normal pattern of structural variation present in these genomes, refining the location of 1,695 structural variants. We find that 50% were seen in more than one individual and that nearly half lay outside regions of the genome previously described as structurally variant. We discover 525 new insertion sequences that are not present in the human reference genome and show that many of these are variable in copy number between individuals. Complete sequencing of 261 structural variants reveals considerable locus complexity and provides insights into the different mutational processes that have shaped the human genome. These data provide the first high-resolution sequence map of human structural variation—a standard for genotyping platforms and a prelude to future individual genome sequencing projects.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    et al. Detection of large-scale variation in the human genome. Nature Genet. 36, 949–951 (2004)

  2. 2.

    et al. Large-scale copy number polymorphism in the human genome. Science 305, 525–528 (2004)

  3. 3.

    et al. Fine-scale structural variation of the human genome. Nature Genet. 37, 727–732 (2005)

  4. 4.

    et al. Segmental duplications and copy-number variation in the human genome. Am. J. Hum. Genet. 77, 78–88 (2005)

  5. 5.

    et al. Global variation in copy number in the human genome. Nature 444, 444–454 (2006)

  6. 6.

    et al. A comprehensive analysis of common copy-number variations in the human genome. Am. J. Hum. Genet. 80, 91–104 (2007)

  7. 7.

    , , , & A high-resolution survey of deletion polymorphisms in the human genome. Nature Genet. 38, 75–81 (2006)

  8. 8.

    et al. Common deletion polymorphisms in the human genome. Nature Genet. 38, 86–92 (2006)

  9. 9.

    , , , & Common deletions and SNPs are in linkage disequilibrium in the human genome. Nature Genet. 38, 82–85 (2006)

  10. 10.

    et al. A genome-wide comparison of recent chimpanzee and human segmental duplications. Nature 437, 88–93 (2005)

  11. 11.

    et al. Copy number polymorphism in Fcgr3 predisposes to glomerulonephritis in rats and humans. Nature 439, 851–855 (2006)

  12. 12.

    et al. The influence of CCL3L1 gene-containing segmental duplications on HIV-1/AIDS susceptibility. Science 307, 1434–1440 (2005)

  13. 13.

    et al. A chromosome 8 gene-cluster polymorphism with low human β-defensin 2 gene copy number predisposes to Crohn disease of the colon. Am. J. Hum. Genet. 79, 439–448 (2006)

  14. 14.

    et al. Psoriasis is associated with increased β-defensin genomic copy number. Nature Genet. 40, 23–25 (2007)

  15. 15.

    , & Mutational and selective effects on copy-number variants in the human genome. Nature Genet. 39, S22–S29 (2007)

  16. 16.

    et al. Whole-genome shotgun assembly and comparison of human genome assemblies. Proc. Natl Acad. Sci. USA 101, 1916–1921 (2004)

  17. 17.

    et al. Genome assembly comparison identifies structural variants in the human genome. Nature Genet. 38, 1413–1418 (2006)

  18. 18.

    et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007)

  19. 19.

    et al. Completing the map of human genetic variation. Nature 447, 161–165 (2007)

  20. 20.

    The International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005)

  21. 21.

    & Fosmid libraries for genomic structural variation detection. Curr. Protocols Hum. Genet. 5, 20.1–20.18 (2007)

  22. 22.

    et al. End-sequence profiling: sequence-based analysis of aberrant genomes. Proc. Natl Acad. Sci. USA 100, 7696–7701 (2003)

  23. 23.

    , & Emerin deletion revals a common X-chromosome inversion mediated by inverted repeats. Nature Genet. 16, 96–99 (1997)

  24. 24.

    et al. Heterozygous submicroscopic inversions involving olfactory receptor-gene clusters mediate the recurrent t(4;8)(p16;p23) translocation. Am. J. Hum. Genet. 71, 276–285 (2002)

  25. 25.

    et al. A common inversion under selection in Europeans. Nature Genet. 37, 129–137 (2005)

  26. 26.

    et al. Discovery of previously unidentified genomic disorders from the duplication architecture of the human genome. Nature Genet. 38, 1038–1042 (2006)

  27. 27.

    et al. Characterization of a recurrent 15q24 microdeletion syndrome. Hum. Mol. Genet. 16, 567–572 (2007)

  28. 28.

    , , , & Inverted repeat structure of the human genome: the X-chromosome contains a preponderance of large, highly homologous inverted repeats that contain testes genes. Genome Res. 14, 1861–1869 (2004)

  29. 29.

    , , & TIGR Assembler: a new tool for assembling large shotgun sequencing projects. Genome Sci. Technol. 1, 9–19 (1995)

  30. 30.

    et al. Closing gaps in the human genome with fosmid resources generated from multiple individuals. Nature Genet. 40, 96–101 (2008)

  31. 31.

    et al. Challenges and standards in integrating surveys of structural variation. Nature Genet. 39, S7–S15 (2007)

  32. 32.

    et al. Short mucin 6 alleles are associated with H. pylori infection. World J. Gastroenterol. 12, 6021–6025 (2006)

  33. 33.

    , & Molecular definition of the extreme size polymorphism in apolipoprotein(a). Hum. Mol. Genet. 2, 933–940 (1993)

  34. 34.

    , & SSAHA: a fast search method for large DNA databases. Genome Res. 11, 1725–1729 (2001)

  35. 35.

    ENCODE Project Consortium. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799–816 (2007)

  36. 36.

    et al. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409, 928–933 (2001)

  37. 37.

    et al. DNA sequence and analysis of human chromosome 8. Nature 439, 331–335 (2006)

  38. 38.

    et al. Array CGH analysis of copy number variation identifies 1284 new genes variant in healthy white males: implications for association studies of complex diseases. Hum. Mol. Genet. 16, 2783–2794 (2007)

  39. 39.

    et al. Strong association of de novo copy number mutations with autism. Science 316, 445–449 (2007)

  40. 40.

    et al. Paired-end mapping reveals extensive structural variation in the human genome. Science 318, 420–426 (2007)

  41. 41.

    et al. Assembly of high-resolution restriction maps based on multiple complete digests of a redundant set of overlapping clones. Genomics 33, 389–408 (1996)

  42. 42.

    , , & Multiple-complete-digest restriction fragment mapping: generating sequence-ready maps for large-scale DNA sequencing. Proc. Natl Acad. Sci. USA 94, 5225–5230 (1997)

Download references

Acknowledgements

We thank the staff from the University of Washington Genome Center and the Washington University Genome Sequencing Center for technical assistance. J.M.K. is supported by a National Science Foundation Graduate Research Fellowship. G.M.C. is supported by a Merck, Jane Coffin Childs Memorial Fund Postdoctoral Fellowship. This work was supported by National Institutes of Health grants HG004120 to E.E.E., D.A.N. and M.V.O., and 3 U54 HG002043 to M.V.O. E.E.E. is an Investigator of the Howard Hughes Medical Institute.

Author Contributions J.M.K., G.M.C., M.V.O, D.A.N, and E.E.E. contributed to the writing of this paper. The study was coordinated by L.B., M.V.O, R.K., D.R.S., J.M.K. and E.E.E. A.B., D.R.S., D.Sa., E.G., H.M.E., K.M., N.T., R.D., W.F.D. and W.T. performed library construction and end sequencing. E.H., H.S.H., K.A.P., M.V.O., R.K., R.K.W., T.G. and W.G. performed clone insert validation and sequencing. C.A., D.A.N., E.T., J.D.S., J.S., L.C., M.D., M.M., M.W., T.L.N. and Z.C. provided technical and analytical support. D.A.P., D.A.A., J.M.Ko. and S.A.M. contributed variation data. G.M.C., J.M.K., L.B., N.A.Y., N.S. and P.T. designed and analysed array CGH experiments. G.M.C. and T.Z. performed the genotype analysis. F.A. performed FISH experiments. B.T. and D.S. performed optical mapping experiments. E.E.E., J.M.K. and L.C. analysed sequenced clones. J.C.M. and N.H. identified SNPs and indels.

Author information

Affiliations

  1. Department of Genome Sciences and Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA

    • Jeffrey M. Kidd
    • , Gregory M. Cooper
    • , Can Alkan
    • , Francesca Antonacci
    • , Troy Zerr
    • , Tera L. Newman
    • , Eray Tüzün
    • , Ze Cheng
    • , Molly Weaver
    • , Lin Chen
    • , Maika Malig
    • , Joshua D. Smith
    • , Michael Dorschner
    • , John Stamatoyannopoulos
    • , Deborah A. Nickerson
    •  & Evan E. Eichler
  2. Agencourt Bioscience Corporation, Beverly, Massachusetts 01915, USA

    • William F. Donahue
    • , Heather M. Ebling
    • , Nadeem Tusneem
    • , Robert David
    • , David Saranga
    • , Adrianne Brand
    • , Wei Tao
    • , Erik Gustafson
    • , Kevin McKernan
    •  & Douglas R. Smith
  3. Division of Medical Genetics, Department of Medicine, and University of Washington Genome Center, University of Washington, Seattle, Washington 98195, USA

    • Hillary S. Hayden
    • , Eric Haugen
    • , Will Gillett
    • , Karen A. Phelps
    • , Maynard V. Olson
    •  & Rajinder Kaul
  4. Agilent Technologies, Santa Clara, California 95051, USA

    • Nick Sampas
    • , N. Alice Yamada
    • , Peter Tsang
    •  & Laurakay Bruhn
  5. Washington University Genome Sequencing Center, School of Medicine, St Louis, Missouri 63108, USA

    • Tina Graves
    •  & Richard K. Wilson
  6. Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA

    • Nancy Hansen
    •  & James C. Mullikin
  7. Laboratory of Genetics, University of Wisconsin, Madison, Wisconsin 53706, USA

    • Brian Teague
    •  & David Schwartz
  8. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02114, USA

    • Joshua M. Korn
    • , Steven A. McCarroll
    •  & David A. Altshuler
  9. Illumina, Inc., 9885 Towne Centre Drive, San Diego, California 92121, USA

    • Daniel A. Peiffer

Authors

  1. Search for Jeffrey M. Kidd in:

  2. Search for Gregory M. Cooper in:

  3. Search for William F. Donahue in:

  4. Search for Hillary S. Hayden in:

  5. Search for Nick Sampas in:

  6. Search for Tina Graves in:

  7. Search for Nancy Hansen in:

  8. Search for Brian Teague in:

  9. Search for Can Alkan in:

  10. Search for Francesca Antonacci in:

  11. Search for Eric Haugen in:

  12. Search for Troy Zerr in:

  13. Search for N. Alice Yamada in:

  14. Search for Peter Tsang in:

  15. Search for Tera L. Newman in:

  16. Search for Eray Tüzün in:

  17. Search for Ze Cheng in:

  18. Search for Heather M. Ebling in:

  19. Search for Nadeem Tusneem in:

  20. Search for Robert David in:

  21. Search for Will Gillett in:

  22. Search for Karen A. Phelps in:

  23. Search for Molly Weaver in:

  24. Search for David Saranga in:

  25. Search for Adrianne Brand in:

  26. Search for Wei Tao in:

  27. Search for Erik Gustafson in:

  28. Search for Kevin McKernan in:

  29. Search for Lin Chen in:

  30. Search for Maika Malig in:

  31. Search for Joshua D. Smith in:

  32. Search for Joshua M. Korn in:

  33. Search for Steven A. McCarroll in:

  34. Search for David A. Altshuler in:

  35. Search for Daniel A. Peiffer in:

  36. Search for Michael Dorschner in:

  37. Search for John Stamatoyannopoulos in:

  38. Search for David Schwartz in:

  39. Search for Deborah A. Nickerson in:

  40. Search for James C. Mullikin in:

  41. Search for Richard K. Wilson in:

  42. Search for Laurakay Bruhn in:

  43. Search for Maynard V. Olson in:

  44. Search for Rajinder Kaul in:

  45. Search for Douglas R. Smith in:

  46. Search for Evan E. Eichler in:

Competing interests

Daniel A. Peiffer is currently an employee of Illumina, Inc.; Kevin McKernan and Robert David are currently employed by Applied Biosystems, a manufacturer of DNA-sequencing reagents and instruments; and Laurakay Bruhn, Nick Sampas, Peter Tsang and N. Alice Yamada are employees of Agilent Technologies, Inc.

Corresponding author

Correspondence to Evan E. Eichler.

Supplementary information

PDF files

  1. 1.

    Supplementary Information

    The file contains extensive Supplementary Information with Supplementary Figures S1-S2, S4-S9. Supplementary Figures S3 and S10 are included in separate files.

  2. 2.

    Supplementary Figure S2

    The file contains Supplementary Figure S2 with end-sequence mapping of fosmids against the human genome. All discordant fosmids mapping to the human genome are displayed individually for each library using the following color scheme: ABC7=green, ABC8=forestgreen, ABC10=blue, ABC13=cyan, G248=black, ABC9=purple, ABC11=red, ABC12=orange, and ABC14=hotpink. The end-sequence placements are mapped in the context of gaps within the assembly (purple) and segmental duplications (grey bars).

  3. 3.

    Supplementary Figure S3

    The file contains Supplementary Figure S3 with end-sequence mapping of fosmids against the human genome. All discordant fosmids mapping to the human genome are displayed individually for each library using the following color scheme: ABC7=green, ABC8=forestgreen, ABC10=blue, ABC13=cyan, G248=black, ABC9=purple, ABC11=red, ABC12=orange, and ABC14=hotpink. The end-sequence placements are mapped in the context of gaps within the assembly (purple) and segmental duplications (grey bars).

  4. 4.

    Supplementary Figure S4

    The file contains Supplementary Figure S4 with end-sequence mapping of fosmids against the human genome. All discordant fosmids mapping to the human genome are displayed individually for each library using the following color scheme: ABC7=green, ABC8=forestgreen, ABC10=blue, ABC13=cyan, G248=black, ABC9=purple, ABC11=red, ABC12=orange, and ABC14=hotpink. The end-sequence placements are mapped in the context of gaps within the assembly (purple) and segmental duplications (grey bars).

  5. 5.

    Supplementary Figure S10

    The file contains Supplementary Figure S10 with sequenced Structural Variation and Gene Structure. A graphical representation for sequenced sites (n=266) of structural variation (miropeats view) is provided. Each alignment compares the human reference genome (top) with the sequenced structure of the fosmid clone.

Excel files

  1. 1.

    Supplementary Table S1

    The file contains Supplementary Table S1 showing concordant vs. discordant clone placement summary statistics.

  2. 2.

    Supplementary Table S2

    The file contains Supplementary Table S2 showing one-end anchored (OEA) clone statistics.

  3. 3.

    Supplementary Table S3

    The file contains Supplementary Table S3 with All ESP predicted sites of insertions and deletions with associated experimental validation (See Supplementary Material Section 12 for description of column headers)

  4. 4.

    Supplementary Table S4

    The file contains Supplementary Table S4 with ESP predicted sites of insertion and deletion loci (non-redundant) across the fosmid libraries (See Supplementary Material Section 12 for description of column headers)

  5. 5.

    Supplementary Table S5

    The file contains Supplementary Table S5 with genotyping results for a subset of ESP deletion variants based on analysis of genotypes from the llumina Human1M BeadChip

  6. 6.

    Supplementary Table S6

    The file contains Supplementary Table S6 with ESP predicted inversion breakpoints

  7. 7.

    Supplementary Table S7

    The file contains Supplementary Table S7 with merged inversion loci (non-redundant).

  8. 8.

    Supplementary Table S8

    The file contains Supplementary Table S8 with large insertions of novel sequence confirmed by optical mapping.

  9. 9.

    Supplementary Table S9

    The file contains Supplementary Table S9 with genbank accession IDs of sequenced clones.

  10. 10.

    Supplementary Table S10

    The file contains Supplementary Table S10 with sequenced structural variants that affect exons of genes.

  11. 11.

    Supplementary Table S11

    The file contains Supplementary Table S11 with summary statistics of fosmid end sequences.

  12. 12.

    Supplementary Table S12

    The file contains Supplementary Table S12 with genotypes based on custom GoldenGate Assay and qPCR.

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nature06862

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.