Resource | Published:

Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library

Nature Biotechnology volume 28, pages 4755 (2010) | Download Citation

Subjects

Abstract

Structural variants (SVs) are a major source of human genomic variation; however, characterizing them at nucleotide resolution remains challenging. Here we assemble a library of breakpoints at nucleotide resolution from collating and standardizing ~2,000 published SVs. For each breakpoint, we infer its ancestral state (through comparison to primate genomes) and its mechanism of formation (e.g., nonallelic homologous recombination, NAHR). We characterize breakpoint sequences with respect to genomic landmarks, chromosomal location, sequence motifs and physical properties, finding that the occurrence of insertions and deletions is more balanced than previously reported and that NAHR-formed breakpoints are associated with relatively rigid, stable DNA helices. Finally, we demonstrate an approach, BreakSeq, for scanning the reads from short-read sequenced genomes against our breakpoint library to accurately identify previously overlooked SVs, which we then validate by PCR. As new data become available, we expect our BreakSeq approach will become more sensitive and facilitate rapid SV genotyping of personal genomes.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    et al. Large-scale copy number polymorphism in the human genome. Science 305, 525–528 (2004).

  2. 2.

    et al. Detection of large-scale variation in the human genome. Nat. Genet. 36, 949–951 (2004).

  3. 3.

    et al. Fine-scale structural variation of the human genome. Nat. Genet. 37, 727–732 (2005).

  4. 4.

    et al. Global variation in copy number in the human genome. Nature 444, 444–454 (2006).

  5. 5.

    et al. Paired-end mapping reveals extensive structural variation in the human genome. Science 318, 420–426 (2007).

  6. 6.

    et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008).

  7. 7.

    et al. Germline rates of de novo meiotic deletions and duplications causing several genomic disorders. Nat. Genet. 40, 90–95 (2008).

  8. 8.

    Frequency of new copy number variation in humans. Nat. Genet. 37, 333–334 (2005).

  9. 9.

    et al. The genetic architecture of Down syndrome phenotypes revealed by high-resolution analysis of human segmental trisomies. Proc. Natl. Acad. Sci. USA 106, 12031–12036 (2009).

  10. 10.

    et al. Discovery of previously unidentified genomic disorders from the duplication architecture of the human genome. Nat. Genet. 38, 1038–1042 (2006).

  11. 11.

    et al. Deletion polymorphism upstream of IRGM associated with altered IRGM expression and Crohn's disease. Nat. Genet. 40, 1107–1112 (2008).

  12. 12.

    et al. Deletion of the late cornified envelope LCE3B and LCE3C genes as a susceptibility factor for psoriasis. Nat. Genet. 41, 211–215 (2009).

  13. 13.

    et al. The influence of CCL3L1 gene-containing segmental duplications on HIV-1/AIDS susceptibility. Science 307, 1434–1440 (2005).

  14. 14.

    et al. Copy number polymorphism in Fcgr3 predisposes to glomerulonephritis in rats and humans. Nature 439, 851–855 (2006).

  15. 15.

    , , & Mechanisms of change in gene copy number. Nat. Rev. Genet. 10, 551–564 (2009).

  16. 16.

    et al. Analysis of copy number variants and segmental duplications in the human genome: Evidence for a change in the process of formation in recent evolutionary history. Genome Res. 18, 1865–1874 (2008).

  17. 17.

    et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).

  18. 18.

    , & DNA replication mechanism for generating nonrecurrent rearrangements associated with genomic disorders. Cell 131, 1235–1247 (2007).

  19. 19.

    , & Structural variation in the human genome. Nat. Rev. Genet. 7, 85–97 (2006).

  20. 20.

    et al. PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data. Genome Biol. 10, R23 (2009).

  21. 21.

    et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872–876 (2008).

  22. 22.

    et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).

  23. 23.

    et al. The diploid genome sequence of an Asian individual. Nature 456, 60–65 (2008).

  24. 24.

    , & RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10, 57–63 (2009).

  25. 25.

    et al. The fine-scale and complex architecture of human copy-number variation. Am. J. Hum. Genet. 82, 685–695 (2008).

  26. 26.

    , , & Which transposable elements are active in the human genome? Trends Genet. 23, 183–191 (2007).

  27. 27.

    et al. Mobile elements create structural variation: analysis of a complete human genome. Genome Res. 19, 1516–1526 (2009).

  28. 28.

    , , , & A fine-scale map of recombination rates and hotspots across the human genome. Science 310, 321–324 (2005).

  29. 29.

    et al. Segmental duplications and copy-number variation in the human genome. Am. J. Hum. Genet. 77, 78–88 (2005).

  30. 30.

    & Recombination drives the evolution of GC-content in the human genome. Mol. Biol. Evol. 21, 984–990 (2004).

  31. 31.

    , , & Predicting DNA duplex stability from the base sequence. Proc. Natl. Acad. Sci. USA 83, 3746–3750 (1986).

  32. 32.

    , , & Sequence dependence of DNA conformational flexibility. Biochemistry 28, 7842–7849 (1989).

  33. 33.

    & Primate segmental duplications: crucibles of evolution, diversity and disease. Nat. Rev. Genet. 7, 552–564 (2006).

  34. 34.

    et al. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 37, W202 (2009).

  35. 35.

    , , , & A common sequence motif associated with recombination hot spots and genome instability in humans. Nat. Genet. 40, 1124–1129 (2008).

  36. 36.

    et al. Human subtelomeres are hot spots of interchromosomal recombination and segmental duplication. Nature 437, 94–100 (2005).

  37. 37.

    , & A robust framework for detecting structural variations in a genome. Bioinformatics 24, i59–i67 (2008).

  38. 38.

    et al. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat. Genet. 40, 722–729 (2008).

  39. 39.

    et al. High-resolution mapping of copy-number alterations with massively parallel sequencing. Nat. Methods 6, 99–103 (2009).

  40. 40.

    , , , & MSB: a mean-shift-based approach for the analysis of structural variation in the genome. Genome Res. 19, 106–117 (2009).

  41. 41.

    , , , & Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 25, 2865–2871 (2009).

  42. 42.

    et al. Genome-wide nucleotide-level mammalian ancestor reconstruction. Genome Res. 18, 1829–1843 (2008).

  43. 43.

    et al. The influence of recombination on human genetic diversity. PLoS Genet. 2, e148 (2006).

  44. 44.

    et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007).

  45. 45.

    et al. An initial map of insertion and deletion (INDEL) variation in the human genome. Genome Res. 16, 1182–1190 (2006).

  46. 46.

    , , , & Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes. Proc. Natl. Acad. Sci. USA 100, 11484–11489 (2003).

  47. 47.

    et al. Human-mouse alignments with BLASTZ. Genome Res. 13, 103–107 (2003).

  48. 48.

    et al. PseudoPipe: an automated pseudogene identification pipeline. Bioinformatics 22, 1437–1439 (2006).

  49. 49.

    , , & Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).

Download references

Acknowledgements

We acknowledge support from the National Institutes of Health, the A.L. Williams Professorship funds and the European Molecular Biology Laboratory. We thank R. Alexander and E. Khurana for proofreading the manuscript, and A. Abyzov, Z. Zhang, T. Rausch and J. Du for helpful discussions. Finally, we thank the 1000 Genomes Project for early data access.

Author information

Author notes

    • Michael Snyder

    Present address: Department of Genetics, Stanford University School of Medicine, Stanford, California, USA.

    • Hugo Y K Lam
    • , Xinmeng Jasmine Mu
    •  & Jan O Korbel

    These authors contributed equally to this work.

Affiliations

  1. Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, USA.

    • Hugo Y K Lam
    • , Xinmeng Jasmine Mu
    •  & Mark B Gerstein
  2. Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, Connecticut, USA.

    • Xinmeng Jasmine Mu
    •  & Michael Snyder
  3. Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany.

    • Adrian M Stütz
    •  & Jan O Korbel
  4. Institute for Theoretical Chemistry, University of Vienna, Vienna, Austria.

    • Andrea Tanzer
  5. Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, USA.

    • Philip D Cayting
    •  & Mark B Gerstein
  6. Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada.

    • Philip M Kim
  7. Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada.

    • Philip M Kim
  8. Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada.

    • Philip M Kim
  9. Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.

    • Philip M Kim
  10. European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    • Jan O Korbel
  11. Department of Computer Science, Yale University, New Haven, Connecticut, USA.

    • Mark B Gerstein

Authors

  1. Search for Hugo Y K Lam in:

  2. Search for Xinmeng Jasmine Mu in:

  3. Search for Adrian M Stütz in:

  4. Search for Andrea Tanzer in:

  5. Search for Philip D Cayting in:

  6. Search for Michael Snyder in:

  7. Search for Philip M Kim in:

  8. Search for Jan O Korbel in:

  9. Search for Mark B Gerstein in:

Contributions

H.Y.K.L., X.J.M. and J.O.K. contributed equally to this work; M.B.G. and J.O.K. co-directed this work; M.B.G., J.O.K., H.Y.K.L. and X.J.M. designed the research; H.Y.K.L., X.J.M., A.M.S., A.T., P.D.C., M.S., P.M.K. and J.O.K. performed or provided direction for the analyses and/or experiments; M.B.G., J.O.K., H.Y.K.L., X.J.M., P.M.K. and A.T. wrote the manuscript.

Corresponding authors

Correspondence to Jan O Korbel or Mark B Gerstein.

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figs. 1–6

Excel files

  1. 1.

    Supplementary Tables 1–6

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nbt.1600

Further reading