Abstract
Structural variants (SVs) are a major source of human genomic variation; however, characterizing them at nucleotide resolution remains challenging. Here we assemble a library of breakpoints at nucleotide resolution from collating and standardizing ~2,000 published SVs. For each breakpoint, we infer its ancestral state (through comparison to primate genomes) and its mechanism of formation (e.g., nonallelic homologous recombination, NAHR). We characterize breakpoint sequences with respect to genomic landmarks, chromosomal location, sequence motifs and physical properties, finding that the occurrence of insertions and deletions is more balanced than previously reported and that NAHR-formed breakpoints are associated with relatively rigid, stable DNA helices. Finally, we demonstrate an approach, BreakSeq, for scanning the reads from short-read sequenced genomes against our breakpoint library to accurately identify previously overlooked SVs, which we then validate by PCR. As new data become available, we expect our BreakSeq approach will become more sensitive and facilitate rapid SV genotyping of personal genomes.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Sebat, J. et al. Large-scale copy number polymorphism in the human genome. Science 305, 525–528 (2004).
Iafrate, A.J. et al. Detection of large-scale variation in the human genome. Nat. Genet. 36, 949–951 (2004).
Tuzun, E. et al. Fine-scale structural variation of the human genome. Nat. Genet. 37, 727–732 (2005).
Redon, R. et al. Global variation in copy number in the human genome. Nature 444, 444–454 (2006).
Korbel, J.O. et al. Paired-end mapping reveals extensive structural variation in the human genome. Science 318, 420–426 (2007).
Kidd, J.M. et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008).
Turner, D.J. et al. Germline rates of de novo meiotic deletions and duplications causing several genomic disorders. Nat. Genet. 40, 90–95 (2008).
van Ommen, G.J. Frequency of new copy number variation in humans. Nat. Genet. 37, 333–334 (2005).
Korbel, J.O. et al. The genetic architecture of Down syndrome phenotypes revealed by high-resolution analysis of human segmental trisomies. Proc. Natl. Acad. Sci. USA 106, 12031–12036 (2009).
Sharp, A.J. et al. Discovery of previously unidentified genomic disorders from the duplication architecture of the human genome. Nat. Genet. 38, 1038–1042 (2006).
McCarroll, S.A. et al. Deletion polymorphism upstream of IRGM associated with altered IRGM expression and Crohn's disease. Nat. Genet. 40, 1107–1112 (2008).
de Cid, R. et al. Deletion of the late cornified envelope LCE3B and LCE3C genes as a susceptibility factor for psoriasis. Nat. Genet. 41, 211–215 (2009).
Gonzalez, E. et al. The influence of CCL3L1 gene-containing segmental duplications on HIV-1/AIDS susceptibility. Science 307, 1434–1440 (2005).
Aitman, T.J. et al. Copy number polymorphism in Fcgr3 predisposes to glomerulonephritis in rats and humans. Nature 439, 851–855 (2006).
Hastings, P.J., Lupski, J.R., Rosenberg, S.M. & Ira, G. Mechanisms of change in gene copy number. Nat. Rev. Genet. 10, 551–564 (2009).
Kim, P.M. et al. Analysis of copy number variants and segmental duplications in the human genome: Evidence for a change in the process of formation in recent evolutionary history. Genome Res. 18, 1865–1874 (2008).
Lander, E.S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
Lee, J.A., Carvalho, C.M. & Lupski, J.R.A. DNA replication mechanism for generating nonrecurrent rearrangements associated with genomic disorders. Cell 131, 1235–1247 (2007).
Feuk, L., Carson, A.R. & Scherer, S.W. Structural variation in the human genome. Nat. Rev. Genet. 7, 85–97 (2006).
Korbel, J.O. et al. PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data. Genome Biol. 10, R23 (2009).
Wheeler, D.A. et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872–876 (2008).
Bentley, D.R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).
Wang, J. et al. The diploid genome sequence of an Asian individual. Nature 456, 60–65 (2008).
Wang, Z., Gerstein, M. & Snyder, M. RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10, 57–63 (2009).
Perry, G.H. et al. The fine-scale and complex architecture of human copy-number variation. Am. J. Hum. Genet. 82, 685–695 (2008).
Mills, R.E., Bennett, E.A., Iskow, R.C. & Devine, S.E. Which transposable elements are active in the human genome? Trends Genet. 23, 183–191 (2007).
Xing, J. et al. Mobile elements create structural variation: analysis of a complete human genome. Genome Res. 19, 1516–1526 (2009).
Myers, S., Bottolo, L., Freeman, C., McVean, G. & Donnelly, P. A fine-scale map of recombination rates and hotspots across the human genome. Science 310, 321–324 (2005).
Sharp, A.J. et al. Segmental duplications and copy-number variation in the human genome. Am. J. Hum. Genet. 77, 78–88 (2005).
Meunier, J. & Duret, L. Recombination drives the evolution of GC-content in the human genome. Mol. Biol. Evol. 21, 984–990 (2004).
Breslauer, K.J., Frank, R., Blocker, H. & Marky, L.A. Predicting DNA duplex stability from the base sequence. Proc. Natl. Acad. Sci. USA 83, 3746–3750 (1986).
Sarai, A., Mazur, J., Nussinov, R. & Jernigan, R.L. Sequence dependence of DNA conformational flexibility. Biochemistry 28, 7842–7849 (1989).
Bailey, J.A. & Eichler, E.E. Primate segmental duplications: crucibles of evolution, diversity and disease. Nat. Rev. Genet. 7, 552–564 (2006).
Bailey, T.L. et al. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 37, W202 (2009).
Myers, S., Freeman, C., Auton, A., Donnelly, P. & McVean, G. A common sequence motif associated with recombination hot spots and genome instability in humans. Nat. Genet. 40, 1124–1129 (2008).
Linardopoulou, E.V. et al. Human subtelomeres are hot spots of interchromosomal recombination and segmental duplication. Nature 437, 94–100 (2005).
Lee, S., Cheran, E. & Brudno, M. A robust framework for detecting structural variations in a genome. Bioinformatics 24, i59–i67 (2008).
Campbell, P.J. et al. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat. Genet. 40, 722–729 (2008).
Chiang, D.Y. et al. High-resolution mapping of copy-number alterations with massively parallel sequencing. Nat. Methods 6, 99–103 (2009).
Wang, L.Y., Abyzov, A., Korbel, J.O., Snyder, M. & Gerstein, M. MSB: a mean-shift-based approach for the analysis of structural variation in the genome. Genome Res. 19, 106–117 (2009).
Ye, K., Schulz, M.H., Long, Q., Apweiler, R. & Ning, Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 25, 2865–2871 (2009).
Paten, B. et al. Genome-wide nucleotide-level mammalian ancestor reconstruction. Genome Res. 18, 1829–1843 (2008).
Spencer, C.C. et al. The influence of recombination on human genetic diversity. PLoS Genet. 2, e148 (2006).
Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007).
Mills, R.E. et al. An initial map of insertion and deletion (INDEL) variation in the human genome. Genome Res. 16, 1182–1190 (2006).
Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W. & Haussler, D. Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes. Proc. Natl. Acad. Sci. USA 100, 11484–11489 (2003).
Schwartz, S. et al. Human-mouse alignments with BLASTZ. Genome Res. 13, 103–107 (2003).
Zhang, Z. et al. PseudoPipe: an automated pseudogene identification pipeline. Bioinformatics 22, 1437–1439 (2006).
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
Acknowledgements
We acknowledge support from the National Institutes of Health, the A.L. Williams Professorship funds and the European Molecular Biology Laboratory. We thank R. Alexander and E. Khurana for proofreading the manuscript, and A. Abyzov, Z. Zhang, T. Rausch and J. Du for helpful discussions. Finally, we thank the 1000 Genomes Project for early data access.
Author information
Authors and Affiliations
Contributions
H.Y.K.L., X.J.M. and J.O.K. contributed equally to this work; M.B.G. and J.O.K. co-directed this work; M.B.G., J.O.K., H.Y.K.L. and X.J.M. designed the research; H.Y.K.L., X.J.M., A.M.S., A.T., P.D.C., M.S., P.M.K. and J.O.K. performed or provided direction for the analyses and/or experiments; M.B.G., J.O.K., H.Y.K.L., X.J.M., P.M.K. and A.T. wrote the manuscript.
Corresponding authors
Supplementary information
Supplementary Text and Figures
Supplementary Figs. 1–6 (PDF 2936 kb)
Rights and permissions
About this article
Cite this article
Lam, H., Mu, X., Stütz, A. et al. Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library. Nat Biotechnol 28, 47–55 (2010). https://doi.org/10.1038/nbt.1600
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nbt.1600
This article is cited by
-
Prevalence and mechanisms of somatic deletions in single human neurons during normal aging and in DNA repair disorders
Nature Communications (2022)
-
No evidence of increased mutations in the germline of a group of British nuclear test veterans
Scientific Reports (2022)
-
Madagascar ground gecko genome analysis characterizes asymmetric fates of duplicated genes
BMC Biology (2018)
-
Copy number variation and disease resistance in plants
Theoretical and Applied Genetics (2017)
-
InDel marker detection by integration of multiple softwares using machine learning techniques
BMC Bioinformatics (2016)