Structural variation in the sequencing era

Abstract

Identifying structural variation (SV) is essential for genome interpretation but has been historically difficult due to limitations inherent to available genome technologies. Detection methods that use ensemble algorithms and emerging sequencing technologies have enabled the discovery of thousands of SVs, uncovering information about their ubiquity, relationship to disease and possible effects on biological mechanisms. Given the variability in SV type and size, along with unique detection biases of emerging genomic platforms, multiplatform discovery is necessary to resolve the full spectrum of variation. Here, we review modern approaches for investigating SVs and proffer that, moving forwards, studies integrating biological information with detection will be necessary to comprehensively understand the impact of SV in the human genome.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Overview of ensemble algorithms.
Fig. 2: Structural variation signatures in single-molecule and connected-molecule strategies.
Fig. 3: Resolving the molecular context behind structural variants by integrating multimodal information.

References

  1. 1.

    1000 Genomes Project Consortium et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    Google Scholar 

  2. 2.

    Conrad, D. F. et al. Origins and functional impact of copy number variation in the human genome. Nature 464, 704–712 (2010).

    CAS  PubMed  Google Scholar 

  3. 3.

    Sudmant, P. H. et al. Diversity of human copy number variation and multicopy genes. Science 330, 641–646 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  4. 4.

    Mills, R. E. et al. Mapping copy number variation by population-scale genome sequencing. Nature 470, 59–65 (2011). This study provides one of the first frameworks for using an ensemble approach to detect structural variants as part of phase 1 for the 1KGP.

    CAS  PubMed  PubMed Central  Google Scholar 

  5. 5.

    Sudmant, P. H. et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015). This paper describes the development of the 1KGP phase 3 release set, which is currently one of the largest and most diverse reference sets.

    CAS  PubMed  PubMed Central  Google Scholar 

  6. 6.

    Sudmant, P. H. et al. Global diversity, population stratification, and selection of human copy-number variation. Science 349, aab3761 (2015).

    PubMed  PubMed Central  Google Scholar 

  7. 7.

    Weischenfeldt, J., Symmons, O., Spitz, F. & Korbel, J. O. Phenotypic impact of genomic structural variation: insights from and for human disease. Nat. Rev. Genet. 14, 125–138 (2013).

    CAS  PubMed  Google Scholar 

  8. 8.

    Spielmann, M., Lupiáñez, D. G. & Mundlos, S. Structural variation in the 3D genome. Nat. Rev. Genet. 19, 453–467 (2018).

    CAS  PubMed  Google Scholar 

  9. 9.

    Alkan, C., Coe, B. P. & Eichler, E. E. Genome structural variation discovery and genotyping. Nat. Rev. Genet. 12, 363–376 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  10. 10.

    Lappalainen, T., Scott, A. J., Brandt, M. & Hall, I. M. Genomic analysis in the age of human genome sequencing. Cell 177, 70–84 (2019).

    CAS  PubMed  Google Scholar 

  11. 11.

    Tuzun, E. et al. Fine-scale structural variation of the human genome. Nat. Genet. 37, 727–732 (2005).

    CAS  PubMed  Google Scholar 

  12. 12.

    Sharp, A. J. et al. Segmental duplications and copy-number variation in the human genome. Am. J. Hum. Genet. 77, 78–88 (2005).

    CAS  PubMed  PubMed Central  Google Scholar 

  13. 13.

    Hastings, P. J., Lupski, J. R., Rosenberg, S. M. & Ira, G. Mechanisms of change in gene copy number. Nat. Rev. Genet. 10, 551–564 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  14. 14.

    Sherry, S. T. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  15. 15.

    International HapMap Consortium. The International HapMap Project. Nature 426, 789–796 (2003).

    Google Scholar 

  16. 16.

    DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. 17.

    Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  18. 18.

    UK10K Consortium. The UK10K project identifies rare variants in health and disease. Nature 526, 82–90 (2015).

    Google Scholar 

  19. 19.

    Zook, J. M. et al. Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat. Biotechnol. 32, 246–251 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  20. 20.

    Exome Aggregation Consortium et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).

    PubMed Central  Google Scholar 

  21. 21.

    Goodwin, S., McPherson, J. D. & McCombie, W. R. Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet. 17, 333–351 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. 22.

    Macintyre, G., Ylstra, B. & Brenton, J. D. Sequencing structural variants in cancer for precision therapeutics. Trends Genet. 32, 530–542 (2016).

    CAS  PubMed  Google Scholar 

  23. 23.

    Yi, K. & Ju, Y. S. Patterns and mechanisms of structural variations in human cancer. Exp. Mol. Med. 50, 98 (2018).

    PubMed  PubMed Central  Google Scholar 

  24. 24.

    Korbel, J. O. et al. Paired-end mapping reveals extensive structural variation in the human genome. Science 318, 420–426 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. 25.

    Yoon, S., Xuan, Z., Makarov, V., Ye, K. & Sebat, J. Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Res. 19, 1586–1592 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  26. 26.

    Hajirasouliha, I. et al. Detection and characterization of novel sequence insertions using paired-end next-generation sequencing. Bioinformatics 26, 1277–1283 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  27. 27.

    Chen, K. et al. BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nat. Methods 6, 677–681 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  28. 28.

    Abyzov, A., Urban, A. E., Snyder, M. & Gerstein, M. CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res. 21, 974–984 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  29. 29.

    Korbel, J. O. et al. PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data. Genome Biol. 10, R23 (2009).

    PubMed  PubMed Central  Google Scholar 

  30. 30.

    Rausch, T. et al. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28, i333–i339 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. 31.

    Handsaker, R. E., Korn, J. M., Nemesh, J. & McCarroll, S. A. Discovery and genotyping of genome structural polymorphism by sequencing on a population scale. Nat. Genet. 43, 269–276 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. 32.

    Layer, R. M., Chiang, C., Quinlan, A. R. & Hall, I. M. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 15, R84 (2014).

    PubMed  PubMed Central  Google Scholar 

  33. 33.

    Chen, X. et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 32, 1220–1222 (2016).

    Google Scholar 

  34. 34.

    Sindi, S. S., Önal, S., Peng, L. C., Wu, H.-T. & Raphael, B. J. An integrative probabilistic model for identification of structural variation in sequencing data. Genome Biol. 13, R22 (2012).

    PubMed  PubMed Central  Google Scholar 

  35. 35.

    Zhao, X., Emery, S. B., Myers, B., Kidd, J. M. & Mills, R. E. Resolving complex structural genomic rearrangements using a randomized approach. Genome Biol. 17, 126 (2016).

    PubMed  PubMed Central  Google Scholar 

  36. 36.

    Michaelson, J. J. & Sebat, J. forestSV: structural variant discovery through statistical learning. Nat. Methods 9, 819–821 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  37. 37.

    Telenti, A. et al. Deep sequencing of 10,000 human genomes. Proc. Natl Acad. Sci. USA 113, 11901–11906 (2016).

    CAS  PubMed  Google Scholar 

  38. 38.

    Kosugi, S. et al. Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing. Genome Biol. 20, 117 (2019). This paper extensively compares the sensitivity of SV detection algorithms and the combinations of these algorithms.

    PubMed  PubMed Central  Google Scholar 

  39. 39.

    Chaisson, M. J. P. et al. Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nat. Commun. 10, 1784 (2019). This study generates one of the most comprehensive multiplatform haplotype-specific SV discovery sets and provides potential frameworks for their integration.

    PubMed  PubMed Central  Google Scholar 

  40. 40.

    Wong, K., Keane, T. M., Stalker, J. & Adams, D. J. Enhanced structural variant and breakpoint detection using SVMerge by integration of multiple detection methods and local assembly. Genome Biol. 11, R128 (2010).

    PubMed  PubMed Central  Google Scholar 

  41. 41.

    Lam, H. Y. K. et al. Detecting and annotating genetic variations using the HugeSeq pipeline. Nat. Biotechnol. 30, 226–229 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  42. 42.

    Parikh, H. et al. svclassify: a method to establish benchmark structural variant calls. BMC Genom. 17, 64 (2016).

    Google Scholar 

  43. 43.

    Collins, R. L. et al. An open resource of structural variation for medical and population genetics. bioRxiv https://doi.org/10.1101/578674 (2019).

  44. 44.

    Abel, H. J. et al. Mapping and characterization of structural variation in 17,795 deeply sequenced human genomes. bioRxiv https://doi.org/10.1101/508515 (2018).

  45. 45.

    Hehir-Kwa, J. Y. et al. A high-quality human reference panel reveals the complexity and distribution of genomic structural variants. Nat. Commun. 7, 12989 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  46. 46.

    Werling, D. M. et al. An analytical framework for whole-genome sequence association studies and its implications for autism spectrum disorder. Nat. Genet. 50, 727–736 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  47. 47.

    Chiang, C. et al. SpeedSeq: ultra-fast personal genome analysis and interpretation. Nat. Methods 12, 966–968 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  48. 48.

    Larson, D. E. et al. svtools: population-scale analysis of structural variation. Bioinformatics https://doi.org/10.1093/bioinformatics/btz492 (2019).

    PubMed  PubMed Central  Google Scholar 

  49. 49.

    Mimori, T. et al. iSVP: an integrated structural variant calling pipeline from high-throughput sequencing data. BMC Syst. Biol. 7, S8 (2013).

    PubMed  PubMed Central  Google Scholar 

  50. 50.

    Zarate, S. et al. Parliament2: fast structural variant calling using optimized combinations of callers. bioRxiv https://doi.org/10.1101/424267 (2018).

  51. 51.

    Mohiyuddin, M. et al. MetaSV: an accurate and integrative structural-variant caller for next generation sequencing. Bioinformatics 31, 2741–2744 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  52. 52.

    Jeffares, D. C. et al. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat. Commun. 8, 14061 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  53. 53.

    Becker, T. et al. FusorSV: an algorithm for optimally combining data from multiple structural variation detection methods. Genome Biol. 19, 38 (2018).

    PubMed  PubMed Central  Google Scholar 

  54. 54.

    Pounraja, V. K., Jayakar, G., Jensen, M., Kelkar, N. & Girirajan, S. A machine-learning approach for accurate detection of copy number variants from exome sequencing. Genome Res. 29, 1134–1143 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  55. 55.

    Huddleston, J. & Eichler, E. E. An incomplete understanding of human genetic variation. Genetics 202, 1251–1254 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  56. 56.

    Iafrate, A. J. et al. Detection of large-scale variation in the human genome. Nat. Genet. 36, 949–951 (2004).

    CAS  PubMed  Google Scholar 

  57. 57.

    Kloosterman, W. P. et al. Characteristics of de novo structural changes in the human genome. Genome Res. 25, 792–801 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  58. 58.

    Nagasaki, M. et al. Rare variant discovery by deep whole-genome sequencing of 1,070 Japanese individuals. Nat. Commun. 6, 8018 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  59. 59.

    Morales, J. et al. A standardized framework for representation of ancestry data in genomics studies, with application to the NHGRI-EBI GWAS Catalog. Genome Biol. 19, 21 (2018).

    PubMed  PubMed Central  Google Scholar 

  60. 60.

    Zook, J. M. et al. A robust benchmark for germline structural variant detection. bioRxiv https://doi.org/10.1101/664623 (2019). This study integrates multiple platforms to develop a gold standard reference set for SV benchmarking.

  61. 61.

    Chaisson, M. J. P. et al. Resolving the complexity of the human genome using single-molecule sequencing. Nature 517, 608–611 (2015). This is one of the first papers using PacBio for comprehensive SV discovery, detecting thousands of previously undetectable SVs, including small insertions in tandem repeats and mobile elements.

    CAS  PubMed  Google Scholar 

  62. 62.

    Treangen, T. J. & Salzberg, S. L. Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat. Rev. Genet. 13, 36–46 (2012).

    CAS  Google Scholar 

  63. 63.

    Medvedev, P., Stanciu, M. & Brudno, M. Computational methods for discovering structural variation with next-generation sequencing. Nat. Methods 6, S13–S20 (2009).

    CAS  PubMed  Google Scholar 

  64. 64.

    Kitzman, J. O. et al. Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nat. Biotechnol. 29, 59–63 (2011).

    CAS  PubMed  Google Scholar 

  65. 65.

    McCoy, R. C. et al. Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly-repetitive transposable elements. PLOS ONE 9, 13 (2014).

    Google Scholar 

  66. 66.

    Zheng, G. X. Y. et al. Haplotyping germline and cancer genomes with high-throughput linked-read sequencing. Nat. Biotechnol. 34, 303–311 (2016). This paper is the first major study using linked reads to detect SVs in human genomes and demonstrates the ability of linked reads in phasing large haplotype blocks and detecting gene fusions.

    CAS  PubMed  PubMed Central  Google Scholar 

  67. 67.

    Bishara, A. et al. Read clouds uncover variation in complex regions of the human genome. Genome Res. 25, 1570–1580 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  68. 68.

    Marks, P. et al. Resolving the full spectrum of human genome variation using linked-reads. Genome Res. 29, 635–645 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  69. 69.

    Spies, N. et al. Genome-wide reconstruction of complex structural variants using read clouds. Nat. Methods 14, 915–920 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  70. 70.

    Fang, L. et al. LinkedSV: detection of mosaic structural variants from linked-read exome and genome sequencing data. bioRxiv https://doi.org/10.1101/409789 (2019).

  71. 71.

    Elyanow, R., Wu, H.-T. & Raphael, B. J. Identifying structural variants using linked-read sequencing data. Bioinformatics 34, 353–360 (2018).

    CAS  PubMed  Google Scholar 

  72. 72.

    Eslami Rasekh, M. et al. Discovery of large genomic inversions using long range information. BMC Genom. 18, 65 (2017).

    Google Scholar 

  73. 73.

    Karaoglanoglu, F. et al. Characterization of segmental duplications and large inversions using linked-reads. bioRxiv https://doi.org/10.1101/394528 (2018).

  74. 74.

    Xia, L. C. et al. Identification of large rearrangements in cancer genomes with barcode linked reads. Nucleic Acids Res. 46, e19 (2018).

    CAS  PubMed  Google Scholar 

  75. 75.

    Wong, K. H. Y., Levy-Sakin, M. & Kwok, P.-Y. De novo human genome assemblies reveal spectrum of alternative haplotypes in diverse populations. Nat. Commun. 9, 3040 (2018).

    PubMed  PubMed Central  Google Scholar 

  76. 76.

    Weisenfeld, N. I., Kumar, V., Shah, P., Church, D. M. & Jaffe, D. B. Direct determination of diploid genome sequences. Genome Res. 27, 757–767 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  77. 77.

    Meleshko, D., Marks, P., Williams, S. & Hajirasouliha, I. Detection and assembly of novel sequence insertions using linked-read technology. bioRxiv https://doi.org/10.1101/551028 (2019).

  78. 78.

    Sedlazeck, F. J., Lee, H., Darby, C. A. & Schatz, M. C. Piercing the dark matter: bioinformatics of long-range sequencing and mapping. Nat. Rev. Genet. 19, 329–346 (2018). This Review discusses the main bioinformatics challenges faced by many of the described technologies. Topics include phasing, assembly, long-range expression and methylation.

    CAS  PubMed  Google Scholar 

  79. 79.

    Shajii, A., Numanagić, I., Whelan, C. & Berger, B. Statistical binning for barcoded reads improves downstream analyses. Cell Syst. 7, 219–226.e5 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  80. 80.

    Falconer, E. et al. DNA template strand sequencing of single-cells maps genomic rearrangements at high resolution. Nat. Methods 9, 1107–1112 (2012). This is the first major study showing the utility of Strand-seq for the detection of chromosomal rearrangements, along with the first application of this method in human genomes.

    CAS  PubMed  PubMed Central  Google Scholar 

  81. 81.

    Sanders, A. D. et al. Characterizing polymorphic inversions in human genomes by single-cell sequencing. Genome Res. 26, 1575–1587 (2016). This paper is the first major work using Strand-seq to detect inversions and reveals numerous inverted loci of interest within the human genome.

    CAS  PubMed  PubMed Central  Google Scholar 

  82. 82.

    Hills, M., O’Neill, K., Falconer, E., Brinkman, R. & Lansdorp, P. M. BAIT: organizing genomes and mapping rearrangements in single cells. Genome Med. 5, 82 (2013).

    PubMed  PubMed Central  Google Scholar 

  83. 83.

    Sanders, A. D., Falconer, E., Hills, M., Spierings, D. C. J. & Lansdorp, P. M. Single-cell template strand sequencing by Strand-seq enables the characterization of individual homologs. Nat. Protoc. 12, 1151–1176 (2017).

    CAS  PubMed  Google Scholar 

  84. 84.

    Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  85. 85.

    Harewood, L. et al. Hi-C as a tool for precise detection and characterisation of chromosomal rearrangements and copy number variation in human tumours. Genome Biol. 18, 125 (2017). This is the first study detecting both large chromosomal rearrangements and copy number changes with Hi-C.

    PubMed  PubMed Central  Google Scholar 

  86. 86.

    Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  87. 87.

    Steininger, A. et al. Genome-wide analysis of interchromosomal interaction probabilities reveals chained translocations and overrepresentation of translocation breakpoints in genes in a cutaneous T-cell lymphoma cell line. Front. Oncol. 8, 183 (2018).

    PubMed  PubMed Central  Google Scholar 

  88. 88.

    Seaman, L. et al. Nucleome analysis reveals structure–function relationships for colon cancer. Mol. Cancer Res. 15, 821–830 (2017).

    CAS  PubMed  Google Scholar 

  89. 89.

    Chakraborty, A. & Ay, F. Identification of copy number variations and translocations in cancer cells from Hi-C data. Bioinformatics 34, 338–345 (2018).

    CAS  PubMed  Google Scholar 

  90. 90.

    Zhang, X. et al. Local and global chromatin interactions are altered by large genomic deletions associated with human brain development. Nat. Commun. 9, 5356 (2018).

    PubMed  PubMed Central  Google Scholar 

  91. 91.

    Dixon, J. R. et al. Integrative detection and analysis of structural variation in cancer genomes. Nat. Genet. 50, 1388–1398 (2018). This study integrates three platforms, showing that their combination is necessary to detect the range of SVs in cancer genomes, and describes the only algorithm that currently detects most SV types with Hi-C.

    CAS  PubMed  PubMed Central  Google Scholar 

  92. 92.

    Díaz, N. et al. Chromatin conformation analysis of primary patient tissue using a low input Hi-C method. Nat. Commun. 9, 4938 (2018).

    PubMed  PubMed Central  Google Scholar 

  93. 93.

    Lee, H. & Schatz, M. C. Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score. Bioinformatics 28, 2097–2105 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  94. 94.

    Stephens, Z., Wang, C., Iyer, R. K. & Kocher, J.-P. Detection and visualization of complex structural variants from long reads. BMC Bioinform. 19, 508 (2018).

    CAS  Google Scholar 

  95. 95.

    English, A. C., Salerno, W. J. & Reid, J. G. PBHoney: identifying genomic variants via long-read discordance and interrupted mapping. BMC Bioinform. 15, 180 (2014).

    Google Scholar 

  96. 96.

    Sedlazeck, F. J. et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods 15, 461–468 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  97. 97.

    Huddleston, J. et al. Discovery and genotyping of structural variation from long-read haploid genome sequence data. Genome Res. 27, 677–685 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  98. 98.

    Heller, D. & Vingron, M. SVIM: structural variant identification using mapped long reads. Bioinformatics 35, 2907–2915 (2019).

    PubMed  PubMed Central  Google Scholar 

  99. 99.

    Fang, L., Hu, J., Wang, D. & Wang, K. NextSV: a meta-caller for structural variants from low-coverage long-read sequencing data. BMC Bioinform. 19, 180 (2018).

    Google Scholar 

  100. 100.

    Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat. Biotechnol. 37, 1155–1162 (2019).

  101. 101.

    Rhoads, A. & Au, K. F. PacBio sequencing and its applications. Genom. Proteom. Bioinform. 13, 278–289 (2015).

    Google Scholar 

  102. 102.

    Pendleton, M. et al. Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat. Methods 12, 780–786 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  103. 103.

    Shi, L. et al. Long-read sequencing and de novo assembly of a Chinese genome. Nat. Commun. 7, 12065 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  104. 104.

    Seo, J.-S. et al. De novo assembly and phasing of a Korean human genome. Nature 538, 243–247 (2016).

    CAS  PubMed  Google Scholar 

  105. 105.

    Ameur, A. et al. De novo assembly of two Swedish genomes reveals missing segments from the human GRCh38 reference and improves variant calling of population-scale sequencing data. Genes 9, 486 (2018).

    PubMed Central  Google Scholar 

  106. 106.

    Kronenberg, Z. N. et al. High-resolution comparative analysis of great ape genomes. Science 360, eaar6343 (2018).

    PubMed  PubMed Central  Google Scholar 

  107. 107.

    Nagasaki, M. Construction of JRG (Japanese reference genome) with single-molecule real-time sequencing. Hum. Genome Var. 6, 27 (2019).

    PubMed  PubMed Central  Google Scholar 

  108. 108.

    Audano, P. A. et al. Characterizing the major structural variant alleles of the human genome. Cell 176, 663–675.e19 (2019). This study is the most comprehensive PacBio-based SV discovery project to date, detecting variants over 15 deeply sequenced individuals and creating a call-set reference with major shared SVs.

    CAS  PubMed  PubMed Central  Google Scholar 

  109. 109.

    Clarke, J. et al. Continuous base identification for single-molecule nanopore DNA sequencing. Nat. Nanotechnol. 4, 265–270 (2009).

    CAS  PubMed  Google Scholar 

  110. 110.

    Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. Science 323, 133–138 (2009).

    CAS  PubMed  Google Scholar 

  111. 111.

    Cretu Stancu, M. et al. Mapping and phasing of structural variation in patient genomes using nanopore sequencing. Nat. Commun. 8, 1326 (2017). This is the first major paper using nanopore sequencing to detect SVs in human genomes and describes the NanoSV algorithm.

    PubMed  PubMed Central  Google Scholar 

  112. 112.

    Gong, L. et al. Picky comprehensively detects high-resolution structural variants in nanopore long reads. Nat. Methods 15, 455–460 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  113. 113.

    De Coster, W. et al. Structural variants identified by Oxford Nanopore PromethION sequencing of the human genome. Genome Res. 29, 1178–1187 (2019).

    PubMed  PubMed Central  Google Scholar 

  114. 114.

    Lam, E. T. et al. Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly. Nat. Biotechnol. 30, 771–776 (2012). This is the first major study using Bionano optical mapping to detect SVs in human genomes, leveraging the long molecules to characterize the highly polymorphic major histocompatibility complex.

    CAS  PubMed  Google Scholar 

  115. 115.

    Schwartz, D. et al. Ordered restriction maps of Saccharomyces cerevisiae chromosomes constructed by optical mapping. Science 262, 110–114 (1993).

    CAS  PubMed  Google Scholar 

  116. 116.

    Teague, B. et al. High-resolution human genome structure by single-molecule analysis. Proc. Natl Acad. Sci. USA 107, 10848–10853 (2010).

    CAS  PubMed  Google Scholar 

  117. 117.

    Cao, H. et al. Rapid detection of structural variation in a human genome using nanochannel-based genome mapping technology. GigaScience 3, 34 (2014).

    PubMed  PubMed Central  Google Scholar 

  118. 118.

    Mak, A. C. Y. et al. Genome-wide structural variation detection by genome mapping on nanochannel arrays. Genetics 202, 351–362 (2016).

    CAS  PubMed  Google Scholar 

  119. 119.

    Levy-Sakin, M. et al. Genome maps across 26 human populations reveal population-specific patterns of structural variation. Nat. Commun. 10, 1025 (2019).

    PubMed  PubMed Central  Google Scholar 

  120. 120.

    Li, L. et al. OMSV enables accurate and comprehensive identification of large structural variations from nanochannel-based single-molecule optical maps. Genome Biol. 18, 230 (2017).

    PubMed  PubMed Central  Google Scholar 

  121. 121.

    Hastie, A. R. et al. Rapid automated large structural variation detection in a diploid genome by nanochannel based next-generation mapping. bioRxiv https://doi.org/10.1101/102764 (2017).

  122. 122.

    Lima, L. et al. Comparative assessment of long-read error correction software applied to nanopore RNA-sequencing data. Brief. Bioinform. https://doi.org/10.1093/bib/bbz058 (2019).

  123. 123.

    Fu, S., Wang, A. & Au, K. F. A comparative evaluation of hybrid error correction methods for error-prone long reads. Genome Biol. 20, 26 (2019).

    PubMed  PubMed Central  Google Scholar 

  124. 124.

    Zhang, H., Jain, C. & Aluru, S. A comprehensive evaluation of long read error correction methods. bioRxiv https://doi.org/10.1101/519330 (2019)

  125. 125.

    Jaratlerdsiri, W. et al. Next generation mapping reveals novel large genomic rearrangements in prostate cancer. Oncotarget 8, 23588–23602 (2017).

    PubMed  PubMed Central  Google Scholar 

  126. 126.

    Xu, J. et al. An integrated framework for genome analysis reveals numerous previously unrecognizable structural variants in leukemia patients’ samples. bioRxiv https://doi.org/10.1101/563270 (2019).

  127. 127.

    Zhou, B. et al. Comprehensive, integrated, and phased whole-genome analysis of the primary ENCODE cell line K562. Genome Res. 29, 472–484 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  128. 128.

    Zhou, B. et al. Haplotype-resolved and integrated genome analysis of the cancer cell line HepG2. Nucleic Acids Res. 47, 3846–3861 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  129. 129.

    Chan, E. K. F. et al. Optical mapping reveals a higher level of genomic architecture of chained fusions in cancer. Genome Res. 28, 726–738 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  130. 130.

    English, A. C. et al. Assessing structural variation in a personal genome—towards a human reference diploid genome. BMC Genom. 16, 286 (2015). This study is one of the first applications of hybrid assembly for structural variant detection, showing highly increased sensitivity from platform integration.

    Google Scholar 

  131. 131.

    Ritz, A. et al. Characterization of structural variants with single molecule and hybrid sequencing approaches. Bioinformatics 30, 3458–3466 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  132. 132.

    Fan, X., Chaisson, M., Nakhleh, L. & Chen, K. HySA: a Hybrid Structural variant Assembly approach using next-generation and single-molecule sequencing technologies. Genome Res. 27, 793–800 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  133. 133.

    Weischenfeldt, J. et al. Pan-cancer analysis of somatic copy-number alterations implicates IRS4 and IGF2 in enhancer hijacking. Nat. Genet. 49, 65–74 (2017).

    CAS  PubMed  Google Scholar 

  134. 134.

    McPherson, A. et al. deFuse: an algorithm for gene fusion discovery in tumor RNA-Seq data. PLOS Comput. Biol. 7, e1001138 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  135. 135.

    McPherson, A. et al. nFuse: discovery of complex genomic rearrangements in cancer using high-throughput sequencing. Genome Res. 22, 2250–2261 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  136. 136.

    Yorukoglu, D. et al. Dissect: detection and characterization of novel structural alterations in transcribed sequences. Bioinformatics 28, i179–i187 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  137. 137.

    Franke, M. et al. Formation of new chromatin domains determines pathogenicity of genomic duplications. Nature 538, 265–269 (2016).

    CAS  PubMed  Google Scholar 

  138. 138.

    Gheldof, N. et al. Structural variation-associated expression changes are paralleled by chromatin architecture modifications. PLOS ONE 8, e79973 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  139. 139.

    Fudenberg, G. & Pollard, K. S. Chromatin features constrain structural variation across evolutionary timescales. Proc. Natl Acad. Sci. USA 116, 2175–2180 (2019).

    CAS  PubMed  Google Scholar 

  140. 140.

    Quigley, D. A. et al. Genomic hallmarks and structural variation in metastatic prostate cancer. Cell 174, 758–769.e9 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  141. 141.

    Stranger, B. E. et al. Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science 315, 848–853 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  142. 142.

    Chiang, C. et al. The impact of structural variation on human gene expression. Nat. Genet. 49, 692–699 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  143. 143.

    Merker, J. D. et al. Long-read genome sequencing identifies causal structural variation in a Mendelian disease. Genet. Med. 20, 159–163 (2018).

    CAS  PubMed  Google Scholar 

  144. 144.

    Miao, H. et al. Long-read sequencing identified a causal structural variant in an exome-negative case and enabled preimplantation genetic diagnosis. Hereditas 155, 32 (2018).

    PubMed  PubMed Central  Google Scholar 

  145. 145.

    Roberts, D. S. et al. Linked-read sequencing analysis reveals tumor-specific genome variation landscapes in neurofibromatosis type 2 (NF2) patients. Otol. Neurotol. 40, e150–e159 (2019).

    PubMed  Google Scholar 

  146. 146.

    Sanchis-Juan, A. et al. Complex structural variants in Mendelian disorders: identification and breakpoint resolution using short- and long-read genome sequencing. Genome Med. 10, 95 (2018).

  147. 147.

    Cantsilieris, S. et al. Recurrent structural variation, clustered sites of selection, and disease risk for the complement factor H (CFH) gene family. Proc. Natl Acad. Sci. USA 115, E4433–E4442 (2018).

    CAS  PubMed  Google Scholar 

  148. 148.

    Nattestad, M. et al. Complex rearrangements and oncogene amplifications revealed by long-read DNA and RNA sequencing of a breast cancer cell line. Genome Res. 28, 1126–1135 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  149. 149.

    Aneichyk, T. et al. Dissecting the causal mechanism of X-linked dystonia–parkinsonism by integrating genome and transcriptome assembly. Cell 172, 897–909.e21 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  150. 150.

    Sharim, H. et al. Long-read single-molecule maps of the functional methylome. Genome Res. 29, 646–656 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  151. 151.

    Lee, I. et al. Simultaneous profiling of chromatin accessibility and methylation on human cell lines with nanopore sequencing. bioRxiv https://doi.org/10.1101/504993 (2019).

  152. 152.

    Beck, C. R. et al. Megabase length hypermutation accompanies human structural variation at 17p11.2. Cell 176, 1310–1324.e10 (2019).

    CAS  PubMed  Google Scholar 

  153. 153.

    Viswanathan, S. R. et al. Structural alterations driving castration-resistant prostate cancer revealed by linked-read genome sequencing. Cell 174, 433–447.e19 (2018). This study leverages layered biological information to understand the role of SVs in oncogene amplification for a specific cancer type.

    CAS  PubMed  PubMed Central  Google Scholar 

  154. 154.

    Huynh, L. & Hormozdiari, F. TAD fusion score: discovery and ranking the contribution of deletions to genome structure. Genome Biol. 20, 60 (2019).

  155. 155.

    Feuk, L., Carson, A. R. & Scherer, S. W. Structural variation in the human genome. Nat. Rev. Genet. 7, 85–97 (2006).

    CAS  PubMed  Google Scholar 

  156. 156.

    Sebat, J. Large-scale copy number polymorphism in the human genome. Science 305, 525–528 (2004).

    CAS  PubMed  Google Scholar 

  157. 157.

    Redon, R. et al. Global variation in copy number in the human genome. Nature 444, 444–454 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  158. 158.

    McCarroll, S. A. et al. Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat. Genet. 40, 1166–1174 (2008).

    CAS  PubMed  Google Scholar 

  159. 159.

    Kidd, J. M. et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  160. 160.

    Zhou, B. et al. Whole-genome sequencing analysis of CNV using low-coverage and paired-end strategies is efficient and outperforms array-based CNV analysis. J. Med. Genet. 55, 735–743 (2018).

    CAS  PubMed  Google Scholar 

  161. 161.

    Speicher, M. R. & Carter, N. P. The new cytogenetics: blurring the boundaries with molecular biology. Nat. Rev. Genet. 6, 782–792 (2005).

    CAS  PubMed  Google Scholar 

  162. 162.

    Lee, C., Iafrate, A. J. & Brothman, A. R. Copy number variations and clinical cytogenetic diagnosis of constitutional disorders. Nat. Genet. 39, S48–S54 (2007).

    CAS  PubMed  Google Scholar 

  163. 163.

    Scherer, S. W. et al. Challenges and standards in integrating surveys of structural variation. Nat. Genet. 39, S7–S15 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  164. 164.

    Tattini, L., D’Aurizio, R. & Magi, A. Detection of genomic structural variants from next-generation sequencing data. Front. Bioeng. Biotechnol. 3, 92 (2015).

    PubMed  PubMed Central  Google Scholar 

  165. 165.

    Guan, P. & Sung, W.-K. Structural variation detection using next-generation sequencing data. Methods 102, 36–49 (2016).

    CAS  PubMed  Google Scholar 

  166. 166.

    Quinlan, A. R. & Hall, I. M. Characterizing complex structural variation in germline and somatic genomes. Trends Genet. 28, 43–53 (2012).

    CAS  PubMed  Google Scholar 

  167. 167.

    Tan, R. et al. An evaluation of copy number variation detection tools from whole-exome sequencing data. Hum. Mutat. 35, 899–907 (2014).

    CAS  PubMed  Google Scholar 

  168. 168.

    Hehir-Kwa, J. Y., Tops, B. B. J. & Kemmeren, P. The clinical implementation of copy number detection in the age of next-generation sequencing. Expert. Rev. Mol. Diagn. 18, 907–915 (2018).

    CAS  PubMed  Google Scholar 

  169. 169.

    Hehir-Kwa, J. Y., Pfundt, R. & Veltman, J. A. Exome sequencing and whole genome sequencing for the detection of copy number variation. Expert. Rev. Mol. Diagn. 15, 1023–1032 (2015).

    CAS  PubMed  Google Scholar 

  170. 170.

    Pang, A. W. et al. Towards a comprehensive structural variation map of an individual human genome. Genome Biol. 11, R52 (2010).

    PubMed  PubMed Central  Google Scholar 

  171. 171.

    Park, H. et al. Discovery of common Asian copy number variants using integrated high-resolution array CGH and massively parallel DNA sequencing. Nat. Genet. 42, 400–405 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  172. 172.

    Jain, M. et al. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat. Biotechnol. 36, 338–345 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  173. 173.

    Anderson-Trocmé, L. et al. Legacy data confounds genomics studies. Mol. Biol. Evol. https://doi.org/10.1093/molbev/msz201 (2019).

    Google Scholar 

  174. 174.

    Lappalainen, I. et al. dbVar and DGVa: public archives for genomic structural variation. Nucleic Acids Res. 41, D936–D941 (2012).

    PubMed  PubMed Central  Google Scholar 

  175. 175.

    Demaerel, W. et al. The 22q11 low copy repeats are characterized by unprecedented size and structure variability. Genome Res. 29, 1389–1401 (2019).

    CAS  PubMed  Google Scholar 

  176. 176.

    Carvalho, C. M. B. & Lupski, J. R. Mechanisms underlying structural variant formation in genomic disorders. Nat. Rev. Genet. 17, 224–238 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  177. 177.

    Vollger, M. R. et al. Long-read sequence and assembly of segmental duplications. Nat. Methods 16, 88–94 (2019).

    CAS  PubMed  Google Scholar 

  178. 178.

    Jiang, T., Liu, B., Li, J. & Wang, Y. rMETL: sensitive mobile element insertion detection with long read realignment. Bioinformatics https://doi.org/10.1093/bioinformatics/btz106 (2019).

    PubMed  Google Scholar 

  179. 179.

    Meng, G. et al. TSD: a computational tool to study the complex structural variants using PacBio targeted sequencing data. G3 9, 1371–1376 (2019).

    CAS  PubMed  Google Scholar 

  180. 180.

    Frith, M. C. & Khan, S. A survey of localized sequence rearrangements in human DNA. Nucleic Acids Res. 46, 1661–1673 (2018).

    CAS  PubMed  Google Scholar 

  181. 181.

    Greer, S. U. & Ji, H. P. Structural variant analysis for linked-read sequencing data with gemtools. Bioinformatics https://doi.org/10.1093/bioinformatics/btz239 (2019).

    Article  PubMed  Google Scholar 

  182. 182.

    Bakhtiari, M., Shleizer-Burko, S., Gymrek, M., Bansal, V. & Bafna, V. Targeted genotyping of variable number tandem repeats with adVNTR. Genome Res. 28, 1709–1719 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  183. 183.

    Ummat, A. & Bashir, A. Resolving complex tandem repeats with long reads. Bioinformatics 30, 3491–3498 (2014).

    CAS  PubMed  Google Scholar 

  184. 184.

    Liu, Q., Zhang, P., Wang, D., Gu, W. & Wang, K. Interrogating the “unsequenceable” genomic trinucleotide repeat disorders by long-read sequencing. Genome Med. 9, 65 (2017).

    PubMed  PubMed Central  Google Scholar 

  185. 185.

    Shao, H. et al. npInv: accurate detection and genotyping of inversions using long read sub-alignment. BMC Bioinform. 19, 261 (2018).

    Google Scholar 

  186. 186.

    Mitsuhashi, S. Tandem-genotypes: robust detection of tandem repeat expansions from long DNA reads. Genome Biol. 20, 58 (2019).

    PubMed  PubMed Central  Google Scholar 

  187. 187.

    Mitelman, F., Johansson, B. & Mertens, F. The impact of translocations and gene fusions on cancer causation. Nat. Rev. Cancer 7, 233–245 (2007).

    CAS  PubMed  Google Scholar 

  188. 188.

    Zhang, Q. et al. Clinical application of single-molecule optical mapping to a multigeneration FSHD1 pedigree. Mol. Genet. Genom. Med. 7, e565 (2019).

    Google Scholar 

  189. 189.

    Norris, A. L., Workman, R. E., Fan, Y., Eshleman, J. R. & Timp, W. Nanopore sequencing detects structural variants in cancer. Cancer Biol. Ther. 17, 246–253 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  190. 190.

    Euskirchen, P. et al. Same-day genomic and epigenomic diagnosis of brain tumors using real-time nanopore sequencing. Acta Neuropathol. 134, 691–703 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  191. 191.

    Jacobson, E. C. et al. Hi-C detects novel structural variants in HL-60 and HL-60/S4 cell lines. Genomics https://doi.org/10.1016/j.ygeno.2019.05.009 (2019).

    Article  PubMed  Google Scholar 

  192. 192.

    Greer, S. U. et al. Linked read sequencing resolves complex genomic rearrangements in gastric cancer metastases. Genome Med. 9, 57 (2017).

    PubMed  PubMed Central  Google Scholar 

  193. 193.

    Sebat, J. et al. Strong association of de novo copy number mutations with autism. Science 316, 445–449 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  194. 194.

    Marshall, C. R. et al. Structural variation of chromosomes in autism spectrum disorder. Am. J. Hum. Genet. 82, 477–488 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  195. 195.

    Sullivan, P. F. & Geschwind, D. H. Defining the genetic, genomic, cellular, and diagnostic architectures of psychiatric disorders. Cell 177, 162–183 (2019).

    CAS  PubMed  Google Scholar 

  196. 196.

    Yuen, R. K. et al. Genome-wide characteristics of de novo mutations in autism. Npj Genomic Med. 1, 160271–1602710 (2016).

    Google Scholar 

  197. 197.

    Brand, H. et al. Paired-duplication signatures mark cryptic inversions and other complex structural variation. Am. J. Hum. Genet. 97, 170–176 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  198. 198.

    Turner, T. N. et al. Genome sequencing of autism-affected families reveals disruption of putative noncoding regulatory DNA. Am. J. Hum. Genet. 98, 58–74 (2016).

    CAS  PubMed  Google Scholar 

  199. 199.

    Brandler, W. M. et al. Paternally inherited cis-regulatory structural variants are associated with autism. Science 360, 327–331 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  200. 200.

    Turner, T. N. et al. Genomic patterns of de novo mutation in simplex autism. Cell 171, 710–722.e12 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  201. 201.

    Mizuguchi, T. et al. Detecting a long insertion variant in SAMD12 by SMRT sequencing: implications of long-read whole-genome sequencing for repeat expansion diseases. J. Hum. Genet. 64, 191–197 (2019).

    CAS  PubMed  Google Scholar 

  202. 202.

    Mizuguchi, T. et al. A 12-kb structural variation in progressive myoclonic epilepsy was newly identified by long-read whole-genome sequencing. J. Hum. Genet. 64, 359–368 (2019).

    PubMed  Google Scholar 

  203. 203.

    Barseghyan, H. et al. Next-generation mapping: a novel approach for detection of pathogenic structural variants with a potential utility in clinical diagnosis. Genome Med. 9, 90 (2017).

  204. 204.

    Collins, R. L. et al. Defining the diverse spectrum of inversions, complex structural variation, and chromothripsis in the morbid human genome. Genome Biol. 18, 36 (2017).

  205. 205.

    Eisfeldt, J. et al. Comprehensive structural variation genome map of individuals carrying complex chromosomal rearrangements. PLOS Genet. 15, e1007858 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  206. 206.

    Dutta, U. R. et al. Breakpoint mapping of a novel de novo translocation t(X;20)(q11.1;p13) by positional cloning and long read sequencing. Genomics 111, 1108–1114 (2019).

    CAS  PubMed  Google Scholar 

  207. 207.

    Sherman, R. M. et al. Assembly of a pan-genome from deep sequencing of 910 humans of African descent. Nat. Genet. 51, 30–35 (2019).

    CAS  PubMed  Google Scholar 

  208. 208.

    Zhou, B. et al. Extensive and deep sequencing of the Venter/HuRef genome for developing and benchmarking genome analysis tools. Sci. Data 5, 180261 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  209. 209.

    Levy, S. et al. The diploid genome sequence of an individual human. PLOS Biol. 5, e254 (2007).

    PubMed  PubMed Central  Google Scholar 

  210. 210.

    Miga, K. H. et al. Telomere-to-telomere assembly of a complete human X chromosome. bioRxiv https://doi.org/10.1101/735928 (2019).

  211. 211.

    Wang, Y.-C. et al. High-coverage, long-read sequencing of Han Chinese trio reference samples. Sci. Data 6, 91 (2019).

    PubMed  PubMed Central  Google Scholar 

  212. 212.

    Zook, J. M. et al. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Sci. Data 3, 160025 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

The authors thank Y. Wang, W. Zhou, A. Weber and B. Zhou for their valuable comments and help with proofreading the manuscript. S.S.H. was supported through the Michigan Predoctoral Training in Genetics grant (T32 GM007544). A.E.U. acknowledges funding by the National Institutes of Health (NIH) and the Simons Foundation, and is a Tashia and John Morgridge Faculty Scholar of the Stanford Child Health Research Institute.

Author information

Affiliations

Authors

Contributions

S.S.H. and R.E.M researched the literature and wrote the article. All authors provided substantial contributions to discussions of the content, and reviewed and/or edited the manuscript before submission.

Corresponding author

Correspondence to Ryan E. Mills.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information

Nature Reviews Genetics thanks C. Alkan, F. Sedlazeck and M. Talkowski for their contribution to the peer review of this work.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Glossary

Structural variations

(SVs). Operationally defined as sequence variants >50 bp in size. The most recognized forms of SV include deletions, duplications, inversions, insertions and translocations.

Complex rearrangements

A structural variant that consists of multiple combinations of structural variant types nested or clustered with one another.

Read signatures

Specific marks that result from reads that map discordantly to the reference genome.

Short-read HTS

(Short-read high-throughput sequencing). Standard sequencing where libraries are fragmented to ~600–800 bp in length. Two ends are sequenced ~100–250 bp with an unsequenced insert size of ~100–600 bp.

Flow cells

Glass slides containing fluidic channels for sequencing reactions to occur.

Microfluidics

Devices that precisely manipulate and control small amounts of fluids.

SV callers

An algorithm designed to detect structural variations (SVs). Each putative SV detected by a caller is an individual ‘call’. ‘Call’ derives from computer science, meaning to invoke a particular task; detected SVs are the result of each performed ‘task’.

Sensitivity

The ability to detect known variants correctly. Low sensitivity implies low ability to detect bona fide variants.

Reference data sets

High-resolution structural variation data sets typically derived from de novo genome assemblies, population-scale sequencing or projects employing multiple orthogonal detection methods. Reference sets are used to benchmark detection algorithms and determine the novelty and rarity of structural variation calls.

Ensemble algorithm

A detection method that combines the resulting call sets from multiple independent algorithms.

False-discovery rate

The expected number of calls that should be false but are marked as true within the final call set.

Coordinate overlap

The number of base pairs that are identical between two different variant calls.

Purifying selection

A process of natural selection where strongly deleterious alleles are selectively removed from a population.

Phased SVs

(Phased structural variations). Variants that are assigned to a paternal haplotype, often computed using family trio or heterozygous single-nucleotide variant data.

Receiver operating characteristic curves

Plots of the true positive rate against the false positive rate showing the relationship between sensitivity and specificity.

Connected-molecule strategies

Genomic methods that connect shorter reads of a DNA molecule together to provide long-range information.

Sequence coverage

The average number of times a given locus is covered by a sequence read.

Physical coverage

The average number of times a given locus is covered by the cumulative length of the reads, including unsequenced inserts.

Single-molecule strategies

Genomic methods that read the entirety of long strands of DNA.

Specificity

The ability to detect the absence of variants correctly. Low specificity implies many false positives.

Base-calling error

Errors in determining the respective nucleotide from raw signals during sequencing.

Circular consensus sequencing

A single-molecule real-time (SMRT) sequencing method that improves accuracy through multiple passes of the template molecule.

Hybrid assembly

A genome assembly that leverages sequencing data from multiple platforms to reconstruct the original sequence, using the orthogonal data to extend the contig lengths or to branch contigs to one another.

N50

A number that denotes the minimum contig size for which 50% of the nucleotide sequence is contained within. A larger N50 implies a more contiguous assembly.

Topologically associating domain

A spatial partition of the genome where segments within these domains are enriched for interactions with each other when compared with interactions with segments outside the domain.

Allelic bias

Gene expression that is biased towards one allele over the other.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ho, S.S., Urban, A.E. & Mills, R.E. Structural variation in the sequencing era. Nat Rev Genet 21, 171–189 (2020). https://doi.org/10.1038/s41576-019-0180-9

Download citation

Further reading

Search

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing