Structural variation was originally defined as insertions, deletions and inversions greater than 1 kb in size, but with the sequencing of human genomes now becoming routine, the operational spectrum of structural variants has widened to include events >50 bp in length.
The main focus of structural variant (SV) studies should be accurate characterization of the copy, content and structure of genomic variants.
Methods to discover and genotype structural variation can be divided into two main types: experimental and computational.
Experimental methods for discovering SVs include hybridization-based approaches (SNP microarrays and array comparative genomic hybridization) and single-molecule analysis (optical mapping). In addition, PCR-based techniques can be used to genotype SVs.
Computational methods use genome sequencing data to discover and genotype SVs. There are four main computational approaches: read-pair, read-depth, split-read and sequence-assembly methods.
All existing platforms and methods have different biases and limitations. Accurate characterization of the full spectrum of structural variation remains a challenge.
Comparisons of human genomes show that more base pairs are altered as a result of structural variation — including copy number variation — than as a result of point mutations. Here we review advances and challenges in the discovery and genotyping of structural variation. The recent application of massively parallel sequencing methods has complemented microarray-based methods and has led to an exponential increase in the discovery of smaller structural-variation events. Some global discovery biases remain, but the integration of experimental and computational approaches is proving fruitful for accurate characterization of the copy, content and structure of variable regions. We argue that the long-term goal should be routine, cost-effective and high quality de novo assembly of human genomes to comprehensively assess all classes of structural variation.
This is a preview of subscription content
Subscribe to Journal
Get full journal access for 1 year
only $4.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
Iafrate, A. J. et al. Detection of large-scale variation in the human genome. Nature Genet. 36, 949–951 (2004). The first report of CNVs in the human genome using array CGH.
Redon, R. et al. Global variation in copy number in the human genome. Nature 444, 444–454 (2006).
Tuzun, E. et al. Fine-scale structural variation of the human genome. Nature Genet. 37, 727–732 (2005). The first study to implement a paired-end sequencing approach to study structural variation.
Kidd, J. M. et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008).
Conrad, D. F. et al. Origins and functional impact of copy number variation in the human genome. Nature 464, 704–712 (2010). This study represents the first application of an ultra-high-density CGH array.
Sebat, J. et al. Large-scale copy number polymorphism in the human genome. Science 305, 525–528 (2004).
Feuk, L., Carson, A. R. & Scherer, S. W. Structural variation in the human genome. Nature Rev. Genet. 7, 85–97 (2006).
The 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010). A milestone paper describing the pilot phase of the 1000 Genomes Project, the most extensive study on genomic variation in human genomes to date.
Sebat, J. et al. Strong association of de novo copy number mutations with autism. Science 316, 445–449 (2007). The first study to report CNVs in a common complex neuropsychiatric disease.
Sharp, A. J. et al. Discovery of previously unidentified genomic disorders from the duplication architecture of the human genome. Nature Genet. 38, 1038–1042 (2006).
de Vries, B. B. et al. Diagnostic genome profiling in mental retardation. Am. J. Hum. Genet. 77, 606–616 (2005).
Stankiewicz, P. & Lupski, J. R. Genomic architecture, rearrangements and genomic disorders. Trends Genet. 18, 74–82 (2002).
Fellermann, K. et al. A chromosome 8 gene-cluster polymorphism with low human b-defensin 2 gene copy number predisposes to Crohn disease of the colon. Am. J. Hum. Genet. 79, 439–448 (2006).
Aitman, T. J. et al. Copy number polymorphism in Fcgr3 predisposes to glomerulonephritis in rats and humans. Nature 439, 851–855 (2006).
Locke, D. P. et al. BAC microarray analysis of 15q11–q13 rearrangements and the impact of segmental duplications. J. Med. Genet. 41, 175–182 (2004).
Itsara, A. et al. Population analysis of large copy number variants and hotspots of human genetic disease. Am. J. Hum. Genet. 84, 148–161 (2009).
Snijders, A. M. et al. Assembly of microarrays for genome-wide measurement of DNA copy number. Nature Genet. 29, 263–264 (2001).
Pinkel, D. et al. High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nature Genet. 20, 207–211 (1998).
Park, H. et al. Discovery of common Asian copy number variants using integrated high-resolution array CGH and massively parallel DNA sequencing. Nature Genet. 42, 400–405 (2010).
McCarroll, S. A. et al. Integrated detection and population-genetic analysis of SNPs and copy number variation. Nature Genet. 40, 1166–1174 (2008).
Perry, G. H. et al. The fine-scale and complex architecture of human copy-number variation. Am. J. Hum. Genet. 82, 685–695 (2008).
Miller, D. T. et al. Consensus statement: chromosomal microarray is a first-tier clinical diagnostic test for individuals with developmental disabilities or congenital anomalies. Am. J. Hum. Genet. 86, 749–764 (2010).
Cooper, G. M., Zerr, T., Kidd, J. M., Eichler, E. E. & Nickerson, D. A. Systematic assessment of copy number variant detection via genome-wide SNP genotyping. Nature Genet. 40, 1199–1203 (2008).
Peiffer, D. A. et al. High-resolution genomic profiling of chromosomal aberrations using Infinium whole-genome genotyping. Genome Res. 16, 1136–1148 (2006).
Coe, B. P. et al. Resolving the resolution of array CGH. Genomics 89, 647–653 (2007).
Greshock, J. et al. A comparison of DNA copy number profiling platforms. Cancer Res. 67, 10173–10180 (2007).
Curtis, C. et al. The pitfalls of platform comparison: DNA copy number array technologies assessed. BMC Genomics 10, 588 (2009).
Jakobsson, M. et al. Genotype, haplotype and copy-number variation in worldwide human populations. Nature 451, 998–1003 (2008).
Gusev, A. et al. Whole population, genome-wide mapping of hidden relatedness. Genome Res. 19, 318–326 (2009).
Winchester, L., Yau, C. & Ragoussis, J. Comparing CNV detection methods for SNP arrays. Brief. Funct. Genomic. Proteomic. 8, 353–366 (2009).
Kidd, J. M. et al. Characterization of missing human genome sequences and copy-number polymorphic insertions. Nature Methods 7, 365–371 (2010).
Wellcome Trust Case Control Consortium. Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls. Nature 464, 713–720 (2010).
Paris, P. L. et al. High resolution oligonucleotide CGH using DNA from archived prostate tissue. The Prostate 67, 1447–1455 (2007).
Hehir-Kwa, J. Y. et al. Genome-wide copy number profiling on high-density bacterial artificial chromosomes, single-nucleotide polymorphisms, and oligonucleotide microarrays: a platform comparison based on statistical power analysis. DNA Res. 14, 1–11 (2007).
Wicker, N. et al. A new look towards BAC-based array CGH through a comprehensive comparison with oligo-based array CGH. BMC Genomics 8, 84 (2007).
van de Wiel, M. A. et al. CGHcall: calling aberrations for array CGH tumor profiles. Bioinformatics (Oxford, England) 23, 892–894 (2007).
van Wieringen, W. N., van de Wiel, M. A. & Ylstra, B. Normalized, segmented or called aCGH data? Cancer Inform. 3, 321–327 (2007).
Wang, K. et al. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 17, 1665–1674 (2007).
Korn, J. M. et al. Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nature Genet. 40, 1253–1260 (2008).
Coe, B. P., Chari, R., MacAulay, C. & Lam, W. L. FACADE: a fast and sensitive algorithm for the segmentation and calling of high resolution array CGH data. Nucleic Acids Res. 38, e157 (2010).
Dellinger, A. E. et al. Comparative analyses of seven algorithms for copy number variant identification from single nucleotide polymorphism arrays. Nucleic Acids Res. 38, e105 (2010).
Church, D. M. et al. Public data archives for genomic structural variation. Nature Genet. 42, 813–814 (2010).
Walsh, T. et al. Rare structural variants disrupt multiple genes in neurodevelopmental pathways in schizophrenia. Science 320, 539–543 (2008).
Heinzen, E. L. et al. Rare deletions at 16p13.11 predispose to a diverse spectrum of sporadic epilepsy syndromes. Am. J. Hum. Genet. 86, 707–718 (2010).
Bailey, J. A. et al. Recent segmental duplications in the human genome. Science 297, 1003–1007 (2002).
Mailman, M. D. et al. The NCBI dbGaP database of genotypes and phenotypes. Nature Genet. 39, 1181–1186 (2007).
Trask, B. J. et al. Large multi-chromosomal duplications encompass many members of the olfactory receptor gene family in the human genome. Hum. Mol. Genet. 7, 2007–2020 (1998).
Schwartz, D. C. et al. Ordered restriction maps of Saccharomyces cerevisiae chromosomes constructed by optical mapping. Science 262, 110–114 (1993).
Teague, B. et al. High-resolution human genome structure by single-molecule analysis. Proc. Natl Acad. Sci. USA 107, 10848–10853 (2010). Application of the optical mapping technology to characterize human genome structure.
Antonacci, F. et al. A large and complex structural polymorphism at 16p12.1 underlies microdeletion disease risk. Nature Genet. 42, 745–750 (2010).
Das, S. K. et al. Single molecule linear analysis of DNA in nano-channel labeled with sequence specific fluorescent probes. Nucleic Acids Res. 38, e177 (2010).
Jo, K. et al. A single-molecule barcoding system using nanoslits for DNA analysis. Proc. Natl Acad. Sci. USA 104, 2673–2678 (2007).
Xiao, M. et al. Direct determination of haplotypes from single DNA molecules. Nature Methods 6, 199–201 (2009).
Beer, N. R. et al. On-chip, real-time, single-copy polymerase chain reaction in picoliter droplets. Anal. Chem. 79, 8471–8475 (2007).
Pushkarev, D., Neff, N. F. & Quake, S. R. Single-molecule sequencing of an individual human genome. Nature Biotech. 27, 847–852 (2009).
Wheeler, D. A. et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872–876 (2008).
Bentley, D. R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).
McKernan, K. J. et al. Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. Genome Res. 19, 1527–1541 (2009).
Korbel, J. O. et al. Paired-end mapping reveals extensive structural variation in the human genome. Science 318, 420–426 (2007). The first study in SV discovery using second-generation sequencing technologies.
Volik, S. et al. End-sequence profiling: sequence-based analysis of aberrant genomes. Proc. Natl Acad. Sci. USA 100, 7696–7701 (2003).
Medvedev, P., Stanciu, M. & Brudno, M. Computational methods for discovering structural variation with next-generation sequencing. Nature Methods 6, S13–S20 (2009). An extensive review on sequencing-based methods for discovering structural variation.
Mills, R. E. et al. Mapping copy number variation at fine scale by population scale genome sequencing. Nature 470, 59–65 (2011). Describes the SV discovery and analysis efforts of the 1000 Genomes Project.
Kidd, J. M. et al. A human genome structural variation sequencing resource reveals insights into mutational mechanisms. Cell 143, 837–847 (2010).
Korbel, J. O. et al. PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data. Genome Biol. 10, R23 (2009).
Hormozdiari, F., Alkan, C., Eichler, E. E. & Sahinalp, S. C. Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes. Genome Res. 19, 1270–1278 (2009).
Hormozdiari, F. et al. Next-generation VariationHunter: combinatorial algorithms for transposon insertion discovery. Bioinformatics (Oxford, England) 26, i350–i357 (2010).
Hormozdiari, F., Hajirasouliha, I., A., M., Eichler, E. E. & Sahinalp, S. C. Simultaneous structural variation discovery in multiple paired-end sequenced genomes. Proc. RECOMB 2011 (in the press).
Chen, K. et al. BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nature Methods 6, 677–681 (2009).
Lee, S., Hormozdiari, F., Alkan, C. & Brudno, M. MoDIL: detecting small indels from clone-end sequencing with mixtures of distributions. Nature Methods 6, 473–474 (2009).
Lee, S., Xing, E. & Brudno, M. MoGUL: detecting common insertions and deletions in a population. Proc. RECOMB 2010 6044, 357–368 (2010).
Quinlan, A. R. et al. Genome-wide mapping and assembly of structural variant breakpoints in the mouse genome. Genome Res. 20, 623–635 (2010).
Campbell, P. J. et al. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nature Genet. 40, 722–729 (2008). This manuscript describes the use of NGS technologies to characterize rearrangements in cancer.
Chiang, D. Y. et al. High-resolution mapping of copy-number alterations with massively parallel sequencing. Nature Methods 6, 99–103 (2009).
Alkan, C. et al. Personalized copy number and segmental duplication maps using next-generation sequencing. Nature Genet. 41, 1061–1067 (2009). The first publication to describe methods to predict absolute copy numbers of duplicated segments.
Sudmant, P. H. et al. Diversity of human copy number variation and multicopy genes. Science 330, 641–646 (2010). Provides copy-number maps in 159 genomes and describes the SUN method to accurately genotype duplications and characterize paralogue-specific copy numbers.
Yoon, S., Xuan, Z., Makarov, V., Ye, K. & Sebat, J. Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Res. 19, 1586–1592 (2009).
Abyzov, A., Urban, A. E., Snyder, M. & Gerstein, M. CNVnator: an approach to discover, genotype and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res. 7 Feb 2011 (doi:10.1101/gr.114876.110).
Mills, R. E. et al. An initial map of insertion and deletion (INDEL) variation in the human genome. Genome Res. 16, 1182–1190 (2006).
Ye, K., Schulz, M. H., Long, Q., Apweiler, R. & Ning, Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics (Oxford, England) 25, 2865–2871 (2009).
Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007).
Xing, J. et al. Mobile elements create structural variation: analysis of a complete human genome. Genome Res. 19, 1516–1526 (2009).
Pang, A. W. et al. Towards a comprehensive structural variation map of an individual human genome. Genome Biol. 11, R52 (2010).
Chaisson, M. J., Brinza, D. & Pevzner, P. A. De novo fragment assembly with short mate-paired reads: does the read length matter? Genome Res. 19, 336–346 (2009).
Simpson, J. T. et al. ABySS: a parallel assembler for short read sequence data. Genome Res. 19, 1117–1123 (2009).
Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265–272 (2009).
Gnerre, S. et al. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc. Natl Acad. Sci. USA 108, 1513–1518 (2011).
Hajirasouliha, I. et al. Detection and characterization of novel sequence insertions using paired-end next-generation sequencing. Bioinformatics (Oxford, England) 26, 1277–1283 (2010). The first computational framework to merge local and de novo sequence assembly methods to characterize novel sequence insertions using NGS technology.
She, X. et al. Shotgun sequence assembly and recent segmental duplications within the human genome. Nature 431, 927–930 (2004).
Alkan, C., Sajjadian, S. & Eichler, E. E. Limitations of next-generation genome sequence assembly. Nature Methods 8, 61–65 (2011).
Medvedev, P., Fiume, M., Dzamba, M., Smith, T. & Brudno, M. Detecting copy number variation with mated short reads. Genome Res. 20, 1613–1622 (2010). The first algorithm to incorporate both read-depth and read-pair methods for accurate CNV discovery.
Handsaker, R. E., Korn, J. M., Nemesh, J. & McCarroll, S. A. Discovery and genotyping of genome structural polymorphism by sequencing on a population scale. Nature Genet. 13 Feb 2011 (doi: 10.1038/ng.768).
Schatz, M. C., Delcher, A. L. & Salzberg, S. L. Assembly of large genomes using second-generation sequencing. Genome Res. 20, 1165–1173 (2010).
Human genome: genomes by the thousand. Nature 467, 1026–1027 (2010).
Weksberg, R. et al. A method for accurate detection of genomic microdeletions using real-time quantitative PCR. BMC Genomics 6, 180 (2005).
Schaeffeler, E., Schwab, M., Eichelbaum, M. & Zanger, U. M. CYP2D6 genotyping strategy based on gene copy number determination by TaqMan real-time PCR. Hum. Mutation 22, 476–485 (2003).
Gomez-Curet, I. et al. Robust quantification of the SMN gene copy number by real-time TaqMan PCR. Neurogenetics 8, 271–278 (2007).
Armour, J. A., Sismani, C., Patsalis, P. C. & Cross, G. Measurement of locus copy number by hybridisation with amplifiable probes. Nucleic Acids Res. 28, 605–609 (2000).
Kumps, C. et al. Multiplex amplicon quantification (MAQ), a fast and efficient method for the simultaneous detection of copy number alterations in neuroblastoma. BMC Genomics 11, 298 (2010).
Schouten, J. P. et al. Relative quantification of 40 nucleic acid sequences by multiplex ligation-dependent probe amplification. Nucleic Acids Res. 30, e57 (2002).
Fan, H. C., Blumenfeld, Y. J., El-Sayed, Y. Y., Chueh, J. & Quake, S. R. Microfluidic digital PCR enables rapid prenatal diagnosis of fetal aneuploidy. Am. J. Obstet. Gynecol. 200, 543.e1–543.e7 (2009).
Shen, F., Du, W., Kreutz, J. E., Fok, A. & Ismagilov, R. F. Digital PCR on a SlipChip. Lab Chip 10, 2666–2672 (2010).
Diehl, F. et al. BEAMing: single-molecule PCR on microparticles in water-in-oil emulsions. Nature Methods 3, 551–559 (2006).
Weaver, S. et al. Taking qPCR to a higher level: analysis of CNV reveals the power of high throughput qPCR to enhance quantitative resolution. Methods (San Diego, California) 50, 271–276 (2010).
Mefford, H. C. et al. A method for rapid, targeted CNV genotyping identifies rare variants associated with neurocognitive disease. Genome Res. 19, 1579–1585 (2009).
Zerr, T., Cooper, G. M., Eichler, E. E. & Nickerson, D. A. Targeted interrogation of copy number variation using SCIMMkit. Bioinformatics (Oxford, England) 26, 120–122 (2010). References 104 and 105 describe an experimental method to rapidly and efficiently genotype thousands of cases for disease-associated candidate regions.
Lam, H. Y. et al. Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library. Nature Biotech. 28, 47–55 (2010).
Waszak, S. M. et al. Systematic inference of copy-number genotypes from personal genome sequencing data reveals extensive olfactory receptor gene content diversity. PLoS Comput. Biol. 6, e1000988 (2010).
Conrad, D. F. et al. Mutation spectrum revealed by breakpoint sequencing of human germline CNVs. Nature Genet. 42, 385–391 (2010).
Itsara, A. et al. De novo rates and selection of large copy number variation. Genome Res. 20, 1469–1481 (2010).
Zody, M. C. et al. Evolutionary toggling of the MAPT 17q21.31 inversion region. Nature Genet. 40, 1076–1083 (2008).
Kitzman, J. O. et al. Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nature Biotech. 29, 59–63 (2011).
Oostlander, A. E., Meijer, G. A. & Ylstra, B. Microarray-based comparative genomic hybridization and its applications in human genetics. Clin. Genet. 66, 488–495 (2004).
Conlin, L. K. et al. Mechanisms of mosaicism, chimerism and uniparental disomy identified by single nucleotide polymorphism array analysis. Hum. Mol. Genet. 19, 1263–1275 (2010).
Rodriguez-Santiago, B. et al. Mosaic uniparental disomies and aneuploidies as large structural variants of the human genome. Am. J Hum. Genet. 87, 129–138 (2010).
Zerbino, D. R. & Birney, E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008).
Perry, G. H. et al. Diet and the evolution of human amylase gene copy number variation. Nature Genet. 39, 1256–1260 (2007).
We thank J. Kidd, G. Cooper and S. Girirajan for valuable comments in the preparation of this review; P. Sudmant, F. Antonacci and J. Kitzman for their help in creating the figures; and T. Brown for proofreading the text. We also thank the authors of the algorithms that were unpublished during the preparation of this manuscript for sharing pre-prints and extended descriptions (S. McCarroll, K. Chen, A. Abyzov, Z. Iqbal and C. Stewart). B.P.C. is supported by a fellowship from the Canadian Institutes of Health Research. E. E.E. is an investigator of the Howard Hughes Medical Institute.
Evan E. Eichler is a scientific advisory board member for Pacific Biosciences.
- Structural variant
(SV). Genomic rearrangements that affect >50 bp of sequence, including deletions, novel insertions, inversions, mobile-element transpositions, duplications and translocations.
- Copy number variant
(CNV). Also defined as unbalanced structural variants; variants that change the number of base pairs in the genome.
- Mobile elements
DNA sequences that move location within the genome. Active mobile elements (transposons) in the human genome include Alu, L1 and SVA sequences.
- Array comparative genomic hybridization
(Array CGH). A technique based on competitively hybridizing fluorescently labelled test and reference samples to a known target DNA sequence immobilized on a solid glass substrate and then interrogating the hybridization ratio.
- SNP microarrays
Hybridization-based assays in which the target DNA sequences are discriminated on the basis of a single base difference. Assays are processed with a single sample per array and perform both SNP genotyping and copy-number interrogation.
- Single-base extension
Single-base-extension reactions use a primer that binds to a region of interest and follow this with an extension reaction that allows the incorporation of a single base after the primer.
- Segmental uniparental disomy
Uniparental disomy (often abbreviated UPD) is a cryptic alteration in which two copies of a chromosome or segment (segmental UPD) are present, but derive from a single parent.
- Nano-channel flow cells
Specialized flow cells narrow enough for a single DNA molecule to pass through in linear form without having sufficient room to fold over on itself.
Narrow channels (~1 μm wide) on specialized silicon substrates. They are loaded with linear stretched DNA strands by applying a charge to microchannels on the substrates that contain electrodes.
- Emulsion picolitre droplet PCR
Emulsion PCR is based on the generation of independent PCR reaction by emulsifying the aqueous reagents in oil such that each droplet becomes a separate PCR reaction. Reagents are diluted such that each droplet contains a single target sequence.
- Paired-end reads
Two reads sequenced from the start and end of the same molecule (such as a fosmid, bacterial artificial chromosome or next-generation sequence fragment).
- Fosmid end sequence library
Paired-end sequences from a collection of bacterial cloning vectors that can carry an average of 40 kb of DNA.
- Tag SNP
A SNP in strong linkage disequilibrium with a set of SNPs or a copy number variant.
About this article
Cite this article
Alkan, C., Coe, B. & Eichler, E. Genome structural variation discovery and genotyping. Nat Rev Genet 12, 363–376 (2011). https://doi.org/10.1038/nrg2958
Genomic characterization of the world’s longest selection experiment in mouse reveals the complexity of polygenic traits
BMC Biology (2022)
Combined use of Oxford Nanopore and Illumina sequencing yields insights into soybean structural variation biology
BMC Biology (2022)
Comprehensive evaluation of structural variant genotyping methods based on long-read sequencing data
BMC Genomics (2022)
Genome Medicine (2022)
Molecular Horticulture (2022)