Genome structural variation discovery and genotyping

Alkan, Can; Coe, Bradley P.; Eichler, Evan E.

doi:10.1038/nrg2958

Review Article
Published: 01 March 2011

Genome structural variation discovery and genotyping

Can Alkan^1,2,
Bradley P. Coe¹ &
Evan E. Eichler^1,2

Nature Reviews Genetics volume 12, pages 363–376 (2011)Cite this article

39k Accesses
920 Citations
47 Altmetric
Metrics details

Subjects

Key Points

Structural variation was originally defined as insertions, deletions and inversions greater than 1 kb in size, but with the sequencing of human genomes now becoming routine, the operational spectrum of structural variants has widened to include events >50 bp in length.
The main focus of structural variant (SV) studies should be accurate characterization of the copy, content and structure of genomic variants.
Methods to discover and genotype structural variation can be divided into two main types: experimental and computational.
Experimental methods for discovering SVs include hybridization-based approaches (SNP microarrays and array comparative genomic hybridization) and single-molecule analysis (optical mapping). In addition, PCR-based techniques can be used to genotype SVs.
Computational methods use genome sequencing data to discover and genotype SVs. There are four main computational approaches: read-pair, read-depth, split-read and sequence-assembly methods.
All existing platforms and methods have different biases and limitations. Accurate characterization of the full spectrum of structural variation remains a challenge.

Abstract

Comparisons of human genomes show that more base pairs are altered as a result of structural variation — including copy number variation — than as a result of point mutations. Here we review advances and challenges in the discovery and genotyping of structural variation. The recent application of massively parallel sequencing methods has complemented microarray-based methods and has led to an exponential increase in the discovery of smaller structural-variation events. Some global discovery biases remain, but the integration of experimental and computational approaches is proving fruitful for accurate characterization of the copy, content and structure of variable regions. We argue that the long-term goal should be routine, cost-effective and high quality de novo assembly of human genomes to comprehensively assess all classes of structural variation.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Classes of structural variation.**

**Figure 2: Structural variation sequence signatures.**

**Figure 3: Copy number variant discovery biases.**

**Figure 4: Genotyping duplicated paralogues using next-generation sequencing.**

**Figure 5: Improved copy number variant genotyping by the integration of computational and experimental approaches.**

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

Genome-wide association studies

Article 26 August 2021

Tissue-specific enhancer–gene maps from multimodal single-cell data identify causal disease alleles

Article 09 April 2024

References

Iafrate, A. J. et al. Detection of large-scale variation in the human genome. Nature Genet. 36, 949–951 (2004). The first report of CNVs in the human genome using array CGH.
CAS PubMed Google Scholar
Redon, R. et al. Global variation in copy number in the human genome. Nature 444, 444–454 (2006).
CAS PubMed PubMed Central Google Scholar
Tuzun, E. et al. Fine-scale structural variation of the human genome. Nature Genet. 37, 727–732 (2005). The first study to implement a paired-end sequencing approach to study structural variation.
CAS PubMed Google Scholar
Kidd, J. M. et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008).
Article CAS PubMed PubMed Central Google Scholar
Conrad, D. F. et al. Origins and functional impact of copy number variation in the human genome. Nature 464, 704–712 (2010). This study represents the first application of an ultra-high-density CGH array.
Article CAS PubMed Google Scholar
Sebat, J. et al. Large-scale copy number polymorphism in the human genome. Science 305, 525–528 (2004).
Article CAS PubMed Google Scholar
Feuk, L., Carson, A. R. & Scherer, S. W. Structural variation in the human genome. Nature Rev. Genet. 7, 85–97 (2006).
CAS PubMed Google Scholar
The 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010). A milestone paper describing the pilot phase of the 1000 Genomes Project, the most extensive study on genomic variation in human genomes to date.
Sebat, J. et al. Strong association of de novo copy number mutations with autism. Science 316, 445–449 (2007). The first study to report CNVs in a common complex neuropsychiatric disease.
Article CAS PubMed PubMed Central Google Scholar
Sharp, A. J. et al. Discovery of previously unidentified genomic disorders from the duplication architecture of the human genome. Nature Genet. 38, 1038–1042 (2006).
CAS PubMed Google Scholar
de Vries, B. B. et al. Diagnostic genome profiling in mental retardation. Am. J. Hum. Genet. 77, 606–616 (2005).
CAS PubMed PubMed Central Google Scholar
Stankiewicz, P. & Lupski, J. R. Genomic architecture, rearrangements and genomic disorders. Trends Genet. 18, 74–82 (2002).
CAS PubMed Google Scholar
Fellermann, K. et al. A chromosome 8 gene-cluster polymorphism with low human b-defensin 2 gene copy number predisposes to Crohn disease of the colon. Am. J. Hum. Genet. 79, 439–448 (2006).
CAS PubMed PubMed Central Google Scholar
Aitman, T. J. et al. Copy number polymorphism in Fcgr3 predisposes to glomerulonephritis in rats and humans. Nature 439, 851–855 (2006).
CAS PubMed Google Scholar
Locke, D. P. et al. BAC microarray analysis of 15q11–q13 rearrangements and the impact of segmental duplications. J. Med. Genet. 41, 175–182 (2004).
CAS PubMed PubMed Central Google Scholar
Itsara, A. et al. Population analysis of large copy number variants and hotspots of human genetic disease. Am. J. Hum. Genet. 84, 148–161 (2009).
CAS PubMed PubMed Central Google Scholar
Snijders, A. M. et al. Assembly of microarrays for genome-wide measurement of DNA copy number. Nature Genet. 29, 263–264 (2001).
CAS PubMed Google Scholar
Pinkel, D. et al. High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nature Genet. 20, 207–211 (1998).
CAS PubMed Google Scholar
Park, H. et al. Discovery of common Asian copy number variants using integrated high-resolution array CGH and massively parallel DNA sequencing. Nature Genet. 42, 400–405 (2010).
CAS PubMed Google Scholar
McCarroll, S. A. et al. Integrated detection and population-genetic analysis of SNPs and copy number variation. Nature Genet. 40, 1166–1174 (2008).
CAS PubMed Google Scholar
Perry, G. H. et al. The fine-scale and complex architecture of human copy-number variation. Am. J. Hum. Genet. 82, 685–695 (2008).
CAS PubMed PubMed Central Google Scholar
Miller, D. T. et al. Consensus statement: chromosomal microarray is a first-tier clinical diagnostic test for individuals with developmental disabilities or congenital anomalies. Am. J. Hum. Genet. 86, 749–764 (2010).
CAS PubMed PubMed Central Google Scholar
Cooper, G. M., Zerr, T., Kidd, J. M., Eichler, E. E. & Nickerson, D. A. Systematic assessment of copy number variant detection via genome-wide SNP genotyping. Nature Genet. 40, 1199–1203 (2008).
CAS PubMed Google Scholar
Peiffer, D. A. et al. High-resolution genomic profiling of chromosomal aberrations using Infinium whole-genome genotyping. Genome Res. 16, 1136–1148 (2006).
Article CAS PubMed PubMed Central Google Scholar
Coe, B. P. et al. Resolving the resolution of array CGH. Genomics 89, 647–653 (2007).
CAS PubMed Google Scholar
Greshock, J. et al. A comparison of DNA copy number profiling platforms. Cancer Res. 67, 10173–10180 (2007).
CAS PubMed Google Scholar
Curtis, C. et al. The pitfalls of platform comparison: DNA copy number array technologies assessed. BMC Genomics 10, 588 (2009).
PubMed PubMed Central Google Scholar
Jakobsson, M. et al. Genotype, haplotype and copy-number variation in worldwide human populations. Nature 451, 998–1003 (2008).
CAS PubMed Google Scholar
Gusev, A. et al. Whole population, genome-wide mapping of hidden relatedness. Genome Res. 19, 318–326 (2009).
CAS PubMed PubMed Central Google Scholar
Winchester, L., Yau, C. & Ragoussis, J. Comparing CNV detection methods for SNP arrays. Brief. Funct. Genomic. Proteomic. 8, 353–366 (2009).
CAS PubMed Google Scholar
Kidd, J. M. et al. Characterization of missing human genome sequences and copy-number polymorphic insertions. Nature Methods 7, 365–371 (2010).
CAS PubMed PubMed Central Google Scholar
Wellcome Trust Case Control Consortium. Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls. Nature 464, 713–720 (2010).
Paris, P. L. et al. High resolution oligonucleotide CGH using DNA from archived prostate tissue. The Prostate 67, 1447–1455 (2007).
CAS PubMed PubMed Central Google Scholar
Hehir-Kwa, J. Y. et al. Genome-wide copy number profiling on high-density bacterial artificial chromosomes, single-nucleotide polymorphisms, and oligonucleotide microarrays: a platform comparison based on statistical power analysis. DNA Res. 14, 1–11 (2007).
CAS PubMed PubMed Central Google Scholar
Wicker, N. et al. A new look towards BAC-based array CGH through a comprehensive comparison with oligo-based array CGH. BMC Genomics 8, 84 (2007).
PubMed PubMed Central Google Scholar
van de Wiel, M. A. et al. CGHcall: calling aberrations for array CGH tumor profiles. Bioinformatics (Oxford, England) 23, 892–894 (2007).
CAS Google Scholar
van Wieringen, W. N., van de Wiel, M. A. & Ylstra, B. Normalized, segmented or called aCGH data? Cancer Inform. 3, 321–327 (2007).
PubMed PubMed Central Google Scholar
Wang, K. et al. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 17, 1665–1674 (2007).
CAS PubMed PubMed Central Google Scholar
Korn, J. M. et al. Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nature Genet. 40, 1253–1260 (2008).
CAS PubMed Google Scholar
Coe, B. P., Chari, R., MacAulay, C. & Lam, W. L. FACADE: a fast and sensitive algorithm for the segmentation and calling of high resolution array CGH data. Nucleic Acids Res. 38, e157 (2010).
PubMed PubMed Central Google Scholar
Dellinger, A. E. et al. Comparative analyses of seven algorithms for copy number variant identification from single nucleotide polymorphism arrays. Nucleic Acids Res. 38, e105 (2010).
PubMed PubMed Central Google Scholar
Church, D. M. et al. Public data archives for genomic structural variation. Nature Genet. 42, 813–814 (2010).
CAS PubMed Google Scholar
Walsh, T. et al. Rare structural variants disrupt multiple genes in neurodevelopmental pathways in schizophrenia. Science 320, 539–543 (2008).
CAS PubMed Google Scholar
Heinzen, E. L. et al. Rare deletions at 16p13.11 predispose to a diverse spectrum of sporadic epilepsy syndromes. Am. J. Hum. Genet. 86, 707–718 (2010).
CAS PubMed PubMed Central Google Scholar
Bailey, J. A. et al. Recent segmental duplications in the human genome. Science 297, 1003–1007 (2002).
CAS PubMed Google Scholar
Mailman, M. D. et al. The NCBI dbGaP database of genotypes and phenotypes. Nature Genet. 39, 1181–1186 (2007).
CAS PubMed Google Scholar
Trask, B. J. et al. Large multi-chromosomal duplications encompass many members of the olfactory receptor gene family in the human genome. Hum. Mol. Genet. 7, 2007–2020 (1998).
CAS PubMed Google Scholar
Schwartz, D. C. et al. Ordered restriction maps of Saccharomyces cerevisiae chromosomes constructed by optical mapping. Science 262, 110–114 (1993).
CAS PubMed Google Scholar
Teague, B. et al. High-resolution human genome structure by single-molecule analysis. Proc. Natl Acad. Sci. USA 107, 10848–10853 (2010). Application of the optical mapping technology to characterize human genome structure.
CAS PubMed PubMed Central Google Scholar
Antonacci, F. et al. A large and complex structural polymorphism at 16p12.1 underlies microdeletion disease risk. Nature Genet. 42, 745–750 (2010).
CAS PubMed Google Scholar
Das, S. K. et al. Single molecule linear analysis of DNA in nano-channel labeled with sequence specific fluorescent probes. Nucleic Acids Res. 38, e177 (2010).
PubMed PubMed Central Google Scholar
Jo, K. et al. A single-molecule barcoding system using nanoslits for DNA analysis. Proc. Natl Acad. Sci. USA 104, 2673–2678 (2007).
CAS PubMed PubMed Central Google Scholar
Xiao, M. et al. Direct determination of haplotypes from single DNA molecules. Nature Methods 6, 199–201 (2009).
CAS PubMed Google Scholar
Beer, N. R. et al. On-chip, real-time, single-copy polymerase chain reaction in picoliter droplets. Anal. Chem. 79, 8471–8475 (2007).
CAS PubMed Google Scholar
Pushkarev, D., Neff, N. F. & Quake, S. R. Single-molecule sequencing of an individual human genome. Nature Biotech. 27, 847–852 (2009).
CAS Google Scholar
Wheeler, D. A. et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872–876 (2008).
CAS PubMed Google Scholar
Bentley, D. R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).
CAS PubMed PubMed Central Google Scholar
McKernan, K. J. et al. Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. Genome Res. 19, 1527–1541 (2009).
CAS PubMed PubMed Central Google Scholar
Korbel, J. O. et al. Paired-end mapping reveals extensive structural variation in the human genome. Science 318, 420–426 (2007). The first study in SV discovery using second-generation sequencing technologies.
CAS PubMed PubMed Central Google Scholar
Volik, S. et al. End-sequence profiling: sequence-based analysis of aberrant genomes. Proc. Natl Acad. Sci. USA 100, 7696–7701 (2003).
PubMed PubMed Central Google Scholar
Medvedev, P., Stanciu, M. & Brudno, M. Computational methods for discovering structural variation with next-generation sequencing. Nature Methods 6, S13–S20 (2009). An extensive review on sequencing-based methods for discovering structural variation.
CAS PubMed Google Scholar
Mills, R. E. et al. Mapping copy number variation at fine scale by population scale genome sequencing. Nature 470, 59–65 (2011). Describes the SV discovery and analysis efforts of the 1000 Genomes Project.
CAS PubMed PubMed Central Google Scholar
Kidd, J. M. et al. A human genome structural variation sequencing resource reveals insights into mutational mechanisms. Cell 143, 837–847 (2010).
CAS PubMed PubMed Central Google Scholar
Korbel, J. O. et al. PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data. Genome Biol. 10, R23 (2009).
PubMed PubMed Central Google Scholar
Hormozdiari, F., Alkan, C., Eichler, E. E. & Sahinalp, S. C. Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes. Genome Res. 19, 1270–1278 (2009).
CAS PubMed PubMed Central Google Scholar
Hormozdiari, F. et al. Next-generation VariationHunter: combinatorial algorithms for transposon insertion discovery. Bioinformatics (Oxford, England) 26, i350–i357 (2010).
CAS Google Scholar
Hormozdiari, F., Hajirasouliha, I., A., M., Eichler, E. E. & Sahinalp, S. C. Simultaneous structural variation discovery in multiple paired-end sequenced genomes. Proc. RECOMB 2011 (in the press).
Chen, K. et al. BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nature Methods 6, 677–681 (2009).
CAS PubMed PubMed Central Google Scholar
Lee, S., Hormozdiari, F., Alkan, C. & Brudno, M. MoDIL: detecting small indels from clone-end sequencing with mixtures of distributions. Nature Methods 6, 473–474 (2009).
CAS PubMed Google Scholar
Lee, S., Xing, E. & Brudno, M. MoGUL: detecting common insertions and deletions in a population. Proc. RECOMB 2010 6044, 357–368 (2010).
Google Scholar
Quinlan, A. R. et al. Genome-wide mapping and assembly of structural variant breakpoints in the mouse genome. Genome Res. 20, 623–635 (2010).
CAS PubMed PubMed Central Google Scholar
Campbell, P. J. et al. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nature Genet. 40, 722–729 (2008). This manuscript describes the use of NGS technologies to characterize rearrangements in cancer.
CAS PubMed Google Scholar
Chiang, D. Y. et al. High-resolution mapping of copy-number alterations with massively parallel sequencing. Nature Methods 6, 99–103 (2009).
CAS PubMed Google Scholar
Alkan, C. et al. Personalized copy number and segmental duplication maps using next-generation sequencing. Nature Genet. 41, 1061–1067 (2009). The first publication to describe methods to predict absolute copy numbers of duplicated segments.
CAS PubMed Google Scholar
Sudmant, P. H. et al. Diversity of human copy number variation and multicopy genes. Science 330, 641–646 (2010). Provides copy-number maps in 159 genomes and describes the SUN method to accurately genotype duplications and characterize paralogue-specific copy numbers.
CAS PubMed PubMed Central Google Scholar
Yoon, S., Xuan, Z., Makarov, V., Ye, K. & Sebat, J. Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Res. 19, 1586–1592 (2009).
CAS PubMed PubMed Central Google Scholar
Abyzov, A., Urban, A. E., Snyder, M. & Gerstein, M. CNVnator: an approach to discover, genotype and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res. 7 Feb 2011 (doi:10.1101/gr.114876.110).
CAS PubMed PubMed Central Google Scholar
Mills, R. E. et al. An initial map of insertion and deletion (INDEL) variation in the human genome. Genome Res. 16, 1182–1190 (2006).
CAS PubMed PubMed Central Google Scholar
Ye, K., Schulz, M. H., Long, Q., Apweiler, R. & Ning, Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics (Oxford, England) 25, 2865–2871 (2009).
CAS Google Scholar
Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007).
PubMed PubMed Central Google Scholar
Xing, J. et al. Mobile elements create structural variation: analysis of a complete human genome. Genome Res. 19, 1516–1526 (2009).
CAS PubMed PubMed Central Google Scholar
Pang, A. W. et al. Towards a comprehensive structural variation map of an individual human genome. Genome Biol. 11, R52 (2010).
PubMed PubMed Central Google Scholar
Chaisson, M. J., Brinza, D. & Pevzner, P. A. De novo fragment assembly with short mate-paired reads: does the read length matter? Genome Res. 19, 336–346 (2009).
CAS PubMed PubMed Central Google Scholar
Simpson, J. T. et al. ABySS: a parallel assembler for short read sequence data. Genome Res. 19, 1117–1123 (2009).
CAS PubMed PubMed Central Google Scholar
Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265–272 (2009).
PubMed Google Scholar
Gnerre, S. et al. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc. Natl Acad. Sci. USA 108, 1513–1518 (2011).
CAS PubMed Google Scholar
Hajirasouliha, I. et al. Detection and characterization of novel sequence insertions using paired-end next-generation sequencing. Bioinformatics (Oxford, England) 26, 1277–1283 (2010). The first computational framework to merge local and de novo sequence assembly methods to characterize novel sequence insertions using NGS technology.
CAS Google Scholar
She, X. et al. Shotgun sequence assembly and recent segmental duplications within the human genome. Nature 431, 927–930 (2004).
CAS PubMed Google Scholar
Alkan, C., Sajjadian, S. & Eichler, E. E. Limitations of next-generation genome sequence assembly. Nature Methods 8, 61–65 (2011).
CAS PubMed Google Scholar
Medvedev, P., Fiume, M., Dzamba, M., Smith, T. & Brudno, M. Detecting copy number variation with mated short reads. Genome Res. 20, 1613–1622 (2010). The first algorithm to incorporate both read-depth and read-pair methods for accurate CNV discovery.
CAS PubMed PubMed Central Google Scholar
Handsaker, R. E., Korn, J. M., Nemesh, J. & McCarroll, S. A. Discovery and genotyping of genome structural polymorphism by sequencing on a population scale. Nature Genet. 13 Feb 2011 (doi: 10.1038/ng.768).
CAS PubMed PubMed Central Google Scholar
Schatz, M. C., Delcher, A. L. & Salzberg, S. L. Assembly of large genomes using second-generation sequencing. Genome Res. 20, 1165–1173 (2010).
CAS PubMed PubMed Central Google Scholar
Human genome: genomes by the thousand. Nature 467, 1026–1027 (2010).
Weksberg, R. et al. A method for accurate detection of genomic microdeletions using real-time quantitative PCR. BMC Genomics 6, 180 (2005).
PubMed PubMed Central Google Scholar
Schaeffeler, E., Schwab, M., Eichelbaum, M. & Zanger, U. M. CYP2D6 genotyping strategy based on gene copy number determination by TaqMan real-time PCR. Hum. Mutation 22, 476–485 (2003).
CAS Google Scholar
Gomez-Curet, I. et al. Robust quantification of the SMN gene copy number by real-time TaqMan PCR. Neurogenetics 8, 271–278 (2007).
CAS PubMed Google Scholar
Armour, J. A., Sismani, C., Patsalis, P. C. & Cross, G. Measurement of locus copy number by hybridisation with amplifiable probes. Nucleic Acids Res. 28, 605–609 (2000).
CAS PubMed PubMed Central Google Scholar
Kumps, C. et al. Multiplex amplicon quantification (MAQ), a fast and efficient method for the simultaneous detection of copy number alterations in neuroblastoma. BMC Genomics 11, 298 (2010).
PubMed PubMed Central Google Scholar
Schouten, J. P. et al. Relative quantification of 40 nucleic acid sequences by multiplex ligation-dependent probe amplification. Nucleic Acids Res. 30, e57 (2002).
PubMed PubMed Central Google Scholar
Fan, H. C., Blumenfeld, Y. J., El-Sayed, Y. Y., Chueh, J. & Quake, S. R. Microfluidic digital PCR enables rapid prenatal diagnosis of fetal aneuploidy. Am. J. Obstet. Gynecol. 200, 543.e1–543.e7 (2009).
Google Scholar
Shen, F., Du, W., Kreutz, J. E., Fok, A. & Ismagilov, R. F. Digital PCR on a SlipChip. Lab Chip 10, 2666–2672 (2010).
CAS PubMed PubMed Central Google Scholar
Diehl, F. et al. BEAMing: single-molecule PCR on microparticles in water-in-oil emulsions. Nature Methods 3, 551–559 (2006).
CAS PubMed Google Scholar
Weaver, S. et al. Taking qPCR to a higher level: analysis of CNV reveals the power of high throughput qPCR to enhance quantitative resolution. Methods (San Diego, California) 50, 271–276 (2010).
CAS Google Scholar
Mefford, H. C. et al. A method for rapid, targeted CNV genotyping identifies rare variants associated with neurocognitive disease. Genome Res. 19, 1579–1585 (2009).
CAS PubMed PubMed Central Google Scholar
Zerr, T., Cooper, G. M., Eichler, E. E. & Nickerson, D. A. Targeted interrogation of copy number variation using SCIMMkit. Bioinformatics (Oxford, England) 26, 120–122 (2010). References 104 and 105 describe an experimental method to rapidly and efficiently genotype thousands of cases for disease-associated candidate regions.
CAS Google Scholar
Lam, H. Y. et al. Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library. Nature Biotech. 28, 47–55 (2010).
CAS Google Scholar
Waszak, S. M. et al. Systematic inference of copy-number genotypes from personal genome sequencing data reveals extensive olfactory receptor gene content diversity. PLoS Comput. Biol. 6, e1000988 (2010).
PubMed PubMed Central Google Scholar
Conrad, D. F. et al. Mutation spectrum revealed by breakpoint sequencing of human germline CNVs. Nature Genet. 42, 385–391 (2010).
CAS PubMed Google Scholar
Itsara, A. et al. De novo rates and selection of large copy number variation. Genome Res. 20, 1469–1481 (2010).
CAS PubMed PubMed Central Google Scholar
Zody, M. C. et al. Evolutionary toggling of the MAPT 17q21.31 inversion region. Nature Genet. 40, 1076–1083 (2008).
CAS PubMed Google Scholar
Kitzman, J. O. et al. Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nature Biotech. 29, 59–63 (2011).
CAS Google Scholar
Oostlander, A. E., Meijer, G. A. & Ylstra, B. Microarray-based comparative genomic hybridization and its applications in human genetics. Clin. Genet. 66, 488–495 (2004).
CAS PubMed Google Scholar
Conlin, L. K. et al. Mechanisms of mosaicism, chimerism and uniparental disomy identified by single nucleotide polymorphism array analysis. Hum. Mol. Genet. 19, 1263–1275 (2010).
CAS PubMed PubMed Central Google Scholar
Rodriguez-Santiago, B. et al. Mosaic uniparental disomies and aneuploidies as large structural variants of the human genome. Am. J Hum. Genet. 87, 129–138 (2010).
CAS PubMed PubMed Central Google Scholar
Zerbino, D. R. & Birney, E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008).
CAS PubMed PubMed Central Google Scholar
Perry, G. H. et al. Diet and the evolution of human amylase gene copy number variation. Nature Genet. 39, 1256–1260 (2007).
CAS PubMed Google Scholar

Download references

Acknowledgements

We thank J. Kidd, G. Cooper and S. Girirajan for valuable comments in the preparation of this review; P. Sudmant, F. Antonacci and J. Kitzman for their help in creating the figures; and T. Brown for proofreading the text. We also thank the authors of the algorithms that were unpublished during the preparation of this manuscript for sharing pre-prints and extended descriptions (S. McCarroll, K. Chen, A. Abyzov, Z. Iqbal and C. Stewart). B.P.C. is supported by a fellowship from the Canadian Institutes of Health Research. E. E.E. is an investigator of the Howard Hughes Medical Institute.

Author information

Authors and Affiliations

Department of Genome Sciences, University of Washington School of Medicine, Foege S413C, 3720 15th Ave NE, Seattle, Washington, USA
Can Alkan, Bradley P. Coe & Evan E. Eichler
Howard Hughes Medical Institute, Foege S413C, 3720 15th Ave NE, Seattle, Washington, USA
Can Alkan & Evan E. Eichler

Authors

Can Alkan
View author publications
You can also search for this author in PubMed Google Scholar
Bradley P. Coe
View author publications
You can also search for this author in PubMed Google Scholar
Evan E. Eichler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Evan E. Eichler.

Ethics declarations

Competing interests

Evan E. Eichler is a scientific advisory board member for Pacific Biosciences.

Glossary

Structural variant: (SV). Genomic rearrangements that affect >50 bp of sequence, including deletions, novel insertions, inversions, mobile-element transpositions, duplications and translocations.
Copy number variant: (CNV). Also defined as unbalanced structural variants; variants that change the number of base pairs in the genome.
Mobile elements: DNA sequences that move location within the genome. Active mobile elements (transposons) in the human genome include Alu, L1 and SVA sequences.
Array comparative genomic hybridization: (Array CGH). A technique based on competitively hybridizing fluorescently labelled test and reference samples to a known target DNA sequence immobilized on a solid glass substrate and then interrogating the hybridization ratio.
SNP microarrays: Hybridization-based assays in which the target DNA sequences are discriminated on the basis of a single base difference. Assays are processed with a single sample per array and perform both SNP genotyping and copy-number interrogation.
Single-base extension: Single-base-extension reactions use a primer that binds to a region of interest and follow this with an extension reaction that allows the incorporation of a single base after the primer.
Segmental uniparental disomy: Uniparental disomy (often abbreviated UPD) is a cryptic alteration in which two copies of a chromosome or segment (segmental UPD) are present, but derive from a single parent.
Nano-channel flow cells: Specialized flow cells narrow enough for a single DNA molecule to pass through in linear form without having sufficient room to fold over on itself.
Nanoslits: Narrow channels (~1 μm wide) on specialized silicon substrates. They are loaded with linear stretched DNA strands by applying a charge to microchannels on the substrates that contain electrodes.
Emulsion picolitre droplet PCR: Emulsion PCR is based on the generation of independent PCR reaction by emulsifying the aqueous reagents in oil such that each droplet becomes a separate PCR reaction. Reagents are diluted such that each droplet contains a single target sequence.
Paired-end reads: Two reads sequenced from the start and end of the same molecule (such as a fosmid, bacterial artificial chromosome or next-generation sequence fragment).
Fosmid end sequence library: Paired-end sequences from a collection of bacterial cloning vectors that can carry an average of 40 kb of DNA.
Tag SNP: A SNP in strong linkage disequilibrium with a set of SNPs or a copy number variant.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alkan, C., Coe, B. & Eichler, E. Genome structural variation discovery and genotyping. Nat Rev Genet 12, 363–376 (2011). https://doi.org/10.1038/nrg2958

Download citation

Published: 01 March 2011
Issue Date: May 2011
DOI: https://doi.org/10.1038/nrg2958

This article is cited by

Comparative genomic analyses provide new insights into evolutionary history and conservation genomics of gorillas
- Tom van der Valk
- Axel Jensen
- Katerina Guschanski
BMC Ecology and Evolution (2024)
Protein-altering variants at copy number-variable regions influence diverse human phenotypes
- Margaux L. A. Hujoel
- Robert E. Handsaker
- Po-Ru Loh
Nature Genetics (2024)
Screening copy number variations in 35 unsolved inherited retinal disease families
- Xiaozhen Liu
- Hehua Dai
- Jing Hong
Human Genetics (2024)
A membrane associated tandem kinase from wild emmer wheat confers broad-spectrum resistance to powdery mildew
- Miaomiao Li
- Huaizhi Zhang
- Zhiyong Liu
Nature Communications (2024)
Exome and genome sequencing to unravel the precise breakpoints of partial trisomy 6q and partial Monosomy 2q
- Shuang Zhang
- Qianwei Cui
- Xunlun Sheng
BMC Pediatrics (2023)