Genome structural variation discovery and genotyping

Key Points

  • Structural variation was originally defined as insertions, deletions and inversions greater than 1 kb in size, but with the sequencing of human genomes now becoming routine, the operational spectrum of structural variants has widened to include events >50 bp in length.

  • The main focus of structural variant (SV) studies should be accurate characterization of the copy, content and structure of genomic variants.

  • Methods to discover and genotype structural variation can be divided into two main types: experimental and computational.

  • Experimental methods for discovering SVs include hybridization-based approaches (SNP microarrays and array comparative genomic hybridization) and single-molecule analysis (optical mapping). In addition, PCR-based techniques can be used to genotype SVs.

  • Computational methods use genome sequencing data to discover and genotype SVs. There are four main computational approaches: read-pair, read-depth, split-read and sequence-assembly methods.

  • All existing platforms and methods have different biases and limitations. Accurate characterization of the full spectrum of structural variation remains a challenge.

Abstract

Comparisons of human genomes show that more base pairs are altered as a result of structural variation — including copy number variation — than as a result of point mutations. Here we review advances and challenges in the discovery and genotyping of structural variation. The recent application of massively parallel sequencing methods has complemented microarray-based methods and has led to an exponential increase in the discovery of smaller structural-variation events. Some global discovery biases remain, but the integration of experimental and computational approaches is proving fruitful for accurate characterization of the copy, content and structure of variable regions. We argue that the long-term goal should be routine, cost-effective and high quality de novo assembly of human genomes to comprehensively assess all classes of structural variation.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Classes of structural variation.
Figure 2: Structural variation sequence signatures.
Figure 3: Copy number variant discovery biases.
Figure 4: Genotyping duplicated paralogues using next-generation sequencing.
Figure 5: Improved copy number variant genotyping by the integration of computational and experimental approaches.

References

  1. 1

    Iafrate, A. J. et al. Detection of large-scale variation in the human genome. Nature Genet. 36, 949–951 (2004). The first report of CNVs in the human genome using array CGH.

    CAS  PubMed  Google Scholar 

  2. 2

    Redon, R. et al. Global variation in copy number in the human genome. Nature 444, 444–454 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  3. 3

    Tuzun, E. et al. Fine-scale structural variation of the human genome. Nature Genet. 37, 727–732 (2005). The first study to implement a paired-end sequencing approach to study structural variation.

    CAS  Google Scholar 

  4. 4

    Kidd, J. M. et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  5. 5

    Conrad, D. F. et al. Origins and functional impact of copy number variation in the human genome. Nature 464, 704–712 (2010). This study represents the first application of an ultra-high-density CGH array.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  6. 6

    Sebat, J. et al. Large-scale copy number polymorphism in the human genome. Science 305, 525–528 (2004).

    CAS  Article  Google Scholar 

  7. 7

    Feuk, L., Carson, A. R. & Scherer, S. W. Structural variation in the human genome. Nature Rev. Genet. 7, 85–97 (2006).

    CAS  Google Scholar 

  8. 8

    The 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010). A milestone paper describing the pilot phase of the 1000 Genomes Project, the most extensive study on genomic variation in human genomes to date.

  9. 9

    Sebat, J. et al. Strong association of de novo copy number mutations with autism. Science 316, 445–449 (2007). The first study to report CNVs in a common complex neuropsychiatric disease.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  10. 10

    Sharp, A. J. et al. Discovery of previously unidentified genomic disorders from the duplication architecture of the human genome. Nature Genet. 38, 1038–1042 (2006).

    CAS  Google Scholar 

  11. 11

    de Vries, B. B. et al. Diagnostic genome profiling in mental retardation. Am. J. Hum. Genet. 77, 606–616 (2005).

    CAS  PubMed  PubMed Central  Google Scholar 

  12. 12

    Stankiewicz, P. & Lupski, J. R. Genomic architecture, rearrangements and genomic disorders. Trends Genet. 18, 74–82 (2002).

    CAS  Google Scholar 

  13. 13

    Fellermann, K. et al. A chromosome 8 gene-cluster polymorphism with low human b-defensin 2 gene copy number predisposes to Crohn disease of the colon. Am. J. Hum. Genet. 79, 439–448 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  14. 14

    Aitman, T. J. et al. Copy number polymorphism in Fcgr3 predisposes to glomerulonephritis in rats and humans. Nature 439, 851–855 (2006).

    CAS  Google Scholar 

  15. 15

    Locke, D. P. et al. BAC microarray analysis of 15q11–q13 rearrangements and the impact of segmental duplications. J. Med. Genet. 41, 175–182 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  16. 16

    Itsara, A. et al. Population analysis of large copy number variants and hotspots of human genetic disease. Am. J. Hum. Genet. 84, 148–161 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. 17

    Snijders, A. M. et al. Assembly of microarrays for genome-wide measurement of DNA copy number. Nature Genet. 29, 263–264 (2001).

    CAS  Google Scholar 

  18. 18

    Pinkel, D. et al. High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nature Genet. 20, 207–211 (1998).

    CAS  PubMed  PubMed Central  Google Scholar 

  19. 19

    Park, H. et al. Discovery of common Asian copy number variants using integrated high-resolution array CGH and massively parallel DNA sequencing. Nature Genet. 42, 400–405 (2010).

    CAS  PubMed  Google Scholar 

  20. 20

    McCarroll, S. A. et al. Integrated detection and population-genetic analysis of SNPs and copy number variation. Nature Genet. 40, 1166–1174 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. 21

    Perry, G. H. et al. The fine-scale and complex architecture of human copy-number variation. Am. J. Hum. Genet. 82, 685–695 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. 22

    Miller, D. T. et al. Consensus statement: chromosomal microarray is a first-tier clinical diagnostic test for individuals with developmental disabilities or congenital anomalies. Am. J. Hum. Genet. 86, 749–764 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  23. 23

    Cooper, G. M., Zerr, T., Kidd, J. M., Eichler, E. E. & Nickerson, D. A. Systematic assessment of copy number variant detection via genome-wide SNP genotyping. Nature Genet. 40, 1199–1203 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  24. 24

    Peiffer, D. A. et al. High-resolution genomic profiling of chromosomal aberrations using Infinium whole-genome genotyping. Genome Res. 16, 1136–1148 (2006).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  25. 25

    Coe, B. P. et al. Resolving the resolution of array CGH. Genomics 89, 647–653 (2007).

    CAS  PubMed  Google Scholar 

  26. 26

    Greshock, J. et al. A comparison of DNA copy number profiling platforms. Cancer Res. 67, 10173–10180 (2007).

    CAS  PubMed  Google Scholar 

  27. 27

    Curtis, C. et al. The pitfalls of platform comparison: DNA copy number array technologies assessed. BMC Genomics 10, 588 (2009).

    PubMed  PubMed Central  Google Scholar 

  28. 28

    Jakobsson, M. et al. Genotype, haplotype and copy-number variation in worldwide human populations. Nature 451, 998–1003 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  29. 29

    Gusev, A. et al. Whole population, genome-wide mapping of hidden relatedness. Genome Res. 19, 318–326 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  30. 30

    Winchester, L., Yau, C. & Ragoussis, J. Comparing CNV detection methods for SNP arrays. Brief. Funct. Genomic. Proteomic. 8, 353–366 (2009).

    CAS  PubMed  Google Scholar 

  31. 31

    Kidd, J. M. et al. Characterization of missing human genome sequences and copy-number polymorphic insertions. Nature Methods 7, 365–371 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. 32

    Wellcome Trust Case Control Consortium. Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls. Nature 464, 713–720 (2010).

  33. 33

    Paris, P. L. et al. High resolution oligonucleotide CGH using DNA from archived prostate tissue. The Prostate 67, 1447–1455 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  34. 34

    Hehir-Kwa, J. Y. et al. Genome-wide copy number profiling on high-density bacterial artificial chromosomes, single-nucleotide polymorphisms, and oligonucleotide microarrays: a platform comparison based on statistical power analysis. DNA Res. 14, 1–11 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  35. 35

    Wicker, N. et al. A new look towards BAC-based array CGH through a comprehensive comparison with oligo-based array CGH. BMC Genomics 8, 84 (2007).

    PubMed  PubMed Central  Google Scholar 

  36. 36

    van de Wiel, M. A. et al. CGHcall: calling aberrations for array CGH tumor profiles. Bioinformatics (Oxford, England) 23, 892–894 (2007).

    CAS  Google Scholar 

  37. 37

    van Wieringen, W. N., van de Wiel, M. A. & Ylstra, B. Normalized, segmented or called aCGH data? Cancer Inform. 3, 321–327 (2007).

    PubMed  PubMed Central  Google Scholar 

  38. 38

    Wang, K. et al. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 17, 1665–1674 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  39. 39

    Korn, J. M. et al. Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nature Genet. 40, 1253–1260 (2008).

    CAS  Google Scholar 

  40. 40

    Coe, B. P., Chari, R., MacAulay, C. & Lam, W. L. FACADE: a fast and sensitive algorithm for the segmentation and calling of high resolution array CGH data. Nucleic Acids Res. 38, e157 (2010).

    PubMed  PubMed Central  Google Scholar 

  41. 41

    Dellinger, A. E. et al. Comparative analyses of seven algorithms for copy number variant identification from single nucleotide polymorphism arrays. Nucleic Acids Res. 38, e105 (2010).

    PubMed  PubMed Central  Google Scholar 

  42. 42

    Church, D. M. et al. Public data archives for genomic structural variation. Nature Genet. 42, 813–814 (2010).

    CAS  PubMed  Google Scholar 

  43. 43

    Walsh, T. et al. Rare structural variants disrupt multiple genes in neurodevelopmental pathways in schizophrenia. Science 320, 539–543 (2008).

    CAS  Google Scholar 

  44. 44

    Heinzen, E. L. et al. Rare deletions at 16p13.11 predispose to a diverse spectrum of sporadic epilepsy syndromes. Am. J. Hum. Genet. 86, 707–718 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  45. 45

    Bailey, J. A. et al. Recent segmental duplications in the human genome. Science 297, 1003–1007 (2002).

    CAS  Google Scholar 

  46. 46

    Mailman, M. D. et al. The NCBI dbGaP database of genotypes and phenotypes. Nature Genet. 39, 1181–1186 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  47. 47

    Trask, B. J. et al. Large multi-chromosomal duplications encompass many members of the olfactory receptor gene family in the human genome. Hum. Mol. Genet. 7, 2007–2020 (1998).

    CAS  PubMed  Google Scholar 

  48. 48

    Schwartz, D. C. et al. Ordered restriction maps of Saccharomyces cerevisiae chromosomes constructed by optical mapping. Science 262, 110–114 (1993).

    CAS  Google Scholar 

  49. 49

    Teague, B. et al. High-resolution human genome structure by single-molecule analysis. Proc. Natl Acad. Sci. USA 107, 10848–10853 (2010). Application of the optical mapping technology to characterize human genome structure.

    CAS  PubMed  Google Scholar 

  50. 50

    Antonacci, F. et al. A large and complex structural polymorphism at 16p12.1 underlies microdeletion disease risk. Nature Genet. 42, 745–750 (2010).

    CAS  PubMed  Google Scholar 

  51. 51

    Das, S. K. et al. Single molecule linear analysis of DNA in nano-channel labeled with sequence specific fluorescent probes. Nucleic Acids Res. 38, e177 (2010).

    PubMed  PubMed Central  Google Scholar 

  52. 52

    Jo, K. et al. A single-molecule barcoding system using nanoslits for DNA analysis. Proc. Natl Acad. Sci. USA 104, 2673–2678 (2007).

    CAS  PubMed  Google Scholar 

  53. 53

    Xiao, M. et al. Direct determination of haplotypes from single DNA molecules. Nature Methods 6, 199–201 (2009).

    CAS  PubMed  Google Scholar 

  54. 54

    Beer, N. R. et al. On-chip, real-time, single-copy polymerase chain reaction in picoliter droplets. Anal. Chem. 79, 8471–8475 (2007).

    CAS  PubMed  Google Scholar 

  55. 55

    Pushkarev, D., Neff, N. F. & Quake, S. R. Single-molecule sequencing of an individual human genome. Nature Biotech. 27, 847–852 (2009).

    CAS  Google Scholar 

  56. 56

    Wheeler, D. A. et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872–876 (2008).

    CAS  PubMed  Google Scholar 

  57. 57

    Bentley, D. R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  58. 58

    McKernan, K. J. et al. Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. Genome Res. 19, 1527–1541 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  59. 59

    Korbel, J. O. et al. Paired-end mapping reveals extensive structural variation in the human genome. Science 318, 420–426 (2007). The first study in SV discovery using second-generation sequencing technologies.

    CAS  PubMed  PubMed Central  Google Scholar 

  60. 60

    Volik, S. et al. End-sequence profiling: sequence-based analysis of aberrant genomes. Proc. Natl Acad. Sci. USA 100, 7696–7701 (2003).

    PubMed  Google Scholar 

  61. 61

    Medvedev, P., Stanciu, M. & Brudno, M. Computational methods for discovering structural variation with next-generation sequencing. Nature Methods 6, S13–S20 (2009). An extensive review on sequencing-based methods for discovering structural variation.

    CAS  Google Scholar 

  62. 62

    Mills, R. E. et al. Mapping copy number variation at fine scale by population scale genome sequencing. Nature 470, 59–65 (2011). Describes the SV discovery and analysis efforts of the 1000 Genomes Project.

    CAS  PubMed  PubMed Central  Google Scholar 

  63. 63

    Kidd, J. M. et al. A human genome structural variation sequencing resource reveals insights into mutational mechanisms. Cell 143, 837–847 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  64. 64

    Korbel, J. O. et al. PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data. Genome Biol. 10, R23 (2009).

    PubMed  PubMed Central  Google Scholar 

  65. 65

    Hormozdiari, F., Alkan, C., Eichler, E. E. & Sahinalp, S. C. Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes. Genome Res. 19, 1270–1278 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  66. 66

    Hormozdiari, F. et al. Next-generation VariationHunter: combinatorial algorithms for transposon insertion discovery. Bioinformatics (Oxford, England) 26, i350–i357 (2010).

    CAS  Google Scholar 

  67. 67

    Hormozdiari, F., Hajirasouliha, I., A., M., Eichler, E. E. & Sahinalp, S. C. Simultaneous structural variation discovery in multiple paired-end sequenced genomes. Proc. RECOMB 2011 (in the press).

  68. 68

    Chen, K. et al. BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nature Methods 6, 677–681 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  69. 69

    Lee, S., Hormozdiari, F., Alkan, C. & Brudno, M. MoDIL: detecting small indels from clone-end sequencing with mixtures of distributions. Nature Methods 6, 473–474 (2009).

    CAS  PubMed  Google Scholar 

  70. 70

    Lee, S., Xing, E. & Brudno, M. MoGUL: detecting common insertions and deletions in a population. Proc. RECOMB 2010 6044, 357–368 (2010).

    Google Scholar 

  71. 71

    Quinlan, A. R. et al. Genome-wide mapping and assembly of structural variant breakpoints in the mouse genome. Genome Res. 20, 623–635 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  72. 72

    Campbell, P. J. et al. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nature Genet. 40, 722–729 (2008). This manuscript describes the use of NGS technologies to characterize rearrangements in cancer.

    CAS  PubMed  PubMed Central  Google Scholar 

  73. 73

    Chiang, D. Y. et al. High-resolution mapping of copy-number alterations with massively parallel sequencing. Nature Methods 6, 99–103 (2009).

    CAS  Google Scholar 

  74. 74

    Alkan, C. et al. Personalized copy number and segmental duplication maps using next-generation sequencing. Nature Genet. 41, 1061–1067 (2009). The first publication to describe methods to predict absolute copy numbers of duplicated segments.

    CAS  Google Scholar 

  75. 75

    Sudmant, P. H. et al. Diversity of human copy number variation and multicopy genes. Science 330, 641–646 (2010). Provides copy-number maps in 159 genomes and describes the SUN method to accurately genotype duplications and characterize paralogue-specific copy numbers.

    CAS  PubMed  PubMed Central  Google Scholar 

  76. 76

    Yoon, S., Xuan, Z., Makarov, V., Ye, K. & Sebat, J. Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Res. 19, 1586–1592 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  77. 77

    Abyzov, A., Urban, A. E., Snyder, M. & Gerstein, M. CNVnator: an approach to discover, genotype and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res. 7 Feb 2011 (doi:10.1101/gr.114876.110).

    CAS  PubMed  PubMed Central  Google Scholar 

  78. 78

    Mills, R. E. et al. An initial map of insertion and deletion (INDEL) variation in the human genome. Genome Res. 16, 1182–1190 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  79. 79

    Ye, K., Schulz, M. H., Long, Q., Apweiler, R. & Ning, Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics (Oxford, England) 25, 2865–2871 (2009).

    CAS  Google Scholar 

  80. 80

    Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007).

    PubMed  PubMed Central  Google Scholar 

  81. 81

    Xing, J. et al. Mobile elements create structural variation: analysis of a complete human genome. Genome Res. 19, 1516–1526 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  82. 82

    Pang, A. W. et al. Towards a comprehensive structural variation map of an individual human genome. Genome Biol. 11, R52 (2010).

    PubMed  PubMed Central  Google Scholar 

  83. 83

    Chaisson, M. J., Brinza, D. & Pevzner, P. A. De novo fragment assembly with short mate-paired reads: does the read length matter? Genome Res. 19, 336–346 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  84. 84

    Simpson, J. T. et al. ABySS: a parallel assembler for short read sequence data. Genome Res. 19, 1117–1123 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  85. 85

    Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265–272 (2009).

    PubMed  Google Scholar 

  86. 86

    Gnerre, S. et al. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc. Natl Acad. Sci. USA 108, 1513–1518 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  87. 87

    Hajirasouliha, I. et al. Detection and characterization of novel sequence insertions using paired-end next-generation sequencing. Bioinformatics (Oxford, England) 26, 1277–1283 (2010). The first computational framework to merge local and de novo sequence assembly methods to characterize novel sequence insertions using NGS technology.

    CAS  Google Scholar 

  88. 88

    She, X. et al. Shotgun sequence assembly and recent segmental duplications within the human genome. Nature 431, 927–930 (2004).

    CAS  PubMed  Google Scholar 

  89. 89

    Alkan, C., Sajjadian, S. & Eichler, E. E. Limitations of next-generation genome sequence assembly. Nature Methods 8, 61–65 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  90. 90

    Medvedev, P., Fiume, M., Dzamba, M., Smith, T. & Brudno, M. Detecting copy number variation with mated short reads. Genome Res. 20, 1613–1622 (2010). The first algorithm to incorporate both read-depth and read-pair methods for accurate CNV discovery.

    CAS  PubMed  PubMed Central  Google Scholar 

  91. 91

    Handsaker, R. E., Korn, J. M., Nemesh, J. & McCarroll, S. A. Discovery and genotyping of genome structural polymorphism by sequencing on a population scale. Nature Genet. 13 Feb 2011 (doi: 10.1038/ng.768).

    CAS  PubMed  PubMed Central  Google Scholar 

  92. 92

    Schatz, M. C., Delcher, A. L. & Salzberg, S. L. Assembly of large genomes using second-generation sequencing. Genome Res. 20, 1165–1173 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  93. 93

    Human genome: genomes by the thousand. Nature 467, 1026–1027 (2010).

  94. 94

    Weksberg, R. et al. A method for accurate detection of genomic microdeletions using real-time quantitative PCR. BMC Genomics 6, 180 (2005).

    PubMed  PubMed Central  Google Scholar 

  95. 95

    Schaeffeler, E., Schwab, M., Eichelbaum, M. & Zanger, U. M. CYP2D6 genotyping strategy based on gene copy number determination by TaqMan real-time PCR. Hum. Mutation 22, 476–485 (2003).

    CAS  Google Scholar 

  96. 96

    Gomez-Curet, I. et al. Robust quantification of the SMN gene copy number by real-time TaqMan PCR. Neurogenetics 8, 271–278 (2007).

    CAS  PubMed  Google Scholar 

  97. 97

    Armour, J. A., Sismani, C., Patsalis, P. C. & Cross, G. Measurement of locus copy number by hybridisation with amplifiable probes. Nucleic Acids Res. 28, 605–609 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  98. 98

    Kumps, C. et al. Multiplex amplicon quantification (MAQ), a fast and efficient method for the simultaneous detection of copy number alterations in neuroblastoma. BMC Genomics 11, 298 (2010).

    PubMed  PubMed Central  Google Scholar 

  99. 99

    Schouten, J. P. et al. Relative quantification of 40 nucleic acid sequences by multiplex ligation-dependent probe amplification. Nucleic Acids Res. 30, e57 (2002).

    PubMed  PubMed Central  Google Scholar 

  100. 100

    Fan, H. C., Blumenfeld, Y. J., El-Sayed, Y. Y., Chueh, J. & Quake, S. R. Microfluidic digital PCR enables rapid prenatal diagnosis of fetal aneuploidy. Am. J. Obstet. Gynecol. 200, 543.e1–543.e7 (2009).

    Google Scholar 

  101. 101

    Shen, F., Du, W., Kreutz, J. E., Fok, A. & Ismagilov, R. F. Digital PCR on a SlipChip. Lab Chip 10, 2666–2672 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  102. 102

    Diehl, F. et al. BEAMing: single-molecule PCR on microparticles in water-in-oil emulsions. Nature Methods 3, 551–559 (2006).

    CAS  PubMed  Google Scholar 

  103. 103

    Weaver, S. et al. Taking qPCR to a higher level: analysis of CNV reveals the power of high throughput qPCR to enhance quantitative resolution. Methods (San Diego, California) 50, 271–276 (2010).

    CAS  Google Scholar 

  104. 104

    Mefford, H. C. et al. A method for rapid, targeted CNV genotyping identifies rare variants associated with neurocognitive disease. Genome Res. 19, 1579–1585 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  105. 105

    Zerr, T., Cooper, G. M., Eichler, E. E. & Nickerson, D. A. Targeted interrogation of copy number variation using SCIMMkit. Bioinformatics (Oxford, England) 26, 120–122 (2010). References 104 and 105 describe an experimental method to rapidly and efficiently genotype thousands of cases for disease-associated candidate regions.

    CAS  Google Scholar 

  106. 106

    Lam, H. Y. et al. Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library. Nature Biotech. 28, 47–55 (2010).

    CAS  Google Scholar 

  107. 107

    Waszak, S. M. et al. Systematic inference of copy-number genotypes from personal genome sequencing data reveals extensive olfactory receptor gene content diversity. PLoS Comput. Biol. 6, e1000988 (2010).

    PubMed  PubMed Central  Google Scholar 

  108. 108

    Conrad, D. F. et al. Mutation spectrum revealed by breakpoint sequencing of human germline CNVs. Nature Genet. 42, 385–391 (2010).

    CAS  PubMed  Google Scholar 

  109. 109

    Itsara, A. et al. De novo rates and selection of large copy number variation. Genome Res. 20, 1469–1481 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  110. 110

    Zody, M. C. et al. Evolutionary toggling of the MAPT 17q21.31 inversion region. Nature Genet. 40, 1076–1083 (2008).

    CAS  PubMed  Google Scholar 

  111. 111

    Kitzman, J. O. et al. Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nature Biotech. 29, 59–63 (2011).

    CAS  Google Scholar 

  112. 112

    Oostlander, A. E., Meijer, G. A. & Ylstra, B. Microarray-based comparative genomic hybridization and its applications in human genetics. Clin. Genet. 66, 488–495 (2004).

    CAS  PubMed  Google Scholar 

  113. 113

    Conlin, L. K. et al. Mechanisms of mosaicism, chimerism and uniparental disomy identified by single nucleotide polymorphism array analysis. Hum. Mol. Genet. 19, 1263–1275 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  114. 114

    Rodriguez-Santiago, B. et al. Mosaic uniparental disomies and aneuploidies as large structural variants of the human genome. Am. J Hum. Genet. 87, 129–138 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  115. 115

    Zerbino, D. R. & Birney, E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  116. 116

    Perry, G. H. et al. Diet and the evolution of human amylase gene copy number variation. Nature Genet. 39, 1256–1260 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank J. Kidd, G. Cooper and S. Girirajan for valuable comments in the preparation of this review; P. Sudmant, F. Antonacci and J. Kitzman for their help in creating the figures; and T. Brown for proofreading the text. We also thank the authors of the algorithms that were unpublished during the preparation of this manuscript for sharing pre-prints and extended descriptions (S. McCarroll, K. Chen, A. Abyzov, Z. Iqbal and C. Stewart). B.P.C. is supported by a fellowship from the Canadian Institutes of Health Research. E. E.E. is an investigator of the Howard Hughes Medical Institute.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Evan E. Eichler.

Ethics declarations

Competing interests

Evan E. Eichler is a scientific advisory board member for Pacific Biosciences.

Related links

Related links

FURTHER INFORMATION

Authors' homepage

1000 Genomes Project

CNVnator

Cortex

dbVar

Nature Reviews Genetics series on Applications of Next-Generation Sequencing

Nature Reviews Genetics series on Study Designs

Repbase

TIGRA

Glossary

Structural variant

(SV). Genomic rearrangements that affect >50 bp of sequence, including deletions, novel insertions, inversions, mobile-element transpositions, duplications and translocations.

Copy number variant

(CNV). Also defined as unbalanced structural variants; variants that change the number of base pairs in the genome.

Mobile elements

DNA sequences that move location within the genome. Active mobile elements (transposons) in the human genome include Alu, L1 and SVA sequences.

Array comparative genomic hybridization

(Array CGH). A technique based on competitively hybridizing fluorescently labelled test and reference samples to a known target DNA sequence immobilized on a solid glass substrate and then interrogating the hybridization ratio.

SNP microarrays

Hybridization-based assays in which the target DNA sequences are discriminated on the basis of a single base difference. Assays are processed with a single sample per array and perform both SNP genotyping and copy-number interrogation.

Single-base extension

Single-base-extension reactions use a primer that binds to a region of interest and follow this with an extension reaction that allows the incorporation of a single base after the primer.

Segmental uniparental disomy

Uniparental disomy (often abbreviated UPD) is a cryptic alteration in which two copies of a chromosome or segment (segmental UPD) are present, but derive from a single parent.

Nano-channel flow cells

Specialized flow cells narrow enough for a single DNA molecule to pass through in linear form without having sufficient room to fold over on itself.

Nanoslits

Narrow channels (~1 μm wide) on specialized silicon substrates. They are loaded with linear stretched DNA strands by applying a charge to microchannels on the substrates that contain electrodes.

Emulsion picolitre droplet PCR

Emulsion PCR is based on the generation of independent PCR reaction by emulsifying the aqueous reagents in oil such that each droplet becomes a separate PCR reaction. Reagents are diluted such that each droplet contains a single target sequence.

Paired-end reads

Two reads sequenced from the start and end of the same molecule (such as a fosmid, bacterial artificial chromosome or next-generation sequence fragment).

Fosmid end sequence library

Paired-end sequences from a collection of bacterial cloning vectors that can carry an average of 40 kb of DNA.

Tag SNP

A SNP in strong linkage disequilibrium with a set of SNPs or a copy number variant.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Alkan, C., Coe, B. & Eichler, E. Genome structural variation discovery and genotyping. Nat Rev Genet 12, 363–376 (2011). https://doi.org/10.1038/nrg2958

Download citation

Further reading