Enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations

Key Points

  • The ability to identify low-frequency genetic variants among heterogeneous populations of cells or DNA molecules is important in many fields of basic science, clinical medicine and other applications, yet current high-throughput DNA sequencing technologies have an error rate between 1 per 100 and 1 per 1,000 base pairs sequenced, which obscures their presence below this level.

  • As next-generation sequencing technologies evolved over the decade, throughput has improved markedly, but raw accuracy has remained generally unchanged. Researchers with a need for high accuracy developed data filtering methods and incremental biochemical improvements that modestly improve low-frequency variant detection, but background errors remain limiting in many fields.

  • The most profoundly impactful means for reducing errors, first developed approximately 7 years ago, has been the concept of single-molecule consensus sequencing. This entails redundant sequencing of multiple copies of a given specific DNA molecule and discounting of variants that are not present in all or most of the copies as likely errors.

  • Consensus sequencing can be achieved by labelling each molecule with a unique molecular barcode before generating copies, which allows subsequent comparison of these copies or schemes whereby copies are physically joined and sequenced together. Because of trade-offs in cost, time and accuracy, no single method is optimal for every application, and each method should be considered on a case-by-case basis.

  • Major applications for high-accuracy DNA sequencing include non-invasive cancer diagnostics, cancer screening, early detection of cancer relapse or impending drug resistance, infectious disease applications, prenatal diagnostics, forensics and mutagenesis assessment.

  • Future advances in ultra-high-accuracy sequencing are likely to be driven by an emerging generation of single-molecule sequencers, particularly those that allow independent sequence comparison of both strands of native DNA duplexes.

Abstract

Mutations, the fuel of evolution, are first manifested as rare DNA changes within a population of cells. Although next-generation sequencing (NGS) technologies have revolutionized the study of genomic variation between species and individual organisms, most have limited ability to accurately detect and quantify rare variants among the different genome copies in heterogeneous mixtures of cells or molecules. We describe the technical challenges in characterizing subclonal variants using conventional NGS protocols and the recent development of error correction strategies, both computational and experimental, including consensus sequencing of single DNA molecules. We also highlight major applications for low-frequency mutation detection in science and medicine, describe emerging methodologies and provide our vision for the future of DNA sequencing.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: The signal-to-noise problem.
Figure 2: Methods of consensus-based error correction on short-read platforms.
Figure 3: Methods of single-molecule sequencing consensus-based error correction.
Figure 4: Impact of error correction technology on detection sensitivity.
Figure 5: Applications of rare variant detection.

References

  1. 1

    Darwin, C. On the Origin of Species (John Murray Press, 1859).

    Google Scholar 

  2. 2

    Luria, S. E. & Delbrück, M. Mutations of bacteria from virus sensitivity to virus resistance. Genetics 28, 491–511 (1943).

    CAS  PubMed  PubMed Central  Google Scholar 

  3. 3

    Cairns, J. Mutation selection and the natural history of cancer. Nature 255, 197–200 (1975).

    Article  CAS  PubMed  Google Scholar 

  4. 4

    Fisher, R. et al. Deep sequencing reveals minor protease resistance mutations in patients failing a protease inhibitor regimen. J. Virol. 86, 6231–6237 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. 5

    Schmitt, M. W., Loeb, L. A. & Salk, J. J. The influence of subclonal resistance mutations on targeted cancer therapy. Nat. Rev. Clin. Oncol. 13, 335–347 (2016).

    Article  CAS  PubMed  Google Scholar 

  6. 6

    Maher, G. J. et al. Visualizing the origins of selfish de novo mutations in individual seminiferous tubules of human testes. Proc. Natl Acad. Sci. USA 113, 2454–2459 (2016).

    Article  CAS  PubMed  Google Scholar 

  7. 7

    Kennedy, S. R., Loeb, L. A. & Herr, A. J. Somatic mutations in aging, cancer and neurodegeneration. Mech. Ageing Dev. 133, 118–126 (2012).

    Article  CAS  PubMed  Google Scholar 

  8. 8

    Vijg, J. Somatic mutations, genome mosaicism, cancer and aging. Curr. Opin. Genet. Dev. 26, 141–149 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. 9

    Shendure, J. et al. DNA sequencing at 40: past, present and future. Nature 550, 345–353 (2017).

    Article  CAS  PubMed  Google Scholar 

  10. 10

    Goodwin, S., Mcpherson, J. D. & Mccombie, W. R. Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet. 17, 333–351 (2016).

    Article  CAS  PubMed  Google Scholar 

  11. 11

    Sanger, F., Nicklen, S. & Coulson, A. R. DNA sequencing with chain-terminating inhibitors. Proc. Natl Acad. Sci. USA 74, 5463–5467 (1977). One of two Nobel prize-winning DNA sequencing methodologies published in 1977 (the other being that of Maxam and Gilbert). The Sanger approach formed the basis of The Human Genome Project.

    Article  CAS  Google Scholar 

  12. 12

    Ley, T. J. et al. DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature 456, 66–72 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. 13

    Zagordi, O., Klein, R., Däumer, M. & Beerenwinkel, N. Error correction of next-generation sequencing data and reliable estimation of HIV quasispecies. Nucleic Acids Res. 38, 7400–7409 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. 14

    Parsons, B. L. & Heflich, R. H. Genotypic selection methods for the direct analysis of point mutations. Mutat. Res. 387, 97–121 (1997).

    Article  CAS  PubMed  Google Scholar 

  15. 15

    Bielas, J. H. & Loeb, L. A. Quantification of random genomic mutations. Nat. Methods 2, 285–290 (2005).

    Article  CAS  PubMed  Google Scholar 

  16. 16

    Li, J. et al. Replacing PCR with COLD-PCR enriches variant DNA sequences and redefines the sensitivity of genetic testing. Nat. Med. 14, 579–584 (2008).

    Article  CAS  PubMed  Google Scholar 

  17. 17

    Sykes, P. J. et al. Quantitation of targets for PCR by use of limiting dilution. Biotechniques 13, 444–449 (1992).

    CAS  PubMed  Google Scholar 

  18. 18

    Vogelstein, B. & Kinzler, K. W. Digital, P. C. R. Proc. Natl Acad. Sci. USA 96, 9236–9241 (1999).

    Article  CAS  PubMed  Google Scholar 

  19. 19

    Hindson, B. J. et al. High-throughput droplet digital PCR system for absolute quantitation of DNA copy number. Anal. Chem. 83, 8604–8610 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. 20

    Fox, E. J., Reid-Bayliss, K. S., Emond, M. J. & Loeb, L. A. Accuracy of next generation sequencing platforms. Next Gener. Seq. Appl. 1, 1000106 (2014).

    PubMed  PubMed Central  Google Scholar 

  21. 21

    Blokzijl, F. et al. Tissue-specific mutation accumulation in human adult stem cells during life. Nature 538, 260–264 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. 22

    Ewing, B. & Green, P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8, 186–194 (1998). Among the first and most important uses of rigorous statistical methods to assign degree of certainty to DNA sequencing data.

    Article  CAS  PubMed  Google Scholar 

  23. 23

    Cock, P. J. A., Fields, C. J., Goto, N., Heuer, M. L. & Rice, P. M. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res. 38, 1767–1771 (2010).

    Article  CAS  Google Scholar 

  24. 24

    Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. 25

    Koboldt, D. C. et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568–576 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. 26

    Wang, Q. et al. Detecting somatic point mutations in cancer genome sequencing data: a comparison of mutation callers. Genome Med. 5, 91 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. 27

    Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at ArXiV arXiv:1303.3997v2 [q-bio.GN] (2013).

    Google Scholar 

  28. 28

    Wei, Z., Wang, W., Hu, P., Lyon, G. J. & Hakonarson, H. SNVer: a statistical tool for variant calling in analysis of pooled or individual next-generation sequencing data. Nucleic Acids Res. 39, e132–e132 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. 29

    Wilm, A. et al. LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets. Nucleic Acids Res. 40, 11189–11201 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. 30

    Gerstung, M. et al. Reliable detection of subclonal single-nucleotide variants in tumour cell populations. Nat. Commun. 3, 811 (2012).

    Article  CAS  PubMed  Google Scholar 

  31. 31

    Costello, M. et al. Discovery and characterization of artifactual mutations in deep coverage targeted capture sequencing data due to oxidative DNA damage during sample preparation. Nucleic Acids Res. 41, e67–e67 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. 32

    Chen, L., Liu, P., Evans, T. C. & Ettwiller, L. M. DNA damage is a pervasive cause of sequencing errors, directly confounding variant identification. Science 355, 752–756 (2017).

    Article  CAS  PubMed  Google Scholar 

  33. 33

    Schirmer, M., D'Amore, R., Ijaz, U. Z., Hall, N. & Quince, C. Illumina error profiles: resolving fine-scale variation in metagenomic sequencing data. BMC Bioinformatics 17, 125 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. 34

    Martincorena, I. et al. Tumor evolution. High burden and pervasive positive selection of somatic mutations in normal human skin. Science 348, 880–886 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. 35

    Welch, J. S. et al. The origin and evolution of mutations in acute myeloid leukemia. Cell 150, 264–278 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. 36

    Nik-Zainal, S. et al. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature 534, 47–54 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. 37

    Kircher, M., Sawyer, S. & Meyer, M. Double indexing overcomes inaccuracies in multiplex sequencing on the Illumina platform. Nucleic Acids Res. 40, e3 (2012). An important description of the commonness of PCR chimaeras, optical duplicates and index swapping that occurs during NGS library preparation and polony formation. This contributed to the now common practice of dual indexing for error-sensitive applications.

    Article  CAS  Google Scholar 

  38. 38

    Potapov, V. & Ong, J. L. Examining sources of error in PCR by single-molecule sequencing. PLOS ONE 12, e0169774 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. 39

    Brodin, J. et al. PCR-induced transitions are the major source of error in cleaned ultra-deep pyrosequencing data. PLOS ONE 8, e70388 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. 40

    Star, B. et al. Palindromic sequence artifacts generated during next generation sequencing library preparation from historic and ancient DNA. PLOS ONE 9, e89676 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. 41

    Van Allen, E. M. et al. Whole-exome sequencing and clinical interpretation of formalin-fixed, paraffin-embedded tumor samples to guide precision cancer medicine. Nat. Med. 20, 682–688 (2014).

    Article  CAS  PubMed  Google Scholar 

  42. 42

    Arbeithuber, B., Makova, K. D. & Tiemann-Boege, I. Artifactual mutations resulting from DNA lesions limit detection levels in ultrasensitive sequencing applications. DNA Res. 23, 547–559 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. 43

    Lindahl, T. & Nyberg, B. Rate of depurination of native deoxyribonucleic acid. Biochemistry 11, 3610–3618 (1972).

    Article  CAS  PubMed  Google Scholar 

  44. 44

    Knierim, E., Lucke, B., Schwarz, J. M., Schuelke, M. & Seelow, D. Systematic comparison of three methods for fragmentation of long-range PCR products for next generation sequencing. PLOS ONE 6, e28240 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. 45

    Do, H. & Dobrovic, A. Sequence artifacts in DNA from formalin-fixed tissues: causes and strategies for minimization. Clin. Chem. 61, 64–71 (2015).

    Article  CAS  PubMed  Google Scholar 

  46. 46

    Lou, D. I. et al. High-throughput DNA sequencing errors are reduced by orders of magnitude using circle sequencing. Proc. Natl Acad. Sci. USA 110, 19872–19877 (2013). The first important description of consensus sequencing by tandem duplication of library molecules. Although challenging on short-read sequencers, this concept is likely to become very important as single-molecule sequencers improve in the coming years.

    Article  CAS  Google Scholar 

  47. 47

    Chen, G., Mosier, S., Gocke, C. D., Lin, M.-T. & Eshleman, J. R. Cytosine deamination is a major cause of baseline noise in next-generation sequencing. Mol. Diagn. Ther. 18, 587–593 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. 48

    Schaaper, R. M., Kunkel, T. A. & Loeb, L. A. Infidelity of DNA synthesis associated with bypass of apurinic sites. Proc. Natl Acad. Sci. USA 80, 487–491 (1983).

    Article  CAS  PubMed  Google Scholar 

  49. 49

    Sagher, D. & Strauss, B. Insertion of nucleotides opposite apurinic/apyrimidinic sites in deoxyribonucleic acid during in vitro synthesis: uniqueness of adenine nucleotides. Biochemistry 22, 4518–4526 (1983).

    Article  CAS  PubMed  Google Scholar 

  50. 50

    Nishimura, S. 8-Hydroxyguanine: a base for discovery. DNA Repair 10, 1078–1083 (2011).

    Article  CAS  PubMed  Google Scholar 

  51. 51

    Sinha, R. et al. Index switching causes 'spreading-of-signal' among multiplexed samples in Illumina HiSeq 4000 DNA sequencing. https://doi.org/10.1101/125724 (2017).

  52. 52

    Hiatt, J. B., Turner, E. H., Patwardhan, R. P., Caperton, L. & Shendure, J. Next-generation DNA sequencing for de novo genome assembly. Western Student Medical Research Forum (2009).

  53. 53

    Hiatt, J. B., Patwardhan, R. P., Turner, E. H., Lee, C. & Shendure, J. Parallel, tag-directed assembly of locally derived short sequence reads. Nat. Methods 7, 119–122 (2010). The first description of consensus sequencing PCR duplicates for error correction, both with UMIs and without.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. 54

    Casbon, J. A., Osborne, R. J., Brenner, S. & Lichtenstein, C. P. A method for counting PCR template molecules with application to next-generation sequencing. Nucleic Acids Res. 39, e81 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. 55

    Kinde, I., Wu, J., Papadopoulos, N., Kinzler, K. W. & Vogelstein, B. Detection and quantification of rare mutations with massively parallel sequencing. Proc. Natl Acad. Sci. USA 108, 9530–9535 (2011). A key early description of single-strand tag-based error correction for rare variant detection. This publication put the significance in clinical context and was probably the most important launch for the field.

    Article  Google Scholar 

  56. 56

    Jabara, C. B., Jones, C. D., Roach, J., Anderson, J. A. & Swanstrom, R. Accurate sampling and deep sequencing of the HIV-1 protease gene using a Primer ID. Proc. Natl Acad. Sci. USA 108, 20166–20171 (2011).

    Article  Google Scholar 

  57. 57

    Fu, G. K., Hu, J., Wang, P.-H. & Fodor, S. P. A. Counting individual DNA molecules by the stochastic attachment of diverse labels. Proc. Natl Acad. Sci. USA 108, 9026–9031 (2011).

    Article  PubMed  Google Scholar 

  58. 58

    Kivioja, T. et al. Counting absolute numbers of molecules using unique molecular identifiers. Nat. Methods 9, 72–74 (2011).

    Article  CAS  PubMed  Google Scholar 

  59. 59

    Shiroguchi, K., Jia, T. Z., Sims, P. A. & Xie, X. S. Digital RNA sequencing minimizes sequence-dependent bias and amplification noise with optimized single-molecule barcodes. Proc. Natl Acad. Sci. USA 109, 1347–1352 (2012).

    Article  Google Scholar 

  60. 60

    Schmitt, M. W. et al. Detection of ultra-rare mutations by next-generation sequencing. Proc. Natl Acad. Sci. USA 109, 14508–14513 (2012). The initial description of DupSeq and the concept of labelling copies of both strands of individual double-stranded molecules to allow them to be sequenced and compared for even greater accuracy. This technique opened the door to investigations of ultra-rare variants, such as those that occur in ageing and with mutagenic chemical exposure.

    Article  Google Scholar 

  61. 61

    Hoang, M. L. et al. Genome-wide quantification of rare somatic mutations in normal human tissues using massively parallel sequencing. Proc. Natl Acad. Sci. USA 113, 9846–9851 (2016). A duplex sequencing approach at very low depth and not requiring exogenous UMIs. An excellent example of genotoxicity and ageing applications.

    Article  CAS  PubMed  Google Scholar 

  62. 62

    Nachmanson, D. et al. CRISPR-DS: an efficient, low DNA input method for ultra-accurate sequencing. Preprint at bioRxivhttps://doi.org/10.1101/207027 (2017).

  63. 63

    Liang, R. H. et al. Theoretical and experimental assessment of degenerate primer tagging in ultra-deep applications of next-generation sequencing. Nucleic Acids Res. 42, e98 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. 64

    Zhang, T.-H., Wu, N. C. & Sun, R. A benchmark study on error-correction by read-pairing and tag-clustering in amplicon-based deep sequencing. BMC Genomics 17, 108 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. 65

    Smith, T., Heger, A. & Sudbery, I. UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy. Genome Res. 27, 491–499 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. 66

    Ståhlberg, A. et al. Simple, multiplexed, PCR-based barcoding of DNA enables sensitive mutation detection in liquid biopsies using sequencing. Nucleic Acids Res. 44, e105 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. 67

    Ståhlberg, A. et al. Simple multiplexed PCR-based barcoding of DNA for ultrasensitive mutation detection by next-generation sequencing. Nat. Protoc. 12, 664–682 (2017).

    Article  CAS  PubMed  Google Scholar 

  68. 68

    Hiatt, J. B., Pritchard, C. C., Salipante, S. J., O'Roak, B. J. & Shendure, J. Single molecule molecular inversion probes for targeted, high accuracy detection of low frequency variation. Genome Res. https://doi.org/10.1101/gr.147686.112 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. 69

    Carlson, K. D. et al. MIPSTR: a method for multiplex genotyping of germline and somatic STR variation across many individuals. Genome Res. 25, 750–761 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. 70

    Boyle, E. A., O'Roak, B. J., Martin, B. K., Kumar, A. & Shendure, J. MIPgen: optimized modeling and design of molecular inversion probes for targeted resequencing. Bioinformatics 30, 2670–2672 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. 71

    Wang, K. et al. Ultra-precise detection of mutations by droplet-based amplification of circularized DNA. BMC Genomics 17, 214 (2016). An important description of several biochemical techniques to improve consensus making efficiency and reduce cost.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. 72

    Hong, L. Z. et al. BAsE-Seq: a method for obtaining long viral haplotypes from short sequence reads. Genome Biol. 15, 517 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. 73

    Schmitt, M. W., Fox, E. J. & Salk, J. J. Risks of double-counting in deep sequencing. Proc. Natl Acad. Sci. USA 111, E1560 (2014).

    Article  CAS  PubMed  Google Scholar 

  74. 74

    Hong, J. & Gresham, D. Incorporation of unique molecular identifiers in TruSeq adapters improves the accuracy of quantitative sequencing. Biotechniques 63, 221–226 (2017).

    Article  CAS  PubMed  Google Scholar 

  75. 75

    Narayan, A. et al. Ultrasensitive measurement of hotspot mutations in tumor DNA in blood using error-suppressed multiplexed deep sequencing. Cancer Res. 72, 3492–3498 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. 76

    Gregory, M. T. et al. Targeted single molecule mutation detection with massively parallel sequencing. Nucleic Acids Res. 44, e22–e22 (2016).

    Article  CAS  PubMed  Google Scholar 

  77. 77

    Pel, J. et al. Duplex Proximity Sequencing (Pro-Seq): a method to improve DNA sequencing accuracy without the cost of molecular barcoding redundancy. Preprint at bioRxiv https://doi.org/10.1101/163444 (2017).

    Google Scholar 

  78. 78

    Kennedy, S. R. et al. Detecting ultralow-frequency mutations by duplex sequencing. Nat. Protoc. 9, 2586–2606 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. 79

    Roach, J. C. et al. Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328, 636–639 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. 80

    Kennedy, S. R., Salk, J. J., Schmitt, M. W. & Loeb, L. A. Ultra-sensitive sequencing reveals an age-related increase in somatic mitochondrial mutations that are inconsistent with oxidative damage. PLOS Genet. 9, e1003794 (2013). The first description of high-accuracy consensus sequencing to measure the effect of human ageing on somatic mutation load.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. 81

    Taylor, P. H., Cinquin, A. & Cinquin, O. Quantification of in vivo progenitor mutation accrual with ultra-low error rate and minimal input DNA using SIP-HAVA-seq. Genome Res. 26, 1600–1611 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. 82

    Hoekstra, J. G., Hipp, M. J., Montine, T. J. & Kennedy, S. R. Mitochondrial DNA mutations increase in early stage Alzheimer disease and are inconsistent with oxidative damage. Ann. Neurol. 80, 301–306 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. 83

    Pickrell, A. M. et al. Endogenous parkin preserves dopaminergic substantia nigral neurons following mitochondrial DNA mutagenic stress. Neuron 87, 371–381 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. 84

    Reid-Bayliss, K. S., Arron, S. T., Loeb, L. A., Bezrookove, V. & Cleaver, J. E. Why Cockayne syndrome patients do not get cancer despite their DNA repair deficiency. Proc. Natl Acad. Sci. USA 113, 10151–10156 (2016).

    Article  CAS  PubMed  Google Scholar 

  85. 85

    Chawanthayatham, S. et al. Mutational spectra of aflatoxin B1 in vivo establish biomarkers of exposure for human hepatocellular carcinoma. Proc. Natl Acad. Sci. USA 114, E3101–E3109 (2017).

    Article  CAS  PubMed  Google Scholar 

  86. 86

    Mattox, A. K. et al. Bisulfite-converted duplexes for the strand-specific detection and quantification of rare mutations. Proc. Natl Acad. Sci. USA 114, 4733–4738 (2017).

    Article  CAS  PubMed  Google Scholar 

  87. 87

    Kumar, V. et al. Partial bisulfite conversion for unique template sequencing. Nucleic Acids Res. https://doi.org/10.1093/nar/gkx1054 (2017).

    Article  CAS  PubMed Central  Google Scholar 

  88. 88

    Deamer, D., Akeson, M. & Branton, D. Three decades of nanopore sequencing. Nat. Biotechnol. 34, 518–524 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. 89

    Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. 323, 133–138 (2009).

  90. 90

    Madoui, M.-A. et al. Genome assembly using nanopore-guided long and error-free DNA reads. BMC Genomics 16, 327 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  91. 91

    Schüle, B. et al. Parkinson's disease associated with pure ATXN10 repeat expansion. NPJ Parkinsons Dis. 3, 27 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  92. 92

    Li, C. et al. INC-Seq: accurate single molecule reads using nanopore sequencing. Gigascience 5, 34 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  93. 93

    Jain, M., Olsen, H. E., Paten, B. & Akeson, M. The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol. 17, 239 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  94. 94

    Travers, K. J., Chin, C.-S., Rank, D. R., Eid, J. S. & Turner, S. W. A flexible and efficient template format for circular consensus sequencing and SNP detection. Nucleic Acids Res. 38, e159 (2010). The first description of consensus sequencing based on iterative resequencing of both strands of individual molecules. This concept, although currently challenging, will probably become very important as single-molecule DNA sequencers improve.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  95. 95

    Loomis, E. W. et al. Sequencing the unsequenceable: expanded CGG-repeat alleles of the fragile X gene. Genome Res. 23, 121–128 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  96. 96

    Russo, G. et al. Highly sensitive, non-invasive detection of colorectal cancer mutations using single molecule, third generation sequencing. Appl. Transl Genom. 7, 32–39 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  97. 97

    Frank, J. A. et al. Improved metagenome assemblies and taxonomic binning using long-read circular consensus sequence data. Sci. Rep. 6, 25373 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  98. 98

    Hestand, M. S., Van Houdt, J., Cristofoli, F. & Vermeesch, J. R. Polymerase specific error rates and profiles identified by single molecule sequencing. Mutat. Res. 784–785, 39–45 (2016).

    Article  CAS  PubMed  Google Scholar 

  99. 99

    Heerema, S. J. & Dekker, C. Graphene nanodevices for DNA sequencing. Nat. Nanotechnol. 11, 127–136 (2016).

    Article  CAS  PubMed  Google Scholar 

  100. 100

    Beechem, J. Library free targeted sequencing of native genomic DNA FFPE samples using Hyb & Seq technology-the hybridization based single molecule sequencing system. Advances in Genome Biology and Technology Annual Meeting https://www.nanostring.com/application/files/3815/0206/1895/AGBT2017_HybSeq_Chemistry_Final.pdf (2017).

    Google Scholar 

  101. 101

    Johnson, S. S., Zaikova, E., Goerlitz, D. S., Bai, Y. & Tighe, S. W. Real-time DNA sequencing in the Antarctic dry valleys using the Oxford Nanopore sequencer. J. Biomol. Tech. 28, 2–7 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  102. 102

    Wang, K. et al. Using ultra-sensitive next generation sequencing to dissect DNA damage-induced mutagenesis. Sci. Rep. 6, 25310 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  103. 103

    Stoler, N., Arbeithuber, B., Guiblet, W., Makova, K. D. & Nekrutenko, A. Streamlined analysis of duplex sequencing data with Du Novo. Genome Biol. 17, 180 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  104. 104

    Newman, A. M. et al. Integrated digital error suppression for improved detection of circulating tumor DNA. Nat. Biotechnol. 34, 547–555 (2016). An important early comprehensive description of a cfDNA liquid biopsy approach using tag-based error correction techniques.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  105. 105

    Zheng, Z. et al. Anchored multiplex PCR for targeted next-generation sequencing. Nat. Med. 20, 1479–1484 (2014).

    Article  CAS  PubMed  Google Scholar 

  106. 106

    Kennedy, S. & Hipp, M. J. Removing sequencer and PCR artifacts for forensic DNA analysis on massively parallel sequencing platforms: https://www.promega.com/-/media/files/products-and-services/genetic-identity/ishi-28-oral-abstracts/kennedy-ishipaper.pdf (2017).

  107. 107

    Krimmel, J. D., Salk, J. J. & Risques, R.-A. Cancer-like mutations in non-cancer tissue: towards a better understanding of multistep carcinogenesis. Transl Cancer Res. https://doi.org/10.21037/tcr.2016.11.67 (2016).

    Article  Google Scholar 

  108. 108

    Loeb, L. A., Springgate, C. F. & Battula, N. Errors in DNA replication as a basis of malignant changes. Cancer Res. 34, 2311–2321 (1974).

    CAS  PubMed  Google Scholar 

  109. 109

    Merlo, L. M. F., Pepper, J. W., Reid, B. J. & Maley, C. C. Cancer as an evolutionary and ecological process. Nat. Rev. Cancer 6, 924–935 (2006).

    Article  CAS  Google Scholar 

  110. 110

    Gatenby, R. A. & Gillies, R. J. A microenvironmental model of carcinogenesis. Nat. Rev. Cancer 8, 56–61 (2008).

    Article  CAS  PubMed  Google Scholar 

  111. 111

    Salk, J. J., Fox, E. J. & Loeb, L. A. Mutational heterogeneity in human cancers: origin and consequences. Annu. Rev. Pathol. 5, 51–75 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  112. 112

    Greaves, M. & Maley, C. C. Clonal evolution in cancer. Nature 481, 306–313 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  113. 113

    Burrell, R. A., McGranahan, N., Bartek, J. & Swanton, C. The causes and consequences of genetic heterogeneity in cancer evolution. Nature 501, 338–345 (2013).

    Article  CAS  PubMed  Google Scholar 

  114. 114

    Gerlinger, M. et al. Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N. Engl. J. Med. 366, 883–892 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  115. 115

    Sottoriva, A. et al. Intratumor heterogeneity in human glioblastoma reflects cancer evolutionary dynamics. Proc. Natl Acad. Sci. USA 110, 4009–4014 (2013).

    Article  PubMed  Google Scholar 

  116. 116

    Zhang, J. et al. Intratumor heterogeneity in localized lung adenocarcinomas delineated by multiregion sequencing. Science 346, 256–259 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  117. 117

    de Bruin, E. C. et al. Spatial and temporal diversity in genomic instability processes defines lung cancer evolution. Science 346, 251–256 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  118. 118

    Naxerova, K. et al. Hypermutable DNA chronicles the evolution of human colon cancer. Proc. Natl Acad. Sci. USA 111, E1889–E1898 (2014).

    Article  CAS  PubMed  Google Scholar 

  119. 119

    Reiter, J. G. et al. Reconstructing metastatic seeding patterns of human cancers. Nat. Commun. 8, 14114 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  120. 120

    Marusyk, A. et al. Non-cell-autonomous driving of tumour growth supports sub-clonal heterogeneity. Nature 514, 54–58 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  121. 121

    Yates, L. R. et al. Subclonal diversification of primary breast cancer revealed by multiregion sequencing. Nat. Med. 21, 751–759 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  122. 122

    Ding, L. et al. Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing. Nature 481, 506–510 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  123. 123

    Sequist, L. V. et al. Genotypic and histological evolution of lung cancers acquiring resistance to EGFR inhibitors. Sci. Transl Med. 3, 75ra26 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  124. 124

    Jamal-Hanjani, M. et al. Tracking the evolution of non-small-cell lung cancer. N. Engl. J. Med. 376, 2109–2121 (2017).

    Article  CAS  PubMed  Google Scholar 

  125. 125

    Andor, N. et al. Pan-cancer analysis of the extent and consequences of intratumor heterogeneity. Nat. Med. 22, 105–113 (2016).

    Article  CAS  PubMed  Google Scholar 

  126. 126

    Mroz, E. A. et al. High intratumor genetic heterogeneity is related to worse outcome in patients with head and neck squamous cell carcinoma. Cancer 119, 3034–3042 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  127. 127

    Parker, W. T., Ho, M., Scott, H. S., Hughes, T. P. & Branford, S. Poor response to second-line kinase inhibitors in chronic myeloid leukemia patients with multiple low-level mutations, irrespective of their resistance profile. Blood 119, 2234–2238 (2012).

    Article  CAS  PubMed  Google Scholar 

  128. 128

    Landau, D. A. et al. Evolution and impact of subclonal mutations in chronic lymphocytic leukemia. Cell 152, 714–726 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  129. 129

    Klco, J. M. et al. Association between mutation clearance after induction therapy and outcomes in acute myeloid leukemia. JAMA 314, 811–822 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  130. 130

    Misale, S. et al. Emergence of KRAS mutations and acquired resistance to anti-EGFR therapy in colorectal cancer. Nature 486, 532–536 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  131. 131

    Stroun, M., Anker, P., Lyautey, J., Lederrey, C. & Maurice, P. A. Isolation and characterization of DNA from the plasma of cancer patients. Eur. J. Cancer Clin. Oncol. 23, 707–712 (1987).

    Article  CAS  PubMed  Google Scholar 

  132. 132

    Bettegowda, C. et al. Detection of circulating tumor DNA in early- and late-stage human malignancies. Sci. Transl Med. 6, 224ra24 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  133. 133

    Wan, J. C. M. et al. Liquid biopsies come of age: towards implementation of circulating tumour DNA. Nat. Rev. Cancer 17, 223–238 (2017).

    Article  CAS  PubMed  Google Scholar 

  134. 134

    Murtaza, M. et al. Non-invasive analysis of acquired resistance to cancer therapy by sequencing of plasma DNA. Nature 497, 108–112 (2013).

    Article  CAS  PubMed  Google Scholar 

  135. 135

    Garcia-Murillas, I. et al. Mutation tracking in circulating tumor DNA predicts relapse in early breast cancer. Sci. Transl Med. 7, 302ra133 (2015).

    Article  PubMed  Google Scholar 

  136. 136

    Tie, J. et al. Circulating tumor DNA analysis detects minimal residual disease and predicts recurrence in patients with stage II colon cancer. Sci. Transl Med. 8, 346ra92 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  137. 137

    Newman, A. M. et al. An ultrasensitive method for quantitating circulating tumor DNA with broad patient coverage. Nat. Med. 20, 548–554 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  138. 138

    Fujii, T. et al. Mutation-enrichment next-generation sequencing for quantitative detection of KRAS mutations in urine cell-free DNA from patients with advanced cancers. Clin. Cancer Res. 23, 3657–3666 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  139. 139

    Wang, Y. et al. Detection of tumor-derived DNA in cerebrospinal fluid of patients with primary tumors of the brain and spinal cord. Proc. Natl Acad. Sci. USA 112, 9704–9709 (2015).

    Article  CAS  PubMed  Google Scholar 

  140. 140

    Kinde, I. et al. Evaluation of DNA from the Papanicolaou test to detect ovarian and endometrial cancers. Sci. Transl Med. 5, 167ra4 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  141. 141

    Maritschnegg, E. et al. Lavage of the uterine cavity for molecular detection of Müllerian duct carcinomas: a proof-of-concept study. J. Clin. Oncol. 33, 4293–4300 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  142. 142

    Wang, Y. et al. Detection of somatic mutations and HPV in the saliva and plasma of patients with head and neck squamous cell carcinomas. Sci. Transl Med. 7, 293ra104 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  143. 143

    Sidransky, D. et al. Identification of ras oncogene mutations in the stool of patients with curable colorectal tumors. Science 256, 102–105 (1992).

    Article  CAS  PubMed  Google Scholar 

  144. 144

    Aravanis, A. M., Lee, M. & Klausner, R. D. Next-generation sequencing of circulating tumor DNA for early cancer detection. Cell 168, 571–574 (2017).

    Article  CAS  PubMed  Google Scholar 

  145. 145

    Armitage, P. & Doll, R. The age distribution of cancer and a multi-stage theory of carcinogenesis. Br. J. Cancer 8, 1–12 (1954).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  146. 146

    Genovese, G. et al. Clonal hematopoiesis and blood-cancer risk inferred from blood DNA sequence. N. Engl. J. Med. 371, 2477–2487 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  147. 147

    Jaiswal, S. et al. Age-related clonal hematopoiesis associated with adverse outcomes. N. Engl. J. Med. 371, 2488–2498 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  148. 148

    Young, A. L., Challen, G. A., Birmann, B. M. & Druley, T. E. Clonal haematopoiesis harbouring AML-associated mutations is ubiquitous in healthy adults. Nat. Commun. 7, 12484 (2016). A description of the use of a single-strand tag-based error correction technique to identify preneoplastic clones in nearly all adults, which had only 2 years earlier been believed to occur in only a subset of very elderly individuals. It is an important example of how a fundamental biological understanding can change quickly with improved discovery technologies.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  149. 149

    Krimmel, J. D. et al. Ultra-deep sequencing detects ovarian cancer cells in peritoneal fluid and reveals somatic TP53 mutations in noncancerous tissues. Proc. Natl Acad. Sci. USA 113, 6005–6010 (2016).

    Article  CAS  PubMed  Google Scholar 

  150. 150

    Salk, J. J. et al. Duplex Sequencing detects cancer-associated mutations arising during normal aging: clonal evolution over a century of human lifetime [abstract]. Cancer Res. 77, 3041 (2017).

    Google Scholar 

  151. 151

    Jee, J. et al. Rates and mechanisms of bacterial mutagenesis from maximum-depth sequencing. Nature 534, 693–696 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  152. 152

    Maslov, A. Y., Quispe-Tintaya, W., Gorbacheva, T., White, R. R. & Vijg, J. High-throughput sequencing in mutation detection: a new generation of genotoxicity tests? Mutat. Res. 776, 136–143 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  153. 153

    Fielden, M. R. et al.Modernizing human cancer risk assessment of therapeutics. Trends Pharmacol. Sci. https://doi.org/10.1016/j.tips.2017.11.005 (2017).

    Article  CAS  PubMed  Google Scholar 

  154. 154

    Kim, D., Kim, S., Kim, S., Park, J. & Kim, J.-S. Genome-wide target specificities of CRISPR-Cas9 nucleases revealed by multiplex Digenome-seq. Genome Res. 26, 406–415 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  155. 155

    Caperton, L. et al. Assisted reproductive technologies do not alter mutation frequency or spectrum. Proc. Natl Acad. Sci. USA 104, 5085–5090 (2007).

    Article  CAS  PubMed  Google Scholar 

  156. 156

    Nelson, J. L. The otherness of self: microchimerism in health and disease. Trends Immunol. 33, 421–427 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  157. 157

    Eun, J. K., Guthrie, K. A., Zirpoli, G. & Gadi, V. K. In situ breast cancer and microchimerism. Sci. Rep. 3, 2192 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  158. 158

    Fan, H. C., Blumenfeld, Y. J., Chitkara, U., Hudgins, L. & Quake, S. R. Noninvasive diagnosis of fetal aneuploidy by shotgun sequencing DNA from maternal blood. Proc. Natl Acad. Sci. USA 105, 16266–16271 (2008).

    Article  Google Scholar 

  159. 159

    Chiu, R. W. K. et al. Non-invasive prenatal assessment of trisomy 21 by multiplexed maternal plasma DNA sequencing: large scale validity study. BMJ 342, c7401 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  160. 160

    Bianchi, D. W. et al. Noninvasive prenatal testing and incidental detection of occult maternal malignancies. JAMA 314, 162–169 (2015).

    Article  CAS  Google Scholar 

  161. 161

    Jamuar, S. S. & Walsh, C. A. Somatic mutations in cerebral cortical malformations. N. Engl. J. Med. 371, 2038–2038 (2014).

    Article  CAS  PubMed  Google Scholar 

  162. 162

    Poduri, A., Evrony, G. D., Cai, X. & Walsh, C. A. Somatic mutation, genomic variation, and neurological disease. Science 341, 1237758–1237758 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  163. 163

    De Vlaminck, I. et al. Circulating cell-free DNA enables noninvasive diagnosis of heart transplant rejection. Sci. Transl Med. 6, 241ra77 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  164. 164

    Shugay, M. et al. Towards error-free profiling of immune repertoires. Nat. Methods 11, 653–655 (2014).

    Article  CAS  PubMed  Google Scholar 

  165. 165

    DeWitt, W. S. et al. Dynamics of the cytotoxic T cell response to a model of acute viral infection. J. Virol. 89, 4517–4526 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  166. 166

    Hsu, M. S. et al. TCR sequencing can identify and track glioma-infiltrating T cells after DC vaccination. Cancer Immunol. Res. 4, 412–418 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  167. 167

    Tumeh, P. C. et al. PD-1 blockade induces responses by inhibiting adaptive immune resistance. Nature 515, 568–571 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  168. 168

    Goodnow, C. C. Multistep pathogenesis of autoimmune disease. Cell 130, 25–35 (2007).

    Article  CAS  PubMed  Google Scholar 

  169. 169

    Qian, J. et al. B cell super-enhancers and regulatory clusters recruit AID tumorigenic activity. Cell 159, 1524–1537 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  170. 170

    Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012).

  171. 171

    Lynch, S. V. & Pedersen, O. The human intestinal microbiome in health and disease. N. Engl. J. Med. 375, 2369–2379 (2016).

    Article  CAS  PubMed  Google Scholar 

  172. 172

    Van de Wiele, T., Van Praet, J. T., Marzorati, M., Drennan, M. B. & Elewaut, D. How the microbiota shapes rheumatic diseases. Nat. Rev. Rheumatol. 12, 398–411 (2016).

    Article  CAS  PubMed  Google Scholar 

  173. 173

    Rosenbaum, M., Knight, R. & Leibel, R. L. The gut microbiota in human energy homeostasis and obesity. Trends Endocrinol. Metab. 26, 493–501 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  174. 174

    Alexander, J. L. et al. Gut microbiota modulation of chemotherapy efficacy and toxicity. Nat. Rev. Gastroenterol. Hepatol. 1805, 105 (2017).

    Google Scholar 

  175. 175

    Vindigni, S. M. & Surawicz, C. M. Fecal microbiota transplantation. Gastroenterol. Clin. North Am. 46, 171–185 (2017).

    Article  PubMed  Google Scholar 

  176. 176

    Dominguez-Bello, M. G. et al. Partial restoration of the microbiota of cesarean-born infants via vaginal microbial transfer. Nat. Med. 22, 250–253 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  177. 177

    Roach, D. J. et al. A year of infection in the intensive care unit: prospective whole genome sequencing of bacterial clinical isolates reveals cryptic transmissions and novel microbiota. PLOS Genet. 11, e1005413 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  178. 178

    Cummings, L. A. et al. Clinical next generation sequencing outperforms standard microbiological culture for characterizing polymicrobial samples. Clin. Chem. 62, 1465–1473 (2016).

    Article  CAS  PubMed  Google Scholar 

  179. 179

    Grumaz, S. et al. Next-generation sequencing diagnostics of bacteremia in septic patients. Genome Med. 8, 73 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  180. 180

    Kim, S. et al. High-throughput automated microfluidic sample preparation for accurate microbial genomics. Nat. Commun. 8, 13919 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  181. 181

    Acevedo, A., Brodsky, L. & Andino, R. Mutational and fitness landscapes of an RNA virus revealed through population sequencing. Nature 505, 686–690 (2014).

    Article  CAS  PubMed  Google Scholar 

  182. 182

    Eigen, M. The concept of the quasispecies will soon be 50 years old. Introduction. Curr. Top. Microbiol. Immunol. 392, vii (2016).

    PubMed  Google Scholar 

  183. 183

    Henn, M. R. et al. Whole genome deep sequencing of HIV-1 reveals the impact of early minor variants upon immune recognition during acute infection. PLOS Pathog. 8, e1002529 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  184. 184

    Solmone, M. et al. Use of massively parallel ultradeep pyrosequencing to characterize the genetic diversity of hepatitis B virus in drug-resistant and drug-naive patients and to detect minor variants in reverse transcriptase and hepatitis B S antigen. J. Virol. 83, 1718–1726 (2009).

    Article  CAS  PubMed  Google Scholar 

  185. 185

    Svarovskaia, E. S., Martin, R., McHutchison, J. G., Miller, M. D. & Mo, H. Abundant drug-resistant NS3 mutants detected by deep sequencing in hepatitis C virus-infected patients undergoing NS3 protease inhibitor monotherapy. J. Clin. Microbiol. 50, 3267–3274 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  186. 186

    Daum, L. T. et al. Next-generation ion torrent sequencing of drug resistance mutations in Mycobacterium tuberculosis strains. J. Clin. Microbiol. 50, 3831–3837 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  187. 187

    Katz, M., Hover, B. & Brady, S. Culture-independent discovery of natural products from soil metagenomes. J. Ind. Microbiol. Biotechnol. 43, 129–141 (2016).

    Article  CAS  PubMed  Google Scholar 

  188. 188

    Bassil, N. M., Bryan, N. & Lloyd, J. R. Microbial degradation of isosaccharinic acid at high pH. ISME J. 9, 310–320 (2015).

    Article  CAS  PubMed  Google Scholar 

  189. 189

    Yamamoto, S. et al. Environmental DNA metabarcoding reveals local fish communities in a species-rich coastal sea. Sci. Rep. 7, 40368 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  190. 190

    Mayo, B. et al. Impact of next generation sequencing techniques in food microbiology. Curr. Genom. 15, 293–309 (2014).

    Article  CAS  Google Scholar 

  191. 191

    Jäger, A. C. et al. Developmental validation of the MiSeq FGx Forensic Genomics System for targeted next generation sequencing in forensic DNA casework and database laboratories. Forensic Sci. Int. Genet. 28, 52–70 (2017).

    Article  CAS  PubMed  Google Scholar 

  192. 192

    Stiller, M. et al. Patterns of nucleotide misincorporations during enzymatic amplification and direct large-scale sequencing of ancient DNA. Proc. Natl Acad. Sci. USA 103, 13578–13584 (2006).

    Article  CAS  Google Scholar 

  193. 193

    Avery, O. T., Macleod, C. M. & McCarty, M. Studies on the chemical nature of the substance inducing transformation of pneumococcal types: induction of transformation by a desoxyribonucleic acid fraction isolated from pneumococcus type III. J. Exp. Med. 79, 137–158 (1944).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  194. 194

    Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).

    Article  CAS  Google Scholar 

  195. 195

    Mostovoy, Y. et al. A hybrid approach for de novo human genome sequence assembly and phasing. Nat. Methods 13, 587–590 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  196. 196

    Bickhart, D. M. et al. Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome. Nat. Genet. 49, 643–650 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  197. 197

    King, D. A. et al. Mosaic structural variation in children with developmental disorders. Hum. Mol. Genet. 24, 2733–2745 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  198. 198

    Navin, N. et al. Tumour evolution inferred by single-cell sequencing. Nature 472, 90–94 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  199. 199

    Vitak, S. A. et al. Sequencing thousands of single-cell genomes with combinatorial indexing. Nat. Methods 14, 302–308 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  200. 200

    Zheng, G. X. Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  201. 201

    Rosenberg, A. B. et al. Scaling single cell transcriptomics through split pool barcoding. Preprint at bioRxiv https://doi.org/10.1101/105163 (2017).

    Google Scholar 

  202. 202

    Ullal, A. V. et al. Cancer cell profiling by barcoding allows multiplexed protein analysis in fine-needle aspirates. Sci. Transl Med. 6, 219ra9 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  203. 203

    ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

  204. 204

    Sun, W.-J. et al. RMBase: a resource for decoding the landscape of RNA modifications from high-throughput sequencing data. Nucleic Acids Res. 44, D259–265 (2016).

    Article  CAS  PubMed  Google Scholar 

  205. 205

    Wellcome Collection. Charles Robert Darwin. Photograph by L. Darwin. Wellcome Trust https://wellcomecollection.org/works/s6x9wbsj?page=1&query=darwin (2016).

Download references

Acknowledgements

The authors thank R. Risques, J. Hiatt and A. Boswell for critical review; E. Fox and E. H. Ahn for contributions to early drafts; K. Loubet-Senear, R. Risques and M. Emond for graphics ideas; N. Homer and C. Valentine for software information and members of the Loeb, Kennedy and Risques laboratories at the University of Washington for many lively discussions. This work was supported by National Institutes of Health grants T32CA009515 (J.J.S.) and R01CA193649, P01CA77852, and R33CA181771 (L.A.L.).

Author information

Affiliations

Authors

Contributions

J.J.S and L.A.L. contributed to discussion of content and reviewing and editing the manuscript before submission. J.J.S. was primarily responsible for researching data and writing the manuscript. M.W.S. researched the literature and contributed to writing parts of the initial drafts of the article.

Corresponding authors

Correspondence to Jesse J. Salk or Lawrence A. Loeb.

Ethics declarations

Competing interests

J.J.S., M.W.S. and L.A.L. are equity holders in TwinStrand Biosciences, Inc.

Related links

PowerPoint slides

Glossary

Clonal

When referring to a genetic variant or mutation, it is one that is present in all or most molecules in a population being sequenced. The term typically implies that it arose from a common ancestor, such as a fertilized egg in the case of germline variation, or the earliest founder cell of a tumour.

Subclonal

When referring to a genetic variant or mutation, it is one that is present in only a subset of molecules being sequenced. This may refer to either a variant carried by a subpopulation that arose and expanded within a larger population or through mixing of two or more distinct populations.

Sequencing accuracy

The number of errors made per base pair sequenced. It may be stratified by subtype of error, such as a specific type of base substitution.

Sequencing sensitivity

The ability to detect a variant at a particular variant allele frequency. This depends on both the sequencing accuracy and the number of independent DNA molecules successfully sequenced that include the genomic position (or positions) of interest.

Variant allele frequency

(VAF). The fraction of all molecules being sequenced that carry a specific genetic change or mutation at a particular genomic position.

Digital PCR

DNA amplification carried out in single-molecule reaction chambers. Recently, this has most often entailed microscopic aqueous droplets immersed in oil. When DNA input is sufficiently low, only one molecule will seed each reaction. When allele-specific amplification conditions are used, the number of droplets that successfully amplify can be digitally tabulated to determine the variant allele frequency.

Polony

A population of identical amplification copies that originated from a single founder molecule and are spatially colocalized, such as on the surface of a microbead or as a spot on a surface. It is the biochemical analogue of a bacterial colony on a Petri dish.

Tag-based error correction

Also known as consensus sequencing, an approach for error correction whereby individual DNA molecules are uniquely labelled before amplification and sequencing, and the sequences of the related derivative copies are then compared with each other to exclude errors.

Short-read platforms

Next-generation sequencing systems that generate reads that are dozens to several hundreds of nucleotides in length, for example, the current Illumina and Thermo Fisher Scientific Ion Torrent platforms and previously manufactured Roche 454 and ABI SOLiD platforms. Current versions sequence amplified polonies, not single molecules.

Long-read platforms

Next-generation sequencing systems that generate reads that are thousands to tens of thousands of nucleotides in length. These currently include Pacific Biosciences (PacBio) and Oxford Nanopore Technologies, which sequence single molecules, not polonies, and therefore have a higher error rate than short-read platforms.

Molecular barcode

Also known as a unique molecular identifier (UMI). A set of DNA nucleotide codes where each is affixed to only one or a subset of individual DNA molecules within a sample. The purpose is to uniquely label single molecules for consensus-based error correction or molecular counting. These may be informatically combined with molecule fragmentation points for greater label diversity.

Index sequence

A particular DNA nucleotide code affixed to all molecules within a given DNA sample that is used for multiplexing samples on a single sequencer run.

Sequencing depth

The number of sequencing reads that include a particular genomic position in their sequence. Some may be simply PCR copies of the same molecule.

Molecular depth

The number of collapsed consensus reads derived from an independent DNA molecule that include a particular genomic position.

Tag clashes

The occurrence of two independent molecules being identically labelled by random chance. This may happen if the diversity of the applied molecular barcodes is too low for the number of DNA molecules sequenced. True mutations may erroneously be excluded.

False families

Sets of related molecules where an error has occurred during amplification that mutates the common tag sequence to erroneously make it appear that two independent molecules gave rise to these molecules.

Consensus-making efficiency

The number of raw sequencing reads that are required to form a consensus read. This typically refers to an average: total raw reads divided by total consensus reads.

Molecular conversion efficiency

The fraction of inputted DNA molecules of interest that are recovered as consensus sequences. This is often described in terms of genome-equivalents.

Aneuploidies

Abnormal numbers of chromosomes in a cell. This may be inherited, such as trisomy 21, the basis of Down syndrome, or somatically acquired, such as in cancer.

Metagenomics

The study of complex microbial populations encompassing many co-mingling species that form an ecosystem, for example, an individual's gut microbiota.

Phasing

The proper assignment of two or more variants at spatially distant genomic locations to the derivative nucleic acid molecule, for example, the maternal or paternal allele.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Salk, J., Schmitt, M. & Loeb, L. Enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations. Nat Rev Genet 19, 269–285 (2018). https://doi.org/10.1038/nrg.2017.117

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing