The ability to identify low-frequency genetic variants among heterogeneous populations of cells or DNA molecules is important in many fields of basic science, clinical medicine and other applications, yet current high-throughput DNA sequencing technologies have an error rate between 1 per 100 and 1 per 1,000 base pairs sequenced, which obscures their presence below this level.
As next-generation sequencing technologies evolved over the decade, throughput has improved markedly, but raw accuracy has remained generally unchanged. Researchers with a need for high accuracy developed data filtering methods and incremental biochemical improvements that modestly improve low-frequency variant detection, but background errors remain limiting in many fields.
The most profoundly impactful means for reducing errors, first developed approximately 7 years ago, has been the concept of single-molecule consensus sequencing. This entails redundant sequencing of multiple copies of a given specific DNA molecule and discounting of variants that are not present in all or most of the copies as likely errors.
Consensus sequencing can be achieved by labelling each molecule with a unique molecular barcode before generating copies, which allows subsequent comparison of these copies or schemes whereby copies are physically joined and sequenced together. Because of trade-offs in cost, time and accuracy, no single method is optimal for every application, and each method should be considered on a case-by-case basis.
Major applications for high-accuracy DNA sequencing include non-invasive cancer diagnostics, cancer screening, early detection of cancer relapse or impending drug resistance, infectious disease applications, prenatal diagnostics, forensics and mutagenesis assessment.
Future advances in ultra-high-accuracy sequencing are likely to be driven by an emerging generation of single-molecule sequencers, particularly those that allow independent sequence comparison of both strands of native DNA duplexes.
Mutations, the fuel of evolution, are first manifested as rare DNA changes within a population of cells. Although next-generation sequencing (NGS) technologies have revolutionized the study of genomic variation between species and individual organisms, most have limited ability to accurately detect and quantify rare variants among the different genome copies in heterogeneous mixtures of cells or molecules. We describe the technical challenges in characterizing subclonal variants using conventional NGS protocols and the recent development of error correction strategies, both computational and experimental, including consensus sequencing of single DNA molecules. We also highlight major applications for low-frequency mutation detection in science and medicine, describe emerging methodologies and provide our vision for the future of DNA sequencing.
Subscribe to Journal
Get full journal access for 1 year
only $21.58 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Darwin, C. On the Origin of Species (John Murray Press, 1859).
Luria, S. E. & Delbrück, M. Mutations of bacteria from virus sensitivity to virus resistance. Genetics 28, 491–511 (1943).
Cairns, J. Mutation selection and the natural history of cancer. Nature 255, 197–200 (1975).
Fisher, R. et al. Deep sequencing reveals minor protease resistance mutations in patients failing a protease inhibitor regimen. J. Virol. 86, 6231–6237 (2012).
Schmitt, M. W., Loeb, L. A. & Salk, J. J. The influence of subclonal resistance mutations on targeted cancer therapy. Nat. Rev. Clin. Oncol. 13, 335–347 (2016).
Maher, G. J. et al. Visualizing the origins of selfish de novo mutations in individual seminiferous tubules of human testes. Proc. Natl Acad. Sci. USA 113, 2454–2459 (2016).
Kennedy, S. R., Loeb, L. A. & Herr, A. J. Somatic mutations in aging, cancer and neurodegeneration. Mech. Ageing Dev. 133, 118–126 (2012).
Vijg, J. Somatic mutations, genome mosaicism, cancer and aging. Curr. Opin. Genet. Dev. 26, 141–149 (2014).
Shendure, J. et al. DNA sequencing at 40: past, present and future. Nature 550, 345–353 (2017).
Goodwin, S., Mcpherson, J. D. & Mccombie, W. R. Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet. 17, 333–351 (2016).
Sanger, F., Nicklen, S. & Coulson, A. R. DNA sequencing with chain-terminating inhibitors. Proc. Natl Acad. Sci. USA 74, 5463–5467 (1977). One of two Nobel prize-winning DNA sequencing methodologies published in 1977 (the other being that of Maxam and Gilbert). The Sanger approach formed the basis of The Human Genome Project.
Ley, T. J. et al. DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature 456, 66–72 (2008).
Zagordi, O., Klein, R., Däumer, M. & Beerenwinkel, N. Error correction of next-generation sequencing data and reliable estimation of HIV quasispecies. Nucleic Acids Res. 38, 7400–7409 (2010).
Parsons, B. L. & Heflich, R. H. Genotypic selection methods for the direct analysis of point mutations. Mutat. Res. 387, 97–121 (1997).
Bielas, J. H. & Loeb, L. A. Quantification of random genomic mutations. Nat. Methods 2, 285–290 (2005).
Li, J. et al. Replacing PCR with COLD-PCR enriches variant DNA sequences and redefines the sensitivity of genetic testing. Nat. Med. 14, 579–584 (2008).
Sykes, P. J. et al. Quantitation of targets for PCR by use of limiting dilution. Biotechniques 13, 444–449 (1992).
Vogelstein, B. & Kinzler, K. W. Digital, P. C. R. Proc. Natl Acad. Sci. USA 96, 9236–9241 (1999).
Hindson, B. J. et al. High-throughput droplet digital PCR system for absolute quantitation of DNA copy number. Anal. Chem. 83, 8604–8610 (2011).
Fox, E. J., Reid-Bayliss, K. S., Emond, M. J. & Loeb, L. A. Accuracy of next generation sequencing platforms. Next Gener. Seq. Appl. 1, 1000106 (2014).
Blokzijl, F. et al. Tissue-specific mutation accumulation in human adult stem cells during life. Nature 538, 260–264 (2016).
Ewing, B. & Green, P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8, 186–194 (1998). Among the first and most important uses of rigorous statistical methods to assign degree of certainty to DNA sequencing data.
Cock, P. J. A., Fields, C. J., Goto, N., Heuer, M. L. & Rice, P. M. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res. 38, 1767–1771 (2010).
Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).
Koboldt, D. C. et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568–576 (2012).
Wang, Q. et al. Detecting somatic point mutations in cancer genome sequencing data: a comparison of mutation callers. Genome Med. 5, 91 (2013).
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at ArXiV arXiv:1303.3997v2 [q-bio.GN] (2013).
Wei, Z., Wang, W., Hu, P., Lyon, G. J. & Hakonarson, H. SNVer: a statistical tool for variant calling in analysis of pooled or individual next-generation sequencing data. Nucleic Acids Res. 39, e132–e132 (2011).
Wilm, A. et al. LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets. Nucleic Acids Res. 40, 11189–11201 (2012).
Gerstung, M. et al. Reliable detection of subclonal single-nucleotide variants in tumour cell populations. Nat. Commun. 3, 811 (2012).
Costello, M. et al. Discovery and characterization of artifactual mutations in deep coverage targeted capture sequencing data due to oxidative DNA damage during sample preparation. Nucleic Acids Res. 41, e67–e67 (2013).
Chen, L., Liu, P., Evans, T. C. & Ettwiller, L. M. DNA damage is a pervasive cause of sequencing errors, directly confounding variant identification. Science 355, 752–756 (2017).
Schirmer, M., D'Amore, R., Ijaz, U. Z., Hall, N. & Quince, C. Illumina error profiles: resolving fine-scale variation in metagenomic sequencing data. BMC Bioinformatics 17, 125 (2016).
Martincorena, I. et al. Tumor evolution. High burden and pervasive positive selection of somatic mutations in normal human skin. Science 348, 880–886 (2015).
Welch, J. S. et al. The origin and evolution of mutations in acute myeloid leukemia. Cell 150, 264–278 (2012).
Nik-Zainal, S. et al. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature 534, 47–54 (2016).
Kircher, M., Sawyer, S. & Meyer, M. Double indexing overcomes inaccuracies in multiplex sequencing on the Illumina platform. Nucleic Acids Res. 40, e3 (2012). An important description of the commonness of PCR chimaeras, optical duplicates and index swapping that occurs during NGS library preparation and polony formation. This contributed to the now common practice of dual indexing for error-sensitive applications.
Potapov, V. & Ong, J. L. Examining sources of error in PCR by single-molecule sequencing. PLOS ONE 12, e0169774 (2017).
Brodin, J. et al. PCR-induced transitions are the major source of error in cleaned ultra-deep pyrosequencing data. PLOS ONE 8, e70388 (2013).
Star, B. et al. Palindromic sequence artifacts generated during next generation sequencing library preparation from historic and ancient DNA. PLOS ONE 9, e89676 (2014).
Van Allen, E. M. et al. Whole-exome sequencing and clinical interpretation of formalin-fixed, paraffin-embedded tumor samples to guide precision cancer medicine. Nat. Med. 20, 682–688 (2014).
Arbeithuber, B., Makova, K. D. & Tiemann-Boege, I. Artifactual mutations resulting from DNA lesions limit detection levels in ultrasensitive sequencing applications. DNA Res. 23, 547–559 (2016).
Lindahl, T. & Nyberg, B. Rate of depurination of native deoxyribonucleic acid. Biochemistry 11, 3610–3618 (1972).
Knierim, E., Lucke, B., Schwarz, J. M., Schuelke, M. & Seelow, D. Systematic comparison of three methods for fragmentation of long-range PCR products for next generation sequencing. PLOS ONE 6, e28240 (2011).
Do, H. & Dobrovic, A. Sequence artifacts in DNA from formalin-fixed tissues: causes and strategies for minimization. Clin. Chem. 61, 64–71 (2015).
Lou, D. I. et al. High-throughput DNA sequencing errors are reduced by orders of magnitude using circle sequencing. Proc. Natl Acad. Sci. USA 110, 19872–19877 (2013). The first important description of consensus sequencing by tandem duplication of library molecules. Although challenging on short-read sequencers, this concept is likely to become very important as single-molecule sequencers improve in the coming years.
Chen, G., Mosier, S., Gocke, C. D., Lin, M.-T. & Eshleman, J. R. Cytosine deamination is a major cause of baseline noise in next-generation sequencing. Mol. Diagn. Ther. 18, 587–593 (2014).
Schaaper, R. M., Kunkel, T. A. & Loeb, L. A. Infidelity of DNA synthesis associated with bypass of apurinic sites. Proc. Natl Acad. Sci. USA 80, 487–491 (1983).
Sagher, D. & Strauss, B. Insertion of nucleotides opposite apurinic/apyrimidinic sites in deoxyribonucleic acid during in vitro synthesis: uniqueness of adenine nucleotides. Biochemistry 22, 4518–4526 (1983).
Nishimura, S. 8-Hydroxyguanine: a base for discovery. DNA Repair 10, 1078–1083 (2011).
Sinha, R. et al. Index switching causes 'spreading-of-signal' among multiplexed samples in Illumina HiSeq 4000 DNA sequencing. https://doi.org/10.1101/125724 (2017).
Hiatt, J. B., Turner, E. H., Patwardhan, R. P., Caperton, L. & Shendure, J. Next-generation DNA sequencing for de novo genome assembly. Western Student Medical Research Forum (2009).
Hiatt, J. B., Patwardhan, R. P., Turner, E. H., Lee, C. & Shendure, J. Parallel, tag-directed assembly of locally derived short sequence reads. Nat. Methods 7, 119–122 (2010). The first description of consensus sequencing PCR duplicates for error correction, both with UMIs and without.
Casbon, J. A., Osborne, R. J., Brenner, S. & Lichtenstein, C. P. A method for counting PCR template molecules with application to next-generation sequencing. Nucleic Acids Res. 39, e81 (2011).
Kinde, I., Wu, J., Papadopoulos, N., Kinzler, K. W. & Vogelstein, B. Detection and quantification of rare mutations with massively parallel sequencing. Proc. Natl Acad. Sci. USA 108, 9530–9535 (2011). A key early description of single-strand tag-based error correction for rare variant detection. This publication put the significance in clinical context and was probably the most important launch for the field.
Jabara, C. B., Jones, C. D., Roach, J., Anderson, J. A. & Swanstrom, R. Accurate sampling and deep sequencing of the HIV-1 protease gene using a Primer ID. Proc. Natl Acad. Sci. USA 108, 20166–20171 (2011).
Fu, G. K., Hu, J., Wang, P.-H. & Fodor, S. P. A. Counting individual DNA molecules by the stochastic attachment of diverse labels. Proc. Natl Acad. Sci. USA 108, 9026–9031 (2011).
Kivioja, T. et al. Counting absolute numbers of molecules using unique molecular identifiers. Nat. Methods 9, 72–74 (2011).
Shiroguchi, K., Jia, T. Z., Sims, P. A. & Xie, X. S. Digital RNA sequencing minimizes sequence-dependent bias and amplification noise with optimized single-molecule barcodes. Proc. Natl Acad. Sci. USA 109, 1347–1352 (2012).
Schmitt, M. W. et al. Detection of ultra-rare mutations by next-generation sequencing. Proc. Natl Acad. Sci. USA 109, 14508–14513 (2012). The initial description of DupSeq and the concept of labelling copies of both strands of individual double-stranded molecules to allow them to be sequenced and compared for even greater accuracy. This technique opened the door to investigations of ultra-rare variants, such as those that occur in ageing and with mutagenic chemical exposure.
Hoang, M. L. et al. Genome-wide quantification of rare somatic mutations in normal human tissues using massively parallel sequencing. Proc. Natl Acad. Sci. USA 113, 9846–9851 (2016). A duplex sequencing approach at very low depth and not requiring exogenous UMIs. An excellent example of genotoxicity and ageing applications.
Nachmanson, D. et al. CRISPR-DS: an efficient, low DNA input method for ultra-accurate sequencing. Preprint at bioRxivhttps://doi.org/10.1101/207027 (2017).
Liang, R. H. et al. Theoretical and experimental assessment of degenerate primer tagging in ultra-deep applications of next-generation sequencing. Nucleic Acids Res. 42, e98 (2014).
Zhang, T.-H., Wu, N. C. & Sun, R. A benchmark study on error-correction by read-pairing and tag-clustering in amplicon-based deep sequencing. BMC Genomics 17, 108 (2016).
Smith, T., Heger, A. & Sudbery, I. UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy. Genome Res. 27, 491–499 (2017).
Ståhlberg, A. et al. Simple, multiplexed, PCR-based barcoding of DNA enables sensitive mutation detection in liquid biopsies using sequencing. Nucleic Acids Res. 44, e105 (2016).
Ståhlberg, A. et al. Simple multiplexed PCR-based barcoding of DNA for ultrasensitive mutation detection by next-generation sequencing. Nat. Protoc. 12, 664–682 (2017).
Hiatt, J. B., Pritchard, C. C., Salipante, S. J., O'Roak, B. J. & Shendure, J. Single molecule molecular inversion probes for targeted, high accuracy detection of low frequency variation. Genome Res. https://doi.org/10.1101/gr.147686.112 (2013).
Carlson, K. D. et al. MIPSTR: a method for multiplex genotyping of germline and somatic STR variation across many individuals. Genome Res. 25, 750–761 (2015).
Boyle, E. A., O'Roak, B. J., Martin, B. K., Kumar, A. & Shendure, J. MIPgen: optimized modeling and design of molecular inversion probes for targeted resequencing. Bioinformatics 30, 2670–2672 (2014).
Wang, K. et al. Ultra-precise detection of mutations by droplet-based amplification of circularized DNA. BMC Genomics 17, 214 (2016). An important description of several biochemical techniques to improve consensus making efficiency and reduce cost.
Hong, L. Z. et al. BAsE-Seq: a method for obtaining long viral haplotypes from short sequence reads. Genome Biol. 15, 517 (2014).
Schmitt, M. W., Fox, E. J. & Salk, J. J. Risks of double-counting in deep sequencing. Proc. Natl Acad. Sci. USA 111, E1560 (2014).
Hong, J. & Gresham, D. Incorporation of unique molecular identifiers in TruSeq adapters improves the accuracy of quantitative sequencing. Biotechniques 63, 221–226 (2017).
Narayan, A. et al. Ultrasensitive measurement of hotspot mutations in tumor DNA in blood using error-suppressed multiplexed deep sequencing. Cancer Res. 72, 3492–3498 (2012).
Gregory, M. T. et al. Targeted single molecule mutation detection with massively parallel sequencing. Nucleic Acids Res. 44, e22–e22 (2016).
Pel, J. et al. Duplex Proximity Sequencing (Pro-Seq): a method to improve DNA sequencing accuracy without the cost of molecular barcoding redundancy. Preprint at bioRxiv https://doi.org/10.1101/163444 (2017).
Kennedy, S. R. et al. Detecting ultralow-frequency mutations by duplex sequencing. Nat. Protoc. 9, 2586–2606 (2014).
Roach, J. C. et al. Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328, 636–639 (2010).
Kennedy, S. R., Salk, J. J., Schmitt, M. W. & Loeb, L. A. Ultra-sensitive sequencing reveals an age-related increase in somatic mitochondrial mutations that are inconsistent with oxidative damage. PLOS Genet. 9, e1003794 (2013). The first description of high-accuracy consensus sequencing to measure the effect of human ageing on somatic mutation load.
Taylor, P. H., Cinquin, A. & Cinquin, O. Quantification of in vivo progenitor mutation accrual with ultra-low error rate and minimal input DNA using SIP-HAVA-seq. Genome Res. 26, 1600–1611 (2016).
Hoekstra, J. G., Hipp, M. J., Montine, T. J. & Kennedy, S. R. Mitochondrial DNA mutations increase in early stage Alzheimer disease and are inconsistent with oxidative damage. Ann. Neurol. 80, 301–306 (2016).
Pickrell, A. M. et al. Endogenous parkin preserves dopaminergic substantia nigral neurons following mitochondrial DNA mutagenic stress. Neuron 87, 371–381 (2015).
Reid-Bayliss, K. S., Arron, S. T., Loeb, L. A., Bezrookove, V. & Cleaver, J. E. Why Cockayne syndrome patients do not get cancer despite their DNA repair deficiency. Proc. Natl Acad. Sci. USA 113, 10151–10156 (2016).
Chawanthayatham, S. et al. Mutational spectra of aflatoxin B1 in vivo establish biomarkers of exposure for human hepatocellular carcinoma. Proc. Natl Acad. Sci. USA 114, E3101–E3109 (2017).
Mattox, A. K. et al. Bisulfite-converted duplexes for the strand-specific detection and quantification of rare mutations. Proc. Natl Acad. Sci. USA 114, 4733–4738 (2017).
Kumar, V. et al. Partial bisulfite conversion for unique template sequencing. Nucleic Acids Res. https://doi.org/10.1093/nar/gkx1054 (2017).
Deamer, D., Akeson, M. & Branton, D. Three decades of nanopore sequencing. Nat. Biotechnol. 34, 518–524 (2016).
Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. 323, 133–138 (2009).
Madoui, M.-A. et al. Genome assembly using nanopore-guided long and error-free DNA reads. BMC Genomics 16, 327 (2015).
Schüle, B. et al. Parkinson's disease associated with pure ATXN10 repeat expansion. NPJ Parkinsons Dis. 3, 27 (2017).
Li, C. et al. INC-Seq: accurate single molecule reads using nanopore sequencing. Gigascience 5, 34 (2016).
Jain, M., Olsen, H. E., Paten, B. & Akeson, M. The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol. 17, 239 (2016).
Travers, K. J., Chin, C.-S., Rank, D. R., Eid, J. S. & Turner, S. W. A flexible and efficient template format for circular consensus sequencing and SNP detection. Nucleic Acids Res. 38, e159 (2010). The first description of consensus sequencing based on iterative resequencing of both strands of individual molecules. This concept, although currently challenging, will probably become very important as single-molecule DNA sequencers improve.
Loomis, E. W. et al. Sequencing the unsequenceable: expanded CGG-repeat alleles of the fragile X gene. Genome Res. 23, 121–128 (2013).
Russo, G. et al. Highly sensitive, non-invasive detection of colorectal cancer mutations using single molecule, third generation sequencing. Appl. Transl Genom. 7, 32–39 (2015).
Frank, J. A. et al. Improved metagenome assemblies and taxonomic binning using long-read circular consensus sequence data. Sci. Rep. 6, 25373 (2016).
Hestand, M. S., Van Houdt, J., Cristofoli, F. & Vermeesch, J. R. Polymerase specific error rates and profiles identified by single molecule sequencing. Mutat. Res. 784–785, 39–45 (2016).
Heerema, S. J. & Dekker, C. Graphene nanodevices for DNA sequencing. Nat. Nanotechnol. 11, 127–136 (2016).
Beechem, J. Library free targeted sequencing of native genomic DNA FFPE samples using Hyb & Seq technology-the hybridization based single molecule sequencing system. Advances in Genome Biology and Technology Annual Meeting https://www.nanostring.com/application/files/3815/0206/1895/AGBT2017_HybSeq_Chemistry_Final.pdf (2017).
Johnson, S. S., Zaikova, E., Goerlitz, D. S., Bai, Y. & Tighe, S. W. Real-time DNA sequencing in the Antarctic dry valleys using the Oxford Nanopore sequencer. J. Biomol. Tech. 28, 2–7 (2017).
Wang, K. et al. Using ultra-sensitive next generation sequencing to dissect DNA damage-induced mutagenesis. Sci. Rep. 6, 25310 (2016).
Stoler, N., Arbeithuber, B., Guiblet, W., Makova, K. D. & Nekrutenko, A. Streamlined analysis of duplex sequencing data with Du Novo. Genome Biol. 17, 180 (2016).
Newman, A. M. et al. Integrated digital error suppression for improved detection of circulating tumor DNA. Nat. Biotechnol. 34, 547–555 (2016). An important early comprehensive description of a cfDNA liquid biopsy approach using tag-based error correction techniques.
Zheng, Z. et al. Anchored multiplex PCR for targeted next-generation sequencing. Nat. Med. 20, 1479–1484 (2014).
Kennedy, S. & Hipp, M. J. Removing sequencer and PCR artifacts for forensic DNA analysis on massively parallel sequencing platforms: https://www.promega.com/-/media/files/products-and-services/genetic-identity/ishi-28-oral-abstracts/kennedy-ishipaper.pdf (2017).
Krimmel, J. D., Salk, J. J. & Risques, R.-A. Cancer-like mutations in non-cancer tissue: towards a better understanding of multistep carcinogenesis. Transl Cancer Res. https://doi.org/10.21037/tcr.2016.11.67 (2016).
Loeb, L. A., Springgate, C. F. & Battula, N. Errors in DNA replication as a basis of malignant changes. Cancer Res. 34, 2311–2321 (1974).
Merlo, L. M. F., Pepper, J. W., Reid, B. J. & Maley, C. C. Cancer as an evolutionary and ecological process. Nat. Rev. Cancer 6, 924–935 (2006).
Gatenby, R. A. & Gillies, R. J. A microenvironmental model of carcinogenesis. Nat. Rev. Cancer 8, 56–61 (2008).
Salk, J. J., Fox, E. J. & Loeb, L. A. Mutational heterogeneity in human cancers: origin and consequences. Annu. Rev. Pathol. 5, 51–75 (2010).
Greaves, M. & Maley, C. C. Clonal evolution in cancer. Nature 481, 306–313 (2012).
Burrell, R. A., McGranahan, N., Bartek, J. & Swanton, C. The causes and consequences of genetic heterogeneity in cancer evolution. Nature 501, 338–345 (2013).
Gerlinger, M. et al. Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N. Engl. J. Med. 366, 883–892 (2012).
Sottoriva, A. et al. Intratumor heterogeneity in human glioblastoma reflects cancer evolutionary dynamics. Proc. Natl Acad. Sci. USA 110, 4009–4014 (2013).
Zhang, J. et al. Intratumor heterogeneity in localized lung adenocarcinomas delineated by multiregion sequencing. Science 346, 256–259 (2014).
de Bruin, E. C. et al. Spatial and temporal diversity in genomic instability processes defines lung cancer evolution. Science 346, 251–256 (2014).
Naxerova, K. et al. Hypermutable DNA chronicles the evolution of human colon cancer. Proc. Natl Acad. Sci. USA 111, E1889–E1898 (2014).
Reiter, J. G. et al. Reconstructing metastatic seeding patterns of human cancers. Nat. Commun. 8, 14114 (2017).
Marusyk, A. et al. Non-cell-autonomous driving of tumour growth supports sub-clonal heterogeneity. Nature 514, 54–58 (2014).
Yates, L. R. et al. Subclonal diversification of primary breast cancer revealed by multiregion sequencing. Nat. Med. 21, 751–759 (2015).
Ding, L. et al. Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing. Nature 481, 506–510 (2012).
Sequist, L. V. et al. Genotypic and histological evolution of lung cancers acquiring resistance to EGFR inhibitors. Sci. Transl Med. 3, 75ra26 (2011).
Jamal-Hanjani, M. et al. Tracking the evolution of non-small-cell lung cancer. N. Engl. J. Med. 376, 2109–2121 (2017).
Andor, N. et al. Pan-cancer analysis of the extent and consequences of intratumor heterogeneity. Nat. Med. 22, 105–113 (2016).
Mroz, E. A. et al. High intratumor genetic heterogeneity is related to worse outcome in patients with head and neck squamous cell carcinoma. Cancer 119, 3034–3042 (2013).
Parker, W. T., Ho, M., Scott, H. S., Hughes, T. P. & Branford, S. Poor response to second-line kinase inhibitors in chronic myeloid leukemia patients with multiple low-level mutations, irrespective of their resistance profile. Blood 119, 2234–2238 (2012).
Landau, D. A. et al. Evolution and impact of subclonal mutations in chronic lymphocytic leukemia. Cell 152, 714–726 (2013).
Klco, J. M. et al. Association between mutation clearance after induction therapy and outcomes in acute myeloid leukemia. JAMA 314, 811–822 (2015).
Misale, S. et al. Emergence of KRAS mutations and acquired resistance to anti-EGFR therapy in colorectal cancer. Nature 486, 532–536 (2012).
Stroun, M., Anker, P., Lyautey, J., Lederrey, C. & Maurice, P. A. Isolation and characterization of DNA from the plasma of cancer patients. Eur. J. Cancer Clin. Oncol. 23, 707–712 (1987).
Bettegowda, C. et al. Detection of circulating tumor DNA in early- and late-stage human malignancies. Sci. Transl Med. 6, 224ra24 (2014).
Wan, J. C. M. et al. Liquid biopsies come of age: towards implementation of circulating tumour DNA. Nat. Rev. Cancer 17, 223–238 (2017).
Murtaza, M. et al. Non-invasive analysis of acquired resistance to cancer therapy by sequencing of plasma DNA. Nature 497, 108–112 (2013).
Garcia-Murillas, I. et al. Mutation tracking in circulating tumor DNA predicts relapse in early breast cancer. Sci. Transl Med. 7, 302ra133 (2015).
Tie, J. et al. Circulating tumor DNA analysis detects minimal residual disease and predicts recurrence in patients with stage II colon cancer. Sci. Transl Med. 8, 346ra92 (2016).
Newman, A. M. et al. An ultrasensitive method for quantitating circulating tumor DNA with broad patient coverage. Nat. Med. 20, 548–554 (2014).
Fujii, T. et al. Mutation-enrichment next-generation sequencing for quantitative detection of KRAS mutations in urine cell-free DNA from patients with advanced cancers. Clin. Cancer Res. 23, 3657–3666 (2017).
Wang, Y. et al. Detection of tumor-derived DNA in cerebrospinal fluid of patients with primary tumors of the brain and spinal cord. Proc. Natl Acad. Sci. USA 112, 9704–9709 (2015).
Kinde, I. et al. Evaluation of DNA from the Papanicolaou test to detect ovarian and endometrial cancers. Sci. Transl Med. 5, 167ra4 (2013).
Maritschnegg, E. et al. Lavage of the uterine cavity for molecular detection of Müllerian duct carcinomas: a proof-of-concept study. J. Clin. Oncol. 33, 4293–4300 (2015).
Wang, Y. et al. Detection of somatic mutations and HPV in the saliva and plasma of patients with head and neck squamous cell carcinomas. Sci. Transl Med. 7, 293ra104 (2015).
Sidransky, D. et al. Identification of ras oncogene mutations in the stool of patients with curable colorectal tumors. Science 256, 102–105 (1992).
Aravanis, A. M., Lee, M. & Klausner, R. D. Next-generation sequencing of circulating tumor DNA for early cancer detection. Cell 168, 571–574 (2017).
Armitage, P. & Doll, R. The age distribution of cancer and a multi-stage theory of carcinogenesis. Br. J. Cancer 8, 1–12 (1954).
Genovese, G. et al. Clonal hematopoiesis and blood-cancer risk inferred from blood DNA sequence. N. Engl. J. Med. 371, 2477–2487 (2014).
Jaiswal, S. et al. Age-related clonal hematopoiesis associated with adverse outcomes. N. Engl. J. Med. 371, 2488–2498 (2014).
Young, A. L., Challen, G. A., Birmann, B. M. & Druley, T. E. Clonal haematopoiesis harbouring AML-associated mutations is ubiquitous in healthy adults. Nat. Commun. 7, 12484 (2016). A description of the use of a single-strand tag-based error correction technique to identify preneoplastic clones in nearly all adults, which had only 2 years earlier been believed to occur in only a subset of very elderly individuals. It is an important example of how a fundamental biological understanding can change quickly with improved discovery technologies.
Krimmel, J. D. et al. Ultra-deep sequencing detects ovarian cancer cells in peritoneal fluid and reveals somatic TP53 mutations in noncancerous tissues. Proc. Natl Acad. Sci. USA 113, 6005–6010 (2016).
Salk, J. J. et al. Duplex Sequencing detects cancer-associated mutations arising during normal aging: clonal evolution over a century of human lifetime [abstract]. Cancer Res. 77, 3041 (2017).
Jee, J. et al. Rates and mechanisms of bacterial mutagenesis from maximum-depth sequencing. Nature 534, 693–696 (2016).
Maslov, A. Y., Quispe-Tintaya, W., Gorbacheva, T., White, R. R. & Vijg, J. High-throughput sequencing in mutation detection: a new generation of genotoxicity tests? Mutat. Res. 776, 136–143 (2015).
Fielden, M. R. et al.Modernizing human cancer risk assessment of therapeutics. Trends Pharmacol. Sci. https://doi.org/10.1016/j.tips.2017.11.005 (2017).
Kim, D., Kim, S., Kim, S., Park, J. & Kim, J.-S. Genome-wide target specificities of CRISPR-Cas9 nucleases revealed by multiplex Digenome-seq. Genome Res. 26, 406–415 (2016).
Caperton, L. et al. Assisted reproductive technologies do not alter mutation frequency or spectrum. Proc. Natl Acad. Sci. USA 104, 5085–5090 (2007).
Nelson, J. L. The otherness of self: microchimerism in health and disease. Trends Immunol. 33, 421–427 (2012).
Eun, J. K., Guthrie, K. A., Zirpoli, G. & Gadi, V. K. In situ breast cancer and microchimerism. Sci. Rep. 3, 2192 (2013).
Fan, H. C., Blumenfeld, Y. J., Chitkara, U., Hudgins, L. & Quake, S. R. Noninvasive diagnosis of fetal aneuploidy by shotgun sequencing DNA from maternal blood. Proc. Natl Acad. Sci. USA 105, 16266–16271 (2008).
Chiu, R. W. K. et al. Non-invasive prenatal assessment of trisomy 21 by multiplexed maternal plasma DNA sequencing: large scale validity study. BMJ 342, c7401 (2011).
Bianchi, D. W. et al. Noninvasive prenatal testing and incidental detection of occult maternal malignancies. JAMA 314, 162–169 (2015).
Jamuar, S. S. & Walsh, C. A. Somatic mutations in cerebral cortical malformations. N. Engl. J. Med. 371, 2038–2038 (2014).
Poduri, A., Evrony, G. D., Cai, X. & Walsh, C. A. Somatic mutation, genomic variation, and neurological disease. Science 341, 1237758–1237758 (2013).
De Vlaminck, I. et al. Circulating cell-free DNA enables noninvasive diagnosis of heart transplant rejection. Sci. Transl Med. 6, 241ra77 (2014).
Shugay, M. et al. Towards error-free profiling of immune repertoires. Nat. Methods 11, 653–655 (2014).
DeWitt, W. S. et al. Dynamics of the cytotoxic T cell response to a model of acute viral infection. J. Virol. 89, 4517–4526 (2015).
Hsu, M. S. et al. TCR sequencing can identify and track glioma-infiltrating T cells after DC vaccination. Cancer Immunol. Res. 4, 412–418 (2016).
Tumeh, P. C. et al. PD-1 blockade induces responses by inhibiting adaptive immune resistance. Nature 515, 568–571 (2014).
Goodnow, C. C. Multistep pathogenesis of autoimmune disease. Cell 130, 25–35 (2007).
Qian, J. et al. B cell super-enhancers and regulatory clusters recruit AID tumorigenic activity. Cell 159, 1524–1537 (2014).
Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012).
Lynch, S. V. & Pedersen, O. The human intestinal microbiome in health and disease. N. Engl. J. Med. 375, 2369–2379 (2016).
Van de Wiele, T., Van Praet, J. T., Marzorati, M., Drennan, M. B. & Elewaut, D. How the microbiota shapes rheumatic diseases. Nat. Rev. Rheumatol. 12, 398–411 (2016).
Rosenbaum, M., Knight, R. & Leibel, R. L. The gut microbiota in human energy homeostasis and obesity. Trends Endocrinol. Metab. 26, 493–501 (2015).
Alexander, J. L. et al. Gut microbiota modulation of chemotherapy efficacy and toxicity. Nat. Rev. Gastroenterol. Hepatol. 1805, 105 (2017).
Vindigni, S. M. & Surawicz, C. M. Fecal microbiota transplantation. Gastroenterol. Clin. North Am. 46, 171–185 (2017).
Dominguez-Bello, M. G. et al. Partial restoration of the microbiota of cesarean-born infants via vaginal microbial transfer. Nat. Med. 22, 250–253 (2016).
Roach, D. J. et al. A year of infection in the intensive care unit: prospective whole genome sequencing of bacterial clinical isolates reveals cryptic transmissions and novel microbiota. PLOS Genet. 11, e1005413 (2015).
Cummings, L. A. et al. Clinical next generation sequencing outperforms standard microbiological culture for characterizing polymicrobial samples. Clin. Chem. 62, 1465–1473 (2016).
Grumaz, S. et al. Next-generation sequencing diagnostics of bacteremia in septic patients. Genome Med. 8, 73 (2016).
Kim, S. et al. High-throughput automated microfluidic sample preparation for accurate microbial genomics. Nat. Commun. 8, 13919 (2017).
Acevedo, A., Brodsky, L. & Andino, R. Mutational and fitness landscapes of an RNA virus revealed through population sequencing. Nature 505, 686–690 (2014).
Eigen, M. The concept of the quasispecies will soon be 50 years old. Introduction. Curr. Top. Microbiol. Immunol. 392, vii (2016).
Henn, M. R. et al. Whole genome deep sequencing of HIV-1 reveals the impact of early minor variants upon immune recognition during acute infection. PLOS Pathog. 8, e1002529 (2012).
Solmone, M. et al. Use of massively parallel ultradeep pyrosequencing to characterize the genetic diversity of hepatitis B virus in drug-resistant and drug-naive patients and to detect minor variants in reverse transcriptase and hepatitis B S antigen. J. Virol. 83, 1718–1726 (2009).
Svarovskaia, E. S., Martin, R., McHutchison, J. G., Miller, M. D. & Mo, H. Abundant drug-resistant NS3 mutants detected by deep sequencing in hepatitis C virus-infected patients undergoing NS3 protease inhibitor monotherapy. J. Clin. Microbiol. 50, 3267–3274 (2012).
Daum, L. T. et al. Next-generation ion torrent sequencing of drug resistance mutations in Mycobacterium tuberculosis strains. J. Clin. Microbiol. 50, 3831–3837 (2012).
Katz, M., Hover, B. & Brady, S. Culture-independent discovery of natural products from soil metagenomes. J. Ind. Microbiol. Biotechnol. 43, 129–141 (2016).
Bassil, N. M., Bryan, N. & Lloyd, J. R. Microbial degradation of isosaccharinic acid at high pH. ISME J. 9, 310–320 (2015).
Yamamoto, S. et al. Environmental DNA metabarcoding reveals local fish communities in a species-rich coastal sea. Sci. Rep. 7, 40368 (2017).
Mayo, B. et al. Impact of next generation sequencing techniques in food microbiology. Curr. Genom. 15, 293–309 (2014).
Jäger, A. C. et al. Developmental validation of the MiSeq FGx Forensic Genomics System for targeted next generation sequencing in forensic DNA casework and database laboratories. Forensic Sci. Int. Genet. 28, 52–70 (2017).
Stiller, M. et al. Patterns of nucleotide misincorporations during enzymatic amplification and direct large-scale sequencing of ancient DNA. Proc. Natl Acad. Sci. USA 103, 13578–13584 (2006).
Avery, O. T., Macleod, C. M. & McCarty, M. Studies on the chemical nature of the substance inducing transformation of pneumococcal types: induction of transformation by a desoxyribonucleic acid fraction isolated from pneumococcus type III. J. Exp. Med. 79, 137–158 (1944).
Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
Mostovoy, Y. et al. A hybrid approach for de novo human genome sequence assembly and phasing. Nat. Methods 13, 587–590 (2016).
Bickhart, D. M. et al. Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome. Nat. Genet. 49, 643–650 (2017).
King, D. A. et al. Mosaic structural variation in children with developmental disorders. Hum. Mol. Genet. 24, 2733–2745 (2015).
Navin, N. et al. Tumour evolution inferred by single-cell sequencing. Nature 472, 90–94 (2011).
Vitak, S. A. et al. Sequencing thousands of single-cell genomes with combinatorial indexing. Nat. Methods 14, 302–308 (2017).
Zheng, G. X. Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).
Rosenberg, A. B. et al. Scaling single cell transcriptomics through split pool barcoding. Preprint at bioRxiv https://doi.org/10.1101/105163 (2017).
Ullal, A. V. et al. Cancer cell profiling by barcoding allows multiplexed protein analysis in fine-needle aspirates. Sci. Transl Med. 6, 219ra9 (2014).
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Sun, W.-J. et al. RMBase: a resource for decoding the landscape of RNA modifications from high-throughput sequencing data. Nucleic Acids Res. 44, D259–265 (2016).
Wellcome Collection. Charles Robert Darwin. Photograph by L. Darwin. Wellcome Trust https://wellcomecollection.org/works/s6x9wbsj?page=1&query=darwin (2016).
The authors thank R. Risques, J. Hiatt and A. Boswell for critical review; E. Fox and E. H. Ahn for contributions to early drafts; K. Loubet-Senear, R. Risques and M. Emond for graphics ideas; N. Homer and C. Valentine for software information and members of the Loeb, Kennedy and Risques laboratories at the University of Washington for many lively discussions. This work was supported by National Institutes of Health grants T32CA009515 (J.J.S.) and R01CA193649, P01CA77852, and R33CA181771 (L.A.L.).
J.J.S., M.W.S. and L.A.L. are equity holders in TwinStrand Biosciences, Inc.
When referring to a genetic variant or mutation, it is one that is present in all or most molecules in a population being sequenced. The term typically implies that it arose from a common ancestor, such as a fertilized egg in the case of germline variation, or the earliest founder cell of a tumour.
When referring to a genetic variant or mutation, it is one that is present in only a subset of molecules being sequenced. This may refer to either a variant carried by a subpopulation that arose and expanded within a larger population or through mixing of two or more distinct populations.
- Sequencing accuracy
The number of errors made per base pair sequenced. It may be stratified by subtype of error, such as a specific type of base substitution.
- Sequencing sensitivity
The ability to detect a variant at a particular variant allele frequency. This depends on both the sequencing accuracy and the number of independent DNA molecules successfully sequenced that include the genomic position (or positions) of interest.
- Variant allele frequency
(VAF). The fraction of all molecules being sequenced that carry a specific genetic change or mutation at a particular genomic position.
- Digital PCR
DNA amplification carried out in single-molecule reaction chambers. Recently, this has most often entailed microscopic aqueous droplets immersed in oil. When DNA input is sufficiently low, only one molecule will seed each reaction. When allele-specific amplification conditions are used, the number of droplets that successfully amplify can be digitally tabulated to determine the variant allele frequency.
A population of identical amplification copies that originated from a single founder molecule and are spatially colocalized, such as on the surface of a microbead or as a spot on a surface. It is the biochemical analogue of a bacterial colony on a Petri dish.
- Tag-based error correction
Also known as consensus sequencing, an approach for error correction whereby individual DNA molecules are uniquely labelled before amplification and sequencing, and the sequences of the related derivative copies are then compared with each other to exclude errors.
- Short-read platforms
Next-generation sequencing systems that generate reads that are dozens to several hundreds of nucleotides in length, for example, the current Illumina and Thermo Fisher Scientific Ion Torrent platforms and previously manufactured Roche 454 and ABI SOLiD platforms. Current versions sequence amplified polonies, not single molecules.
- Long-read platforms
Next-generation sequencing systems that generate reads that are thousands to tens of thousands of nucleotides in length. These currently include Pacific Biosciences (PacBio) and Oxford Nanopore Technologies, which sequence single molecules, not polonies, and therefore have a higher error rate than short-read platforms.
- Molecular barcode
Also known as a unique molecular identifier (UMI). A set of DNA nucleotide codes where each is affixed to only one or a subset of individual DNA molecules within a sample. The purpose is to uniquely label single molecules for consensus-based error correction or molecular counting. These may be informatically combined with molecule fragmentation points for greater label diversity.
- Index sequence
A particular DNA nucleotide code affixed to all molecules within a given DNA sample that is used for multiplexing samples on a single sequencer run.
- Sequencing depth
The number of sequencing reads that include a particular genomic position in their sequence. Some may be simply PCR copies of the same molecule.
- Molecular depth
The number of collapsed consensus reads derived from an independent DNA molecule that include a particular genomic position.
- Tag clashes
The occurrence of two independent molecules being identically labelled by random chance. This may happen if the diversity of the applied molecular barcodes is too low for the number of DNA molecules sequenced. True mutations may erroneously be excluded.
- False families
Sets of related molecules where an error has occurred during amplification that mutates the common tag sequence to erroneously make it appear that two independent molecules gave rise to these molecules.
- Consensus-making efficiency
The number of raw sequencing reads that are required to form a consensus read. This typically refers to an average: total raw reads divided by total consensus reads.
- Molecular conversion efficiency
The fraction of inputted DNA molecules of interest that are recovered as consensus sequences. This is often described in terms of genome-equivalents.
Abnormal numbers of chromosomes in a cell. This may be inherited, such as trisomy 21, the basis of Down syndrome, or somatically acquired, such as in cancer.
The study of complex microbial populations encompassing many co-mingling species that form an ecosystem, for example, an individual's gut microbiota.
The proper assignment of two or more variants at spatially distant genomic locations to the derivative nucleic acid molecule, for example, the maternal or paternal allele.
About this article
Cite this article
Salk, J., Schmitt, M. & Loeb, L. Enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations. Nat Rev Genet 19, 269–285 (2018). https://doi.org/10.1038/nrg.2017.117
Molecular methods for measurable residual disease in acute myeloid leukemia: where are we and where are we going?
Journal of Hematopathology (2021)
Expert Review of Molecular Diagnostics (2020)
SinoDuplex: An Improved Duplex Sequencing Approach to Detect Low-frequency Variants in Plasma cfDNA Samples
Genomics, Proteomics & Bioinformatics (2020)
Evolution of retrovirus-infected premalignant T-cell clones prior to adult T-cell leukemia/lymphoma diagnosis
Genome Biology (2020)