Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Highly accurate fluorogenic DNA sequencing with information theory–based error correction

Abstract

Eliminating errors in next-generation DNA sequencing has proved challenging. Here we present error-correction code (ECC) sequencing, a method to greatly improve sequencing accuracy by combining fluorogenic sequencing-by-synthesis (SBS) with an information theory–based error-correction algorithm. ECC embeds redundancy in sequencing reads by creating three orthogonal degenerate sequences, generated by alternate dual-base reactions. This is similar to encoding and decoding strategies that have proved effective in detecting and correcting errors in information communication and storage. We show that, when combined with a fluorogenic SBS chemistry with raw accuracy of 98.1%, ECC sequencing provides single-end, error-free sequences up to 200 bp. ECC approaches should enable accurate identification of extremely rare genomic variations in various applications in biology and medicine.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: The schematic of the ECC sequencing approach.
Figure 2: Sequencing signal in a typical fluorogenic degenerate sequencing run.
Figure 3: Information communication model for ECC sequencing.
Figure 4: Dynamic programming-based decoding algorithm.
Figure 5: High accuracy of ECC sequencing.

Similar content being viewed by others

References

  1. Shendure, J., Mitra, R.D., Varma, C. & Church, G.M. Advanced sequencing technologies: methods and goals. Nat. Rev. Genet. 5, 335–344 (2004).

    Article  CAS  Google Scholar 

  2. Koboldt, D.C., Steinberg, K.M., Larson, D.E., Wilson, R.K. & Mardis, E.R. The next-generation sequencing revolution and its impact on genomics. Cell 155, 27–38 (2013).

    Article  CAS  Google Scholar 

  3. Drmanac, R. The advent of personal genome sequencing. Genet. Med. 13, 188–190 (2011).

    Article  Google Scholar 

  4. Mardis, E.R. & Wilson, R.K. Cancer genome sequencing: a review. Hum. Mol. Genet. 18, R2, R163–R168 (2009).

    Article  Google Scholar 

  5. Schrijver, I. et al. Opportunities and challenges associated with clinical diagnostic genome sequencing: a report of the Association for Molecular Pathology. J. Mol. Diagn. 14, 525–540 (2012).

    Article  CAS  Google Scholar 

  6. Goodwin, S., McPherson, J.D. & McCombie, W.R. Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet. 17, 333–351 (2016).

    Article  CAS  Google Scholar 

  7. Mardis, E.R. A decade's perspective on DNA sequencing technology. Nature 470, 198–203 (2011).

    Article  CAS  Google Scholar 

  8. Mardis, E.R. Next-generation sequencing platforms. Annu. Rev. Anal. Chem. (Palo Alto, Calif.) 6, 287–303 (2013).

    Article  CAS  Google Scholar 

  9. Metzker, M.L. Sequencing technologies - the next generation. Nat. Rev. Genet. 11, 31–46 (2010).

    Article  CAS  Google Scholar 

  10. Shendure, J. & Ji, H. Next-generation DNA sequencing. Nat. Biotechnol. 26, 1135–1145 (2008).

    Article  CAS  Google Scholar 

  11. Fuller, C.W. et al. The challenges of sequencing by synthesis. Nat. Biotechnol. 27, 1013–1023 (2009).

    Article  CAS  Google Scholar 

  12. Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. Science 323, 133–138 (2009).

    Article  CAS  Google Scholar 

  13. Bentley, D.R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).

    Article  CAS  Google Scholar 

  14. Braslavsky, I., Hebert, B., Kartalov, E. & Quake, S.R. Sequence information can be obtained from single DNA molecules. Proc. Natl. Acad. Sci. USA 100, 3960–3964 (2003).

    Article  CAS  Google Scholar 

  15. Pushkarev, D., Neff, N.F. & Quake, S.R. Single-molecule sequencing of an individual human genome. Nat. Biotechnol. 27, 847–850 (2009).

    Article  CAS  Google Scholar 

  16. Gao, Y. et al. Single molecule targeted sequencing for cancer gene mutation detection. Sci. Rep. 6, 26110 (2016).

    Article  CAS  Google Scholar 

  17. Ju, J. et al. Four-color DNA sequencing by synthesis using cleavable fluorescent nucleotide reversible terminators. Proc. Natl. Acad. Sci. USA 103, 19635–19640 (2006).

    Article  CAS  Google Scholar 

  18. Guo, J., Yu, L., Turro, N.J. & Ju, J. An integrated system for DNA sequencing by synthesis using novel nucleotide analogues. Acc. Chem. Res. 43, 551–563 (2010).

    Article  CAS  Google Scholar 

  19. Stupi, B.P. et al. Stereochemistry of benzylic carbon substitution coupled with ring modification of 2-nitrobenzyl groups as key determinants for fast-cleaving reversible terminators. Angew. Chem. Int. Ed. 51, 1724–1727 (2012).

    Article  CAS  Google Scholar 

  20. Margulies, M. et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376–380 (2005).

    Article  CAS  Google Scholar 

  21. Rothberg, J.M. et al. An integrated semiconductor device enabling non-optical genome sequencing. Nature 475, 348–352 (2011).

    Article  CAS  Google Scholar 

  22. Sims, P.A., Greenleaf, W.J., Duan, H. & Xie, X.S. Fluorogenic DNA sequencing in PDMS microreactors. Nat. Methods 8, 575–580 (2011).

    Article  CAS  Google Scholar 

  23. Chen, Z. et al. Fluorogenic sequencing using halogen-fluorescein-labeled nucleotides. ChemBioChem 16, 1153–1157 (2015).

    Article  CAS  Google Scholar 

  24. Wu, W. et al. Termination of DNA synthesis by N6-alkylated, not 3′-O-alkylated, photocleavable 2′-deoxyadenosine triphosphates. Nucleic Acids Res. 35, 6339–6349 (2007).

    Article  CAS  Google Scholar 

  25. Rothberg, J.M. & Leamon, J.H. The development and impact of 454 sequencing. Nat. Biotechnol. 26, 1117–1124 (2008).

    Article  CAS  Google Scholar 

  26. Forgetta, V. et al. Sequencing of the Dutch elm disease fungus genome using the Roche/454 GS-FLX Titanium System in a comparison of multiple genomics core facilities. J. Biomol. Tech. 24, 39–49 (2013).

    PubMed  PubMed Central  Google Scholar 

  27. Loman, N.J. et al. Performance comparison of benchtop high-throughput sequencing platforms. Nat. Biotechnol. 30, 434–439 (2012).

    Article  CAS  Google Scholar 

  28. Liu, L. et al. Comparison of next-generation sequencing systems. J. Biomed. Biotechnol. 2012, 251364 (2012).

    PubMed  PubMed Central  Google Scholar 

  29. Urano, Y. et al. Evolution of fluorescein as a platform for finely tunable fluorescence probes. J. Am. Chem. Soc. 127, 4888–4894 (2005).

    Article  CAS  Google Scholar 

  30. Sood, A. et al. Terminal phosphate-labeled nucleotides with improved substrate properties for homogeneous nucleic acid assays. J. Am. Chem. Soc. 127, 2394–2395 (2005).

    Article  CAS  Google Scholar 

  31. Rumble, S.M. et al. SHRiMP: accurate mapping of short color-space reads. PLoS Comput. Biol. 5, e1000386 (2009).

    Article  Google Scholar 

  32. Kinde, I., Wu, J., Papadopoulos, N., Kinzler, K.W. & Vogelstein, B. Detection and quantification of rare mutations with massively parallel sequencing. Proc. Natl. Acad. Sci. USA 108, 9530–9535 (2011).

    Article  Google Scholar 

  33. Hoang, M.L. et al. Genome-wide quantification of rare somatic mutations in normal human tissues using massively parallel sequencing. Proc. Natl. Acad. Sci. USA 113, 9846–9851 (2016).

    Article  CAS  Google Scholar 

  34. Schmitt, M.W. et al. Detection of ultra-rare mutations by next-generation sequencing. Proc. Natl. Acad. Sci. USA 109, 14508–14513 (2012).

    Article  CAS  Google Scholar 

  35. Paten, B., Novak, A. & Haussler, D. Mapping to a reference genome structure. Preprint available at https://arxiv.org/abs/1404.5010v1 (2014).

Download references

Acknowledgements

The authors thank Y. Men, Z. Yu, H. Qiu, C. Zheng, Y. Fu, X. Zhang, T. Chen, L. Wu, S. Zhang, X. Jiang, J. Bu, P.A. Sims, L.L. Tao, and H. Ge for experimental assistance and discussion. This work was supported by the Ministry of Science and Technology of China (863 Program 2012AA02A101), National Natural Science Foundation of China (21327808 and 21525521), Beijing Municipal Commission of Science and Technology (Z111100059111002), and Beijing Advanced Innovation Center for Genomics.

Author information

Authors and Affiliations

Authors

Contributions

Y.H. and X.S.X. conceived the project. Y.H. and Z.C. designed the experiment. Z.C., W.Z., S.Q., L.K., and H.D. performed the experiments. All authors analyzed the data and wrote the manuscript.

Corresponding author

Correspondence to Yanyi Huang.

Ethics declarations

Competing interests

Patent applications based on this work have been filed by Peking University, and licensed to Cygnus, of which all the authors are consultants, Z.C. and H.D. are shareholders.

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Z., Zhou, W., Qiao, S. et al. Highly accurate fluorogenic DNA sequencing with information theory–based error correction. Nat Biotechnol 35, 1170–1178 (2017). https://doi.org/10.1038/nbt.3982

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nbt.3982

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing