Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Highly accurate fluorogenic DNA sequencing with information theory–based error correction

Abstract

Eliminating errors in next-generation DNA sequencing has proved challenging. Here we present error-correction code (ECC) sequencing, a method to greatly improve sequencing accuracy by combining fluorogenic sequencing-by-synthesis (SBS) with an information theory–based error-correction algorithm. ECC embeds redundancy in sequencing reads by creating three orthogonal degenerate sequences, generated by alternate dual-base reactions. This is similar to encoding and decoding strategies that have proved effective in detecting and correcting errors in information communication and storage. We show that, when combined with a fluorogenic SBS chemistry with raw accuracy of 98.1%, ECC sequencing provides single-end, error-free sequences up to 200 bp. ECC approaches should enable accurate identification of extremely rare genomic variations in various applications in biology and medicine.

This is a preview of subscription content

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: The schematic of the ECC sequencing approach.
Figure 2: Sequencing signal in a typical fluorogenic degenerate sequencing run.
Figure 3: Information communication model for ECC sequencing.
Figure 4: Dynamic programming-based decoding algorithm.
Figure 5: High accuracy of ECC sequencing.

References

  1. 1

    Shendure, J., Mitra, R.D., Varma, C. & Church, G.M. Advanced sequencing technologies: methods and goals. Nat. Rev. Genet. 5, 335–344 (2004).

    CAS  Article  Google Scholar 

  2. 2

    Koboldt, D.C., Steinberg, K.M., Larson, D.E., Wilson, R.K. & Mardis, E.R. The next-generation sequencing revolution and its impact on genomics. Cell 155, 27–38 (2013).

    CAS  Article  Google Scholar 

  3. 3

    Drmanac, R. The advent of personal genome sequencing. Genet. Med. 13, 188–190 (2011).

    Article  Google Scholar 

  4. 4

    Mardis, E.R. & Wilson, R.K. Cancer genome sequencing: a review. Hum. Mol. Genet. 18, R2, R163–R168 (2009).

    Article  Google Scholar 

  5. 5

    Schrijver, I. et al. Opportunities and challenges associated with clinical diagnostic genome sequencing: a report of the Association for Molecular Pathology. J. Mol. Diagn. 14, 525–540 (2012).

    CAS  Article  Google Scholar 

  6. 6

    Goodwin, S., McPherson, J.D. & McCombie, W.R. Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet. 17, 333–351 (2016).

    CAS  Article  Google Scholar 

  7. 7

    Mardis, E.R. A decade's perspective on DNA sequencing technology. Nature 470, 198–203 (2011).

    CAS  Article  Google Scholar 

  8. 8

    Mardis, E.R. Next-generation sequencing platforms. Annu. Rev. Anal. Chem. (Palo Alto, Calif.) 6, 287–303 (2013).

    CAS  Article  Google Scholar 

  9. 9

    Metzker, M.L. Sequencing technologies - the next generation. Nat. Rev. Genet. 11, 31–46 (2010).

    CAS  Article  Google Scholar 

  10. 10

    Shendure, J. & Ji, H. Next-generation DNA sequencing. Nat. Biotechnol. 26, 1135–1145 (2008).

    CAS  Article  Google Scholar 

  11. 11

    Fuller, C.W. et al. The challenges of sequencing by synthesis. Nat. Biotechnol. 27, 1013–1023 (2009).

    CAS  Article  Google Scholar 

  12. 12

    Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. Science 323, 133–138 (2009).

    CAS  Article  Google Scholar 

  13. 13

    Bentley, D.R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).

    CAS  Article  Google Scholar 

  14. 14

    Braslavsky, I., Hebert, B., Kartalov, E. & Quake, S.R. Sequence information can be obtained from single DNA molecules. Proc. Natl. Acad. Sci. USA 100, 3960–3964 (2003).

    CAS  Article  Google Scholar 

  15. 15

    Pushkarev, D., Neff, N.F. & Quake, S.R. Single-molecule sequencing of an individual human genome. Nat. Biotechnol. 27, 847–850 (2009).

    CAS  Article  Google Scholar 

  16. 16

    Gao, Y. et al. Single molecule targeted sequencing for cancer gene mutation detection. Sci. Rep. 6, 26110 (2016).

    CAS  Article  Google Scholar 

  17. 17

    Ju, J. et al. Four-color DNA sequencing by synthesis using cleavable fluorescent nucleotide reversible terminators. Proc. Natl. Acad. Sci. USA 103, 19635–19640 (2006).

    CAS  Article  Google Scholar 

  18. 18

    Guo, J., Yu, L., Turro, N.J. & Ju, J. An integrated system for DNA sequencing by synthesis using novel nucleotide analogues. Acc. Chem. Res. 43, 551–563 (2010).

    CAS  Article  Google Scholar 

  19. 19

    Stupi, B.P. et al. Stereochemistry of benzylic carbon substitution coupled with ring modification of 2-nitrobenzyl groups as key determinants for fast-cleaving reversible terminators. Angew. Chem. Int. Ed. 51, 1724–1727 (2012).

    CAS  Article  Google Scholar 

  20. 20

    Margulies, M. et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376–380 (2005).

    CAS  Article  Google Scholar 

  21. 21

    Rothberg, J.M. et al. An integrated semiconductor device enabling non-optical genome sequencing. Nature 475, 348–352 (2011).

    CAS  Article  Google Scholar 

  22. 22

    Sims, P.A., Greenleaf, W.J., Duan, H. & Xie, X.S. Fluorogenic DNA sequencing in PDMS microreactors. Nat. Methods 8, 575–580 (2011).

    CAS  Article  Google Scholar 

  23. 23

    Chen, Z. et al. Fluorogenic sequencing using halogen-fluorescein-labeled nucleotides. ChemBioChem 16, 1153–1157 (2015).

    CAS  Article  Google Scholar 

  24. 24

    Wu, W. et al. Termination of DNA synthesis by N6-alkylated, not 3′-O-alkylated, photocleavable 2′-deoxyadenosine triphosphates. Nucleic Acids Res. 35, 6339–6349 (2007).

    CAS  Article  Google Scholar 

  25. 25

    Rothberg, J.M. & Leamon, J.H. The development and impact of 454 sequencing. Nat. Biotechnol. 26, 1117–1124 (2008).

    CAS  Article  Google Scholar 

  26. 26

    Forgetta, V. et al. Sequencing of the Dutch elm disease fungus genome using the Roche/454 GS-FLX Titanium System in a comparison of multiple genomics core facilities. J. Biomol. Tech. 24, 39–49 (2013).

    PubMed  PubMed Central  Google Scholar 

  27. 27

    Loman, N.J. et al. Performance comparison of benchtop high-throughput sequencing platforms. Nat. Biotechnol. 30, 434–439 (2012).

    CAS  Article  Google Scholar 

  28. 28

    Liu, L. et al. Comparison of next-generation sequencing systems. J. Biomed. Biotechnol. 2012, 251364 (2012).

    PubMed  PubMed Central  Google Scholar 

  29. 29

    Urano, Y. et al. Evolution of fluorescein as a platform for finely tunable fluorescence probes. J. Am. Chem. Soc. 127, 4888–4894 (2005).

    CAS  Article  Google Scholar 

  30. 30

    Sood, A. et al. Terminal phosphate-labeled nucleotides with improved substrate properties for homogeneous nucleic acid assays. J. Am. Chem. Soc. 127, 2394–2395 (2005).

    CAS  Article  Google Scholar 

  31. 31

    Rumble, S.M. et al. SHRiMP: accurate mapping of short color-space reads. PLoS Comput. Biol. 5, e1000386 (2009).

    Article  Google Scholar 

  32. 32

    Kinde, I., Wu, J., Papadopoulos, N., Kinzler, K.W. & Vogelstein, B. Detection and quantification of rare mutations with massively parallel sequencing. Proc. Natl. Acad. Sci. USA 108, 9530–9535 (2011).

    Article  Google Scholar 

  33. 33

    Hoang, M.L. et al. Genome-wide quantification of rare somatic mutations in normal human tissues using massively parallel sequencing. Proc. Natl. Acad. Sci. USA 113, 9846–9851 (2016).

    CAS  Article  Google Scholar 

  34. 34

    Schmitt, M.W. et al. Detection of ultra-rare mutations by next-generation sequencing. Proc. Natl. Acad. Sci. USA 109, 14508–14513 (2012).

    CAS  Article  Google Scholar 

  35. 35

    Paten, B., Novak, A. & Haussler, D. Mapping to a reference genome structure. Preprint available at https://arxiv.org/abs/1404.5010v1 (2014).

Download references

Acknowledgements

The authors thank Y. Men, Z. Yu, H. Qiu, C. Zheng, Y. Fu, X. Zhang, T. Chen, L. Wu, S. Zhang, X. Jiang, J. Bu, P.A. Sims, L.L. Tao, and H. Ge for experimental assistance and discussion. This work was supported by the Ministry of Science and Technology of China (863 Program 2012AA02A101), National Natural Science Foundation of China (21327808 and 21525521), Beijing Municipal Commission of Science and Technology (Z111100059111002), and Beijing Advanced Innovation Center for Genomics.

Author information

Affiliations

Authors

Contributions

Y.H. and X.S.X. conceived the project. Y.H. and Z.C. designed the experiment. Z.C., W.Z., S.Q., L.K., and H.D. performed the experiments. All authors analyzed the data and wrote the manuscript.

Corresponding author

Correspondence to Yanyi Huang.

Ethics declarations

Competing interests

Patent applications based on this work have been filed by Peking University, and licensed to Cygnus, of which all the authors are consultants, Z.C. and H.D. are shareholders.

Supplementary information

Supplementary Note

Supplementary Information (PDF 13087 kb)

Life Sciences Reporting Summary (PDF 176 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Chen, Z., Zhou, W., Qiao, S. et al. Highly accurate fluorogenic DNA sequencing with information theory–based error correction. Nat Biotechnol 35, 1170–1178 (2017). https://doi.org/10.1038/nbt.3982

Download citation

Further reading

Search

Quick links