Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Protocol
  • Published:

Reading and writing digital data in DNA

Abstract

Because of its longevity and enormous information density, DNA is considered a promising data storage medium. In this work, we provide instructions for archiving digital information in the form of DNA and for subsequently retrieving it from the DNA. In principle, information can be represented in DNA by simply mapping the digital information to DNA and synthesizing it. However, imperfections in synthesis, sequencing, storage and handling of the DNA induce errors within the molecules, making error-free information storage challenging. The procedure discussed here enables error-free storage by protecting the information using error-correcting codes. Specifically, in this protocol, we provide the technical details and precise instructions for translating digital information to DNA sequences, physically handling the biomolecules, storing them and subsequently re-obtaining the information by sequencing the DNA. Along with the protocol, we provide computer code that automatically encodes digital information to DNA sequences and decodes the information back from DNA to a digital file. The required software is provided on a Github repository. The protocol relies on commercial DNA synthesis and DNA sequencing via Illumina dye sequencing, and requires 1–2 h of preparation time, 1/2 d for sequencing preparation and 2–4 h for data analysis. This protocol focuses on storage scales of ~100 kB to 15 MB, offering an ideal starting point for small experiments. It can be augmented to enable higher data volumes and random access to the data and also allows for future sequencing and synthesis technologies, by changing the parameters of the encoder/decoder to account for the corresponding error rates.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Process overview of DNA data storage.
Fig. 2: Reed–Solomon-based error-correcting coding scheme from file to DNA.
Fig. 3: Sequencing preparation scheme from synthesized DNA to DNA ready for Illumina sequencing with quality control.

Similar content being viewed by others

Data and code availability

Supplementary Software containing the code version used in the protocol, together with all test data and documentation, can be found in the following GitHub and Figshare repositories: https://github.com/reinhardh/dna_rs_coding and https://doi.org/10.6084/m9.figshare.c.4546937.

References

  1. Valladas, H. et al. Radiocarbon AMS dates for paleolithic cave paintings. Radiocarbon 43, 977–986 (2001).

    Article  CAS  Google Scholar 

  2. Kutschera, W. & Rom, W. Ötzi, the prehistoric Iceman. Nucl. Instr. Methods Phys. Res. 164, 12–22 (2000).

    Article  Google Scholar 

  3. Keller, A. et al. New insights into the Tyrolean Iceman’s origin and phenotype as inferred by whole-genome sequencing. Nat. Commun. 3, 698 (2012).

    Article  Google Scholar 

  4. Rutten, M., Vaandrager, F. W., Elemans, J. A. A. W. & Nolte, R. J. M. Encoding information into polymers. Nat. Rev. Chem. 2, 365–381 (2018).

    Article  Google Scholar 

  5. Neiman, M. S. Some fundamental issues of microminiaturization. Radiotekhnika 2, 3–12 (1964).

    Google Scholar 

  6. Goldman, N. et al. Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature 494, 77–80 (2013).

    Article  CAS  Google Scholar 

  7. Church, G. M., Gao, Y. & Kosuri, S. Next-generation digital information storage in DNA. Science 337, 1628 (2012).

    Article  CAS  Google Scholar 

  8. Grass, R. N., Heckel, R., Puddu, M., Paunescu, D. & Stark, W. J. Robust chemical preservation of digital information on DNA in silica with error-correcting codes. Angew. Chem. Int. Ed. Engl. 54, 2552–2555 (2015).

    Article  CAS  Google Scholar 

  9. Yazdi, S. M. H. T., Yuan, Y., Ma, J., Zhao, H. & Milenkovic, O. A rewritable, random-access DNA-based storage system. Sci. Rep. 5, 14138 (2015).

    Article  Google Scholar 

  10. Erlich, Y. & Zielinski, D. DNA Fountain enables a robust and efficient storage architecture. Science 355, 950–954 (2017).

    Article  CAS  Google Scholar 

  11. Organick, L. et al. Random access in large-scale DNA data storage. Nat. Biotechnol. 36, 242–250 (2018).

    Article  CAS  Google Scholar 

  12. Bergamin, F. Entire music album to be stored on DNA. ETH Zürich https://www.ethz.ch/en/news-and-events/eth-news/news/2018/04/entire-music-album-to-be-stored-on-DNA.html (2018).

  13. Hesketh, E. E., Sayir, J. & Goldman, N. Improving communication for interdisciplinary teams working on storage of digital information in DNA. F1000Res. 7, 39 (2018).

    Article  Google Scholar 

  14. Lu, H., Giordano, F. & Ning, Z. Oxford nanopore MinION sequencing and genome assembly. Genomics Proteom. Bioinforma. 14, 265–279 (2016).

    Article  Google Scholar 

  15. Bossert, M. Channel Coding for Telecommunications (Wiley, 1999).

  16. Heckel, R., Mikutis, G. & Grass, R. N. A characterization of the DNA data storage channel. Sci. Rep. 9, 9663 (2018).

    Article  Google Scholar 

  17. Singleton, R. C. Maximum distance Q-nary codes. IEEE Trans. Inf. Theory 10, 116–118 (1964).

    Article  Google Scholar 

  18. Costello, D. J. Jr & Forney, G. D. Jr Channel coding: the road to channel capacity. Proc. IEEE 95, 1150–1177 (2007).

    Article  Google Scholar 

  19. Reed, I. S. A brief history of the development of error correcting codes. Comput. Math. Appl. 39, 89–93 (2000).

    Article  Google Scholar 

  20. MacKay, D. J. C. Fountain codes. IEEE Commun. 152, 1062–2425 (2005).

    Article  Google Scholar 

  21. Heckel, R. An archive written in DNA. Nat. Biotechnol. 36, 236–237 (2018).

    Article  CAS  Google Scholar 

  22. Heckel, R., Shomorony, I., Ramchandran, K. & Tse, D. N. C. Fundamental limits of DNA storage systems. 2017 IEEE International Symposium on Information Theory (ISIT), 3130–3134 (2017).

  23. Shomorony, I. & Heckel, R. Capacity results for the noisy shuffling channel. 2019 IEEE International Symposium on Information Theory (ISIT), 762–766 (2019).

  24. Paunescu, D., Puddu, M., Soellner, J. O. B., Stoessel, P. R. & Grass, R. N. Reversible DNA encapsulation in silica to produce ROS-resistant and heat-resistant synthetic DNA ‘fossils’. Nat. Protoc. 8, 2440–2448 (2013).

    Article  CAS  Google Scholar 

  25. Bonnet, J. et al. Chain and conformation stability of solid-state DNA: implications for room temperature storage. Nucleic Acids Res. 38, 1531–1546 (2009).

    Article  Google Scholar 

  26. Nakata, T. & Kubo, I. A coupon collector’s problem with bonuses. DMTCS Proc. AG, 215–224 (2006).

  27. Blawat, M. et al. Forward error correction for DNA data storage. Procedia Comput. Sci. 80, 1011–1022 (2016).

    Article  Google Scholar 

  28. Hamming, R. W. Error detecting and error correcting codes. Bell Syst. Tech. J. 29, 147–160 (1950).

    Article  Google Scholar 

  29. Gottesman, D. Efficient fault tolerance. Nature 450, 44–45 (2016).

    Article  Google Scholar 

  30. Campbell, E. T., Terhal, B. M. & Vuillot, C. Roads towards fault-tolerant universal quantum computation. Nature 549, 172–179 (2017).

    Article  CAS  Google Scholar 

  31. Shannon, C. E. A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423 (1948).

    Article  Google Scholar 

  32. Solomon, G. & Reed, I. S. Polynomial codes over certain finite fields. J. Soc. Ind. Appl. Math. 8, 300–304 (1960).

    Article  Google Scholar 

  33. Michelson, A. M. & Todd, A. R. Nucleotides part XXXII. Synthesis of a dithymidine dinucleotide containing a 3’: 5’-internucleotidic linkage. J. Chem. Soc. 0, 2632–2638 (1955).

    Article  CAS  Google Scholar 

  34. Kosuri, S. & Church, G. M. Large-scale de novo DNA synthesis: technologies and applications. Nat. Methods 11, 499–507 (2014).

    Article  CAS  Google Scholar 

  35. Custom Microarrays and Oligo Pools. CustomArray http://www.customarrayinc.com/oligos_main.htm (accessed 8 April 2019).

  36. LeProust, E. M. et al. Synthesis of high-quality libraries of long (150mer) oligonucleotides by a novel depurination controlled process. Nucleic Acids Res. 38, 2522–2540 (2010).

    Article  CAS  Google Scholar 

  37. Bioscience & Twist. Case Update—Agilent v. Twist Litigation (2019).

  38. Maurer, K. et al. Electrochemically generated acid and its containment to 100 micron reaction areas for the production of DNA microarrays. PLOS ONE 1, e34 (2006).

    Article  Google Scholar 

  39. Yazdi, S. M. H. T. et al. DNA-based storage: trends and methods. IEEE Trans. Mol. Biol. Multi Scale Commun. 1, 230–248 (2015).

    Article  Google Scholar 

  40. Palluk, S. et al. De novo DNA synthesis using polymerasenucleotide conjugates. Nat. Biotechnol. 36, 645–650 (2018).

    Article  CAS  Google Scholar 

  41. Plesa, C., Sidore, A. M., Lubock, N. B., Zhang, D. & Kosuri, S. Multiplexed gene synthesis in emulsions for exploring protein functional landscapes. Science 359, 343–347 (2018).

    Article  CAS  Google Scholar 

  42. Lee, H. H., Kalhor, R., Goela, N., Bolot, J. & Church, G. M. Terminator-free template-independent enzymatic DNA synthesis for digital information storage. Nat. Commun. 10, 2383 (2019).

    Article  Google Scholar 

Download references

Acknowledgements

We thank ICB/ETH Zurich for funding and the Beat Christen Group at ETH for giving access to the iSeq 100 sequencer.

Author information

Authors and Affiliations

Authors

Contributions

R.N.G. initiated and supervised the project with input from W.J.S. R.H. designed and developed the code and coding scheme. P.L.A. and J.K. performed the experiments. A.X.K., W.D.C. and L.C.M. prepared illustrations. L.C.M., R.H. and R.N.G. wrote the manuscript with input and approval from all authors.

Corresponding authors

Correspondence to Reinhard Heckel or Robert N. Grass.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

Key references using this protocol

Grass, R. et al. Angew. Chem. Int. Ed. 54, 2552–2555 (2015): https://doi.org/10.1002/anie.201411378

Chen, W. et al. Adv. Funct. Mater. 29, 1–8 (2019): https://doi.org/10.1002/adfm.201901672

Heckel, R. et al. Sci. Rep. 9, 9663 (2019): https://doi.org/10.1038/s41598-019-45832-6

Supplementary information

Supplementary Manual

README file. Description of coding scheme with additional explanations of coding parameters and examples for how to utilize the code. Additionally, code installation instructions are given for Windows, Linux, and macOS.

Reporting Summary

Supplementary Software

Error-correcting code (C++). Error-correcting scheme for storing information in DNA using Reed–Solomon codes.

Supplementary Data 1

Coding parameters. File to aid parameter selection by choosing redundancy, file size, number of sequences to be synthesized, and the sequence.

Supplementary Data 2

Files to be encoded. Sample file to be encoded as an illustrative example of the protocol’s procedure. Here the first five protocols published in Nature Protocols were chosen.

Supplementary Data 3

The output of the decoder using Supplementary Data 2 as input, executed on a macOS operating system with default parameters as given in the Anticipated results (K = 32, N = 34, l = 4, nuss = 12, n = 12,472, k = 9,000, resulting in sequences of length 102 nt each).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Meiser, L.C., Antkowiak, P.L., Koch, J. et al. Reading and writing digital data in DNA. Nat Protoc 15, 86–101 (2020). https://doi.org/10.1038/s41596-019-0244-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41596-019-0244-5

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics