Molecular digital data storage using DNA

Abstract

Molecular data storage is an attractive alternative for dense and durable information storage, which is sorely needed to deal with the growing gap between information production and the ability to store data. DNA is a clear example of effective archival data storage in molecular form. In this Review, we provide an overview of the process, the state of the art in this area and challenges for mainstream adoption. We also survey the field of in vivo molecular memory systems that record and store information within the DNA of living cells, which, together with in vitro DNA data storage, lie at the growing intersection of computer systems and biotechnology.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Timeline of major published works on digital data storage with DNA.
Fig. 2: Overview of the major steps of digital data storage in DNA.
Fig. 3: Overview of the encoding–decoding process.
Fig. 4: Overview of in vivo strategies for molecular recording and storage of data in DNA.

References

  1. 1.

    Reisel, D., Gantz, J. & Rydning, J. Data age 2025: the digitization of the world from edge to core. Seagate https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf (2018).

  2. 2.

    Rutten, M. G. T. A., Vaandrager, F. W., Elemans, J. A. A. W. & Nolte, R. J. M. Encoding information into polymers. Nat. Rev. Chem. 2, 365–381 (2018).

    Article  Google Scholar 

  3. 3.

    Organick, L. et al. Random access in large-scale DNA data storage. Nat. Biotechnol. 36, 242–248 (2018). This study presents an end-to-end discussion of DNA data storage, demonstrating the ability to perform random access at a large scale, the first error correction that tolerates insertions and deletions, and the largest amount of digital data in DNA as of 2019.

    CAS  Article  Google Scholar 

  4. 4.

    Zhirnov, V., Zadegan, R. M., Sandhu, G. S., Church, G. M. & Hughes, W. L. Nucleic acid memory. Nat. Mater. 15, 366–370 (2016). This paper presents a detailed analysis of properties of DNA as a data storage medium and compares it with other media.

    CAS  Article  Google Scholar 

  5. 5.

    Stewart, K. et al. in DNA Computing and Molecular Programming (eds Doty, D. & Dietz, H.) 55–70 (Springer International Publishing, Cham, 2018).

  6. 6.

    Grass, R. N., Heckel, R., Puddu, M., Paunescu, D. & Stark, W. J. Robust chemical preservation of digital information on DNA in silica with error-correcting codes. Angew. Chem. 54, 2552–2555 (2015). This study introduces the first robust system based on error correcting codes using inner codes and outer codes for DNA data storage, and it demonstrates silica encapsulation for greater durability.

    CAS  Article  Google Scholar 

  7. 7.

    Sheth, R. U. & Wang, H. H. DNA-based memory devices for recording cellular events. Nat. Rev. Genet. 19, 718–732 (2018).

    CAS  Article  Google Scholar 

  8. 8.

    Wiener, N. Interview: machines smarter than men? US News World Rep. 56, 84–86 (1964).

    Google Scholar 

  9. 9.

    Neiman, M. S. On the molecular memory systems and the directed mutations. Radiotekhnika 6, 1–8 (1965).

    Google Scholar 

  10. 10.

    Dawkins, R. The Blind Watchmaker (Longman Scientific & Technical, 1986).

  11. 11.

    Davis, J. Microvenus. Art J. 55, 70–74 (1996).

    Article  Google Scholar 

  12. 12.

    Clelland, C. T., Risca, V. & Bancroft, C. Hiding messages in DNA microdots. Nature 399, 533–534 (1999).

    CAS  Article  Google Scholar 

  13. 13.

    Bancroft, C. Long-term storage of information in DNA. Science 293, 1763–1765 (2001).

    CAS  Article  Google Scholar 

  14. 14.

    Wong, P. C., Wong, K.-k. & Foote, H. Organic data memory using the DNA approach. Commun. ACM 46, 95–98 (2003).

    Article  Google Scholar 

  15. 15.

    Arita, M. & Ohashi, Y. Secret signatures inside genomic DNA. Biotechnol. Prog. 20, 1605–1607 (2004).

    CAS  Article  Google Scholar 

  16. 16.

    Yachie, N., Sekiyama, K., Sugahara, J., Ohashi, Y. & Tomita, M. Alignment-based approach for durable data storage into living organisms. Biotechnol. Prog. 23, 501–505 (2007).

    CAS  Article  Google Scholar 

  17. 17.

    Portney, N. G., Wu, Y., Quezada, L. K., Lonardi, S. & Ozkan, M. Length-based encoding of binary data in DNA. Langmuir 24, 1613–1616 (2008).

    CAS  Article  Google Scholar 

  18. 18.

    Ailenberg, M. & Rotstein, O. D. An improved Huffman coding method for archiving text, images, and music characters in DNA. Biotechniques 47, 747–754 (2009).

    CAS  Article  Google Scholar 

  19. 19.

    Church, G. M., Gao, Y. & Kosuri, S. Next-generation digital information storage in DNA. Science 337, 1628–1628 (2012).

    CAS  Article  Google Scholar 

  20. 20.

    Goldman, N. et al. Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature 494, 77–80 (2013). Church et al. (2012) and Goldman et al. (2013) feature key work on the modern reincarnation and demonstration of DNA data storage ideas.

    CAS  Article  Google Scholar 

  21. 21.

    Lee, H. H., Kalhor, R., Goela, N., Bolot, J. & Church, G. M. Enzymatic DNA synthesis for digital information storage. Preprint at bioRxiv https://www.biorxiv.org/content/10.1101/348987v1 (2018).

  22. 22.

    Gibson, D. G. et al. Creation of a bacterial cell controlled by a chemically synthesized genome. Science 329, 52–57 (2010).

    CAS  Article  Google Scholar 

  23. 23.

    Perli, S. D., Cui, C. H. & Lu, T. K. Continuous genetic recording with self-targeting CRISPR-Cas in human cells. Science 353, aag0511 (2016).

    Article  Google Scholar 

  24. 24.

    Shipman, S. L., Nivala, J., Macklis, J. D. & Church, G. M. Molecular recordings by directed CRISPR spacer acquisition. Science 353, aaf1175 (2016). This paper describes the first demonstration that the CRISPR–Cas adaptation system can be used to store DNA oligonucleotides of arbitrary sequence within the genome.

    Article  Google Scholar 

  25. 25.

    Sheth, R. U., Yim, S. S., Wu, F. L. & Wang, H. H. Multiplex recording of cellular events over time on CRISPR biological tape. Science 358, 1457–1461 (2017).

    CAS  Article  Google Scholar 

  26. 26.

    Tang, W. & Liu, D. R. Rewritable multi-event analog recording in bacterial and mammalian cells. Science 360, eaap8992 (2018).

    Article  Google Scholar 

  27. 27.

    Glaser, J. I. et al. Statistical analysis of molecular signal recording. PLOS Comput. Biol. 9, e1003145 (2013).

    CAS  Article  Google Scholar 

  28. 28.

    Bornholt, J. et al. A DNA-based archival storage system. Presented at the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ‘16) (2016).

  29. 29.

    Heckel, R., Shomorony, I., Ramchandran, K. & Tse, D. N. Fundamental limits of DNA storage systems. Presented at the 2017 IEEE International Symposium on Information Theory (ISIT) (2017).

  30. 30.

    Kosuri, S. & Church, G. M. Large-scale de novo DNA synthesis: technologies and applications. Nat. Methods 11, 499–507 (2014).

    CAS  Article  Google Scholar 

  31. 31.

    Yazdi, S. M. H. T., Yuan, Y., Ma, J., Zhao, H. & Milenkovic, O. A. Rewritable, random-access DNA-based storage system. Sci. Rep. 5, 14138 (2015). This paper proposes PCR-based random access.

    Article  Google Scholar 

  32. 32.

    Erlich, Y. & Zielinski, D. DNA Fountain enables a robust and efficient storage architecture. Science 355, 950–954 (2017).

    CAS  Article  Google Scholar 

  33. 33.

    Caruthers, M. H. The chemical synthesis of DNA/RNA: our gift to science. J. Biol. Chem. 288, 1420–1427 (2013).

    CAS  Article  Google Scholar 

  34. 34.

    Heckel, R., Mikutis, G. & Grass, R. N. A characterization of the DNA data storage channel. Preprint at arXiv https://arxiv.org/abs/1803.03322 (2018).

  35. 35.

    Albrecht, T. R. et al. Bit-patterned magnetic recording: theory, media fabrication, and recording performance. IEEE Trans. Magn. 51, 0800342 (2015).

    Article  Google Scholar 

  36. 36.

    Shannon, C. The mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423 (1948).

    Article  Google Scholar 

  37. 37.

    Reed, I. S. & Solomon, G. Polynomial codes over certain finite fields. J. Soc. Ind. Appl. Math. 8, 300–304 (1960).

    Article  Google Scholar 

  38. 38.

    Rashtchian, C. et al. Clustering billions of reads for DNA data storage. NIPS https://papers.nips.cc/paper/6928-clustering-billions-of-reads-for-dna-data-storage.pdf (2017).

  39. 39.

    Choi, Y. et al. Addition of degenerate bases to DNA-based data storage for increased information capacity. Preprint at bioRxiv https://www.biorxiv.org/content/10.1101/367052v1 (2018).

  40. 40.

    Anavy, L., Vaknin, I., Atar, O., Amit, R. & Yakhini, Z. Improved DNA based storage capacity and fidelity using composite DNA letters. Preprint at bioRxiv https://www.biorxiv.org/content/10.1101/433524v1 (2018).

  41. 41.

    Jensen, M. A. & Davis, R. W. Template-independent enzymatic oligonucleotide synthesis (TiEOS): its history, prospects, and challenges. Biochemistry 57, 1821–1832 (2018).

    CAS  Article  Google Scholar 

  42. 42.

    Palluk, S. et al. De novo DNA synthesis using polymerase-nucleotide conjugates. Nat. Biotechnol. 36, 645–650 (2018).

    CAS  Article  Google Scholar 

  43. 43.

    Yazdi, S. M. H. T., Gabrys, R. & Milenkovic, O. Portable and error-free DNA-based data storage. Sci. Rep. 7, 5011 (2017).

    Article  Google Scholar 

  44. 44.

    Goodwin, S., McPherson, J. D. & McCombie, W. R. Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet. 17, 333–351 (2016).

    CAS  Article  Google Scholar 

  45. 45.

    Shendure, J. et al. DNA sequencing at 40: past, present and future. Nature 550, 345–353 (2017).

    CAS  Article  Google Scholar 

  46. 46.

    Deamer, D., Akeson, M. & Branton, D. Three decades of nanopore sequencing. Nat. Biotechnol. 34, 518–524 (2016).

    CAS  Article  Google Scholar 

  47. 47.

    Baum, E. B. Building an associative memory vastly larger than the brain. Science 268, 583–585 (1995).

    CAS  Article  Google Scholar 

  48. 48.

    Fontana, R. E. & Decad, G. M. Moore’s law realities for recording systems and memory storage components: HDD, tape, NAND, and optical. AIP Adv. 8, 056506 (2018).

    Article  Google Scholar 

  49. 49.

    Carlson, R. Guesstimating the size of the global array synthesis market. Synthesis http://www.synthesis.cc/synthesis/2017/8/guesstimating-the-size-of-the-global-array-synthesis-market (2017).

  50. 50.

    Allentoft, M. E. et al. The half-life of DNA in bone: measuring decay kinetics in 158 dated fossils. Proc. Biol. Sci. 279, 4724–4733 (2012).

    CAS  Article  Google Scholar 

  51. 51.

    Bonnet, J. et al. Chain and conformation stability of solid-state DNA: implications for room temperature storage. Nucleic Acids Res. 38, 1531–1546 (2009).

    Article  Google Scholar 

  52. 52.

    Ivanova, N. V. & Kuzmina, M. L. Protocols for dry DNA storage and shipment at room temperature. Mol. Ecol. Resour. 13, 890–898 (2013).

    CAS  Article  Google Scholar 

  53. 53.

    Howlett, S. E., Castillo, H. S., Gioeni, L. J., Robertson, J. M. & Donfack, J. Evaluation of DNAstableTM for DNA storage at ambient temperature. Forens. Sci. Int. Genet. 8, 170–178 (2014).

    CAS  Article  Google Scholar 

  54. 54.

    Takahashi, C. N., Nguyen, B. H., Strauss, K. & Ceze, L. H. Demonstration of end-to-end automation of DNA data storage. Preprint at bioRxiv https://www.biorxiv.org/content/10.1101/439521v1 (2018).

  55. 55.

    Choi, K., Ng, A. H., Fobel, R. & Wheeler, A. R. Digital microfluidics. Annu. Rev. Anal. Chem. 5, 413–440 (2012).

    CAS  Article  Google Scholar 

  56. 56.

    Prakadan, S. M., Shalek, A. K. & Weitz, D. A. Scaling by shrinking: empowering single-cell ‘omics’ with microfluidic devices. Nat. Rev. Genet. 18, 345–361 (2017).

    CAS  Article  Google Scholar 

  57. 57.

    Willsey, M. et al. in Proc. 24th Int. Conf. on Architectural Support for Programming Languages and Operating Systems 183–197 (ACM, 2019).

  58. 58.

    Newman, S. et al. High density DNA data storage library via dehydration with digital microfluidic retrieval. Nat. Commun. 10, 1706 (2019).

  59. 59.

    Inniss, M. C. & Silver, P. A. Building synthetic memory. Curr. Biol. 23, R812–R816 (2013).

    CAS  Article  Google Scholar 

  60. 60.

    Burrill, D. R. & Silver, P. A. Making cellular memories. Cell 140, 13–18 (2010).

    CAS  Article  Google Scholar 

  61. 61.

    Ham, T. S., Lee, S. K., Keasling, J. D. & Arkin, A. P. Design and construction of a double inversion recombination switch for heritable sequential genetic memory. PLOS ONE 3, e2815 (2008).

    Article  Google Scholar 

  62. 62.

    Bonnet, J., Subsoontorn, P. & Endy, D. Rewritable digital data storage in live cells via engineered control of recombination directionality. Proc. Natl Acad. Sci. USA 109, 8884–8889 (2012).

    CAS  Article  Google Scholar 

  63. 63.

    Friedland, A. E. et al. Synthetic gene networks that count. Science 324, 1199–1202 (2009).

    CAS  Article  Google Scholar 

  64. 64.

    Roquet, N., Soleimany, A. P., Ferris, A. C., Aaronson, S. & Lu, T. K. Synthetic recombinase-based state machines in living cells. Science 353, aad8559 (2016).

    Article  Google Scholar 

  65. 65.

    Yang, L. et al. Permanent genetic memory with >1-byte capacity. Nat. Methods 11, 1261–1266 (2014).

    CAS  Article  Google Scholar 

  66. 66.

    Farzadfard, F. & Lu, T. K. Genomically encoded analog memory with precise in vivo DNA writing in living cell populations. Science 346, 1256272 (2014).

    Article  Google Scholar 

  67. 67.

    Marraffini, L. A. CRISPR-Cas immunity in prokaryotes. Nature 526, 55–61 (2015).

    CAS  Article  Google Scholar 

  68. 68.

    Doudna, J. A. & Charpentier, E. The new frontier of genome engineering with CRISPR-Cas9. Science 346, 1258096 (2014).

    Article  Google Scholar 

  69. 69.

    Kalhor, R., Mali, P. & Church, G. M. Rapidly evolving homing CRISPR barcodes. Nat. Methods 14, 195–200 (2017).

    CAS  Article  Google Scholar 

  70. 70.

    Shipman, S. L., Nivala, J., Macklis, J. D. & Church, G. M. CRISPR-Cas encoding of a digital movie into the genomes of a population of living bacteria. Nature 547, 345–349 (2017).

    CAS  Article  Google Scholar 

  71. 71.

    Tavella, F. et al. DNA molecular storage system: transferring digitally encoded information through bacterial nanonetworks. Preprint at arXiv https://arxiv.org/abs/1801.04774 (2018).

  72. 72.

    Blawat, M. et al. Forward error correction for DNA data storage. Procedia Comput. Sci. 80, 1011–1022 (2016).

    Article  Google Scholar 

Download references

Acknowledgements

The authors thank S. Yekhanin for input on coding methods and R. Carlson, D. Carmean, G. Seelig, B. Nguyen, L. Organick, Y.-J. Chen, K. Stewart, S. D. Ang, M. Willsey, C. Takahashi and R. Lopez for helpful general discussions on DNA data storage. This work was supported, in part, by sponsored research agreements with Microsoft and Oxford Nanopore Technologies and gifts from Microsoft and DARPA under the Molecular Informatics Program.

Reviewer information

Nature Reviews Genetics thanks R. Heckel and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

Affiliations

Authors

Contributions

All authors contributed to all aspects of the manuscript.

Corresponding author

Correspondence to Luis Ceze.

Ethics declarations

Competing interests

L.C. is a consultant to Microsoft and a Venture Partner at Madrona Venture Group. K.S. is employed by Microsoft. J.N. is a consultant to Oxford Nanopore Technologies.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Glossary

Archival storage

A method of retaining information outside of the internal memory of a computer.

Random access

The ability to select a portion of the data stored and thus avoid the need to read all the data in storage.

Erasures

The removal of writing, recorded material or data.

Error correcting codes

The results of mathematical manipulation of data to correct errors inserted in the data as bits are stored, transmitted and so on. The process typically involves computing a summary of the data and storing and/or transmitting it with the data and using the redundant information to correct those errors. An inner code refers to coding within a single strand to correct local errors. An outer code refers to whole new additional strands to deal with errors that are not covered by inner codes, for example, erasures.

Physical redundancy

The number of copies of each DNA species stored. Physical redundancy is not always available in the referenced work in Table 1, so we used the sequencing coverage as an upper bound for this number.

Logical ‘exclusive-or’ operation

A logic operation that outputs true only when inputs differ (that is, 0 xor 0 = 0; 0 xor 1 = 1; 1 xor 0 = 1; or 1 xor 1 = 0).

Logical density

The number of bits per nucleotide in the DNA sequences produced by the encoder.

Access latency

The time needed to retrieve data.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ceze, L., Nivala, J. & Strauss, K. Molecular digital data storage using DNA. Nat Rev Genet 20, 456–466 (2019). https://doi.org/10.1038/s41576-019-0125-3

Download citation

Further reading

Search

Quick links

Sign up for the Nature Briefing newsletter for a daily update on COVID-19 science.
Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing