Review Article | Published:

Molecular digital data storage using DNA

Abstract

Molecular data storage is an attractive alternative for dense and durable information storage, which is sorely needed to deal with the growing gap between information production and the ability to store data. DNA is a clear example of effective archival data storage in molecular form. In this Review, we provide an overview of the process, the state of the art in this area and challenges for mainstream adoption. We also survey the field of in vivo molecular memory systems that record and store information within the DNA of living cells, which, together with in vitro DNA data storage, lie at the growing intersection of computer systems and biotechnology.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. 1.

    Reisel, D., Gantz, J. & Rydning, J. Data age 2025: the digitization of the world from edge to core. Seagate https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf (2018).

  2. 2.

    Rutten, M. G. T. A., Vaandrager, F. W., Elemans, J. A. A. W. & Nolte, R. J. M. Encoding information into polymers. Nat. Rev. Chem. 2, 365–381 (2018).

  3. 3.

    Organick, L. et al. Random access in large-scale DNA data storage. Nat. Biotechnol. 36, 242–248 (2018). This study presents an end-to-end discussion of DNA data storage, demonstrating the ability to perform random access at a large scale, the first error correction that tolerates insertions and deletions, and the largest amount of digital data in DNA as of 2019.

  4. 4.

    Zhirnov, V., Zadegan, R. M., Sandhu, G. S., Church, G. M. & Hughes, W. L. Nucleic acid memory. Nat. Mater. 15, 366–370 (2016). This paper presents a detailed analysis of properties of DNA as a data storage medium and compares it with other media.

  5. 5.

    Stewart, K. et al. in DNA Computing and Molecular Programming (eds Doty, D. & Dietz, H.) 55–70 (Springer International Publishing, Cham, 2018).

  6. 6.

    Grass, R. N., Heckel, R., Puddu, M., Paunescu, D. & Stark, W. J. Robust chemical preservation of digital information on DNA in silica with error-correcting codes. Angew. Chem. 54, 2552–2555 (2015). This study introduces the first robust system based on error correcting codes using inner codes and outer codes for DNA data storage, and it demonstrates silica encapsulation for greater durability.

  7. 7.

    Sheth, R. U. & Wang, H. H. DNA-based memory devices for recording cellular events. Nat. Rev. Genet. 19, 718–732 (2018).

  8. 8.

    Wiener, N. Interview: machines smarter than men? US News World Rep. 56, 84–86 (1964).

  9. 9.

    Neiman, M. S. On the molecular memory systems and the directed mutations. Radiotekhnika 6, 1–8 (1965).

  10. 10.

    Dawkins, R. The Blind Watchmaker (Longman Scientific & Technical, 1986).

  11. 11.

    Davis, J. Microvenus. Art J. 55, 70–74 (1996).

  12. 12.

    Clelland, C. T., Risca, V. & Bancroft, C. Hiding messages in DNA microdots. Nature 399, 533–534 (1999).

  13. 13.

    Bancroft, C. Long-term storage of information in DNA. Science 293, 1763–1765 (2001).

  14. 14.

    Wong, P. C., Wong, K.-k. & Foote, H. Organic data memory using the DNA approach. Commun. ACM 46, 95–98 (2003).

  15. 15.

    Arita, M. & Ohashi, Y. Secret signatures inside genomic DNA. Biotechnol. Prog. 20, 1605–1607 (2004).

  16. 16.

    Yachie, N., Sekiyama, K., Sugahara, J., Ohashi, Y. & Tomita, M. Alignment-based approach for durable data storage into living organisms. Biotechnol. Prog. 23, 501–505 (2007).

  17. 17.

    Portney, N. G., Wu, Y., Quezada, L. K., Lonardi, S. & Ozkan, M. Length-based encoding of binary data in DNA. Langmuir 24, 1613–1616 (2008).

  18. 18.

    Ailenberg, M. & Rotstein, O. D. An improved Huffman coding method for archiving text, images, and music characters in DNA. Biotechniques 47, 747–754 (2009).

  19. 19.

    Church, G. M., Gao, Y. & Kosuri, S. Next-generation digital information storage in DNA. Science 337, 1628–1628 (2012).

  20. 20.

    Goldman, N. et al. Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature 494, 77–80 (2013). Church et al. (2012) and Goldman et al. (2013) feature key work on the modern reincarnation and demonstration of DNA data storage ideas.

  21. 21.

    Lee, H. H., Kalhor, R., Goela, N., Bolot, J. & Church, G. M. Enzymatic DNA synthesis for digital information storage. Preprint at bioRxiv https://www.biorxiv.org/content/10.1101/348987v1 (2018).

  22. 22.

    Gibson, D. G. et al. Creation of a bacterial cell controlled by a chemically synthesized genome. Science 329, 52–57 (2010).

  23. 23.

    Perli, S. D., Cui, C. H. & Lu, T. K. Continuous genetic recording with self-targeting CRISPR-Cas in human cells. Science 353, aag0511 (2016).

  24. 24.

    Shipman, S. L., Nivala, J., Macklis, J. D. & Church, G. M. Molecular recordings by directed CRISPR spacer acquisition. Science 353, aaf1175 (2016). This paper describes the first demonstration that the CRISPR–Cas adaptation system can be used to store DNA oligonucleotides of arbitrary sequence within the genome.

  25. 25.

    Sheth, R. U., Yim, S. S., Wu, F. L. & Wang, H. H. Multiplex recording of cellular events over time on CRISPR biological tape. Science 358, 1457–1461 (2017).

  26. 26.

    Tang, W. & Liu, D. R. Rewritable multi-event analog recording in bacterial and mammalian cells. Science 360, eaap8992 (2018).

  27. 27.

    Glaser, J. I. et al. Statistical analysis of molecular signal recording. PLOS Comput. Biol. 9, e1003145 (2013).

  28. 28.

    Bornholt, J. et al. A DNA-based archival storage system. Presented at the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ‘16) (2016).

  29. 29.

    Heckel, R., Shomorony, I., Ramchandran, K. & Tse, D. N. Fundamental limits of DNA storage systems. Presented at the 2017 IEEE International Symposium on Information Theory (ISIT) (2017).

  30. 30.

    Kosuri, S. & Church, G. M. Large-scale de novo DNA synthesis: technologies and applications. Nat. Methods 11, 499–507 (2014).

  31. 31.

    Yazdi, S. M. H. T., Yuan, Y., Ma, J., Zhao, H. & Milenkovic, O. A. Rewritable, random-access DNA-based storage system. Sci. Rep. 5, 14138 (2015). This paper proposes PCR-based random access.

  32. 32.

    Erlich, Y. & Zielinski, D. DNA Fountain enables a robust and efficient storage architecture. Science 355, 950–954 (2017).

  33. 33.

    Caruthers, M. H. The chemical synthesis of DNA/RNA: our gift to science. J. Biol. Chem. 288, 1420–1427 (2013).

  34. 34.

    Heckel, R., Mikutis, G. & Grass, R. N. A characterization of the DNA data storage channel. Preprint at arXiv https://arxiv.org/abs/1803.03322 (2018).

  35. 35.

    Albrecht, T. R. et al. Bit-patterned magnetic recording: theory, media fabrication, and recording performance. IEEE Trans. Magn. 51, 0800342 (2015).

  36. 36.

    Shannon, C. The mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423 (1948).

  37. 37.

    Reed, I. S. & Solomon, G. Polynomial codes over certain finite fields. J. Soc. Ind. Appl. Math. 8, 300–304 (1960).

  38. 38.

    Rashtchian, C. et al. Clustering billions of reads for DNA data storage. NIPS https://papers.nips.cc/paper/6928-clustering-billions-of-reads-for-dna-data-storage.pdf (2017).

  39. 39.

    Choi, Y. et al. Addition of degenerate bases to DNA-based data storage for increased information capacity. Preprint at bioRxiv https://www.biorxiv.org/content/10.1101/367052v1 (2018).

  40. 40.

    Anavy, L., Vaknin, I., Atar, O., Amit, R. & Yakhini, Z. Improved DNA based storage capacity and fidelity using composite DNA letters. Preprint at bioRxiv https://www.biorxiv.org/content/10.1101/433524v1 (2018).

  41. 41.

    Jensen, M. A. & Davis, R. W. Template-independent enzymatic oligonucleotide synthesis (TiEOS): its history, prospects, and challenges. Biochemistry 57, 1821–1832 (2018).

  42. 42.

    Palluk, S. et al. De novo DNA synthesis using polymerase-nucleotide conjugates. Nat. Biotechnol. 36, 645–650 (2018).

  43. 43.

    Yazdi, S. M. H. T., Gabrys, R. & Milenkovic, O. Portable and error-free DNA-based data storage. Sci. Rep. 7, 5011 (2017).

  44. 44.

    Goodwin, S., McPherson, J. D. & McCombie, W. R. Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet. 17, 333–351 (2016).

  45. 45.

    Shendure, J. et al. DNA sequencing at 40: past, present and future. Nature 550, 345–353 (2017).

  46. 46.

    Deamer, D., Akeson, M. & Branton, D. Three decades of nanopore sequencing. Nat. Biotechnol. 34, 518–524 (2016).

  47. 47.

    Baum, E. B. Building an associative memory vastly larger than the brain. Science 268, 583–585 (1995).

  48. 48.

    Fontana, R. E. & Decad, G. M. Moore’s law realities for recording systems and memory storage components: HDD, tape, NAND, and optical. AIP Adv. 8, 056506 (2018).

  49. 49.

    Carlson, R. Guesstimating the size of the global array synthesis market. Synthesis http://www.synthesis.cc/synthesis/2017/8/guesstimating-the-size-of-the-global-array-synthesis-market (2017).

  50. 50.

    Allentoft, M. E. et al. The half-life of DNA in bone: measuring decay kinetics in 158 dated fossils. Proc. Biol. Sci. 279, 4724–4733 (2012).

  51. 51.

    Bonnet, J. et al. Chain and conformation stability of solid-state DNA: implications for room temperature storage. Nucleic Acids Res. 38, 1531–1546 (2009).

  52. 52.

    Ivanova, N. V. & Kuzmina, M. L. Protocols for dry DNA storage and shipment at room temperature. Mol. Ecol. Resour. 13, 890–898 (2013).

  53. 53.

    Howlett, S. E., Castillo, H. S., Gioeni, L. J., Robertson, J. M. & Donfack, J. Evaluation of DNAstableTM for DNA storage at ambient temperature. Forens. Sci. Int. Genet. 8, 170–178 (2014).

  54. 54.

    Takahashi, C. N., Nguyen, B. H., Strauss, K. & Ceze, L. H. Demonstration of end-to-end automation of DNA data storage. Preprint at bioRxiv https://www.biorxiv.org/content/10.1101/439521v1 (2018).

  55. 55.

    Choi, K., Ng, A. H., Fobel, R. & Wheeler, A. R. Digital microfluidics. Annu. Rev. Anal. Chem. 5, 413–440 (2012).

  56. 56.

    Prakadan, S. M., Shalek, A. K. & Weitz, D. A. Scaling by shrinking: empowering single-cell ‘omics’ with microfluidic devices. Nat. Rev. Genet. 18, 345–361 (2017).

  57. 57.

    Willsey, M. et al. in Proc. 24th Int. Conf. on Architectural Support for Programming Languages and Operating Systems 183–197 (ACM, 2019).

  58. 58.

    Newman, S. et al. High density DNA data storage library via dehydration with digital microfluidic retrieval. Nat. Commun. 10, 1706 (2019).

  59. 59.

    Inniss, M. C. & Silver, P. A. Building synthetic memory. Curr. Biol. 23, R812–R816 (2013).

  60. 60.

    Burrill, D. R. & Silver, P. A. Making cellular memories. Cell 140, 13–18 (2010).

  61. 61.

    Ham, T. S., Lee, S. K., Keasling, J. D. & Arkin, A. P. Design and construction of a double inversion recombination switch for heritable sequential genetic memory. PLOS ONE 3, e2815 (2008).

  62. 62.

    Bonnet, J., Subsoontorn, P. & Endy, D. Rewritable digital data storage in live cells via engineered control of recombination directionality. Proc. Natl Acad. Sci. USA 109, 8884–8889 (2012).

  63. 63.

    Friedland, A. E. et al. Synthetic gene networks that count. Science 324, 1199–1202 (2009).

  64. 64.

    Roquet, N., Soleimany, A. P., Ferris, A. C., Aaronson, S. & Lu, T. K. Synthetic recombinase-based state machines in living cells. Science 353, aad8559 (2016).

  65. 65.

    Yang, L. et al. Permanent genetic memory with >1-byte capacity. Nat. Methods 11, 1261–1266 (2014).

  66. 66.

    Farzadfard, F. & Lu, T. K. Genomically encoded analog memory with precise in vivo DNA writing in living cell populations. Science 346, 1256272 (2014).

  67. 67.

    Marraffini, L. A. CRISPR-Cas immunity in prokaryotes. Nature 526, 55–61 (2015).

  68. 68.

    Doudna, J. A. & Charpentier, E. The new frontier of genome engineering with CRISPR-Cas9. Science 346, 1258096 (2014).

  69. 69.

    Kalhor, R., Mali, P. & Church, G. M. Rapidly evolving homing CRISPR barcodes. Nat. Methods 14, 195–200 (2017).

  70. 70.

    Shipman, S. L., Nivala, J., Macklis, J. D. & Church, G. M. CRISPR-Cas encoding of a digital movie into the genomes of a population of living bacteria. Nature 547, 345–349 (2017).

  71. 71.

    Tavella, F. et al. DNA molecular storage system: transferring digitally encoded information through bacterial nanonetworks. Preprint at arXiv https://arxiv.org/abs/1801.04774 (2018).

  72. 72.

    Blawat, M. et al. Forward error correction for DNA data storage. Procedia Comput. Sci. 80, 1011–1022 (2016).

Download references

Acknowledgements

The authors thank S. Yekhanin for input on coding methods and R. Carlson, D. Carmean, G. Seelig, B. Nguyen, L. Organick, Y.-J. Chen, K. Stewart, S. D. Ang, M. Willsey, C. Takahashi and R. Lopez for helpful general discussions on DNA data storage. This work was supported, in part, by sponsored research agreements with Microsoft and Oxford Nanopore Technologies and gifts from Microsoft and DARPA under the Molecular Informatics Program.

Reviewer information

Nature Reviews Genetics thanks R. Heckel and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

All authors contributed to all aspects of the manuscript.

Competing interests

L.C. is a consultant to Microsoft and a Venture Partner at Madrona Venture Group. K.S. is employed by Microsoft. J.N. is a consultant to Oxford Nanopore Technologies.

Correspondence to Luis Ceze.

Glossary

Archival storage

A method of retaining information outside of the internal memory of a computer.

Random access

The ability to select a portion of the data stored and thus avoid the need to read all the data in storage.

Erasures

The removal of writing, recorded material or data.

Error correcting codes

The results of mathematical manipulation of data to correct errors inserted in the data as bits are stored, transmitted and so on. The process typically involves computing a summary of the data and storing and/or transmitting it with the data and using the redundant information to correct those errors. An inner code refers to coding within a single strand to correct local errors. An outer code refers to whole new additional strands to deal with errors that are not covered by inner codes, for example, erasures.

Physical redundancy

The number of copies of each DNA species stored. Physical redundancy is not always available in the referenced work in Table 1, so we used the sequencing coverage as an upper bound for this number.

Logical ‘exclusive-or’ operation

A logic operation that outputs true only when inputs differ (that is, 0 xor 0 = 0; 0 xor 1 = 1; 1 xor 0 = 1; or 1 xor 1 = 0).

Logical density

The number of bits per nucleotide in the DNA sequences produced by the encoder.

Access latency

The time needed to retrieve data.

Rights and permissions

Reprints and Permissions

About this article

Fig. 1: Timeline of major published works on digital data storage with DNA.
Fig. 2: Overview of the major steps of digital data storage in DNA.
Fig. 3: Overview of the encoding–decoding process.
Fig. 4: Overview of in vivo strategies for molecular recording and storage of data in DNA.