Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Random access DNA memory using Boolean search in an archival file storage system


DNA is an ultrahigh-density storage medium that could meet exponentially growing worldwide demand for archival data storage if DNA synthesis costs declined sufficiently and if random access of files within exabyte-to-yottabyte-scale DNA data pools were feasible. Here, we demonstrate a path to overcome the second barrier by encapsulating data-encoding DNA file sequences within impervious silica capsules that are surface labelled with single-stranded DNA barcodes. Barcodes are chosen to represent file metadata, enabling selection of sets of files with Boolean logic directly, without use of amplification. We demonstrate random access of image files from a prototypical 2-kilobyte image database using fluorescence sorting with selection sensitivity of one in 106 files, which thereby enables one in 106N selection capability using N optical channels. Our strategy thereby offers a scalable concept for random access of archival files in large-scale molecular datasets.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Write–access–read cycle for a content-addressable molecular file system.
Fig. 2: Encapsulation of DNA plasmids into silica and surface barcoding.
Fig. 3: Single-barcode sorting.
Fig. 4: Fundamental Boolean logic gates.
Fig. 5: Arbitrary logic searching.

Data availability

Gene sequences and plasmid maps are available from AddGene ( Insert sequences and barcoding sequences are given in Supplementary Tables 1 and 2. All the data files used to generate the plots in this manuscript are available from M.B. upon request.

Code availability

Software for sequence encoding and decoding is publicly available on GitHub (


  1. 1.

    Zhirnov, V., Zadegan, R. M., Sandhu, G. S., Church, G. M. & Hughes, W. L. Nucleic acid memory. Nat. Mater. 15, 366–370 (2016).

    CAS  Article  Google Scholar 

  2. 2.

    Ceze, L., Nivala, J. & Strauss, K. Molecular digital data storage using DNA. Nat. Rev. Genet. 20, 456–466 (2019).

    CAS  Article  Google Scholar 

  3. 3.

    Kosuri, S. & Church, G. M. Large-scale de novo DNA synthesis: technologies and applications. Nat. Methods 11, 499–507 (2014).

    CAS  Article  Google Scholar 

  4. 4.

    Palluk, S. et al. De novo DNA synthesis using polymerase-nucleotide conjugates. Nat. Biotechnol. 36, 645–650 (2018).

    CAS  Article  Google Scholar 

  5. 5.

    Lee, H. H., Kalhor, R., Goela, N., Bolot, J. & Church, G. M. Terminator-free template-independent enzymatic DNA synthesis for digital information storage. Nat. Commun. 10, 2383 (2019).

    Article  CAS  Google Scholar 

  6. 6.

    Church, G. M., Gao, Y. & Kosuri, S. Next-generation digital information storage in DNA. Science 337, 1628–1628 (2012).

    CAS  Article  Google Scholar 

  7. 7.

    Goldman, N. et al. Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature 494, 77–80 (2013).

    CAS  Article  Google Scholar 

  8. 8.

    Yazdi, S. M. H. T., Yuan, Y., Ma, J., Zhao, H. & Milenkovic, O. A rewritable, random-access DNA-based storage system. Sci. Rep. 5, 14138 (2015).

    Article  CAS  Google Scholar 

  9. 9.

    Grass, R. N., Heckel, R., Puddu, M., Paunescu, D. & Stark, W. J. Robust chemical preservation of digital information on DNA in silica with error-correcting codes. Angew. Chem. Int. Ed. 54, 2552–2555 (2015).

    CAS  Article  Google Scholar 

  10. 10.

    Yazdi, S. M. H. T., Gabrys, R. & Milenkovic, O. Portable and error-free DNA-based data storage. Sci. Rep. 7, 5011 (2017).

    Article  CAS  Google Scholar 

  11. 11.

    Erlich, Y. & Zielinski, D. DNA Fountain enables a robust and efficient storage architecture. Science 355, 950–954 (2017).

    CAS  Article  Google Scholar 

  12. 12.

    Organick, L. et al. Random access in large-scale DNA data storage. Nat. Biotechnol. 36, 242–248 (2018).

    CAS  Article  Google Scholar 

  13. 13.

    Ranu, N., Villani, A.-C., Hacohen, N. & Blainey, P. C. Targeting individual cells by barcode in pooled sequence libraries. Nucleic Acids Res. 47, e4 (2018).

    Article  CAS  Google Scholar 

  14. 14.

    Kashiwamura, S., Yamamoto, M., Kameda, A., Shiba, T. & Ohuchi, A. Hierarchical DNA memory based on nested PCR. In 8th International Workshop on DNA-Based Computers (DNA8) (eds Hagiya, M. & Ohuchi, A.) 112–123 (Springer, 2003).

  15. 15.

    Yamamoto, M., Kashiwamura, S., Ohuchi, A. & Furukawa, M. Large-scale DNA memory based on the nested PCR. Nat. Comput. 7, 335–346 (2008).

    CAS  Article  Google Scholar 

  16. 16.

    Yamamoto, M., Kashiwamura, S. & Ohuchi, A. DNA memory with 16.8M addresses. In 13th International Meeting on DNA Computing (DNA13) (eds Garzon, M. H. & Yan, H.) 99–108 (Springer, 2008).

  17. 17.

    Tomek, K. J. et al. Driving the scalability of DNA-based information storage systems. ACS Synth. Biol. 8, 1241–1248 (2019).

    CAS  Article  Google Scholar 

  18. 18.

    Organick, L. et al. Probing the physical limits of reliable DNA data retrieval. Nat. Commun. 11, 616 (2020).

    CAS  Article  Google Scholar 

  19. 19.

    Chen, Y.-J. et al. Quantifying molecular bias in DNA data storage. Nat. Commun. 11, 3264 (2020).

    CAS  Article  Google Scholar 

  20. 20.

    Xu, Q., Schlabach, M. R., Hannon, G. J. & Elledge, S. J. Design of 240,000 orthogonal 25mer DNA barcode probes. Proc. Natl Acad. Sci. USA 106, 2289–2294 (2009).

    CAS  Article  Google Scholar 

  21. 21.

    Newman, S. et al. High density DNA data storage library via dehydration with digital microfluidic retrieval. Nat. Commun. 10, 1706 (2019).

    CAS  Article  Google Scholar 

  22. 22.

    Lin, K. N., Volkel, K., Tuck, J. M. & Keung, A. J. Dynamic and scalable DNA-based information storage. Nat. Commun. 11, 2981 (2020).

    CAS  Article  Google Scholar 

  23. 23.

    Paunescu, D., Puddu, M., Soellner, J. O. B., Stoessel, P. R. & Grass, R. N. Reversible DNA encapsulation in silica to produce ROS-resistant and heat-resistant synthetic DNA ‘fossils’. Nat. Protoc. 8, 2440–2448 (2013).

    CAS  Article  Google Scholar 

  24. 24.

    Paunescu, D., Fuhrer, R. & Grass, R. N. Protection and deprotection of DNA—high-temperature stability of nucleic acid barcodes for polymer labeling. Angew. Chem. Int. Ed. 52, 4269–4272 (2013).

    CAS  Article  Google Scholar 

  25. 25.

    Farzadfard, F. et al. Single-nucleotide-resolution computing and memory in living cells. Mol. Cell 75, 769–780.E4 (2019).

    CAS  Article  Google Scholar 

  26. 26.

    Farzadfard, F. & Lu, T. K. Genomically encoded analog memory with precise in vivo DNA writing in living cell populations. Science 346, 1256272 (2014).

    Article  CAS  Google Scholar 

  27. 27.

    Farzadfard, F. & Lu, T. K. Emerging applications for DNA writers and molecular recorders. Science 361, 870–875 (2018).

    CAS  Article  Google Scholar 

  28. 28.

    Nguyen, H. H. et al. Long-term stability and integrity of plasmid-based DNA data storage. Polymers 10, 28 (2018).

    Article  CAS  Google Scholar 

  29. 29.

    Plesa, C., Sidore, A. M., Lubock, N. B., Zhang, D. & Kosuri, S. Multiplexed gene synthesis in emulsions for exploring protein functional landscapes. Science 359, 343–347 (2018).

    CAS  Article  Google Scholar 

  30. 30.

    Shepherd, T. R., Du, R. R., Huang, H., Wamhoff, E.-C. & Bathe, M. Bioproduction of pure, kilobase-scale single-stranded DNA. Sci. Rep. 9, 6121 (2019).

    Article  CAS  Google Scholar 

  31. 31.

    Veneziano, R. et al. In vitro synthesis of gene-length single-stranded DNA. Sci. Rep. 8, 6548 (2018).

    Article  CAS  Google Scholar 

  32. 32.

    Minev, D. et al. Rapid in vitro production of single-stranded DNA. Nucleic Acids Res. 47, 11956–11962 (2019).

    CAS  Google Scholar 

  33. 33.

    Reif, J. H. et al. Experimental construction of very large scale DNA databases with associative search capability. In 7th International Workshop on DNA-Based Computers (DNA7) (eds Jonoska, N. & Seeman, N. C.) 231–247 (Springer, 2002).

  34. 34.

    Chen, W. D. et al. Combining data longevity with high storage capacity—layer-by-layer DNA encapsulated in magnetic nanoparticles. Adv. Funct. Mater. 29, 1901672 (2019).

    Article  CAS  Google Scholar 

  35. 35.

    Pillai, P. P., Reisewitz, S., Schroeder, H. & Niemeyer, C. M. Quantum-dot-encoded silica nanospheres for nucleic acid hybridization. Small 6, 2130–2134 (2010).

    CAS  Article  Google Scholar 

  36. 36.

    Leidner, A. et al. Biopebbles: DNA-functionalized core–shell silica nanospheres for cellular uptake and cell guidance studies. Adv. Funct. Mater. 28, 1707572 (2018).

    Article  CAS  Google Scholar 

  37. 37.

    Sun, P. et al. Biopebble containers: DNA-directed surface assembly of mesoporous silica nanoparticles for cell studies. Small 15, 1900083 (2019).

    Article  CAS  Google Scholar 

  38. 38.

    Perfetto, S. P., Chattopadhyay, P. K. & Roederer, M. Seventeen-colour flow cytometry: unravelling the immune system. Nat. Rev. Immunol. 4, 648–655 (2004).

    CAS  Article  Google Scholar 

  39. 39.

    Chattopadhyay, P. K. et al. Quantum dot semiconductor nanocrystals for immunophenotyping by polychromatic flow cytometry. Nat. Med. 12, 972–977 (2006).

    CAS  Article  Google Scholar 

  40. 40.

    Fontana, R. E.Jr & Decad, G. M. Moore’s law realities for recording systems and memory storage components: HDD, tape, NAND, and optical. AIP Adv. 8, 056506 (2018).

    Article  Google Scholar 

  41. 41.

    Machado, A. H. E. et al. Encapsulation of DNA in macroscopic and nanosized calcium alginate gel particles. Langmuir 29, 15926–15935 (2013).

    CAS  Article  Google Scholar 

  42. 42.

    Zelikin, A. N. et al. A general approach for DNA encapsulation in degradable polymer microcapsules. ACS Nano 1, 63–69 (2007).

    CAS  Article  Google Scholar 

  43. 43.

    Hur, S. C., Tse, H. T. K. & Di Carlo, D. Sheathless inertial cell ordering for extreme throughput flow cytometry. Lab Chip 10, 274–280 (2010).

    CAS  Article  Google Scholar 

  44. 44.

    Lee, H., Kim, J., Kim, H., Kim, J. & Kwon, S. Colour-barcoded magnetic microparticles for multiplexed bioassays. Nat. Mater. 9, 745–749 (2010).

    CAS  Article  Google Scholar 

  45. 45.

    Stewart, K. et al. A content-addressable DNA database with learned sequence encodings. In 24th International Conference on DNA Computing and Molecular Programming (DNA 24) (eds Doty, D & Dietz, H.)55–70 (Springer, 2018).

  46. 46.

    Shieh, P. et al. Cleavable comonomers enable degradable, recyclable thermoset plastics. Nature 583, 542–547 (2020).

    CAS  Article  Google Scholar 

  47. 47.

    Kohll, A. X. et al. Stabilizing synthetic DNA for long-term data storage with earth alkaline salts. Chem. Commun. 56, 3613–3616 (2020).

    CAS  Article  Google Scholar 

  48. 48.

    Broekema, P. C., van Nieuwpoort, R. V. & Bal, H. E. In Proceedings of the 2012 Workshop on High-Performance Computing for Astronomy Date 9–16 (Association for Computing Machinery, 2012).

  49. 49.

    Gaillard, M. & Pandolfi, S. CERN Data Centre passes the 200-petabyte milestone. CERN (2017).

  50. 50.

    Mayer, L. et al. The Nippon Foundation—GEBCO seabed 2030 project: the quest to see the world’s oceans completely mapped by 2030. Geosciences 8, 63 (2018).

    Article  Google Scholar 

  51. 51.

    Banal, J. L. et al., DNA-Memory-Blocks v.2.0 (Zenodo, 2021).

Download references


We gratefully acknowledge discussions with C. Leiserson and T. B. Schardl on the scalability and generalizability of our barcoding approach. We thank G. Paradis, M. Jennings and M. Griffin of the Flow Cytometry Core at the Koch Institute at the Massachusetts Institute of Technology (MIT) and P. Rogers of the Flow Cytometry Facility at the Broad Institute of Harvard and MIT for assistance and discussions in developing the flow cytometry workflow. We also thank D. Mankus of the Nanotechnology Materials Core Facility at the Koch Institute at MIT for assistance in the imaging of the particles using the scanning electron microscope and A. Leshinsky of the Biopolymer and Proteomics Core at the Koch Institute at MIT for assistance in mass spectrometry characterization. M.B., J.L.B., T.R.S. and J.B. gratefully acknowledge funding from the Office of Naval Research (N00014-17-1-2609, N00014-16-1-2506, N00014-12-1-0621 and N00014-18-1-2290) and the National Science Foundation (CCF-1564025, CCF-1956054, HDR OAC-1940231 and CBET-1729397). Research was sponsored by the US Army Research Office and accomplished under cooperative agreement W911NF-19-2-0026 for the Institute for Collaborative Biotechnologies. Additional funding to J.B. was provided through a National Science Foundation Graduate Research Fellowship (grant no. 1122374). P.C.B. was supported by a Career Award at the Scientific Interface from the Burroughs Wellcome Fund. C.M.A. was supported by National Institutes of Health grant F32CA236425.

Author information




J.L.B., T.R.S. and M.B. designed the file labelling and selection scheme. J.L.B., T.R.S. and C.M.A. implemented the file selection scheme using FAS. J.B. and T.R.S. developed the encoding scheme and metadata tagging of the images to DNA. T.R.S. designed the plasmid for encoding imaging. H.H. and T.R.S. performed the cloning, transformation and purification of the plasmids. J.L.B. synthesized and purified all the TAMRA- and AFDye-647-labelled DNA oligonucleotides. J.L.B. characterized the particles. J.L.B. developed the synthetic route to attach DNA barcodes on the surface of the particles. J.L.B. performed the encapsulation, barcoding, sorting, reverse encapsulation of the particles after sorting and desalting. T.R.S., H.H. and M.R. performed the sequencing. J.B. performed the computational validation of the orthogonality of the barcode sequences, and J.L.B. performed the experimental validation of the orthogonality of barcode and probe sequences. J.B. developed the computational workflow to analyse the sequencing data, including statistical analyses. M.B. conceived the file system and supervised the entire project. P.C.B. supervised the FAS selection and supervised the sequencing workflow. All authors analysed the data and equally contributed to the writing of the manuscript.

Corresponding author

Correspondence to Mark Bathe.

Ethics declarations

Competing interests

The Massachusetts Institute of Technology has filed patents covering the encapsulation-based file system (US application number 16/097594) and microfluidics-based storage, access and retrieval of biopolymers using the same file system (US application number 16/012583) on behalf of the inventors (J.L.B., T.R.S., J.B. and M.B.) M.B. is the founder of Cache DNA and is a member of its Scientific Advisory Board. P.C.B. is a member of the Scientific Advisory Board of Cache DNA. H.H., M.R. and C.M.A. declare no competing interests.

Additional information

Peer review information Nature Materials thanks Reinhard Heckel, William L. Hughes and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Sections 1–14, Figs. 1–27 and Tables 1–8.

Reporting Summary

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Banal, J.L., Shepherd, T.R., Berleant, J. et al. Random access DNA memory using Boolean search in an archival file storage system. Nat. Mater. (2021).

Download citation


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing