DNA is an ultrahigh-density storage medium that could meet exponentially growing worldwide demand for archival data storage if DNA synthesis costs declined sufficiently and if random access of files within exabyte-to-yottabyte-scale DNA data pools were feasible. Here, we demonstrate a path to overcome the second barrier by encapsulating data-encoding DNA file sequences within impervious silica capsules that are surface labelled with single-stranded DNA barcodes. Barcodes are chosen to represent file metadata, enabling selection of sets of files with Boolean logic directly, without use of amplification. We demonstrate random access of image files from a prototypical 2-kilobyte image database using fluorescence sorting with selection sensitivity of one in 106 files, which thereby enables one in 106N selection capability using N optical channels. Our strategy thereby offers a scalable concept for random access of archival files in large-scale molecular datasets.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Communications Chemistry Open Access 18 October 2023
Nature Communications Open Access 13 October 2023
Nature Communications Open Access 03 July 2023
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Rent or buy this article
Prices vary by article type
Prices may be subject to local taxes which are calculated during checkout
Software for sequence encoding and decoding is publicly available on GitHub (https://github.com/lcbb/DNA-Memory-Blocks/).
Zhirnov, V., Zadegan, R. M., Sandhu, G. S., Church, G. M. & Hughes, W. L. Nucleic acid memory. Nat. Mater. 15, 366–370 (2016).
Ceze, L., Nivala, J. & Strauss, K. Molecular digital data storage using DNA. Nat. Rev. Genet. 20, 456–466 (2019).
Kosuri, S. & Church, G. M. Large-scale de novo DNA synthesis: technologies and applications. Nat. Methods 11, 499–507 (2014).
Palluk, S. et al. De novo DNA synthesis using polymerase-nucleotide conjugates. Nat. Biotechnol. 36, 645–650 (2018).
Lee, H. H., Kalhor, R., Goela, N., Bolot, J. & Church, G. M. Terminator-free template-independent enzymatic DNA synthesis for digital information storage. Nat. Commun. 10, 2383 (2019).
Church, G. M., Gao, Y. & Kosuri, S. Next-generation digital information storage in DNA. Science 337, 1628–1628 (2012).
Goldman, N. et al. Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature 494, 77–80 (2013).
Yazdi, S. M. H. T., Yuan, Y., Ma, J., Zhao, H. & Milenkovic, O. A rewritable, random-access DNA-based storage system. Sci. Rep. 5, 14138 (2015).
Grass, R. N., Heckel, R., Puddu, M., Paunescu, D. & Stark, W. J. Robust chemical preservation of digital information on DNA in silica with error-correcting codes. Angew. Chem. Int. Ed. 54, 2552–2555 (2015).
Yazdi, S. M. H. T., Gabrys, R. & Milenkovic, O. Portable and error-free DNA-based data storage. Sci. Rep. 7, 5011 (2017).
Erlich, Y. & Zielinski, D. DNA Fountain enables a robust and efficient storage architecture. Science 355, 950–954 (2017).
Organick, L. et al. Random access in large-scale DNA data storage. Nat. Biotechnol. 36, 242–248 (2018).
Ranu, N., Villani, A.-C., Hacohen, N. & Blainey, P. C. Targeting individual cells by barcode in pooled sequence libraries. Nucleic Acids Res. 47, e4 (2018).
Kashiwamura, S., Yamamoto, M., Kameda, A., Shiba, T. & Ohuchi, A. Hierarchical DNA memory based on nested PCR. In 8th International Workshop on DNA-Based Computers (DNA8) (eds Hagiya, M. & Ohuchi, A.) 112–123 (Springer, 2003).
Yamamoto, M., Kashiwamura, S., Ohuchi, A. & Furukawa, M. Large-scale DNA memory based on the nested PCR. Nat. Comput. 7, 335–346 (2008).
Yamamoto, M., Kashiwamura, S. & Ohuchi, A. DNA memory with 16.8M addresses. In 13th International Meeting on DNA Computing (DNA13) (eds Garzon, M. H. & Yan, H.) 99–108 (Springer, 2008).
Tomek, K. J. et al. Driving the scalability of DNA-based information storage systems. ACS Synth. Biol. 8, 1241–1248 (2019).
Organick, L. et al. Probing the physical limits of reliable DNA data retrieval. Nat. Commun. 11, 616 (2020).
Chen, Y.-J. et al. Quantifying molecular bias in DNA data storage. Nat. Commun. 11, 3264 (2020).
Xu, Q., Schlabach, M. R., Hannon, G. J. & Elledge, S. J. Design of 240,000 orthogonal 25mer DNA barcode probes. Proc. Natl Acad. Sci. USA 106, 2289–2294 (2009).
Newman, S. et al. High density DNA data storage library via dehydration with digital microfluidic retrieval. Nat. Commun. 10, 1706 (2019).
Lin, K. N., Volkel, K., Tuck, J. M. & Keung, A. J. Dynamic and scalable DNA-based information storage. Nat. Commun. 11, 2981 (2020).
Paunescu, D., Puddu, M., Soellner, J. O. B., Stoessel, P. R. & Grass, R. N. Reversible DNA encapsulation in silica to produce ROS-resistant and heat-resistant synthetic DNA ‘fossils’. Nat. Protoc. 8, 2440–2448 (2013).
Paunescu, D., Fuhrer, R. & Grass, R. N. Protection and deprotection of DNA—high-temperature stability of nucleic acid barcodes for polymer labeling. Angew. Chem. Int. Ed. 52, 4269–4272 (2013).
Farzadfard, F. et al. Single-nucleotide-resolution computing and memory in living cells. Mol. Cell 75, 769–780.E4 (2019).
Farzadfard, F. & Lu, T. K. Genomically encoded analog memory with precise in vivo DNA writing in living cell populations. Science 346, 1256272 (2014).
Farzadfard, F. & Lu, T. K. Emerging applications for DNA writers and molecular recorders. Science 361, 870–875 (2018).
Nguyen, H. H. et al. Long-term stability and integrity of plasmid-based DNA data storage. Polymers 10, 28 (2018).
Plesa, C., Sidore, A. M., Lubock, N. B., Zhang, D. & Kosuri, S. Multiplexed gene synthesis in emulsions for exploring protein functional landscapes. Science 359, 343–347 (2018).
Shepherd, T. R., Du, R. R., Huang, H., Wamhoff, E.-C. & Bathe, M. Bioproduction of pure, kilobase-scale single-stranded DNA. Sci. Rep. 9, 6121 (2019).
Veneziano, R. et al. In vitro synthesis of gene-length single-stranded DNA. Sci. Rep. 8, 6548 (2018).
Minev, D. et al. Rapid in vitro production of single-stranded DNA. Nucleic Acids Res. 47, 11956–11962 (2019).
Reif, J. H. et al. Experimental construction of very large scale DNA databases with associative search capability. In 7th International Workshop on DNA-Based Computers (DNA7) (eds Jonoska, N. & Seeman, N. C.) 231–247 (Springer, 2002).
Chen, W. D. et al. Combining data longevity with high storage capacity—layer-by-layer DNA encapsulated in magnetic nanoparticles. Adv. Funct. Mater. 29, 1901672 (2019).
Pillai, P. P., Reisewitz, S., Schroeder, H. & Niemeyer, C. M. Quantum-dot-encoded silica nanospheres for nucleic acid hybridization. Small 6, 2130–2134 (2010).
Leidner, A. et al. Biopebbles: DNA-functionalized core–shell silica nanospheres for cellular uptake and cell guidance studies. Adv. Funct. Mater. 28, 1707572 (2018).
Sun, P. et al. Biopebble containers: DNA-directed surface assembly of mesoporous silica nanoparticles for cell studies. Small 15, 1900083 (2019).
Perfetto, S. P., Chattopadhyay, P. K. & Roederer, M. Seventeen-colour flow cytometry: unravelling the immune system. Nat. Rev. Immunol. 4, 648–655 (2004).
Chattopadhyay, P. K. et al. Quantum dot semiconductor nanocrystals for immunophenotyping by polychromatic flow cytometry. Nat. Med. 12, 972–977 (2006).
Fontana, R. E.Jr & Decad, G. M. Moore’s law realities for recording systems and memory storage components: HDD, tape, NAND, and optical. AIP Adv. 8, 056506 (2018).
Machado, A. H. E. et al. Encapsulation of DNA in macroscopic and nanosized calcium alginate gel particles. Langmuir 29, 15926–15935 (2013).
Zelikin, A. N. et al. A general approach for DNA encapsulation in degradable polymer microcapsules. ACS Nano 1, 63–69 (2007).
Hur, S. C., Tse, H. T. K. & Di Carlo, D. Sheathless inertial cell ordering for extreme throughput flow cytometry. Lab Chip 10, 274–280 (2010).
Lee, H., Kim, J., Kim, H., Kim, J. & Kwon, S. Colour-barcoded magnetic microparticles for multiplexed bioassays. Nat. Mater. 9, 745–749 (2010).
Stewart, K. et al. A content-addressable DNA database with learned sequence encodings. In 24th International Conference on DNA Computing and Molecular Programming (DNA 24) (eds Doty, D & Dietz, H.)55–70 (Springer, 2018).
Shieh, P. et al. Cleavable comonomers enable degradable, recyclable thermoset plastics. Nature 583, 542–547 (2020).
Kohll, A. X. et al. Stabilizing synthetic DNA for long-term data storage with earth alkaline salts. Chem. Commun. 56, 3613–3616 (2020).
Broekema, P. C., van Nieuwpoort, R. V. & Bal, H. E. In Proceedings of the 2012 Workshop on High-Performance Computing for Astronomy Date 9–16 (Association for Computing Machinery, 2012).
Gaillard, M. & Pandolfi, S. CERN Data Centre passes the 200-petabyte milestone. CERN https://cds.cern.ch/record/2276551 (2017).
Mayer, L. et al. The Nippon Foundation—GEBCO seabed 2030 project: the quest to see the world’s oceans completely mapped by 2030. Geosciences 8, 63 (2018).
Banal, J. L. et al., DNA-Memory-Blocks v.2.0 https://doi.org/10.5281/zenodo.4586900 (Zenodo, 2021).
We gratefully acknowledge discussions with C. Leiserson and T. B. Schardl on the scalability and generalizability of our barcoding approach. We thank G. Paradis, M. Jennings and M. Griffin of the Flow Cytometry Core at the Koch Institute at the Massachusetts Institute of Technology (MIT) and P. Rogers of the Flow Cytometry Facility at the Broad Institute of Harvard and MIT for assistance and discussions in developing the flow cytometry workflow. We also thank D. Mankus of the Nanotechnology Materials Core Facility at the Koch Institute at MIT for assistance in the imaging of the particles using the scanning electron microscope and A. Leshinsky of the Biopolymer and Proteomics Core at the Koch Institute at MIT for assistance in mass spectrometry characterization. M.B., J.L.B., T.R.S. and J.B. gratefully acknowledge funding from the Office of Naval Research (N00014-17-1-2609, N00014-16-1-2506, N00014-12-1-0621 and N00014-18-1-2290) and the National Science Foundation (CCF-1564025, CCF-1956054, HDR OAC-1940231 and CBET-1729397). Research was sponsored by the US Army Research Office and accomplished under cooperative agreement W911NF-19-2-0026 for the Institute for Collaborative Biotechnologies. Additional funding to J.B. was provided through a National Science Foundation Graduate Research Fellowship (grant no. 1122374). P.C.B. was supported by a Career Award at the Scientific Interface from the Burroughs Wellcome Fund. C.M.A. was supported by National Institutes of Health grant F32CA236425.
The Massachusetts Institute of Technology has filed patents covering the encapsulation-based file system (US application number 16/097594) and microfluidics-based storage, access and retrieval of biopolymers using the same file system (US application number 16/012583) on behalf of the inventors (J.L.B., T.R.S., J.B. and M.B.) M.B. is the founder of Cache DNA and is a member of its Scientific Advisory Board. P.C.B. is a member of the Scientific Advisory Board of Cache DNA. H.H., M.R. and C.M.A. declare no competing interests.
Peer review information Nature Materials thanks Reinhard Heckel, William L. Hughes and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Banal, J.L., Shepherd, T.R., Berleant, J. et al. Random access DNA memory using Boolean search in an archival file storage system. Nat. Mater. 20, 1272–1280 (2021). https://doi.org/10.1038/s41563-021-01021-3
This article is cited by
BMC Bioinformatics (2023)
Nature Nanotechnology (2023)
Scientific Reports (2023)
Scientific Reports (2023)
DNA-Aeon provides flexible arithmetic coding for constraint adherence and error correction in DNA storage
Nature Communications (2023)