Abstract
DNA is an ultrahigh-density storage medium that could meet exponentially growing worldwide demand for archival data storage if DNA synthesis costs declined sufficiently and if random access of files within exabyte-to-yottabyte-scale DNA data pools were feasible. Here, we demonstrate a path to overcome the second barrier by encapsulating data-encoding DNA file sequences within impervious silica capsules that are surface labelled with single-stranded DNA barcodes. Barcodes are chosen to represent file metadata, enabling selection of sets of files with Boolean logic directly, without use of amplification. We demonstrate random access of image files from a prototypical 2-kilobyte image database using fluorescence sorting with selection sensitivity of one in 106 files, which thereby enables one in 106N selection capability using N optical channels. Our strategy thereby offers a scalable concept for random access of archival files in large-scale molecular datasets.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
Gene sequences and plasmid maps are available from AddGene (https://www.addgene.org/browse/article/28206796/). Insert sequences and barcoding sequences are given in Supplementary Tables 1 and 2. All the data files used to generate the plots in this manuscript are available from M.B. upon request.
Code availability
Software for sequence encoding and decoding is publicly available on GitHub (https://github.com/lcbb/DNA-Memory-Blocks/).
References
Zhirnov, V., Zadegan, R. M., Sandhu, G. S., Church, G. M. & Hughes, W. L. Nucleic acid memory. Nat. Mater. 15, 366–370 (2016).
Ceze, L., Nivala, J. & Strauss, K. Molecular digital data storage using DNA. Nat. Rev. Genet. 20, 456–466 (2019).
Kosuri, S. & Church, G. M. Large-scale de novo DNA synthesis: technologies and applications. Nat. Methods 11, 499–507 (2014).
Palluk, S. et al. De novo DNA synthesis using polymerase-nucleotide conjugates. Nat. Biotechnol. 36, 645–650 (2018).
Lee, H. H., Kalhor, R., Goela, N., Bolot, J. & Church, G. M. Terminator-free template-independent enzymatic DNA synthesis for digital information storage. Nat. Commun. 10, 2383 (2019).
Church, G. M., Gao, Y. & Kosuri, S. Next-generation digital information storage in DNA. Science 337, 1628–1628 (2012).
Goldman, N. et al. Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature 494, 77–80 (2013).
Yazdi, S. M. H. T., Yuan, Y., Ma, J., Zhao, H. & Milenkovic, O. A rewritable, random-access DNA-based storage system. Sci. Rep. 5, 14138 (2015).
Grass, R. N., Heckel, R., Puddu, M., Paunescu, D. & Stark, W. J. Robust chemical preservation of digital information on DNA in silica with error-correcting codes. Angew. Chem. Int. Ed. 54, 2552–2555 (2015).
Yazdi, S. M. H. T., Gabrys, R. & Milenkovic, O. Portable and error-free DNA-based data storage. Sci. Rep. 7, 5011 (2017).
Erlich, Y. & Zielinski, D. DNA Fountain enables a robust and efficient storage architecture. Science 355, 950–954 (2017).
Organick, L. et al. Random access in large-scale DNA data storage. Nat. Biotechnol. 36, 242–248 (2018).
Ranu, N., Villani, A.-C., Hacohen, N. & Blainey, P. C. Targeting individual cells by barcode in pooled sequence libraries. Nucleic Acids Res. 47, e4 (2018).
Kashiwamura, S., Yamamoto, M., Kameda, A., Shiba, T. & Ohuchi, A. Hierarchical DNA memory based on nested PCR. In 8th International Workshop on DNA-Based Computers (DNA8) (eds Hagiya, M. & Ohuchi, A.) 112–123 (Springer, 2003).
Yamamoto, M., Kashiwamura, S., Ohuchi, A. & Furukawa, M. Large-scale DNA memory based on the nested PCR. Nat. Comput. 7, 335–346 (2008).
Yamamoto, M., Kashiwamura, S. & Ohuchi, A. DNA memory with 16.8M addresses. In 13th International Meeting on DNA Computing (DNA13) (eds Garzon, M. H. & Yan, H.) 99–108 (Springer, 2008).
Tomek, K. J. et al. Driving the scalability of DNA-based information storage systems. ACS Synth. Biol. 8, 1241–1248 (2019).
Organick, L. et al. Probing the physical limits of reliable DNA data retrieval. Nat. Commun. 11, 616 (2020).
Chen, Y.-J. et al. Quantifying molecular bias in DNA data storage. Nat. Commun. 11, 3264 (2020).
Xu, Q., Schlabach, M. R., Hannon, G. J. & Elledge, S. J. Design of 240,000 orthogonal 25mer DNA barcode probes. Proc. Natl Acad. Sci. USA 106, 2289–2294 (2009).
Newman, S. et al. High density DNA data storage library via dehydration with digital microfluidic retrieval. Nat. Commun. 10, 1706 (2019).
Lin, K. N., Volkel, K., Tuck, J. M. & Keung, A. J. Dynamic and scalable DNA-based information storage. Nat. Commun. 11, 2981 (2020).
Paunescu, D., Puddu, M., Soellner, J. O. B., Stoessel, P. R. & Grass, R. N. Reversible DNA encapsulation in silica to produce ROS-resistant and heat-resistant synthetic DNA ‘fossils’. Nat. Protoc. 8, 2440–2448 (2013).
Paunescu, D., Fuhrer, R. & Grass, R. N. Protection and deprotection of DNA—high-temperature stability of nucleic acid barcodes for polymer labeling. Angew. Chem. Int. Ed. 52, 4269–4272 (2013).
Farzadfard, F. et al. Single-nucleotide-resolution computing and memory in living cells. Mol. Cell 75, 769–780.E4 (2019).
Farzadfard, F. & Lu, T. K. Genomically encoded analog memory with precise in vivo DNA writing in living cell populations. Science 346, 1256272 (2014).
Farzadfard, F. & Lu, T. K. Emerging applications for DNA writers and molecular recorders. Science 361, 870–875 (2018).
Nguyen, H. H. et al. Long-term stability and integrity of plasmid-based DNA data storage. Polymers 10, 28 (2018).
Plesa, C., Sidore, A. M., Lubock, N. B., Zhang, D. & Kosuri, S. Multiplexed gene synthesis in emulsions for exploring protein functional landscapes. Science 359, 343–347 (2018).
Shepherd, T. R., Du, R. R., Huang, H., Wamhoff, E.-C. & Bathe, M. Bioproduction of pure, kilobase-scale single-stranded DNA. Sci. Rep. 9, 6121 (2019).
Veneziano, R. et al. In vitro synthesis of gene-length single-stranded DNA. Sci. Rep. 8, 6548 (2018).
Minev, D. et al. Rapid in vitro production of single-stranded DNA. Nucleic Acids Res. 47, 11956–11962 (2019).
Reif, J. H. et al. Experimental construction of very large scale DNA databases with associative search capability. In 7th International Workshop on DNA-Based Computers (DNA7) (eds Jonoska, N. & Seeman, N. C.) 231–247 (Springer, 2002).
Chen, W. D. et al. Combining data longevity with high storage capacity—layer-by-layer DNA encapsulated in magnetic nanoparticles. Adv. Funct. Mater. 29, 1901672 (2019).
Pillai, P. P., Reisewitz, S., Schroeder, H. & Niemeyer, C. M. Quantum-dot-encoded silica nanospheres for nucleic acid hybridization. Small 6, 2130–2134 (2010).
Leidner, A. et al. Biopebbles: DNA-functionalized core–shell silica nanospheres for cellular uptake and cell guidance studies. Adv. Funct. Mater. 28, 1707572 (2018).
Sun, P. et al. Biopebble containers: DNA-directed surface assembly of mesoporous silica nanoparticles for cell studies. Small 15, 1900083 (2019).
Perfetto, S. P., Chattopadhyay, P. K. & Roederer, M. Seventeen-colour flow cytometry: unravelling the immune system. Nat. Rev. Immunol. 4, 648–655 (2004).
Chattopadhyay, P. K. et al. Quantum dot semiconductor nanocrystals for immunophenotyping by polychromatic flow cytometry. Nat. Med. 12, 972–977 (2006).
Fontana, R. E.Jr & Decad, G. M. Moore’s law realities for recording systems and memory storage components: HDD, tape, NAND, and optical. AIP Adv. 8, 056506 (2018).
Machado, A. H. E. et al. Encapsulation of DNA in macroscopic and nanosized calcium alginate gel particles. Langmuir 29, 15926–15935 (2013).
Zelikin, A. N. et al. A general approach for DNA encapsulation in degradable polymer microcapsules. ACS Nano 1, 63–69 (2007).
Hur, S. C., Tse, H. T. K. & Di Carlo, D. Sheathless inertial cell ordering for extreme throughput flow cytometry. Lab Chip 10, 274–280 (2010).
Lee, H., Kim, J., Kim, H., Kim, J. & Kwon, S. Colour-barcoded magnetic microparticles for multiplexed bioassays. Nat. Mater. 9, 745–749 (2010).
Stewart, K. et al. A content-addressable DNA database with learned sequence encodings. In 24th International Conference on DNA Computing and Molecular Programming (DNA 24) (eds Doty, D & Dietz, H.)55–70 (Springer, 2018).
Shieh, P. et al. Cleavable comonomers enable degradable, recyclable thermoset plastics. Nature 583, 542–547 (2020).
Kohll, A. X. et al. Stabilizing synthetic DNA for long-term data storage with earth alkaline salts. Chem. Commun. 56, 3613–3616 (2020).
Broekema, P. C., van Nieuwpoort, R. V. & Bal, H. E. In Proceedings of the 2012 Workshop on High-Performance Computing for Astronomy Date 9–16 (Association for Computing Machinery, 2012).
Gaillard, M. & Pandolfi, S. CERN Data Centre passes the 200-petabyte milestone. CERN https://cds.cern.ch/record/2276551 (2017).
Mayer, L. et al. The Nippon Foundation—GEBCO seabed 2030 project: the quest to see the world’s oceans completely mapped by 2030. Geosciences 8, 63 (2018).
Banal, J. L. et al., DNA-Memory-Blocks v.2.0 https://doi.org/10.5281/zenodo.4586900 (Zenodo, 2021).
Acknowledgements
We gratefully acknowledge discussions with C. Leiserson and T. B. Schardl on the scalability and generalizability of our barcoding approach. We thank G. Paradis, M. Jennings and M. Griffin of the Flow Cytometry Core at the Koch Institute at the Massachusetts Institute of Technology (MIT) and P. Rogers of the Flow Cytometry Facility at the Broad Institute of Harvard and MIT for assistance and discussions in developing the flow cytometry workflow. We also thank D. Mankus of the Nanotechnology Materials Core Facility at the Koch Institute at MIT for assistance in the imaging of the particles using the scanning electron microscope and A. Leshinsky of the Biopolymer and Proteomics Core at the Koch Institute at MIT for assistance in mass spectrometry characterization. M.B., J.L.B., T.R.S. and J.B. gratefully acknowledge funding from the Office of Naval Research (N00014-17-1-2609, N00014-16-1-2506, N00014-12-1-0621 and N00014-18-1-2290) and the National Science Foundation (CCF-1564025, CCF-1956054, HDR OAC-1940231 and CBET-1729397). Research was sponsored by the US Army Research Office and accomplished under cooperative agreement W911NF-19-2-0026 for the Institute for Collaborative Biotechnologies. Additional funding to J.B. was provided through a National Science Foundation Graduate Research Fellowship (grant no. 1122374). P.C.B. was supported by a Career Award at the Scientific Interface from the Burroughs Wellcome Fund. C.M.A. was supported by National Institutes of Health grant F32CA236425.
Author information
Authors and Affiliations
Contributions
J.L.B., T.R.S. and M.B. designed the file labelling and selection scheme. J.L.B., T.R.S. and C.M.A. implemented the file selection scheme using FAS. J.B. and T.R.S. developed the encoding scheme and metadata tagging of the images to DNA. T.R.S. designed the plasmid for encoding imaging. H.H. and T.R.S. performed the cloning, transformation and purification of the plasmids. J.L.B. synthesized and purified all the TAMRA- and AFDye-647-labelled DNA oligonucleotides. J.L.B. characterized the particles. J.L.B. developed the synthetic route to attach DNA barcodes on the surface of the particles. J.L.B. performed the encapsulation, barcoding, sorting, reverse encapsulation of the particles after sorting and desalting. T.R.S., H.H. and M.R. performed the sequencing. J.B. performed the computational validation of the orthogonality of the barcode sequences, and J.L.B. performed the experimental validation of the orthogonality of barcode and probe sequences. J.B. developed the computational workflow to analyse the sequencing data, including statistical analyses. M.B. conceived the file system and supervised the entire project. P.C.B. supervised the FAS selection and supervised the sequencing workflow. All authors analysed the data and equally contributed to the writing of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The Massachusetts Institute of Technology has filed patents covering the encapsulation-based file system (US application number 16/097594) and microfluidics-based storage, access and retrieval of biopolymers using the same file system (US application number 16/012583) on behalf of the inventors (J.L.B., T.R.S., J.B. and M.B.) M.B. is the founder of Cache DNA and is a member of its Scientific Advisory Board. P.C.B. is a member of the Scientific Advisory Board of Cache DNA. H.H., M.R. and C.M.A. declare no competing interests.
Additional information
Peer review information Nature Materials thanks Reinhard Heckel, William L. Hughes and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Sections 1–14, Figs. 1–27 and Tables 1–8.
Rights and permissions
About this article
Cite this article
Banal, J.L., Shepherd, T.R., Berleant, J. et al. Random access DNA memory using Boolean search in an archival file storage system. Nat. Mater. 20, 1272–1280 (2021). https://doi.org/10.1038/s41563-021-01021-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41563-021-01021-3
This article is cited by
-
DNA as a universal chemical substrate for computing and data storage
Nature Reviews Chemistry (2024)
-
Modelling for Efficient Scientific Data Storage Using Simple Graphs in DNA
SN Computer Science (2024)
-
Recent progress in DNA data storage based on high-throughput DNA synthesis
Biomedical Engineering Letters (2024)
-
In-vitro validated methods for encoding digital data in deoxyribonucleic acid (DNA)
BMC Bioinformatics (2023)
-
DNA storage in thermoresponsive microcapsules for repeated random multiplexed data access
Nature Nanotechnology (2023)