Abstract
DNA is an excellent medium for archiving data. Recent efforts have illustrated the potential for information storage in DNA using synthesized oligonucleotides assembled in vitro1,2,3,4,5,6. A relatively unexplored avenue of information storage in DNA is the ability to write information into the genome of a living cell by the addition of nucleotides over time. Using the Cas1–Cas2 integrase, the CRISPR–Cas microbial immune system stores the nucleotide content of invading viruses to confer adaptive immunity7. When harnessed, this system has the potential to write arbitrary information into the genome8. Here we use the CRISPR–Cas system to encode the pixel values of black and white images and a short movie into the genomes of a population of living bacteria. In doing so, we push the technical limits of this information storage system and optimize strategies to minimize those limitations. We also uncover underlying principles of the CRISPR–Cas adaptation system, including sequence determinants of spacer acquisition that are relevant for understanding both the basic biology of bacterial adaptation and its technological applications. This work demonstrates that this system can capture and stably store practical amounts of real data within the genomes of populations of living cells.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Church, G. M., Gao, Y. & Kosuri, S. Next-generation digital information storage in DNA. Science 337, 1628 (2012)
Goldman, N. et al. Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature 494, 77–80 (2013)
Gibson, D. G. et al. Creation of a bacterial cell controlled by a chemically synthesized genome. Science 329, 52–56 (2010)
Clelland, C. T., Risca, V. & Bancroft, C. Hiding messages in DNA microdots. Nature 399, 533–534 (1999)
Adleman, L. M. Molecular computation of solutions to combinatorial problems. Science 266, 1021–1024 (1994)
Davis, J. Microvenus. Art J. 55, 70–74 (1996)
Barrangou, R. et al. CRISPR provides acquired resistance against viruses in prokaryotes. Science 315, 1709–1712 (2007)
Shipman, S. L ., Nivala, J ., Macklis, J. D. & Church, G. M. Molecular recordings by directed CRISPR spacer acquisition. Science 353, aaf1175 (2016)
Amitai, G. & Sorek, R. CRISPR–Cas adaptation: insights into the mechanism of action. Nat. Rev. Microbiol. 14, 67–76 (2016)
Sternberg, S. H., Richter, H., Charpentier, E. & Qimron, U. Adaptation in CRISPR–Cas Systems. Mol. Cell 61, 797–808 (2016)
van der Oost, J., Jore, M. M., Westra, E. R., Lundgren, M. & Brouns, S. J. CRISPR-based adaptive and heritable immunity in prokaryotes. Trends Biochem. Sci. 34, 401–407 (2009)
Deveau, H. et al. Phage response to CRISPR-encoded resistance in Streptococcus thermophilus. J. Bacteriol. 190, 1390–1400 (2008)
Paez-Espino, D. et al. Strong bias in the bacterial CRISPR elements that confer immunity to phage. Nat. Commun. 4, 1430 (2013)
Westra, E. R. et al. Type I-E CRISPR–Cas systems discriminate target from non-target DNA through base pairing-independent PAM recognition. PLoS Genet. 9, e1003742 (2013)
Shmakov, S. et al. Pervasive generation of oppositely oriented spacers during CRISPR adaptation. Nucleic Acids Res. 42, 5907–5916 (2014)
Nuñez, J. K., Harrington, L. B., Kranzusch, P. J., Engelman, A. N. & Doudna, J. A. Foreign DNA capture during CRISPR–Cas adaptive immunity. Nature 527, 535–538 (2015)
Wang, J. et al. Structural and mechanistic basis of PAM-dependent spacer acquisition in CRISPR–Cas systems. Cell 163, 840–853 (2015)
Yosef, I., Goren, M. G. & Qimron, U. Proteins and DNA elements essential for the CRISPR adaptation process in Escherichia coli. Nucleic Acids Res. 40, 5569–5576 (2012)
Diez-Villasenor, C., Almendros, C., Garcia-Martinez, J. & Mojica, F. J. Diversity of CRISPR loci in Escherichia coli. Microbiology 156, 1351–1361 (2010)
Weinberger, A. D. et al. Persisting viral sequences shape microbial CRISPR-based immunity. PLOS Comput. Biol. 8, e1002475 (2012)
Held, N. L., Herrera, A., Cadillo-Quiroz, H. & Whitaker, R. J. CRISPR associated diversity within a population of Sulfolobus islandicus. PLoS One 5, e12988 (2010)
Yosef, I. et al. DNA motifs determining the efficiency of adaptation into the Escherichia coli CRISPR array. Proc. Natl Acad. Sci. USA 110, 14396–14401 (2013)
Westra, E. R. et al. CRISPR immunity relies on the consecutive binding and degradation of negatively supercoiled invader DNA by cascade and Cas3. Mol. Cell 46, 595–605 (2012)
Semenova, E. et al. Interference by clustered regularly interspaced short palindromic repeat (CRISPR) RNA is governed by a seed sequence. Proc. Natl Acad. Sci. USA 108, 10098–10103 (2011)
Zhirnov, V., Zadegan, R. M., Sandhu, G. S., Church, G. M. & Hughes, W. L. Nucleic acid memory. Nat. Mater. 15, 366–370 (2016)
Hsiao, V., Hori, Y., Rothemund, P. W. & Murray, R. M. A population-based temporal logic gate for timing and recording chemical events. Mol. Syst. Biol. 12, 869 (2016)
McKenna, A . et al. Whole-organism lineage tracing by combinatorial and cumulative genome editing. Science 353, aaf7907 (2016)
Frieda, K. L. et al. Synthetic recording and in situ readout of lineage information in single cells. Nature 541, 107–111 (2017)
Erlich, Y. & Zielinski, D. DNA Fountain enables a robust and efficient storage architecture. Science 355, 950–954 (2017)
O’Shea, J. P. et al. pLogo: a probabilistic approach to visualizing sequence motifs. Nat. Methods 10, 1211–1212 (2013)
Acknowledgements
S.L.S. is a Shurl and Kay Curci Foundation Fellow of the Life Sciences Research Foundation. The project was supported by grants from the National Institute of Mental Health (5R01MH103910), National Human Genome Research Institute (5RM1HG008525), and Simons Foundation Autism Research Initiative (368485) to G.M.C., the National Institute of Neurological Disorders and Stroke (5R01NS045523) to J.D.M and an Allen Distinguished Investigator Award from the Paul G. Allen Frontiers Group to J.D.M. We thank G. Kuznetsov for comments on the manuscript.
Author information
Authors and Affiliations
Contributions
S.L.S. and J.N. conceived the study. S.L.S. designed the work, performed experiments, analysed data, wrote custom Python analysis software, and wrote the manuscript with input from J.N., J.D.M. and G.M.C. S.L.S. J.N., J.D.M. and G.M.C. discussed results and commented on the manuscript.
Corresponding author
Ethics declarations
Competing interests
S.L.S. J.N, J.D.M., and G.M.C. are inventors on a provisional patent (62/296,812) filed by the President and Fellows of Harvard College that covers the work in this manuscript. A complete accounting of the financial interests of G.M.C. is listed at: http://arep.med.harvard.edu/gmc/tech.html.
Additional information
Reviewer Information Nature thanks R. Barrangou and the other anonymous reviewer(s) for their contribution to the peer review of this work.
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Figure 1 Recording images into the genome.
a, Pixel values are encoded across many protospacers, which are electroporated into a population of bacteria that overexpress Cas1 and Cas2 to store the image data. These bacteria can be archived, propagated, and eventually sequenced to recall the image. b, Initial image to be encoded. c, Nucleotide-to-colour encoding scheme. d, Example of the encoding scheme. Sequence at top shows the protospacer linear view with pixet code (specifying a pixel set) followed by pixel values, which are distributed across the image. Pixet number is shown under the pixet nucleotides, with the binary-converted pixet and binary-to-nucleotide conversion reference below that. Small numbers (in colour) below the protospacer indicate individual pixels, identified by boxes on the image. Protospacer in minimal hairpin format for electroporation is shown on the right. e, Results of one replicate at a depth of 655,360 reads. White is shown if no information was recovered about the pixel value (owing to a pixet protospacer not being recovered after sequencing). f, Percentage of accurately recalled pixets as a function of read depth. Unfilled circles indicate points derived from 3 biological replicates. The black line is the mean of the replicates. g, Examples of the images that result from down-sampling the sequencing reads. h, Effect of supplying fewer oligonucleotides on recall accuracy as a function of reads sampled when smaller pools of oligonucleotides are supplied and recalled. Individual points show 3 biological replicates, lines are the means of the replicates. i, Number of reads required to reach 50%, 60%, 70%, and 80% accuracy on a given oligonucleotide set as a function of oligonucleotides supplied (n = 3; linear regression of the 80% curve, R2=0.9466; runs test of the 80% curve, P > 0.99). Additional statistical details in Supplementary Table 2.
Extended Data Figure 2 Testing a minimal hairpin protospacer.
a, Percentage of arrays expanded with oligonucleotide-supplied spacers following electroporation of the sequences indicated below, aimed at testing PAM inclusion on both the top and bottom strands. Unfilled circles indicate biological replicates, bars are mean ± s.e.m (n = 3; one-way ANOVA: P < 0.0001; follow-up Dunnett’s multiple comparison (corrected), no PAM versus full PAM: P = 0.0001, no PAM versus bottom PAM: P = 0.0002). *P < 0.05. Oligonucleotides supplied at 3.125 μM each. b, Percentage of arrays expanded with oligonucleotide-supplied spacers following electroporation of the sequences indicated to the left, right, and below aimed at finding a minimal functional hairpin protospacer. Unfilled circles indicate individual biological replicates, bars are mean ± s.e.m (n = 4; one-way ANOVA effect of protospacer: P > 0.05). Oligonucleotides supplied at 3.125 μM. c, Percentage of arrays expanded following electroporation of different concentrations of the minimal hairpin oligonucleotide protospacer (n = 1). Additional statistical details in Supplementary Table 2.
Extended Data Figure 3 Cell surviving electroporation.
Colony-forming units per millilitre of starting culture before beginning electroporation, after pre-electroporation washes, immediately post-electroporation, and after 1 h of recovery. Cells in red were electroporated with a minimal hairpin oligonucleotide, those in blue were electroporated in water alone. Unfilled circles represent individual biological replicates (n = 3), filled circles are mean ± s.e.m.
Extended Data Figure 4 Optimization of protospacer sequence parameters.
a, Comparison of the percentage of arrays that were expanded after encoding handR and handF images (n = 3). b, Percentage of arrays expanded per oligonucleotide (single pool) or per subpool (subpooled) across a range of GC percentages. Unfilled black circles to the left represent individual oligonucleotide protospacer sequences (three biological replicates each), while black line shows mean ± s.e.m. Unfilled red circles to the right represent individual biological replicates. Bars are mean ± s.e.m (n = 3; one-way ANOVA on effect of GC percentage, single pool: P < 0.0001, subpooled P = 0.0011; follow-up testing with Tukey’s multiple comparison (corrected), see Supplementary Table 2). c, Percentage of arrays expanded per oligonucleotide electroporated individually across a range of GC percentages. Unfilled red circles are individual biological replicates. Bars show mean ± s.e.m (n = 3; one-way ANOVA on effect of GC percentage: P = 0.0001; follow-up testing with Tukey’s multiple comparison (corrected), see Supplementary Table 2). d, Gibbs free energy of minimal hairpin protospacers structures for each of the images, with protospacers ranked by overall acquisition frequency (n = 3; linear regression, handR: P = 0.0089, handF: P = 0.0004). e, Percentage of arrays expanded per oligonucleotide (single pool) or per subpool (subpooled) with different numbers of mononucleotide repeats (n = 3; one-way ANOVA on effect of mononucleotide repeats, single pool: P = 0.3843, subpooled: P = 0.0015; follow-up testing with Tukey’s multiple comparison (corrected), see Supplementary Table 2). Panel attributes as in b. f, Percentage of arrays expanded per oligonucleotide (single pool) or per subpool (subpooled) with different numbers of internal PAMs (n = 3; one-way ANOVA on effect of internal PAMs, single pool: P = 0.0565, subpooled: P = 0.0052; follow-up testing with Tukey’s multiple comparison (corrected), see Supplementary Table 2). Panel attributes as in b. *P < 0.05. Additional statistical details in Supplementary Table 2.
Extended Data Figure 5 Effect of the 3′ motif on protospacer acquisition when supplied as two complementary oligonucleotides.
Individual sequences designed to directly test the motif identified in Fig. 2b shown to the left. To the right, percentage of arrays expanded following electroporation of the sequences indicated as two complementary oligonucleotides (in dark red), rather than a minimal oligonucleotide hairpin (shown for comparison in pink). Unfilled circles indicate individual biological replicates. Bars show mean ± s.e.m. (n = 3; one-way ANOVA on effect of oligonucleotide: P = 0.0041; follow-up testing with Sidak’s multiple comparison (corrected), seqover versus seqover-CCT: P = 0.0103, sequnder versus sequnder-TGA: P = 0.0081). *P < 0.05. Additional statistical details in Supplementary Table 2.
Extended Data Figure 6 Recall of frame order over time based on position in the CRISPR array.
a, Initial set of rules to test the order of spacers within a pixet. Every time two spacers from the same pixet are found in a single array, their relative physical location (with respect to the leader) is extracted. As is the location of each spacer relative to spacers drawn from the genome or plasmid (G/P). The actual sequence of electroporated protospacers should occupy arrays in a predictable physical arrangement, as described by these ordering rules. Every possible permutation of spacers within a pixet is tested against each of these rules and, if a permutation satisfies all the rules, spacers are assigned to frame. b, Second set of tests to compare between pixets. If no permutation satisfies all of the tests in a, spacers are compared to previously assigned spacers from other pixets pairwise when found in the same array. A larger set of rules will hold true for the actual sequence of electroporated protospacers when compared against previously assigned spacers. Again, all possible order permutations are tested, and order is assigned based on the best overall satisfaction of these ordering rules.
Extended Data Figure 7 Quantification of errors by source.
Includes any instance of a called spacer that does not match the supplied protospacer.
Extended Data Figure 8 Methods of image encoding for error-correction.
a–d, Method used in Fig. 1. a, Triplet code to flexibly specify 21 colours. b, Example of a pixet to be encoded into nucleotide space with pixel values marked. c, Rules specifying how the protospacer will be built. d, Example of the build of the protospacer. The AAG introduced by the addition of pixel 4 is unacceptable and invokes the flexible switch to another triplet. In a test of the extendibility of this encoding scheme, we ran three random sets of 100 million different nine-colour orderings through the sequence build and found that 99.86 ± 0.07% of colour orders were able to satisfy the requirements we set out without optimization by hand. e–i, Method of alternating clusters for error correction. e, Triplet assignment to clusters A, B, and X. f, Example of a pixet to be encoded into nucleotide space with pixel values marked. g, Rules for adding new triplets in this scheme. h, Example of the build of the protospacer. The AAG introduced by the addition of pixel 4 is unacceptable and invokes the flexible switch to cluster X. i, Example of an error signal. j–l, Method of checksum error correction. j, Annotation of protospacer with the addition of a checksum. k, Annotation of the checksum itself. l, Full protospacer with checksum implemented.
Supplementary information
Supplementary Information
This file contains Supplementary Notes and Supplementary Tables 1-2. (PDF 324 kb)
Supplementary Data
This file contains Supplementary Table 3. (XLSX 73 kb)
Rights and permissions
About this article
Cite this article
Shipman, S., Nivala, J., Macklis, J. et al. CRISPR–Cas encoding of a digital movie into the genomes of a population of living bacteria. Nature 547, 345–349 (2017). https://doi.org/10.1038/nature23017
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nature23017
This article is cited by
-
Molecular recording using DNA Typewriter
Nature Protocols (2024)
-
Symbolic recording of signalling and cis-regulatory element activity to DNA
Nature (2024)
-
Advancing CRISPR base editing technology through innovative strategies and ideas
Science China Life Sciences (2024)
-
Recent progress in DNA data storage based on high-throughput DNA synthesis
Biomedical Engineering Letters (2024)
-
On duplication-free codes for disjoint or equal-length errors
Designs, Codes and Cryptography (2024)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.