Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

CRISPR–Cas encoding of a digital movie into the genomes of a population of living bacteria

Abstract

DNA is an excellent medium for archiving data. Recent efforts have illustrated the potential for information storage in DNA using synthesized oligonucleotides assembled in vitro1,2,3,4,5,6. A relatively unexplored avenue of information storage in DNA is the ability to write information into the genome of a living cell by the addition of nucleotides over time. Using the Cas1–Cas2 integrase, the CRISPR–Cas microbial immune system stores the nucleotide content of invading viruses to confer adaptive immunity7. When harnessed, this system has the potential to write arbitrary information into the genome8. Here we use the CRISPR–Cas system to encode the pixel values of black and white images and a short movie into the genomes of a population of living bacteria. In doing so, we push the technical limits of this information storage system and optimize strategies to minimize those limitations. We also uncover underlying principles of the CRISPR–Cas adaptation system, including sequence determinants of spacer acquisition that are relevant for understanding both the basic biology of bacterial adaptation and its technological applications. This work demonstrates that this system can capture and stably store practical amounts of real data within the genomes of populations of living cells.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Figure 1: An image into the genome.
Figure 2: Sequence determinants of acquisition.
Figure 3: Encoding a GIF in bacteria.

References

  1. Church, G. M., Gao, Y. & Kosuri, S. Next-generation digital information storage in DNA. Science 337, 1628 (2012)

    ADS  CAS  Article  Google Scholar 

  2. Goldman, N. et al. Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature 494, 77–80 (2013)

    ADS  CAS  Article  Google Scholar 

  3. Gibson, D. G. et al. Creation of a bacterial cell controlled by a chemically synthesized genome. Science 329, 52–56 (2010)

    ADS  CAS  Article  Google Scholar 

  4. Clelland, C. T., Risca, V. & Bancroft, C. Hiding messages in DNA microdots. Nature 399, 533–534 (1999)

    ADS  CAS  Article  Google Scholar 

  5. Adleman, L. M. Molecular computation of solutions to combinatorial problems. Science 266, 1021–1024 (1994)

    ADS  CAS  Article  Google Scholar 

  6. Davis, J. Microvenus. Art J. 55, 70–74 (1996)

    Article  Google Scholar 

  7. Barrangou, R. et al. CRISPR provides acquired resistance against viruses in prokaryotes. Science 315, 1709–1712 (2007)

    ADS  CAS  Article  Google Scholar 

  8. Shipman, S. L ., Nivala, J ., Macklis, J. D. & Church, G. M. Molecular recordings by directed CRISPR spacer acquisition. Science 353, aaf1175 (2016)

    Article  Google Scholar 

  9. Amitai, G. & Sorek, R. CRISPR–Cas adaptation: insights into the mechanism of action. Nat. Rev. Microbiol. 14, 67–76 (2016)

    CAS  Article  Google Scholar 

  10. Sternberg, S. H., Richter, H., Charpentier, E. & Qimron, U. Adaptation in CRISPR–Cas Systems. Mol. Cell 61, 797–808 (2016)

    CAS  Article  Google Scholar 

  11. van der Oost, J., Jore, M. M., Westra, E. R., Lundgren, M. & Brouns, S. J. CRISPR-based adaptive and heritable immunity in prokaryotes. Trends Biochem. Sci. 34, 401–407 (2009)

    CAS  Article  Google Scholar 

  12. Deveau, H. et al. Phage response to CRISPR-encoded resistance in Streptococcus thermophilus. J. Bacteriol. 190, 1390–1400 (2008)

    CAS  Article  Google Scholar 

  13. Paez-Espino, D. et al. Strong bias in the bacterial CRISPR elements that confer immunity to phage. Nat. Commun. 4, 1430 (2013)

    ADS  Article  Google Scholar 

  14. Westra, E. R. et al. Type I-E CRISPR–Cas systems discriminate target from non-target DNA through base pairing-independent PAM recognition. PLoS Genet. 9, e1003742 (2013)

    CAS  Article  Google Scholar 

  15. Shmakov, S. et al. Pervasive generation of oppositely oriented spacers during CRISPR adaptation. Nucleic Acids Res. 42, 5907–5916 (2014)

    CAS  Article  Google Scholar 

  16. Nuñez, J. K., Harrington, L. B., Kranzusch, P. J., Engelman, A. N. & Doudna, J. A. Foreign DNA capture during CRISPR–Cas adaptive immunity. Nature 527, 535–538 (2015)

    ADS  Article  Google Scholar 

  17. Wang, J. et al. Structural and mechanistic basis of PAM-dependent spacer acquisition in CRISPR–Cas systems. Cell 163, 840–853 (2015)

    CAS  Article  Google Scholar 

  18. Yosef, I., Goren, M. G. & Qimron, U. Proteins and DNA elements essential for the CRISPR adaptation process in Escherichia coli. Nucleic Acids Res. 40, 5569–5576 (2012)

    CAS  Article  Google Scholar 

  19. Diez-Villasenor, C., Almendros, C., Garcia-Martinez, J. & Mojica, F. J. Diversity of CRISPR loci in Escherichia coli. Microbiology 156, 1351–1361 (2010)

    CAS  Article  Google Scholar 

  20. Weinberger, A. D. et al. Persisting viral sequences shape microbial CRISPR-based immunity. PLOS Comput. Biol. 8, e1002475 (2012)

    CAS  Article  Google Scholar 

  21. Held, N. L., Herrera, A., Cadillo-Quiroz, H. & Whitaker, R. J. CRISPR associated diversity within a population of Sulfolobus islandicus. PLoS One 5, e12988 (2010)

    ADS  Article  Google Scholar 

  22. Yosef, I. et al. DNA motifs determining the efficiency of adaptation into the Escherichia coli CRISPR array. Proc. Natl Acad. Sci. USA 110, 14396–14401 (2013)

    ADS  CAS  Article  Google Scholar 

  23. Westra, E. R. et al. CRISPR immunity relies on the consecutive binding and degradation of negatively supercoiled invader DNA by cascade and Cas3. Mol. Cell 46, 595–605 (2012)

    CAS  Article  Google Scholar 

  24. Semenova, E. et al. Interference by clustered regularly interspaced short palindromic repeat (CRISPR) RNA is governed by a seed sequence. Proc. Natl Acad. Sci. USA 108, 10098–10103 (2011)

    ADS  CAS  Article  Google Scholar 

  25. Zhirnov, V., Zadegan, R. M., Sandhu, G. S., Church, G. M. & Hughes, W. L. Nucleic acid memory. Nat. Mater. 15, 366–370 (2016)

    ADS  CAS  Article  Google Scholar 

  26. Hsiao, V., Hori, Y., Rothemund, P. W. & Murray, R. M. A population-based temporal logic gate for timing and recording chemical events. Mol. Syst. Biol. 12, 869 (2016)

    Article  Google Scholar 

  27. McKenna, A . et al. Whole-organism lineage tracing by combinatorial and cumulative genome editing. Science 353, aaf7907 (2016)

    Article  Google Scholar 

  28. Frieda, K. L. et al. Synthetic recording and in situ readout of lineage information in single cells. Nature 541, 107–111 (2017)

    ADS  CAS  Article  Google Scholar 

  29. Erlich, Y. & Zielinski, D. DNA Fountain enables a robust and efficient storage architecture. Science 355, 950–954 (2017)

    ADS  CAS  Article  Google Scholar 

  30. O’Shea, J. P. et al. pLogo: a probabilistic approach to visualizing sequence motifs. Nat. Methods 10, 1211–1212 (2013)

    Article  Google Scholar 

Download references

Acknowledgements

S.L.S. is a Shurl and Kay Curci Foundation Fellow of the Life Sciences Research Foundation. The project was supported by grants from the National Institute of Mental Health (5R01MH103910), National Human Genome Research Institute (5RM1HG008525), and Simons Foundation Autism Research Initiative (368485) to G.M.C., the National Institute of Neurological Disorders and Stroke (5R01NS045523) to J.D.M and an Allen Distinguished Investigator Award from the Paul G. Allen Frontiers Group to J.D.M. We thank G. Kuznetsov for comments on the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

S.L.S. and J.N. conceived the study. S.L.S. designed the work, performed experiments, analysed data, wrote custom Python analysis software, and wrote the manuscript with input from J.N., J.D.M. and G.M.C. S.L.S. J.N., J.D.M. and G.M.C. discussed results and commented on the manuscript.

Corresponding author

Correspondence to George M. Church.

Ethics declarations

Competing interests

S.L.S. J.N, J.D.M., and G.M.C. are inventors on a provisional patent (62/296,812) filed by the President and Fellows of Harvard College that covers the work in this manuscript. A complete accounting of the financial interests of G.M.C. is listed at: http://arep.med.harvard.edu/gmc/tech.html.

Additional information

Reviewer Information Nature thanks R. Barrangou and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Figure 1 Recording images into the genome.

a, Pixel values are encoded across many protospacers, which are electroporated into a population of bacteria that overexpress Cas1 and Cas2 to store the image data. These bacteria can be archived, propagated, and eventually sequenced to recall the image. b, Initial image to be encoded. c, Nucleotide-to-colour encoding scheme. d, Example of the encoding scheme. Sequence at top shows the protospacer linear view with pixet code (specifying a pixel set) followed by pixel values, which are distributed across the image. Pixet number is shown under the pixet nucleotides, with the binary-converted pixet and binary-to-nucleotide conversion reference below that. Small numbers (in colour) below the protospacer indicate individual pixels, identified by boxes on the image. Protospacer in minimal hairpin format for electroporation is shown on the right. e, Results of one replicate at a depth of 655,360 reads. White is shown if no information was recovered about the pixel value (owing to a pixet protospacer not being recovered after sequencing). f, Percentage of accurately recalled pixets as a function of read depth. Unfilled circles indicate points derived from 3 biological replicates. The black line is the mean of the replicates. g, Examples of the images that result from down-sampling the sequencing reads. h, Effect of supplying fewer oligonucleotides on recall accuracy as a function of reads sampled when smaller pools of oligonucleotides are supplied and recalled. Individual points show 3 biological replicates, lines are the means of the replicates. i, Number of reads required to reach 50%, 60%, 70%, and 80% accuracy on a given oligonucleotide set as a function of oligonucleotides supplied (n = 3; linear regression of the 80% curve, R2=0.9466; runs test of the 80% curve, P > 0.99). Additional statistical details in Supplementary Table 2.

Extended Data Figure 2 Testing a minimal hairpin protospacer.

a, Percentage of arrays expanded with oligonucleotide-supplied spacers following electroporation of the sequences indicated below, aimed at testing PAM inclusion on both the top and bottom strands. Unfilled circles indicate biological replicates, bars are mean ± s.e.m (n = 3; one-way ANOVA: P < 0.0001; follow-up Dunnett’s multiple comparison (corrected), no PAM versus full PAM: P = 0.0001, no PAM versus bottom PAM: P = 0.0002). *P < 0.05. Oligonucleotides supplied at 3.125 μM each. b, Percentage of arrays expanded with oligonucleotide-supplied spacers following electroporation of the sequences indicated to the left, right, and below aimed at finding a minimal functional hairpin protospacer. Unfilled circles indicate individual biological replicates, bars are mean ± s.e.m (n = 4; one-way ANOVA effect of protospacer: P > 0.05). Oligonucleotides supplied at 3.125 μM. c, Percentage of arrays expanded following electroporation of different concentrations of the minimal hairpin oligonucleotide protospacer (n = 1). Additional statistical details in Supplementary Table 2.

Extended Data Figure 3 Cell surviving electroporation.

Colony-forming units per millilitre of starting culture before beginning electroporation, after pre-electroporation washes, immediately post-electroporation, and after 1 h of recovery. Cells in red were electroporated with a minimal hairpin oligonucleotide, those in blue were electroporated in water alone. Unfilled circles represent individual biological replicates (n = 3), filled circles are mean ± s.e.m.

Extended Data Figure 4 Optimization of protospacer sequence parameters.

a, Comparison of the percentage of arrays that were expanded after encoding handR and handF images (n = 3). b, Percentage of arrays expanded per oligonucleotide (single pool) or per subpool (subpooled) across a range of GC percentages. Unfilled black circles to the left represent individual oligonucleotide protospacer sequences (three biological replicates each), while black line shows mean ± s.e.m. Unfilled red circles to the right represent individual biological replicates. Bars are mean ± s.e.m (n = 3; one-way ANOVA on effect of GC percentage, single pool: P < 0.0001, subpooled P = 0.0011; follow-up testing with Tukey’s multiple comparison (corrected), see Supplementary Table 2). c, Percentage of arrays expanded per oligonucleotide electroporated individually across a range of GC percentages. Unfilled red circles are individual biological replicates. Bars show mean ± s.e.m (n = 3; one-way ANOVA on effect of GC percentage: P = 0.0001; follow-up testing with Tukey’s multiple comparison (corrected), see Supplementary Table 2). d, Gibbs free energy of minimal hairpin protospacers structures for each of the images, with protospacers ranked by overall acquisition frequency (n = 3; linear regression, handR: P = 0.0089, handF: P = 0.0004). e, Percentage of arrays expanded per oligonucleotide (single pool) or per subpool (subpooled) with different numbers of mononucleotide repeats (n = 3; one-way ANOVA on effect of mononucleotide repeats, single pool: P = 0.3843, subpooled: P = 0.0015; follow-up testing with Tukey’s multiple comparison (corrected), see Supplementary Table 2). Panel attributes as in b. f, Percentage of arrays expanded per oligonucleotide (single pool) or per subpool (subpooled) with different numbers of internal PAMs (n = 3; one-way ANOVA on effect of internal PAMs, single pool: P = 0.0565, subpooled: P = 0.0052; follow-up testing with Tukey’s multiple comparison (corrected), see Supplementary Table 2). Panel attributes as in b. *P < 0.05. Additional statistical details in Supplementary Table 2.

Extended Data Figure 5 Effect of the 3′ motif on protospacer acquisition when supplied as two complementary oligonucleotides.

Individual sequences designed to directly test the motif identified in Fig. 2b shown to the left. To the right, percentage of arrays expanded following electroporation of the sequences indicated as two complementary oligonucleotides (in dark red), rather than a minimal oligonucleotide hairpin (shown for comparison in pink). Unfilled circles indicate individual biological replicates. Bars show mean ± s.e.m. (n = 3; one-way ANOVA on effect of oligonucleotide: P = 0.0041; follow-up testing with Sidak’s multiple comparison (corrected), seqover versus seqover-CCT: P = 0.0103, sequnder versus sequnder-TGA: P = 0.0081). *P < 0.05. Additional statistical details in Supplementary Table 2.

Extended Data Figure 6 Recall of frame order over time based on position in the CRISPR array.

a, Initial set of rules to test the order of spacers within a pixet. Every time two spacers from the same pixet are found in a single array, their relative physical location (with respect to the leader) is extracted. As is the location of each spacer relative to spacers drawn from the genome or plasmid (G/P). The actual sequence of electroporated protospacers should occupy arrays in a predictable physical arrangement, as described by these ordering rules. Every possible permutation of spacers within a pixet is tested against each of these rules and, if a permutation satisfies all the rules, spacers are assigned to frame. b, Second set of tests to compare between pixets. If no permutation satisfies all of the tests in a, spacers are compared to previously assigned spacers from other pixets pairwise when found in the same array. A larger set of rules will hold true for the actual sequence of electroporated protospacers when compared against previously assigned spacers. Again, all possible order permutations are tested, and order is assigned based on the best overall satisfaction of these ordering rules.

Extended Data Figure 7 Quantification of errors by source.

Includes any instance of a called spacer that does not match the supplied protospacer.

Extended Data Figure 8 Methods of image encoding for error-correction.

ad, Method used in Fig. 1. a, Triplet code to flexibly specify 21 colours. b, Example of a pixet to be encoded into nucleotide space with pixel values marked. c, Rules specifying how the protospacer will be built. d, Example of the build of the protospacer. The AAG introduced by the addition of pixel 4 is unacceptable and invokes the flexible switch to another triplet. In a test of the extendibility of this encoding scheme, we ran three random sets of 100 million different nine-colour orderings through the sequence build and found that 99.86 ± 0.07% of colour orders were able to satisfy the requirements we set out without optimization by hand. ei, Method of alternating clusters for error correction. e, Triplet assignment to clusters A, B, and X. f, Example of a pixet to be encoded into nucleotide space with pixel values marked. g, Rules for adding new triplets in this scheme. h, Example of the build of the protospacer. The AAG introduced by the addition of pixel 4 is unacceptable and invokes the flexible switch to cluster X. i, Example of an error signal. jl, Method of checksum error correction. j, Annotation of protospacer with the addition of a checksum. k, Annotation of the checksum itself. l, Full protospacer with checksum implemented.

Supplementary information

Supplementary Information

This file contains Supplementary Notes and Supplementary Tables 1-2. (PDF 324 kb)

Supplementary Data

This file contains Supplementary Table 3. (XLSX 73 kb)

PowerPoint slides

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Shipman, S., Nivala, J., Macklis, J. et al. CRISPR–Cas encoding of a digital movie into the genomes of a population of living bacteria. Nature 547, 345–349 (2017). https://doi.org/10.1038/nature23017

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nature23017

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing