In Living Memory: the First Steps toward Genetic Data Storage

Drew Endy's group at Stanford has just published their latest paper¹—open access of course. It represents the first major step in a long-term ambition to create a reliable form of living memory—rewritable, retrievable digital information stored in living cells. Endy, a civil engineer by training but synthetic biologist in practice, has been at the forefront of recent innovations in genetic circuits and synthetic biology systems². In an interview with the New Yorker in 2009, he speculated: "If the cells in our bodies had a little memory, think what we could do." Specifically, he entertained the idea that genetic memory could be used to encode a counter that tracks cell divisions. Besides making aging studies technically easier, anti-cancer therapies could be interfaced with the counter to specifically target cancer cells that are dividing out of control.

Background

It isn't too surprising that living cells are (at least potentially) good media for information storage, since they have evolved a stable storage medium (DNA) and a high efficiency, high fidelity copying system (DNA polymerase and associated proteins). However, engineering rewritable DNA information storage presents a significant challenge. It requires building a system that reliably converts an environmental input supplied by researchers to a form that can be stably encoded in DNA and retrieved. We don't have a good way of controlling mutations in a way that might allow for A, C, T, and G themselves to encode the genetic memory, and it's not at all clear that we could retrieve the data without killing the cell. Thus, encoding genetic memory requires a more sophisticated approach.

The authors of this paper chose DNA flipping as the mechanism for genetic data storage. The application of DNA flipping to synthetic biology was pioneered by the collaboration I work with, originally applied to solving difficult combinatorics problems³. In general, DNA flipping uses proteins to invert a segment of DNA bounded by sequences that the proteins recognize. That segment of DNA is equivalent to a bit of memory, since it can be in exactly two orientations: forward or reverse, 1 or 0. When the DNA segment is inverted by a recombinase, the original two sites are restored. This allows the DNA segment in between to be flipped over and over again, as long as the recombinases (proteins that scramble DNA in this manner) are present in the cell. This poses quite a challenge to using DNA flipping for information storage, since the flipping is hard to turn off! Even a single recombinase could flip the same segment of DNA several times, so it is not at all practical to rely on flipping mechanisms that can't tell the difference between a DNA segment that has been flipped, versus one that has not.

Design

To sidestep the continuous flipping problem, the authors used a DNA flipping system from bacteriophage (viruses that infect bacteria) that is bound by different sequences when it is in the forward versus reverse orientations. Further, the flipping mechanism is different in the two directions, so continuous back-and-forth flipping should be avoidable. The mechanism relies on two proteins: integrase and excisionase. They refer to the entire data storage module as RAD: recombinase addressable data.

The key question starting off is what to put in the flippable segment of DNA itself. It has to be something directional, whose function is changed by flipping. They chose a promoter, which recruits RNA polymerase and initiates gene expression. If the promoter is pointed to the right, RNA polymerase is directed to express GFP ("green fluorescent protein,") turning the E. coli cells green. If pointed to the left, RNA polymerase is directed in the opposite direction to express RFP ("red fluorescent protein,") turning the cells red. The authors of the paper defined the "right" GFP state to be 0 and the "left" RFP state to be 1. Flipping from 0 to 1 (which they refer to as "set") requires expression of integrase only, and flipping from 1 to 0 requires the expression of both integrase and excisionase (known as "reset.") The figure at the top of the page summarizes the operation of the RAD module.

Testing

They subjected their RAD module to a battery of tests to understand its operation and confirm their final design was successful. After confirming that they could easily observe the difference between the 0 and 1 state (the difference was quite sharp), they tested the "set" and "reset" functions separately. The "set" function worked as expected (see figure). A culture of cells in the 0 state were switched to 1 with >95% efficiency, and the 1 state persisted after the inducer was removed. The "reset" function, however, did not initially work as expected. This was an instance where what worked well in vitro did not translate to success in vivo. Instead of one-way flipping from 1 to 0, they observed random back-and-forth flipping. In the figure, you can see that cells are initially overwhelmingly in the 1, or RFP, state. When the cells are induced to "reset," they produce both RFP and GFP, eventually settling into one state or the other when flipping ceases. After some serious troubleshooting, which I won't get into (but is described in the paper), involving testing hundreds of clones, they found a stoichiometry that worked. By combining computational design, existing synthetic biology parts, directed evolution experiments, and tireless clone screening, they finally obtained a fully optimized RAD module.

To fulfill Endy's dream of having a counter that lasts hundreds of cell divisions, the RAD module needs to be evolutionarily robust. This is a huge problem in synthetic biology, since our constructs usually divert resources from the cell that could be used for reproducing faster. To test this, they grew bacteria carrying the RAD module for 10 days, starting new cultures each day with a sample of cells from the last. At the end of 10 days, the cells had undergone about 120 doublings, and the performance of the RAD module was not observably different. Not only did the module retain its encoded data (0 or 1), but after 90 doublings—the number they happened to choose—the switching mechanism still worked efficiently (see figures).

Conclusions

This project pushed the synthetic biology design process to its limits. As noted in the paper, they began with not even knowing the optimal stoichiometry of components. They repeatedly tweaked their design, used libraries of alternative clones, relied on computational design, and even brought in an amusing trick seldom seen in synthetic biology: alternative start codons (substituting GTG for ATG to lower translational output). It is remarkable how much design work and testing went in to designing a single bit of memory. Yet, Endy speculates about an 8-bit (= 1 byte) memory system, which would be sufficient for the applications he originally spoke about, such as counting cell divisions to halt cancer cell proliferation. (It's not a perfectly descriptive calculation of what actually happens, but it is interesting nonetheless, that it takes only 47 doublings to get from a single cell to the 100 trillion that make up the human body [log₂ of 100 trillion]). This is definitely a project to watch over the coming years.

Image Credits (in order of appearance): Fig. 1A, Fig. 2B and 2C, Figs. 2F and 2G (all from ref. 1)

Reference:

1. Bonnet, J., Subsoontorn, P., & Endy, D. Rewritable Digital Data Storage in Live Cells via Engineered Control of Recombination Directionality [Open Access]. PNAS. Published online May 21, 2011.

2. Specter, M. "A Life of Its Own: Where Will Synthetic Biology Lead Us?" [Open Access]. The New Yorker. September 28, 2009.

3. Haynes, K. A. et al. Engineering Bacteria to Solve the Burnt Pancake Problem [Open Access]. Journal of Biological Engineering 2, 8 (2009).

2 Comments

June 11, 2012 | 11:03 PM

Posted By: Eric Sawyer

Hi Khalil--
Drew Endy has done some amazing things for synthetic biology, and he's showing no signs of slowing down. New Yorker article touches on a lot of things from a bunch of perspectives in the field, and I find myself looking at it occasionally too for general reference. --Eric

June 08, 2012 | 09:06 AM

Posted By: Khalil A. Cassimally

Wo, that's a really interesting study. Now bookmarking the New Yorker article!