Determining the gene-expression profile of a cell is crucial to unlocking how its DNA blueprint gives rise to its physical characteristics and behaviours. The standard approach used currently involves RNA sequencing or single-cell imaging techniques that generate detailed snapshots of gene-expression profiles. However, these techniques capture such profiles only at the moment of analysis, and kill the cells. This makes it hard to capture fleeting gene-expression profiles or those that provide a complete picture of cells going through major behavioural or environmental changes. Writing in Nature, Schmidt et al.1 report progress in overcoming this challenge by enlisting a bacterial-defence system that can create a DNA record of specific RNA sequences in a cell.
The CRISPR–Cas bacterial-defence systems are probably best known for their application in genetic engineering to cleave specific DNA sequences2. But another feature of these systems is the incorporation of snippets of DNA from unwanted intruders into a bacterium’s own genome. These stored sequences provide a permanent ‘memory’ of infection, which can enable a defensive response if the same sequences are encountered again. The nucleotides are added to the cell’s DNA in a configuration called a CRISPR array. The sequence of an array alternates between identical repeat sequences and the incorporated snippets, which are called spacers. As spacers are acquired, the array lengthens, and the positioning of spacers in the array reflects the order in which they were inserted3.
Almost all CRISPR–Cas systems acquire foreign genetic material by directly capturing DNA from an invader. Some previous work expoited this feature of CRISPR–Cas systems to record information in the form of acquired and stored nucleotide sequences. For instance, one approach4,5 used CRISPR–Cas-mediated acquisition of externally provided synthetic DNA to capture sequences in a specific order. The particular order of the nucleotides in the spacers was subsequently ‘decoded’ to link each CRISPR array to pixels in sequential images5. Another study6 used chemical cues from the environment to drive expression of a gene controlling the abundance of a form of circular DNA called a plasmid. As plasmid abundance rose in the cell, the plasmid became the preferred source of DNA snippets for new spacers; this linked the presence of the chemical cue to a stored spacer that matched the plasmid DNA. That study, in particular, set the stage for the use of CRISPR–Cas to record the expression of one or a few genes. Yet it was unclear how this approach could be extended to provide a comprehensive record of the gene-expression profile of a cell.
Schmidt and colleagues devised a creative solution by focusing on CRISPR–Cas systems that capture invading RNA rather than DNA7 (Fig. 1a). These systems need only two proteins to achieve this feat, with one protein making a DNA version of the RNA sequence that becomes the spacer. Being able to generate DNA from RNA raised the possibility that this DNA could be used to document the identity and abundance of RNA transcripts, and therefore capture a cell’s gene-expression profile.
To use these CRISPR–Cas systems, the authors first had to overcome two technological hurdles. One hurdle was finding efficient RNA-capturing Cas proteins, because previously characterized proteins were inefficient at this task. The authors tested a large and genetically diverse set of Cas proteins, and identified clear winners from the human gut bacterium Fusicatenibacter saccharivorans. The other hurdle was being able to conduct DNA sequencing that focuses on the few CRISPR arrays that had obtained a new spacer, because most arrays were unaltered. The authors overcame this hurdle by developing a simple approach that selectively isolates the CRISPR arrays that have newly acquired spacers.
With these advances made, the authors went on to develop a method they called Record-seq for capturing gene-expression profiles. They genetically engineered the bacterium Escherichia coli to contain the RNA-acquisition proteins from F. saccharivorans. They then verified that these proteins could incorporate spacers into the genetic information of the E. coli cell, and that RNA rather than DNA sequences determined the corresponding spacer DNA.
In Record-seq, the RNA-acquisition proteins are expressed during the recording of the gene-expression profile. At the end of this period, a sample of the cell population is taken. Newly expanded CRISPR arrays are isolated and sequenced, and the spacers are matched to the corresponding genomic sequences.
The next steps were to prove that the method could faithfully create a record of gene expression and to determine what could be discerned about the cellular environment during the recording period. The authors found that Record-seq could record hundreds to thousands of different RNA transcripts present in the cell at any time. Although there was a strong bias towards capture of highly abundant transcripts, the transcript abundance of particular RNAs, as assessed by RNA sequencing, generally correlated with the frequency with which a corresponding spacer sequence was acquired in the sample. Furthermore, the collection of spacers could form a particular pattern depending on the growth conditions in which the cells were cultivated, allowing the authors to use such a spacer ‘fingerprint’ as a way to discern the conditions that the cells had experienced.
One key outcome was that the authors determined the characteristics that seem to govern the selection of RNA snippets (typically averaging around 40 base pairs in length) by the CRISPR-acquisition machinery during the process that generates spacers. Schmidt and colleagues found that the snippets were rich in adenine and thymine nucleotides, and often came from either one of the two ends of an RNA transcript. Unexpectedly, the authors found no obvious preference for specific sequences flanking the RNA regions used to make RNA snippets. Such flanking sequences, often termed protospacer-adjacent motifs (PAMs), are needed for the recognition process that enables CRISPR–Cas defences to specifically cleave the intended target sequence in the invader but not to cleave the same sequence present in the array8. The system therefore might generate some spacers that will not enable an effective immune response to be launched because the corresponding target sequences are not flanked by a PAM. This possibility, and the ability of the RNA-acquisition proteins to acquire RNA snippets from the bacterium’s own transcripts, raises questions about whether, and, if so, how, these systems effectively defend cells from unwanted intruders.
Arguably the most important demonstration of their method came when the authors compared Record-seq with direct sequencing of RNA. In one key experiment, the authors evaluated how well each technique could capture bacterial cells’ transcriptional responses to a brief exposure to the toxic molecule paraquat. They found that only Record-seq could capture both transient and dosage-dependent features of the transcriptional response to paraquat exposure (Fig. 1b,c).
Schmidt and colleagues have laid the groundwork for using Record-seq to monitor complex gene-expression profiles over time, although there are some immediate technical limitations that must be overcome. One current limitation is that spacer acquisition still remains highly inefficient, requiring at least 10 million bacterial cells to faithfully record an expression profile. Another is that the authors tested their system only in bacterial cells, whereas much of the future potential of Record-seq might lie with animal and plant cells. Last, Record-seq was used to sequence arrays that have only one or two spacers, for reasons relating to how the newly expanded arrays were isolated and sequenced. If the technique is modified to analyse longer arrays, this could provide a way of discerning the timing and intensity of more than one cellular event during the same recording period. The successful application of DNA-based CRISPR technologies in various multicellular organisms, along with ongoing advances in the engineering of Cas proteins9–11, offer hope that Record-seq might overcome these challenges and eventually provide a robust and widely used technology.
As Record-seq is further developed, it might have many applications. Could it be used to track spatio-temporal changes in gene-expression profiles in multicellular systems and shed light on the development of animal and plant tissues and organs? Perhaps microbial communities in fluctuating micro-environments or the interactions between a pathogen and its host during infection could be monitored using this technique. Finally, will it be possible to use cells engineered to perform Record-seq to monitor gene expression in difficult-to-access environments, such as the human gut, or to identify gene-expression profiles that are a signature of disease or abnormality? Schmidt and colleagues’ technique might transform how gene-expression profiles are monitored in vivo in cells, and it highlights yet another aspect of CRISPR–Cas systems that can be harnessed to make powerful technologies.
Nature 562, 347-349 (2018)