A longstanding goal of synthetic biology has been the programmable control of cellular functions. Central to this is the creation of versatile regulatory toolsets that allow for programmable control of gene expression. Of the many regulatory molecules available, RNA regulators offer the intriguing possibility of de novo design—allowing for the bottom-up molecular-level design of genetic control systems. Here we present a computational design approach for the creation of a bacterial regulator called Small Transcription Activating RNAs (STARs) and create a library of high-performing and orthogonal STARs that achieve up to ~ 9000-fold gene activation. We demonstrate the versatility of these STARs—from acting synergistically with existing constitutive and inducible regulators, to reprogramming cellular phenotypes and controlling multigene metabolic pathway expression. Finally, we combine these new STARs with themselves and CRISPRi transcriptional repressors to deliver new types of RNA-based genetic circuitry that allow for sophisticated and temporal control of gene expression.
RNA is increasingly recognized as a powerful biomolecule for controlling gene expression and engineering synthetic cellular functions1,2,3,4. One of the reasons for this is that natural and engineered RNA-based regulators are now available that can control almost every aspect of gene expression5. In addition these regulatory functions can be enacted and tuned by the programmable formation of specific RNA structures, which mediate interactions with cellular machinery to perform gene regulation. For example in bacteria, the formation of simple RNA structures such as hairpins within mRNAs can prevent their transcription and translation2. Moreover, these cis-acting RNA structures can be further controlled through interacting with trans-acting small RNAs (sRNAs) or binding of a ligand, to prevent or allow their formation—in effect creating inducible genetic control elements2, 3. This combination of versatile genetic regulation controlled by simple RNA structures creates the intriguing possibility of using nucleic acid design algorithms to create RNA regulators de novo 1, 2. Thus RNA as a substrate for molecular programming has a potential major advantage over less designable protein regulators, and there is great promise for RNA synthetic biology to allow for the bottom up molecular-level design of genetic control systems.
Recently, there has been great progress towards this vision, with numerous demonstrations of synthetic RNA regulators that have been rationally engineered from natural versions6,7,8,9,10,11 or designed de novo 12,13,14,15,16,17,18,19,20,21. However, while computational design of RNA regulators has been possible for many years, a major challenge has been the design of high-performing mechanisms that exert large fold changes in gene expression. This is recently beginning to change, most notably for RNA-based translational regulators. For example, improvements of biophysical models of translation initiation have resulted in more accurate design of constitutive22 and ligand switchable23 cis-acting translational regulators. In addition, there has been tremendous advances in the computationally-driven de novo design of orthogonal libraries of trans-acting sRNA translational activators called toehold switches, which function with dynamic ranges in the hundreds of fold activation21.
While significant advances have been made in the computational design of translational RNA regulators, it has been more difficult to design RNA regulators of transcription. This is an important challenge, as RNAs that control transcription have the potential to be more general, able to regulate both RNA synthesis and downstream protein synthesis. In addition, they also offer the distinct advantage of being able to implement genetic regulatory circuits entirely as RNAs7 that operate on the fast timescales of RNA degradation24. The challenge of the de novo design of RNA transcriptional regulators is due to several reasons. First, these regulators depend on the formation of transient RNA structures such as intrinsic terminator hairpins that form cotranscriptionally in a kinetically-driven, out-of-equilibrium folding regime25,26,27,28. However, RNA folding and design algorithms have largely been developed to model and design RNA structures that are in an equilibrium folding regime. Thus RNA design algorithms are not naturally suited to the design of transient RNA structures. Second, the RNA structures underpinning natural transcriptional RNA regulators such as riboswitches are often more complex than just simple hairpins, and depend on interactions that are harder to design such as pseudoknots and non-canonical base pairs29. While there has been some progress in designing riboswitches15, 16, these challenges have so far prevented the de novo design of sRNA transcriptional regulators that show protein-like dynamic ranges.
One solution to this challenge is to identify transcriptional regulatory mechanisms that function through simple RNA structures that use motifs that design algorithms can exploit. Towards this end, we recently created synthetic sRNA transcriptional activators called Small Transcription Activating RNAs (STARs) (Fig. 1a), which are based upon the conditional formation of simple RNA hairpins30. This regulator is composed of a target RNA placed upstream of a gene to be regulated that, once transcribed, folds into an intrinsic terminator hairpin that prevents transcription of the downstream gene (OFF state). To activate transcription, a STAR is expressed that interacts with the target RNA, preventing terminator formation and allowing transcription of the downstream gene (ON state). While our original development of STARs provided a simple transcriptional activation mechanism, our approach was dependent on the unpredictable reengineering of natural terminator hairpin sequences into STARs. This resulted in a small number of orthogonal STARs and often low fold of activation, which could only be modestly improved through RNA engineering strategies31.
However, the inherent simplicity of the STAR mechanism could potentially permit the application of nucleic acid design algorithms for the de novo design of STARs to address these current limitations. To deliver on this potential, we describe here the creation of a new STAR design motif that allowed for the computational design of large libraries of STARs with high-dynamic ranges. Moreover, our computational design approach offered an immediate solution to the often complex problem of creating independently acting, or orthogonal, RNA regulators.
These achievements then allowed us to demonstrate the versatility of designable RNA transcriptional control in a number of contexts of importance to synthetic biology approaches and applications. Because STARs control transcription, they can be easily interfaced synergistically with commonly used constitutive and inducible regulatory systems to switch and tune their performance. We also demonstrate that these STARs can function robustly in different genetic and environmental contexts by showing that they can control diverse mRNAs in vivo encoded on plasmids and the genome, and can tightly control gene expression within in vitro cell-free gene expression systems. We also show broad utility of these STARs—from reprogramming cellular phenotypes to controlling metabolic pathway expression.
Finally, we demonstrate several contexts in which STARs can be leveraged to create RNA-only genetic network motifs of increasing sophistication. This includes the creation of one-to-many control of gene expression with a STAR single-input module, the creation of an activation-activation STAR cascade, and the creation of a split STAR AND gate. Furthermore, interfacing the complementary regulation offered by both STARs and CRISPR interference (CRISPRi) transcriptional repressors32 allowed construction of an activation-repression cascade and a NIMPLY logic-gate, which when combined enabled us to create the first RNA-based incoherent feed-forward loop that creates a pulse of gene expression.
Computational design of STAR libraries
As a starting point towards computationally designing STARs, we first sought to uncover a simple STAR design motif by mechanistically analyzing the role of each sequence region of the STAR-target complex (Supplementary Fig. 1). Through this analysis we observed that while STAR sequestration of the terminator hairpin is required to prevent terminator formation and activate transcription, STAR binding of the linear region appeared to be critical for seeding this interaction. As such, we inferred that the interaction between the linear region of the target RNA and corresponding binding region of the STAR serves as the key recognition region of the STAR regulatory system. This observation raised the intriguing possibility that functionally diverse and orthogonal STARs could be simply created by varying the linear region while keeping constant the terminator scaffold of the most efficient AD1 STAR system30.
We next sought to use this design motif to create a library of STAR variants using the Nucleic Acids Package (NUPACK) RNA online design tool33, 34 (Supplementary Fig. 2 and Supplementary Note 1). NUPACK is a suite of algorithms that perform statistical thermodynamic calculations of RNA structures to either predict an RNA structure from a given sequence, or to design RNA sequences that will fold into user-defined structures. While NUPACK is a powerful suite of design tools, its application to designing STARs is nontrivial. This is because the statistical thermodynamic calculations behind NUPACK’s design and prediction algorithms are focused on RNA structures that are at equilibrium, whereas STAR regulation is likely governed by kinetic, out-of-equilibrium folding regimes. Nevertheless, by focusing the design on the STAR-target linear recognition region while maintaining the terminator scaffold we hypothesized that we could utilize NUPACK to design efficient interaction sequences that limited competing intramolecular structures, while maximizing intermolecular interactions (Supplementary Fig. 2).
Using our STAR design motif and NUPACK, we designed and created a library of 100 STAR variants. To functionally characterize this library, plasmids encoding target RNA-reporter fusions were transformed into E. coli cells and fluorescence measured in the presence of plasmids encoding the cognate STAR or a no-STAR control plasmid (Supplementary Fig. 3). Characterization of this STAR library revealed a broad-range of functionality, with each STAR showing distinct OFF and ON levels, and folds of activation (Fig. 1b and Supplementary Fig. 4), which we subsequently confirmed were due to changes in transcription using RT-qPCR (Supplementary Fig. 5). We observed that 37 variants show > 50 fold activation—comparable to that of the previously best performing STAR30 (labeled variant 28 in Fig. 1b). In addition, numerous variants showed significantly higher folds of activation up to ~ 400 fold (Fig. 1c). Furthermore, we were able to achieve 9171±3881 fold activation (± refers to standard deviation of n = 9 biological replicates) by combining the best STAR variant with a commonly used unstable GFP (GFPmut3b-ASV) that reduced the OFF-level fluorescence to be nearly indistinguishable from cellular autofluorescence (Fig. 1c). Finally, we showed that our best performing STAR compared favorably to the best performing computationally designed toehold switch translational activator21, with the distinct advantage of offering lower OFF levels in our experimental configuration (Supplementary Fig. 6). As such we showed that we can not only computationally design STARs, but in doing so attain some of the highest-dynamic ranges achieved by an RNA regulator to date.
We next aimed to determine whether we could uncover design principles from this functionally diverse STAR library. Studying the functional characteristics of the library first, we observed that a prerequisite for a STAR:target to achieve a high-dynamic range was tight control of gene expression in the OFF state. Thus the transcription termination efficiency of the target RNA represented a key determinant of overall performance. Computational analysis of the library revealed several sequence and structural features that suggested the formation of base stacking interactions and secondary structures within the linear region of the target RNA negatively impacted transcription termination efficiency (see Supplementary Note 2 for detailed discussion). Thus the best performing STARs more closely satisfied the design criterion of having unstructured linear regions.
We next assessed whether these observations represented a generalizable STAR design principle or were specific for the AD1 terminator hairpin that was used as a terminator scaffold. To test this, we used NUPACK to computationally design a small library of STARs using the terminator from the E. coli ribA gene as a terminator scaffold30. This ribA STAR library was characterized and shown to be functional, demonstrating that our STAR design strategy was generalizable to other terminator hairpins (Supplementary Fig. 7). Interestingly we again observed a strong negative correlation between termination efficiency and the presence of base stacking interactions and predicted secondary structure in the linear region, suggesting this to be a general design principle of STARs (see Supplementary Note 2).
Finally, we sought to explore whether this design strategy can yield orthogonal, or independently-acting, STAR variants. To do so, we developed a computational algorithm to predict the orthogonality between members of the STAR library (Supplementary Note 3). In this algorithm, NUPACK35 was first used to predict base pairing between all target RNA and STAR variants (10,201 combinations) (Supplementary Fig. 8). This data was then used to select predicted orthogonal subsets of STAR-target variants, by identifying different variants to be orthogonal if the number of predicted base pairs was less than a cut-off value of 13 that we experimentally determined (Supplementary Fig. 9). To validate these predictions, we selected a subset of 6 STAR variants that were predicted to be orthogonal (Supplementary Fig. 10) and experimentally confirmed that these STARs were indeed highly-orthogonal (Fig. 1d and Supplementary Fig. 11). This demonstrated that orthogonal STARs can be easily identified with purely computational methods, circumventing the need to experimentally screen thousands of possible combinations.
In summary, by combining nucleic acid design algorithms with a simple design motif we can computationally design and create de novo large libraries of STARs. Moreover, we showed that through the creation of this library we can identify high-performing and orthogonal variants, and also obtain a deeper understanding of the sequence-structure-function relationship of this regulatory system.
Harnessing STARs to switch and tune genetic control elements
Synthetic biology has created a wealth of tools for fine-tuning and optimizing multi-gene genetic programs36. Perhaps the most prevalent are strength variants of constitutive promoters and ribosome binding sites (RBS). We hypothesized that STARs could be synergistically combined with these constitutive control elements to create switchable versions, in which the STAR system provides tight-control of the OFF state and promoter:RBS variants provide tunable control of the ON state. To test this concept, we combined a library of promoter and RBS strength variants (Supplementary Fig. 12) with a target RNA expression cassette (Fig. 2a). Characterization revealed that through different combinations of promoter:RBS variants, highly-tunable activation levels can be achieved—all while maintaining stringent repression in the OFF state (Fig. 2a and Supplementary Fig. 13). Thus STARs provide a generalizable approach to create switchable control elements from the large number of existing libraries of constitutive regulatory parts37.
The ability of STARs to interface with constitutive control elements motivated us to examine the effects of interfacing STARs with inducible regulators. We first demonstrated that tunable activation could be achieved by using a chemically inducible promoter system to titrate STAR expression and consequently, the level of activation (Fig. 2b). Interestingly, we also observed that different STAR systems with distinct ON levels resulted in inducible systems with distinct transfer functions, varying in the maximum ON level of expression. Moreover, this ON level could be further fine-tuned through combining this inducible-STAR system with the target-RBS library described above (Supplementary Fig. 14). Thus we were able to simply tune the dynamic range of the inducible system without having to undertake the laborious task of either engineering the promoter-transcription factor interface, or the transcription factor itself.
These results demonstrated the power of interfacing easily designable RNA regulators with existing genetic control elements to switch and tune their regulatory properties, all without having to engineer the DNA or protein components.
STARs function in diverse genetic and environmental contexts
A prerequisite for a robust regulator is that it should predictably function in a variety of contexts. To demonstrate this for STARs, we first confirmed that STAR functionality is maintained independent of the gene being regulated. Three specific STARs were chosen that achieved high-dynamic ranges through a high ON level (STAR 6), low OFF level (STAR 5) or a combination (STAR 8). We combined these STARs to control the expression of three sequence diverse fluorescent proteins and confirmed that the different STARs functioned consistently, independent of the gene being regulated (Fig. 3a).
We next demonstrated that STARs can regulate genes located on the genome by integrating a STAR-controlled sfGFP expression cassette into the genome of E. coli. This led to a near infinite dynamic range because of the indistinguishable OFF level from cellular autofluorescence (Supplementary Fig. 15). We also demonstrated that STARs could regulate endogenous genes on the genome by engineering a strain of E. coli to only express the chemotactic regulator CheZ in the presence of a STAR. As CheZ controls chemotaxis in E. coli, we inferred CheZ expression by performing semi-solid agar motility assays. We observed a significant increase in motility in the presence of the cognate STAR, providing further evidence that STARs can control expression of genomic genes, and can even be used to reprogram cellular phenotype (Fig. 3b and Supplementary Fig. 16).
Finally, we demonstrated the ability of STARs to control gene expression within E. coli based cell-free transcription and translation (TX-TL) systems38, 39, which are gaining momentum for a range of applications including as a platform for rapid prototyping of genetic circuitry24, 40 and paper based diagnostics41, 42. We demonstrated STARs can tightly regulate the synthesis of both sfGFP (Supplementary Fig. 17) and the enzyme catechol 2,3-dioxygenase (C23DO) (Fig. 3c), all while maintaining their high-dynamic ranges observed in vivo. The C23DO regulation example could be particularly useful for alternative colorimetric outputs for diagnostic applications41, 42.
Taken together, we have shown that STARs function robustly to provide large dynamic range control of gene expression in a wide-variety of genetic and environmental contexts.
STARs allow control of multi-gene metabolic pathways
Metabolic engineering is a major application of synthetic biology and there has been much interest in applying genetic control strategies to maximize metabolic pathway productivity and yields. Towards this, we investigated how STARs can be used to control the expression of a multi-gene pathway for the production of deoxyviolacein—a purple compound with valuable anti-bacterial, anti-viral, and anti-tumor properties (Fig. 4a). To do this, a target RNA with a low OFF level was constructed upstream of the four gene deoxyviolacein pathway, and deoxyviolacein production characterized through solvent extraction and spectral quantification (Fig. 4b). This characterization revealed that STARs allowed for tight control of the deoxyviolacein pathway expression, with almost no deoxyviolacein production observed in the absence of STAR. This result confirmed STARs can tightly control metabolic pathways, and more generally be used to regulate multi-gene operon expression through a single regulatory locus.
STARs allow for new RNA regulatory network motifs
Construction of synthetic regulatory networks analogous to those seen in natural systems has been a long-standing goal of synthetic biology43, 44. Towards this goal, we aimed to utilize STARs for creating RNA-only regulatory network motifs in which signals are propagated through transcriptional regulation. We first aimed to use STARs to create a single-input module (SIM)45, in which a STAR is used to coordinate expression of multiple genes. In this manner, instead of a STAR controlling a single gene (one-to-one) in a cell, a STAR is used to control the expression of multiple genes simultaneously (one-to-many). To test this concept, we configured a high-performing target RNA with a low OFF state upstream of three genes encoding mRFP, sfGFP and the enzyme C23DO. STAR activation of each gene was characterized in cells containing only a single gene at a time (one-to-one), as well as in cells containing all three genes at once (one-to-many) (see Supplementary Fig. 18 for details of C23DO characterization). Characterization revealed the SIM to be functional and able to significantly activate all three genes simultaneously (Fig. 5a). Interestingly, when the one-to-many and one-to-one circuits were compared, a decrease in the sfGFP expression level in the presence of the STAR was observed in the one-to-many case. We hypothesized that this decrease could be due to either a retroactivity constraint of using the same pool of STAR to activate multiple targets46, or from contextual effects of expressing both mRFP and sfGFP from a single plasmid47.
We next sought to demonstrate that STARs can be used to construct an activation-activation cascade, whereby the transcriptional output of one STAR regulator is used to drive the expression of another STAR. Using two STARs from our orthogonal library, we constructed a three-stage STAR activation-activation cascade. In this design, a STAR activates a target RNA that is transcriptionally fused to an orthogonal STAR, which in turn activates the expression of an sfGFP output (Fig. 5b and Supplementary Fig. 19). An endoribonuclease RNA processing platform48 was used to separate the transcriptionally fused target RNA-STAR transcript of stage 2 of the cascade. Each of the cascade stages was constructed on separate plasmids and combinations of cascade stages were experimentally characterized (Fig. 5b). This revealed the cascade to be functional, showing maximal activation (~ 4 fold) only in the presence of the full cascade. We note that the fold of activation of the full cascade was lower than that of the lowest fold activation STAR:target pair which limits cascade performance (4 fold compared to 34 fold), which we hypothesized could be due to changes in the relative plasmid copy number encoding the STAR:target RNA.
Finally, we aimed to use STARs to construct an AND logic gate (A AND B). Previously we had shown that this could be achieved by transcriptionally fusing two orthogonal target RNAs in tandem30 to perform signal integration at the level of the target RNA. While successful, our mechanistic studies suggested a potentially simpler strategy was to use a single target RNA and perform signal integration at the level of the STAR. This is similar to a recently published strategy with split toehold translational activators49. We hypothesized that if a STAR was split into two halves between the linear and terminator binding regions (referred to as A and B), that neither half alone could activate transcription. However, if interaction sequences were designed between these two STAR halves, when both were present they would form a complex (AB) that would be capable of activating transcription (Fig. 5c). To test this concept, NUPACK was used to design the interaction sequences between the split STARs. Characterization revealed this AND logic gate design to be functional, only maximally expressing when both STAR halves A and B were present (Fig. 5d).
In summary we have demonstrated that STARs can be configured to expand the variety of RNA-only transcriptional network motifs that serve as the basic building blocks for the construction of entirely synthetic regulatory networks45.
Combining STARs with CRISPRi to create novel network motifs
CRISPRi has emerged as a powerful tool to repress transcription in bacteria32, while STARs represent a versatile tool to activate transcription. We therefore sought to determine whether STARs and CRISPRi are functionally composable into network motifs that can dynamically control gene expression. To do this, we first demonstrated that STARs can be used to control expression of a single guide RNA (sgRNA) from a CRISPRi system32, in effect creating an activation-repression cascade (Fig. 6a and Supplementary Fig. 20). As for the activation-activation cascade, an endoribonuclease RNA processing platform48 was used to cleave the transcriptionally fused target RNA-sgRNA transcript. By using a target RNA with low OFF level expression we were able to tightly control expression of the sgRNA in the absence of STAR, while achieving repression comparable to a constitutive sgRNA control when STAR was present (Fig. 6a).
We next used the combination of activation and repression offered by both systems to create a NIMPLY logic gate (A AND NOT B) (Supplementary Fig. 21). Characterization of the NIMPLY gate revealed it to be functional—expressing mRFP only when the STAR (Input A) was present (Fig. 6b).
Finally, the activation-repression cascade and the NIMPLY logic gate were combined to create the first RNA-based feed-forward loop45. The network architecture of an incoherent type 1 feed-forward loop (I1-FFL) results in a temporal gene expression profile that shows accelerated response compared to direct activation, and an overshooting of the steady-state to create a pulse of gene expression45 (Supplementary Fig. 21). We observed both of these characteristics in our STAR and CRISPRi I1-FFL—with a response time (defined as the time to reach half of the steady-state expression level) of 1.2 h compared to 3.4 h for direct activation, and a pulse of >25% of the final steady-state expression level (Fig. 6c and Supplementary Fig. 22).
Thus we have demonstrated that STARs and CRISPRi are highly composable, allowing for the creation of novel RNA-based network motifs. Given the numerous network motifs that require both activation and repression, the synergistic combination of high-performing CRISPRi and STARs provides a powerful toolset for the engineering of synthetic network motifs.
In this work, we have shown that by combining a simple RNA design motif with computational RNA structure design tools, we can easily design large libraries of high-performing RNA-based transcriptional activators called STARs. These new STARs achieve high functional performance, with the best-performing variants representing some of the highest dynamic ranges achieved by RNA-based regulators to date. Through analyzing the sequence-structure-function relationship of our STAR library, we uncovered design principles that we anticipate will further improve the robustness of our design strategy. In addition, we showed that orthogonal STARs can be identified entirely computationally, opening the door for large libraries of orthogonal and high-performing STARs to be engineered. Thus, the work described here represents a significant advancement of the STAR regulatory system and our ability to computationally design RNA transcriptional regulators.
More broadly, STARs complement an increasingly comprehensive bacterial toolbox of powerful RNA-based regulators that are achieving protein-like dynamic ranges, such as the toehold translational activators21. Thus alongside CRISPRi transcriptional repressors32, STARs now offer a powerful RNA regulatory tool for controlling transcription. Interestingly, we observed that a distinct advantage of STARs over toehold switches was the tight control of gene expression in the OFF state, which in certain genetic contexts, gave rise to infinite folds of activation. We hypothesize that this reflects an important distinction between RNA transcriptional and translational activators. Both regulatory systems are dependent on a de-repressive mechanism—by default folding into RNA structures that prevent gene expression that are then de-repressed through binding of a trans-acting sRNA to achieve activation. As such, one of the key determinants of the level of gene expression in the OFF state is likely to be the misfolding of these RNA structures into non-repressive conformations. Toehold switch hairpins control translation, and are thus present for longer times during their regulatory regime. This longer time inherently allows for more RNA structural fluctuations that can be exploited by ribosomes to initiate translation even in the OFF state50. STARs on the other hand operate cotranscriptionally, meaning there is inherently a limited time-window in which these misfolding events can occur before the regulatory decision has been made. It is clear from our results that certain RNA sequences can fold into terminator structures extremely efficiently, which combined with the short time window of regulation leads to near perfect OFF levels. Of course this perfection in OFF level may incur a metabolic cost, as many rounds of abortive transcripts may be terminated before a STAR can capture one to activate it. Overall the comparison between toeholds switches and STARs reflects fascinating tradeoffs between equilibrium vs. kinetic RNA design, and starts to uncover principles by which kinetically driven transient RNA structures may perform exquisite feats of regulation.
The emergence of high-performing RNA regulators has already seen their deployment in a variety of application spaces41, 42, 49, 51, 52. Contributing to this, we have demonstrated that STARs are ideally suited for a range of existing and novel applications. First, we found that STARs could be synergistically combined with existing regulators to create new composite systems for tunable control of gene expression. This included combining STARs with promoter:RBS strength variants to provide a new route to create switchable and tunable control elements from existing libraries. In addition we showed that combining STARs with inducible promoter systems could tune their transfer functions, which to date has been largely achieved through engineering of DNA promoters and protein transcription factors53. Thus STARs add to an increasingly powerful toolbox of regulators that can be used in composite with existing regulatory mechanisms to alter their regulatory properties54, 55.
We anticipate that the robustness of STAR regulation in diverse genetic and environmental contexts make them ideally suited for existing applications such as metabolic engineering. For example, the ability of STARs to regulate the expression of individual enzymes of metabolic pathways could be used to rapidly identify optimal expression regimes simply through titration of STAR expression level, circumventing the need to construct large libraries of pathway expression variants56. Moreover, the ability of STARs to regulate multigene metabolic pathways can provide new control points for pathway optimization, particular those that interfere with host metabolism and produce toxic intermediates or products. Finally, the relatively small size, functionality on the genome, and gene-independent regulation could make STARs ideally suited as a tool for strain engineering and reprogramming of cellular phenotypes. For example, STARs could be used to create synthetic switches for decoupling biomass and pathway production57, 58.
STARs also add new tools for construction of transcriptional RNA-based network motifs. Importantly, these STARs not only add high-performing orthogonal transcriptional activators to the RNA genetic circuitry toolbox, but are functionally composable with CRISPRi transcriptional repressors, unlocking network motifs that require both activation and repression. As the repertoire of these basic network motifs increases, more complex composite motifs will be possible, allowing for more sophisticated gene control such as the creation of temporal regulatory programs45. As RNA synthetic biology begins to match the complexity of motifs achieved using protein-based transcription factors, we anticipate the emergence of several advantages of RNA-only transcriptional networks including a smaller genetic footprint, potentially less cellular burden and faster network dynamics2, 24.
Finally, one of the major advantages of RNA synthetic biology is the wealth of computational and experimental tools available to understand sequence-structure-function relationships2. Interestingly, here we show that this can not only further design principles of synthetic RNA regulators but also uncover principles of natural RNA regulators. For example, while there have been a wealth of studies examining how sequence and structure of terminator hairpins impacts terminator efficiency26, 59, this work represents one of the few investigations into the effects of the 5’ sequence contexts of intrinsic terminators. Our results suggest that minimization of RNA structure upstream of a terminator hairpin can dramatically increase terminator efficiency. In agreement with this, single-molecule experiments have also shown that minimization of secondary structure upstream of terminator hairpins can increase terminator efficiency, suggesting terminator efficiency is not only an intrinsic property of the terminator hairpins but also of upstream sequence contexts28. As RNA synthetic biology begins to turn its attention to more diverse regulatory systems, we anticipate that through engineering we will uncover a deeper understanding of natural RNA regulators.
In summary, STARs represent a powerful class of RNA regulatory mechanisms that can be computationally designed to offer high-performing and orthogonal control of gene expression in diverse contexts. The uncovering of a kinetically driven RNA regulatory motif that can nonetheless be designed using equilibrium-focused computational design algorithms gives great hope for more advanced RNA synthetic biology. We anticipate that as we learn more about RNA cotranscriptional folding27, we will unlock even deeper principles of RNA design, and that the advances in RNA synthetic biology have only begun.
All plasmids used in this study can be found in Supplementary Data 1 with key sequences provided in Supplementary Datas 2-4 and Supplementary Tables 1-2. STAR and target expressing DNA plasmids were constructed using inverse PCR (iPCR). Schematic of representative DNA plasmid maps used are shown in Supplementary Fig. 3. All assembled plasmids were verified using DNA sequencing.
Integration of STAR regulated genes into the E. coli genome
Strains containing genomic insertions of STAR regulated genes were created using the clonetegration platform60 as summarized in Supplementary Table 3. The HK022 plasmid was used to integrate constructs into the attB site of the E. coli genome. Successful integrations were identified by selecting for the presence of an integrated antibiotic resistance cassette by growing bacteria in the presence of the corresponding antibiotic and by performing colony PCR to amplify the genome integrated cassette to confirm the size and sequence of this fragment.
In vivo fluorescence measurements
In vivo fluorescence characterization experiments were performed in E. coli strain TG1 except for Supplementary Fig. 15 as described in the figure legend. For the in absence of STAR (-STAR) condition the no-STAR control plasmid (JBL002) was used. Experiments were performed for nine biological replicates collected over three separate days. It should be noted that for flow cytometry if cell counts did not meet the required threshold as described below because of slower growth rate, a fourth day of three more biological replicates was used. For each day of fluorescence measurements, plasmid combinations were transformed into chemically competent E. coli cells and plated on LB + Agar (Difco) plates containing combinations of 100 μg mL−1 carbenicillin, 34 μg mL−1 chloramphenicol and/or 50 μg mL−1 spectinomycin depending on plasmids used, and incubated approximately 17 h (h) overnight at 37 °C. Plates were taken out of the incubator and left at room temperature for approximately 7 h. Three colonies were used to inoculate three cultures of 300 μL of LB containing antibiotics at the concentrations described above in a 2 mL 96-well block (Costar), and grown for approximately 17 h overnight at 37 °C at 1000 rpm in a VorTemp 56 (Labnet) bench top shaker. Four μL of each overnight culture were then added to 196 μL (1:50 dilution) of either supplemented M9 minimal media (1 × M9 minimal salts, 1 mM thiamine hydrochloride, 0.4% glycerol, 0.2% casamino acids, 2 mM MgSO4, 0.1 mM CaCl2) or for the activation-activation cascade and CRISPRi experiments (Figs. 5b, 6a, b and Supplementary Fig. 20) MOPS EZ rich defined media (Teknova) (1 x MOPS mixture, 1.32 mM K2HPO4, 1x ACGU, 1 x supplement EZ, 0.2% glucose) containing the selective antibiotics and grown for 6 h at the same conditions as the overnight culture. For Figs. 5b and 6a, b and Supplementary Fig. 20 4 μL of 5 μg mL−1 of anhydrotetracycline (aTc) was also added (1:50 dilution). For Fig. 2b and Supplementary Fig. 14 4 μL of acyl-homoserine lactone (AHL) (N-(β-ketocaproyl)-L-Homoserine lactone, Cayman Chemical) solution was also added (1:50 dilution) at the indicated AHL concentration. For Fig. 6c and Supplementary Fig. 22 16 μL of each overnight culture were added to 768 μL of MOPS EZ rich defined media containing selective antibiotics and grown for 4 h at the same conditions as the overnight culture. After 4 h 16 μL of either 5 μM of AHL or ddH20 were added and cultures grown for 7 h and fluorescence measured every 1 h. For all bulk fluorescence measurements after subculture 50 μL of this culture was then transferred to a 96-well plate (Costar) containing 50 μL of phosphate buffered saline (PBS). Fluorescence (FL) and optical density (OD) at 600 nm were then measured using a SynergyH1 plate reader (Biotek). The following settings were used: sfGFP fluorescence (485 nm excitation, 520 nm emission), mRFP fluorescence (560 nm excitation, 630 nm emission) or YFP fluorescence (510 nm excitation, 540 nm emission).
For in vivo flow cytometry measurements for Figs. 1b, d and 5d, Supplementary Fig. 4, Supplementary Fig. 11 and Supplementary Fig. 15, subcultures (6–12 μL) were transferred into a FACS round-bottom 96 well plate with 244 μL of PBS containing 2 mg mL−1 kanamycin to stop translation. The plate was then read on a BD LSR II using the high-throughput setting with the high-throughput sampler (HTS). The following parameters were collected on the BD LSR II: forward scatter (FSC), side scatter (SSC), and sfGFP fluorescence (488 nm excitation, 515–545 nm emission). Each sample was required to have at least 5000 counts or up to 50,000 counts. Counts were gated as explained in Supplementary Note 4 and Supplementary Fig. 23. sfGFP fluorescence values were recorded in relative channel number (1–262,144 corresponding to 18-bit data) and the geometric mean over the gated data was calculated for each sample.
For in vivo flow cytometry measurements for Fig. 1c and Supplementary Fig. 6, subcultures (1–12 μL) were transferred into a FACS round-bottom 96 well plate with 250 μL of PBS containing 2 mg mL−1 kanamycin to stop translation. The plate was then read on a BD Accuri C6 Plus CSampler. The following parameters were collected on the BD Accuri C6 Plus: forward scatter (FSC), side scatter (SSC), and sfGFP fluorescence (FL1: FITC 488 nm excitation, 518–548 nm emission). Each sample was measured up to 50,000 counts. Counts were gated as explained in Supplementary Note 4 and Supplementary Fig. 23. sfGFP fluorescence values were recorded in relative channel number (1–16,777,216 corresponding to 24-bit data) and the geometric mean over the gated data was calculated for each sample.
Bulk fluorescence data analysis
On each 96-well block there were two sets of controls; a media blank and E. coli TG1 cells transformed with combinations of control plasmids JBL001, JBL002 or JBL5999 (blank cells) and thus not expressing reporter genes (Supplementary Data 1 and Supplementary Fig. 3). The block contained three replicates of each control. OD and FL values for each colony were first corrected by subtracting the corresponding mean values of the media blank. The ratio of FL to OD (FL/OD) was then calculated for each well (grown from a single colony) and the mean FL/OD of blank cells was subtracted from each colony’s FL/OD value. Three biological replicates were collected from independent transformations, with three colonies characterized per transformation (9 colonies total). Means of FL/OD were calculated over replicates and error bars represent standard deviations (s.d). Normalized FL/OD was calculated by dividing the mean values of a specific condition indicated in the figure legend and propagating errors. For Fig. 6c and Supplementary Fig. 22 fluorescence data for each condition were individually normalized by dividing by the final fluorescence values at 7 h for each colony before calculating the mean and standard deviation at each time point.
Flow cytometry data analysis
Spherotech 8-Peak Rainbow Calibration Beads (Spherotech) were used to obtain a calibration curve to convert fluorescence intensity (geometric mean, relative channel number) into Molecules of Equivalent of Fluorescein (MEFL). For each experiment, E. coli cells transformed with combinations of control plasmids JBL001 and JBL002 (blank cells) not expressing sfGFP were used as a control. The mean MEFL value of blank cells without sfGFP expression was subtracted from each colony’s MEFL value. Mean MEFL values were calculated for at least 7 biological replicates and error bars represent the standard deviation (s.d). The ON level (+STAR) is the MEFL of cells containing the target expressing DNA plasmid and the STAR expressing plasmid and the OFF level (-STAR) is the MEFL of cells containing the target expressing DNA plasmid and a no-STAR control plasmid (JBL002). The fold activation was calculated by dividing the ON level by the OFF level (ON/OFF). For Fig. 1b a Welch’s t-test was performed on each -STAR and +STAR condition with all STAR variants showing statistically significant differences between -STAR and +STAR conditions (P < 0.05).
In vivo catechol 2,3-dioxygenase characterization
For in vivo catechol 2,3-dioxygenase absorbance measurements E. coli strain TG1 was used. Plasmids were transformed, grown overnight and subcultured as described for in vivo bulk fluorescence measurements. After subculture growth, OD at 600 nm was measured and cells diluted to an OD at 600 nm of 0.1 in 100 μL of M9 media. 98 μL of these cells were then transferred to a 96-well plate (Costar) and 2 μL of 25 mM catechol (Sigma Aldrich) added. Conversion of catechol to 2-hydroxymuconate semialdehyde was characterized by measuring OD at 385 nm in a BioTek SynergyH1 microplate reader for 25 min at 5 min intervals and reactions held at 33 °C. The maximum rate of catechol to 2-hydroxymuconate semialdehyde conversion was determined between 0 and 10 min as described in Supplementary Fig. 18. Repeats and controls were as described for bulk fluorescence measurements.
In vivo deoxyviolacein pathway characterization
For in vivo deoxyviolacein quantification E. coli strain DH5 alpha pir were used. Plasmids were transformed and grown overnight as described for in vivo bulk fluorescence measurements. 150 μL of overnight culture was then removed and pelleted by centrifugation at 13,000 xg for 5 min. The pellet was resuspended in 150 μL of 100% ethanol (EtOH) and cells lysed by boiling for 5 min, followed by centrifugation at 13,000 xg for 5 min. One hundred μL of supernatant was then carefully removed, added to a costar 96-well plate and the deoxyviolacein quantified by measuring OD at 525 nm in a BioTek SynergyH1 microplate reader. OD 525 nm values of EtOH extracts from blank cells containing control plasmids were subtracted from values. Repeats and controls were as described for bulk fluorescence measurements.
RNA extraction and reverse transcription quantitative PCR
For extraction of total RNA for reverse transcription quantitative PCR (RT-qPCR) experiments E. coli strain TG1 was used. Plasmids were transformed and subsequent colonies grown overnight as described for in vivo bulk fluorescence measurements. For each biological replicate, 20 μL of a single overnight culture was added to three wells containing 980 μL (1:50 dilution) of supplemented M9 minimal media containing the selective antibiotics and grown for 6 h at the same conditions as the overnight cultures. For each plasmid combination, 500 μL of cells were removed from three wells (grown from one colony) and combined into a 1.6 mL tube and pelleted by centrifugation at 13,000 xg for 1 min. The supernatant was removed and the remaining pellet resuspended in 750 μL of Trizol reagent (Life Technologies), homogenized by repetitive pipetting, incubated at room temperature for 5 min. 150 μL of chloroform (Sigma Aldrich) was added, mixed for 15 s and incubated at room temperature for 3 min. Following incubation, the samples were centrifuged for 15 min at 12,000 g at 4 °C and 200 μL of the top aqueous layer removed. One μL of glycogen (20 μg μL−1) (Life Technologies) and 375 μL of isopropanol were added to the aqueous phase, incubated at room temperature for 10 min and centrifuged for 15 min at 13,000 xg at 4 °C. Following centrifugation the isopropanol was carefully removed from the total RNA glycogen pellets, washed in 600 μL of chilled 70% EtOH and centrifuged for 2 min at 13,000 xg at 4 °C. EtOH was removed and tubes centrifuged for another 2 min at 13,000 xg at 4 °C to ensure all EtOH was effectively removed. Pellets were resuspended in 20 μL of RNase free double-distilled water (ddH20). Purified total RNA samples were quantified by the Qubit Fluorometer (Life Technologies) and were diluted to a concentration of 10 ng μL−1 in a total of 10 μL RNase free ddH20 and digested by Turbo DNase (Life Technologies) according to manufacturer’s protocol. After digestion 150 μL of RNase free ddH20 and 200 μL phenol chloroform (Acros Organics) was added, vortexed for 10 s and incubated for 3 min at room temperature and centrifuged for 10 min at 13,000 xg at 4 °C. After centrifugation, 190 μL of the top aqueous layer was carefully removed, 190 μL of chloroform added, samples vortexed for 10 s, incubated for 3 min at room temperature and centrifuged for 10 min at 13,000 xg at 4 °C. After centrifugation, 170 μL of the top aqueous layer was carefully removed, 170 μL of chloroform added, samples vortexed for 10 s, incubated for 3 min at room temperature and centrifuged for 10 min at 13,000 xg at 4 °C. After centrifugation, 120 μL of the top aqueous layer was carefully removed and added to 1 μL glycogen, 360 μL of chilled 100% EtOH and 12 μL of 3 M sodium acetate pH 5. Samples were vortexed for 10 s and stored at −80 °C for 1 h. Samples were then centrifuged for 30 min at 13,000 xg at 4 °C. Supernatant was removed and the pellets washed in 600 μL of chilled 70% EtOH. Samples were then centrifuged for 2 min at 13,000 xg at 4 °C and the EtOH removed. Samples were re-centrifuged for 2 min at 13,000 xg at 4 °C and residual EtOH removed, pellets air-dried for 10 min and eluted in 10 μL RNase free ddH2O.
Reverse transcription quantitative PCR measurements
To enable comparison between different samples, each DNase treated sample was normalized to contain the same total RNA concentration. Each sample was quantified using a Qubit Fluorometer and the sample diluted to 0.025 ng μL−1 of total RNA in 20 μL RNase free ddH20. For cDNA synthesis 1 μL of this total RNA, 1 μL of 2 μM reverse transcription primer (Supplementary Table 4), 1 μL of 10 mM dNTPs (New England Biolabs) and RNase free ddH20 up to 6.5 μL were incubated for 5 min at 65 °C and cooled on ice for 5 min. 0.25 μL of Superscript III reverse transcriptase (Life technologies), 1 μL of 100 mM Dithiothreitol (DTT) and 1x first-strand buffer (Life technologies), 0.5 μL RNaseOUT (Life Technologies) and RNase free H2O up to 3.5 μL were then added, incubated at 55 °C for 1 h, 75 °C for 15 min and then stored at −20 °C. qPCR was performed using 5 μL of Maxima SYBR green qPCR master mix (Thermo Scientific), 1 μL of cDNA solution and 0.5 μL of 2 μM sfGFP qPCR primers (Supplementary Table 4) and RNase free ddH2O up to 10 μL. A ViiA 7 real-time PCR machine (Applied Biosystems) was used for data collection using the following PCR program: 50 °C for 2 min, 95 °C for 10 min followed by 30 cycles of, 95 °C for 15 s and 60 °C for 1 min. All measurements were followed by melting curve analysis. A MicroAmp EnduraPlate Optical 384-well plate (Applied Biosystems) and an optically clear seal (Applied Biosystems) were used for all measurements. Results were analyzed using the ViiA 7 software (Applied Biosystems) by a relative standard curve. For quantification, a five point standard curve covering a 10,000-fold range of sfGFP DNA concentrations was run in parallel and used to determine the relative sfGFP cDNA abundance in each sample. It was shown that the sfGFP qPCR primer set had a primer efficiency ~ 80%. All cDNA samples were measured in triplicate and non-template controls run in parallel to control for contamination and non-specific amplification or primer dimers. In addition, qPCR was performed on total RNA samples to confirm no DNA plasmid was detected under the conditions used. Melting curve analysis and DNA gel electrophoresis were performed to confirm that only a single product was amplified.
Characterization of E. coli motility
Plasmids were transformed and subsequent colonies grown overnight as described for in vivo bulk fluorescence measurements in E. coli strains described in figure legends. For each biological replicate, 40 μL of a single overnight culture was added to a culture tube containing 1960 μL (1:50 dilution) of LB containing the selective antibiotics and grown for 6 h at the same conditions as the overnight cultures. We note that 17 μg mL−1 chloramphenicol was used for the genomically integrated chloramphenicol resistance cassettes. After subculture, OD at 600 nm was measured and cultures diluted to an OD at 600 nm of 0.25 with fresh LB containing selective antibiotics. 3 μL of diluted cultures were plated on to the center of 0.25% tryptone agar (1% tryptone, 0.5% NaCl, 0.25% agar) plates (each containing 25 ml of 0.25% tryptone agar) containing selective antibiotics, air dried for 10 min and incubated overnight at 30 °C on a flat surface. Plates were imaged under trans white light illumination using a Chemidoc XRS + (BioRad).
Characterization of STARs in TX-TL
TX-TL cell extract, amino acids and energy solution were gifted by the Noireaux lab (University of Minnesota) and prepared as described in Garamella et al.38 Extract, amino acids and energy solution and a buffer (containing Mg-glutamate, K-glutamate and PEG-8000) were combined and pre-incubated on a heat block at 37 °C for 20 min. After incubation 7.88 μL of the TX-TL mix was added to 2.63 μL of DNA and water solution. Each reaction contained 8 nM of a target DNA plasmid and either 15 nM of a no-STAR control plasmid or 15 nM of a DNA plasmid encoding a cognate STAR. Dilutions of DNA plasmids were calculated using a modified spreadsheet derived from Sun et al.39 For TX-TL characterization of STAR-regulated catechol 2,3-dioxygenase, 1 mM of catechol (Sigma Aldrich) was added while maintaining a total volume of 10.5 μL. 10 μL of each reaction was then placed into their respective wells in a 384-well plate (Nunc) and sealed with an optical clear seal (Thermo Scientific) and measured in a BioTek SynergyH1 microplate reader. For fluorescence measurements, reactions were held at 29 °C and measured every 5 min (485 nm excitation, 520 nm emission). For catechol 2,3-dioxygenase expression reactions were held at 33 °C and OD at 385 nm measured every 5 min. On each 384-well plate a control consisting of a TX-TL reaction containing no DNA was used and the FL or OD values of this reaction were subtracted from each condition to remove background signal from the TX-TL reaction. Rate of sfGFP production was determined by taking the time derivative of fluorescence measurements, and the average rate of sfGFP production calculated from the linear sfGFP production phase as described in Supplementary Fig. 17. n = 3 biological replicates were collected for each independent repeat, for a total of three independent repeats. Mean values were calculated from these n = 9 replicates and error bars represent standard deviations (s.d).
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The authors gratefully acknowledge the gift of TX-TL extract and buffers used in this work from Vincent Noireaux’s laboratory (University of Minnesota). The authors gratefully acknowledge the gift of dCas9 expression plasmids used in this work from Stanley Qi’s laboratory (University of Stanford). The authors gratefully acknowledge the gift of a deoxyviolacein expression plasmid from Robert Egbert of Adam Arkin’s laboratory (University of California, Berkley). The authors also thank John Dueber (University of California, Berkley) for helpful conversations about the deoxyviolacein pathway. The authors also thank Niles Pierce’s laboratory (California Institute of Technology) for helpful conversations about NUPACK and RNA design. This work was supported by an NSF CAREER Award (1452441 to J.B.L.), the Defense Advanced Research Projects Agency (contract HR0011-16-C-0134 to J.B.L.), and Searle Funds at The Chicago Community Trust (to J.B.L.).