Computational design of small transcription activating RNAs for versatile and dynamic gene regulation

A longstanding goal of synthetic biology has been the programmable control of cellular functions. Central to this is the creation of versatile regulatory toolsets that allow for programmable control of gene expression. Of the many regulatory molecules available, RNA regulators offer the intriguing possibility of de novo design—allowing for the bottom-up molecular-level design of genetic control systems. Here we present a computational design approach for the creation of a bacterial regulator called Small Transcription Activating RNAs (STARs) and create a library of high-performing and orthogonal STARs that achieve up to ~ 9000-fold gene activation. We demonstrate the versatility of these STARs—from acting synergistically with existing constitutive and inducible regulators, to reprogramming cellular phenotypes and controlling multigene metabolic pathway expression. Finally, we combine these new STARs with themselves and CRISPRi transcriptional repressors to deliver new types of RNA-based genetic circuitry that allow for sophisticated and temporal control of gene expression.

Schematics and fluorescence characterization of STAR variants that were used to determine optimal lengths of (b) the linear binding sequence and (c) the stem and loop binding sequence of the AD1 STAR 5 . STAR variants were created by truncating either the (b) 3' or (c) 5' end of the STAR while the target RNA was kept constant. Characterization revealed the optimal lengths were 21 nucleotides (nt) for the stem and loop binding sequence and ~40 nt for the linear binding sequence. In addition, it was observed that neither the stem and loop binding sequence itself (0 nt in (b) indicated by *) or the linear binding sequence itself (0 nt in (c) indicated by ‡) of the STAR were sufficient to appreciably activate transcription.  Figure 4. Coefficient of variation for fluorescence characterization of STAR variants. Coefficient of variation (CV) was determined by calculating the ratio of the standard deviation and the mean of fluorescence measurements for the STAR:target variants. Data is derived from fluorescence characterization described in Figure 1b for both in the absence (-STAR) and the presence (+STAR) of cognate STAR expression plasmids. The yaxis was limited to 1 to aid interpretation. The CV for variant 101 in the +STAR condition was 1.93.         at t = 0 to induce STAR expression. STAR accumulates and at t = 1 an activation threshold is reached to activate mRFP transcription, resulting in an increase in mRFP levels. Simultaneously the sgRNA and dCas9 expression is activated and at t = 2 reaches a repression threshold that represses transcription of mRFP, resulting in decrease in mRFP levels. As a result the I1-FFL creates a pulse of mRFP expression 9 . In addition if mRFP is not completely repressed by CRISPRi, the I1-FFL should accelerate the response time towards steady-state compared to direct activation 9 .

Supplementary Note 2. Determining STAR design principles
Our goal was to determine whether we could uncover design principles from the computationally designed STAR library. We began by understanding the relationship between target RNA regulatory characteristics and dynamic range. We first compared the fluorescence characterization of our target RNA library in both the absence (-STAR [OFF state]) and presence (+STAR [ON state]) of STAR, to the fold activation of each STAR:target variant (Supplementary Fig.  24). We note that variant 101 was excluded because this design gave rise to no activation (Figure 1b). While we saw a greater variation in the ON state fluorescence for designs with high-dynamic range, we observed that designs with high-dynamic ranges consistently had a low OFF level of fluorescence. In other words, target RNA transcriptional termination efficiency in absence of STAR appeared to be a key determinant of dynamic-range.
Given this, we next aimed to determine the relationship between target RNA sequence and structure, and the OFF state fluorescence, which we used as a proxy for transcriptional termination efficiency. We began by studying the effect of sequence composition of the computationally designed linear region of the target RNA (Supplementary Fig. 25). Overall we observed a relatively even distribution of the four nucleotides across the computationally designed target RNAs, with a slight decreased preference for guanosine (Supplementary Fig.  25a). We next compared how percentage content of each nucleotide affected OFF state fluorescence. We note that as percentage nucleotide content is often used as a proxy for free energy, we compared this to the natural log of fluorescence according to a model whereby the observed fluorescence is proportional to the equilibrium constant between the folded and un-folded states of the linear region. This model effectively assumes equilibration of the linear region before termination, which could be valid given the fast timescales of RNA folding and the timescale of pausing of polymerase on a polyU tract. Interestingly we observed a relatively strong negative correlation (R 2 of ~0.5) between both uracil (U) and cytosine (C) percentage content and the OFF state fluorescence (Supplementary Fig. 25b). Moreover, we observed a similar correlation when both U and C content were combined to give an R 2 of ~0.5 (Supplementary Fig.  25c). Taken together, this suggests that high UC content improves transcription termination efficiency of the target RNA. We hypothesized that this bias towards UC pyrimidine nucleotides for high termination efficiency may be attributed to base stacking interactions. Stacking interactions between the aromatic nucleotide bases are a major contributor to RNA structures. For example, single-stranded RNAs base stacking has been shown to influence structural properties such as rigidity and formation of partial helical conformations 12 . Moreover, it is wellestablished that sequence is a major determinant of base stacking interactions, with decreasing stacking free energies in the order purine-purine, purinepyrimidine, pyrimidine-purine and pyrimidine-pyrimidine 10 . Indeed, when we calculate the stacking free energy of different target RNA's linear regions according to previously determined stacking free energy values 10 , we observed a negative correlation to OFF state fluorescence (Supplementary Fig. 25d).
We next turned to the relationship between secondary structure and termination efficiency. Using NUPACK to predict secondary structure within the target RNA's linear region we compared the predicted ensemble free energy of each target RNA to the OFF state fluorescence (Supplementary Fig. 26). We observed a negative correlation with an R 2 of 0.342, suggesting that secondary structure within the linear region of the target RNA negatively impacts the transcription termination efficiency.
Taken together this suggested that both the presence of base stacking and secondary structure within the target RNA's linear region decreased transcription termination efficiency. We next sought to determine whether these observations represented a general STAR design principle or were specific for the AD1 terminator hairpin that was used as a terminator scaffold for the STAR library. To test this, we used NUPACK to computationally design a small library of STARs using the terminator from the E. coli ribA gene as a terminator scaffold 5 . This library was constructed and functionally characterized (Supplementary Fig. 7). We again observed a strong negative correlation between the OFF state fluorescence and UC content (R 2 0.976), stacking free energy (R 2 0.975) and ensemble free energy (R 2 0.754) of the target RNA's linear region (Supplementary Fig. 27). As such, this suggested that the negative impact of base stacking and secondary structures on transcription termination efficiency appeared to be a generalizable principle for the STAR regulatory system.

Supplementary Note 3. Predicting orthogonal STAR libraries
To predict orthogonal STAR:target RNA pairs, we developed an in house algorithm that uses NUPACK 13 to model STAR:target interactions and select pairs with minimal interactions between non-cognate pairs. This algorithm first creates two NUPACK input files for each of the 101 target RNAs: <prefix>.in and <prefix>.list whereby <prefix> is the target variant identifier. The <prefix>.in file specifies the number of strands (101 STARs and 1 target RNA), the 101 STAR variant sequences and the specific target variant sequence, as shown below: It should be noted that only the linear binding sequences and linear region of the STAR:target were used.
The <prefix>.list file specifies the strand composition of the complexes to be analyzed -in this case all the STAR variants against a single target RNA. The partition function, equilibrium base-pairing probabilities and minimum free energy (MFE) structures of the STAR-target complexes are then calculated by running NUPACK locally using the test tube analysis complex function 7, 11 with the following options:

complexes -T 37 -material rna -pairs -mfe -degenerate <prefix>
This results in an output file called <prefix.ocx-mfe> which contained the predicted minimum free energy folds and a dot-bracket structure for each STARtarget complex folded. The algorithm then compiles the dot-bracket structures of each STAR-target complex and counts the nucleotides that are unpaired or involved in intramolecular structures within the STAR strand. This results in a count of unpaired nucleotides for all possible 10,201 complexes, from which the number of linear region predicted base pairing interactions between STAR and target are determined (Supplementary Fig. 8). Based upon experimental characterization (Supplementary Fig. 9) we predicted that sets of STAR:target RNAs that showed less than 13 base pairs of interaction in the target RNA for one of the combinations (i.e. either STAR 1:target 2 or STAR 2:target 1) would be orthogonal. To identify predicted orthogonal sets, we first sorted the list of pair interactions to identify pairs that were predicted to have less than 13 bases of interaction. Additional STAR:target combinations were added to each pair to identify combinations of three STAR:targets predicted to be orthogonal. This was repeated to identify sets of four, five, etc. pairs of orthogonal STAR:targets. An example is shown in Supplementary Fig. 10 that shows a set of 6 STARs that were identified using this approach, and then experimentally validated (Figure  1d).