Genetic circuit characterization by inferring RNA polymerase movement and ribosome usage

To perform their computational function, genetic circuits change states through a symphony of genetic parts that turn regulator expression on and off. Debugging is frustrated by an inability to characterize parts in the context of the circuit and identify the origins of failures. Here, we take snapshots of a large genetic circuit in different states: RNA-seq is used to visualize circuit function as a changing pattern of RNA polymerase (RNAP) flux along the DNA. Together with ribosome profiling, all 54 genetic parts (promoters, ribozymes, RBSs, terminators) are parameterized and used to inform a mathematical model that can predict circuit performance, dynamics, and robustness. The circuit behaves as designed; however, it is riddled with genetic errors, including cryptic sense/antisense promoters and translation, attenuation, incorrect start codons, and a failed gate. While not impacting the expected Boolean logic, they reduce the prediction accuracy and could lead to failures when the parts are used in other designs. Finally, the cellular power (RNAP and ribosome usage) required to maintain a circuit state is calculated. This work demonstrates the use of a small number of measurements to fully parameterize a regulatory circuit and quantify its impact on host.


II. Supplementary Tables
Supplementary Table 1: Sequences of cryptic sense and antisense promoters. Supplementary

Supplementary Figure 2:
Size distribution of the mapped fragments. Distribution of the mapped fragments from (a) RNA-seq and (b) ribosome profiling experiments across eight induction states are shown (Methods). For each sequencing experiment, all raw fragments that were mapped to the reference sequences (E. coli DH10B genome and plasmids) were collected and their length distribution was generated. The circuit state is indicated by the presence or absence of the IPTG/aTc/Ara inducers (top right of each graph). Source data are provided as a Source Data file.

Supplementary Figure 3:
Comparison of conventional and end-enriching RNA-seq methods. The transcripts are shown when the average fragment size is 280 nt (a) and 24 nt (b) (Methods). Data in (a) are taken from Gorochowski et al 4 . The TSSs are indicated by the thick dashed lines and the regions used to calculate the promoter strength are in grey. Note that effects due to the upstream promoter and downstream ribozyme are obscured when the fragment size is large. The circuit state is shown in the upper left of each profile: IPTG/aTc/Ara. Source data are provided as a Source Data file.

Supplementary Figure 6:
Comparison of transcription and translation between native and circuit genes. (a) Each black dot is a native genome-encoded gene, measured using the control cell (Methods). The RD and FPKM of the circuit repressor genes across all circuit states are shown in blue if the gate is on (promoter controlling the repressor gene is off) and red if it is off (promoter controlling the repressor gene is on). (b) Distribution of RD and FPKM of native genes shown in a are compared with those of circuit repressor genes when their corresponding gate is off (repressor gene is transcribed). Histogram bars show the normalized counts (distribution) of each RD and FPKM within the data shown in a, and lines are the normal distribution fit to the histograms calculated using stats.norm function of scipy in Python. Source data are provided as a Source Data file.

Supplementary Figure 8:
Identification of 70 motifs in cryptic promoter sequences. The consensus sites for -10 and -35 box of 70 sequences are shown for 27 cryptic promoters with activity > 10 -5 RNAP/s per DNA (Supplementary Table 1). Logos were generated using WebLogo 5 . Consensus E. coli sigma 70 motifs are also shown for comparison.

Supplementary Figure 9:
Evidence for a cryptic antisense promoter in the ribozymes. (a) A schematic is shown of the RiboJ class of insulators. The variable region is different for each insulator, but the sequence shown is the hairpin shared between them. The putative -10 box of the cryptic antisense promoter is indicated as well as the TSS (triangle). The RNA-seq profiles in the sense (top, dark grey) and antisense (bottom, light grey) are shown. The location of the TSSs are shown as triangles and these points correspond to an increase in transcription in the antisense direction. Data were selected for a particular combination of inducers, shown at the bottom of each graph (IPTG/aTc/Ara), but the evidence for the antisense promoter is present for all the states where the gene it is fused to is transcribed.
The full sequences of all ribozymes are shown with the constant hairpin sequence underlined, and the TSS of the cryptic antisense promoter in bold. Source data are provided as a Source Data file.

Supplementary Figure 10:
Importance of ribozyme in sensor and gate characterizations. (a) The cryptic antisense promoters in the riboJ insulators pose a problem in defining the boundary across which the RNAP flux is quantified as an input to the gate. These antisense promoters act to reduce the "real" RNAP flux produced by the input promoter (that is the output promoter of a sensor or upstream gate). This effect was canceled out in our circuit design because the gates and all the sensors were characterized using riboJ upstream of the yfp reporter (and our reference promoter also uses riboJ). (b) A riboJ insulator controls the output of a sensor in two ways. It increases the mRNA stability by 2-fold 6 , while its cryptic antisense promoter reduces the RNAP flux by around 10-fold, resulting in a 5-fold drop in sensor output activity. When connecting the sensor to a gate, with and without the ribozyme, this 5fold difference in sensor output (input to the gate) can results in an error in the gate's output activity.

Supplementary Figure 11:
Evidence of transcriptional attenuation. The RNA-seq profile is shown for the genes indicated at the top. The circuit state is shown in the upper left of each profile: IPTG/aTc/Ara. The dashed line is the maximum RNAP flux. When >10-fold continuous decrease in RNAP flux along the gene sequence is observed, this is interpreted as evidence for attenuation and is marked with the fold-change. Source data are provided as a Source Data file.

Supplementary Figure 12:
Evidence for translational errors. Ribosome occupancy (per transcript) profiles demonstrate four different translational errors across the circuit. (a) Translation at the 5'-UTR. Black dashed lines are the annotated start codon for each gene and light blue dashed lines mark the position of start codons with upstream SD-like sequences found near the elevated part of the profiles. The inset shows the circuit state for which the data is shown: IPTG/aTc/Ara. To the right, the sequence corresponds to 25 nucleotides upstream of the start codon (light blue ATG) with the SD sequences highlighted in red. SD was identified as the region hybridizing with the 16S rRNA sequence An example of mRNA secondary structure is shown for the best (sarJ) and worst (riboJ57) performing ribozymes using data from circuit state -/+/-(IPTG/aTc/Ara). Blue structures show the hammerhead RNA regions. Red dots are the ribozyme cleavage sites. Structures "in isolation" show the secondary structure of ribozyme alone, whereas structures "in circuit" show mRNA secondary structure when transcribed from upstream promoters in state -/+/-. Light brown structures are mRNA regions upstream of the ribozyme. For the sarJ ribozyme, the hammerhead RNA structure remains intact even in the circuit context. However, because hammerhead RNA structure in riboJ57 ribozyme is much weaker, ribozyme region undergoes conformational change when transcribed as a part of the long mRNA in the circuit, deactivating its cleavage capability. Source data are provided as a Source Data file. Fitting the gate response functions to omics data. Input and output promoter activity data for each gate is shown across all eight circuit states (combinations of inducers). Points in red represent states of BetI and HlyIIR gates that had failed. These outliers were disregarded when calculating the best fit for these two gates. The line indicates the best fit to the data (Equation 1), with fitted parameters K (in units of RNAP/s) and n (dimensionless). The fitted K is then converted to the repressor binding constant k in units of protein number using Equation 22. The resulting repressor binding constant k and n are presented in Table 2. Source data are provided as a Source Data file.

Supplementary Figure 20:
Dynamic modelling of genetic circuit using parameters extracted from omics data. See Methods for simulation details. These graphs are showing transitions between states, expanding on the one shown in Figure 5a. The switch between induction conditions is shown at the top (IPTG/aTc/Ara). The top panel indicates the changes in the sensor promoter activity, with P BAD,2 and P Tet,2 not shown as they are similar to P BAD,1 and P Tet,1 , respectively. In both simulations, the transition in the activity of output promoter (P BM3R1 ) is monotonic, and circuit is free of any glitches. Source data are provided as a Source Data file.