Quantitative modeling of transcription and translation of an all-E. coli cell-free system

Cell-free transcription-translation (TXTL) is expanding as a polyvalent experimental platform to engineer biological systems outside living organisms. As the number of TXTL applications and users is rapidly growing, some aspects of this technology could be better characterized to provide a broader description of its basic working mechanisms. In particular, developing simple quantitative biophysical models that grasp the different regimes of in vitro gene expression, using relevant kinetic constants and concentrations of molecular components, remains insufficiently examined. In this work, we present an ODE (Ordinary Differential Equation)-based model of the expression of a reporter gene in an all E. coli TXTL that we apply to a set of regulatory elements spanning several orders of magnitude in strengths, far beyond the T7 standard system used in most of the TXTL platforms. Several key biochemical constants are experimentally determined through fluorescence assays. The robustness of the model is tested against the experimental parameters, and limitations of TXTL resources are described. We establish quantitative references between the performance of E. coli and synthetic promoters and ribosome binding sites. The model and the data should be useful for the TXTL community interested either in gene network engineering or in biomanufacturing beyond the conventional platforms relying on phage transcription.


Results and Discussion
phenomenology. The transcription of the all-E. coli TXTL toolbox relies on the core RNA polymerase and the primary sigma factor 70 (RpoD), as discussed previously in several articles 6,19 . All the circuits executed in this system, commercialized under the name myTXTL, are booted up through this transcription mechanism. In our reference plasmid P70a-deGFP, the gene degfp encoding the reporter protein deGFP is cloned under the promoter P70a, specific to sigma 70 (Fig. S1). P70a, derived from the phage lambda, is one of the strongest E. coli promoters reported so far. The untranslated region (UTR), located between the promoter and the ATG, is the UTR downstream of promoter 14 from the phage T7 20 . It is the strongest bacterial UTR reported so far, and used in many standard plasmids to overexpress proteins in E. coli. It is defined as UTR1 in this work. The synthetic transcription terminator T500 is cloned downstream of the degfp gene. P70a-deGFP is designated as our reference plasmid because it delivers the strongest gene expression in vitro. We compare the performance of single regulatory elements (promoters, UTR, terminators) and of other plasmids to P70a-deGFP.
The typical kinetics of deGFP synthesis in a TXTL reaction, using P70a-deGFP as template, shows three phases (Fig. 1a). The first regime, that lasts 30 min to 1 h, is a transient regime when gene expression starts. The second regime, between 1-6 h, corresponds to a steady state. The reporter protein deGFP, which does not degrade in our study, accumulates linearly in time because the concentration of degfp messenger RNA (mRNA) is constant. The last regime, typically observed after 6 hours of incubation, is when gene expression curves towards a plateau. This regime is complex to interpret because it corresponds to a depletion of the biochemical building blocks (amino acids, ribonucleosides) and to a change of the biochemical conditions (pH drop for example, see 21 ). When the concentration of plasmid P70a-deGFP is varied, the maximum rate of deGFP synthesis in steady state is linearly proportional to the plasmid concentration below 5 nM (Fig. 1b). We observe a saturation of the rate above 5 nM of template. The transition from the linear to the saturated regime is sharp. The linear and saturated regimes observed for the rate of deGFP synthesis are also observed for the protein synthesis yield (Fig. S2). We performed the same experiments with the plasmid P70a-mCherry and observed the same trends for a different reporter protein (Fig. S2). It is this phenomenological observation that we model in this article. We hypothesize that this saturation occurs when either the transcription machinery (core RNA polymerase) or the translation machinery as suggested before 7 , or both, are entirely depleted. For instance, at a sufficiently large concentration of synthesized mRNA, all the ribosomes are performing translation. Therefore, adding more DNA template to the reaction does not convert to more protein produced. As we shall see, transcription in this system never saturates. Our goal is to (i) derive a simple model that captures this hypothesis, (ii) constrain the model by determining experimentally some of the kinetics constants and concentrations, (iii) and test the sensitivity of the model with respect to biochemical parameters.
Model. The schematic of TXTL of a reporter gene under a constitutive promoter (P70a-deGFP) (Fig. 1c), shows most of the major biochemical species that we include in the model: • E 0 : free core RNA polymerase • S 70 : sigma factor 70 • P 70 : promoter specific to sigma 70 (S70) • m: degfp mRNA • Rnase: ribonucleases responsible for mRNA degradation • R 0 : free ribosomes • deGFP dark : non-mature deGFP (not fluorescent) • deGFP mat : mature deGFP (fluorescent) • L m : length in nt of the mRNA (or gene) • C m : transcription rate in bp/s • C p : translation rate in b/s www.nature.com/scientificreports www.nature.com/scientificreports/ The model is based on only three ordinary differential equations (ODEs) and two equations for conservation: the total concentrations of RNA polymerases and ribosomes are constant (Fig. 1d). The biochemical constants and concentrations for our best fit are summarized in the Table Fig. 2. The model is derived using the following appropriate assumptions: • quasi-steady state for Michaelis-Menten terms. K M,70 , K M,m , and K M,R are the Michaelis-Menten constants for transcription, mRNA degradation and translation respectively. • nutrients necessary for gene expression (tRNA, amino acids, ribonucleosides) are in infinite supply during the steady state. • the concentration of holoenzyme RNA polymerase-Sigma 70 is larger than the concentration of template (i.e. larger than the concentration of promoter P70). • Sigma 70 is not limiting for transcription, which is confirmed by the sensitivity assay.
• the concentration of ribonucleases is smaller than the concentration of synthesized mRNA (m).
• the concentration of ribosomes (R 0 ) is larger than the concentration of synthesized mRNA (m).
• translation initiation factors are never limiting.
• the maturation of deGFP dark to deGFP mat is modeled by a first order kinetics, which fits very well to the data in the maturation assay (Supplementary Information). • none of the components of TX and TL are degraded until the end of the steady state: their concentration is constant. This hypothesis is supported by the fact that this system can be used in semi-continuous mode to express proteins for about a day 6,19 . It is the major difference with respect to the work by Stogbauer and coworkers 10 , whose model attributes saturation of the synthesis rate to a degradation of the TX and TL components.
Using these assumptions, the set of three ODEs that describes the kinetics of deGFP synthesis is the following: The term of mRNA degradation is re-written by taking k [R nase ] = k d,m (Eq. 4). Based on our previous work 6,22 , mRNA degradation in our system behaves as a first order kinetics which means that K M,m ≫ [m]. The mRNA degradation term is not written as a first order kinetics, however, for modeling purposes (to avoid a negative mRNA concentration in the execution of the Matlab program). The constants k d,m (6.6 nM s −1 ) and K M,m (8000 nM) were chosen so as to obtain k deg,m determined by the assay later described and so that K M,m ≫ [m], which is the case because [m] at the transition from the linear to saturated regimes (5 nM P70a-deGFP) is on the order of 100 nM (Fig. S3). The model is independent from the numerical values of k d,m and K M,m as long as their ratio is equal to k deg,m and K M,m ≫ [m].
The set of Equations (1-3) becomes: In the next step we build two equations of conservation for the core RNA polymerases and ribosomes. The sigma factor 70 has two forms, free (S 70free ) or complexed with the core RNA polymerase (E 70 ): We consider that the following biochemical reaction is at equilibrium all the time (i.e. it is a fast biochemical reaction with respect to the others): We call K 70 the dissociation constant: Therefore, using Eq. (8): The core RNA polymerase has three forms: free (E 0 ), complexed with S 70 (E 70 ), or performing transcription on mRNA (E m ). E tot is constant: The number of core RNA polymerases that are bound to DNA is (see) 23 : The first term in Eq. 12 corresponds to the core RNA polymerase on the promoter and the other term the core RNA polymerases that have engaged in transcription. We then get the conservation equation, Eq. 13, that has to be solved for E 0 : We proceed in a similar manner to construct the conservation of ribosomes. Note that here we assume that the translation initiation and termination factors are not limiting the process of translation. Ribosomes can be in two forms, free (R 0 ), and performing translation on mRNA (R m ): tot m 0 The number of ribosomes on mRNA is: The first term in Eq. 15 corresponds to the ribosomes on the ribosome binding site and the other term is for the ribosomes that have engaged into translation. Eq. 16 that has to be solved for R 0 : The final system of equations (using Eqs (5-7) and 10) is (also shown in Fig. 1d): We did not include protein degradation in the experiments. There are two reasons for this. First, protein degradation, achieved by the ClpXP complex in TXTL, is a zeroth order kinetic reaction that does not allow a steady state for proteins 6 . Consequently, the analysis is less interesting. Second, the concentration of ClpXP complex does not seem to remain constant in the TXTL reaction (data not shown), presumably due to the well-established instability of ClpX 24 . That would make the analysis and modeling complicated and phenomenological.
tX. The biochemical constants and other parameters (for our best fit) are summarized in the Table Fig. 2b.
In its simple expression, the initiation frequency k TX for TX depends on k cat,m , K M,70 and E 70 (Eqs 5 and 22). k TX varies over three orders of magnitude 25 , with a maximum that can reach 30 initiations per 60 seconds 26,27 . This puts a limit on k cat,m to 0.5 s −1 , especially at low plasmid concentration when free RNA polymerase (E 0 ) is an infinite reservoir and E 70 equals S 70 . The rate constant for mRNA synthesis k cat,m was estimated to be between 10 −1 and 10 −3 s −1 for E. coli promoters 25 . For a strong promoter like P70a, we expect k cat,m to be at the high end of these estimations. In our best fit, k cat,m = 0.065 s −1 . The Michaelis-Menten constant K M,70 is typically between 1 nM and 100 nM 25,28 . In our previous TXTL work 22 , based on the first version of the system 5 , K M,70 was estimated to be around 10 nM for the promoter P70a. In this work, we used the new version of this TXTL system 6 ; our best fit was with K M,70 = 1 nM. The concentration of core RNA polymerases in E. coli varies between 1500 and 11400 molecules per cell depending on the growth conditions 26 . Because the lysate is prepared from cells growing in a rich medium and collected in the exponential phase, the concentration of core RNA polymerase in the collected cells is considered to be on the high end at about 11000-12000 per cell. Taking into account a dilution factor of about 7-10 during the lysate preparation (200-320 mg/ml of proteins in the E. coli cytoplasm 29 , 30 mg/ml for the lysate), the maximal concentration of core RNA polymerase is around 1.5 µM if all the enzymes are released during the preparation. This estimation translates as a maximum of E tot = 500 nM of core RNA polymerase in a TXTL reaction, which contains a 1/3 volume fraction of lysate. The minimum concentration of free core RNA polymerase in TXTL is found by only considering the polymerases not bound to DNA 6,30 . Our best fit was found for E tot = 400 nM. The same calculation was made for the primary sigma factor 70 (RpoD), whose number density is around 500-700 copies per cell (about 500-700 nM for a cell volume of 1 femtoliter) 31,32 . In a TXTL reaction, sigma 70 is therefore at a maximum concentration of about S 70 = 30-35 nM, which works for our best fit. The dissociation constant between sigma 70 and the core RNA polymerase has been precisely determined: K 70 = 0.26 nM 32 . The rate constant of the deGFP mRNA degradation was determined by an assay (Fig. S4): 1/k deg,m = 8. 25 10 −4 s (20.2 min for the mean lifetime). This constant was written as k deg,m = k d,m /K M,m (Eq. 4) with k d,m = 6.6 nM/s and K M,m = 8000 nM. The concentration of promoter P 70 and gene (both equal to the plasmid concentration) was fixed experimentally. The length of the transcribed gene is L m = 750 bp, from the TX start to the TX terminator. The average speed of TX (speed of the core RNA polymerase on DNA) in the all E. coli TXTL was estimated by an assay (Fig. S5): C m ≈ 10 bp/s, which is about 4-8 times smaller than in vivo 26 . E 0 , the concentration of free core RNA polymerase, is determined by Eq. 20.
The mRNA steady state [m] SS (Eq. 23) is found by setting Eq. 17 to zero (Eq. 22). For low plasmid concentration (in the linear regime), one can assume that E 70 ≫ K M,70 (or that K 70 ≪ E 0 ) and therefore k cat,m ≈ k TX . The mRNA mean lifetime 1/k deg,m for the malachite green aptamer (MGapt) was estimated using an assay (Fig. S6): 1/k deg,m ≈ 27 min. Our measurements of [m] SS at low plasmid concentration, using the malachite green aptamer as an RNA probe (Fig. S7), gives us a value of k cat,m ≈ k TX = 1.5 10 −2 s −1 using [m] SS = 25 nM at 1 nM plasmid. This experiment, however, can only provide a low estimation for this constant (i.e. the value for k TX can only be underestimated because the assay may not report all the malachite green aptamers synthesized or fluorescent). In our simulations, we found that the best fit was obtained with k cat,m ≈ k TX = 6.5 10 −2 s −1 (Fig. 2 www.nature.com/scientificreports www.nature.com/scientificreports/ Note that for the deGFP mRNA, 1/k deg,m ≈ 20 min (Fig. S4), using k cat,m ≈ k TX = 6.5 10 −2 s −1 , we get that [m] SS ≈ 80 nM at 1 nM plasmid. A maximum theoretical value (1 nM plasmid ≈ 1 copy per E. coli) of [m] SS ≈ 600 nM in TXTL is obtained by taking k cat,m ≈ k TX = 0.5 s −1 and a 1/k deg,m ≈ 20 min. Experimentally, one can see that the TX machinery is never limiting in the system because the rate of mRNA synthesis keeps increasing even at plasmid (P70a-deGFP-MGapt) concentrations larger than 5 nM (Fig. S8). As we shall see below, it is the TL machinery that is limiting in the system, i.e. ribosomes are entirely depleted onto the mRNA at plasmid concentrations above 5 nM (P70a-deGFP). Because it is the strongest promoter-UTR pair, the protein synthesis rate or yield for any other promoter-UTR regulatory element is linear with respect to plasmid concentration up to 5 nM or more; saturation of the protein synthesis rate cannot be observed below 5 nM plasmid.
tL. Similarly to TX, in its simple expression, the initiation frequency k TL (Eq. 24) for TL depends on both k cat,p and K M,R , and R 0 . The translation initiation frequency can be as high as 0.5 s −1 33 . The Michaelis-Menten constant for translation was measured in vitro and estimated to be around 23 nM for the 70S ribosome with no tRNA and 10 nM with tRNA 34 . In a previous cell-free system, K M,R was fitted at 65.8 nM 10 . K M,R = 10 nM was used for our best fit. No estimation of the rate constant for protein synthesis k cat,p was found in the literature. At low mRNA concentration, one can expect that R 0 » K M,R , which puts a limit on k cat,p to 0.5 s −1 . The rate constant for the maturation of deGFP was determined by an assay described previously 6 and repeated in this work (Fig. S9). The average concentration of ribosomes in E. coli cells growing in a rich medium, with a doubling time between 20 and 30 minutes, is between 44000 and 73000 26 , which corresponds to 1450-2500 nM in a TXTL reaction. It is in excellent agreement with respect to previous measurements in cell-free systems 35 . R tot = 1100 nM was our best fit for active ribosomes in TXTL. Finally, we estimated the average translation speed (speed of ribosomes on mRNA) to be at least 1 amino acid s −1 (2.5 bp s −1 ) (Fig. S10).
The steady state for deGFP dark is: At 1 nM plasmid P70a-deGFP, we measure a maximum protein synthesis rate of 0.5 nM/s, which indicates that the product k cat,p *k cat,m = 4 10 −4 s −2 (taking k deg,m = 8.25 10 −4 s −1 for the deGFP mRNA). The value for k cat,p = 6 10 −3 s −1 was chosen based on this calculation using k cat,m = 6.5 10 −2 s −1 . A maximum theoretical value (1 nM plasmid ≈ 1 copy per E. coli) of 300 nM/s for the protein synthesis rate in TXTL is obtained by taking k cat,m ≈ k cat,p = 0.5 s −1 and a 1/k deg,m ≈ 20 min. As shown for plasmid P70a-deGFP concentrations of 1, 5 and 10 nM, the model also delivers reliable kinetics at steady state for the first few hours, below and above the transition from linear to saturated regimes (Fig. S11). A major hallmark of our approach is how the model grasps very well the sharpness between the linear and saturated regime (Fig. 2). A model describing a similar TXTL system, yet based on a different regeneration system, attributes the saturation to metabolic processes and energy efficiency 14 . When applied to P70a-deGFP, however, this approach neither captures the linear regime nor the sharpness of the response that we observed in this work (see Fig. S1 in 14 ). We assume that the behavior of cell-free expression (e.g. presence of a linear response regime and sharpness of the transition from linear to saturated) in both systems do not have the same origin. parts combinations and sensitivity analysis. We designed two other promoters, P70b and P70c, derived from P70a (strengths: P70a > P70b > P70c) and two other untranslated regions, UTR2 and UTR3, derived from UTR1 (strengths: UTR1 > UTR2 > UTR3) to create a set of nine combinations (sequences in Supplementary  Information). The −35 and −10 of P70a were mutated to get P70b and P70c. The ribosome binding site in UTR1 was mutated to get UTR2 and UTR3. These sets span two orders of magnitude in strengths. By changing the promoter and UTR strengths, we change the value of k cat,m and k cat,p , and of K M,70 and K M,R . Many k cat,m -K M,70 and k cat,p -K M,R pairs can be found to fit the results. However, because the system is only weakly sensitive to changes in the magnitude of the Michaelis-Menten contants K M,70 and K M,R (see thereafter), we only changed the value of k cat,m and k cat,p that we determined through the simulations to get the best fits (Fig. 3). We experimentally determined the rate of protein synthesis for the nine combinations with respect to plasmid concentration and performed sensitivity analysis on six biochemical parameters. The sensitivity analysis comprised of varying each of the six biochemical constants, while keeping all the others constants at their best numerical fit values, by one order of magnitude above and below the best fit value. As discussed for P70a-UTR1, translation is the limiting process responsible for saturation of the protein synthesis rate as plasmid concentration is increased. Consequently, the model and data are most sensitive to the ribosome concentration, especially for strong promoters (Fig. 3). As expected, for weak promoters and/or UTRs (e.g. P70c), the response is linear for any plasmid concentration (up to 30 nM tested in this work). In addition to the ribosome concentration, high sensitivity is observed for k deg,m (Fig. S12). As expected, if k deg,m is larger, the system does not saturate and the response remains linear. Conversely, if k deg,m is smaller, the systems saturates more quickly with respect to plasmid concentration. Some sensitivity is observed for k mat (Fig. S13) and for E tot (Fig. S14). Note that for E tot , saturation is not observed in the experiments (Fig. S8) as captured by the model. Limitations due to E tot in the plasmid range 0-30 nM (P70a-deGFP) would be observed if E 0 < 100 nM. The model shows very weak sensitivity to K M,70 and K M,R (Figs S15 and S16). The model was not sensitive to changes in S 70 (Fig. S17). For P70a-deGFP, the model predicts a sharp transition in the concentration of free ribosomes around 5 nM plasmid, while the concentration of free core RNA polymerase decreases sharply only at plasmid concentrations of about 50 nM (Fig. S18). www.nature.com/scientificreports www.nature.com/scientificreports/ Strengths of synthetic vs natural regulatory elements. Our next step consisted of testing natural promoters and UTRs from E. coli to establish quantitative references with respect to the synthetic parts used to develop the model. Note that the strengths of some promoters have been already compared in vivo and in vitro 36 . We chose the constitutive promoters of the following genes, some based on protein abundance 37 , that we isolated by coupling each of them to the strong UTR1 (Fig. 4): lacI, rpoH, rrsB, recA. We chose the UTRs of the following genes that we isolated by coupling each of them to the strong promoter P70a (Fig. 4): lacI, rpoH, rpsA, acpP. We measured the rates of deGFP synthesis for all these constructions over the same plasmid range, from 0 to 30 nM (Fig. 4). Most of these constructions showed a linear regime followed by a saturation. Only PrrsB (16S ribosomal RNA promoter) behaved differently with a response curve characterized by a sigmoidal response at low plasmid concentration. As expected, weak promoters such as PlacI never saturate. As importantly, we defined the rates of deGFP synthesis per plasmid concentration (deGFP/h/nM), for each construction in the linear regime, as an indicator of the promoter or UTR strengths (Fig. 4). Many other promoters and UTRs can be rapidly tested in TXTL using this method. This table serves as a minimal quantitative reference between several synthetic promoters/ UTRs used in TXTL and natural ones. tXtL load calculator. The last step of this work consisted of building a load calculator as a procedure and formula to determine the burden on the TXTL components, especially on the translation machinery. This approach requires making several plasmids to define the strengths of the parts and measuring the protein synthesis rate (using eGFP for instance) to define the linear and saturated regimes. In order to determine the concentration of DNA (nM) in a TXTL for which the translation machinery will limit the deGFP synthesis rate, we developed an equation that takes into account the promoter strength (P), the UTR strength (U) and the length of the gene being expressed (L m ) in the DNA construct. The equation was constructed by fitting power function to each variable individually against the approximate concentration of DNA for which the ribosomes became www.nature.com/scientificreports www.nature.com/scientificreports/ limiting based on the model (Fig. 5). The three fit equations were then combined to form the equation below, which accounts for variations in each of the three variables. In order to make use of the equation, the promoter and UTR strength must already be characterized. P is the strength of the promoter relative to P70a, where P70a is given as a strength of 1. U is the strength of the UTR relative to UTR1, where UTR1 is given as a strength of 1. If more than one DNA construct is being used in the TXTL reaction and a user wants to know if ribosomes will be limiting, the equation can be used to calculate approximately what fraction of the ribosomes will be used by each DNA construct. For example, if two DNA constructs will be used in a TXTL reaction, and if the equation determines that the limiting concentration of one DNA construct alone is 5 nM, and 1 nM will be used in the reaction, then the limiting concentration of the second DNA construct should be reduced by 1 nM/5 nM = 20%. This process can be repeated if more than two DNA constructs are being used in a TXTL reaction.

conclusions
As the field of cell-free expression is rapidly growing, developing models with constrained biochemical parameters is necessary to determine the TXTL biochemical regimes and provide users with quantitative information to set the strengths and stoichiometry of regulatory parts making circuits, either executed in batch mode reactions or other settings such as microfluidics chips and synthetic cells. Because each cell-free system is different, model should be specific and accompanied by relevant measurements for each platform. In this work, our model captures remarkably well the linear and saturated regime, and more importantly, the sharpness of the transition between the two regimes for the all-E. coli system. While powerful computer tools are available to develop complex and sophisticated models, some models should also remain practical and thus accessible.

Materials and Methods
tXtL reactions. The TXTL system used in this work is the myTXTL kit from Arbor Biosciences. This system has been described in several articles 6,19 . TXTL reactions were assembled using a Labcyte Echo 550 Acoustic Liquid Handler, to volumes of 2 µl, and incubated at 29 °C. At a scale of 2 µl, the reactions were not limited by oxygen consumption. Individual TXTL reaction components were added to the 384 well source plate (Labcyte PP-0200), dispensed into a 96 well v-bottom plate (Sigma-Aldrich CLS-3857) and sealed with a well plate storage mat (Sigma-Aldrich CLS-3080). Protein fluorescence kinetics measurements were performed with the reporter plasmid P70a-deGFP, expressing the truncated version of eGFP (25.4 kDA, 1 mg/mL = 39.38 µM) 19 . deGFP fluorescence was measured on either a Biotek Neo2 or Biotek H1 plate reader at excitation and emission wavelengths of 485 nM and 528 nM, respectively, typically measuring every 3 minutes for 16 hours, with an incubation temperature of 29 °C. Fluorescence on the plate readers was calibrated using pure eGFP (Cell Biolabs STA-201) following Model predictions for the approximate limiting DNA concentration for different promoter strengths, UTR strengths, and lengths of gene. Each data set was constructed by varying each specific constant (k cap,m , k cat,p , L m ) and fit to a power function, then combined to form the final Eq. (28). The factor of 250 in Eq. (28) is due to the length of gene not being normalized, 2.19 10 2 is divided by the limiting DNA concentration at the length of deGFP (800 nt), which is 4.43 nt. That value is multiplied by the limiting DNA concentration for P70a-deGFP, where P = 1, U = 1, and L m = 800, which is 5 nM. The power function is chosen only because it is a good fit phenomenologically, not for some physiological reasons. The load calculator is just a tool to approximate when the resources will become limiting and the protein synthesis rate will not be in the linear regime. The obtained value tells us how sensitive each parameter is to the limiting DNA concentration, such that they can be combined in one single TXTL load calculator equation.