Precise and reliable gene expression via standard transcription and translation initiation elements

Journal name:
Nature Methods
Year published:
Published online


An inability to reliably predict quantitative behaviors for novel combinations of genetic elements limits the rational engineering of biological systems. We developed an expression cassette architecture for genetic elements controlling transcription and translation initiation in Escherichia coli: transcription elements encode a common mRNA start, and translation elements use an overlapping genetic motif found in many natural systems. We engineered libraries of constitutive and repressor-regulated promoters along with translation initiation elements following these definitions. We measured activity distributions for each library and selected elements that collectively resulted in expression across a 1,000-fold observed dynamic range. We studied all combinations of curated elements, demonstrating that arbitrary genes are reliably expressed to within twofold relative target expression windows with ~93% reliability. We expect the genetic element definitions validated here can be collectively expanded to create collections of public-domain standard biological parts that support reliable forward engineering of gene expression at genome scales.

At a glance


  1. Rules for regularizing gene expression.
    Figure 1: Rules for regularizing gene expression.

    (a) We defined an expression operating unit (EOU) to set boundaries and junctions of functional genetic elements underlying the expression of heterologous genes (Supplementary Note). The variable regions within each element type (wider icons) and the standard junctions (labeled lines) between elements that best enable reliable reuse of elements in novel combinations are detailed. The bicistronic design (BCD) with its two Shine-Dalgarno motifs (SD1 and SD2) is shown. (b) Rank-ordered library of constitutive promoters that encode an expected common +1 mRNA boundary and 5′ UTR leader sequence. a.u., arbitrary units. (c) Rank-ordered library of SD2 sites that adhere to the BCD and resulting BCD:GOI junction as established here. Error bars, s.d. (n = 3).

  2. Standard translation initiation elements using a bicistronic design are reliably reusable.
    Figure 2: Standard translation initiation elements using a bicistronic design are reliably reusable.

    (a) Gene expression via a regularized medium-strength promoter (Ptrc; asterisk indicates an absent operator sequence) and 22 monocistronic design (MCD) 5′ UTRs of varying expression strength. Eight GOIs coding for a total of 14 chimeric reporter fusions with either gfp or rfp (columns) are shown. The 14 chimeric reporter GOIs are encoded via the first 36 nt of the N-terminal coding sequences of lacI, araC, rfp, gfp, tetR and genes encoding putative cellulase (Cell), phosphomevalonate kinase (PMK) and penicillin acylase (PA) and via the full-length coding sequence of tetR (Online Methods). Variance in mean-centered log2 expression (left) from each MCD across all GOIs sequences (right) and average Spearman rank correlations (bottom) as given (Supplementary Fig. 8). a.u., arbitrary units. (b) The same SD sequences used in a encoded within bicistronic designs (BCDs). Rank orderings for a and b were established via data of b. Variance in mean-centered log2 expression from each BCD across all GOIs (right) and average Spearman rank correlations (bottom) as given (Supplementary Fig. 6). (c,d) Analysis of variance (Online Methods) in total protein synthesis levels realized using the MCDs (c) or BCDs (d). (e) Comparison of absolute GFP synthesis ranges produced using MCDs or BCDs across all tested GOIs. (f) Predicted hybridization free energies between 16S rRNA and SD sequences are better correlated to expression for BCDs than that for MCDs (Supplementary Figs. 11 and 12).

  3. Bicistronic designs (BCDs) retain functional reliability with alternate transcription systems and different leader cistrons.
    Figure 3: Bicistronic designs (BCDs) retain functional reliability with alternate transcription systems and different leader cistrons.

    (a) Correlated gene expression levels from BCDs with an E. coli Ptrc* promoter (x axis) or bacteriophage T7 (y axis) promoter and RNA polymerase. The asterisk indicates that the promoter has no operator sequence and hence is constitutive in expression. a.u., arbitrary units. (b) Correlated gene expression levels from a phage T7 transcription system but with two GOIs. (c) Rank-ordered GFP expression for BCDs (WT-SD1-BCD) compared to expression for those in which SD1 is disrupted (Null-SD1-BCD, schematic). (d) Correlated expression levels from an E. coli promoter but with stop or rare codons inserted in the BCD leader cistron (schematic) across SD2 elements of different expression strengths (x axis, clustered groupings). Error bars, s.d. (n = 3).

  4. Precise and reliable gene expression via standard transcription-control and translation-initiation elements.
    Figure 4: Precise and reliable gene expression via standard transcription-control and translation-initiation elements.

    (a) Standard promoters produce mRNA from a common +1 nucleotide position. Translation initiation is entirely encoded by a separate and independent bicistronic design (BCD). (b,c) Mean-centered log2 expression for green (b) and red (c) fluorescent proteins via a full combinatorial library of standardized promoters (14) and BCDs (22). a.u., arbitrary units. (d) Direct correlation of expression from b and c (red circles) against those generated by use of irregular transcription- and translation-control elements (blue diamonds, data from ref. 15). (e) Factorial analysis of variance for mean-normalized expression from the standard promoter and BCD combinatorial library, with element- and junction-specific contributions to total expression as noted (Online Methods). (f) Correlation of observed versus predicted protein expression for sequence-distinct GOIs, as predicted using expression data from a single GOI (GFP) to estimate activity scores for promoters and BCDs adhering to method for forward-engineering gene expression developed here. Error bars: y axis, s.d. (n = 3); x axis, deviations in predicted values derived from the cross-validated model (Online Methods). Cellulase, putative cellulase; PMK, phosphomevalonate kinase; PA, penicillin acylase.


  1. Endy, D. Foundations for engineering biology. Nature 438, 449453 (2005).
  2. Purnick, P.E. & Weiss, R. The second wave of synthetic biology: from modules to systems. Nat. Rev. Mol. Cell Biol. 10, 410422 (2009).
  3. Ellis, T., Adie, T. & Baldwin, G.S. DNA assembly for synthetic biology: from parts to pathways and beyond. Integr. Biol. (Camb.) 3, 109118 (2011).
  4. Carr, P.A. & Church, G.M. Genome engineering. Nat. Biotechnol. 27, 11511162 (2009).
  5. Gibson, D.G. et al. Creation of a bacterial cell controlled by a chemically synthesized genome. Science 329, 5256 (2010).
  6. Lu, T.K., Khalil, A.S. & Collins, J.J. Next-generation synthetic gene networks. Nat. Biotechnol. 27, 11391150 (2009).
  7. Keasling, J.D. Manufacturing molecules through metabolic engineering. Science 330, 13551358 (2010).
  8. Cardinale, S. & Arkin, A.P. Contextualizing context for synthetic biology—identifying causes of failure of synthetic biological systems. Biotechnol. J. 7, 856866 (2012).
  9. Kittleson, J.T., Wu, G.C. & Anderson, J.C. Successes and failures in modular genetic engineering. Curr. Opin. Chem. Biol. 16, 329336 (2012).
  10. Salis, H.M., Mirsky, E.A. & Voigt, C.A. Automated design of synthetic ribosome binding sites to control protein expression. Nat. Biotechnol. 27, 946950 (2009).
  11. Cambray, G., Mutalik, V.K. & Arkin, A.P. Toward rational design of bacterial genomes. Curr. Opin. Microbiol. 14, 624630 (2011).
  12. Canton, B., Labno, A. & Endy, D. Refinement and standardization of synthetic biological parts and devices. Nat. Biotechnol. 26, 787793 (2008).
  13. Rosenfeld, N., Young, J.W., Alon, U., Swain, P.S. & Elowitz, M.B. Accurate prediction of gene feedback circuit behavior from component properties. Mol. Syst. Biol. 3, 143 (2007).
  14. Smolke, C.D. Building outside of the box: iGEM and the BioBricks Foundation. Nat. Biotechnol. 27, 10991102 (2009).
  15. Mutalik, V.K. et al. Quantitative estimation of activity and quality for collections of functional genetic elements. Nat. Methods advance online publication, doi:10.1038/nmeth.2403 (10 March 2013).
  16. Lou, C., Stanton, B., Chen, Y.J., Munsky, B. & Voigt, C.A. Ribozyme-based insulator parts buffer synthetic circuits from genetic context. Nat. Biotechnol. 30, 11371142 (2012).
  17. Qi, L., Haurwitz, R.E., Shao, W., Doudna, J.A. & Arkin, A.P. RNA processing enables predictable programming of gene expression. Nat. Biotechnol. 30, 10021006 (2012).
  18. Dreyfus, M. What constitutes the signal for the initiation of protein synthesis on Escherichia coli mRNAs? J. Mol. Biol. 204, 7994 (1988).
  19. Welch, M., Villalobos, A., Gustafsson, C. & Minshull, J. You're one in a googol: optimizing genes for protein expression. J. R. Soc. Interface 6 (suppl. 4), S467S476 (2009).
  20. Bonnet, J., Subsoontorn, P. & Endy, D. Rewritable digital data storage in live cells via engineered control of recombination directionality. Proc. Natl. Acad. Sci. USA 109, 88848889 (2012).
  21. Spanjaard, R.A. & Vanduin, J. Translational reinitiation in the presence and absence of a Shine and Dalgarno sequence. Nucleic Acids Res. 17, 55015507 (1989).
  22. Oppenheim, D.S. & Yanofsky, C. Translational coupling during expression of the tryptophan operon of Escherichia coli. Genetics 95, 785795 (1980).
  23. Schümperli, D., McKenney, K., Sobieski, D.A. & Rosenberg, M. Translational coupling at an intercistronic boundary of the Escherichia coli galactose operon. Cell 30, 865871 (1982).
  24. Das, A. & Yanofsky, C. A ribosome binding site sequence is necessary for efficient expression of the distal gene of a translationally-coupled gene pair. Nucleic Acids Res. 12, 47574768 (1984).
  25. Schoner, B.E., Belagaje, R.M. & Schoner, R.G. Translation of a synthetic two-cistron mRNA in Escherichia coli. Proc. Natl. Acad. Sci. USA 83, 85068510 (1986).
  26. Makoff, A.J. & Smallwood, A.E. The use of two-cistron constructions in improving the expression of a heterologous gene in E. coli. Nucleic Acids Res. 18, 17111718 (1990).
  27. Mendez-Perez, D., Gunasekaran, S., Orler, V.J. & Pfleger, B.F. A translation-coupling DNA cassette for monitoring protein translation in Escherichia coli. Metab. Eng. 14, 298305 (2012).
  28. Takyar, S., Hickerson, R.P. & Noller, H.F. mRNA helicase activity of the ribosome. Cell 120, 4958 (2005).
  29. Qu, X. et al. The ribosome uses two active mechanisms to unwind messenger RNA during translation. Nature 475, 118121 (2011).
  30. Barrick, D. et al. Quantitative analysis of ribosome binding sites in E.coli. Nucleic Acids Res. 22, 12871295 (1994).
  31. Steitz, J.A. Polypeptide chain initiation: nucleotide sequences of the three ribosomal binding sites in bacteriophage R17 RNA. Nature 224, 957964 (1969).
  32. Yusupova, G.Z., Yusupov, M.M., Cate, J.H. & Noller, H.F. The path of messenger RNA through the ribosome. Cell 106, 233241 (2001).
  33. Kudla, G., Murray, A.W., Tollervey, D. & Plotkin, J.B. Coding-sequence determinants of gene expression in Escherichia coli. Science 324, 255258 (2009).
  34. Iost, I., Guillerez, J. & Dreyfus, M. Bacteriophage T7 RNA polymerase travels far ahead of ribosomes in vivo. J. Bacteriol. 174, 619622 (1992).
  35. Alper, H., Fischer, C., Nevoigt, E. & Stephanopoulos, G. Tuning genetic control through promoter engineering. Proc. Natl. Acad. Sci. USA 102, 1267812683 (2005).
  36. Cox, R.S. III, Surette, M.G. & Elowitz, M.B. Programming gene expression with combinatorial promoters. Mol. Syst. Biol. 3, 145 (2007).
  37. Hook-Barnard, I.G. & Hinton, D.M. Transcription initiation by mix and match elements: flexibility for polymerase binding to bacterial promoters. Gene Regul. Syst. Bio. 1, 275293 (2007).
  38. Kwok, R. Five hard truths for synthetic biology. Nature 463, 288290 (2010).
  39. Sellers, W. A system of screw threads and nuts. J. Franklin Inst. 77, 344350 (1864).
  40. Kozak, M. Initiation of translation in prokaryotes and eukaryotes. Gene 234, 187208 (1999).
  41. Scherbakov, D.V. & Garber, M.B. Overlapping genes in bacterial and phage genomes. Mol. Biol. 34, 485495 (2000).
  42. Chan, L.Y., Kosuri, S. & Endy, D. Refactoring bacteriophage T7. Mol. Syst. Biol. 1, 2005.0018 (2005).
  43. Temme, K., Zhao, D. & Voigt, C.A. Refactoring the nitrogen fixation gene cluster from Klebsiella oxytoca. Proc. Natl. Acad. Sci. USA 109, 70857090 (2012).
  44. Jaschke, P.R., Lieberman, E.K., Rodriguez, J., Sierra, A. & Endy, D. A fully decompressed synthetic bacteriophage øX174 genome assembled and archived in yeast. Virology (2012).
  45. Mutalik, V.K., Qi, L., Guimaraes, J.C., Lucks, J.B. & Arkin, A.P. Rationally designed families of orthogonal RNA regulators of translation. Nat. Chem. Biol. 8, 447454,
    434, 278–284 (2012).
  46. Liu, C.C., Qi, L., Yanofsky, C. & Arkin, A.P. Regulation of transcription by unnatural amino acids. Nat. Biotechnol. 29, 164168 (2011).
  47. Chang, A.L., Wolf, J.J. & Smolke, C.D. Synthetic RNA switches as a tool for temporal and spatial control over gene expression. Curr. Opin. Biotechnol. 23, 679688 (2012).
  48. Pfleger, B.F., Pitera, D.J., Smolke, C.D. & Keasling, J.D. Combinatorial engineering of intergenic regions in operons tunes expression of multiple genes. Nat. Biotechnol. 24, 10271032 (2006).
  49. Cobb, R.E., Si, T. & Zhao, H. Directed evolution: an evolving and enabling synthetic biology tool. Curr. Opin. Chem. Biol. 16, 285291 (2012).
  50. Aitken, C.E., Petrov, A. & Puglisi, J.D. Single ribosome dynamics and the mechanism of translation. Annu. Rev. Biophys. 39, 491513 (2010).
  51. Ausubel, F.M. Short Protocols in Molecular Biology 5th edn. (Wiley, New York, 2002).
  52. Pédelacq, J.D., Cabantous, S., Tran, T., Terwilliger, T.C. & Waldo, G.S. Engineering and characterization of a superfolder green fluorescent protein. Nat. Biotechnol. 24, 7988 (2006).
  53. Campbell, R.E. et al. A monomeric red fluorescent protein. Proc. Natl. Acad. Sci. USA 99, 78777882 (2002).
  54. Lee, T.S. et al. BglBrick vectors and datasheets: a synthetic biology platform for gene expression. J. Biol. Eng. 5, 12 (2011).
  55. Lutz, R. & Bujard, H. Independent and tight regulation of transcriptional units in Escherichia coli via the LacR/O, the TetR/O and AraC/I1–I2 regulatory elements. Nucleic Acids Res. 25, 12031210 (1997).
  56. McDowell, J.C., Roberts, J.W., Jin, D.J. & Gross, C. Determination of intrinsic transcription termination efficiency by RNA polymerase elongation rate. Science 266, 822825 (1994).
  57. Kireeva, M.L. & Kashlev, M. Mechanism of sequence-specific pausing of bacterial RNA polymerase. Proc. Natl. Acad. Sci. USA 106, 89008905 (2009).
  58. Davis, J.H., Rubin, A.J. & Sauer, R.T. Design, construction and characterization of a set of insulated bacterial promoters. Nucleic Acids Res. 39, 11311141 (2011).
  59. Brosius, J., Erfle, M. & Storella, J. Spacing of the −10 and −35 regions in the tac promoter. J. Biol. Chem. 260, 35393541 (1985).
  60. Saecker, R.M., Record, M.T. Jr. & Dehaseth, P.L. Mechanism of bacterial transcription initiation: RNA polymerase - promoter binding, isomerization to initiation-competent open complexes, and initiation of RNA synthesis. J. Mol. Biol. 412, 754771 (2011).
  61. Gross, C.A. et al. The functional and regulatory roles of sigma factors in transcription. Cold Spring Harb. Symp. Quant. Biol. 63, 141155 (1998).
  62. Shultzaberger, R.K., Malashock, D.S., Kirsch, J.F. & Eisen, M.B. The fitness landscapes of cis-acting binding sites in different promoter and environmental contexts. PLoS Genet. 6, e1001042 (2010).
  63. Rhodius, V.A., Mutalik, V.K. & Gross, C.A. Predicting the strength of UP-elements and full-length E. coli σE promoters. Nucleic Acids Res. 40, 29072924 (2012).
  64. Rhodius, V.A. & Mutalik, V.K. Predicting strength and function for promoters of the Escherichia coli alternative sigma factor, σE. Proc. Natl. Acad. Sci. USA 107, 28542859 (2010).
  65. Mutalik, V.K., Nonaka, G., Ades, S.E., Rhodius, V.A. & Gross, C.A. Promoter strength properties of the complete sigma E regulon of Escherichia coli and Salmonella enterica. J. Bacteriol. 191, 72797287 (2009).
  66. Bujard, H. et al. A T5 promoter-based transcription-translation system for the analysis of proteins in vitro and in vivo. Methods Enzymol. 155, 416433 (1987).
  67. Miroslavova, N.S. & Busby, S.J. Investigations of the modular structure of bacterial promoters. Biochem. Soc. Symp. 73, 110 (2006).
  68. Szoke, P.A., Allen, T.L. & deHaseth, P.L. Promoter recognition by Escherichia coli RNA polymerase: effects of base substitutions in the -10 and -35 regions. Biochemistry 26, 61886194 (1987).
  69. Engler, C., Kandzia, R. & Marillonnet, S. A one pot, one step, precision cloning method with high throughput capability. PLoS ONE 3, e3647 (2008).
  70. Postle, K., Nguyen, T.T. & Bertrand, K.P. Nucleotide sequence of the repressor gene of the TN10 tetracycline resistance determinant. Nucleic Acids Res. 12, 48494863 (1984).
  71. Matsuda, A., Toma, K. & Komatsu, K. Nucleotide sequences of the genes for two distinct cephalosporin acylases from a Pseudomonas strain. J. Bacteriol. 169, 58215826 (1987).
  72. Master, E.R., Zheng, Y., Storms, R., Tsang, A. & Powlowski, J. A xyloglucan-specific family 12 glycosyl hydrolase from Aspergillus niger: recombinant expression, purification and characterization. Biochem. J. 411, 161170 (2008).
  73. Redding-Johanson, A.M. et al. Targeted proteomics for metabolic pathway optimization: application to terpene production. Metab. Eng. 13, 194203 (2011).
  74. Martin, V.J., Pitera, D.J., Withers, S.T., Newman, J.D. & Keasling, J.D. Engineering a mevalonate pathway in Escherichia coli for production of terpenoids. Nat. Biotechnol. 21, 796802 (2003).
  75. Blattner, F.R. et al. The complete genome sequence of Escherichia coli K-12. Science 277, 14531462 (1997).
  76. Gulevich, A.Y. et al. A new method for the construction of translationally coupled operons in a bacterial chromosome. Mol. Biol. 43, 505514 (2009).
  77. Olins, P.O., Devine, C.S., Rangwala, S.H. & Kavka, K.S. The T7 phage gene 10 leader RNA, a ribosome-binding site that dramatically enhances the expression of foreign genes in Escherichia coli. Gene 73, 227235 (1988).
  78. Olins, P.O. & Rangwala, S.H. A novel sequence element derived from bacteriophage T7 mRNA acts as an enhancer of translation of the lacZ gene in Escherichia coli. J. Biol. Chem. 264, 1697316976 (1989).
  79. Baba, T. et al. Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Mol. Syst. Biol. 2, 2006.0008 (2006).
  80. Rice, P., Longden, I. & Bleasby, A. EMBOSS: the European molecular biology open software suite. Trends Genet. 16, 276277 (2000).
  81. Needleman, S.B. & Wunsch, C.D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443453 (1970).
  82. Markham, N.R. & Zuker, M. UNAFold: software for nucleic acid folding and hybridization. Methods Mol. Biol. 453, 331 (2008).
  83. Wu, C.F.J. & Hamada, M.S. Experiments: Planning, Analysis, and Optimization 2nd edn. (Wiley, Hoboken, New Jersey, USA, 2009).
  84. Wold, S., Sjöström, M. & Eriksson, L. PLS-regression: a basic tool of chemometrics. Chemom. Intell. Lab. Syst. 58, 109130 (2001).
  85. Freedman, D. & Diaconis, P. On the histogram as a density estimator: L2 theory. Z Wahrscheinlichkeit 57, 453476 (1981).
  86. Saeed, A.I. et al. TM4 microarray software suite. Methods Enzymol. 411, 134193 (2006).
  87. Crooks, G.E., Hon, G., Chandonia, J.M. & Brenner, S.E. WebLogo: a sequence logo generator. Genome Res. 14, 11881190 (2004).

Download references

Author information

  1. These authors contributed equally to this work.

    • Adam P Arkin &
    • Drew Endy


  1. BIOFAB International Open Facility Advancing Biotechnology, Emeryville, California, USA.

    • Vivek K Mutalik,
    • Joao C Guimaraes,
    • Guillaume Cambray,
    • Colin Lam,
    • Marc Juul Christoffersen,
    • Quynh-Anh Mai,
    • Andrew B Tran,
    • Morgan Paull,
    • Jay D Keasling,
    • Adam P Arkin &
    • Drew Endy
  2. Lawrence Berkeley National Laboratory, Physical Biosciences Division, Berkeley, California, USA.

    • Vivek K Mutalik,
    • Jay D Keasling &
    • Adam P Arkin
  3. Department of Bioengineering, University of California, Berkeley, Berkeley, California, USA.

    • Vivek K Mutalik,
    • Joao C Guimaraes,
    • Guillaume Cambray,
    • Colin Lam,
    • Marc Juul Christoffersen,
    • Quynh-Anh Mai,
    • Andrew B Tran,
    • Jay D Keasling &
    • Adam P Arkin
  4. Department of Informatics, Computer Science and Technology Center, University of Minho, Campus de Gualtar, Braga, Portugal.

    • Joao C Guimaraes
  5. Department of Chemical & Biomolecular Engineering, University of California, Berkeley, Berkeley, California, USA.

    • Jay D Keasling
  6. Joint BioEnergy Institute, Emeryville, California, USA.

    • Jay D Keasling
  7. Department of Bioengineering, Stanford University, Stanford, California, USA.

    • Drew Endy


V.K.M., A.P.A. and D.E. conceived the study and designed the experiments. V.K.M., C.L., Q.-A.M., A.B.T. and M.P. performed the experiments. V.K.M., J.C.G., G.C., M.J.C., A.P.A. and D.E. analyzed the data. V.K.M., J.C.G., G.C., J.D.K., A.P.A. and D.E. wrote the manuscript. All authors discussed and commented on the manuscript.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (8 MB)

    Supplementary Figures 1–32, Supplementary Table 1 and Supplementary Note

Excel files

  1. Supplementary Data 1 (475 KB)

    List of parts, plasmids and strains used in the present work. Columns as follows: A, number; B, vector backbone; C, abstract part number for promoter element, indicated as “apFAB#”; D, promoter name; E, abstract part number for 5' UTR element, indicated as “apFAB#”; F, 5' UTR name used in the main text; G, abstract part number for GOI element, indicated as “apFAB#”; H, GOI name; I, plasmid number “pFAB#”; J, antibiotics; K, replication origin; L, strain; M, strain number “sFAB#”; N, project name.

  2. Supplementary Data 2 (135 KB)

    List of primers used in the present work. Columns as follows: A, number; B, oligonucleotide number (“oFAB#”; primers used for sequencing are denoted as “soFAB#”); C, forward and reverse primers are indicated as FW and RV; D, information notes for the primer; E, primer sequence (5' to 3'); F, project name.

Additional data