Degeneracy in the genetic code, which enables a single protein to be encoded by a multitude of synonymous gene sequences, has an important role in regulating protein expression, but substantial uncertainty exists concerning the details of this phenomenon. Here we analyse the sequence features influencing protein expression levels in 6,348 experiments using bacteriophage T7 polymerase to synthesize messenger RNA in Escherichia coli. Logistic regression yields a new codon-influence metric that correlates only weakly with genomic codon-usage frequency, but strongly with global physiological protein concentrations and also mRNA concentrations and lifetimes in vivo. Overall, the codon content influences protein expression more strongly than mRNA-folding parameters, although the latter dominate in the initial ~16 codons. Genes redesigned based on our analyses are transcribed with unaltered efficiency but translated with higher efficiency in vitro. The less efficiently translated native sequences show greatly reduced mRNA levels in vivo. Our results suggest that codon content modulates a kinetic competition between protein elongation and mRNA degradation that is a central feature of the physiology and also possibly the regulation of translation in E. coli.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
BMC Bioinformatics Open Access 23 December 2022
Scientific Reports Open Access 15 July 2022
Consistent Clustering Pattern of Prokaryotic Genes Based on Base Frequency at the Second Codon Position and its Association with Functional Category Preference
Interdisciplinary Sciences: Computational Life Sciences Open Access 24 November 2021
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
Chen, G. T. & Inouye, M. Role of the AGA/AGG codons, the rarest codons in global gene expression in Escherichia coli. Genes Dev. 8, 2641–2652 (1994)
Deana, A., Ehrlich, R. & Reiss, C. Synonymous codon selection controls in vivo turnover and amount of mRNA in Escherichia coli bla and ompA genes. J. Bacteriol. 178, 2718–2720 (1996)
Kudla, G., Murray, A. W., Tollervey, D. & Plotkin, J. B. Coding-sequence determinants of gene expression in Escherichia coli. Science 324, 255–258 (2009)
Tuller, T., Waldman, Y. Y., Kupiec, M. & Ruppin, E. Translation efficiency is determined by both codon bias and folding energy. Proc. Natl Acad. Sci. USA 107, 3645–3650 (2010)
Goodman, D. B., Church, G. M. & Kosuri, S. Causes and effects of N-terminal codon bias in bacterial genes. Science 342, 475–479 (2013)
Castillo-Méndez, M. A., Jacinto-Loeza, E., Olivares-Trejo, J. J., Guarneros-Pena, G. & Hernandez-Sanchez, J. Adenine-containing codons enhance protein synthesis by promoting mRNA binding to ribosomal 30S subunits provided that specific tRNAs are not exhausted. Biochimie 94, 662–672 (2012)
Bentele, K., Saffert, P., Rauscher, R., Ignatova, Z. & Bluthgen, N. Efficient translation initiation dictates codon usage at gene start. Mol. Syst. Biol. 9, 675 (2013)
Hunt, R. C., Simhadri, V. L., Iandoli, M., Sauna, Z. E. & Kimchi-Sarfaty, C. Exposing synonymous mutations. Trends Genet. 30, 308–321 (2014)
Spencer, P. S., Siller, E., Anderson, J. F. & Barral, J. M. Silent substitutions predictably alter translation elongation rates and protein folding efficiencies. J. Mol. Biol. 422, 328–335 (2012)
Li, G. W., Burkhardt, D., Gross, C. & Weissman, J. S. Quantifying absolute protein synthesis rates reveals principles underlying allocation of cellular resources. Cell 157, 624–635 (2014)
Li, G.-W., Oh, E. & Weissman, J. S. The anti-Shine–Dalgarno sequence drives translational pausing and codon choice in bacteria. Nature 484, 538–541 (2012)
Gingold, H. & Pilpel, Y. Determinants of translation efficiency and accuracy. Mol. Syst. Biol. 7, 481 (2011)
Cannarozzi, G. et al. A role for codon order in translation dynamics. Cell 141, 355–367 (2010)
Sharp, P. M. & Li, W. H. The codon adaptation index–a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 15, 1281–1295 (1987)
Ninio, J. Fine tuning of ribosomal accuracy. FEBS Lett. 196, 1–4 (1986)
Tuller, T. et al. An evolutionarily conserved mechanism for controlling the efficiency of protein translation. Cell 141, 344–354 (2010)
Wallace, E. W., Airoldi, E. M. & Drummond, D. A. Estimating selection on synonymous codon usage from noisy experimental data. Mol. Biol. Evol. 30, 1438–1453 (2013)
Caskey, C. T., Beaudet, A. & Nirenberg, M. RNA codons and protein synthesis. 15. Dissimilar responses of mammalian and bacterial transfer RNA fractions to messenger RNA codons. J. Mol. Biol. 37, 99–118 (1968)
Ikemura, T. Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system. J. Mol. Biol. 151, 389–409 (1981)
Muramatsu, T. et al. Codon and amino-acid specificities of a transfer RNA are both converted by a single post-transcriptional modification. Nature 336, 179–181 (1988)
Zhang, S. P., Zubay, G. & Goldman, E. Low-usage codons in Escherichia coli, yeast, fruit fly and primates. Gene 105, 61–72 (1991)
Bulmer, M. The selection-mutation-drift theory of synonymous codon usage. Genetics 129, 897–907 (1991)
Dong, H., Nilsson, L. & Kurland, C. G. Co-variation of tRNA abundance and codon usage in Escherichia coli at different growth rates. J. Mol. Biol. 260, 649–663 (1996)
Elf, J., Nilsson, D., Tenson, T. & Ehrenberg, M. Selective charging of tRNA isoacceptors explains patterns of codon usage. Science 300, 1718–1722 (2003)
Dittmar, K. A., Sorensen, M. A., Elf, J., Ehrenberg, M. & Pan, T. Selective charging of tRNA isoacceptors induced by amino-acid starvation. EMBO Rep. 6, 151–157 (2005)
Zhang, F., Saha, S., Shabalina, S. A. & Kashina, A. Differential arginylation of actin isoforms is regulated by coding sequence-dependent degradation. Science 329, 1534–1537 (2010)
Vivanco-Domínguez, S. et al. Protein synthesis factors (RF1, RF2, RF3, RRF, and tmRNA) and peptidyl-tRNA hydrolase rescue stalled ribosomes at sense codons. J. Mol. Biol. 417, 425–439 (2012)
Dana, A. & Tuller, T. The effect of tRNA levels on decoding times of mRNA codons. Nucleic Acids Res. 42, 9171–9181 (2014)
Pelechano, V. & Wei, W. & Steinmetz, Lars M. Widespread co-translational RNA decay reveals ribosome dynamics. Cell 161, 1400–1412 (2015)
Presnyak, V. et al. Codon optimality is a major determinant of mRNA stability. Cell 160, 1111–1124 (2015)
Drummond, D. A. & Wilke, C. O. Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution. Cell 134, 341–352 (2008)
Shakin-Eshleman, S. H. & Liebhaber, S. A. Influence of duplexes 3′ to the mRNA initiation codon on the efficiency of monosome formation. Biochemistry 27, 3975–3982 (1988)
Quax, T. E. et al. Differential translation tunes uneven production of operon-encoded proteins. Cell Rep . 4, 938–944 (2013)
Letzring, D. P., Wolf, A. S., Brule, C. E. & Grayhack, E. J. Translation of CGA codon repeats in yeast involves quality control components and ribosomal protein L1. RNA 19, 1208–1217 (2013)
Ude, S. et al. Translation elongation factor EF-P alleviates ribosome stalling at polyproline stretches. Science 339, 82–85 (2013)
Iost, I. & Dreyfus, M. The stability of Escherichia coli lacZ mRNA depends upon the simultaneity of its synthesis and translation. EMBO J. 14, 3252–3261 (1995)
Iost, I., Guillerez, J. & Dreyfus, M. Bacteriophage T7 RNA polymerase travels far ahead of ribosomes in vivo. J. Bacteriol . 174, 619–622 (1992)
Acton, T. B. et al. Robotic cloning and protein production platform of the Northeast Structural Genomics Consortium. Methods Enzymol. 394, 210–243 (2005)
Price, W. N. et al. Large-scale experimental studies show unexpected amino acid effects on protein expression and solubility in vivo in E. coli. Microb. Inform. Exp . 1, 6 (2011)
Duval, M. et al. Escherichia coli ribosomal protein S1 unfolds structured mRNAs onto the ribosome for active translation initiation. PLoS Biol. 11, e1001731 (2013)
Reuter, J. S. & Mathews, D. H. RNAstructure: software for RNA secondary structure prediction and analysis. BMC Bioinformatics 11, 129 (2010)
Lu, J. & Deutsch, C. Electrostatics in the ribosomal tunnel modulate chain elongation rates. J. Mol. Biol. 384, 73–86 (2008)
Ishihama, Y. et al. Protein abundance profiling of the Escherichia coli cytosol. BMC Genomics 9, 102 (2008)
Chen, H., Shiroguchi, K., Ge, H. & Xie, X. S. Genome-wide study of mRNA degradation and transcript elongation in Escherichia coli. Mol. Syst. Biol. 11, 781 (2015)
dos Reis, M. Unexpected correlations between gene expression and codon usage bias from microarray data for the whole Escherichia coli K-12 genome. Nucleic Acids Res. 31, 6976–6985 (2003)
Nogueira, T., de Smit, M., Graffe, M. & Springer, M. The relationship between translational control and mRNA degradation for the Escherichia coli threonyl-tRNA synthetase gene. J. Mol. Biol. 310, 709–722 (2001)
Richards, J., Sundermeier, T., Svetlanov, A. & Karzai, A. W. Quality control of bacterial mRNA decoding and decay. Biochim. Biophys. Acta 1779, 574–582 (2008)
Ivanova, N., Pavlov, M. Y. & Ehrenberg, M. tmRNA-induced release of messenger RNA from stalled ribosomes. J. Mol. Biol. 350, 897–905 (2005)
Shoemaker, C. J., Eyler, D. E. & Green, R. Dom34:Hbs1 promotes subunit dissociation and peptidyl-tRNA drop-off to initiate no-go decay. Science 330, 369–372 (2010)
Chadani, Y., Ono, K., Kutsukake, K. & Abo, T. Escherichia coli YaeJ protein mediates a novel ribosome-rescue pathway distinct from SsrA- and ArfA-mediated pathways. Mol. Microbiol. 80, 772–785 (2011)
Xiao, R. et al. The high-throughput protein sample production platform of the Northeast Structural Genomics Consortium. J. Struct. Biol. 172, 21–33 (2010)
Acton, T. B. et al. Preparation of protein samples for NMR structure, function, and small-molecule screening studies. Methods Enzymol. 493, 21–60 (2011)
R Development Core Team. A Language and Environment for Statistical Computing; http://www.r-project.org/ (2012)
Akaike, H. A new look at the statistical model identification. IEEE Trans. Auto. Con . 19, 716–723 (1974)
Harrell, F. E. Jr. R package version 4.2-0; http://CRAN.R-project.org/package=rms (2014)
Jansson, M. et al. High-level production of uniformly 15N- and 13C-enriched fusion proteins in Escherichia coli. J. Biomol. NMR 7, 131–141 (1996)
Keseler, I. M. et al. EcoCyc: fusing model organism databases with systems biology. Nucleic Acids Res. 41, D605–D612 (2013)
Juncker, A. S. et al. Prediction of lipoprotein signal peptides in Gram-negative bacteria. Protein Sci. 12, 1652–1662 (2003)
Krogh, A., Larsson, B., von Heijne, G. & Sonnhammer, E. L. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 305, 567–580 (2001)
Novick, A. & Weiner, M. Enzyme induction as an all-or-none phenomenon. Proc. Natl Acad. Sci. USA 43, 553–566 (1957)
Jensen, P. R., Westerhoff, H. V. & Michelsen, O. The use of lac-type promoters in control analysis. Eur. J. Biochem. 211, 181–191 (1993)
Guzman, L. M., Belin, D., Carson, M. J. & Beckwith, J. Tight regulation, modulation, and high-level expression by vectors containing the arabinose PBAD promoter. J. Bacteriol. 177, 4121–4130 (1995)
This work was supported by NIGMS Protein Structure Initiative grant U54-GM094597 to the Northeast Structural Genomics Consortium to J.F.H. and G.T.M., and NIH grant GM106372 to D.P.A. We thank B. Klingenberg, R. Gonzalez, M. Gottesman and V. de Crécy-Lagard for advice.
Two patent applications have been submitted related to results reported in this paper. G.T.M. is affiliated with Nexomics Inc., and G.B., G.T.M., D.P.A. and J.F.H. are affiliated with OPTimum Protein Technologies.
Extended data figures and tables
Extended Data Figure 1 Phylogenic distribution of the proteins in the large-scale protein expression data set.
The colours in the cladogram encode the number of genes/proteins from each organism, as indicated by the legend. The data set includes 47 from eukaryotes (45 from humans and 2 from mouse), 809 from archaebacteria, and 96 from E. coli, with the remainder coming from other eubacteria. The organism contributing the largest number of proteins to the data set is the eubacterium Bacteroides thetaiotaomicron (150 proteins).
Extended Data Figure 2 Relationships between additional mRNA sequence parameters and results in the large-scale protein expression data set.
a, i, k, Histograms showing for each expression score the distribution of the overall G+C frequency (a), the frequency in all reading frames of the AGGA core sequence of the Shine–Dalgarno ribosome-binding sequence (i), and the amino acid repetition rate r (k; see Methods for definition). The parameter distributions in the E = 5 and E = 0 categories (n = 3,727 for both combined) are shown in a in dark and light blue, respectively, and in i and k in red and black, respectively. The symbols used for the histograms for the intermediate expression scores (n = 2,621 for all combined) are indicated in the legend for each panel. b–h, j, l–o, Plots showing the logarithm of the ratio of the number of proteins with E = 5 versus E = 0 scores as a function of parameter value. b, Data for the overall frequencies of the four individual nucleotide bases as well as the combined G + C frequency (labelled GC). c–e, The equivalent data separately for the first (c), second (d) and third (e) positions in the codons in the genes. f, Data for genes either not containing or containing at least one occurrence of the ATA–ATA di-codon (P = 2 × 10−32). The error bars in this panel represent 95% confidence limits calculated from bootstrapping; the error bars for the genes without any occurrence of this di-codon are smaller than the size of the symbol. g, h, Data for the codon adaptation index14 (g) and tRNA adaptation index16 (h). j, Data for the frequency in all reading frames of the sequence AGGA. l, m, Data for the amino acid repetition rate r (l) and the codon repetition rate (m). n, o, Data for the statistical entropy of the amino acid (n) and codon sequences (o). The data in a–e, i and k are binned in equal ranges of the parameter value, while the data in g, h, j and l–o are binned in deciles containing equal populations.
Extended Data Figure 3 Correlations between sequence parameters in the genes included in the large-scale protein expression data set.
a–c, Corrgrams representing the signed Pearson correlations coefficients between different mRNA sequence parameters in the genes in the E = 0 plus E = 5 categories in the data set (n = 3,727 for the two combined). The colour-coding is defined schematically on the left in a, with blue being used for positively correlated variables, red for negatively correlated variables, and white for uncorrelated variables. In a, E represents the expression score in the binary categories (0, 5), sall represents the mean value of our new codon-influence metric (coloured symbols in Fig. 3a) over the entire gene (without the LEHHHHH tag), s7–16 and s17–32 represent the mean values of this metric for codons 7–16 and 17–32, respectively, ΔGUH represents the predicted free energy of mRNA folding for the 5′-UTR from the pET21 expression vector plus the first 48 nucleotides in the gene, <∆GT>96 represents the mean value in the remainder of the gene of the predicted free energy of folding in 50% overlapping windows of 96 nucleotides, I represents an indicator variable that assumes a value of 0 or 1 if (ΔGUH <−39 kcal mol−1) and (%GC2–6 > 0.65), dAUA assumes a value of 0 or 1 if there is at least one occurrence of the ATA–ATA di-codon, r represents the codon repetition rate (see Methods), and %GC represents the percentage content of G plus C bases in the gene. The variables aH, aH2, gH2 and u3H represent monomial functions of the fractional content of A, G and U bases in codons 2–6; the correlation coefficient for these nucleotide-composition terms was calculated using their sum weighted by their optimized coefficients from model M (Fig. 4 and Extended Data Table 1a), as given in the equation in the main text. b, Data for the frequencies of the codons positively correlated with expression score E. c, Data for the frequencies of the codons negatively correlated with expression score E. d–g, Two-dimensional histograms illustrating the dependence of results in the large-scale protein-expression data set on pairs of sequence parameters. The colours encode the fractional excess of proteins with E = 5 versus E = 0 scores (that is, (#E5 − #E0)/(#E5 + #E0)), as calibrated by the scale bar on the right. The area of each square is proportional to the number of proteins in that bin in the two-dimensional parameter space. The variables sall, s7–16 and stail represent, respectively, the mean values of our new codon-influence metric for the entire gene, for codons 7–16, and for all of the remaining codons downstream in the gene. ΔGUH represents the predicted free energy of mRNA folding for the 5′-UTR from the pET21 expression vector plus the first 48 nucleotides in the gene, <∆GT>96 represents the mean value in the remainder of the gene of the predicted free energy of folding in 50% overlapping windows of 96 nucleotides, and r represents the amino acid repetition rate (as defined in Methods).
Extended Data Figure 4 Relationship of the new codon-influence metric to parameters assumed to influence translation efficiency in previous literature.
a, Average frequency of each non-stop codon in the genes in just the E = 0 plus E = 5 categories (dark grey) or in the E = 0 through E = 5 categories (light grey), with error bars representing the s.d. of the frequency among the genes in each set. b, Codon slopes from single-variable binary logistic regressions (dark grey symbols in Fig. 3a) segregated according to the identity of the nucleotide at each of the three positions in the codon. These slopes come from single-variable linear logistic regressions that were performed separately for each of the individual 61 non-stop codons. c, Codon slopes from the simultaneous multi-parameter binary logistic regression model M (Extended Data Table 1a and coloured symbols in Fig. 3a) segregated according to the identity of the nucleotide at each of the three positions in the codon. d–h, The codon slopes from model M plotted versus the relative synonymous codon usage (RSCU) in E. coli BL21 (e), the codon adaptation index14 in E. coli K12 (f), the codon sensitivity24 in E. coli K12 (d), the tRNA adaptation index16 in E. coli K12 (g), and the concentration of exactly cognate tRNAs23 in E. coli K12 (h). The shapes and colour-coding of the symbols in b–h, which are the same as in Fig. 3, encode structural and qualitative chemical characteristics of the amino acids.
Extended Data Figure 5 Variation in codon influence as a function of position in the coding sequence.
Plots showing the reduction in the deviance of the computational model resulting from adding a term representing the average value of the codon slope (coloured symbols in Fig. 3a) in a window 5, 10 or 16 codons wide starting at the position indicated on the abscissa (that is, c through (c + 4) in blue, c through (c + 9) in red, or c through (c + 15) in purple, respectively, with c representing the number of the first codon in the window). The reduction in deviance was calculated relative to a base model containing codon frequencies in the entire coding sequence, head nucleotide composition terms (aH, aH2, u3H and gH2), the predicted free energy of RNA folding in the head plus the 5′-UTR (ΔGUH), the binary indicator variable for head folding effects I, the binary variable indicating the occurrence of an AUAAUA di-codon dAUA, and the codon repetition rate r (n = 3,727). The mean slope of codons 2–6 presumably does not improve the model because the head-composition terms rather than codon content dominate the influence of this region on protein-expression level. This effect also probably accounts for the peaks in the sc − (c + 9) and sc − (c + 15) plots for windows starting at codon 7. For reference, adding s7–16 and s16–32 terms to model M contributes 29.7 points (P = 5 × 10−8) and 12 points (P = 5 × 10−4) of model deviance, respectively (Extended Data Table 1 and Fig. 4a). Dropping out terms to measure their influence (Fig. 4a) shows every codon contributes on average (423.7/270) = 1.6 deviance units, while codons 7–16 each contribute on average an additional (29.6/10) = 3.0 deviance units. Therefore, individual codons at positions 7–16 are approximately three times more influential than those in the tail of the gene.
Extended Data Figure 6 Further experiments on synthetic genes designed to enhance protein expression.
a–d, Data for three additional proteins equivalent to the data presented in Fig. 5. The in vivo and in vitro expression properties from pET vectors are compared for inefficiently translated native (WT) genes and synonymous genes redesigned in the head or the tail or both using the 6AA, 31C-FO or 31C-FD methods. The type of sequence in the head (H) is indicated separately from that in the tail (T), and the name of the target protein is indicated on the left on each row. a, E. coli BL21(DE3) host cell growth curves at room temperature after induction of the target gene at time zero in chemically defined MJ9 medium. b, Coomassie-blue-stained SDS–PAGE gels of whole cells after overnight induction at 17 °C, with the amount loaded in each lane normalized to the A600 nm of the culture at the time of harvest. Black arrows indicate the migration positions of the target proteins. c, Autoradiographs of SDS–PAGE gels of in vitro translation reactions using fully purified translation components in the presence of [35S]methionine. Each reaction contained an equal amount of purified mRNA that was transcribed in vitro using T7 RNA polymerase. d, Northern blot analyses of the mRNA for the target protein after induction of expression in vivo. An equal amount of total RNA was loaded in each lane, and blots were hybridized with a probe matching the 5′-UTR. e, f, Coomassie blue stained SDS–PAGE gels (e) and anti-tetrahistidine western blots (f) showing that gene optimization has equivalent effects at physiological protein expression levels. Pairs of synonymous native (WT) and codon-optimized 31C-FOH/T genes with C-terminal hexahistidine tags were re-cloned under control of the arabinose-inducible promoter in a pBAD vector62, and the concentration of arabinose in the growth medium was adjusted so the 31C-FOH/T genes yielded protein expression in the physiological range as assessed from Coomassie blue stained SDS-PAGE gels of whole cell extracts. Black arrows indicate locations of the induced target proteins. Substantially lower protein expression from the wild-type genes compared to the synonymous 31C-FOH/T genes in these experiments demonstrates that equivalent codon-usage effects are observed when proteins are overexpressed using a pET vector or expressed at roughly phyiological levels using a pBAD vector, despite changes explained in the online Methods in the polymerase used to transcribe the genes, the medium used to grow the cells, and the timescale and temperature of the protein-induction process.The constitutively expressed ~25-kDa protein that reacts with the anti-tetrahistidine antibody in the cells containing the 31C-FOH/T gene for YcaQ is probably an amino-terminally truncated protein synthesized from a 5′-truncated mRNA transcribed from an internal promoter sequence fortuitously introduced into this synthetic gene. Uncropped scans of the gels shown here are included in Supplementary Fig. 1.
Extended Data Figure 7 In vivo expression of synthetic genes with sequences optimized using the 31C-FO method.
a, Coomassie-blue-stained SDS–PAGE gels of whole-cell extracts after overnight induction at 17 °C of synthetic genes designed using the 31C-FOH method to encode 17 different proteins. All genes were cloned in-frame with a C-terminal hexa-histidine tag in the same pET21 plasmid derivative used to generate our large-scale protein-expression data set38. Equal volumes of induced cultures were loaded in all lanes. b, Coomassie-blue-stained SDS–PAGE gels of whole-cell extracts (top) and the corresponding soluble fractions (bottom) after overnight induction at 17 °C of 14 of the synthetic genes fused in-frame at the C terminus of the gene for the E. coli maltose-binding protein (MBP). The protein sequences come from the following source organisms: LCABL_04230 from Lactobacillus casei BL23; VIPARP466_2889 from Vibrio parahaemolyticus; AM1_4824 from Acaryochloris marina MBIC11017; CLO_0718 from Clostridium botulinum E1; ESAG_04692 from Escherichia sp. 3_2_53FAA; FTCG_00666 and FTCG_01175 from Francisella tularensis subsp. novicida GA99-3549; FTE_1275, FTE_1608, FTE_0420 and FTE_1020 from Francisella tularensis subsp. novicida FTE; FRANO wbtG and A1DS62_FRANO from Francisella novicidal; FTBG_00988 and A7JEH2_FRATL from Francisella tularensis subsp. tularensis FSC033; FTN_1238 from Francisella tularensis subsp. novicida U112; O1O_09285 from Pseudomonas aeruginosa MPAO1/P1; Sthe_2331 from Sphaerobacter thermophilus DSM20745/S6022; SEVCU126_0606 from Staphylococcus epidermidis VCU126; and Y007_20720 from Salmonella enterica subsp. enterica serovar Montevideo 507440-20.
a, Final yield of mRNA purified from reactions conducted under identical conditions, as described in the Methods. The yields were calculated from the optical density at 260 nm. b–e, Kinetic analyses of in vitro transcription reactions using formaldehyde-agarose gel electrophoresis. Samples were taken at 0, 5, 10 and 30 min. The gels were stained with ethidium bromide. The ‘standard’ lane contains 1 μg of the same mRNA after purification to enable calibration for differences in the sensitivity of the molecules to staining. Reactions were started by addition of the wild-type or 31C-FOH/31C-FOT (31C-FOH/T) linearized plasmids encoding SRU_1983 (b), APE_0230.1 (c), SCO1897 (d), or Eco-YcaQ (e).
This file contains Supplementary Text, Supplementary References and Supplementary Figure 1, the uncropped gel presented in Extended Data Figures 6 and 7. (PDF 2305 kb)
This file contains the value, p-value and standard deviation for the Single parameter regressions and the Multiparameter Model M. (XLSX 60 kb)
This file contains the expression values, the sequences and calculated parameters for the 6348 proteins dataset. (XLSX 2158 kb)
This file contains the sequences and parameters of the optimized genes. (XLSX 66 kb)
About this article
Cite this article
Boël, G., Letso, R., Neely, H. et al. Codon influence on protein expression in E. coli correlates with mRNA levels. Nature 529, 358–363 (2016). https://doi.org/10.1038/nature16509
This article is cited by
BMC Bioinformatics (2022)
Scientific Reports (2022)
Consistent Clustering Pattern of Prokaryotic Genes Based on Base Frequency at the Second Codon Position and its Association with Functional Category Preference
Interdisciplinary Sciences: Computational Life Sciences (2022)
Plant Growth Regulation (2022)
The Influence of Codon Usage, Protein Abundance, and Protein Stability on Protein Evolution Vary by Evolutionary Distance and the Type of Protein
The Protein Journal (2022)