Abstract
We comprehensively assessed the contribution of the Shine-Dalgarno sequence to protein expression and used the data to develop EMOPEC (Empirical Model and Oligos for Protein Expression Changes; http://emopec.biosustain.dtu.dk). EMOPEC is a free tool that makes it possible to modulate the expression level of any Escherichia coli gene by changing only a few bases. Measured protein levels for 91% of our designed sequences were within twofold of the desired target level.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
A standardized genome architecture for bacterial synthetic biology (SEGA)
Nature Communications Open Access 07 October 2021
-
Deep learning suggests that gene expression is encoded in all parts of a co-evolving interacting gene regulatory structure
Nature Communications Open Access 01 December 2020
-
Predictive design of sigma factor-specific promoters
Nature Communications Open Access 16 November 2020
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout


References
Mutalik, V.K. et al. Nat. Methods 10, 347–353 (2013).
Mutalik, V.K. et al. Nat. Methods 10, 354–360 (2013).
Kosuri, S. et al. Proc. Natl. Acad. Sci. USA 110, 14024–14029 (2013).
Goodman, D.B., Church, G.M. & Kosuri, S. Science 342, 475–479 (2013).
Lee, J.W. et al. Nat. Chem. Biol. 8, 536–546 (2012).
Woolston, B.M., Edgar, S. & Stephanopoulos, G. Annu. Rev. Chem. Biomol. Eng. 4, 259–288 (2013).
Wang, H.H. et al. Nature 460, 894–898 (2009).
Bonde, M.T. et al. ACS Synth. Biol. 4, 17–22 (2015).
Sommer, M.O., Church, G.M. & Dantas, G. Mol. Syst. Biol. 6, 360 (2010).
Klumpp, S., Zhang, Z. & Hwa, T. Cell 139, 1366–1375 (2009).
Gold, L. Annu. Rev. Biochem. 57, 199–233 (1988).
Shine, J. & Dalgarno, L. Proc. Natl. Acad. Sci. USA 71, 1342–1346 (1974).
Schurr, T., Nadir, E. & Margalit, H. Nucleic Acids Res. 21, 4019–4023 (1993).
Shultzaberger, R.K., Bucheimer, R.E., Rudd, K.E. & Schneider, T.D. J. Mol. Biol. 313, 215–228 (2001).
Salis, H.M. Methods Enzymol. 498, 19–42 (2011).
Reeve, B., Hargest, T., Gilbert, C. & Ellis, T. Front. Bioeng. Biotechnol. 2, 1–6 (2014).
Seo, S.W. et al. Metab. Eng. 15, 67–74 (2013).
Salis, H.M., Mirsky, E.A. & Voigt, C.A. Nat. Biotechnol. 27, 946–950 (2009).
Bonde, M.T. et al. Nucleic Acids Res. 42, W408–W415 (2014).
Farasat, I. et al. Mol. Syst. Biol. 10, 731 (2014).
Shaner, N.C. et al. Nat. Biotechnol. 22, 1567–1572 (2004).
Waldo, G.S., Standish, B.M., Berendzen, J & Terwilliger, T.C. Nat. Biotechnol. 17, 691–695 (1999).
Söderström, B. et al. Mol. Microbiol. 92, 1–9 (2014).
Datsenko, K.A. & Wanner, B.L. Proc. Natl. Acad. Sci. U.S.A. 97, 6640–6645 (2000).
Cherepanov, P.P. & Wackernagel, W. Gene 158, 9–14 (1995).
Prasher, D.C., Eckenrode, V.K., Ward, W.W., Prendergast, F.G. & Cormier, M.J. Gene 111, 229–233 (1992).
Sharon, E. et al. Nat. Biotechnol. 30, 521–530 (2012).
Lorenz, R. et al. Algorithms Mol. Biol. 6, 26 (2011).
Griffith, K.L. & Wolf, R.E. Biochem. Biophys. Res. Commun 290, 397–402 (2002).
Rappsilber, J., Mann, M. & Ishihama, Y. Nat. Protoc. 2, 1896–1906 (2007).
Bantscheff, M., Schirle, M., Sweetman, G., Rick, J. & Kuster, B. Anal. Bioanal. Chem. 389, 1017–1031 (2007).
Pfaffl, M.W. Nucleic Acids Res. 29, e45 (2001).
Herring, C.D. & Blattner, F.R. J. Bacteriol. 186, 6714–6720 (2004).
Acknowledgements
We thank H. Genee, A. Wallin, S. Cardinale and H. Wang for discussions and suggestions regarding this manuscript, and we thank A. Koza for assistance with DNA sequencing. The research leading to these results received funding from the Novo Nordisk Foundation through the Novo Nordisk Foundation Center for Biosustainability and the European Union Seventh Framework Programme (FP7-KBBE-2013-7-single-stage) under grant agreement 613745, Promys.
Author information
Authors and Affiliations
Contributions
M.T.B., M.P., M.S.K., S.I.J., T.W. and S.H. conducted the experiments. M.T.B., M.S.K. and M.J.H. conducted bioinformatics and data analysis. A.T.N. supervised the flow cytometry experiments. S.H. supervised the proteomics experiments. M.O.A.S., M.T.B., M.P. and M.S.K. designed the study. M.O.A.S. conceived and supervised the project. M.T.B., M.S.K. and M.O.A.S. wrote the manuscript, and all authors contributed to editing of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 RBS Calculator values versus estimated expression.
Flow-seq estimated RBS strength compared with RBS Calculator estimated RBS strength. The highlighted data points (open red circles) are the 106 sequences measured as additional controls in fig. 1e and f. The downloadable version 1 of the RBS Calculator was used to estimate RBS strength.
Supplementary Figure 2 Random Forest prediction of missing sequences: cross-validation.
5-fold cross validation of Random Forest model for predicting SD strength of sequences not identified with >50 reads in the Flow-seq dataset. The out-of-bag estimate is R2 = 0.90, the five-fold cross-validation R2 = 0.89.
Supplementary Figure 3 Change in secondary structure for optilib oligos.
(a) Change in secondary structure for the transcripts of all the 40,526 targetting E. coli genes. The average change is 0.51 kcal/mol. * denotes previously observed changes in free energy that led to a significant change in expression levels19. (b) Distribution of oligos with 1-6 nucleotides changed from the wild-type in optilib.
Supplementary Figure 4 Distribution of predicted expression levels in the constrained optilib.
The 40,526 oligos designed to change expression of all E. coli genes. The oligos were designed with the constraint to not modify the coding sequence of overlapping genes. Even with the constraints, it was possible to design sequences with predicted expression close to the intended target value for most sequences.
Supplementary Figure 5 mCherry mRNA levels.
mRNA levels measured by Real Time PCR. No notable difference was found between mRNA levels of the different transcripts.
Supplementary Figure 6 Predicted versus measured protein levels for EMOPEC and RBS Calculator.
Measured expression values of the mCherry, LacZ, Ppc, AspC, Can, and AceA validation strains compared with the EMOPEC predicted values and RBS Calculator (V2.0) predicted values.
Supplementary Figure 7 Pooled measured expression values compared to EMOPEC and RBS Calculator predictions.
Measured expression levels for mCherry and the native E. coli genes LacZ, AceA, Can, Ppc and AspC for EMOPEC (left) and the RBS calculator (right). Linear regression, p < 0.001 for both plots.
Supplementary Figure 8 Signal distributions for each flow-sorted bin.
Flow cytometry signal distributions for each bin and Gaussian fit to the signal.
Supplementary Figure 9 Read-count distribution.
Distribution of merged read counts across all bins for the SD sequences.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–9 (PDF 1113 kb)
Supplementary Table 1
All 3,864 estimated GFP expression values. (XLSX 162 kb)
Supplementary Table 2
106 individually verified single colonies from the main library. (XLSX 14 kb)
Supplementary Table 3
EMOPEC designed oligos, ten expression levels for every annotated gene in E. coli K12 MG1655. (XLSX 8970 kb)
Supplementary Table 4
Strains, plasmids and nucleotide oligos used in this study. (XLSX 15 kb)
Supplementary Table 5
Barcoded primers used for sequencing the flow-sorted bins. (XLSX 9 kb)
Supplementary Table 6
Sequence counts from the flow-sorted bins. (XLSX 465 kb)
Supplementary Software
EMOPEC Python library. Source code for the EMOPEC algorithm as a Python library including the web server. (ZIP 1131 kb)
Source data
Rights and permissions
About this article
Cite this article
Bonde, M., Pedersen, M., Klausen, M. et al. Predictable tuning of protein expression in bacteria. Nat Methods 13, 233–236 (2016). https://doi.org/10.1038/nmeth.3727
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nmeth.3727
This article is cited by
-
Towards next-generation cell factories by rational genome-scale engineering
Nature Catalysis (2022)
-
A standardized genome architecture for bacterial synthetic biology (SEGA)
Nature Communications (2021)
-
Introduction of an AU-rich Element into the 5’ UTR of mRNAs Enhances Protein Expression in Escherichia coli by S1 Protein and Hfq Protein
Biotechnology and Bioprocess Engineering (2021)
-
Increased production of periplasmic proteins in Escherichia coli by directed evolution of the translation initiation region
Microbial Cell Factories (2020)
-
A machine learning Automated Recommendation Tool for synthetic biology
Nature Communications (2020)