Predictable tuning of protein expression in bacteria

Bonde, Mads T; Pedersen, Margit; Klausen, Michael S; Jensen, Sheila I; Wulff, Tune; Harrison, Scott; Nielsen, Alex T; Herrgård, Markus J; Sommer, Morten O A

doi:10.1038/nmeth.3727

Brief Communication
Published: 11 January 2016

Predictable tuning of protein expression in bacteria

Mads T Bonde¹^na1,
Margit Pedersen¹^na1,
Michael S Klausen¹^na1,
Sheila I Jensen¹,
Tune Wulff¹,
Scott Harrison¹,
Alex T Nielsen¹,
Markus J Herrgård¹ &
…
Morten O A Sommer¹

Nature Methods volume 13, pages 233–236 (2016)Cite this article

9713 Accesses
95 Citations
22 Altmetric
Metrics details

Subjects

Abstract

We comprehensively assessed the contribution of the Shine-Dalgarno sequence to protein expression and used the data to develop EMOPEC (Empirical Model and Oligos for Protein Expression Changes; http://emopec.biosustain.dtu.dk). EMOPEC is a free tool that makes it possible to modulate the expression level of any Escherichia coli gene by changing only a few bases. Measured protein levels for 91% of our designed sequences were within twofold of the desired target level.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Characterization of the *E. coli* SD sequence.**

**Figure 2: Experimental validation of the EMOPEC algorithm.**

Highly accurate protein structure prediction with AlphaFold

Article Open access 15 July 2021

scGPT: toward building a foundation model for single-cell multi-omics using generative AI

Article 26 February 2024

De novo design of protein structure and function with RFdiffusion

Article Open access 11 July 2023

References

Mutalik, V.K. et al. Nat. Methods 10, 347–353 (2013).
Article CAS Google Scholar
Mutalik, V.K. et al. Nat. Methods 10, 354–360 (2013).
Article CAS Google Scholar
Kosuri, S. et al. Proc. Natl. Acad. Sci. USA 110, 14024–14029 (2013).
Article CAS Google Scholar
Goodman, D.B., Church, G.M. & Kosuri, S. Science 342, 475–479 (2013).
Article CAS Google Scholar
Lee, J.W. et al. Nat. Chem. Biol. 8, 536–546 (2012).
Article CAS Google Scholar
Woolston, B.M., Edgar, S. & Stephanopoulos, G. Annu. Rev. Chem. Biomol. Eng. 4, 259–288 (2013).
Article CAS Google Scholar
Wang, H.H. et al. Nature 460, 894–898 (2009).
Article CAS Google Scholar
Bonde, M.T. et al. ACS Synth. Biol. 4, 17–22 (2015).
Article CAS Google Scholar
Sommer, M.O., Church, G.M. & Dantas, G. Mol. Syst. Biol. 6, 360 (2010).
Article Google Scholar
Klumpp, S., Zhang, Z. & Hwa, T. Cell 139, 1366–1375 (2009).
Article Google Scholar
Gold, L. Annu. Rev. Biochem. 57, 199–233 (1988).
Article CAS Google Scholar
Shine, J. & Dalgarno, L. Proc. Natl. Acad. Sci. USA 71, 1342–1346 (1974).
Article CAS Google Scholar
Schurr, T., Nadir, E. & Margalit, H. Nucleic Acids Res. 21, 4019–4023 (1993).
Article CAS Google Scholar
Shultzaberger, R.K., Bucheimer, R.E., Rudd, K.E. & Schneider, T.D. J. Mol. Biol. 313, 215–228 (2001).
Article CAS Google Scholar
Salis, H.M. Methods Enzymol. 498, 19–42 (2011).
Article CAS Google Scholar
Reeve, B., Hargest, T., Gilbert, C. & Ellis, T. Front. Bioeng. Biotechnol. 2, 1–6 (2014).
Article Google Scholar
Seo, S.W. et al. Metab. Eng. 15, 67–74 (2013).
Article CAS Google Scholar
Salis, H.M., Mirsky, E.A. & Voigt, C.A. Nat. Biotechnol. 27, 946–950 (2009).
Article CAS Google Scholar
Bonde, M.T. et al. Nucleic Acids Res. 42, W408–W415 (2014).
Article CAS Google Scholar
Farasat, I. et al. Mol. Syst. Biol. 10, 731 (2014).
Article Google Scholar
Shaner, N.C. et al. Nat. Biotechnol. 22, 1567–1572 (2004).
Article CAS Google Scholar
Waldo, G.S., Standish, B.M., Berendzen, J & Terwilliger, T.C. Nat. Biotechnol. 17, 691–695 (1999).
Article CAS Google Scholar
Söderström, B. et al. Mol. Microbiol. 92, 1–9 (2014).
Article Google Scholar
Datsenko, K.A. & Wanner, B.L. Proc. Natl. Acad. Sci. U.S.A. 97, 6640–6645 (2000).
Article CAS Google Scholar
Cherepanov, P.P. & Wackernagel, W. Gene 158, 9–14 (1995).
Article CAS Google Scholar
Prasher, D.C., Eckenrode, V.K., Ward, W.W., Prendergast, F.G. & Cormier, M.J. Gene 111, 229–233 (1992).
Article CAS Google Scholar
Sharon, E. et al. Nat. Biotechnol. 30, 521–530 (2012).
Article CAS Google Scholar
Lorenz, R. et al. Algorithms Mol. Biol. 6, 26 (2011).
Article Google Scholar
Griffith, K.L. & Wolf, R.E. Biochem. Biophys. Res. Commun 290, 397–402 (2002).
Article CAS Google Scholar
Rappsilber, J., Mann, M. & Ishihama, Y. Nat. Protoc. 2, 1896–1906 (2007).
Article CAS Google Scholar
Bantscheff, M., Schirle, M., Sweetman, G., Rick, J. & Kuster, B. Anal. Bioanal. Chem. 389, 1017–1031 (2007).
Article CAS Google Scholar
Pfaffl, M.W. Nucleic Acids Res. 29, e45 (2001).
Article CAS Google Scholar
Herring, C.D. & Blattner, F.R. J. Bacteriol. 186, 6714–6720 (2004).
Article CAS Google Scholar

Download references

Acknowledgements

We thank H. Genee, A. Wallin, S. Cardinale and H. Wang for discussions and suggestions regarding this manuscript, and we thank A. Koza for assistance with DNA sequencing. The research leading to these results received funding from the Novo Nordisk Foundation through the Novo Nordisk Foundation Center for Biosustainability and the European Union Seventh Framework Programme (FP7-KBBE-2013-7-single-stage) under grant agreement 613745, Promys.

Author information

Mads T Bonde, Margit Pedersen and Michael S Klausen: These authors contributed equally to this work.

Authors and Affiliations

Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Hørsholm, Denmark
Mads T Bonde, Margit Pedersen, Michael S Klausen, Sheila I Jensen, Tune Wulff, Scott Harrison, Alex T Nielsen, Markus J Herrgård & Morten O A Sommer

Authors

Mads T Bonde
View author publications
You can also search for this author in PubMed Google Scholar
Margit Pedersen
View author publications
You can also search for this author in PubMed Google Scholar
Michael S Klausen
View author publications
You can also search for this author in PubMed Google Scholar
Sheila I Jensen
View author publications
You can also search for this author in PubMed Google Scholar
Tune Wulff
View author publications
You can also search for this author in PubMed Google Scholar
Scott Harrison
View author publications
You can also search for this author in PubMed Google Scholar
Alex T Nielsen
View author publications
You can also search for this author in PubMed Google Scholar
Markus J Herrgård
View author publications
You can also search for this author in PubMed Google Scholar
Morten O A Sommer
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.T.B., M.P., M.S.K., S.I.J., T.W. and S.H. conducted the experiments. M.T.B., M.S.K. and M.J.H. conducted bioinformatics and data analysis. A.T.N. supervised the flow cytometry experiments. S.H. supervised the proteomics experiments. M.O.A.S., M.T.B., M.P. and M.S.K. designed the study. M.O.A.S. conceived and supervised the project. M.T.B., M.S.K. and M.O.A.S. wrote the manuscript, and all authors contributed to editing of the manuscript.

Corresponding author

Correspondence to Morten O A Sommer.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 RBS Calculator values versus estimated expression.

Flow-seq estimated RBS strength compared with RBS Calculator estimated RBS strength. The highlighted data points (open red circles) are the 106 sequences measured as additional controls in fig. 1e and f. The downloadable version 1 of the RBS Calculator was used to estimate RBS strength.

Supplementary Figure 2 Random Forest prediction of missing sequences: cross-validation.

5-fold cross validation of Random Forest model for predicting SD strength of sequences not identified with >50 reads in the Flow-seq dataset. The out-of-bag estimate is R² = 0.90, the five-fold cross-validation R² = 0.89.

Supplementary Figure 3 Change in secondary structure for optilib oligos.

(a) Change in secondary structure for the transcripts of all the 40,526 targetting E. coli genes. The average change is 0.51 kcal/mol. * denotes previously observed changes in free energy that led to a significant change in expression levels¹⁹. (b) Distribution of oligos with 1-6 nucleotides changed from the wild-type in optilib.

Supplementary Figure 4 Distribution of predicted expression levels in the constrained optilib.

The 40,526 oligos designed to change expression of all E. coli genes. The oligos were designed with the constraint to not modify the coding sequence of overlapping genes. Even with the constraints, it was possible to design sequences with predicted expression close to the intended target value for most sequences.

Supplementary Figure 5 mCherry mRNA levels.

mRNA levels measured by Real Time PCR. No notable difference was found between mRNA levels of the different transcripts.

Supplementary Figure 6 Predicted versus measured protein levels for EMOPEC and RBS Calculator.

Measured expression values of the mCherry, LacZ, Ppc, AspC, Can, and AceA validation strains compared with the EMOPEC predicted values and RBS Calculator (V2.0) predicted values.

Supplementary Figure 7 Pooled measured expression values compared to EMOPEC and RBS Calculator predictions.

Measured expression levels for mCherry and the native E. coli genes LacZ, AceA, Can, Ppc and AspC for EMOPEC (left) and the RBS calculator (right). Linear regression, p < 0.001 for both plots.

Supplementary Figure 8 Signal distributions for each flow-sorted bin.

Flow cytometry signal distributions for each bin and Gaussian fit to the signal.

Supplementary Figure 9 Read-count distribution.

Distribution of merged read counts across all bins for the SD sequences.

Source data

Source data to Fig. 1

Source data to Fig. 2

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bonde, M., Pedersen, M., Klausen, M. et al. Predictable tuning of protein expression in bacteria. Nat Methods 13, 233–236 (2016). https://doi.org/10.1038/nmeth.3727

Download citation

Received: 13 May 2015
Accepted: 13 November 2015
Published: 11 January 2016
Issue Date: March 2016
DOI: https://doi.org/10.1038/nmeth.3727

This article is cited by

A standardized genome architecture for bacterial synthetic biology (SEGA)
- Carolyn N. Bayer
- Maja Rennig
- Morten H. H. Nørholm
Nature Communications (2021)
Introduction of an AU-rich Element into the 5’ UTR of mRNAs Enhances Protein Expression in Escherichia coli by S1 Protein and Hfq Protein
- Hyang-Mi Lee
- Jun Ren
- Dokyun Na
Biotechnology and Bioprocess Engineering (2021)
Increased production of periplasmic proteins in Escherichia coli by directed evolution of the translation initiation region
- Kiavash Mirzadeh
- Patrick J. Shilling
- Daniel O. Daley
Microbial Cell Factories (2020)
A machine learning Automated Recommendation Tool for synthetic biology
- Tijana Radivojević
- Zak Costello
- Hector Garcia Martin
Nature Communications (2020)
Predictive design of sigma factor-specific promoters
- Maarten Van Brempt
- Jim Clauwaert
- Marjan De Mey
Nature Communications (2020)