Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Predictable tuning of protein expression in bacteria


We comprehensively assessed the contribution of the Shine-Dalgarno sequence to protein expression and used the data to develop EMOPEC (Empirical Model and Oligos for Protein Expression Changes; EMOPEC is a free tool that makes it possible to modulate the expression level of any Escherichia coli gene by changing only a few bases. Measured protein levels for 91% of our designed sequences were within twofold of the desired target level.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Rent or buy this article

Get just this article for as long as you need it


Prices may be subject to local taxes which are calculated during checkout

Figure 1: Characterization of the E. coli SD sequence.
Figure 2: Experimental validation of the EMOPEC algorithm.


  1. Mutalik, V.K. et al. Nat. Methods 10, 347–353 (2013).

    Article  CAS  Google Scholar 

  2. Mutalik, V.K. et al. Nat. Methods 10, 354–360 (2013).

    Article  CAS  Google Scholar 

  3. Kosuri, S. et al. Proc. Natl. Acad. Sci. USA 110, 14024–14029 (2013).

    Article  CAS  Google Scholar 

  4. Goodman, D.B., Church, G.M. & Kosuri, S. Science 342, 475–479 (2013).

    Article  CAS  Google Scholar 

  5. Lee, J.W. et al. Nat. Chem. Biol. 8, 536–546 (2012).

    Article  CAS  Google Scholar 

  6. Woolston, B.M., Edgar, S. & Stephanopoulos, G. Annu. Rev. Chem. Biomol. Eng. 4, 259–288 (2013).

    Article  CAS  Google Scholar 

  7. Wang, H.H. et al. Nature 460, 894–898 (2009).

    Article  CAS  Google Scholar 

  8. Bonde, M.T. et al. ACS Synth. Biol. 4, 17–22 (2015).

    Article  CAS  Google Scholar 

  9. Sommer, M.O., Church, G.M. & Dantas, G. Mol. Syst. Biol. 6, 360 (2010).

    Article  Google Scholar 

  10. Klumpp, S., Zhang, Z. & Hwa, T. Cell 139, 1366–1375 (2009).

    Article  Google Scholar 

  11. Gold, L. Annu. Rev. Biochem. 57, 199–233 (1988).

    Article  CAS  Google Scholar 

  12. Shine, J. & Dalgarno, L. Proc. Natl. Acad. Sci. USA 71, 1342–1346 (1974).

    Article  CAS  Google Scholar 

  13. Schurr, T., Nadir, E. & Margalit, H. Nucleic Acids Res. 21, 4019–4023 (1993).

    Article  CAS  Google Scholar 

  14. Shultzaberger, R.K., Bucheimer, R.E., Rudd, K.E. & Schneider, T.D. J. Mol. Biol. 313, 215–228 (2001).

    Article  CAS  Google Scholar 

  15. Salis, H.M. Methods Enzymol. 498, 19–42 (2011).

    Article  CAS  Google Scholar 

  16. Reeve, B., Hargest, T., Gilbert, C. & Ellis, T. Front. Bioeng. Biotechnol. 2, 1–6 (2014).

    Article  Google Scholar 

  17. Seo, S.W. et al. Metab. Eng. 15, 67–74 (2013).

    Article  CAS  Google Scholar 

  18. Salis, H.M., Mirsky, E.A. & Voigt, C.A. Nat. Biotechnol. 27, 946–950 (2009).

    Article  CAS  Google Scholar 

  19. Bonde, M.T. et al. Nucleic Acids Res. 42, W408–W415 (2014).

    Article  CAS  Google Scholar 

  20. Farasat, I. et al. Mol. Syst. Biol. 10, 731 (2014).

    Article  Google Scholar 

  21. Shaner, N.C. et al. Nat. Biotechnol. 22, 1567–1572 (2004).

    Article  CAS  Google Scholar 

  22. Waldo, G.S., Standish, B.M., Berendzen, J & Terwilliger, T.C. Nat. Biotechnol. 17, 691–695 (1999).

    Article  CAS  Google Scholar 

  23. Söderström, B. et al. Mol. Microbiol. 92, 1–9 (2014).

    Article  Google Scholar 

  24. Datsenko, K.A. & Wanner, B.L. Proc. Natl. Acad. Sci. U.S.A. 97, 6640–6645 (2000).

    Article  CAS  Google Scholar 

  25. Cherepanov, P.P. & Wackernagel, W. Gene 158, 9–14 (1995).

    Article  CAS  Google Scholar 

  26. Prasher, D.C., Eckenrode, V.K., Ward, W.W., Prendergast, F.G. & Cormier, M.J. Gene 111, 229–233 (1992).

    Article  CAS  Google Scholar 

  27. Sharon, E. et al. Nat. Biotechnol. 30, 521–530 (2012).

    Article  CAS  Google Scholar 

  28. Lorenz, R. et al. Algorithms Mol. Biol. 6, 26 (2011).

    Article  Google Scholar 

  29. Griffith, K.L. & Wolf, R.E. Biochem. Biophys. Res. Commun 290, 397–402 (2002).

    Article  CAS  Google Scholar 

  30. Rappsilber, J., Mann, M. & Ishihama, Y. Nat. Protoc. 2, 1896–1906 (2007).

    Article  CAS  Google Scholar 

  31. Bantscheff, M., Schirle, M., Sweetman, G., Rick, J. & Kuster, B. Anal. Bioanal. Chem. 389, 1017–1031 (2007).

    Article  CAS  Google Scholar 

  32. Pfaffl, M.W. Nucleic Acids Res. 29, e45 (2001).

    Article  CAS  Google Scholar 

  33. Herring, C.D. & Blattner, F.R. J. Bacteriol. 186, 6714–6720 (2004).

    Article  CAS  Google Scholar 

Download references


We thank H. Genee, A. Wallin, S. Cardinale and H. Wang for discussions and suggestions regarding this manuscript, and we thank A. Koza for assistance with DNA sequencing. The research leading to these results received funding from the Novo Nordisk Foundation through the Novo Nordisk Foundation Center for Biosustainability and the European Union Seventh Framework Programme (FP7-KBBE-2013-7-single-stage) under grant agreement 613745, Promys.

Author information

Authors and Affiliations



M.T.B., M.P., M.S.K., S.I.J., T.W. and S.H. conducted the experiments. M.T.B., M.S.K. and M.J.H. conducted bioinformatics and data analysis. A.T.N. supervised the flow cytometry experiments. S.H. supervised the proteomics experiments. M.O.A.S., M.T.B., M.P. and M.S.K. designed the study. M.O.A.S. conceived and supervised the project. M.T.B., M.S.K. and M.O.A.S. wrote the manuscript, and all authors contributed to editing of the manuscript.

Corresponding author

Correspondence to Morten O A Sommer.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 RBS Calculator values versus estimated expression.

Flow-seq estimated RBS strength compared with RBS Calculator estimated RBS strength. The highlighted data points (open red circles) are the 106 sequences measured as additional controls in fig. 1e and f. The downloadable version 1 of the RBS Calculator was used to estimate RBS strength.

Supplementary Figure 2 Random Forest prediction of missing sequences: cross-validation.

5-fold cross validation of Random Forest model for predicting SD strength of sequences not identified with >50 reads in the Flow-seq dataset. The out-of-bag estimate is R2 = 0.90, the five-fold cross-validation R2 = 0.89.

Supplementary Figure 3 Change in secondary structure for optilib oligos.

(a) Change in secondary structure for the transcripts of all the 40,526 targetting E. coli genes. The average change is 0.51 kcal/mol. * denotes previously observed changes in free energy that led to a significant change in expression levels19. (b) Distribution of oligos with 1-6 nucleotides changed from the wild-type in optilib.

Supplementary Figure 4 Distribution of predicted expression levels in the constrained optilib.

The 40,526 oligos designed to change expression of all E. coli genes. The oligos were designed with the constraint to not modify the coding sequence of overlapping genes. Even with the constraints, it was possible to design sequences with predicted expression close to the intended target value for most sequences.

Supplementary Figure 5 mCherry mRNA levels.

mRNA levels measured by Real Time PCR. No notable difference was found between mRNA levels of the different transcripts.

Supplementary Figure 6 Predicted versus measured protein levels for EMOPEC and RBS Calculator.

Measured expression values of the mCherry, LacZ, Ppc, AspC, Can, and AceA validation strains compared with the EMOPEC predicted values and RBS Calculator (V2.0) predicted values.

Supplementary Figure 7 Pooled measured expression values compared to EMOPEC and RBS Calculator predictions.

Measured expression levels for mCherry and the native E. coli genes LacZ, AceA, Can, Ppc and AspC for EMOPEC (left) and the RBS calculator (right). Linear regression, p < 0.001 for both plots.

Supplementary Figure 8 Signal distributions for each flow-sorted bin.

Flow cytometry signal distributions for each bin and Gaussian fit to the signal.

Supplementary Figure 9 Read-count distribution.

Distribution of merged read counts across all bins for the SD sequences.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–9 (PDF 1113 kb)

Supplementary Table 1

All 3,864 estimated GFP expression values. (XLSX 162 kb)

Supplementary Table 2

106 individually verified single colonies from the main library. (XLSX 14 kb)

Supplementary Table 3

EMOPEC designed oligos, ten expression levels for every annotated gene in E. coli K12 MG1655. (XLSX 8970 kb)

Supplementary Table 4

Strains, plasmids and nucleotide oligos used in this study. (XLSX 15 kb)

Supplementary Table 5

Barcoded primers used for sequencing the flow-sorted bins. (XLSX 9 kb)

Supplementary Table 6

Sequence counts from the flow-sorted bins. (XLSX 465 kb)

Supplementary Software

EMOPEC Python library. Source code for the EMOPEC algorithm as a Python library including the web server. (ZIP 1131 kb)

Source data

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bonde, M., Pedersen, M., Klausen, M. et al. Predictable tuning of protein expression in bacteria. Nat Methods 13, 233–236 (2016).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing