Resource | Published:

Evaluation of 244,000 synthetic sequences reveals design principles to optimize translation in Escherichia coli

Nature Biotechnology volume 36, pages 10051015 (2018) | Download Citation

This article has been updated


Comparative analyses of natural and mutated sequences have been used to probe mechanisms of gene expression, but small sample sizes may produce biased outcomes. We applied an unbiased design-of-experiments approach to disentangle factors suspected to affect translation efficiency in E. coli. We precisely designed 244,000 DNA sequences implementing 56 replicates of a full factorial design to evaluate nucleotide, secondary structure, codon and amino acid properties in combination. For each sequence, we measured reporter transcript abundance and decay, polysome profiles, protein production and growth rates. Associations between designed sequences properties and these consequent phenotypes were dominated by secondary structures and their interactions within transcripts. We confirmed that transcript structure generally limits translation initiation and demonstrated its physiological cost using an epigenetic assay. Codon composition has a sizable impact on translatability, but only in comparatively rare elongation-limited transcripts. We propose a set of design principles to improve translation efficiency that would benefit from more accurate prediction of secondary structures in vivo.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Change history

  • 12 November 2018

    In the supplementary information originally posted for this article, the outer file extension for Supplementary Data 1, 2, 4–6, 9, 15, 22, 25 and 28 should have been zip instead of csv. Supplementary Data 16–21, 23, 24, 26, 27, 29–32, 34 and 35 should have had inner and outer file extensions of instead of just zip. In addition, the wrong version of Supplementary Code 28 was posted. These file have been reposted.


Primary accessions

Sequence Read Archive

Referenced accessions

NCBI Reference Sequence


  1. 1.

    , , & Quantifying absolute protein synthesis rates reveals principles underlying allocation of cellular resources. Cell 157, 624–635 (2014).

  2. 2.

    & Codon preferences in free-living microorganisms. Microbiol. Rev. 54, 198–210 (1990).

  3. 3.

    , , , & Interdependence of cell growth and gene expression: origins and consequences. Science 330, 1099–1102 (2010).

  4. 4.

    , , & Quantifying cellular capacity identifies gene expression designs with reduced burden. Nat. Methods 12, 415–418 (2015).

  5. 5.

    et al. Gene architectures that minimize cost of gene expression. Mol. Cell 65, 142–153 (2017).

  6. 6.

    Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system. J. Mol. Biol. 151, 389–409 (1981).

  7. 7.

    & The codon adaptation index—a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 15, 1281–1295 (1987).

  8. 8.

    & Codon Evolution (Oxford Univ. Press, 2012).

  9. 9.

    , & Ribosome collisions and translation efficiency: optimization by codon usage and mRNA destabilization. J. Mol. Biol. 382, 236–245 (2008).

  10. 10.

    & Positively charged residues are the major determinants of ribosomal velocity. PLoS Biol. 11, e1001508 (2013).

  11. 11.

    et al. Causal signals between codon bias, mRNA structure, and the efficiency of translation and elongation. Mol. Syst. Biol. 10, 770 (2014).

  12. 12.

    , , & Secondary structure across the bacterial transcriptome reveals versatile roles in mRNA regulation and function. PLoS Genet. 11, e1005613 (2015).

  13. 13.

    & Scanning model for translational reinitiation in eubacteria. J. Mol. Biol. 213, 811–818 (1990).

  14. 14.

    , , & Coding-sequence determinants of gene expression in Escherichia coli. Science 324, 255–258 (2009).

  15. 15.

    et al. Precise and reliable gene expression via standard transcription and translation initiation elements. Nat. Methods 10, 354–360 (2013).

  16. 16.

    , & Translation rate is controlled by coupled trade-offs between site accessibility, selective RNA unfolding and sliding at upstream standby sites. Nucleic Acids Res. 42, 2646–2659 (2014).

  17. 17.

    & Multiple roles of the coding sequence 5′ end in gene expression regulation. Nucleic Acids Res. 43, 13–28 (2015).

  18. 18.

    et al. An evolutionarily conserved mechanism for controlling the efficiency of protein translation. Cell 141, 344–354 (2010).

  19. 19.

    et al. Composite effects of gene determinants on the translation speed and density of ribosomes. Genome Biol. 12, R110 (2011).

  20. 20.

    & Positive charge loading at protein termini is due to membrane protein topology, not a translational ramp. Mol. Biol. Evol. 31, 70–84 (2014).

  21. 21.

    , & Causes and effects of N-terminal codon bias in bacterial genes. Science 342, 475–479 (2013).

  22. 22.

    , & Multifactorial determinants of protein expression in prokaryotic open reading frames. J. Mol. Biol. 402, 905–918 (2010).

  23. 23.

    , , & Practical applications of design of experiments in the field of engineering: a bibliographical review. Qual. Reliab. Eng. Int. 24, 417–428 (2008).

  24. 24.

    Design and Analysis of Experiments (Wiley, 2017).

  25. 25.

    , , , & Algorithmic co-optimization of genetic constructs and growth conditions: application to 6-ACA, a potential nylon-6 precursor. Nucleic Acids Res. 43, 10560–10570 (2015).

  26. 26.

    , , , & Experimental design-aided systematic pathway optimization of glucose uptake and deoxyxylulose phosphate pathway for improved amorphadiene production. Appl. Microbiol. Biotechnol. 99, 3825–3837 (2015).

  27. 27.

    et al. Quantitative estimation of activity and quality for collections of functional genetic elements. Nat. Methods 10, 347–353 (2013).

  28. 28.

    et al. Composability of regulatory sequences controlling transcription and translation in Escherichia coli. Proc. Natl. Acad. Sci. USA 110, 14024–14029 (2013).

  29. 29.

    et al. Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells. Science 329, 533–538 (2010).

  30. 30.

    , , & D-Tailor: automated analysis and design of DNA sequences. Bioinformatics 30, 1087–1094 (2014).

  31. 31.

    , , , & Engineering and characterization of a superfolder green fluorescent protein. Nat. Biotechnol. 24, 79–88 (2006).

  32. 32.

    , , & An enhanced system for unnatural amino acid mutagenesis in E. coli. J. Mol. Biol. 395, 361–374 (2010).

  33. 33.

    et al. Inferring gene regulatory logic from high-throughput measurements of thousands of systematically designed promoters. Nat. Biotechnol. 30, 521–530 (2012).

  34. 34.

    , , , & Genome-wide probing of RNA structure reveals active unfolding of mRNA structures in vivo. Nature 505, 701–705 (2014).

  35. 35.

    & Requirements for translation re-initiation in Escherichia coli: roles of initiator tRNA and initiation factors IF2 and IF3. Mol. Microbiol. 67, 1012–1026 (2008).

  36. 36.

    et al. RNA structural determinants of optimal codons revealed by MAGE-seq. Cell Syst. 3, 563–571.e6 (2016).

  37. 37.

    , & Solving the riddle of codon usage preferences: a test for translational selection. Nucleic Acids Res. 32, 5036–5044 (2004).

  38. 38.

    , & CBDB: the codon bias database. BMC Bioinformatics 13, 62 (2012).

  39. 39.

    & The effect of tRNA levels on decoding times of mRNA codons. Nucleic Acids Res. 42, 9171–9181 (2014).

  40. 40.

    et al. Codon influence on protein expression in E. coli correlates with mRNA levels. Nature 529, 358–363 (2016).

  41. 41.

    , & The effects of codon context on in vivo translation speed. PLoS Genet. 10, e1004392 (2014).

  42. 42.

    & Transposon insertion sequencing: a new tool for systems-level analysis of microorganisms. Nat. Rev. Microbiol. 11, 435–442 (2013).

  43. 43.

    & Optimality and evolutionary tuning of the expression level of a protein. Nature 436, 588–592 (2005).

  44. 44.

    , & Dependency on medium and temperature of cell size and chemical composition during balanced growth of Salmonellatyphimurium. J. Gen. Microbiol. 19, 592–606 (1958).

  45. 45.

    bacteria tune translation efficiency? Curr. Opin. Microbiol. 24, 66–71 (2015).

  46. 46.

    & Lost in translation: the influence of ribosomes on bacterial mRNA decay. Genes Dev. 19, 2526–2533 (2005).

  47. 47.

    et al. Codon optimality is a major determinant of mRNA stability. Cell 160, 1111–1124 (2015).

  48. 48.

    , & Messenger RNA degradation in bacterial cells. Annu. Rev. Genet. 48, 537–559 (2014).

  49. 49.

    & Shutdown in protein synthesis due to the expression of mini-genes in bacteria. J. Mol. Biol. 291, 745–759 (1999).

  50. 50.

    et al. A fast and agnostic method for bacterial genome-wide association studies: bridging the gap between kmers and genetic events. Preprint at bioRxiv (2018).

  51. 51.

    The selection-mutation-drift theory of synonymous codon usage. Genetics 129, 897–907 (1991).

  52. 52.

    , , , & Rate-limiting steps in yeast protein translation. Cell 153, 1589–1601 (2013).

  53. 53.

    , & Ribosome traffic on mRNAs maps to gene ontology: genome-wide quantification of translation initiation rates and polysome size regulation. PLoS Comput. Biol. 9, e1002866 (2013).

  54. 54.

    et al. Escherichia coli ribosomal protein S1 unfolds structured mRNAs onto the ribosome for active translation initiation. PLoS Biol. 11, e1001731 (2013).

  55. 55.

    et al. Structured mRNAs regulate translation initiation by binding to the platform of the ribosome. Cell 130, 1019–1031 (2007).

  56. 56.

    et al. The ribosome uses two active mechanisms to unwind messenger RNA during translation. Nature 475, 118–121 (2011).

  57. 57.

    et al. Using in-cell SHAPE-Seq and simulations to probe structure-function design principles of RNA transcriptional regulators. RNA 22, 920–933 (2016).

  58. 58.

    , , , & Genome-wide profiling of in vivo RNA structure at single-nucleotide resolution using structure-seq. Nat. Protoc. 10, 1050–1066 (2015).

  59. 59.

    & Large-scale de novo DNA synthesis: technologies and applications. Nat. Methods 11, 499–507 (2014).

  60. 60.

    , & Improved gfp and inaZ broad-host-range promoter-probe vectors. Mol. Plant Microbe Interact. 13, 1243–1250 (2000).

  61. 61.

    et al. BglBrick vectors and datasheets: a synthetic biology platform for gene expression. J. Biol. Eng. 5, 12 (2011).

  62. 62.

    & Controlled intracellular processing of fusion proteins by TEV protease. Protein Expr. Purif. 19, 312–318 (2000).

  63. 63.

    , , & The P1′ specificity of tobacco etch virus protease. Biochem. Biophys. Res. Commun. 294, 949–955 (2002).

  64. 64.

    et al. Measurement and modeling of intrinsic transcription terminators. Nucleic Acids Res. 41, 5139–5148 (2013).

  65. 65.

    & Using chromosomal lacIQ1 to control expression of genes on high-copy-number plasmids in Escherichia coli. Gene 223, 221–231 (1998).

  66. 66.

    , , & Stochastic gene expression in a single cell. Science 297, 1183–1186 (2002).

  67. 67.

    , , & A high-throughput, quantitative cell-based screen for efficient tailoring of RNA device activity. Nucleic Acids Res. 40, e154 (2012).

  68. 68.

    et al. Emergent properties of reduced-genome Escherichia coli. Science 312, 1044–1046 (2006).

  69. 69.

    , , , & Low-mutation-rate, reduced-genome Escherichia coli: an improved host for faithful maintenance of engineered genetic constructs. Microb. Cell Fact. 11, 11 (2012).

  70. 70.

    & Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595 (2010).

  71. 71.

    , , & A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19, 185–193 (2003).

  72. 72.

    , & Tn-seq: high-throughput parallel sequencing for fitness and genetic interaction studies in microorganisms. Nat. Methods 6, 767–772 (2009).

  73. 73.

    et al. Selective ribosome profiling reveals the cotranslational chaperone action of trigger factor in vivo. Cell 147, 1295–1308 (2011).

  74. 74.

    & Analysis of polysomes from bacteria. Methods Enzymol. 530, 159–172 (2013).

  75. 75.

    R Core Team. R: a language and environment for statistical computing (2017).

  76. 76.

    & Using effect size—or why the P value is not enough. J. Grad. Med. Educ. 4, 279–282 (2012).

Download references


We thank V. Mutalik, C. Liu, L. Jacob, M. Price, A. Deutschbauer, M. Samoilov, P. Shah, J. Plotkin, J. Savitskaya and L. Ciandrini for discussions. We are grateful to the Agilent Laboratories and the Synthetic Biology Institute (SBI) for providing the OLS array. We thank J. Sampson, P. Anderson and S. Laderman from Agilent Laboratories for discussing OLS setup and processing. G.C. was funded by the Human Frontier Science Program (LT000873/2011-l), J.C.G. by the Portuguese Fundação para a Ciência e Tecnologia (SFRH/BD/47819/2008). We acknowledge financial support by the Synthetic Biology Engineering Research Center (SynBERC under National Science Foundation grant 04-570/0540879). This work used the Vincent J. Coates Genomics Sequencing Laboratory at UC Berkeley (NIH S10 Instrumentation Grants S10RR029668 and S10RR027303).

Author information


  1. California Institute for Quantitative Biosciences, University of California, Berkeley, Berkeley, California, USA.

    • Guillaume Cambray
    •  & Joao C Guimaraes
  2. DGIMI, Univ. Montpellier, INRA, Montpellier, France.

    • Guillaume Cambray
  3. Department of Bioengineering, University of California, Berkeley, Berkeley, California, USA.

    • Joao C Guimaraes
    •  & Adam Paul Arkin
  4. Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA.

    • Adam Paul Arkin


  1. Search for Guillaume Cambray in:

  2. Search for Joao C Guimaraes in:

  3. Search for Adam Paul Arkin in:


G.C. and A.P.A. conceived the work; G.C. and J.C.G. designed sequences; G.C. performed experiments and processed data; G.C. and A.P.A. analyzed the data and J.C.G. contributed post hoc secondary structure analyses; G.C. and A.P.A. wrote the manuscript.

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Guillaume Cambray or Adam Paul Arkin.

Integrated supplementary information

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–14

  2. 2.

    Life Sciences Reporting Summary

  3. 3.

    Supplementary Tables

    Supplementary Tables 1–3

  4. 4.

    Supplementary Notes

    Supplementary Note 1

Text files

  1. 1.

    Supplementary Code 1

    Parameter file. Used to parameterize python scripts involved with processing of sequencing data (Supplementary Code 2–5).

  2. 2.

    Supplementary Code 2

    De-multiplex fastq. Python script to identify and trim custom sequencing barcodes. Support parallelization. Outputs a separate fastq file for each barcode.

  3. 3.

    Supplementary Code 3

    Python wrapper for BWA and samtools. Produce mapping and quality check of the reads by calling BWA and samtools. Support parallelization.

  4. 4.

    Supplementary Code 4

    Read counter. Python script to summarize the number of read mapping to each target sequence from the bam files generated by Supplementary Code 3. Support parallelization.

  5. 5.

    Supplementary Code 5

    Count aggregator. Python script to aggregate count tables generated by Supplementary Code 4.

  6. 6.

    Supplementary Code 6

    Processing of protein production under regular and facilitated initiation from FACS-seq data. R script to normalize, rescale and aggregate read count data from multiple FACS-seq replicate experiments. Convert digital read distribution into a continuous linear measure of protein production ranging between 1 and 100 (PNI and PFI).

  7. 7.

    Supplementary Code 7

    Computation of ANOVA's sum of squares for PNI. R script to run an ANOVA on PNI data and extract the sum of squares accounted by design properties and their first-order interactions.

  8. 8.

    Supplementary Code 8

    Computation of sum of squares for multiple linear regression of PNI on design properties. R script to run a multiple linear regression on PNI data and extract the ANOVA-like sum of squares accounted by design properties and their first-order interactions.

  9. 9.

    Supplementary Code 9

    Regression tree analysis for PNI. R script to run a CART analysis on PNI.

  10. 10.

    Supplementary Code 10

    Computation of ANOVA's sum of squares for PFI. R script to run an ANOVA on PFI data and extract the sum of squares accounted for by design properties and their first-order interactions.

  11. 11.

    Supplementary Code 11

    Computation of sum of squares for multiple linear regression of PFI on design properties. R script to run a multiple linear regression on PFI data and extract the ANOVA-like sum of squares accounted for by design properties and their first-order interactions.

  12. 12.

    Supplementary Code 12

    Regression tree analysis for PFI. R script to run a CART analysis on PFI.

  13. 13.

    Supplementary Code 13

    Effect of structure strength predicted across sliding windows of different sizes. R script to run linear regression of PNI and PFI against minimal free energies computed over sliding windows of different length. Report the ANOVA-like sum of squares.

  14. 14.

    Supplementary Code 14

    Multiple linear regression of PNI and PFI on predicted nucleotide accessibilities. R script to run a multiple linear regression of protein production data on predicted nucleotide availabilities. Report the ANOVA-like sum of squares accounted by every position.

  15. 15.

    Supplementary Code 15

    Call to the RBS calculator web service. Python script to remotely run the RBS calculator on designed sequences.

  16. 16.

    Supplementary Code 16

    Effect of predictions from the RBS calculator. R script to run linear regression of PNI and PFI against RBS calculator outputs. Report the ANOVA-like sum of squares.

  17. 17.

    Supplementary Code 17

    Partial correlation between PFI and various codon metrics, given PNI. R script to compute various alternative codon metric for the codon sequence and determine their partial correlations with PFI accounting for PNI.

  18. 18.

    Supplementary Code 18

    Processing of growth measurements from FIT-seq data collected under various conditions. R script to convert differential enrichment of read count data over time into an integrated measure of cell growth. Process read count data from multiple replicate experiments. Convert read count ratios into aggregated measures of relative growth in a given environment (WNI, WFI, WUTX, WM).

  19. 19.

    Supplementary Code 19

    Computation of sum of squares for multiple linear regression of WNI on PNI and design properties. R script to run a multiple linear regression on of WNI against PNI, PNI2 and design properties. Report ANOVA-like sum of squares

  20. 20.

    Supplementary Code 20

    Computation of sum of squares for multiple linear regression of WFI on PFI and design properties. R script to run a multiple linear regression on of WFI against PFI, PFI2 and design properties. Report ANOVA-like sum of squares.

  21. 21.

    Supplementary Code 21

    Processing of RNA abundance and decay measurements from serial RNA-seq. R script to compute RNA decay after transcription arrest. Sample read counts are corrected using coefficients derived from ratioing counts of spiked-in RNA standards over time. Performs a nonlinear decay fit to the corrected count frequencies to estimate RNA abundance at steady state (RNASS), RNA half-life (RNAHL) and RNA protection (WPTX).

  22. 22.

    Supplementary Code 22

    Compute 3D animation of the data. R scripts to produce the images necessary for Supplementary Video 1.

  23. 23.

    Supplementary Code 23

    Processing of polysome profiles from DNA-seq of separate polysome fractions. R script to compute the distribution of polysome (up to fifth fraction) for each design sequence from read counts.

  24. 24.

    Supplementary Code 24

    Definition of sequence archetypes. R script to categorize sequences into the most relevant combinations of sequence properties. Calculate the series-wise means of various phenotypes for sequences belonging to these archetypes.

  25. 25.

    Supplementary Code 25

    GenBank parser. A script to parse coding sequence from GenBank file using BioPython.

  26. 26.

    Supplementary Code 26

    D-Tailor module. Links to specific D-Tailor modules used in this work.

  27. 27.

    Supplementary Code 28

    Seed generator for D-Tailor. Python script to generate a random input sequence for D-Tailor that maximizes the distance to other input sequences.

Zip files

  1. 1.

    Supplementary Code 27

    Genome randomization. Perl modules to produce random genome variants that retain codon usage and protein's amino acid composition.

  2. 2.

    Supplementary Data 1

    E. coli's features and measurements. Dataset aggregating various measures of sequence property for every gene in a reference E. coli and corresponding expression data for a subset (Taniguchi, 2009).

  3. 3.

    Supplementary Data 2

    Mean hydropathy index over sliding windows. Calculation of the MHI over sliding windows for every gene in the reference E. coli genome.

  4. 4.

    Supplementary Data 4

    Accessible bottleneck strengths. Calculation of bottleneck strength for random sequence cloned in the translation reporter.

  5. 5.

    Supplementary Data 5

    E. coli's features and levels. Calculation of property scores and discrete categorisation for every gene in the E. coli genome, based on the properties and thresholds set for the Design of Experiments.

  6. 6.

    Supplementary Data 6

    Random solutions. Calculation of property scores and categorization for random sequences cloned in the translation reporter context, based on the properties and thresholds set for the Design of Experiments.

  7. 7.

    Supplementary Data 8

    Series logo. Position-wise nucleotide and amino acid frequency matrices for each series.

  8. 8.

    Supplementary Data 9

    Sequencing count summary. A table reporting the number of counts associated with each design sequence for every sequencing library in this work.

  9. 9.

    Supplementary Data 15

    Integrated phenotypic measurements. Consolidated dataset comprising design information, intermediates and fully processed phenotypic measurements for all 244,000 synthetic sequences.

  10. 10.

    Supplementary Data 16

    ANOVA on PNI. An R object containing the sum of squares computed by running ANOVAs on the full dataset and independent series (Supplementary Code 7).

  11. 11.

    Supplementary Data 17

    MLR on PNI. An R object containing the sum of squares computed by running multiple linear regressions on the full dataset and independent series (Supplementary Code 8).

  12. 12.

    Supplementary Data 18

    CART on PNI. An R object containing the result of CART analysis (Supplementary Code 9).

  13. 13.

    Supplementary Data 19

    ANOVA on PFI. An R object containing the sum of squares computed by running ANOVAs on the full dataset and independent series (output of Supplementary Code 7).

  14. 14.

    Supplementary Data 20

    MLR on PFI. An R object containing the sum of squares computed by running multiple linear regressions on the full dataset and independent series (output of Supplementary Code 8).

  15. 15.

    Supplementary Data 21

    CART on PFI. An R object containing the result of CART analysis (output of Supplementary Code 9).

  16. 16.

    Supplementary Data 22

    Effect of minimum free energy over sliding windows. MFE predicted for sliding windows of different length on each designed sequence.

  17. 17.

    Supplementary Data 23

    Sum of squares corresponding to regression of PNI on MFE over sliding windows (output of Supplementary Code 13).

  18. 18.

    Supplementary Data 24

    Sum of squares corresponding to regression of PFI to the residuals of PNI's regression on MFEs (output of Supplementary Code 13).

  19. 19.

    Supplementary Data 25

    Single nucleotide accessibilities. Predicted accessibilities at every position of each designed sequences.

  20. 20.

    Supplementary Data 26

    Sum of squares for multiple linear regression of PNI on accessibilities (output of Supplementary Code 14).

  21. 21.

    Supplementary Data 27

    Sum of squares for multiple linear regression of PFI on accessibilities (output of Supplementary Code 14).

  22. 22.

    Supplementary Data 28

    RBS calculator predictions. Aggregation of outputs obtained by running each designed sequence in reporter context in the RBS calculator (output of Supplementary Code 15).

  23. 23.

    Supplementary Data 29

    Sum of squares corresponding to the regression of PNI on RBS calculator's predictions (output of Supplementary Code 16).

  24. 24.

    Supplementary Data 30

    Partial correlation of various codon-based metrics with PFI, given PNI (output of Supplementary Code 17).

  25. 25.

    Supplementary Data 31

    Sum of squares for multiple linear regression of WNI on design properties and PNI (output of Supplementary Code 19).

  26. 26.

    Supplementary Data 32

    Sum of squares for multiple linear regression of WFI on design properties and PFI (output of Supplementary Code 20).

  27. 27.

    Supplementary Data 34

    Nonlinear decay fit. An R object containing fit data (output of Supplementary Code 21).

  28. 28.

    Supplementary Data 35

    Phenotypic archetypes. Quartiles of series-wise mean for various phenotypes (output of Supplementary Code 24).

  29. 29.

    Supplementary Data 36

    Random E. coli genomes. Result of constrained genome randomization (output of Supplementary Code 27).

CSV files

  1. 1.

    Supplementary Data 3

    tAI profiles for sfGFP and a designed variant. Calculates tAI over a sliding window.

  2. 2.

    Supplementary Data 10

    Illumina lane description. Mapping of the different sequencing libraries on Illumina sequencing lane.

  3. 3.

    Supplementary Data 11

    TAG coupling upon activation by unnatural amino acids. Table reporting the mean fluorescence observed upon induction by increasing concentration of the unnatural amino acid pAcF.

  4. 4.

    Supplementary Data 12

    TAG coupling mutants. Table reporting the mean fluorescence observed in various mutants of the TAG position.

  5. 5.

    Supplementary Data 13

    Growth of TAG mutants. Density of cell culture (OD600) over time for various mutants at the TAG position.

  6. 6.

    Supplementary Data 14

    Number of cells sorted during FACS-seq. Report the number of cells sorted in each bin during the FACS-seq experiments. Used to normalize read counts upon sequencing.

  7. 7.

    Supplementary Data 33

    RNA standards. Counts of reads mapping to RNA standard sequences in RNA decay libraries.

Excel files

  1. 1.

    Supplementary Data 7

    Intra-series distance. Collection of tables reporting Hamming distances between every pair of sequences within the same series.


  1. 1.

    3D animation of the data in RNA–Protein–Fitness space.

About this article

Publication history





Further reading

Newsletter Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing