Article

Cotranslational protein assembly imposes evolutionary constraints on homomeric proteins

  • Nature Structural & Molecular Biologyvolume 25pages279288 (2018)
  • doi:10.1038/s41594-018-0029-5
  • Download Citation
Received:
Accepted:
Published:

Abstract

Cotranslational protein folding can facilitate rapid formation of functional structures. However, it can also cause premature assembly of protein complexes, if two interacting nascent chains are in close proximity. By analyzing known protein structures, we show that homomeric protein contacts are enriched toward the C termini of polypeptide chains across diverse proteomes. We hypothesize that this is the result of evolutionary constraints for folding to occur before assembly. Using high-throughput imaging of protein homomers in Escherichia coli and engineered protein constructs with N- and C-terminal oligomerization domains, we show that, indeed, proteins with C-terminal homomeric interface residues consistently assemble more efficiently than those with N-terminal interface residues. Using in vivo, in vitro and in silico experiments, we identify features that govern successful assembly of homomers, which have implications for protein design and expression optimization.

  • Subscribe to Nature Structural & Molecular Biology for full access:

    $59

    Subscribe

Additional access options:

Already a subscriber?  Log in  now or  Register  for online access.

Additional information

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. 1.

    Elcock, A. H. Molecular simulations of cotranslational protein folding: fragment stabilities, folding cooperativity, and trapping in the ribosome. PLOS Comput. Biol. 2, e98 (2006).

  2. 2.

    Sander, I. M., Chaney, J. L. & Clark, P. L. Expanding Anfinsen’s principle: contributions of synonymous codon selection to rational protein design. J. Am. Chem. Soc. 136, 858–861 (2014).

  3. 3.

    Pechmann, S. & Frydman, J. Evolutionary conservation of codon optimality reveals hidden signatures of cotranslational folding. Nat. Struct. Mol. Biol. 20, 237–243 (2013).

  4. 4.

    Levy, E. D. & Teichmann, S. Structural, evolutionary, and assembly principles of protein oligomerization. Prog. Mol. Biol. Transl. Sci. 117, 25–51 (2013).

  5. 5.

    Goodsell, D. S. & Olson, A. J. Structural symmetry and protein function. Annu. Rev. Biophys. Biomol. Struct. 29, 105–153 (2000).

  6. 6.

    Natan, E., Wells, J. N., Teichmann, S. A. & Marsh, J. A. Regulation, evolution and consequences of cotranslational protein complex assembly. Curr. Opin. Struct. Biol. 42, 90–97 (2017).

  7. 7.

    Shieh, Y. W. et al. Operon structure and cotranslational subunit association direct protein assembly in bacteria. Science 350, 678–680 (2015).

  8. 8.

    Borgia, M. B. et al. Single-molecule fluorescence reveals sequence-specific misfolding in multidomain proteins. Nature 474, 662–665 (2011).

  9. 9.

    Wright, C. F., Teichmann, S. A., Clarke, J. & Dobson, C. M. The importance of sequence diversity in the aggregation and evolution of proteins. Nature 438, 878–881 (2005).

  10. 10.

    Nissley, D. A. & O’Brien, E. P. Timing is everything: unifying codon translation rates and nascent proteome behavior. J. Am. Chem. Soc. 136, 17892–17898 (2014).

  11. 11.

    Buhr, F. et al. Synonymous codons direct cotranslational folding toward different protein conformations. Mol. Cell 61, 341–351 (2016).

  12. 12.

    Ahnert, S. E., Marsh, J. A., Hernández, H., Robinson, C. V. & Teichmann, S. A. Principles of assembly reveal a periodic table of protein complexes. Science 350, aaa2245 (2015).

  13. 13.

    Kitagawa, M. et al. Complete set of ORF clones of Escherichia coli ASKA library (a complete set of E. coli K-12 ORF archive): unique resources for biological research. DNA Res. 12, 291–299 (2005).

  14. 14.

    Waldo, G. S., Standish, B. M., Berendzen, J. & Terwilliger, T. C. Rapid protein-folding assay using green fluorescent protein. Nat. Biotechnol. 17, 691–695 (1999).

  15. 15.

    Ugrinov, K. G. & Clark, P. L. Cotranslational folding increases GFP folding yield. Biophys. J. 98, 1312–1320 (2010).

  16. 16.

    Wells, J. N., Bergendahl, L. T. & Marsh, J. A. Co-translational assembly of protein complexes. Biochem. Soc. Trans. 43, 1221–1226 (2015).

  17. 17.

    Gaglia, G., Guan, Y., Shah, J. V. & Lahav, G. Activation and control of p53 tetramerization in individual living cells. Proc. Natl. Acad. Sci. USA 110, 15497–15501 (2013).

  18. 18.

    Lomax, M. E., Barnes, D. M., Hupp, T. R., Picksley, S. M. & Camplejohn, R. S. Characterization of p53 oligomerization domain mutations isolated from Li-Fraumeni and Li-Fraumeni like family members. Oncogene 17, 643–649 (1998).

  19. 19.

    Rajagopalan, S., Huang, F. & Fersht, A. R. Single-molecule characterization of oligomerization kinetics and equilibria of the tumor suppressor p53. Nucleic Acids Res. 39, 2294–2303 (2011).

  20. 20.

    Mateu, M. G., Sánchez Del Pino, M. M. & Fersht, A. R. Mechanism of folding and assembly of a small tetrameric protein domain from tumor suppressor p53. Nat. Struct. Biol. 6, 191–198 (1999).

  21. 21.

    Nicholls, C. D., McLure, K. G., Shields, M. A. & Lee, P. W. Biogenesis of p53 involves cotranslational dimerization of monomers and posttranslational dimerization of dimers. Implications on the dominant negative effect. J. Biol. Chem. 277, 12937–12945 (2002).

  22. 22.

    Mateu, M. G. & Fersht, A. R. Mutually compensatory mutations during evolution of the tetramerization domain of tumor suppressor p53 lead to impaired hetero-oligomerization. Proc. Natl. Acad. Sci. USA 96, 3595–3599 (1999).

  23. 23.

    Mateu, M. G. & Fersht, A. R. Nine hydrophobic side chains are key determinants of the thermodynamic stability and oligomerization status of tumour suppressor p53 tetramerization domain. EMBO J. 17, 2748–2758 (1998).

  24. 24.

    Iwasaki, S. & Ingolia, N. T. Seeing translation. Science 352, 1391–1392 (2016).

  25. 25.

    Jones, D. D., Stott, K. M., Howard, M. J. & Perham, R. N. Restricted motion of the lipoyl-lysine swinging arm in the pyruvate dehydrogenase complex of Escherichia coli. Biochemistry 39, 8448–8459 (2000).

  26. 26.

    Radford, S. E., Laue, E. D., Perham, R. N., Martin, S. R. & Appella, E. Conformational flexibility and folding of synthetic peptides representing an interdomain segment of polypeptide chain in the pyruvate dehydrogenase multienzyme complex of Escherichia coli. J. Biol. Chem. 264, 767–775 (1989).

  27. 27.

    Lengyel, J. S. et al. Extended polypeptide linkers establish the spatial architecture of a pyruvate dehydrogenase multienzyme complex. Structure 16, 93–103 (2008).

  28. 28.

    Natan, E. & Joerger, A. C. Structure and kinetic stability of the p63 tetramerization domain. J. Mol. Biol. 415, 503–513 (2012).

  29. 29.

    Natan, E. et al. Interaction of the p53 DNA-binding domain with its n-terminal extension modulates the stability of the p53 tetramer. J. Mol. Biol. 409, 358–368 (2011).

  30. 30.

    Tsai, C. J. et al. Synonymous mutations and ribosome stalling can lead to altered folding pathways and distinct minima. J. Mol. Biol. 383, 281–291 (2008).

  31. 31.

    O’Brien, E. P., Vendruscolo, M. & Dobson, C. M. Prediction of variable translation rate effects on cotranslational protein folding. Nat. Commun. 3, 868 (2012).

  32. 32.

    Zhang, G. & Ignatova, Z. Folding at the birth of the nascent chain: coordinating translation with co-translational folding. Curr. Opin. Struct. Biol. 21, 25–31 (2011).

  33. 33.

    Xu, C., Wang, S., Thibault, G. & Ng, D. T. Futile protein folding cycles in the ER are terminated by the unfolded protein O-mannosylation pathway. Science 340, 978–981 (2013).

  34. 34.

    Reid, B. G. & Flynn, G. C. Chromophore formation in green fluorescent protein. Biochemistry 36, 6786–6791 (1997).

  35. 35.

    Shimizu, Y., Kanamori, T. & Ueda, T. Protein synthesis by pure translation systems. Methods 36, 299–304 (2005).

  36. 36.

    O’Brien, E. P., Christodoulou, J., Vendruscolo, M. & Dobson, C. M. Trigger factor slows co-translational folding through kinetic trapping while sterically protecting the nascent chain from aberrant cytosolic interactions. J. Am. Chem. Soc. 134, 10920–10932 (2012).

  37. 37.

    Niwa, T., Kanamori, T., Ueda, T. & Taguchi, H. Global analysis of chaperone effects using a reconstituted cell-free translation system. Proc. Natl. Acad. Sci. USA 109, 8937–8942 (2012).

  38. 38.

    Jaenicke, R. Protein folding: local structures, domains, subunits, and assemblies. Biochemistry 30, 3147–3161 (1991).

  39. 39.

    Schröder, H., Langer, T., Hartl, F. U. & Bukau, B. DnaK, DnaJ and GrpE form a cellular chaperone machinery capable of repairing heat-induced protein damage. EMBO J. 12, 4137–4144 (1993).

  40. 40.

    Calloni, G. et al DnaK functions as a central hub in the E. coli chaperone network. Cell Rep. 1, 251–264 (2012).

  41. 41.

    Brandt, F. et al. The native 3D organization of bacterial polysomes. Cell 136, 261–271 (2009).

  42. 42.

    Marsh, J. A. & Teichmann, S. A. Structure, dynamics, assembly, and evolution of protein complexes. Annu. Rev. Biochem. 84, 551–575 (2015).

  43. 43.

    Levy, E. D., De, S. & Teichmann, S. A. Cellular crowding imposes global constraints on the chemistry and evolution of proteomes. Proc. Natl. Acad. Sci. USA 109, 20461–20466 (2012).

  44. 44.

    Jaenicke, R. & Lilie, H. Folding and association of oligomeric and multimeric proteins. Adv. Protein Chem. 53, 329–401 (2000).

  45. 45.

    Garcia-Seisdedos, H., Empereur-Mot, C., Elad, N. & Levy, E. D. Proteins evolve on the edge of supramolecular self-assembly. Nature 548, 244–247 (2017).

  46. 46.

    Peisajovich, S. G., Rockah, L. & Tawfik, D. S. Evolution of new protein topologies through multistep gene rearrangements. Nat. Genet. 38, 168–174 (2006).

  47. 47.

    Tam, S. et al. The chaperonin TRiC blocks a huntingtin sequence element that promotes the conformational switch to aggregation. Nat. Struct. Mol. Biol. 16, 1279–1285 (2009).

  48. 48.

    Levy, E. D. PiQSi: protein quaternary structure investigation. Structure 15, 1364–1367 (2007).

  49. 49.

    Marsh, J. A. & Teichmann, S. A. Protein flexibility facilitates quaternary structure assembly and evolution. PLoS Biol. 12, e1001870 (2014).

  50. 50.

    Arnold, K., Bordoli, L., Kopp, J. & Schwede, T. The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling. Bioinformatics 22, 195–201 (2006).

  51. 51.

    Smith, K. et al. CIDRE: an illumination-correction method for optical microscopy. Nat. Methods 12, 404–406 (2015).

  52. 52.

    Carpenter, A. E. et al. CellProfiler: image analysis software for identifying and quantifying cell phenotypes. Genome Biol. 7, R100 (2006).

  53. 53.

    Hall, M. et al. The WEKA data mining software: an update. SIGKDD Explor. 11, 10–18 (2009).

  54. 54.

    Sobott, F., Hernández, H., McCammon, M. G., Tito, M. A. & Robinson, C. V. A tandem mass spectrometer for improved transmission and analysis of large macromolecular assemblies. Anal. Chem. 74, 1402–1407 (2002).

  55. 55.

    Hernández, H. & Robinson, C. V. Determining the stoichiometry and interactions of macromolecular assemblies from mass spectrometry. Nat. Protoc. 2, 715–726 (2007).

  56. 56.

    Niwa, T. et al. Bimodal protein solubility distribution revealed by an aggregation analysis of the entire ensemble of Escherichia coli proteins. Proc. Natl. Acad. Sci. USA 106, 4201–4206 (2009).

  57. 57.

    Ermak, D. L. & McCammon, J. Brownian dynamics with hydrodynamic interactions. J. Chem. Phys. 69, 1352 (1978).

  58. 58.

    Agirrezabala, X. et al. Structural insights into cognate versus near-cognate discrimination during decoding. EMBO J. 30, 1497–1507 (2011).

  59. 59.

    Yang, F., Moss, L. G. & Phillips, G. N. Jr. The molecular structure of green fluorescent protein. Nat. Biotechnol. 14, 1246–1251 (1996).

  60. 60.

    Jefferys, B. R., Kelley, L. A. & Sternberg, M. J. Protein folding requires crowd control in a simulated cell. J. Mol. Biol. 397, 1329–1338 (2010).

  61. 61.

    Marsh, J. A. et al. Protein complexes are under evolutionary selection to assemble via ordered pathways. Cell 153, 461–470 (2013).

  62. 62.

    Xiang, Z., Soto, C. S. & Honig, B. Evaluating conformational free energies: the colony energy and its application to the problem of loop prediction. Proc. Natl. Acad. Sci. USA 99, 7432–7437 (2002).

  63. 63.

    Jeffrey, P. D., Gorina, S. & Pavletich, N. P. Crystal structure of the tetramerization domain of the p53 tumor suppressor at 1.7 angstroms. Science 267, 1498–1502 (1995).

  64. 64.

    Antosiewicz, J., McCammon, J. A. & Gilson, M. K. The determinants of pKas in proteins. Biochemistry 35, 7819–7833 (1996).

  65. 65.

    Winter, U. & Geyer, T. Coarse grained simulations of a small peptide: effects of finite damping and hydrodynamic interactions. J. Chem. Phys. 131, 104102 (2009).

  66. 66.

    Frembgen-Kesner, T. & Elcock, A. H. Striking effects of hydrodynamic interactions on the simulated diffusion and folding of proteins. J. Chem. Theory Comput. 5, 242–256 (2009).

  67. 67.

    Hogg, J. D. A DAG-based parallel Cholesky factorization for multicore systems. Report No. RAL-TR-2008-029 (Rutherford Appleton Laboratory, 2008).

  68. 68.

    Velankar, S. et al. SIFTS: structure integration with function, taxonomy and sequences resource. Nucleic Acids Res. 41, D483–D489 (2013).

Download references

Acknowledgements

We are grateful to G. Kramer and B. Bukau (Deutsches Krebsforschungszentrum, Heidelberg, Germany) for their generous gift of trigger factor protein and A. Drummond (Department of Biochemistry & Molecular Biology, University of Chicago) for the generous gift of plasmids. We also thank L. Byung-Gil for useful advice and N. Sanchez De Groot for technical support. We thank C. Vogel, M.T. Burgas and E. Arbely for helpful suggestions and critical reading. E.N. thanks N. Weiner and the ISEF foundation for their support. M.M.B., T.F. and G.C. are supported by the Medical Research Council (MC_U105185859). T.F. was also supported by the Boehringer Ingelheim Fond. B.P. and C.P. thank ‘Lendület’ Programme of the Hungarian Academy of Sciences and the Wellcome Trust for supporting this work and the European Research Council (C.P.). B.K. is supported by the János Bolyai Research Scholarship of the Hungarian Academy of Sciences and NKFI 120220. Z.M. is supported by GINOP-2.3.2-15-2016-00001. P.H. thanks the National Brain Research Programme and the TEKES Finland Distinguished Professor Grant for their support. S.A.T. thanks the Lister Institute, the MRC, the EMBL-European Bioinformatics Institute and the Wellcome Trust Sanger Institute. N.S. and T.E. were partly supported by Grants-in-Aid for Scientific Research from the Ministry of Education, Culture, Sports, Science and Technology (ME5a–c and Supplementary Fig. 10 areXT), mostly Innovative Areas of “Chemistry for Multimolecular Crowding in Biosystems” (JSPS KAKENHI Grant No. JP17H06351) and MEXT-Supported Program for the Strategic Research Foundation at Private Universities (2014-2019) and The Hirao Taro Foundation of KONAN GAKUEN for Academic Research. J.M. is supported by an MRC Career Development Award (MR/M02122X/1). C.R. is supported by the Medical Research Council, Grant Reference MR/N020413/1. L.H.V. was supported by EMBO (award number ALTF 698-2012), Directorate-General for Research and Innovation (FP7-PEOPLE-2010-IEF, ThPLAST 274192) and an EMBL Interdisciplinary Postdoctoral fellowship, supported by H2020 Marie Sklodowska Curie Actions. B.P. and H.P. acknowledge funding from GINOP-2.3.2-15-2016-00026. A.H.E.‘s work was supported by the National Institutes of Health through grant R01 GM099865. This work is dedicated to Jakob Natan and Shalom Marciano.

Author information

Affiliations

  1. The Aleph Lab Ltd, Oxford, UK

    • Eviatar Natan
  2. Frontier Institute for Biomolecular Engineering Research (FIBER), Konan University, Kobe, Japan

    • Tamaki Endoh
    •  & Naoki Sugimoto
  3. EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK

    • Liora Haim-Vilmovsky
  4. Wellcome Trust Sanger Institute, Wellcome Genome Campus, Cambridge, UK

    • Liora Haim-Vilmovsky
    •  & Sarah A. Teichmann
  5. MRC Laboratory of Molecular Biology, Cambridge, UK

    • Tilman Flock
    • , Guilhem Chalancon
    •  & M. Madan Babu
  6. OMass Technologies Ltd, Centre for Innovation & Enterprise, Oxford, UK

    • Jonathan T. S. Hopper
  7. Synthetic and System Biology Unit, Biological Research Center of the Hungarian Academia of Sciences, Szeged, Hungary

    • Bálint Kintses
    • , Peter Horvath
    • , Lejla Daruka
    • , Gergely Fekete
    • , Csaba Pál
    •  & Balázs Papp
  8. Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland

    • Peter Horvath
  9. Institute of Plant Biology, Biological Research Center of the Hungarian Academia of Sciences, Szeged, Hungary

    • Erika Oszi
    •  & Zoltán Magyar
  10. MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, UK

    • Joseph A. Marsh
  11. Department of Biochemistry, University of Iowa, Iowa City, IA, USA

    • Adrian H. Elcock
  12. Department of Chemistry, University of Oxford, Oxford, UK

    • Carol V. Robinson
  13. Graduate School of Frontiers of Innovative Research in Science and Technology (FIRST), Konan University, Kobe, Japan

    • Naoki Sugimoto
  14. Cavendish Laboratory, University of Cambridge, Cambridge, UK

    • Sarah A. Teichmann

Authors

  1. Search for Eviatar Natan in:

  2. Search for Tamaki Endoh in:

  3. Search for Liora Haim-Vilmovsky in:

  4. Search for Tilman Flock in:

  5. Search for Guilhem Chalancon in:

  6. Search for Jonathan T. S. Hopper in:

  7. Search for Bálint Kintses in:

  8. Search for Peter Horvath in:

  9. Search for Lejla Daruka in:

  10. Search for Gergely Fekete in:

  11. Search for Csaba Pál in:

  12. Search for Balázs Papp in:

  13. Search for Erika Oszi in:

  14. Search for Zoltán Magyar in:

  15. Search for Joseph A. Marsh in:

  16. Search for Adrian H. Elcock in:

  17. Search for M. Madan Babu in:

  18. Search for Carol V. Robinson in:

  19. Search for Naoki Sugimoto in:

  20. Search for Sarah A. Teichmann in:

Contributions

The study was conceived by E.N. and S.A.T. The study was coordinated by E.N. and S.A.T. The experiments were designed by E.N., L.H.V., B.K., B.P., C.P. and P.H. The experiments were conducted by E.N., T.E., N.S., A.H.E., B.K., L.D., E.O. and Z.M. Bioinformatic analysis was conducted by T.F. and J.A.M. Simulations were run by A.H.E. Machine-learning analysis was conducted by P.H. Data analysis was conducted by E.N., T.E., A.H.E., T.F., B.K., G.F., H.P., B.P., C.P. and G.C. The manuscript was written by E.N. and S.A.T. with contributions from all authors.

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Eviatar Natan or Sarah A. Teichmann.

Integrated supplementary information

  1. Supplementary Figure 1 Distribution of interface-forming residues in homomers and heteromers

    (a-b) Distribution of interface-forming residues for homomers, as in Fig. 2, divided into bacteria, eukaryotes and archaea. (a) Including all homomers in our dataset, (b) Including only full-length or nearly full-length homomers, in which the crystallized construct contains >90% of the residues present in the UniProt sequence of the full protein. There is no apparent C-terminal interface enrichment in archaea; since this is also by far the smallest group it is difficult to say whether or not this is due to genuine biological differences or due to small numbers. (c) Distribution of interface-forming residues for all heteromeric subunits as a control, showing no C-terminal enrichtment. (d) Relative enrichment in interface in the C-terminal halves of proteins compared to the N-terminal halves for all species with >100 non-redundant heteromeric subunit structures in our dataset. No significant enrichment for heteromers is evident, in contrast to homomers. (e) Distribution of interface-forming residues for heteromeric subunits from bacteria, eukaryotes and archaea. Error bars for all plots are calculated the same as in Fig. 2 with 104 bootstrapping replicates. Again, no significant enrichment is evident. (f) Given the notable difference between the enrichment observed for humans and rats in Fig. 2b, we compared the C-terminal interface enrichment in homomers from humans and rats, considering only those structures that are closely related (> 70% sequence identity) between the two groups. There is very little difference between the enrichment seen in human vs. rat structures. Error bars are calculated the same as in Fig. 2 with 104 bootstrap replicates.

  2. Supplementary Figure 2 Comparison of the C-terminal interface enrichment for homomers of different lengths and from different symmetry groups

    (a) Homomers were split into three equally sized groups of short, medium-length and long proteins, and the interface enrichment was plotted as in Fig. 2a. (b) Homomers were grouped on the basis of the most common symmetry types, and the interface enrichment was plotted as in Fig. 2a. (c) Relative enrichment of interface and solvent-accessible surface area across the length of proteins. These plots are analogous to Fig 2a, except interface enrichment is not normalized by solvent-accessible surface area; instead, both are shown separately. Solvent-accessible surface area is calculated only considering monomeric subunits, neglecting interactions.

  3. Supplementary Figure 3 Workflow of the image-based high-throughput screen

    (a) 611 native E. coli homomers with C-terminal GFP labels were compiled from the complete set of E. coli K-12 ASKA library. All proteins were over-expressed in 96-well plates and cells were imaged to determine GFP-signal. Using a supervised machine-learning algorithm each cell was classified into phenotypes. Two types of cells were selected: (i) ‘Green cells’, which are cells with homogeneous and high GFP signal along the cell, and (ii) ‘Dark cells’ are cells with GFP-signal at background levels. Finally, each homomer was classified into one of these groups, depending on which phenotype was predominant in the corresponding cell population. (b-c) N-terminal regions in the ‘Dark’ homomers are enriched in N-terminal interface forming residues as compared to ‘Green’ homomers across all length and relative interface size categories.(b) Homomers were split into two equally sized groups of short (left) and long (right) proteins. The relative enrichment of interface-forming residues along the protein length is shown in green and grey for ‘Green’ and ‘Dark’ cells, respectively, as in Fig. 2c. Error bars represent standard errors calculated from 104 bootstrapping replicates as before. In the ‘Dark’ group, N-terminal regions with significant interface enrichment (indicated with *) were observed as compared to ‘Green’ proteins, both in the long and short protein groups. (c) As for the length-based analysis, homomers were split into two equally sized groups based on relative interface size. Relative interface size was calculated for each protein by dividing the size of the homomer interface with the total available surface area of the protein. Both in the small and large relative interface size category significant enrichment of interface forming regions was observed for ‘Dark’ homomers as compared to the ‘Green’ ones. (d) The sub-group of cytoplasmic-only proteins was analyzed separately, and the observed enrichment trend was kept. The dataset for membrane proteins was too small, thus was not presented.

  4. Supplementary Figure 4 Constructs of the YFP sublibrary and flow cytometry and ESI-MS characterization

    (a) The different constructs are identical or almost identical in sequence composition. Three N-terminal variants were used in this work, monomeric, dimeric and tetrameric differing by a single amino acid residue. In addition, a construct with a tetrameric oligomerization-domain positioned at the C-terminus was also used. (b) Flow cytometry measurements of all four variants. The lines along the dot-plots indicate the intensity of the tetrameric (@N) and monomeric variants. The N-terminal tetrameric variants always show the lowest fluorescence level. The ratio is shown in (c). (d) Western Blot of Tet-SL-YFP, YFP-SL-Tet and empty vector, as shown in Fig. 3b. The blot is reprehensive blot, where each variant was expressed from three different colonies (n=3) and three times from each colony (total n=9). (e-h) Verifying the oligomerization state of YFP sub-library constructs using ESI-MS to measure the oligomeric state of the following four constructs: (e) Tet-SL-YFP, (f) YFP-SL-Tet (g) YFP-LL-Tet and (h) Mono-SL-YFP. All spectra possess charge state distributions with deconvoluted masses in agreement with the theoretical masses calculated from their amino acid sequence. Importantly, tandem MS (MSMS) experiments confirmed that the first three constructs (e–g) are tetramers by applying high energy ejecting a monomeric subunit and a trimeric complex, as shown in insets.

  5. Supplementary Figure 5 Flow cytometry of constructs with tetrameric (Tet@N) or monomeric (Mono@N) oligomerization domain

    (a) Flow cytometry of YFP constructs with with short-, medium-, or long-linker. The lines along the dot-plots indicate the intensity of the tetrameric and monomeric variants. The ratio calculated is shown in Fig. 4b. The plots are representative for all experiments (n>5). (b-c) Flow Cytometry of long-linker GFP or fGFP constructs with tetrameric (Tet@N) or monomeric (Mono@N) oligomerization-domain at 37 °C and 18 °C. (b) The lines along the dot-plots indicate the intensity of the monomeric constructs. As shown by Confocal Microscopy (Fig. 4c), there is only a very small difference between the fGFP variants. On the other hand the GFP tetrameric variant has a significantly lower fluorescence in comparison to the monomeric variant. This difference can be compromised if the strains are grown at 18°C. (c) Analysis and ratios calculated from data presented in (b). (Independent cell cultures replicates, **p-value < 0.01, *p-value < 0.05, double sided t-test. Error bars represent s.d.).

  6. Supplementary Figure 6 Luciferase misassembly in vivo and in vitro is similar to the slow-folding GFP rather than the fast-folding fGFP and YFP

    The Luciferase (Luc) reporter was chosen because of its significantly different fold compared to YFP and GFP, and for its slower folding-rate. (a) In vivo assay of luminescence level after normalization to the number of cells. The Tet-LL-Luc shows almost no signal. A single amino-acid substitution to generate a monomeric variant increases the levels of signal dramatically. Increasing the linker length for the tetrameric variant from short- to long-linker also increases dramatically. Similarly to the short-linker variants, the monomeric long-linker variant had much higher luminescence signal than the tetrameric variant. (b) In vitro results using the polysomic conditions in PURE system. The results were in agreement with the in vivo experiments. (c) Comparison of the monosomic conditions with that of the polysomic (as presented in B). According to our hypothesis, reducing the ribosome local concentration will decrease the frequency of cotranslational assembly events, thus decreasing misassembly. The results align with the hypothesis. Moreover, the C-terminal tetrameric construct, which cannot assemble cotranslationally, did not show a significant difference between the monosomic and polysomic conditions. This further confirms our hypothesis. (d-e) Similarly to Fig. 5, we tested the same three chaperone groups: [“KJE mix”, which includes DnaK, DnaJ and GrpE, “GroE mix”, which includes GroEL and GroES, and Trigger Factor (TF)]. Overall the effect of chaperones was similar, and even stronger in comparison to the GFP sub-library (p-value *<0.05, ** <0.01, results represent as mean of the different replicates and error bars represent s.d.).

  7. Supplementary Figure 7 PURE System western blot of fGFP and GFP

    (a) Western Blot of the different GFP constructs examined. For each construct, both the polysomic and monosomic conditions are shown. (b) Average quantification of (a). (Error bars represent s.d.).

  8. Supplementary Figure 8 The effect of chaperones in vitro and in vivo

    (a–c). Summary of GFP, fGFP and Luc sub-libraries expression, with or without chaperones, using the PURE system. In each row, the tetrameric and monomeric construct of the different sub-library is examined. (a) GFP, (b) fGFP and (c) Luc. Overall, the effect of chaperones correlated with oligomeric state, i.e. tetramer versus monomer and with folding-rate, i.e., fast- and slow-folding proteins. The highest rescue effect was achieved by the KJE mix, particularly with the tetrameric slow folding Luc and GFP. (Results represent as mean of the different replicates and error bars represent s.d.). (d–g) Analysis of the influence of chaperones on homomeric and heteromeric complexes in E. coli from Ref25. (d) Depletion of misfolded homomeric and heteromeric protein complexes from the soluble fraction of E. coli mutant with ΔKJT deletion (DnaK/DnaJ and TF are deleted). (e) The change in abundance of homomeric and heteromeric protein complexes the insoluble fraction is shown for the same E. coli mutant strain.(f) Interaction of homomeric and heteromeric complex proteins with DnaK (PD/BG ratio). The relative frequencies were normalised to account for the number of homomeric and heteromeric complexes. (g) Histograms of absolute numbers of interactions of homomeric and heteromeric complex proteins with DnaK (PD/BG ratio).

  9. Supplementary Figure 9 Snapshots of the endpoint of all in silico simulations

    The snapshots are of the three main constructs of the YFP sub-library: Tet-SL-YFP, YFP-SL-Tet and Tet-LL-YFP. Representatives of these simulations can be watched in Movies S1-S3. (a) Endpoint of twenty simulations of Tet-SL-YFP. (b) Endpoint of ten simulations of the constructs YFP-SL-Tet. (c) Endpoint of twenty simulations of Tet-LL-YFP. Symbol in bottom left of some simulation indicates that Tet was assembled.

  10. Supplementary Figure 10 Evolutionary constraints as a function of assembly

    (a) Real-time growth-rate of strains that express the different constructs used in this work. Each curve represents the average of three same culture replications. The plot is a representative of three such averaged curves. All experiments show the same trend: N-terminal tetrameric constructs consistently grow more slowly than the other variants. Error bars represent s.d. (b) All available E. coli homomeric protein structures were analyzed to create a library of protein structures of proteins with (i) a discrete oligomerization-domain and (ii) data predicting whether the protein folds post- or cotranslationally. Three such protein structures were found. Oligomerization-domains are shown in red, domain-linkers in cyan, and other domains in yellow. CTP Synthetase (PDB:1s1m) has an N-terminal oligomerization-domain, which may be compensated by a linker (cyan). The other two proteins have C-terminal oligomerization-domains, no linkers, and are: aspartate-semialdehyde dehydrogenase (PDB:1t4b), and alpha-N-Acetyl-galactosaminidase (PDB:2p53). More information about these proteins is provided in Supplementary Data Set 4.

Supplementary information

  1. Supplementary Text and Figures

    Supplementary Figures 1–10, Supplementary Table 1 and Supplementary Note 1

  2. Life Sciences Reporting Summary

  3. Supplementary Dataset 1

    ‘Dark’ and ‘Green’ homomers from the genome wide in vivo screen. The table describes for each homomer the parameters from the in vivo screen, the structural parameters of the interface location and the Western Blot analysis.

  4. Supplementary Dataset 2

    Western blot analysis to test the expression of all the 136 ‘Dark’ and a selected set of 25 ‘Green’ homomers. The homomers were detected using a GFP-specific antibody. Both GFP negative (C–) and positive (C+) samples were loaded on each gel. Coomassie Brilliant Blue (CBB) stained separate gels (#1-5) or the membranes used for the Western blot assays (#6–14) are shown to justify the loading (shown in each case above the image of Western blotting). Asterisks show the expressed GFP-tagged proteins. Molecular masses (in kDa) are indicated on the left.

  5. Supplementary Dataset 3

    Protein complex immunoprecipitation (Co-IP). A strain with an empty vector and strains that express the tetrameric N-terminus (Tet-SL-YFP) and tetrameric C-terminus (YFP-SL-Tet) constructs were harvested a few hours after induction. Then, the cells’ contents were mixed with magnetic anti-HA antibody beads. The samples were washed and eluted. The eluted samples were run on an SDS gel, and selected bands were analyzed by MS. A list of the different proteins that were identified and their fold changes are indicated.

  6. Supplementary Dataset 4

    Characterization of examples of representative proteins as shown in Figure S10. The table describes E. coli structures that have oligomerization domains and have a known full-length protein structure as well as associated folding parameters.

  7. Supplementary Dataset 5

    Nonredundant sets of homomer structures. These are split into the sets of all complexes filtered for sequence redundancy across all structures, or at the species level, and the set of only full-length structures. The total amount of interface and monomer accessible surface area (in Å2) is given for the N-terminal and C-terminal halves of each protein. The same data is provided for heteromers.

Videos

  1. Supplementary Video 1

    Simulation of Tet@N with short linker. For all movies, the red segment is the Tet, and the yellow segment is the YFP β-barrel. Both cotranslational folding and misassembly take place once the Tet appear outside of the ribosome tunnels.

  2. Supplementary Video 2

    Simulation of Tet@C with short linker. There was no cotranslational assembly as the Tet of the leading ribosome leaves the ribosomal tunnel prior to the translation of the second ribosome, which allows it time to diffuse before the second Tet leaves the ribosome tunnel.

  3. Supplementary Video 3

    Simulation of Tet@N with long linker. Cotranslational assembly takes place, but not misassembly. Similarly to movies S1-S2, the red segment is the Tet and the yellow is the YFP reporter-gene, the cyan segment is the long-linker. The less frequent misassembly events fit the observed in vivo and in vitro data.