Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Cotranslational protein assembly imposes evolutionary constraints on homomeric proteins


Cotranslational protein folding can facilitate rapid formation of functional structures. However, it can also cause premature assembly of protein complexes, if two interacting nascent chains are in close proximity. By analyzing known protein structures, we show that homomeric protein contacts are enriched toward the C termini of polypeptide chains across diverse proteomes. We hypothesize that this is the result of evolutionary constraints for folding to occur before assembly. Using high-throughput imaging of protein homomers in Escherichia coli and engineered protein constructs with N- and C-terminal oligomerization domains, we show that, indeed, proteins with C-terminal homomeric interface residues consistently assemble more efficiently than those with N-terminal interface residues. Using in vivo, in vitro and in silico experiments, we identify features that govern successful assembly of homomers, which have implications for protein design and expression optimization.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Illustration of the possible cotranslational assembly of homomeric proteins.
Fig. 2: Interface residues of native homomers are C-terminally enriched, correlating with stability of the protein.
Fig. 3: Position of the oligomerization domain is crucial for protein solubility.
Fig. 4: Extending the linker decreases misassembly rates.
Fig. 5: Misassembly as a function of oligomerization, folding rate and ribosome density, using PURE in vitro translation system.
Fig. 6: In silico simulation of the translation of different constructs.
Fig. 7: Cotranslational (mis)assembly as a function of sequence-intrinsic features.


  1. Elcock, A. H. Molecular simulations of cotranslational protein folding: fragment stabilities, folding cooperativity, and trapping in the ribosome. PLOS Comput. Biol. 2, e98 (2006).

    Article  PubMed  PubMed Central  Google Scholar 

  2. Sander, I. M., Chaney, J. L. & Clark, P. L. Expanding Anfinsen’s principle: contributions of synonymous codon selection to rational protein design. J. Am. Chem. Soc. 136, 858–861 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Pechmann, S. & Frydman, J. Evolutionary conservation of codon optimality reveals hidden signatures of cotranslational folding. Nat. Struct. Mol. Biol. 20, 237–243 (2013).

    Article  CAS  PubMed  Google Scholar 

  4. Levy, E. D. & Teichmann, S. Structural, evolutionary, and assembly principles of protein oligomerization. Prog. Mol. Biol. Transl. Sci. 117, 25–51 (2013).

    Article  CAS  PubMed  Google Scholar 

  5. Goodsell, D. S. & Olson, A. J. Structural symmetry and protein function. Annu. Rev. Biophys. Biomol. Struct. 29, 105–153 (2000).

    Article  CAS  PubMed  Google Scholar 

  6. Natan, E., Wells, J. N., Teichmann, S. A. & Marsh, J. A. Regulation, evolution and consequences of cotranslational protein complex assembly. Curr. Opin. Struct. Biol. 42, 90–97 (2017).

    Article  CAS  PubMed  Google Scholar 

  7. Shieh, Y. W. et al. Operon structure and cotranslational subunit association direct protein assembly in bacteria. Science 350, 678–680 (2015).

    Article  CAS  PubMed  Google Scholar 

  8. Borgia, M. B. et al. Single-molecule fluorescence reveals sequence-specific misfolding in multidomain proteins. Nature 474, 662–665 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Wright, C. F., Teichmann, S. A., Clarke, J. & Dobson, C. M. The importance of sequence diversity in the aggregation and evolution of proteins. Nature 438, 878–881 (2005).

    Article  CAS  PubMed  Google Scholar 

  10. Nissley, D. A. & O’Brien, E. P. Timing is everything: unifying codon translation rates and nascent proteome behavior. J. Am. Chem. Soc. 136, 17892–17898 (2014).

    Article  CAS  PubMed  Google Scholar 

  11. Buhr, F. et al. Synonymous codons direct cotranslational folding toward different protein conformations. Mol. Cell 61, 341–351 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Ahnert, S. E., Marsh, J. A., Hernández, H., Robinson, C. V. & Teichmann, S. A. Principles of assembly reveal a periodic table of protein complexes. Science 350, aaa2245 (2015).

    Article  PubMed  Google Scholar 

  13. Kitagawa, M. et al. Complete set of ORF clones of Escherichia coli ASKA library (a complete set of E. coli K-12 ORF archive): unique resources for biological research. DNA Res. 12, 291–299 (2005).

    Article  CAS  PubMed  Google Scholar 

  14. Waldo, G. S., Standish, B. M., Berendzen, J. & Terwilliger, T. C. Rapid protein-folding assay using green fluorescent protein. Nat. Biotechnol. 17, 691–695 (1999).

    Article  CAS  PubMed  Google Scholar 

  15. Ugrinov, K. G. & Clark, P. L. Cotranslational folding increases GFP folding yield. Biophys. J. 98, 1312–1320 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Wells, J. N., Bergendahl, L. T. & Marsh, J. A. Co-translational assembly of protein complexes. Biochem. Soc. Trans. 43, 1221–1226 (2015).

    Article  CAS  PubMed  Google Scholar 

  17. Gaglia, G., Guan, Y., Shah, J. V. & Lahav, G. Activation and control of p53 tetramerization in individual living cells. Proc. Natl. Acad. Sci. USA 110, 15497–15501 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Lomax, M. E., Barnes, D. M., Hupp, T. R., Picksley, S. M. & Camplejohn, R. S. Characterization of p53 oligomerization domain mutations isolated from Li-Fraumeni and Li-Fraumeni like family members. Oncogene 17, 643–649 (1998).

    Article  CAS  PubMed  Google Scholar 

  19. Rajagopalan, S., Huang, F. & Fersht, A. R. Single-molecule characterization of oligomerization kinetics and equilibria of the tumor suppressor p53. Nucleic Acids Res. 39, 2294–2303 (2011).

    Article  CAS  PubMed  Google Scholar 

  20. Mateu, M. G., Sánchez Del Pino, M. M. & Fersht, A. R. Mechanism of folding and assembly of a small tetrameric protein domain from tumor suppressor p53. Nat. Struct. Biol. 6, 191–198 (1999).

    Article  CAS  PubMed  Google Scholar 

  21. Nicholls, C. D., McLure, K. G., Shields, M. A. & Lee, P. W. Biogenesis of p53 involves cotranslational dimerization of monomers and posttranslational dimerization of dimers. Implications on the dominant negative effect. J. Biol. Chem. 277, 12937–12945 (2002).

    Article  CAS  PubMed  Google Scholar 

  22. Mateu, M. G. & Fersht, A. R. Mutually compensatory mutations during evolution of the tetramerization domain of tumor suppressor p53 lead to impaired hetero-oligomerization. Proc. Natl. Acad. Sci. USA 96, 3595–3599 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Mateu, M. G. & Fersht, A. R. Nine hydrophobic side chains are key determinants of the thermodynamic stability and oligomerization status of tumour suppressor p53 tetramerization domain. EMBO J. 17, 2748–2758 (1998).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Iwasaki, S. & Ingolia, N. T. Seeing translation. Science 352, 1391–1392 (2016).

    Article  CAS  PubMed  Google Scholar 

  25. Jones, D. D., Stott, K. M., Howard, M. J. & Perham, R. N. Restricted motion of the lipoyl-lysine swinging arm in the pyruvate dehydrogenase complex of Escherichia coli. Biochemistry 39, 8448–8459 (2000).

    Article  CAS  PubMed  Google Scholar 

  26. Radford, S. E., Laue, E. D., Perham, R. N., Martin, S. R. & Appella, E. Conformational flexibility and folding of synthetic peptides representing an interdomain segment of polypeptide chain in the pyruvate dehydrogenase multienzyme complex of Escherichia coli. J. Biol. Chem. 264, 767–775 (1989).

    CAS  PubMed  Google Scholar 

  27. Lengyel, J. S. et al. Extended polypeptide linkers establish the spatial architecture of a pyruvate dehydrogenase multienzyme complex. Structure 16, 93–103 (2008).

    Article  CAS  PubMed  Google Scholar 

  28. Natan, E. & Joerger, A. C. Structure and kinetic stability of the p63 tetramerization domain. J. Mol. Biol. 415, 503–513 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Natan, E. et al. Interaction of the p53 DNA-binding domain with its n-terminal extension modulates the stability of the p53 tetramer. J. Mol. Biol. 409, 358–368 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Tsai, C. J. et al. Synonymous mutations and ribosome stalling can lead to altered folding pathways and distinct minima. J. Mol. Biol. 383, 281–291 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. O’Brien, E. P., Vendruscolo, M. & Dobson, C. M. Prediction of variable translation rate effects on cotranslational protein folding. Nat. Commun. 3, 868 (2012).

    Article  PubMed  Google Scholar 

  32. Zhang, G. & Ignatova, Z. Folding at the birth of the nascent chain: coordinating translation with co-translational folding. Curr. Opin. Struct. Biol. 21, 25–31 (2011).

    Article  PubMed  Google Scholar 

  33. Xu, C., Wang, S., Thibault, G. & Ng, D. T. Futile protein folding cycles in the ER are terminated by the unfolded protein O-mannosylation pathway. Science 340, 978–981 (2013).

    Article  CAS  PubMed  Google Scholar 

  34. Reid, B. G. & Flynn, G. C. Chromophore formation in green fluorescent protein. Biochemistry 36, 6786–6791 (1997).

    Article  CAS  PubMed  Google Scholar 

  35. Shimizu, Y., Kanamori, T. & Ueda, T. Protein synthesis by pure translation systems. Methods 36, 299–304 (2005).

    Article  CAS  PubMed  Google Scholar 

  36. O’Brien, E. P., Christodoulou, J., Vendruscolo, M. & Dobson, C. M. Trigger factor slows co-translational folding through kinetic trapping while sterically protecting the nascent chain from aberrant cytosolic interactions. J. Am. Chem. Soc. 134, 10920–10932 (2012).

    Article  PubMed  Google Scholar 

  37. Niwa, T., Kanamori, T., Ueda, T. & Taguchi, H. Global analysis of chaperone effects using a reconstituted cell-free translation system. Proc. Natl. Acad. Sci. USA 109, 8937–8942 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Jaenicke, R. Protein folding: local structures, domains, subunits, and assemblies. Biochemistry 30, 3147–3161 (1991).

    Article  CAS  PubMed  Google Scholar 

  39. Schröder, H., Langer, T., Hartl, F. U. & Bukau, B. DnaK, DnaJ and GrpE form a cellular chaperone machinery capable of repairing heat-induced protein damage. EMBO J. 12, 4137–4144 (1993).

    PubMed  PubMed Central  Google Scholar 

  40. Calloni, G. et al DnaK functions as a central hub in the E. coli chaperone network. Cell Rep. 1, 251–264 (2012).

    Article  CAS  PubMed  Google Scholar 

  41. Brandt, F. et al. The native 3D organization of bacterial polysomes. Cell 136, 261–271 (2009).

    Article  CAS  PubMed  Google Scholar 

  42. Marsh, J. A. & Teichmann, S. A. Structure, dynamics, assembly, and evolution of protein complexes. Annu. Rev. Biochem. 84, 551–575 (2015).

    Article  CAS  PubMed  Google Scholar 

  43. Levy, E. D., De, S. & Teichmann, S. A. Cellular crowding imposes global constraints on the chemistry and evolution of proteomes. Proc. Natl. Acad. Sci. USA 109, 20461–20466 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Jaenicke, R. & Lilie, H. Folding and association of oligomeric and multimeric proteins. Adv. Protein Chem. 53, 329–401 (2000).

    Article  CAS  PubMed  Google Scholar 

  45. Garcia-Seisdedos, H., Empereur-Mot, C., Elad, N. & Levy, E. D. Proteins evolve on the edge of supramolecular self-assembly. Nature 548, 244–247 (2017).

    CAS  PubMed  Google Scholar 

  46. Peisajovich, S. G., Rockah, L. & Tawfik, D. S. Evolution of new protein topologies through multistep gene rearrangements. Nat. Genet. 38, 168–174 (2006).

    Article  CAS  PubMed  Google Scholar 

  47. Tam, S. et al. The chaperonin TRiC blocks a huntingtin sequence element that promotes the conformational switch to aggregation. Nat. Struct. Mol. Biol. 16, 1279–1285 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Levy, E. D. PiQSi: protein quaternary structure investigation. Structure 15, 1364–1367 (2007).

    Article  CAS  PubMed  Google Scholar 

  49. Marsh, J. A. & Teichmann, S. A. Protein flexibility facilitates quaternary structure assembly and evolution. PLoS Biol. 12, e1001870 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  50. Arnold, K., Bordoli, L., Kopp, J. & Schwede, T. The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling. Bioinformatics 22, 195–201 (2006).

    Article  CAS  PubMed  Google Scholar 

  51. Smith, K. et al. CIDRE: an illumination-correction method for optical microscopy. Nat. Methods 12, 404–406 (2015).

    Article  CAS  PubMed  Google Scholar 

  52. Carpenter, A. E. et al. CellProfiler: image analysis software for identifying and quantifying cell phenotypes. Genome Biol. 7, R100 (2006).

    Article  PubMed  PubMed Central  Google Scholar 

  53. Hall, M. et al. The WEKA data mining software: an update. SIGKDD Explor. 11, 10–18 (2009).

    Article  Google Scholar 

  54. Sobott, F., Hernández, H., McCammon, M. G., Tito, M. A. & Robinson, C. V. A tandem mass spectrometer for improved transmission and analysis of large macromolecular assemblies. Anal. Chem. 74, 1402–1407 (2002).

    Article  CAS  PubMed  Google Scholar 

  55. Hernández, H. & Robinson, C. V. Determining the stoichiometry and interactions of macromolecular assemblies from mass spectrometry. Nat. Protoc. 2, 715–726 (2007).

    Article  PubMed  Google Scholar 

  56. Niwa, T. et al. Bimodal protein solubility distribution revealed by an aggregation analysis of the entire ensemble of Escherichia coli proteins. Proc. Natl. Acad. Sci. USA 106, 4201–4206 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Ermak, D. L. & McCammon, J. Brownian dynamics with hydrodynamic interactions. J. Chem. Phys. 69, 1352 (1978).

    Article  CAS  Google Scholar 

  58. Agirrezabala, X. et al. Structural insights into cognate versus near-cognate discrimination during decoding. EMBO J. 30, 1497–1507 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Yang, F., Moss, L. G. & Phillips, G. N. Jr. The molecular structure of green fluorescent protein. Nat. Biotechnol. 14, 1246–1251 (1996).

    Article  CAS  PubMed  Google Scholar 

  60. Jefferys, B. R., Kelley, L. A. & Sternberg, M. J. Protein folding requires crowd control in a simulated cell. J. Mol. Biol. 397, 1329–1338 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Marsh, J. A. et al. Protein complexes are under evolutionary selection to assemble via ordered pathways. Cell 153, 461–470 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Xiang, Z., Soto, C. S. & Honig, B. Evaluating conformational free energies: the colony energy and its application to the problem of loop prediction. Proc. Natl. Acad. Sci. USA 99, 7432–7437 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Jeffrey, P. D., Gorina, S. & Pavletich, N. P. Crystal structure of the tetramerization domain of the p53 tumor suppressor at 1.7 angstroms. Science 267, 1498–1502 (1995).

    Article  CAS  PubMed  Google Scholar 

  64. Antosiewicz, J., McCammon, J. A. & Gilson, M. K. The determinants of pKas in proteins. Biochemistry 35, 7819–7833 (1996).

    Article  CAS  PubMed  Google Scholar 

  65. Winter, U. & Geyer, T. Coarse grained simulations of a small peptide: effects of finite damping and hydrodynamic interactions. J. Chem. Phys. 131, 104102 (2009).

    Article  Google Scholar 

  66. Frembgen-Kesner, T. & Elcock, A. H. Striking effects of hydrodynamic interactions on the simulated diffusion and folding of proteins. J. Chem. Theory Comput. 5, 242–256 (2009).

    Article  CAS  PubMed  Google Scholar 

  67. Hogg, J. D. A DAG-based parallel Cholesky factorization for multicore systems. Report No. RAL-TR-2008-029 (Rutherford Appleton Laboratory, 2008).

  68. Velankar, S. et al. SIFTS: structure integration with function, taxonomy and sequences resource. Nucleic Acids Res. 41, D483–D489 (2013).

    Article  CAS  PubMed  Google Scholar 

Download references


We are grateful to G. Kramer and B. Bukau (Deutsches Krebsforschungszentrum, Heidelberg, Germany) for their generous gift of trigger factor protein and A. Drummond (Department of Biochemistry & Molecular Biology, University of Chicago) for the generous gift of plasmids. We also thank L. Byung-Gil for useful advice and N. Sanchez De Groot for technical support. We thank C. Vogel, M.T. Burgas and E. Arbely for helpful suggestions and critical reading. E.N. thanks N. Weiner and the ISEF foundation for their support. M.M.B., T.F. and G.C. are supported by the Medical Research Council (MC_U105185859). T.F. was also supported by the Boehringer Ingelheim Fond. B.P. and C.P. thank ‘Lendület’ Programme of the Hungarian Academy of Sciences and the Wellcome Trust for supporting this work and the European Research Council (C.P.). B.K. is supported by the János Bolyai Research Scholarship of the Hungarian Academy of Sciences and NKFI 120220. Z.M. is supported by GINOP-2.3.2-15-2016-00001. P.H. thanks the National Brain Research Programme and the TEKES Finland Distinguished Professor Grant for their support. S.A.T. thanks the Lister Institute, the MRC, the EMBL-European Bioinformatics Institute and the Wellcome Trust Sanger Institute. N.S. and T.E. were partly supported by Grants-in-Aid for Scientific Research from the Ministry of Education, Culture, Sports, Science and Technology (ME5a–c and Supplementary Fig. 10 areXT), mostly Innovative Areas of “Chemistry for Multimolecular Crowding in Biosystems” (JSPS KAKENHI Grant No. JP17H06351) and MEXT-Supported Program for the Strategic Research Foundation at Private Universities (2014-2019) and The Hirao Taro Foundation of KONAN GAKUEN for Academic Research. J.M. is supported by an MRC Career Development Award (MR/M02122X/1). C.R. is supported by the Medical Research Council, Grant Reference MR/N020413/1. L.H.V. was supported by EMBO (award number ALTF 698-2012), Directorate-General for Research and Innovation (FP7-PEOPLE-2010-IEF, ThPLAST 274192) and an EMBL Interdisciplinary Postdoctoral fellowship, supported by H2020 Marie Sklodowska Curie Actions. B.P. and H.P. acknowledge funding from GINOP-2.3.2-15-2016-00026. A.H.E.‘s work was supported by the National Institutes of Health through grant R01 GM099865. This work is dedicated to Jakob Natan and Shalom Marciano.

Author information

Authors and Affiliations



The study was conceived by E.N. and S.A.T. The study was coordinated by E.N. and S.A.T. The experiments were designed by E.N., L.H.V., B.K., B.P., C.P. and P.H. The experiments were conducted by E.N., T.E., N.S., A.H.E., B.K., L.D., E.O. and Z.M. Bioinformatic analysis was conducted by T.F. and J.A.M. Simulations were run by A.H.E. Machine-learning analysis was conducted by P.H. Data analysis was conducted by E.N., T.E., A.H.E., T.F., B.K., G.F., H.P., B.P., C.P. and G.C. The manuscript was written by E.N. and S.A.T. with contributions from all authors.

Corresponding authors

Correspondence to Eviatar Natan or Sarah A. Teichmann.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Distribution of interface-forming residues in homomers and heteromers

(a-b) Distribution of interface-forming residues for homomers, as in Fig. 2, divided into bacteria, eukaryotes and archaea. (a) Including all homomers in our dataset, (b) Including only full-length or nearly full-length homomers, in which the crystallized construct contains >90% of the residues present in the UniProt sequence of the full protein. There is no apparent C-terminal interface enrichment in archaea; since this is also by far the smallest group it is difficult to say whether or not this is due to genuine biological differences or due to small numbers. (c) Distribution of interface-forming residues for all heteromeric subunits as a control, showing no C-terminal enrichtment. (d) Relative enrichment in interface in the C-terminal halves of proteins compared to the N-terminal halves for all species with >100 non-redundant heteromeric subunit structures in our dataset. No significant enrichment for heteromers is evident, in contrast to homomers. (e) Distribution of interface-forming residues for heteromeric subunits from bacteria, eukaryotes and archaea. Error bars for all plots are calculated the same as in Fig. 2 with 104 bootstrapping replicates. Again, no significant enrichment is evident. (f) Given the notable difference between the enrichment observed for humans and rats in Fig. 2b, we compared the C-terminal interface enrichment in homomers from humans and rats, considering only those structures that are closely related (> 70% sequence identity) between the two groups. There is very little difference between the enrichment seen in human vs. rat structures. Error bars are calculated the same as in Fig. 2 with 104 bootstrap replicates.

Supplementary Figure 2 Comparison of the C-terminal interface enrichment for homomers of different lengths and from different symmetry groups

(a) Homomers were split into three equally sized groups of short, medium-length and long proteins, and the interface enrichment was plotted as in Fig. 2a. (b) Homomers were grouped on the basis of the most common symmetry types, and the interface enrichment was plotted as in Fig. 2a. (c) Relative enrichment of interface and solvent-accessible surface area across the length of proteins. These plots are analogous to Fig 2a, except interface enrichment is not normalized by solvent-accessible surface area; instead, both are shown separately. Solvent-accessible surface area is calculated only considering monomeric subunits, neglecting interactions.

Supplementary Figure 3 Workflow of the image-based high-throughput screen

(a) 611 native E. coli homomers with C-terminal GFP labels were compiled from the complete set of E. coli K-12 ASKA library. All proteins were over-expressed in 96-well plates and cells were imaged to determine GFP-signal. Using a supervised machine-learning algorithm each cell was classified into phenotypes. Two types of cells were selected: (i) ‘Green cells’, which are cells with homogeneous and high GFP signal along the cell, and (ii) ‘Dark cells’ are cells with GFP-signal at background levels. Finally, each homomer was classified into one of these groups, depending on which phenotype was predominant in the corresponding cell population. (b-c) N-terminal regions in the ‘Dark’ homomers are enriched in N-terminal interface forming residues as compared to ‘Green’ homomers across all length and relative interface size categories.(b) Homomers were split into two equally sized groups of short (left) and long (right) proteins. The relative enrichment of interface-forming residues along the protein length is shown in green and grey for ‘Green’ and ‘Dark’ cells, respectively, as in Fig. 2c. Error bars represent standard errors calculated from 104 bootstrapping replicates as before. In the ‘Dark’ group, N-terminal regions with significant interface enrichment (indicated with *) were observed as compared to ‘Green’ proteins, both in the long and short protein groups. (c) As for the length-based analysis, homomers were split into two equally sized groups based on relative interface size. Relative interface size was calculated for each protein by dividing the size of the homomer interface with the total available surface area of the protein. Both in the small and large relative interface size category significant enrichment of interface forming regions was observed for ‘Dark’ homomers as compared to the ‘Green’ ones. (d) The sub-group of cytoplasmic-only proteins was analyzed separately, and the observed enrichment trend was kept. The dataset for membrane proteins was too small, thus was not presented.

Supplementary Figure 4 Constructs of the YFP sublibrary and flow cytometry and ESI-MS characterization

(a) The different constructs are identical or almost identical in sequence composition. Three N-terminal variants were used in this work, monomeric, dimeric and tetrameric differing by a single amino acid residue. In addition, a construct with a tetrameric oligomerization-domain positioned at the C-terminus was also used. (b) Flow cytometry measurements of all four variants. The lines along the dot-plots indicate the intensity of the tetrameric (@N) and monomeric variants. The N-terminal tetrameric variants always show the lowest fluorescence level. The ratio is shown in (c). (d) Western Blot of Tet-SL-YFP, YFP-SL-Tet and empty vector, as shown in Fig. 3b. The blot is reprehensive blot, where each variant was expressed from three different colonies (n=3) and three times from each colony (total n=9). (e-h) Verifying the oligomerization state of YFP sub-library constructs using ESI-MS to measure the oligomeric state of the following four constructs: (e) Tet-SL-YFP, (f) YFP-SL-Tet (g) YFP-LL-Tet and (h) Mono-SL-YFP. All spectra possess charge state distributions with deconvoluted masses in agreement with the theoretical masses calculated from their amino acid sequence. Importantly, tandem MS (MSMS) experiments confirmed that the first three constructs (e–g) are tetramers by applying high energy ejecting a monomeric subunit and a trimeric complex, as shown in insets.

Supplementary Figure 5 Flow cytometry of constructs with tetrameric (Tet@N) or monomeric (Mono@N) oligomerization domain

(a) Flow cytometry of YFP constructs with with short-, medium-, or long-linker. The lines along the dot-plots indicate the intensity of the tetrameric and monomeric variants. The ratio calculated is shown in Fig. 4b. The plots are representative for all experiments (n>5). (b-c) Flow Cytometry of long-linker GFP or fGFP constructs with tetrameric (Tet@N) or monomeric (Mono@N) oligomerization-domain at 37 °C and 18 °C. (b) The lines along the dot-plots indicate the intensity of the monomeric constructs. As shown by Confocal Microscopy (Fig. 4c), there is only a very small difference between the fGFP variants. On the other hand the GFP tetrameric variant has a significantly lower fluorescence in comparison to the monomeric variant. This difference can be compromised if the strains are grown at 18°C. (c) Analysis and ratios calculated from data presented in (b). (Independent cell cultures replicates, **p-value < 0.01, *p-value < 0.05, double sided t-test. Error bars represent s.d.).

Supplementary Figure 6 Luciferase misassembly in vivo and in vitro is similar to the slow-folding GFP rather than the fast-folding fGFP and YFP

The Luciferase (Luc) reporter was chosen because of its significantly different fold compared to YFP and GFP, and for its slower folding-rate. (a) In vivo assay of luminescence level after normalization to the number of cells. The Tet-LL-Luc shows almost no signal. A single amino-acid substitution to generate a monomeric variant increases the levels of signal dramatically. Increasing the linker length for the tetrameric variant from short- to long-linker also increases dramatically. Similarly to the short-linker variants, the monomeric long-linker variant had much higher luminescence signal than the tetrameric variant. (b) In vitro results using the polysomic conditions in PURE system. The results were in agreement with the in vivo experiments. (c) Comparison of the monosomic conditions with that of the polysomic (as presented in B). According to our hypothesis, reducing the ribosome local concentration will decrease the frequency of cotranslational assembly events, thus decreasing misassembly. The results align with the hypothesis. Moreover, the C-terminal tetrameric construct, which cannot assemble cotranslationally, did not show a significant difference between the monosomic and polysomic conditions. This further confirms our hypothesis. (d-e) Similarly to Fig. 5, we tested the same three chaperone groups: [“KJE mix”, which includes DnaK, DnaJ and GrpE, “GroE mix”, which includes GroEL and GroES, and Trigger Factor (TF)]. Overall the effect of chaperones was similar, and even stronger in comparison to the GFP sub-library (p-value *<0.05, ** <0.01, results represent as mean of the different replicates and error bars represent s.d.).

Supplementary Figure 7 PURE System western blot of fGFP and GFP

(a) Western Blot of the different GFP constructs examined. For each construct, both the polysomic and monosomic conditions are shown. (b) Average quantification of (a). (Error bars represent s.d.).

Supplementary Figure 8 The effect of chaperones in vitro and in vivo

(a–c). Summary of GFP, fGFP and Luc sub-libraries expression, with or without chaperones, using the PURE system. In each row, the tetrameric and monomeric construct of the different sub-library is examined. (a) GFP, (b) fGFP and (c) Luc. Overall, the effect of chaperones correlated with oligomeric state, i.e. tetramer versus monomer and with folding-rate, i.e., fast- and slow-folding proteins. The highest rescue effect was achieved by the KJE mix, particularly with the tetrameric slow folding Luc and GFP. (Results represent as mean of the different replicates and error bars represent s.d.). (d–g) Analysis of the influence of chaperones on homomeric and heteromeric complexes in E. coli from Ref25. (d) Depletion of misfolded homomeric and heteromeric protein complexes from the soluble fraction of E. coli mutant with ΔKJT deletion (DnaK/DnaJ and TF are deleted). (e) The change in abundance of homomeric and heteromeric protein complexes the insoluble fraction is shown for the same E. coli mutant strain.(f) Interaction of homomeric and heteromeric complex proteins with DnaK (PD/BG ratio). The relative frequencies were normalised to account for the number of homomeric and heteromeric complexes. (g) Histograms of absolute numbers of interactions of homomeric and heteromeric complex proteins with DnaK (PD/BG ratio).

Supplementary Figure 9 Snapshots of the endpoint of all in silico simulations

The snapshots are of the three main constructs of the YFP sub-library: Tet-SL-YFP, YFP-SL-Tet and Tet-LL-YFP. Representatives of these simulations can be watched in Movies S1-S3. (a) Endpoint of twenty simulations of Tet-SL-YFP. (b) Endpoint of ten simulations of the constructs YFP-SL-Tet. (c) Endpoint of twenty simulations of Tet-LL-YFP. Symbol in bottom left of some simulation indicates that Tet was assembled.

Supplementary Figure 10 Evolutionary constraints as a function of assembly

(a) Real-time growth-rate of strains that express the different constructs used in this work. Each curve represents the average of three same culture replications. The plot is a representative of three such averaged curves. All experiments show the same trend: N-terminal tetrameric constructs consistently grow more slowly than the other variants. Error bars represent s.d. (b) All available E. coli homomeric protein structures were analyzed to create a library of protein structures of proteins with (i) a discrete oligomerization-domain and (ii) data predicting whether the protein folds post- or cotranslationally. Three such protein structures were found. Oligomerization-domains are shown in red, domain-linkers in cyan, and other domains in yellow. CTP Synthetase (PDB:1s1m) has an N-terminal oligomerization-domain, which may be compensated by a linker (cyan). The other two proteins have C-terminal oligomerization-domains, no linkers, and are: aspartate-semialdehyde dehydrogenase (PDB:1t4b), and alpha-N-Acetyl-galactosaminidase (PDB:2p53). More information about these proteins is provided in Supplementary Data Set 4.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–10, Supplementary Table 1 and Supplementary Note 1

Life Sciences Reporting Summary

Supplementary Dataset 1

‘Dark’ and ‘Green’ homomers from the genome wide in vivo screen. The table describes for each homomer the parameters from the in vivo screen, the structural parameters of the interface location and the Western Blot analysis.

Supplementary Dataset 2

Western blot analysis to test the expression of all the 136 ‘Dark’ and a selected set of 25 ‘Green’ homomers. The homomers were detected using a GFP-specific antibody. Both GFP negative (C–) and positive (C+) samples were loaded on each gel. Coomassie Brilliant Blue (CBB) stained separate gels (#1-5) or the membranes used for the Western blot assays (#6–14) are shown to justify the loading (shown in each case above the image of Western blotting). Asterisks show the expressed GFP-tagged proteins. Molecular masses (in kDa) are indicated on the left.

Supplementary Dataset 3

Protein complex immunoprecipitation (Co-IP). A strain with an empty vector and strains that express the tetrameric N-terminus (Tet-SL-YFP) and tetrameric C-terminus (YFP-SL-Tet) constructs were harvested a few hours after induction. Then, the cells’ contents were mixed with magnetic anti-HA antibody beads. The samples were washed and eluted. The eluted samples were run on an SDS gel, and selected bands were analyzed by MS. A list of the different proteins that were identified and their fold changes are indicated.

Supplementary Dataset 4

Characterization of examples of representative proteins as shown in Figure S10. The table describes E. coli structures that have oligomerization domains and have a known full-length protein structure as well as associated folding parameters.

Supplementary Dataset 5

Nonredundant sets of homomer structures. These are split into the sets of all complexes filtered for sequence redundancy across all structures, or at the species level, and the set of only full-length structures. The total amount of interface and monomer accessible surface area (in Å2) is given for the N-terminal and C-terminal halves of each protein. The same data is provided for heteromers.


Supplementary Video 1

Simulation of Tet@N with short linker. For all movies, the red segment is the Tet, and the yellow segment is the YFP β-barrel. Both cotranslational folding and misassembly take place once the Tet appear outside of the ribosome tunnels.

Supplementary Video 2

Simulation of Tet@C with short linker. There was no cotranslational assembly as the Tet of the leading ribosome leaves the ribosomal tunnel prior to the translation of the second ribosome, which allows it time to diffuse before the second Tet leaves the ribosome tunnel.

Supplementary Video 3

Simulation of Tet@N with long linker. Cotranslational assembly takes place, but not misassembly. Similarly to movies S1-S2, the red segment is the Tet and the yellow is the YFP reporter-gene, the cyan segment is the long-linker. The less frequent misassembly events fit the observed in vivo and in vitro data.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Natan, E., Endoh, T., Haim-Vilmovsky, L. et al. Cotranslational protein assembly imposes evolutionary constraints on homomeric proteins. Nat Struct Mol Biol 25, 279–288 (2018).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing