Cotranslational protein assembly imposes evolutionary constraints on homomeric proteins

Natan, Eviatar; Endoh, Tamaki; Haim-Vilmovsky, Liora; Flock, Tilman; Chalancon, Guilhem; Hopper, Jonathan T. S.; Kintses, Bálint; Horvath, Peter; Daruka, Lejla; Fekete, Gergely; Pál, Csaba; Papp, Balázs; Oszi, Erika; Magyar, Zoltán; Marsh, Joseph A.; Elcock, Adrian H.; Babu, M. Madan; Robinson, Carol V.; Sugimoto, Naoki; Teichmann, Sarah A.

doi:10.1038/s41594-018-0029-5

Article
Published: 12 February 2018

Cotranslational protein assembly imposes evolutionary constraints on homomeric proteins

Eviatar Natan ORCID: orcid.org/0000-0002-0323-0034¹,
Tamaki Endoh²,
Liora Haim-Vilmovsky^3,4,
Tilman Flock ORCID: orcid.org/0000-0003-3398-0968⁵,
Guilhem Chalancon⁵,
Jonathan T. S. Hopper⁶,
Bálint Kintses⁷,
Peter Horvath^7,8,
Lejla Daruka⁷,
Gergely Fekete⁷,
Csaba Pál⁷,
Balázs Papp⁷,
Erika Oszi⁹,
Zoltán Magyar⁹,
Joseph A. Marsh ORCID: orcid.org/0000-0003-4132-0628¹⁰,
Adrian H. Elcock¹¹,
M. Madan Babu ORCID: orcid.org/0000-0003-0556-6196⁵,
Carol V. Robinson¹²,
Naoki Sugimoto^2,13 &
…
Sarah A. Teichmann^4,14

Nature Structural & Molecular Biology volume 25, pages 279–288 (2018)Cite this article

5515 Accesses
27 Citations
10 Altmetric
Metrics details

Subjects

Abstract

Cotranslational protein folding can facilitate rapid formation of functional structures. However, it can also cause premature assembly of protein complexes, if two interacting nascent chains are in close proximity. By analyzing known protein structures, we show that homomeric protein contacts are enriched toward the C termini of polypeptide chains across diverse proteomes. We hypothesize that this is the result of evolutionary constraints for folding to occur before assembly. Using high-throughput imaging of protein homomers in Escherichia coli and engineered protein constructs with N- and C-terminal oligomerization domains, we show that, indeed, proteins with C-terminal homomeric interface residues consistently assemble more efficiently than those with N-terminal interface residues. Using in vivo, in vitro and in silico experiments, we identify features that govern successful assembly of homomers, which have implications for protein design and expression optimization.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Illustration of the possible cotranslational assembly of homomeric proteins.**

**Fig. 2: Interface residues of native homomers are C-terminally enriched, correlating with stability of the protein.**

**Fig. 3: Position of the oligomerization domain is crucial for protein solubility.**

**Fig. 4: Extending the linker decreases misassembly rates.**

**Fig. 5: Misassembly as a function of oligomerization, folding rate and ribosome density, using PURE in vitro translation system.**

**Fig. 6: In silico simulation of the translation of different constructs.**

**Fig. 7: Cotranslational (mis)assembly as a function of sequence-intrinsic features.**

Co-translational assembly orchestrates competing biogenesis pathways

Article Open access 09 March 2022

Assembling membraneless organelles from de novo designed proteins

Article Open access 14 September 2023

Synthesis runs counter to directional folding of a nascent protein domain

Article Open access 09 October 2020

References

Elcock, A. H. Molecular simulations of cotranslational protein folding: fragment stabilities, folding cooperativity, and trapping in the ribosome. PLOS Comput. Biol. 2, e98 (2006).
Article PubMed PubMed Central Google Scholar
Sander, I. M., Chaney, J. L. & Clark, P. L. Expanding Anfinsen’s principle: contributions of synonymous codon selection to rational protein design. J. Am. Chem. Soc. 136, 858–861 (2014).
Article CAS PubMed PubMed Central Google Scholar
Pechmann, S. & Frydman, J. Evolutionary conservation of codon optimality reveals hidden signatures of cotranslational folding. Nat. Struct. Mol. Biol. 20, 237–243 (2013).
Article CAS PubMed Google Scholar
Levy, E. D. & Teichmann, S. Structural, evolutionary, and assembly principles of protein oligomerization. Prog. Mol. Biol. Transl. Sci. 117, 25–51 (2013).
Article CAS PubMed Google Scholar
Goodsell, D. S. & Olson, A. J. Structural symmetry and protein function. Annu. Rev. Biophys. Biomol. Struct. 29, 105–153 (2000).
Article CAS PubMed Google Scholar
Natan, E., Wells, J. N., Teichmann, S. A. & Marsh, J. A. Regulation, evolution and consequences of cotranslational protein complex assembly. Curr. Opin. Struct. Biol. 42, 90–97 (2017).
Article CAS PubMed Google Scholar
Shieh, Y. W. et al. Operon structure and cotranslational subunit association direct protein assembly in bacteria. Science 350, 678–680 (2015).
Article CAS PubMed Google Scholar
Borgia, M. B. et al. Single-molecule fluorescence reveals sequence-specific misfolding in multidomain proteins. Nature 474, 662–665 (2011).
Article CAS PubMed PubMed Central Google Scholar
Wright, C. F., Teichmann, S. A., Clarke, J. & Dobson, C. M. The importance of sequence diversity in the aggregation and evolution of proteins. Nature 438, 878–881 (2005).
Article CAS PubMed Google Scholar
Nissley, D. A. & O’Brien, E. P. Timing is everything: unifying codon translation rates and nascent proteome behavior. J. Am. Chem. Soc. 136, 17892–17898 (2014).
Article CAS PubMed Google Scholar
Buhr, F. et al. Synonymous codons direct cotranslational folding toward different protein conformations. Mol. Cell 61, 341–351 (2016).
Article CAS PubMed PubMed Central Google Scholar
Ahnert, S. E., Marsh, J. A., Hernández, H., Robinson, C. V. & Teichmann, S. A. Principles of assembly reveal a periodic table of protein complexes. Science 350, aaa2245 (2015).
Article PubMed Google Scholar
Kitagawa, M. et al. Complete set of ORF clones of Escherichia coli ASKA library (a complete set of E. coli K-12 ORF archive): unique resources for biological research. DNA Res. 12, 291–299 (2005).
Article CAS PubMed Google Scholar
Waldo, G. S., Standish, B. M., Berendzen, J. & Terwilliger, T. C. Rapid protein-folding assay using green fluorescent protein. Nat. Biotechnol. 17, 691–695 (1999).
Article CAS PubMed Google Scholar
Ugrinov, K. G. & Clark, P. L. Cotranslational folding increases GFP folding yield. Biophys. J. 98, 1312–1320 (2010).
Article CAS PubMed PubMed Central Google Scholar
Wells, J. N., Bergendahl, L. T. & Marsh, J. A. Co-translational assembly of protein complexes. Biochem. Soc. Trans. 43, 1221–1226 (2015).
Article CAS PubMed Google Scholar
Gaglia, G., Guan, Y., Shah, J. V. & Lahav, G. Activation and control of p53 tetramerization in individual living cells. Proc. Natl. Acad. Sci. USA 110, 15497–15501 (2013).
Article CAS PubMed PubMed Central Google Scholar
Lomax, M. E., Barnes, D. M., Hupp, T. R., Picksley, S. M. & Camplejohn, R. S. Characterization of p53 oligomerization domain mutations isolated from Li-Fraumeni and Li-Fraumeni like family members. Oncogene 17, 643–649 (1998).
Article CAS PubMed Google Scholar
Rajagopalan, S., Huang, F. & Fersht, A. R. Single-molecule characterization of oligomerization kinetics and equilibria of the tumor suppressor p53. Nucleic Acids Res. 39, 2294–2303 (2011).
Article CAS PubMed Google Scholar
Mateu, M. G., Sánchez Del Pino, M. M. & Fersht, A. R. Mechanism of folding and assembly of a small tetrameric protein domain from tumor suppressor p53. Nat. Struct. Biol. 6, 191–198 (1999).
Article CAS PubMed Google Scholar
Nicholls, C. D., McLure, K. G., Shields, M. A. & Lee, P. W. Biogenesis of p53 involves cotranslational dimerization of monomers and posttranslational dimerization of dimers. Implications on the dominant negative effect. J. Biol. Chem. 277, 12937–12945 (2002).
Article CAS PubMed Google Scholar
Mateu, M. G. & Fersht, A. R. Mutually compensatory mutations during evolution of the tetramerization domain of tumor suppressor p53 lead to impaired hetero-oligomerization. Proc. Natl. Acad. Sci. USA 96, 3595–3599 (1999).
Article CAS PubMed PubMed Central Google Scholar
Mateu, M. G. & Fersht, A. R. Nine hydrophobic side chains are key determinants of the thermodynamic stability and oligomerization status of tumour suppressor p53 tetramerization domain. EMBO J. 17, 2748–2758 (1998).
Article CAS PubMed PubMed Central Google Scholar
Iwasaki, S. & Ingolia, N. T. Seeing translation. Science 352, 1391–1392 (2016).
Article CAS PubMed Google Scholar
Jones, D. D., Stott, K. M., Howard, M. J. & Perham, R. N. Restricted motion of the lipoyl-lysine swinging arm in the pyruvate dehydrogenase complex of Escherichia coli. Biochemistry 39, 8448–8459 (2000).
Article CAS PubMed Google Scholar
Radford, S. E., Laue, E. D., Perham, R. N., Martin, S. R. & Appella, E. Conformational flexibility and folding of synthetic peptides representing an interdomain segment of polypeptide chain in the pyruvate dehydrogenase multienzyme complex of Escherichia coli. J. Biol. Chem. 264, 767–775 (1989).
CAS PubMed Google Scholar
Lengyel, J. S. et al. Extended polypeptide linkers establish the spatial architecture of a pyruvate dehydrogenase multienzyme complex. Structure 16, 93–103 (2008).
Article CAS PubMed Google Scholar
Natan, E. & Joerger, A. C. Structure and kinetic stability of the p63 tetramerization domain. J. Mol. Biol. 415, 503–513 (2012).
Article CAS PubMed PubMed Central Google Scholar
Natan, E. et al. Interaction of the p53 DNA-binding domain with its n-terminal extension modulates the stability of the p53 tetramer. J. Mol. Biol. 409, 358–368 (2011).
Article CAS PubMed PubMed Central Google Scholar
Tsai, C. J. et al. Synonymous mutations and ribosome stalling can lead to altered folding pathways and distinct minima. J. Mol. Biol. 383, 281–291 (2008).
Article CAS PubMed PubMed Central Google Scholar
O’Brien, E. P., Vendruscolo, M. & Dobson, C. M. Prediction of variable translation rate effects on cotranslational protein folding. Nat. Commun. 3, 868 (2012).
Article PubMed Google Scholar
Zhang, G. & Ignatova, Z. Folding at the birth of the nascent chain: coordinating translation with co-translational folding. Curr. Opin. Struct. Biol. 21, 25–31 (2011).
Article PubMed Google Scholar
Xu, C., Wang, S., Thibault, G. & Ng, D. T. Futile protein folding cycles in the ER are terminated by the unfolded protein O-mannosylation pathway. Science 340, 978–981 (2013).
Article CAS PubMed Google Scholar
Reid, B. G. & Flynn, G. C. Chromophore formation in green fluorescent protein. Biochemistry 36, 6786–6791 (1997).
Article CAS PubMed Google Scholar
Shimizu, Y., Kanamori, T. & Ueda, T. Protein synthesis by pure translation systems. Methods 36, 299–304 (2005).
Article CAS PubMed Google Scholar
O’Brien, E. P., Christodoulou, J., Vendruscolo, M. & Dobson, C. M. Trigger factor slows co-translational folding through kinetic trapping while sterically protecting the nascent chain from aberrant cytosolic interactions. J. Am. Chem. Soc. 134, 10920–10932 (2012).
Article PubMed Google Scholar
Niwa, T., Kanamori, T., Ueda, T. & Taguchi, H. Global analysis of chaperone effects using a reconstituted cell-free translation system. Proc. Natl. Acad. Sci. USA 109, 8937–8942 (2012).
Article CAS PubMed PubMed Central Google Scholar
Jaenicke, R. Protein folding: local structures, domains, subunits, and assemblies. Biochemistry 30, 3147–3161 (1991).
Article CAS PubMed Google Scholar
Schröder, H., Langer, T., Hartl, F. U. & Bukau, B. DnaK, DnaJ and GrpE form a cellular chaperone machinery capable of repairing heat-induced protein damage. EMBO J. 12, 4137–4144 (1993).
PubMed PubMed Central Google Scholar
Calloni, G. et al DnaK functions as a central hub in the E. coli chaperone network. Cell Rep. 1, 251–264 (2012).
Article CAS PubMed Google Scholar
Brandt, F. et al. The native 3D organization of bacterial polysomes. Cell 136, 261–271 (2009).
Article CAS PubMed Google Scholar
Marsh, J. A. & Teichmann, S. A. Structure, dynamics, assembly, and evolution of protein complexes. Annu. Rev. Biochem. 84, 551–575 (2015).
Article CAS PubMed Google Scholar
Levy, E. D., De, S. & Teichmann, S. A. Cellular crowding imposes global constraints on the chemistry and evolution of proteomes. Proc. Natl. Acad. Sci. USA 109, 20461–20466 (2012).
Article CAS PubMed PubMed Central Google Scholar
Jaenicke, R. & Lilie, H. Folding and association of oligomeric and multimeric proteins. Adv. Protein Chem. 53, 329–401 (2000).
Article CAS PubMed Google Scholar
Garcia-Seisdedos, H., Empereur-Mot, C., Elad, N. & Levy, E. D. Proteins evolve on the edge of supramolecular self-assembly. Nature 548, 244–247 (2017).
CAS PubMed Google Scholar
Peisajovich, S. G., Rockah, L. & Tawfik, D. S. Evolution of new protein topologies through multistep gene rearrangements. Nat. Genet. 38, 168–174 (2006).
Article CAS PubMed Google Scholar
Tam, S. et al. The chaperonin TRiC blocks a huntingtin sequence element that promotes the conformational switch to aggregation. Nat. Struct. Mol. Biol. 16, 1279–1285 (2009).
Article CAS PubMed PubMed Central Google Scholar
Levy, E. D. PiQSi: protein quaternary structure investigation. Structure 15, 1364–1367 (2007).
Article CAS PubMed Google Scholar
Marsh, J. A. & Teichmann, S. A. Protein flexibility facilitates quaternary structure assembly and evolution. PLoS Biol. 12, e1001870 (2014).
Article PubMed PubMed Central Google Scholar
Arnold, K., Bordoli, L., Kopp, J. & Schwede, T. The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling. Bioinformatics 22, 195–201 (2006).
Article CAS PubMed Google Scholar
Smith, K. et al. CIDRE: an illumination-correction method for optical microscopy. Nat. Methods 12, 404–406 (2015).
Article CAS PubMed Google Scholar
Carpenter, A. E. et al. CellProfiler: image analysis software for identifying and quantifying cell phenotypes. Genome Biol. 7, R100 (2006).
Article PubMed PubMed Central Google Scholar
Hall, M. et al. The WEKA data mining software: an update. SIGKDD Explor. 11, 10–18 (2009).
Article Google Scholar
Sobott, F., Hernández, H., McCammon, M. G., Tito, M. A. & Robinson, C. V. A tandem mass spectrometer for improved transmission and analysis of large macromolecular assemblies. Anal. Chem. 74, 1402–1407 (2002).
Article CAS PubMed Google Scholar
Hernández, H. & Robinson, C. V. Determining the stoichiometry and interactions of macromolecular assemblies from mass spectrometry. Nat. Protoc. 2, 715–726 (2007).
Article PubMed Google Scholar
Niwa, T. et al. Bimodal protein solubility distribution revealed by an aggregation analysis of the entire ensemble of Escherichia coli proteins. Proc. Natl. Acad. Sci. USA 106, 4201–4206 (2009).
Article CAS PubMed PubMed Central Google Scholar
Ermak, D. L. & McCammon, J. Brownian dynamics with hydrodynamic interactions. J. Chem. Phys. 69, 1352 (1978).
Article CAS Google Scholar
Agirrezabala, X. et al. Structural insights into cognate versus near-cognate discrimination during decoding. EMBO J. 30, 1497–1507 (2011).
Article CAS PubMed PubMed Central Google Scholar
Yang, F., Moss, L. G. & Phillips, G. N. Jr. The molecular structure of green fluorescent protein. Nat. Biotechnol. 14, 1246–1251 (1996).
Article CAS PubMed Google Scholar
Jefferys, B. R., Kelley, L. A. & Sternberg, M. J. Protein folding requires crowd control in a simulated cell. J. Mol. Biol. 397, 1329–1338 (2010).
Article CAS PubMed PubMed Central Google Scholar
Marsh, J. A. et al. Protein complexes are under evolutionary selection to assemble via ordered pathways. Cell 153, 461–470 (2013).
Article CAS PubMed PubMed Central Google Scholar
Xiang, Z., Soto, C. S. & Honig, B. Evaluating conformational free energies: the colony energy and its application to the problem of loop prediction. Proc. Natl. Acad. Sci. USA 99, 7432–7437 (2002).
Article CAS PubMed PubMed Central Google Scholar
Jeffrey, P. D., Gorina, S. & Pavletich, N. P. Crystal structure of the tetramerization domain of the p53 tumor suppressor at 1.7 angstroms. Science 267, 1498–1502 (1995).
Article CAS PubMed Google Scholar
Antosiewicz, J., McCammon, J. A. & Gilson, M. K. The determinants of pKas in proteins. Biochemistry 35, 7819–7833 (1996).
Article CAS PubMed Google Scholar
Winter, U. & Geyer, T. Coarse grained simulations of a small peptide: effects of finite damping and hydrodynamic interactions. J. Chem. Phys. 131, 104102 (2009).
Article Google Scholar
Frembgen-Kesner, T. & Elcock, A. H. Striking effects of hydrodynamic interactions on the simulated diffusion and folding of proteins. J. Chem. Theory Comput. 5, 242–256 (2009).
Article CAS PubMed Google Scholar
Hogg, J. D. A DAG-based parallel Cholesky factorization for multicore systems. Report No. RAL-TR-2008-029 (Rutherford Appleton Laboratory, 2008).
Velankar, S. et al. SIFTS: structure integration with function, taxonomy and sequences resource. Nucleic Acids Res. 41, D483–D489 (2013).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We are grateful to G. Kramer and B. Bukau (Deutsches Krebsforschungszentrum, Heidelberg, Germany) for their generous gift of trigger factor protein and A. Drummond (Department of Biochemistry & Molecular Biology, University of Chicago) for the generous gift of plasmids. We also thank L. Byung-Gil for useful advice and N. Sanchez De Groot for technical support. We thank C. Vogel, M.T. Burgas and E. Arbely for helpful suggestions and critical reading. E.N. thanks N. Weiner and the ISEF foundation for their support. M.M.B., T.F. and G.C. are supported by the Medical Research Council (MC_U105185859). T.F. was also supported by the Boehringer Ingelheim Fond. B.P. and C.P. thank ‘Lendület’ Programme of the Hungarian Academy of Sciences and the Wellcome Trust for supporting this work and the European Research Council (C.P.). B.K. is supported by the János Bolyai Research Scholarship of the Hungarian Academy of Sciences and NKFI 120220. Z.M. is supported by GINOP-2.3.2-15-2016-00001. P.H. thanks the National Brain Research Programme and the TEKES Finland Distinguished Professor Grant for their support. S.A.T. thanks the Lister Institute, the MRC, the EMBL-European Bioinformatics Institute and the Wellcome Trust Sanger Institute. N.S. and T.E. were partly supported by Grants-in-Aid for Scientific Research from the Ministry of Education, Culture, Sports, Science and Technology (ME5a–c and Supplementary Fig. 10 areXT), mostly Innovative Areas of “Chemistry for Multimolecular Crowding in Biosystems” (JSPS KAKENHI Grant No. JP17H06351) and MEXT-Supported Program for the Strategic Research Foundation at Private Universities (2014-2019) and The Hirao Taro Foundation of KONAN GAKUEN for Academic Research. J.M. is supported by an MRC Career Development Award (MR/M02122X/1). C.R. is supported by the Medical Research Council, Grant Reference MR/N020413/1. L.H.V. was supported by EMBO (award number ALTF 698-2012), Directorate-General for Research and Innovation (FP7-PEOPLE-2010-IEF, ThPLAST 274192) and an EMBL Interdisciplinary Postdoctoral fellowship, supported by H2020 Marie Sklodowska Curie Actions. B.P. and H.P. acknowledge funding from GINOP-2.3.2-15-2016-00026. A.H.E.‘s work was supported by the National Institutes of Health through grant R01 GM099865. This work is dedicated to Jakob Natan and Shalom Marciano.

Author information

Authors and Affiliations

The Aleph Lab Ltd, Oxford, UK
Eviatar Natan
Frontier Institute for Biomolecular Engineering Research (FIBER), Konan University, Kobe, Japan
Tamaki Endoh & Naoki Sugimoto
EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK
Liora Haim-Vilmovsky
Wellcome Trust Sanger Institute, Wellcome Genome Campus, Cambridge, UK
Liora Haim-Vilmovsky & Sarah A. Teichmann
MRC Laboratory of Molecular Biology, Cambridge, UK
Tilman Flock, Guilhem Chalancon & M. Madan Babu
OMass Technologies Ltd, Centre for Innovation & Enterprise, Oxford, UK
Jonathan T. S. Hopper
Synthetic and System Biology Unit, Biological Research Center of the Hungarian Academia of Sciences, Szeged, Hungary
Bálint Kintses, Peter Horvath, Lejla Daruka, Gergely Fekete, Csaba Pál & Balázs Papp
Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland
Peter Horvath
Institute of Plant Biology, Biological Research Center of the Hungarian Academia of Sciences, Szeged, Hungary
Erika Oszi & Zoltán Magyar
MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, UK
Joseph A. Marsh
Department of Biochemistry, University of Iowa, Iowa City, IA, USA
Adrian H. Elcock
Department of Chemistry, University of Oxford, Oxford, UK
Carol V. Robinson
Graduate School of Frontiers of Innovative Research in Science and Technology (FIRST), Konan University, Kobe, Japan
Naoki Sugimoto
Cavendish Laboratory, University of Cambridge, Cambridge, UK
Sarah A. Teichmann

Authors

Eviatar Natan
View author publications
You can also search for this author in PubMed Google Scholar
Tamaki Endoh
View author publications
You can also search for this author in PubMed Google Scholar
Liora Haim-Vilmovsky
View author publications
You can also search for this author in PubMed Google Scholar
Tilman Flock
View author publications
You can also search for this author in PubMed Google Scholar
Guilhem Chalancon
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan T. S. Hopper
View author publications
You can also search for this author in PubMed Google Scholar
Bálint Kintses
View author publications
You can also search for this author in PubMed Google Scholar
Peter Horvath
View author publications
You can also search for this author in PubMed Google Scholar
Lejla Daruka
View author publications
You can also search for this author in PubMed Google Scholar
Gergely Fekete
View author publications
You can also search for this author in PubMed Google Scholar
Csaba Pál
View author publications
You can also search for this author in PubMed Google Scholar
Balázs Papp
View author publications
You can also search for this author in PubMed Google Scholar
Erika Oszi
View author publications
You can also search for this author in PubMed Google Scholar
Zoltán Magyar
View author publications
You can also search for this author in PubMed Google Scholar
Joseph A. Marsh
View author publications
You can also search for this author in PubMed Google Scholar
Adrian H. Elcock
View author publications
You can also search for this author in PubMed Google Scholar
M. Madan Babu
View author publications
You can also search for this author in PubMed Google Scholar
Carol V. Robinson
View author publications
You can also search for this author in PubMed Google Scholar
Naoki Sugimoto
View author publications
You can also search for this author in PubMed Google Scholar
Sarah A. Teichmann
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The study was conceived by E.N. and S.A.T. The study was coordinated by E.N. and S.A.T. The experiments were designed by E.N., L.H.V., B.K., B.P., C.P. and P.H. The experiments were conducted by E.N., T.E., N.S., A.H.E., B.K., L.D., E.O. and Z.M. Bioinformatic analysis was conducted by T.F. and J.A.M. Simulations were run by A.H.E. Machine-learning analysis was conducted by P.H. Data analysis was conducted by E.N., T.E., A.H.E., T.F., B.K., G.F., H.P., B.P., C.P. and G.C. The manuscript was written by E.N. and S.A.T. with contributions from all authors.

Corresponding authors

Correspondence to Eviatar Natan or Sarah A. Teichmann.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Distribution of interface-forming residues in homomers and heteromers

(a-b) Distribution of interface-forming residues for homomers, as in Fig. 2, divided into bacteria, eukaryotes and archaea. (a) Including all homomers in our dataset, (b) Including only full-length or nearly full-length homomers, in which the crystallized construct contains >90% of the residues present in the UniProt sequence of the full protein. There is no apparent C-terminal interface enrichment in archaea; since this is also by far the smallest group it is difficult to say whether or not this is due to genuine biological differences or due to small numbers. (c) Distribution of interface-forming residues for all heteromeric subunits as a control, showing no C-terminal enrichtment. (d) Relative enrichment in interface in the C-terminal halves of proteins compared to the N-terminal halves for all species with >100 non-redundant heteromeric subunit structures in our dataset. No significant enrichment for heteromers is evident, in contrast to homomers. (e) Distribution of interface-forming residues for heteromeric subunits from bacteria, eukaryotes and archaea. Error bars for all plots are calculated the same as in Fig. 2 with 10⁴ bootstrapping replicates. Again, no significant enrichment is evident. (f) Given the notable difference between the enrichment observed for humans and rats in Fig. 2b, we compared the C-terminal interface enrichment in homomers from humans and rats, considering only those structures that are closely related (> 70% sequence identity) between the two groups. There is very little difference between the enrichment seen in human vs. rat structures. Error bars are calculated the same as in Fig. 2 with 10⁴ bootstrap replicates.

Supplementary Figure 2 Comparison of the C-terminal interface enrichment for homomers of different lengths and from different symmetry groups

(a) Homomers were split into three equally sized groups of short, medium-length and long proteins, and the interface enrichment was plotted as in Fig. 2a. (b) Homomers were grouped on the basis of the most common symmetry types, and the interface enrichment was plotted as in Fig. 2a. (c) Relative enrichment of interface and solvent-accessible surface area across the length of proteins. These plots are analogous to Fig 2a, except interface enrichment is not normalized by solvent-accessible surface area; instead, both are shown separately. Solvent-accessible surface area is calculated only considering monomeric subunits, neglecting interactions.

Supplementary Figure 3 Workflow of the image-based high-throughput screen

(a) 611 native E. coli homomers with C-terminal GFP labels were compiled from the complete set of E. coli K-12 ASKA library. All proteins were over-expressed in 96-well plates and cells were imaged to determine GFP-signal. Using a supervised machine-learning algorithm each cell was classified into phenotypes. Two types of cells were selected: (i) ‘Green cells’, which are cells with homogeneous and high GFP signal along the cell, and (ii) ‘Dark cells’ are cells with GFP-signal at background levels. Finally, each homomer was classified into one of these groups, depending on which phenotype was predominant in the corresponding cell population. (b-c) N-terminal regions in the ‘Dark’ homomers are enriched in N-terminal interface forming residues as compared to ‘Green’ homomers across all length and relative interface size categories.(b) Homomers were split into two equally sized groups of short (left) and long (right) proteins. The relative enrichment of interface-forming residues along the protein length is shown in green and grey for ‘Green’ and ‘Dark’ cells, respectively, as in Fig. 2c. Error bars represent standard errors calculated from 10⁴ bootstrapping replicates as before. In the ‘Dark’ group, N-terminal regions with significant interface enrichment (indicated with *) were observed as compared to ‘Green’ proteins, both in the long and short protein groups. (c) As for the length-based analysis, homomers were split into two equally sized groups based on relative interface size. Relative interface size was calculated for each protein by dividing the size of the homomer interface with the total available surface area of the protein. Both in the small and large relative interface size category significant enrichment of interface forming regions was observed for ‘Dark’ homomers as compared to the ‘Green’ ones. (d) The sub-group of cytoplasmic-only proteins was analyzed separately, and the observed enrichment trend was kept. The dataset for membrane proteins was too small, thus was not presented.

Supplementary Figure 4 Constructs of the YFP sublibrary and flow cytometry and ESI-MS characterization

(a) The different constructs are identical or almost identical in sequence composition. Three N-terminal variants were used in this work, monomeric, dimeric and tetrameric differing by a single amino acid residue. In addition, a construct with a tetrameric oligomerization-domain positioned at the C-terminus was also used. (b) Flow cytometry measurements of all four variants. The lines along the dot-plots indicate the intensity of the tetrameric (@N) and monomeric variants. The N-terminal tetrameric variants always show the lowest fluorescence level. The ratio is shown in (c). (d) Western Blot of Tet-SL-YFP, YFP-SL-Tet and empty vector, as shown in Fig. 3b. The blot is reprehensive blot, where each variant was expressed from three different colonies (n=3) and three times from each colony (total n=9). (e-h) Verifying the oligomerization state of YFP sub-library constructs using ESI-MS to measure the oligomeric state of the following four constructs: (e) Tet-SL-YFP, (f) YFP-SL-Tet (g) YFP-LL-Tet and (h) Mono-SL-YFP. All spectra possess charge state distributions with deconvoluted masses in agreement with the theoretical masses calculated from their amino acid sequence. Importantly, tandem MS (MSMS) experiments confirmed that the first three constructs (e–g) are tetramers by applying high energy ejecting a monomeric subunit and a trimeric complex, as shown in insets.

Supplementary Figure 5 Flow cytometry of constructs with tetrameric (Tet@N) or monomeric (Mono@N) oligomerization domain

(a) Flow cytometry of YFP constructs with with short-, medium-, or long-linker. The lines along the dot-plots indicate the intensity of the tetrameric and monomeric variants. The ratio calculated is shown in Fig. 4b. The plots are representative for all experiments (n>5). (b-c) Flow Cytometry of long-linker GFP or fGFP constructs with tetrameric (Tet@N) or monomeric (Mono@N) oligomerization-domain at 37 °C and 18 °C. (b) The lines along the dot-plots indicate the intensity of the monomeric constructs. As shown by Confocal Microscopy (Fig. 4c), there is only a very small difference between the fGFP variants. On the other hand the GFP tetrameric variant has a significantly lower fluorescence in comparison to the monomeric variant. This difference can be compromised if the strains are grown at 18°C. (c) Analysis and ratios calculated from data presented in (b). (Independent cell cultures replicates, **p-value < 0.01, *p-value < 0.05, double sided t-test. Error bars represent s.d.).

Supplementary Figure 6 Luciferase misassembly in vivo and in vitro is similar to the slow-folding GFP rather than the fast-folding fGFP and YFP

The Luciferase (Luc) reporter was chosen because of its significantly different fold compared to YFP and GFP, and for its slower folding-rate. (a) In vivo assay of luminescence level after normalization to the number of cells. The Tet-LL-Luc shows almost no signal. A single amino-acid substitution to generate a monomeric variant increases the levels of signal dramatically. Increasing the linker length for the tetrameric variant from short- to long-linker also increases dramatically. Similarly to the short-linker variants, the monomeric long-linker variant had much higher luminescence signal than the tetrameric variant. (b) In vitro results using the polysomic conditions in PURE system. The results were in agreement with the in vivo experiments. (c) Comparison of the monosomic conditions with that of the polysomic (as presented in B). According to our hypothesis, reducing the ribosome local concentration will decrease the frequency of cotranslational assembly events, thus decreasing misassembly. The results align with the hypothesis. Moreover, the C-terminal tetrameric construct, which cannot assemble cotranslationally, did not show a significant difference between the monosomic and polysomic conditions. This further confirms our hypothesis. (d-e) Similarly to Fig. 5, we tested the same three chaperone groups: [“KJE mix”, which includes DnaK, DnaJ and GrpE, “GroE mix”, which includes GroEL and GroES, and Trigger Factor (TF)]. Overall the effect of chaperones was similar, and even stronger in comparison to the GFP sub-library (p-value *<0.05, ** <0.01, results represent as mean of the different replicates and error bars represent s.d.).

Supplementary Figure 7 PURE System western blot of fGFP and GFP

(a) Western Blot of the different GFP constructs examined. For each construct, both the polysomic and monosomic conditions are shown. (b) Average quantification of (a). (Error bars represent s.d.).

Supplementary Figure 8 The effect of chaperones in vitro and in vivo

(a–c). Summary of GFP, fGFP and Luc sub-libraries expression, with or without chaperones, using the PURE system. In each row, the tetrameric and monomeric construct of the different sub-library is examined. (a) GFP, (b) fGFP and (c) Luc. Overall, the effect of chaperones correlated with oligomeric state, i.e. tetramer versus monomer and with folding-rate, i.e., fast- and slow-folding proteins. The highest rescue effect was achieved by the KJE mix, particularly with the tetrameric slow folding Luc and GFP. (Results represent as mean of the different replicates and error bars represent s.d.). (d–g) Analysis of the influence of chaperones on homomeric and heteromeric complexes in E. coli from Ref²⁵. (d) Depletion of misfolded homomeric and heteromeric protein complexes from the soluble fraction of E. coli mutant with ΔKJT deletion (DnaK/DnaJ and TF are deleted). (e) The change in abundance of homomeric and heteromeric protein complexes the insoluble fraction is shown for the same E. coli mutant strain.(f) Interaction of homomeric and heteromeric complex proteins with DnaK (PD/BG ratio). The relative frequencies were normalised to account for the number of homomeric and heteromeric complexes. (g) Histograms of absolute numbers of interactions of homomeric and heteromeric complex proteins with DnaK (PD/BG ratio).

Supplementary Figure 9 Snapshots of the endpoint of all in silico simulations

The snapshots are of the three main constructs of the YFP sub-library: Tet-SL-YFP, YFP-SL-Tet and Tet-LL-YFP. Representatives of these simulations can be watched in Movies S1-S3. (a) Endpoint of twenty simulations of Tet-SL-YFP. (b) Endpoint of ten simulations of the constructs YFP-SL-Tet. (c) Endpoint of twenty simulations of Tet-LL-YFP. Symbol in bottom left of some simulation indicates that Tet was assembled.

Supplementary Figure 10 Evolutionary constraints as a function of assembly

(a) Real-time growth-rate of strains that express the different constructs used in this work. Each curve represents the average of three same culture replications. The plot is a representative of three such averaged curves. All experiments show the same trend: N-terminal tetrameric constructs consistently grow more slowly than the other variants. Error bars represent s.d. (b) All available E. coli homomeric protein structures were analyzed to create a library of protein structures of proteins with (i) a discrete oligomerization-domain and (ii) data predicting whether the protein folds post- or cotranslationally. Three such protein structures were found. Oligomerization-domains are shown in red, domain-linkers in cyan, and other domains in yellow. CTP Synthetase (PDB:1s1m) has an N-terminal oligomerization-domain, which may be compensated by a linker (cyan). The other two proteins have C-terminal oligomerization-domains, no linkers, and are: aspartate-semialdehyde dehydrogenase (PDB:1t4b), and alpha-N-Acetyl-galactosaminidase (PDB:2p53). More information about these proteins is provided in Supplementary Data Set 4.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–10, Supplementary Table 1 and Supplementary Note 1

Life Sciences Reporting Summary

Supplementary Dataset 1

‘Dark’ and ‘Green’ homomers from the genome wide in vivo screen. The table describes for each homomer the parameters from the in vivo screen, the structural parameters of the interface location and the Western Blot analysis.

Supplementary Dataset 2

Western blot analysis to test the expression of all the 136 ‘Dark’ and a selected set of 25 ‘Green’ homomers. The homomers were detected using a GFP-specific antibody. Both GFP negative (C–) and positive (C+) samples were loaded on each gel. Coomassie Brilliant Blue (CBB) stained separate gels (#1-5) or the membranes used for the Western blot assays (#6–14) are shown to justify the loading (shown in each case above the image of Western blotting). Asterisks show the expressed GFP-tagged proteins. Molecular masses (in kDa) are indicated on the left.

Supplementary Dataset 3

Protein complex immunoprecipitation (Co-IP). A strain with an empty vector and strains that express the tetrameric N-terminus (Tet-SL-YFP) and tetrameric C-terminus (YFP-SL-Tet) constructs were harvested a few hours after induction. Then, the cells’ contents were mixed with magnetic anti-HA antibody beads. The samples were washed and eluted. The eluted samples were run on an SDS gel, and selected bands were analyzed by MS. A list of the different proteins that were identified and their fold changes are indicated.

Supplementary Dataset 4

Characterization of examples of representative proteins as shown in Figure S10. The table describes E. coli structures that have oligomerization domains and have a known full-length protein structure as well as associated folding parameters.

Supplementary Dataset 5

Nonredundant sets of homomer structures. These are split into the sets of all complexes filtered for sequence redundancy across all structures, or at the species level, and the set of only full-length structures. The total amount of interface and monomer accessible surface area (in Å2) is given for the N-terminal and C-terminal halves of each protein. The same data is provided for heteromers.

Videos

Supplementary Video 1

Simulation of Tet@N with short linker. For all movies, the red segment is the Tet, and the yellow segment is the YFP β-barrel. Both cotranslational folding and misassembly take place once the Tet appear outside of the ribosome tunnels.

Supplementary Video 2

Simulation of Tet@C with short linker. There was no cotranslational assembly as the Tet of the leading ribosome leaves the ribosomal tunnel prior to the translation of the second ribosome, which allows it time to diffuse before the second Tet leaves the ribosome tunnel.

Supplementary Video 3

Simulation of Tet@N with long linker. Cotranslational assembly takes place, but not misassembly. Similarly to movies S1-S2, the red segment is the Tet and the yellow is the YFP reporter-gene, the cyan segment is the long-linker. The less frequent misassembly events fit the observed in vivo and in vitro data.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Natan, E., Endoh, T., Haim-Vilmovsky, L. et al. Cotranslational protein assembly imposes evolutionary constraints on homomeric proteins. Nat Struct Mol Biol 25, 279–288 (2018). https://doi.org/10.1038/s41594-018-0029-5

Download citation

Received: 07 April 2017
Accepted: 10 January 2018
Published: 12 February 2018
Issue Date: March 2018
DOI: https://doi.org/10.1038/s41594-018-0029-5

This article is cited by

Proteome-wide landscape of solubility limits in a bacterial cell
- Ádám Györkei
- Lejla Daruka
- Balázs Papp
Scientific Reports (2022)
Co-translational assembly orchestrates competing biogenesis pathways
- Maximilian Seidel
- Anja Becker
- Martin Beck
Nature Communications (2022)
Designer protein assemblies with tunable phase diagrams in living cells
- Meta Heidenreich
- Joseph M. Georgeson
- Emmanuel D. Levy
Nature Chemical Biology (2020)