Key Points
-
Multidomain proteins account for over 50% of the proteome (over 70% in eukaryotes), yet the vast majority of folding studies focus on individual domains.
-
Single-domain folding studies have identified some broad principles, however only a small fraction of the fold space has been studied. Over half of the individual domains studied have been extracted from multidomain proteins.
-
The SCOP (Structural Classification of Proteins) and SUPERFAMILY databases have been used to identify and classify domain architecture in many genomes.
-
Analysis of three-dimensional protein structures has revealed general features of the domain structure of multidomain proteins, including the extent to which domain interface geometry is conserved and the structural characteristics of domain interfaces.
-
The few multidomain-folding studies completed so far are summarized and some general features of multidomain-protein folding emerge from this limited data set: large, dense interfaces correlate with folding dependence between domains, although there are exceptions.
-
There are many mechanisms that multidomain proteins use to protect themselves from the high local domain concentrations and to avoid misfolding.
Abstract
Analyses of genomes show that more than 70% of eukaryotic proteins are composed of multiple domains. However, most studies of protein folding focus on individual domains and do not consider how interactions between domains might affect folding. Here, we address this by analysing the three-dimensional structures of multidomain proteins that have been characterized experimentally and observe that where the interface is small and loosely packed, or unstructured, the folding of the domains is independent. Furthermore, recent studies indicate that multidomain proteins have evolved mechanisms to minimize the problems of interdomain misfolding.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Chothia, C. & Lesk, A. M. The relation between the divergence of sequence and structure in proteins. EMBO J. 5, 823–826 (1986).
Davis, F. P. & Sali, A. PIBASE: a comprehensive database of structurally defined protein interfaces. Bioinformatics 21, 1901–1907 (2005).
Stein, A., Russell, R. B. & Aloy, P. 3Did: interacting protein domains of known three-dimensional structure. Nucleic Acids Res. 33, D413–D417 (2005).
Winter, C., Henschel, A., Kim, W. K. & Schroeder, M. SCOPPI: a structural classification of protein–protein interfaces. Nucleic Acids Res. 34, D310–D314 (2006).
Murzin, A. G., Brenner, S. E., Hubbard, T. & Chothia, C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536–540 (1995). This paper describes the SCOP database — a seminal advance in the classification of protein structures. All the analyses of structures in our manuscript depend on this database.
Apic, G., Gough, J. & Teichmann, S. A. Domain combinations in archaeal, eubacterial and eukaryotic proteomes. J. Mol. Biol. 310, 311–325 (2001). The first comprehensive survey of multidomain proteins and domain-superfamily combinations across all three kingdoms of life. Showed the preponderance of multidomain proteins in genomes.
Ekman, D., Bjorklund, A. K., Frey-Skott, J. & Elofsson, A. Multi-domain proteins in the three kingdoms of life: orphan domains and other unassigned regions. J. Mol. Biol. 348, 231–243 (2005). An update on reference 6 that includes domain predictions for domains of unknown structure, achieving a much higher coverage of annotation. This includes an analysis of repeated domains and disordered regions, as well as identifying putative novel domains.
Gerstein, M. How representative are the known structures of the proteins in a complete genome? A comprehensive structural census. Fold. Des. 3, 497–512 (1998).
Liu, J. & Rost, B. CHOP proteins into structural domain-like fragments. Proteins 55, 678–688 (2004).
Teichmann, S. A., Park, J. & Chothia, C. Structural assignments to the Mycoplasma genitalium proteins show extensive gene duplications and domain rearrangements. Proc. Natl Acad. Sci. USA 95, 14658–14663 (1998).
Anfinsen, C. B. The rules that govern the folding of protein chains. Science 181, 223–230 (1973).
Plaxco, K. W., Simons, K. T. & Baker, D. Contact order, transition state placement and the refolding rates of single domain proteins. J. Mol. Biol. 277, 985–994 (1998).
Fersht, A. R. Transition-state structure as a unifying basis in protein-folding mechanisms: contact order, chain topology, stability and the extended nucleus mechanism. Proc. Natl Acad. Sci. USA 97, 1525–1529 (2000).
Lindberg, M., Tangrot, J. & Oliveberg, M. Complete change of the protein folding transition state upon circular permutation. Nature Struct. Biol. 9, 818–822 (2002).
Zarrine-Asfar, A., Larson, S. M. & Davidson, A. R. The family feud: do proteins with similar structures fold via the same pathway? Curr. Opin. Struct. Biol. 15, 42–49 (2005). A recent thorough review of the study of the folding of homologous proteins. An excellent introduction to the field of single-domain-protein folding.
Hamill, S. J., Steward, A. & Clarke, J. The folding of an immunoglobulin-like Greek key protein is defined by a common-core nucleus and regions constrained by topology. J. Mol. Biol. 297, 165–178 (2000).
Capaldi, A. P., Kleanthous, C. & Radford, S. E. Im7 folding mechanism: misfolding on a path to the native state. Nature Struct. Biol. 9, 209–216 (2002).
Friel, C. T., Capaldi, A. P. & Radford, S. E. Structural analysis of the rate-limiting transition states in the folding of Im7 and Im9: similarities and differences in the folding of homologous proteins. J. Mol. Biol. 326, 293–305 (2003).
Gianni, S. et al. Unifying features in protein-folding mechanisms. Proc. Natl Acad. Sci. USA 100, 13286–13291 (2003).
Wuchty, S. Scale-free behavior in protein domain networks. Mol. Biol. Evol. 18, 1694–1702 (2001).
Apic, G., Huber, W. & Teichmann, S. A. Multi-domain protein families and domain pairs: comparison with known structures and a random model of domain recombination. J. Struct. Funct. Genomics 4, 67–78 (2003).
Vogel, C., Berzuini, C., Bashton, M., Gough, J. & Teichmann, S. A. Supra-domains: evolutionary units larger than single protein domains. J. Mol. Biol. 336, 809–823 (2004).
Bashton, M. & Chothia, C. The geometry of domain combination in proteins. J. Mol. Biol. 315, 927–939 (2002).
Aloy, P., Ceulemans, H., Stark, A. & Russell, R. B. The relationship between sequence and interaction divergence in proteins. J. Mol. Biol. 332, 989–998 (2003). The first large-scale analysis of domain–domain interactions within and between polypeptide chains of known 3D structure. This showed a trend for more closely related domain pairs to be more conserved in their relative 3D orientation to each other than are divergent sequences.
Kim, W. K. & Ison, J. C. Survey of the geometric association of domain–domain interfaces. Proteins 61, 1075–1088 (2005).
Littler, S. J. & Hubbard, S. J. Conservation of orientation and sequence in protein domain–domain interactions. J. Mol. Biol. 345, 1265–1279 (2005).
Han, J. -H., Kerrison, N., Chothia, C. & Teichmann, S. A. Divergence of inter-domain geometry in two-domain proteins. Structure 14, 935–945 (2006). A careful analysis of over 100 2-domain multidomain families of known 3D structure. This showed that domain orientation is conserved in under two-thirds of the families, and that domain geometry can change across proteins of all different levels of sequence conservation.
Batey, S., Scott, K. A. & Clarke, J. Complex folding kinetics of a multidomain protein. Biophys. J. 90, 2120–2130 (2006).
Batey, S. & Clarke, J. Apparent cooperativity in the folding of multidomain proteins depends on the relative rates of folding of the constituent domains. Proc. Natl Acad. Sci. USA 103, 18113–18118 (2006). A study spelling out the complexity of experimental studies of multidomain-protein folding. It emphasizes the need to use both kinetics and thermodynamics methods.
Arora, P., Hammes, G. G. & Oas, T. G. Folding mechanism of a multiple independently-folding domain protein: double B domain of protein A. Biochemistry 45, 12312–12324 (2006).
Hamill, S. J., Meekhof, A. E. & Clarke, J. The effect of boundary selection on the stability and folding of the third fibronectin type III domain from human tenascin. Biochemistry 37, 8071–8079 (1998).
Politou, A. S., Gautel, M., Joseph, C. & Pastore, A. Immunoglobulin-type domains of titin are stabilized by amino-terminal extension. FEBS Lett. 352, 27–31 (1994).
Gay, G. D. et al. Conformational pathway of the polypeptide-chain of chymotrypsin inhibitor-2 growing from its N terminus in vitro. Parallels with the protein-folding pathway. J. Mol. Biol. 254, 968–979 (1995).
Bahadur, R. P., Chakrabarti, P., Rodier, F. & Janin, J. A dissection of specific and non-specific protein–protein interfaces. J. Mol. Biol. 336, 943–955 (2004).
Jones, S. & Thornton, J. Principles of protein–protein interactions. Proc. Natl Acad. Sci. USA 93, 13–20 (1996).
Grum, V. L., Li, D., Macdonald, R. I. & Mondragon, A. Structures of two repeats of spectrin suggest models of flexibility. Cell 98, 523–535 (1999).
Rothlisberger, D., Honegger, A. & Pluckthun, A. Domain interactions in the Fab fragment: a comparative evaluation of the single-chain Fv and Fab format engineered with variable domains of different stability. J. Mol. Biol. 347, 773–789 (2005).
Jackson, S. E. How do small single-domain proteins fold? Fold. Des. 3, R81–R91 (1998).
Sanchez, I. E., Morillas, M., Zobeley, E., Kiefhaber, T. & Glockshuber, R. Fast folding of the two-domain semliki forest virus capsid protein explains co-translational proteolytic activity. J. Mol. Biol. 338, 159–167 (2004).
Dobson, C. M. Protein folding and misfolding. Nature 426, 884–890 (2003).
Selkoe, D. J. Folding proteins in fatal ways. Nature 426, 900–904 (2003).
Pepys, M. B. Pathogenesis, diagnosis and treatment of systemic amyloidosis. Phil. Trans. R. Soc. Lond. B 356, 203–211 (2001).
Barral, J. M., Broadley, S. A., Schaffar, G. F. & Hartl, F. U. Roles of molecular chaperones in protein misfolding diseases. Semin. Cell. Dev. Biol. 15, 17–29 (2004).
Netzer, W. J. & Hartl, F. U. Recombination of protein domains facilitated by co-translational folding in eukaryotes. Nature 388, 343–349 (1997).
Rivenzon-Segal, D., Wolf, S. G., Shimon, L., Willson, K. R. & Horovitz, A. Sequential ATP-induced allosteric transitions of the cytplasmic chaperonin containing TCP-1 revealed by EM analysis. Nature Struct. Mol. Biol. 12, 233–237 (2005).
Carrion-Vazquez, M. et al. Mechanical and chemical unfolding of a single protein: a comparison. Proc. Natl Acad. Sci. USA 96, 3694–3699 (1999).
Oberhauser, A. F., Marszalek, P. E., Carrion-Vazquez, M. & Fernandez, J. M. Single protein misfolding events captured by atomic force microscopy. Nature Struct. Biol. 6, 1025–1028 (1999). An important paper that demonstrated the power of single-molecule studies to identify rare misfolded species in multidomain proteins.
Bjorklund, A. K., Ekman, D. & Elofsson, A. Expansion of protein domain repeats. PLoS Comput. Biol. 2, 959–970 (2006).
Weiner, J., 3rd, Beaussart, F. & Bornberg-Bauer, E. Domain deletions and substitutions in the modular protein evolution. FEBS J. 273, 2037–2047 (2006).
Ponting, C. P., Mott, R., Bork, P. & Copley, R. R. Novel protein domains and repeats in Drosophila melanogaster: insights into structure, function, and evolution. Genome Res. 11, 1996–2008 (2001).
Rajan, R. S., Illing, M. E., Bence, N. F. & Kopito, R. Specificity in intracellular protein aggregation and inclusion body formation. Proc. Natl Acad. Sci. USA 98, 13060–13065 (2001).
Wright, C. F., Teichmann, S. A., Clarke, J. & Dobson, C. M. The importance of sequence diversity in the aggregation and evolution of proteins. Nature 438, 878–881 (2005). The first study to investigate the potential problem of interdomain misfolding in multidomain proteins. This identifies the diversification of neighbouring domain sequences as an evolutionary mechanism for avoiding misfolding and aggregation.
Forman, J. R., Qamar, S., Paci, E., Sandford, R. N. & Clarke, J. The remarkable mechanical strength of polycystin-1 supports a direct role in mechanotransduction. J. Mol. Biol. 349, 861–871 (2005).
Best, R. B., Li, B., Steward, A., Daggett, V. & Clarke, J. Can non-mechanical proteins withstand force? Stretching barnase by atomic force microscopy and molecular dynamics simulation. Biophys. J. 81, 2344–2356 (2001).
Otzen, D. E., Kristensen, O. & Oliveberg, M. Designed protein tetramer zipped together with a hydrophobic Alzheimer homology: a structural clue to amyloid assembly. Proc. Natl Acad. Sci. USA 97, 9907–9912 (2000).
Parrini, C. et al. Glycine residues appear to be evolutionarily conserved for their ability to inhibit aggregation. Structure 13, 1143–1151 (2005).
Richardson, J. S. & Richardson, D. C. Natural β-sheet proteins use negative design to avoid edge-to-edge aggregation. Proc. Natl Acad. Sci. USA 99, 2754–2759 (2002).
Steward, A., Adhya, S. & Clarke, J. Sequence conservation in Ig-like domains: the role of highly conserved proline residues in the fibronecting type III superfamily. J. Mol. Biol. 318, 935–940 (2002).
Pearl, F. M. et al. The CATH database: an extended protein family resource for structural and functional genomics. Nucleic Acids Res. 31, 452–455 (2003).
Gough, J., Karplus, K., Hughey, R. & Chothia, C. Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. J. Mol. Biol. 313, 903–919 (2001).
Finn, R. D. et al. Pfam: clans, web tools, and services. Nucleic Acids Res. 34, D247–D251 (2006).
Fersht, A. R. Structure and Mechanism in Protein Science: a Guide to Enzyme Catalysis and Protein Folding (W. H. Freeman and Company, New York, 1999).
Batey, S., Randles, L. G., Steward, A. & Clarke, J. Cooperative folding in a multi-domain protein. J. Mol. Biol. 349, 1045–1059 (2005).
Randles, L. G., Rounsevell, R. W. S. & Clarke, J. Spectrin domains lose cooperativity in forced unfolding. Biophys. J. 92, 571–577 (2007).
Mayor, U. et al. The complete folding pathway of a protein from nanoseconds to microseconds. Nature 421, 863–867 (2003).
Religa, T. L., Markson, J. S., Mayor, U., Freund, S. M. & Fersht, A. R. Solution structure of a protein denatured state and folding intermediate. Nature 437, 1053–1056 (2005).
Chiti, F. et al. Mutational analysis of acylphosphatase suggests the importance of topology and contact order in protein folding. Nature Struct. Biol. 6, 1005–1009 (1999).
Villegas, V., Martinez, J. C., Aviles, F. X. & Serrano, L. Structure of the transition state in the folding process of human procarboxypeptidase A2 activation domain. J. Mol. Biol. 283, 1027–1036 (1998).
Otzen, D. E. & Oliveberg, M. Conformational plasticity in folding of the split β-α-β protein S6: Evidence for burst-phase disruption of the native state. J. Mol. Biol. 317, 613–627 (2002).
Ternstrom, T., Mayor, U., Akke, M. & Oliveberg, M. From snapshot to movie: φ analysis of protein folding transition states taken one step further. Proc. Natl Acad. Sci. USA 96, 14854–14859 (1999).
Cota, E., Steward, A., Fowler, S. B. & Clarke, J. The folding nucleus of a fibronectin type III domain is composed of core residues of the immunoglobulin-like fold. J. Mol. Biol. 305, 1185–1194 (2001).
Fowler, S. B. & Clarke, J. Mapping the folding pathway of an immunoglobulin domain: structural detail from φ value analysis and movement of the transition state. Structure 9, 355–366 (2001).
Lorch, M., Mason, J. M., Clarke, A. R. & Parker, M. J. Effects of core mutations on the folding of a β-sheet protein: implications for backbone organization in the I-state. Biochemistry 38, 1377–1385 (1999).
Guerois, R. & Serrano, L. The SH3-fold family: experimental evidence and prediction of variations in the folding pathways. J. Mol. Biol. 304, 967–982 (2000).
Martinez, J. C. & Serrano, L. The folding transition state between SH3 domains is conformationally restricted and evolutionarily conserved. Nature Struct. Biol. 6, 1010–1016 (1999).
Ventura, S. et al. Conformational strain in the hydrophobic core and its implications for protein folding and design. Nature Struct. Biol. 9, 485–493 (2002).
Northey, J. G., Di Nardo, A. A. & Davidson, A. R. Hydrophobic core packing in the SH3 domain folding transition state. Nature Struct. Biol. 9, 126–130 (2002).
Northey, J. G., Maxwell, K. L. & Davidson, A. R. Protein folding kinetics beyond the φ value: using multiple amino acid substitutions to investigate the structure of the SH3 domain folding transition state. J. Mol. Biol. 320, 389–402 (2002).
Guijarro, J. I., Morton, C. J., Plaxco, K. W., Campbell, I. D. & Dobson, C. M. Folding kinetics of the SH3 domain of PI3 kinase by real-time NMR combined with optical spectroscopy. J. Mol. Biol. 276, 657–667 (1998).
Riddle, D. S. et al. Experiment and theory highlight role of native state topology in SH3 folding. Nature Struct. Biol. 6, 1016–1024 (1999).
Osvath, S., Kohler, G., Zavodszky, P. & Fidy, J. Asymmetric effect of domain interactions on the kinetics of folding in yeast phosphoglycerate kinase. Protein Sci. 14, 1609–1616 (2005).
Wenk, M., Jaenicke, R. & Mayr, E. M. Kinetic stabilisation of a modular protein by domain interactions. FEBS Lett. 438, 127–130 (1998). One of the first studies to systematically characterize the folding of multidomain proteins using both kinetics and thermodynamics methods — a textbook example of how these studies should be done.
Chothia, C., Novotny, J., Bruccoleri, R. & Karplus, K. Domain association in immunoglobulin molecules. The packing of variable domains. J. Mol. Biol. 186, 651–663 (1985).
Jager, M., Gehrig, P. & Pluckthun, A. The scFv fragment of the antibody hu4D5–8: evidence for early premature domain interaction in refolding. J. Mol. Biol. 305, 1111–1129 (2001).
Improta, S. et al. The assembly of immunoglobulin-like modules in titin: implications for muscle assembly. J. Mol. Biol. 284, 761–777 (1998).
Politou, A. S., Gautel, M., Improta, S., Vangelista, L. & Pastore, A. The elastic I-band region of titin is assembled in a “modular” fashion by weakly interacting Ig-like domains. J. Mol. Biol. 255, 604–616 (1996).
Scott, K. A., Steward, A., Fowler, S. B. & Clarke, J. Titin; a multidomain protein that behaves as the sum of its parts. J. Mol. Biol. 315, 819–829 (2002).
Robertsson, J., Petzold, K., Lofvenberg, L. & Backman, L. Folding of spectrin's SH3 domain in the presence of spectrin repeats. Cell. Mol. Biol. Lett. 10, 595–612 (2005).
Acknowledgements
This work was funded by the Medical Research Council and the Wellcome Trust. J.C. is a Wellcome Trust Senior Research Fellow. We thank K. Levy and our colleagues in our laboratories for helpful discussions. L. Randles kindly provided Figure 2a.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary information S1 (table)
Single domain proteins where the folding pathway of more than one member of a fold has been studied - including genome analysis. (PDF 364 kb)
Supplementary information S2 (table)
Multidomain proteins with experimental characterisation of folding properties (including data from Table 2) (PDF 1723 kb)
Related links
Glossary
- Domain
-
A structural, functional and evolutionary component of proteins, which can often be expressed as a single unit.
- Domain interface
-
The surfaces of two domains where they interact with each other. In the context of this review, the two domains are part of the same polypeptide chain.
- Sequence clustering
-
Grouping of domain families using a sequence-similarity cut-off without any structural information. So, domain sequences from the same family are more closely related to each other than to domain sequences from other families.
- Family
-
A group of related proteins with obvious evolutionary relationships from the amino-acid sequence alone.
- Superfamily
-
A group of related proteins that have diverged beyond recognizable sequence similarity but with clear structural evolutionary relationships.
- Fold
-
Describes the number and the arrangement (topology) of secondary structural elements in a protein.
- Sequence profile
-
A sequence profile represents average sequence characteristics of aligned sequences. In particular, it gives position-dependent probability for each possible amino acid as well as for insertion or deletion events at each sequence position.
- All-α protein
-
A protein consisting entirely of α-helical structural elements.
- All-β protein
-
A protein consisting entirely of β-sheet elements.
- Contact order
-
A measure of the average sequence separation between contacting residues in the native state.
- Chain connectivity
-
The N- to C-terminal order of secondary structure elements (for example, helix, strand) in a domain.
- Fold space
-
The complete repertoire of identified folds.
- Protein domain architecture
-
Describes the domain content of a protein in N- to C-terminal order.
- Power law
-
One of the most frequent scaling laws that describes many natural phenomena. The relationship of the variables (for example, frequency versus number of domain partners per superfamily) following a power law can be written as y = axk. On a log–log graph, a power law is a straight line.
- Local atomic density
-
A measure of packing density for domain interfaces. This is a measure of the average number of atoms in an interface per unit surface area.
- Packing density
-
Packing density is the density of atoms at an interface.
- Fab fragment
-
Fab fragments represent the forked end of an antibody, which has a Y shape. They are generated by digesting an antibody with an enzyme called papain. A Fab fragment consists of two variable (V) and two constant (C) domains. One of each type of domain comes from a light (L) and a heavy (H) chain.
- Unfolding half-life
-
The time taken for half of the protein molecules to undergo an unfolding event — related to the unfolding rate constant. Proteins with a fast unfolding rate will have a short half-life, whereas those with a slow unfolding rate will have a long half-life.
Rights and permissions
About this article
Cite this article
Han, JH., Batey, S., Nickson, A. et al. The folding and evolution of multidomain proteins. Nat Rev Mol Cell Biol 8, 319–330 (2007). https://doi.org/10.1038/nrm2144
Published:
Issue Date:
DOI: https://doi.org/10.1038/nrm2144
This article is cited by
-
Folding pathway of a discontinuous two-domain protein
Nature Communications (2024)
-
Genome-wide analysis of cellulose synthase gene superfamily in Tectona grandis L.f.
3 Biotech (2024)
-
Characterisation of HOIP RBR E3 ligase conformational dynamics using integrative modelling
Scientific Reports (2022)
-
I-TASSER-MTD: a deep-learning-based platform for multi-domain protein structure and function prediction
Nature Protocols (2022)
-
The role of single-protein elasticity in mechanobiology
Nature Reviews Materials (2022)