Much of systems biology aims to predict the behaviour of biological systems on the basis of the set of molecules involved. Understanding the interactions between these molecules is therefore crucial to such efforts.
A full understanding of how molecules interact comes only from three-dimensional structures, but structural biology is still difficult for complexes of two or more macromolecules. This makes the methods that are used to predict structural details for interactions of crucial importance.
The interactomes for Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, Helicobacter pylori, Escherichia coli and Homo sapiens that are available at present can be readily complemented by methods that predict interactions on the basis of genome context, expression patterns and using other data sources.
The molecular details of interactions can be predicted by protein docking, homology modelling, or identifying recurring interaction-sequence signatures, either a pair of domains or a domain and a short linear peptide.
Using these tools, it is possible to predict the structures of large molecular assemblies or the details of how cellular pathways operate.
Complementing the interactome with structural information will ultimately produce a more complete whole-cell framework at atomic-level detail, which will have a large impact on the study of biological systems.
Much of systems biology aims to predict the behaviour of biological systems on the basis of the set of molecules involved. Understanding the interactions between these molecules is therefore crucial to such efforts. Although many thousands of interactions are known, precise molecular details are available for only a tiny fraction of them. The difficulties that are involved in experimentally determining atomic structures for interacting proteins make predictive methods essential for progress. Structural details can ultimately turn abstract system representations into models that more accurately reflect biological reality.
Subscribe to Journal
Get full journal access for 1 year
only $4.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Levesque, M. P. & Benfey, P. N. Systems biology. Curr. Biol. 14, R179–R180 (2004).
Auffray, C., Imbeaud, S., Roux-Rouquie, M. & Hood, L. From functional genomics to systems biology: concepts and practices. C R Biol. 326, 879–892 (2003).
Aggarwal, K. & Lee, K. H. Functional genomics and proteomics as a foundation for systems biology. Brief Funct. Genom. Proteom. 2, 175–184 (2003).
Kitano, H. Computational systems biology. Nature 420, 206–210 (2002).
Rousseau, F. & Schymkowitz, J. A systems biology perspective on protein structural dynamics and signal transduction. Curr. Opin. Struct. Biol. 15, 23–30 (2005).
Pieper, U. et al. MODBASE, a database of annotated comparative protein structure models, and associated resources. Nucleic Acids Res. 32, D217–D222 (2004).
Muirhead, H. & Perutz, M. F. Structure of haemoglobin. A three-dimensional fourier synthesis of reduced human haemoglobin at 5.5 Å resolution. Nature 199, 633–638 (1963).
Ban, N., Nissen, P., Hansen, J., Moore, P. B. & Steitz, T. A. The complete atomic structure of the large ribosomal subunit at 2.4 Å resolution. Science 289, 905–920 (2000).
Cramer, P., Bushnell, D. A. & Kornberg, R. D. Structural basis of transcription: RNA polymerase II at 2.8 angstrom resolution. Science 292, 1863–1876 (2001).
Berger, I., Fitzgerald, D. J. & Richmond, T. J. Baculovirus expression system for heterologous multiprotein complexes. Nature Biotechnol. 22, 1583–1587 (2004).
Tan, S. A modular polycistronic expression system for overexpressing protein complexes in Escherichia coli. Protein Expr. Purif. 21, 224–234 (2001).
Kim, K. J. et al. Two-promoter vector is highly efficient for overproduction of protein complexes. Protein Sci. 13, 1698–1703 (2004).
Frank, J. Single-particle imaging of macromolecules by cryo-electron microscopy. Annu. Rev. Biophys. Biomol. Struct. 31, 303–319 (2002).
Uetz, P. et al. A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae. Nature 403, 623–627 (2000). The first high-throughput application of an interaction-discovery technique: the two-hybrid system being applied to the complete genome of Saccharomyces cerevisiae.
Ito, T. et al. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl Acad. Sci. USA 98, 4569–4574 (2001).
Rain, J. C. et al. The protein–protein interaction map of Helicobacter pylori. Nature 409, 211–215 (2001).
Li, S. et al. A map of the interactome network of the metazoan C. elegans. Science 303, 540–543 (2004).
Giot, L. et al. A protein interaction map of Drosophila melanogaster. Science 302, 1727–1736 (2003).
Gavin, A. C. et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 141–147 (2002).
Ho, Y. et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415, 180–183 (2002).
Butland, G. et al. Interaction network containing conserved and essential protein complexes in Escherichia coli. Nature 433, 531–537 (2005).
Stelzl, U. et al. A human protein–protein interaction network: a resource for annotating the proteome. Cell 122, 957–968 (2005).
Rual, J. F. et al. Towards a proteome-scale map of the human protein–protein interaction network. Nature 437, 1173–1178 (2005).
Aloy, P. et al. Structure-based assembly of protein complexes in yeast. Science 303, 2026–2029 (2004). The first attempt to model complexes on a large scale in an organism through the combined use of affinity purification, homology modelling and electron microscopy.
Aloy, P., Pichaud, M. & Russell, R. B. Protein complexes: structure prediction challenges for the 21st century. Curr. Opin. Struct. Biol. 15, 15–22 (2005).
Aloy, P. & Russell, R. B. The third dimension for protein interactions and complexes. Trends Biochem. Sci. 27, 633–638 (2002).
von Mering, C. et al. Comparative assessment of large-scale data sets of protein–protein interactions. Nature 417, 399–403 (2002).
Dandekar, T., Snel, B., Huynen, M. & Bork, P. Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem. Sci. 23, 324–328 (1998).
Marcotte, E. M., Pellegrinin, M., Thompson, M. J., Yeates, T. O. & Eisenberg, D. A combined algorithm for genome-wide prediction of protein function. Nature 402, 83–86 (1999). An excellent summary of the use of genomic context to predict functional associations between proteins and its application in prokaryotes.
Enright, A. J., Iliopoulos, I. L, Kyrpides, N. C. & Ouzounis, C. A. Protein interaction maps for complete genomes based on gene fusion events. Nature 402, 25–26 (1999).
Jansen, R. et al. A Bayesian networks approach for predicting protein–protein interactions from genomic data. Science 302, 449–453 (2003).
von Mering, C. et al. STRING: known and predicted protein–protein associations, integrated and transferred across organisms. Nucleic Acids Res. 33, D433–D437 (2005).
Gavin, A. C. et al. Proteome survey reveals modularity of the yeast cell machinery. Nature 22 Jan 2006 (10.1038/nature04532). The first attempt to define a pseudo-biophysical measurement directly from functional genomics data (affinity-purification results) and its application in defining the modular organization of protein complexes in S. cerevisiae.
Jones, R. B., Gordus, A., Krall, J. A. & MacBeath, G. A quantitative protein interaction network for the ErbB receptors using protein microarrays. Nature 439, 168–174 (2006).
Aloy, P. & Russell, R. B. Ten thousand interactions for the molecular biologist. Nature Biotechnol. 22, 1317–1321 (2004).
Lu, L., Arakaki, A. K., Lu, H. & Skolnick, J. Multimeric threading-based prediction of protein–protein interactions on a genomic scale: application to the Saccharomyces cerevisiae proteome. Genome Res. 13, 1146–1154 (2003). The first application of interaction modelling on a genome scale. The authors suggest that this approach is roughly as accurate as high-throughput experimental approaches.
Sprinzak, E. & Margalit, H. Correlated sequence-signatures as markers of protein–protein interaction. J. Mol. Biol. 311, 681–692 (2001). The first attempt to deduce details of protein interactions by looking for 'domain signatures' — pairs of domains that are seen repeatedly in several interactions.
Wojcik, J. & Schachter, V. Protein–protein interaction map inference using interacting domain profile pairs. Bioinformatics 17 (Suppl. 1), 296–305 (2001).
Deng, M., Mehta, S., Sun, F. & Chen, T. Inferring domain–domain interactions from protein–protein interactions. Genome Res. 12, 1540–1548 (2002).
Smith, G. R. & Sternberg, M. J. Prediction of protein–protein interactions by docking methods. Curr. Opin. Struct. Biol. 12, 28–35 (2002).
Wodak, S. J. & Mendez, R. Prediction of protein–protein interactions: the CAPRI experiment, its evaluation and implications. Curr. Opin. Struct. Biol. 14, 242–249 (2004).
Dominguez, C., Boelens, R. & Bonvin, A. M. HADDOCK: a protein–protein docking approach based on biochemical or biophysical information. J. Am. Chem. Soc. 125, 1731–1737 (2003).
Dobrodumov, A. & Gronenborn, A. M. Filtering and selection of structural models: combining docking and NMR. Proteins 53, 18–32 (2003).
Morillas, M. et al. Structural model of a malonyl-CoA-binding site of carnitine octanoyltransferase and carnitine palmitoyltransferase I: mutational analysis of a malonyl-CoA affinity domain. J. Biol. Chem. 277, 11473–11480 (2002).
Hothorn, M., Wolf, S., Aloy, P., Greiner, S. & Scheffzek, K. Structural insights into the target specificity of plant invertase and pectin methylesterase inhibitory proteins. Plant Cell 16, 3437–3447 (2004).
Aloy, P., Ceulemans, H., Stark, A. & Russell, R. B. The relationship between sequence and interaction divergence in proteins. J. Mol. Biol. 332, 989–998 (2003).
Park, S. Y., Beel, B. D., Simon, M. I., Bilwes, A. M. & Crane, B. R. In different organisms, the mode of interaction between two signaling proteins is not necessarily conserved. Proc. Natl Acad. Sci. USA 101, 11646–11651 (2004).
Aloy, P. & Russell, R. B. Interrogating protein interaction networks through structural biology. Proc. Natl Acad. Sci. USA 99, 5896–5901 (2002). The first method to use complexes of known 3D structure to test for putative interactions between the homologues of the proteins that are contained in a complex.
Lu, L., Lu, H. & Skolnick, J. MULTIPROSPECTOR: an algorithm for the prediction of protein–protein interactions by multimeric threading. Proteins 49, 350–364 (2002).
Bracken, C., Iakoucheva, L. M., Romero, P. R. & Dunker, A. K. Combining prediction, computation and experiment for the characterization of protein disorder. Curr. Opin. Struct. Biol. 14, 570–576 (2004).
Neduva, V. et al. Systematic discovery of new recognition peptides mediating protein interaction networks. PLoS Biol. 3, e405 (2005). The first attempt to discover and validate new domain–motif interacting pairs in high-throughput interaction data.
Puntervoll, P. et al. ELM server: a new resource for investigating short functional sites in modular eukaryotic proteins. Nucleic Acids Res. 31, 3625–3630 (2003).
Blom, N., Gammeltoft, S. & Brunak, S. Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. J. Mol. Biol. 294, 1351–1362 (1999).
Diella, F. et al. Phospho.ELM: a database of experimentally verified phosphorylation sites in eukaryotic proteins. BMC Bioinformatics 5, 79 (2004).
de Rinaldis, M., Ausiello, G., Cesareni, G. & Helmer-Citterich, M. Three-dimensional profiles: a new tool to identify protein surface similarities. J. Mol. Biol. 284, 1211–1221 (1998).
Sheinerman, F. B., Al-Lazikani, B. & Honig, B. Sequence, structure and energetic determinants of phosphopeptide selectivity of SH2 domains. J. Mol. Biol. 334, 823–841 (2003).
Zhou, H. X. Association and dissociation kinetics of colicin E3 and immunity protein 3: convergence of theory and experiment. Protein Sci. 12, 2379–2382 (2003).
Kambach, C. et al. Crystal structures of two Sm protein complexes and their implications for the assembly of the spliceosomal snRNPs. Cell 96, 375–387 (1999).
Kostyuchenko, V. A. et al. Three-dimensional structure of bacteriophage T4 baseplate. Nature Struct. Biol. 10, 688–693 (2003).
Shin, D. S. et al. Full-length archaeal Rad51 structure and mutants: mechanisms for RAD51 assembly and control by BRCA2. EMBO J. 22, 4566–4576 (2003).
Gao, H. et al. Study of the structural dynamics of the E. coli 70S ribosome using real-space refinement. Cell 113, 789–801 (2003).
Holmes, K. C., Angert, I., Kull, F. J., Jahn, W. & Schroder, R. R. Electron cryo-microscopy shows how strong binding of myosin to actin releases nucleotide. Nature 425, 423–427 (2003).
Inbar, Y., Benyamini, H., Nussinov, R. & Wolfson, H. J. Prediction of multimolecular assemblies by multiple docking. J. Mol. Biol. 349, 435–447 (2005).
Karp, P. D. et al. Expansion of the BioCyc collection of pathway/genome databases to 160 genomes. Nucleic Acids Res. 33, 6083–6089 (2005).
Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y. & Hattori, M. The KEGG resource for deciphering the genome. Nucleic Acids Res. 32, D277–D280 (2004).
Joshi-Tope, G. et al. Reactome: a knowledgebase of biological pathways. Nucleic Acids Res. 33, D428–D432 (2005). The description of a systems-biology representation of pathway information — a qualitative framework on which quantitative data can be superimposed when they become available.
Plotnikov, A. N., Schlessinger, J., Hubbard, S. R. & Mohammadi, M. Structural basis for FGF receptor dimerization and activation. Cell 98, 641–650 (1999).
Stauber, D. J., DiGabriele, A. D. & Hendrickson, W. A. Structural interactions of fibroblast growth factor receptor with its ligands. Proc. Natl Acad. Sci. USA 97, 49–54 (2000).
Yeh, B. K. et al. Structural basis by which alternative splicing confers specificity in fibroblast growth factor receptors. Proc. Natl Acad. Sci. USA 100, 2266–2271 (2003).
Olsen, S. K. et al. Insights into the molecular basis for fibroblast growth factor receptor autoinhibition and ligand-binding promiscuity. Proc. Natl Acad. Sci USA 101, 935–940 (2004).
Mohammadi, M. et al. Structures of the tyrosine kinase domain of fibroblast growth factor receptor in complex with inhibitors. Science 276, 955–960 (1997).
Mohammadi, M. et al. Identification of six novel autophosphorylation sites on fibroblast growth factor receptor 1 and elucidation of their importance in receptor activation and signal transduction. Mol. Cell. Biol. 16, 977–989 (1996).
Dunican, D. J., Williams, E. J., Howell, F. V. & Doherty, P. Selective inhibition of fibroblast growth factor (FGF)-stimulated mitogenesis by a FGF receptor-1-derived phosphopeptide. Cell Growth Differ. 12, 255–264 (2001).
Zhou, M. M. et al. Structure and ligand recognition of the phosphotyrosine binding domain of Shc. Nature 378, 584–592 (1995).
Cussac, D., Frech, M. & Chardin, P. Binding of the Grb2 SH2 domain to phosphotyrosine motifs does not change the affinity of its SH3 domains for Sos proline-rich motifs. EMBO J. 13, 4011–4021 (1994).
Hadari, Y. R., Kouhara, H., Lax, I. & Schlessinger, J. Binding of Shp2 tyrosine phosphatase to FRS2 is essential for fibroblast growth factor-induced PC12 cell differentiation. Mol. Cell. Biol. 18, 3966–3973 (1998).
Farooq, A., Zeng, L., Yan, K. S., Ravichandran, K. S. & Zhou, M. M. Coupling of folding and binding in the PTB domain of the signaling protein Shc. Structure 11, 905–913 (2003).
Nioche, P. et al. Crystal structures of the SH2 domain of Grb2: highlight on the binding of a new high-affinity inhibitor. J. Mol. Biol. 315, 1167–1177 (2002).
Maignan, S. et al. Crystal structure of the mammalian Grb2 adaptor. Science 268, 291–293 (1995).
Zhou, M. M. et al. Structural basis for IL-4 receptor phosphopeptide recognition by the IRS-1 PTB domain. Nature Struct. Biol. 3, 388–393 (1996).
Li, N. et al. Guanine-nucleotide-releasing factor hSos1 binds to Grb2 and links receptor tyrosine kinases to Ras signalling. Nature 363, 85–88 (1993).
Ghose, R., Shekhtman, A., Goger, M. J., Ji, H. & Cowburn, D. A novel, specific interaction involving the Csk SH3 domain and its natural ligand. Nature Struct. Biol. 8, 998–1004 (2001).
Boriack-Sjodin, P. A., Margarit, S. M., Bar-Sagi, D. & Kuriyan, J. The structural basis of the activation of Ras by Sos. Nature 394, 337–343 (1998).
Nassar, N. et al. The 2.2 Å crystal structure of the Ras-binding domain of the serine/threonine kinase c-Raf1 in complex with Rap1A and a GTP analogue. Nature 375, 554–560 (1995).
Bejsovec, A. Wnt pathway activation: new relations and locations. Cell 120, 11–14 (2005).
Haq, S. et al. Glycogen synthase kinase-3β is a negative regulator of cardiomyocyte hypertrophy. J. Cell Biol. 151, 117–130 (2000).
Barabasi, A. L. & Albert, R. Emergence of scaling in random networks. Science 286, 509–512 (1999).
Jeong, H., Mason, S. P., Barabasi, A. L. & Oltvai, Z. N. Lethality and centrality in protein networks. Nature 411, 41–42 (2001).
Aloy, P. & Russell, R. B. Potential artefacts in protein-interaction networks. FEBS Lett. 530, 253–254 (2002).
Han, J. D. et al. Evidence for dynamically organized modularity in the yeast protein–protein interaction network. Nature 430, 88–93 (2004).
Miller, M. E. & Cross, F. R. Cyclin specifiicity: how many wheels do you need on a unicycle? J. Cell Sci. 114, 1811–1820 (2001).
Medalia, O. et al. Macromolecular architecture in eukaryotic cells visualized by cryoelectron tomography. Science 298, 1209–1213 (2002). The first electron tomogram of a single cryo-frozen cell at a resolution of 4nm, which reveals much of the detail of the inside of a eukaryotic cell.
Nickell, S., Kofler, C., Leis, A. P. & Baumeister, W. A visual approach to proteomics. Nature Rev. Mol. Cell. Biol. 7, 225–230 (2006).
Sali, A., Glaeser, R., Earnest, T. & Baumeister, W. From words to literature in structural proteomics. Nature 422, 216–225 (2003).
Stoevesandt, O., Köhler, O., Fischer, R., Johnston, I. & Brock, R. One-step analysis of protein complexes in microliters of cell lysate. Nature Methods 2, 833–835 (2005).
Marcotte, E. M. et al. Detecting protein function and protein–protein interactions from genome sequences. Science 285, 751–753 (1999).
Pazos, F. & Valencia, A. Similarity of phylogenetic trees as indicator of protein–protein interaction. Protein Eng. 14, 609–614 (2001).
Xenarios, I. et al. DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res. 30, 303–305 (2002).
Alfarano, C. et al. The Biomolecular Interaction Network Database and related tools 2005 update. Nucleic Acids Res. 33, D418–D424 (2005).
Tetko, I. V. et al. MIPS bacterial genomes functional annotation benchmark dataset. Bioinformatics 21, 2520–2521 (2005).
Qin, J., Vinogradova, O. & Gronenborn, A. M. Protein–protein interactions probed by nuclear magnetic resonance spectroscopy. Methods Enzymol. 339, 377–389 (2001).
Stark, H., Dube, P., Luhrmann, R. & Kastner, B. Arrangement of RNA and proteins in the spliceosomal U1 small nuclear ribonucleoprotein particle. Nature 409, 539–542 (2001).
Aloy, P. et al. A complex prediction: three-dimensional model of the yeast exosome. EMBO Rep. 3, 628–635 (2002).
We thank T. Gibson (European Molecular Biology Laboratory (EMBL), Heidelberg, Germany), R. Jackson (University of Leeds, UK) and A. Bonvin (University of Utrecht, The Netherlands) for helpful discussions.
The authors declare no competing financial interests.
Protein Data Bank
- Structural genomics
Initiatives to solve X-ray or NMR structures in a high-throughput manner. They are usually focused on a single organism, pathway or disease, or are aimed at providing a complete set of protein folds (by solving representative structures, on the basis of which all other structures can be modelled).
- Homology modelling
A method of protein-structure prediction that uses a known structure as a modelling template for a homologue that has been identified on the basis of sequence similarity.
The protein-interaction equivalent of the genome. It denotes the set of interactions that occur in an organism.
- Chemical crosslinking
The process of chemically joining two molecules using a covalent bond. Chemical agents are used to determine near-neighbour relationships, to analyse protein structure, and to provide information on the distance between interacting molecules.
- Chemical footprinting
A method that takes advantage of chemical labelling to study protein–protein and protein–DNA interactions, by identifying the exact residues or DNA signature to which a protein binds.
- Protein arrays
Solid-phase, ligand-binding assays that use immobilized proteins on different surfaces (for example, glass or membranes). Bound proteins are normally identified using specific antibodies.
- Fluorescence resonance energy transfer
(FRET). The process of energy transfer between two fluorophores, which can be used to measure protein interactions in vivo. It can be used to determine the distance between two molecules or between two attachment positions in a macromolecule.
- Fluorescence cross-correlation spectroscopy
(FCCS). A technique that detects the synchronous movement of two biomolecules with different fluorescent labels. It can be applied to live cells.
- Protein-fold recognition (or threading)
A method of protein-structure prediction that attempts to find a suitable template on which to model a protein of unknown structure regardless of any sequence similarity (because dissimilar sequences can adopt similar protein folds). The sequence being queried is fitted, or threaded, onto a library of known structures to find out which one is most compatible (as measured by various structural criteria — for example, how well hydrophobic residues are buried).
- SH3 domain
(Src-homology-3 domain). A protein of about 50 amino acids that recognizes and binds to sequences that are rich in proline residues.
- SH2 domain
(Src-homology-2 domain). A protein motif that recognizes and binds to tyrosine-phosphorylated sequences, and thereby has a key role in relaying cascades of signal transduction.
- WW domain
A protein-interaction domain that is characterized by a pair of tryptophan residues that are 20–22 amino acids apart, and an invariant proline residue within a region of 40 amino acids. WW domains interact with proline-rich regions, including those containing phosphoserine or phosphothreonine.
- PDZ domain
(postsynaptic-density protein of 95 kDa, Discs large, Zona occludens-1). A protein-interaction domain that often occurs in scaffolding proteins and is named after the founding members of this protein family.
The early steps of recombination involving homologous pairing and strand exchange are promoted by proteins of the RecA/RAD51 family of recombinases in all organisms. Human RAD51 is a relatively small protein, but it is functional as a long helical polymer that is made up of hundreds of monomers.
A protein complex found in eukaryotes and archae that has 3′→5′ exonuclease activity and is involved in RNA processing and degradation.
- Electron tomography
A structural technique that allows a single cell to be studied using cryo-freezing and by obtaining data using a series of tilt angles in the electron beam, such that a three-dimensional image can be reconstructed.
About this article
Cite this article
Aloy, P., Russell, R. Structural systems biology: modelling protein interactions. Nat Rev Mol Cell Biol 7, 188–197 (2006). https://doi.org/10.1038/nrm1859
Current Opinion in Systems Biology (2021)
International Journal of Molecular Sciences (2020)
Current Opinion in Structural Biology (2020)
The CASP13-CAPRI targets as case studies to illustrate a novel scoring pipeline integrating CONSRANK with clustering and interface analyses
BMC Bioinformatics (2020)