Emerging methods in protein co-evolution

de Juan, David; Pazos, Florencio; Valencia, Alfonso

doi:10.1038/nrg3414

Review Article
Published: 05 March 2013

Emerging methods in protein co-evolution

David de Juan¹,
Florencio Pazos² &
Alfonso Valencia¹

Nature Reviews Genetics volume 14, pages 249–261 (2013)Cite this article

27k Accesses
422 Citations
42 Altmetric
Metrics details

Subjects

Key Points

Co-evolution is an essential component of evolution that contributes to maintain the structure of ecological and molecular networks while allowing species, and proteins and genes, to change and adapt over time.
The signatures of co-evolution detected by computational methods in multiple sequence alignments of protein families are intimately related with physical and functional interactions.
Co-evolutionary methods are applied to two different levels: inter-residue correlations in single proteins and correlations between evolutionary patterns of protein pairs or protein collections. Some hybrid methods combine both levels.
A new generation of methods able to single-out direct interactions, by efficiently dealing with complex networks of correlations, has been successfully applied to the detection of protein interaction partners and to the construction of protein structure models.
Co-evolutionary methodology has been applied and in many cases combined with experimental approaches to: protein modelling, detection of binding sites, deciphering protein mechanisms of action, prediction of protein–protein interaction partners and reconstruction of protein complexes and interaction networks.
Co-evolution-based methods have been independently developed and up to now have been considered unrelated. This general Review of the field prompts us to think that unifying co-evolutionary methods under a common framework would be an important step forward in the understanding of the molecular basis of co-evolution.

Abstract

Co-evolution is a fundamental component of the theory of evolution and is essential for understanding the relationships between species in complex ecological networks. A wide range of co-evolution-inspired computational methods has been designed to predict molecular interactions, but it is only recently that important advances have been made. Breakthroughs in the handling of phylogenetic information and in disentangling indirect relationships have resulted in an improved capacity to predict interactions between proteins and contacts between different protein residues. Here, we review the main co-evolution-based computational approaches, their theoretical basis, potential applications and foreseeable developments.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Co-evolutionary features extracted from protein multiple sequence alignments.**

**Figure 2: Influence of phylogenetic history in the association of co-evolution and different types of molecular interactions.**

Improving microbial phylogeny with citizen science within a mass-market video game

Article Open access 15 April 2024

Highly accurate protein structure prediction with AlphaFold

Article Open access 15 July 2021

Genome-wide association studies

Article 26 August 2021

References

Dobzhansky, T. Genetics of natural populations. XIX. Origin of heterosis through natural selection in populations of Drosophila pseudoobscura. Genetics 35, 288–302 (1950).
Article CAS PubMed PubMed Central Google Scholar
Wallace, B. On coadaptation in Drosophila. Am. Nat. 87, 343–358 (1953).
Article Google Scholar
Ehrlich, P. & Raven, P. Butterflies and plants: a study in coevolution. Evolution 18, 586–608 (1964).
Article Google Scholar
Thompson, J. N. The Coevolutionary Process (Univ. Chicago Press, 1994).
Book Google Scholar
Burton, R. & Rawson, P. Genetic architecture of physiological phenotypes: empirical evidence for coadapted gene complexes. Amer. Zool. 39, 451–462 (1999).
Article CAS Google Scholar
Fitch, W. M. & Markowitz, E. An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolution. Biochem. Genet. 4, 579–593 (1970).
Article CAS PubMed Google Scholar
Göbel, U., Sander, C., Schneider, R. & Valencia, A. Correlated mutations and residue contacts in proteins. Proteins 18, 309–317 (1994). This paper describes one of the first automatic approaches for extracting correlated patterns of amino acid replacements between positions of MSAs with the goal of predicting residues close in three-dimensional structures.
Article PubMed Google Scholar
Shindyalov, I. N., Kolchanov, N. A. & Sander, C. Can three-dimensional contacts in protein structures be predicted by analysis of correlated mutations? Protein Eng. 7, 349–358 (1994).
Article CAS PubMed Google Scholar
Taylor, W. R. & Hatrick, K. Compensating changes in protein multiple sequence alignments. Protein Eng. 7, 341–348 (1994).
Article CAS PubMed Google Scholar
Neher, E. How frequent are correlated changes in families of protein sequences? Proc. Natl Acad. Sci. USA 91, 98–102 (1994).
Article CAS PubMed PubMed Central Google Scholar
Pazos, F., Helmer-Citterich, M., Ausiello, G. & Valencia, A. Correlated mutations contain information about protein-protein interaction. J. Mol. Biol. 271, 511–523 (1997).
Article CAS PubMed Google Scholar
Tress, M. et al. Scoring docking models with evolutionary information. Proteins 60, 275–280 (2005).
Article CAS PubMed Google Scholar
Yeang, C.-H. & Haussler, D. Detecting coevolution in and among protein domains. PLoS Comp. Biol. 3, e211 (2007).
Article CAS Google Scholar
Burger, L. & van Nimwegen, E. Accurate prediction of protein–protein interactions from sequence alignments using a Bayesian method. Mol. Syst. Biol. 4, 165 (2008). Here, the authors present a parameter-free Bayesian method for predicting interaction partners from MSAs (eventually including paralogues) based on co-evolution between multiple positions of potential interacting partners.
Article PubMed PubMed Central CAS Google Scholar
Weigt, M., White, R. A., Szurmant, H., Hoch, J. A. & Hwa, T. Identification of direct residue contacts in protein–protein interaction by message passing. Proc. Natl Acad. Sci. USA 106, 67–72 (2009).
Article CAS PubMed Google Scholar
Schug, A., Weigt, M., Onuchic, J. N., Hwa, T. & Szurmant, H. High-resolution protein complexes from integrating genomic information with molecular simulation. Proc. Natl Acad. Sci. USA 106, 22124–22129 (2009).
Article CAS PubMed PubMed Central Google Scholar
Dago, A. E. et al. Structural basis of histidine kinase autophosphorylation deduced by integrating genomics, molecular dynamics, and mutagenesis. Proc. Natl Acad. Sci. USA 109, E1733–E1742 (2012).
Article CAS PubMed PubMed Central Google Scholar
Casari, G., Sander, C. & Valencia, A. A method to predict functional residues in proteins. Nature Struct. Biol. 2, 171–178 (1995). This is one of the original approaches detecting SDPs in MSAs. It is the basis for a family of methodologies that use PCA-related vectorial representations of the alignments to detect amino acid patterns associated with the corresponding protein subfamilies.
Article CAS PubMed Google Scholar
Lichtarge, O., Bourne, H. R. & Cohen, F. E. An evolutionary trace method defines binding surfaces common to protein families. J. Mol. Biol. 257, 342–358 (1996). This is the initial proposal of the evolutionary trace methodology. A simple analysis of differential sequence conservation at different levels of the family phylogenetic tree is used to locate protein-binding surfaces.
Article CAS PubMed Google Scholar
Goh, C. S., Bogan, A. A., Joachimiak, M., Walther, D. & Cohen, F. E. Co-evolution of proteins with their interaction partners. J. Mol. Biol. 299, 283–293 (2000).
Article CAS PubMed Google Scholar
Pazos, F. & Valencia, A. Similarity of phylogenetic trees as indicator of protein–protein interaction. Protein Eng. 14, 609–614 (2001). This is the initial publication of the 'MirrorTree' approach for the quantification of similarities of phylogenetic trees (represented by their distance matrices) to predict potential protein interactions.
Article CAS PubMed Google Scholar
Fryxell, K. J. The coevolution of gene family trees. Trends Genet. 12, 364–369 (1996).
Article CAS PubMed Google Scholar
Korber, B. T., Farber, R. M., Wolpert, D. H. & Lapedes, A. S. Covariation of mutations in the V3 loop of human immunodeficiency virus type 1 envelope protein: an information theoretic analysis. Proc. Natl Acad. Sci. USA 90, 7176–7180 (1993). This is one of the initial publications in the field of protein co-evolution. In this work, a mutual information method is used to detect co-evolving positions in a particular biological case.
Article CAS PubMed PubMed Central Google Scholar
Fodor, A. A. & Aldrich, R. W. Influence of conservation on calculations of amino acid covariance in multiple sequence alignments. Proteins 56, 211–221 (2004).
Article CAS PubMed Google Scholar
Dunn, S. D., Wahl, L. M. & Gloor, G. B. Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics 24, 333–340 (2008).
Article CAS PubMed Google Scholar
Olmea, O. & Valencia, A. Improving contact predictions by the combination of correlated mutations and other sources of sequence information. Fold. Des. 2, S25–S32 (1997).
Article CAS PubMed Google Scholar
Fares, M. A. & Travers, S. A. A. A novel method for detecting intramolecular coevolution: adding a further dimension to selective constraints analyses. Genetics 173, 9–23 (2006).
Article CAS PubMed PubMed Central Google Scholar
Tillier, E. R. M. & Lui, T. W. H. Using multiple interdependency to separate functional from phylogenetic correlations in protein alignments. Bioinformatics 19, 750–755 (2003).
Article CAS PubMed Google Scholar
Martin, L. C., Gloor, G. B., Dunn, S. D. & Wahl, L. M. Using information theory to search for co-evolving residues in proteins. Bioinformatics 21, 4116–4124 (2005).
Article CAS PubMed Google Scholar
Fairman, J. W. et al. Crystal structures of the outer membrane domain of intimin and invasin from enterohemorrhagic E. coli and enteropathogenic Y. pseudotuberculosis. Structure 20, 1233–1243 (2012).
Article CAS PubMed PubMed Central Google Scholar
Altschuh, D., Lesk, A. M., Bloomer, A. C. & Klug, A. Correlation of co-ordinated amino acid substitutions with function in viruses related to tobacco mosaic virus. J. Mol. Biol. 193, 693–707 (1987).
Article CAS PubMed Google Scholar
Oliveira, L., Paiva, A. C. M. & Vriend, G. Correlated mutation analyses on very large sequence families. Chembiochem 3, 1010–1017 (2002).
Article CAS PubMed Google Scholar
Fleishman, S. J., Yifrach, O. & Ben-Tal, N. An evolutionarily conserved network of amino acids mediates gating in voltage-dependent potassium channels. J. Mol. Biol. 340, 307–318 (2004).
Article CAS PubMed Google Scholar
Dutheil, J., Pupko, T., Jean-Marie, A. & Galtier, N. A model-based approach for detecting coevolving positions in a molecule. Mol. Biol. Evol. 22, 1919–1928 (2005).
Article CAS PubMed Google Scholar
Pollock, D. D., Taylor, W. R. & Goldman, N. Coevolving protein residues: maximum likelihood identification and relationship to structure. J. Mol. Biol. 287, 187–198 (1999).
Article CAS PubMed Google Scholar
Barker, D. & Pagel, M. Predicting functional gene links from phylogenetic-statistical analyses of whole genomes. PLoS Comp. Biol. 1, e3 (2005).
Article CAS Google Scholar
Lapedes, A. S., Giraud, B. G., Liu, L. C. & Stormo, G. D. Correlated mutations in models of protein sequences: phylogenetic and structural effects. Stat. Mol. Biol. Genet. 33, 236–256 (1999).
Article Google Scholar
Morcos, F. et al. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc. Natl Acad. Sci. USA 108, E1293–E1301 (2011). This is an efficient methodology based on reference 15 to extract direct couplings between positions in MSAs that can obtain accurate predictions of physical contacts for many very large MSAs.
Article CAS PubMed PubMed Central Google Scholar
Jones, D. T., Buchan, D. W. A., Cozzetto, D. & Pontil, M. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics 28, 184–190 (2012). This article presents an innovative methodology using sparse inverse covariance estimation techniques to remove indirect couplings between residues in very large MSAs.
Article CAS PubMed Google Scholar
Balakrishnan, S., Kamisetty, H., Carbonell, J. G., Lee, S.-I. & Langmead, C. J. Learning generative models for protein fold families. Proteins 79, 1061–1078 (2011).
Article CAS PubMed Google Scholar
Burger, L. & van Nimwegen, E. Disentangling direct from indirect co-evolution of residues in protein alignments. PLoS Comp. Biol. 6, e1000633 (2010).
Article CAS Google Scholar
Sreekumar, J., Braak, ter, C. J. F., van Ham, R. C. H. J. & van Dijk, A. D. J. Correlated mutations via regularized multinomial regression. BMC Bioinformatics 12, 444 (2011).
Article CAS PubMed PubMed Central Google Scholar
Di Lena, P., Nagata, K. & Baldi, P. Deep architectures for protein contact map prediction. Bioinformatics 28, 2449–2457 (2012).
Article CAS PubMed PubMed Central Google Scholar
Juan, D., Pazos, F. & Valencia, A. Co-evolution and co-adaptation in protein networks. FEBS Lett. 582, 1225–1230 (2008).
Article CAS PubMed Google Scholar
Pazos, F. & Valencia, A. Protein co-evolution, co-adaptation and interactions. EMBO J. 27, 2648–2655 (2008).
Article CAS PubMed PubMed Central Google Scholar
Madabushi, S. et al. Structural clusters of evolutionary trace residues are statistically significant and common in proteins. J. Mol. Biol. 316, 139–154 (2002).
Article CAS PubMed Google Scholar
del Sol Mesa, A., Pazos, F. & Valencia, A. Automatic methods for predicting functionally important residues. J. Mol. Biol. 326, 1289–1302 (2003).
Article CAS PubMed Google Scholar
Rausell, A., Juan, D., Pazos, F. & Valencia, A. Protein interactions and ligand binding: from protein subfamilies to functional specificity. Proc. Natl Acad. Sci. USA 107, 1995–2000 (2010). This is a recent methodology for the automatic detection of subfamilies and SDPs in MSAs. The application of this method to a large set of protein families demonstrates the relation between SDPs and regions of functional importance for binding to specific interactors and substrates.
Article CAS PubMed PubMed Central Google Scholar
Rodriguez, G. J., Yao, R., Lichtarge, O. & Wensel, T. G. Evolution-guided discovery and recoding of allosteric pathway specificity determinants in psychoactive bioamine receptors. Proc. Natl Acad. Sci. USA 107, 7787–7792 (2010).
Article CAS PubMed PubMed Central Google Scholar
Socolich, M. et al. Evolutionary information for specifying a protein fold. Nature 437, 512–518 (2005).
Article CAS PubMed Google Scholar
Lockless, S. W. & Ranganathan, R. Evolutionarily conserved pathways of energetic connectivity in protein families. Science 286, 295–299 (1999).
Article CAS PubMed Google Scholar
Kass, I. & Horovitz, A. Mapping pathways of allosteric communication in GroEL by analysis of correlated mutations. Proteins 48, 611–617 (2002).
Article CAS PubMed Google Scholar
Süel, G. M., Lockless, S. W., Wall, M. A. & Ranganathan, R. Evolutionarily conserved networks of residues mediate allosteric communication in proteins. Nature Struct. Biol. 10, 59–69 (2003).
Article PubMed CAS Google Scholar
Reynolds, K. A., McLaughlin, R. N. & Ranganathan, R. Hot spots for allosteric regulation on protein surfaces. Cell 147, 1564–1575 (2011). This work demonstrates that mutations at surface residues predicted by SCAnew (a method based on reference 51) modify the activity of the active site of selected proteins by altering the chain of allosteric interactions.
Article CAS PubMed PubMed Central Google Scholar
Hannenhalli, S. S. & Russell, R. B. Analysis and prediction of functional sub-types from protein sequence alignments. J. Mol. Biol. 303, 61–76 (2000).
Article CAS PubMed Google Scholar
Mihalek, I., Res, I. & Lichtarge, O. A family of evolution-entropy hybrid methods for ranking protein residues by importance. J. Mol. Biol. 336, 1265–1282 (2004). An improved version of the evolutionary trace methodology (reference 19) that incorporates an entropy-based quantification of the conservation of each position in a MSA for the different partitions of the corresponding family phylogenetic tree.
Article CAS PubMed Google Scholar
Mirny, L. A. & Gelfand, M. S. Using orthologous and paralogous proteins to identify specificity-determining residues in bacterial transcription factors. J. Mol. Biol. 321, 7–20 (2002).
Article CAS PubMed Google Scholar
Kalinina, O. V., Gelfand, M. S. & Russell, R. B. Combining specificity determining and conserved residues improves functional site prediction. BMC Bioinformatics 10, 174 (2009).
Article PubMed PubMed Central CAS Google Scholar
Landgraf, R., Xenarios, I. & Eisenberg, D. Three-dimensional cluster analysis identifies interfaces and functional residue clusters in proteins. J. Mol. Biol. 307, 1487–1502 (2001).
Article CAS PubMed Google Scholar
Reva, B., Antipin, Y. & Sander, C. Determinants of protein function revealed by combinatorial entropy optimization. Genome Biol. 8, R232 (2007).
Article PubMed PubMed Central CAS Google Scholar
Marttinen, P., Corander, J., Törönen, P. & Holm, L. Bayesian search of functionally divergent protein subgroups and their function specific residues. Bioinformatics 22, 2466–2474 (2006).
Article CAS PubMed Google Scholar
Halabi, N., Rivoire, O., Leibler, S. & Ranganathan, R. Protein sectors: evolutionary units of three-dimensional structure. Cell 138, 774–786 (2009). This work shows that SCAnew (reference 54) can detect 'protein sectors' (that is, pseudo-independent groups of correlated positions of the MSA) that are related to the structural and functional organization of proteins in a selected number of examples.
Article CAS PubMed PubMed Central Google Scholar
Brown, C. A. & Brown, K. S. Validation of coevolving residue algorithms via pipeline sensitivity analysis: ELSC and OMES and ZNMI, oh my! PLoS ONE 5, e10779 (2010).
Article PubMed PubMed Central CAS Google Scholar
Harrington, E. D., Jensen, L. J. & Bork, P. Predicting biological networks from genomic data. FEBS Lett. 582, 1251–1258 (2008).
Article CAS PubMed Google Scholar
Wass, M. N., David, A. & Sternberg, M. J. Challenges for the prediction of macromolecular interactions. Curr. Opin. Struct. Biol. 21, 382–390 (2011).
Article CAS PubMed Google Scholar
von Mering, C. et al. STRING: a database of predicted functional associations between proteins. Nucleic Acids Res. 31, 258–261 (2003).
Article CAS PubMed PubMed Central Google Scholar
Pazos, F., Ranea, J. A. G., Juan, D. & Sternberg, M. J. E. Assessing protein co-evolution in the context of the tree of life assists in the prediction of the interactome. J. Mol. Biol. 352, 1002–1015 (2005).
Article CAS PubMed Google Scholar
Sato, T., Yamanishi, Y., Kanehisa, M. & Toh, H. The inference of protein–protein interactions by co-evolutionary analysis is improved by excluding the information about the phylogenetic relationships. Bioinformatics 21, 3482–3489 (2005).
Article CAS PubMed Google Scholar
Kann, M. G., Jothi, R., Cherukuri, P. F. & Przytycka, T. M. Predicting protein domain interactions from coevolution of conserved regions. Proteins 67, 811–820 (2007).
Article CAS PubMed Google Scholar
Sato, T., Yamanishi, Y., Horimoto, K., Kanehisa, M. & Toh, H. Partial correlation coefficient between distance matrices as a new indicator of protein–protein interactions. Bioinformatics 22, 2488–2492 (2006).
Article CAS PubMed Google Scholar
Juan, D., Pazos, F. & Valencia, A. High-confidence prediction of global interactomes based on genome-wide coevolutionary networks. Proc. Natl Acad. Sci. USA 105, 934–939 (2008). This methodology relies on the whole set of pairwise similarities between phylogenetic trees within a given proteome (co-evolutionary network) to reassess the co-evolutionary signal of every pair of proteins. The method predicts interactions at the level of macromolecular complexes and functional units for fully sequenced genomes.
Article CAS PubMed PubMed Central Google Scholar
Herman, D. et al. Selection of organisms for the co-evolution-based study of protein interactions. BMC Bioinformatics 12, 363 (2011).
Article CAS PubMed PubMed Central Google Scholar
Choi, K. & Gomez, S. M. Comparison of phylogenetic trees through alignment of embedded evolutionary distances. BMC Bioinformatics 10, 423 (2009).
Article PubMed PubMed Central CAS Google Scholar
Tillier, E. R. M. & Charlebois, R. L. The human protein coevolution network. Genome Res. 19, 1861–1871 (2009).
Article CAS PubMed PubMed Central Google Scholar
Ramani, A. K. & Marcotte, E. M. Exploiting the co-evolution of interacting proteins to discover interaction specificity. J. Mol. Biol. 327, 273–284 (2003).
Article CAS PubMed Google Scholar
Jothi, R., Kann, M. G. & Przytycka, T. M. Predicting protein–protein interaction by searching evolutionary tree automorphism space. Bioinformatics 21 (Suppl. 1), i241–i250 (2005).
Article CAS PubMed Google Scholar
Izarzugaza, J. M., Juan, D., Pons, C., Pazos, F. & Valencia, A. Enhancing the prediction of protein pairings between interacting families using orthology information. BMC Bioinformatics 9, 35 (2008).
Article PubMed PubMed Central CAS Google Scholar
Jothi, R., Cherukuri, P. F., Tasneem, A. & Przytycka, T. M. Co-evolutionary analysis of domains in interacting proteins reveals insights into domain–domain interactions mediating protein–protein interactions. J. Mol. Biol. 362, 861–875 (2006).
Article CAS PubMed PubMed Central Google Scholar
Kann, M. G., Shoemaker, B. A., Panchenko, A. R. & Przytycka, T. M. Correlated evolution of interacting proteins: looking behind the MirrorTree. J. Mol. Biol. 385, 91–98 (2009).
Article CAS PubMed Google Scholar
Pellegrini, M., Marcotte, E. M., Thompson, M. J., Eisenberg, D. & Yeates, T. O. Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc. Natl Acad. Sci. USA 96, 4285–4288 (1999).
Article CAS PubMed PubMed Central Google Scholar
Gaasterland, T. & Ragan, M. A. Microbial genescapes: phyletic and functional patterns of ORF distribution among prokaryotes. Microb. Comp. Genom. 3, 199–217 (1998).
Article CAS Google Scholar
Date, S. V. & Marcotte, E. M. Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages. Nature Biotech. 21, 1055–1062 (2003).
Article CAS Google Scholar
Ranea, J. A. G., Yeats, C., Grant, A. & Orengo, C. A. Predicting protein function with hierarchical phylogenetic profiles: the Gene3D Phylo-Tuner method applied to eukaryotic genomes. PLoS Comp. Biol. 3, e237 (2007).
Article CAS Google Scholar
Zhou, Y., Wang, R., Li, L., Xia, X. & Sun, Z. Inferring functional linkages between proteins from evolutionary scenarios. J. Mol. Biol. 359, 1150–1159 (2006).
Article CAS PubMed Google Scholar
Ta, H. X., Koskinen, P. & Holm, L. A novel method for assigning functional linkages to proteins using enhanced phylogenetic trees. Bioinformatics 27, 700–706 (2011).
Article CAS PubMed Google Scholar
Sun, J. et al. Refined phylogenetic profiles method for predicting protein-protein interactions. Bioinformatics 21, 3409–3415 (2005).
Article CAS PubMed Google Scholar
Jothi, R., Przytycka, T. M. & Aravind, L. Discovering functional linkages and uncharacterized cellular pathways using phylogenetic profile comparisons: a comprehensive assessment. BMC Bioinformatics 8, 173 (2007).
Article PubMed PubMed Central CAS Google Scholar
Pazos, F. & Valencia, A. In silico two-hybrid system for the selection of physically interacting protein pairs. Proteins 47, 219–227 (2002).
Article CAS PubMed Google Scholar
Tillier, E. R. M., Biro, L., Li, G. & Tillo, D. Codep: maximizing co-evolutionary interdependencies to discover interacting proteins. Proteins 63, 822–831 (2006).
Article CAS PubMed Google Scholar
Thompson, J. N. The coevolving web of life. Am. Nat. 173, 125–140 (2009).
Article PubMed Google Scholar
Graña, O. et al. CASP6 assessment of contact prediction. Proteins 61 (Suppl. 7), 214–224 (2005).
Article PubMed CAS Google Scholar
Tress, M. L. & Valencia, A. Predicted residue–residue contacts can help the scoring of 3D models. Proteins 78, 1980–1991 (2010).
Article CAS PubMed Google Scholar
Sadowski, M. I., Maksimiak, K. & Taylor, W. R. Direct correlation analysis improves fold recognition. Comput. Biol. Chem. 35, 323–332 (2011).
Article CAS PubMed PubMed Central Google Scholar
Marks, D. S. et al. Protein 3D structure computed from evolutionary sequence variation. PLoS ONE 6, e28766 (2011).
Article CAS PubMed PubMed Central Google Scholar
Hopf, T. A. et al. Three-dimensional structures of membrane proteins from genomic sequencing. Cell 149, 1607–1621 (2012). This publication presents a new methodology for obtaining high quality de novo models of transmembrane proteins by integrating DCA (reference 38) predictions with various topological constraints.
Article CAS PubMed PubMed Central Google Scholar
Nugent, T. & Jones, D. T. Accurate de novo structure prediction of large transmembrane protein domains using fragment-assembly and correlated mutation analysis. Proc. Natl Acad. Sci. USA 109, E1540–E1547 (2012).
Article CAS PubMed PubMed Central Google Scholar
Sułkowska, J. I., Morcos, F., Weigt, M., Hwa, T. & Onuchic, J. N. Genomics-aided structure prediction. Proc. Natl Acad. Sci. USA 109, 10340–10345 (2012).
Article PubMed PubMed Central Google Scholar
Izarzugaza, J. M. G. et al. Characterization of pathogenic germline mutations in human protein kinases. BMC Bioinformatics 12 (Suppl. 4), S1 (2011).
Article CAS PubMed PubMed Central Google Scholar
Izarzugaza, J. M. G., del Pozo, A., Vazquez, M. & Valencia, A. Prioritization of pathogenic mutations in the protein kinase superfamily. BMC Genomics 13 (Suppl. 4), S3 (2012).
Article CAS PubMed PubMed Central Google Scholar
Bauer, B. et al. Effector recognition by the small GTP-binding proteins Ras and Ral. J. Biol. Chem. 274, 17763–17770 (1999).
Article CAS PubMed Google Scholar
Morillas, M. et al. Identification of conserved amino acid residues in rat liver carnitine palmitoyltransferase I critical for malonyl-CoA inhibition. Mutation of methionine 593 abolishes malonyl-CoA inhibition. J. Biol. Chem. 278, 9058–9063 (2003).
Article CAS PubMed Google Scholar
Hernanz-Falcón, P. et al. Identification of amino acid residues crucial for chemokine receptor dimerization. Nature Immunol. 5, 216–223 (2004).
Article CAS Google Scholar
Shenoy, S. K. et al. β-arrestin-dependent, G protein-independent ERK1/2 activation by the β2 adrenergic receptor. J. Biol. Chem. 281, 1261–1273 (2006).
Article CAS PubMed Google Scholar
Ribes-Zamora, A., Mihalek, I., Lichtarge, O. & Bertuch, A. A. Distinct faces of the Ku heterodimer mediate DNA repair and telomeric functions. Nature Struct. Mol. Biol. 14, 301–307 (2007).
Article CAS Google Scholar
Zamir, L. et al. Tight coevolution of proliferating cell nuclear antigen (PCNA)-partner interaction networks in fungi leads to interspecies network incompatibility. Proc. Natl Acad. Sci. USA 109, E406–E414 (2012).
Article CAS PubMed PubMed Central Google Scholar
Capra, E. J., Perchuk, B. S., Skerker, J. M. & Laub, M. T. Adaptive mutations that prevent crosstalk enable the expansion of paralogous signaling protein families. Cell 150, 222–232 (2012).
Article CAS PubMed PubMed Central Google Scholar
Gershoni, M. et al. Coevolution predicts direct interactions between mtDNA-encoded and nDNA-encoded subunits of oxidative phosphorylation complex i. J. Mol. Biol. 404, 158–171 (2010).
Article CAS PubMed Google Scholar
Ochoa, D. & Pazos, F. Studying the co-evolution of protein families with the MirrorTree web server. Bioinformatics 26, 1370–1371 (2010).
Article CAS PubMed Google Scholar
Edgar, R. S. et al. Peroxiredoxins are conserved markers of circadian rhythms. Nature 485, 459–464 (2012).
Article CAS PubMed PubMed Central Google Scholar
Watanabe, M. et al. Highly divergent sequences of the pollen self-incompatibility (S) gene in class-I S haplotypes of Brassica campestris (syn. rapa) L. FEBS Lett. 473, 139–144 (2000).
Article CAS PubMed Google Scholar
Clark, N. L. et al. Coevolution of interacting fertilization proteins. PLoS Genet. 5, e1000570 (2009).
Article PubMed PubMed Central CAS Google Scholar

Download references

Acknowledgements

We thank J. Onuchic from the University of California, San Diego, USA, D. Jones from the University College London, UK, C. Sander from the Computational Biology Center at the Memorial Sloan–Kettering Cancer Center, New York, USA, E. van Nimwegen from Biozentrum at the University of Basel, Switzerland, F. Gervasio and S. Marsili from the Computational Biophysics Group at CNIO, A. Rausell from the Swiss Institute of Bioinformatics Vital-IT & Institute of Microbiology of the University of Lausanne and D. Ochoa from the Computational Systems Biology Group at CNB–CSIC for interesting discussions, as well as the many authors and collaborators with important contributions to the field of molecular co-evolution in the past 20 years, many of which could not be included in this Review.

Author information

Authors and Affiliations

Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
David de Juan & Alfonso Valencia
Computational Systems Biology Group, National Centre for Biotechnology (CNB–CSIC), Madrid, Spain
Florencio Pazos

Authors

David de Juan
View author publications
You can also search for this author in PubMed Google Scholar
Florencio Pazos
View author publications
You can also search for this author in PubMed Google Scholar
Alfonso Valencia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alfonso Valencia.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary information S1 (boxes)

Technical information on co-evolution methods. (PDF 406 kb)

Supplementary Information S2 (table)

(PDF 122 kb)

Glossary

Molecular phylogenetics: The study of evolutionary phenomena using biomolecular data, generally in the form of sequences of nucleic acids or proteins.
Covarion model: A phylogenetic model in which the evolutionary rate of different codons are interdependent.
Protein family: A set of homologous proteins defined according to a given threshold of sequence similarity.
Homologues: Genes and proteins arisen from a common ancestor. In most cases, this common origin is traceable at the sequence level, albeit the sequence similarity can be very low and difficult to detect.
Correlated mutations: Relationship between two positions of a multiple sequence alignment in which the amino acid changes in one of the positions (mutational pattern) parallels that in the other.
Phylogenetic trees: Representations of the evolutionary relationships between a set of biological entities (such as proteins, genes or organisms).
Protein interfaces: Regions of the surface of a protein involved in the interaction with others.
Amino acid substitution matrix: A matrix containing, for every possible pair between the 20 canonical amino acids, a quantification of the 'interchangeability' of one by the other in the same protein site, as a proxy of the evolutionary feasibility of the corresponding change (mutation). They are often derived from curated sets of MSAs assumed to contain real representations of the amino acids allowed at a given protein site.
Benchmark: In bioinformatics, this term describes the assessment of the performance of a method using a set of examples of known outcome (the 'gold standard'), particularly by testing its predictive power relative to current best practice tools.
Clades: Groups of entities (such as genes or organisms) in a phylogenetic tree that have all arisen from a common ancestor.
Homology modelling: Protein structure prediction technique that, on the basis of the proven relationship between sequence similarity and structural similarity, models the three-dimensional structure of a protein based on the (experimentally determined) structure of a homologue (known as a 'template' in this context). Also known as 'comparative modelling'.
De novo protein modelling: Any approach for predicting protein structure that does not make use of information on other existing protein structures (such as those of homologues). Also known as 'ab initio modelling'.
Mutual information: In information theory, this is entropy-based formulation for quantifying the interdependence between the values of two random categorical variables.
Continuous-time Markov process: Process in which a system explores along time different states of a finite 'state space' in such a way that the Markov property is satisfied. This property means that the probability distribution of the system at a time point given the whole history of the process up to a previous time depends only on the state of the system at that previous time.
Monte Carlo algorithm: An algorithm based on simulated repeated random sampling to obtain approximate solutions to complex mathematical and statistical problems.
Heuristic approaches: Methods that makes use of approximations or assumptions so as to reduce the search space but that consequently do not ensure the exact solution to be found.
Bayesian network: Probabilistic model in which a set of random variables (nodes) and their conditional dependencies (directed edges) are arranged in a network representation.
Residue entropy: Quantification of the evolutionary variability of the position of a multiple sequence alignment corresponding to a given protein residue based on the 'entropy' parameter of information theory.
Orthologues: Homologous genes or proteins split in a speciation event, ending up in different organisms.
Horizontal gene transfer: (HGT). Transmission of genetic material between organisms different from that which occurs between the parents and the offspring ('vertical transfer'). Also known as 'lateral gene transfer'.
Principal component analysis: (PCA). Multivariate data analysis technique that consists of calculating a lower dimensionality space in which the axes explain most of the variability of the original data. The rationale is that such lower dimensionality space is easy to handle and to visualize, whereas most of the information of the original data (for example, in terms of relative distances) is retained and some contributions of noise are removed.
Multiple correspondence analysis: (MCA). Multivariate data analysis technique similar to principal component analysis but more suitable for categorical data.
Spectral decomposition: Decomposition of a squared matrix (A) as the product of its eigenvectors (V) times the diagonal matrix of its eigenvalues (D) times the inverse of its eigenvectors: A = V·D·V⁻¹. Also known as 'eigendecomposition'.
Paralogues: Homologous genes or proteins split in a gene duplication event, resulting in two copies of the parental gene in the same organism that latter diverge in sequence and function.
Protein domains: Pieces of a protein defined according to given criteria: for example, structural domains or functional domains.
Genetic saturation: Apparent reduction with time of the observed divergence between two genes owing to factors such as reversed or convergent mutations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

de Juan, D., Pazos, F. & Valencia, A. Emerging methods in protein co-evolution. Nat Rev Genet 14, 249–261 (2013). https://doi.org/10.1038/nrg3414

Download citation

Published: 05 March 2013
Issue Date: April 2013
DOI: https://doi.org/10.1038/nrg3414

This article is cited by

Enhancing coevolutionary signals in protein–protein interaction prediction through clade-wise alignment integration
- Tao Fang
- Damian Szklarczyk
- Christian von Mering
Scientific Reports (2024)
Adaptive evolution and co-evolution of chloroplast genomes in Pteridaceae species occupying different habitats: overlapping residues are always highly mutated
- Xiaolin Gu
- Lingling Li
- Ting Wang
BMC Plant Biology (2023)
Exploring complexity of class-A Beta-lactamase family using physiochemical-based multiplex networks
- Pradeep Bhadola
- Nivedita Deo
Scientific Reports (2023)
Customized multiple sequence alignment as an effective strategy to improve performance of Taq DNA polymerase
- Xinjia Li
- Binbin Chen
- Haoran Yu
Applied Microbiology and Biotechnology (2023)
Computer-aided molecular modeling and structural analysis of the human centromere protein–HIKM complex
- Henrietta Onyinye Uzoeto
- Samuel Cosmas
- Olanrewaju Ayodeji Durojaye
Beni-Suef University Journal of Basic and Applied Sciences (2022)