Co-evolution is an essential component of evolution that contributes to maintain the structure of ecological and molecular networks while allowing species, and proteins and genes, to change and adapt over time.
The signatures of co-evolution detected by computational methods in multiple sequence alignments of protein families are intimately related with physical and functional interactions.
Co-evolutionary methods are applied to two different levels: inter-residue correlations in single proteins and correlations between evolutionary patterns of protein pairs or protein collections. Some hybrid methods combine both levels.
A new generation of methods able to single-out direct interactions, by efficiently dealing with complex networks of correlations, has been successfully applied to the detection of protein interaction partners and to the construction of protein structure models.
Co-evolutionary methodology has been applied and in many cases combined with experimental approaches to: protein modelling, detection of binding sites, deciphering protein mechanisms of action, prediction of protein–protein interaction partners and reconstruction of protein complexes and interaction networks.
Co-evolution-based methods have been independently developed and up to now have been considered unrelated. This general Review of the field prompts us to think that unifying co-evolutionary methods under a common framework would be an important step forward in the understanding of the molecular basis of co-evolution.
Co-evolution is a fundamental component of the theory of evolution and is essential for understanding the relationships between species in complex ecological networks. A wide range of co-evolution-inspired computational methods has been designed to predict molecular interactions, but it is only recently that important advances have been made. Breakthroughs in the handling of phylogenetic information and in disentangling indirect relationships have resulted in an improved capacity to predict interactions between proteins and contacts between different protein residues. Here, we review the main co-evolution-based computational approaches, their theoretical basis, potential applications and foreseeable developments.
Subscribe to Journal
Get full journal access for 1 year
only $4.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Dobzhansky, T. Genetics of natural populations. XIX. Origin of heterosis through natural selection in populations of Drosophila pseudoobscura. Genetics 35, 288–302 (1950).
Wallace, B. On coadaptation in Drosophila. Am. Nat. 87, 343–358 (1953).
Ehrlich, P. & Raven, P. Butterflies and plants: a study in coevolution. Evolution 18, 586–608 (1964).
Thompson, J. N. The Coevolutionary Process (Univ. Chicago Press, 1994).
Burton, R. & Rawson, P. Genetic architecture of physiological phenotypes: empirical evidence for coadapted gene complexes. Amer. Zool. 39, 451–462 (1999).
Fitch, W. M. & Markowitz, E. An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolution. Biochem. Genet. 4, 579–593 (1970).
Göbel, U., Sander, C., Schneider, R. & Valencia, A. Correlated mutations and residue contacts in proteins. Proteins 18, 309–317 (1994). This paper describes one of the first automatic approaches for extracting correlated patterns of amino acid replacements between positions of MSAs with the goal of predicting residues close in three-dimensional structures.
Shindyalov, I. N., Kolchanov, N. A. & Sander, C. Can three-dimensional contacts in protein structures be predicted by analysis of correlated mutations? Protein Eng. 7, 349–358 (1994).
Taylor, W. R. & Hatrick, K. Compensating changes in protein multiple sequence alignments. Protein Eng. 7, 341–348 (1994).
Neher, E. How frequent are correlated changes in families of protein sequences? Proc. Natl Acad. Sci. USA 91, 98–102 (1994).
Pazos, F., Helmer-Citterich, M., Ausiello, G. & Valencia, A. Correlated mutations contain information about protein-protein interaction. J. Mol. Biol. 271, 511–523 (1997).
Tress, M. et al. Scoring docking models with evolutionary information. Proteins 60, 275–280 (2005).
Yeang, C.-H. & Haussler, D. Detecting coevolution in and among protein domains. PLoS Comp. Biol. 3, e211 (2007).
Burger, L. & van Nimwegen, E. Accurate prediction of protein–protein interactions from sequence alignments using a Bayesian method. Mol. Syst. Biol. 4, 165 (2008). Here, the authors present a parameter-free Bayesian method for predicting interaction partners from MSAs (eventually including paralogues) based on co-evolution between multiple positions of potential interacting partners.
Weigt, M., White, R. A., Szurmant, H., Hoch, J. A. & Hwa, T. Identification of direct residue contacts in protein–protein interaction by message passing. Proc. Natl Acad. Sci. USA 106, 67–72 (2009).
Schug, A., Weigt, M., Onuchic, J. N., Hwa, T. & Szurmant, H. High-resolution protein complexes from integrating genomic information with molecular simulation. Proc. Natl Acad. Sci. USA 106, 22124–22129 (2009).
Dago, A. E. et al. Structural basis of histidine kinase autophosphorylation deduced by integrating genomics, molecular dynamics, and mutagenesis. Proc. Natl Acad. Sci. USA 109, E1733–E1742 (2012).
Casari, G., Sander, C. & Valencia, A. A method to predict functional residues in proteins. Nature Struct. Biol. 2, 171–178 (1995). This is one of the original approaches detecting SDPs in MSAs. It is the basis for a family of methodologies that use PCA-related vectorial representations of the alignments to detect amino acid patterns associated with the corresponding protein subfamilies.
Lichtarge, O., Bourne, H. R. & Cohen, F. E. An evolutionary trace method defines binding surfaces common to protein families. J. Mol. Biol. 257, 342–358 (1996). This is the initial proposal of the evolutionary trace methodology. A simple analysis of differential sequence conservation at different levels of the family phylogenetic tree is used to locate protein-binding surfaces.
Goh, C. S., Bogan, A. A., Joachimiak, M., Walther, D. & Cohen, F. E. Co-evolution of proteins with their interaction partners. J. Mol. Biol. 299, 283–293 (2000).
Pazos, F. & Valencia, A. Similarity of phylogenetic trees as indicator of protein–protein interaction. Protein Eng. 14, 609–614 (2001). This is the initial publication of the 'MirrorTree' approach for the quantification of similarities of phylogenetic trees (represented by their distance matrices) to predict potential protein interactions.
Fryxell, K. J. The coevolution of gene family trees. Trends Genet. 12, 364–369 (1996).
Korber, B. T., Farber, R. M., Wolpert, D. H. & Lapedes, A. S. Covariation of mutations in the V3 loop of human immunodeficiency virus type 1 envelope protein: an information theoretic analysis. Proc. Natl Acad. Sci. USA 90, 7176–7180 (1993). This is one of the initial publications in the field of protein co-evolution. In this work, a mutual information method is used to detect co-evolving positions in a particular biological case.
Fodor, A. A. & Aldrich, R. W. Influence of conservation on calculations of amino acid covariance in multiple sequence alignments. Proteins 56, 211–221 (2004).
Dunn, S. D., Wahl, L. M. & Gloor, G. B. Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics 24, 333–340 (2008).
Olmea, O. & Valencia, A. Improving contact predictions by the combination of correlated mutations and other sources of sequence information. Fold. Des. 2, S25–S32 (1997).
Fares, M. A. & Travers, S. A. A. A novel method for detecting intramolecular coevolution: adding a further dimension to selective constraints analyses. Genetics 173, 9–23 (2006).
Tillier, E. R. M. & Lui, T. W. H. Using multiple interdependency to separate functional from phylogenetic correlations in protein alignments. Bioinformatics 19, 750–755 (2003).
Martin, L. C., Gloor, G. B., Dunn, S. D. & Wahl, L. M. Using information theory to search for co-evolving residues in proteins. Bioinformatics 21, 4116–4124 (2005).
Fairman, J. W. et al. Crystal structures of the outer membrane domain of intimin and invasin from enterohemorrhagic E. coli and enteropathogenic Y. pseudotuberculosis. Structure 20, 1233–1243 (2012).
Altschuh, D., Lesk, A. M., Bloomer, A. C. & Klug, A. Correlation of co-ordinated amino acid substitutions with function in viruses related to tobacco mosaic virus. J. Mol. Biol. 193, 693–707 (1987).
Oliveira, L., Paiva, A. C. M. & Vriend, G. Correlated mutation analyses on very large sequence families. Chembiochem 3, 1010–1017 (2002).
Fleishman, S. J., Yifrach, O. & Ben-Tal, N. An evolutionarily conserved network of amino acids mediates gating in voltage-dependent potassium channels. J. Mol. Biol. 340, 307–318 (2004).
Dutheil, J., Pupko, T., Jean-Marie, A. & Galtier, N. A model-based approach for detecting coevolving positions in a molecule. Mol. Biol. Evol. 22, 1919–1928 (2005).
Pollock, D. D., Taylor, W. R. & Goldman, N. Coevolving protein residues: maximum likelihood identification and relationship to structure. J. Mol. Biol. 287, 187–198 (1999).
Barker, D. & Pagel, M. Predicting functional gene links from phylogenetic-statistical analyses of whole genomes. PLoS Comp. Biol. 1, e3 (2005).
Lapedes, A. S., Giraud, B. G., Liu, L. C. & Stormo, G. D. Correlated mutations in models of protein sequences: phylogenetic and structural effects. Stat. Mol. Biol. Genet. 33, 236–256 (1999).
Morcos, F. et al. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc. Natl Acad. Sci. USA 108, E1293–E1301 (2011). This is an efficient methodology based on reference 15 to extract direct couplings between positions in MSAs that can obtain accurate predictions of physical contacts for many very large MSAs.
Jones, D. T., Buchan, D. W. A., Cozzetto, D. & Pontil, M. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics 28, 184–190 (2012). This article presents an innovative methodology using sparse inverse covariance estimation techniques to remove indirect couplings between residues in very large MSAs.
Balakrishnan, S., Kamisetty, H., Carbonell, J. G., Lee, S.-I. & Langmead, C. J. Learning generative models for protein fold families. Proteins 79, 1061–1078 (2011).
Burger, L. & van Nimwegen, E. Disentangling direct from indirect co-evolution of residues in protein alignments. PLoS Comp. Biol. 6, e1000633 (2010).
Sreekumar, J., Braak, ter, C. J. F., van Ham, R. C. H. J. & van Dijk, A. D. J. Correlated mutations via regularized multinomial regression. BMC Bioinformatics 12, 444 (2011).
Di Lena, P., Nagata, K. & Baldi, P. Deep architectures for protein contact map prediction. Bioinformatics 28, 2449–2457 (2012).
Juan, D., Pazos, F. & Valencia, A. Co-evolution and co-adaptation in protein networks. FEBS Lett. 582, 1225–1230 (2008).
Pazos, F. & Valencia, A. Protein co-evolution, co-adaptation and interactions. EMBO J. 27, 2648–2655 (2008).
Madabushi, S. et al. Structural clusters of evolutionary trace residues are statistically significant and common in proteins. J. Mol. Biol. 316, 139–154 (2002).
del Sol Mesa, A., Pazos, F. & Valencia, A. Automatic methods for predicting functionally important residues. J. Mol. Biol. 326, 1289–1302 (2003).
Rausell, A., Juan, D., Pazos, F. & Valencia, A. Protein interactions and ligand binding: from protein subfamilies to functional specificity. Proc. Natl Acad. Sci. USA 107, 1995–2000 (2010). This is a recent methodology for the automatic detection of subfamilies and SDPs in MSAs. The application of this method to a large set of protein families demonstrates the relation between SDPs and regions of functional importance for binding to specific interactors and substrates.
Rodriguez, G. J., Yao, R., Lichtarge, O. & Wensel, T. G. Evolution-guided discovery and recoding of allosteric pathway specificity determinants in psychoactive bioamine receptors. Proc. Natl Acad. Sci. USA 107, 7787–7792 (2010).
Socolich, M. et al. Evolutionary information for specifying a protein fold. Nature 437, 512–518 (2005).
Lockless, S. W. & Ranganathan, R. Evolutionarily conserved pathways of energetic connectivity in protein families. Science 286, 295–299 (1999).
Kass, I. & Horovitz, A. Mapping pathways of allosteric communication in GroEL by analysis of correlated mutations. Proteins 48, 611–617 (2002).
Süel, G. M., Lockless, S. W., Wall, M. A. & Ranganathan, R. Evolutionarily conserved networks of residues mediate allosteric communication in proteins. Nature Struct. Biol. 10, 59–69 (2003).
Reynolds, K. A., McLaughlin, R. N. & Ranganathan, R. Hot spots for allosteric regulation on protein surfaces. Cell 147, 1564–1575 (2011). This work demonstrates that mutations at surface residues predicted by SCAnew (a method based on reference 51) modify the activity of the active site of selected proteins by altering the chain of allosteric interactions.
Hannenhalli, S. S. & Russell, R. B. Analysis and prediction of functional sub-types from protein sequence alignments. J. Mol. Biol. 303, 61–76 (2000).
Mihalek, I., Res, I. & Lichtarge, O. A family of evolution-entropy hybrid methods for ranking protein residues by importance. J. Mol. Biol. 336, 1265–1282 (2004). An improved version of the evolutionary trace methodology (reference 19) that incorporates an entropy-based quantification of the conservation of each position in a MSA for the different partitions of the corresponding family phylogenetic tree.
Mirny, L. A. & Gelfand, M. S. Using orthologous and paralogous proteins to identify specificity-determining residues in bacterial transcription factors. J. Mol. Biol. 321, 7–20 (2002).
Kalinina, O. V., Gelfand, M. S. & Russell, R. B. Combining specificity determining and conserved residues improves functional site prediction. BMC Bioinformatics 10, 174 (2009).
Landgraf, R., Xenarios, I. & Eisenberg, D. Three-dimensional cluster analysis identifies interfaces and functional residue clusters in proteins. J. Mol. Biol. 307, 1487–1502 (2001).
Reva, B., Antipin, Y. & Sander, C. Determinants of protein function revealed by combinatorial entropy optimization. Genome Biol. 8, R232 (2007).
Marttinen, P., Corander, J., Törönen, P. & Holm, L. Bayesian search of functionally divergent protein subgroups and their function specific residues. Bioinformatics 22, 2466–2474 (2006).
Halabi, N., Rivoire, O., Leibler, S. & Ranganathan, R. Protein sectors: evolutionary units of three-dimensional structure. Cell 138, 774–786 (2009). This work shows that SCAnew (reference 54) can detect 'protein sectors' (that is, pseudo-independent groups of correlated positions of the MSA) that are related to the structural and functional organization of proteins in a selected number of examples.
Brown, C. A. & Brown, K. S. Validation of coevolving residue algorithms via pipeline sensitivity analysis: ELSC and OMES and ZNMI, oh my! PLoS ONE 5, e10779 (2010).
Harrington, E. D., Jensen, L. J. & Bork, P. Predicting biological networks from genomic data. FEBS Lett. 582, 1251–1258 (2008).
Wass, M. N., David, A. & Sternberg, M. J. Challenges for the prediction of macromolecular interactions. Curr. Opin. Struct. Biol. 21, 382–390 (2011).
von Mering, C. et al. STRING: a database of predicted functional associations between proteins. Nucleic Acids Res. 31, 258–261 (2003).
Pazos, F., Ranea, J. A. G., Juan, D. & Sternberg, M. J. E. Assessing protein co-evolution in the context of the tree of life assists in the prediction of the interactome. J. Mol. Biol. 352, 1002–1015 (2005).
Sato, T., Yamanishi, Y., Kanehisa, M. & Toh, H. The inference of protein–protein interactions by co-evolutionary analysis is improved by excluding the information about the phylogenetic relationships. Bioinformatics 21, 3482–3489 (2005).
Kann, M. G., Jothi, R., Cherukuri, P. F. & Przytycka, T. M. Predicting protein domain interactions from coevolution of conserved regions. Proteins 67, 811–820 (2007).
Sato, T., Yamanishi, Y., Horimoto, K., Kanehisa, M. & Toh, H. Partial correlation coefficient between distance matrices as a new indicator of protein–protein interactions. Bioinformatics 22, 2488–2492 (2006).
Juan, D., Pazos, F. & Valencia, A. High-confidence prediction of global interactomes based on genome-wide coevolutionary networks. Proc. Natl Acad. Sci. USA 105, 934–939 (2008). This methodology relies on the whole set of pairwise similarities between phylogenetic trees within a given proteome (co-evolutionary network) to reassess the co-evolutionary signal of every pair of proteins. The method predicts interactions at the level of macromolecular complexes and functional units for fully sequenced genomes.
Herman, D. et al. Selection of organisms for the co-evolution-based study of protein interactions. BMC Bioinformatics 12, 363 (2011).
Choi, K. & Gomez, S. M. Comparison of phylogenetic trees through alignment of embedded evolutionary distances. BMC Bioinformatics 10, 423 (2009).
Tillier, E. R. M. & Charlebois, R. L. The human protein coevolution network. Genome Res. 19, 1861–1871 (2009).
Ramani, A. K. & Marcotte, E. M. Exploiting the co-evolution of interacting proteins to discover interaction specificity. J. Mol. Biol. 327, 273–284 (2003).
Jothi, R., Kann, M. G. & Przytycka, T. M. Predicting protein–protein interaction by searching evolutionary tree automorphism space. Bioinformatics 21 (Suppl. 1), i241–i250 (2005).
Izarzugaza, J. M., Juan, D., Pons, C., Pazos, F. & Valencia, A. Enhancing the prediction of protein pairings between interacting families using orthology information. BMC Bioinformatics 9, 35 (2008).
Jothi, R., Cherukuri, P. F., Tasneem, A. & Przytycka, T. M. Co-evolutionary analysis of domains in interacting proteins reveals insights into domain–domain interactions mediating protein–protein interactions. J. Mol. Biol. 362, 861–875 (2006).
Kann, M. G., Shoemaker, B. A., Panchenko, A. R. & Przytycka, T. M. Correlated evolution of interacting proteins: looking behind the MirrorTree. J. Mol. Biol. 385, 91–98 (2009).
Pellegrini, M., Marcotte, E. M., Thompson, M. J., Eisenberg, D. & Yeates, T. O. Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc. Natl Acad. Sci. USA 96, 4285–4288 (1999).
Gaasterland, T. & Ragan, M. A. Microbial genescapes: phyletic and functional patterns of ORF distribution among prokaryotes. Microb. Comp. Genom. 3, 199–217 (1998).
Date, S. V. & Marcotte, E. M. Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages. Nature Biotech. 21, 1055–1062 (2003).
Ranea, J. A. G., Yeats, C., Grant, A. & Orengo, C. A. Predicting protein function with hierarchical phylogenetic profiles: the Gene3D Phylo-Tuner method applied to eukaryotic genomes. PLoS Comp. Biol. 3, e237 (2007).
Zhou, Y., Wang, R., Li, L., Xia, X. & Sun, Z. Inferring functional linkages between proteins from evolutionary scenarios. J. Mol. Biol. 359, 1150–1159 (2006).
Ta, H. X., Koskinen, P. & Holm, L. A novel method for assigning functional linkages to proteins using enhanced phylogenetic trees. Bioinformatics 27, 700–706 (2011).
Sun, J. et al. Refined phylogenetic profiles method for predicting protein-protein interactions. Bioinformatics 21, 3409–3415 (2005).
Jothi, R., Przytycka, T. M. & Aravind, L. Discovering functional linkages and uncharacterized cellular pathways using phylogenetic profile comparisons: a comprehensive assessment. BMC Bioinformatics 8, 173 (2007).
Pazos, F. & Valencia, A. In silico two-hybrid system for the selection of physically interacting protein pairs. Proteins 47, 219–227 (2002).
Tillier, E. R. M., Biro, L., Li, G. & Tillo, D. Codep: maximizing co-evolutionary interdependencies to discover interacting proteins. Proteins 63, 822–831 (2006).
Thompson, J. N. The coevolving web of life. Am. Nat. 173, 125–140 (2009).
Graña, O. et al. CASP6 assessment of contact prediction. Proteins 61 (Suppl. 7), 214–224 (2005).
Tress, M. L. & Valencia, A. Predicted residue–residue contacts can help the scoring of 3D models. Proteins 78, 1980–1991 (2010).
Sadowski, M. I., Maksimiak, K. & Taylor, W. R. Direct correlation analysis improves fold recognition. Comput. Biol. Chem. 35, 323–332 (2011).
Marks, D. S. et al. Protein 3D structure computed from evolutionary sequence variation. PLoS ONE 6, e28766 (2011).
Hopf, T. A. et al. Three-dimensional structures of membrane proteins from genomic sequencing. Cell 149, 1607–1621 (2012). This publication presents a new methodology for obtaining high quality de novo models of transmembrane proteins by integrating DCA (reference 38) predictions with various topological constraints.
Nugent, T. & Jones, D. T. Accurate de novo structure prediction of large transmembrane protein domains using fragment-assembly and correlated mutation analysis. Proc. Natl Acad. Sci. USA 109, E1540–E1547 (2012).
Sułkowska, J. I., Morcos, F., Weigt, M., Hwa, T. & Onuchic, J. N. Genomics-aided structure prediction. Proc. Natl Acad. Sci. USA 109, 10340–10345 (2012).
Izarzugaza, J. M. G. et al. Characterization of pathogenic germline mutations in human protein kinases. BMC Bioinformatics 12 (Suppl. 4), S1 (2011).
Izarzugaza, J. M. G., del Pozo, A., Vazquez, M. & Valencia, A. Prioritization of pathogenic mutations in the protein kinase superfamily. BMC Genomics 13 (Suppl. 4), S3 (2012).
Bauer, B. et al. Effector recognition by the small GTP-binding proteins Ras and Ral. J. Biol. Chem. 274, 17763–17770 (1999).
Morillas, M. et al. Identification of conserved amino acid residues in rat liver carnitine palmitoyltransferase I critical for malonyl-CoA inhibition. Mutation of methionine 593 abolishes malonyl-CoA inhibition. J. Biol. Chem. 278, 9058–9063 (2003).
Hernanz-Falcón, P. et al. Identification of amino acid residues crucial for chemokine receptor dimerization. Nature Immunol. 5, 216–223 (2004).
Shenoy, S. K. et al. β-arrestin-dependent, G protein-independent ERK1/2 activation by the β2 adrenergic receptor. J. Biol. Chem. 281, 1261–1273 (2006).
Ribes-Zamora, A., Mihalek, I., Lichtarge, O. & Bertuch, A. A. Distinct faces of the Ku heterodimer mediate DNA repair and telomeric functions. Nature Struct. Mol. Biol. 14, 301–307 (2007).
Zamir, L. et al. Tight coevolution of proliferating cell nuclear antigen (PCNA)-partner interaction networks in fungi leads to interspecies network incompatibility. Proc. Natl Acad. Sci. USA 109, E406–E414 (2012).
Capra, E. J., Perchuk, B. S., Skerker, J. M. & Laub, M. T. Adaptive mutations that prevent crosstalk enable the expansion of paralogous signaling protein families. Cell 150, 222–232 (2012).
Gershoni, M. et al. Coevolution predicts direct interactions between mtDNA-encoded and nDNA-encoded subunits of oxidative phosphorylation complex i. J. Mol. Biol. 404, 158–171 (2010).
Ochoa, D. & Pazos, F. Studying the co-evolution of protein families with the MirrorTree web server. Bioinformatics 26, 1370–1371 (2010).
Edgar, R. S. et al. Peroxiredoxins are conserved markers of circadian rhythms. Nature 485, 459–464 (2012).
Watanabe, M. et al. Highly divergent sequences of the pollen self-incompatibility (S) gene in class-I S haplotypes of Brassica campestris (syn. rapa) L. FEBS Lett. 473, 139–144 (2000).
Clark, N. L. et al. Coevolution of interacting fertilization proteins. PLoS Genet. 5, e1000570 (2009).
We thank J. Onuchic from the University of California, San Diego, USA, D. Jones from the University College London, UK, C. Sander from the Computational Biology Center at the Memorial Sloan–Kettering Cancer Center, New York, USA, E. van Nimwegen from Biozentrum at the University of Basel, Switzerland, F. Gervasio and S. Marsili from the Computational Biophysics Group at CNIO, A. Rausell from the Swiss Institute of Bioinformatics Vital-IT & Institute of Microbiology of the University of Lausanne and D. Ochoa from the Computational Systems Biology Group at CNB–CSIC for interesting discussions, as well as the many authors and collaborators with important contributions to the field of molecular co-evolution in the past 20 years, many of which could not be included in this Review.
The authors declare no competing financial interests.
- Molecular phylogenetics
The study of evolutionary phenomena using biomolecular data, generally in the form of sequences of nucleic acids or proteins.
- Covarion model
A phylogenetic model in which the evolutionary rate of different codons are interdependent.
- Protein family
A set of homologous proteins defined according to a given threshold of sequence similarity.
Genes and proteins arisen from a common ancestor. In most cases, this common origin is traceable at the sequence level, albeit the sequence similarity can be very low and difficult to detect.
- Correlated mutations
Relationship between two positions of a multiple sequence alignment in which the amino acid changes in one of the positions (mutational pattern) parallels that in the other.
- Phylogenetic trees
Representations of the evolutionary relationships between a set of biological entities (such as proteins, genes or organisms).
- Protein interfaces
Regions of the surface of a protein involved in the interaction with others.
- Amino acid substitution matrix
A matrix containing, for every possible pair between the 20 canonical amino acids, a quantification of the 'interchangeability' of one by the other in the same protein site, as a proxy of the evolutionary feasibility of the corresponding change (mutation). They are often derived from curated sets of MSAs assumed to contain real representations of the amino acids allowed at a given protein site.
In bioinformatics, this term describes the assessment of the performance of a method using a set of examples of known outcome (the 'gold standard'), particularly by testing its predictive power relative to current best practice tools.
Groups of entities (such as genes or organisms) in a phylogenetic tree that have all arisen from a common ancestor.
- Homology modelling
Protein structure prediction technique that, on the basis of the proven relationship between sequence similarity and structural similarity, models the three-dimensional structure of a protein based on the (experimentally determined) structure of a homologue (known as a 'template' in this context). Also known as 'comparative modelling'.
- De novo protein modelling
Any approach for predicting protein structure that does not make use of information on other existing protein structures (such as those of homologues). Also known as 'ab initio modelling'.
- Mutual information
In information theory, this is entropy-based formulation for quantifying the interdependence between the values of two random categorical variables.
- Continuous-time Markov process
Process in which a system explores along time different states of a finite 'state space' in such a way that the Markov property is satisfied. This property means that the probability distribution of the system at a time point given the whole history of the process up to a previous time depends only on the state of the system at that previous time.
- Monte Carlo algorithm
An algorithm based on simulated repeated random sampling to obtain approximate solutions to complex mathematical and statistical problems.
- Heuristic approaches
Methods that makes use of approximations or assumptions so as to reduce the search space but that consequently do not ensure the exact solution to be found.
- Bayesian network
Probabilistic model in which a set of random variables (nodes) and their conditional dependencies (directed edges) are arranged in a network representation.
- Residue entropy
Quantification of the evolutionary variability of the position of a multiple sequence alignment corresponding to a given protein residue based on the 'entropy' parameter of information theory.
Homologous genes or proteins split in a speciation event, ending up in different organisms.
- Horizontal gene transfer
(HGT). Transmission of genetic material between organisms different from that which occurs between the parents and the offspring ('vertical transfer'). Also known as 'lateral gene transfer'.
- Principal component analysis
(PCA). Multivariate data analysis technique that consists of calculating a lower dimensionality space in which the axes explain most of the variability of the original data. The rationale is that such lower dimensionality space is easy to handle and to visualize, whereas most of the information of the original data (for example, in terms of relative distances) is retained and some contributions of noise are removed.
- Multiple correspondence analysis
(MCA). Multivariate data analysis technique similar to principal component analysis but more suitable for categorical data.
- Spectral decomposition
Decomposition of a squared matrix (A) as the product of its eigenvectors (V) times the diagonal matrix of its eigenvalues (D) times the inverse of its eigenvectors: A = V·D·V−1. Also known as 'eigendecomposition'.
Homologous genes or proteins split in a gene duplication event, resulting in two copies of the parental gene in the same organism that latter diverge in sequence and function.
- Protein domains
Pieces of a protein defined according to given criteria: for example, structural domains or functional domains.
- Genetic saturation
Apparent reduction with time of the observed divergence between two genes owing to factors such as reversed or convergent mutations.
About this article
Cite this article
de Juan, D., Pazos, F. & Valencia, A. Emerging methods in protein co-evolution. Nat Rev Genet 14, 249–261 (2013). https://doi.org/10.1038/nrg3414
Nature Reviews Genetics (2021)
Nature Machine Intelligence (2021)
Scientific Reports (2021)
Trivial and nontrivial error sources account for misidentification of protein partners in mutual information approaches
Scientific Reports (2021)
CopulaNet: Learning residue co-evolution directly from multiple sequence alignment for protein structure prediction
Nature Communications (2021)