Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Emerging methods in protein co-evolution

Key Points

  • Co-evolution is an essential component of evolution that contributes to maintain the structure of ecological and molecular networks while allowing species, and proteins and genes, to change and adapt over time.

  • The signatures of co-evolution detected by computational methods in multiple sequence alignments of protein families are intimately related with physical and functional interactions.

  • Co-evolutionary methods are applied to two different levels: inter-residue correlations in single proteins and correlations between evolutionary patterns of protein pairs or protein collections. Some hybrid methods combine both levels.

  • A new generation of methods able to single-out direct interactions, by efficiently dealing with complex networks of correlations, has been successfully applied to the detection of protein interaction partners and to the construction of protein structure models.

  • Co-evolutionary methodology has been applied and in many cases combined with experimental approaches to: protein modelling, detection of binding sites, deciphering protein mechanisms of action, prediction of protein–protein interaction partners and reconstruction of protein complexes and interaction networks.

  • Co-evolution-based methods have been independently developed and up to now have been considered unrelated. This general Review of the field prompts us to think that unifying co-evolutionary methods under a common framework would be an important step forward in the understanding of the molecular basis of co-evolution.


Co-evolution is a fundamental component of the theory of evolution and is essential for understanding the relationships between species in complex ecological networks. A wide range of co-evolution-inspired computational methods has been designed to predict molecular interactions, but it is only recently that important advances have been made. Breakthroughs in the handling of phylogenetic information and in disentangling indirect relationships have resulted in an improved capacity to predict interactions between proteins and contacts between different protein residues. Here, we review the main co-evolution-based computational approaches, their theoretical basis, potential applications and foreseeable developments.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Co-evolutionary features extracted from protein multiple sequence alignments.
Figure 2: Influence of phylogenetic history in the association of co-evolution and different types of molecular interactions.


  1. 1

    Dobzhansky, T. Genetics of natural populations. XIX. Origin of heterosis through natural selection in populations of Drosophila pseudoobscura. Genetics 35, 288–302 (1950).

    CAS  PubMed  PubMed Central  Google Scholar 

  2. 2

    Wallace, B. On coadaptation in Drosophila. Am. Nat. 87, 343–358 (1953).

    Google Scholar 

  3. 3

    Ehrlich, P. & Raven, P. Butterflies and plants: a study in coevolution. Evolution 18, 586–608 (1964).

    Google Scholar 

  4. 4

    Thompson, J. N. The Coevolutionary Process (Univ. Chicago Press, 1994).

    Google Scholar 

  5. 5

    Burton, R. & Rawson, P. Genetic architecture of physiological phenotypes: empirical evidence for coadapted gene complexes. Amer. Zool. 39, 451–462 (1999).

    CAS  Google Scholar 

  6. 6

    Fitch, W. M. & Markowitz, E. An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolution. Biochem. Genet. 4, 579–593 (1970).

    CAS  PubMed  Google Scholar 

  7. 7

    Göbel, U., Sander, C., Schneider, R. & Valencia, A. Correlated mutations and residue contacts in proteins. Proteins 18, 309–317 (1994). This paper describes one of the first automatic approaches for extracting correlated patterns of amino acid replacements between positions of MSAs with the goal of predicting residues close in three-dimensional structures.

    PubMed  Google Scholar 

  8. 8

    Shindyalov, I. N., Kolchanov, N. A. & Sander, C. Can three-dimensional contacts in protein structures be predicted by analysis of correlated mutations? Protein Eng. 7, 349–358 (1994).

    CAS  PubMed  Google Scholar 

  9. 9

    Taylor, W. R. & Hatrick, K. Compensating changes in protein multiple sequence alignments. Protein Eng. 7, 341–348 (1994).

    CAS  PubMed  Google Scholar 

  10. 10

    Neher, E. How frequent are correlated changes in families of protein sequences? Proc. Natl Acad. Sci. USA 91, 98–102 (1994).

    CAS  PubMed  Google Scholar 

  11. 11

    Pazos, F., Helmer-Citterich, M., Ausiello, G. & Valencia, A. Correlated mutations contain information about protein-protein interaction. J. Mol. Biol. 271, 511–523 (1997).

    CAS  PubMed  Google Scholar 

  12. 12

    Tress, M. et al. Scoring docking models with evolutionary information. Proteins 60, 275–280 (2005).

    CAS  PubMed  Google Scholar 

  13. 13

    Yeang, C.-H. & Haussler, D. Detecting coevolution in and among protein domains. PLoS Comp. Biol. 3, e211 (2007).

    Google Scholar 

  14. 14

    Burger, L. & van Nimwegen, E. Accurate prediction of protein–protein interactions from sequence alignments using a Bayesian method. Mol. Syst. Biol. 4, 165 (2008). Here, the authors present a parameter-free Bayesian method for predicting interaction partners from MSAs (eventually including paralogues) based on co-evolution between multiple positions of potential interacting partners.

    PubMed  PubMed Central  Google Scholar 

  15. 15

    Weigt, M., White, R. A., Szurmant, H., Hoch, J. A. & Hwa, T. Identification of direct residue contacts in protein–protein interaction by message passing. Proc. Natl Acad. Sci. USA 106, 67–72 (2009).

    CAS  PubMed  Google Scholar 

  16. 16

    Schug, A., Weigt, M., Onuchic, J. N., Hwa, T. & Szurmant, H. High-resolution protein complexes from integrating genomic information with molecular simulation. Proc. Natl Acad. Sci. USA 106, 22124–22129 (2009).

    CAS  PubMed  Google Scholar 

  17. 17

    Dago, A. E. et al. Structural basis of histidine kinase autophosphorylation deduced by integrating genomics, molecular dynamics, and mutagenesis. Proc. Natl Acad. Sci. USA 109, E1733–E1742 (2012).

    CAS  PubMed  Google Scholar 

  18. 18

    Casari, G., Sander, C. & Valencia, A. A method to predict functional residues in proteins. Nature Struct. Biol. 2, 171–178 (1995). This is one of the original approaches detecting SDPs in MSAs. It is the basis for a family of methodologies that use PCA-related vectorial representations of the alignments to detect amino acid patterns associated with the corresponding protein subfamilies.

    CAS  PubMed  Google Scholar 

  19. 19

    Lichtarge, O., Bourne, H. R. & Cohen, F. E. An evolutionary trace method defines binding surfaces common to protein families. J. Mol. Biol. 257, 342–358 (1996). This is the initial proposal of the evolutionary trace methodology. A simple analysis of differential sequence conservation at different levels of the family phylogenetic tree is used to locate protein-binding surfaces.

    CAS  PubMed  Google Scholar 

  20. 20

    Goh, C. S., Bogan, A. A., Joachimiak, M., Walther, D. & Cohen, F. E. Co-evolution of proteins with their interaction partners. J. Mol. Biol. 299, 283–293 (2000).

    CAS  PubMed  Google Scholar 

  21. 21

    Pazos, F. & Valencia, A. Similarity of phylogenetic trees as indicator of protein–protein interaction. Protein Eng. 14, 609–614 (2001). This is the initial publication of the 'MirrorTree' approach for the quantification of similarities of phylogenetic trees (represented by their distance matrices) to predict potential protein interactions.

    CAS  PubMed  Google Scholar 

  22. 22

    Fryxell, K. J. The coevolution of gene family trees. Trends Genet. 12, 364–369 (1996).

    CAS  PubMed  Google Scholar 

  23. 23

    Korber, B. T., Farber, R. M., Wolpert, D. H. & Lapedes, A. S. Covariation of mutations in the V3 loop of human immunodeficiency virus type 1 envelope protein: an information theoretic analysis. Proc. Natl Acad. Sci. USA 90, 7176–7180 (1993). This is one of the initial publications in the field of protein co-evolution. In this work, a mutual information method is used to detect co-evolving positions in a particular biological case.

    CAS  PubMed  Google Scholar 

  24. 24

    Fodor, A. A. & Aldrich, R. W. Influence of conservation on calculations of amino acid covariance in multiple sequence alignments. Proteins 56, 211–221 (2004).

    CAS  PubMed  Google Scholar 

  25. 25

    Dunn, S. D., Wahl, L. M. & Gloor, G. B. Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics 24, 333–340 (2008).

    CAS  PubMed  Google Scholar 

  26. 26

    Olmea, O. & Valencia, A. Improving contact predictions by the combination of correlated mutations and other sources of sequence information. Fold. Des. 2, S25–S32 (1997).

    CAS  PubMed  Google Scholar 

  27. 27

    Fares, M. A. & Travers, S. A. A. A novel method for detecting intramolecular coevolution: adding a further dimension to selective constraints analyses. Genetics 173, 9–23 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  28. 28

    Tillier, E. R. M. & Lui, T. W. H. Using multiple interdependency to separate functional from phylogenetic correlations in protein alignments. Bioinformatics 19, 750–755 (2003).

    CAS  PubMed  Google Scholar 

  29. 29

    Martin, L. C., Gloor, G. B., Dunn, S. D. & Wahl, L. M. Using information theory to search for co-evolving residues in proteins. Bioinformatics 21, 4116–4124 (2005).

    CAS  PubMed  Google Scholar 

  30. 30

    Fairman, J. W. et al. Crystal structures of the outer membrane domain of intimin and invasin from enterohemorrhagic E. coli and enteropathogenic Y. pseudotuberculosis. Structure 20, 1233–1243 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. 31

    Altschuh, D., Lesk, A. M., Bloomer, A. C. & Klug, A. Correlation of co-ordinated amino acid substitutions with function in viruses related to tobacco mosaic virus. J. Mol. Biol. 193, 693–707 (1987).

    CAS  PubMed  Google Scholar 

  32. 32

    Oliveira, L., Paiva, A. C. M. & Vriend, G. Correlated mutation analyses on very large sequence families. Chembiochem 3, 1010–1017 (2002).

    CAS  PubMed  Google Scholar 

  33. 33

    Fleishman, S. J., Yifrach, O. & Ben-Tal, N. An evolutionarily conserved network of amino acids mediates gating in voltage-dependent potassium channels. J. Mol. Biol. 340, 307–318 (2004).

    CAS  PubMed  Google Scholar 

  34. 34

    Dutheil, J., Pupko, T., Jean-Marie, A. & Galtier, N. A model-based approach for detecting coevolving positions in a molecule. Mol. Biol. Evol. 22, 1919–1928 (2005).

    CAS  PubMed  Google Scholar 

  35. 35

    Pollock, D. D., Taylor, W. R. & Goldman, N. Coevolving protein residues: maximum likelihood identification and relationship to structure. J. Mol. Biol. 287, 187–198 (1999).

    CAS  PubMed  Google Scholar 

  36. 36

    Barker, D. & Pagel, M. Predicting functional gene links from phylogenetic-statistical analyses of whole genomes. PLoS Comp. Biol. 1, e3 (2005).

    Google Scholar 

  37. 37

    Lapedes, A. S., Giraud, B. G., Liu, L. C. & Stormo, G. D. Correlated mutations in models of protein sequences: phylogenetic and structural effects. Stat. Mol. Biol. Genet. 33, 236–256 (1999).

    Google Scholar 

  38. 38

    Morcos, F. et al. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc. Natl Acad. Sci. USA 108, E1293–E1301 (2011). This is an efficient methodology based on reference 15 to extract direct couplings between positions in MSAs that can obtain accurate predictions of physical contacts for many very large MSAs.

    CAS  PubMed  Google Scholar 

  39. 39

    Jones, D. T., Buchan, D. W. A., Cozzetto, D. & Pontil, M. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics 28, 184–190 (2012). This article presents an innovative methodology using sparse inverse covariance estimation techniques to remove indirect couplings between residues in very large MSAs.

    CAS  PubMed  Google Scholar 

  40. 40

    Balakrishnan, S., Kamisetty, H., Carbonell, J. G., Lee, S.-I. & Langmead, C. J. Learning generative models for protein fold families. Proteins 79, 1061–1078 (2011).

    CAS  PubMed  Google Scholar 

  41. 41

    Burger, L. & van Nimwegen, E. Disentangling direct from indirect co-evolution of residues in protein alignments. PLoS Comp. Biol. 6, e1000633 (2010).

    Google Scholar 

  42. 42

    Sreekumar, J., Braak, ter, C. J. F., van Ham, R. C. H. J. & van Dijk, A. D. J. Correlated mutations via regularized multinomial regression. BMC Bioinformatics 12, 444 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  43. 43

    Di Lena, P., Nagata, K. & Baldi, P. Deep architectures for protein contact map prediction. Bioinformatics 28, 2449–2457 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  44. 44

    Juan, D., Pazos, F. & Valencia, A. Co-evolution and co-adaptation in protein networks. FEBS Lett. 582, 1225–1230 (2008).

    CAS  PubMed  Google Scholar 

  45. 45

    Pazos, F. & Valencia, A. Protein co-evolution, co-adaptation and interactions. EMBO J. 27, 2648–2655 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  46. 46

    Madabushi, S. et al. Structural clusters of evolutionary trace residues are statistically significant and common in proteins. J. Mol. Biol. 316, 139–154 (2002).

    CAS  PubMed  Google Scholar 

  47. 47

    del Sol Mesa, A., Pazos, F. & Valencia, A. Automatic methods for predicting functionally important residues. J. Mol. Biol. 326, 1289–1302 (2003).

    CAS  PubMed  Google Scholar 

  48. 48

    Rausell, A., Juan, D., Pazos, F. & Valencia, A. Protein interactions and ligand binding: from protein subfamilies to functional specificity. Proc. Natl Acad. Sci. USA 107, 1995–2000 (2010). This is a recent methodology for the automatic detection of subfamilies and SDPs in MSAs. The application of this method to a large set of protein families demonstrates the relation between SDPs and regions of functional importance for binding to specific interactors and substrates.

    CAS  PubMed  Google Scholar 

  49. 49

    Rodriguez, G. J., Yao, R., Lichtarge, O. & Wensel, T. G. Evolution-guided discovery and recoding of allosteric pathway specificity determinants in psychoactive bioamine receptors. Proc. Natl Acad. Sci. USA 107, 7787–7792 (2010).

    CAS  PubMed  Google Scholar 

  50. 50

    Socolich, M. et al. Evolutionary information for specifying a protein fold. Nature 437, 512–518 (2005).

    CAS  PubMed  Google Scholar 

  51. 51

    Lockless, S. W. & Ranganathan, R. Evolutionarily conserved pathways of energetic connectivity in protein families. Science 286, 295–299 (1999).

    CAS  PubMed  Google Scholar 

  52. 52

    Kass, I. & Horovitz, A. Mapping pathways of allosteric communication in GroEL by analysis of correlated mutations. Proteins 48, 611–617 (2002).

    CAS  PubMed  Google Scholar 

  53. 53

    Süel, G. M., Lockless, S. W., Wall, M. A. & Ranganathan, R. Evolutionarily conserved networks of residues mediate allosteric communication in proteins. Nature Struct. Biol. 10, 59–69 (2003).

    PubMed  Google Scholar 

  54. 54

    Reynolds, K. A., McLaughlin, R. N. & Ranganathan, R. Hot spots for allosteric regulation on protein surfaces. Cell 147, 1564–1575 (2011). This work demonstrates that mutations at surface residues predicted by SCAnew (a method based on reference 51) modify the activity of the active site of selected proteins by altering the chain of allosteric interactions.

    CAS  PubMed  PubMed Central  Google Scholar 

  55. 55

    Hannenhalli, S. S. & Russell, R. B. Analysis and prediction of functional sub-types from protein sequence alignments. J. Mol. Biol. 303, 61–76 (2000).

    CAS  PubMed  Google Scholar 

  56. 56

    Mihalek, I., Res, I. & Lichtarge, O. A family of evolution-entropy hybrid methods for ranking protein residues by importance. J. Mol. Biol. 336, 1265–1282 (2004). An improved version of the evolutionary trace methodology (reference 19) that incorporates an entropy-based quantification of the conservation of each position in a MSA for the different partitions of the corresponding family phylogenetic tree.

    CAS  PubMed  Google Scholar 

  57. 57

    Mirny, L. A. & Gelfand, M. S. Using orthologous and paralogous proteins to identify specificity-determining residues in bacterial transcription factors. J. Mol. Biol. 321, 7–20 (2002).

    CAS  PubMed  Google Scholar 

  58. 58

    Kalinina, O. V., Gelfand, M. S. & Russell, R. B. Combining specificity determining and conserved residues improves functional site prediction. BMC Bioinformatics 10, 174 (2009).

    PubMed  PubMed Central  Google Scholar 

  59. 59

    Landgraf, R., Xenarios, I. & Eisenberg, D. Three-dimensional cluster analysis identifies interfaces and functional residue clusters in proteins. J. Mol. Biol. 307, 1487–1502 (2001).

    CAS  PubMed  Google Scholar 

  60. 60

    Reva, B., Antipin, Y. & Sander, C. Determinants of protein function revealed by combinatorial entropy optimization. Genome Biol. 8, R232 (2007).

    PubMed  PubMed Central  Google Scholar 

  61. 61

    Marttinen, P., Corander, J., Törönen, P. & Holm, L. Bayesian search of functionally divergent protein subgroups and their function specific residues. Bioinformatics 22, 2466–2474 (2006).

    CAS  PubMed  Google Scholar 

  62. 62

    Halabi, N., Rivoire, O., Leibler, S. & Ranganathan, R. Protein sectors: evolutionary units of three-dimensional structure. Cell 138, 774–786 (2009). This work shows that SCAnew (reference 54) can detect 'protein sectors' (that is, pseudo-independent groups of correlated positions of the MSA) that are related to the structural and functional organization of proteins in a selected number of examples.

    CAS  PubMed  PubMed Central  Google Scholar 

  63. 63

    Brown, C. A. & Brown, K. S. Validation of coevolving residue algorithms via pipeline sensitivity analysis: ELSC and OMES and ZNMI, oh my! PLoS ONE 5, e10779 (2010).

    PubMed  PubMed Central  Google Scholar 

  64. 64

    Harrington, E. D., Jensen, L. J. & Bork, P. Predicting biological networks from genomic data. FEBS Lett. 582, 1251–1258 (2008).

    CAS  PubMed  Google Scholar 

  65. 65

    Wass, M. N., David, A. & Sternberg, M. J. Challenges for the prediction of macromolecular interactions. Curr. Opin. Struct. Biol. 21, 382–390 (2011).

    CAS  PubMed  Google Scholar 

  66. 66

    von Mering, C. et al. STRING: a database of predicted functional associations between proteins. Nucleic Acids Res. 31, 258–261 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  67. 67

    Pazos, F., Ranea, J. A. G., Juan, D. & Sternberg, M. J. E. Assessing protein co-evolution in the context of the tree of life assists in the prediction of the interactome. J. Mol. Biol. 352, 1002–1015 (2005).

    CAS  PubMed  Google Scholar 

  68. 68

    Sato, T., Yamanishi, Y., Kanehisa, M. & Toh, H. The inference of protein–protein interactions by co-evolutionary analysis is improved by excluding the information about the phylogenetic relationships. Bioinformatics 21, 3482–3489 (2005).

    CAS  PubMed  Google Scholar 

  69. 69

    Kann, M. G., Jothi, R., Cherukuri, P. F. & Przytycka, T. M. Predicting protein domain interactions from coevolution of conserved regions. Proteins 67, 811–820 (2007).

    CAS  PubMed  Google Scholar 

  70. 70

    Sato, T., Yamanishi, Y., Horimoto, K., Kanehisa, M. & Toh, H. Partial correlation coefficient between distance matrices as a new indicator of protein–protein interactions. Bioinformatics 22, 2488–2492 (2006).

    CAS  PubMed  Google Scholar 

  71. 71

    Juan, D., Pazos, F. & Valencia, A. High-confidence prediction of global interactomes based on genome-wide coevolutionary networks. Proc. Natl Acad. Sci. USA 105, 934–939 (2008). This methodology relies on the whole set of pairwise similarities between phylogenetic trees within a given proteome (co-evolutionary network) to reassess the co-evolutionary signal of every pair of proteins. The method predicts interactions at the level of macromolecular complexes and functional units for fully sequenced genomes.

    CAS  PubMed  Google Scholar 

  72. 72

    Herman, D. et al. Selection of organisms for the co-evolution-based study of protein interactions. BMC Bioinformatics 12, 363 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  73. 73

    Choi, K. & Gomez, S. M. Comparison of phylogenetic trees through alignment of embedded evolutionary distances. BMC Bioinformatics 10, 423 (2009).

    PubMed  PubMed Central  Google Scholar 

  74. 74

    Tillier, E. R. M. & Charlebois, R. L. The human protein coevolution network. Genome Res. 19, 1861–1871 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  75. 75

    Ramani, A. K. & Marcotte, E. M. Exploiting the co-evolution of interacting proteins to discover interaction specificity. J. Mol. Biol. 327, 273–284 (2003).

    CAS  PubMed  Google Scholar 

  76. 76

    Jothi, R., Kann, M. G. & Przytycka, T. M. Predicting protein–protein interaction by searching evolutionary tree automorphism space. Bioinformatics 21 (Suppl. 1), i241–i250 (2005).

    CAS  PubMed  PubMed Central  Google Scholar 

  77. 77

    Izarzugaza, J. M., Juan, D., Pons, C., Pazos, F. & Valencia, A. Enhancing the prediction of protein pairings between interacting families using orthology information. BMC Bioinformatics 9, 35 (2008).

    PubMed  PubMed Central  Google Scholar 

  78. 78

    Jothi, R., Cherukuri, P. F., Tasneem, A. & Przytycka, T. M. Co-evolutionary analysis of domains in interacting proteins reveals insights into domain–domain interactions mediating protein–protein interactions. J. Mol. Biol. 362, 861–875 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  79. 79

    Kann, M. G., Shoemaker, B. A., Panchenko, A. R. & Przytycka, T. M. Correlated evolution of interacting proteins: looking behind the MirrorTree. J. Mol. Biol. 385, 91–98 (2009).

    CAS  PubMed  Google Scholar 

  80. 80

    Pellegrini, M., Marcotte, E. M., Thompson, M. J., Eisenberg, D. & Yeates, T. O. Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc. Natl Acad. Sci. USA 96, 4285–4288 (1999).

    CAS  PubMed  Google Scholar 

  81. 81

    Gaasterland, T. & Ragan, M. A. Microbial genescapes: phyletic and functional patterns of ORF distribution among prokaryotes. Microb. Comp. Genom. 3, 199–217 (1998).

    CAS  Google Scholar 

  82. 82

    Date, S. V. & Marcotte, E. M. Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages. Nature Biotech. 21, 1055–1062 (2003).

    CAS  Google Scholar 

  83. 83

    Ranea, J. A. G., Yeats, C., Grant, A. & Orengo, C. A. Predicting protein function with hierarchical phylogenetic profiles: the Gene3D Phylo-Tuner method applied to eukaryotic genomes. PLoS Comp. Biol. 3, e237 (2007).

    Google Scholar 

  84. 84

    Zhou, Y., Wang, R., Li, L., Xia, X. & Sun, Z. Inferring functional linkages between proteins from evolutionary scenarios. J. Mol. Biol. 359, 1150–1159 (2006).

    CAS  PubMed  Google Scholar 

  85. 85

    Ta, H. X., Koskinen, P. & Holm, L. A novel method for assigning functional linkages to proteins using enhanced phylogenetic trees. Bioinformatics 27, 700–706 (2011).

    CAS  PubMed  Google Scholar 

  86. 86

    Sun, J. et al. Refined phylogenetic profiles method for predicting protein-protein interactions. Bioinformatics 21, 3409–3415 (2005).

    CAS  PubMed  Google Scholar 

  87. 87

    Jothi, R., Przytycka, T. M. & Aravind, L. Discovering functional linkages and uncharacterized cellular pathways using phylogenetic profile comparisons: a comprehensive assessment. BMC Bioinformatics 8, 173 (2007).

    PubMed  PubMed Central  Google Scholar 

  88. 88

    Pazos, F. & Valencia, A. In silico two-hybrid system for the selection of physically interacting protein pairs. Proteins 47, 219–227 (2002).

    CAS  PubMed  Google Scholar 

  89. 89

    Tillier, E. R. M., Biro, L., Li, G. & Tillo, D. Codep: maximizing co-evolutionary interdependencies to discover interacting proteins. Proteins 63, 822–831 (2006).

    CAS  PubMed  Google Scholar 

  90. 90

    Thompson, J. N. The coevolving web of life. Am. Nat. 173, 125–140 (2009).

    PubMed  Google Scholar 

  91. 91

    Graña, O. et al. CASP6 assessment of contact prediction. Proteins 61 (Suppl. 7), 214–224 (2005).

    PubMed  Google Scholar 

  92. 92

    Tress, M. L. & Valencia, A. Predicted residue–residue contacts can help the scoring of 3D models. Proteins 78, 1980–1991 (2010).

    CAS  PubMed  Google Scholar 

  93. 93

    Sadowski, M. I., Maksimiak, K. & Taylor, W. R. Direct correlation analysis improves fold recognition. Comput. Biol. Chem. 35, 323–332 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  94. 94

    Marks, D. S. et al. Protein 3D structure computed from evolutionary sequence variation. PLoS ONE 6, e28766 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  95. 95

    Hopf, T. A. et al. Three-dimensional structures of membrane proteins from genomic sequencing. Cell 149, 1607–1621 (2012). This publication presents a new methodology for obtaining high quality de novo models of transmembrane proteins by integrating DCA (reference 38) predictions with various topological constraints.

    CAS  PubMed  PubMed Central  Google Scholar 

  96. 96

    Nugent, T. & Jones, D. T. Accurate de novo structure prediction of large transmembrane protein domains using fragment-assembly and correlated mutation analysis. Proc. Natl Acad. Sci. USA 109, E1540–E1547 (2012).

    CAS  PubMed  Google Scholar 

  97. 97

    Sułkowska, J. I., Morcos, F., Weigt, M., Hwa, T. & Onuchic, J. N. Genomics-aided structure prediction. Proc. Natl Acad. Sci. USA 109, 10340–10345 (2012).

    PubMed  Google Scholar 

  98. 98

    Izarzugaza, J. M. G. et al. Characterization of pathogenic germline mutations in human protein kinases. BMC Bioinformatics 12 (Suppl. 4), S1 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  99. 99

    Izarzugaza, J. M. G., del Pozo, A., Vazquez, M. & Valencia, A. Prioritization of pathogenic mutations in the protein kinase superfamily. BMC Genomics 13 (Suppl. 4), S3 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  100. 100

    Bauer, B. et al. Effector recognition by the small GTP-binding proteins Ras and Ral. J. Biol. Chem. 274, 17763–17770 (1999).

    CAS  PubMed  Google Scholar 

  101. 101

    Morillas, M. et al. Identification of conserved amino acid residues in rat liver carnitine palmitoyltransferase I critical for malonyl-CoA inhibition. Mutation of methionine 593 abolishes malonyl-CoA inhibition. J. Biol. Chem. 278, 9058–9063 (2003).

    CAS  PubMed  Google Scholar 

  102. 102

    Hernanz-Falcón, P. et al. Identification of amino acid residues crucial for chemokine receptor dimerization. Nature Immunol. 5, 216–223 (2004).

    Google Scholar 

  103. 103

    Shenoy, S. K. et al. β-arrestin-dependent, G protein-independent ERK1/2 activation by the β2 adrenergic receptor. J. Biol. Chem. 281, 1261–1273 (2006).

    CAS  PubMed  Google Scholar 

  104. 104

    Ribes-Zamora, A., Mihalek, I., Lichtarge, O. & Bertuch, A. A. Distinct faces of the Ku heterodimer mediate DNA repair and telomeric functions. Nature Struct. Mol. Biol. 14, 301–307 (2007).

    CAS  Google Scholar 

  105. 105

    Zamir, L. et al. Tight coevolution of proliferating cell nuclear antigen (PCNA)-partner interaction networks in fungi leads to interspecies network incompatibility. Proc. Natl Acad. Sci. USA 109, E406–E414 (2012).

    CAS  PubMed  Google Scholar 

  106. 106

    Capra, E. J., Perchuk, B. S., Skerker, J. M. & Laub, M. T. Adaptive mutations that prevent crosstalk enable the expansion of paralogous signaling protein families. Cell 150, 222–232 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  107. 107

    Gershoni, M. et al. Coevolution predicts direct interactions between mtDNA-encoded and nDNA-encoded subunits of oxidative phosphorylation complex i. J. Mol. Biol. 404, 158–171 (2010).

    CAS  PubMed  Google Scholar 

  108. 108

    Ochoa, D. & Pazos, F. Studying the co-evolution of protein families with the MirrorTree web server. Bioinformatics 26, 1370–1371 (2010).

    CAS  PubMed  Google Scholar 

  109. 109

    Edgar, R. S. et al. Peroxiredoxins are conserved markers of circadian rhythms. Nature 485, 459–464 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  110. 110

    Watanabe, M. et al. Highly divergent sequences of the pollen self-incompatibility (S) gene in class-I S haplotypes of Brassica campestris (syn. rapa) L. FEBS Lett. 473, 139–144 (2000).

    CAS  PubMed  Google Scholar 

  111. 111

    Clark, N. L. et al. Coevolution of interacting fertilization proteins. PLoS Genet. 5, e1000570 (2009).

    PubMed  PubMed Central  Google Scholar 

Download references


We thank J. Onuchic from the University of California, San Diego, USA, D. Jones from the University College London, UK, C. Sander from the Computational Biology Center at the Memorial Sloan–Kettering Cancer Center, New York, USA, E. van Nimwegen from Biozentrum at the University of Basel, Switzerland, F. Gervasio and S. Marsili from the Computational Biophysics Group at CNIO, A. Rausell from the Swiss Institute of Bioinformatics Vital-IT & Institute of Microbiology of the University of Lausanne and D. Ochoa from the Computational Systems Biology Group at CNB–CSIC for interesting discussions, as well as the many authors and collaborators with important contributions to the field of molecular co-evolution in the past 20 years, many of which could not be included in this Review.

Author information



Corresponding author

Correspondence to Alfonso Valencia.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Related links

PowerPoint slides

Supplementary information

Supplementary information S1 (boxes)

Technical information on co-evolution methods. (PDF 406 kb)

Supplementary Information S2 (table)

(PDF 122 kb)


Molecular phylogenetics

The study of evolutionary phenomena using biomolecular data, generally in the form of sequences of nucleic acids or proteins.

Covarion model

A phylogenetic model in which the evolutionary rate of different codons are interdependent.

Protein family

A set of homologous proteins defined according to a given threshold of sequence similarity.


Genes and proteins arisen from a common ancestor. In most cases, this common origin is traceable at the sequence level, albeit the sequence similarity can be very low and difficult to detect.

Correlated mutations

Relationship between two positions of a multiple sequence alignment in which the amino acid changes in one of the positions (mutational pattern) parallels that in the other.

Phylogenetic trees

Representations of the evolutionary relationships between a set of biological entities (such as proteins, genes or organisms).

Protein interfaces

Regions of the surface of a protein involved in the interaction with others.

Amino acid substitution matrix

A matrix containing, for every possible pair between the 20 canonical amino acids, a quantification of the 'interchangeability' of one by the other in the same protein site, as a proxy of the evolutionary feasibility of the corresponding change (mutation). They are often derived from curated sets of MSAs assumed to contain real representations of the amino acids allowed at a given protein site.


In bioinformatics, this term describes the assessment of the performance of a method using a set of examples of known outcome (the 'gold standard'), particularly by testing its predictive power relative to current best practice tools.


Groups of entities (such as genes or organisms) in a phylogenetic tree that have all arisen from a common ancestor.

Homology modelling

Protein structure prediction technique that, on the basis of the proven relationship between sequence similarity and structural similarity, models the three-dimensional structure of a protein based on the (experimentally determined) structure of a homologue (known as a 'template' in this context). Also known as 'comparative modelling'.

De novo protein modelling

Any approach for predicting protein structure that does not make use of information on other existing protein structures (such as those of homologues). Also known as 'ab initio modelling'.

Mutual information

In information theory, this is entropy-based formulation for quantifying the interdependence between the values of two random categorical variables.

Continuous-time Markov process

Process in which a system explores along time different states of a finite 'state space' in such a way that the Markov property is satisfied. This property means that the probability distribution of the system at a time point given the whole history of the process up to a previous time depends only on the state of the system at that previous time.

Monte Carlo algorithm

An algorithm based on simulated repeated random sampling to obtain approximate solutions to complex mathematical and statistical problems.

Heuristic approaches

Methods that makes use of approximations or assumptions so as to reduce the search space but that consequently do not ensure the exact solution to be found.

Bayesian network

Probabilistic model in which a set of random variables (nodes) and their conditional dependencies (directed edges) are arranged in a network representation.

Residue entropy

Quantification of the evolutionary variability of the position of a multiple sequence alignment corresponding to a given protein residue based on the 'entropy' parameter of information theory.


Homologous genes or proteins split in a speciation event, ending up in different organisms.

Horizontal gene transfer

(HGT). Transmission of genetic material between organisms different from that which occurs between the parents and the offspring ('vertical transfer'). Also known as 'lateral gene transfer'.

Principal component analysis

(PCA). Multivariate data analysis technique that consists of calculating a lower dimensionality space in which the axes explain most of the variability of the original data. The rationale is that such lower dimensionality space is easy to handle and to visualize, whereas most of the information of the original data (for example, in terms of relative distances) is retained and some contributions of noise are removed.

Multiple correspondence analysis

(MCA). Multivariate data analysis technique similar to principal component analysis but more suitable for categorical data.

Spectral decomposition

Decomposition of a squared matrix (A) as the product of its eigenvectors (V) times the diagonal matrix of its eigenvalues (D) times the inverse of its eigenvectors: A = V·D·V−1. Also known as 'eigendecomposition'.


Homologous genes or proteins split in a gene duplication event, resulting in two copies of the parental gene in the same organism that latter diverge in sequence and function.

Protein domains

Pieces of a protein defined according to given criteria: for example, structural domains or functional domains.

Genetic saturation

Apparent reduction with time of the observed divergence between two genes owing to factors such as reversed or convergent mutations.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

de Juan, D., Pazos, F. & Valencia, A. Emerging methods in protein co-evolution. Nat Rev Genet 14, 249–261 (2013).

Download citation

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing