Protein interfaces often display coevolving residues. This idea has been explored for predicting protein interaction partners, but high-throughput characterization of previously unknown protein–protein interactions (PPIs) across an organism’s proteome has been a formidable task. David Baker and colleagues, from the University of Washington, hypothesized that the most strongly coevolving protein pairs were most likely to physically interact, and that aligning homologs for each protein and investigating correlated changes for each aligned protein pair would reveal PPIs.

The researchers demonstrate this using 4,262 proteins from Escherichia coli. Each of the 40,607 reference bacterial proteomes available were searched for putative homologs of each protein. They obtained more than 9 million paired alignments after looking for homologs for each protein and after creating all possible aligned protein pairs. In order to systematically evaluate each protein pair, they used mutual information — a metric for mutual dependence between two sequences, followed by further screens using previously established coevolution-based methods and molecular docking. They were able to identify 1,618 total PPIs through this coevolution screen. Impressively, this method outperformed experimental methods in identifying PPIs when tested on benchmark datasets. Baker adds, “We were basically surprised at how well it worked [...], all this information on protein–protein interactions just sitting there in the sequence databases!”

They also investigated the proteome of Mycobacterium tuberculosis, a more challenging case with fewer known homologs, and were able to predict 911 PPIs with an expected precision of 83%. The group is working on expanding the study. Baker mentions, “We’re scaling this up to do quite a variety of other organisms, and we’ll make those results available online”.

The importance of the technique ultimately lies in the new biological interactions that it can uncover to facilitate better understanding of PPI networks, discovery of novel targets for drug design, as well as understanding the evolutionary dynamics of PPI networks.