Around the time that scientists celebrated the completion of the draft sequence of the human genome, papers from two separate groups described results of another project that tested all the possible pairings of thousands of yeast proteins to see whether they interact1, 2.
The importance of protein–protein interactions is beyond dispute. Little happens in a cell without one protein 'touching' another. Whether a cell divides, secretes a hormone or triggers its own death, protein–protein interactions make the event happen. Consequently, comprehensive maps showing which proteins came together in a yeast cell were much anticipated.
But the results took scientists aback. Although the two research groups had explored the full collection of proteins in the same organism using the same yeast two-hybrid (Y2H) assay, the two papers found fewer than 150 interactions in common — only 10% of the findings that either team dubbed high quality. Most scientists regarded the results as so riddled with artefacts that they were useless.
M. SHALES/KROGAN LAB/UCSF
Multiple replicated experiments and sophisticated statistics reveal 497 interactions between 16 HIV proteins (blue) and hundreds of human factors.
“As you can imagine, people were extremely critical. They just couldn't believe that you would get such different results when you were studying the same thing,” recalls Peter Uetz, who studies protein interactions at the Center for the Study of Biological Complexity at Virginia Commonwealth University in Richmond, and was a co-author on one of the papers1. Even today, many researchers look askance at the Y2H assays used in the studies.
But Marc Vidal, a systems biologist at the Dana-Farber Cancer Institute in Boston, Massachusetts, says that the technique has come a long way in a decade. Not only have researchers found ways to recognize and reduce false-positives, but gruelling follow-up studies show that the startlingly low overlap between the two reports was not because the assays found so many interactions that do not exist, but because they missed so many that do3.
Understanding these interactions is as important as ever. Protein interactomes — maps of protein interactions — are raw fuel for systems biologists. Promising techniques to block protein–protein interactions in cancer cells and for other diseases have launched a string of biotechnology deals. Considering disease in terms of protein–protein interactions rather than individual genes and proteins could help to untangle jumbled observations. For example, mutations in the same protein could lead to different diseases by disrupting different interactions. Similarly, mutations in different proteins that disrupt the same interaction could lead to the same disease.
A good reference map of interactions would be like completing the human genome sequence, says Vidal, and could spawn further efforts to study genetic variation and function. A validated network would give scientists a jumping off point for more experiments. “My guess is that as these networks grow, we will get more elaborate ways of understanding where these interactions take place, when and why,” he says. “We are getting a sense of a cell's organizational self by doing this.
Interaction maps can help to explain protein function and identify new ways to fight disease.
First described in 1989, the Y2H assay tests the interactivity of pairs of proteins by attaching them to two halves of a transcription factor4. If the proteins come together, the transcription factor is reformed, activating reporter genes and allowing the yeast to grow. Companies including Hybrigenics in Paris and Dualsystems Biotech in Zurich, Switzerland, run Y2H as a service.
“Yeast two-hybrid has an enormous advantage, which might also be a disadvantage: it can detect low affinity,” says Erich Wanker, a neuroproteomics researcher at the Max Delbrück Center for Molecular Medicine in Berlin, and co-editor of a book on the topic5. In other words, the assay can identify weak, transient pairings such as those that perpetuate cell signalling. But it also detects proteins that randomly bump together. This bumping has led to almost philosophical discussions. “At what point do we really believe that it's an interaction?” asks Wanker.
Scientists have also found ways to detect and avoid many sorts of false positive. Artefacts from 'sticky' proteins, which bind non-specifically to other proteins, can be identified and excluded. Growth that is promoted by a single introduced protein rather than a reformed transcription factor can also be recognized.
Precise systems also exist to make sure that all desired combinations are tested. Rather than transfecting the same yeast cell with genes for two potential interaction partners, yeast are transfected with individual genes, mated in pools and their progeny assayed for growth. Robotic systems mix yeast precisely and run multiple replicates of each assay. The number of times that the same interaction is seen becomes part of a quality score. “Our view is that Y2H can give reliable and reproducible results,” says Wanker.
Still, some interactions will not be observed in Y2H. For example, the interacting proteins have to allow the two halves of the transcription factor to reunite, and the proteins must be able to reach the nucleus to activate the reporter gene. Thus, interactions with membrane- or organelle-specific proteins are invisible.
Besides Y2H, lower-throughput tests in mammalian cells can be used to screen interactions; these tests include luminescence-based mammalian interactome (LUMIER), mammalian protein–protein interaction trap (MAPPIT), protein arrays and protein-fragment complementation assay (PCA). Although these are orders of magnitude slower than Y2H, they can probe interactions in a more relevant context.
MAPPIT is one of the highest-throughput mammalian screens. Instead of a yeast transcription factor, a mammalian cytokine receptor is split and becomes capable of cell signalling only when reconstituted. In 2009, Jan Tavernier, a network biologist at VIB, a life-sciences research institute in Ghent, Belgium, described a higher throughput version of MAPPIT in which plasmids encoding potential interaction partners linked to one cytokine receptor fragment can be individually spotted into wells and stored6. To begin the experiment, wells are filled with cells expressing the cytokine-receptor fragment linked with the selected 'bait'. When interactions occur, signalling activates the light-emitting enzyme luciferase.
Using multiwell plates it costs about 2,000 (US$2,600) to screen one bait protein against the human ORFeome (a complete set of cloned protein-encoding open reading frames), says Tavernier, who hopes to describe techniques to run MAPPIT on microchips later this year. Miniaturized assays should reduce the cost to 100 and allow the ORFeome to be tested against 100 baits a week.
At this throughput and cost, Tavernier says, new kinds of experiments become feasible. Instead of restricting screens to yeast cells, “you start mapping full interactomes in the appropriate species”, he says. In addition, Tavernier plans to compare how interactomes change when cells are treated with agents such as drugs or toxic chemicals. He is hoping to commercialize the technology, and is working with Vidal and other scientists to map human protein interactions using both MAPPIT and Y2H assays.
Marc Vidal: “We are getting a sense of a cell's organizational self by having a validation network”
LUMIER assays are also relatively high-throughput and can be used to test whether particular interactions are affected by drugs, hormones or other additives. For these assays, cells are transiently transfected with two proteins. One protein is attached to a hydrophilic peptide called FLAG. Potential interaction partners are linked with luciferase. Cells are lysed, the FLAG-tagged proteins are captured and the presence of the interacting partners can be detected by the light they give off7.
Protein-fragment complementation assays, which can be conducted in yeast as well as mammalian cells, rely on reconstituting a wide range of 'reporters', often enzymes or fluorescent proteins. Since the reporters can signal throughout the cell, interactions can be detected where they naturally occur.
In a collection of articles published in January 2009, Vidal, Wanker and others described what Vidal terms an empirical framework for assessing protein interactions found in high-throughput screens3. In practice, this means repeating experiments using different types of assay and comparing the results with sets of controls. The positive controls are a reference set of about 100 well-established interactions carefully selected from the literature. The negative controls are some 100 randomly assigned pairs that have never been observed together. Conditions of the assays are adjusted to boost detection of positive controls without raising the detection of random interactions.
As part of a framework put forth in Nature Methods8, results from interaction studies should be confirmed in different types of assays. The more methods that find an interaction, the more confident researchers can be. Still, collectively, these assays detect only about 70% of the positive reference set (see 'Beyond binary interactions').
Box 1: Beyond binary interactions
High-throughput experiments are not the only way to identify protein–protein interactions. Several databases, such as the Biological General Repository for Interaction Datasets (see thebiogrid.org) and IntAct (see www.ebi.ac.uk/intact), compile lists of interactions as they are published in the literature, culling from both small-scale and high-throughput experiments as well as predicted interactions inferred from other analyses. But this list is not even close to complete, says Sandra Orchard, a proteomics service coordinator at the European Bioinformatics Institute in Hinxton, UK, who helped to develop minimal information standards to help share and evaluate interaction data. “We will be lucky if as much as 30% of the yeast interactome has been observed,” she says. For the human interactome, she estimates that the figure is less than 10%, including published results that are not captured in the databases.
When to believe
Biologists rely on interaction data in several ways. They often layer protein–protein interaction networks onto other networks. After identifying transcription factors that regulate a gene, for example, they search databases and literature for transcription factors' interaction partners. Researchers also explore how sets of proteins are connected to each other, and then ask questions based on the structure of the network, such as classifying the proteins that have the most interaction partners. But not all interaction data are equal, warns Russell Finley, a network biologist at Wayne State University School of Medicine in Detroit, Michigan, who believes that incorporating quality measures could make the data substantially more powerful. At present, he says, savvy researchers filter out interactions unless they have been observed more than once through different methods, but these 'intuitive filters' can be biased. For example, the more often a protein is studied, the more interactions will be found. Finley says that a better approach would be to consider all the data available and assign a score reflecting the likelihood that an interaction is real. Computer analyses could then be used to consider more interactions, giving more weight to those with higher confidence scores.
Yeast two-hybrid assays can probe hundreds of thousands of potential protein interaction pairs a week.
But an interaction can occur and have no actual consequences. “The real question is what interactions have meaning in the first place,” says Stephen Michnick, a biochemist at the University of Montreal, Canada. “An interaction can be quite good, that is, reproducible in multiple assays, but not be biologically important.” In other words, the interaction has no discernible effects: it does not start or stop a molecular machine, activate an enzyme or send another protein to destruction.
Michnick came to these conclusions after conducting a comprehensive study that allowed protein interactions to be studied in a more natural context. In the protein-fragment complementation assays, interacting proteins reconstituted an enzyme that yeast needed to survive under culture conditions9. This identified about 3,000 new interactions, with many involving membrane and other proteins that cannot reach the cell nucleus.
But thousands of other protein interactions were observed with less confidence. “We were surprised that there were known proteins that made too many interactions or made interactions that didn't make biological sense,” Michnick recalls. “We thought we had the perfect method, and so we would get perfect results.” “So we thought, if we are seeing junk interactions and other people are seeing junk, what is the junk?” The answer, he believes, is that these are naturally occurring 'junk' interactions that, like sections of DNA that do not seem to have a function, simply exist.
Michnick believes that perhaps as many as half of the interactions observed even in rigorous screens have no biological function. Abundant proteins should be treated with particular scepticism, but if the same pairs of proteins are consistently found together and not with other proteins then that interaction is more likely to be real, and the same is true of interactions identified across multiple species. “The parts that are functional have to be dissected from the rest of what's there,” Michnick says.
Trey Ideker, a network biologist at the University of California, San Diego, is more worried that such a small percentage has been observed at all. “It's not clear how you can shortcut to the functional interactions without some unbiased way of getting all the interactions,” he says. “We have a flashlight illuminating 20% of the yard, but the other 80% is dark.” In fact, no one yet knows how big the universe of interactions is, he says, “but everyone agrees that we are not even close to having mapped it”.
Nonetheless, more interactions have been identified than can be individually investigated. For Ideker, the best approach is to think in terms of databases. “I have this big 'gamish' of interactions, how do I best query it?”
Trey Ideker: “It's the superposition of biophysical and functional data that is going to save the day.”
One strategy is combining diverse data sets around focused questions. For example, Ideker decided to conduct a Y2H screen that would pull out interactions involved in the mitogen-activated protein kinase (MAPK) signalling cascade — an important drug target that regulates processes such as cell growth, differentiation and survival. Ideker and his colleagues picked 150 proteins associated with the pathway and hunted for their interaction partners using Y2H assays. This revealed more than 2,000 interactions among about 1,500 proteins.
From these they selected a dozen or so proteins that had not previously been associated with the MAPK cascade and used RNA interference to knock down the expression of the identified interaction partners. In about one-third of the cases, RNA knockdown altered gene expression within the cascade, indicating that these interactions were functional. Follow-up studies provided the first experimental evidence that a protein called NHE-1 served as a MAPK scaffold10.
By starting with the interactions and whittling them away with other data, the researchers can uncover new biology, says Ideker. “It's the superposition of biophysical and functional data that is really going to save the day here.
Researchers can also glean insight from how proteins interact physically. This year, Haiyuan Yu and his colleagues at Cornell University, Ithaca, New York, showed how combining data about protein–protein interactions and protein structure could suggest how certain mutations cause disease11.
They combined several established data sets of protein–protein interactions, the physical structure of those interactions, and genetic measurements to show that when mutations do not prevent proteins from being expressed but still cause disease, they are more likely to occur in the interface between interacting proteins than elsewhere. “For the past decade, biologists have been using this mathematical definition. Every protein is a mathematical dot. But we know that protein structure is fundamentally important for function,” says Yu.
Information about whether an interaction occurs in a specific cell type or under certain conditions could go a long way to revealing its function, says Anne-Claude Gavin, who studies protein complexes at the European Molecular Biology Laboratory in Heidelberg, Germany. “Interactions have to be context-dependent; they have to start at one time and stop at another.” But these studies are difficult and are rarely done. “This is a level of sophistication that we just don't understand,” she says.
To understand a protein–protein interaction in context, researchers need to single them out for focused studies.
Sometimes, screening techniques can be adapted to follow particular interactions in depth. For example, complementation assays with fluorescent proteins or luciferase can be used to follow interacting proteins. Because different coloured fluorescent proteins are so similar, one protein can be tested for interactions with two or more proteins in the same cell. One protein is labelled with a fragment of yellow fluorescent protein, a second with a fragment of cyan fluorescent protein and another interrogated protein carries a fragment common to both fluorescent proteins. This can show which protein interactions are occurring and where in the cell they occur. Complementation assays with luciferase can also be used with multiple colours of proteins and have the advantage that the enzyme easily breaks apart and reforms, allowing researchers to study how interactions can be disrupted. Imaging techniques such as bioluminescent resonance energy transfer and fluorescent resonance energy transfer can be used in living cells. They use genetically tagged proteins that emit light when proteins come into contact with each other, and so are used in a variety of assays. Other assays label each of two proteins and then monitor whether they move together in cells.
Although slower and more expensive than large-scale screening efforts, one-at-a-time explorations of interactions are essential, says Uetz. “Eventually you want to drill down into the actual interactions.”
- Nature 403, 623–627 (2000). et al.
- Proc Natl Acad Sci USA 98, 4569–4574 (2001). , , , , &
- Nature Methods 6, 83–90 (2009). et al.
- Nature 340, 245–246 (1989). &
- 2012). & (eds) Two Hybrid Technologies: Methods and Protocols (Humana,
- J. Proteome Res. 8, 877–886 (2009). et al.
- Science 307, 1621–1625 (2005). et al.
- Nature Methods 6, 91–97 (2009). et al.
- Science 320, 1465–1470 (2008). et al.
- Nature Methods 7, 801–805 (2010). et al.
- Nature Biotechnol. 30, 159–164 (2012). et al.
- Nature 481, 365–370 (2012). et al.