Nature Biotechnology
22, 43 - 44 (2004)
doi:10.1038/nbt0104-43
Protein interaction maps on the flyPeter Uetz
& Michael J PankratzPeter Uetz and Michael J. Pankratz are at the Institute of Genetics, Forschungszentrum Karlsruhe, Box 3640, D-76021 Karlsruhe, Germany.
Correspondence should be addressed to Peter Uetz peter.uetz@itg.fzk.deA protein interaction map for the fruit fly Drosophila melanogaster promises to facilitate functional analysis of many eukaryotic proteins.Unraveling the ways in which proteins interact with each other on a genome-wide level is one of the main goals of proteomics research. Until now, large-scale protein interaction maps have been published only for yeast1,
2,
3 and bacteria4. Now, in a paper published in Science, Giot et al.5 report the largest protein interaction map to date and the first genome-wide study for a multicellular organism, the fly. Key to this work was the development of computational and statistical methodologies to extract relevant protein-protein interactions, which are presented in this issue6.
Building the interaction map required a tour de force of PCR to amplify all 14,000 predicted D. melanogaster open reading frames (ORFs), of which more than 12,000 worked successfully. These PCR products were then cloned into two-hybrid bait (protein of interest) and prey (interactors) vectors, yielding roughly 11,000 clones each. This achievement is amazing by itself, given that sequencing of the PCR products represents a serious expressed sequence tag project on its own, which may allow biologists to identify many new exons and introns if investigated further. Next, all of the clones were screened against each other using methods Giot et al. have employed for the yeast genome. The fly project, however, proceeded on a vastly greater scale, requiring isolation of 45,417 two-hybrid positive colonies, from which 35,151 prey clones were obtained and sequenced. This approach generated 10,021 protein interactions involving 4,500 proteins.
Giot et al. then went one step further using their whole bait collection to screen a two-hybrid cDNA library prepared from embryonic, larval and adult tissues. Again, these screens resulted in another 45,962 positives of which 31,760 clones were successfully sequenced. This more traditional approach yielded 10,782 interactions involving 5,200 proteins. Thus, altogether Giot et al. churned out a whopping 20,405 interactions, dwarfing all previous efforts by at least four- to fivefold.
Nevertheless, two-hybrid screens are infamous for their false-positive rate; thus, one wonders about the reliability of these data. To address this issue, Giot et al. have developed fairly sophisticated statistics for the analysis of their protein network and Bader et al.6 elaborate on this work, using yeast data as a model. In brief, both papers exploit topological and other information in a network to calculate confidence scores for each protein interaction. For example, two-hybrid interactions are considered low-confidence interactions if the interacting proteins are not found in a protein complex. Similarly, if proteins that have been found in a complex are far apart from each other in a two-hybrid map, the confidence score for this interaction decreases. Giot et al. use many more such parameters, including the reproducibility of interactions, the frequency with which bait and prey show up in a screen (indicating sticky and unspecific interactions) and evolutionary conservation of an interaction when compared with the yeast data set. After optimizing their statistical model, Bader et al. validated it by correlating their calculated confidence scores to independently collected information, such as gene ontology annotation (functional classification and subcellular localization) and gene expression data. As a result of such analyses, Giot et al. estimated that the fraction of 'biologically relevant' interactions may be as low as 11% in their whole data set, but around 38% in their high-confidence subset of 4,679 proteins and 4,780 interactions.
Given these quality-controlled protein interactions, what have we gained from this plethora of information? First, extrapolating from our experience with yeast, it is clear that although Giot et al. identified a huge number of interactions, they may represent only a fraction of all protein interactions in flies, which may harbor 50,000 or more. In addition, the number of essential interactions remains unresolved.
Second, the fly network has generated many novel hypotheses for biologists working on individual proteins, pathways or processes to investigate (see Fig. 1).
 | | Figure 1. The awesome power of comparative interactomics. |  |  |  | A comparison of interaction maps identifies evolutionarily conserved pathways in fly and yeast, such as those involving reptin and pontin that are homologs of DNA helicases. Reptin has just one high-confidence interactor5, pontin (indicated by thick lines in the figure), whereas pontin returned two high-confidence proteins, one of which is reptin. Previous data showed reptin-pontin interactions in yeast, flies and vertebrates, and their role in chromatin remodeling8, but those interactions and complexes involved different proteins in each model. In multicellular organisms, the pair is involved in wingless/ -catenin signaling. The other high-confidence partner of pontin, Or67c, is a membrane olfactory receptor. Although yeast does not have wingless/ -catenin homologs, it does have a membrane receptor for mating factor alpha, and mutants in reptin or pontin homologs show an impaired response to alpha factor8. Despite different interaction patterns in each model, comparative interactomics suggests both additional components in the pathway as well as the hypothesis that the ancestral role of reptin and pontin was in response to pheromone or chemical signals. Maps are from refs. 3 and 5.
Full Figure and legend (63K) |
|  | Third, 44% of all fly genes have homologs in the human genome, making D. melanogaster an excellent model system for studying normal and pathological processes; therefore, the protein interaction map should provide new avenues of investigation for medical research. In fact, Giot et al. explicitly highlight disease-related proteins in their network, such as homologs of the oncoprotein Src.
Fourth, and most importantly, bioinformatic analysis will extract a wealth of new information from the fly and other protein networks. Comparing interaction networks of different model organisms is not only a powerful way to validate inherently unreliable data; it also allows systems biologists to identify conserved pathways and complexes7. Comparative analysis has already shown how proteins and their interactions evolve and, together with data from structural genomics, will tell us which residues in a three-dimensional structure are necessary for interactions and therefore function. In addition, interaction networks can be harnessed to predict protein interaction domains and motifs, and identify connections between pathways and processes that we would not have imagined otherwise. Topological analysis of protein networks also gives us an idea of why living systems are as robust to perturbations as they are and what complexity means in molecular terms.
Nevertheless, a great deal remains to be done. Most interactions have yet to be discovered, although we have only a few clues of how many there are. Two-hybrid interactions need to be verified by other methods, such as mass spectrometry analysis or systematic subcellular localization experiments. Interactions between proteins that are not localized to the same subcellular compartment or expressed in the same cell type and are not likely to be biologically relevant. Finally, for a full understanding of how a cell works, we need the structures of all components at atomic resolution. Structural genomics has a long way to go here.
But this wish list is mostly directed towards experimentalists. With more and more functional genomics data, bioinformatic analysis increasingly becomes a limiting factor. All these data sources have to be integrated so that a click on a protein in an interaction map pulls up more than a protein's structure or a multiple alignment of its relatives. Context-sensitive menus should pop up to give us a range of options to access a whole knowledge base of data. In fact, protein interaction maps are still static maps of squares and circles instead of realistic representations of what happens dynamically in a cell in four dimensions. So experimentalists and programmers pull up your sleeves and prepare for the exciting dawn of systems biology.
REFERENCES
- Uetz, P. et al. Nature 403, 623627 (2000).
- Ito, T. et al. Proc. Natl. Acad. Sci. USA 98, 456974 (2001).
- Schwikowski, B., Uetz, P. & Fields, S. Nat. Biotechnol. 18, 12571261 (2000).
- Rain, J.C. et al. Nature 409, 211216 (2001).
- Giot, L. et al. Science 302, 17271736 (2003).
- Bader, J.S., Chaudhuri, A. & Chant, J. Nat. Biotechnol. 22, 7885 (2004).
- Kelley, B.P. et al. Proc. Natl. Acad. Sci. USA 100, 1139411399 (2003).
- Jónsson, Z.O. et al. J. Biol. Chem. 276, 1627916288 (2001).
|