Abstract
The genome-wide identification of pairs of interacting proteins is an important step in the elucidation of cell regulatory mechanisms1,2. Much of our present knowledge derives from high-throughput techniques such as the yeast two-hybrid assay and affinity purification3, as well as from manual curation of experiments on individual systems4. A variety of computational approaches based, for example, on sequence homology, gene co-expression and phylogenetic profiles, have also been developed for the genome-wide inference of protein–protein interactions (PPIs)5,6. Yet comparative studies suggest that the development of accurate and complete repertoires of PPIs is still in its early stages7,8,9. Here we show that three-dimensional structural information can be used to predict PPIs with an accuracy and coverage that are superior to predictions based on non-structural evidence. Moreover, an algorithm, termed PrePPI, which combines structural information with other functional clues, is comparable in accuracy to high-throughput experiments, yielding over 30,000 high-confidence interactions for yeast and over 300,000 for human. Experimental tests of a number of predictions demonstrate the ability of the PrePPI algorithm to identify unexpected PPIs of considerable biological interest. The surprising effectiveness of three-dimensional structural information can be attributed to the use of homology models combined with the exploitation of both close and remote geometric relationships between proteins.
This is a preview of subscription content, access via your institution
Access options
Similar content being viewed by others
Change history
06 March 2013
Nature 490, 556–560 (2012); doi:10.1038/nature11503 In this Letter, one of the points shown in Fig. 2 and Supplementary Figs 8, 9 and Supplementary Table 4 reflects the presence of interactions that had been erroneously deposited from a previous publication1 into the IntAct database. We have now used the MINT database to retrieve these interactions, and Fig.
References
Bonetta, L. Protein–protein interactions: interactome under construction. Nature 468, 851–854 (2010)
Vidal, M., Cusick, M. E. & Barabasi, A. L. Interactome networks and human disease. Cell 144, 986–998 (2011)
Shoemaker, B. A. & Panchenko, A. R. Deciphering protein–protein interactions. Part I. Experimental techniques and databases. PLOS Comput. Biol. 3, e42 (2007)
Reguly, T. et al. Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae. J. Biol. 5, 11 (2006)
Shoemaker, B. A. & Panchenko, A. R. Deciphering protein–protein interactions. Part II. Computational methods to predict protein and domain interaction partners. PLOS Comput. Biol. 3, e43 (2007)
Salwinski, L. & Eisenberg, D. Computational methods of analysis of protein–protein interactions. Curr. Opin. Struct. Biol. 13, 377–382 (2003)
von Mering, C. et al. Comparative assessment of large-scale data sets of protein–protein interactions. Nature 417, 399–403 (2002)
Braun, P. et al. An experimentally derived confidence score for binary protein–protein interactions. Nature Methods 6, 91–97 (2009)
Deane, C. M., Salwinski, L., Xenarios, I. & Eisenberg, D. Protein interactions: two methods for assessment of the reliability of high throughput observations. Mol. Cell. Proteomics 1, 349–356 (2002)
Pieper, U. et al. MODBASE: a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res. 34, D291–D295 (2006)
Mirkovic, N., Li, Z., Parnassa, A. & Murray, D. Strategies for high-throughput comparative modeling: applications to leverage analysis in structural genomics and protein family organization. Proteins 66, 766–777 (2007)
Henrick, K. & Thornton, J. M. PQS: a protein quaternary structure file server. Trends Biochem. Sci. 23, 358–361 (1998)
Aloy, P. & Russell, R. B. Interrogating protein interaction networks through structural biology. Proc. Natl Acad. Sci. USA 99, 5896–5901 (2002)
Lu, L., Lu, H. & Skolnick, J. MULTIPROSPECTOR: an algorithm for the prediction of protein–protein interactions by multimeric threading. Proteins 49, 350–364 (2002)
Davis, F. P. et al. Protein complex compositions predicted by structural similarity. Nucleic Acids Res. 34, 2943–2952 (2006)
Tuncbag, N., Gursoy, A., Guney, E., Nussinov, R. & Keskin, O. Architectures and functional coverage of protein–protein interfaces. J. Mol. Biol. 381, 785–802 (2008)
Zhang, Q. C., Petrey, D., Norel, R. & Honig, B. H. Protein interface conservation across structure space. Proc. Natl Acad. Sci. USA 107, 10896–10901 (2010)
Gao, M. & Skolnick, J. Structural space of protein–protein interfaces is degenerate, close to complete, and highly connected. Proc. Natl Acad. Sci. USA 107, 22517–22522 (2010)
Wass, M. N., Fuentes, G., Pons, C., Pazos, F. & Valencia, A. Towards the prediction of protein interaction partners using physical docking. Mol. Syst. Biol. 7, 469 (2011)
Chen, H. L. & Zhou, H. X. Prediction of interface residues in protein–protein complexes by a consensus neural network method: test against NMR data. Proteins 61, 21–35 (2005)
Liang, S., Zhang, C., Liu, S. & Zhou, Y. Protein binding site prediction using an empirical scoring function. Nucleic Acids Res. 34, 3698–3707 (2006)
Zhang, Q. C. et al. PredUs: a web server for predicting protein interfaces using structural neighbors. Nucleic Acids Res. 39, 283–287 (2011)
Yu, H. et al. High-quality binary protein interaction map of the yeast interactome network. Science 322, 104–110 (2008)
Lefebvre, C. et al. A human B-cell interactome identifies MYB and FOXM1 as master regulators of proliferation in germinal centers. Mol. Syst. Biol. 6, 377 (2010)
Jansen, R. et al. A Bayesian networks approach for predicting protein–protein interactions from genomic data. Science 302, 449–453 (2003)
von Mering, C. et al. STRING: known and predicted protein–protein associations, integrated and transferred across organisms. Nucleic Acids Res. 33, D433–D437 (2005)
Stolovitzky, G., Prill, R. J. & Califano, A. Lessons from the DREAM2 challenges. Ann. NY Acad. Sci. 1158, 159–195 (2009)
Keskin, O., Nussinov, R. & Gursoy, A. PRISM: protein–protein interaction prediction by structural matching. Methods Mol. Biol. 484, 505–521 (2008)
Ewing, R. M. et al. Large-scale mapping of human protein–protein interactions by mass spectrometry. Mol. Syst. Biol. 3, 89 (2007)
Levitt, M. Nature of the protein universe. Proc. Natl Acad. Sci. USA 106, 11079–11084 (2009)
Apweiler, R. et al. UniProt: the Universal Protein knowledgebase. Nucleic Acids Res. 32, D115–D119 (2004)
Letunic, I., Doerks, T. & Bork, P. SMART 6: recent updates and new developments. Nucleic Acids Res. 37, D229–D232 (2009)
Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000)
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)
Sanchez, R. & Sali, A. Large-scale protein structure modeling of the Saccharomyces cerevisiae genome. Proc. Natl Acad. Sci. USA 95, 13597–13602 (1998)
Petrey, D. & Honig, B. GRASP2: visualization, surface properties, and electrostatics of macromolecular structures and sequences. Methods Enzymol. 374, 492–509 (2003)
Yang, A. S. & Honig, B. An integrated approach to the analysis and modeling of protein sequences and structures. I. Protein structural alignment and a quantitative measure for protein structural distance. J. Mol. Biol. 301, 665–678 (2000)
Krissinel, E. & Henrick, K. Inference of macromolecular assemblies from crystalline state. J. Mol. Biol. 372, 774–797 (2007)
The Gene Ontology Consortium Gene ontology: tool for the unification of biology. Nature Genet. 25, 25–29 (2000)
Mewes, H. W., Albermann, K., Heumann, K., Liebl, S. & Pfeiffer, F. MIPS: a database for protein sequences, homology data and yeast genome information. Nucleic Acids Res. 25, 28–30 (1997)
Huynen, M., Snel, B., Lathe, W., III & Bork, P. Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. Genome Res. 10, 1204–1210 (2000)
Sun, L. et al. Neuronal and glioma-derived stem cell factor induces angiogenesis within the brain. Cancer Cell 9, 287–300 (2006)
Barrett, T. et al. NCBI GEO: archive for functional genomics data sets—10 years on. Nucleic Acids Res. 39, D1005–D1010 (2011)
Enault, F., Suhre, K. & Claverie, J. M. Phydbac “Gene Function Predictor”: a gene annotation tool based on genomic context analysis. BMC Bioinformatics 6, 247 (2005)
Acknowledgements
This work is supported by National Institutes of Health grants GM030518 and GM094597 (B.H.), CA121852 (A.C. and B.H.), DK057539 (D.A.), CA082683 (T.H.), R01NS043915 (T.M.). L.D. thanks the China Scholarship Council scholarship 2010626059. We thank U. Pieper from A. Sali’s laboratory for help with ModBase, and H. Lee for help with SkyBase.
Author information
Authors and Affiliations
Contributions
Q.C.Z., D.P., A.C. and B.H. designed the research; Q.C.Z. performed the computational work; Q.C.Z., D.P., A.C. and B.H. analysed the data; L.D. set up the PrePPI web server, L.Q., Y.S., C.A.T. and B.B. performed co-immunoprecipitation studies, Q.C.Z., D.P., A.C. and B.H. wrote the paper including text from C.L., D.A., T.H. and T.M.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Information
This file contains Supplementary Figures 1-16, Supplementary Tables 1-6 and additional references. Supplementary Figures 8, 9, 10C and Supplementary Table 4 were corrected on 7 March 2013; please see the Corrigendum associated with the main paper for details. (PDF 3846 kb)
Rights and permissions
About this article
Cite this article
Zhang, Q., Petrey, D., Deng, L. et al. Structure-based prediction of protein–protein interactions on a genome-wide scale. Nature 490, 556–560 (2012). https://doi.org/10.1038/nature11503
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nature11503
This article is cited by
-
Electrostatic complementarity at the interface drives transient protein-protein interactions
Scientific Reports (2023)
-
Sequence-based prediction of protein–protein interaction using auto-feature engineering of RNN-based model
Research on Biomedical Engineering (2023)
-
Struct2Graph: a graph attention network for structure based predictions of protein–protein interactions
BMC Bioinformatics (2022)
-
Multi-view heterogeneous molecular network representation learning for protein–protein interaction prediction
BMC Bioinformatics (2022)
-
Prediction of protein–protein interaction using graph neural networks
Scientific Reports (2022)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.