Letter | Published:

Structure-based prediction of protein–protein interactions on a genome-wide scale

Nature volume 490, pages 556560 (25 October 2012) | Download Citation

  • A Corrigendum to this article was published on 06 March 2013

Abstract

The genome-wide identification of pairs of interacting proteins is an important step in the elucidation of cell regulatory mechanisms1,2. Much of our present knowledge derives from high-throughput techniques such as the yeast two-hybrid assay and affinity purification3, as well as from manual curation of experiments on individual systems4. A variety of computational approaches based, for example, on sequence homology, gene co-expression and phylogenetic profiles, have also been developed for the genome-wide inference of protein–protein interactions (PPIs)5,6. Yet comparative studies suggest that the development of accurate and complete repertoires of PPIs is still in its early stages7,8,9. Here we show that three-dimensional structural information can be used to predict PPIs with an accuracy and coverage that are superior to predictions based on non-structural evidence. Moreover, an algorithm, termed PrePPI, which combines structural information with other functional clues, is comparable in accuracy to high-throughput experiments, yielding over 30,000 high-confidence interactions for yeast and over 300,000 for human. Experimental tests of a number of predictions demonstrate the ability of the PrePPI algorithm to identify unexpected PPIs of considerable biological interest. The surprising effectiveness of three-dimensional structural information can be attributed to the use of homology models combined with the exploitation of both close and remote geometric relationships between proteins.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    Protein–protein interactions: interactome under construction. Nature 468, 851–854 (2010)

  2. 2.

    , & Interactome networks and human disease. Cell 144, 986–998 (2011)

  3. 3.

    & Deciphering protein–protein interactions. Part I. Experimental techniques and databases. PLOS Comput. Biol. 3, e42 (2007)

  4. 4.

    et al. Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae. J. Biol. 5, 11 (2006)

  5. 5.

    & Deciphering protein–protein interactions. Part II. Computational methods to predict protein and domain interaction partners. PLOS Comput. Biol. 3, e43 (2007)

  6. 6.

    & Computational methods of analysis of protein–protein interactions. Curr. Opin. Struct. Biol. 13, 377–382 (2003)

  7. 7.

    et al. Comparative assessment of large-scale data sets of protein–protein interactions. Nature 417, 399–403 (2002)

  8. 8.

    et al. An experimentally derived confidence score for binary protein–protein interactions. Nature Methods 6, 91–97 (2009)

  9. 9.

    , , & Protein interactions: two methods for assessment of the reliability of high throughput observations. Mol. Cell. Proteomics 1, 349–356 (2002)

  10. 10.

    et al. MODBASE: a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res. 34, D291–D295 (2006)

  11. 11.

    , , & Strategies for high-throughput comparative modeling: applications to leverage analysis in structural genomics and protein family organization. Proteins 66, 766–777 (2007)

  12. 12.

    & PQS: a protein quaternary structure file server. Trends Biochem. Sci. 23, 358–361 (1998)

  13. 13.

    & Interrogating protein interaction networks through structural biology. Proc. Natl Acad. Sci. USA 99, 5896–5901 (2002)

  14. 14.

    , & MULTIPROSPECTOR: an algorithm for the prediction of protein–protein interactions by multimeric threading. Proteins 49, 350–364 (2002)

  15. 15.

    et al. Protein complex compositions predicted by structural similarity. Nucleic Acids Res. 34, 2943–2952 (2006)

  16. 16.

    , , , & Architectures and functional coverage of protein–protein interfaces. J. Mol. Biol. 381, 785–802 (2008)

  17. 17.

    , , & Protein interface conservation across structure space. Proc. Natl Acad. Sci. USA 107, 10896–10901 (2010)

  18. 18.

    & Structural space of protein–protein interfaces is degenerate, close to complete, and highly connected. Proc. Natl Acad. Sci. USA 107, 22517–22522 (2010)

  19. 19.

    , , , & Towards the prediction of protein interaction partners using physical docking. Mol. Syst. Biol. 7, 469 (2011)

  20. 20.

    & Prediction of interface residues in protein–protein complexes by a consensus neural network method: test against NMR data. Proteins 61, 21–35 (2005)

  21. 21.

    , , & Protein binding site prediction using an empirical scoring function. Nucleic Acids Res. 34, 3698–3707 (2006)

  22. 22.

    et al. PredUs: a web server for predicting protein interfaces using structural neighbors. Nucleic Acids Res. 39, 283–287 (2011)

  23. 23.

    et al. High-quality binary protein interaction map of the yeast interactome network. Science 322, 104–110 (2008)

  24. 24.

    et al. A human B-cell interactome identifies MYB and FOXM1 as master regulators of proliferation in germinal centers. Mol. Syst. Biol. 6, 377 (2010)

  25. 25.

    et al. A Bayesian networks approach for predicting protein–protein interactions from genomic data. Science 302, 449–453 (2003)

  26. 26.

    et al. STRING: known and predicted protein–protein associations, integrated and transferred across organisms. Nucleic Acids Res. 33, D433–D437 (2005)

  27. 27.

    , & Lessons from the DREAM2 challenges. Ann. NY Acad. Sci. 1158, 159–195 (2009)

  28. 28.

    , & PRISM: protein–protein interaction prediction by structural matching. Methods Mol. Biol. 484, 505–521 (2008)

  29. 29.

    et al. Large-scale mapping of human protein–protein interactions by mass spectrometry. Mol. Syst. Biol. 3, 89 (2007)

  30. 30.

    Nature of the protein universe. Proc. Natl Acad. Sci. USA 106, 11079–11084 (2009)

  31. 31.

    et al. UniProt: the Universal Protein knowledgebase. Nucleic Acids Res. 32, D115–D119 (2004)

  32. 32.

    , & SMART 6: recent updates and new developments. Nucleic Acids Res. 37, D229–D232 (2009)

  33. 33.

    et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000)

  34. 34.

    et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)

  35. 35.

    & Large-scale protein structure modeling of the Saccharomyces cerevisiae genome. Proc. Natl Acad. Sci. USA 95, 13597–13602 (1998)

  36. 36.

    & GRASP2: visualization, surface properties, and electrostatics of macromolecular structures and sequences. Methods Enzymol. 374, 492–509 (2003)

  37. 37.

    & An integrated approach to the analysis and modeling of protein sequences and structures. I. Protein structural alignment and a quantitative measure for protein structural distance. J. Mol. Biol. 301, 665–678 (2000)

  38. 38.

    & Inference of macromolecular assemblies from crystalline state. J. Mol. Biol. 372, 774–797 (2007)

  39. 39.

    Gene ontology: tool for the unification of biology. Nature Genet. 25, 25–29 (2000)

  40. 40.

    , , , & MIPS: a database for protein sequences, homology data and yeast genome information. Nucleic Acids Res. 25, 28–30 (1997)

  41. 41.

    , , & Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. Genome Res. 10, 1204–1210 (2000)

  42. 42.

    et al. Neuronal and glioma-derived stem cell factor induces angiogenesis within the brain. Cancer Cell 9, 287–300 (2006)

  43. 43.

    et al. NCBI GEO: archive for functional genomics data sets—10 years on. Nucleic Acids Res. 39, D1005–D1010 (2011)

  44. 44.

    , & Phydbac “Gene Function Predictor”: a gene annotation tool based on genomic context analysis. BMC Bioinformatics 6, 247 (2005)

Download references

Acknowledgements

This work is supported by National Institutes of Health grants GM030518 and GM094597 (B.H.), CA121852 (A.C. and B.H.), DK057539 (D.A.), CA082683 (T.H.), R01NS043915 (T.M.). L.D. thanks the China Scholarship Council scholarship 2010626059. We thank U. Pieper from A. Sali’s laboratory for help with ModBase, and H. Lee for help with SkyBase.

Author information

Author notes

    • Qiangfeng Cliff Zhang
    •  & Donald Petrey

    These authors contributed equally to this work.

Affiliations

  1. Howard Hughes Medical Institute, Columbia University, New York, New York 10032, USA

    • Qiangfeng Cliff Zhang
    • , Donald Petrey
    •  & Barry Honig
  2. Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 10032, USA

    • Qiangfeng Cliff Zhang
    • , Donald Petrey
    • , Lei Deng
    • , Chan Aye Thu
    • , Tom Maniatis
    • , Andrea Califano
    •  & Barry Honig
  3. Center for Computational Biology and Bioinformatics, Columbia Initiative in Systems Biology, Columbia University, New York, New York 10032, USA

    • Qiangfeng Cliff Zhang
    • , Donald Petrey
    • , Lei Deng
    • , Brygida Bisikirska
    • , Celine Lefebvre
    • , Andrea Califano
    •  & Barry Honig
  4. Department of Computer Science and Technology, Tongji University, Shanghai 201804, China

    • Lei Deng
  5. Naomi Berrie Diabetes Center, Department of Medicine, College of Physicians & Surgeons of Columbia University, New York, New York 10032, USA

    • Li Qiang
    •  & Domenico Accili
  6. Molecular and Cell Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, California 92037, USA

    • Yu Shi
    •  & Tony Hunter
  7. Institute of Cancer Genetics, Columbia University, New York, New York 10032, USA

    • Celine Lefebvre
    •  & Andrea Califano
  8. Department of Biomedical Informatics, Columbia University, New York, New York 10032, USA

    • Andrea Califano

Authors

  1. Search for Qiangfeng Cliff Zhang in:

  2. Search for Donald Petrey in:

  3. Search for Lei Deng in:

  4. Search for Li Qiang in:

  5. Search for Yu Shi in:

  6. Search for Chan Aye Thu in:

  7. Search for Brygida Bisikirska in:

  8. Search for Celine Lefebvre in:

  9. Search for Domenico Accili in:

  10. Search for Tony Hunter in:

  11. Search for Tom Maniatis in:

  12. Search for Andrea Califano in:

  13. Search for Barry Honig in:

Contributions

Q.C.Z., D.P., A.C. and B.H. designed the research; Q.C.Z. performed the computational work; Q.C.Z., D.P., A.C. and B.H. analysed the data; L.D. set up the PrePPI web server, L.Q., Y.S., C.A.T. and B.B. performed co-immunoprecipitation studies, Q.C.Z., D.P., A.C. and B.H. wrote the paper including text from C.L., D.A., T.H. and T.M.

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Andrea Califano or Barry Honig.

Supplementary information

PDF files

  1. 1.

    Supplementary Information

    This file contains Supplementary Figures 1-16, Supplementary Tables 1-6 and additional references. Supplementary Figures 8, 9, 10C and Supplementary Table 4 were corrected on 7 March 2013; please see the Corrigendum associated with the main paper for details.

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nature11503

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.