Protein-protein interactions (PPIs) are useful for understanding signaling cascades, predicting protein function, associating proteins with disease and fathoming drug mechanism of action. Currently, only ∼10% of human PPIs may be known, and about one-third of human proteins have no known interactions. We introduce FpClass, a data mining–based method for proteome-wide PPI prediction. At an estimated false discovery rate of 60%, we predicted 250,498 PPIs among 10,531 human proteins; 10,647 PPIs involved 1,089 proteins without known interactions. We experimentally tested 233 high- and medium-confidence predictions and validated 137 interactions, including seven novel putative interactors of the tumor suppressor p53. Compared to previous PPI prediction methods, FpClass achieved better agreement with experimentally detected PPIs. We provide an online database of annotated PPI predictions (http://ophid.utoronto.ca/fpclass/) and the prediction software (http://www.cs.utoronto.ca/~juris/data/fpclass/).
Subscribe to Journal
Get full journal access for 1 year
only $20.17 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Cusick, M.E. et al. Literature-curated protein interaction datasets. Nat. Methods 6, 39–46 (2009).
Pastrello, C. et al. Integration, visualization and analysis of human interactome. Biochem. Biophys. Res. Commun. 445, 757–773 (2014).
Bork, P. et al. Protein interaction networks from yeast to human. Curr. Opin. Struct. Biol. 14, 292–299 (2004).
Stumpf, M.P. et al. Estimating the size of the human interactome. Proc. Natl. Acad. Sci. USA 105, 6959–6964 (2008).
Venkatesan, K. et al. An empirical framework for binary interactome mapping. Nat. Methods 6, 83–90 (2009).
Edwards, A.M. et al. Too many roads not taken. Nature 470, 163–165 (2011).
Braun, P. et al. An experimentally derived confidence score for binary protein-protein interactions. Nat. Methods 6, 91–97 (2009).
Brückner, A., Polge, C., Lentze, N., Auerbach, D. & Schlattner, U. Yeast two-hybrid, a powerful tool for systems biology. Int. J. Mol. Sci. 10, 2763–2788 (2009).
Wodak, S.J., Pu, S., Vlasblom, J. & Séraphin, B. Challenges and rewards of interaction proteomics. Mol. Cell. Proteomics 8, 3–18 (2009).
Schwartz, A.S., Yu, J., Gardenour, K.R., Finley, R.L. Jr. & Ideker, T. Cost-effective strategies for completing the interactome. Nat. Methods 6, 55–61 (2009).
Rhodes, D.R. et al. Probabilistic model of the human protein-protein interaction network. Nat. Biotechnol. 23, 951–959 (2005).
Scott, M.S. & Barton, G.J. Probabilistic prediction and ranking of human protein-protein interactions. BMC Bioinformatics 8, 239 (2007).
Kim, J.H. & Pearl,, J. in Proc. IJCAI 190–193 (Morgan Kaufmann, 1983).
Petschnigg, J. et al. The mammalian-membrane two-hybrid assay (MaMTH) for probing membrane-protein interactions in human cells. Nat. Methods 11, 585–592 (2014).
Elefsinioti, A. et al. Large-scale de novo prediction of physical protein-protein association. Mol. Cell. Proteomics 10, M111.010629 (2011).
Zhang, Q.C. et al. Structure-based prediction of protein-protein interactions on a genome-wide scale. Nature 490, 556–560 (2012).
D'haeseleer, P. & Church, G.M. in Proc. IEEE Comput. Syst. Bioinform. Conf. 216–223 (IEEE, 2004).
Kang, H.S. et al. NABP1, a novel RORg-regulated gene encoding a single-stranded nucleic-acid-binding protein. Biochem. J. 397, 89–99 (2006).
Krokeide, S.Z. et al. Human NEIL3 is mainly a monofunctional DNA glycosylase removing spiroimindiohydantoin and guanidinohydantoin. DNA Repair (Amst.) 12, 1159–1164 (2013).
Menendez, D., Inga, A. & Resnick, M.A. The expanding universe of p53 targets. Nat. Rev. Cancer 9, 724–737 (2009).
Wang, W. et al. Identification of rare DNA variants in mitochondrial disorders with improved array-based sequencing. Nucleic Acids Res. 39, 44–58 (2011).
Vaseva, A.V. & Moll, U.M. The mitochondrial p53 pathway. Biochim. Biophys. Acta 1787, 414–420 (2009).
Gordon, S., Akopyan, G., Garban, H. & Bonavida, B. Transcription factor YY1: structure, function, and therapeutic implications in cancer biology. Oncogene 25, 1125–1142 (2006).
Tanikawa, C. et al. Regulation of protein citrullination through p53/PADI4 network in DNA damage response. Cancer Res. 69, 8761–8769 (2009).
Ashburner, M. et al. Gene Ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000).
Hunter, S. et al. InterPro: the integrative protein signature database. Nucleic Acids Res. 37, D211–D215 (2009).
Imming, P., Sinning, C. & Meyer, A. Drugs, their targets and the nature and number of drug targets. Nat. Rev. Drug Discov. 5, 821–834 (2006).
Su, A.I. et al. A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl. Acad. Sci. USA 101, 6062–6067 (2004).
Roth, R.B. et al. Gene expression analyses reveal molecular relationships among 20 regions of the human CNS. Neurogenetics 7, 67–80 (2006).
Wang, E.T. et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008).
Krupp, M. et al. RNA-Seq Atlas—a reference database for gene expression profiling in normal tissue by next-generation sequencing. Bioinformatics 28, 1184–1185 (2012).
Uhlen, M. et al. Towards a knowledge-based Human Protein Atlas. Nat. Biotechnol. 28, 1248–1250 (2010).
The UniProt Consortium. The Universal Protein Resource (UniProt) in 2010. Nucleic Acids Res. 38, D142–D148 (2010).
Brown, K.R. & Jurisica, I. Unequal evolutionary conservation of human protein interactions in interologous networks. Genome Biol. 8, R95 (2007).
Piccinin, S. et al. A “twist box” code of p53 inactivation: twist box: p53 interaction promotes p53 degradation. Cancer Cell 22, 404–415 (2012).
Hupp, T.R., Hayward, R.L. & Vojtesek, B. Strategies for p53 reactivation in human sarcoma. Cancer Cell 22, 283–285 (2012).
Sprinzak, E. & Margalit, H. Correlated sequence-signatures as markers of protein-protein interaction. J. Mol. Biol. 311, 681–692 (2001).
Zhang, Y. et al. Systematic analysis, comparison, and integration of disease based human genetic association data and mouse genetic phenotypic information. BMC Med. Genomics 3, 1 (2010).
Osborne, J.D. et al. Annotating the human genome with Disease Ontology. BMC Genomics 10 (suppl. 1), S6 (2009).
Davis, A.P. et al. The Comparative Toxicogenomics Database: update 2011. Nucleic Acids Res. 39, D1067–D1072 (2011).
Kotlyar, M., Fortney, K. & Jurisica, I. Network-based characterization of drug-regulated genes, drug targets, and toxicity. Methods 57, 499–507 (2012).
Maglott, D., Ostell, J., Pruitt, K.D. & Tatusova, T. Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res. 35, D26–D31 (2007).
Hedges, S.B., Dudley, J. & Kumar, S. TimeTree: a public knowledge-base of divergence times among organisms. Bioinformatics 22, 2971–2972 (2006).
Toll-Riera, M. et al. Origin of primate orphan genes: a comparative genomics approach. Mol. Biol. Evol. 26, 603–612 (2009).
Barshir, R. et al. The TissueNet database of human tissue protein-protein interactions. Nucleic Acids Res. 41, D841–D844 (2013).
Birzele, F., Gewehr, J.E. & Zimmer, R. AutoPSI: a database for automatic structural classification of protein sequences and structures. Nucleic Acids Res. 36, D398–D401 (2008).
Ward, J.J., Sodhi, J.S., McGuffin, L.J., Buxton, B.F. & Jones, D.T. Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J. Mol. Biol. 337, 635–645 (2004).
This research was supported by the grants from Genome Canada via the Ontario Genomics Institute, Ontario Research Fund (GL2-01-030, RE-03-020 to I.J.), the Canadian Institutes for Health Research (#99745, #93579 to I.J., A.J.), the Natural Sciences Research Council (#203475 to I.J.), US Army Department of Defense W81XWH-12-1-0501 (to I.J.), the Italian Association for Cancer Research, the Friuli Venezia-Giulia and CRO 5xmille Intramural Grant (to R.M.), the Friuli Venezia-Giulia Exchange Program (to C.P.), the Ontario Genomics Institute (#303547 to I.S.), the Canadian Institutes of Health Research (Catalyst-NHG99091, PPP-125785 to I.S.), the Canadian Cystic Fibrosis Foundation (#300348 to I.S.), the Canadian Cancer Society (2010-700406 to I.S.), Genentech and University Health Network (GL2-01-018 to I.S.), US National Institutes of Health (NIH) PO1/PPG grant 01CA0099031 (to G.B.M., I.J.) and NCI R21 CA126700 (to Z.D., G.B.M.). Computational resources were supported by grants from the Canada Foundation for Innovation (CFI #12301, #203373, #29272, #22540a, #30865) and IBM (to I.J.). I.J. is supported by the Canada Research Chair program. This research was also supported by the University of Toronto McLaughlin Centre and the Ontario Ministry of Health and Long-Term Care (OMOHLTC). The views expressed do not necessarily reflect those of the OMOHLTC. We thank M. Vidal, D. Hill, F. Roth and the Center for Cancer Systems Biology (Dana-Farber Cancer Institute) for prepublication release of protein interaction data, funded by NIH NHGRI grant R01 HG001715.
The authors declare no competing financial interests.
Integrated supplementary information
(a-b) We used the approach of D'Haeseleer and Church1 to estimate FDR. This approach calculates the FDR of a PPI dataset, D, by analyzing intersections among three PPI datasets, D, R, and D′, where R is a reference set of trusted PPIs and D′ is a set of PPIs from a method similar to that of D. It is assumed that the overlap of any two datasets contains largely true positive PPIs. The number of non-overlapping true positives, IV, is calculated from the numbers of shared PPIs: IV = (II × III) / I. Then, the number of false positives, V, and the FDR are calculated. The FDR tends to be low if D has a high overlap with either D′ or R. (c)To calculate the FDR of FpClass we initially set D to our top 35,000 proteome-wide predictions, excluding any PPIs used in training; (we subsequently calculated FDR for larger sets of FpClass predictions (panels d-g)). We defined R as a set of experimentally detected interactions and D′ as the union of high confidence predictions from previous studies by Rhodes et al., 2005, Scott et al., 2007, Elefsinioti et al., 2011, and Zhang et al., 2012. Using a similar approach, we calculated FDRs for high-confidence predictions from these previous studies. For example, to calculate the FDR for Rhodes et al., we defined D as high-confidence predictions from that study, and D′ as the union of top FpClass predictions and high-confidence predictions from the three remaining previous studies. To ensure that estimated FDRs were not due to biases of a particular reference set, we repeated FDR calculations using 6 reference sets. We calculated FDRs using each reference set, except when the intersection of datasets D, D', and R comprised less than 5 PPIs. In such cases the FDR is indicated as NA. (d-g) Using the approach of D'Haeseleer and Church, we estimated FDRs of predicted networks of various sizes from FpClass and four previous prediction methods. The approach of D'Haeseleer and Church requires a trusted reference set of PPIs. We tried four ways of defining this set: (d) using six reference sets (panel c) individually, and then calculating the median of the six resulting FDR estimates, (e) using the union of PPIs from methods that detect direct interactions (Y2H and LUMIER reference sets), (f) using the union of our six reference sets, and (g) using the union of Y2H reference sets.1D’haeseleer, P. & Church, G. M. Estimating and improving protein interaction error rates. Proc IEEE Comput Syst Bioinform Conf 216–223 (2004). 2Rhodes, D. R. et al. Probabilistic model of the human protein-protein interaction network. Nat Biotechnol 23, 951–959 (2005). 3Scott, M. S. & Barton, G. J. Probabilistic prediction and ranking of human protein-protein interactions. BMC Bioinformatics 8, 239 (2007). 4Elefsinioti, A. et al. Large-scale de novo prediction of physical protein-protein association. Mol Cell Proteomics 10, M111.010629 (2011). 5Zhang, Q. C. et al. Structure-based prediction of protein-protein interactions on a genome-wide scale. Nature 490, 556–560 (2012).
(a) Predicted interactions tested by Co-IP assays.(b-c) Predicted interactions tested by GST pull-down assays.(d) Predicted interaction partners of p53 include some of its known partners and d0 proteins. The x-axis indicates the number of top predicted partners, ranked from 1 to 2377. The y-axis indicates the number of known partners and d0 proteins, among the top predicted partners.
(a-c) GO analysis includes genes without GO annotations. (d-f) GO analysis excludes genes without GO annotations. P-values were calculated by hypergeometric tests and adjusted for multiple testing using FDR.
Supplementary Figure 4 Percentages of d0 proteins in drug-target classes and structural properties of d0 proteins.
(a) Main drug target classes and (b) receptor drug target classes, as defined by Imming et al.6. Dashed lines indicate the percentage of d0 proteins in the proteome. P-values were calculated by hypergeometric tests and adjusted for multiple testing using FDR. (c) SCOP structural classes. P-values were calculated by hypergeometric tests and adjusted for multiple testing using FDR. (d) Protein lengths from UniProt8 and (e) protein disorder, predicted with DISOPRED9. P-values for protein length and disorder were calculated by two-sided Mann-Whitney U tests.6Imming, P., Sinning, C. & Meyer, A. Drugs, their targets and the nature and number of drug targets. Nat Rev Drug Discov 5, 821–834 (2006). 7Andreeva, A. et al. Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res 36, D419–25 (2008). 8The UniProt Consortium. The Universal Protein Resource (UniProt) in 2010. Nucleic Acids Res 38, D142–8 (2010). 9Ward, J. J., Sodhi, J. S., McGuffin, L. J., Buxton, B. F. & Jones, D. T. Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J Mol Biol 337, 635–645 (2004).
P-values were calculated by two-sided Mann-Whitney U tests. (a-d) Median expression of d0 and dk genes in healthy human tissues. Gene expression data was taken from (a) Su et al., 200410, (b) Roth et al., 200611, (c) Wang et al., 200812, and (d) Krupp et al., 201213. (e-h) Maximum expression of d0 and dk genes in the same datasets.10Su, A. I. et al. A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci U S A 101, 6062–6067 (2004). 11Roth, R. B. et al. Gene expression analyses reveal molecular relationships among 20 regions of the human CNS. Neurogenetics 7, 67–80 (2006). 12Wang, E. T. et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008). 13Krupp, M. et al. RNA-Seq Atlas--a reference database for gene expression profiling in normal tissue by next-generation sequencing. Bioinformatics 28, 1184–1185 (2012).
Supplementary Figures 1–5, Supplementary Tables 1–10 and Supplementary Note (PDF 2171 kb)
FpClass code (ZIP 239668 kb)
Positive cases in our largest training set. (TXT 279 kb)
Predicted probabilities for protein pairs from Braun et al., 2009 (TXT 8 kb)
Cross-validation data: protein pairs and predicted probabilities. (TXT 48197 kb)
Experimentally tested predictions. (TXT 12 kb)
Fp60 network: predicted interactions with estimated FDR of 60%. (TXT 8784 kb)
D0 proteins: human proteins without experimentally detected interactions in I2D 1.95. (TXT 49 kb)
About this article
Cite this article
Kotlyar, M., Pastrello, C., Pivetta, F. et al. In silico prediction of physical protein interactions and characterization of interactome orphans. Nat Methods 12, 79–84 (2015). https://doi.org/10.1038/nmeth.3178
Familial globular glial tauopathy linked to MAPT mutations: molecular neuropathology and seeding capacity of a prototypical mixed neuronal and glial tauopathy
Acta Neuropathologica (2020)
BMC Plant Biology (2020)
Mass Spectrometry Reviews (2019)
Salivary Scavenger and Agglutinin (SALSA) Is Expressed in Mucosal Epithelial Cells and Decreased in Bronchial Epithelium of Asthmatic Horses
Frontiers in Veterinary Science (2019)