We performed a systematic, large-scale analysis of human protein complexes comprising gene products implicated in many different categories of human disease to create a phenome-interactome network. This was done by integrating quality-controlled interactions of human proteins with a validated, computationally derived phenotype similarity score, permitting identification of previously unknown complexes likely to be associated with disease. Using a phenomic ranking of protein complexes linked to human disease, we developed a Bayesian predictor that in 298 of 669 linkage intervals correctly ranks the known disease-causing protein as the top candidate, and in 870 intervals with no identified disease-causing gene, provides novel candidates implicated in disorders such as retinitis pigmentosa, epithelial ovarian cancer, inflammatory bowel disease, amyotrophic lateral sclerosis, Alzheimer disease, type 2 diabetes and coronary heart disease. Our publicly available draft of protein complexes associated with pathology comprises 506 complexes, which reveal functional relationships between disease-promoting genes that will inform future experimentation.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $20.83 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Brunner, H.G. & van Driel, M.A. From syndrome families to functional genomics. Nat. Rev. Genet. 5, 545–551 (2004).
Adie, E.A., Adams, R.R., Evans, K.L., Porteous, D.J. & Pickard, B.S. Speeding disease gene discovery by sequence-based candidate prioritization. BMC Bioinformatics 6, 55 (2005).
Franke, L. et al. Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. Am. J. Hum. Genet. 78, 1011–1025 (2006).
Franke, L. et al. TEAM: a tool for the integration of expression, and linkage and association maps. Eur. J. Hum. Genet. 12, 633–638 (2004).
Turner, F.S., Clutterbuck, D.R. & Semple, C.A. POCUS: mining genomic sequence annotation to predict disease genes. Genome Biol. 4, R75 (2003).
Perez-Iratxeta, C., Bork, P. & Andrade, M.A. Association of genes to genetically inherited diseases using data mining. Nat. Genet. 31, 316–319 (2002).
Perez-Iratxeta, C., Wjst, M., Bork, P. & Andrade, M.A. G2D: a tool for mining genes associated with disease. BMC Genet. 6, 45 (2005).
Masseroli, M., Galati, O. & Pinciroli, F. GFINDer: genetic disease and phenotype location statistical analysis and mining of dynamically annotated gene lists. Nucleic Acids Res. 33, W717–W723 (2005).
van Driel, M.A., Cuelenaere, K., Kemmeren, P.P., Leunissen, J.A. & Brunner, H.G. A new web-based data mining tool for the identification of candidate genes for human genetic disorders. Eur. J. Hum. Genet. 11, 57–63 (2003).
van Driel, M.A. et al. GeneSeeker: extraction and integration of human disease-related information from web-based genetic databases. Nucleic Acids Res. 33, W758–761 (2005).
Hristovski, D., Peterlin, B., Mitchell, J.A. & Humphrey, S.M. Using literature-based discovery to identify disease candidate genes. Int. J. Med. Inform. 74, 289–298 (2005).
Freudenberg, J. & Propping, P. A similarity-based method for genome-wide prediction of disease-relevant human genes. Bioinformatics 18 Suppl 2, S110–S115 (2002).
Barabasi, A.L. & Oltvai, Z.N. Network biology: understanding the cell's functional organization. Nat. Rev. Genet. 5, 101–113 (2004).
van Driel, M.A., Bruggeman, J., Vriend, G., Brunner, H.G. & Leunissen, J.A. A text-mining analysis of the human phenome. Eur. J. Hum. Genet. 14, 535–542 (2006).
Gavin, A.C. et al. Proteome survey reveals modularity of the yeast cell machinery. Nature (2006).
Giot, L. et al. A protein interaction map of Drosophila melanogaster. Science 302, 1727–1736 (2003).
Walhout, A.J. et al. Integrating interactome, phenome, and transcriptome mapping data for the C. elegans germline. Curr. Biol. 12, 1952–1958 (2002).
Boulton, S.J. et al. Combined functional genomic maps of the C. elegans DNA damage response. Science 295, 127–131 (2002).
Gandhi, T.K. et al. Analysis of the human protein interactome and comparison with yeast, worm and fly interaction datasets. Nat. Genet. 38, 285–293 (2006).
Lim, J. et al. A protein-protein interaction network for human inherited ataxias and disorders of Purkinje cell degeneration. Cell 125, 801–814 (2006).
Oti, M., Snel, B., Huynen, M.A. & Brunner, H.G. Predicting disease genes using protein-protein interactions. J. Med. Genet. (2006).
Aerts, S. et al. Gene prioritization through genomic data fusion. Nat. Biotechnol. 24, 537–544 (2006).
von Mering, C. et al. Comparative assessment of large-scale data sets of protein-protein interactions. Nature 417, 399–403 (2002).
Rual, J.F. et al. Towards a proteome-scale map of the human protein-protein interaction network. Nature 437, 1173–1178 (2005).
Stelzl, U. et al. A human protein-protein interaction network: a resource for annotating the proteome. Cell 122, 957–968 (2005).
Korbel, J.O. et al. Systematic association of genes to phenotypes by genome and literature mining. PLoS Biol. 3, e134 (2005).
Schijvenaars, B.J. et al. Thesaurus-based disambiguation of gene symbols. BMC Bioinformatics 6, 149 (2005).
Butte, A.J. & Kohane, I.S. Creation and implications of a phenome-genome network. Nat. Biotechnol. 24, 55–62 (2006).
Hamosh, A., Scott, A.F., Amberger, J.S., Bocchini, C.A. & McKusick, V.A. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 33 (Database Issue), D514–D517 (2005).
Aronson, A.R. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc. AMIA Symp. 17–21 (2001).
Bodenreider, O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32, D267–D270 (2004).
Gerard Salton, M.J.M. Introduction to Modern Information Retrieval (Neal-Schuman Publishers, New York, 1983).
Divita, G., Tse, T. & Roth, L. Failure analysis of MetaMap Transfer (MMTx). Medinfo 11, 763–767 (2004).
Gu, S., Kumaramanickavel, G., Srikumari, C.R., Denton, M.J. & Gal, A. Autosomal recessive retinitis pigmentosa locus RP28 maps between D2S1337 and D2S286 on chromosome 2p11–p15 in an Indian family. J. Med. Genet. 36, 705–707 (1999).
Ito, T. et al. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl. Acad. Sci. USA 98, 4569–4574 (2001).
Sohocki, M.M. et al. A range of clinical phenotypes associated with mutations in CRX, a photoreceptor transcription-factor gene. Am. J. Hum. Genet. 63, 1307–1315 (1998).
Sekine, M. et al. Localization of a novel susceptibility gene for familial ovarian cancer to chromosome 3p22–p25. Hum. Mol. Genet. 10, 1421–1429 (2001).
Demuth, I. et al. An inducible null mutant murine model of Nijmegen breakage syndrome proves the essential function of NBS1 in chromosomal stability and cell viability. Hum. Mol. Genet. 13, 2385–2397 (2004).
Matsuura, S. et al. Positional cloning of the gene for Nijmegen breakage syndrome. Nat. Genet. 19, 179–181 (1998).
Castilla, L.H. et al. Mutations in the BRCA1 gene in families with early-onset breast and ovarian cancer. Nat. Genet. 8, 387–391 (1994).
Lancaster, J.M. et al. BRCA2 mutations in primary breast and ovarian cancers. Nat. Genet. 13, 238–240 (1996).
Taniguchi, T. et al. Disruption of the Fanconi anemia–BRCA pathway in cisplatin-sensitive ovarian tumors. Nat. Med. 9, 568–574 (2003).
Thompson, L.H. Unraveling the Fanconi anemia–DNA repair connection. Nat. Genet. 37, 921–922 (2005).
Dechairo, B. et al. Replication and extension studies of inflammatory bowel disease susceptibility regions confirm linkage to chromosome 6p (IBD3). Eur. J. Hum. Genet. 9, 627–633 (2001).
Hampe, J. et al. Linkage of inflammatory bowel disease to human chromosome 6p. Am. J. Hum. Genet. 65, 1647–1655 (1999).
Hosler, B.A. et al. Linkage of familial amyotrophic lateral sclerosis with frontotemporal dementia to chromosome 9q21–q22. J.A.M.A. 284, 1664–1669 (2000).
Koyama, S. et al. Alteration of familial ALS-linked mutant SOD1 solubility with disease progression: its modulation by the proteasome and Hsp70. Biochem. Biophys. Res. Commun. 343, 719–730 (2006).
Polavarapu, N. et al. Investigation into biomedical literature classification using support vector machines. Proc. IEEE Comput. Syst. Bioinform. Conf., 366–374 (8-11 August 2005).
Zanzoni, A. et al. MINT: a Molecular INTeraction database. FEBS Lett. 513, 135–140 (2002).
Bader, G.D., Betel, D. & Hogue, C.W. BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res. 31, 248–250 (2003).
Hermjakob, H. et al. IntAct: an open source molecular interaction database. Nucleic Acids Res. 32 (Database Issue), D452–D455 (2004).
Kanehisa, M. et al. From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res. 34, D354–D357 (2006).
Joshi-Tope, G. et al. Reactome: a knowledgebase of biological pathways. Nucleic Acids Res. 33, D428–D432 (2005).
Walhout, A.J. et al. Protein interaction mapping in C. elegans using proteins involved in vulval development. Science 287, 116–122 (2000).
Lehner, B. & Fraser, A.G. A first-draft human protein-interaction map. Genome Biol. 5, R63 (2004).
O'Brien, K.P., Remm, M. & Sonnhammer, E.L. Inparanoid: a comprehensive database of eukaryotic orthologs. Nucleic Acids Res. 33 (Database Issue), D476–D480 (2005).
The authors wish to thank Ulrik de Lichtenberg and Thomas Skøt Jensen for critical reading of the manuscript, editing and help in developing the protein interaction score. We also thank Christopher Workman and Zoltan Szallasi for valuable discussions and help with the manuscript. Y.M. is supported by K.U. Leuven GOA AMBioRICS, CoE EF/05/007 SymBioSys, BELSPO IUAP P6/25 BioMaGNet, EU-FP6-NoE Biopattern and EU-FP6-MC-EST Bioptrain. Z.M.S. is supported by an EU Biosapiens (NoE), FP6 grant. The Center for Biological Sequence Analysis and the Wilhelm Johannsen Center for Functional Genome Research are supported by the Danish National Research Foundation.
The authors declare no competing financial interests.
Measuring phenotype association scores between OMIM records. (PDF 287 kb)
Benchmark of the phenotype association score. (PDF 81 kb)
Benchmark of protein interaction score. (PDF 115 kb)
EOC candidate complex. (PDF 153 kb)
IBD candidate complex. (PDF 290 kb)
ALS with frontotemporal dementia candidate complex. (PDF 219 kb)
Influence of tf-idf weight on phenotype similarity scoring scheme. (PDF 83 kb)
Robustness of the cosine phenotype similarity measure. (PDF 79 kb)
Connection profiles. (PDF 101 kb)
Performance of phenotype similarity scheme on phenotypes with same molecular basis. (PDF 107 kb)
A randomly selected subset of 100 OMIM record pairs crossreferenced by the OMIM curators. (PDF 35 kb)
A list of 113 candidates identified by the Bayesian predictor. (PDF 30 kb)
Comparison of the different computational methods that have been tested by ranking candidate genes in linkage intervals. (PDF 7 kb)
Normalized connectivity of phenotypes from the training set and the prediction set. (PDF 12 kb)
Benchmarking subset where we ranked the correct gene as number one out of all candidates in the interval. (PDF 46 kb)
About this article
A seed-extended algorithm for detecting protein complexes based on density and modularity with topological structure and GO annotations
BMC Genomics (2019)
Journal of Biomedical Informatics (2019)
A computational approach to identify blood cell-expressed Parkinson's disease biomarkers that are coordinately expressed in brain tissue
Computers in Biology and Medicine (2019)
Nature Medicine (2019)
Interactome Analyses implicated CAMK2A in the genetic predisposition and pharmacological mechanism of Bipolar Disorder
Journal of Psychiatric Research (2019)