A human phenome-interactome network of protein complexes implicated in genetic disorders

Article metrics


We performed a systematic, large-scale analysis of human protein complexes comprising gene products implicated in many different categories of human disease to create a phenome-interactome network. This was done by integrating quality-controlled interactions of human proteins with a validated, computationally derived phenotype similarity score, permitting identification of previously unknown complexes likely to be associated with disease. Using a phenomic ranking of protein complexes linked to human disease, we developed a Bayesian predictor that in 298 of 669 linkage intervals correctly ranks the known disease-causing protein as the top candidate, and in 870 intervals with no identified disease-causing gene, provides novel candidates implicated in disorders such as retinitis pigmentosa, epithelial ovarian cancer, inflammatory bowel disease, amyotrophic lateral sclerosis, Alzheimer disease, type 2 diabetes and coronary heart disease. Our publicly available draft of protein complexes associated with pathology comprises 506 complexes, which reveal functional relationships between disease-promoting genes that will inform future experimentation.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Steps in scoring each candidate in a linkage interval.
Figure 2: Performance of the Bayesian predictor.
Figure 3: Case studies of four candidate complexes.


  1. 1

    Brunner, H.G. & van Driel, M.A. From syndrome families to functional genomics. Nat. Rev. Genet. 5, 545–551 (2004).

  2. 2

    Adie, E.A., Adams, R.R., Evans, K.L., Porteous, D.J. & Pickard, B.S. Speeding disease gene discovery by sequence-based candidate prioritization. BMC Bioinformatics 6, 55 (2005).

  3. 3

    Franke, L. et al. Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. Am. J. Hum. Genet. 78, 1011–1025 (2006).

  4. 4

    Franke, L. et al. TEAM: a tool for the integration of expression, and linkage and association maps. Eur. J. Hum. Genet. 12, 633–638 (2004).

  5. 5

    Turner, F.S., Clutterbuck, D.R. & Semple, C.A. POCUS: mining genomic sequence annotation to predict disease genes. Genome Biol. 4, R75 (2003).

  6. 6

    Perez-Iratxeta, C., Bork, P. & Andrade, M.A. Association of genes to genetically inherited diseases using data mining. Nat. Genet. 31, 316–319 (2002).

  7. 7

    Perez-Iratxeta, C., Wjst, M., Bork, P. & Andrade, M.A. G2D: a tool for mining genes associated with disease. BMC Genet. 6, 45 (2005).

  8. 8

    Masseroli, M., Galati, O. & Pinciroli, F. GFINDer: genetic disease and phenotype location statistical analysis and mining of dynamically annotated gene lists. Nucleic Acids Res. 33, W717–W723 (2005).

  9. 9

    van Driel, M.A., Cuelenaere, K., Kemmeren, P.P., Leunissen, J.A. & Brunner, H.G. A new web-based data mining tool for the identification of candidate genes for human genetic disorders. Eur. J. Hum. Genet. 11, 57–63 (2003).

  10. 10

    van Driel, M.A. et al. GeneSeeker: extraction and integration of human disease-related information from web-based genetic databases. Nucleic Acids Res. 33, W758–761 (2005).

  11. 11

    Hristovski, D., Peterlin, B., Mitchell, J.A. & Humphrey, S.M. Using literature-based discovery to identify disease candidate genes. Int. J. Med. Inform. 74, 289–298 (2005).

  12. 12

    Freudenberg, J. & Propping, P. A similarity-based method for genome-wide prediction of disease-relevant human genes. Bioinformatics 18 Suppl 2, S110–S115 (2002).

  13. 13

    Barabasi, A.L. & Oltvai, Z.N. Network biology: understanding the cell's functional organization. Nat. Rev. Genet. 5, 101–113 (2004).

  14. 14

    van Driel, M.A., Bruggeman, J., Vriend, G., Brunner, H.G. & Leunissen, J.A. A text-mining analysis of the human phenome. Eur. J. Hum. Genet. 14, 535–542 (2006).

  15. 15

    Gavin, A.C. et al. Proteome survey reveals modularity of the yeast cell machinery. Nature (2006).

  16. 16

    Giot, L. et al. A protein interaction map of Drosophila melanogaster. Science 302, 1727–1736 (2003).

  17. 17

    Walhout, A.J. et al. Integrating interactome, phenome, and transcriptome mapping data for the C. elegans germline. Curr. Biol. 12, 1952–1958 (2002).

  18. 18

    Boulton, S.J. et al. Combined functional genomic maps of the C. elegans DNA damage response. Science 295, 127–131 (2002).

  19. 19

    Gandhi, T.K. et al. Analysis of the human protein interactome and comparison with yeast, worm and fly interaction datasets. Nat. Genet. 38, 285–293 (2006).

  20. 20

    Lim, J. et al. A protein-protein interaction network for human inherited ataxias and disorders of Purkinje cell degeneration. Cell 125, 801–814 (2006).

  21. 21

    Oti, M., Snel, B., Huynen, M.A. & Brunner, H.G. Predicting disease genes using protein-protein interactions. J. Med. Genet. (2006).

  22. 22

    Aerts, S. et al. Gene prioritization through genomic data fusion. Nat. Biotechnol. 24, 537–544 (2006).

  23. 23

    von Mering, C. et al. Comparative assessment of large-scale data sets of protein-protein interactions. Nature 417, 399–403 (2002).

  24. 24

    Rual, J.F. et al. Towards a proteome-scale map of the human protein-protein interaction network. Nature 437, 1173–1178 (2005).

  25. 25

    Stelzl, U. et al. A human protein-protein interaction network: a resource for annotating the proteome. Cell 122, 957–968 (2005).

  26. 26

    Korbel, J.O. et al. Systematic association of genes to phenotypes by genome and literature mining. PLoS Biol. 3, e134 (2005).

  27. 27

    Schijvenaars, B.J. et al. Thesaurus-based disambiguation of gene symbols. BMC Bioinformatics 6, 149 (2005).

  28. 28

    Butte, A.J. & Kohane, I.S. Creation and implications of a phenome-genome network. Nat. Biotechnol. 24, 55–62 (2006).

  29. 29

    Hamosh, A., Scott, A.F., Amberger, J.S., Bocchini, C.A. & McKusick, V.A. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 33 (Database Issue), D514–D517 (2005).

  30. 30

    Aronson, A.R. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc. AMIA Symp. 17–21 (2001).

  31. 31

    Bodenreider, O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32, D267–D270 (2004).

  32. 32

    Gerard Salton, M.J.M. Introduction to Modern Information Retrieval (Neal-Schuman Publishers, New York, 1983).

  33. 33

    Divita, G., Tse, T. & Roth, L. Failure analysis of MetaMap Transfer (MMTx). Medinfo 11, 763–767 (2004).

  34. 34

    Gu, S., Kumaramanickavel, G., Srikumari, C.R., Denton, M.J. & Gal, A. Autosomal recessive retinitis pigmentosa locus RP28 maps between D2S1337 and D2S286 on chromosome 2p11–p15 in an Indian family. J. Med. Genet. 36, 705–707 (1999).

  35. 35

    Ito, T. et al. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl. Acad. Sci. USA 98, 4569–4574 (2001).

  36. 36

    Sohocki, M.M. et al. A range of clinical phenotypes associated with mutations in CRX, a photoreceptor transcription-factor gene. Am. J. Hum. Genet. 63, 1307–1315 (1998).

  37. 37

    Sekine, M. et al. Localization of a novel susceptibility gene for familial ovarian cancer to chromosome 3p22–p25. Hum. Mol. Genet. 10, 1421–1429 (2001).

  38. 38

    Demuth, I. et al. An inducible null mutant murine model of Nijmegen breakage syndrome proves the essential function of NBS1 in chromosomal stability and cell viability. Hum. Mol. Genet. 13, 2385–2397 (2004).

  39. 39

    Matsuura, S. et al. Positional cloning of the gene for Nijmegen breakage syndrome. Nat. Genet. 19, 179–181 (1998).

  40. 40

    Castilla, L.H. et al. Mutations in the BRCA1 gene in families with early-onset breast and ovarian cancer. Nat. Genet. 8, 387–391 (1994).

  41. 41

    Lancaster, J.M. et al. BRCA2 mutations in primary breast and ovarian cancers. Nat. Genet. 13, 238–240 (1996).

  42. 42

    Taniguchi, T. et al. Disruption of the Fanconi anemia–BRCA pathway in cisplatin-sensitive ovarian tumors. Nat. Med. 9, 568–574 (2003).

  43. 43

    Thompson, L.H. Unraveling the Fanconi anemia–DNA repair connection. Nat. Genet. 37, 921–922 (2005).

  44. 44

    Dechairo, B. et al. Replication and extension studies of inflammatory bowel disease susceptibility regions confirm linkage to chromosome 6p (IBD3). Eur. J. Hum. Genet. 9, 627–633 (2001).

  45. 45

    Hampe, J. et al. Linkage of inflammatory bowel disease to human chromosome 6p. Am. J. Hum. Genet. 65, 1647–1655 (1999).

  46. 46

    Hosler, B.A. et al. Linkage of familial amyotrophic lateral sclerosis with frontotemporal dementia to chromosome 9q21–q22. J.A.M.A. 284, 1664–1669 (2000).

  47. 47

    Koyama, S. et al. Alteration of familial ALS-linked mutant SOD1 solubility with disease progression: its modulation by the proteasome and Hsp70. Biochem. Biophys. Res. Commun. 343, 719–730 (2006).

  48. 48

    Polavarapu, N. et al. Investigation into biomedical literature classification using support vector machines. Proc. IEEE Comput. Syst. Bioinform. Conf., 366–374 (8-11 August 2005).

  49. 49

    Zanzoni, A. et al. MINT: a Molecular INTeraction database. FEBS Lett. 513, 135–140 (2002).

  50. 50

    Bader, G.D., Betel, D. & Hogue, C.W. BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res. 31, 248–250 (2003).

  51. 51

    Hermjakob, H. et al. IntAct: an open source molecular interaction database. Nucleic Acids Res. 32 (Database Issue), D452–D455 (2004).

  52. 52

    Kanehisa, M. et al. From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res. 34, D354–D357 (2006).

  53. 53

    Joshi-Tope, G. et al. Reactome: a knowledgebase of biological pathways. Nucleic Acids Res. 33, D428–D432 (2005).

  54. 54

    Walhout, A.J. et al. Protein interaction mapping in C. elegans using proteins involved in vulval development. Science 287, 116–122 (2000).

  55. 55

    Lehner, B. & Fraser, A.G. A first-draft human protein-interaction map. Genome Biol. 5, R63 (2004).

  56. 56

    O'Brien, K.P., Remm, M. & Sonnhammer, E.L. Inparanoid: a comprehensive database of eukaryotic orthologs. Nucleic Acids Res. 33 (Database Issue), D476–D480 (2005).

Download references


The authors wish to thank Ulrik de Lichtenberg and Thomas Skøt Jensen for critical reading of the manuscript, editing and help in developing the protein interaction score. We also thank Christopher Workman and Zoltan Szallasi for valuable discussions and help with the manuscript. Y.M. is supported by K.U. Leuven GOA AMBioRICS, CoE EF/05/007 SymBioSys, BELSPO IUAP P6/25 BioMaGNet, EU-FP6-NoE Biopattern and EU-FP6-MC-EST Bioptrain. Z.M.S. is supported by an EU Biosapiens (NoE), FP6 grant. The Center for Biological Sequence Analysis and the Wilhelm Johannsen Center for Functional Genome Research are supported by the Danish National Research Foundation.

Author information

Correspondence to Søren Brunak.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Fig. 1

Measuring phenotype association scores between OMIM records. (PDF 287 kb)

Supplementary Fig. 2

Benchmark of the phenotype association score. (PDF 81 kb)

Supplementary Fig. 3

Benchmark of protein interaction score. (PDF 115 kb)

Supplementary Fig. 4

EOC candidate complex. (PDF 153 kb)

Supplementary Fig. 5

IBD candidate complex. (PDF 290 kb)

Supplementary Fig. 6

ALS with frontotemporal dementia candidate complex. (PDF 219 kb)

Supplementary Fig. 7

Influence of tf-idf weight on phenotype similarity scoring scheme. (PDF 83 kb)

Supplementary Fig. 8

Robustness of the cosine phenotype similarity measure. (PDF 79 kb)

Supplementary Fig. 9

Connection profiles. (PDF 101 kb)

Supplementary Fig. 10

Performance of phenotype similarity scheme on phenotypes with same molecular basis. (PDF 107 kb)

Supplementary Table 1

A randomly selected subset of 100 OMIM record pairs crossreferenced by the OMIM curators. (PDF 35 kb)

Supplementary Table 2

A list of 113 candidates identified by the Bayesian predictor. (PDF 30 kb)

Supplementary Table 3

Comparison of the different computational methods that have been tested by ranking candidate genes in linkage intervals. (PDF 7 kb)

Supplementary Table 4

Normalized connectivity of phenotypes from the training set and the prediction set. (PDF 12 kb)

Supplementary Table 5

Benchmarking subset where we ranked the correct gene as number one out of all candidates in the interval. (PDF 46 kb)

Supplementary Data (PDF 23 kb)

Supplementary Methods (PDF 148 kb)

Rights and permissions

Reprints and Permissions

About this article

Further reading