Analysis | Published:

A human phenome-interactome network of protein complexes implicated in genetic disorders

Nature Biotechnology volume 25, pages 309316 (2007) | Download Citation

Abstract

We performed a systematic, large-scale analysis of human protein complexes comprising gene products implicated in many different categories of human disease to create a phenome-interactome network. This was done by integrating quality-controlled interactions of human proteins with a validated, computationally derived phenotype similarity score, permitting identification of previously unknown complexes likely to be associated with disease. Using a phenomic ranking of protein complexes linked to human disease, we developed a Bayesian predictor that in 298 of 669 linkage intervals correctly ranks the known disease-causing protein as the top candidate, and in 870 intervals with no identified disease-causing gene, provides novel candidates implicated in disorders such as retinitis pigmentosa, epithelial ovarian cancer, inflammatory bowel disease, amyotrophic lateral sclerosis, Alzheimer disease, type 2 diabetes and coronary heart disease. Our publicly available draft of protein complexes associated with pathology comprises 506 complexes, which reveal functional relationships between disease-promoting genes that will inform future experimentation.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    & From syndrome families to functional genomics. Nat. Rev. Genet. 5, 545–551 (2004).

  2. 2.

    , , , & Speeding disease gene discovery by sequence-based candidate prioritization. BMC Bioinformatics 6, 55 (2005).

  3. 3.

    et al. Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. Am. J. Hum. Genet. 78, 1011–1025 (2006).

  4. 4.

    et al. TEAM: a tool for the integration of expression, and linkage and association maps. Eur. J. Hum. Genet. 12, 633–638 (2004).

  5. 5.

    , & POCUS: mining genomic sequence annotation to predict disease genes. Genome Biol. 4, R75 (2003).

  6. 6.

    , & Association of genes to genetically inherited diseases using data mining. Nat. Genet. 31, 316–319 (2002).

  7. 7.

    , , & G2D: a tool for mining genes associated with disease. BMC Genet. 6, 45 (2005).

  8. 8.

    , & GFINDer: genetic disease and phenotype location statistical analysis and mining of dynamically annotated gene lists. Nucleic Acids Res. 33, W717–W723 (2005).

  9. 9.

    , , , & A new web-based data mining tool for the identification of candidate genes for human genetic disorders. Eur. J. Hum. Genet. 11, 57–63 (2003).

  10. 10.

    et al. GeneSeeker: extraction and integration of human disease-related information from web-based genetic databases. Nucleic Acids Res. 33, W758–761 (2005).

  11. 11.

    , , & Using literature-based discovery to identify disease candidate genes. Int. J. Med. Inform. 74, 289–298 (2005).

  12. 12.

    & A similarity-based method for genome-wide prediction of disease-relevant human genes. Bioinformatics 18 Suppl 2, S110–S115 (2002).

  13. 13.

    & Network biology: understanding the cell's functional organization. Nat. Rev. Genet. 5, 101–113 (2004).

  14. 14.

    , , , & A text-mining analysis of the human phenome. Eur. J. Hum. Genet. 14, 535–542 (2006).

  15. 15.

    et al. Proteome survey reveals modularity of the yeast cell machinery. Nature (2006).

  16. 16.

    et al. A protein interaction map of Drosophila melanogaster. Science 302, 1727–1736 (2003).

  17. 17.

    et al. Integrating interactome, phenome, and transcriptome mapping data for the C. elegans germline. Curr. Biol. 12, 1952–1958 (2002).

  18. 18.

    et al. Combined functional genomic maps of the C. elegans DNA damage response. Science 295, 127–131 (2002).

  19. 19.

    et al. Analysis of the human protein interactome and comparison with yeast, worm and fly interaction datasets. Nat. Genet. 38, 285–293 (2006).

  20. 20.

    et al. A protein-protein interaction network for human inherited ataxias and disorders of Purkinje cell degeneration. Cell 125, 801–814 (2006).

  21. 21.

    , , & Predicting disease genes using protein-protein interactions. J. Med. Genet. (2006).

  22. 22.

    et al. Gene prioritization through genomic data fusion. Nat. Biotechnol. 24, 537–544 (2006).

  23. 23.

    et al. Comparative assessment of large-scale data sets of protein-protein interactions. Nature 417, 399–403 (2002).

  24. 24.

    et al. Towards a proteome-scale map of the human protein-protein interaction network. Nature 437, 1173–1178 (2005).

  25. 25.

    et al. A human protein-protein interaction network: a resource for annotating the proteome. Cell 122, 957–968 (2005).

  26. 26.

    et al. Systematic association of genes to phenotypes by genome and literature mining. PLoS Biol. 3, e134 (2005).

  27. 27.

    et al. Thesaurus-based disambiguation of gene symbols. BMC Bioinformatics 6, 149 (2005).

  28. 28.

    & Creation and implications of a phenome-genome network. Nat. Biotechnol. 24, 55–62 (2006).

  29. 29.

    , , , & Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 33 (Database Issue), D514–D517 (2005).

  30. 30.

    Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc. AMIA Symp. 17–21 (2001).

  31. 31.

    The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32, D267–D270 (2004).

  32. 32.

    Introduction to Modern Information Retrieval (Neal-Schuman Publishers, New York, 1983).

  33. 33.

    , & Failure analysis of MetaMap Transfer (MMTx). Medinfo 11, 763–767 (2004).

  34. 34.

    , , , & Autosomal recessive retinitis pigmentosa locus RP28 maps between D2S1337 and D2S286 on chromosome 2p11–p15 in an Indian family. J. Med. Genet. 36, 705–707 (1999).

  35. 35.

    et al. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl. Acad. Sci. USA 98, 4569–4574 (2001).

  36. 36.

    et al. A range of clinical phenotypes associated with mutations in CRX, a photoreceptor transcription-factor gene. Am. J. Hum. Genet. 63, 1307–1315 (1998).

  37. 37.

    et al. Localization of a novel susceptibility gene for familial ovarian cancer to chromosome 3p22–p25. Hum. Mol. Genet. 10, 1421–1429 (2001).

  38. 38.

    et al. An inducible null mutant murine model of Nijmegen breakage syndrome proves the essential function of NBS1 in chromosomal stability and cell viability. Hum. Mol. Genet. 13, 2385–2397 (2004).

  39. 39.

    et al. Positional cloning of the gene for Nijmegen breakage syndrome. Nat. Genet. 19, 179–181 (1998).

  40. 40.

    et al. Mutations in the BRCA1 gene in families with early-onset breast and ovarian cancer. Nat. Genet. 8, 387–391 (1994).

  41. 41.

    et al. BRCA2 mutations in primary breast and ovarian cancers. Nat. Genet. 13, 238–240 (1996).

  42. 42.

    et al. Disruption of the Fanconi anemia–BRCA pathway in cisplatin-sensitive ovarian tumors. Nat. Med. 9, 568–574 (2003).

  43. 43.

    Unraveling the Fanconi anemia–DNA repair connection. Nat. Genet. 37, 921–922 (2005).

  44. 44.

    et al. Replication and extension studies of inflammatory bowel disease susceptibility regions confirm linkage to chromosome 6p (IBD3). Eur. J. Hum. Genet. 9, 627–633 (2001).

  45. 45.

    et al. Linkage of inflammatory bowel disease to human chromosome 6p. Am. J. Hum. Genet. 65, 1647–1655 (1999).

  46. 46.

    et al. Linkage of familial amyotrophic lateral sclerosis with frontotemporal dementia to chromosome 9q21–q22. J.A.M.A. 284, 1664–1669 (2000).

  47. 47.

    et al. Alteration of familial ALS-linked mutant SOD1 solubility with disease progression: its modulation by the proteasome and Hsp70. Biochem. Biophys. Res. Commun. 343, 719–730 (2006).

  48. 48.

    et al. Investigation into biomedical literature classification using support vector machines. Proc. IEEE Comput. Syst. Bioinform. Conf., 366–374 (8-11 August 2005).

  49. 49.

    et al. MINT: a Molecular INTeraction database. FEBS Lett. 513, 135–140 (2002).

  50. 50.

    , & BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res. 31, 248–250 (2003).

  51. 51.

    et al. IntAct: an open source molecular interaction database. Nucleic Acids Res. 32 (Database Issue), D452–D455 (2004).

  52. 52.

    et al. From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res. 34, D354–D357 (2006).

  53. 53.

    et al. Reactome: a knowledgebase of biological pathways. Nucleic Acids Res. 33, D428–D432 (2005).

  54. 54.

    et al. Protein interaction mapping in C. elegans using proteins involved in vulval development. Science 287, 116–122 (2000).

  55. 55.

    & A first-draft human protein-interaction map. Genome Biol. 5, R63 (2004).

  56. 56.

    , & Inparanoid: a comprehensive database of eukaryotic orthologs. Nucleic Acids Res. 33 (Database Issue), D476–D480 (2005).

Download references

Acknowledgements

The authors wish to thank Ulrik de Lichtenberg and Thomas Skøt Jensen for critical reading of the manuscript, editing and help in developing the protein interaction score. We also thank Christopher Workman and Zoltan Szallasi for valuable discussions and help with the manuscript. Y.M. is supported by K.U. Leuven GOA AMBioRICS, CoE EF/05/007 SymBioSys, BELSPO IUAP P6/25 BioMaGNet, EU-FP6-NoE Biopattern and EU-FP6-MC-EST Bioptrain. Z.M.S. is supported by an EU Biosapiens (NoE), FP6 grant. The Center for Biological Sequence Analysis and the Wilhelm Johannsen Center for Functional Genome Research are supported by the Danish National Research Foundation.

Author information

Author notes

    • Kasper Lage
    •  & E Olof Karlberg

    These authors contributed equally to this work.

Affiliations

  1. Center for Biological Sequence Analysis, BioCentrum-DTU, Technical University of Denmark, Building 208, DK-2800 Lyngby, Denmark.

    • Kasper Lage
    • , E Olof Karlberg
    • , Zenia M Størling
    • , Páll Í Ólason
    • , Anders G Pedersen
    • , Olga Rigina
    • , Anders M Hinsby
    •  & Søren Brunak
  2. Wilhelm Johannsen Centre for Functional Genome Research, Department of Cellular and Molecular Medicine, The Panum Institute, University of Copenhagen, Blegdamsvej 3, DK-2200, Copenhagen N, Denmark.

    • Zeynep Tümer
    •  & Niels Tommerup
  3. Institute for Clinical Science, University of Lund, SE – 22100 Lund, Sweden.

    • Flemming Pociot
  4. Steno Diabetes Center, Niels Steensesvej 2, DK-2820 Gentofte, Denmark.

    • Flemming Pociot
  5. Department of Electrical Engineering, Faculty of Engineering, Katholieke Universiteit Leuven, B–3001 Heverlee, Belgium.

    • Yves Moreau

Authors

  1. Search for Kasper Lage in:

  2. Search for E Olof Karlberg in:

  3. Search for Zenia M Størling in:

  4. Search for Páll Í Ólason in:

  5. Search for Anders G Pedersen in:

  6. Search for Olga Rigina in:

  7. Search for Anders M Hinsby in:

  8. Search for Zeynep Tümer in:

  9. Search for Flemming Pociot in:

  10. Search for Niels Tommerup in:

  11. Search for Yves Moreau in:

  12. Search for Søren Brunak in:

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Søren Brunak.

Supplementary information

PDF files

  1. 1.

    Supplementary Fig. 1

    Measuring phenotype association scores between OMIM records.

  2. 2.

    Supplementary Fig. 2

    Benchmark of the phenotype association score.

  3. 3.

    Supplementary Fig. 3

    Benchmark of protein interaction score.

  4. 4.

    Supplementary Fig. 4

    EOC candidate complex.

  5. 5.

    Supplementary Fig. 5

    IBD candidate complex.

  6. 6.

    Supplementary Fig. 6

    ALS with frontotemporal dementia candidate complex.

  7. 7.

    Supplementary Fig. 7

    Influence of tf-idf weight on phenotype similarity scoring scheme.

  8. 8.

    Supplementary Fig. 8

    Robustness of the cosine phenotype similarity measure.

  9. 9.

    Supplementary Fig. 9

    Connection profiles.

  10. 10.

    Supplementary Fig. 10

    Performance of phenotype similarity scheme on phenotypes with same molecular basis.

  11. 11.

    Supplementary Table 1

    A randomly selected subset of 100 OMIM record pairs crossreferenced by the OMIM curators.

  12. 12.

    Supplementary Table 2

    A list of 113 candidates identified by the Bayesian predictor.

  13. 13.

    Supplementary Table 3

    Comparison of the different computational methods that have been tested by ranking candidate genes in linkage intervals.

  14. 14.

    Supplementary Table 4

    Normalized connectivity of phenotypes from the training set and the prediction set.

  15. 15.

    Supplementary Table 5

    Benchmarking subset where we ranked the correct gene as number one out of all candidates in the interval.

  16. 16.

    Supplementary Data

  17. 17.

    Supplementary Methods

About this article

Publication history

Published

DOI

https://doi.org/10.1038/nbt1295

Further reading