Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Analysis
  • Published:

Creation and implications of a phenome-genome network

Abstract

Although gene and protein measurements are increasing in quantity and comprehensiveness, they do not characterize a sample's entire phenotype in an environmental or experimental context. Here we comprehensively consider associations between components of phenotype, genotype and environment to identify genes that may govern phenotype and responses to the environment. Context from the annotations of gene expression data sets in the Gene Expression Omnibus is represented using the Unified Medical Language System, a compendium of biomedical vocabularies with nearly 1-million concepts. After showing how data sets can be clustered by annotative concepts, we find a network of relations between phenotypic, disease, environmental and experimental contexts as well as genes with differential expression associated with these concepts. We identify novel genes related to concepts such as aging. Comprehensively identifying genes related to phenotype and environment is a step toward the Human Phenome Project5.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: The method of extracting and relating genome, phenome and envirome data from GEO data sets.
Figure 2: Hierarchical clustering of 448 GEO data sets by context, created by treating each data set as a vector representing the presence or absence of a mapping from that data set to each UMLS concept, then calculating binary distance between data sets and clustering using complete linkage.
Figure 3: Network of relations between 46 biomedical concepts extracted from the annotations of data sets in Gene Expression Omnibus and 444 genes with differential expression associated with the presence or absence of the concept.

Similar content being viewed by others

References

  1. Carson, J.P. et al. Pharmacogenomic identification of targets for adjuvant therapy with the topoisomerase poison camptothecin. Cancer Res. 64, 2096–2104 (2004).

    Article  CAS  PubMed  Google Scholar 

  2. Zhukov, T.A., Johanson, R.A., Cantor, A.B., Clark, R.A. & Tockman, M.S. Discovery of distinct protein profiles specific for lung tumors and pre-malignant lung lesions by SELDI mass spectrometry. Lung Cancer 40, 267–279 (2003).

    Article  PubMed  Google Scholar 

  3. Bhattacharjee, A. et al. Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc. Natl Acad. Sci. USA 98, 13790–13795 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Yanagisawa, K. et al. Proteomic patterns of tumour subsets in non-small-cell lung cancer. Lancet 362, 433–439 (2003).

    Article  CAS  PubMed  Google Scholar 

  5. Anthony, J.C., Eaton, W.W. & Henderson, A.S. Looking to the future in psychiatric epidemiology. Epidemiol. Rev. 17, 240–242 (1995).

    Article  CAS  PubMed  Google Scholar 

  6. Freimer, N. & Sabatti, C. The human phenome project. Nat. Genet. 34, 15–21 (2003).

    Article  CAS  PubMed  Google Scholar 

  7. Mahner, M. & Kary, M. What exactly are genomes, genotypes and phenotypes? And what about phenomes? J. Theor. Biol. 186, 55–63 (1997).

    Article  CAS  PubMed  Google Scholar 

  8. Stoll, M. et al. A genomic-systems biology map for cardiovascular function. Science 294, 1723–1726 (2001).

    Article  CAS  PubMed  Google Scholar 

  9. Wheeler, D.L. et al. Database resources of the National Center for Biotechnology Information: update. Nucleic Acids Res. 32 Database issue, D35–40 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Brazma, A. et al. Minimum information about a microarray experiment (MIAME)—toward standards for microarray data. Nat. Genet. 29, 365–371 (2001).

    Article  CAS  PubMed  Google Scholar 

  11. Spellman, P.T. et al. Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol. 3, RESEARCH0046 (2002).

    Article  PubMed  PubMed Central  Google Scholar 

  12. Bodenreider, O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32 Database issue, D267–70 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. International Classification of Diseases. Clinical Modification (ICD-9-CM), 9th revision, (Centers for Medicare & Medicaid Services, Washington DC, 2003).

  15. Storey, J.D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA 100, 9440–9445 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Jo, K., Rutten, B., Bunn, R.C. & Bredt, D.S. Actinin-associated LIM protein-deficient mice maintain normal development and structure of skeletal muscle. Mol. Cell. Biol. 21, 1682–1687 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Su, A.I. et al. A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl Acad. Sci. USA 101, 6062–6067 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Rikans, L.E. & Moore, D.R. Influence of aging on rat liver enzymes involved in glutathione synthesis and degradation. Arch. Gerontol. Geriatr. 13, 263–270 (1991).

    Article  CAS  PubMed  Google Scholar 

  19. Ninfali, P., Aluigi, G. & Pompella, A. Postnatal expression of glucose-6-phosphate dehydrogenase in different brain areas. Neurochem. Res. 23, 1197–1204 (1998).

    Article  CAS  PubMed  Google Scholar 

  20. Cocco, P. et al. Mortality in a cohort of men expressing the glucose-6-phosphate dehydrogenase deficiency. Blood 91, 706–709 (1998).

    CAS  PubMed  Google Scholar 

  21. Kyng, K.J., May, A., Kolvraa, S. & Bohr, V.A. Gene expression profiling in Werner syndrome closely resembles that of normal aging. Proc. Natl Acad. Sci. USA 100, 12259–12264 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Muschen, M. et al. Molecular portraits of B cell lineage commitment. Proc. Natl Acad. Sci. USA 99, 10014–10019 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Bhattacharya, B. et al. Gene expression in human embryonic stem cell lines: unique molecular signature. Blood 103, 2956–2964 (2004).

    Article  CAS  PubMed  Google Scholar 

  24. Zhao, Y. et al. Cloning and characterization of human DDX24 and mouse Ddx24, two novel putative DEAD-Box proteins, and mapping DDX24 to human chromosome 14q32. Genomics 67, 351–355 (2000).

    Article  CAS  PubMed  Google Scholar 

  25. Mirochnitchenko, O. et al. Acetaminophen toxicity. Opposite effects of two forms of glutathione peroxidase. J. Biol. Chem. 274, 10349–10355 (1999).

    Article  CAS  PubMed  Google Scholar 

  26. Sandre, C. et al. Early evolution of selenium status and oxidative stress parameters in rat models of thermal injury. J. Trace Elem. Med. Biol. 17, 313–318 (2004).

    Article  CAS  PubMed  Google Scholar 

  27. Topsakal, C. et al. Effects of prostaglandin E1, melatonin, and oxytetracycline on lipid peroxidation, antioxidant defense system, paraoxonase (PON1) activities, and homocysteine levels in an animal model of spinal cord injury. Spine 28, 1643–1652 (2003).

    PubMed  Google Scholar 

  28. Sharma, G.D., He, J. & Bazan, H.E. p38 and ERK1/2 coordinate cellular migration and proliferation in epithelial wound healing: evidence of cross-talk activation between MAP kinase cascades. J. Biol. Chem. 278, 21989–21997 (2003).

    Article  CAS  PubMed  Google Scholar 

  29. Schulz, R. et al. Ischemic preconditioning preserves connexin 43 phosphorylation during sustained ischemia in pig hearts in vivo. FASEB J. 17, 1355–1357 (2003).

    Article  CAS  PubMed  Google Scholar 

  30. Nicholson, B., Manner, C.K. & MacLeod, C.L. Cat2 L-arginine transporter-deficient fibroblasts can sustain nitric oxide production. Nitric Oxide 7, 236–243 (2002).

    Article  CAS  PubMed  Google Scholar 

  31. Nicholson, B., Manner, C.K., Kleeman, J. & MacLeod, C.L. Sustained nitric oxide production in macrophages requires the arginine transporter CAT2. J. Biol. Chem. 276, 15881–15885 (2001).

    Article  CAS  PubMed  Google Scholar 

  32. Wang, Y. et al. Modeling human congenital disorder of glycosylation type IIa in the mouse: conservation of asparagine-linked glycan-dependent functions in mammalian physiology and insights into disease pathogenesis. Glycobiology 11, 1051–1070 (2001).

    Article  CAS  PubMed  Google Scholar 

  33. Gilbertson, R.J. & Clifford, S.C. PDGFRB is overexpressed in metastatic medulloblastoma. Nat. Genet. 35, 197–198 (2003).

    Article  CAS  PubMed  Google Scholar 

  34. Tettelin, H. & Parkhill, J. The use of genome annotation data and its impact on biological conclusions. Nat. Genet. 36, 1028–1029 (2004).

    Article  CAS  PubMed  Google Scholar 

  35. Perou, C.M. Show me the data! Nat. Genet. 29, 373 (2001).

    Article  CAS  PubMed  Google Scholar 

  36. Aronson, A.R. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc. AMIA Symp., 17–21 (2001). [

  37. Segal, E., Friedman, N., Koller, D. & Regev, A. A module map showing conditional activity of expression modules in cancer. Nat. Genet. 36, 1090–1098 (2004).

    Article  CAS  PubMed  Google Scholar 

  38. Ihaka, R. & Gentleman, R.R. A language for data analysis and graphics. J. Comput. Graph. Stat. 5, 299–314 (1996).

    Google Scholar 

  39. Butte, A.J., Tamayo, P., Slonim, D., Golub, T.R. & Kohane, I.S. Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. Proc. Natl Acad. Sci. USA 97, 12182–12186 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Lee, H.K., Hsu, A.K., Sajdak, J., Qin, J. & Pavlidis, P. Coexpression analysis of human genes across many microarray data sets. Genome Res. 14, 1085–1094 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Press, W.H., Teukolsky, S.A., Vetterling, W.T. & Flannery, B.P. Numerical Recipes in C: The Art of Scientific Computing (Cambridge University Press, Cambridge, UK, 1993).

    Google Scholar 

Download references

Acknowledgements

We thank Tarangini Deshpande for critical comments on and suggestions for the manuscript. The authors thank Partners Healthcare Research Computing for use of and assistance with the Linux High Performance Computing Cluster. The work was supported by grants from the Lucille Packard Foundation for Children's Health, National Institutes of Health National Center for Biomedical Computing (U54 LM008748), the National Library of Medicine (K22 LM008261), National Institute of Diabetes and Digestive and Kidney Diseases (K12 DK63696, R01 DK62948 and R01 DK060837), the Harvard-MIT Division of Health Sciences and Technology and the Lawson Wilkins Pediatric Endocrine Society.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Atul J Butte.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Fig. 1

Example relations between genes and phenotypes and environment. (PDF 406 kb)

Supplementary Fig. 2

An illustrative subset of concepts and relations from UMLS. (PDF 142 kb)

Supplementary Table 1

Number of concepts mapped to each of the seven GEO free-text annotations. (PDF 30 kb)

Supplementary Note (PDF 49 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Butte, A., Kohane, I. Creation and implications of a phenome-genome network. Nat Biotechnol 24, 55–62 (2006). https://doi.org/10.1038/nbt1150

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nbt1150

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing