Creation and implications of a phenome-genome network


Although gene and protein measurements are increasing in quantity and comprehensiveness, they do not characterize a sample's entire phenotype in an environmental or experimental context. Here we comprehensively consider associations between components of phenotype, genotype and environment to identify genes that may govern phenotype and responses to the environment. Context from the annotations of gene expression data sets in the Gene Expression Omnibus is represented using the Unified Medical Language System, a compendium of biomedical vocabularies with nearly 1-million concepts. After showing how data sets can be clustered by annotative concepts, we find a network of relations between phenotypic, disease, environmental and experimental contexts as well as genes with differential expression associated with these concepts. We identify novel genes related to concepts such as aging. Comprehensively identifying genes related to phenotype and environment is a step toward the Human Phenome Project5.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: The method of extracting and relating genome, phenome and envirome data from GEO data sets.
Figure 2: Hierarchical clustering of 448 GEO data sets by context, created by treating each data set as a vector representing the presence or absence of a mapping from that data set to each UMLS concept, then calculating binary distance between data sets and clustering using complete linkage.
Figure 3: Network of relations between 46 biomedical concepts extracted from the annotations of data sets in Gene Expression Omnibus and 444 genes with differential expression associated with the presence or absence of the concept.


  1. 1

    Carson, J.P. et al. Pharmacogenomic identification of targets for adjuvant therapy with the topoisomerase poison camptothecin. Cancer Res. 64, 2096–2104 (2004).

  2. 2

    Zhukov, T.A., Johanson, R.A., Cantor, A.B., Clark, R.A. & Tockman, M.S. Discovery of distinct protein profiles specific for lung tumors and pre-malignant lung lesions by SELDI mass spectrometry. Lung Cancer 40, 267–279 (2003).

  3. 3

    Bhattacharjee, A. et al. Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc. Natl Acad. Sci. USA 98, 13790–13795 (2001).

  4. 4

    Yanagisawa, K. et al. Proteomic patterns of tumour subsets in non-small-cell lung cancer. Lancet 362, 433–439 (2003).

  5. 5

    Anthony, J.C., Eaton, W.W. & Henderson, A.S. Looking to the future in psychiatric epidemiology. Epidemiol. Rev. 17, 240–242 (1995).

  6. 6

    Freimer, N. & Sabatti, C. The human phenome project. Nat. Genet. 34, 15–21 (2003).

  7. 7

    Mahner, M. & Kary, M. What exactly are genomes, genotypes and phenotypes? And what about phenomes? J. Theor. Biol. 186, 55–63 (1997).

  8. 8

    Stoll, M. et al. A genomic-systems biology map for cardiovascular function. Science 294, 1723–1726 (2001).

  9. 9

    Wheeler, D.L. et al. Database resources of the National Center for Biotechnology Information: update. Nucleic Acids Res. 32 Database issue, D35–40 (2004).

  10. 10

    Brazma, A. et al. Minimum information about a microarray experiment (MIAME)—toward standards for microarray data. Nat. Genet. 29, 365–371 (2001).

  11. 11

    Spellman, P.T. et al. Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol. 3, RESEARCH0046 (2002).

  12. 12

    Bodenreider, O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32 Database issue, D267–70 (2004).

  13. 13

    Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).

  14. 14

    International Classification of Diseases. Clinical Modification (ICD-9-CM), 9th revision, (Centers for Medicare & Medicaid Services, Washington DC, 2003).

  15. 15

    Storey, J.D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA 100, 9440–9445 (2003).

  16. 16

    Jo, K., Rutten, B., Bunn, R.C. & Bredt, D.S. Actinin-associated LIM protein-deficient mice maintain normal development and structure of skeletal muscle. Mol. Cell. Biol. 21, 1682–1687 (2001).

  17. 17

    Su, A.I. et al. A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl Acad. Sci. USA 101, 6062–6067 (2004).

  18. 18

    Rikans, L.E. & Moore, D.R. Influence of aging on rat liver enzymes involved in glutathione synthesis and degradation. Arch. Gerontol. Geriatr. 13, 263–270 (1991).

  19. 19

    Ninfali, P., Aluigi, G. & Pompella, A. Postnatal expression of glucose-6-phosphate dehydrogenase in different brain areas. Neurochem. Res. 23, 1197–1204 (1998).

  20. 20

    Cocco, P. et al. Mortality in a cohort of men expressing the glucose-6-phosphate dehydrogenase deficiency. Blood 91, 706–709 (1998).

  21. 21

    Kyng, K.J., May, A., Kolvraa, S. & Bohr, V.A. Gene expression profiling in Werner syndrome closely resembles that of normal aging. Proc. Natl Acad. Sci. USA 100, 12259–12264 (2003).

  22. 22

    Muschen, M. et al. Molecular portraits of B cell lineage commitment. Proc. Natl Acad. Sci. USA 99, 10014–10019 (2002).

  23. 23

    Bhattacharya, B. et al. Gene expression in human embryonic stem cell lines: unique molecular signature. Blood 103, 2956–2964 (2004).

  24. 24

    Zhao, Y. et al. Cloning and characterization of human DDX24 and mouse Ddx24, two novel putative DEAD-Box proteins, and mapping DDX24 to human chromosome 14q32. Genomics 67, 351–355 (2000).

  25. 25

    Mirochnitchenko, O. et al. Acetaminophen toxicity. Opposite effects of two forms of glutathione peroxidase. J. Biol. Chem. 274, 10349–10355 (1999).

  26. 26

    Sandre, C. et al. Early evolution of selenium status and oxidative stress parameters in rat models of thermal injury. J. Trace Elem. Med. Biol. 17, 313–318 (2004).

  27. 27

    Topsakal, C. et al. Effects of prostaglandin E1, melatonin, and oxytetracycline on lipid peroxidation, antioxidant defense system, paraoxonase (PON1) activities, and homocysteine levels in an animal model of spinal cord injury. Spine 28, 1643–1652 (2003).

  28. 28

    Sharma, G.D., He, J. & Bazan, H.E. p38 and ERK1/2 coordinate cellular migration and proliferation in epithelial wound healing: evidence of cross-talk activation between MAP kinase cascades. J. Biol. Chem. 278, 21989–21997 (2003).

  29. 29

    Schulz, R. et al. Ischemic preconditioning preserves connexin 43 phosphorylation during sustained ischemia in pig hearts in vivo. FASEB J. 17, 1355–1357 (2003).

  30. 30

    Nicholson, B., Manner, C.K. & MacLeod, C.L. Cat2 L-arginine transporter-deficient fibroblasts can sustain nitric oxide production. Nitric Oxide 7, 236–243 (2002).

  31. 31

    Nicholson, B., Manner, C.K., Kleeman, J. & MacLeod, C.L. Sustained nitric oxide production in macrophages requires the arginine transporter CAT2. J. Biol. Chem. 276, 15881–15885 (2001).

  32. 32

    Wang, Y. et al. Modeling human congenital disorder of glycosylation type IIa in the mouse: conservation of asparagine-linked glycan-dependent functions in mammalian physiology and insights into disease pathogenesis. Glycobiology 11, 1051–1070 (2001).

  33. 33

    Gilbertson, R.J. & Clifford, S.C. PDGFRB is overexpressed in metastatic medulloblastoma. Nat. Genet. 35, 197–198 (2003).

  34. 34

    Tettelin, H. & Parkhill, J. The use of genome annotation data and its impact on biological conclusions. Nat. Genet. 36, 1028–1029 (2004).

  35. 35

    Perou, C.M. Show me the data! Nat. Genet. 29, 373 (2001).

  36. 36

    Aronson, A.R. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc. AMIA Symp., 17–21 (2001). [

  37. 37

    Segal, E., Friedman, N., Koller, D. & Regev, A. A module map showing conditional activity of expression modules in cancer. Nat. Genet. 36, 1090–1098 (2004).

  38. 38

    Ihaka, R. & Gentleman, R.R. A language for data analysis and graphics. J. Comput. Graph. Stat. 5, 299–314 (1996).

  39. 39

    Butte, A.J., Tamayo, P., Slonim, D., Golub, T.R. & Kohane, I.S. Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. Proc. Natl Acad. Sci. USA 97, 12182–12186 (2000).

  40. 40

    Lee, H.K., Hsu, A.K., Sajdak, J., Qin, J. & Pavlidis, P. Coexpression analysis of human genes across many microarray data sets. Genome Res. 14, 1085–1094 (2004).

  41. 41

    Press, W.H., Teukolsky, S.A., Vetterling, W.T. & Flannery, B.P. Numerical Recipes in C: The Art of Scientific Computing (Cambridge University Press, Cambridge, UK, 1993).

Download references


We thank Tarangini Deshpande for critical comments on and suggestions for the manuscript. The authors thank Partners Healthcare Research Computing for use of and assistance with the Linux High Performance Computing Cluster. The work was supported by grants from the Lucille Packard Foundation for Children's Health, National Institutes of Health National Center for Biomedical Computing (U54 LM008748), the National Library of Medicine (K22 LM008261), National Institute of Diabetes and Digestive and Kidney Diseases (K12 DK63696, R01 DK62948 and R01 DK060837), the Harvard-MIT Division of Health Sciences and Technology and the Lawson Wilkins Pediatric Endocrine Society.

Author information

Correspondence to Atul J Butte.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Fig. 1

Example relations between genes and phenotypes and environment. (PDF 406 kb)

Supplementary Fig. 2

An illustrative subset of concepts and relations from UMLS. (PDF 142 kb)

Supplementary Table 1

Number of concepts mapped to each of the seven GEO free-text annotations. (PDF 30 kb)

Supplementary Note (PDF 49 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Butte, A., Kohane, I. Creation and implications of a phenome-genome network. Nat Biotechnol 24, 55–62 (2006).

Download citation

Further reading