Although gene and protein measurements are increasing in quantity and comprehensiveness, they do not characterize a sample's entire phenotype in an environmental or experimental context. Here we comprehensively consider associations between components of phenotype, genotype and environment to identify genes that may govern phenotype and responses to the environment. Context from the annotations of gene expression data sets in the Gene Expression Omnibus is represented using the Unified Medical Language System, a compendium of biomedical vocabularies with nearly 1-million concepts. After showing how data sets can be clustered by annotative concepts, we find a network of relations between phenotypic, disease, environmental and experimental contexts as well as genes with differential expression associated with these concepts. We identify novel genes related to concepts such as aging. Comprehensively identifying genes related to phenotype and environment is a step toward the Human Phenome Project5.
Subscribe to Journal
Get full journal access for 1 year
only $20.83 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Carson, J.P. et al. Pharmacogenomic identification of targets for adjuvant therapy with the topoisomerase poison camptothecin. Cancer Res. 64, 2096–2104 (2004).
Zhukov, T.A., Johanson, R.A., Cantor, A.B., Clark, R.A. & Tockman, M.S. Discovery of distinct protein profiles specific for lung tumors and pre-malignant lung lesions by SELDI mass spectrometry. Lung Cancer 40, 267–279 (2003).
Bhattacharjee, A. et al. Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc. Natl Acad. Sci. USA 98, 13790–13795 (2001).
Yanagisawa, K. et al. Proteomic patterns of tumour subsets in non-small-cell lung cancer. Lancet 362, 433–439 (2003).
Anthony, J.C., Eaton, W.W. & Henderson, A.S. Looking to the future in psychiatric epidemiology. Epidemiol. Rev. 17, 240–242 (1995).
Freimer, N. & Sabatti, C. The human phenome project. Nat. Genet. 34, 15–21 (2003).
Mahner, M. & Kary, M. What exactly are genomes, genotypes and phenotypes? And what about phenomes? J. Theor. Biol. 186, 55–63 (1997).
Stoll, M. et al. A genomic-systems biology map for cardiovascular function. Science 294, 1723–1726 (2001).
Wheeler, D.L. et al. Database resources of the National Center for Biotechnology Information: update. Nucleic Acids Res. 32 Database issue, D35–40 (2004).
Brazma, A. et al. Minimum information about a microarray experiment (MIAME)—toward standards for microarray data. Nat. Genet. 29, 365–371 (2001).
Spellman, P.T. et al. Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol. 3, RESEARCH0046 (2002).
Bodenreider, O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32 Database issue, D267–70 (2004).
Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).
International Classification of Diseases. Clinical Modification (ICD-9-CM), 9th revision, (Centers for Medicare & Medicaid Services, Washington DC, 2003).
Storey, J.D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA 100, 9440–9445 (2003).
Jo, K., Rutten, B., Bunn, R.C. & Bredt, D.S. Actinin-associated LIM protein-deficient mice maintain normal development and structure of skeletal muscle. Mol. Cell. Biol. 21, 1682–1687 (2001).
Su, A.I. et al. A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl Acad. Sci. USA 101, 6062–6067 (2004).
Rikans, L.E. & Moore, D.R. Influence of aging on rat liver enzymes involved in glutathione synthesis and degradation. Arch. Gerontol. Geriatr. 13, 263–270 (1991).
Ninfali, P., Aluigi, G. & Pompella, A. Postnatal expression of glucose-6-phosphate dehydrogenase in different brain areas. Neurochem. Res. 23, 1197–1204 (1998).
Cocco, P. et al. Mortality in a cohort of men expressing the glucose-6-phosphate dehydrogenase deficiency. Blood 91, 706–709 (1998).
Kyng, K.J., May, A., Kolvraa, S. & Bohr, V.A. Gene expression profiling in Werner syndrome closely resembles that of normal aging. Proc. Natl Acad. Sci. USA 100, 12259–12264 (2003).
Muschen, M. et al. Molecular portraits of B cell lineage commitment. Proc. Natl Acad. Sci. USA 99, 10014–10019 (2002).
Bhattacharya, B. et al. Gene expression in human embryonic stem cell lines: unique molecular signature. Blood 103, 2956–2964 (2004).
Zhao, Y. et al. Cloning and characterization of human DDX24 and mouse Ddx24, two novel putative DEAD-Box proteins, and mapping DDX24 to human chromosome 14q32. Genomics 67, 351–355 (2000).
Mirochnitchenko, O. et al. Acetaminophen toxicity. Opposite effects of two forms of glutathione peroxidase. J. Biol. Chem. 274, 10349–10355 (1999).
Sandre, C. et al. Early evolution of selenium status and oxidative stress parameters in rat models of thermal injury. J. Trace Elem. Med. Biol. 17, 313–318 (2004).
Topsakal, C. et al. Effects of prostaglandin E1, melatonin, and oxytetracycline on lipid peroxidation, antioxidant defense system, paraoxonase (PON1) activities, and homocysteine levels in an animal model of spinal cord injury. Spine 28, 1643–1652 (2003).
Sharma, G.D., He, J. & Bazan, H.E. p38 and ERK1/2 coordinate cellular migration and proliferation in epithelial wound healing: evidence of cross-talk activation between MAP kinase cascades. J. Biol. Chem. 278, 21989–21997 (2003).
Schulz, R. et al. Ischemic preconditioning preserves connexin 43 phosphorylation during sustained ischemia in pig hearts in vivo. FASEB J. 17, 1355–1357 (2003).
Nicholson, B., Manner, C.K. & MacLeod, C.L. Cat2 L-arginine transporter-deficient fibroblasts can sustain nitric oxide production. Nitric Oxide 7, 236–243 (2002).
Nicholson, B., Manner, C.K., Kleeman, J. & MacLeod, C.L. Sustained nitric oxide production in macrophages requires the arginine transporter CAT2. J. Biol. Chem. 276, 15881–15885 (2001).
Wang, Y. et al. Modeling human congenital disorder of glycosylation type IIa in the mouse: conservation of asparagine-linked glycan-dependent functions in mammalian physiology and insights into disease pathogenesis. Glycobiology 11, 1051–1070 (2001).
Gilbertson, R.J. & Clifford, S.C. PDGFRB is overexpressed in metastatic medulloblastoma. Nat. Genet. 35, 197–198 (2003).
Tettelin, H. & Parkhill, J. The use of genome annotation data and its impact on biological conclusions. Nat. Genet. 36, 1028–1029 (2004).
Perou, C.M. Show me the data! Nat. Genet. 29, 373 (2001).
Aronson, A.R. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc. AMIA Symp., 17–21 (2001). [
Segal, E., Friedman, N., Koller, D. & Regev, A. A module map showing conditional activity of expression modules in cancer. Nat. Genet. 36, 1090–1098 (2004).
Ihaka, R. & Gentleman, R.R. A language for data analysis and graphics. J. Comput. Graph. Stat. 5, 299–314 (1996).
Butte, A.J., Tamayo, P., Slonim, D., Golub, T.R. & Kohane, I.S. Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. Proc. Natl Acad. Sci. USA 97, 12182–12186 (2000).
Lee, H.K., Hsu, A.K., Sajdak, J., Qin, J. & Pavlidis, P. Coexpression analysis of human genes across many microarray data sets. Genome Res. 14, 1085–1094 (2004).
Press, W.H., Teukolsky, S.A., Vetterling, W.T. & Flannery, B.P. Numerical Recipes in C: The Art of Scientific Computing (Cambridge University Press, Cambridge, UK, 1993).
We thank Tarangini Deshpande for critical comments on and suggestions for the manuscript. The authors thank Partners Healthcare Research Computing for use of and assistance with the Linux High Performance Computing Cluster. The work was supported by grants from the Lucille Packard Foundation for Children's Health, National Institutes of Health National Center for Biomedical Computing (U54 LM008748), the National Library of Medicine (K22 LM008261), National Institute of Diabetes and Digestive and Kidney Diseases (K12 DK63696, R01 DK62948 and R01 DK060837), the Harvard-MIT Division of Health Sciences and Technology and the Lawson Wilkins Pediatric Endocrine Society.
The authors declare no competing financial interests.
Example relations between genes and phenotypes and environment. (PDF 406 kb)
An illustrative subset of concepts and relations from UMLS. (PDF 142 kb)
Number of concepts mapped to each of the seven GEO free-text annotations. (PDF 30 kb)
About this article
Cite this article
Butte, A., Kohane, I. Creation and implications of a phenome-genome network. Nat Biotechnol 24, 55–62 (2006). https://doi.org/10.1038/nbt1150
Briefings in Bioinformatics (2019)
Finding relevant biomedical datasets: the UC San Diego solution for the bioCADDIE Retrieval Challenge
Frontiers in Digital Humanities (2018)
Journal of the American Medical Informatics Association (2018)
International Journal of Molecular Sciences (2017)