Abstract
Although gene and protein measurements are increasing in quantity and comprehensiveness, they do not characterize a sample's entire phenotype in an environmental or experimental context. Here we comprehensively consider associations between components of phenotype, genotype and environment to identify genes that may govern phenotype and responses to the environment. Context from the annotations of gene expression data sets in the Gene Expression Omnibus is represented using the Unified Medical Language System, a compendium of biomedical vocabularies with nearly 1-million concepts. After showing how data sets can be clustered by annotative concepts, we find a network of relations between phenotypic, disease, environmental and experimental contexts as well as genes with differential expression associated with these concepts. We identify novel genes related to concepts such as aging. Comprehensively identifying genes related to phenotype and environment is a step toward the Human Phenome Project5.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Carson, J.P. et al. Pharmacogenomic identification of targets for adjuvant therapy with the topoisomerase poison camptothecin. Cancer Res. 64, 2096–2104 (2004).
Zhukov, T.A., Johanson, R.A., Cantor, A.B., Clark, R.A. & Tockman, M.S. Discovery of distinct protein profiles specific for lung tumors and pre-malignant lung lesions by SELDI mass spectrometry. Lung Cancer 40, 267–279 (2003).
Bhattacharjee, A. et al. Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc. Natl Acad. Sci. USA 98, 13790–13795 (2001).
Yanagisawa, K. et al. Proteomic patterns of tumour subsets in non-small-cell lung cancer. Lancet 362, 433–439 (2003).
Anthony, J.C., Eaton, W.W. & Henderson, A.S. Looking to the future in psychiatric epidemiology. Epidemiol. Rev. 17, 240–242 (1995).
Freimer, N. & Sabatti, C. The human phenome project. Nat. Genet. 34, 15–21 (2003).
Mahner, M. & Kary, M. What exactly are genomes, genotypes and phenotypes? And what about phenomes? J. Theor. Biol. 186, 55–63 (1997).
Stoll, M. et al. A genomic-systems biology map for cardiovascular function. Science 294, 1723–1726 (2001).
Wheeler, D.L. et al. Database resources of the National Center for Biotechnology Information: update. Nucleic Acids Res. 32 Database issue, D35–40 (2004).
Brazma, A. et al. Minimum information about a microarray experiment (MIAME)—toward standards for microarray data. Nat. Genet. 29, 365–371 (2001).
Spellman, P.T. et al. Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol. 3, RESEARCH0046 (2002).
Bodenreider, O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32 Database issue, D267–70 (2004).
Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).
International Classification of Diseases. Clinical Modification (ICD-9-CM), 9th revision, (Centers for Medicare & Medicaid Services, Washington DC, 2003).
Storey, J.D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA 100, 9440–9445 (2003).
Jo, K., Rutten, B., Bunn, R.C. & Bredt, D.S. Actinin-associated LIM protein-deficient mice maintain normal development and structure of skeletal muscle. Mol. Cell. Biol. 21, 1682–1687 (2001).
Su, A.I. et al. A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl Acad. Sci. USA 101, 6062–6067 (2004).
Rikans, L.E. & Moore, D.R. Influence of aging on rat liver enzymes involved in glutathione synthesis and degradation. Arch. Gerontol. Geriatr. 13, 263–270 (1991).
Ninfali, P., Aluigi, G. & Pompella, A. Postnatal expression of glucose-6-phosphate dehydrogenase in different brain areas. Neurochem. Res. 23, 1197–1204 (1998).
Cocco, P. et al. Mortality in a cohort of men expressing the glucose-6-phosphate dehydrogenase deficiency. Blood 91, 706–709 (1998).
Kyng, K.J., May, A., Kolvraa, S. & Bohr, V.A. Gene expression profiling in Werner syndrome closely resembles that of normal aging. Proc. Natl Acad. Sci. USA 100, 12259–12264 (2003).
Muschen, M. et al. Molecular portraits of B cell lineage commitment. Proc. Natl Acad. Sci. USA 99, 10014–10019 (2002).
Bhattacharya, B. et al. Gene expression in human embryonic stem cell lines: unique molecular signature. Blood 103, 2956–2964 (2004).
Zhao, Y. et al. Cloning and characterization of human DDX24 and mouse Ddx24, two novel putative DEAD-Box proteins, and mapping DDX24 to human chromosome 14q32. Genomics 67, 351–355 (2000).
Mirochnitchenko, O. et al. Acetaminophen toxicity. Opposite effects of two forms of glutathione peroxidase. J. Biol. Chem. 274, 10349–10355 (1999).
Sandre, C. et al. Early evolution of selenium status and oxidative stress parameters in rat models of thermal injury. J. Trace Elem. Med. Biol. 17, 313–318 (2004).
Topsakal, C. et al. Effects of prostaglandin E1, melatonin, and oxytetracycline on lipid peroxidation, antioxidant defense system, paraoxonase (PON1) activities, and homocysteine levels in an animal model of spinal cord injury. Spine 28, 1643–1652 (2003).
Sharma, G.D., He, J. & Bazan, H.E. p38 and ERK1/2 coordinate cellular migration and proliferation in epithelial wound healing: evidence of cross-talk activation between MAP kinase cascades. J. Biol. Chem. 278, 21989–21997 (2003).
Schulz, R. et al. Ischemic preconditioning preserves connexin 43 phosphorylation during sustained ischemia in pig hearts in vivo. FASEB J. 17, 1355–1357 (2003).
Nicholson, B., Manner, C.K. & MacLeod, C.L. Cat2 L-arginine transporter-deficient fibroblasts can sustain nitric oxide production. Nitric Oxide 7, 236–243 (2002).
Nicholson, B., Manner, C.K., Kleeman, J. & MacLeod, C.L. Sustained nitric oxide production in macrophages requires the arginine transporter CAT2. J. Biol. Chem. 276, 15881–15885 (2001).
Wang, Y. et al. Modeling human congenital disorder of glycosylation type IIa in the mouse: conservation of asparagine-linked glycan-dependent functions in mammalian physiology and insights into disease pathogenesis. Glycobiology 11, 1051–1070 (2001).
Gilbertson, R.J. & Clifford, S.C. PDGFRB is overexpressed in metastatic medulloblastoma. Nat. Genet. 35, 197–198 (2003).
Tettelin, H. & Parkhill, J. The use of genome annotation data and its impact on biological conclusions. Nat. Genet. 36, 1028–1029 (2004).
Perou, C.M. Show me the data! Nat. Genet. 29, 373 (2001).
Aronson, A.R. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc. AMIA Symp., 17–21 (2001). [
Segal, E., Friedman, N., Koller, D. & Regev, A. A module map showing conditional activity of expression modules in cancer. Nat. Genet. 36, 1090–1098 (2004).
Ihaka, R. & Gentleman, R.R. A language for data analysis and graphics. J. Comput. Graph. Stat. 5, 299–314 (1996).
Butte, A.J., Tamayo, P., Slonim, D., Golub, T.R. & Kohane, I.S. Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. Proc. Natl Acad. Sci. USA 97, 12182–12186 (2000).
Lee, H.K., Hsu, A.K., Sajdak, J., Qin, J. & Pavlidis, P. Coexpression analysis of human genes across many microarray data sets. Genome Res. 14, 1085–1094 (2004).
Press, W.H., Teukolsky, S.A., Vetterling, W.T. & Flannery, B.P. Numerical Recipes in C: The Art of Scientific Computing (Cambridge University Press, Cambridge, UK, 1993).
Acknowledgements
We thank Tarangini Deshpande for critical comments on and suggestions for the manuscript. The authors thank Partners Healthcare Research Computing for use of and assistance with the Linux High Performance Computing Cluster. The work was supported by grants from the Lucille Packard Foundation for Children's Health, National Institutes of Health National Center for Biomedical Computing (U54 LM008748), the National Library of Medicine (K22 LM008261), National Institute of Diabetes and Digestive and Kidney Diseases (K12 DK63696, R01 DK62948 and R01 DK060837), the Harvard-MIT Division of Health Sciences and Technology and the Lawson Wilkins Pediatric Endocrine Society.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Fig. 1
Example relations between genes and phenotypes and environment. (PDF 406 kb)
Supplementary Fig. 2
An illustrative subset of concepts and relations from UMLS. (PDF 142 kb)
Supplementary Table 1
Number of concepts mapped to each of the seven GEO free-text annotations. (PDF 30 kb)
Rights and permissions
About this article
Cite this article
Butte, A., Kohane, I. Creation and implications of a phenome-genome network. Nat Biotechnol 24, 55–62 (2006). https://doi.org/10.1038/nbt1150
Published:
Issue Date:
DOI: https://doi.org/10.1038/nbt1150
This article is cited by
-
Gene expression analysis in Fmr1KO mice identifies an immunological signature in brain tissue and mGluR5-related signaling in primary neuronal cultures
Molecular Autism (2015)
-
Deeper, longer phenotyping to accelerate the discovery of the genetic architectures of diseases
Genome Biology (2014)
-
Systematic large-scale study of the inheritance mode of Mendelian disorders provides new insight into human diseasome
European Journal of Human Genetics (2014)
-
ChIPXpress: using publicly available gene expression data to improve ChIP-seq and ChIP-chip target gene ranking
BMC Bioinformatics (2013)
-
Computational Drug Repositioning: From Data to Therapeutics
Clinical Pharmacology & Therapeutics (2013)