Cross-phenotype associations and pleiotropy have been identified in human studies.
Phenome-wide association studies (PheWAS) are an emerging method to identify cross-phenotype associations.
Phenomes have been characterized using electronic health records, which provide a real-time clinical representation of an individual's health conditions.
Phenomes have also been developed by aggregating data from traditional epidemiological studies, which provide a snapshot of a participant's health, lifestyle and environmental exposures.
PheWAS have been performed using data from clinical health records or from epidemiological studies; given that neither approach is designed to fully capture all aspects of the human phenome these approaches should be considered complementary.
Active areas of research for PheWAS include the addition of diverse populations, establishment of 'genome–phenome-wide' significance, and development of methods for the analysis and visualization of these complex associations.
Advances in genotyping technology have, over the past decade, enabled the focused search for common genetic variation associated with human diseases and traits. With the recently increased availability of detailed phenotypic data from electronic health records and epidemiological studies, the impact of one or more genetic variants on the phenome is starting to be characterized both in clinical and population-based settings using phenome-wide association studies (PheWAS). These studies reveal a number of challenges that will need to be overcome to unlock the full potential of PheWAS for the characterization of the complex human genome–phenome relationship.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Acute stress reduces population-level metabolic and proteomic variation
BMC Bioinformatics Open Access 07 March 2023
A tissue-level phenome-wide network map of colocalized genes and phenotypes in the UK Biobank
Communications Biology Open Access 20 August 2022
Diverse functions associate with non-coding polymorphisms shared between humans and chimpanzees
BMC Ecology and Evolution Open Access 23 May 2022
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Rent or buy this article
Get just this article for as long as you need it
Prices may be subject to local taxes which are calculated during checkout
Sturtevant, A. J. The linear arrangement of six sex-linked factors in Drosophila, as shown by their mode of association. J. Exp. Zool. 14, 59 (1913).
Gough, S. C. & Simmonds, M. J. The HLA rgion and autoimmune disease: associations and mechanisms of action. Curr. Genom. 8, 453–465 (2007).
Ueda, H. et al. Association of the T-cell regulatory gene CTLA4 with susceptibility to autoimmune disease. Nature 423, 506–511 (2003).
Criswell, L. A. et al. Analysis of families in the multiple autoimmune disease genetics consortium (MADGC) collection: the PTPN22 620W allele associates with multiple autoimmune phenotypes. Am. J. Hum. Genet. 76, 561–571 (2005).
Zhernakova, A., van Diemen, C. C. & Wijmenga, C. Detecting shared pathogenesis from the shared genetics of immune-related diseases. Nat. Rev. Genet. 10, 43–55 (2009). This review highlights the shared influence of genetic variants for autoimmune diseases.
Diabetes Genetics Initiative of Broad Institute of Harvard and MIT, Lund University, and Novartis Institutes of BioMedical Research et al. Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science 316, 1331–1336 (2007).
McPherson, R. et al. A common allele on chromosome 9 associated with coronary heart disease. Science 316, 1488–1491 (2007).
Helgadottir, A. et al. A common variant on chromosome 9p21 affects the risk of myocardial infarction. Science 316, 1491–1493 (2007).
Samani, N. J. et al. Genomewide association analysis of coronary artery disease. N. Engl. J. Med. 357, 443–453 (2007).
Hindorff, L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. USA 106, 9362–9367 (2009). This is the first comprehensive characterization of GWAS-identified variants from the literature.
Sivakumaran, S. et al. Abundant pleiotropy in human complex diseases and traits. Am. J. Hum. Genet. 89, 607–618 (2011).
Solovieff, N., Cotsapas, C., Lee, P. H., Purcell, S. M. & Smoller, J. W. Pleiotropy in complex traits: challenges and strategies. Nat. Rev. Genet. 14, 483–495 (2013).
Stearns, F. W. One hundred years of pleiotropy: a retrospective. Genetics 186, 767–773 (2010).
Wagner, G. P. & Zhang, J. The pleiotropic structure of the genotype-phenotype map: the evolvability of complex organisms. Nat. Rev. Genet. 12, 204–213 (2011). This is an excellent review of pleiotropy.
Tyler, A. L., Crawford, D. C. & Pendergrass, S. A. The detection and characterization of pleiotropy. discovery, progress, and promise. Brief. Bioinform. 17, 13–22 (2016).
Rastegar-Mojarad, M., Ye, Z., Kolesar, J. M., Hebbring, S. J. & Lin, S. M. Opportunities for drug repositioning from phenome-wide association studies. Nat. Biotechnol. 33, 342–345 (2015).
Collins, F. S. & Varmus, H. A. New initiative on precision medicine. N. Engl. J. Med. 372, 793–795 (2015).
Pendergrass, S. A. & Ritchie, M. Phenome-wide association studies: leveraging comprehensive phenotypic and genotypic data for discovery. Curr. Genet. Med. Rep. 3, 92–100 (2015).
Hebbring, S. J. The challenges, advantages and future of phenome-wide association studies. Immunology 141, 157–165 (2014).
Pendergrass, S. A. et al. Phenome-wide association studies: embracing complexity for discovery. Hum. Hered. 3–4, 111–123 (2015).
Stranger, B. E. et al. Patterns of cis regulatory variation in diverse human populations. PLoS Genet. 8, e1002639 (2012).
Veyrieras, J. B. et al. High-resolution mapping of expression-QTLs yields insight into human gene regulation. PLoS Genet. 4, e1000214 (2008).
Pai, A. A. et al. The contribution of RNA decay quantitative trait loci to inter-individual variation in steady-state gene expression levels. PLoS Genet. 8, e1003000 (2012).
Gaffney, D. J. et al. Controls of nucleosome positioning in the human genome. PLoS Genet. 8, e1003036 (2012).
Degner, J. F. et al. DNase I sensitivity QTLs are a major determinant of human expression variation. Nature 482, 390–394 (2012).
Battle, A. et al. Impact of regulatory variation from RNA to protein. Science 347, 664–667 (2015). This is a systematic study of the ways in which genetic variants influence the expression of transcripts and proteins.
Wu, L. et al. Variation and genetic control of protein abundance in humans. Nature 499, 79–82 (2013).
Hause, R. et al. Identification and validation of genetic variants that influence transcription factor and cell signaling protein levels. Am. J. Hum. Genet. 95, 194–208 (2014).
Nicolae, D. L. et al. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet. 6, e1000888 (2010).
Cookson, W., Liang, L., Abecasis, G., Moffatt, M. & Lathrop, M. Mapping complex disease traits with global gene expression. Nat. Rev. Genet. 10, 184–194 (2009).
Venter, J. C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).
Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
Lander, E. S. Initial impact of the sequencing of the human genome. Nature 470, 187–197 (2011).
Bush, W. S. & Moore, J. H. Chapter 11: genome-wide association studies. PLoS Comput. Biol. 8, e1002822 (2012).
Witte, J. S. Genome-wide association studies and beyond. Annu. Rev. Publ. Health 31, 9–20 (2010).
Altshuler, D., Daly, M. J. & Lander, E. S. Genetic mapping in human disease. Science 322, 881–888 (2008).
Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014).
Jensen, P. B., Jensen, L. J. & Brunak, S. Mining electronic health records: towards better research applications and clinical care. Nat. Rev. Genet. 13, 395–405 (2012).
Kohane, I. S. Using electronic health records to drive discovery in disease genomics. Nat. Rev. Genet. 12, 417–428 (2011). This review is an excellent overview of existing and potential uses of EHRs in the context of genomics.
Banda, Y. et al. Characterizing race/ethnicity and genetic ancestry for 100,000 subjects in the genetic epidemiology research on adult health and aging (GERA) cohort. Genetics 200, 1285–1295 (2015).
Kvale, M. N. et al. Genotyping informatics and quality control for 100,000 subjects in the genetic epidemiology research on adult health and aging (GERA) cohort. Genetics 200, 1051–1060 (2015).
Gaziano, J. M. et al. Million Veteran Program: a mega-biobank to study genetic influences on health and disease. J. Clin. Epidemiol. 70, 214–223 (2016).
McCarty, C. et al. The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies. BMC Medical Genomics 4, 13 (2011).
Ritchie, M. D. et al. Genome- and phenome-wide analyses of cardiac conduction identifies markers of arrhythmia risk. Circulation 127, 1377–1385 (2013).
Denny, J. C. et al. Identification of genomic predictors of atrioventricular conduction. Circulation 122, 2016–2021 (2010).
Ritchie, M. D. et al. Electronic medical records and genomics (eMERGE) network exploration in cataract: several new potential susceptbility loci. Mol. Vis. 20, 1281–1295 (2014).
McDavid, A. et al. Enhancing the power of genetic association studies through the use of silver standard cases derived from electronic medical records. PLoS ONE 8, e63481 (2013).
Turner, S. D. et al. Knowledge-driven multi-locus analysis reveals gene–gene interactions influencing HDL cholesterol level in two independent EMR-linked biobanks. PLoS ONE 6, e19586 (2011).
Kullo, I. J. et al. Leveraging informatics for genetic studies: use of the electronic medical record to enable a genome-wide association study of peripheral arterial disease. J. Am. Med. Inform. Assoc. 17, 568–574 (2010).
Kho, A. N. et al. Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study. J. Am. Med. Inform. Assoc. 19, 212–218 (2012).
Ober, C. & Vercelli, D. Gene-environment interactions in human disease: nuisance or opportunity? Trends Genet. 27, 107–115 (2011). This is an excellent review of the role of gene–environment interactions in the context of human disease.
Jones, R., Pembrey, M., Golding, J. & Herrick, D. The search for genenotype/phenotype associations and the phenome scan. Paediatr. Perinatal Epidemiol. 19, 264–275 (2005).
Freimer, N. & Sabatti, C. The human phenome project. Nat. Genet. 34, 15–21 (2003).
Denny, J. C. et al. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics 26, 1205–1210 (2010). This is the first published PheWAS performed in a biorepository linked to EHRs.
International Multiple Sclerosis Genetics Consortium et al. Risk alleles for multiple sclerosis identified by a genomewide study. N. Engl. J. Med. 357, 851–862 (2007).
De Jager, P. L. et al. Meta-analysis of genome scans and replication identify CD6, IRF8 and TNFRSF1A as new multiple sclerosis susceptibility loci. Nat. Genet. 41, 776–782 (2009).
WTCCC Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).
Gudbjartsson, D. F. et al. Variants conferring risk of atrial fibrillation on chromosome 4q25. Nature 448, 353–357 (2007).
Gudbjartsson, D. F. et al. A sequence variant in ZFHX3 on 16q22 associates with atrial fibrillation and ischemic stroke. Nat. Genet. 41, 876–878 (2009).
Raychaudhuri, S. et al. Common variants at CD40 and other loci confer risk of rheumatoid arthritis. Nat. Genet. 40, 1216–1223 (2008).
Denny, J. C. et al. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nat. Biotechnol. 31, 1102–1111 (2013).
Cheng, I. et al. Pleiotropic effects of genetic risk variants for other cancers on colorectal cancer risk: PAGE, GECCO and CCFR consortia. Gut 63, 800–807 (2014).
Park, S. L. et al. Pleiotropic associations of risk variants identified for other cancers with lung cancer risk: the PAGE and TRICL consortia. J. Natl Cancer Inst. 106, dju061 (2014).
Setiawan, V. W. et al. Cross-cancer pleiotropic analysis of endometrial cancer: PAGE and E2C2 consortia. Carcinogenesis 35, 2068–2073 (2014).
Park, S. L. et al. Association of cancer susceptibility variants with risk of multiple primary cancers: the Population Architecture using Genomics and Epidemiology study. Cancer Epidemiol. Biomarkers Prev. 23, 2568–2578 (2014).
Kocarnik, J. M. et al. Pleiotropic and sex-specific effects of cancer GWAS SNPs on melanoma risk in the Population Architecture Using Genomics and Epidemiology (PAGE) study. PLoS ONE 10, e0120491 (2015).
Pierce, B. L. & Ahsan, H. Genome-wide pleiotropy scan identifies HNF1A region as a novel pancreatic cancer susceptibility locus. Cancer Res. 71, 4352–4358 (2011).
Campa, D. et al. A genome-wide pleiotropy scan does not identify new susceptibility for estrogen receptor negative breast cancer. PLoS ONE 9, e85955 (2014).
Panagiotou, O. A. et al. A genome-wide pleiotropy scan for prostate cancer risk. Eur. Urol. 67, 649–657 (2015).
Cotsapas, C. et al. Pervasive sharing of genetic effects in autoimmune disease. PLoS Genet. 7, e1002254 (2011). This study highlights the shared complex architecture of genetic factors influencing autoimmune diseases.
Pendergrass, S. A. et al. The use of phenome-wide association studies (PheWAS) for exploration of novel genotype-phenotype relationships and pleiotropy discovery. Genet. Epidemiol. 35, 410–422 (2011).
Carroll, R. J., Bastarache, L. & Denny, J. C. R PheWAS: data analysis and plotting tools for phenome-wide association studies in the R environment. Bioinformatics 30, 2375–2376 (2014).
Millard, L. A. C. et al. MR-PheWAS: hypothesis prioritization among potential causal effects of body mass index on many outcomes, using Mendelian randomization. Sci. Rep. 5, 16645 (2015).
Matise, T. C. et al. The next PAGE in understanding complex traits: design for the analysis of population architecture using genetics and epidemiology (PAGE) study. Am. J. Epidemiol. 174, 849–859 (2011).
Zeggini, E. & Ioannidis, J. P. Meta-analysis in genome-wide association studies. Pharmacogenomics 10, 191–201 (2009).
Evangelou, E. & Ioannidis, J. P. A. Meta-analysis methods for genome-wide association studies and beyond. Nat. Rev. Genet. 14, 379–389 (2013).
DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium et al. Genome-wide trans-ancestry meta-analysis provides insight into the genetic architecture of type 2 diabetes susceptibility. Nat. Genet. 46, 234–244 (2014).
Dumitrescu, L. et al. Genetic determinants of lipid traits in diverse populations from the Population Architecture using Genomics and Epidemiology (PAGE) study. PLoS Genet. 7, e1002138 (2011).
Kathiresan, S. et al. Common variants at 30 loci contribute to polygenic dyslipidemia. Nat. Genet. 41, 56–65 (2009).
Hall, M. A. et al. Detection of pleiotropy through a phenome-wide association study (PheWAS) of epidemiologic data as part of the Environmental Architecture for Genes Linked to Environment (EAGLE) Study. PLoS Genet. 10, e1004678 (2014).
Mitchell, S. et al. Investigating the relationship between mitochondrial genetic variation and cardiovascular-related traits to develop a framework for mitochondrial phenome-wide association studies. BioData Min. 7, 6 (2014).
Pendergrass, S., Dudek, S., Crawford, D. & Ritchie, M. Visually integrating and exploring high throughput phenome-wide association study (PheWAS) results using PheWAS-View. BioData Min. 5, 5 (2014).
Xing, E. P. et al. GWAS in a box: statistical and visual analytics of structured associations via GenAMap. PLoS ONE 9, e97524 (2014).
Moore, C. B., Wallace, J. R., Frase, A. T., Pendergrass, S. A. & Ritchie, M. D. BioBin: a bioinformatics tools for automating the binning of rare variants using publicly available biological knowledge. BMC Med Genomics 6, S6 (2013).
Kraja, A. T. et al. Pleiotropic genes for metabolic syndrome and inflammation. Mol. Genet. Metab. 112, 317–338 (2014).
Pendergrass, S. A. et al. Phenome-wide association study (PheWAS) for detection of pleiotropy within the Population Architecture using Genomics and Epidemiology (PAGE) Network. PLoS Genet. 9, e1003087 (2013). This study is the first epidemiologically based PheWAS.
Dumitrescu, L. et al. Towards a phenome-wide catalog of human clinical traits impacted by genetic ancestry. BioData Min. 8, 35 (2015).
Rosenberg, N. A. et al. Genome-wide association studies in diverse populations. Nat. Rev. Genet. 11, 356–366 (2010).
Jaffe, S. Planning for US Precision Medicine Initiative underway. Lancet 385, 2448–2449 (2015).
Flohil, S. C. et al. Prevalence of actinic keratosis and its risk factors in the general population: The Rotterdam Study. J. Invest. Dermatol. 133, 1971–1978 (2013).
Han, J. et al. A genome-wide association study identifies novel alleles associated with hair color and skin pigmentation. PLoS Genet. 4, e1000074 (2008).
Eriksson, N. et al. Web-based, participant-driven studies yield novel genetic associations for common traits. PLoS Genet. 6, e1000993 (2010). This study explores the potential of commercial web-based surveys for study participants.
Zhang, M. et al. Genome-wide association studies identify several new loci associated with pigmentation traits and skin cancer risk in European Americans. Hum. Mol. Genet. 22, 2948–2959 (2013).
Jacobs, L. C. et al. IRF4, MC1R and TYR genes are risk factors for actinic keratosis independent of skin color. Hum. Mol. Genet. 24, 3296–3303 (2015).
Cooper, G. M. & Shendure, J. Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nat. Rev. Genet. 12, 628–640 (2011).
Namjou, B. et al. A GWAS study on liver function test using eMERGE network participants. PLoS ONE 10, e0138677 (2015).
Denny, J. C. et al. Variants near FOXE1 are associated with hypothyroidism and other thyroid conditions: using electronic medical records for genome- and phenome-wide studies. Am. J. Hum. Genet. 89, 529–542 (2011).
Hebbring, S. J. et al. PheWAS approach in studying HLA-DRB1*1501. Genes Immun. 14, 187–191 (2013).
Cronin, R. M. et al. Phenome wide association studies demonstrating pleiotropy of genetic variants within FTO with and without adjustment for body mass index. Front. Genet. 5, 250 (2014).
Shameer, K. et al. A genome- and phenome-wide association study to identify genetic variants influencing platelet count and volume and their pleiotropic effects. Hum. Genet. 133, 95–109 (2014).
Namjou, B. et al. Phenome-wide association study (PheWAS) in EMR-linked pediatric cohorts, genetically links PLCL1 to speech language development and IL5-IL13 to eosinophilic esophagitis. Front. Genet. 5, 401 (2014).
Ye, Z. et al. Phenome-wide association studies (PheWASs) for functional variants. Eur. J. Hum. Genet. 23, 523–529 (2015).
Liao, K. P. et al. Associations of autoantibodies, autoimmune risk alleles, and clinical diagnoses from the electronic medical records in rheumatoid arthritis cases and non-rheumatoid arthritis controls. Arthritis Rheum. 65, 571–581 (2013).
Neuraz, A. et al. Phenome-wide association studies on a quantitative trait: application to TPMT enzyme activity and thiopurine therapy in pharmacogenomics. PLoS Comput. Biol. 9, e1003405 (2013).
Boyd, A. D. et al. Metrics and tools for consistent cohort discovery and financial analyses post-transition to ICD-10-CM. J. Am. Med. Inform. Assoc. 22, 730–737 (2015).
Turer, R. W., Zuckowsky, T. D., Causey, H. J. & Rosenbloom, S. T. ICD-10-CM Crosswalks in the primary care setting: assessing reliability of the GEMs and reimbursement mappings. J. Am. Med. Inform. Assoc. 22, 417–425 (2015).
Hebbring, S. J. et al. Application of clinical text data for phenome-wide association studies (PheWASs). Bioinformatics 31, 1981–1987 (2015).
Rhodes, E. T., Laffel, L. M. B., Gonzalez, T. V. & Ludwig, D. S. Accuracy of administrative coding for type 2 diabetes in children, adolescents, and young adults. Diabetes Care 30, 141–143 (2007).
Richesson, R. L. et al. A comparison of phenotype definitions for diabetes mellitus. J. Am. Med. Inform. Assoc. 20, e319–e326 (2013).
Ritchie, M. D. et al. Robust replication of genotype-phenotype associations across multiple diseases in an electronic medical record. Am. J. Hum. Genet. 86, 560–572 (2010). This study demonstrates that the phenotypes defined by billing codes in the EHRs can replicate known genotype–phenotype associations, suggesting that EHRs in general can be used for genomic discovery.
Dumitrescu, L., Diggins, K. E., Goodloe, R. & Crawford, D. C. Testing population-specific quantitative trait associations for clinical outcome relevance in a biorepository linked to electronic health records: LPA and myocardial infarction in African Americans. Pac. Symp. Biocomput. 21, 96–107 (2016).
Moriyama, I. M., Loy, R. M. & Robb-Smith, A. H. T. History of the Statistical Classification of Diseases and Causes of Death [online] (CDC — National Center for Health Statistics, 2011).
Wiley, L. K., Shah, A., Xu, H. & Bush, W. S. ICD-9 tobacco use codes are effective identifiers of smoking status. J. Am. Med. Inform. Assoc. 20, 652–658 (2013).
Oetjens, M. et al. Utilization of an EMR-biorepository to identify the genetic predictors of calcineurin-inhibitor toxicity in heart transplant recipients. Pac. Symp. Biocomput 2014, 253–264 (2014).
Restrepo, N. A., Farber-Eger, E., Goodloe, R., Haines, J. L. & Crawford, D. C. Extracting primary open-angle glaucoma from electronic medical records for genetic association studies. PLoS ONE 10, e0127817 (2015).
Davis, M. F. Sriram, S., Bush, W. S., Denny, J. C. & Haines, J. L. Automated extraction of clinical traits of multiple sclerosis in electronic medical records. J. Am. Med. Inform. Assoc. 20, e334–e340 (2013).
Peissig, P. et al. Construction of atorvastatin dose-response relationships using data from a large population-based DNA biobank. Bas. Clin. Pharmacol. Toxicol. 100, 286–288 (2007).
Warner, J. L., Denny, J. C., Kreda, D. A. & Alterovitz, G. Seeing the forest through the trees: uncovering phenomic complexity through interactive network visualization. J. Am. Med. Inform. Assoc. 22, 324–329 (2015).
Yu, S. et al. Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources. J. Am. Med. Inform. Assoc. 22, 993–1000 (2015).
Lasko, T. A., Denny, J. C. & Levy, M. A. Computational phenotype discovery using unsupervised feature learning over noisy, sparse, and irregular clinical data. PLoS ONE 8, e66341 (2013).
Deans, A. R. et al. Finding our way through phenotypes. PLoS Biol. 13, e1002033 (2015).
Bennett, S. N. et al. Phenotype harmonization and cross-study collaboration in GWAS consortia: the GENEVA experience. Genet. Epidemiol. 35, 159–173 (2011).
Doiron, D., Raina, P., Ferretti, V., L' Heureux, F. & Fortier, I. Facilitating collaborative research: implementing a platform supporting data harmonization and pooling. Nor. Epidemiol. 21, 221–224 (2012).
Wells, B. J., Chagin, K. M., Nowacki, A. S. & Kattan, M. W. Strategies for handling missing data in electronic health record derived data. EGEMS (Wash. DC) 1, 1035 (2013).
Avery, C. L. et al. A phenomics-based strategy identifies loci on APOC1, BRAP, and PLCG1 associated with metabolic syndrome phenotype domains. PLoS Genet. 7, e1002322 (2011).
Plomin, R., Haworth, C. M. A. & Davis, O. S. P. Common disorders are quantitative traits. Nat. Rev. Genet. 10, 872–878 (2009).
Wood, A. R. et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. 46, 1173–1186 (2014).
Locke, A. E. et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206 (2015).
Muthalagu, A. et al. A rigorous algorithm to detect and clean inaccurate adult height records within EHR systems. Appl. Clin. Inform. 5, 118–126 (2014).
Wells, Q., Farber-Eger, E. & Crawford, D. Extraction of echocardiographic data from the electronic medical record is a rapid and efficient method for study of cardiac structure and function. J. Clin. Bioinforma. 4, 12 (2014).
National Cholesterol Education Program (NCEP) Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults (Adult Treatment Panel III). Third Report of the National Cholesterol Education Program (NCEP) Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults (Adult Treatment Panel III) Final Report. Circulation 106, 3143–3421 (2002).
Uzuner, O., Goldstein, I., Luo, Y. & Kohane, I. Identifying patient smoking status from medical discharge records. J. Am. Med. Inform. Assoc. 15, 14–24 (2008).
Kravets, N. & Parker, J. D. Linkage of the Third National Health and Nutrition Examination Survey to air quality data. Vital Health Stat 2 149, 1–16, (2008).
Parker, J. D., Kravets, N., Nachman, K. & Sapkota, A. Linkage of the 1999–2008 National Health and Nutrition Examination Surveys to traffic indicators from the National Highway Planning Network. Natl Health Stat. Rep. 45, 1–16 (2012).
McCarty, C. et al. Validation of PhenX measures in the personalized medicine research project for use in gene/environment studies. BMC Medical Genomics 7, 3 (2014).
Strobush, L. et al. Dietary intake in the Personalized Medicine Research Project: a resource for studies of gene-diet interaction. Nutr. J. 10, 13 (2011).
Roth, C., Foraker, R., Payne, P. & Embi, P. Community-level determinants of obesity: harnessing the power of electronic health records for retrospective data analysis. BMC Med. Inform. Decis. Mak. 14, 36 (2014).
Schwartz, B. S. et al. Body mass index and the built and social environments in children and adolescents using electronic health records. Am. J. Prev. Med. 41, e17–e28 (2011).
Hall, M. A. et al. Environment-wide association study (EWAS) for type 2 diabetes in the Marshfield Personalized Medicine Research Project Biobank. Pac. Symp. Biocomput. 2014, 200–211 (2014).
Patel, C. J., Bhattacharya, J. & Butte, A. J. An environment-wide association study (EWAS) on type 2 diabetes mellitus. PLoS ONE 5, e10746 (2010).
Patel, C., Chen, R., Kodama, K., Ioannidis, J. & Butte, A. Systematic identification of interaction effects between genome- and environment-wide associations in type 2 diabetes mellitus. Hum. Genet. 132, 495–508 (2013).
Patel, C. J. & Manrai, A. K. Development of exposome correlation globes to map out environment-wide associations. Pac. Symp. Biocomput 2015, 231–242 (2015).
Chen, R. et al. Personal omics profiling reveals dynamic molecular and medical phenotypes. Cell 148, 1293–1307 (2012).
Singh, A. et al. Incorporating temporal EHR data in predictive models for risk stratification of renal function deterioration. J. Biomed. Inform. 53, 220–228 (2015).
Sitlani, C. M. et al. Generalized estimating equations for genome-wide association studies using longitudinal phenotype data. Stat. Med. 34, 118–130 (2015).
Moore, C. B. et al. Phenome-wide association study relating pretreatment laboratory parameters with human genetic variants in AIDS clinical trails group protocols. Open Forum Infect. Dis. 2, ofu113 (2015).
Xu, H. et al. MedEx: a medication information extraction system for clinical narratives. J. Am. Med. Inform. Assoc. 17, 19–24 (2010).
Sohn, S. et al. MedXN: an open source medication extraction and normalization tool for clinical text. J. Am. Med. Inform. Assoc. 21, 858–865 (2014).
Nelson, S. J., Zeng, K., Kilbourne, J., Powell, T. & Moore, R. Normalized names for clinical drugs: RxNorm at 6 years. J. Am. Med. Inform. Assoc. 18, 441–448 (2011).
McCarty C. A., Garber, A., Reeser, J. C., Fost, N. C. & Personalized Medicine Research Project Community Advisory Group and Ethics and Security Advisory Board. Study newsletters, community and ethics advisory boards, and focus group discussions provide ongoing feedback for a large biobank. Am. J. Med. Genet. 155, 737–741 (2011).
Hayden, E. C. Informed consent: a broken contract. Nature 486, 312–314 (2012).
Emanuel, E. J. Reform of clinical research regulations, finally. N. Engl. J. Med. 373, 2296–2299 (2015).
Hazin, R. et al. Ethical, legal, and social implications of incorporating genomic information into electronic health records. Genet. Med. 15, 810–816 (2013).
Malin, B., Loukides, G., Benitez, K. & Clayton, E. Identifiability in biobanks: models, measures, and mitigation strategies. Hum. Genet. 130, 383–392 (2011).
Gymrek, M., McGuire, A. L., Golan, D., Halperin, E. & Erlich, Y. Identifying personal genomes by surname inference. Science 339, 321–324.
Jarvik, G. P. et al. Return of genomic results to research participants: the floor, the ceiling, and the choices in between. Am. J. Hum. Genet. 94, 818–826 (2014).
Fullerton, S. M. et al. Return of individual research results from genome-wide association studies: experience of the Electronic Medical Records and Genomics (eMERGE) Network. Genet. Med. 14, 424–431 (2012).
Alipanah, N., Kim, H. & Ohno-Machado, L. Building an ontology of phentoypes for exsiting GWAS studies. AMIA Jt Summits. Transl. Sci. Proc. 2013, 4–8 (2013).
Hsu, C.-N. et al. Learning phenotype mapping for integrating large genetic data. Proceedings of BioNLP 2011 Workshop [online], (2011).
Kohler, S. et al. The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res. 42, D966–D974 (2014).
Groza, T. et al. The Human Phenotype Ontology: semantic unification of common and rare disease. Am. J. Hum. Genet. 97, 111–124 (2015).
Mailman, M. D. et al. The NCBI dbGaP database of genotypes and phenotypes. Nat. Genet. 39, 1181–1186 (2007).
Tryka, K. A. et al. NCBI's Database of Genotypes and Phenotypes: dbGaP. Nucleic Acids Res. 42, D975–D979 (2014).
Hamilton, C. M. et al. The PhenX Toolki: get the most from your measures. Am. J. Epidemiol. 174, 253–260 (2011).
Pan, H. et al. Using PhenX measures to identify opportunities for cross-study analysis. Hum. Mutat. 33, 849–857 (2012).
O'Reilly, P. F. et al. MultiPhen: joint model of multiple phenotypes can increase discovery in GWAS. PLoS ONE 7, e34861 (2012).
Ferreira, M. A. R. & Purcell, S. M. A multivariate test of association. Bioinformatics 25, 132–133 (2009).
Stephens, M. A unified framework for association analysis with multiple related phenotypes. PLoS ONE 8, e65245 (2013).
Klei, L., Luca, D., Devlin, B. & Roeder, K. Pleiotropy and principal components of heritability combine to increase power for association analysis. Genet. Epidemiol. 32, 9–19 (2008).
van der Sluis, S., Posthuma, D. & Dolan, C. V. TATES: efficient multivariate genotype-phenotype analysis for genome-wide association studies. PLoS Genet. 9, e1003235 (2013).
Galesloot, T. E., van Steen, K., Kiemeney, L. A.L. M., Janss, L. L. & Vermeulen, S. H. A. Comparison of multivariate genome-wide association methods. PLoS ONE 9, e95923 (2014).
Liu, J., Pei, Y., Chris, J. & Deng, H. W. Bivariate association analyses for the mixture of continuous and binary traits with the use of extended generalized estimating equations. Genet. Epidemiol. 33, 217–227 (2009).
Precision Medicine Initiative (PMI) Working Group. The precision medicine initiative cohort program — building a research foundation for 21st century medicine. National Institutes of Health [online], (2015).
Riley, W. T., Nilsen, W. J., Manolio, T. A., Masys, D. R. & Lauer, M. News from the NIH: potential contributions of the behavioral and social sciences to the precision medicine initiative. Transl. Behav. Med. 5, 243–246 (2015).
Collins, R. What makes UK Biobank special? Lancet 379, 1173–1174 (2012).
Crawford, D. C. et al. eMERGEing progress in genomics — the first seven years. Front. Genet. 5, 184 (2014).
Hudson, K. L. & Collins, F. S. Bringing the Common Rule into the 21st Century. N. Engl. J. Med. 373, 2293–2296 (2015).
The authors declare no competing financial interests.
- Genotype frequency
Humans are diploid and therefore have two copies of each chromosome, representing maternal and paternal contributions. When a gene or locus is polymorphic (having more than one allele in a population), the genotype represents the maternal and paternal allele at that locus. For example, for a diallelic locus with alleles 'A' and 'a', the possible genotypes at that site are 'AA', 'Aa' and 'aa'. The genotype frequency is therefore the frequency of these combinations in the population.
- Genome-wide association studies
(GWAS). Studies wherein common genetic variants across the genome (hundreds of thousands to millions) are each tested for an association with one or a handful of common human diseases or traits.
- Cross-phenotype associations
A phenomenon whereby multiple phenotypes are associated with the same gene or genetic variant. Cross-phenotype associations may be due to pleiotropy or other underlying causes.
A genetic variant or gene that affects more than one distinct phenotype.
- Precision medicine
Often described as prescribing the right drug at the right dose for the first time in an individual patient. In general, precision medicine is the use of multiple data types (genomics, electronic health records, environmental exposures, and so on) to determine the best prevention or treatment options for an individual patient.
- Phenome-wide association study
(PheWAS). A study wherein a single genetic variant or a set of genetic variants are tested for an association with an assemblage of human diseases and/or traits (the phenome). The genetic variants often considered in PheWAS already have a known statistical association with a phenotype or are otherwise functional.
The set of all phenotypes expressed by a cell, tissue, organ, organism or species.
- Regression modelling
A statistical approach to assess the relationship between variables. In genetic association studies, the relationship between genetic polymorphisms (the independent variable) and disease status (the dependent variable) is often assessed using logistic regression, and association with a continuous value (such as blood lipid levels) is often assessed using linear regression.
- Multiple testing
The process of using statistical analysis to assess the potential association of a single variant is often formulated as a hypothesis test and has a specified false positive rate (usually 5%). As tens of thousands of such tests may be performed in the analysis of genetic data, adjustments of the P values resulting from assessments of individual variations are required to avoid numerous false positive results — a procedure known as multiple testing correction.
- Electronic health record
(EHR). A digital version of a patient's paper medical chart. An EHR can be distinguished from an electronic medical record (EMR) in that EHRs also include information relevant to the total health of the patient as opposed to being limited mostly to diagnosis and treatment of the patient.
A biological materials repository that collects, processes, stores and distributes biospecimens to support future scientific investigation.
- Effect sizes
The percentages of genetic variance or risk explained by a specific locus, ranging from less than 1% for many common traits up to 100% for some Mendelian diseases.
- eMERGE Network
The Electronic Medical Records and Genomics (eMERGE) Network is a collaboration in the United States of biobanks linked to electronic health records (EHRs). eMERGE was established in 2007 to explore the utility of EHRs in genomic research through funding from the US National Human Genome Research Institute with five biorepositories. The eMERGE Network is now in its third cycle with nine biobanks linked to EHRs, and the research scope has expanded to include domains in genomic medicine implementation.
- Billing codes
Codes that are assigned to services rendered in the clinic for reimbursement purposes. Billing codes include diagnostic codes, procedure codes, and pharmaceutical codes to name a few.
- Chi-squared tests
Statistical tests that are used to determine whether the observed frequencies are different to those expected. For typical case–control genetic association studies, the frequency of alleles or genotypes at each locus in cases with disease is compared with the frequency of alleles or genotypes at that same locus in controls without disease.
- Population stratification
The presence of a systematic difference in allele frequencies between subpopulations from a larger population, possibly owing to different ancestry, especially in the context of association studies. (Population stratification is also referred to as population structure in this context.) If not properly accounted for in association studies, population stratification can lead to spurious associations.
- CPT codes
Current procedural terminology (CPT) codes represent medical, surgical and diagnostic services rendered in the clinic for reimbursement purposes. CPT codes differ from International Classification of Disease (ICD) codes in that they are assigned when the service is performed as opposed to being assigned as part of a diagnosis.
Population Architecture using Genomics and Epidemiology (PAGE) is a collaborative network of large epidemiological and clinic-based studies with an emphasis on racially or ethnically diverse populations. The PAGE I study was funded in 2008 by the US National Human Genome Research Institute and consisted of four study sites with access to seven epidemiological studies and one clinic-based study. The research focus of PAGE I was the generalization of genome-wide association findings to diverse populations and the identification of environmental modifiers. PAGE is currently in its second cycle with a research focus on genomic discovery in diverse populations for common human diseases including cancers (for example, breast cancer, melanoma or prostate cancer) and cardiovascular disease.
The co-occurrence of two or more chronic diseases or conditions in a patient.
- Electronic phenotyping
The secondary use of electronic health records (EHRs) to define cases, controls, environmental exposures and other covariates for genetic association studies. Electronic phenotyping typically requires semi-automated or automated algorithms for accessing structured and unstructured data in the EHR.
- Positive predictive value
(PPV). The PPV is used to calculate the probability that a patient identified as a case is truly a case. The PPV is often calculated as a metric for diagnostic testing. In electronic phenotyping, the PPV is used to assess the performance of algorithms designed to use EHR data to identify cases and controls for downstream genetic association studies.
- Allele frequency
A gene or locus can have different forms, termed alleles. Genes or loci with more than one form are said to be polymorphic. The allele frequency is the frequency at which a particular form of the gene or locus is found in the population. If only one form of the gene is found in the population, the locus is said to be monomorphic.
Rights and permissions
About this article
Cite this article
Bush, W., Oetjens, M. & Crawford, D. Unravelling the human genome–phenome relationship using phenome-wide association studies. Nat Rev Genet 17, 129–145 (2016). https://doi.org/10.1038/nrg.2015.36
This article is cited by
Acute stress reduces population-level metabolic and proteomic variation
BMC Bioinformatics (2023)
Using human genetics to improve safety assessment of therapeutics
Nature Reviews Drug Discovery (2023)
Diverse functions associate with non-coding polymorphisms shared between humans and chimpanzees
BMC Ecology and Evolution (2022)
Large-scale real-world data analysis identifies comorbidity patterns in schizophrenia
Translational Psychiatry (2022)
A tissue-level phenome-wide network map of colocalized genes and phenotypes in the UK Biobank
Communications Biology (2022)