Electronic health record phenotypes associated with genetically regulated expression of CFTR and application to cystic fibrosis



The increasing use of electronic health records (EHRs) and biobanks offers unique opportunities to study Mendelian diseases. We described a novel approach to summarize clinical manifestations from patient EHRs into phenotypic evidence for cystic fibrosis (CF) with potential to alert unrecognized patients of the disease.


We estimated genetically predicted expression (GReX) of cystic fibrosis transmembrane conductance regulator (CFTR) and tested for association with clinical diagnoses in the Vanderbilt University biobank (N = 9142 persons of European descent with 71 cases of CF). The top associated EHR phenotypes were assessed in combination as a phenotype risk score (PheRS) for discriminating CF case status in an additional 2.8 million patients from Vanderbilt University Medical Center (VUMC) and 125,305 adult patients including 25,314 CF cases from MarketScan, an independent external cohort.


GReX of CFTR was associated with EHR phenotypes consistent with CF. PheRS constructed using the EHR phenotypes and weights discovered by the genetic associations improved discriminative power for CF over the initially proposed PheRS in both VUMC and MarketScan.


Our study demonstrates the power of EHRs for clinical description of CF and the benefits of using a genetics-informed weighing scheme in construction of a phenotype risk score. This research may find broad applications for phenomic studies of Mendelian disease genes.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Workflow of the study.
Fig. 2: Genetically regulated expression (GReX) of CFTR in brain hypothalamus correlates with dosage of DF508proxy.
Fig. 3: Haplotype-level genetically regulated expression (hGReX) of CFTR stratified by the presence of cystic fibrosis (CF) alleles.
Fig. 4: Phenotype risk score (PheRS) construction for cystic fibrosis (CF) and performance evaluation.


  1. 1.

    Farrell PM, White TB, Ren CL, Hempstead SE, Accurso F, Derichs N, et al. Diagnosis of cystic fibrosis: consensus guidelines from the Cystic Fibrosis Foundation. J Pediatr. 2017;181S:S4–S15 e11.

    Article  Google Scholar 

  2. 2.

    Ikpa PT, Bijvelds MJ, de Jonge HR. Cystic fibrosis: toward personalized therapies. Int J Biochem Cell Biol. 2014;52:192–200.

    CAS  Article  Google Scholar 

  3. 3.

    Rowntree RK, Harris A. The phenotypic consequences of CFTR mutations. Ann Hum Genet. 2003;67(Pt 5):471–485.

    CAS  Article  Google Scholar 

  4. 4.

    Cutting GR. Cystic fibrosis genetics: from molecular understanding to clinical application. Nat Rev Genet. 2015;16:45–56.

    CAS  Article  Google Scholar 

  5. 5.

    Blackman SM, Commander CW, Watson C, Arcara KM, Strug LJ, Stonebraker JR, et al. Genetic modifiers of cystic fibrosis-related diabetes. Diabetes. 2013;62:3627–3635.

    CAS  Article  Google Scholar 

  6. 6.

    Corvol H, Blackman SM, Boelle PY, Gallins PJ, Pace RG, Stonebraker JR, et al. Genome-wide association meta-analysis identifies five modifier loci of lung disease severity in cystic fibrosis. Nat Commun. 2015;6:8382.

    CAS  Article  Google Scholar 

  7. 7.

    Wright FA, Strug LJ, Doshi VK, Commander CW, Blackman SM, Sun L, et al. Genome-wide association and linkage identify modifier loci of lung disease severity in cystic fibrosis at 11p13 and 20q13.2. Nat Genet. 2011;43:539–546.

    CAS  Article  Google Scholar 

  8. 8.

    Chong JX, Buckingham KJ, Jhangiani SN, Boehm C, Sobreira N, Smith JD, et al. The genetic basis of Mendelian phenotypes: discoveries, challenges, and opportunities. Am J Hum Genet. 2015;97:199–215.

    CAS  Article  Google Scholar 

  9. 9.

    Castel SE, Cervera A, Mohammadi P, Aguet F, Reverter F, Wolman A, et al. Modified penetrance of coding variants by cis-regulatory variation contributes to disease risk. Nat Genet. 2018;50:1327–1334.

    CAS  Article  Google Scholar 

  10. 10.

    Consortium GT. Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science. 2015;348:648–660.

    Article  Google Scholar 

  11. 11.

    Mele M, Ferreira PG, Reverter F, DeLuca DS, Monlong J, Sammeth M, et al. Human genomics. The human transcriptome across tissues and individuals. Science. 2015;348:660–665.

    CAS  Article  Google Scholar 

  12. 12.

    Denny JC, Ritchie MD, Basford MA, Pulley JM, Bastarache L, Brown-Gentry K, et al. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics. 2010;26:1205–1210.

    CAS  Article  Google Scholar 

  13. 13.

    Roden DM, Pulley JM, Basford MA, Bernard GR, Clayton EW, Balser JR, et al. Development of a large-scale de-identified DNA biobank to enable personalized medicine. Clin Pharmacol Ther. 2008;84:362–369.

    CAS  Article  Google Scholar 

  14. 14.

    McCarthy S, Das S, Kretzschmar W, Delaneau O, Wood AR, Teumer A, et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet. 2016;48:1279–1283.

    CAS  Article  Google Scholar 

  15. 15.

    Do R, Willer CJ, Schmidt EM, Sengupta S, Gao C, Peloso GM, et al. Common variants associated with plasma triglycerides and risk for coronary artery disease. Nat Genet. 2013;45:1345–1352.

    CAS  Article  Google Scholar 

  16. 16.

    Howie BN, Donnelly P, Marchini J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 2009;5:e1000529.

    Article  Google Scholar 

  17. 17.

    Gamazon ER, Wheeler HE, Shah KP, Mozaffari SV, Aquino-Michaels K, Carroll RJ, et al. A gene-based association method for mapping traits using reference transcriptome data. Nat Genet. 2015;47:1091–1098.

    CAS  Article  Google Scholar 

  18. 18.

    Denny JC, Bastarache L, Ritchie MD, Carroll RJ, Zink R, Mosley JD, et al. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nat Biotechnol. 2013;31:1102–1110.

    CAS  Article  Google Scholar 

  19. 19.

    Carroll RJ, Bastarache L, Denny JC. R PheWAS: data analysis and plotting tools for phenome-wide association studies in the R environment. Bioinformatics. 2014;30:2375–2376.

    CAS  Article  Google Scholar 

  20. 20.

    Dodge JA, Morison S, Lewis PA, Coles EC, Geddes D, Russell G, et al. Incidence, population, and survival of cystic fibrosis in the UK, 1968-95. UK Cystic Fibrosis Survey Management Committee. Arch Dis Child. 1997;77:493–496.

    CAS  Article  Google Scholar 

  21. 21.

    Kerem B, Rommens JM, Buchanan JA, Markiewicz D, Cox TK, Chakravarti A, et al. Identification of the cystic fibrosis gene: genetic analysis. Science. 1989;245:1073–1080.

    CAS  Article  Google Scholar 

  22. 22.

    Lemna WK, Feldman GL, Kerem B, Fernbach SD, Zevkovich EP, O’Brien WE, et al. Mutation analysis for heterozygote detection and the prenatal diagnosis of cystic fibrosis. N Engl J Med. 1990;322:291–296.

    CAS  Article  Google Scholar 

  23. 23.

    Stegle O, Parts L, Piipari M, Winn J, Durbin R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat Protoc. 2012;7:500–507.

    CAS  Article  Google Scholar 

  24. 24.

    Putting research data into your hands with the MarketScan Databases. 2016. http://truvenhealth.com/markets/life-sciences/products/data-tools/marketscan-databases. Accessed 2020 Feb 6.

  25. 25.

    IBM Watson Health, IBM MarketScan Research Databases 2019. https://www.ibm.com/downloads/cas/4QD5ADRL. Accessed 2020 Feb 6.

  26. 26.

    Kulaylat AS, Schaefer EW, Messaris E, Hollenbeak CS. Truven Health Analytics MarketScan Databases for clinical research in colon and rectal surgery. Clin Colon Rectal Surg. 2019;32:54–60.

    Article  Google Scholar 

  27. 27.

    Quint J. Health research data for the real world: the MarketScan database. Ann Arbor, MI: Truven Health Analytics; 2015.

  28. 28.

    Jia G, Li Y, Zhang H, Chattopadhyay I, Boeck Jensen A, Blair DR, et al. Estimating heritability and genetic correlations from large health data sets in the absence of genetic data. Nat Commun. 2019;10:5508.

    CAS  Article  Google Scholar 

  29. 29.

    Noroski L, Das S, Hajjar J. Case 40-2018: a woman with recurrent sinusitis, cough, and bronchiectasis. N Engl J Med. 2019;380:1383.

    Article  Google Scholar 

  30. 30.

    McCloskey M, Redmond AO, Hill A, Elborn JS. Clinical features associated with a delayed diagnosis of cystic fibrosis. Respiration. 2000;67:402–407.

    CAS  Article  Google Scholar 

  31. 31.

    Gan KH, Geus WP, Bakker W, Lamers CB, Heijerman HG. Genetic and clinical features of patients with cystic fibrosis diagnosed after the age of 16 years. Thorax. 1995;50:1301–1304.

    CAS  Article  Google Scholar 

  32. 32.

    Rodman DM, Polis JM, Heltshe SL, Sontag MK, Chacon C, Rodman RV, et al. Late diagnosis defines a unique population of long-term survivors of cystic fibrosis. Am J Respir Crit Care Med. 2005;171:621–626.

    Article  Google Scholar 

  33. 33.

    Bastarache L, Hughey JJ, Hebbring S, Marlo J, Zhao W, Ho WT, et al. Phenotype risk scores identify patients with unrecognized Mendelian disease patterns. Science. 2018;359:1233–1239.

    CAS  Article  Google Scholar 

  34. 34.

    Bastarache L, Bastarache JA, Denny JC. Case 40-2018: a woman with recurrent sinusitis, cough, and bronchiectasis. N Engl J Med. 2019;380:1382–1383.

    Article  Google Scholar 

  35. 35.

    Schram CA. Atypical cystic fibrosis: identification in the primary care setting. Can Fam Physician. 2012;58:1341–1345. e1699-1704

    PubMed  PubMed Central  Google Scholar 

  36. 36.

    Montoro DT, Haber AL, Biton M, Vinarsky V, Lin B, Birket SE, et al. A revised airway epithelial hierarchy includes CFTR-expressing ionocytes. Nature. 2018;560:319–324.

    CAS  Article  Google Scholar 

  37. 37.

    Plasschaert LW, Zilionis R, Choo-Wing R, Savova V, Knehr J, Roma G, et al. A single-cell atlas of the airway epithelium reveals the CFTR-rich pulmonary ionocyte. Nature. 2018;560:377–381.

    CAS  Article  Google Scholar 

  38. 38.

    Mulberg AE, Weyler RT, Altschuler SM, Hyde TM. Cystic fibrosis transmembrane conductance regulator expression in human hypothalamus. Neuroreport. 1998;9:141–144.

    CAS  Article  Google Scholar 

  39. 39.

    Guo Y, Su M, McNutt MA, Gu J. Expression and distribution of cystic fibrosis transmembrane conductance regulator in neurons of the human brain. J Histochem Cytochem. 2009;57:1113–1120.

    CAS  Article  Google Scholar 

  40. 40.

    Marcorelles P, Friocourt G, Uguen A, Lede F, Ferec C, Laquerriere A. Cystic fibrosis transmembrane conductance regulator protein (CFTR) expression in the developing human brain: comparative immunohistochemical study between patients with normal and mutated CFTR. J Histochem Cytochem. 2014;62:791–801.

    Article  Google Scholar 

  41. 41.

    Kowalczyk T, Pontious A, Englund C, Daza RA, Bedogni F, Hodge R, et al. Intermediate neuronal progenitors (basal progenitors) produce pyramidal-projection neurons for all layers of cerebral cortex. Cereb Cortex. 2009;19:2439–2450.

    Article  Google Scholar 

Download references


This work was funded by the National Institutes of Health (NIH) grants R01MH113362, U01HG009086, R35HG010718, R01HL122712, 1P50MH094267, and U01HL108634-01. A.R. also acknowledges support from the Defense Advanced Research Projects Agency (DARPA) Big Mechanism program under Army Research Office (ARO) contract W911NF1410333, the King Abdullah University of Science and Technology (KAUST), and a gift from Liz and Kent Dauten. BioVU and the Synthetic Derivative of Vanderbilt University Medical Center are supported by the National Center for Advancing Translational Science grant UL1TR000445 from NIH; the genotypes in BioVU used for the analyses described were funded by NIH grants RC2GM092618 and U01HG004603.

Author information



Corresponding authors

Correspondence to Xue Zhong PhD or Nancy J. Cox PhD.

Ethics declarations


E.R.G. receives an honorarium from the journal Circulation Research of the American Heart Association, as a member of the Editorial Board. He performed consulting on pharmacogenetic analysis with the City of Hope/Beckman Research Institute. The other authors declare no conflicts of interest.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhong, X., Yin, Z., Jia, G. et al. Electronic health record phenotypes associated with genetically regulated expression of CFTR and application to cystic fibrosis. Genet Med 22, 1191–1200 (2020). https://doi.org/10.1038/s41436-020-0786-5

Download citation


  • Mendelian
  • cystic fibrosis
  • CFTR
  • cis-regulated expression
  • phenotype risk score