Semantic integration of clinical laboratory tests from electronic health records for deep phenotyping and biomarker discovery

Zhang, Xingmin Aaron; Yates, Amy; Vasilevsky, Nicole; Gourdine, J. P.; Callahan, Tiffany J.; Carmody, Leigh C.; Danis, Daniel; Joachimiak, Marcin P.; Ravanmehr, Vida; Pfaff, Emily R.; Champion, James; Robasky, Kimberly; Xu, Hao; Fecho, Karamarie; Walton, Nephi A.; Zhu, Richard L.; Ramsdill, Justin; Mungall, Christopher J.; Köhler, Sebastian; Haendel, Melissa A.; McDonald, Clement J.; Vreeman, Daniel J.; Peden, David B.; Bennett, Tellen D.; Feinstein, James A.; Martin, Blake; Stefanski, Adrianne L.; Hunter, Lawrence E.; Chute, Christopher G.; Robinson, Peter N.

doi:10.1038/s41746-019-0110-4

Download PDF

Article
Open access
Published: 02 May 2019

Semantic integration of clinical laboratory tests from electronic health records for deep phenotyping and biomarker discovery

npj Digital Medicine volume 2, Article number: 32 (2019) Cite this article

8645 Accesses
34 Citations
54 Altmetric
Metrics details

Subjects

Abstract

Electronic Health Record (EHR) systems typically define laboratory test results using the Laboratory Observation Identifier Names and Codes (LOINC) and can transmit them using Fast Healthcare Interoperability Resource (FHIR) standards. LOINC has not yet been semantically integrated with computational resources for phenotype analysis. Here, we provide a method for mapping LOINC-encoded laboratory test results transmitted in FHIR standards to Human Phenotype Ontology (HPO) terms. We annotated the medical implications of 2923 commonly used laboratory tests with HPO terms. Using these annotations, our software assesses laboratory test results and converts each result into an HPO term. We validated our approach with EHR data from 15,681 patients with respiratory complaints and identified known biomarkers for asthma. Finally, we provide a freely available SMART on FHIR application that can be used within EHR systems. Our approach allows readily available laboratory tests in EHR to be reused for deep phenotyping and exploits the hierarchical structure of HPO to integrate distinct tests that have comparable medical interpretations for association studies.

Ontologizing health systems data at scale: making translational discovery a reality

Article Open access 19 May 2023

GenomeDiver: a platform for phenotype-guided medical genomic diagnosis

Article 10 June 2021

Neptune: an environment for the delivery of genomic medicine

Article 13 July 2021

Introduction

Electronic health records (EHRs) have been widely adopted in US hospitals since the Health Information Technology for Electronic and Clinical Health Act (HITECH) was passed in 2009, and offer an unprecedented opportunity to accelerate translational research because of advantages of scale and cost efficiency as compared with traditional cohort-based studies.¹ In particular, EHRs contain rich phenotype information that can be utilized to stratify diseases and to develop hypotheses. For instance, phenome-wide association studies (PheWAS) can exploit EHR data to define case–control cohorts for disease diagnoses or laboratory traits and then analyze associations with hundreds of thousands of genetic variants.^2,3,4 Despite the great potential of EHR data, patient phenotyping from EHRs is still challenging because the phenotype information is distributed in many EHR locations (laboratories, notes, problem lists, imaging data, etc.) and since EHRs have vastly different structures across sites. This lack of integration represents a substantial barrier to widespread use of EHR data in translational research.

Laboratory tests provide a critical resource for phenotype extraction. Deep phenotyping, i.e., comprehensive and precise phenotyping of individual disease manifestations, is an essential component of precision medicine and could potentially extend the reach of PheWAS studies.^5,6 Laboratory tests have broad applicability for translational research, but EHR-based research using laboratory data have been challenging because of their diversity and the lack of standardization of reporting laboratory test results. For instance, some tests measure nitrite level in urine using an automated machine, whereas others use a test strip. Some report the value in mg/dL, whereas others report a qualitative value of positive/negative. If any of these tests were abnormal, the medical interpretation would be that nitrituria is present, yet current informatics frameworks do not easily support such inferences. Therefore, substantial challenges exist for standardization and integration of laboratory data for deep phenotyping and EHR-based translational research.

Recent advances in the standardization of EHR systems and phenotype ontologies make it feasible to extract patient phenotypes from laboratory tests at a large scale. The Fast Healthcare Interoperability Resource (FHIR) was introduced in 2013 and provides a standardized interface to individual EHR systems for healthcare-related data.⁷ FHIR separates healthcare-related data into granular components as “resources” such as observation, medication, patient identity and insurance claims, which have a standard definition and associated semantic bindings, which can be computationally integrated even when they are created by different methods and organizations. Laboratory tests, encoded as observations in FHIR, are uniquely identified with Laboratory Observation Identifier Names and Codes (LOINC), which is a universal code system that defines various kinds of clinical laboratory tests and other measurements (~86,000 entries).⁸ The outcome of a FHIR observation can be represented by a term in the Human Phenotype Ontology (HPO), which is a logically defined vocabulary for describing medically relevant abnormal phenotypes.⁹ The HPO has become the de facto standard for computational phenotype analysis in genomics and rare disease.^10,11,12 The HPO currently contains 14,184 terms (February, 2019) including a comprehensive representation of laboratory abnormalities such as hyperglycemia, thrombocytopenia, nitrituria, and increased urine alpha-ketoglutarate concentration. Here, we present a computational method that semantically harmonizes FHIR, LOINC, and HPO. The software rolls up LOINC terms for tests whose outcomes are medically comparable into common categories and interprets the outcome as HPO terms, thereby automatically extracting detailed, deep phenotypic profiles of laboratory results for downstream studies.

Results

Overview of strategy

We present an approach to mapping the outcomes of laboratory tests as encoded in EHRs with LOINC terms for the tests and FHIR Observation resources representing the test results as HPO terms. A LOINC term by itself does not specify the outcome of a test. But if the outcome of a test (such as “high” or “low”) and the nature of the test are known, we can then infer the phenotypic abnormality. For example, LOINC 32710-6 “Nitrite [Presence] in Urine” together with the outcome “positive” implies the phenotypic abnormality Nitrituria (HP:0031812).

LOINC-coded laboratory tests can be grouped broadly into three categories, those with a quantitative outcome (Qn), an ordered categorical outcome (ordinal, or Ord) and an unordered categorical outcome (nominal, or Nom). A quantitative test for an analyte has a normal range, and there are three types of mappings depending on the result of the test: L (lower than normal), N (normal), and H (higher than normal). Take, for instance, a test for the concentration of potassium in the blood (LOINC:6298-4, Fig. 1a). If the result is high, our procedure infers the corresponding HPO term for Hyperkalemia (HP:0002153). Analogously, a low result is mapped to Hypokalemia (HP:0002900). The HPO is an ontology of abnormal phenotypes, and thus there is no term that specifically represents a normal test result. However, computational analysis can record negated HPO terms, and the normal test result is represented as NOT Abnormal blood potassium concentration (HP:0011042).

Ordinal tests can have a series of ordered outcomes. The majority of the ordinal LOINC tests were mapped to two possible outcomes, POS (positive) or NEG (negative). For instance, the result of the test Nitrite in urine by test strip can be positive (present) or negative (absent) (Fig. 1a). If present, then our approach infers the HPO term Nitrituria (HP:0031812); if absent, our approach infers NOT Nitrituria (HP:0031812).

Nominal tests have a series of outcomes that lack a natural ordering. Yet, some nominal result values are considered abnormal. For instance, LOINC 5778-6, color of urine. Currently, nine abnormal results of this test are mapped to the nine child terms of abnormal urinary color (HP:0012086), including red urine (HP:0040318) and dark urine (HP:0040319).

A LOINC to HPO mapping library

We have mapped 2923 LOINC terms to HPO terms. In all, 80.4% of the mapped LOINC tests are Qn, 18.8% Ord, and 0.8% Nom (Fig. 2a). Taken together, these LOINC terms mapped to a total of 719 distinct HPO terms. We analyzed the distribution of the number of distinct LOINC terms that were mapped to an individual HPO term. In 54.8% of the cases, two or more LOINC terms are mapped to the same HPO term (mean 7.5) (Fig. 2b), reflecting the fact that multiple laboratory tests (and associated LOINC terms) have outcomes that we consider to have an equivalent clinical interpretation.

Algorithm for converting LOINC-coded laboratory tests into HPO-coded phenotypes

We designed an algorithm that inspects elements of a FHIR resource for laboratory tests and converts the outcome into an HPO term. A standard FHIR resource for laboratory tests (a FHIR Observation) contains patient information, test identification, test result, normal reference ranges, and interpretations (Fig. 1b). The algorithm compares the numerical result with the normal reference ranges to assign an interpretation code such as “L” or “POS” (Table 1), or make use of the interpretation codes when they are present, to map the result to the corresponding HPO term (Supplementary Fig. 1). Overall, the algorithm handles all three major types of LOINC-coded laboratory tests (Qn, Ord, and Nom) when combined with the LOINC to HPO annotation data.

Table 1 FHIR codes for test outcomes

Full size table

HPO on FHIR

To demonstrate conversion of FHIR-encoded LOINC tests into HPO, we created a SMART on FHIR application that uses the mapping library. SMART (Substitutable Medical Applications, Reusable Technologies) on FHIR is an application platform for EHRs that allows applications to run on different FHIR-enabled EHR systems.¹³ Our application, HPO on FHIR, transforms a bundle of laboratory observations for a patient into a list of HPO codes (Fig. 3). We have also developed a command-line application that can iterate through all laboratory tests in a FHIR-enabled server, convert each into an HPO term and store them in a relational database for translational research.

LOINC to HPO demonstration with asthma

To test our method for semantic integration of laboratory tests, we analyzed a de-identified EHR dataset from the University of North Carolina (UNC) comprising 15,681 patients who had a history of asthma or asthma-like symptoms. The cohort is skewed toward female (58.9%) and older patients (median age: 61.5 years, Fig. 4a). The median tracking period of patients in this cohort is 3.1 years. The dataset contains ~54 million records of LOINC-encoded clinical test results, medication prescriptions, diagnosis codes, procedure codes, patient information, and other supporting records (Fig. 4b). Using our LOINC to HPO conversion algorithm, we successfully transformed 9.9 out of 11 million (88.6%) laboratory tests into HPO terms (Fig. 4c). For the entire cohort, on average, each HPO term was mapped from 1.8 distinct types of laboratory tests (Fig. 4d), indicating that the transformation successfully integrated distinctly coded laboratory tests that have the same clinical interpretation. The mapping procedure assigned an average of 633 laboratory test-derived HPO terms per individual patient, many of which were from the same laboratory tests performed at different visits. The tests corresponded to a mean of 57.7 unique HPO terms, of which 20.8 were abnormalities and the remainder were normal phenotypes (Fig. 4e). The hierarchical structure of the HPO allows inferences to be propagated up to parent terms and their ancestors;¹⁴ using this method, we inferred an additional 51.2 HPO terms (total 73.5) based on 22.2 abnormalities for each patient (Supplementary Fig. 2).

As a proof-of-principle, we tested the ability of our procedure to identify phenotypic abnormalities associated with a diagnosis of asthma or with frequent prednisone use. About one-third of the patients in this cohort had an ICD 9/10 diagnosis of asthma, and the remaining patients had ICD 9/10 codes reflecting other, potentially asthma-like, respiratory complaints. In all, 14.2% of patients who had a diagnosis of asthma were administered or prescribed prednisone >3 times within a tracking period between 2004 and 2016; 8.5% of the remaining patients had been administered prednisone more than three times. Prednisone is a corticosteroid drug used for severe asthma treatment with multiple other indications.¹⁵ We reasoned that both the diagnosis of asthma and the history of treatment with prednisone would likely be correlated with different but overlapping sets of laboratory abnormalities. Using logistic regression, we assessed the contribution of frequent prednisone prescription and the presence of acute asthma diagnosis to each phenotypic abnormality.

Prednisone usage was significantly associated with an increased odds ratio for exhibiting many abnormal phenotypes that are consistent with the known effects of prednisone (Table 2), such as hypoalbuminemia (HP:0003073),¹⁶ neutrophilia (HP:0011897),¹⁷ monocytosis (HP:0012311),¹⁸ leukocytosis (HP:0001974),¹⁸ hypokalemia (HP:0002900),¹⁹ and elevated serum creatine phosphokinase (HP:0003236).²⁰ An acute asthma diagnosis was significantly associated with seven phenotypes, abnormal metabolism (HP:0032245), abnormality of vitamin metabolism (HP:0100508), increased red blood cell count (HP:0020059), increased VLDL cholesterol concentration (HP:0003362), and eosinophilia (HP:0001880), and two ancestor terms of eosinophilia, abnormal eosinophil count (HP:0020064), and abnormal eosinophil morphology (HP:0001879). Eosinophilia is a well-established marker for acute allergic asthma.²¹ Several studies have linked vitamin A, B, C, D, E with asthma.^22,23,24 In this study, we applied a threshold minimum number of patients before performing statistical analysis, and none of the specific subtypes of abnormality of vitamin metabolism (HP:0100508; n = 111 patients) passed this threshold. However, a number of patients were found to have increased blood folate (HP:0040087; n = 33 patients), vitamin B12 deficiency (HP:0200502; n = 6 patients), low serum calciferol (HP:0012053; n = 56 patients), and low serum calcitriol (HP:0012052; n = 6 patients). Thus, the hierarchical structure of HPO allowed us to infer the parent phenotype (Abnormality of vitamin metabolism) and aggregate enough data to find that it is associated with acute asthma diagnosis (Supplementary Fig. 3). The term abnormal metabolism (HP:0032245) was also flagged, but this was solely related to the 111 patients annotated to abnormality of vitamin metabolism, which is a child term of abnormal metabolism. Although there have been some conflicting results,²⁵ a number of studies have shown a positive correlation between increased total, high- or low density lipoprotein cholesterol, or triglycerides (Supplementary Fig. 2) and asthma.^{26,27,28,29,30} An increased red blood cell count is not a recognized biomarker of asthma, but could conceivably reflect a number of factors including hypoxemia (11.1% with an acute asthma diagnosis also had a chronic obstructive pulmonary disease diagnosis), or hemoconcentration resulting from acute dehydration during an asthma attack, but the nature of this retrospective study does not allow us to consult the full medical records to investigate this.

Table 2 Odds ratio of phenotypes for frequent prednisone prescription and acute asthma diagnosis

Full size table

Discussion

In this report, we present an approach to the semantic integration of laboratory tests and results in EHR data. Our approach connects a widely used system for denoting laboratory tests, LOINC, with a current standard for transmitting healthcare information, FHIR, and a computational resource for deep phenotyping, HPO, that was previously used mainly in the context of rare disease research and diagnostics. Previous work such as OntoServer provides lookup services of different terminologies and maps similar concepts that originate from different terminologies.³¹ The focus of our tool in contrast is to provide a means of interpreting the outcomes of laboratory tests using an ontology of phenotypic abnormalities. Normalizing laboratory tests with HPO terms is an effective solution for two fundamental issues in clinical research: data integration and deep phenotyping. Laboratory test results support a large proportion of medical decisions.³² It is common that different laboratory tests may lead to results that have very similar or identical clinical interpretations. These different tests are recorded in the EHR using distinct codes (for instance, currently, there are four different LOINC terms for different tests of urine nitrite). This level of granularity can create difficulties for the semantic integration of comparable test results. By converting the results of laboratory tests to HPO-encoded phenotypes, our method provides an effective way for integrating laboratory tests that have the same clinical interpretation but different LOINC codes. Extracted patient phenotypes can be directly utilized for PheWAS studies, which is important because phenotyping patients is a major bottleneck for conducting PheWAS studies.³³ The Electronic Medical Records and Genomics (eMerge) network develops EHR-derived phenotyping algorithms by combining diagnosis codes, procedure codes, medication, narratives, and subsets of laboratory tests and iteratively refine them to identify control and disease cohorts for genome-wide association studies and PheWAS.^1,3,33,34,35 Our method complements existing phenotyping algorithms because it extracts additional phenotypic information by systematically interrogating the vast amount of data in laboratory tests.

The analysis of UNC EHR data demonstrated the potential of combining deep phenotypes from our tool with EHR data for biomarker discovery. Our current mapping library allowed us to convert the majority of the laboratory tests into HPO terms and assign an average of 57.7 unique phenotypes to each patient. The statistical analysis identified phenotypic abnormalities that are associated with frequent prescriptions of prednisone and/or acute asthma diagnosis. The cohort used for this analysis is biased toward senior and female patients and may not be reflective of asthma patient distributions, but the fact that our analysis identified numerous abnormalities that are associated with either prednisone use or asthma suggests that our approach can be useful for the investigation of EHR data for laboratory-based biomarkers of diseases and conditions. We have demonstrated the utility of our approach on the UNC dataset using a simple logistic regression approach as a proof-of-principle; we envision that our mapping approach could be used together with a variety of statistical and algorithmic analysis strategies to address a variety of topics in EHR-based translational research, and we have therefore coded our foundational approach in a way that can easily be integrated into other statistical analysis pipelines. A particularly attractive direction is to incorporate temporal information to build predictive models based on longitudinal phenotypic timelines.^36,37

Some practical issues need to be considered when adopting our approach. Although LOINC has been widely adopted by healthcare providers and increasingly mandated by various federal agencies, it is still not a universal system. Since we used LOINC for the mapping, locally coded laboratory tests will not be able to be mapped to HPO terms with our tools. Similarly, the SMART on FHIR tool reported here can only be utilized in FHIR-enabled hospital systems. However, our annotation file and the algorithmic approach we adopted can be used independently of FHIR.

Several other use cases for our approach are conceivable. Rule-based algorithms could be applied to infer HPO terms from the primary phenotypic abnormalities. For instance, the combination of decreased hemoglobin concentration (HP:0020062) and decreased mean corpuscular volume (HP:0025066) implies microcytic anemia (HP:0001935). The HPO is widely used in rare disease diagnostics, but one bottleneck is that in many settings, HPO terms need to be entered manually into the analysis software. A recent study used text-mining to extract detailed patient phenotypes through natural language processing of clinical narratives in EHR, and used the resulting lists of HPO terms for genomic diagnostics.¹¹ Our tool could supplement such approaches by providing a computational representation of laboratory findings to genomic diagnostic software such as Exomiser.^38,39,40 In principle, our tool could be used to support other tasks related to EHR data, including decision support and cohort recruitment. In the future, we anticipate that semantic integration of a wider range of EHR data will become the norm to support data-driven translational research and precision medicine.

Methods

Mapping LOINC terms to HPO terms

We performed manual biocuration to construct a mapping library from each potential outcome of a LOINC test to the corresponding HPO term (Fig. 1a). The test outcome is represented using a subset of FHIR codes (Table 1, primary code), such as “lower than normal”, “normal”, or “higher than normal”. For quantitative tests that report a numeric measurement, we use FHIR interpretation code “L” and “H” to indicate lower or higher than normal, and “N” and “A” to indicate the result is normal or abnormal. For ordinal tests that have a binary outcome, i.e., present or absent of the test target, we use FHIR interpretation code “POS” to indicate present and “NEG” to indicate absent. In addition, other interpretation codes defined by FHIR are first mapped to primary codes. For example, FHIR codes “LL” (critically low) and “<” (off scale low) are both mapped to “L” (Table 1).

The value for a map entry is an HPO term accompanied by a boolean value to indicate whether it should be negated. That is, while an abnormal test outcome is mapped to a particular HPO term, the normal outcome for that test is mapped to the negated form, since the HPO contains only terms for abnormal phenotypes. Figure 1a shows three examples of mappings for Qn, Ord, or Nom LOINC terms.

In order to efficiently perform the biocuration needed to generate the LOINC mappings, we developed a JavaFX-based annotation tool that recommends candidate HPO terms to a LOINC test based on lexical matching between HPO term definitions and the name of a laboratory test. The recommended HPO terms were then manually vetted by one of five biocurators (i.e., one MD and four PhDs who have biomedical training and are major contributors to the HPO project) and cross-validated by a different annotator. Mapping problems were tracked by Github issues (https://github.com/TheJacksonLaboratory/loinc2hpoAnnotation/issues) and discussed during regular meetings. Source code and an executable version of the biocuration application are freely accessible at https://github.com/monarch-initiative/loinc2hpo. In addition, a subset (n = 160) of pediatric-specific laboratory tests were independently validated by five domain experts (i.e., three pediatric clinicians, a PhD-level molecular biologist, and a master’s-level epidemiologist). To perform this validation, a Qualtrics survey was designed so that each question featured a laboratory test description and set of reasonable HPO concepts. The survey was completed by all experts between October and December (2019). After completion, any laboratory test mapping that did not meet agreement by at least one clinician and both the biologist/epidemiologist were re-evaluated with one clinician until consensus was reached. The pediatric terms were additionally vetted on the loinc2hpoAnnotation GitHub tracker by the entire team of biocurators.

LOINC to HPO mapping file

The LOINC to HPO mapping file contains records of mapping from LOINC test outcomes to the corresponding HPO terms. The annotation data are serialized as a tab-separated value (TSV) file. Each line records the LOINC code, test outcome, the mapped HPO term, and whether the mapped term should be negated. The annotation file is deposited at Github and can be accessed at https://w3id.org/loinc2hpo/annotations. An excerpt is shown in Supplementary Table 1.

HPO on FHIR

We created a SMART on FHIR application, HPO on FHIR, to query a FHIR-enabled EHR servers and return patient laboratory results with LOINC codes and their corresponding HPO terms. The web interface of the application aggregates identical HPO terms together for visualization and also allows users to display source laboratory tests including subject, LOINC code, FHIR resource id, effective time and the corresponding HPO term. The application was written in the Java language with the Spring framework. The application implements the LOINC to HPO conversion algorithm described in Supplementary Fig. 1. The application is deposited at Github and can be accessed at (https://github.com/OCTRI/poc-hpo-on-fhir).

Command-line application for gathering FHIR server statistics

We created a command-line application that finds all laboratory tests for a patient on a FHIR server and attempts to convert them to HPO. The conversion results, both successes and failures, are stored in a relational database to aid in translational research. We ran the application on seven common FHIR sandboxes and gathered statistics about the LOINCs encountered, the rate of success in conversion, and the underlying causes of failure. The application was written in the Java language with the Spring framework. Source code, results, and a backup of the database, can be accessed at https://github.com/OCTRI/f2hstats.

Analysis of UNC data on patients with asthma or an asthma-like condition

For the purposes of demonstrating the potential utility of our library, we examined a de-identified EHR dataset extracted from the Carolina Data Warehouse for Health (CDWH) at the UNC. The data were accessed under a fully executed Data Use Agreement between The Jackson Laboratory and UNC. The CDWH is UNC Health Care System’s (UNCHCS) enterprise data warehouse, and contains EHR data for all UNCHCS patients from 2004 through 2016. The sample used for this investigation contains 15,681 patients with one or more encounters at UNCHCS with an asthma or asthma-like diagnosis (Supplementary Table 2). The data were exported from the UNC EHR system as eight separate comma-separated value (CSV) files containing clinical observations in a variety of data domains, including demographics, encounter details, diagnoses, procedures, medications, vital signs, and LOINC-coded lab results. Prior to transmission from UNC, the dataset was de-identified according to the Safe Harbor method of the Health Insurance Portability and Accountability Act (HIPAA), and all dates were shifted ±50 days. The project methods and use of the de-identified EHR-derived dataset were reviewed by The Jackson Laboratory Institutional Review Board and confirmed to be compliant with relevant guidelines and regulations and approved for data access on 19 December 2017.

Using the extracted laboratory data, we converted each LOINC-coded test into an HPO term. We note, however, that not every laboratory test result was captured in the available dataset. For each patient, we combined test records mapped to the same HPO terms and recorded the counts of observations for each HPO term. Then, we inferred additional phenotypic abnormalities based on the hierarchical structure of HPO, i.e., if a patient was assigned with an HPO term, we infer that the patient also had phenotypic abnormalities encoded by parent and other ancestor terms (Supplementary Fig. 2). We reasoned that an isolated abnormal measurement might represent an artifact or might not be typical of the clinical course of the patient, and therefore used a threshold of three observations over the entire observation period in order to classify a patient with the corresponding HPO-encoded phenotypic abnormality. We classified a patient not having an HPO-encoded phenotypic abnormality only when the patient had never been assigned to the HPO term in question. Patient age was calculated from the last hospital visit date subtracting the birth date and is subject to an inaccuracy of ±50 days due to the deidentification procedure (see above). Patients who rarely visited hospitals were less likely to receive laboratory tests and thus had less phenotypes, so we excluded those who had medical encounters on <10 days. Patients received >3 prednisone prescriptions were considered frequent users.

Statistics

We applied a logistic regression model to determine the weights of being a frequent prednisone user (values 0 or 1) and having an acute asthma diagnosis (values 0 or 1) in determining a patient having an HPO-encoded phenotype (values 0 or 1). We excluded HPO terms from analysis of which the majority (95%) of the cohort had universal values (all 0 or 1). The natural exponential of the weights ± 1.98 standard deviations were converted to the odd ratio and 95% confidence intervals for each variable.

Data cleaning, normalization, wrangling, and table joining were conducted by a combination of “tidyverse”, “RSQLite” packages in R, SQLite, and Java. Logistic regression was conducted with the “glm” package in R. All source code is deposited at Github and can be accessed through https://github.com/TheJacksonLaboratory/HUSHDataAnalysis.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

The patient EHR dataset can be acquired from the UNC with Data Use Agreement.

Code availability

Computer code used in this study is openly accessible with the links provided in the Methods section.

References

Denny, J. C., Bastarache, L. & Roden, D. M. Phenome-wide association studies as a tool to advance precision medicine. Annu. Rev. Genom. Hum. Genet. 17, 353–373 (2016).
Article CAS Google Scholar
Verma, A. et al. PheWAS and beyond: the landscape of associations with medical diagnoses and clinical measures across 38,662 individuals from Geisinger. Am. J. Hum. Genet. 102, 592–608 (2018).
Article CAS Google Scholar
Denny, J. C. et al. Variants near FOXE1 are associated with hypothyroidism and other thyroid conditions: using electronic medical records for genome- and phenome-wide studies. Am. J. Hum. Genet. 89, 529–542 (2011).
Article CAS Google Scholar
Dewey, F. E. et al. Distribution and clinical impact of functional variants in 50,726 whole-exome sequences from the DiscovEHR study. Science 354, aaf6814 (2016).
Article Google Scholar
Freimer, N. & Sabatti, C. The human phenome project. Nat. Genet. 34, 15–21 (2003).
Article CAS Google Scholar
Robinson, P. N. Deep phenotyping for precision medicine. Hum. Mutat. 33, 777–780 (2012).
Article Google Scholar
Leroux, H., Metke-Jimenez, A. & Lawley, M. J. Towards achieving semantic interoperability of clinical study data with FHIR. J. Biomed. Semant. 8, 41 (2017).
Article Google Scholar
McDonald, C. J. et al. LOINC, a universal standard for identifying laboratory observations: a 5-year update. Clin. Chem. 49, 624–633 (2003).
Article CAS Google Scholar
Köhler, S. et al. Expansion of the human phenotype ontology (HPO) knowledge base and resources. Nucleic Acids Res. 47, D1018–D1027 (2019).
Article Google Scholar
Posey, J. E. et al. Resolution of disease phenotypes resulting from multilocus genomic variation. N. Engl. J. Med. 376, 21–31 (2017).
Article CAS Google Scholar
Son, J. H. et al. Deep phenotyping on electronic health records facilitates genetic diagnosis by clinical exomes. Am. J. Hum. Genet. 103, 58–73 (2018).
Article CAS Google Scholar
Köhler, S. et al. The human phenotype ontology in 2017. Nucleic Acids Res. 45, D865–D876 (2017).
Article Google Scholar
Mandel, J. C., Kreda, D. A., Mandl, K. D., Kohane, I. S. & Ramoni, R. B. SMART on FHIR: a standards-based, interoperable apps platform for electronic health records. J. Am. Med. Inform. Assoc. 23, 899–908 (2016).
Article Google Scholar
Robinson, P. N. & Bauer, S. Introduction to Bio-Ontologies. (CRC Press Inc., Boca Raton, FL, 2011).
Krishnan, J. A., Davis, S. Q., Naureckas, E. T., Gibson, P. & Rowe, B. H. An umbrella review: corticosteroid therapy for adults with acute asthma. Am. J. Med. 122, 977–991 (2009).
Article CAS Google Scholar
Aplasca, E. C. & Rammohan, M. The effect of prednisone on the levels of serum albumin of 20 patients with renal transplants. J. Am. Diet. Assoc. 86, 1404–1405 (1986).
CAS PubMed Google Scholar
Dale, D. C., Fauci, A. S., Guerry D, I. V. & Wolff, S. M. Comparison of agents producing a neutrophilic leukocytosis in man. Hydrocortisone, prednisone, endotoxin, and etiocholanolone. J. Clin. Invest. 56, 808–813 (1975).
Article CAS Google Scholar
Shoenfeld, Y., Gurewich, Y., Gallant, L. A. & Pinkhas, J. Prednisone-induced leukocytosis. Influence of dosage, method and duration of administration on the degree of leukocytosis. Am. J. Med. 71, 773–778 (1981).
Article CAS Google Scholar
Veltri, K. T. & Mason, C. Medication-induced hypokalemia. Pharm. Ther. 40, 185–190 (2015).
Google Scholar
Smithson, J. et al.Drug induced muscle disorders. Aust. Pharm. 28, 1056 (2009).
Google Scholar
Price, D. B. et al. Blood eosinophil count and prospective annual asthma disease burden: a UK cohort study. Lancet Respir. Med. 3, 849–858 (2015).
Article Google Scholar
Allen, S., Britton, J. R. & Leonardi-Bee, J. A. Association between antioxidant vitamins and asthma outcome measures: systematic review and meta-analysis. Thorax 64, 610–619 (2009).
Article CAS Google Scholar
Jolliffe, D. A. et al. Vitamin D supplementation to prevent asthma exacerbations: a systematic review and meta-analysis of individual participant data. Lancet Respir. Med 5, 881–890 (2017).
Article CAS Google Scholar
Thuesen, B. H. et al. Atopy, asthma, and lung function in relation to folate and vitamin B(12) in adults. Allergy 65, 1446–1454 (2010).
Article CAS Google Scholar
Yiallouros, P. K. et al. Low serum high-density lipoprotein cholesterol in childhood is associated with adolescent asthma. Clin. Exp. Allergy 42, 423–432 (2012).
Article CAS Google Scholar
Ramaraju, K., Krishnamurthy, S., Maamidi, S., Kaza, A. M. & Balasubramaniam, N. Is serum cholesterol a risk factor for asthma? Lung India 30, 295–301 (2013).
Article Google Scholar
Ko, S.-H. et al. Lipid profiles in adolescents with and without asthma: Korea National Health and nutrition examination survey data. Lipids Health Dis. 17, 158 (2018).
Article Google Scholar
Chen, Y. C. et al. Lipid profiles in children with and without asthma: interaction of asthma and obesity on hyperlipidemia. Diabetes Metab. Syndr. 7, 20–25 (2013).
Article Google Scholar
Al-Shawwa, B., Al-Huniti, N., Titus, G. & Abu-Hasan, M. Hypercholesterolemia is a potential risk factor for asthma. J. Asthma 43, 231–233 (2006).
Article CAS Google Scholar
Cottrell, L., Neal, W. A., Ice, C., Perez, M. K. & Piedimonte, G. Metabolic abnormalities in children with asthma. Am. J. Respir. Crit. Care. Med. 183, 441–448 (2011).
Article Google Scholar
Metke-Jimenez, A., Steel, J., Hansen, D. & Lawley, M. Ontoserver: a syndicated terminology server. J. Biomed. Semant. 9, 24 (2018).
Article Google Scholar
Badrick, T. Evidence-based laboratory medicine. Clin. Biochem. Rev. 34, 43–46 (2013).
PubMed PubMed Central Google Scholar
Pathak, J., Kho, A. N. & Denny, J. C. Electronic health records-driven phenotyping: challenges, recent advances, and perspectives. J. Am. Med. Inform. Assoc. 20, e206–11 (2013).
Article Google Scholar
Ritchie, M. D. et al. Genome- and phenome-wide analyses of cardiac conduction identifies markers of arrhythmia risk. Circulation 127, 1377–1385 (2013).
Article CAS Google Scholar
Karnes, J. H. et al. Phenome-wide scanning identifies multiple diseases and disease severity phenotypes associated with HLA variants. Sci. Transl. Med. 9, 389 (2017).
Article Google Scholar
Glueck, M. et al. PhenoLines: phenotype comparison visualizations for disease subtyping via topic models. IEEE. Trans. Vis. Comput. Graph. 24, 371–381 (2018).
Article Google Scholar
Rajkomar, A. et al. Scalable and accurate deep learning with electronic health records. npj Digital Med. 1, 18 (2018).
Article Google Scholar
Robinson, P. N. et al. Improved exome prioritization of disease genes through cross-species phenotype comparison. Genome Res. 24, 340–348 (2014).
Article CAS Google Scholar
Smedley, D. et al. Next-generation diagnostics and disease-gene discovery with the Exomiser. Nat. Protoc. 10, 2004–2015 (2015).
Article CAS Google Scholar
Smedley, D. et al. A whole-genome analysis framework for effective identification of pathogenic regulatory variants in Mendelian disease. Am. J. Hum. Genet. 99, 595–606 (2016).
Article CAS Google Scholar

Download references

Acknowledgements

We acknowledge colleagues from the Monarch Initiative for comments on this project. Research reported in this work was supported by the National Institutes of Health’s National Center for Advancing Translational Sciences, Grant Number U24TR00230, the Biomedical Data Translator program (awards OT3TR002019 and OT3TR002020), and the Clinical and Translational Science program (award UL1TR002489, UL1TR002369). The project also received support from the Intramural Research Program within the National Library of Medicine, National Institutes of Health and the National Human Genome Research Institute, National Institutes of Health (award NR24OD011883). This work was also supported by the U.S. National Library of Medicine contract HHSN276201400008C. Dr. Feinstein was supported by the Eunice Kennedy Shriver National Institute of Child Health and Human Development of the National Institutes of Health under Award Number K23HD091295. Dr. Hunter was supported by National Institute of Health R01LM008111. Tiffany Callahan was supported by Colorado Biomedical Informatics Training Program T15LM009451. Dr. Peden was supported by EPA Cooperative Agreement CR 83578501. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health, nor should any endorsements be inferred by NIH or the U.S. Government. This material contains content from LOINC® (http://loinc.org), which is copyright © 1995-2018, Regenstrief Institute, Inc. and the Logical Observation Identifiers Names and Codes (LOINC) Committee and is available at no cost under the license at http://loinc.org/license.

Author information

Authors and Affiliations

The Jackson Laboratory for Genomic Medicine, Farmington CT, 06032, USA
Xingmin Aaron Zhang, Leigh C. Carmody, Daniel Danis, Vida Ravanmehr & Peter N. Robinson
Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, OR, 97239, USA
Amy Yates, Nicole Vasilevsky, J. P. Gourdine, Justin Ramsdill & Melissa A. Haendel
Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland, OR, 97239, USA
Nicole Vasilevsky & Melissa A. Haendel
Library, Oregon Health and Science University, Portland, OR, 97239, USA
J. P. Gourdine
Computational Bioscience Program, Department of Pharmacology, University of Colorado Anschutz School of Medicine, Aurora, CO, 80045, USA
Tiffany J. Callahan, Adrianne L. Stefanski & Lawrence E. Hunter
Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
Marcin P. Joachimiak & Christopher J. Mungall
North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
Emily R. Pfaff, James Champion, Kimberly Robasky & David B. Peden
Genetics Department, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
Kimberly Robasky
School of Information and Library Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
Kimberly Robasky
Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
Hao Xu & Karamarie Fecho
Genomic Medicine Institute, Geisinger Health System, Danville, PA, 17822, USA
Nephi A. Walton
Institute for Clinical and Translational Research, Johns Hopkins University, Baltimore, MD, 21202, USA
Richard L. Zhu & Christopher G. Chute
Charité Centrum für Therapieforschung, Charité - Universitätsmedizin Berlin Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin, 10117, Germany
Sebastian Köhler
Einstein Center Digital Future, Berlin, 10117, Germany
Sebastian Köhler
Linus Pauling Institute and Center for Genome Research and Biocomputing, Oregon State University, Corvallis, OR, 97331, USA
Melissa A. Haendel
Lister Hill National Center for Biomedical Communications, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
Clement J. McDonald
Department of Medicine, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
Daniel J. Vreeman
Center for Biomedical Informatics, Regenstrief Institute, Inc., Indianapolis, IN, 46202, USA
Daniel J. Vreeman
Division of Allergy, Immunology and Rheumatology, Department of Pediatrics, University of North Carolina, Chapel Hill, NC, 27599, USA
David B. Peden
University of North Carolina Center for Environmental Medicine, Asthma and Lung Biology, University of North Carolina, Chapel Hill, NC, 27599, USA
David B. Peden
Department of Pediatrics, Section of Pediatric Critical Care, University of Colorado School of Medicine, Aurora, CO, 80045, USA
Tellen D. Bennett & Blake Martin
Adult and Child Consortium for Health Outcomes Research and Delivery Science (ACCORDS), University of Colorado School of Medicine, Aurora, CO, 80045, USA
James A. Feinstein
Institute for Systems Genomics, University of Connecticut, Farmington, CT, 06032, USA
Peter N. Robinson

Authors

Xingmin Aaron Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Amy Yates
View author publications
You can also search for this author in PubMed Google Scholar
Nicole Vasilevsky
View author publications
You can also search for this author in PubMed Google Scholar
J. P. Gourdine
View author publications
You can also search for this author in PubMed Google Scholar
Tiffany J. Callahan
View author publications
You can also search for this author in PubMed Google Scholar
Leigh C. Carmody
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Danis
View author publications
You can also search for this author in PubMed Google Scholar
Marcin P. Joachimiak
View author publications
You can also search for this author in PubMed Google Scholar
Vida Ravanmehr
View author publications
You can also search for this author in PubMed Google Scholar
Emily R. Pfaff
View author publications
You can also search for this author in PubMed Google Scholar
James Champion
View author publications
You can also search for this author in PubMed Google Scholar
Kimberly Robasky
View author publications
You can also search for this author in PubMed Google Scholar
Hao Xu
View author publications
You can also search for this author in PubMed Google Scholar
Karamarie Fecho
View author publications
You can also search for this author in PubMed Google Scholar
Nephi A. Walton
View author publications
You can also search for this author in PubMed Google Scholar
Richard L. Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Justin Ramsdill
View author publications
You can also search for this author in PubMed Google Scholar
Christopher J. Mungall
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Köhler
View author publications
You can also search for this author in PubMed Google Scholar
Melissa A. Haendel
View author publications
You can also search for this author in PubMed Google Scholar
Clement J. McDonald
View author publications
You can also search for this author in PubMed Google Scholar
Daniel J. Vreeman
View author publications
You can also search for this author in PubMed Google Scholar
David B. Peden
View author publications
You can also search for this author in PubMed Google Scholar
Tellen D. Bennett
View author publications
You can also search for this author in PubMed Google Scholar
James A. Feinstein
View author publications
You can also search for this author in PubMed Google Scholar
Blake Martin
View author publications
You can also search for this author in PubMed Google Scholar
Adrianne L. Stefanski
View author publications
You can also search for this author in PubMed Google Scholar
Lawrence E. Hunter
View author publications
You can also search for this author in PubMed Google Scholar
Christopher G. Chute
View author publications
You can also search for this author in PubMed Google Scholar
Peter N. Robinson
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

X.A.Z.: software engineering, curation, data analysis, and interpretation; A.Y.: software engineering; N.V., J.P.G., L.C.C., T.J.C., P.N.R.: data curation; D.D.: software engineering; M.P.J., V.R., K.R.: data analysis; E.R.P., J.C., K.F.: data collection and interpretation; D.B.P., H.X., R.Z., J.R., N.A.W., S.K., C.M., D.V., C.G.C., L.H., C.J.M., M.A.H.: data interpretation; T.D.B., J.A.F., B.M., A.L.S.: manual verification of curated annotations; X.A.Z., P.N.R.: designed study, wrote manuscript.

Corresponding author

Correspondence to Peter N. Robinson.

Ethics declarations

Competing interests

D.J.V. is the President of Blue Sky Premise, LLC and participates in the development, maintenance, and distribution of LOINC. The remaining authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Material

reporting summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zhang, X.A., Yates, A., Vasilevsky, N. et al. Semantic integration of clinical laboratory tests from electronic health records for deep phenotyping and biomarker discovery. npj Digit. Med. 2, 32 (2019). https://doi.org/10.1038/s41746-019-0110-4

Download citation

Received: 26 November 2018
Accepted: 18 April 2019
Published: 02 May 2019
DOI: https://doi.org/10.1038/s41746-019-0110-4

This article is cited by

Ontologizing health systems data at scale: making translational discovery a reality
- Tiffany J. Callahan
- Adrianne L. Stefanski
- Michael G. Kahn
npj Digital Medicine (2023)
SARS2 simplified scores to estimate risk of hospitalization and death among patients with COVID-19
- Hesam Dashti
- Elise C. Roche
- Olga Demler
Scientific Reports (2021)
Outlier concepts auditing methodology for a large family of biomedical ontologies
- Ling Zheng
- Hua Min
- George Hripcsak
BMC Medical Informatics and Decision Making (2020)

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Overview of strategy

A LOINC to HPO mapping library

Algorithm for converting LOINC-coded laboratory tests into HPO-coded phenotypes

HPO on FHIR

LOINC to HPO demonstration with asthma

Discussion

Methods

Mapping LOINC terms to HPO terms

LOINC to HPO mapping file

HPO on FHIR

Command-line application for gathering FHIR server statistics

Analysis of UNC data on patients with asthma or an asthma-like condition

Statistics

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links