Semantic integration of clinical laboratory tests from electronic health records for deep phenotyping and biomarker discovery

Electronic Health Record (EHR) systems typically define laboratory test results using the Laboratory Observation Identifier Names and Codes (LOINC) and can transmit them using Fast Healthcare Interoperability Resource (FHIR) standards. LOINC has not yet been semantically integrated with computational resources for phenotype analysis. Here, we provide a method for mapping LOINC-encoded laboratory test results transmitted in FHIR standards to Human Phenotype Ontology (HPO) terms. We annotated the medical implications of 2923 commonly used laboratory tests with HPO terms. Using these annotations, our software assesses laboratory test results and converts each result into an HPO term. We validated our approach with EHR data from 15,681 patients with respiratory complaints and identified known biomarkers for asthma. Finally, we provide a freely available SMART on FHIR application that can be used within EHR systems. Our approach allows readily available laboratory tests in EHR to be reused for deep phenotyping and exploits the hierarchical structure of HPO to integrate distinct tests that have comparable medical interpretations for association studies.


Introduction
Electronic health records (EHRs) have been widely adopted in US hospitals since the Health Information Technology for Electronic and Clinical Health Act (HITECH) was passed in 2009, and offer an unprecedented opportunity to accelerate translational research because of advantages of scale and cost-efficiency as compared to traditional cohort-based studies 1 . In particular, EHRs contain rich phenotype information that can be utilized to stratify diseases and to develop hypotheses. For instance, phenome-wide association studies (PheWAS) can exploit EHR data to define case control cohorts for disease diagnoses or laboratory traits and then analyze associations with hundreds of thousands of genetic variants [2][3][4] . Despite the great potential of EHR data, patient phenotyping from EHRs is still challenging because the phenotype information is distributed in many EHR locations (laboratories, notes, problem lists, imaging data, etc.) and with EHRs having vastly different structures across sites. This lack of integration represents a substantial barrier to widespread use of EHR data in translational research.
Laboratory tests provide a critical resource for phenotype extraction. Deep phenotyping, i.e., comprehensive and precise phenotyping of individual disease manifestations, is an essential component of precision medicine and could potentially extend the reach of PheWAS studies 5,6 . Laboratory tests have broad applicability for translational research, but EHR-based research using laboratory data has been challenging because of the lack of standardization among different EHR systems. For instance, some tests measure nitrite level in urine using an automated machine, whereas others use a test strip. Some report the value in mg/dL whereas others report a qualitative value of positive/negative. If any of these tests were abnormal, the medical interpretation would be that nitrituria is present, yet current informatics frameworks do not easily support such inferences. Therefore, substantial challenges exist for standardization and integration of laboratory data for deep phenotyping and EHR-based translational research.
Recent advances in the standardization of EHR systems and phenotyping ontologies make it feasible to extract patient phenotypes from laboratory tests at a large scale. The Fast Healthcare Interoperability Resource (FHIR) was introduced in 2013 and provides a standardized interface to individual EHR systems for healthcare-related data 7 . FHIR separates healthcare-related data into granular components as "resources" such as observation, medication, patient identity and insurance claims, that have a standard definition and associated semantic bindings that can be computationally integrated even when they are created by different methods and organizations. Laboratory tests, encoded as observations in FHIR, are uniquely identified with Laboratory Observation Identifier Names and Codes (LOINC), which is a universal code system that defines various kinds of clinical laboratory tests and other measurements (~86,000 entries) 8 . The outcome of a FHIR observation can be represented by a term in the Human Phenotype Ontology (HPO), which is a logically defined vocabulary for describing human abnormal phenotypes 9 . The HPO has become the de facto standard for computational phenotype analysis in genomics and rare disease [9][10][11] . The HPO currently contains 13,608 terms including a comprehensive representation of laboratory abnormalities such as Hyperglycemia, Thrombocytopenia, and Increased urine alpha-ketoglutarate concentration. Here, we present a computational method that semantically harmonizes FHIR, LOINC, and HPO. The software rolls up LOINC terms for tests whose outcomes are medically comparable into common categories and interprets the outcome as HPO terms, thereby automatically extracting detailed, deep phenotypic profiles of laboratory results for downstream studies.

Overview of strategy
We present an approach to mapping the outcomes of laboratory tests as encoded in EHRs with LOINC terms for the tests and FHIR Observation resources representing the test results as HPO terms. A LOINC term by itself does not specify the outcome of a test. But if the outcome of a test (such as "high" or "low") and the nature of the test are known, we can then infer the phenotypic abnormality. For example, LOINC 32710-6 "Nitrite [Presence] in Urine" together with the outcome "positive" implies the phenotypic abnormality Nitrituria (HP:0031812). LOINC-coded laboratory tests can be grouped broadly into three categories, those with a quantitative outcome (Qn), an ordered categorical outcome (ordinal or Ord) and an unordered categorical outcome (nominal, or Nom). A quantitative test for an analyte has a normal range, and there are three types of mappings depending on the result of the test: L (lower than normal), N (normal), and H (higher than normal). Take, for instance, a test for the concentration of potassium in the blood (LOINC:6298-4, Fig. 1A). If the result is high, our procedure infers the corresponding HPO term for Hyperkalemia (HP:0002153). Analogously, a low result is mapped to Hypokalemia (HP:0002900). The HPO is an ontology of abnormal phenotypes, and thus there is no term that specifically represents a normal test result. However, computational analysis can record negated HPO terms, and the normal test result is represented as NOT Abnormality of potassium homeostasis (HP:0011042).
Ordinal tests can have a series of ordered outcomes. The majority of the ordinal LOINC tests were mapped to two possible outcomes, POS (positive) or NEG (negative). For instance, the result of the test Nitrite in urine by test strip can be positive (present) or negative (absent) (Fig.  1A). If present, then our approach infers the HPO term Nitrituria (HP:0031812); if absent, our approach infers NOT Nitrituria (HP:0031812). Representative examples of LOINC to HPO mapping. Potassium in blood, a quantitative (Qn) LOINC test, has three potential outcomes, L (lower than normal), N (normal) and H (higher than normal) and is mapped to three corresponding HPO terms. Presence of nitrite in urine, an ordinal (Ord) test, has two possible outcomes, POS (positive) or NEG (negative) and is mapped to either Nitrituria (HP:0031812) or NOT Nitrituria (HP:0031812), respectively. Color of urine, a nominal (Nom) test, has a list of likely outcomes (represented as Systematized Nomenclature of Medicine-Clinical Terms (SNOMED-CT) codes) and each one is mapped to an HPO term. B. Schematic representation of the relevant contents of a FHIR observation for laboratory tests. Each FHIR Observation resource for a LOINC-encoded laboratory test includes an identifier (id) and the name of the patient, the LOINC code and name, the normal reference range and the observed value as well as an interpretation of the result (see Table 1 for a complete list).

A LOINC to HPO mapping library
We have mapped 2421 LOINC terms to HPO terms. 77.8% of the mapped LOINC tests are Qn, 21.6% Ord and 0.6% Nom ( Fig. 2A). Taken together, these LOINC terms mapped to a total of that were mapped to an individual HPO term. In 56.4% of the cases, two or more LOINC terms are mapped to the same HPO term (mean = 8.5) (Fig. 2B), reflecting the fact that multiple laboratory tests (and associated LOINC terms) have outcomes that we consider to have an equivalent clinical interpretation and can therefore be mapped to the same HPO term.

Algorithm for converting LOINC-coded laboratory tests into HPO-coded phenotypes
We designed an algorithm that inspects elements of a FHIR resource for laboratory tests and converts the outcome into an HPO term. A standard FHIR resource for laboratory tests (a FHIR Observation) contains patient information, test identification, test result, normal reference ranges and interpretations (Fig. 1B). The algorithm compares the numerical result with the normal reference ranges to assign an interpretation code such as "L" or "POS" (Table 1) , or make use of the interpretation codes when they are present, to map the result to the corresponding HPO term ( fig. S1). Overall, the algorithm handles all three major types of LOINC-coded laboratory tests (Qn, Ord, and Nom) when combined with the LOINC to HPO annotation data.  Fig. 4A). The median tracking period of patients in this cohort is 3.1 years. The dataset contains ~54 million records of LOINC-coded clinical test results, medication prescriptions, diagnosis codes, procedure codes, patient information and other supporting records (Fig. 4B). Using our LOINC to HPO conversion algorithm, we successfully transformed 9.3 out of 11 million (83.1%) laboratory tests into HPO terms (Fig. 4C). For the entire cohort, on average, each HPO term was mapped from 1.8 distinct types of laboratory tests (Fig. 4D), indicating that the transformation successfully integrated distinctly coded laboratory tests that have the same clinical interpretation. The mapping procedure assigned an average of 594 laboratory test-derived HPO terms per individual patient, many of which were from the same laboratory tests performed at different visits. The tests corresponded to a mean of 53.5 unique HPO terms, of which 20.8 were abnormalities and the remainder were normal phenotypes (Fig. 4E). The hierarchical structure of the HPO allows inferences to be propagated up to parent terms and their ancestors 13 ; using this method, we inferred an additional 46.1 HPO terms (total 66.9) based on 20.8 abnormalities to each patient (Fig. S2).
As a proof-of-principle, we tested the ability of our procedure to identify phenotypic abnormalities associated with a diagnosis of asthma or with frequent prednisone use. About one third of the patients in this cohort had an ICD-9/10 diagnosis of asthma, and the remaining patients had ICD-9/10 codes reflecting other, potentially asthma-like, respiratory complaints. 14.2% of patients that had a diagnosis of asthma were administered or prescribed prednisone >3 times within a tracking period between 2004-2016; 8.5% of the remaining patients had been administered prednisone more than three times. Prednisone is a corticosteroid drug used for severe asthma treatment with multiple other indications 14 . We reasoned that both the diagnosis of asthma and the history of treatment with prednisone would likely be correlated with different but overlapping sets of laboratory abnormalities. Using logistic regression, we assessed the contribution of frequent prednisone prescription and the presence of acute asthma diagnosis to each phenotypic abnormality. Prednisone usage was significantly associated with an increased odds ratio for exhibiting many abnormal phenotypes that are consistent with the known effects of prednisone (Table 2), such as Hypoalbuminemia (HP:0003073) 15 , Neutrophilia (HP:0011897) 16 , Monocytosis (HP:0012311) 17 , Leukocytosis (HP:0001974) 17 , Hypokalemia (HP:0002900) 18 , and Elevated serum creatine phosphokinase (HP:0003236) 19 . An acute asthma diagnosis was significantly associated with five phenotypes, Increased red blood cell count (HP:0020059), Increased VLDL cholesterol concentration (HP:0003362) and Eosinophilia (HP:0001880), and two ancestor terms of Eosinophilia, Abnormal eosinophil count (HP:0020064) and Abnormal eosinophil morphology (HP:0001879). Eosinophilia is a well established marker for acute allergic asthma 20 . Although there have been some conflicting results 21 , a number of studies have shown a positive correlation between increased total, high or low-high-density lipoprotein cholesterol, or triglycerides and asthma [22][23][24][25][26] . An increased red blood cell count is not a recognized biomarker of asthma, but could conceivably reflect a number of factors including hypoxemia (11.1% with an acute asthma diagnosis also had a chronic obstructive pulmonary disease diagnosis), or hemoconcentration resulting from acute dehydration during an asthma attack, but the nature of this retrospective study does not allow us to consult the full medical records to investigate this. Percentage of laboratory tests in our dataset that could be converted into HPO terms (the remaining unmapped tests did not have LOINC to HPO annotations). D. Number of LOINC terms mapped to a given HPO term in the UNC dataset. E. Distribution of patients by the number of unique HPO terms that are mapped to each patient.

Discussion
In this report, we present an approach to the semantic integration of laboratory tests and results in EHR data. Our approach connects a widely used system for denoting laboratory tests, LOINC, with a current standard for transmitting health care information, FHIR, and a computational resource for deep phenotyping that was previously used mainly in the context of rare disease research and diagnostics.  28 . The Electronic Medical Records and Genomics (eMerge) network develops EHR-derived phenotyping algorithms by combining diagnosis codes, procedure codes, medication, narratives and subsets of laboratory tests and iteratively refine them to identify control and disease cohorts for genome-and phenome-wide association studies 1,3,28-30 . Our method complements existing phenotyping algorithms because it extracts additional phenotypic information by systematically interrogating the vast amount of data in laboratory tests.
The analysis of UNC EHR data demonstrated the potential of combining deep phenotypes from our tool with EHR data for biomarker discovery. Our current mapping library allowed us to convert the majority of the laboratory tests into HPO terms and assign an average of 53.5 unique phenotypes to each patient. The statistical analysis identified phenotypic abnormalities that are associated with frequent prescriptions of prednisone and/or acute asthma diagnosis. The cohort used for this analysis is biased toward senior and female patients and may not be reflective of asthma patient distributions, but the fact that our analysis identified numerous abnormalities that are associated with either prednisone use or asthma suggests that our approach can be useful for the investigation of EHR data for laboratory-based biomarkers of diseases and conditions. We have demonstrated the utility of our approach on the UNC dataset using a simple logistic regression approach as a proof of principle; we envision that our mapping approach could be used together with a variety of statistical and algorithmic analysis strategies to address a variety of topics in EHR-based translational research, and we have therefore coded our foundational approach in a way that can easily be integrated into other statistical analysis pipelines.
Several other use cases for our approach are conceivable. Rule-based algorithms could be applied to infer HPO terms from the primary phenotypic abnormalities. For instance, the combination of Decreased hemoglobin concentration (HP:0020062) and Decreased mean corpuscular volume (HP:0025066) implies Microcytic anemia (HP:0001935). The HPO is widely used in rare disease diagnostics, but one bottleneck is that in many settings, HPO terms need to be entered manually into the analysis software. A recent study used text-mining to extract detailed patient phenotypes through natural language processing of clinical narratives in EHR, and used the resulting lists of HPO terms for genomic diagnostics 11 . Our tool could supplement such tools by providing a computational representation of laboratory findings. In principle, our tool could be used to support other tasks related to EHR data, including decision support and cohort recruitment. In the future, we anticipate that semantic integration of a wider range of EHR data will become the norm to support data-driven translational research and precision medicine.

Mapping LOINC terms to HPO terms
We performed manual biocuration to construct a mapping library from each potential outcome of a LOINC test to the corresponding HPO term (Fig. 1A). The test outcome is represented using a subset of FHIR codes (Table 1, primary code), such as 'lower than normal', 'normal', or 'higher than normal'. For quantitative tests that report a numeric measurement, we use FHIR interpretation code "L" and "H" to indicate lower or higher than normal, and "N" and "A" to indicate the result is normal or abnormal. For ordinal tests that have a binary outcome, i.e. present or absent of the test target, we use FHIR interpretation code "POS" to indicate present and "NEG" to indicate absent. Additionally, other interpretation codes defined by FHIR are first mapped to primary codes. For example, FHIR codes "LL" (critically low) and "<" (off scale low) are both mapped to "L" ( Table 1) .
The value for a map entry is an HPO term accompanied by a boolean value to indicate whether it should be negated. That is, while an abnormal test outcome is mapped to a particular HPO term, the normal outcome for that test is mapped to the negated form, since the HPO contains only terms for abnormal phenotypes. Fig. 1A shows three examples of mappings for Qn, Ord or Nom LOINC terms.

LOINC to HPO mapping file
The LOINC to HPO mapping file contains records of mapping from LOINC test outcomes to the corresponding HPO terms. The annotation data is serialized as a tab-separated value (TSV) file.
Each line records the LOINC code, test outcome, the mapped HPO term, and whether the mapped term should be negated. The annotation file is deposited at Github and can be accessed at https://w3id.org/loinc2hpo/annotations. An excerpt is shown in Supplemental Table 1.

HPO on FHIR
We created a SMART on FHIR application, HPO on FHIR, to query a FHIR-enabled EHR servers and return patient laboratory results with LOINC codes and their corresponding HPO terms. The web interface of the application aggregates identical HPO terms together for visualization and also allows users to display source laboratory tests including subject, LOINC code, FHIR resource id, effective time and the corresponding HPO term. The app was written in the Java language with the Spring framework. The app uses the LOINC to HPO conversion algorithm described in fig. S1. The app is deposited at Github and can be accessed at (https://github.com/OCTRI/poc-hpo-on-fhir).

Command-Line application for gathering FHIR server statistics
We created a command-line application that finds all laboratory tests for a patient on a FHIR server and attempts to convert them to HPO. The conversion results, both successes and failures, are stored in a relational database to aid in translational research. We ran the application on 7 common FHIR sandboxes and gathered statistics about the LOINCs encountered, the rate of success in conversion, and the underlying causes of failure. The application was written in the Java language with the Spring framework. Source code, results, and a backup of the database, can be accessed at (https://github.com/OCTRI/f2hstats). Using the extracted laboratory data, we converted each LOINC-coded test into an HPO term. We note, however, that not every laboratory test result was captured in the available dataset. For each patient, we combined test records mapped to the same HPO terms and recorded the counts of observations for each HPO term. Then we inferred additional phenotypic abnormalities based on the hierarchical structure of HPO, i.e. if a patient was assigned with an HPO term, we infer that the patient automatically had phenotypic abnormalities encoded by parent and other ancestor terms ( fig. S2). We reasoned that an isolated abnormal measurement might represent an artefact or might not be typical of the clinical course of the patient, and therefore used a threshold of 3 observations over the entire observation period in order to classify a patient with the corresponding HPO-encoded phenotypic abnormality. We classified a patient not having an HPO-encoded phenotypic abnormality only when the patient had never been assigned for such an HPO term. Patient age was calculated from the last hospital visit date subtracting the birth date and is subject to an inaccuracy of +/-50 days due to the deidentification procedure (see above). Patients that rarely visited hospitals were less likely to receive laboratory tests and thus had less phenotypes, so we excluded those that had medical encounters on less than 10 days. Patients received more than 3 prednisone prescriptions were considered frequent users.

Statistics
We applied logistic regression model to determine the weights of being frequent prednisone user (values 0 or 1) and having acute asthma diagnosis (values 0 or 1) in determining a patient having an HPO-encoded phenotype (values 0 or 1). We excluded HPO terms from analysis of which the majority (95%) of the cohort had universal values (all 0 or 1). The natural exponential of the weights ± 1.98 standard deviations were converted to the odd ratio and 95% confidence intervals for each variable.
Data cleaning, normalization, wrangling and table joining were conducted by a combination of "tidyverse", "RSQLite" packages in R, SQLite and Java. Logistic regression was conducted with the "glm" package in R. All source code is deposited at Github and can be accessed through https://github.com/TheJacksonLaboratory/HUSHDataAnalysis. Fig. S1. FHIR to HPO conversion algorithm. Fig. S2. Inference of phenotypic abnormalities with the hierarchy of HPO. Table S1. Excerpt of LOINC to HPO annotation file. Table S2. ICD codes used by UNC asthma dataset to identify asthma and asthma-like patients. Table S3. Odds ratio of phenotypes for frequent prednisone prescription and acute asthma diagnosis