Heart failure clinical care analysis uncovers risk reduction opportunities for preserved ejection fraction subtype

Heart failure (HF) has no cure and, for HF with preserved ejection fraction (HFpEF), no life-extending treatments. Defining the clinical epidemiology of HF could facilitate earlier identification of high-risk individuals. We define the clinical epidemiology of HF subtypes (HFpEF and HF with reduced ejection fraction [HFrEF]), identified among 2.7 million individuals receiving routine clinical care. Differences in patterns and rates of accumulation of comorbidities, frequency of hospitalization, use of specialty care, were defined for each HF subtype. Among 28,156 HF cases, 8322 (30%) were HFpEF and 11,677 (42%) were HFrEF. HFpEF was the more prevalent subtype among older women. 177 Phenotypes differentially associated with HFpEF versus HFrEF. HFrEF was more frequently associated with diagnoses related to ischemic cardiac injury while HFpEF was associated more with non-cardiac comorbidities and HF symptoms. These comorbidity patterns were frequently present 3 years prior to a HFpEF diagnosis. HF subtypes demonstrated distinct patterns of clinical co-morbidities and disease progression. For HFpEF, these comorbidities were often non-cardiac and manifested prior to the onset of a HF diagnosis. Recognizing these comorbidity patterns, along the care continuum, may present a window of opportunity to identify individuals at risk for developing incident HFpEF.

Identifying high risk individuals is essential for effective prevention, and this requires a clear understanding of the evolving epidemiology of HF [8][9][10][11] . In particular, a clearer understanding of the spectrum and temporality of the clinical manifestations of early HF could better enable primary care providers to identify high risk patients with sufficient lead time to enable early intervention to delay or prevent the onset of HF and sequelae 12,13 . Defining the relevant clinical epidemiology requires large, contemporary clinical cohorts, such as those who seek care at large medical centers [14][15][16][17][18][19][20] .
We leveraged electronic health records (EHR) from ~ 2.7 million individuals seeking routine clinical care at Vanderbilt University Medical Center. We identified a large collection of HF cases and characterized the comorbidity profiles of HF subtypes prior to and at the time of their clinical diagnosis. We asked if there were distinct longitudinal patterns of comorbidity accumulation between the HF subtypes, as patients moved through the healthcare system, and if this could have implications for HF management.

Methods
All individuals were derived from Vanderbilt University Medical Center's Synthetic Derivative database, a research tool for conducting epidemiological studies using de-identified EHR data, especially suited to work with tools of machine learning and big data [21][22][23][24] . This resource comprises inpatient and outpatient clinical data from multiple sources including diagnostic and procedure codes (ICD-9 [International Classification of Disease, Ninth revision] and CPT [Current Procedural Terminology]), demographics, text from clinical notes, laboratory values, procedural reports (e.g., echocardiograms), and medications extracted individual clinical records [23][24][25] . The study was reviewed and approved by Vanderbilt's Institutional Review Board and was determined to be non-human subjects research. These data contain no HIPAA or other personal identifiers.
Clinical phenotypes. Medication data were extracted using the validated MEDEX tool 26 . Keyword features were extracted from source documents (problem lists and clinical documents) and occurrences were excluded if a negation term (such as "not" or "ruled out") was within 100 characters of the keyword (Supplementary Tables 1 and 2) 27,28 . For each individual, sex, race (white, black or other), blood pressure, heart rate, body mass index (BMI), BNP values, and clinician visit dates and types (i.e., inpatient/outpatient, clinic type) were extracted from structured tables in the SD. Diagnoses of myocardial infarction (MI), coronary artery disease (CAD), hypertension, dyslipidemia, atrial fibrillation (AF), type 2 diabetes (T2D), and chronic kidney disease (CKD) were based on previously validated EHR algorithms (Supplementary Table 3) 29 .
Echocardiographic measures of cardiac structure and function. Measures, extracted from transthoracic echocardiography (TTE) reports, using previously described approaches 29,30 , included left ventricular wall thickness, left ventricular ejection fraction (LVEF), diastolic function, and cardiac chamber dimensions. For each HF case with one or more LVEF measurements, the following HF subtypes were defined: (1) HFpEF: All LVEF measurements ≥ 50%.
PheWAS phenotypes. Phenome-wide association study (PheWAS) codes are validated groupings of related ICD-9 billing codes that capture the extended range of clinical diagnoses within an EHR data set 31,32 . A complete list of PheWAS codes and their mappings to ICD-9 codes can be found at https:// www. vumc. org/ cpm/ cpm-blog/ phewas-pheno me-wide-assoc iation-studi es. For a given PheWAS code, cases are individuals with 1 or more codes and controls are individuals with no related codes. After excluding phenotypes affecting a single sex or with ≤ 100 cases, 1224 clinical phenotypes were included in analyses.
Universal definition of heart failure. We examined how many of the cases used in the Validation set met the Bozkurt et al. 20 universal heart failure algorithm criteria based on the features available to the machine learning algorithm. In line with the guidelines, we tested for the presence of any of the following symptoms: pulmonary edema, pleural effusion, orthopnea, paroxysmal nocturnal dyspnea, nocturnal cough, lower extremity edema, dyspnea on exertion, jugular venous distension/elevated jugular venous pressure/elevated JVP, cardiomegaly, third heart sound, hepatomegaly, rales, ejection fraction < 50%, and an outpatient NT-Pro-BNP, pg/ ml ≥ 125. The proportion of individuals who met all of these criteria was determined.
Analysis. The random forest machine learning classifier, implemented in ScikitLearn (v0.18.1) 33 , was used to identify heart failure cases 34 . The approach and implementation are detailed in the Supplementary Methods. In brief, the machine learning model was developed on a training set (n = 1091) and validated on a testing set (n = 468). Based on the final set of features selected, an independent testing set demonstrated that the model had a positive predictive value (PPV) of 0.92 and a case sensitivity and specificity of 0.67 and 0.99, respectively, as compared to manual clinical record review by a cardiologist (Supplementary Tables 4 and 5). This validated model was then deployed in the EHR population to identify heart failure cases. For cases with TTE measurements, heart failure subtypes were assigned based on LVEF (see above). While a manual record review by a cardiologist confirmed the heart failure diagnosis among all cases used in the Validation set, only 34.8% met the Universal Definition of HF (described above) based on features employed by the random forest classifier. www.nature.com/scientificreports/ Descriptive statistics were generated for all HF cases as well as for each subtype (HFpEF, HFrEF, HFmEF) separately. Between group differences were assessed using the Mann-Whitney U test or Chi-square test for continuous and dichotomous variables, respectively.
We also conducted stratified analyses comparing HF cases identified before and after 2005, the approximate midpoint of the time range for which EHR data were available, to characterize secular trends in HF epidemiology. Subjects were classified by LVEF in 10-year age ranges, centered around decades for which LVEF data were present, and stratified by sex and date of HF diagnosis.
PheWAS analyses were used to comprehensively scan the medical phenome to identify clinical diagnoses differentially associated with HFrEF versus HFpEF, since these subtypes had the largest numbers of individuals. Because PheWAS phenotypes are highly correlated, we conducted a 2-step analysis to identify phenotypes independently associated with the HF subtypes. First, multivariable logistic regression analysis adjusting for age at HF diagnosis, sex, and self-reported race was used to test the association between each PheWAS code and the HF subtype (odds-ratios are the association with HFpEF, as compared to HFrEF). Phenotypes with a Benjamini-Hochberg (B-H) false discovery rate (FDR) 35 q-value < 0.1 were then jointly analyzed using a multivariable logistic model in conjunction with a stepwise selection feature (using Proc Logistic in SAS) that retained all phenotypes with an independent association p < 0.05. PheWAS analyses were stratified by 10-year age ranges.
The ML algorithm also assigned a diagnoses date for each case. To better define the clinical events leading up to a HF diagnosis, we extracted ICD-9 based diagnoses during each of the three years prior to diagnosis of HF and mapped them to corresponding PheWAS phenotypes. Because ascertainment of diagnoses during this time may be incomplete in individuals referred for specialty care, analyses were limited to those individuals receiving medical care at VUMC prior to diagnosis of HF, defined as two or more primary care or cardiology outpatient visits over the three-year period. Descriptive statistics were generated for PheWAS diagnoses and medical encounters prior to HF diagnosis for both HFpEF and HFrEF subjects and were also stratified by 10-year age groups.

Results
The ML classifier identified 28,156 HF cases (Fig. 1). Of these, 8322 (30%) were classified as HFpEF, 11,677 (42%) as HFrEF, 1958 (7%) HFmEF, and 21% had no LVEF data. HFrEF cases were more likely to be male (66 vs 42%) and have coronary heart disease (78 vs. 63%), as compared to HFpEF cases, (Table 1). Additional results on the ML model are presented in the Supplementary Methods. A significant portion of the comorbidities presented in Table 1 include diagnoses associated after a HF diagnosis, the fractions before and after HF diagnosis are presented in Supplementary Table 6. Many of the comorbidities appeared after a heart failure diagnosis.

Figure 1.
Overview of case identification and algorithm development. Individual level data were derived from an EHR comprising 2.7 million records. The machine learning algorithm was trained and tested on manually adjudicated sets of HF cases and non-cases. The final algorithm was deployed in an Implementation set to identify HF cases across the EHR. Performance measures for the HF classifier is shown for the Testing Set.  (Fig. 2). Among both sexes, the proportion of individuals with HFrEF decreased with age.  Table 6A). The trend in increasing age and diagnoses of hypertension, T2D, dyslipidemia, and CKD was seen across all subclasses. The prevalence of CAD remained similar in all HF subclasses, and median BMI was above 28 and 30 for HFrEF and HFpEF, respectively. Among HFrEF subjects, those entering the medical system after 2005, were more likely to have a history of myocardial infarction (MI) (34% vs 30%; p = 1.96 × 10 -8 ) and a slightly higher LVEF nadir (25% vs 20%; p = 1 × 10 -15 ). Phenome-wide comparisons identified clinical diagnoses differentially associated with HFpEF versus HFrEF cases across multiple age ranges. Among all age strata, there was a directionally consistent association between an ICD-9 based diagnosis of a heart failure subtype (HFpEF or HFrEF) and the assigned ML subtype (Fig. 3 Table 7).
Across all ages, most diagnoses prevalent among HFrEF cases were related to the circulatory system (Supplementary Fig. 1). HFrEF was consistently associated with ischemic heart disease (IHD) and MI across age groups and was associated with valve disease and sequelae of IHD in older age groups, including ventricular arrhythmias and ECG abnormalities such as left bundle branch block ( Fig. 3 and Supplementary Fig. 2).
In contrast, diseases associated with HFpEF cases were more heterogeneous and reflected a higher burden of comorbidities across the clinical disease spectrum. Among these were diagnoses related to symptomatic HF, such as respiratory failure, volume overload and edema, as well as obesity and related complications. For example, HFpEF was associated with sleep apnea at all ages, a diagnosis of obesity after age 45, and non-alcoholic liver disease after age 55 (OR = 1.59 [1.08-2.32], p = 1.8 × 10 -2 ). It was also associated with a diagnosis of hypertrophic obstructive cardiomyopathy among individuals ages 65-75 (OR = 6.37 [3.81-10.7], p = 1.9 × 10 -12 ).
In the youngest age group, the patterns of comorbidities were suggestive of HF etiologies secondary to congenital heart disease, extra-cardiac etiologies (e.g., pulmonary hypertension), as well as metabolic risk factors (Fig. 3, Supplemental Fig. 3).
Hospitalizations and cardiovascular evaluations became increasingly common in the years immediately prior to HF diagnosis (Fig. 4). In the year immediately preceding a HF diagnosis, approximately one quarter of both HFpEF and HFrEF subjects were hospitalized and over half were evaluated in outpatient cardiology Figure 3. Summary of clinical phenotypes associated with HF subtypes, by age group. A forward selection logistic regression model adjusting for age, sex and race was used to identify clinical phenotypes independently associated with HFpEF, versus HFrEF. An odds-ratio (OR) < 1 indicates that the phenotype is more prevalent among individuals with HFrEF and an OR > 1 indicates the phenotypes more prevalent with HFpEF. www.nature.com/scientificreports/ clinics. During the three years prior to diagnosis subjects were increasingly diagnosed with cardiovascular comorbidities including IHD, hypertension, dyslipidemia, diabetes, and coronary atherosclerosis as well as symptoms such as shortness of breath, edema, and chest pain ( Supplementary Fig. 3).
Overall patterns were similar between HFpEF and HFrEF, though IHD was more common in HFrEF and hypertension more common in HFpEF. HFpEF subjects also accumulated comorbid diagnoses for a slightly longer duration before a HF diagnosis, as compared to subjects with HFrEF (~ 2 years vs ~ 1 year). We examined the timing of the first mention of diuretic use, as a marker of early symptomatic heart failure, in the 3 years prior to a HF diagnosis. For HFpEF and HFrEF, respectively, 98% and 99% of loop diuretics were first mentioned only within the 6 months prior to a diagnosis ( Supplementary Fig. 7, Supplementary Table 8).

Discussion
This paper defined the comorbidity profiles and patterns of healthcare utilization for a large collection of HF cases receiving routine healthcare. There were clear differences in the profiles for the HF subtypes; overall, HFpEF-associated comorbidities were less likely to be cardiac diseases, as compared to HFrEF comorbidities (summarized in Fig. 4). Many of these comorbidities were present for up to 3 years before a diagnosis of HF was made. For HFpEF, these comorbidities were often presentations of symptomatic HF as well as complications often attributable to obesity. Recognition of these comorbidity patterns may present a window of opportunity to identify individuals at risk for developing incident HFpEF prior to fulminant disease.
Our rationale for pursuing these studies was motivated by the need to provide a fuller understanding of mechanisms contributing to HF morbidity, especially HFpEF, for which there are no proven life-extending therapies. The 2020 National Heart, Lung, and Blood Institute Working Group Summary on HFpEF research priorities has proposed potential molecular mechanisms of cardiac re-modelling in HFpEF attributable to comorbidities 6 . In particular, non-cardiac mechanisms underlying HF development need more research, and successful implementation of therapies targeting these mechanisms requires delineation of the presence and timing of these comorbidities in the context of the natural history of the disease 4 .
The timing and nature of the comorbidities associated with HFpEF was indicative of a frequently insidious disease course characterized by an accumulation of diagnoses related to heart failure symptomology and volume Intersection of the care continuum and the development of heart failure. Individuals accumulate comorbidities, for up to 3 years, as they progress through the hospital system receiving care, before a diagnosis of HF. This phenome-wide study demonstrates that non-cardiac antecedent comorbidities preferentially associate with the development of heart failure with preserved ejection fraction (HFpEF), as compared to heart failure with reduced EF (HFrEF). (c) The layered continuum of HFpEF. HFpEF is a complex, layered continuum of varied etiologies that, over-time, converge at the physiological endpoint of a "stiff heart", as the most apparent clinical feature. The antecedent period may represent the preclinical and subclinical forms of HFpEF, where potential screening and therapeutic opportunities might exist. www.nature.com/scientificreports/ overload such as edema, respiratory symptoms and "fluid overload". There were also diagnoses indicative of organ damage due to poorly controlled hypertension, such as hypertensive heart disease. Other associations were indicative of advance disease, such as anemia 36 , and chronic conditions that are downstream sequelae of chronic heart disease such as hearing loss 37 . The comorbid association patterns among 30-45-year-olds also highlight mechanisms particular to the development of HF symptoms in this age group that are not classically associated with the heart failure syndrome. These include congenital heart defects and primary pulmonary hypertension, though the latter is often misclassified in EHR data sets and typically represents secondary pulmonary hypertension. Thus, this age group is likely enriched in HF subtypes not representative of the more common forms of HF seen in older adults. HFpEF was also associated with a collection of diagnoses related to chronic obesity including morbid obesity, sleep apnea and non-alcoholic liver cirrhosis, highlighting the significant contribution of obesity to adverse LV remodeling, including diastolic dysfunction 38 . There was a temporality to these associations. For instance, sleep apnea was more common at a younger age, suggesting that this may be an important early clinical biomarker of risk for future HFpEF risk. In contrast, older ages were associated with the sequelae of end-organ failure, including non-alcoholic liver disease and HFpEF. It has been demonstrated that obesity is associated with an increased natriuretic peptide receptor type C/type A ratio. It has been postulated that this altered ratio may cause breakdown and deficiency of natriuretic peptide, leading to impaired cardiac function and possible progression to HFpEF 4,6 .
In contrast to HFpEF, HFrEF was associated with cardiovascular conditions that adversely impacted left ventricular systolic function. These associations included intrinsic (e.g., idiopathic) cardiomyopathies among individuals < 55 years old, structural heart disease (e.g., valve disease) at older ages, and ischemic heart disease across all age ranges. HFmEF patients in our study, comprising 10% of the HF cases, were a hybrid of HFpEF (female, HTN, DM2), and HFrEF (CAD burden) features, in agreement with prior observations 39 .
Our overall findings are corroborated by a recent study that examined the burden of 15 comorbidities among participants in the Atherosclerosis Risk In Communities (ARIC) study diagnosed with acute, decompensated heart failure 40 . In that study, HFpEF had a higher average burden of comorbidities, as compared to HFrEF. Importantly, a higher comorbidity burden was associated with higher mortality rates. Our study extends that by demonstrating that the higher comorbidity burden extends to a far broader spectrum of comorbidities. We also add to their findings by demonstrating that these comorbidities are antecedent in nature, are along the care continuum, and may carry a differential risk for developing incident HFpEF.
The majority of encounters occurring prior to a HF diagnosis were with non-cardiologist providers, especially for patients who went on to develop HFpEF. Approximately 50% of HFpEF patients received loop diuretics before their HF diagnosis, and most (98%) received them in the 6 months prior to HF diagnosis. Sensitizing primary providers that a loop diuretic requirement may suggest the onset of symptomatic heart failure could advance the lead time for a HF diagnosis and potentially enhance outcomes.
A large portion of research related to HFpEF has focused on adverse cardiac remodeling including diastolic dysfunction, cardiac hypertrophy and myocardial fibrosis 4 . Less research focuses on the multi-organ system processes that may play a role in the development of HFpEF. Our findings support the notion that HFpEF is a layered continuum characterized by premorbid states, such as obesity, that ultimately can lead to HFpEF (Fig. 4). This period of premorbid conditions may represent preclinical HFpEF, a timepoint where biomarkers like natriuretic peptide may be informative for early risk stratification. Most therapeutic trials recruit participants with fulminant HFpEF, a timepoint where refractory pathological changes have likely set in [3][4][5][6][7] . Better characterization of the premorbid state may identify timepoints where therapies may demonstrate greater benefit.
There are several novel aspects of the current study. The combination of a large data set, a broad representation of cardiac and non-cardiac phenotypes and large numbers of individuals across the full adult age spectrum provided an opportunity to identify important drivers of HF across the life course. In the younger age group, ischemic heart disease was a less common HF comorbidity, as compared to older ages. Cardiometabolic risk factors were common among this age group, and structural heart disease related to congenital heart defects were also more prevalent. Across all ages, HFpEF is a condition associated with a broad range of comorbidities, many of which are related to obesity. Prior to a HF diagnosis, most clinical encounters were with non-cardiologists for both HF subtypes, especially HFpEF. Importantly, the methods used here are easily portable to other EHR environments, and implementation of this approach across a broad range of EHR data sets will enable efficient, contemporaneous characterization of the evolving epidemiology of HF and will highlight important differences across diverse populations which will directed enhance treatment and prevention.
There are limitations to this study. The VUMC EHR is an observational, single site data set, which can be associated with differentially missing data elements and ascertainment biases. Thus, for instance, TTE measurements performed at outside institutions were not available for these analyses. VUMC is a tertiary care center and may represent a sicker cohort than the general population. The data elements available for analysis are influenced by clinical decision making and practice patterns, which can lead to systematic biases. There was no active followup of study subjects, which can lead to differential censoring of data.
Finally, these analyses used a random forest algorithm to identify cases. While the threshold to define HF cases had PPV of 92%, up to 8% of cases may be misclassified. Importantly, the machine-learning algorithm heavily loaded on the frequency of instances that an individual was assigned billing codes for a heart failure diagnosis. Thus, the case definition is heavily based on the summative assessment of the clinical provider using clinical criteria relevant to the time period when the diagnosis was made. When cases used in the Validation set were evaluated using a Universal heart failure definition 20 , only 35% had sufficient data available in an electronic format to confirm that they met the case definition. Thus, the validity of the case definition with respect to a contemporary epidemiological case definition of heart failure is not well-defined. To fully address the robustness of the approach, it will need to be systematically evaluated in other native clinical environments. www.nature.com/scientificreports/ In summary, we characterized the clinical epidemiology of HF subtypes derived from a large clinical data set. HF subtypes demonstrated distinct patterns of clinical co-morbidities and disease progression. Of direct clinical relevance, we demonstrate that, for HFpEF, these comorbidities are often not related to cardiac disease and manifest prior to the onset of a HF diagnosis. Awareness of these stereotypical patterns of clinical presentation will enhance early detection and prevention strategies, which is essential for HFpEF, which has no proven life-extending therapies.

Data availability
The aggregate data supporting the findings of this study are available from the corresponding author upon reasonable request; however, due to institutional data use agreements, access to individual level data is limited.