Machine learning for the development of diagnostic models of decompensated heart failure or exacerbation of chronic obstructive pulmonary disease

Heart failure (HF) and chronic obstructive pulmonary disease (COPD) are two chronic diseases with the greatest adverse impact on the general population, and early detection of their decompensation is an important objective. However, very few diagnostic models have achieved adequate diagnostic performance. The aim of this trial was to develop diagnostic models of decompensated heart failure or COPD exacerbation with machine learning techniques based on physiological parameters. A total of 135 patients hospitalized for decompensated heart failure and/or COPD exacerbation were recruited. Each patient underwent three evaluations: one in the decompensated phase (during hospital admission) and two more consecutively in the compensated phase (at home, 30 days after discharge). In each evaluation, heart rate (HR) and oxygen saturation (Ox) were recorded continuously (with a pulse oximeter) during a period of walking for 6 min, followed by a recovery period of 4 min. To develop the diagnostic models, predictive characteristics related to HR and Ox were initially selected through classification algorithms. Potential predictors included age, sex and baseline disease (heart failure or COPD). Next, diagnostic classification models (compensated vs. decompensated phase) were developed through different machine learning techniques. The diagnostic performance of the developed models was evaluated according to sensitivity (S), specificity (E) and accuracy (A). Data from 22 patients with decompensated heart failure, 25 with COPD exacerbation and 13 with both decompensated pathologies were included in the analyses. Of the 96 characteristics of HR and Ox initially evaluated, 19 were selected. Age, sex and baseline disease did not provide greater discriminative power to the models. The techniques with S and E values above 80% were the logistic regression (S: 80.83%; E: 86.25%; A: 83.61%) and support vector machine (S: 81.67%; E: 85%; A: 82.78%) techniques. The diagnostic models developed achieved good diagnostic performance for decompensated HF or COPD exacerbation. To our knowledge, this study is the first to report diagnostic models of decompensation potentially applicable to both COPD and HF patients. However, these results are preliminary and warrant further investigation to be confirmed.


Sample.
The criteria for admission to this study and the recruitment process have been previously reported 19 .
Patients older than 55 years who were able to walk at least 30 m, with a main diagnosis of decompensated HF and/or exacerbation of COPD and hospitalized in the Department of Internal Medicine, Cardiology or Pneumology were included.Participants with a pacemaker or intracardiac device, domiciliary oxygen therapy users prior to admission and patients with HF functional class IV of the New York Heart Association (NYHA) classification were excluded 29 .
Four hospitals participated: two tertiary university hospitals (600-900 hospital beds) and two regional secondary care hospitals (150-400 hospital beds) in the provinces of Barcelona and Madrid.
Each center had a trained interviewer, and each department had a referring physician who was accessible to the interviewer.Each day, the interviewer contacted the referring physician to review the hospitalization census and identify patients with the diagnosis of interest.Next, the interviewer confirmed the main diagnosis (decompensated HF and/or exacerbation of COPD) with the physician responsible for the patient and then contacted the participant (the same day or the next day) to obtain informed consent and verify compliance with all admission criteria of this study.The sample was obtained through convenience sampling, and all patients were enrolled consecutively as they were identified.
The recruitment and follow-up periods lasted 18 months starting in November 2010.
Evaluation of the participants.Each patient underwent three identical evaluations: the first in the hospitalization unit (V1) and the other two consecutively and at least 24 h apart in the participant's home 30 days after hospital discharge (V2 and V3).Thus, each participant underwent one evaluation in the decompensated phase (V1) and two in the compensated phase (V2, V3) of their disease.The evaluation protocol 19 included documentation of symptoms (dyspnea according to the NYHA 29 and Modified Medical Research Council (mMRC) 30 scales) and physiological parameters (HR and Ox) in two consecutive periods: effort (walking at a normal pace and on flat terrain for a maximum of 6 min) and recovery (seated for 4 min after the end of the effort period).HR and Ox were considered time series with a sample frequency of 1 Hz and were collected throughout the evaluation with a pulse oximeter (Model 3100, brand Nonin® Medical, Inc., Plymouth, MN, USA) placed on the left index finger.
Reference standard diagnostic test.Given the absence of a single standard diagnostic test to verify whether a patient was in the compensated or decompensated phase of their disease, the clinical judgment of the participant's responsible physician was considered a standard diagnostic test.Thus, in the decompensated phase, the diagnosis of decompensated HF and/or COPD exacerbation corresponded to the confirmed diagnosis from the participant's attending physician (in cases of diagnostic doubt, the patient was excluded).For the compensated phase, a standard diagnosis of compensated HF and/or stable COPD was confirmed by a study physician through telephone contact with the participant 30 days after hospital discharge.During this telephone interaction, the patient was considered to be in the compensated phase if none of the following events had occurred since hospital discharge: increased cough, sputum or dyspnea; initiation of or an increase in corticosteroid use; and initiation of antibiotic treatment or medical consultation for worsening of the clinical situation from any cause.In cases of doubt or if the compensated phase could not be confirmed, successive telephone contacts were made until the phase could be confirmed.The interviewer scheduled home visits for the respective evaluations (V2, V3) only after confirmation and within 24-48 h of receiving confirmation.

Index test: diagnostic algorithms. Initial preparation of potentially predictive variables or characteristics.
Given the objective of this study (development of an "online" algorithm capable of detecting the onset of an exacerbation from HR and Ox data), various characteristics of each of the evaluations were extracted (V1, V2, V3).For this purpose, the effort phase (walking) and recovery phase of each evaluation were separated by verifying the times recorded manually in the data collection records at the beginning and end of each phase of the test and visually reviewing the signals to confirm the manual records.Once the signals were separated according to the evaluation phase, the corresponding characteristics of the available measures were extracted.
Numerous characteristics were extracted from the signals.During each of the tests, two different phases were considered: effort and recovery, which were treated separately.From each of the phases, three signals were considered: HR, Ox and the normalized difference between these variables.From each of these three temporal signals, the characteristics of the temporal (the mean, standard deviation, and range) and frequency domains (the characteristics of the first and second harmonics, the distribution of the harmonics [kurtosis and skewness], the sum of all harmonics and the six first indices of the principal component analysis [PCA] for the normalized fast Fourier transform [FFT] of the signal) were extracted.Accordingly, 16 characteristics were obtained from each phase (effort and recovery) of each signal (HR, Ox, and the normalized difference between these), resulting in a total of 96 characteristics for each evaluation.The normalized difference between Ox and HR was defined using the sklearn standardscaler function (the mathematical formula is available at https:// scikit-learn.org/ stable/ modul es/ gener ated/ sklea rn.prepr ocess ing.Stand ardSc aler.html), and PCA was applied to the HR and Ox time series using the sklearn.decomposition.PCA function (formula available at https:// scikit-learn.org/ stable/ modul es/ gener ated/ sklea rn.decom posit ion.PCA.html).Regarding the selection of the first 6 components of the PCA, this decision was made based on the researchers' criteria, considering that typically in this type of analysis, the first 3 to 6 components are considered.
Labeling and definition of events to be detected.Given that the main objective of this study was the detection of a transition from a state considered normal or stable (HF or COPD in the compensated phase [V2, V3]) to a state of decompensation or exacerbation (decompensated phase [V1]), a methodological scheme was applied based on calculation of the differences between the evaluations of each available characteristic.Thus, if a patient had three evaluations (V1, V2 and V3), six differences or useful comparative signals were obtained from these evaluations (V1-V2, V1-V3, V2-V1, V2-V3, V3-V1, V3-V2).The label of each of these comparative signals is illustrated in Fig. 1.
Although the differences V1-V2 and V1-V3 might be more appropriately considered "decompensation recovery" rather than "no decompensation", we decided to discard a third label category ("decompensation recovery") due to the small sample size and because the main objective of the trial was the detection of decompensation.

Selection of predictor variables or characteristics.
In a first approximation, potential predictive characteristics were selected using the random forest 31 , gradient boosting classifier 31 and light gradient-boosting machine (LGBM) 32 classification algorithms, which integrate the functions of characteristic selection by importance within the decision.We selected the top 10 features based on their importance ranking within the structure of each classifier model.
Figure 2 shows an outline of the process for preparation and selection of the characteristics of the signals.
Vol:.( 1234567890) www.nature.com/scientificreports/During the process of selecting characteristics, all those that were redundant or had very low variabilities were discarded.In this study, by definition, we did not have variables with perfect separation that could cause overestimation of the diagnostic capacity of the models (overfitting) 26 .
In addition to the characteristics selected from the HR and Ox signals, the age, sex and baseline disease (HF or COPD) of the patients were considered potential predictors.

Development and validation of algorithms.
For the development of the algorithms, the ML techniques most used in the studies of classification models were considered: (i) decision trees, (ii) random forest, (iii) k-nearest neighbor (KNN), (iv) support vector machine (SVM), (v) logistic regression, (vi) naive Bayes classifier, (vii) gradient-boosting classifier and (viii) LGBM.
For each of these techniques, hyperparameters were selected based on a brute force scheme using all available data through a cross-validation scheme (K-fold cross-validation, k = 5).A normalization process based on the medians and interquartile ranges (IQRs) was applied to all characteristics 31 .
Once the best parameters of each technique were identified, internal validation was performed with a leaveone-patient-out method.Thus, a new model was calculated for each patient by replacing the model's data from the training and validation sets with the patient's data.Figure 3 shows an outline of the training and validation process.
The observation units (inputs) on which the algorithms were applied were the differences between two different evaluations, as illustrated in Fig. 1.Thus, the algorithms classified the evaluated difference as a state of "no decompensation" (label = 0) or "a change to decompensation" (label = 1).Therefore, the following parameters were defined: • True positive (TP) "a change to decompensation" as the classification result for a V3-V1 or V2-V1 compari- son.
• True negative (TN) "no decompensation" as the classification result for a V1-V2, V1-V3, V2-V3 or V3-V2 comparison.The parameters used to evaluate the diagnostic performance of the algorithms were S, E and accuracy (A).Each patient could have up to six observation units or inputs; therefore, up to six classification results were obtained, which were then defined as TP, TN, FP or FN.Then, the S, E and A were obtained for each patient.The final S, E and A of the entire sample were calculated from the mean of the parameters obtained from each patient.
The predictive values were not considered because the proportions of evaluations in the decompensated phase (33% [V1]) and compensated phase (66% [V2, V3]) did not correspond to the usual proportion found in clinical practice (the vast majority of patients in the community are usually in the compensated phase).
Missing data, excluded data and indeterminate results.Missing data were not included in the analysis, but patients with missing data were not excluded (all available patient data were included in the analysis).
No imputation of the missing data was performed.
During the process of signal review and verification of the start and end times of each evaluation from the manual records, missing sections of HR and/or Ox data due to poor contact between the skin and the sensor were observed.This incidence caused the introduction of some filters to be applied to exclude these missing sections from the analysis.Thus, an evaluation was excluded if it had a loss rate (missing measures divided by the total number of measures) greater than 10% in any phase.In addition, evaluations performed at home (V2, V3) that did not reveal an improvement in the sensation of dyspnea for the patient (of at least one point according to the mMRC scale 30 ) with respect to the decompensated phase evaluation (V1) were also excluded to ensure that home assessments were performed in the "compensated phase".
No indeterminate results were noted in the index test (algorithms); in all cases, the model produced a "no decompensation" or "a change to decompensation" result.On the other hand, all evaluations were always performed after a definitive result of the standard diagnostic reference test: clinical diagnosis of the decompensated phase by the doctor responsible for the patient in the hospital evaluation (V1) and clinical diagnosis of the compensated phase by the doctor who contacted the patients by phone before home evaluations (V2, V3).Thus, the algorithms were developed and applied on evaluations clearly labeled as the compensated or decompensated phase by the reference diagnostic test.

Diagnostic algorithms.
The diagnostic performance of the algorithms developed according to the technique used is shown in Table 3.The techniques with S and E values above 80% were logistic regression and SVM.

Discussion
Main results.The present study reported diagnostic models that achieved a good detection capacity for exacerbation of COPD or HF decompensation (S and E greater than 80%).Although the S and E were slightly lower than those of models in two other studies (Vamos et al. 15 for HF and Wu et al. 17 for COPD), we highlighted that the models in our study, unlike these previous models, do not require complex devices such as intradomiciliary sensors or cardiac defibrillators for their implementation in clinical practice.A study that is potentially www.nature.com/scientificreports/more comparable to ours in terms of the technology used and the method developed for the algorithms is that of Stehlik et al. 16 .The study reported HF decompensation detection models developed through ML from the monitoring of physiological parameters of 100 patients collected through a cutaneous patch at the thoracic level.3. Diagnostic capacity of the algorithms developed according to the technique used.*These parameters were obtained from the mean of all patients (since not all had the same number of evaluations, the mean does not necessarily correspond to that obtained from the total true positive, false negative, true negative and false positive data available in the entire sample).www.nature.com/scientificreports/ The models developed obtained an S of 76 to 88% and an E of 85%, values similar to those of the models in our study.Recently, Morrill et al. 33 reported diagnostic models of decompensated HF developed with ML techniques with an S of 100% and E of 73% but based on simulated clinical situations and not real patients.Another important result was that the underlying disease (COPD or HF) did not influence the development or diagnostic performance of the models; thus, to our knowledge, this is the first study to report diagnostic models of decompensation potentially applicable to patients affected by COPD and by HF, which may be relevant given the increasing proportion of patients affected by both pathologies.However, our study can only be considered preliminary at this point because the trial size was modest and the design was not robust enough to confirm that this result is generalizable.Therefore, this result warrants further investigation.As a hypothesis, we propose the coexistence of pathophysiological mechanisms in the decompensation of both diseases, with HR, Ox and their relationship serving as parameters that could represent a relevant common denominator for decompensation of both pathologies.Ox has already shown considerable utility in the detection of acute HF in previous studies 34 and has been considered the physiological parameter with the greatest discriminative power in COPD 10,35 .In addition, the cutoff point of Ox for the detection of acute HF does not seem to be modified in patients who also have COPD 34 .Our study also proposes HR and its relationship with Ox as parameters of interest in the pathophysiological mechanisms related to decompensation of both diseases because although most of the characteristics chosen for the development of the models (eight of 19) were only related to Ox, four were exclusively related to HR, and the rest (seven of 19) were related to the combination of the two parameters (HR-Ox).In any case, further research should explore this hypothesis.
Validity.With the methodological approach considered, we believe that none of the selected characteristics or the other potentially predictive variables evaluated were associated with a possible phenomenon of "information leakage" from the outcome variable (compensated or decompensated phase) to the predictor variables ("outcome leakage") 26 .However, we must recognize a possible "validation leakage" 26 because we could not use a completely independent sample for validation of the diagnostic models developed (the sample size we had led us to prioritize the development of the models with the maximum available sample), and we must recognize the possibility of some overestimation in the diagnostic performance obtained.
We began this study with a cohort of patients in the decompensated phase.This design allowed us to have sufficient observations for both categories of the outcome variable and to develop and evaluate the models obtained (if we had started with a cohort of stable patients, only a small proportion would have presented with decompensation).In addition, the design allowed each patient to act as his or her own control.Although choosing hospitalization as a reference for the decompensated phase was not ideal because the ultimate goal of these algorithms was to detect clinical decompensation in an earlier phase, the evaluation during hospitalization was performed once the patients were clinically stable and were able to walk at least 30 m, so the hospital evaluation (V1) was actually carried out once the most acute phase of decompensation had passed.In the same way, we did not take admission parameters into account because that moment represents the most severe phase of exacerbation, and the aim of this study was to develop an algorithm based on parameters as similar as possible to an earlier phase of exacerbation.
The time interval between the standard diagnostic method (confirmation of the compensated or decompensated phase by a doctor) and data collection for the development of the diagnostic models was quite short (24-48 h); therefore, we do not believe that considerable changes in the clinical states of the patients occurred between these events to influence the results for the diagnostic performance of the models developed.
In terms of the extrapolation of our results to other populations, the models were designed to detect the most severe exacerbations of the disease (those that prompt hospital admission) and not milder exacerbations (such as those requiring only outpatient management).The inclusion of centers with different levels of complexity in two different geographical areas allowed us to include a sample of patients representing a large part of the clinical spectrum of both pathologies.

Clinical implications. Pending external validation and demonstration of their efficacy in routine clinical
practice, the models developed in this study are designed for implementation in minimally invasive or nondisruptive devices for routine, continuous out-of-hospital monitoring of certain patients.Although the data in this study were collected from a pulse oximeter, various commonly used devices (for example, smartwatches) are capable of continuously monitoring the physiological parameters included in the diagnostic models developed.
Limitations.In addition to the limitations mentioned in the previous paragraphs, we must recognize the high proportion of valuations missing or excluded from the analysis, which may have adversely influenced the diagnostic performance achieved by the models developed.Thus, we must accept the possibility of a nonnegligible selection bias in the final sample available for analysis.This limitation, along with the modest sample size of our study, prevented us from delving deeper into certain aspects of the analysis and interpretation of the results.We were unable to perform analyses on specific subgroups (e.g., the subgroup of patients with both conditions in a decompensated state), identify phenotypes of the evaluated conditions, or compare the diagnostic performance of the models between different evaluations (e.g., V1-V2 vs. V1-V3).
We also emphasize that the conditions in which the assessments were performed were controlled (a specific protocol of walking and recovery was followed), and evaluations in more "real" conditions are still pending.Finally, although the high proportion of evaluations in the decompensated phase allowed us to enhance model development, this proportion was considerably higher than that in the real world (in usual conditions, most patients are in the compensated phase of their disease); therefore, a compensated/decompensated phase

Figure 1 .
Figure 1.Labeling and interpretation of comparative signals.

Figure 2 . 6 •
Figure 2. Process for preparation and selection of the characteristics of the evaluations.

Figure 4 .
Figure 4. Flow of the study participant selection process.

Figure 5 .
Figure 5. Ox and HR time series for a patient during each evaluation of the trial.The orange and blue lines represent SpO2 and HR, respectively.The HR and SpO2 values correspond to those displayed in the left and right columns, respectively.

Table 1 .
Baseline characteristics of the patients analyzed.

Table 2 .
Selected predictor characteristics or variables.HR heart rate, Ox oxygen saturation, PC principal component, FFT fast Fourier transform.