Predication of oxygen requirement in COVID-19 patients using dynamic change of inflammatory markers: CRP, hypertension, age, neutrophil and lymphocyte (CHANeL)

The objective of the study was to develop and validate a prediction model that identifies COVID-19 patients at risk of requiring oxygen support based on five parameters: C-reactive protein (CRP), hypertension, age, and neutrophil and lymphocyte counts (CHANeL). This retrospective cohort study included 221 consecutive COVID-19 patients and the patients were randomly assigned randomly to a training set and a test set in a ratio of 1:1. Logistic regression, logistic LASSO regression, Random Forest, Support Vector Machine, and XGBoost analyses were performed based on age, hypertension status, serial CRP, and neutrophil and lymphocyte counts during the first 3 days of hospitalization. The ability of the model to predict oxygen requirement during hospitalization was tested. During hospitalization, 45 (41.8%) patients in the training set (n = 110) and 41 (36.9%) in the test set (n = 111) required supplementary oxygen support. The logistic LASSO regression model exhibited the highest AUC for the test set, with a sensitivity of 0.927 and a specificity of 0.814. An online risk calculator for oxygen requirement using CHANeL predictors was developed. “CHANeL” prediction models based on serial CRP, neutrophil, and lymphocyte counts during the first 3 days of hospitalization, along with age and hypertension status, provide a reliable estimate of the risk of supplement oxygen requirement among patients hospitalized with COVID-19.

www.nature.com/scientificreports/ or mild disease, elderly patients and those with comorbidities (such as cardiovascular disease, diabetes mellitus, hypertension, chronic lung disease, cancer, and chronic kidney disease) are at an increased risk of death from respiratory failure and sepsis [1][2][3][4] . In the absence of effective and/or preventive treatments, the outcome for critically ill COVID-19 patients depends on the availability of supportive intensive medical care 5 . The rapid spread of COVID-19 as a global pandemic has brought extraordinary challenges to the healthcare system. When the healthcare system is overwhelmed by a massive influx of patients, mortality increases 6 . In the face of limited resources, it is critical to reliably identify COVID-19 patients who require close monitoring and intensive care, including supplementary oxygen and/or mechanical ventilation, while those patients with a good prognosis can be monitored at home or managed at a living and treatment center 7 . A prediction model that can identify patients at high risk of respiratory failure at an early stage will help optimal allocation of limited resources.
During the early stage of COVID-19 infection, immunologic responses differ between survivors and nonsurvivors 2 . Since clinical and laboratory parameters (especially inflammatory markers) are subject to dynamic change, trends (i.e., time-series measurements) might better capture onset of a potentially lethal hyper-inflammatory immune response, which is associated with a severe clinical course and a poor outcome 8 .
Here, we aimed to construct a prediction model that identifies COVID-19 patients at high risk of developing respiratory failure. Based on our previous findings, we a priori selected five parameters: CRP, hypertension status, age, and neutrophil and lymphocyte counts (CHANeL). We hypothesized that the pattern of CRP, and neutrophil and lymphocyte counts during the first 3 days of hospitalization are predictive of the type (e.g., hyper-inflammatory) of inflammatory response likely to occur during the course of infection. We constructed several prediction models including a logistic regression, logistic LASSO regression, a Random Forrest model, a Support Vector Machine, and XGBoost. We found that the logistic LASSO regression model showed high sensitivity and specificity for identifying patients with COVID-19 who are at high risk of respiratory failure during hospitalization.

Results
Baseline characteristics. Between January 24, 2020 and July 10, 2020, 280 consecutive patients with COVID-19 were enrolled. After excluding patients with an unclear diagnosis (n = 3) and missing data (n = 56), 221 patients were assigned randomly to a training set (n = 110) or a test set (n = 111) (Fig. 1). The mean age of the patients in the training and test sets was 56.0 and 55.0 years, respectively, and 58.2% and 65.8%, respectively, were male. The clinical characteristics of the patients in the training and test sets at the time of admission are shown in Table 1. There was no difference in baseline pulse oximetric saturation/fraction of inspired oxygen (SpO 2 /FiO 2 ) ratio and other clinical and laboratory features between the groups. The prevalence of hypertension, diabetes and chronic kidney disease were similar in the training set and the test set (Table 1.) Forty-six patients (41.8%) in the training set and 41 (36.9%) in the test set required supplementary oxygen during hospitalization. The patients received supplementary oxygen therapy when clinically indicated (SpO 2 < 92% or any shortness of breath on room air). The mode of oxygen administration was subject to change based on the patient's condition as described in Supplementary Table S2.
Prediction models. We developed multivariate risk prediction models to assess the primary outcome (i.e., requirement of supplementary oxygen during hospitalization) based on five variables. All five models showed a high AUC > 0.9 for the training set and test set. Among them, the logistic LASSO regression model showed the highest AUC for the test set ( Fig. 2A,B).
Sensitivity and specificity of the prediction models. The probability cut-off for each model was set at 0.3 to increase the sensitivity (at the cost of specificity). Sensitivity, specificity, predictive values, and accuracy of the predictor models for both the training and test set were high ( Table 2 and Fig. 2C,D). Among the test models, the logistic LASSO regression model showed the highest sensitivity (0.927) and specificity (0.814) for the test set. All models had a high negative predictive value (NPV). When the probability cut-off value was set to > 0.4, the specificity (for the training and test set) and accuracy (for the training set) improved, but the sensitivity decreased.  Table S1). In the logistic LASSO regression model, the CRP value on Day 3 had the highest impact in all five models, whereas the CRP level on Days 1 and 2 played less of a role.
In the Random Forest model, variables of the first 3 days were important.
Construction of a calculator. An

Discussion
To the best of our knowledge, this study was the first attempt to include pattern of the routine inflammatory markers during the early stage of disease in model to predict requirement for supplementary oxygen (i.e., respiratory failure) among hospitalized patients with COVID-19. All models based on CHANeL (age, hypertension, serial CRP, and neutrophil and lymphocyte counts during the first 3 days of hospitalization) showed high accuracy.
The unique strength of the CHANeL prediction models is the hypothesis-driven a priori selection of the five predictors. We showed previously that a hyper-inflammatory immune response, characterized by high CRP levels, high neutrophil counts, and low lymphocyte counts, was associated with a requirement for supplementary oxygen support and a worse outcome, whereas a normal inflammatory response, characterized by minimal elevation of CRP, a normal neutrophil count, and a normal lymphocyte count, was associated with an excellent outcome 8 . The inflammatory markers were similar on day of admission and started to differ between patients who required supplementary oxygen and those who did not in the first few days of illness. The difference become prominent in the second week of hospitalization. Thus, the dynamic changes (i.e., patterns) in common inflammatory markers (CRP, and neutrophil and lymphocyte counts) in early disease course were strongly associated with overall inflammatory response and clinical severity of COVID19 8 . A retrospective study of 136 COVID-19 patients showed that initial clinical and laboratory characteristics at admission were not predictive of this deterioration, further supporting that parameters measured at a single time point might not be sensitive enough to identify patients at risk 9 . To increase the model accuracy, we included two known demographic risk factors (age and hypertension), which have been identified consistently as demographic characteristics associated with a worse outcome 2,5 .
Numerous laboratory parameters have been suggested as risk factors for a worse outcome of COVID-19 disease; these include increased neutrophil counts, decreased lymphocyte counts (and, thus, the neutrophil/ www.nature.com/scientificreports/ lymphocyte ratio), elevated CRP levels, and elevated d-dimer levels 2,4,10-13 . Others identified serum hydrogen sulfide and soluble urokinase plasminogen activator receptor as potential predictors for severe pneumonia in COVID-19 14,15 . Furthermore, blood levels of interleukin (IL)-1, IL-6, IL-8, and tumor necrosis factor (TNF) are associated with severity and prognosis of COVID-19 16 . TNF and IL-6 drive hepatic synthesis of CRP, whereas IL-8 increases neutrophil recruitment. Therefore, the levels of these cytokines are reflected indirectly by CRP levels and neutrophil counts in the CHANeL model. Liang et al. developed a clinical risk score to predict the probability of developing a critical illness. The score system was based on ten variables measured at admission, all of which were selected from an initial 72 candidates 17 . Similarly, the other prediction models for severe COVID-19 such as CANPT score or CMR tool are based on scoring of numerous parameters at admission 18,19 . By contrast, the CHANeL model is based on the hypothesis that the inflammatory response ultimately determines the clinical course of COVID-19. Since clinical manifestations such as hemoptysis, dyspnea, chest X-ray abnormalities, and mental status change, and laboratory parameters are considered to be the result (not the cause) of a systemic inflammatory response to viral infection, they were not included. Despite, or because of, its simplicity, the performance of the CHANeL-based prediction models was high; all models had an AUC of > 0.9 (Fig. 1). The five different models were indirectly compared with regards to sensitivity, specificity, positive predictive value, negative predictive value and accuracy, and the logistic LASSO model and the Random Forrest Model showed the best sensitivity and specificity ( Table 2, Fig. 2C); therefore, they were used to develop a risk calculator for bedside use (Fig. 3). Interestingly, in the logistic LASSO model, day 3 level of the CRP (among the first 3 days values) had the highest impact. However, in other models, the values on day 1-3 (the "trend" over the first 3 days) were important (Supplementary table S1), emphasizing the different algorithms used in the 5 prediction models. Identifying patients with a hyper-inflammatory immune response early during the disease course may enable timely treatment of those at risk of high mortality. This is of particular interest since progression to acute respiratory distress syndrome or sepsis often marks the "point of no return", where most treatment options (including high dose glucocorticoids) become ineffective 20 . Therefore, targeted blockade of additional detrimental hyperinflammatory responses using early glucocorticoid and/or a monoclonal antibody (neutralizing proinflammatory IL-6) therapy might prevent exacerbation 21,22 . This can, optimally, facilitate allocation of limited resources during www.nature.com/scientificreports/ a pandemic (and prevent the collapse of the healthcare system); patients at a low risk can be discharged from hospital safely after 3 days of observation to self-quarantine at home or in a living and treatment center 7 , whereas patients at a high risk should remain in hospital for close monitoring and intensive treatment. Further studies are needed to investigate whether implementing the CHANeL model will save more lives and/or shorten hospital stay. This study has several limitations. First, this study included only hospitalized Korean patients. External validation of the CHANeL models in different ethnic groups is required. Second, the mortality in this cohort was 2.7% whereas the current mortality of COVID-19 is 1.6% in Korea 23 . As the patients in this cohort only included hospitalized patients, the mortality was expected to be significantly higher than that in the general population, indicating that the relatively mild COVID-19 cases were included (58.2% of patients in the training set and 63.1% in the test set did not require any oxygen supplementation). This is, in part, due to the low incidence of COVID-19 in Korea, allowing the many patients with mild COVID-19 being treated as inpatients. The higher proportion of the non-O2 requirement, however, help to build the model better. Third, information on arterial blood gas analysis or PaO 2 /FiO 2 (PF) ratio was not available in all patients. Instead, we utilized SpO 2 /FiO 2 ratio which correlates with PF ratio 24 . Last but not the least, the primary aim of the study was to identify high risk patients who require more intensive monitoring and treatment (i.e. oxygen requirement as a surrogate marker for more severe disease). Therefore, an ideal study population would be patients who are just diagnosed with www.nature.com/scientificreports/ COVID-19. Accordingly, the prediction models need to be validated in a prospective cohort of patients who are diagnosed with COVID-19.
In conclusion, CHANeL prediction models based on serial measurements of CRP, ANC, and ALC during first 3 days of hospitalization, along with age and hypertension, provide an accurate estimate of the risk of supplement oxygen requirement among hospitalized patients with COVID-19. Further studies are needed to examine whether implementing this model at bedside can improve outcomes and shorten hospital stays. Table 2. Sensitivity, specificity, positive predictive value, negative predictive value, and accuracy of the models for the training and test set (probability cut-off = > 0.3, > 0.4, or > 0.5). NPV, negative predictive value; PPV, positive predictive value; Se, sensitivity; Sp, specificity.

Methods
Patients and data collection. This retrospective cohort study included COVID-19 patients who were treated at five medical centers designated for treatment of COVID-19 by the South Korean government. A diagnosis of COVID-19 was confirmed by a positive SARS-CoV-2 real-time reverse transcriptase-polymerase chain reaction result from a respiratory sample; RT-PCR was performed at the participating institutions or at the Korea Centers for Disease Control and Prevention. The cohort included 280 consecutive patients with COVID-19, all of whom were admitted to one of the five hospitals from January 24, 2020 through July 10, 2020. After excluding patients with incomplete information about medications, the patients were assigned randomly to a training set and a test set in a ratio of 1:1. Of note, the patients included in this study were the same as the patients included in our prior study 8 . Demographic and laboratory data were obtained from electronic medical records. The study was conducted in accordance with the principles of the Declaration of Helsinki and Good Clinical Practice guidelines. The study was approved by the institutional review board of each participating center (NMC, SNUBH, SNUH, Armed Forces Capital Hospital, Myongji hospital). The institutional review board of each participating center (NMC, SNUBH, SNUH, Armed Forces Capital Hospital, Myongji hospital) waived informed consent because the study involved a minimum risk to the patient and no identifiable information was used.
Outcome. The primary outcome was a requirement for supplementary oxygen during the hospitalization period. Supplementary oxygen requirement, ranging from nasal prongs to mechanical ventilation, is a marker of COVID-19 severity and an important indication for close monitoring and treatment. A previous study showed that all patients with COVID-19 who did not require supplementary oxygen had a mild disease course and a good prognosis 8 .
Selection of CHANeL predictors. Two demographic variables (age and history of hypertension) were selected a priori; both of these are known risk factors for severe COVID-19 disease 25 . In addition, three routine inflammatory markers (CRP, absolute neutrophil count (ANC), and absolute lymphocyte count (ALC)) during the first 3 days of hospitalization were selected. Predictor selection was based on the previous observation that longitudinal patterns of CRP, ANC, and ALC are highly associated with a particular type of inflammatory response and clinical outcome, including oxygen requirement and death 8 .
Missing values were imputed using linear interpolation between the non-missing values immediately before and after the missing time point, with a calculated variation that follows the shape of the population's average trajectory 26 . Patients for whom missing data could not be imputed reliably were excluded.
Construction of prediction models. Logistic regression, logistic LASSO regression, Random Forest, Support Vector Machine, and XGBoost analysis were tested using the five CHANeL predictors. The ability of each model to predict supplementary oxygen requirement was assessed by calculating the area under the receiveroperator characteristic curve (AUC). A training set and a test set was used to test each model for sensitivity (proportion of oxygen requirement cases predicted correctly), specificity (proportion of no-oxygen requirement cases predicted correctly), and accuracy (proportion of cases predicted correctly).

Statistical analysis. Continuous variables and categorical variables were compared using t-tests or the
Mann-Whitney U-test, or the Chi-squared test or Fisher's exact test, as appropriate. Statistical analysis was performed using RStudio (version 1.2; Boston, MA, USA) and SPSS (IBM SPSS Statistics for Windows, Version 25.0). A P-value < 0.05 was considered statistically significant.