Introduction

As evidence accumulates, it has become clear that the presence of cardiovascular disease (CVD) is closely related to the prognosis of COVID-19 patients1,2. Early studies reported that patients who required intensive care were more likely to have CVD3, and myocardial injury was associated with fatal outcomes in COVID-194. More recently, Cereda et al. reported that the coronary calcium score contributes to stratifying the risk of complications in COVID-19 patients5. Similarly, virus-related cardiac injury has also been highlighted through investigations of hospitalized patients6. Viral infections are often associated with lymphopenia, and COVID-19 is no exception to this. Several studies have found a correlation between disease severity and lymphopenia7,8,9. On the other hand, the role of neutrophils in COVID-19 is also attracting increasing attention10. A recent study by Parackova et al. demonstrated that neutrophils from COVID-19 patients induced T-cell polarization, leading to reduction in the percentage of Th1 cells11. In this context, it is suggested that the balance between neutrophils and lymphocytes reflects the disease activity of COVID-19. The neutrophil-to-lymphocyte ratio (NLR) is a biomarker of systemic inflammatory status that can easily be obtained from differential white blood cell count12. NLR, calculated as a simple ratio between the neutrophil and lymphocyte counts, reflects the balance between acute and chronic inflammation and is predictive of mortality, even in the general population13. A series of recent studies have shown that NLR is an independent risk factor for critical illness and hospital mortality in COVID-19 patients14,15. In 2020, Qin et al. reported that plasma T lymphocyte levels were significantly reduced, while neutrophil levels were augmented in patients with severe COVID-19 compared with those in patients with mild symptoms7. Importantly, there were significantly more cases of CVD in patients with severe symptoms. Therefore, we believe that the comorbidity of CVDs should be considered when investigating the clinical significance of NLR in COVID-19 patients. In this study, we sought to investigate the prognostic significance of NLR in patients with COVID-19 complicated with CVD and/or its risk factors (CVDRF).

Methods

Study design and population

This was a retrospective analysis conducted using Clinical Outcomes of COVID-19 Infection in Hospitalized Patients with Cardiovascular Diseases and/or Risk Factors (CLAVIS-COVID registry). This study was approved by the Ethics Committee of Ehime Prefectural Central Hospital (no. 02-22), and the study protocol complied with the tenets of the Declaration of Helsinki. The CLAVIS-COVID registry is a retrospective, observational, national, multicenter study that included an adult population with CVDRF hospitalized for COVID-19 in Japan. The main aim of the registry was to evaluate the characteristics and clinical outcomes of hospitalized COVID-19 patients with CVDRF. The study protocol including the opt-out consent method was approved by the review board of each institution, and all patients provided informed consent to participate in the study. This clinical study was registered with the University Hospital Medical Information Network Clinical Trial Registry (UMIN-ID: UMIN000040598; further details accessible at https://upload.umin.ac.jp/cgi-open-bin/ctr_e/ctr_view.cgi?recptno=R000046132) before the first patient was enrolled, in accordance with the International Committee of Medical Journal Editors. Detailed inclusion/exclusion criteria, decision of hospitalization/discharge, and definition of collected data are described in the original article16. Briefly, a total of 1518 patients were recruited from 49 hospitals from January 1 to May 31, 2020. Among all participants, 693 were complicated with CVDRF. Cardiovascular risk factors were defined as hypertension, diabetes mellitus, and dyslipidemia. Pre-existing CVD was defined as a history and/or manifestations upon admission for any of the following: heart failure, coronary artery disease, myocardial infarction, peripheral artery disease, valvular heart disease, cardiac arrhythmia, pericarditis, myocarditis, congenital heart disease, pulmonary hypertension, deep vein thrombosis, pulmonary embolism, aortic dissection, aortic aneurysm, cerebral infarction/transient ischemic attack, heart transplantation, and cardiac arrest and the use of cardiac devices (e.g., a pacemaker, implantable cardioverter-defibrillator, cardiac resynchronization therapy device, and left ventricular assist device). COVID-19 was diagnosed based on a positive polymerase chain reaction test of nasal or pharyngeal swab specimens in all patients. All patients admitted to the participating hospital and enrolled in this study were discharged by November 8, 2020, the deadline for data transfer. Clinical data, including symptoms, demographics, medical history, home medications, baseline comorbidities, physical findings, laboratory test results, radiography and chest computed tomography findings, electrocardiography and cardiac echocardiography results, treatment information, and outcomes, were obtained from electronic medical records using data collection forms. All laboratory and imaging data were obtained at the time of admission. We defined the data at “follow-up” as the results of the final blood test performed before hospital discharge, regardless of the form of discharge. According to this definition, baseline and follow-up values will be the same in cases wherein only one blood test was performed during hospitalization.

A list of all studied and excluded variables is made available through a Mendeley data repository (available at https://doi.org/10.17632/66djg6mmzf.2). Patients were divided into four groups according to the quartiles of baseline NLR values as previously reported13,14, to outline the characteristics of the cohort as the first step of the analysis. Patients whose NLR data were not available were excluded from the analysis. A schematic of the study population is shown in Fig. 1. The contribution of NLR to in-hospital mortality was analyzed as the primary endpoint using the statistical methods described below.

Figure 1
figure 1

Study population. Patients included in the analysis are in the shaded boxes, N = 601. Q1–Q4 correspond to quartiles of the baseline NLR.

Conventional statistical analysis

Data are shown as percentage for categorical variables and median (interquartile ranges, IQR) for continuous variables. One-way ANOVA and Kruskal–Wallis tests for continuous variables and χ2 tests for qualitative variables were used for between-group comparisons based on data distribution. Kaplan–Meier survival analysis was performed to compare 30-day in-hospital mortality among individuals in each NLR quartile. Multivariable Cox proportional hazards models were used to determine the hazard ratios and 95% CI for each factor. The validity of the proportional hazard assumption was verified using scaled Schoenfeld residuals. Receiver operating characteristic (ROC) curves were generated to obtain the area under the curve (AUC) as a predictor of mortality from the NLR values. The optimal cut-off value for predicting 30-day mortality was determined based on the Youden index.

Statistical significance was defined as a P-value of < 0.05. All statistical analyses were performed using SciPy, a Python library, and SPSS statistical package (Version 12, SPSS Inc, Chicago IL, USA).

Clustering analysis and data visualization

We used t-stochastic neighborhood embedding (t-SNE)17 to visualize patients’ clinical characteristics. Scikit-learn, a Python machine learning library, was used for the analysis. All demographic and clinical variables at baseline and follow-up were included in the analysis, except for those that were missing in > 30% of patients. In the case of missing values, data were replaced with the item mean as previously described18. After confirming that the major clusters were consistently identified through different values of perplexity and iteration numbers (Supplementary Fig. S1), the default parameters of Scikit-learn (dimension = 2, perplexity = 30, learning rate = 200, and iteration number = 1000) were used to create a 2D map. Parameters of interest (mortality, intubation, baseline NLR, and follow-up NLR) were displayed on each data point as a heat map.

Results

Baseline characteristics and in-hospital outcomes of COVID-19 patients according to the quartiles of NLR

The curated dataset included 601 patients with 260 variables. The median number of days from symptom onset to hospital admission was seven (IQR 3–10). The median age of the cohort was 70 (IQR 58–80) years, and 65.4% of the participants were male. Further, 23.1% of the patients (n = 139) required mechanical ventilation and the overall number of in-hospital deaths was 98 (16.3%). Table 1 compares the baseline characteristics and in-hospital outcomes of COVID-19 patients according to the quartiles of NLR (Q1 0–2.57 n = 141, Q2 2.58–4.44 n = 155, Q3 4.45–7.48 n = 151, and Q4 7.49–84 n = 154). The median age was significantly different between the groups (p = 0.013). The higher quartile included a significantly higher number of male patients (p = 0.003). Significant differences were observed in the prevalence of diabetes mellitus and previous myocardial infarction (p = 0.008 and p = 0.028, respectively). Shortness of breath and dyspnea were more prevalent in the group of patients with higher NLR values (Q1 22.7%, Q2 25.8%, Q3 39.7%; Q4, 46.1%; p < 0.001). Antiviral drugs and steroids were used more frequently in patients with a high baseline NLR. Regarding clinical outcomes, the length of hospital stay (days) was significantly longer in patients with a higher baseline NLR (Q1 15 (10–22), Q2 17 (11–27.5) Q3 18 (12–28), and Q4 20 (12–31.5); p < 0.013). Mechanical ventilation with intubation was significantly frequent among the higher NLR quartiles (Q1 6.4%, Q2 16.1%, Q3 27.8%, and Q4 40.9%; p < 0.001). Hospital mortality significantly increased with an increase in the baseline NLR quartile (Q1 6.3%, Q2 11.0%, Q3 20.5%, and Q4 26.6%; p < 0.001).

Table 1 Demographics and clinical characteristics of COVID-19 patients according to baseline NLR quartiles.

The median NLR at follow-up was 3.05 (IQR 1.95–5.77), which was significantly higher in the patients who showed high NLR at baseline. The median number of days from hospital admission to the last blood test (follow-up) was 13 (IQR 6–23), while the median number of days of hospitalization was 17 (IQR 11–27).

Baseline laboratory parameters of COVID-19 patients according to baseline NLR quartiles

Laboratory data at admission were also compared among the baseline NLR quartiles. As shown in Table 2, the absolute white blood cell count was significantly higher in the higher NLR quartile groups (Q1 4400 (3700–5800), Q2 5100 (4150–6400), Q3 6100 (4950–7430), and Q4 8300 (6125–11,122); p < 0.001). The median NLR at follow-up was 3.05 (IQR 1.95–5.77), which was significantly higher in the patients who showed high NLR at baseline. The hemoglobin levels were significantly different among the groups. NLR at follow-up was higher in the group with high baseline NLR. As for other parameters, significant differences were observed in D-dimer, CRP, and LDH levels.

Table 2 Laboratory parameters of COVID-19 patients according to baseline NLR quartiles.

Survival analysis according to NLR quartiles

Kaplan–Meier curves show the different in-hospital mortality according to NLR quartiles, as shown in Fig. 2. Briefly, cumulative mortality increased as the quartile of baseline NLR increased. The paired log-rank test revealed significant differences in survival for Q1 vs Q3 (p = 0.017), Q1 vs. Q4 (p < 0.001), Q2 vs. Q3 (p = 0.034), and Q2 vs. Q4 (p < 0.001). There were no significant differences in survival between Q1 and Q2 (p = 0.706) or between Q3 and Q4 (p = 0.121). Multivariable Cox regression analysis revealed that older age, male sex, higher BMI, higher creatinine, and higher CRP values were significantly associated with 1-month mortality, while the baseline NLR was not (Table 3).

Figure 2
figure 2

Kaplan–Meier curves for 30-day in-hospital survival among quartiles of baseline NLR. Cumulative probabilities of survival with increasing NLR values were shown. Q1–Q4 correspond to quartiles of the baseline NLR.

Table 3 Factors associated with 30-day mortality in hospitalized patients with COVID-19.

Prognostic value of the baseline and follow-up NLR for disease fatality

As shown in Fig. 3, the AUC for predicting in-hospital death based on baseline NLR was only 0.682. In contrast, the AUC for predicting in-hospital death by the follow-up NLR was 0.893. The optimal cut-off value of the baseline NLR for predicting in-hospital death was 5.39.

Figure 3
figure 3

Receiver operating characteristics (ROC) for the prediction of mortality from NLR levels. ROC curves for mortality prediction were shown for baseline NLR (red) and follow-up NLR (blue) respectively. AUC, area under the curve.

Data visualization using t-SNE

Based on the 2D mapping shown in Fig. 4, we identified two distinct clusters in the fourth quadrant (lower right), one of which was characterized by high in-hospital mortality. The population that required mechanical ventilation was also seen in the fourth quadrant and seemed to overlap with the population with a high baseline NLR. Patients who survived in the fourth quadrant had a lower NLR at the follow-up.

Figure 4
figure 4

Two-dimensional mapping of study population by t-SNE. Axes represents arbitrary unit determined based on t-SNE. Target variables are indicated by color-bar in each mapping.

Discussion

To date, several studies have reported that severe systemic inflammation is associated with higher incidence of CVD12,19,20,21,22. The results of the current study illustrate the clinical implications of the NLR in COVID-19 patients with CVDRF. The main findings were that in-hospital mortality was higher among patients with higher baseline NLR, but the predictive performance of baseline NLR for fatality was insufficient. Similarly, survival analysis showed that baseline NLR was not significantly associated with 30-day mortality, unlike other parameters, including age and sex. In contrast, NLR at follow-up was significantly higher in deceased patients than in those who survived, leading to a high area under the ROC curve for the prediction of mortality. Further analysis using unsupervised machine learning-based visual mapping revealed that a cluster with a high baseline NLR tended to require mechanical ventilation but did not necessarily show high mortality.

Previous reports have revealed an association between NLR and disease severity or mortality in COVID-19 patients. However, the cut-off values for prognosis prediction vary across studies, probably due to the heterogeneity of the cases studied23,24,25,26. In addition, the cut-off value calculated in this study (baseline NLR > 5.39) was higher than any of the previously reported values. According to a study by Caillon et al., NLR was not selected as an important variable in their mortality prediction model26,27. A possible factor for the varying results could be the timing of data collection. A recent study by Jimeno et al. showed that the peak NLR value and the rate of NLR increase, but not the NLR value at hospital admission, are significantly associated with mortality in COVID-19 patients28. Although the prevalence of comorbidities has not been reported in detail, their findings are consistent with our results that later data are more reflective of disease severity. Interestingly, the median number of days from symptom onset to hospital admission in the data of Jimeno et al. was the same as ours, with a median of 7 days. During the study period, the Japanese government mandated the hospitalization of all patients with COVID-19 regardless of disease severity during patient enrollment29. Therefore, laboratory data at the time of hospitalization may have been collected before the onset of the disease or at a relatively mild stage compared with in reports from other countries. In any case, it should be noted that the follow-up blood tests in our study were collected at the end of hospitalization; therefore, they are not clinically useful in predicting disease severity.

Another unique aspect of our research lies in our application of the cluster analysis method using unsupervised machine learning (specifically, t-SNE). Analysis using machine learning has become a trend in the field of cardiovascular medicine owing to its advantages over existing statistical methods30. A nonlinear dimensionality reduction technique, t-SNE is a manifold learning algorithm that is commonly used for the visualization of high-dimensional data in genomic analysis. The use of this algorithm is not limited to genomic data but also includes the analysis of electronic medical record data and posturography data in neurodegenerative diseases31,32. Recently, De Canniere et al. reported the efficacy of two-dimensional (2D) visualization of 6-min walking test data using t-SNE-based mapping33. The 2D map allows for the simultaneous assessment of the relative similarity of all subjects in our dataset, along with the distribution of their clinical characteristics. In this study, we visualized the different distributions of NLR quartiles according to the time at which the data were obtained (admission or follow-up). We believe that this method is useful for phenotyping patient groups, as it can represent high-dimensional data in a human-interpretable (2D) way.

In summary, we have shown that while NLR clearly reflects disease activity, it does not necessarily predict future disease severity based on values at admission. Further validation with therapeutic intervention is required to confirm the usefulness of the NLR as a biomarker.

Limitations

The current study has several limitations. First, this was a retrospective study, and there were considerable missing values for baseline serum biomarkers, especially cTn and BNP/NT-proBNP. The missing data might result in our univariate and multivariate analyses of cardiac biomarkers differing from the results of previous studies. In addition, patients with mild disease had only one blood test performed during hospitalization, which may have led to bias due to missing data. The second limitation was the study population. As mentioned earlier, this study used a registry that enrolled patients with relatively mild disease, which may not necessarily reflect the current status of most hospitalized patients. Recently, the proportion of new variant strains of SARS-CoV-2 has increased in hospitalized patients, and the variant strains have been reported to differ from conventional strains in infectivity and severity of disease34,35. Therefore, the influence of the variant strains is a major limitation that cannot be addressed in this study. Third, we were unable to show the clear benefit of using the ratio of neutrophils to lymphocytes, rather than using their sole values. As shown in Supplementary Fig. S2, the advantage of using NLR only emerged at the time of follow-up. Fourth, when we excluded CRP from the covariates in the multivariable Cox proportional hazards regression model, the baseline NLR became an independent factor for predicting 1-month mortality (Supplementary Table S1). Therefore, multicollinearity was suspected between CRP level and NLR. Lastly, the CLAVIS-COVID registry analyzed in this study focuses on patients with CVDRF, and no blood test data are available for the control group (subjects without CVDRF). Therefore, it is unclear whether patients with CVDRF exhibit a higher NLR than those without.

Conclusion

NLR values may have prognostic implications among hospitalized COVID-19 patients with CVDRF; however, their significance depends on the timing of data collection. In predicting the prognosis of COVID-19 patients with CVDRF using the NLR, one should focus on changes in values over time.