Introduction

COVID‐19 infection disease severity ranges widely, from asymptomatic or mild, self‐limiting illness to severe progressive pneumonia, multiorgan failure, and death1.

In addition to respiratory manifestations, several somewhat specific complications have been shown to be associated with COVID-19 illness, including cardiac and cardiovascular complications2, thromboembolic complications3, neurologic complications4, and inflammatory manifestations5. As the clinical picture is quite variable, a question emerges regarding the factors that direct the disease in a specific course.

The case of cardiac involvement is considerably diverse. Although cardiac complications are common and are associated with increased mortality6, cardiac involvement is heterogeneous, including right ventricular (RV) dysfunction or dilatation, left ventricular (LV) diastolic dysfunction and systolic dysfunction (10%)2.

As patients’ baseline characteristics and disease manifestations vary, we hypothesized that different epitomes of disease manifestations may be identified. To identify those, we sought to use a strategy of machine learning-based clustering. Unsupervised discovery of subtypes of a single disease has been widely used in cardiology7, infectious disease8, and critical care medicine9 to find better pathologic explanations and improve current treatments for these conditions.

Although several trials have endeavored unsupervised clustering of COVID-19 illness10,11, these included no comprehensive data regarding cardiac performance. As the echocardiographic data of patients with COVID-19 illness are essential for elucidation of both pathogenesis and prognosis12, we find this addition imperative to an enhanced clustering of COVID-19 illness.

Methods

Details about data acquisition and gathering have been specified previously2. In brief, we prospectively studied consecutive adult patients (aged ≥ 18 years) admitted between March 21, 2020, and September 16, 2020, to the Tel Aviv Medical Center due to COVID‐19 infection. All patients had a diagnosis of COVID‐19 infection confirmed by a positive reverse‐transcriptase polymerase chain reaction assay. Demographic data, comorbid conditions, medications, physical examination, laboratory, and ECG findings were systematically recorded. All patients underwent comprehensive transthoracic echocardiography within 48 h of admission as part of a predefined step‐by‐step protocol. Clinical and imaging data were collected prospectively. Mortality analysis started at the time of baseline echocardiographic examination and included in‐hospital mortality. Mortality was ascertained until the end of follow-up, beyond hospitalization and irrespective of discharge date, for all patients by telephone calls and was complete for all the patients.

Ethics approval

Since data were evaluated retrospectively, pseudonymously and were solely obtained for treatment purposes, the ethics committee of the Tel Aviv Medical Center approved the study (institutional review board number 0196‐20‐TLV) and voided the requirement of informed consent. The research was performed in accordance with the Declaration of Helsinki.

Echocardiography

Echocardiography was performed in a standard manner with the same equipment (CX 50; Philips Medical Systems, Bothell, WA) by cardiologists with expertise in echocardiographic recording and interpretation. In accordance with current guidelines13, the following measures were undertaken to minimize the risk of infection: (1) All echocardiographic studies were bedside studies performed at the designated COVID‐19 intensive care or internal ward units. (2) All echocardiographic examinations were performed with small, dedicated scanners because of their easier disinfection. (3) Echocardiographic scanners were set aside in each COVID‐19-designated ward to minimize the risk of infection spread. (4) Personal protection at the time of echocardiographic recordings included N‐95 respirator masks, fluid‐resistant gowns, gloves, head covers, and eye shields. (5) Electrocardiographic monitoring during imaging was omitted, and all measurements were performed offline to reduce exposure and contamination. LV diameters, ejection fraction, and mass were measured as recommended14. Measurements of mitral inflow included the peak early filling (E wave) and late diastolic filling (A wave) velocities, E/A ratio, and deceleration time of early filling velocity. Early diastolic mitral septal and lateral annular velocities (e′) were measured in the apical 4‐chamber view15. Left atrial volume was calculated with the biplane area‐length method at end systole. Forward stroke volume was calculated from the LV outflow tract with subsequent calculation of cardiac output.

From 4‐chamber views encompassing the entire RV, end‐systolic and end‐diastolic RV areas and tricuspid annulus were measured. RV function was evaluated by tricuspid annular plane systolic excursion (TAPSE), systolic tricuspid lateral annular velocity measured in the apical 4‐chamber view, and fractional area change14,16. Hemodynamic right‐sided assessment included the measurement of the pulmonic flow acceleration time to assess pulmonary vascular resistance17.

Patients underwent another comprehensive echocardiographic test whenever there was any clinical deterioration, i.e., the need for mechanical ventilation and hemodynamic support, according to the treating physician’s judgment. This test was performed in the same manner as the first test performed upon arrival.

The cohort

Patients who signed the DNR/DNI (n = 24) were removed from the cohort, resulting in 506 patients. For each patient, we used as input for clustering the measurements that were taken at admission and the first echocardiography result. Variables describing outcomes and treatments (n = 23) and medications (n = 43) were not part of the input data for clustering and were later used for evaluation of the clinical significance of the clusters. Variables that were missing in more than 2/3 of the cohort were also excluded (n = 23). We allowed a relatively high missing rate, as the data are sparse, to keep a high number of variables. The missing rate of each variable is found in Supplementary 3. This process left 85 continuous variables and 56 categorical variables. Thirty-one of the continuous variables and two of the categorical variables were from echocardiography.

Computational methods

Imputation and normalization

Missing values in continuous variables were imputed using the Iterative Imputer algorithm based on MICE18. The continuous variables were normalized using the Yeo-Johnson power transform for nonnegative variables19 (see Supplementary 1). For the categorical variables, missing values were imputed with the most frequent value (another method for imputing the categorical variables achieved similar results. For details, see Supplementary 1).

Clustering

We used the k-prototypes20 algorithm, which is a distance-based algorithm that allows the use of mixed data, i.e., both categorical and continuous variables. We chose k-prototypes because it was reported as one of the best performers in a recent benchmark study on mixed-data clustering algorithms21. The algorithm receives as input the number of clusters \(k\) and uses the distance function between variable vectors \(x,y\):

$$d\left(x,y\right)= \sum_{j=1}^{p}{\left({x}_{j}-{y}_{j}\right)}^{2}+\gamma \sum_{j=p+1}^{m}\delta \left({x}_{j},{y}_{j}\right)$$
(1)

where \({x}_{1}, \dots {x}_{p}\) are numerical variables, \({x}_{p+1}, \dots {x}_{m}\) are categorical variables and \(\delta\) is the Hamming distance function. \(\gamma\) defines the relative weight assigned to the categorical variables. We used \(\gamma =3\) and \(k=4\) (see Supplementary 2, 3).

To obtain robust clustering, we applied consensus clustering22 to multiple clustering results obtained on subsampled data. In each repetition, we randomly chose a fraction \(r\) of the patients and clustered them using k-prototypes. The process was repeated \(n\) times. For patients \(i\) and \(j\), define \(M(i,j)\) as the fraction of times in which they were in the same cluster, and the distance between them as \(D(i,j)=1-M(i,j)\). Applying (regular) k-means on the distance matrix \(D\) gives the final clusters. We used \(r=0.85\) and \(n=50\) (see Supplementary 2).

Evaluation of clusters

To evaluate the quality of the clusters, we looked for clinically significant characteristics of the patients composing the different clusters. For each variable, we tested its significance under the null assumption that it does not vary between clusters. For the continuous variables, we performed ANOVA, and for the categorical variables, we performed the Chi2 test, both implemented in SciPy23. P-values were corrected for multiple testing with FDR.

To find variables that are most discriminative between clusters, we computed the absolute standardized mean differences (ASMD) score (see Supplementary 1).

To visualize the different characteristics of the clusters, we created a radar plot of selected variables. Each variable was normalized to [0.2–1], where 0.2 is the lowest cluster average or percentage and 1 is the highest.

To analyze the survival trends in different clusters, we plotted the Kaplan–Meier survival curves24 for each cluster and computed the conditional multivariate log-rank test to compare the survival plots across the clusters. To measure the predictability of the clusters for in-hospital mortality, we estimated survival times from the Cox proportional hazard model25 with binary variables that represent the cluster’s membership as the covariates of the model and calculated the c-index26.

We also wished to evaluate whether the clusters showed distinct survival patterns among the patients who received respiratory support. For that, we built another Cox proportional hazard model for in-hospital mortality using only the patients who received respiratory support, with cluster labels as the covariates. We excluded Cluster 0, where only one patient received such support, and computed hazard ratios per cluster (relative to cluster 1). Implementations of Kaplan–Meier, Cox proportional hazard, C-index and log-rank were taken from lifelines27.

Evaluating the contribution of echocardiography

To evaluate the contribution of the 33 echocardiography variables to creating meaningful clusters, we tested the change in c-index obtained after randomly permuting across patients the values for each of the 33 variables, independently for each variable. In this way, we kept the distribution of each variable and the number of input variables unchanged. Next, we performed a similar process where we randomly chose 31 continuous variables and 2 categorical variables (the same numbers as for the echocardiography variables) and permuted their values to compare the contribution of the echocardiography data to randomly chosen variables. We also tested the change in echocardiography measurements over time using the results of a second echocardiography that some of the patients underwent. Full details of this analysis are found in Supplementary 6.

Results

The final cohort studied included 506 hospitalized patients with PCR-positive COVID-19 of average age 62.31 and 36.96% females (n = 187). Further demographic and clinical characteristics are shown in Table 1.

Table 1 Demographics and selected clinical features of the cohort.

Identifying distinct patient subgroups

We clustered the patients into four clusters labeled 0 to 3, with sizes of 128, 195, 112, and 71, respectively. We analyzed several possible values for the number \(k\) of clusters and selected k = 4 as the appropriate number (see full analysis in Supplementary 2). As we demonstrate below, the echocardiography parameters had a major contribution to the quality of the results.

The main parameters that significantly distinguish between the clusters are presented in Table 2, and the full list is shown in Supplementary 3. Based on these parameters, the most prominent characteristics of the clusters are as follows:

  • Cluster 0—young patients (average age: \(43.63\)) with less medical history of significant diseases, with mild signs of inflammation at admission (low CRP) and no deaths.

  • Cluster 1—relatively young patients (average age: \(61.27)\) with a more severe inflammatory process. Of these, 27% received respiratory support, and 10% died while hospitalized.

  • Cluster 2—older patients (average age: \(76.67)\) with fewer inflammatory markers upon admission, with a richer medical history (e.g., 75% with hypertension) and more females than any other cluster (69%).

  • Cluster 3—older patients (average age: 76.22) with marked inflammatory processes and a significant medical history (e.g., 83% with hypertension). These patients had the highest in-hospital mortality rate (54%).

Table 2 Statistics of selected variables that were significantly different among the clusters.

To further assess the discriminative value of the different variables between the clusters, we calculated the absolute standardized mean differences (ASMD) scores for all variables. Figure 1 shows the variables with the highest scores. Among the variables with high ASMD are some natural COVID-19 parameters, such as CRP for inflammatory state, pulmonary ultrasound findings and findings in chest X-ray for lung function, and MEWS and SOFA, established warning scores for clinical deterioration. Age had the highest score, and several echocardiography variables, in particular left ventricle variables (E′ lat, E′ sept, E/e average), were scored high. We also evaluated the clinical merit of the clusters by calculating their discriminative power with respect to outcomes, which were not part of the input for clustering. Among them, in-hospital mortality and respiratory support-related variables scored high.

Figure 1
figure 1

Variables with the highest ASMD scores. In red are outcomes that were not used for the clustering; in green are echocardiography variables. Total B + C a lung ultrasound parameter, combining consolidation and B line results.

Figure 2 shows a radar plot of significantly distinguishing variables (see Supplementary 4) that were selected to represent different aspects of the disease and the characteristics of the patients. Cluster 0 stands out as having the best results in most health-related parameters. In contrast, cluster 3 has the most severe results in most parameters. Cluster 2 patients have very similar age distributions to those in cluster 3, they have higher rates of dementia, and in both clusters we see high rates of hypertension and other comorbidities (likely due to high average age). However, remarkably, the COVID-19-related parameters of cluster 2 are much better: low CRP and high O2 saturation (this variable was reversed, so a high value in the plot means low O2 saturation). On the other hand, patients in cluster 1 were younger with fewer background diseases but worse COVID-19-related variables, such as high CRP, chest X-ray findings and O2 saturation.

Figure 2
figure 2

Radar plot for selected variables in clusters. Each variable is normalized to [0.2, 1], where 0.2 is the lowest cluster average/percentage and 1 is the highest. For ease of interpretability, the scale of some variables was reversed so that an increase from 0.2 to 1 always accounts for worse conditions. Underlined variables were reversed. *Echocardiography features: E/e average left ventricle feature, At right ventricle feature, SVI Stroke volume index. **Hypertension and Dementia are shown as examples for past diseases. Other past diseases showed similar trends (Supplementary 3). CRP C-reactive protein a marker of inflammation; MEWS COVID-19 Modified Early Warning Score for clinical deterioration28, eGFR Estimated Glomerular Filtration Rate.

Interestingly, sex significantly differed among the clusters. Cluster 2 is significantly enriched with female patients and the only cluster with a majority of females, while clusters 1 and 3 have a percentage of females below the cohort average (37%). A comparison of outcomes between males and females aged 80 and above in the full cohort showed that males are significantly more likely to receive respiratory support (p-value = 0.02), which is a sign for deterioration. All other outcomes were worse in males but not significantly so, perhaps due to the small sample size. For full details see Supplementary 7.

We also wished to evaluate to what extent the clusters differ in terms of in-hospital mortality. Figure 3 shows the Kaplan-Meyer survival curves of the four subgroups. We can see three distinct survival patterns, with no deaths in cluster 0, an intermediate survival pattern for clusters 1 and 2, and the worst survival of cluster 3. The log-rank test to assess differences in survival functions across clusters produced a highly significant p-value of 2.73e−12. The c-index for the Cox proportional hazard model (see “Methods”) was 0.77, which is relatively high.

Figure 3
figure 3

Kaplan–Meier survival curves for each cluster for the event “In-hospital mortality”. P is the log-rank p-value.

Note that clusters 1 and 2 are very different in terms of age but have similar survival plots. For a detailed comparison of clusters 1 and 2, see Supplementary 5. While many parameters seem to be age related, Fig. 3 suggests that the course of the disease is not entirely dominated by age.

Figure 4 shows the fraction of patients in each cluster who received respiratory and hemodynamic support. Patients in Clusters 1 and 3 suffered from severe disease, and therefore, a higher percentage of them were treated with respiratory or hemodynamic support (drug or mechanical).

Figure 4
figure 4

Percentage of patients who received respiratory support and hemodynamic support in each cluster.

Clusters 1–3 included a substantial number of patients who needed respiratory support. To test if their risk differed across clusters, we built a Cox proportional hazard ratio model for in-hospital mortality based only on those patients, using cluster labels as the covariates. Cluster 0 was excluded because only one patient in that group was ventilated. Cluster 1 was used as a reference. The hazard ratios were 1.25 (0.50, 3.14) for cluster 2 and 4.27 (2.42, 7.52) for cluster 3 (95% confidence interval in parentheses). Hence, ventilated patients in cluster 3 were at higher risk than those in cluster 1, although their inflammatory condition was similar at admission (CRP 129 vs. 133, see Table 1). Ventilated patients in cluster 2 were at a similar risk to those in cluster 1 despite the age difference between the groups. Although patients in cluster 2 were relatively older, they arrived in better inflammatory condition. The Cox model for the ventilated patients shows similar trends in survival to the full cohort, as was observed in the Kaplan–Meier curves in Fig. 3 (results not shown).

There were no significant differences between clusters in terms of treatment. For more details see Supplementary 8.

Echocardiography contribution

Full statistics of the echocardiography variables are presented in Table 3. To further assess the contribution of the echocardiography data to the formation of meaningful clusters, we shuffled the echocardiography values, reclustered the resulting data 50 times (see “Methods”) and recomputed the c-index and log-rank p-values in each case. The average log-rank p-value was 1.97e−06 ± 6.54e−06 with a median of 3.98e−07, far less significant than on the original data (2.73e−12, Fig. 3). The average c-index was 0.74 ± 0.01, a decline of 0.03 compared to the initial clustering with the echocardiography variables. A clustering solution obtained without the echocardiography variables obtained a c-index of 0.74 ± 0.01 and a log rank p-value of 4.66e−06 ± 5.2e−0.06. While the gaps are modest, they are statistically significant. Together, these tests suggest that clusters obtained with echocardiography data are more predictive of mortality.

Table 3 Statistics of all echocardiography variables.

As another test of the contribution, we repeated the process of choosing at random 31 of the 85 continuous variables and 2 of the 56 categorical and shuffling their values. The average of 50 random choices was 0.73 ± 0.03, similar to just shuffling the echocardiography parameters. The average p-value of the log-rank test is 2.29e−04 ± 8.88e−04 with a median of 2.45e−08. This means that the echocardiography data are roughly as meaningful for forming the clustering as the rest of the parameters. It also shows redundancy in the contribution of different variables, as the changes are relatively small.

Echocardiography over time

Forty-eight patients who suffered clinical deterioration underwent multiple echocardiography measurements adjacent to the time of deterioration (only the first echocardiography measurement was used as input for the clustering). The differences in selected echocardiography variables between the first and second echocardiography results are shown in Supplementary 6. Due to the small sample size, the differences are not statistically significant, but some parameters show a trend.

Discussion

In the above trial, we have shown that hospitalized COVID-19 patients can be divided at admission into different clusters that have significant implications regarding disease course and final prognosis.

Although some aspects of the clusters seem self-evident (e.g., young patients have a better prognosis than old patients), others are less obvious, raising interest in the model’s ability to influence our understanding of COVID-19’s pathophysiology and progression and possibly help tailor treatments based on patient characteristics. The clusters significantly differed in natural COVID-19 parameters, such as CRP, chest X-ray findings and MEWS score, suggesting that the clusters are different in terms of the state of the disease. Moreover, the clusters are also separated by mortality- and respiratory-related variables that were not part of the input data, showing good separation among the clusters in terms of these outcomes. A further analysis of the survival patterns shows that the clusters manifest very distinct survival patterns.

The main point of interest is the difference between clusters 1 and 2. Patients in cluster 1 were younger with fewer chronic diseases and less cardiac involvement on the one hand and higher levels of inflammatory markers and markers of pulmonary involvement on the other hand. Cluster 2 had the highest average patient age, but lower levels of inflammatory markers and notably was the only cluster enriched with females. This suggests that older infected males may be more prone to deteriorate, in comparison to females in the same age group. This is also supported by significantly higher rate of respiratory support provided to older males, in comparison to older females, across the whole cohort. An interesting finding is that pulmonary artery acceleration time was similar between clusters, possibly signifying that it is a marker of both pulmonary dysfunction due to increased afterload because of pulmonary vasoconstriction and cardiac intrinsic dysfunction. Although these clusters had very different starting points on admission, the patient’s overall survival was similar in both. However, there were still significant differences in the need for hemodynamic support and respiratory support, with patients in cluster 1 necessitating more support. This difference might be explained by a different mechanism of deterioration between the two groups, possibly due to different pathological responses in the heart and lung systems between clusters. These differences show that the clusters are not only statistically and computationally justified but may also signify slightly different clinical entities.

Another noteworthy comparison is between cluster 1 and cluster 3. While the inflammatory state at admission was similar in the two clusters, the survival of cluster 3 was much worse, both when comparing the full clusters and when comparing only the ventilated patients. We can attribute the higher mortality to other factors, particularly age and comorbidity burden28.

Variance in clinical deterioration of clusters

Another interesting finding is the difference between clusters of patients with clinical deterioration. Patients in clusters 1 and 3 had a decrease in stroke volume upon clinical deterioration together with an increase in right ventricular diastolic area (Table 3). This combination of an increase in RV preload (increase in right ventricular diastolic area) combined with a decrease in cardiac output (stroke volume) suggests that in patients in clusters 1 and 3, the right ventricle worked on the flat portion of the Frank–Starling curve30, signifying significant right ventricular failure. This is in contrast with patients from cluster 2 who had an increase in stroke volume with unchanged RV area, suggesting normal RV filling pressure and high output, indicating that in patients from cluster 2, clinical deterioration was due to cytokine storm and vasodilatation and not due to RV failure.

Effect of echocardiography on clusters.

Although previous studies have tried to cluster COVID-19 patients, none have used echocardiography in the process. As previously demonstrated, echocardiography has added value over clinical characteristics alone in determining prognosis. In this study, we have shown that echocardiography data contributed to patient clustering in a modest but statistically significant manner and that clustering performed using echocardiographic data yields clusters that are better at predicting prognosis. However, conducting echocardiography is far more complex than obtaining the other clinical variables, the contribution of echocardiography, although significant, does not seem to justify the additional collection burden.

Limitations

The study was performed only on patients who were admitted to the hospital and thus cannot be generalized to the entire population of patients with COVID-19. However, hospitalized patients were needed to perform echocardiography on each patient, and admitted patients are usually higher risk patients and thus of higher interest to study.

Furthermore, the study was performed on patients admitted during the first wave of COVID-19, before the emergence of new treatments, the availability of vaccines and the appearance of new strains of CVOID-19, such as the Delta and Omicron strains, and thus may not be relevant to the current manifestations of disease with newer treatments, vaccines and strains. Our group plans to perform a similar study on current patients to evaluate the clusters in a contemporary setting.

Additionally, this study was performed on patients admitted only to one center, and the results may not be generalizable to other cohorts of patients. Although we tried to find validation cohorts to compare the results and prevent overfitting, none were available that had comprehensive echocardiographic data. Furthermore, we used consensus clustering and carefully chose the clustering algorithm’s hyperparameters to ensure robust clustering results.

Regarding the echocardiography contribution, although the current results do not support performing echocardiography at baseline in all patients, we did not include direct cardiac complications (such as myocardial infarction, myocarditis, and cardiac arrhythmias) as outcomes. Therefore, we cannot estimate the predictive ability of baseline echocardiography and cluster assignment of these conditions.

In conclusion, hospitalized COVID-19 patients were segregated into different clusters according to demographic, clinical and echocardiographic data at admission. These clusters of patients showed different disease courses and proved valuable in determining prognosis.