Introduction

Coronavirus disease 2019 (COVID-19), caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has become a pandemic with widespread increased mortality1. The spectrum of COVID-19 severity is broad and ranges from an asymptomatic and mild presentation to severe and critical illness2,3,4,5. There is increasing awareness of the cardiovascular manifestations of COVID-19 and their adverse effects on disease prognosis6. Acute cardiac injury has been reported in 8–62% of patients hospitalized with COVID-19 and is associated with greater disease severity, including the need for mechanical ventilation and death7,8,9.

Because of rapid fluctuation in infection rates and limitations in medical systems, the demand for tertiary medical services has increased. However, it is incredibly difficult to identify whether patients have good or poor prognoses in the initial stage, especially when patients are treated at home. Therefore, the early prediction of disease severity and prognosis has an important effect on clinical outcomes in patients with COVID-19.

SARS-CoV-2 infections result in electrocardiographic changes that enable the use of artificial intelligence-enhanced electrocardiography (AI-ECG) as a rapid screening test with a high negative predictive value2. Thus, AI-ECG may become a leading tool to assess the extent of cardiac involvement in patients with COVID-19, owing to its low cost, the feasibility of point-of-care testing, and the possibility of remote evaluations10. COVID-19 results in recognizable changes in the AI-ECG, and the absence of these changes exclude the presence of acute coronavirus infections, facilitating point-of-care screening.

The rapid influx of COVID-19 hospitalizations has placed a heavy load on the limited healthcare system; therefore, an efficient and streamlined risk stratification tool is required to predict the prognosis of patients. In this study, we aimed to assess whether AI using initial 12-lead ECGs could assist in the early prediction of COVID-19 disease severity.

Methods

Study population

We enrolled 1,453 adult patients (aged ≥ 18 years) who were diagnosed with COVID-19 and admitted to our tertiary hospital between March 2020 and June 2022. COVID-19 was diagnosed if the patient had a positive result in the SARS-CoV-2 polymerase chain reaction test. We included patients with comprehensive data, encompassing a 12-lead ECG indicating sinus rhythm, laboratory parameters, oxygen requirement status, clinical course, and outcomes. These patients had also undergone an ECG prior to any severity classification and before transitioning from the emergency department (ED) to COVID-19 dedicated wards, as our objective was to evaluate the predictive value of the AI algorithm using ECG data for early assessment of severity in COVID-19 patients. Patients with atrial fibrillation or atrial flutter were excluded due to their potential association with adverse clinical outcomes in COVID-19, which could introduce a confounding bias in prognosis prediction. We also excluded patients without 12-lead ECG data, those with discrepant admission data, those with an unclear discharge status, and those who had undergone ECG after transitioning from the ED to dedicated isolation rooms or the intensive care unit (ICU), implying their severity classification had already been determined. Dataset A, acquired from March 2020 to December 2021, was used for development and internal validation, and dataset B, acquired from January 2022 to June 2022 after the development of the AI, was used for external validation (Fig. 1).

Figure 1
figure 1

Study flow diagram showing the selection of patients with COVID-19 and the creation of the study datasets. ECGs were allocated to the training, internal validation, and external validation datasets using Data A and B. ECG electrocardiography; COVID-19 coronavirus disease 2019.

The Institutional Review Board of the Inha University Hospital (2021-10-006) approved the study protocol and waived the need for informed consent owing to the impracticality of obtaining consent and the minimal harm resulting from the study. The study complied with the principles of the Declaration of Helsinki.

COVID-19 severity classification

After transitioning from the ED to the COVID-19 dedicated wards, we classified patients based on the World Health Organization guideline into two categories: group 1 with mild-to-moderate illness, defined by not requiring oxygen therapy or low-flow oxygen therapy < 5 L via nasal prongs; and group 2 with severe-to-critical illness, characterized by the need of high-flow oxygen, continuous positive airway pressure, invasive mechanical ventilation, or extracorporeal membrane oxygenation [ECMO]11,12,13.

Data collection and covariates

All data in the ECGs were acquired at a sampling rate of 500 Hz using a GE-Marquette ECG machine (General Electric Healthcare, Chicago, Illinois, United States). The raw data were stored as XML documents using the MUSE data management system in relational databases. All ECG data were manually adjudicated by two electrophysiologists. We included the demographic, laboratory, clinical, and ECG covariates in our prediction models. The demographic covariates included age, sex, ethnicity, and insurance type, and the vital signs included oxygen saturation, mean blood pressure, body temperature, and ventricular rate.

AI algorithm model for predicting COVID-19 severity

The AI algorithm was developed using Long Short-Term Memory Fully Convolutional Networks (LSTM-FCN) to manage sequential data reflecting the ECG characteristics. With an attention mechanism, the AI algorithm can automatically capture the most important ECG characteristics and classify the data. We extracted and analyzed the XML data from the MUSE data management system, and to minimize the artifacts, all data files were stored in the XML format on a GE ECG machine (General Electric Healthcare, Chicago, Illinois, United States).

The ECGs were originally recorded from 12 leads; however, because of the device’s data storage method, only data from eight leads were stored, excluding lead III, aVR, aVL, and aVF. Simple arithmetic operations can be used to calculate the data from those four leads, and it is common to apply these processes to approximate the data14. Therefore, only the eight recorded signals of leads I, II, V1, V2, V3, V4, V5, and V6 were used in this study. The signals from each lead were simultaneously measured for 10 s, and when the Base64-encoded value was read, eight one-dimensional arrays for each XML file were obtained. As a 10-s signal has multiple pulses and heart rate varies from person to person, we obtained approximately 10 or more pulses per person (Fig. 2). We specified the position of the P, QRS, and T waves and analyzed those waves separately to avoid any bias from the variable heart rate. We used an algorithm to detect the R peaks, and the P, QRS, and T waves were located afterwards. We then analyzed each wave using AI, which calculated the scores for each wave. The result was presented by calculating the mean score. Additionally, we utilized the Class Activation Map (CAM) to highlight ECG segments that indicate regions significantly contributing to the severity classification in COVID-19 patients15.

Figure 2
figure 2

Description of the artificial intelligence algorithm for predicting the severity in patients with COVID-19. COVID-19 coronavirus disease 2019.

Confirmation of the performance of AI-ECGs for predicting COVID-19 severity

We trained and validated the AI-enhanced ECGs to assess the severity of COVID-19 in patients who underwent their initial ECG at our ED before severity classification. We tested the accuracy of the AI-ECG using an external dataset. We compared the area under the receiver operating characteristic curve (AUROC) to confirm the accuracy of the developed AI-ECGs. AUROC was calculated using the AI-ECG in the presence of severe-to-critical illness in COVID-19 patients with the Early Warning Scores (EWSs), including the Modified Early Warning Score (MEWS), National Early Warning Score (NEWS), and the Worthing Physiological Scoring System (WPS). Those scores were calculated after relevant data assessment16,17,18.

Statistical analysis

Continuous variables are reported as means ± standard deviations or medians and interquartile ranges, and categorical variables are presented as percentages and frequencies. Comparisons between groups were performed using the independent sample t-test or chi-square test. The performance of the AI model was measured using the AUROC to predict the dataset accuracy, recall (sensitivity), specificity, and F1 score. Recall is the ratio of correctly predicted positive observations to the total observations, while the F1 score (balanced F-score) is the harmonic mean of the precision and recall. In addition, to predict mortality in the admission of COVID-19 patients, we performed a Cox proportional-hazards model regression analysis. For all variables, p < 0·05 was considered statistically significant. Statistical analyses were performed using SPSS statistical software for Windows (version 21.0; IBM, Armonk, New York, United States).

Results

Patient characteristics

The baseline characteristics, comorbidities, and laboratory and electrocardiographic findings of the enrolled patients are shown in Table 1. The mean age of the 1,453 participants was 59.7 ± 20.1 years, and 54.2% of the patients were male. Group 1 (mild-to-moderate illness, with no need for oxygen therapy or low-flow oxygen therapy) included 892 patients, while group 2 (severe-to-critical illness and required higher treatment than high-flow oxygen [5 L via nasal prong]) included 561 patients. For both datasets A and B, the proportions of patients with hypertension (p < 0.001), diabetes mellitus (p < 0.001), and strokes (p < 0.001) were significantly greater in group 2 than in group 1. Regarding the laboratory findings, the white blood cell and platelet counts and C-reactive protein, N-terminal-pro hormone B-type natriuretic peptide, creatine phosphokinase, creatine kinase-MB, blood urea nitrogen, and serum creatinine levels were also higher in group 2 than in group 1 in datasets A and B. On comparing the ECG findings between the two groups, we found that the patients in group 2 had a higher heart rate, prolonged QRS duration, and longer corrected QT (QTc) interval than those in group 1.

Table 1 Patient characteristics and laboratory and electrocardiographic findings at enrollment.

Clinical outcomes and the EWS according to the COVID-19 classification

The in-hospital mortality rate was 8.3% (121 patients), and all patients belonged to group 2. The proportions of heart failure, intensive care unit care, invasive mechanical ventilation, and ECMO were significantly higher in group 2 than in group 1 (p < 0.001). Overall, the duration of hospitalization was significantly longer in group 2 than in group 1 (p < 0.001; Table 2). In both datasets A and B, the MEWS, NEWS and WPS scores were significantly higher in group 2 than in group 1 (p < 0.001; Table 3).

Table 2 Clinical outcomes according to the COVID-19 classification.
Table 3 A comparison among the Modified Early Warning Score, National Early Warning Score, and Worthing Physiological Scoring System according to disease severity in patients with COVID-19.

Performance of the AI model for predicting the severity and prognosis of COVID-19

During the internal and external validations, the AUCs of the AI model for predicting severe-to-critical illness in patients with COVID-19 were 0.725 (95% CI: 0.712–0.738) and 0.729 (95% CI: 0.724–0.734), respectively (Fig. 3A,B; Table 4). During the external validation, the AUCs of the MEWS, NEWS, and WPS for detecting severe-to-critical illness in patients with COVID-19 were 0.714 (95% CI: 0.672–0.756), 0.822 (95% CI: 0.786–0.858), and 0.795 (95% CI: 0.757–0.833), respectively. As shown in Fig. 3, the AI tool combined with the EWS showed reliable performance for predicting patients with severe-to critical COVID-19 with an AUC of 0.802 (95% CI: 0.798–0.806) during internal validation and 0.833 (95% CI: 0.830–0.835) during external validation.

Figure 3
figure 3

Multiclass ROC curves with deep neural networks. (A) Internal validation for predicting the severity of COVID-19 patients using dataset A. (B) External validation for predicting the severity of COVID-19 patients using dataset A. COVID-19 coronavirus disease 2019; ROC receiver operating characteristic.

Table 4 AI model performance for predicting COVID-19 severity in hospitalized patients.

AI-ECG as a significant predictor of mortality risk in admission of COVID-19 patients

Table 5 presents the analysis of risk factors associated with mortality in COVID-19 patients during hospitalization. In the Cox proportional hazards models for mortality in the admission of COVID-19 patients, after adjusting for age, sex, and relevant variables, including the EWS systems, the AI-ECG showed a significantly higher hazard ratio of 2.019 (95% CI: 1.156–3.525, p = 0.014; Table 5).

Table 5 Cox regression analysis for mortality in admission of COVID-19 patients.

ECG wave analysis using class activation maps

We performed a CAM to demonstrate ECG waveforms for COVID-19 patients throughout severity classifications to better understand the impact of COVID-19 on ECG. As illustrated in the Supplementary Figure, the activation map identified the P wave, the onset of the QRS complex and the T wave as pivotal regions for patients with mild-to-moderate illness, while the QRS complex and the T wave were prominently highlighted for patients with severe-to-critical illness (Supplementary Figure).

Discussion

We developed a new AI algorithm using initial 12-lead ECGs to identify disease severity and prognosis in patients hospitalized with COVID-19. The algorithm demonstrated reasonable accuracy for internal and external validations. To the best of our knowledge, this is the first study to develop a deep neural network that assesses the severity of COVID-19 based on initial ECGs at admission. Our algorithm can help identify patients who are more likely to develop severe-to-critical illness, thus enabling the effective deployment of medical resources and provision of adequate patient care in the early stages of a large-scale outbreak. Our AI algorithm showed the predictive value of an ECG in identifying COVID-19 severity using a deep learning algorithm. Compared to the previously commonly used physiological scoring systems, the AI-ECG had reliable performance in estimating the severity of COVID-19 in patients. The AI-ECG, combined with the EWS, had a more desirable performance in predicting the severity of COVID-19 (AUC of 0.833 [95% CI: 0.830–0.835], recall of 0.747, F1 score of 0.747, and overall accuracy of 0.745 than that of previous physiological scoring systems.

Efficient initial patient triage using the AI-ECG

A prior AI model (using a single 12-lead ECG) was created to develop a screening test to exclude those with COVID-19 infection from the general population2. Our AI algorithm may support physicians’ decision-making regarding patient referral and assist in screening patients at high risk of progressing to severe disease within the limitations of medical resources. Rapid and accurate point-of-care testing using this AI method can improve patient prognosis by focusing on effective critical care treatment in a limited healthcare system. Furthermore, AI-ECG algorithms have the potential to be applied to recently available smartphones and wearable ECGs. Therefore, AI-ECG provides a fast, reliable, efficient, inexpensive, harmless, and easily accessible method for severity screening and predicting the prognosis of COVID-19. Further, in response to the pandemic, most countries have established community treatment centers for COVID-19 patients or advocated for home isolation to manage medical resources efficiently, particularly regarding bed availability. The rapid clinical deterioration typically experienced by COVID-19 patients, often progressing within a few days from disease onset, underscores the importance of timely transfers from these facilities to hospitals equipped to manage severe to critical conditions19,20,21. The use of relatively simple, non-invasive, and cost-effective examinations, like an ECG, can be advantageous in these circumstances. This study was conducted with the anticipation that this approach would facilitate the efficient allocation of medical resources and consequently improve patient prognoses in upcoming pandemic scenarios similar to COVID-19.

Impact of COVID-19 on ECG

In this study, patients with severe-to-critical illness had a higher heart rate, prolonged PR interval, QRS duration, and corrected QT interval than patients with mild-to-moderate illness. This may be explained by the effect of coronaviruses on both cardiac function and electrophysiology22,23,24. COVID-19 affects the QT interval independently of factors that may cause QT prolongation; additionally, it is associated with severe cardiac inflammation and renin–angiotensin system activation, known to affect repolarization18, 23, 25, 26. Therefore, acute COVID-19 may subtly and pluralistically affect the ECG results27. Furthermore, cardiac depolarization and repolarization are complex and delicate processes that can be affected by cardiac dysfunction, metabolic and electrolyte imbalances, and medications, which are factors that affect patients with COVID-19. Moreover, QT prolongation is also a marker of systemic illness severity and increased mortality, as well as an independent risk factor for sudden death both in the general population and those in the ICU22.

Previous studies indicate that several ECG changes, such as prolonged PR interval, P wave duration, QT interval, and left ventricular hypertrophy, have been identified in ICU patients who died28. Heart failure and asymptomatic severe left ventricular dysfunction have both been successfully detected by deep neural networks based on the ECG29. Analyzing ECG waveforms of COVID-19 patients across severity classifications, our CAM analysis revealed distinct patterns. In patients with mild-to-moderate illness, the algorithm highlighted the importance of the P wave, the onset of the QRS complex, and T wave. However, the QRS complex and the T wave emerged as critical areas for those with severe-to-critical disease. Although we cannot fully understand and interpret the decision-making approach in deep learning algorithms due to the “black box” limitation, our results from this analysis support the assumption that ECG changes in mild-to-moderate illness are related to atrial electrical abnormalities, early alterations in ventricular depolarization patterns, and ventricular repolarization abnormalities. Conversely, the severe-to-critical disease exhibited more extensive ventricular depolarization and repolarization abnormalities. These observations suggest atrial and ventricular electrical remodeling and their potential impact on the decision-making process in deep learning algorithms30. Thus, such electrocardiographic changes may help with the risk stratification of severity and prognosis in patients with COVID-19.

AI-ECG and previous early warning scoring systems predict the severity in patients with COVID-19

EWSs are widely used in clinical practice to help doctors estimate the risk of deterioration, monitor the patient's evolution, and make clinical decisions to enhance the critical patient's safety. Many EWS models have been developed, including the NEWS, MEWS, and WPS31. These models are based on the effects of COVID-19 on the cardiovascular and pulmonary systems and several extrapulmonary organs32. However, limitations in assessing the vital signs, consciousness, oxygen saturation, and other indirect indicators may be overcome by the AI-based approach based on the ECG.

In a recent study, the AUROCs for the NEWS and MEWS in predicting mortality were shown to be 0.809 (95% CI: 0.727–0.891) and 0.670 (95% CI: 0.573–0.767), respectively31. We demonstrated a reasonable accuracy of COVID-19 severity prediction in both internal and external validations. In our study, the developed AI using the initial ECG combined with the EWS for detecting severe-to-critical illness in COVID-19 presented a better performance compared with that of the physiologic scoring systems, MEWS, NEWS, and WPS (AUC of 0.833 [95% CI: 0.830–0.835]). In the early stage of COVID-19, ECG-based AI demonstrated better performance in predicting the progression to severe-to-critical illness than the physiologic scoring systems.

This study had some limitations. First, as this was a retrospective study conducted in a single tertiary hospital in Korea, it is necessary to validate the model with patients in other hospitals and countries. A prospective study is warranted to establish the model's usefulness as a new, feasible, and noninvasive screening tool. Second, although we used CAM to visualize ECG waveforms for COVID-19 patients across various severity classifications to understand better COVID-19's impact on ECG, the interpretation of deep learning models and the underlying rationale of AI decision-making remain inherently challenging due to the nature of AI. Third, given the heterogeneity of the patient population, it is possible that the use of drugs that affect the ECG (e.g., antiarrhythmic drugs) may also have affected the network output. Fourth, it remains unclear whether the changes in the ECGs in the presence of a fever or acute respiratory distress associated with the presence of other infectious agents differed from those of COVID-19. Moreover, SARS-CoV-2 is constantly changing. Many notable strains have emerged, including the Alpha, Beta, Delta, and Omicron, and it remains unclear whether COVID-19-related ECG changes differ if the new mutation is more aggressive, highly contagious, vaccine-resistant, can cause more severe illness, or all of the above, compared with the original strain of the virus. Thus, newer variants may require prospective research into what our AI algorithms will accurately predict. Fifth, despite the favorable performance of our deep learning algorithm, overcoming false positives and negatives to identify the optimal treatment and predict the prognosis remains a critical issue. Although it is difficult to fully rely on the AI-ECG, the algorithm could predict disease severity using the initial 12-lead ECG, which is a rapid, simple, and inexpensive point-of-care test. Sixth, utilizing ECGs obtained from local health centers, private clinics, and primary and secondary hospitals might potentially be more closely aligned with the initial onset following a COVID-19 diagnosis. However, almost all patients were rapidly transferred to our hospital's ED without ECGs, resulting in a minimal time discrepancy from disease onset. Seventh, while our research robustly tested our model compared to established ones and used a separate dataset for validation, the single-center nature coupled with challenges from an imbalanced dataset and limited patients underscores the need for a large-scale study. Finally, recent studies have linked COVID-19 exposure to a higher risk of adverse cardiovascular outcomes, even after recovery from acute illness33, 34. Consequently, further research with long-term follow-up in patients with COVID-19 complicated with cardiovascular involvement is required to better understand the long-term cardiovascular consequences of COVID-19 on the AI-ECG.

In conclusion, AI using the initial 12-lead ECG demonstrated reasonable performance for predicting COVID-19 severity in hospitalized patients. This AI algorithm could significantly improve COVID-19 severity screening, both efficiently and inexpensively, considering the limited availability of medical resources in a recurrent pandemic.