iCOVID: interpretable deep learning framework for early recovery-time prediction of COVID-19 patients

Wang, Jun; Liu, Chen; Li, Jingwen; Yuan, Cheng; Zhang, Lichi; Jin, Cheng; Xu, Jianwei; Wang, Yaqi; Wen, Yaofeng; Lu, Hongbing; Li, Biao; Chen, Chang; Li, Xiangdong; Shen, Dinggang; Qian, Dahong; Wang, Jian

doi:10.1038/s41746-021-00496-3

Download PDF

Article
Open access
Published: 16 August 2021

iCOVID: interpretable deep learning framework for early recovery-time prediction of COVID-19 patients

npj Digital Medicine volume 4, Article number: 124 (2021) Cite this article

4770 Accesses
10 Citations
4 Altmetric
Metrics details

Subjects

Abstract

Most prior studies focused on developing models for the severity or mortality prediction of COVID-19 patients. However, effective models for recovery-time prediction are still lacking. Here, we present a deep learning solution named iCOVID that can successfully predict the recovery-time of COVID-19 patients based on predefined treatment schemes and heterogeneous multimodal patient information collected within 48 hours after admission. Meanwhile, an interpretable mechanism termed FSR is integrated into iCOVID to reveal the features greatly affecting the prediction of each patient. Data from a total of 3008 patients were collected from three hospitals in Wuhan, China, for large-scale verification. The experiments demonstrate that iCOVID can achieve a time-dependent concordance index of 74.9% (95% CI: 73.6-76.3%) and an average day error of 4.4 days (95% CI: 4.2-4.6 days). Our study reveals that treatment schemes, age, symptoms, comorbidities, and biomarkers are highly related to recovery-time predictions.

Generalizable machine learning approach for COVID-19 mortality risk prediction using on-admission clinical and laboratory features

Article Open access 10 February 2023

Predicting critical state after COVID-19 diagnosis: model development using a large US electronic health record dataset

Article Open access 20 July 2021

A machine learning model for predicting deterioration of COVID-19 inpatients

Article Open access 16 February 2022

Introduction

Since the outbreak of coronavirus disease 2019 (COVID-19), artificial intelligence (AI) has played an essential role in the global fight against the pandemic, including (1) contactless telehealth systems for remote diagnosis to protect doctors and patients from the high risk of viral exposure¹ and (2) computer-aided diagnosis of the infection based on X-ray or computed tomography (CT) images to reduce the workload of healthcare workers^{2,3,4,5,6,7,8,9}. In clinical practice, it is routine for COVID-19 patients to undergo various laboratory examinations, such as blood tests, liver function tests, and CT scans. Meanwhile, patients may suffer from different symptoms^10,11 and comorbidities¹², producing large quantities of heterogeneous multimodal clinical data. Such heterogeneity represents a substantial challenge for clinicians aiming to manually analyze the complicated clinical information and provide an appropriate treatment scheme for patients. Consequently, there is a great need for automatic data analysis methods to aid clinical treatment planning for COVID-19, which has also received widespread attention over the past year^{13,14,15,16,17}. Some studies have demonstrated that biomarkers, symptoms, comorbidities, and even CT images can be applied for various prognostic prediction tasks, including the prediction of mortality risk^18,19, progression to a severe or critical state^17,20,21,22, and intensive care unit admission^23,24.

The modeling methods used in the above-mentioned studies can be roughly classified into the following two categories: (1) pure nonlinear methods^21,22 and (2) linear and nonlinear hybrid methods^17,19,20. The former directly construct deep learning models using heterogeneous multimodal data for specific tasks. For example, Ning et al.²² fused image features extracted by a deep convolutional neural network (DCNN) with other clinical features for severity-level prediction of patients. Deep learning methods can build a nonlinear relationship between the model inputs and the corresponding outputs, which can achieve promising performance. However, deep models are black boxes lacking the interpretability of the prediction results²⁵. Generally, clinicians are eager to know the clinical factors that are highly related to the prediction result rather than simply the prediction result. In contrast, hybrid methods are more practical. These methods usually first use linear analysis methods (e.g., multivariable regression or LASSO regression) to select statistically linear-significant clinical features and then train machine learning or deep learning models via the preselected features. However, these statistical analysis methods still cannot provide individual interpretability of the model prediction of each patient.

More importantly, the disease status of COVID-19 patients changes over time, i.e., a dynamic process of mutual influence between treatments and patient covariates (i.e., symptoms, comorbidities, and biomarkers)²⁶. However, treatment information was not considered in the model developed in most prior studies, and the models were only implemented as classification tasks^27,28, e.g., severity-level classification. A more practical model should focus on directly predicting the recovery time of patients based on treatment information rather than only classifying patients’ severity levels. However, it is challenging to construct models for this purpose mainly due to the following complicated characteristics of clinical data sets: (1) data sets contain a large proportion of patients with unknown outcomes who were transferred to other hospitals and thus lost to follow-up (so-called censored data in the survival analysis field). Thus, how to reasonably utilize these data when constructing models for recovery-time prediction remains problematic; and (2) there are individual differences in recovery times among similar patients. For example, two similar patients might have different recovery times even if they were treated with identical treatment schemes. It might be difficult to converge a model at the training stage owing to this time-variant issue. To avoid the above-mentioned issues, Cox’s proportional hazard (CPH) model, which is the most commonly used method in the survival analysis field²⁹, assumes a time-invariant linear combination of patients’ clinical features to simplify the model construction at the cost of poor performance.

In this study, we present an end-to-end deep learning framework termed iCOVID that considers treatment information for the early prediction of COVID-19 recovery time (Fig. 1a). iCOVID can fully use heterogeneous multimodal data (i.e., CT images, biomarkers, symptoms, comorbidities, and treatment information) from patients with different outcomes to learn the time-variant nonlinear relationship between the data and predictions. Furthermore, a feature significance ranking (FSR) mechanism is proposed to learn the nonlinear regression coefficients reflecting the significance of each feature to the prediction outputs. Extensive experiments based on multicenter data are performed to demonstrate the effectiveness of the proposed method (Fig. 1b).

**Fig. 1: Prognostic model development and study design for recovery-time prediction.**

The main contributions of this study can be summarized as follows: (1) we develop a deep learning method (i.e., the iCOVID) for recovery-time prediction of COVID-19 patients based on a large quantity of multimodal clinical data. Particularly, treatment information is considered an important factor in our work. (2) The proposed iCOVID is a time-dependent regression model, rather than a classification model, that can predict a “recovery probability distribution” within a time range since admission (see Fig. 1a). (3) An interpretable mechanism (i.e., the FSR) is designed to learn the significance of clinical features in an end-to-end manner, thereby avoiding the preselection of features.

Results

Data acquisition and preparation

To develop and evaluate iCOVID, we built a relatively large-scale data set containing retrospective data collected from a total of 2530 COVID-19 patients from Huoshenshan Hospital in Wuhan, China. From each patient, we collected the following information: (1) used treatment schemes, (2) primitive CT scans, (3) clinical features, (4) severity-level, (5) patient outcome (recovered, decreased, or censored), and (6) outcome occurring days since admission. Each treatment scheme consisted of 19 types of drugs or treatment tools, while the clinical features included two demographics (age and gender), 10 types of symptoms, 7 types of comorbidities, and 27 types of biomarkers (Supplementary Tables 1–2). In this study, all patients were randomly divided into subsets for fivefold cross-validation (Supplementary Fig. 1d). To test the generalization of iCOVID, we also built two additional cohorts as external validation sets with data collected from two hospitals in the epicenter of Wuhan (Taikang Tongji Wuhan Hospital, and Hubei Maternity and Child Healthcare Guanggu Hospital). The patient statistics are summarized in Fig. 2a and Supplementary Fig. 1a–c. More details of the data acquisition are provided in the Methods section.

Formally, each patient can be defined as a tuple $\left( {\overrightarrow {\boldsymbol{\tau}} _n,{{{\boldsymbol{I}}}}_n,\overrightarrow {\boldsymbol{x}} _n,s_n,\sigma _n,t_n} \right)$ representing the above-mentioned six categories of information. Here, $\overrightarrow {\boldsymbol{\tau}} _n$ is a 19-dimensional vector of treatment schemes, with each element represented by a binary value of 1 or 0, indicating whether a specific treatment or drug was used for the patient. I_n is an image matrix of the CT scan. $\overrightarrow {\boldsymbol{x}} _n$ refers to a vector consisting of the 46 clinical feature values. s_n is an indicator of the severity-level as follows: mild|s_n = 0, moderate|s_n = 1, severe|s_n = 2, and critical|s_n = 3. σ_n is an indicator of the outcome type as follows: censored data |σ_n = 0, recovery|σ_n = 1, and death|σ_n = 2. t_n is the day on which the outcome occurred. Figure 2b shows some examples of the tuple information. Figure 2c lists the top 15 clinical features identified via the FSR mechanism, which are highly related to the recovery-time prediction of COVID-19 patients.

Network architecture for recovery-time prediction

Figure 3a illustrates the main architecture of iCOVID, which incorporates treatment schemes, lung CT images, and clinical features as inputs. Convolutional features are extracted from the lung images using the VGG-16 network³⁰, which are then combined with clinical features and treatment schemes using fully connected layers for recovery-time prediction. The output component is a softmax layer with T neurons estimating a probability distribution ${\overrightarrow {P}} = \left[ {P_1, \ldots ,P_t, \ldots ,P_T} \right]$ within a predefined day range {1, 2, …, T} for each patient. In this expression, each element $P_t \in [0,1]$ indicates the possibility of recovery on the t^th day after admission. Considering that the number of patients who required >30 days to recover was generally low (see Supplementary Fig. 1c), we assumed that the recovery day of patients who recovered after 30 days was 31 and that of patients who died was 32. Hence, the maximum day T was set to a value of 32.

**Fig. 3: Framework architecture and predictions of four patient examples.**

To address the “black box” issue of the deep model, the FSR mechanism is incorporated in the framework as a subnetwork to estimate the significance of each clinical feature for the final predictions. Specifically, the FSR can automatically produce a weighting vector (denoted by ${\overrightarrow {\boldsymbol{\omega}}} = \left[ {\omega _1,\omega _2, \ldots ,\omega _K} \right],K = 46$) for each input clinical feature vector (i.e., ${\overrightarrow {\boldsymbol{x}}} = \left[ {x_1,x_2, \ldots ,x_K} \right],K = 46$), where each element in the weighting vector represents the significance of the corresponding clinical feature. This mechanism allows us to determine the most significant clinical features for the prediction of each patient. The FSR can be trained with the whole framework end-to-end using a multi-event loss function that comprises four losses, i.e., the censor, recover, death, and ranking losses in Fig. 3a. The former three losses are designed to address censored, recovered, and decreased patients, whereas the ranking loss³¹ is introduced to address the time-variant issue among recovered patients. The relevant details are further discussed in the Methods section.

In clinical applications, the probability distribution (i.e., the softmax output ${\overrightarrow {\boldsymbol{P}}}$) produced by iCOVID can aid in visually assessing the risk of patients. Figure 3c plots the probability distribution of four patients as follows: patients #1–#3 recovered on the 6^th, 15^th, and 20^th day, and patient #4 died on the 13^th day after admission (see Fig. 3b; the black horizontal lines and the red triangles indicate normal and abnormal biomarkers, respectively). The predicted recovery day of each patient can be estimated by the specific day with the maximum probability, i.e., ${{{\mathrm{argmax}}}}( {\overrightarrow {\boldsymbol{P}} })$ (see the peak highlighted by the vertical dashed lines in Fig. 3c). In addition, calculating the cumulative incidence function (CIF measures the possibility of a patient recovering within a specific time range, see Eq. 4 in Methods) can assess patient risk more reliably. For example, patient #2 can be considered to have a higher risk than patient #1, as the CIF $P\left( {t \le 10} \right) = 25.4{{{\mathrm{\% }}}}$ of patient #2 is much smaller than $P\left( {t \le 10} \right) = 61.8{{{\mathrm{\% }}}}$ of patient #1 (see Fig. 3d). Figure 3e demonstrates the significance of nine clinical features obtained via the FSR (three for symptoms, comorbidities, and biomarkers each), revealing the important features corresponding to each patient’s prediction (the box length indicates the significance of the corresponding feature). It can be observed that the biomarker albumin (AM) and hemoglobin (HG) are important for the prediction of recovered patients #1–#3, whereas the comorbidity shock (SK) and acute respiratory distress syndrome (ARDS) play a more significant role in the identification of deceased patient #4.

Impact of treatment schemes and CT images on recovery-time prediction

We evaluated the model performance quantitatively by calculating the time-dependent concordance index (TD-CI)³², which is a variant of the ordinary concordance index (CI) that is widely used as a discriminative index for prognostic estimation. In contrast to the CI, the TD-CI considers time and thus can reflect the potential change in outcome over time (see Eq. 8 in Methods). A larger value of TD-CI indicates the superior performance of the model. Furthermore, we assume that the predicted recovery day of each patient is the day with the maximum probability in the day range (see Fig. 3c). Then, to further validate the performance, we calculated the mean absolute day error (MADE) between the predicted and real recovery day of all recovered patients. Intuitively, the smaller the MADE value, the better the prediction of the model.

To validate the impact of treatment schemes and CT images on the prediction, we designed ablation experiments of iCOVID without using any treatment scheme or any CT images (clinical features, i.e., demographics, symptoms, comorbidities, and biomarkers were used as baseline information in all models, see Methods). The statistical results tabulated in Table 1 reveal that iCOVID can achieve promising performance with a TD-CI value as high as 74.9% (95% CI: 73.6%–76.3%) and a MADE value as low as 4.4 days (95% CI: 4.2–4.6 days) for all 1969 recovered patients. However, when the treatment scheme is ignored, the performance considerably worsens (TD-CI = 69.1% and MADE = 6.0 days). This phenomenon demonstrates that the treatment scheme is indeed an important factor in estimating how long a patient requires to recover. In addition, the results demonstrate that iCOVID also achieves inferior performance in both the TD-CI and MADE metrics when CT image information is ignored. However, the absolute difference is only 0.3% for TD-CI and 0.3 days for MADE, indicating that CT images are not as significant as treatment schemes for recovery-time prediction. Alternative DCNNs, such as ResNet-34³³, MobileNet-v3³⁴, InceptionNet-v4³⁵, and EfficientNet-b3³⁶, were also adopted as CNN feature extractors. The experimental results show that the choice of CNN model only has a trivial influence on the overall performance (see Supplementary Table 4).

Table 1 Fivefold cross-validation results: impact of treatments and CT images.

Full size table

Figure 4a–f plot the day error statistics of patients corresponding to different treatment/drug groups: antiviral drugs (ABD: arbidol; RV: ribavirin; and OV: oseltamivir), antibacterial drugs (PPL: piperacillin; CPP: cephalosporins; LFN: levofloxacin; LZD: linezolid; and MFN: moxifloxacin), traditional Chinese medicine (LQC: Lianhua Qingwen capsule and XBJ: Xuebijing), immunotherapy drugs (CP: convalescent plasma; IGN: immunoglobulin; and TB: tocilizumab), apophlegmatisant (ABX: ambroxol and ACN: acetylcysteine), and others (HPN: heparin; MPN: methylprednisolone; HFNC: high-flow nasal cannula oxygen; and VC: vitamin C). It can be observed that the median values of most boxes are very close to zero, regardless of whether the treatment schemes are considered (dark-blue boxes) or not (red boxes). This phenomenon confirms the effectiveness of iCOVID in the recovery-time prediction of COVID-19 patients. Although the median values in most dark-blue boxes are similar to their counterparts, the main difference is that almost all dark-blue boxes have much smaller interquartile ranges than the red boxes. This finding proves that iCOVID can indeed achieve more stable predictions by considering treatment schemes. It is recognized that different patients might be treated using various treatment schemes. For example, critically ill patients normally received more treatments (Supplementary Fig. 2a). The number of treatment schemes might be an implication for the model predictions. However, our experimental results demonstrate that the treatment rather than the number of treatments is more significant on the prediction (Supplementary Fig. 2b).

**Fig. 4: Distribution and statistics of the day error between the average predicted and true recovery days.**

We also analyzed the distribution of the average day error between the predicted and real recovery days in the following different patient groups: high-risk (severe and critical, Fig. 4g) and low-risk (mild and moderate, Fig. 4h) patients. The performance of iCOVID considering treatment schemes was much better than that without considering treatment schemes, especially for high-risk patients (Fig. 4g). In addition, both prediction performances were reduced for low-risk patients who recovered 24 days after admission (Fig. 4h). We attribute this issue to the constructed data set in which the number of collected patients who recovered after 24 days is very limited (Supplementary Fig. 1c), increasing the difficulty in estimating their recovery days.

Model interpretation and significant clinical features for the prediction

To understand the regions of the image and the types of clinical features that are highly related to the recovery-time prediction, we visualized convolutional feature maps using the Grad-CAM³⁷ technique and calculated the average significance of each clinical feature based on the FSR mechanism. Figure 5a shows feature maps of four representative patients and demonstrates that the proposed model mainly focused on the lesion regions of CT images to make decisions regarding moderate, severe, or critical patients. Among mild patients, almost no lesion can be observed on CT images, and the proposed model mainly relied on the whole lung region to make predictions.

**Fig. 5: Network visualization and statistics of the feature significance for the model prediction.**

Then, we created a heatmap of the average significance of each clinical feature and the recovery days (see Fig. 5b). The heatmap demonstrates that the biomarkers AM, HG, and TP, the symptoms EPC, DH, and FV, and the comorbidities DB, ARDS, and SK are important for the predictions. However, the level of significance of these features differs. For example, AM, HG, and TP are the top 3 significant features for the prediction of recovered patients, whereas the biomarkers LDH, α-HBDH, and comorbidities SK, ARDS, and DB are the most important features for the prediction of deceased patients (recovery day 32 indicates death). To further illustrate the difference, we plotted the statistical significance of the top 15 clinical features in Fig. 5c, which reveals that biomarkers (i.e., AM, HG, and TP), symptoms (i.e., EPC, DH, and FV), and comorbidities (i.e., DB, ARDS, and SK) are indeed important for the prediction. To verify the reliability of the result, we also performed a Pearson analysis³⁸ to calculate the correlation coefficients between each feature and the recovery time. The Pearson coefficients (Fig. 5d) demonstrate that the top 15 features shown in Fig. 5c indeed are strongly correlated with the recovery time of COVID-19 patients (p value < 0.001, except for the discrete features EPC, DH, and SK). Finally, we also conducted a statistical analysis of the main biomarkers among the recovered and deceased patients. The value distribution of AM, HG, TP, and LDH is plotted in Fig. 6a, which demonstrates that the recovery day is statistically negatively correlated with AM, HG, and TP but positively correlated with LDH (see the red lines). Compared with the recovered patients, patients who died normally had lower levels of AM, HG, and LDH and higher levels of TP (Fig. 6b).

**Fig. 6: Value distribution of the main biomarkers among all recovered and deceased patients.**

Comparison with benchmark survival models

Subsequently, we compared the proposed iCOVID model with two benchmark survival models that are widely used in the field of survival analysis, including CPH²⁹ model and the random survival forest (RSF)³⁹ model (see Methods for the implementation details). CPH is a linear model and assumes that the “possibility” of experiencing an event remains constant over time (i.e., the proportional hazard assumption), whereas the RSF model does not have this restriction by predicting a score for every time point, which is similar to the iCOVID model (i.e., time-dependent prediction). In this study, we utilized the time-dependent AUC⁴⁰ to validate the overall performance of all models. Figure 7 demonstrates that the iCOVID model can achieve much superior performance than both the CPH and RSF models regardless of whether treatment or image is considered. When both treatment and image information are considered, iCOVID achieves a mean AUC score of 0.841 ± 0.024, whereas the mean AUC scores of CPH and RSF are only 0.770 ± 0.045 and 0.799 ± 0.049, respectively (Fig. 7a). If treatment information is ignored (Fig. 7b), the performances of all models are worse, especially the CPH model, which obtains a mean AUC score of only 0.563 ± 0.021. However, the iCOVID and RSF models still achieve promising performance with the mean AUC scores of 0.804 ± 0.048 and 0.775 ± 0.053, respectively. When image information is ignored (Fig. 7c), the performance of the iCOVID model is slightly inferior, with a mean AUC score of 0.837 ± 0.027. Interestingly, the performances of the CPH model and the RSF model are even improved when image information is not considered.

**Fig. 7: Comparison across different survival models.**

External validation

To evaluate the generalization ability of iCOVID, we also tested iCOVID’s performance using two additional cohorts, i.e., Taikang and Guanggu. Table 2 shows the mean and standard deviation results achieved by the five models trained with fivefold cross-validation using the Huoshenshan data set. The following three main conclusions can be drawn: (1) iCOVID can still achieve promising performance in both external data sets despite its inferior performance compared with that using the Huoshenshan data set. In particular, the TD-CI score and the MADE obtained in Guanggu cohort (71.5 ± 1.8%; 4.8 ± 4.0 days) are relatively close to those obtained in the Huoshenshan data set (76.3%; 4.4 ± 3.9 days). (2) We can observe that both TD-CI and MADE are worse when treatment schemes are ignored. This phenomenon further proves that treatment schemes indeed have a significant impact on recovery-time predictions. (3) Once CT images are ignored, iCOVID can achieve inferior but still comparative performance as demonstrated by the results corresponding to iCOVID and iCOVID without CT images.

Table 2 External validation results of fivefold cross-validation models.

Full size table

Discussion

In this paper, we proposed a deep learning-based time-to-event analysis framework named iCOVID that can successfully achieve early recovery-time prediction of COVID-19 patients at admission within 48 hours. Extensive experiments and statistical analysis of multicenter data demonstrated that the average error between the predicted and true recovery days was ~4.5 days (see Table 1). Most importantly, we investigated a large number of clinical features as listed in Supplementary Table 1 that might be relevant for the prediction, including demographics (age and gender), symptoms, comorbidities, and biomarkers. Our experimental results revealed that albumin, hemoglobin, total protein (TP), expectoration, diarrhea, soreness, fever, cough, diabetes, ARDS, and shock were highly related to the recovery-time prediction (see Fig. 5b), which was consistent with prior studies^41,42. In addition to the above-mentioned features, we also investigated the impact of treatment schemes on the predictions, which has not been considered in most previous studies to the best of our knowledge. Our experimental results demonstrated that treatment schemes were indeed significant in the prediction of the recovery time (see Table 1 and Figs. 4, 7). Since iCOVID considers treatment schemes, it can be integrated with a computer-aided diagnosis system of COVID-19 to help clinicians determine the optimal treatment from various predefined treatment schemes, which can reduce patients’ recovery time to the greatest extent.

The results shown in Fig. 7 demonstrate that the proposed deep learning method can achieve much superior performance over the benchmark CPH model and the RSF model. A previous study⁴³ demonstrated that the CPH model normally has limited performance owing to the proportional hazard assumption, and the RSF model is more suitable for complicated applications as it can build a nonlinear relationship between the variables and outcomes. However, RSF is a nondifferentiable model and is widely recognized in processing discrete variables (e.g., the symptoms and comorbidities investigated in this study). In contrast, the deep learning method is more adaptive to both discrete and continuous variables (e.g., biomarkers). In addition, treatment schemes and baseline clinical features are much more significant than CT images for the prediction, regardless of model type, further revealing that the recovery of patients is more relative to precisely individualized treatment schemes. Furthermore, the external validation demonstrates that iCOVID has promising generalization ability, even though the performance is inevitably reduced (see Table 2) owing to the variation of data distribution across the multi-site data sets (see Supplementary Tables 1–3).

We reviewed representative studies concerning computer-aided COVID-19 prognosis (see Supplementary Table 5). Most studies^{17,19,20,44,45,46} focused on developing deep learning or machine learning classifiers for risk prediction (severity or mortality) of COVID-19 patients. The iCOVID model developed in our study can also be applied for risk assessment of COVID-19 patients as illustrated in Fig. 3c, d. For example, if a patient is predicted to recover after a long time (e.g., 10 days), he/she should be triaged as high-risk²⁰. In addition, we considered deceased patients and assumed that their recovery day was 32 in the model development, which allowed us to screen patients at a high risk of death. These patients can be easily identified by observing the curve shape as illustrated by patient #4 in Fig. 3d. These patients should have a flat curve with a peak on the last day of the time range. To validate the performance of iCOVID in identifying deaths, we drew ROC curves (see Supplementary Fig. 3) and calculated the AUC scores based on the predicted probability P_T of all patients. The AUC scores obtained using the Huoshenshan, Taikang, and Guanggu data sets were 94.8 ± 3.3%, 94.4 ± 2.6%, and 73.6 ± 8.3%, respectively. Data imbalance damages the AUC score obtained using the Guanggu data set, because only two deaths suffered from shock (2 of 20 deaths), which is one of the most important features for the identification of deaths (see Fig. 5b). Thus, some deaths in the Guanggu data set might be treated by iCOVID as patients who have a high probability of recovering. Promisingly, iCOVID still considers these patients at a high-risk level, and the average predicted recovery time of these patients is as high as 25.8 ± 6.5 days. It is not informative that predicting poor outcomes for patients suffering from shock or ARDS. However, ~50% of deceased patients investigated in this study were shock-free and ARDS-free within 48 hours after admission (see Supplementary Table 3). The AUC scores demonstrate that iCOVID can also precisely identify these high-risk patients.

We found only two studies that were related to recovery-time prediction. Yue et al.⁴⁷ trained a random forest model to classify patients with different hospital stay using 1218 radiomic features⁴⁸ extracted from CT images, whereas Liu et al.⁴⁹ used a Kaplan–Meier analysis⁵⁰ to determine the risk factors associated with the length of hospital stay. However, these studies had the following two main drawbacks: (1) a shortage of data for model development (<100 patients) and (2) a modeling method that was too simplistic for complicated clinical scenes, such as using heterogeneous multimodal data to predict the length of hospital stay. In particular, the Kaplan–Meier method can only be used for univariate analysis. In contrast, the proposed iCOVID solution is more practical. During the training stage, iCOVID can fully use data from patients with different outcomes (i.e., recovered, deceased, and censored) to learn the time-variant nonlinear relationship between multimodal information and events. Then, during the testing stage, iCOVID can directly estimate the length of days the patient needs to recover. Furthermore, knowledge of clinical factors, especially biomarkers that are highly correlated with the recovery time of COVID-19 patients, is also clinically important. For this purpose, an FSR mechanism is designed and incorporated with the model as a subnetwork to learn the feature significance, allowing us to screen the most important clinical features and provide strong individual interpretability of the prediction (see Fig. 3e) rather than simply statistical interpretability.

Notably, the future work includes the following limitations that are planned to be addressed. First, the model was developed and evaluated using data collected only from three hospitals and the number of patients for the external validation is relatively small. To further validate the clinical application value of the iCOVID model, we plan to perform prospective validation with cooperative hospitals. Second, the iCOVID model uses a variety of information (i.e., treatment schemes, CT images, and clinical features) as input to make predictions. However, it might be difficult to simultaneously collect all information in clinical practice. Therefore, we also validated the performance of iCOVID fed with only baseline features (all 46, top 20, top 15, top 10, and top 5 clinical features). The results demonstrate that iCOVID can still achieve promising performance, especially when considering only the top 20 and top 15 features, respectively, which can be normally obtained within 48 hours after admission (Supplementary Fig. 4). Third, the proposed FSR mechanism can reveal the significance of each clinical feature but cannot reflect the positive or negative correlation between the features and the recovery time as demonstrated by the red lines in Fig. 6. Therefore, the FSR mechanism should be further improved in future work to enhance its applicability in clinical practice. Finally, our experiments demonstrated that treatment schemes have a significant impact on the model performance. However, a more comprehensive investigation of the relationship between treatments and the recovery of COVID-19 patients needs to be performed.

In conclusion, we developed a deep learning-based time-dependent prognostic analysis framework that is applicable for the early recovery-time prediction of COVID-19 patients. We demonstrated that considering both treatment schemes and patient covariates (i.e., CT scans, demographics, symptoms, comorbidities, and biomarkers) for model development can significantly improve the prediction performance. The proposed work is not only vital for the study of COVID-19 pneumonia but also universally significant for the early prognostic prediction of other respiratory infectious diseases, especially viral pneumonia.

Methods

Ethics

The study was approved by the Ethics Committee of the First Affiliated Hospital of Army Medical University with approval number KY2020277, and the study was performed according to the principles of the Declaration of Helsinki. Since it is a retrospective study and presents no more than minimal risk, a waiver for informed consent was granted by the Ethics Committee.

Materials

We collected the information of 2530 COVID-19 patients from Huoshenshan Hospital, which was built temporarily for the emergency treatment of patients in Wuhan, China. We also collected the information of 398 and 80 patients from Taikang Tongji (Wuhan) Hospital and Maternity and Child Healthcare Hospital (Guanggu) in Wuhan, respectively. SARS-CoV-2 infection was confirmed by reverse transcription polymerase chain reaction (RT-PCR) among all patients between 1 February and 31 March 2020. All specimens were extracted from nasal and throat swabs using the same standardized protocol. Confirmed cases of COVID-19 were defined as positive RT-PCR according to World Health Organization interim guidance⁵¹. Strict recovery criteria were executed according to the diagnostic and treatment guideline for COVID-19 issued by the Chinese National Health Committee (version seventh)⁵². All the following recovery criteria had to be met for hospital discharge or discontinuation of quarantine: (1) normal temperature lasting longer than 3 days, (2) resolved respiratory symptoms, (3) substantially improved acute exudative lesions on chest CT images, and (4) two consecutively negative RT-PCR test results separately by at least 1 day. As summarized in Supplementary Table 1, the patient information included age, gender, symptoms, comorbidities, and biomarkers, which were acquired within 2 days of the patients’ admission to the hospitals (average 0.58 ± 1.52 days). The COVID-19 severity level, the number of days of hospital stay, the treatment type, and the outcome (censored data, recovery, or death) of each patient were also collected to build the survival analysis data set. The outcome-time since admission was as follows: recovery:14.5 ± 8.2 days (minimum/maximum: 3/61 days) and death: 13.5 ± 9.6 days (minimum/maximum: 3/50 days). All patients with censored data were lost to follow-up as they were transferred to other hospitals, and it is only known that these patients were in a remission state on the date of transfer. Finally, many previous studies^53,54,55,56 have reported that CT images can provide vital clues for prognostic estimation. Therefore, we also collected the primitive CT scan of each patient following admission within 48 hours. However, we only selected scans that were reconstructed with a slice thickness of ≤3 mm. Owing to these selection criteria and other unknown reasons, the imaging data of a total of 1492 patients were not considered during the data collection process. Thus, only 1516 patients (mild and moderate: 922; severe and critical: 594) had CT scan information in our data set.

Data preprocessing

Different features have different magnitude ranges. For example, the lymphocytic absolute value is generally lower than 5.0/L, whereas the value of TP is usually larger than 50 g/L. We found that the framework performance can be adversely affected if the framework is trained with inputs of the original feature values. In particular, it is difficult for the FSR mechanism to learn regression coefficients that precisely reflect the significance of each feature. To reduce this impact, we normalized all features to ensure that their values ranged between 0 and 1 before feeding them into the framework (missing values were set to 0 by default). Regarding the CT scans, we first resized the scans to the same voxel size of 1 mm ×1 mm × 1 mm using bilinear interpolation to reduce variation across different scans, especially the slice thickness. Subsequently, we obtained the lung region mask of each scan using 3D-Unet trained for lung region segmentation from chest CT images². This mask was consequently used to calculate a hull convex region for cropping a refined lung-only CT scan intended to remove image noise outside the lungs and reducing the cost of GPU memory. All cropped subvolumes were downsampled to the same size of 48 × 48 × 48 before feeding to the framework due to memory limitations and computational efficiency.

Network details

The network architecture of the framework is shown in Fig. 3a. The inputs to the framework were the treatment scheme (a 19-dimensional vector), lung CT images sized 48 × 48 × 48, and clinical features (age, gender, symptoms, comorbidities, and biomarkers) represented by a 46-dimensional vector. The convolutional neural network (CNN) VGG-16³⁰ was modified to a 3D version and used to extract a 128-dimensional imaging feature vector from the lung CT images. Subsequently, the 46-dimensional feature vector was fed to the FSR module to generate a weighted feature vector. This vector was finally combined with the 128-dimensional imaging feature vector and the 19-dimensional treatment scheme vector using cascaded fully-connected layers (i.e., the hidden layers in Fig. 3a) for the prognostic estimation of the number of days a patient needs to recover. In our implementation, the hidden layers were composed of four fully connected layers with 256, 512, 512, and 256 neurons. A rectified linear unit was empirically selected as the activation function, and dropout⁵⁷ was applied after each fully connected layer during the training stage to avoid overfitting.

FSR mechanism

The FSR mechanism is designed as a subnetwork to learn a 46-dimensional coefficient vector with each element representing the significance of each feature in the 46-dimensional feature vector, which intuitively provides interpretability to the prediction result. For example, by sorting the learned coefficients, we can determine which features make the greatest contribution to the prediction. Formally, let the feature vector and coefficient vector be represented by ${\overrightarrow {\boldsymbol{x}}} = [x_1,x_2, \ldots ,x_K]$ and ${\overrightarrow {\boldsymbol{\omega}}} = [\omega _1,\omega _2, \ldots ,\omega _K]$ (K = 46), respectively, and the aim is to generate a weighted feature vector ${\overrightarrow {\boldsymbol{x}}} ^\prime = [\omega _1x_1,\omega _2x_2, \ldots ,\omega _Kx_K]$ that is finally fed to the hidden layers for the prediction. Each coefficient ω_k in the vector ${\overrightarrow {\boldsymbol{\omega}}}$ is obtained by the softmax function as follows:

$$\omega _k = \frac{{{{{\mathrm{exp}}}}\left\{ {f\left( {\left. {\overrightarrow {\boldsymbol{x}} } \right|\theta } \right)_k} \right\}}}{{\mathop {\sum }\nolimits_{i = 1}^K {{{\mathrm{exp}}}}\left\{ {f\left( {\left. {\overrightarrow {\boldsymbol{x}} } \right|\theta } \right)_i} \right\}}},$$

(1)

where f(·) indicates a subnetwork with trainable parameters θ. Since the weighting coefficients are calculated using the softmax function, they are subject to $\mathop {\sum }\nolimits_{k = 1}^K \omega _k = 1$. Intuitively, the FSR module can be simply implemented by cascading fully connected layers with each layer followed by an activation layer (e.g., SeLU⁵⁸). The final fully connected layer consists of K neurons that are connected to a softmax layer to produce the weighting coefficients. Notably, the fully connected layer must be initialized with 1.0 to guarantee that all features have an identical impact at the beginning of training.

Multi-event loss function

The network was trained by minimizing a multi-event loss, comprising the following five parts:

$${{{\mathcal{L}}}} = \lambda _1{{{\mathcal{L}}}}_{{{{\mathrm{censor}}}}} + \lambda _2{{{\mathcal{L}}}}_{{{{\mathrm{recover}}}}} + \lambda _3{{{\mathcal{L}}}}_{{{{\mathrm{rank}}}}} + \lambda _4{{{\mathcal{L}}}}_{{{{\mathrm{death}}}}} + \lambda _5\left\| {\overrightarrow {\boldsymbol{\omega}} } \right\|_1,$$

(2)

where ${{{\mathcal{L}}}}_{{{{\mathrm{censor}}}}}$, ${{{\mathcal{L}}}}_{{{{\mathrm{recover}}}}}$, and ${{{\mathcal{L}}}}_{{{{\mathrm{death}}}}}$ indicate the loss for handling censored data, recoveries, and deaths in the prognostic estimation task, respectively. ${{{\mathcal{L}}}}_{{{{\mathrm{rank}}}}}$ is a raking loss that is applied to the recoveries to address the time-variant issue. The ranking loss adapts the idea of concordance⁵⁹ as follows: a patient who recovered on day t_* should have a higher probability of recovering on day t_* than any patient who did not yet recover on day t_*. The last term $\left\| {\overrightarrow {\boldsymbol{\omega}} } \right\|_1$ is the L1-norm, which helps learn the sparse coefficient vector. λ₁, λ₂, λ₃, λ₄, and λ₅ are hyperparameters used to control the contribution of each term in Eq. 2. These hyperparameters are empirically set to 1, 2, 1, 5, and 1. The details are further explained as follows:

(a)
Loss ${{{\mathcal{L}}}}_{{{{\mathrm{censor}}}}}$ is defined as follows:
$${{{\mathcal{L}}}}_{{{{\mathrm{censor}}}}} = - \frac{1}{{N_{{{{\mathrm{censor}}}}}}}\mathop {\sum }\limits_{n = 1}^N \left\{ {{{{\mathrm{sgn}}}}\left( {\sigma _n = 0} \right) \cdot \log \left[ {1 - F\left( {t_n\left| {\overrightarrow {\boldsymbol{x}} _n} \right.,{{{\boldsymbol{I}}}}_n,\overrightarrow {\boldsymbol{\tau}} _n} \right)} \right]} \right\},$$
(3)
where sgn(·) denotes an indicator function. N_censor indicates the number of censored patients in the minibatch with size N. F(*) is the CIF, which is defined by:
$$F\left( {t_n\left| {\overrightarrow {\boldsymbol{x}} _n} \right.,I_n,\overrightarrow {\boldsymbol{\tau}} _n} \right) = \overrightarrow {\boldsymbol{P}} \left( {t \le t_n\left| {\overrightarrow {\boldsymbol{x}} _n} \right.,I_n,\overrightarrow {\boldsymbol{\tau}} _n} \right) = \mathop {\sum }\limits_{n = 1}^{t_n} P\left( {t\left| {\overrightarrow {\boldsymbol{x}} _n} \right.,I_n,{\overrightarrow {\boldsymbol{\tau}} _n}} \right),$$
(4)
where ${\overrightarrow {\boldsymbol{P}}} ( \ast )$ is the estimated probability distribution. The target of Eq. 3 minimizes all probabilities $\left\{ {P_1,P_2, \ldots ,P_{t_n}} \right\}$ based on the prior knowledge that each patient with censored data did not yet recover on the last recorded day t_n.
(b)
Loss ${{{\mathcal{L}}}}_{{{{\mathrm{recover}}}}}$ is defined as follows:
$${{{\mathcal{L}}}}_{{{{\mathrm{recover}}}}} = - \frac{1}{{N_{{{{\mathrm{recover}}}}}}}\mathop {\sum }\limits_{n = 1}^N \left\{ {{{{\mathrm{sgn}}}}\left( {\sigma _n = 1} \right) \cdot \log \left[ {P(t_n\left| {\overrightarrow {\boldsymbol{x}} _n} \right.,{{{\boldsymbol{I}}}}_n,{\overrightarrow {\boldsymbol{\tau}} _n)}} \right]} \right\},$$
(5)
where N_recover is the number of recovered patients in the minibatch. Equation 5 drives the network to learn a maximum probability on the t_n day when the nth patient recovers after admission.
(c)
Ranking loss ${{{\mathcal{L}}}}_{{{{\mathrm{rank}}}}}$ is calculated as follows:
$${{{\mathcal{L}}}}_{{{{\mathrm{rank}}}}}=\mathop {\sum}\limits_{n \ne m}\bigg\{{{{\mathrm{sgn}}}}(t_n\,<\,t_m)\cdot {{{\mathrm{exp}}}}\bigg[\frac{{F\left({t_n\left|{\overrightarrow {\boldsymbol{x}}_m}\right.,{{{\boldsymbol{I}}}}_m,\overrightarrow{\boldsymbol{\tau}}_m} \right)-F\left({t_n\left| {\overrightarrow{\boldsymbol{x}} _n} \right.,{{{\boldsymbol{I}}}}_n,\overrightarrow {\boldsymbol{\tau}} _n} \right)}}{\alpha }\bigg]\bigg\} ,$$
(6)
where α denotes a hyperparameter that is empirically set to 0.2 in this study. Since exp(*) is a convex function, minimizing Eq. 6 equals maximizing the distance between $F\left( {t_n\left| {\overrightarrow {\boldsymbol{x}} _n} \right.,{{{\boldsymbol{I}}}}_n,\overrightarrow {\boldsymbol{\tau}} _n} \right)$ and $F\left( {t_n\left| {\overrightarrow {\boldsymbol{x}} _n} \right.,{{{\boldsymbol{I}}}}_m,\overrightarrow {\boldsymbol{\tau}} _m} \right)$ subject to $F\left( {t_n\left| {\overrightarrow {\boldsymbol{x}} _n} \right.,{{{\boldsymbol{I}}}}_n,\overrightarrow {\boldsymbol{\tau}} _n} \right) > F\left( {t_n\left| {\overrightarrow {\boldsymbol{x}} _m} \right.,{{{\boldsymbol{I}}}}_m,\overrightarrow {\boldsymbol{\tau}} _m} \right)$.
(d)
Loss ${{{\mathcal{L}}}}_{{{{\mathrm{death}}}}}$ is defined as follows:
$${{{\mathcal{L}}}}_{{{{\mathrm{death}}}}} = - \frac{1}{{N_{{{{\mathrm{death}}}}}}}\mathop {\sum }\limits_{n = 1}^N \left\{ {{{{\mathrm{sgn}}}}\left( {\sigma _n = 2} \right) \cdot \log \left[ {1 - F\left( {T - 1\left| {\overrightarrow {\boldsymbol{x}} _n} \right.,{{{\boldsymbol{I}}}}_n,\overrightarrow {\boldsymbol{\tau}} _n} \right)} \right]} \right\},$$
(7)
where N_death is the number of deceased patients in the minibatch, and T is the last day in the estimation time range (T = 32 in this study). The target of Eq. 7 minimizes all probabilities {$P_1,P_2, \ldots ,P_{T - 1}$} and maximizes P_T corresponding to each deceased patient. We can observe that Eq. 7 is similar to Eq. 3. According to this definition, deceased patients are treated as a special type of patients with censored data.

Evaluation metrics

The TD-CI and the MADE were calculated to evaluate the performance of the recovery event. Given the CIF in Eq. 4, the TD-CI C^tdis defined as follows:

$$C^{{{{\mathrm{td}}}}} = \frac{{\mathop {\sum }\nolimits_{n \ne m} {{{\mathrm{sgn}}}}(t_n\, < \,t_m) \cdot {{{\mathrm{sgn}}}}(F\left( {t_n\left| {\overrightarrow {\boldsymbol{x}} _n} \right.,{{{\boldsymbol{I}}}}_n,\overrightarrow {\boldsymbol{\tau}} _n} \right) > F\left( {t_n\left| {\overrightarrow {\boldsymbol{x}} _n} \right.,{{{\boldsymbol{I}}}}_m,\overrightarrow {\boldsymbol{\tau}} _m} \right))}}{{\mathop {\sum }\nolimits_{n \ne m} {{{\mathrm{sgn}}}}(t_n\, < \,t_m)}},$$

(8)

which counts the number of predictions that correctly abide by the idea of concordance⁵⁹. The MADE d^ma is calculated as follows:

$$d^{{{{\mathrm{ma}}}}} = \frac{1}{N}\mathop {\sum }\limits_{n = 1}^N \left| {t_n - {{{\mathrm{argmax}}}}\left( {\overrightarrow {{\boldsymbol{P}}_n} } \right)} \right|,$$

(9)

where ${\overrightarrow {{\boldsymbol{P}}_n}}$ is the predicted probability distribution of the n^th patient. For the internal validation, the performance was evaluated statistically in terms of 95% confidence interval of the above-mentioned TD-CI and MADE metrics. The 95% CI values were calculated using the bootstrap method⁶⁰.

Fivefold cross-validation

The five subsets $\left\{ {{{{\mathrm{Cohort}}}}\_i\left| {i = 1,2, \ldots ,5} \right.} \right\}$ of the Huoshenshan data set were used to train five independent models $\left\{ {{{{\mathrm{M}}}}_j{{{\mathrm{|}}}}j = 1,2, \ldots ,5} \right\}$for internal validation. Each model M_j was trained using four subsets $\{ {{{\mathrm{Cohort}}}}\_i|i = 1,2, \ldots ,5\,{{{\mathrm{and}}}}\,i \,\ne \,j\}$ and tested using the remaining subset.

Ablation experiments

To validate the impact of treatments and CT images on the prediction performance, we also trained iCOVID models without considering any treatment information, i.e., setting all τ in the ground-truth treatment scheme to zero during the training stage, and iCOVID models without using any CT image information, i.e., setting all voxel values in the image matrix to zero during the training stage. The clinical data, i.e., demographics, symptoms, comorbidities, and biomarkers, were used as baseline information in all models.

Implementation of benchmark models

The fivefold CPH models and the RSF models were trained using CoxnetSurvivalAnalysis and RandomSurvivalForeset (with 100 trees) implemented in the python library Scikit-survival: https://scikit-survival.readthedocs.io/en/latest/index.html. Because the CPH model and the RSF model cannot directly process CT images, we first extracted the convolutional feature vectors from the CT images using the CNN encoders of the iCOVID models, and then, we used the feature vectors as the input to the benchmark models. Similar to the aforementioned ablation study, we also trained the benchmark models without considering any treatment or image information for comparison.

Training details

The framework was implemented using Google TensorFlow (version 2.0 with Keras API) on an NVIDIA RTX 2080Ti GPU. During the training stage, the networks were optimized by gradient descending with gradients estimated by the Adam optimizer under the constraint of minimizing the multi-event loss. The learning rate was 0.001, decaying every 100 iterations with an exponential rate of 0.96. The total number of iterations was 2k (20 epochs multiplied by 100 iterations). At each iteration, a minibatch of 72 samples was fed to the networks. We augmented the CT scans by randomly rotating each scan to 0, 90, 180, and 270 degrees, and randomly flipping the scans in the X, Y, and Z axes. For those patients without CT images, we directly input a volume of size 48 × 48 × 48 with zero values. To avoid the overfitting issue, only the model that achieved a minimum MADE using the subset applied for the online evaluation (i.e., Cohort_6, see Supplementary Fig. 1d) was saved.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

The survival data sets used for modeling are not publicly available owing to privacy concerns. However, researchers can contact the corresponding author to obtain the de-identified data upon ethical approval from the Ethics Committee of Southwest Hospital, Third Military Medical University, and signature of a data usage agreement. The remaining data are available in the article and supplementary files.

Code availability

The code, pre-trained models, and a total of 258 samples are available at: https://github.com/wangjuncongyu/covid19_recovery.

References

Ullah, S. M. A. et al. Scalable telehealth services to combat novel coronavirus (COVID-19) pandemic. SN Comput. Sci. 2, 18 (2021).
Article PubMed PubMed Central Google Scholar
Wang, J. et al. Prior-attention residual learning for more discriminative COVID-19 screening in CT images. IEEE Trans. Med. Imaging 39, 2572–2583 (2020).
Article PubMed Google Scholar
Fan, D. P. et al. Inf-Net: Automatic COVID-19 lung infection segmentation from CCT images. IEEE Trans. Med. Imaging 39, 2626–2637 (2020).
Article PubMed Google Scholar
Islam, M. M., Karray, F., Alhajj, R. & Zeng, J. A review on deep learning techniques for the diagnosis of novel coronavirus (COVID-19). IEEE Access 9, 30551–30572 (2021).
Article PubMed Google Scholar
Islam, M. Z., Islam, M. M. & Asraf, A. A combined deep CNN-LSTM network for the detection of novel coronavirus (COVID-19) using X-ray images. Inform. Med. Unlocked 20, 100412 (2020).
Article PubMed PubMed Central Google Scholar
Saha, P., Sadi, M. S. & Islam, M. M. EMCNet: automated COVID-19 diagnosis from X-ray images using convolutional neural network and ensemble of machine learning classifiers. Inform. Med. Unlocked 22, 100505 (2021).
Article PubMed Google Scholar
Islam, M. M., Islam, M. Z., Asraf, A. & Ding, W. Diagnosis of COVID-19 from X-rays using combined CNN-RNN architecture with transfer learning. medRxiv https://doi.org/10.1101/2020.08.24.20181339 (2020).
Javaheri, T. et al. CovidCTNet: an open-source deep learning approach to diagnose covid-19 using small cohort of CT images. npj Digit. Med. 4, 29 (2021).
Article PubMed PubMed Central Google Scholar
Lee, E. H. et al. Deep COVID DeteCT: an international experience on COVID-19 lung detection and prognosis using chest CT. npj Digit. Med. 4, 11 (2021).
Article PubMed PubMed Central Google Scholar
Islam, M. M. et al. Wearable technology to assist the patients infected with novel coronavirus (COVID-19). SN Comput. Sci. 1, 320 (2020).
Article PubMed PubMed Central Google Scholar
Zoabi, Y., Deri-Rozov, S. & Shomron, N. Machine learning-based prediction of COVID-19 diagnosis based on symptoms. npj Digit. Med. 4, 3 (2021).
Article PubMed PubMed Central Google Scholar
Islam, M. M., Ullah, S. M. A., Mahmud, S. & Raju, S. M. T. U. Breathing aid devices to support novel coronavirus (COVID-19)Infected Patients. SN Comput. Sci. 1, 274 (2020).
Article PubMed PubMed Central Google Scholar
Guo, W. et al. Diabetes is a risk factor for the progression and prognosis of COVID-19. Diabetes. Metab. Res. Rev. 36, e3319 (2020).
Article CAS Google Scholar
Yu, T. et al. Association Between Clinical Manifestations and Prognosis in Patients with COVID-19. Clin. Ther. 42, 964–972 (2020).
Article CAS PubMed PubMed Central Google Scholar
Chen, H. J. et al. Early chest CT features of patients with 2019 novel coronavirus (COVID-19) pneumonia: relationship to diagnosis and prognosis. Eur. Radiol. 30, 6178–6185 (2020).
Article CAS PubMed PubMed Central Google Scholar
Muhammad, L. J., Islam, M. M., Usman, S. S. & Ayon, S. I. Predictive data mining models for novel coronavirus (COVID-19) infected patients’ recovery. SN Comput. Sci. 1, 206 (2020).
Article CAS PubMed PubMed Central Google Scholar
Lassau, N. et al. Integrating deep learning CT-scan model, biological and clinical variables to predict severity of COVID-19 patients. Nat. Commun. 12, 634 (2021).
Article CAS PubMed PubMed Central Google Scholar
Li, X. et al. Risk factors for severity and mortality in adult COVID-19 inpatients in Wuhan. J. Allergy Clin. Immunol. 146, 110–118 (2020).
Article CAS PubMed PubMed Central Google Scholar
Gao, Y. et al. Machine learning based early warning system enables accurate mortality risk prediction for COVID-19. Nat. Commun. 11, 1–9 (2020).
Article Google Scholar
Liang, W. et al. Early triage of critically ill COVID-19 patients using deep learning. Nat. Commun. 11, 1–7 (2020).
Article Google Scholar
Meng, L. et al. A deep learning prognosis model help alert for COVID-19 patients at high-risk of death: a multi-center study. IEEE J. Biomed. Health. Inform. 24, 3576–3584 (2020).
Article PubMed Google Scholar
Ning, W. et al. Open resource of clinical data from patients with pneumonia for the prediction of COVID-19 outcomes via deep learning. Nat. Biomed. Eng. 4, 1197–1207 (2020).
Article PubMed PubMed Central CAS Google Scholar
Kim, L. et al. Risk factors for intensive care unit admission and in-hospital mortality among hospitalized adults identified through the US Coronavirus Disease 2019 (COVID-19)-associated hospitalization surveillance network (COVID-NET). Clin. Infect. Dis. 72, e206–e214 (2020).
Article CAS Google Scholar
Grasselli, G. et al. Risk factors associated with mortality among patients with COVID-19 in intensive care units in Lombardy, Italy. JAMA Intern. Med. 180, 1345–1355 (2020).
Article CAS PubMed Google Scholar
Arrieta, A. B. et al. Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 58, 82–115 (2020).
Article Google Scholar
Xu, X. et al. Effective treatment of severe COVID-19 patients with tocilizumab. Proc. Natl. Acad. Sci. 117, 10970–10975 (2020).
Article CAS PubMed PubMed Central Google Scholar
Wynants, L. et al. Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal. BMJ 369, 1–16 (2020).
Google Scholar
Roberts, M. et al. Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans. Nat. Mach. Intell. 3, 199–217 (2021).
Article Google Scholar
Lin, D. Y. & Wei, L. J. The robust inference for the Cox proportional hazards model. J. Am. Stat. Assoc. 84, 1074–1078 (1989).
Article Google Scholar
Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. in 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings. https://arxiv.org/abs/1409.1556 (2015).
Lee, C., Zame, W. R., Yoon, J. & Van Der Schaar, M. DeepHit: a deep learning approach to survival analysis with competing risks. in 32nd AAAI Conference on Artificial Intelligence, AAAI 2018 (2018).
Antolini, L., Boracchi, P. & Biganzoli, E. A time-dependent discrimination index for survival data. Stat. Med. 24, 3927–3944 (2005).
Article PubMed Google Scholar
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. in Proceedings of the IEEE conference on computer vision and pattern recognition 770–778 (2016).
Howard, A. G. et al. MobileNets: Efficient convolutional neural networks for mobile vision applications. https://arxiv.org/abs/1704.04861 (2017).
Szegedy, C., Ioffe, S., Vanhoucke, V. & Alemi, A. A. Inception-v4, inception-resnet and the impact of residual connections on learning. In Thirty-First AAAI Conference on Artificial Intelligence (2017).
Tan, M. & Le, Q. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning (eds. Chaudhuri, K. & Salakhutdinov, R.) 97, 6105–6114 (PMLR, 2019).
Selvaraju, R. R. et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vis. https://doi.org/10.1007/s11263-019-01228-7 (2020).
Pearson Correlation. In A Practical Approach to Using Statistics in Health Research https://doi.org/10.1002/9781119383628.ch17 (2018).
Ishwaran, H., Kogalur, U. B., Blackstone, E. H. & Lauer, M. S. Random survival forests. Ann. Appl. Stat. 2, 841–860 (2008).
Article Google Scholar
Hung, H. & Chiang, C. Estimation methods for time-dependent AUC models with survival data. Can. J. Stat. 38, 8–26 (2010).
Google Scholar
Yang, J. et al. Elevated cardiac biomarkers may be effective prognostic predictors for patients with COVID-19: A multicenter, observational study. Am. J. Emerg. Med. 39, 34–41 (2021).
Article Google Scholar
Lippi, G. & Plebani, M. Laboratory abnormalities in patients with COVID-2019 infection. Clin. Chem. Lab. Med. 58, 1131–1134 (2020).
Article CAS PubMed Google Scholar
Kantidakis, G. et al. Survival prediction models since liver transplantation-comparisons between Cox models and machine learning techniques. BMC Med. Res. Methodol. 20, 1–14 (2020).
Article Google Scholar
Schwab, P. et al. Real-time prediction of COVID-19 related mortality using electronic health records. Nat. Commun. 12, 1058 (2021).
Article CAS PubMed PubMed Central Google Scholar
Barda, N. et al. Developing a COVID-19 mortality risk prediction model when individual-level data are not available. Nat. Commun. 11, 4439 (2020).
Article CAS PubMed PubMed Central Google Scholar
Feng, Z. et al. Early prediction of disease progression in COVID-19 pneumonia patients with chest CT and clinical characteristics. Nat. Commun. 11, 4968 (2020).
Article CAS PubMed PubMed Central Google Scholar
Yue, H. et al. Machine learning-based CT radiomics method for predicting hospital stay in patients with pneumonia associated with SARS-CoV-2 infection: a multicenter study. Ann. Transl. Med. 8, 859 (2020).
Article PubMed PubMed Central CAS Google Scholar
Lambin, P. et al. Radiomics: the bridge between medical imaging and personalized medicine. Nat. Rev. Clin. Oncol. 14, 749–762 (2017).
Article PubMed Google Scholar
Liu, X. et al. Risk factors associated with disease severity and length of hospital stay in COVID-19 patients. J. Infect. https://doi.org/10.1016/j.jinf.2020.04.008 (2020).
Efron, B. Logistic regression, survival analysis, and the Kaplan-Meier curve. J. Am. Stat. Assoc. 83, 414–425 (1988).
Article Google Scholar
World Health Organization. Clinical management of severe acute respiratory infection when novel coronavirus (2019-nCoV) infection is suspected: interim guidance 21 (2020).
China, N. H. C. of. Diagnosis and treatment protocol for novel coronavirus pneumonia (Trial version 7). Chin. Med. J. 133, 1087–1095 (2020).
Article Google Scholar
Li, Y. & Xia, L. Coronavirus disease 2019 (COVID-19): role of chest CT in diagnosis and management. Am. J. Roentgenol. 214, 1280–1286 (2020).
Article Google Scholar
Zhao, W., Zhong, Z., Xie, X., Yu, Q. & Liu, J. Relation between chest CT findings and clinical conditions of coronavirus disease (covid-19) pneumonia: a multicenter study. Am. J. Roentgenol. 214, 1072–1077 (2020).
Article Google Scholar
Rubin, G. D. et al. The role of chest imaging in patient management during the covid-19 pandemic: a multinational consensus statement from the fleischner society. Radiology 158, 106–116 (2020).
CAS Google Scholar
Wang, S. et al. A fully automatic deep learning system for COVID-19 diagnostic and prognostic analysis. Eur. Respir. J. 56, 2000775 (2020).
Article CAS PubMed PubMed Central Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014).
Google Scholar
Klambauer, G., Unterthiner, T., Mayr, A. & Hochreiter, S. Self-normalizing neural networks. In Advances in Neural Information Processing Systems 972-981 (NIPS, 2017).
Harrell, F. E., Califf, R. M., Pryor, D. B., Lee, K. L. & Rosati, R. A. Evaluating the yield of medical tests. JAMA J. Am. Med. Assoc. 247, 2543–2546 (1982).
Article Google Scholar
DiCiccio, T. J. & Efron, B. Bootstrap confidence intervals. Stat. Sci. 11, 189–228 (1996).
Article Google Scholar

Download references

Acknowledgements

This research is partially supported by the Chinese Postdoctoral Science Foundation under Grant 2020M681306, the Chongqing Medical Research Program under Grant 2021MSXM052, the National Natural Science Foundation of China under Grant 919591260, 81974276, and 62001292, and Application demonstration and evaluation of innovative medical devices under Grant WX2017-007-05.

Author information

These authors contributed equally: Jun Wang, Chen Liu, Jingwen Li, Cheng Yuan, and Lichi Zhang

Authors and Affiliations

School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
Jun Wang, Cheng Yuan, Lichi Zhang, Jianwei Xu, Yaofeng Wen & Dahong Qian
Department of Radiology, Southwest Hospital, Third Military Medical University (Army Medical University), Chongqing, China
Chen Liu & Jian Wang
Department of Gastroenterology, Southwest Hospital, Third Military Medical University (Army Medical University), Chongqing, China
Jingwen Li
Department of Radiation Oncology, Stanford University School of Medicine, Stanford, CA, USA
Cheng Jin
College of Media, Communication University of Zhejiang, Hangzhou, China
Yaqi Wang
College of Computer Science and Technology, Zhejiang University, Hangzhou, China
Hongbing Lu
Department of Nuclear Medicine, Ruijin Hospital, Shanghai, China
Biao Li & Dahong Qian
Department of Thoracic Surgery, Shanghai Pulmonary Hospital, Tongji University School of Medicine, Shanghai, China
Chang Chen
Department of Radiology, General Hospital of Southern Theatre Command, PLA, Guangzhou, China
Xiangdong Li
Department of Radiology, Huoshenshan Hospital, Wuhan, China
Xiangdong Li
School of Biomedical Engineering, ShanghaiTech University, Shanghai, China
Dinggang Shen
Department of Research and Development, Shanghai United Imaging Intelligence, Co., Ltd, Shanghai, China
Dinggang Shen

Authors

Jun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chen Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jingwen Li
View author publications
You can also search for this author in PubMed Google Scholar
Cheng Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Lichi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Cheng Jin
View author publications
You can also search for this author in PubMed Google Scholar
Jianwei Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yaqi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yaofeng Wen
View author publications
You can also search for this author in PubMed Google Scholar
Hongbing Lu
View author publications
You can also search for this author in PubMed Google Scholar
Biao Li
View author publications
You can also search for this author in PubMed Google Scholar
Chang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xiangdong Li
View author publications
You can also search for this author in PubMed Google Scholar
Dinggang Shen
View author publications
You can also search for this author in PubMed Google Scholar
Dahong Qian
View author publications
You can also search for this author in PubMed Google Scholar
Jian Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.W., C.L., J.W.L, C.Y., and L.C.Z. are the co-first authors of this paper. L.C., D.G.S., X.D.L., D.H.Q., and J.W. contributed to the study conception and design. J.W. designed, implemented, and evaluated the deep learning framework. C.L. and J.W.L. collated the data set. C.Y., L.C.Z., C.J., and J.W.X. contributed to the initial draft of the manuscript. All authors contributed to the data preparation and revision of the manuscript for important content.

Corresponding authors

Correspondence to Xiangdong Li, Dinggang Shen, Dahong Qian or Jian Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wang, J., Liu, C., Li, J. et al. iCOVID: interpretable deep learning framework for early recovery-time prediction of COVID-19 patients. npj Digit. Med. 4, 124 (2021). https://doi.org/10.1038/s41746-021-00496-3

Download citation

Received: 31 March 2021
Accepted: 21 July 2021
Published: 16 August 2021
DOI: https://doi.org/10.1038/s41746-021-00496-3

This article is cited by

Evaluation of VQC-LSTM for disability forecasting in multiple sclerosis using sequential multisequence MRI
- John Mayfield
- Issam El Naqa
Quantum Machine Intelligence (2024)
A Comprehensive Survey on Pandemic Patient Monitoring System: Enabling Technologies, Opportunities, and Research Challenges
- Charu Krishna
- Dinesh Kumar
- Dharmender Singh Kushwaha
Wireless Personal Communications (2023)
COVID-19 imaging, where do we go from here? Bibliometric analysis of medical imaging in COVID-19
- Ru Wen
- Mudan Zhang
- Xianchun Zeng
European Radiology (2023)
Machine Learning-Based COVID-19 Patients Triage Algorithm Using Patient-Generated Health Data from Nationwide Multicenter Database
- Min Sue Park
- Hyeontae Jo
- Hyung Ju Hwang
Infectious Diseases and Therapy (2022)