Study on the prognosis predictive model of COVID-19 patients based on CT radiomics

Making timely assessments of disease progression in patients with COVID-19 could help offer the best personalized treatment. The purpose of this study was to explore an effective model to predict the outcome of patients with COVID-19. We retrospectively included 188 patients (124 in the training set and 64 in the test set) diagnosed with COVID-19. Patients were divided into aggravation and improvement groups according to the disease progression. Three kinds of models were established, including the radiomics, clinical, and combined model. Receiver operating characteristic curves, decision curves, and Delong’s test were used to evaluate and compare the models. Our analysis showed that all the established prediction models had good predictive performance in predicting the progress and outcome of COVID-19.

that radiomics from unenhanced chest CT images can accurately predict the extent and type of pulmonary opacities, as well as the patient outcome, by comparing the radiomics and clinical variables, with an AUC reaching 0.99 in predicting disease severity 12 . Li et al. constructed a model integrating information from radiomics and deep learning features to discriminate critical cases from severe cases using CT images, with an AUC up to 0.909 13 .
From the perspective of disease progress and outcome of patients with COVID-19, this study aims to explore the factors that affect the prognosis of patients and develop a predictive model, which is not only conducive to furthering our understanding of the disease, but also to provide significant clinical guidance for the management of patients, the rational allocation of medical resources, and the selection of appropriate treatment modalities.

Methods
Patients. The study was approved by the Ethics Committee of the Second Affiliated Hospital of Harbin Medical University (KY2019-183). The informed consent of patients was omitted with the permission of the Ethical Committee of the Second Affiliated Hospital of Harbin Medical University because the retrospective study design according to the ethical guidelines for Medical and Health Research Involving Human Subjects. All experiments were performed in accordance with relevant named guidelines and regulations. The inclusion criteria were as follows: (a) RT-PCR confirmed COVID-19 diagnosis; (b) chest CT imaging data were available, and; (c) baseline information could be obtained. The training cohort comprised 124 patients diagnosed with COVID-19 between January 25 and July 30, 2020. We reviewed the medical record system to obtain case data, including demographics, comorbidities, physiological data, laboratory tests, and CT images from the picture archiving and communication system (PACS). We chose the first laboratory examination and CT images during the patient's hospitalization to conduct statistical analysis and modeling. The exclusion criteria were as follows: (a) substantial motion artifacts in the CT images (n = 15); (b) small or inconspicuous lesions that could not be identified by CT (n = 24); and (c) unavailable imaging or clinical data (n = 21). In total, 60 patients were excluded from the study.
The included patients were separated into two groups according to the course of the disease: aggravation and improvement. Aggravation was defined as a composite endpoint when patients developed respiratory failure, acute respiratory distress syndrome (ARDS), acute liver or kidney injury, or death. Similar methods have been used in previous studies to assess the severity of serious infectious diseases 14,15 . Patients lacking these conditions were placed in the "improvement" group. At the same time, another 64 cases from Provincial tuberculosis hospital were collected as an external validation cohort, called the test cohort, according to strict screening criteria mentioned earlier to minimize selective bias.
Of the 188 patients described in our study, 56 were included in another published work, which was a descriptive study focusing on the clinical characteristics of patients with or without entry into the Intensive Care Unit (ICU) 16 .
CT imaging protocol. Image acquisition was performed using a Philips 256iCT and GE 11800i scanner.
Patients were supine with head advanced, and continuous scanning was performed from the top of the lung to the bottom of the lung. Chest CT protocol was as follows: tube voltage, 120 kV; tube current, 200 mA; slice thickness, 5 mm; standard algorithm thin layer reconstruction to 1-2 mm. No patients received intravenous contrast medium.
Image processing and lesion segmentation. Two radiologists with 3 and 7 years' experience, respectively, in chest imaging diagnosis identified the lesions using the Dr. Wise multimodal scientific research platform (version number: V1.6, website: http:// keyan. deepw ise. com/) for automatic recognition and segmentation of signs of pneumonia. The region of interest (ROI) was manually examined and selected by doctors for feature extraction of typical viral pneumonia lesions, such as patchy GGO and interstitial changes in the early stage (especially in the periphery zone of the lung), multiple infiltrative GGO of both lungs or consolidation afterward, removed isolated small nodules below 3 mm, suspected simple bacterial infection, suspected old tuberculosis and suspected tumor lesions. Questionable lesions were confirmed by a third senior doctor. All the doctors were blinded to the patient's outcome in this process. The schematic diagram of lesion segmentation is illustrated in Fig. 1.
Feature extraction and modeling. Image standardization was performed using B-spline interpolation sampling technology for resampling, and all CT images were resampled to 1.0 × 1.0 × 1.0 mm 3 voxels, which could effectively solve the influence of different machine parameters when scanning. The PyRadiomics package (Version 2.1.0, https:// pyrad iomics. readt hedocs. io/) was used to extract image features from all labeled ROIs. The original CT image was pre-processed by wavelet transform and Laplacian Gaussian transform. Features were extracted from the pre-processed and original images.
Intra-class correlation coefficient (ICC) was calculated to evaluate the reliability and repeatability of observers. The first observer completed the segmentation of all lesions. After an interval of 14 days, 20 patients were randomly selected and segmented again to evaluate the repeatability within the observer. The second observer performed another segmentation of the above 20 patients, and the imaging features obtained by the first and second observers were compared to evaluate the repeatability between observers. The features with ICC ≥ 0.75 were considered to be reliable and stable to build models. Features were then screened by the F-test method. The radiomics models were established by fivefold cross-validation and five classical machine learning algorithms, namely Logistic Regression (LR), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), and Extreme Gradient Boosting (XGBoost).
Meanwhile, we constructed the clinical model with demographic and laboratory examination results, and the combined model integrating both clinical and radiomics models. The workflow of our study is shown in Fig. 2 Statistical analysis. Software R (Version 3.6.3) 17 was used for data analysis and figure plotting. For the analysis of measurement data, the Kolmogorov-Smirnov test was first used to evaluate normality, and the data with normal distribution and homogeneity of variance were analyzed by independent sample t-test and expressed as mean [standard deviation]. Otherwise, the data were analyzed by the Mann-Whitney U test and expressed as median [IQR]. Chi-square test and Fisher's exact probability method were used to compare the differences between groups. DeLong's test was used to compare the differences among the AUCs using the pROC package. Differences with P < 0.05 were considered to be statistically significant.   The decreased lymphocyte contents, increased D-dimer, C-reactive protein and lactate dehydrogenase (LDH) showed a similar trend in P-value, and were selected to build the clinical model, together with age and sex, which are prognostic factors that couldn't be ignored in clinic though with no statistical significance. Comparison of the basic information between the two cohorts is presented in Table 1. The demographic and laboratory characteristics are presented and compared in detail (see Supplementary Table S1 online).

ICC test results.
In this study, a total of 1218 imaging features were extracted from the segmented ROIs in the training set. The ICC of only 28 features were below 0.75 and were removed, indicating that the automatic recognition function has good stability, and the established models are reliable. The bars of the intra and interobserver ICC is shown in Supplementary Fig. S1 online.
Establishment and evaluation of the diagnostic models. We used five machine learning classifiers (LR, SVM, DT, RF, and XGBoost) to establish the radiomics models. The radiomics score calculated by the regression coefficients of the final features multiplied by the value of the corresponding feature was signifi- www.nature.com/scientificreports/ cantly different in distinguishing the improvement and aggravation group in the five models (p < 0.05; Fig. 3). 19 features were eventually screened to establish the radiomics model, including 7 first-order features, 3 shape features, 9 texture features (1 gldm feature and 6 glszm features, 2 glrlm feature). Details of the selected features in the LR model are shown in Supplementary Table S2  The statistical efficacy of the different models in the test set is presented in Table 2. The precision-recall graph of the LR radiomics model is shown in Supplementary Fig. S4 online. According to DeLong's test, there were no significant differences among the radiomics, clinical and combined models in the test set as shown in Table S3. Predictive efficacy was also compared within the radiomics models (see Supplementary Table S4 online).
The overfitting evaluation of models in the external validation showed that there were no statistical differences in the AUC values between the training set and the test cohort ( Table 3). The decision curves of the models showed great clinical application value 18 . The gray line is the net benefit of assuming that all patients were aggravated; the black line is the net benefit of assuming no patients aggravated; and the green, pink and red lines are the expected net benefit based on the predictive models, with the combined model (red line) showing the highest net benefit (Fig. 5).

Discussion
In this study, we comprehensively explored the clinical and radiomics characteristics of patients with COVID-19, focusing on establishing models in predicting the course of the disease. Results showed that using the first CT radiomics and clinical factors would play an ideal role to predict whether the disease would aggravate or improve, so as to make clinical decisions timely.
Due to advancements in imaging technology, diagnosis is developing from qualitative to quantitative, which can provide an objective evaluation of the heterogeneity of lesions 19 . Radiomics has been used in the field of oncology initially. Changes in voxels (reflected in the changes of radiomics features such as intensity, shape, texture or wavelet), can predict the response of a certain treatment 20 and the survival time of the patient 21 ; but there are only a few studies of radiomics in pneumonia. Rivka et al. had explored a radiomics model to predict pneumonitis induced by immunotherapy 22 . A review by Chumbita et al. highlighted recently published artificial intelligence approaches being used to support clinical decision-making processes in pneumonia 23 . With regard to COVID-19, Chen et al. 24 constructed a radiomics nomogram based on CT images to predict disease prognosis. However, they divided patients into absorption group and consolidation (progression) group, based only on the condition of radiological progression, rather than the development of the patients' actual clinical condition mentioned in our article. Furthermore, they included only 40 patients in the study and segmented 180 ROIs. They regarded each ROI as an independent case for grouping and analysis. In our study, we took the patient as a unit, not the lesion, which could minimize the overfitting phenomenon caused by different ROIs www.nature.com/scientificreports/ from the same patient. Chao et al. 25 used holistic information containing imaging and clinical data for COVID-19 outcome prediction. Their grouping standard (endpoint) was the need for ICU admission. However, the decision on whether a patient required admission to the ICU is determined by many uncertain factors and is a relatively subjective judgement. Our endpoint is relatively objective, based on whether the patient has serious complications or death in practice.
The clinical and laboratory characteristics of COVID-19 patients also play an important role in predicting the development of the disease. A study of critically ill patients with COVID-19 from Yang et al. 26 showed worse prognosis in older patients and, compared with survivors, non-survivors were older (64.6 vs 51.9 years old). Cho et al. reported that the mortality rate of men is higher, especially for patients aged 50-64 or ≥ 65 years 27 . In our study, there were no significant differences in demographic characteristics. The reasons for this discrepancy may be due to the small sample size. With regard to laboratory examination results, we observed a decrease in the number of lymphocytes and the content of hemoglobin, while the levels of C-reactive protein, D-dimer, and LDH were significantly increased. Elevated levels of C-reactive protein reflect an active inflammatory response in the body. According to a previous study, the increased D-dimer levels can increase the risk of thrombosis, embolism, and disseminated intravascular coagulation (DIC), and is strongly associated with disease progression and prognosis 28,29 . The level of LDH increased significantly in aggravated patients, and was related to organ injury. Other laboratory examination results were different between the two groups, but the values were either in the www.nature.com/scientificreports/ normal range or the differences were not significant in the test set, so they were not included in the clinical model. Although there was no statistical significance according to Delong's test, the combined model showed better performance than the clinical or radiomics models alone. Therefore, in a clinical setting, we still need to make a comprehensive judgment by considering all aspects of the patient's information, rather than just one aspect. The results of our study can help medical personnel judge disease development promptly and take appropriate measures in the early stage, such as timely admission to the ICU for close monitoring, and early application of visceral protective drugs to avoid more serious consequences that may endanger people's lives. Further, our study may help guide the reasonable allocation of medical resources, providing a new method for the management of patients with COVID-19.
Our research has some limitations: Firstly, due to the short period for collecting case information, the patients were not followed up after discharge, and only the disease course and prognosis during hospitalization were discussed, which will be supplemented in the follow-up study. Secondly, because of the retrospective study design, there is a certain bias in the selection of cases. Thirdly, the overall number of cases is relatively small, and the sample size must be expanded and the models verified by multicenter studies in the future. Fourthly, this paper does not include the treatment of patients during hospitalization, and cannot rule out the impact of medical intervention on the progress of the disease, thus affecting the prognosis. Lastly, the fact that patients received no intravenous contrast media was another limitation, since the presence of acute pulmonary embolism may carry prognostic information.
In conclusion, the predictive models we established based on clinical and radiomics factors can effectively predict the development of COVID-19, by predicting whether the patient's condition will aggravate at the early stage of hospitalization, so that healthcare providers can take corresponding measures in advance and improve the efficiency of clinical decision-making.  Table 3. Overfitting evaluation of the prediction models. P-value reflected the differences between the training and test cohorts, and P < 0.05 (two-sided) were considered statistically significant. AUC area under the curve; CI confidence interval; LR logistic regression; SVM support vector machine; DT decision tree; RF random forest; XGBoost extreme gradient boosting.