Deep radiomics-based survival prediction in patients with chronic obstructive pulmonary disease

Heterogeneous clinical manifestations and progression of chronic obstructive pulmonary disease (COPD) affect patient health risk assessment, stratification, and management. Pulmonary function tests are used to diagnose and classify the severity of COPD, but they cannot fully represent the type or range of pathophysiologic abnormalities of the disease. To evaluate whether deep radiomics from chest computed tomography (CT) images can predict mortality in patients with COPD, we designed a convolutional neural network (CNN) model for extracting representative features from CT images and then performed random survival forest to predict survival in COPD patients. We trained CNN-based binary classifier based on six-minute walk distance results (> 440 m or not) and extracted high-throughput image features (i.e., deep radiomics) directly from the last fully connected layer of it. The various sizes of fully connected layers and combinations of deep features were experimented using a discovery cohort with 344 patients from the Korean Obstructive Lung Disease cohort and an external validation cohort with 102 patients from Penang General Hospital in Malaysia. In the integrative analysis of discovery and external validation cohorts, with combining 256 deep features from the coronal slice of the vertebral body and two sagittal slices of the left/right lung, deep radiomics for survival prediction achieved concordance indices of 0.8008 (95% CI, 0.7642–0.8373) and 0.7156 (95% CI, 0.7024–0.7288), respectively. Deep radiomics from CT images could be used to predict mortality in COPD patients.

www.nature.com/scientificreports/ Radiomics has been proposed to explore the correlation among medical images, other -omics, and clinical parameters, and interest in its application has been growing since it has the potential to provide significant interpretive and predictive information to support decision making. Radiomics is the process of extracting high-throughput quantitative features from radiographic images and building predictive models relating image features to genomic patterns and clinical outcomes 8 . In the past few years, a number of radiomics models have been proposed for tumor classification [9][10][11] , survival prediction 12,13 , and recurrence prediction 14,15 .
In radiomics-based analysis, high-throughput feature extraction (i.e., radiomics) is a critical task. In previous studies, most extracted features were designed by hand or explicitly. In the field of COPD, quantitative CT imaging methods have been proposed to provide more precise and reproducible estimates of the severity and distribution of emphysema and airway disease 16 . In the early days of quantitative imaging biomarkers, research was weighted toward quantification of emphysema, and there has recently been an increasing number of publications targeting the airway component of COPD, which includes direct airway parameter measurements and quantification of air trapping as functional manifestations of small airway disease 17 . Moreover, quantitative pulmonary vascular features turned out to be associated with COPD severity and emphysema extent 18 . Although the number of handcrafted features can reach tens of thousands, these features are shallow and low order. They may not fully characterize image heterogeneity and may limit the potential of radiomics models. Therefore, it is necessary to assess deeper and higher-order features that may improve the predictive performance of radiomics models.
Recently, the performance of deep learning has been intensively demonstrated in computer vision. In particular, a convolutional neural network (CNN), which uses a trainable filter bank with an extensive weight-sharing scheme, can quickly outperform state-of-the-art approaches in many computer vision tasks, including image classification and segmentation [19][20][21][22] . These deep learning-based approaches have also impressive results for CT analysis in COPD 23 .
CNN can be incorporated into current radiomics models by extracting many deep features from hidden layers 24,25 . These deep features, extracted not by feature engineering (handcrafting) but by feature learning, could contain more representative and high-level medical image information and provide more predictive patterns compared to handcrafted features. In this paper, we propose a deep feature-based radiomics model for predicting the overall survival of COPD patients.

Materials and methods
Patients. There were two groups of patients: (1) a discovery cohort with 344 patients from the Korean Obstructive Lung Disease (KOLD) cohort and (2) an external validation cohort with 102 patients from Penang General Hospital in Malaysia. The inclusion and exclusion criteria for the discovery cohort have been published previously 26 . In short, patients over 18 years of age with chronic respiratory symptoms and airflow limitations or bronchial hyperresponsiveness were included. Between June 2005 and April 2012, 344 patients with an established COPD diagnosis and available volumetric chest CT scans taken at the time of registration were enrolled. Subjects underwent PFTs within two weeks of volumetric chest CT image acquisition. Baseline clinical characteristics, PFT results, six-minute walk distance (6MWD) results, and survival information were documented for all patients. Our institutional review board approved this study, and written informed consent was obtained from all patients. The external validation cohort included patients in the chest clinic of the 1200-bed Penang General Hospital, which is part of the Asian Network of Obstructive Lung Disease 27 . All patients with stable COPD who were referred to or followed up were invited to participate. Prospective data of 112 eligible COPD subjects was available for mortality analysis. Inclusion and exclusion criteria have been published previously 28 . Of 112 patients initially eligible, 10 were excluded due to poor CT image quality. The median follow-up time was 1000 days (range, 60-1400). Quantitative CT and clinical demographic data were collected at the time of study entry. Written informed consent was obtained from all participants. Research and ethical approval was obtained from the National Research and Ethics Committee of Malaysia (NNMR-13-313-15138). All methods were performed in accordance with the relevant guidelines and regulations.
Volumetric chest data acquisition. Discovery cohort volumetric chest CT scans were obtained using 16-or 64-slice multidetector CT (MDCT) scanners produced by two different manufactures (259 CT scans using SOMATOM Sensation 16 or SOMATOM Definition AS from Siemens Healthineers AG, Bonn, Germany; 85 CT scans using Philips Brilliance 16, 40, or 64 from Philips Medical Systems, Best, Netherlands). Patients were scanned craniocaudally in the supine position during full inspiration. Routine administration of intravenous contrast media was not required for image acquisition using either type of scanner. CT scan parameters were: collimation of 16 × 0.75 mm, effective mAs of 100, kVp of 140, and pitch of 1. CT data were reconstructed at a 0.75-mm slice thickness and 0.7-mm increment using a B30f kernel for Siemens scanners and a 0.8-mm slice thickness and 0.8-mm increment using a standard reconstruction algorithm for Philips scanners.
External validation cohort volumetric chest CT scans were obtained using a 64-slice MDCT scanner (SOMATOM Sensation 64; Siemens Healthineers AG, Forchheim, Germany) at the Loh Guan Lye Specialist Centre in Penang, Malaysia. CT scans were obtained using standardized protocol from the Research Institute of Radiology of the Asan Medical Center in Seoul, South Korea 4,29 . The CT scan parameters were a collimation of 0.75 mm, effective mAs of 100, kVp of 140, and pitch of 1. Patients were scanned craniocaudally in the supine position during full inspiration. Images were reconstructed using a soft kernel (B30f; Siemens Healthineers AG) from thoracic inlet to lung base. Image quality and protocol compliance were verified by the Asan Medical Center.

Deep features extraction.
We trained a CNN model to obtain high-level representative information from medical images, and high-throughput image features (i.e., deep radiomics) were directly extracted from the last fully connected layer. The CNN performed binary classification based on 6MWD testing, one of the most impor-  30 , so 440 m was selected as the optimal threshold. Therefore, training the classifier with deep learning using 6MWD results > 440 m could not only reveal known prognostic features but also potentially identify previously unknown ones. With the value of 440 m, the discovery cohort consisted of 157 patients with 6MWD over 440 m and 187 patients with 6MWD less than 440 m, not causing data imbalance in binary classification. The baseline clinical characteristics and the Kaplan-Meier-estimated cumulative survivals of these two groups were compared in Table 1 and Fig. 1, respectively, and a CNN-based binary classifier was designed to distinguish these two groups. Due to the nature of the discovery cohort and the limited processing capabilities of existing graphical processing units, the whole slices were not available. Even if it is possible to utilize the whole slices with reduced resolution, there should be a sufficient number of datasets to be able to train them. Considering the number and mortality of our dataset, training the whole slices was difficult. So, with the advices of expert radiologists, 11 representative CT slices were selected for each patient based on predetermined anatomic landmarks and used for CNN input: (1) three coronal slices at the vertebral body, center of the tracheal carina, and superior vena cava, (2) two sagittal slices at the center of the left and right lung, and (3) six axial slices (two uppers, center and three lowers) at 2-cm intervals at the center of the tracheal carina (Fig. 2). Anatomic landmarks were selected via consensus of two thoracic radiologists (each with more than 20 years of experience) with the idea that information on the distribution of emphysema or airway changes should be necessary, and 11 CT slices in both cohorts were manually selected by an experienced research assistant, and then finally confirmed by a thoracic radiologist. In the case of axial slices, 2-cm intervals were adopted as a way of obtaining the maximum information with a small number of slices. To classify low-risk patients based on 6MWD results, the CNN was designed to have five convolution blocks and one fully connected layer (Fig. 3). Each convolution block consisted of a convolutional layer with 32 learnable filters followed by batch normalization, rectified linear unit activation, and max-pooling. To prevent overfitting, a connection dropout probability of 0.5 was added to the fully connected layer. Finally, low-risk probabilities were calculated using the softmax function. Using the same CNN architecture, 11 models were separately trained with each of the 11 selected slices and then used to extract deep features from each image. The performances of 11 models were validated using five-fold cross-validation (Supplementary Table S1). We could not verify them with the external validation cohort since there was no 6MWD information, but performance can be inferred Table 1. Baseline clinical characteristics and mortality data. Continuous variables are presented as mean ± standard deviation and categorical data are presented as the number of patients with percentages in parentheses. 'Follow-up' and '6MWD' are presented as the mean with minimum and maximum values in parentheses. 6MWD six-minute walk distance, PFT pulmonary function test, FEV 1 forced expiratory volume in 1 s, FVC forced vital capacity, GOLD Global Initiative for Chronic Obstructive Lung Disease. *Overall survival was investigated as study end-point, which is defined as the time until death from any cause. **6MWD information was only in the discovery cohort, and Supplementary Figure S2 shows the distribution of it.

Survival analysis and statistical comparison.
To predict overall survival in COPD patients, random survival forest (RSF) with deep features of each slice was performed. Our deep features were predominantly black box features so that it was difficult to effectively reduce multiple collinear and correlated predictors that could produce unstable estimates and might overfit predictions. RSF is a censored data extension of the Random Forest method, where the ensemble survival function is constructed by aggregating tree-based estimator 31 . We expected that RSF would effectively ensemble the features extracted from our censored data to predict mortality. RSF analysis was implemented using the randomForestSRC R package with default settings. RSF calculated a survival curve for each patient, grew a forest using log-rank splitting, and then averaged the results of the forest, obtaining a stable result. Using RSF, we could predict the survival and cumulative hazard function of individuals. The performance of the proposed deep radiomics-based survival prediction model was evaluated in two independent datasets: (1) the discovery cohort (KOLD 26 ) and (2) the external validation cohort (Malaysia 28 ). For a quantified comparison, we computed the concordance probability (C-index) and time-dependent area under the receiver operating characteristic (ROC) curve (AUC). C-index is the frequency of concordant pairs among all pairs of subjects and can be used to measure and compare the discriminative power of a risk prediction model. The time-dependent AUC has incorporated time dependency in AUC in time-event data for individuals instead of using the standard ROC curve approach 32,33 , dealing with censored data and yielding different values of AUC at each time point. Internal validation used ROC curves for 3-and 5-year survival, but external validation was not calculated time-dependent AUC because of its follow-up duration. The time-dependent AUC was implemented using the timeROC R package.

Results
There were two groups of patients: (1) a discovery cohort with 344 patients from KOLD cohort 26 and (2) an external validation cohort with 102 patients from Penang General Hospital in Malaysia 28 . Baseline clinical characteristics and mortality data are summarized in Table 1. Overall survival was investigated as the study end-point, which is defined as the time until death from any cause, and Supplementary Figure S1 shows the Kaplan-Meier survival curves of two cohorts.
Internal validation. Deep feature-based survival analysis was performed via five-fold cross-validation of the discovery cohort to determine which features provide more high-level medical image information and predictive patterns. In order to find the optimal number of features suitable for predicting overall survival in COPD patients, we trained the binary classifier of 6MWD test results with various sizes of fully connected layers of 128, 256, 512, 1024, and then extracted deep features using the representative 11 CT slices. Mortality prediction performance was evaluated using combinations of deep features of each slice, which were obtained by pooling (e.g., for the combination of C1 and C2 with 256 deep features, 512 deep features were used as the input to the RSF). We performed RSF for all combinations of 11 slices. Depending on the number of selected samples, the following numbers of combinations were produced, and RSF was performed for a total of 2047 combinations: 1-and 10-selection, 11 combinations; 2-and 9-selection, 55 combinations; 3-and 8-selection, 165 combinations; 4-and   www.nature.com/scientificreports/ External validation. The deep radiomics model was evaluated against an external validation cohort (102 CT scans). Top 5 combinations of deep features were used to evaluate the proposed method since they had the best results on internal validation (Table 2). In the external validation, the Rank 1 combination of the internal validation showed also the best C-index of 0.7156 (95% CI, 0.7024-0.7288).

Discussion
The current study found that a deep radiomics approach for survival prediction in COPD patients was feasible and showed acceptable performance, which was confirmed by concordant results in an external validation cohort. The deep features from the CNN model using COPD patients' chest CT data were found to be significant and independent predictors of mortality in both the discovery and external validation groups. Many methodologies using quantitative CT features for quantitative assessment of different COPD components have been studied, but few studies have adapted an integrative approach 3,34,35 . A deep radiomics approach using a CNN to extract many learned features from chest CT images may be helpful in developing clinically useful decision support models.
In this study, a chest CT-based deep radiomics approach with a CNN was used for the first time to predict survival in COPD patients. The CNN performed binary classification based on 6MWD results (> 440 m or not). In the discovery and external validation cohorts, with combining the coronal slice of the vertebral body and two sagittal slices of the left/right lung (C1 + S1 + S2, 256 × 3 deep features), deep radiomics for survival prediction achieved C-indices of 0.8008 (95% CI, 0.7642-0.8373) and 0.7156 (95% CI, 0.7024-0.7288), respectively, and AUC for 3-and 5-year survival was and 0.8878 (95% CI, 0.7900-0.9856) and 0.8411 (95% CI, 0.7901-0.8922), respectively. Comparing performances of the best combination to using all slices for each number of deep features (Supplementary Table S2), it is meaningful to use slices selectively. Among studies using quantitative CT features as a predictor of survival in COPD patients [36][37][38] , Cho et al. 38 reported the performance using the same datasets of ours. Five features were selected as the final radiomics signature; (1) a percentage of low attenuation area; (2) airway wall thickness of 6th generation bronchus at an internal perimeter of 10 mm; (3) heterogeneity of percentage wall area; (4) heterogeneity of airway wall thickness at an internal perimeter of 10 mm; (5) average pulmonary vessel cross-sectional area measured at 18 mm from the pleural surface. C-indices of five final radiomics signature were 0.699, 0.531, 0.615, 0.542, and 0.605, respectively, and the combinations of radiomics signature were 0.774. In the same datasets, our deep feature-based survival prediction model outperformed compared to the quantitative CT features. Moll et al. 37 proposed a survival prediction model using a combination of clinical and quantitative CT features which reported a C-index ≥ 0.7 and showed 6MWD as the most important predictor. In our discovery cohort, a prediction model with only 6MWD achieved C-index of 0.6072 (95% CI, 0.6014-0.6130), and our model surpassed it. The mortality prediction performance of our model was externally validated in a separate group of patients. Patient enrollment for the discovery cohort began much earlier (June 2005) than for the external validation cohort (September 2013), which inevitably led to differences in follow-up duration (mean, 69.8 vs. 32.9 months). Although there were some differences in characteristics between two groups including a considerable difference in follow-up duration, external validation was performed to demonstrate the generalizability and transportability of our model.
Both spirometry and multidimensional indices are limited in that they cannot fully represent the type and range of morphologic alterations that may be detectable before functional parameters begin to deteriorate. With a deep radiomics approach, essential information related to phenotypic heterogeneity and pathophysiology may be learned from medical images and used to improve medical decision-making in COPD patients. In the current study, the deep radiomics approach was confined to survival prediction in COPD patients. However, we believe that a deep radiomics approach could potentially be applied to other facets of COPD, such as reliable phenotyping, predicting acute exacerbation, and monitoring treatment response.
The current study is subject to several limitations. First, although our deep feature-based survival prediction model has been integrally analyzed in the discovery and external validation cohorts, a larger study population would have been beneficial, especially in the external validation group. Second, because all patients included in this study were Asian, the results may not be applicable to patients of other ethnicities. However, a major strength of the current research is that the positive discovery group findings were externally validated using a group of patients of a different nationality. That said, further validation of the findings in a larger-scale study with patients of different ethnicities is warranted. Third, the proportion of males in our discovery and external validation cohorts amount to 98.3% and 93.1% respectively, resulting in the striking gender imbalance. The imbalance can partly be explained by the cultural environment where the smoking rate in men (36.7% in South Korea, and 43.0% in Malaysia) is overwhelmingly higher than that of women (7.5% in South Korea, and 1.4% in Malaysia) in general population 39,40 , and sexual difference in COPD prevalence [40][41][42] . Nevertheless, we should admit that our model is likely to be bias/specific for male lungs due to the anatomical and physiological differences between males and females. Fourth, our model requires that 11 representative CT slices should be manually selected by an expert. It can be an additional workload for the radiologists, and it has the possibility of incorrect selection of them. The development of a deep learning-based system for detecting anatomic landmarks could be helpful. Fifth, the deep radiomics features were predominantly black box features; therefore, they need to be interpreted by case review and other technical methods. Lastly, the survival prediction performance of the deep radiomics model was not directly compared with that of other various clinical risk-scoring systems such as the Body mass index, airflow Obstruction, Dyspnea, and Exercise (BODE) index, and the incremental value of the deep radiomics model was not fully investigated. In the future, evaluating the relationship between deep features and traditional lung density measurements would be useful and potential added value.
In conclusion, a deep radiomics approach for survival prediction was feasible. The performances of 20 models (top 5 combinations in each size 128, 256, 512, and 1024 of deep features) were compared, and the www.nature.com/scientificreports/ highest C-index of 0.8088 8008 (95% CI, 0.7642-0.8373) was obtained by combining 256 features each from a coronal slice and two sagittal slices (C1 + S1 + S2), as confirmed by concordant results (C-index, 0.7156; 95% CI, 0.7024-0.7288) in an external validation group. The models with 256 deep features performed superior, C1 + S1 + S2 performed best, but there is a risk of false discovery because the differences in the results of different combinations are insignificant.

Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.