A novel CT-based radiomics model for predicting response and prognosis of chemoradiotherapy in esophageal squamous cell carcinoma

No clinically relevant biomarker has been identified for predicting the response of esophageal squamous cell carcinoma (ESCC) to chemoradiotherapy (CRT). Herein, we established a CT-based radiomics model with artificial intelligence (AI) to predict the response and prognosis of CRT in ESCC. A total of 44 ESCC patients (stage I-IV) were enrolled in this study; training (n = 27) and validation (n = 17) cohorts. First, we extracted a total of 476 radiomics features from three-dimensional CT images of cancer lesions in training cohort, selected 110 features associated with the CRT response by ROC analysis (AUC ≥ 0.7) and identified 12 independent features, excluding correlated features by Pearson’s correlation analysis (r ≥ 0.7). Based on the 12 features, we constructed 5 prediction models of different machine learning algorithms (Random Forest (RF), Ridge Regression, Naive Bayes, Support Vector Machine, and Artificial Neural Network models). Among those, the RF model showed the highest AUC in the training cohort (0.99 [95%CI 0.86–1.00]) as well as in the validation cohort (0.92 [95%CI 0.71–0.99]) to predict the CRT response. Additionally, Kaplan-Meyer analysis of the validation cohort and all the patient data showed significantly longer progression-free and overall survival in the high-prediction score group compared with the low-prediction score group in the RF model. Univariate and multivariate analyses revealed that the radiomics prediction score and lymph node metastasis were independent prognostic biomarkers for CRT of ESCC. In conclusion, we have developed a CT-based radiomics model using AI, which may have the potential to predict the CRT response as well as the prognosis for ESCC patients with non-invasiveness and cost-effectiveness.

www.nature.com/scientificreports/ESCC, even at the palliative stage.Despite the effectiveness of CRT for ESCC, a certain population of patients who undergo CRT experience subsequent recurrence within a relatively short period 6 .The resistance to CRT is one of the major causes of treatment failure in patients with ESCC 7 .However, the molecular characterization of CRT resistance is very complex, and it is extremely challenging to identify and decode the mechanism of CRT resistance using a basic biological approach.Therefore, it is necessary to find optimal clinical biomarkers that can distinguish responding and non-responding patients with ESCC.
Radiomics is a new quantitative analysis approach to medical imaging.The information generated about a large number of image features within tumors including their spatial and temporal heterogeneity can be applied to create diagnostic, prognostic, and predictive models.Radiomics analysis can be performed by extracting quantitative radiomics features from multimodality medical images, such as ultrasound (US), computed tomography (CT), magnetic resonance (MR), and positron emission tomography (PET) scans [8][9][10][11][12] .Recently, the technology of CT analysis has enabled high-level quantitative evaluation of features and pixel-based textures for tumor characterization 13 .Furthermore, machine-learning algorithms of artificial intelligence (AI) using CT images are boosting the powers of radiomics to predict treatment response and prognoses 14 .Such recent advances in radiomics technologies have opened a new era of radiomics-based biomarker discovery, which can reveal in-depth tumor characterization.In addition, the availability of large amounts of medical data, together with advanced computerized image analysis approaches with AI have paved a new path for identification of more precise and robust biomarkers.
Therefore, in this study, using a systematic and comprehensive biomarker discovery process with AI, we compared CT-based radiomics features of ESCC between responder and non-responders, established a novel, non-invasive, radiomics prediction model and then validated the model in an additional cohort.Moreover, using Kaplan-Meier survival analysis, we evaluated the model's performance in predicting the prognosis of CRT in ESCC patients.Finally, we performed univariate and multivariate Cox regression analysis to show superiority of the AI based radiomics model.

Patients and study design
This was a retrospective single-institution study at Tokushima University Hospital (Tokushima, Japan).We continuously enrolled a total of 50 patients with pathologically proven ESCC who underwent CRT as firstline treatment from February 2009 to September 2019, and generated datasets on February 24, 2022.Of these patients, 6 were excluded from this study due to lack of clinical information such as accurate survival time, and the remaining 44 were analyzed.Among the 44 patients, 27 were admitted and received CRT in the Department of Gastroenterology and Oncology, and 17 were in the Department of Thoracic, Endocrine Surgery and Oncology of Tokushima University Hospital.Because the ratio of the training cohort and validation cohort is reported to be approximately 6:4 15 , we used the former as a training cohort and the latter as a validation cohort.
To identify a novel CT-based radiomics model associated with the CRT response in ESCC patients, we designed this study in 3 phases: a discovery phase for the selection of candidate radiomics features and construction of the prediction model, a validation phase with an independent CRT clinical validation cohort to assess the performance of the radiomics prediction score as a CRT response marker, and a development phase with the validation cohort (n = 17) and all enrolled CRT patients (n = 44) to assess and advance our CRT response marker as a prognosis-prediction marker as well (Fig. 1).
Cancer staging was performed according to the Union for International Cancer Control TNM staging system (8th Edition).All patients underwent esophageal endoscopy for endoscopic and histological evaluation of the effect of CRT 1 month after completion of CRT.The median follow-up time of all the patients analyzed was 63.4 months (95%CI 46.5-80.2).This study was conducted in accordance with the Declaration of Helsinki and approved by the ethics committee of Tokushima University Hospital.Informed consent was obtained from all patients prior to the collection of any data.

CT imaging protocol
All patients were examined using a 16-detector row Aquilion LB model CT scanner (Toshiba, Tokyo, Japan).The CT scanning parameters included a tube voltage of 120 kV, tube current auto, pixel size 0.976 × 0.976 mm 2 , and slice thickness 2.5 mm.All raw data were reconstructed with a 0.625 mm section thickness for the routine axial CT images.No patient received intravenous contrast medium.

Treatment evaluation
A responder to CRT was defined as `complete response (CR) of primary lesion in the radiation field' maintained for more than 1 year.Evaluation of the CRT response was performed 1 month after the completion of CRT by CT scan and endoscopy.CT scans were then taken every 3-6 month for 2 years, and approximately every 6 months since then in all patients.According to the criteria from 11th edition of the Esophageal Cancer Handling Regulations, endoscopic primary lesions were evaluated as follows: (1) all endoscopic findings suggestive of neoplastic lesions have disappeared, (2) there is pathologically no cancer detected by endoscopic biopsy of the primary www.nature.com/scientificreports/lesion that was present before treatment, (3) the entire esophagus can be observed by endoscopy, (4) there are no endoscopic findings suggestive of active esophagitis (no swelling alteration, no white moss).A complete response was achieved when all of the above findings (1) to (4) were satisfied 16 .Patients who did not meet the response definition were categorized as non-responders.

Feature extraction
We extracted radiomics features from each pretreatment CT images.A schematic illustration of the process of extracting radiomics features is shown in Supplementary Fig. S1.First, the volume of interest (VOI), which is equivalent to gross tumor volume (GTV) in the treatment planning of radiotherapy, was manually delineated by the same radiologist (T.K.) to mitigate intra-observer delineation variability.The VOI was set on the threedimensional (3D) CT image for all patients, and then 8 features depending only on the shape and size of the VOIs were extracted.A 3D wavelet transform was applied to each CT dataset to decompose into 8 components for extraction of the histogram and texture features.All decomposed images as well as the original image were resampled isotropically with a 2-mm scale and were requantized with a 25-Hounsfield unit bin size.Then, 10 × 9 (90) histogram features were extracted from each component image as well as from the original image.
Similarly, 42 × 9 (378) texture features were extracted.Thus, 1 case has 476 features extracted from original and wavelet filtered images.Through the feature extraction process, we modified MATLAB programming tools for radiomics feature extraction 17,18 .

Feature selection
In the discovery phase, we selected candidate radiomics features from the CT images, which associated with CRT response in the training cohort.We calculated AUC value for each of all features for response by receiver operating characteristic (ROC) analysis using c-statistics, and selected radiomics features which significantly associated with responders (AUC ≥ 0.7, p < 0.05).Furthermore, to avoid redundancy for such selected features, we used Pearson's correlation coefficient analysis and limited the feature spaces by discarding features that were highly correlated with the others.In this study, we used r ≥ 0.7 (p < 0.05) as the threshold value for the pairwise correlation [19][20][21] .

Machine learning
We used 5 commonly machine learning algorithms to achieve the best predictive model.These machine learning algorithms including Random Forest (RF) model, Naive Bayes (NB) model, Ridge Regression (RR) model, Artificial Neural Network (ANN) model, and Support Vector Machine (SVM) model were compared based on ROC curves and the best-performing prediction model was selected [22][23][24] .In the validation phase, we evaluated the models constructed in the discovery phase to discriminate between responders and non-responders by ROC analysis for the validation cohort.

Prognosis analysis
In the development phase, to evaluate whether our radiomics model is able to predict prognosis as well, Kaplan-Meier analysis was performed comparing progression-free survival (PFS) and overall survival (OS) between the high-prediction score group and low-prediction score group of RF model.The data of all the patients (n = 44) were used and a p value was calculated by log-rank test.PFS was defined as the time from the date of CRT initiation to the date of first radiologic confirmation of tumor progression or death from any cause.OS was defined as the time from the date of CRT initiation to the date of death due to any cause.The follow-up endpoint was set at February 24, 2022.To find possible factors associated with PFS, we used Cox proportional hazards model for univariate and multivariate analyses.

Statistical analysis
The CR rate of CRT for esophageal squamous carcinoma patients was expected to be 29.6%, according to a previous study 25 .Assuming that the AUC value of our radiomics algorithm for CRT response is 0.9, the sample size (validation cohort) was calculated to be 17, with 80% power and 5% significance level, as determined using Medcalc statistical software.In general, the ratio of the validation cohort and training cohort sample sizes should be reportedly 4:6 15 .Therefore, we set the validation and training cohort sample sizes as 17 and 27, totaling 44 patients.Statistical differences were analyzed using χ 2 , Fisher exact test or Student t-test.All statistical analyses were performed using R software version 4.0.3,Medcalc statistical software (v.12.7.7., Medcalc Software bvba, Ostend, Belgium), GraphPad Prism version 9.0 (GraphPad Software, San Diego, CA), and JMP software (10.0.2., SAS Institute, Cary, NC).Pearson's correlation coefficient (r) was used to evaluate the linear relationship between 2 variables.For time-to-event analysis, survival estimates were calculated using Kaplan-Meier analysis, and groups were compared by log-rank test.ROC curves were established to discriminate between CRT responders and non-responders, and the Youden's index was used to determine the optimal cutoff thresholds for prediction score to predict the CRT response.The prediction score was calculated using the RF model, as described in "Supplementary methods".According to this formula, a higher score is more likely to show a better response, whereas a lower score is more likely to show a poorer response.The AUCs were compared using DeLong's test.All p values were 2-sided, and those less than 0.05 were considered statistically significant.

Radiomics features associated with CRT responders
To select the optimal predictive radiomics features associated with the tumor response to CRT, we calculated AUCs for each of the 476 features in the training cohort of ESCC patients.We selected 110 radiomics features with AUCs more than 0.7.We then calculated correlation coefficients among those features, and grouped features with a correlation coefficient (r ≥ 0.7) into 12 groups.The 12 groups and their constituent features are shown in Supplementary Table S1  www.nature.com/scientificreports/although NB and RR did not show any significant difference.Thus, the RF model showed the highest performance (AUC 0.92; accuracy, 82.4%; sensitivity, 83.3%; specificity, 90.0%) for the validation cohort.All the prediction scores of RF model in the validation cohort are shown in Supplementary Table S4.The NB and RR models also showed high AUC values of more than 0.8.

Survival analysis
Since the RF model had the highest prediction performance in the validation phase, we performed survival analyses comparing the high-prediction score group and low-prediction score group in the RF model.In all patients, the PFS in the high-prediction score group was significantly longer than that in the low-prediction score group (55.6 vs 5.9 months; HR:0.25 [95%CI 0.11-0.52];p < 0.001) (Fig. 3A).Similarly, the OS in the highprediction score group was significantly longer than that in the low-prediction score group (100.4 vs 13.4 months; HR:0.26 [95%CI 0.10-0.57];p < 0.001) (Fig. 3B).Univariate and multivariate Cox regression analysis associated with PFS and OS are shown in Tables 2 and 3.The T stage, lymph node metastasis and radiomics prediction score were significantly associated with both PFS and OS in the univariable analysis.Furthermore, multivariate analysis revealed significant differences in lymph node metastasis (HR:0.41[95%CI 0.19-0.83];p = 0.013) and radiomics prediction score (HR:0.35[95%CI 0.14-0.77];p = 0.009) in Table 2, and the T stage (HR:0.26[95%CI 0.06-0.79];p = 0.014), lymph node metastasis (HR:0.34[95%CI 0.15-0.70];p = 0.003) and radiomics prediction score (HR:0.44 [95%CI 0.17-0.98];p = 0.056) in Table 3.Similar results were obtained in Kaplan-Meier analysis, and univariate and multivariate analyses in the validation cohort (Supplementary Fig. S2, Tables S5 and S6).Thus, the radiomics prediction score was shown to be an important prognostic factor for ESCC patients treated with CRT.

Discussion
In this study, we performed a comprehensive CT-based radiomics analysis to identify candidate features for CRT response from 27 ESCC patients in a training cohort and subsequently identified 12 radiomics features for the CRT response.In addition, we developed a radiomics prediction model for the CRT response with 5 commonly used machine learning algorithms.Thus, we were able to validate high diagnostic performance of the model using another independent CRT cohort of 17 ESCC patients.Furthermore, we expanded survival evaluation and showed a prognostic ability to predict PFS as well as OS.This is the first study proposing a CT-based radiomics model associated with high initial response as well as long-term response after CRT in ESCC patients.Notably, we showed that the radiomics prediction score had superior survival predictability compared with serum SCC-Ag, the most commonly used conventional clinical serological marker for ESCC.
In previous studies, the CRT response was evaluated only a few months after treatment, and radiomics features were analyzed based on such short-term responses because most patients in these studies underwent surgical resection 19,21 .However, in the present study, we defined a CRT responder as `CR of primary lesion in the radiation field maintained for more than 1 year' .Consequently, our model could successfully predict the long-term response after CRT (the median PFS time, 55.6 months).Furthermore, though our study included a variety of patients from early stage to palliative stage and with both resectable and unresectable cancers, most previous studies analyzed only patients who underwent neoadjuvant CRT; ie, resectable patients.Owing to our systematic and comprehensive biomarker approach using the medical data of a total of 44 patients, our radiomics model provided a greater predictability and higher diagnostic accuracy (AUC: 0.92, p < 0.001) in comparison with these previous studies [26][27][28][29][30] .Furthermore, the greatest strength of our study is that our radiomics model could predict not only the response to CRT but also the prognosis of ESCC patients who received CRT.
Among the 5 machine learnings (RF, NB, RR, ANN, SVM) used in this study, all the models were predictive with high accuracy rates, especially the RF, NB, and RR models.Our data clearly suggest that the 12 selected features can appropriately predict the CRT response.In particular, the RF model showed the best performance compared with the other models.The RF algorithm uses a number of decision trees and predicts more accurately www.nature.com/scientificreports/by averaging the data in case of regression and voting them in case of classification 31 .The RF algorithm can also be used with a wide range of sample sizes including small sample sizes.The characteristics of the RF algorithm may be suitable for the analysis of our data from a relatively small sample size consisting of a wide range of stages (Stage I-IV).
A limitation of this study is that the sample size was comparatively small, and that it was a single-institution retrospective analysis, although radiomics studies with small sample sizes at single institution, similar to our study, have been reported 28,32,33 .Therefore, large multicenter and prospective cohort studies are needed to optimize the generality, robustness, and clinical usefulness of our model.Another limitation is that inter-observer consistency was not evaluated in this study.Intraclass correlation coefficient analysis for this model should be performed in the future.
In conclusion, we used a comprehensive biomarker discovery process with 2 independent clinical cohorts to develop and validate a novel CT-based radiomics model for the prediction of the response to CRT as well as the prognosis of ESCC patients after CRT.Our radiomics model of RF using 12 radiomics features, which is clinically useful, cost-free, and non-invasive, may have the potential to contribute to more effective treatment strategies as a promising and personalized decision-making tool to decrease ESCC mortality.

Figure 1 .
Figure1.Study design for the identification and validation of the CT-based radiomics model for predicting response to and survival following CRT in ESCC.Among 50 patients with esophageal squamous cell carcinoma (ESCC) who underwent CRT, we excluded 6 patients from this study due to lack of clinical information such as accurate survival time.Ultimately, 44 patients were enrolled.Radiomics features were extracted and selected from the CT images for each patient.We created 5 machine learning models using radiomics features from the training cohort (n = 27) and selected the best model using ROC analysis.We evaluated the best-performing prediction model using a validation cohort (n = 17).Survival analysis was performed using the validation cohort (n = 17) and all cases (n = 44).

We used 5 Figure 2 .
Figure 2. ROC curves plotted by prediction models for each machine learning algorithm.The diagnostic abilities of 5 machine learning models-the RF, NB, RR, ANN, SVM models-were evaluated using ROC curves in the training (A) and validation cohorts (B). A. Among the 5 machine learning models, the RF model exhibited the highest AUC (0.99 [95%CI 0.86-1.00])despite showing no significance among the 5 models.B. The RF model showed the highest AUC (0.92 [95%CI 0.71-0.99]),which was significantly higher compared with ANN and SVM by DeLong's test (p < 0.05).The NB and RR did not show any significant difference compared with any of the 5 models.

Figure 3 .
Figure 3. Kaplan-Meier analysis of PFS and OS comparing high-and low-prediction score groups of ESCCs in the RF model.All patients (n = 44) were analyzed using the RF model, categorized as high-or low-prediction score groups, and Kaplan-Meyer curves were drawn.A. Kaplan-Meier curves of PFS.The median PFS in the high-prediction score group was significantly longer than that in the low-prediction score group (55.6 vs 5.9 months; HR:0.25 [95%CI 0.11-0.52];p < 0.001).B. Kaplan-Meier curves of OS.The median OS in the high-prediction score group was significantly longer than that in the low-prediction score group (100.4 vs 13.4 months; HR:0.26 [95%CI 0.10-0.57];p < 0.001).
The clinical characteristics of the patients are shown in Table1.A total of 44 patients were enrolled in this study, including 27 in the training cohort and 17 in the validation cohort.The mean ages were 73.4 years (range 56-96 years) and 68.6 years (range 47-88 years), respectively.A majority of the patients were males; 92.6% and 76.5% respectively.Most patients were T3/4; 85.1% and 70.6%, respectively.The clinical stage was mostly IV that did not have metastatic lesions (M0), namely locally advanced lesions, in both groups.There were 6 (22.2%)CRT responders in the training cohort and 6 (35.3%) in the validation cohort.No significant difference was observed in any of the factors between the 2 groups.

Table 2 .
Univariate and multivariate analyses of possible factors associated with PFS.PFS progression free survival; HR hazard ratio; SCC squamous cell carcinoma.Significant values in Bold.

Table 3 .
Univariate and multivariate analyses of possible factors associated with OS.Significant values in Bold.