Prognostic value of FDG-PET radiomics with machine learning in pancreatic cancer

Patients with pancreatic cancer have a poor prognosis, therefore identifying particular tumor characteristics associated with prognosis is important. This study aims to investigate the utility of radiomics with machine learning using 18F-fluorodeoxyglucose (FDG)-PET in patients with pancreatic cancer. We enrolled 161 patients with pancreatic cancer underwent pretreatment FDG-PET/CT. The area of the primary tumor was semi-automatically contoured with a threshold of 40% of the maximum standardized uptake value, and 42 PET features were extracted. To identify relevant PET parameters for predicting 1-year survival, Gini index was measured using random forest (RF) classifier. Twenty-three patients were censored within 1 year of follow-up, and the remaining 138 patients were used for the analysis. Among the PET parameters, 10 features showed statistical significance for predicting overall survival. Multivariate analysis using Cox HR regression revealed gray-level zone length matrix (GLZLM) gray-level non-uniformity (GLNU) as the only PET parameter showing statistical significance. In RF model, GLZLM GLNU was the most relevant factor for predicting 1-year survival, followed by total lesion glycolysis (TLG). The combination of GLZLM GLNU and TLG stratified patients into three groups according to risk of poor prognosis. Radiomics with machine learning using FDG-PET in patients with pancreatic cancer provided useful prognostic information.

Pancreatic cancer is associated with poor prognosis 1 and is the fourth most common cause of cancer death in Japan, the USA, and Europe [2][3][4] . Despite advances in the past decades in surgery, radiation therapy, and chemotherapy, the 5-year survival rate remains less than 9% 5 . Therefore, identifying particular tumor characteristics associated with poor prognosis is important at the initial assessment. Numerous 18F-fluorodeoxyglucose (FDG)-PET reports have demonstrated the efficacy of conventional PET features such as maximum standardized uptake value (SUVmax), metabolic tumor volume (MTV), and total lesion glycolysis (TLG) for predicting therapeutic response and prognosis [6][7][8][9] . However, those conventional PET features do not represent the spatial tumoral heterogeneity, which is deeply associated with cellular and molecular characteristics such as cellular proliferation and necrosis 10,11 . Texture analysis has recently been identified as a volume-based method for quantifying tumor properties that are beyond the capability of visual interpretation or simple metrics as an essential tool for "radiomics" 12,13 . Radiomics is defined as the conversion of digital medical images into high-dimensional quantitative features, enabling data to be extracted and applied to the improvement of diagnostic and prognostic accuracy. This field has increased in importance for cancer research in recent years. Radiomics offers new opportunities for developing a better understanding of oncological processes, enabling personalized therapy 6,11 . Some recent radiomics studies have used machine-learning methods such as support vector machines, neural networks, and random forest (RF) classifiers 14-16 that can improve the robustness of the statistical analysis 12 . However, few studies have explored the prognostic value of radiomics in pancreatic cancer using FDG-PET/CT with texture analysis [17][18][19][20] . To the best of our knowledge, no study has evaluated the prognostic value of FDG-PET/CT radiomics with machine learning in pancreatic cancer.
We hypothesized that radiomics with machine learning can provide a useful combination of clinical information, volume-based PET imaging parameters, and PET texture features that provide prognostic information for Univariate and multivariate Cox hazard regression analysis. Among the clinical characteristics, clinical stage and surgical treatment were identified as significantly important factors for predicting overall survival (Table 2). Among the PET parameters, 10 features showed statistical significance (log-rank p < 0.001) for predicting overall survival; of these, multivariate analysis with Cox HR regression revealed gray-level zone length matrix (GLZLM) gray-level non-uniformity (GLNU) as the only statistically significant PET parameter (Table 3). Kaplan-Meier curves for GLZLM GLNU are shown in Fig. 1.
Machine learning analysis. GLZLM GLNU was an independent risk factor for poor prognosis regardless of clinical stage and surgical status (Table 4). In the RF model, GLZLM GLNU was the most relevant factor for predicting 1-year survival, followed by total lesion glycolysis (TLG) (Fig. 2). The combination of GLZLM GLNU and TLG appropriately stratified patients into three groups according to risk for poor prognosis (Fig. 3). This combination was also effective in a subgroup analysis of patients who had received surgical treatment alone (Supplemental Figure S1).

Discussion
The present study appears to be the first to evaluate the prognostic value of FDG-PET radiomics with machine learning in pancreatic cancer. Among the various PET parameters, GLZLM GLNU was the most relevant feature for predicting prognosis in multivariate analysis and machine learning analysis with RF. In addition, GLZLM  www.nature.com/scientificreports/ GLNU combined with TLG, which was the second most important factor in the RF model, enabled stratification of patients into three groups according to their risk for poor prognosis. We selected an RF classifier for use in a machine-learning approach. Random forest is an ensemble approach that computes multiple decision-tree-based classifiers using implicit feature selection 21 . Although a number of studies of malignant diseases have reported the clinical implications of intratumoral heterogeneity on FDG-PET, a lack of standardization complicates the comparison of these results. In their critical review, Hatt et al. described common issues in recent studies of texture analysis such as variability of nomenclature, workflow complexity, and redundancy of features; moreover, they recommended using robust machine-learning techniques to achieve better redundancy analysis and feature selection/combination 12 . Among the various machine-learning techniques, the advantage of RF in being able to predict features non-parametrically even if some features show collinearities with others suggests its suitability for texture analysis. Indeed, Ahn et al. reported that an RF classifier provided higher diagnostic performance compared with other machine-learning algorithms, including support vector machine and neural network algorithms, for predicting the prognosis of lung cancer on FDG-PET 14 . The RF classifier technique shows promise for extraction of the most prognostic PET features.
In multivariate analysis, GLZLM GLNU was the only PET parameter that showed statistical significance, and was the most important factor for predicting prognosis in the RF model, outperforming conventional FDG-PET parameters such as SUVmax and metabolic tumor volume. The gray-level zone length matrix (GLZLM, also termed gray level size zone matrix [GLSZM]) is a regional textural feature. It provides information regarding the size of homogeneous zones for each gray level in three dimensions. Gray-level non-uniformity (GLNU) is a measure of the similarity of gray-level values throughout the image 22 ; as with many other textural features, the value of GLSZM GLNU increases if the lesion is heterogeneous [23][24][25] . Intratumoral heterogeneity is associated with tumor aggressiveness, treatment response, and prognosis 12,26 . Many studies have demonstrated the clinical  [17][18][19][20] . These studies were all were FDG-PET-based, and primarily assessed the prognostic value of intratumoral heterogeneity for predicting survival. Hyun et al. investigated the utility of texture analysis on FDG-PET in 137 patients with pancreatic cancer who underwent diverse treatment and supportive care. In time-dependent ROC curve analysis for 2-year survival prediction, entropy (a global textural feature) and heterogeneity index showed the highest AUC value (0.720), followed by TLG (AUC = 0.697) 18 . In the present study, "entropy" corresponds to "Entropy(log2)" in the Globaltextural Histogram and was ranked 23rd out of 42 features in the RF analysis (Supplemental Table S1), but direct comparisons are difficult to make because the present study deals with a larger number of features than did previous studies (36 features). Furthermore, the present study included many patients with stage 1 pancreatic cancer (45%). Although possibly the cause of the difference in results compared with the study of Hyun et al.
(no stage 1), it provides an advantage in predicting patient prognosis at an early stage. Although there are subtle differences in the feature types, the results are consistent with our findings, in that textural features reflecting intratumoral heterogeneity and the volumetric parameter TLG are the two most important prognostic factors. As well as being complementary, intratumoral heterogeneity by texture analysis and conventional volumetric PET parameters in combination enable more accurate prognostic analysis in pancreatic cancer.
The results of the present study revealed surgical treatment as the strongest prognostic factor among the clinical features; however, we do not have this important information at the point of clinical decision making. In addition, GLZLM GLNU was identified as an independent risk factor for poor prognosis regardless of surgical treatment, and high GLZLM GLNU and/or TLG were associated with worse survival in patients who had undergone surgery, and also in the overall patients. The use of these imaging biomarkers could help improve risk stratification and enhance cancer management.
Several limitations must be considered in this study. First, this was a retrospective study in which the patients had undergone various treatment protocols. All patients underwent FDG-PET prior to any treatment, but had different clinical courses. Second, this was a single-center study; nevertheless, it included a relatively large number of patients compared with previous studies. Our study results need to be validated in a prospective multi-center study with external data. Third, lesions without significant uptake were excluded from analysis. This limitation is not specific to our study, and is inevitable in appropriate texture analysis 28 .
In conclusion, radiomics with machine learning using FDG-PET in pancreatic cancer extracted factors of useful prognostic value; in particular, the combination of GLZLM GLNU and TLG appropriately stratified patients according to their risk for poor prognosis. This information could be beneficial in pretreatment clinical decision making in patients with pancreatic cancer, enabling personalized medicine such as risk-based follow-up and enhanced chemotherapy. Further prospective validation studies are required before FDG-PET radiomics with machine learning can be applied to practical clinical use.  Table S1) including conventional features (e.g., SUVmax, MTV, TLG) and global, local, and regional texture features were measured using the LIFEx package 29 . Texture features were calculated only for VOIs of ≥ 64 voxels because textural features cannot be accurately quantified for small regions 28 . All PET/CT images were assessed by two nuclear medicine physicians (M.H. and Y.T, with 12 and 10 years of experience in CT and 5 and 4 years of expertise in PET, respectively), with decisions made in consensus. In cases of disagreement, a final consensus was achieved by discussion. The study endpoint was overall survival (OS), defined as the time from pretreatment FDG-PET/CT scan to cancer-related death. Outcome data were collected from the medical records of each patient. Surviving patients were censored at the time of last clinical follow-up.
Machine learning and statistical analysis. All statistical analyses were performed using R version 3.5.1 (R Foundation for Statistical Computing, Vienna, Austria). Kaplan-Meier analysis with the log-rank test was performed for PET parameters and clinical features. Optimal cutoff values of the PET parameters were obtained by Classification and Regression Tree (CART) analysis using the "rpart" R package. CART is a tree-buildingbased technique in which several predictor variables are tested to determine their impact on such as including overall survival 30 . The cutoff values for age and BMI were set at 60 years and 22 kg/m 2 , respectively, based on their clinical importance. Receiver-operating characteristic (ROC) analysis was performed to identify the optimal cutoff values for tumor markers CA19-9 and CEA. For PET parameters, the p value threshold for statistical significance was set at < 0.0012 (0.05/42) following Bonferroni correction. For the other analyses, p values < 0.05 were regarded as significant. Univariate and multivariate analyses were performed using Cox hazard ratio (HR) www.nature.com/scientificreports/ regression. To identity the PET parameters important for prediction of 1-year survival, mean decrease in Gini index was evaluated using an RF classifier with "randomForest" R package, in the population excluding patients who had been censored less than 1 year. Random forest is an ensemble technique that computes multiple decision-tree-based classifiers using implicit feature selection. Gini index is an efficient approximation of entropy in a computational manner. It is calculated at each node split of the RF and reflects how well the data could be split into two classes at a particular node in each tree. Gini index measures the degree or probability of a particular variable being wrongly classified for each feature at a node 21,31 . The RF classifier was optimized for the number of trees (ntree) (100, 250, 500, 750, 1000, 1500) with repeated (n = 100) and tenfold cross-validation using the "caret" R package, and optimal ntree and number of variables tried at each split (mtry) were determined (ntree = 750, mtry = 1). Using the two most relevant PET parameters from the RF model, CART analysis was performed to classify patients into subgroups according to their risk for overall survival. ethical statement. This study was approved by the local Ethics Committee and was carried out in accordance with the principles of the 1964 Declaration of Helsinki.