Peritumoral and intratumoral radiomic features predict survival outcomes among patients diagnosed in lung cancer screening

The National Lung Screening Trial (NLST) demonstrated that screening with low-dose computed tomography (LDCT) is associated with a 20% reduction in lung cancer mortality. One potential limitation of LDCT screening is overdiagnosis of slow growing and indolent cancers. In this study, peritumoral and intratumoral radiomics was used to identify a vulnerable subset of lung patients associated with poor survival outcomes. Incident lung cancer patients from the NLST were split into training and test cohorts and an external cohort of non-screen detected adenocarcinomas was used for further validation. After removing redundant and non-reproducible radiomics features, backward elimination analyses identified a single model which was subjected to Classification and Regression Tree to stratify patients into three risk-groups based on two radiomics features (NGTDM Busyness and Statistical Root Mean Square [RMS]). The final model was validated in the test cohort and the cohort of non-screen detected adenocarcinomas. Using a radio-genomics dataset, Statistical RMS was significantly associated with FOXF2 gene by both correlation and two-group analyses. Our rigorous approach generated a novel radiomics model that identified a vulnerable high-risk group of early stage patients associated with poor outcomes. These patients may require aggressive follow-up and/or adjuvant therapy to mitigate their poor outcomes.


Radiomic analyses.
A total of 109 peritumoral features were extracted, of which 56 were found to be both stable and reproducible, and a total of 155 intratumoral features were extracted, of which 35 were stable and reproducible. Therefore, a total of 91 stable and reproducible radiomics features (peritumoral and intratumoral) were subjected to univariable analysis. In univariable analyses, 40 of the 91 radiomic features (26 peritumoral and 14 intratumoral) were significantly associated with OS in the training cohort (Supplemental Table 1) and 30 of the 40 features were eliminated because they were correlated. The 10 remaining features were reduced to four highly informative features using backward elimination (Supplemental Table 2). Among the four features, three were peritumoral (average co-occurrence joint entropy, NGTDM busyness, and average co-occurrence angular second moment) and one was intratumoral (statistical root mean square).
Classification and regression tree (CART) analysis. The four remaining radiomic features were subjected to Classification And Regression Tree (CART) analysis in the training cohort and based on 2 radiomic features (NGTDM Busyness and Statistical Root Mean Square). CART analysis classified patients into four risk groups: low-risk, intermediate-risk-1, intermediate-risk-2, and high-risk (Supplemental Fig. 1). The four risk groups were reduced to three risk groups by combining the two intermediate-risk groups (Fig. 1B) and 3 risk groups from the CART model was replicated in the test cohort.
Patient characteristics of the three risk groups in the total data set (N = 234). There were no statistically significant differences between the three risk groups by age, smoking status, number of pack-years smoked, self-reported COPD, and family history of lung cancer, histological subtypes, and treatment (Table 2). However, there were statistically significant differences across the risk groups for sex (P = 0.04) and stage of disease (P = 0.001). Specifically, 92% of the patients in the high-risk group in were male vs. 54% in the low-risk group (P = 0.04) ( Table 2). In term of lung cancer stage, 33% of the patients in the high-risk group had lung cancer early-stage vs. 80% in the low-risk group (P = 0.001).
Validation dataset. Using non-screen detected we attempted to replicate the risk groups obtained from the CART model. Among early-stage adenocarcinoma lung cancers (Fig. 3A), the high-risk group was associated with worse OS (HR = 2.63; 56% 2.5-year and 42% 5-year OS, log-rank P = 0.112) compared to the low-risk group (HR = 1.00; 75% 2.5-year and 75% 5-year OS). Among late-stage patients (Fig. 3B), the risk groups were not associated with survival (log-rank P = 0.432).

Multivariable analyses.
Multivariable Cox regression models were used to adjust for potential confounding factors including sex, treatment, and stage. In the training cohort, the high-risk group was associated with an elevated hazard ratio (OS: HR = 9.71; 95% Confidence Interval: [3.85, 24.48 performance of the multivariable model was estimated using the Harrell's C index. The multivariable model showed a better discrimination capability with a higher C indices in the training and test cohort when analyzing OS (C-index: 0.83 and 0.81, training and test cohort respectively) compared to PFS (C-index: 0.81 and 0.80, training and test cohort respectively). Using the multivariable models, time-dependent areas under the curve (AUROC) for OS were generated and produced an AUROC of 0.878 at 2-years and 0.800 for 4-years. Among early-stage patients, the AUROC was 0.702 for 2-years and 0.669 and 4-years (Fig. 4). Similar results were found for PFS (Supplemental Fig. 3). www.nature.com/scientificreports/

Discussion
Predictive biomarkers that identify aggressive cancers from those that are either indolent, or at lower-risk of poor survival outcomes, are a critical unmet need in the lung cancer screening setting. In this study, we utilized peritumoral and intratumoral radiomic features to generate a model that is able to detect vulnerable group of screen-detected early stage lung cancer patients that have high-risk of experiencing poor survival outcomes. Specifically, we identified a model that contained two radiomics features, one peritumoral and one intratumoral, which stratified patients into three risk-groups: Radiomics is a non-invasive approach that utilizes standard-of-care imaging to generate quantitative image features that can be used for risk prediction, diagnostic discrimination, prognostication, and to predict treatment response 14,15,[21][22][23] . Prior studies have shown that peritumoral features, extracted from the area surrounding the tumor parenchyma, and intratumoral features, extracted from the area within the tumor, have prognostic and predictive utility in cancers such as lung, breast, brain, gastric, and head and neck 1,24-29 . For example, Dong et al. (29) developed an individualized nomogram using radiomic features from primary tumor and from the Table 3. Multivariable Cox proportional hazards models for overall survival in the training and test cohorts. Abbreviations: HR: Hazard Ratios; NOS: Not otherwise specified; Data in parentheses are 95% CIs. 1 BAC and adenocarcinoma were combined into one group. 2 "All patients" combines the training and test sets into a single cohort.  26 . In, Xu et al. identified a radiomic score using features from the peritumoral region of hepatocellular carcinoma tumors that predicts microvascular involvement 30 . Cumulative, the evidence of our study and others have demonstrate the utility of using peritumoral features alone or in combination with intratumoral features. In this study, we identified a highly informative peritumoral feature (NGTDM busyness) and a highly informative intratumoral feature (statistical RMS). NGTDM, a texture feature, captures intensity values of a neighborhood of pixels to characterize the difference between a center voxel within the neighborhood 31,32 . NGTDM parameters are coarseness, contrast, and busyness. Coarseness describes the granularity of an image, contrast relates to the dynamic range of intensity, and busyness relate to the rate of intensity change within an image 32 . NGTDM busyness has a high predictive power in differentiating between glioblastoma and primary central nervous system lymphoma 33 . Studies found that NGTDM busyness extracted from positron emission tomography (PET) is useful for discriminating benign from malignant solid pulmonary nodules 34 . Intensity-based features are derived from image histograms which represent the intensity distribution of the image. Parmar et al. 2015 showed a correlation between intensity feature and patient survival in lung and head and neck cancer. Statistical RMS, an intensity feature, is a first-order statistic that calculates the root mean square of the voxel's intensity value 31,35 . A previous study showed that statistical RMS was able to predict pathological response after chemoradiation in non-small cell lung cancer (NSCLC), by identifying gross residual response 35 . Statistical RMS combined with other intensity statistical features was able to distinguish bladder tumor tissue from other normal tissues in fluorodeoxyglucose-positron emission tomography (FDG-PET) scan 36 .
The radiogenomics analyses revealed that the most informative intratumoral radiomic feature, RMS, was significantly associated with expression of FOXF2 and LOC285043 which is an uncharacterized gene 37 . We observed a trend for lower expression of FOXF2 in intermediate and high-risk groups versus the low-risk group. However, this trend in high and low-risk group was not statistically significant. FOXF2 is expressed in the lung and functions as an activator or inhibitor of gene transcription 38 and up-regulation of FOXF2 expression induces EMT, migration, invasion and metastasis in breast cancer 39 . To support our findings that low expression of FOXF2 is a negative prognostic factor, a prior study demonstrated that patients with stage I NSCLC who had low FOXF2 expression had significantly shorter overall survival compared to patients with high FOXF2 expression 40 . RABGAP1L was found to be significantly associated with NGTDM Busyness in the pairwise analyses and we revealed that the intermediate and high-risk groups had higher expression of RABGAP1L versus the low-risk www.nature.com/scientificreports/ group. At present, there is no known role for RABGAP1L in lung cancer. However, RABGAP1L has been shown to deregulate the tyrosine-kinase signaling pathway in acute myeloid leukemia 41 and regulates the activity of GTPases which is essential to transport cell adhesion proteins and migrating cells 42 . We acknowledge some limitations of this study. First, the sample size is somewhat modest because we utilized lung cancer cases with specific inclusion/exclusion criteria from the NLST. Then when we split the available number of cases into training and test cohorts based on a 70:30 ratio; the resulting sample sizes likely attributed to the poorly calibrated model based on its ability to predict 5-years survival outcomes in the training and test cohorts. However, we applied a rigorous feature reduction approach to eliminate correlated and non-reproducible features and utilized a backward reduction approach to identify a parsimonious model containing the most www.nature.com/scientificreports/ important features to reduce false positive findings. Although the overall population of lung cancer patients were treated heterogeneously; however, among the early stage patients, 92.74% of the patients had surgery as their only treatment. We also recognize that there were limited numbers of patients in the high-risk groups for these cohorts. Finally, our validation cohort was limited to patients with lung adenocarcinoma. Additional research is needed to validate the biological underpinnings of these features. The results from our analyses produced a parsimonious radiomic model that identified a vulnerable subset of screen-detected lung cancers that are associated with poor outcome. These findings could support more aggressive treatment and follow-up for such high-risk patients. Nonetheless, additional research will be needed to inform the potential translational implications of these findings, to fully elucidate the biology these high-risk www.nature.com/scientificreports/ screen-detected tumors, to assess whether these findings are consistent across screening trials and cohorts, and how best to personalize cancer management in these vulnerable patients.
Methods nLSt study population. Deidentified data and LDCT images were accessed from the National Cancer Institute (NCI) Cancer Data Access System (CDAS) 43 . The NLST study design and main findings have been described previously 1,28 . NLST eligibility criteria included current and former smokers aged 55-74 years with a minimum 30 pack-years smoking history and former smokers had to have quit within the past 15 years. Based on the schema described in Schabath et al. 44 , this analysis considered 321 NLST participants who had a negative or positive baseline screening (T0) result and were diagnosed with a screen-detected, incidental lung cancer on follow-up screening intervals 12 (T1) or 24 (T2) months after T0. Positive screens were defined as abnormalities on baseline screens or at follow-up screens that were new, stable or evolved. Negative screens were defined as a computerized tomography (CT) scan with no abnormalities, minor abnormalities, or significant abnormalities not suspicious for lung cancer. Among the 321 lung cancer patients, 196 had a baseline positive screen that was not diagnosed as lung cancer but evolved and diagnosed as lung cancer at T1 or T2 screening intervals. The remaining 125 lung cancer patients had a baseline negative screen result but developed a nodule that was diagnosed as lung cancer at either T1 or T2. Lung cancer patients who had multiple nodules at time of their diagnosis were excluded (N = 58) since we are unable to verify which nodule(s) were cancer. Complications with segmentation, such as a nodule attached to lung wall, led to 29 patients being excluded. The final dataset of 234 screen-detected lung cancers were randomly split into a training cohort (N = 161) and a test cohort (N = 73). non-screen detected lung cancer validation dataset. The radiomics data were further validated for OS in a prior published dataset was comprised of 62 adenocarcinoma patients who underwent surgical resection as first course therapy at the Moffitt Cancer Center and had pre-surgery CTs within 2 months prior surgery 45 . Radiogenomics dataset. A previously described dataset 46 of surgically resected adenocarcinoma lung cancers (N = 103) who had pre-surgery CTs and gene expression data was used to identify potential biological underpinnings of the final two informative radiomic features (RMS and NGTDM Busyness). The gene expression data were IRON-normalized and batch-corrected for RNA quality Pathway and Gene Ontology Enrichment using Clarivate Analytics MetaCore 46 .
Radiomics. Nodule identification and tumor segmentation is described elsewhere 15 . The tumor mask images (i.e., tumor delineations) were imported into in-house radiomic feature extraction toolboxes created in MAT-LAB® 2015b (The Mathworks Inc., Natick, Massachusetts) and C+ + (https ://isocp p.org). Using cubic interpolation, the images were resampled to a single voxel spacing of 1 mm × 1 mm × 1 mm to standardize spacing across all images. Hounsfield units (HU) in all images were resampled into fixed bin sizes of 25 HUs discretized from -1,000 to 1,000 HU.
Using standardized radiomic algorithms from the Image Biomarker Standardization Initiative (IBSI) v5 47 , a total of 264 radiomic features were extracted from semi-automatic segmented intratumoral region (n = 155) and from the peritumoral region (n = 109) 3 mm outside of tumor boundary. The peritumoral regions were generated as an extension of the tumor segmentations using morphological image processing operations as previously mentioned 48 . Peritumoral regions were bounded by a lung parenchyma mask to exclude the region of interest (ROI) outside of the lung parenchyma. Shape and size based peritumoral features were excluded as they explicitly describe (i.e., correlate) the intratumoral ROI. Only the most stable and reproducible intratumoral and peritumoral radiomic features that were previously found on another study of our group 48 were utilized for analyses to create repeatable models. Further details of the feature selection process are presented in the statistical analysis section below (Fig. 1A). Overall survival (OS) and progression-free survival (PFS) were the main end-points for the analysis and were assessed from date of lung cancer diagnosis to the date of an event or last follow up. For OS, an event was defined as death and for PFS an event was established as death or progression of cancer. All survival data were right censored at 5-years. To generate a parsimonious radiomics model, we first performed univariable analyses using Cox Proportional Hazard regression and retained features with a P value < 0.10. Among the remaining features after univariable analyses, we removed features that were correlated based on Pearson's correlation coefficient > 0.80. If two or more features were correlated based on an absolute Pearson's correlation coefficient > 0.80, the feature with the smaller p-value from the univariable analyses was retained. The remaining radiomic features were subjected to backward elimination approach using a pre-specified more stringent P value < 0.01 for inclusion. Among the remaining covariates, Classification And Regression Tree analysis (CART) was used to stratify patients into risk groups. CART is a nonparametric data-mining tool that can identify hierarchical interactions and segment covariates into novel and meaningful terminal subgroups (i.e., nodes). The hazard ratios from the risk groups were generated using Cox Proportional Hazard regression. The risk groups were also analyzed by Kaplan Meier curves and log-rank tests. The risk groups based on the most informative radiomic features in the training cohort were validated in the test cohort and further replicated in the adenocarcinoma cohort. The Harrell's concordance index (C-index) was used to evaluate the multivariable model. Time-dependent area under www.nature.com/scientificreports/ the receding operating curve (AUROC) analyses was used to assess accuracy of the Cox regression models at different time-points using R packages survival 49 , survminer 50 , and survivalROC 51 .

Statistical analysis.
Using the radiogenomics dataset, analyses were conducted to determine if the two final radiomic features (RMS and NGTDM Busyness) were associated with the gene probesets using two different approaches: correlation and two-group analysis. For the correlation analysis, gene probesets were filtered and determined as statistically significant using the following criteria: Pearson's correlation with a threshold |R|> 0.4, an expression filter with max expression of gene > 5, and an inter-quartile filter (IQR > log2 (1.2 FC)). For the two-group analyses, gene probesets were filtered and determined as significant using the following criteria based on a Student's t test p < 0.001 and mean log fold-change between high and low prognostic radiomic feature oflfc > log2 (1.4 FC). The significant probesets from the two-group analyses were intersected yielding a final list of probesets significantly associated with the most informative radiomic feature. ANOVA and Tukey pairwise mean comparison was performed to analyzed gene expression across the risk groups.

Availability of data and materials
The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.