## Introduction

The precision medicine paradigm in healthcare decision-making has shown promise to improve cancer treatment outcomes17,18,19. Biomarkers from genomic and proteomic data derived in primary cancers of the lung, breast, ovary or brain can be used to stratify patients into different diagnostic/prognostic groups leading to more-effective treatment paths20,21,22,23,24,25. Acquiring genomic and proteomic biomarkers are, however, challenging in terms of implementation as they are invasive, not always technically feasible, and in some cases not robust as they may not represent the entirety of the tumour26,27. A reason for this observation is the fact that tumours are often spatially heterogeneous, and hard to characterize completely using core biopsy specimens28,29.

A number of recent studies have proposed quantitative imaging techniques to characterize different malignancies, including breast, lung, prostate and brain cancer, and monitor/predict their response to anti-cancer therapies30,31,32,33,34,35,36,37,38,39. To identify the quantitative imaging biomarkers with high sensitivity and specificity in such applications, computational methods have been adapted through an emerging field of research named radiomics40,41,42,43,44. The radiomic analysis often utilizes the images obtained as part of the standard of patient care, and can be implemented in the clinical routines with minimal overload. Further, the 3D volumetric images frequently applied in radiomics permits a comprehensive assessment of the entire tumour volume and surrounding regions. Recent studies have reported strong links between the radiomic signature of various tumours and their phenotype and genomic and proteomic profiles45,46,47. Aert et al. demonstrate that the computed tomography (CT) based radiomic features characterizing the spatial heterogeneity in lung and head and neck tumours are associated with underlying gene-expression patterns45. In another study, Zhu et al. indicate that in invasive breast cancer the transcriptional activities of various genetic pathways and miRNA expressions are associated with the tumour shape and texture in dynamic contrast-enhanced magnetic resonance (MR) images46. Similarly, Grossmann et al. report that the intra-tumour heterogeneity and intensity dispersion in CT images acquired from lung cancer patients are associated with the molecular characteristics of the tumour, including the activity of RNA polymerase transcription and the autodegration pathway of a ubiquitin ligase47.

A number of previous studies have investigated the efficacy of different radiomic biomarkers in cancer diagnosis, treatment response monitoring, and outcome prediction48,49,50,51,52,53. A study by Gómez Flores et al. suggests that ultrasound radiomic features can be used to characterize benign versus malignant breast lesions48. Li et al. demonstrate the potential of MR radiomic features to predict OS in Glioblastoma multiform49. Other studies have correlated radiomic biomarkers derived from positron emission tomography (PET) and CT images with loco-regional recurrence (LR), distance metastasis (DM), and OS in head and neck50, and lung cancer patients51.

In this study, quantitative MRI (qMRI) biomarkers was investigated through a radiomics analysis framework (Fig. 1) to predict the outcome of LF in brain metastasis patients treated with SRT. MR images were acquired from 100 patients (133 tumours) triaged for hypo-fractionated SRT before and within three months after the treatment, as part of our institutional standard of care. Various geometrical and textural features were extracted from gadolinium-contrast-enhanced-T1-weighted (T1w) and T2-weighted-fluid-attenuation-inversion-recovery (T2-FLAIR) images within the tumour and edema regions and the corresponding margins. A multi-step feature reduction and selection method consisting of correlation-based feature reduction, feature ranking, and forward feature selection based on bootstrap 0.632 + area under the curve (AUC) was used to construct the optimal qMRI biomarkers (consisted of four or five features). The feature selection results demonstrate that the qMRI biomarkers extracted from the edema, tumour margin and the lesion margin have more prognostic power compared to features extracted from the tumour itself. A support vector machine (SVM) classifier was used to predict the SRT outcome in terms of LC or LF. The results demonstrate a good potential of qMRI biomarkers to predict the survival-linked LC/LF in brain metastasis patients treated with SRT.

## Results

Figure 1 demonstrates the scheme of the radiomics analysis framework applied in this study to develop qMRI signatures of metastatic brain lesions treated with SRT for the LC outcome evaluation. A total of 3072 imaging features were extracted from four sub-regions of each lesion: the tumour, edema, tumour-margin, and the lesion-margin, as described in the Methods.

The heat maps of the derived qMRI features are detailed in Fig. 2. Figure 2a shows the R-squared heat map generated using the Pearson correlation coefficient obtained for each pair of the extracted features. The heat map identifies several clusters of redundant features with high levels of inter-feature correlation. An R-squared threshold of 0.8 was used to reduce clusters of highly-correlated features to a single representative feature with the largest natural dynamic range. This process reduced the number of features from 3072 to 927. A non-parametric Mann-Whitney U test was applied to quantify the level of statistical difference exhibited by each feature between the lesions with different local outcomes (LC versus LF). Figure 2b demonstrates the p-value heat maps obtained for different sub-regions of the lesion after the redundant features are eliminated. The heat maps highlight several qMRI features demonstrating a statistically significant difference between the lesions with an overall LC versus LF outcome (p-value < 0.05). Similar results were obtained for the 6-month and 12-month LC/LF outcomes (Supplementary Figures 1 and 2).

The Mann-Whitney U test was used in conjunction with a 50-fold sampling scheme to rank the features for the overall, 6-month, and 12-month LC/LF outcomes, followed by a forward feature selection with bootstrap 0.632 + AUC $$({\widehat{AUC}}_{.632+})$$. Table 1 demonstrates the optimal qMRI biomarkers obtained for each outcome. The optimal qMRI biomarker consists of five features for the overall LC/LF outcome, and four features for the 6-month and 12-month outcomes. Among the 13 distinct features selected, four features (LM-WLHL-G-MaxProb-T2, TM-WLHL-G-MaxProb-T2, LM-WHLL- G-MaxProb-T2, E-WHLH-H-Max-T2) were statistically extremely significant (p-value < 0.001), seven features (E-WHHL-H-Max_T2, T-H-Min-T2, TM-WHLH-H-Min-T1, E-MedianLBP-T1, T-MedianLBP-T1, E-WHLH-H-MeanAbsDev-T2, TM-GE-Convexity-T1) were statistically highly significant (p-value < 0.01), and two features (TM-WHLH-H-Min-T2, LM-WHLL-H-Range-T2) were statistically significant (p-value < 0.05), between the LC versus LF outcomes. The selected features mainly characterize the heterogeneity in the surrounding regions of tumour including edema, tumour-margin, and lesion-margin.

Figure 3 contrasts two representative tumours, one with LC on long-term follow up (Fig. 3a) and the other one with a LF (Fig. 3b). The parametric maps at first follow up show significantly different changes from the baseline between the tumours. Specifically, the LM-WLHL-G-MaxProb-T2, E-WHHL-H-Max_T2, T-H-Min-T2, TM-WHLH-H-Min-T2, and LM-WHLL-H-Range-T2 parametric maps demonstrate a mean change of 88% versus −40%, −52% versus 16%, −59% versus −2%, −95% versus −65%, and −54% versus 27%, respectively, for the LC vs LF case, respectively.

### Early prediction of SRT outcome

Table 2 summarizes the results of LC/LF outcome prediction using the optimal qMRI biomarkers with an SVM classifier. The table presents the prediction results in terms of $${\widehat{AUC}}_{.632+}$$ as well as the leave-one-patient-out (LOPO) cross-validated sensitivity, specificity, and accuracy. The maximum $${\widehat{AUC}}_{.632+}$$ was 0.82, using 7 features for the 6-month and overall LC/LF, and 6 features for the 12-month local outcome. The optimal qMRI biomarkers (the smallest set of features as presented in Table 1) resulted in an $${\widehat{AUC}}_{.632+}$$ of 0.80, 0.81, and 0.79 for the 6-month, 12-month and overall LC/LF, respectively. In addition, the optimal qMRI biomarkers could predict the 6-month, 12-month and overall LC/LF outcomes with a cross-validated accuracy of 80%, 82% and 80%, respectively.

### Risk assessment

The performance of the qMRI optimal biomarkers presented in Table 1 was further evaluated through long-term (five years) risk assessments in terms of the local outcome and overall survival using the Kaplan-Meier (KM) analysis. The risk assessment was performed for two separate cohorts: those predicted with a LC (cohort 1) versus LF (cohort 2) outcome by the radiomic framework. Figure 4 demonstrates the estimated long-term LC outcome of the tumours and overall survival of the patients. The plots show the KM clinical local control/survival curves obtained for the tumours/patients categorized based on the cross-validated 6-month, 12-month and overall LC/LF predicted outcomes at the first follow up using the optimal qMRI biomarkers. A statistically extremely significant difference (p-value < 0.001) was observed between the clinical local control curves of the two tumour cohorts categorized based on the optimal qMRI biomarkers of all three outcomes. Also, the survival curves obtained for the two patient cohorts categorized using the optimal qMRI biomarkers of the overall, 6-month, and 12-month outcomes demonstrated a statistically significant (p-value = 0.01), statistically highly significant (p-value < 0.01), and statistically extremely significant (p-value ≤ 0.001) difference, respectively.

## Discussion

In this study, the qMRI biomarkers derived from conventional T1w and T2-FLAIR images were investigated through a radiomics-machine learning framework to predict the local outcome in patients with brain metastasis treated with SRT. A total of 3072 geometrical and textural features were extracted from MR images of 100 patients (133 lesions) who underwent SRT for brain metastases. The features were extracted within the tumour, edema, tumour-margin, and lesion-margin before and at the first follow up after the treatment. The relative change from the baseline was calculated for each feature. The number of features was reduced to 927 using a correlation-based analysis. Subsequently, the best features were selected using Mann-Whitney U test in conjunction with a 50-fold sampling scheme, and a forward selection strategy to construct optimal biomarkers of overall, 6-month and 12-month LC/LF outcomes. The optimal qMRI biomarker consisted of 5 features for the overall LC/LF, and 4 features for the 6-month and 12-month local outcomes. All the selected features demonstrated a statistically significant, highly significant, or extremely significant difference between the lesions that had an outcome of LC versus LF.

The optimal biomarkers mainly consisted of features that characterize the heterogeneity within the edema, tumour-margin, and lesion-margin. Specifically, 11 out of the 13 distinct features in the three optimal biomarkers quantify changes in the heterogeneity in the region surrounding the tumour. These marginal regions are likely to contain malignant cells, but the number of these cells is not enough to result in an evident image contrast on standard MRI. The heterogeneity in these regions may characterize the frequency and distribution of cancerous cells and, therefore, could be linked to the LC/LF outcome of the treatment. This is in agreement with the findings of other studies in which the heterogeneity in the micro-structure of the tumour margin showed a good correlation to tumour response to therapy54,55.

Previous studies have demonstrated the prognostic power of the radiomics features extracted from ultrasound56,57,58,59,60, CT61,62, MR63,64, and pathology65 images. Such features have been reported to be predictive of different therapy outcomes (e.g. pathological response66, distant metastasis51,61, LC50, and OS45), and for various cancer sites including the lung45,61 brain67, breast48,68, and head and neck51 cancer. Whereas most previous studies have focused on the tumour itself, the results obtained from this study encourage conducting a comprehensive assessment of a larger region encompassing the entire lesion as well as its margin. Specifically, the qMRI biomarkers extracted from the edema and tumour/lesion margins demonstrated more prognostic power compared to features extracted from the tumour itself.

The results of the feature selection demonstrated a good potential of multi-wavelet filtered images for characterizing outcome-related attributes of metastatic lesions. In particular, four out of five, two out of four, and three out of four features selected in the optimal biomarkers of the overall, 6-month and 12-month LC/LF outcomes, respectively, were derived from the multi-wavelet filtered T1w or T2-FLAIR images. The multi-wavelet filtered images are obtained by filtering the original image in x, y, and z directions using the wavelet transform (Supplementary Figure 5). Such images contain the decomposed textural information of the original image in different bandwidths of spatial frequency for all directions. The results obtained in this study are in agreement with the findings of previous work that demonstrate the diagnostic and prognostic value of texture features extracted from wavelet images for tumour characterization and therapy outcome evaluation45,61,65. In this study, the features extracted from T2-FLAIR images demonstrated high prognostic power for prediction of LC/LF, where the top selected feature in all three optimal qMRI biomarkers (nine out of 13 selected features) were extracted from T2-FLAIR images. This observation further highlights the links between the characteristics of the edema micro-structure and LC in brain metastasis as T2-FLAIR images provide a better contrast for the edema region, compared to T1w images. Tumour size, SRT dose and number of fractions may impact LC/LF in brain metastasis. The average size of the tumours investigated in this study was 2.1 ± 1.1 cm, with a relatively small standard deviation. The majority of the tumours received a total dose of 30 Gy in 5 fractions. The total dose was adjusted to 25, 27.5, 32.5 or 35 Gy (in 5 fractions) for a few tumours based on their size. The LC/LF for each tumour was determined independent of the size, total dose and the number of fractions.

The optimal qMRI biomarkers were also applied to predict LF using an SVM classifier with the bootstrap 0.632 + and LOPO cross-validation methods. The optimal biomarkers could predict the overall, 6-month, and 12-month local outcomes with an $${\widehat{AUC}}_{.632+}$$ of 0.79, 0.80, and 0.80 respectively. The cross-validated accuracy of the predictive model was found to be 80% for overall and 6-month LC/LF, and 82% for the 12-month LC/LF. The performance of the qMRI optimal biomarkers was further evaluated through long-term risk assessments using the KM analysis. A statistically significant difference was observed between the clinical local control curves, and survival curves obtained for the patients with a predicted LC versus LF outcome using the optimal qMRI biomarkers. The proposed model can potentially be used for clinical risk assessment, and treatment planning of brain metastasis patients. The results of the outcome prediction and risk assessment obtained in this study for 100 patients are promising. However, investigations on larger cohorts of patients (including an independent validation cohort) are required to assess further the efficacy and robustness of the technique in the clinic.

In conclusion, this study demonstrated a good potential of qMRI biomarkers derived from conventional T1w and T2-FLAIR images to predict the local outcome in metastatic brain tumours early after the treatment with SRT. The qMRI features identified in this study were (extremely/highly) significantly different between the tumours with a LF versus LC outcome. The proposed biomarkers were also found to have a good cross-validated sensitivity and specificity to distinguish the tumours with an outcome of LF within three months following the treatment. The promising results presented in this study imply that the qMRI biomarkers, as potential early surrogates of LC/LF in brain metastasis patients, can possibly facilitate effective changes in treatment on an individual patient basis that may improve patient outcomes. Patients predicted to have LF can potentially be treated with salvage surgical or radiation treatments earlier to prevent downstream morbidity of progressive metastatic disease in the brain.

## Methods

### Study protocol and data acquisition

The radiomics framework in this study applied the images acquired at the baseline and the first follow up. The baseline images for each patient were acquired right before the radiation treatment planning, whereas the first follow-up images were acquired within three months after the treatment. The in-plane image resolution was 0.5 mm for both T1w and T2-FLAIR images. The slice thickness was 1.5 mm and 5 mm for T1w and T2-FLAIR images, respectively.

### Feature extraction

The quantitative features investigated in this study included two groups of geometrical and textural features. The geometrical features were calculated from the binary masks. The texture features were extracted from the T1w and T2-FLAIR images within the four sub-lesion masks. A total of 3072 geometrical and textural features were derived as described in detail in the supplementary information. The textural features included the first order histogram features derived from the MR images and the corresponding local binary pattern (LBP) parametric images, as well as the second order grey-level co-occurrence matrix (GLCM) features74,75. These features were extracted from both the original images and the multi-wavelet filtered images as described in the supplementary information76. The interpolated MR images with an isotropic voxel size were used to extract 3D features (GLCM and multi-wavelet features). The other 2D features were derived from the original images. All features were extracted from both the baseline and first follow up images and relative changes from the baseline at the first follow up were calculated.

### Feature reduction/selection

The optimal qMRI biomarkers for LC/LF outcome prediction were constructed using a multi-step feature reduction/selection procedure. The features were first assessed using a Pearson correlation analysis to obtain the coefficient of determination (R2) for each feature pair (Fig. 2a). Clusters of highly correlated features were identified from the R-squared matrix using a threshold of R2 = 0.8. Next, in each cluster of correlated features, the one with the largest dynamic range was selected as the representative feature. Using this approach, the 3072 features were reduced to 927. Subsequently, a two-step feature selection procedure was utilized to construct the optimal qMRI biomarkers. First, the features were ranked using the p-values obtained from the Mann-Whitney U test in conjunction with a 50-fold sampling scheme. In each sampling step, the p-values were calculated over 49 folds of the patients and the 15 features with the smallest p-values were identified. The features were then ranked using their frequency of occurrence in the best 15 features over the 50 samples. In the second step of feature selection, the $${\widehat{AUC}}_{.632+}$$ was used with a forward feature selection scheme to find the feature sets leading to the maximum performance. The $${\widehat{AUC}}_{.632+}$$ is the average of the $$AU{C}_{.632+}$$ values obtained for all bootstrap 0.632 + samples drawn from the data set. The $$AU{C}_{.632+}$$ is a weighted sum of the re-substitution and test AUCs computed for each bootstrap sample (supplementary information). In the forward feature selection, the $${\widehat{AUC}}_{.632+}$$ was calculated for each feature set using 250 bootstrap samples and an SVM classifier. The best feature set obtained from the forward selection was further reduced using an ANOVA test to find the smallest set of features (referred to as the optimal qMRI biomarker) leading to a statistically similar performance. The details of feature reduction/selection are described in supplementary information. The proposed feature selection procedure was performed independently for each of the overall, 6-month, and 12-month LC/LF outcomes.

### Outcome prediction and risk assessment

The outcomes prediction was performed independently after the feature selection step using the optimal qMRI biomarkers. An SVM classifier was used in conjunction with bootstrap 0.632 + and LOPO cross-validation methods to predict the LC/LF outcomes. The $${\widehat{AUC}}_{.632+}$$ was calculated using another 250 bootstrap samples. A Gaussian kernel was used in the SVM for predicting the overall and 12-month outcomes. A linear SVM kernel was used for the 6-month outcome prediction. In order to compensate for the imbalance of the dataset, a random under-sampling was applied on the majority class (LC outcome) before each bootstrap sampling77. The under-sampling was applied on a patient level to have a balanced sample with an equal number of lesions with LC and LF outcomes.

The accuracy, sensitivity, and specificity of the model were calculated using a LOPO cross-validation. Each LOPO cross-validation step involved leaving one patient out, training the SVM classifier independently on 250 bootstrap samples from the training set, and characterizing the test lesions (associated with the left-out patient) with a majority vote of the 250 trained classifiers. The classifier’s parameters were similar to those used for calculating the $${\widehat{AUC}}_{.632+}$$.

Long-term risk analyses were performed to generate Kaplan-Meier clinical local control and survival curves for the two cohorts with different predicted outcomes. The overall, 6-month, and 12-month predicted outcomes (LC/LF) for each lesion was determined using the LOPO cross validation scheme. A patient was categorized into the LF cohort in case they have one tumour with a cross-validated LF predicted outcome at the first follow up. The Kaplan-Meier curves were compared using a log-rank test to assess for statistically significant differences between the clinical local control (lesion level) and survival (patient level) curves.