Quantitative MRI Biomarkers of Stereotactic Radiotherapy Outcome in Brain Metastasis

About 20–40% of cancer patients develop brain metastases, causing significant morbidity and mortality. Stereotactic radiation treatment is an established option that delivers high dose radiation to the target while sparing the surrounding normal tissue. However, up to 20% of metastatic brain tumours progress despite stereotactic treatment, and it can take months before it is evident on follow-up imaging. An early predictor of radiation therapy outcome in terms of tumour local failure (LF) is crucial, and can facilitate treatment adjustments or allow for early salvage treatment. In this study, an MR-based radiomics framework was proposed to derive and investigate quantitative MRI (qMRI) biomarkers for the outcome of LF in brain metastasis patients treated with hypo-fractionated stereotactic radiation therapy (SRT). The qMRI biomarkers were constructed through a multi-step feature extraction/reduction/selection framework using the conventional MR imaging data acquired from 100 patients (133 lesions), and were applied in conjunction with machine learning techniques for outcome prediction and risk assessment. The results indicated that the majority of the features in the optimal qMRI biomarkers characterize the heterogeneity in the surrounding regions of tumour including edema and tumour/lesion margins. The optimal qMRI biomarker consisted of five features that predict the outcome of LF with an area under the curve (AUC) of 0.79, and a cross-validated sensitivity and specificity of 81% and 79%, respectively. The Kaplan-Meier analyses showed a statistically significant difference in local control (p-value < 0.0001) and overall survival (p = 0.01). Findings from this study are a step towards using qMRI for early prediction of local failure in brain metastasis patients treated with SRT. This may facilitate early adjustments in treatment, such as surgical resection or salvage radiation, that can potentially improve treatment outcomes. Investigations on larger cohorts of patients are, however, required for further validation of the technique.


Image Registration/Segmentation and Mask Generation
To obtain the regions of interest (ROIs) for sub-lesion mask generation, a semi-automatic registrationsegmentation framework was developed ( Figure 1). The baseline T2-weighted-fluid-attenuationinversion-recovery (T2-FLAIR) images were initially registered to their corresponding Gadoliniumcontrast-enhanced-T1-weighted (T1w) images using an affine registration method with mutual information as the metric 1 . The edema region was segmented on the registered T2-FLAIR images using a region growing algorithm 2 . The region growing segmentation was followed by manual refinements performed in 3D slicer 3 . Finally, the inverse of the registration transformation matrix was used to warp the edema, and the tumour masks generated for the T1w images on the T2-FLAIR images. A similar procedure was performed on the follow-up images for generation of the sub-lesion masks. In addition to the tumour and edema masks, the lesion-margin and tumour-margin masks were generated using morphological image analysis, covering a 1, 3, 5, and 10 mm margin of the lesion (tumour + edema), and tumour, respectively.

Radiomics Features
A total number of 3072 were extracted from the segmented ROIs. The derived features included four groups of: 1) geometrical features, 2) histogram features, 3) gray-level co-occurrence matric (GLCM) features, and 4) multi-wavelet features.

1) Geometrical Features
The geometrical features were extracted from 2D axial slices of the binary masks. The extracted features were subsequently averaged over all axial slices. The following geometrical features were extracted using the MATLAB software package: 4. Aspect Ratio aspect ratio = width of ROI height of ROI

convexity =
The perimeter of convex hull P 6. Eccentricity: Eccentricity of the ellipse that has the same second-moments as the ROI. The eccentricity is the ratio of the distance between the foci of the ellipse and its major axis length.

2) Histogram Features
The histogram statistics describe the distribution of voxel intensities within an image. In this study, the histogram features were extracted from both the MR images and the corresponding local binary pattern (LBP) parametric images (described below).
The histogram features were computed for each 2D image slice, and subsequently averaged over all 2D slices. The following first-order statistical features were extracted:

Mean Absolute Deviation
9. Median: The median intensity value.

Uniformity
X(i) represents the intensity value of pixel i, and P(i) is the probability of the intensity value i.

LBP
LBP is a very efficient texture operator which labels the pixels of an image by thresholding the neighborhood of each pixel and considers the result as a binary number 4 . The LBP feature vector, in its simplest form, is created in the following manner:

3) Gray-Level Co-occurrence Matrix (GLCM) Features
The GLCM features are second-order texture features describing the relative position of the various gray-level intensities over the image that are computed using the gray level co-occurrence matrix 5 . To compute the gray level co-occurrence matrix, voxel intensities were linearly quantized using a binwidth of 25 gray levels 6 . The discretization step helps with noise removal and consistent texture extraction among all patients.
The gray-level co-occurrence matrix is a × matrix defined as ( , , , ). The entry ( , ) of P is computed by finding the relative number of times (probability) that occurs in direction , and distance of pixels from . Here, is the number of discrete gray level intensities.
Page 9 of 16 In this study, was a distance of 1 to 4 pixels, and was each of the 13 directions in a 3D matrix (45° rotation between the adjacent directions). The final GLCM matrix was the accumulation of all the GLCM matrices calculated for each , and .
For the co-occurrence matrix ( , ) with distance , direction , and number of intensity levels , the following parameters are calculated: , the mean of ( , ), The texture features are then computed using the following formula: 1. Autocorrelation: Entropy 10. Homogeneity homogeneity = ∑ ∑ (i, j) 1 + |i − j| Variance

4) Wavelet Features
The last group of features were derived from multi-wavelet filtered images which were acquired by filtering the MR images in x, y, and z directions using the wavelet filters 7 . The wavelet filters extract textural information by decomposing the image in low (L), and high (H) frequencies. The procedure for applying the three dimensional wavelet transform on each image is shown in Figure 5 where the first layer corresponds to filtering along the x axis. The second and thirds layers correspond to filtering along the y, and z axes, respectively. Once the multi-wavelet images were computed, the histogram and GLCM features were extracted from each multi-wavelet image.

Feature Reduction/Selection
The optimal features for LC/LF prediction were obtained through a multi-step feature reduction/selection procedure. The features reduction procedure involved using a Pearson correlation analysis to obtain the coefficient of determination (R 2 ) for each feature pair (Figures 1   and 2). Clusters of highly correlated features were identified from the R-squred matrix using a threshold of R 2 = 0.8. Next, in each cluster of correlated features, the one which was associated with the largest dynamic range was selected as the representative feature. The feature reduction step was followed by a two-step feature selection procedure. First, the features were ranked using the pvalues obtained from the Mann-Whitney U test in conjunction with a 50-fold sampling scheme. In each sampling step, the p-values were calculated over 49 folds of the patients and the 15 features with the smallest p-values were identified. The features were then ranked using their frequency of occurrence over the 50 samples. In the second step of feature selection, the ̂. 632+ was used with a forward feature selection scheme to find the feature sets leading to the maximum performance.
The ̂. 632+ is the average of the .632+ values obtained for all bootstrap .632+ samples drawn from the dataset X. In this study, a balancing step was performed before each bootstrap sampling to compensate for the imbalance of the dataset. For the dataset X consisting of N observations, let X bal be a balanced set obtained by keeping all the observations from the minority group (n=N1), and random sampling of N1 observations from the majority group. Next, let X train bal−b be the selected bootstrap training sample at step b (b = 1, 2 … B), and X test bal−b the corresponding test sample obtained by excluding the training samples from X bal .
For each step, the bootstrap AUC, AUC b , can be obtained by testing the trained classifier on the bootstrap test set. AUC b is known to be upward biased. As such, the bias is accounted for using the following definition: and AUC x is the AUC obtained from testing the trained classifier on the trained set (re-substitution AUC). AUC x is known to be downward biased. Finally, the overall AUC for bootstrapping test is defined as: In the forward feature selection, the ̂. 632+ was calculated for each feature set using 250 bootstrap samples and an SVM classifier. The best feature set obtained from the forward selection was further reduced using an ANOVA test to find the smallest set of features (referred to as the optimal qMR biomarker) leading to a statistically similar performance. Specifically, in each step of forward selection The feature selection procedure is demonstrated in Figure 4. The feature extraction, reduction, selection and classification algorithms were implemented in MATLAB software package.