Quantitative MRI-based radiomics analysis identifies blood flow feature associated to overall survival for rectal cancer patients

Radiomics objectively quantifies image information through numerical metrics known as features. In this study, we investigated the stability of magnetic resonance imaging (MRI)-based radiomics features in rectal cancer using both anatomical MRI and quantitative MRI (qMRI), when different methods to define the tumor volume were used. Second, we evaluated the prognostic value of stable features associated to 5-year progression-free survival (PFS) and overall survival (OS). On a 1.5 T MRI scanner, 81 patients underwent diagnostic MRI, an extended diffusion-weighted sequence with calculation of the apparent diffusion coefficient (ADC) and a multiecho dynamic contrast sequence generating both dynamic contrast-enhanced and dynamic susceptibility contrast (DSC) MR, allowing quantification of Ktrans, blood flow (BF) and area under the DSC curve (AUC). Radiomic features were extracted from T2w images and from ADC, Ktrans, BF and AUC maps. Tumor volumes were defined with three methods; machine learning, deep learning and manual delineations. The interclass correlation coefficient (ICC) assessed the stability of features. Internal validation was performed on 1000 bootstrap resamples in terms of discrimination, calibration and decisional benefit. For each combination of image and volume definition, 94 features were extracted. Features from qMRI contained higher prognostic potential than features from anatomical MRI. When stable features (> 90% ICC) were compared with clinical parameters, qMRI features demonstrated the best prognostic potential. A feature extracted from the DSC MRI parameter BF was associated with both PFS (p = 0.004) and OS (p = 0.004). In summary, stable qMRI-based radiomics features was identified, in particular, a feature based on BF from DSC MRI was associated with both PFS and OS.


Materials and methods
Figure 1 provides a flowchart detailing the input of MRI parameters and tumor volumes to the radiomics model as well as how radiomics features were extracted and used in outcome analysis together with clinical parameters.

Patient cohort
Patients included in this analysis were part of the prospective observational OxyTarget study (NCT01816607).The study was performed in accordance with the Helsinki Declaration, and written informed consent was obtained from all participants.Approval was obtained from the Institutional Review Board and the Regional Committee for Medical and Health Research Ethics.Between October 2013 and December 2017, a total of 192 patients with suspected rectal cancer were consecutively enrolled.Eligible participants were older than 18 years and did not have any previous treatment for rectal cancer.Routine and study-specific MRI sequences were carried out at baseline before treatment.For patients receiving neoadjuvant treatment (radiotherapy and/or chemotherapy), a second MRI was performed after treatment completion.Selection of patients for neoadjuvant treatment was determined by the multidisciplinary team, applying the 2013 ESMO Clinical Practice Guidelines 19 and according to the imaging updates detailed in the 2017 version 20 .Diagnostic T and N stages (mrTN) were determined using at the diagnostic MRI.The histopathological assessment (pTN or ypTN stage) was performed by experienced pathologists after surgery.Distant metastasis (M) was detected with routine thoracoabdominal computed tomography (CT) as recommended by the national follow-up program for colorectal cancer or with individual, supplemental examinations due to clinical suspicion.All patients are followed for five years, the last censoring date was January 31 st , 2022.No patients were lost to follow-up.The PFS was calculated from study enrolment to first progression (local recurrence, metastasis, or death) or patient censoring due to reaching the maximum follow-up time of 5 years.The OS was calculated from enrolment to death or patient censoring.For the present study, we included data from patients who were enrolled in a previous study on automatic tumor segmentation.Figure 2 shows a flowchart of the number of patients eligible for analysis.Table 1 summarizes the patient characteristics.For the stability analysis of radiomics features, all 81 patient datasets were used.For the subsequent statistical analysis to identify whether clinical and/or stable radiomics features can predict PFS or OS, 19 patients who presented distant metastases at the time of primary diagnosis were excluded.

Magnetic resonance imaging
MRI was performed on a Philips Achieva 1.5 T system (Philips Healthcare, Best, The Netherlands).In addition to routine T2-weighted high-resolution fast spin echo (T2w) images, an extended echo-planar imaging based DW MRI sequence with seven b-values of b = 0, 25, 50, 100, 500, 1000 and 1300 s/mm 2 and a dynamic multi-echo contrast MRI sequence with three echoes with echo times (TE) = 4.6, 13.9 and 23.2 ms was collected.The latter was acquired using a split dynamic acquisition 9 , using a bolus injection of 0.2 ml/kg body weight of Dotarem® (279.3 mg/ml, Guerbert Roissy, France) directly followed by a 20 ml saline solution.Further details regarding the image acquisitions are found in Supplementary Table S1 and in 11,17 .To reduce bowel movement, glucagon (1 mg/

Image pre-processing
While the T2w images were z-score normalized and used directly as input to the radiomics analysis, voxel-wise quantitative parameter maps were calculated from the DW and multi-echo MR images.For the DW images, the apparent diffusion coefficient (ADC) was calculated using the standard mono-exponential method 21 .From the dynamic multi-echo contrast MRI sequence, both T1w DCE-and T2*w DSC curves were extracted, which were further processed to parameter maps.Details regarding this have been published previously 10 .In brief, the DCE curves were fitted to an extended Tofts model 22 for estimation of the volume transfer constant (K trans ), the rate constant (k ep ), the plasma volume (v p ) and the extravascular extracellular volume (v e ).Based on previous studies 9,10 , the K trans parameter has shown the highest prognostic potential and was used as input to the radiomics analysis.For the DSC analysis, the T2*w signal was used in a model-free approach 23 for estimation of the perfusion parameter BF and area under the curve (AUC).The BF is calculated by deconvolution of the contrast curve with the arterial input function.The AUC is a model free description of the contrast enhancement curve.The analysis of dynamic multi-echo data was done in NordicICE, version 4.0 (NordicNeuroLab, Bergen, Norway) and analysis of DW data was done in Python v3.7.The T2w images and the ADC, K trans , BF and AUC maps were used as input to the radiomics analysis.

Tumor volume definition
We have previously developed methods for semi-automatic and automatic tumor volume definition 17,18 .This allowed us to explore whether the method of tumor volume definition affect radiomics feature extraction and their correlation to clinical outcome in this analysis.Tumor volumes defined by three different methods were used; (1) manually by two radiologists (Manual-A and Manual-B) with 14 and 7 years of experience with abdominal MRI delineating the tumor volume on the T2w images with DW images available as guidance where the median interobserver per patient Sørensen Dice similarity coefficient (DSC P ) is 0.87 (interquartile range (IQR): 0.07) as described in 18 ; (2) semi-automatically based on classical machine learning using voxel-wise classification www.nature.com/scientificreports/via adaptive boosting (ADA) combined with morphological postprocessing 17 .This model was trained on raw image data from the T2w, DW, and dynamic multi-echo MRI of the same cohort and reported a DSC P of 0.72 (IQR: 0.16) 17 .And (3), volumes which were automatically segmented using a DL 2D U-Net trained on the T2w and b500 DW images, giving a DSC P of 0.77 (IQR: 0.14), as described in 18 .

Feature extraction
Radiomic features were extracted using Pyradiomics v2.2 24 in Python v3.7, see Supplementary material Appendix A1 for the configuration file.Features were extracted for the T2w images and the four parameter maps (ADC, K trans , BF and AUC).The extraction was repeated with all four tumor volume definitions (Manual-A, Manual-B, Automatic, Semi-Automatic).The tumor masks were originally defined on the T2w image grid, and rigid image registration was used to transfer the masks to the corresponding parameter map coordinate system.For each combination of image and volume definition, 94 features were extracted.These features can be classified into six feature families; first order statistics (FO, n = 19), gray level cooccurrence matrix (GLCM, n = 24), gray level run length matrix (GLRLM, n = 16), gray level size zone matrix (GLSZM, n = 16), gray level dependence matrix (GLDM, n = 14) and neighboring gray tone difference matrix (NGTDM, n = 5).Detailed extraction configurations can be found in the Supplementary Appendix A1, and the exact definitions of the individual features can be found in the Pyradiomics documentation.For the feature extraction, the bin width parameter (binWidth) was adjusted for each image type to split the intensity range of the corresponding image into approximately 100 bins.

Assessment of feature stability
To assess the stability of each radiomics feature from the different image inputs under the varying tumor volume definitions, the interclass correlation coefficient (ICC) was calculated.The two-way random effects, absolute agreement, single rater ICC(2,1) was used, following the definition in 25 : where rater here stands for the tumor volume definition, MS R is mean square for rows (patient samples), MS E is mean square for error (average extent to which the rater's scores equal), MS C is mean square for columns (raters), k is number of raters and n the number of subjects.Following the same guidelines, the reliability of the radiomics features was classified as poor (ICC < 0.5), moderate (0.5 < = ICC < 0.75), good (0.75 < = ICC < 0.9) or excellent (0.9 < = ICC).In the subsequent analysis identifying radiomics features with prognostic potential, only features with excellent reliability were included, derived from the manual tumor contour of the most experienced radiologist (Manual A).The calculation of ICC(2,1) was performed with pingouin v0.4 26 in Python v3.7.

Statistical analysis
Three models (clinical, radiomics and combined clinical and radiomics) were made to identify whether clinical variables and/or the stable radiomics features could predict the patients' outcome.Univariable and multivariable Cox regression were performed to find significant predictors.Considering the small number of events, a maximum number of two candidate predictors were used to avoid overfitting.The predictors with the highest hazard ratio (HR) in univariable analysis and no significant correlation (using the Spearman correlation test) were chosen to build the radiomic model.Selected predictors in the clinical and radiomic models were used to build the combined model.Internal validation was performed on 1000 bootstrap resamples to estimate the optimism-corrected area under the receiver operating characteristic curve (AUC).The Youden index was used to determine the threshold for calculating the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and accuracy.The Youden index is defined as sensitivity (%) + specificity (%) − 100.Sensitivity (%) is defined as true positives(TP) TP+false negatives(FN) and specificity (%) as true negatives(TN) TN+false positives(FP) , whereas PPV is defined as TP TP+FP , and NPV as TN TN+FN .Calibration, which presents the agreement between the actual outcomes and predicted probabilities, was evaluated using the scatter plot where the x = y line indicates perfect calibration.Decision curve analysis (DCA) was performed to visualize the net benefit of models considering "treat none" and "treat all" as the benchmarking strategies.The net benefit was calculated as a function of relative harms related to the false predictions for each threshold of predicted probability.A nomogram presenting a sample patient was developed to improve the interpretability and reusability of the prediction models.Variables were compared by the Mann-Whitney U test, Chi-square test or Fisher's exact test.Differences in survival were assessed by the log-rank test and visualized with Kaplan-Meier plots based on separating patients in two groups over/under the optimal cutoff of the

Feature extraction and stability analysis
Figure 3 shows a T2w image and the ADC, K trans , BF and AUC maps for one patient, as an example of the input to the radiomics analysis.From each of the five input images, 94 different radiomics features were extracted.Figure 4 shows a list of all these features separated into the six different feature classes.The color maps show the distribution of the mean ICCs over all patients, which were calculated based on the four different volumes as a means to identify the stability of the features when the volume is varying.Overall, the features from the T2w images had a low ICC.Oppositely, the features from the qMRI parameters, in particular K trans and BF, showed high ICC.The Supplementary Figure S2 shows a more detailed overview of the ICC per feature and image type.
In Fig. 5, the ICC is summarized per feature class and image type, confirming the low ICC with high variation in features from T2w images and high ICC with low variation in features from K trans and BF.Five of the six feature classes presented very similar trends, where the performance of the last class (NGTDM) was inferior to the others.In order to select stable and robust features for the outcome analysis, we set the cutoff at 90% ICC, i.e., excellent reproducibility.

Outcome analysis
Based on the clinical parameters (Table 1) and the identified stable radiomics features with ICC > 90%, outcome prediction models consisting of either clinical or radiomics features or a combination, were developed for both PFS and OS.The clinical parameters age, sex, body mass index (BMI), T stage and N stage were candidate parameters to the clinical prognostic model (Table 2A and B).The two parameters that were statistically most different between the groups were selected.In univariable analysis, sex and T stage were found to have significant associations with PFS, and BMI and T stage showed significant associations with OS.
Table 3A and B shows the results from the clinical model, the radiomics model, and the combined model using the best clinical and radiomics parameters as input.For the analysis against PFS (Table 3A), sex was most important for the clinical model, whereas for the radiomics model the best feature was a feature based on the K trans parameter, the FO mean absolute.In the combined model, the feature based on K trans remained as the most significant variable, with an HR of 1.63 (95% CI = 1.05-2.52,p = 0.03).For the analysis against OS (Table 3B), high BMI was important for the clinical model, whereas for the radiomics model the best feature was a feature based on the BF parameter, the BF GLCM id parameter.In the combined model, the feature based on BF remained as the most significant variable, with an HR of 2.16 (95% CI = 1.18-3.95,p = 0.013).
Figure 6 shows the ROC analysis, calibration and DCA for PFS and OS for the three models.The models perform relatively similar, although for PFS the radiomics model had the highest AUC of 68.0% (standard deviation (SD): 19%), with a PPV of 51% and a NPV of 95%, and for OS the combined model had the highest AUC of 69.4% (SD: 22%), with a PPV of 46% and NPV of 92%.In the calibration plots, it can be seen that for the PFS the clinical model underestimates the actual risk, whereas for the OS all three models performs quite similar, although the clinical model underestimates the risk for high event probabilities.In the DCA for PFS, it can be seen that the clinical model provides a more accurate prediction compared to the radiomics model for middle-risk patients (threshold probability 0.3-0.6).However, for high-risk patients (threshold probability 0.7-0.9),only the radiomics model provided predictions that were acceptable, since that was the only model showing net benefits above both benchmarking lines.The DCA of the models for OS showed similar results.However, for the high-risk patients the combined model with both radiomics and clinical input were better than the radiomics only model.Nomograms of the combined model for predicting the 5-year PFS and OS are shown in Supplementary Figure S3.
A log-rank test with associated Kaplan-Meier plots were performed for the three radiomics features that were identified in Table 3 (K trans FO mean absolute, ADC GLCM joint average, BF GLSZM id).When evaluated against both PFS and OS, the log-rank tests revealed that only the BF GLSZM id feature was significant (p = 0.004 for PFS and p = 0.004 for OS). Figure 7 shows the Kaplan-Meier plot for the BF GLSZM id feature for PFS and OS, when separating patients above and below the optimal cutoff.For PFS, the difference in progression at 60 months was 54% for the group of patients above the cutoff, and 19% for the group of patients below the cutoff.For OS, the survival difference at 60 months was 38% for the group of patients above the cutoff and 8% for patients below the cutoff.

Discussion
In this study we aimed to identify radiomics features with prognostic potential from MR images of rectal cancer, which were stable against interobserver variability in tumor contouring.From the stable radiomics features, our main finding was that features based on qMRI contained higher prognostic potential than features based on highresolution T2-weighted MRI sequences.Also, when the radiomics features were compared to clinical parameters, the qMRI radiomics features demonstrated the best prognostic potential, in particular the radiomics feature based on the DSC MRI parameter BF, BF GLSZM id, which was significantly associated with both PFS and OS.
Given that we had several input volumes available (two sets of manual contours, semi-automatic machine learning contours and deep-learning contours) we had the opportunity to identify which radiomics features were stable and robust across the different input volumes.By doing such a stability analysis, one may reduce the noise and probability of random findings in the feature extraction process.Previously, few studies have investigated stability of qMRI-based radiomics, however, two studies using the ADC from DW MRI exist.One study assessed the stability of ADC-based radiomics features in rectal cancer 27 , finding that shape features were strongly affected by delineation quality whereas reproducibility of textural features was poor when the image pre-processing was varied.In contrast, features from intensity distributions were less sensitive to variation in both pre-processing and delineations.A study by Peerling et al. showed low test-retest stability of ADC-based radiomics features in a multicenter study in lung cancer, ovarian cancer and liver metastases of colorectal cancer 28 .In addition, they showed that the feature stability was affected by the type of MRI scanner and the field strength of the scanner.In the stability analysis in our study, we show that relevant and stable radiomics features can be extracted from volumes defined by automatic deep learning algorithms.This represents an important step towards an automated workflow, which is essential if radiomics analysis is to be integrated as a clinical tool.
The stability analysis was conducted using the measure recommended by 25 , the two-way random effects, absolute agreement, single rater ICC(2,1) measure.We discovered that many T2w features were removed because their ICC was low, whereas the ICC results were overall higher and more consistent when qMRI was used as www.nature.com/scientificreports/(Pyradiomics) adds robustness to our results.Given the few studies that have addressed stability of qMRI radiomics features, we believe our approach may provide an example on how this can be conducted.After we identified the robust radiomics features, their prognostic potential against 5-year PFS and OS was compared with the prognostic potential of common clinical parameters, as well as with a combined model based on both radiomics and clinical parameters.The models successfully predicted both PFS and OS.Interestingly, in the combined models, the radiomics features based on qMRI remained as the most significant (K trans FO mean absolute for PFS and BF GLSZM id for OS) (Table 3A and B).Furthermore, the DCA analysis revealed that the radiomics features were important in predicting both PFS and OS for high-risk patients (Fig. 6).For PFS, the radiomics model alone was most valuable, whereas for OS the combined radiomics and clinical model was the most optimal.The result that the clinical model alone provides inferior predictions for high-risk patients, provides support for including radiomics features into clinical decision support.In line with this finding, it was previously found in the same patient cohort showed that the K trans and BF parameters are important.In one study we found that K trans was associated with higher probability of lymph node metastasis 9 , whereas the BF parameter has been associated with both CRT response and OS 11 .The finding that these parameters remain significant also in a radiomics study, supports that these parameters are stable and important markers to assess and predict the development of rectal cancer progression.
Ideally, we should have validated our results in independent cohorts.However, since our study-specific DCE MRI and DSC MRI sequences are not commonly acquired in rectal cancer, we were unable to find suitable validation cohorts.This highlights the need for standardization in both image acquisition and post-processing of qMRI 13 in order to achieve validated models that can qualify for clinical integration 33 .Furthermore, MRI sequences not requiring intravenous contrast agents are preferable due to storage of gadolinium in the brain.However, these patients have a life-threatening disease and the addition of a gadolinium-based contrast agent is justified.Table 3. Clinical, radiomic and combined models for prediction of (A) progression-free survival (PFS) and (B) overall survival (OS).PFS: progression free survival; OS: overall survival; HR: hazard ratio; K trans : volume transfer constant; FO: first order; ADC: apparent diffusion constant; GLCM: gray level co-occurrence matrix; BF: blood flow; ID: inverse difference.

Conclusion
We provide an approach to identify stable qMRI-based radiomics features with prognostic value.In particular, a radiomics feature based on BF from DSC MRI was stable and associated with both PFS and OS.qMRI as input to radiomics for robust outcome analysis is novel, but further studies are needed to fully identify the potential benefit of qMRI for radiomics.Kaplan-Meier plot for the blood flow (BF) gray level size zone matrix (GLSZM) id feature for progression-free survival (PFS) and overall survival (OS), when separating patients above and below the optimal cutoff.For PFS, the difference in progression at 60 months was 54% for the group of patients above the cutoff, and 19% for the group of patients below the cutoff.For OS, the survival difference at 60 months was 38% for the group of patients above the cutoff and 8% for patients below the cutoff.The log-rank tests for BF GLSZM id against resulted in p = 0.004 for PFS and p = 0.004 for OS.

Figure 1 .
Figure1.Flowchart illustrating the input to the radiomics analysis consisting of different MRI parameters and different methods to define the tumor volume, before radiomics features were extracted.The stable radiomics features having an interclass correlation coefficient (ICC) above 0.9 were used in outcome analysis (5-year overall survival (OS) and progression-free survival (PFS)), where a model with the radiomics features was compared to a clinical model, but also to a combined model with both clinical and radiomics features.ADC = apparent diffusion coefficient, ADA = adaptive boosting, AUC = area under the curve, BF = blood flow, BMI = body mass index, DL = deep learning, FO = first order, GLCM = gray level cooccurrence matrix, GLDM = gray level dependence matrix, GLRLM = gray level run length matrix, GLSZM = gray level size zone matrix, K trans = plasma transfer constant, ML = machine learning, NGTDM = neighboring gray tone difference matrix, TNM = tumor node metastasis, T2w = T2-weighted MRI.

Figure 3 .Figure 4 .
Figure 3. Illustration of images used as input to the radiomics analysis.The T2-weighted (T2w) magnetic resonance (MR) image is shown together with the tumor contour made by the more experienced radiologist.In addition, the quantitative parameter maps for the apparent diffusion coefficient (ADC), plasma transfer constant (K trans ), blood flow (BF) and area under the curve (AUC) derived from the dynamic susceptibility contrast (DSC) MR images are shown.

Figure 5 .
Figure5.Overview of the interclass correlation coefficient (ICC), ICC(2,1) values, as measurement of feature stability under contour variation for the different feature classes and the different images and parameter maps used as input.FO: first order, GLCM: gray level cooccurrence matrix, GLRLM: gray level run length matrix, GLSZM: gray level size zone matrix, GLDM: gray level dependence matrix, NGTDM: neighboring gray tone difference matrix, ADC: apparent diffusion coefficient, K trans : plasma transfer constant, BF: blood flow, AUC: area under the curve.

Figure 6 .
Figure 6.Receiver operative curve (ROC) analysis for progression-free survival (PFS) and overall survival (OS) (A and D), calibration plots for PFS and OS (B and E), and decision curve analysis (DCA) (C and F) for PFS and OS for the clinical model alone (red), radiomics model alone (blue) and the combined clinical and radiomics model (green).

Figure 7 .
Figure 7. Kaplan-Meier plot for the blood flow (BF) gray level size zone matrix (GLSZM) id feature for progression-free survival (PFS) and overall survival (OS), when separating patients above and below the optimal cutoff.For PFS, the difference in progression at 60 months was 54% for the group of patients above the cutoff, and 19% for the group of patients below the cutoff.For OS, the survival difference at 60 months was 38% for the group of patients above the cutoff and 8% for patients below the cutoff.The log-rank tests for BF GLSZM id against resulted in p = 0.004 for PFS and p = 0.004 for OS.

Table 1 .
Patient characteristics.Data are number of patients, with percentages in parantheses, or mean ± standard deviation.BMI = body mass index, TNM = tumor node metastasis, PFS = progression-free survival, OS = overall survival.variable presented.All tests were two-sided.A p-value < 0.05 was considered statistically significant.Analyses were performed in R v.4.1.2(R Foundation for Statistical Computing) or in Python v3.7.