Predicting distant metastasis and chemotherapy benefit in locally advanced rectal cancer

Distant metastasis (DM) is the main cause of treatment failure in locally advanced rectal cancer. Adjuvant chemotherapy is usually used for distant control. However, not all patients can benefit from adjuvant chemotherapy, and particularly, some patients may even get worse outcomes after the treatment. We develop and validate an MRI-based radiomic signature (RS) for prediction of DM within a multicenter dataset. The RS is proved to be an independent prognostic factor as it not only demonstrates good accuracy for discriminating patients into high and low risk of DM in all the four cohorts, but also outperforms clinical models. Within the stratified analysis, good chemotherapy efficacy is observed for patients with pN2 disease and low RS, whereas poor chemotherapy efficacy is detected in patients with pT1–2 or pN0 disease and high RS. The RS may help individualized treatment planning to select patients who may benefit from adjuvant chemotherapy for distant control.


Supplementary Methods
All patients were enrolled with strict inclusion and exclusion criteria, which are shown as

Sample size evaluation
In this study, the proportional hazards regression model was used for the prognosis of distant metastasis free survival of the patients. To avoid potential overfitting of the model, the minimum sample size for the primary and validation cohorts was estimated.
For the sample size evaluation of the primary cohort, the rule of thumb is that the number of predictors should remain within 1/15-1/10 of the sample size in the training dataset (i.e., the primary cohort) (1,2). In this study, 4 imaging features were selected for radiomic signature, while the sample size of the primary cohort was 176. Thus, the relative between the number of patients in the primary cohort and the selected features for the radiomic signature was acceptable.
For the sample size evaluation of the validation cohort, a sample size estimation method proposed in the book "Sample Size Calculations in Clinical Research 2 nd Ed" was performed.
According to the book, the sample size calculation could be estimated using following formula: Letting θ represent the hazard ratio, the hypotheses of interest are: The sample size and power are calculated respectively: where, n is the sample size for model validation, Φ is the standard Normal distribution function, α is Type I error, β is Type II error, 1 − β is power, θ 0 is the hazard ratio hypothesized under the null hypothesis, θ is the hazard ratio, p E is the overall probability of the event occurred, and σ 2 is the variance of the covariate.
In our study, the overall probability of the event (DM ratio) in 3 years occurred in the pilot experiment was 0.29 (p E , 51/176, data in the primary cohort is used as a pilot experiment), the log hazards ratio associated with a one-unit change and the variance of radiomic signature were 6.628 (ln(θ)) and 0.801 (σ 2 ), respectively. ln(θ 0 ) was set to 0 for the null hypothesis. The results showed that the sample size needed in the validation cohort was calculated to be 16 with the desired two-sided significance level α = 0.05 and power 1 − β = 95%, referring to the primary cohort. In this study, 154, 150, and 149 patients were enrolled into the three validation cohorts, respectively.
In conclusion, both the sample size for primary and validation cohorts in this study was enough.

Tumor masking
Five radiologists (1 from each participating hospital) with at least 10 years' experience in rectal MR imaging were chiefly responsible for the evaluation of tumor masking. ROIs were drawn along the contour of the tumor as visualized by T2WI (slightly high signal), containing the surrounding chords and burrs. ROIs were placed on the high signal intensity region on DWI on each slice. Due to the higher resolution of DWI compared with ADC maps, ROIs were detected with a b-value of 1,000 s/mm 2 first and then copied to the corresponding ADC maps for further analysis. Inter-and intra-observer reproducibility of tumor masking and radiomic feature extraction were initially analyzed with the T2WI data of 50 randomly selected patients for ROIbased radiomic feature generation in a blinded fashion by these 5 radiologists.
To ensure reproducibility, each radiologist repeated the tumor masking and generation of radiomic features twice with an interval of at least 1 month, following the same procedure. Intraclass correlation coefficients (ICCs) were utilized for evaluating the intra-and inter-observer agreement in terms of feature extraction. We interpreted an ICC of 0.81-1.00 as almost perfect agreement, 0.61-0.80 as substantial agreement, 0.41-0.60 as moderate agreement, 0.21-0.40 as fair agreement, and 0-0.20 as poor or no agreement. An ICC greater than 0.6 was considered a mark of satisfactory inter-and intra-observer reproducibility.
To ensure the accuracy of tumor masking, the tumor masks were evaluated by other radiologists from the same hospital for each hospital, following the same guideline describing how to define the boundary of tumors.

Radiomic feature extraction
Four groups of imaging features were extracted from both T2 images and ADC maps: Group 1 with 8 shape-and size-based features, Group 2 with 15 first order statistical features, Group 3 with 53 textural features, Group 4 with 544 wavelet features. Following showed the details of the features: (1) Shape and size-based features In this group of features, we included descriptors of the three-dimensional shape-and size of the tumor region. Let in the following definitions V denote the volume and A the surface area of the volume of interest. We determined the following shape and size-based features: 1. Compactness 1:

Maximum 3d diameter:
The maximum three-dimensional tumor diameter is measured as the largest pairwise Euclidean distance, between voxels on the surface of the tumor volume.

Spherical disproportion:
6. Surface area: The surface area is calculated by triangulation (i.e. dividing the surface into connected triangles) and is defined as: Where N is the total number of triangles covering the surface and a, b and c are edge vectors of the triangles.

7.
Surface to volume ratio:

Volume:
The volume (V) of the tumor is determined by counting the number of pixels in the tumor region and multiplying this value by the voxel size.
(2) First order statistical features The following 15 statistical features were extracted.
Let X be the three-dimensional image matrix with N voxels of the ROI and P be the first order histogram distribution with Ng discrete intensity levels.
1. IntensityMax: The maximum intensity value of X.
2. IntensityMin: The minimum intensity value of X.
3. Median: The median intensity value of X.

Range:
The range of intensity values of X.

Textural features
Second order statistic texture features, and higher order statistic texture features were extracted.
Twenty-two second order statistic texture features could be calculated from the Gray Level Cooccurrence Matrix (GLCM). Thirty-one high order statistic texture features were calculated from the Gray Level Size Zone Matrix (GLSZM), Gray Level Run Length Matrix (GLRLM), and Neighborhood Gray Tone Difference Matrix (NGTDM).

Gray-Level Co-Occurrence Matrix based features (GLCM)
GLCM based features were second-order statistical texture features, which are defined as a matrix M (i, j; δ, θ) to indicate the relative frequency with intensity values of pixels (i and j) at the distance of δ in direction θ. Texture matrices were determined considering 26 connected voxels (i.e. voxels were considered to be neighbors in all 13 directions in three dimensions). In this study, δ was set to 1 and θ to each of the 13 directions in three dimensions, yielding a total of 13 gray GLCM for each 3D image. From these GLCM matrices, 22 textural features are derived.
Each 3D GLCM based feature was then calculated as the mean of the feature calculations for each of the 13 directions. Let: M (i, j) be the co-occurrence matrix for an arbitrary δ and θ, set δ=1 and θ=0 and 45, Ng be the number of discrete intensity levels in the images, set as 25, μ be the mean of M (i, j), be the marginal column probabilities, and uy ,μx, be the mean of mx .and 1. Energy: Wavelet transform effectively decouples textural information by decomposing the original image into low-and high-frequencies. In this study, a 3D Coiflet wavelet transform was applied to each MR image, which decomposes the original image X into 8 decompositions. Consider L and H to be low-pass and high-pass function, the wavelet decompositions of X to be labeled as XLLL, XLLH, XLHL, XLHH, XHLL, XHLH, XHHL and XHHH. For example, XLLH is then interpreted as the high-pass band, resulting from directional filtering of X with a low-pass filter along x-direction, a low pass filter along y-direction and a high-pass filter along z-direction and is constructed as: Imaging feature normalization Following normalization procedure was conducted to balance the deviance of radiomics features extracted from the four cohorts relying on different MRI protocols and environments and to generate appropriate input for further analysis.
First, all quantitative imaging features were normalized with a max-minimum normalization method in each cohort.
Second, a posteriori harmonization statistical method named Combat, which was proposed initially for genomics analysis to correct batch effect and applying in radiomic studies, was used for balancing the deviation of radiomic features extracted from the four cohorts. In this step, we first set the four cohorts as four batches, and then Combat was used.
Third, a method referred to inverse probability of treatment weighting (IPTW) was used to estimate the propensity scores. Specifically, principal component analysis (PCA) was applied to extract the main components of the radiomics features balanced by Combat, and 75% explained components were used as the predictors to construct a logistic model for estimating the probability of a sample belonging to the primary cohort. The inverse of probabilities was used as the propensity scores. Here, DM patients and non-DM patients (the status at three years after surgery was used, without the censored data) were analyzed separately.
Finally, radiomic features were weighted based on the propensity scores to eliminate the covariable effect between the four cohorts.

Nomogram construction
To demonstrate the incremental value of the radiomics signature, a radiomic nomogram and two nomograms based on clinicopathologic risk factors were constructed and presented in validation cohorts.
For the nomogram based on the fifteen clinicopathologic risk factors ( Abbreviations: FOV, field of view; TR, repetition time; TE, echo time; T2WI, T2 weighted imaging; DWI, diffusion weighted imaging; CE, contrast enhancement.