A Novel Methodology using CT Imaging Biomarkers to Quantify Radiation Sensitivity in the Esophagus with Application to Clinical Trials

Personalized cancer therapy seeks to tailor treatment to an individual patient’s biology. Therefore, a means to characterize radiosensitivity is necessary. In this study, we investigated radiosensitivity in the normal esophagus using an imaging biomarker of radiation-response and esophageal toxicity, esophageal expansion, as a method to quantify radiosensitivity in 134 non-small-cell lung cancer patients, by using K-Means clustering to group patients based on esophageal radiosensitivity. Patients within the cluster of higher response and lower dose were labelled as radiosensitive. This information was used as a variable in toxicity prediction modelling (lasso logistic regression). The resultant model performance was quantified and compared to toxicity prediction modelling without utilizing radiosensitivity information. The esophageal expansion-response was highly variable between patients, even for similar radiation doses. K-Means clustering was able to identify three patient subgroups of radiosensitivity: radiosensitive, radio-normal, and radioresistant groups. Inclusion of the radiosensitive variable improved lasso logistic regression models compared to model performance without radiosensitivity information. Esophageal radiosensitivity can be quantified using esophageal expansion and K-Means clustering to improve toxicity prediction modelling. Finally, this methodology may be applied in clinical trials to validate pre-treatment biomarkers of esophageal toxicity.

One hurdle to elucidating if a potential biomarker has utility in characterizing dose-response in terms of radiation esophagitis, is the manner in which we quantify the toxicity. Typically, pre-treatment biomarkers of toxicity are investigated in multivariate predictive models by comparing models using the pre-treatment biomarker to models not utilizing the respective biomarker in the model. These models will have an outcome, toxicity severity in our case, and biomarkers are validated if the comparison shows a large improvement in predictive performance of the model by including the pre-treatment biomarker. To date, some studies have shown certain single nucleotide polymorphisms (SNPs) as potential pre-treatment biomarkers of radiation sensitivity using the methods previously described 10,[16][17][18][19][20][21] .
Traditionally, grading criteria such as the common terminology criteria for adverse events (CTCAE) has been a typical clinical method of quantifying toxicity 22 . While this has great practical importance in terms of clinical symptom management, the use of grading criteria is suboptimal for the use of outcome assessment via predictive models, as well as for investigating pre-treatment biomarkers. This is because grading criteria assign a nominal score for toxicity severity based on the patient's perceived symptom severity and physician chosen interventions, which are subjective in nature and are non-continuous quantifications 11 . This concern highlights the need for objective endpoint measures of toxicity severity, as well as endpoints for outcome assessment that directly relate to the individual patient's radiation-response in the esophagus. For these reasons, the need for objective imaging biomarkers has been raised in several review articles within the realm of radiation oncology 23,24 .
The radiation-induced swelling response in the esophagus, deemed esophageal expansion, has been previously validated as an imaging biomarker of radiation-response and toxicity 25,26 . By utilizing a baseline CT scan (the radiation therapy planning CT), as well as a CT scan acquired towards the end of treatment, the relative amount of swelling, or the expansion, can be quantified. When combined with the radiation dose information, a precise quantification of response is obtained from the radiation dose for a particular patient. This biomarker is able to objectively quantify toxicity in the esophagus, and therefore is a suitable endpoint for the validation of pre-treatment biomarkers of esophageal radiation sensitivity.
The goals of this study were: (i) to quantify the inter-patient variability of esophageal response, also referred to as the normal tissue toxicity in this study, by utilizing esophageal expansion along with the corresponding radiation dose to quantify individual patient's dose-response; (ii) to determine if patient subgroups of radiation sensitivity can be identified in a mathematically reproducible manner using K-Means clustering; and (iii) to determine if the patient radiation sensitivity subgroup information can be used in the predictive modelling process to improve toxicity prediction models, thereby showing feasibility for this methodology as a validation procedure for pre-treatment biomarkers of radiation sensitivity.

Methods and Materials
Patient Population. One hundred and thirty-four patients were identified from a prospective, randomized clinical trial for the treatment of stage III NSCLC with concurrent chemoradiation therapy (paclitaxel and carboplatin), with tumor prescription doses of 60 (n = 4), 66 (n = 28), or 74 (n = 53) Gy in 2-Gy fractions over 6-8 weeks at University of Texas-MD Anderson Cancer Center. Radiation dose was chosen as the maximum of the three prescriptions that met critical structure constraints. These constraints included: mean lung dose ≤22 Gy, lung volume receiving ≥20 Gy up to 40%, mean esophageal dose ≤45 Gy, 33% of esophageal volume ≤65 Gy, 66% of esophageal volume ≤55 Gy, maximum spinal cord dose ≤50 Gy to any 2 cm 3 volume, and mean heart dose ≤33 Gy. The inclusion/exclusion criteria included: pathologically proven, unresected stage II-IIIB NSCLC, suitability of concurrent chemo radiation therapy for treatment, age between 18 and 85 years, informed consent obtained before enrollment; small-cell histology, prior radiotherapy to the thoracic region, pregnancy. Intensitymodulated radiation therapy (IMRT) and passive-scatter proton therapy (PSPT) was utilized in 85 and 49 of the study patients, respectively. During radiation therapy, patients had weekly 4-dimensional computed tomography (4DCT) imaging and prospective esophagitis scoring according to Common Terminology Criteria for Adverse Events version (CTCAE) 3.0. Our study was approved by the University of Texas-MD Anderson Cancer Center Institutional Review Board, including obtaining informed consent for all study patients, and was compliant with Health Insurance Portability and Accountability Act (HIPAA) regulations. A summary of study patient demographics is shown in Table 1.
CT scans were acquired on General Electric Lightspeed Discovery ST, Lightspeed RT16 (GE Healthcare, Waukesha, WI), or Philips Brilliance 64 (Philips Healthcare, Bothell, WA) CT scanners operated at 120 kV. Voxel dimensions were 0.98 × 0.98 × 2.50 mm in the right-left direction, anterior-posterior, and superior-inferior direction, respectively, with a 512 × 512-pixel area. Patient treatment planning and segmentation was conducted using the Pinnacle treatment planning system (Phillips Healthcare), with esophageal contours segmented from the cricoid cartilage to the gastroesophageal junction, in the axial plane, with Pinnacle version 9.8.

Quantification of Esophageal Response.
Esophageal expansion was previously validated as a radiation-response measure in the esophagus 25 . Expansion is a surrogate quantification of esophageal swelling that is measured from relative volume change, as represented by a corresponding pair of 4DCTs (radiotherapy planning CT and CT acquired at the end of radiation therapy). An example of esophageal expansion is illustrated in Fig. 1.
The expansion-response for a given patient was quantified as the mean expansion and corresponding mean delivered radiation dose, to an isotropic esophageal sub-volume, centered at the slice location of maximum axial expansion. To maintain uniform sampling, expansion was quantified at the imaging time point nearest fraction 30 (mean = 30.5, standard deviation = ±2.2). Delivered dose was quantified as the voxel dose at the time of the expansion quantification; this is typically less than the planning dose, as fraction 30 was not the last fraction of treatment for many of the study patients. The combination of expansion value and corresponding delivered dose at the time of expansion quantification is the expansion-response for a given patient.  The underlying premise in utilizing clustering to identify patient sub-groups of differing radiosensitivity is that a particular cluster must have a proportionally higher expansion per delivered dose than other clusters. Based on the previous assumption, we assume that the 3 following clusters should be observed based on radiosensitivity: the radiosensitive cluster, which has the highest expansion per delivered dose; the radioresistant cluster, which has high delivered dose, but proportionally lower expansion than the radiosensitive group; and third, the radionormal cluster, which has lower expansion and delivered dose than the two other clusters.
The expansion dose-response quantified at the end of treatment, around fraction 30, were clustered separately using a K-Means mixture model 27 . This method is a variation of clustering using Gaussian mixture modelling, which is a process of identifying membership of the patients to a finite number of unique clusters, based on the assumption that the observed data distribution is a collection of multiple Gaussian distributions. These unique underlying Gaussian distributions are representative of the patient radiosensitivity clusters we seek to identify.
K-Means clustering is a commonly utilized technique where the squared Euclidean distance is used as a dissimilarity measure [27][28][29] . Minimization of dissimilarity for data points (patients) in a given number of clusters is used to find the solution. Once minimized, the patients are clustered into unique groups based on Gaussian mixture modeling of the expansion-response.
Before clustering was calculated, patients with a sub-volume dose less than 20 Gy were excluded from the analysis (n = 8). This is because there was insufficient dose to incite an expansion-response. All excluded patients were asymptomatic. After clustering the remaining 126 patients, the radiosensitive patient cluster was identified and then used in the toxicity prediction modelling process.

Toxicity Prediction Modelling.
In this study, least absolute shrinkage and selection operator (lasso) logistic regression was utilized to create the toxicity prediction models. These models were then used to determine if the radiosensitivity cluster membership substantially improves esophagitis prediction modelling. Lasso logistic regression is a robust model building method that prevents overfitting 28,30 . Lasso toxicity prediction models were constructed with the 126 study patients that had adequate esophageal dose for analysis in a repeated cross-validation procedure, for 1000 iterations, which is illustrated in Fig. 2. The repeated cross-validation procedure yields an accurate representation of model generalizability and reduces the effect of random partitioning of patient data 28,30,31 .
To summarize this procedure, predictor variables in the form of dosimetric and clinical factors were used as covariates to create toxicity prediction models for ≥grade 3 esophagitis complication, according to CTCAE version 3.0. Approximately 75% of the patients are randomly drawn into a training set, which builds the prediction model. The remaining 25% of patients comprise the test set and are used to quantify the built model's predictive performance in the form of area under the curve (AUC) and Brier scores. This process is repeated for 1000 iterations to remove any influence of random draw of patients for the training or tests sets. The dosimetric and clinical factors utilized as covariates in the models are shown in Table 2. The recurrence of model features was quantified by recording variables in each model, for every iteration of the cross-validation procedure.
The prediction model construction process was then repeated with radiosensitivity as an additional covariate. The previously described clustering technique was used to identify patients that had proportionally higher expansion-response then other study patients, and this information was quantified as a dichotomous variable (1 for radiosensitive patient, 0 otherwise) in the LASSO toxicity prediction modelling construction process. Model performance was assessed and recurring model predictors were cataloged for every iteration of the model construction process. The results of both model construction scenarios (with and without the radiosensitivity predictor) were compared.
Computational Implementation. Predictor variables were standardized by subtracting the mean variable value of all patients from each individual patient value, and then dividing the result by the standard deviation 32 . All computations were conducted in MATLAB version 8.2 (Mathworks, Natick, MA). K-Means clustering was computed using MATLAB's Statistics and Machine Learning toolbox. Lasso models were constructed using the open source glmnet package implemented in MATLAB 33 . A p-value of p < 0.05 was considered statistically significant.

Results
Expansion-Response and Radiosensitivity Clustering. The expansion-response quantified towards the end of treatment is described for all 134 study patients in Fig. 3A. An overall trend of increasing toxicity severity along with dose is observed, but this has high patient-to-patient variability for a given dose range. Additionally, the expansion per delivered dose is also markedly variable. The distribution of expansion in 10-Gy dose partitions, from 20 Gy to 70 Gy, is shown in Fig. 3B, with a high variance of expansion observed for patients with similar doses. The standard deviation of expansion in a given dose partition is also shown, with a 30% standard deviation of expansion being typical.
The 8 patients excluded from the clustering and toxicity analyses can be identified in Fig. 3A as the patients with mean sub-volume doses under 20 Gy. The resultant K-Means clustering of the expansion-response is shown in Fig. 3C. The clusters are shown by color, with the radiosensitive cluster as brown, the radio-normal cluster is green, and the radioresistant cluster is purple. The assigned clusters' radiation sensitivity characteristics met the necessary assumptions of expansion-response described in the methods section. Distributions of toxicity severity according to cluster membership is shown in Fig. 3D. No grade 0 patients were found in the radiosensitive cluster, but many grade 2 and 3 patients were. The radioresistant (purple) cluster contained the most patients, and all esophagitis grades being observed within this cluster.
The lasso toxicity prediction model construction procedure had similar distributions of recurring model predictors, even for toxicity models not using radiosensitivity as a predictor variable. The results of predictor recurrence for models constructed without and with the radiosensitivity variable are shown in Fig. 4A and B, respectively. For models constructed with the radiosensitivity variable, this predictor was the most recurring variable and was chosen in over 99% of the 1000 iterations of model construction (top most data bar in Fig. 4B). Mean esophageal dose was the second most recurring predictor in the radiosensitivity information inclusive models and the most recurring predictor in models not including the radiosensitivity information.
The toxicity prediction model performance with and without radiosensitivity information is summarized in Table 3. Models utilizing radiation sensitivity information using K-Means clustering to identify radiosensitive patients outperformed models lacking radiosensitivity information. The training and predictive performance of  Table 2. Predictor variables used in the NTCP model construction process. Abbreviations: MED = mean esophagus dose; Dmax = maximum esophagus dose; V10 = volume of esophagus receiving at least 10 Gy; LE10 25% = esophageal length with at least 10 Gy to at least 25% of the cross-sectional area to axial slice of the esophagus; LE10 100% = esophageal length with at least 10 Gy to at least 100% of the cross-sectional area to axial slice of the esophagus. models using clustering have significantly higher AUC Training and AUC Test (paired T-test on AUC values for each corresponding iteration of cross-validation, p < 0.05), than models not using radiation sensitivity information. Both types of Brier scores were optimal (lower value) for the clustering/radiosensitivity models compared to models without radiosensitivity information.

Discussion
In this study, K-Means clustering was performed on esophageal expansion-response to identify patients' inherent radiosensitivity. Clustering was calculated using expansion-response at approximately the 30th radiation therapy treatment fraction. This information was then used to identify radiosenstive patients, and this information was then converted to a dichotomous variable. This radiosensitivity information was used in the toxicity prediction modelling process in an attempt to improve esophagitis prediction models, using lasso penalized logistic regression in a repeated cross-validation procedure.
The expansion-response of these patients was highly variable regardless of delivered esophageal dose. For similar subvolume doses, many patients had vastly different amounts of expansion, in addition to varying toxicity severity. This shows a potential pitfall of toxicity prediction modelling without accounting for inherent radiation sensitivity, where variability of patients' response outweighs the study population's average observed response. The variability of response for patients with similar delivered dose may make detecting such effects arduous if patient radiosensitivity is not considered.
Toxicity prediction models using radiosensitivity predictor variables outperformed toxicity models not utilizing radiosensitivity information, for a grade 3 maximum esophagitis endpoint. The performance of models using radiosenstivity information is even more impressive, as a total of eight low-dose, low-response, and asymptomatic patients were excluded from the model construction process. In typical modelling scenarios, these types of patients are easily classified and will contribute to a higher model performance metric value, which would be reflected in the quantification of predictive ability. By not including these patients, the modelling situation is more challenging for the classification of esophagitis severity. Since the toxicity models with radiosensitivity information had high predictive ability despite this challenging scenario, this translated into more robust toxicity prediction models when including the radiosensitivity information.
A prime application of the framework presented in this study, is the use the use of radiosenstivity, as quantified by the expansion dose-response, as a validation methodology for clinical trials. With the push into radiogenomics and a desire to quantify pre-treatment biomarkers to aid in personalized medicine, the question of how to validate such pre-treatment biomarkers arises. Traditional esophagitis grade endpoints are quite subjective and variable, and do not objectively quantify radiation-response. Since esophageal expansion is an objective radiation-response biomarker, we can utilize the expansion-response to validate any prospectively investigated pre-treatment biomarker of radiation-response, when esophageal response is a trial endpoint. A simple workflow of this concept is shown in Fig. 5. Furthermore, this validation methodology could be applied to any situation that prospectively investigates any two radiation therapy treatments (different modality, fractionation scheme, etc.) to objectively validate any difference in radiation-response in the esophagus.   This work was not without limitations. The clustering process is unsupervised in terms of esophagitis outcome, and therefore requires some assumptions for interpretation. As described in the methods, the cluster assignment of radiosensitivity was determined based on the assumptions of the relative expansion-response within the study population. It is vital to validate these findings on an external dataset, as it would be interesting to see if cluster assignment and shape would vary with new patient data. Another limitation was that the radiosensitivity information was only used dichotomously (radiosensitive or not radiosensitive). It would be of interest to analyze the utility of not just the radiosensitive clusters, but also patients labelled as radionormal and radioresistant. The radioresistant cluster in particular would be of interest in dose-escalation studies.
In addition to expansion, other types of imaging biomarkers of radiation-response can be used in a similar methodology as the work presented in this study. Esophageal uptake from 18 F-fluorodeoxyglucose positron emission tomography (FDG-PET) has been shown to quantify radiation-response and toxicity in the esophagus 34,35 . Therefore, it is feasible to apply esophageal FDG uptake with the clustering methodology to identify patients with inherent radiation sensitivity in the esophagus. Another application of this work is the analysis of any potential variation of radiosensitivity along the axial length of the esophagus. With a controlled cohort similarly irradiated along different axial regions of the esophagus, it may possible to determine any variability in response between specific esophageal sub-regions.
In conclusion, clustering techniques can be applied to the expansion-response mechanism to determine patient radiosensitivity. This radiosensitivity information can be used in the esophagitis prediction modelling process to improve toxicity prediction performance. Patient inherent radiosensitivity can be assessed towards the end of radiation therapy and may be applicable for outcome assessment in clinical trials that investigate response in the esophagus. Data Availability. The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.