Improved prediction of radiation-induced hypothyroidism in nasopharyngeal carcinoma using pre-treatment CT radiomics

When planning radiation therapy, late effects due to the treatment should be considered. One of the most common complications of head and neck radiation therapy is hypothyroidism. Although clinical and dosimetric data are routinely used to assess the risk of hypothyroidism after radiation, the outcome is still unsatisfactory. Medical imaging can provide additional information that improves the prediction of hypothyroidism. In this study, pre-treatment computed tomography (CT) radiomics features of the thyroid gland were combined with clinical and dosimetric data from 220 participants to predict the occurrence of hypothyroidism within 2 years after radiation therapy. The findings demonstrated that the addition of CT radiomics consistently and significantly improves upon conventional model, achieving the highest area under the receiver operating characteristic curve (AUCs) of 0.81 ± 0.06 with a random forest model. Hence, pre-treatment thyroid CT imaging provides useful information that have the potential to improve the ability to predict hypothyroidism after nasopharyngeal radiation therapy.

RIH with moderate area under the receiver operating characteristic curve (AUCs) of 0.64.Therefore, an effective predictive model is needed to improve treatment planning and reduce the occurrence of RIH.
Medical imaging provides essential information about the pathophysiology of the tumor that is useful for cancer treatment planning, monitoring, and post-therapy evaluation.In addition to qualitative visual inspection of radiological images by clinicians, quantitative numerical data can also be extracted from radiological images via radiomics approaches.Radiomics features describes various metrics of the shape, size, and heterogeneity of the tumor that can complement clinical and dosimetric data when developing predictive machine learning models for clinical applications.Past studies shown the benefits of radiomics information on the prediction of locoregional recurrence, response to treatment, survival, and complications in head and neck cancer patients 6 .Khadija et al. 7 reported that radiomics features from pre-treatment computed tomography (CT) and magnetic resonance (MR) imagings of salivary glands might be able to reflect the functional states of the glands and are predictive of post-radiation xerostomia.Hence, the incorporation of radiomics features of the thyroid gland should improve the ability to predict RIH compared to conventional methods that rely on only clinical and dosimetric data.
In this study, the occurrence of RIH in nasopharyngeal cancer patients within 2 years after treatment were predicted using radiomics features from pre-treatment contrast-enhanced CT images together with clinical and dosimetric data.Conventional models using only clinical and dosimetric data were compared to the combined models that utilized radiomics features to evaluate the benefits of radiomics features on RIH prediction.Multiple machine learning models were explored to assess the reproducibility of the findings.The results can lead to the development of a pre-treatment planning tool that helps clinicians optimize radiation dosage to manage the risk of RIH.

Results
Table 1 summarizes the demographic and clinical characteristics of the 220 nasopharyngeal cancer patients, 106 (48.18%) of which developed RIH within 2 years after radiation therapy.The average age was 48.28 ± 11.71 years, with the majority being male (72.27%).Most patients were of clinical staged 3 (51.82%),N stage 2 (51.82%), and T stages 2 or 1 (33.18% and 31.36%,respectively).Two pre-treatment clinical variables, the level of thyroidstimulating hormone (TSH) and the thyroid volume, were significantly different between the patients with and without RIH (adjusted Mann-Whitney U test p-values < 0.05).The average levels of TSH were 2.68 ± 7.08 μU/ml in patients with RIH and 1.83 ± 2.30 μU/ml in patients without RIH.The average volumes of the thyroid glands were 13.23 ± 6.43 cm 3 in patients with RIH and 15.06 ± 7.07 cm 3 in patients without RIH.There was no significant difference in dosimetric variables between the two groups of patients as shown in Table 2.
A total of 1,288 radiomics features were extracted from pre-treatment contrast-enhanced CT images and categorized into four classes: shape, first-order statistics, texture, and filter-based.The robustness of each radiomics features to the variation in regions of interest drawn by different radiation oncologists were evaluated using  2, AUC = 0.64-0.65).Highly predictive radiomics features include the wavelet-HLL_glcm_MaximumProbability, log-sigma-1-0-mm-3D_ngtdm_Coarseness, wavelet-LLH_ngtdm_Strength, and wavelet-LLH_ngtdm_Strength.Five machine learning models were trained and cross-validated on different combinations of clinical, dosimetric, and radiomics data (Table 3).The areas under the receiver operating characteristic curve (AUC) of the models that incorporated radiomics data were compared to the performance of the conventional models to assess the benefits of radiomics.In all cases, the addition of radiomics data significantly improved the validation AUCs (adjusted signed rank test p-values < 0.05).The combined logistic regression model which incorporated all data types achieved a validation AUC of 0.80 ± 0.06 compared to the AUCs of up to 0.68 ± 0.07 when only clinical and dosimetric data were used.Similarly, the combined random forest model achieved a validation AUC of 0.81 ± 0.06 compared to the AUCs of up to 0.71 ± 0.06 when only clinical and dosimetric data were used.The highest performing models based on support vector machine (SVM) with radial basis kernel, extreme gradient boosting (XGBoost), and adaptive boosting (AdaBoost) achieved slightly lower validation AUCs of 0.77 ± 0.06, 0.77 ± 0.05, and 0.78 ± 0.05, respectively, but still significantly outperformed the model variants that utilized only clinical and dosimetric data.
The top performing models, namely the combined logistic regression model and the combined random forest model, were than evaluated on a held-out test dataset (Fig. 1 and Table 4).The combined models again outperformed the models variants that utilized only clinical and dosimetric data in almost all metrics.The only exception is sensitivity where the combined models did not achieve the best performances.It should be noted that although the combined logistic regression model and the combined random forest model achieved similar AUCs (0.72 and 0.74, respectively), they have different tradeoffs.While the combined random forest model achieved higher sensitivity in the high specificity range (> 0.70), the opposite is true in the intermediate specificity range (Fig. 1c).The confusion matrices also suggested that the combined logistic regression model tends to produce slightly more false positives than false negatives while the combined random forest model behaves in the opposite manner (Fig. 2).Hence, multiple metrics should be considered when selecting the best model and cutoff value.
During model development, features that did not strongly contribute to the prediction were iteratively removed.Starting from more than 800 input features, the final combined logistic regression model consisted of three clinical features, namely bilateral neck metastasis status, pre-treatment TSH level, and age, one dosimetric feature, namely the percentage of thyroid volume that received at least 40 (TR V 40 ), and 26 radiomics features (Fig. 3).The final combined random forest model contains one clinical feature, namely pre-treatment TSH level, and one dosimetric feature, namely the mean dose to thyroid (TR mean), and 34 radiomics features.

Discussion
This study aimed to develop a new predictive model for radiation-induced hypothyroidism (RIH) using the combination of clinical, dosimetric, and pre-treatment CT radiomics data and to evaluate the benefits of radiomics data in this context.The results showed that the combined models which incorporated all three data types significantly outperformed the conventional models that utilized only clinical and dosimetric data.In good www.nature.com/scientificreports/agreement with prior studies 4,8-10 , the best combined models assigned high importance to the pre-treatment TSH level, age, and metastasis status which have been implicated in RIH.Here, younger age, positive nodes, and high pre-treatment TSH levels were associated with a higher risk of developing RIH.Among dosimetric features, TR V 40 and TR mean selected by the best combined models had been reported as good predictors for RIH 10,11 .Moreover, the best combined models (validation AUCs of 0.80-0.81and test AUCs of 0.72-0.74)also compared favorably to recent normal tissue complication probability (NTCP) models 12,13 which achieved an AUC of only 0.58 on this cohort.These NTCP models considered only age, TR mean, and thyroid volume as predictors.
The majority of radiomics features selected by the best combined models came from the texture and filter groups, such as log-sigma_ngtdm_contrast, log-sigma_ngtdm_coarseness, and wavelet-HHL_glszm_SmallAr-eaEmphasis.Texture features describe the spatial distribution of voxel intensity levels in a region of interest as fine, coarse, grainy, or smooth, which may reflect the pathophysiological status of the organs or malignancies.Ishibashi N et al. 14 reported that decreased thyroid gland CT intensity might indicate a higher risk of RIH.As subtle changes in the textures of the thyroid may not be discernable by human eyes, radiomics was required to obtain a more precise assessment (Fig. 4).Although radiomics has been successfully applied to enhance the prediction of radiation-induced complications in various studies 6,7,15 , there are also reports that the incorporation of radiomics features did not significantly improve the prediction of RIH some cancer types, such as oropharyngeal cancer 16 .One possible reason for the contradictory findings could be due to the difference in etiologies, which would influence the association between radiomics features and the clinical status of the tumors.However, our study is not without limitations.First, the retrospective design of the study did not allow for a complete standardization of the CT imaging acquisition protocols.Moreover, RIH requires a long follow-up period which resulted in a relatively small sample size.
In conclusion, while the mechanism of RIH remains unclear, ionizing radiation is expected to damage the thyroid gland, altering the morphology, vessel structure, and immune response of the organ.Our results demonstrated that the combination of information from pre-treatment CT imaging with clinical and dosimetric data significantly improves the prediction of RIH across many performance metrics.We contend that these findings will lead to the development of a better pre-treatment planning tool that helps radiation oncologists optimize dose constraints on the thyroid gland to reduce the risk of hypothyroidism.

Population and sample
This study is retrospective and has been approved by the Ethic Committee at the Faculty of Medicine of Chulalongkorn University (IRB No. 745/61).The need for written informed consent was waived by the same Ethic Committee at the Faculty of Medicine, Chulalongkorn University.All methods were performed in accordance with relevant guidelines and regulations.A total of 220 patients with nasopharyngeal cancer (NPC) whose hypothyroidism statuses were confirmed within 2 years after radiation therapy treatment by the Division of Radiation Oncology, Department of Radiology, King Chulalongkorn Memorial Hospital during the period 2010-2020 were included.The inclusion criteria were age of at least 18 years, treated with definite RT (IMRT or VMAT) with or without chemotherapy, and received a radiation dose of 70 Gy in 33-35 fractions.The patients must also have a normal baseline thyroid function, and no history of pre-existing thyroid disease, thyroid surgery, or radiation therapy in the neck.

CT image acquisition
All patients underwent a CT simulation before radiation therapy treatment.A 64 detector-row CT simulator was used to acquire CT images (Revolution CT; GE Healthcare, Chicago, IL, USA).Acquisition protocols included a non-contrast phase and a contrast-enhanced phase in helical mode at 120 kV with smart mA by 2.5 mm slice thickness.

Thyroid segmentation
Manual 3D segmentation of the thyroid gland was performed by radiation oncologists.The region of interest (ROI) covering the thyroid gland in contrast-enhanced CT images were drawn using the Eclipse Contouring software (Varian Medical System, Inc: version 15.5).To assess the robustness of each radiomics feature, an intraclass correlation (ICC) test was performed by having three radiologists segment the same set of thirty randomly selected patients.Robust radiomics features should maintain the same values across ROIs delineated by different radiologists for the same CT image.ICC cutoffs of 0.5 and 0.75 were then applied to select robust radiomics features.

Radiomics features extraction
Radiomics features were extracted from contrast-enhanced CT images using the PyRadiomics package (version 3.0) through the 3D slicer software (version 4.11.2) 17,18 .There were 14 shape-based features, 18 first-order statistics features, 73 texture-based features, and 1183 filter-based features.The bin width parameter was varied between 0.05, 0.1, 0.15, and 0.2.

Clinical and dosimetric data collection
Clinical variables were collected from the hospital information system of the King Chulalongkorn Memorial Hospital.Dosimetric variables were calculated from the dose volume histogram, namely V 40 , V 50 , V 60 , Pit 50 , Pit 55 (V x : Percentage of thyroid volume that has received at least × Gy radiation, Pit x : Percentage of pituitary volume that has received at least × Gy radiation), VS 40 , VS 50 , VS 60 (VS x : Percentage of thyroid volume preserved from × Gy of radiation), the mean dose of thyroid and pituitary gland, the maximum dose of thyroid and pituitary gland, and the minimum dose of thyroid and pituitary gland dose, via the treatment planning system.

Model development
The dataset was first divided into training (80%) and testing (20%).Then during model development, the training set was further divided into 5 equal partitions to perform a fivefold cross-validation.The fivefold cross-validation process was repeated 20 times with different random partitioning of the training set.Five machine learning model families, namely regularized logistic regression, random forest, support vector machine (SVM), gradient boosting trees (XGBoost), and adaptive boosting (AdaBoost) were developed through the scikit-learn and xgboost packages 19,20 .Multiple combinations of clinical data, dosimetric data, and radiomics data were considered as inputs.For each initial set of input features, recursive feature elimination was performed to iteratively remove unimportant features that do not strongly contribute to the prediction.The area under the receiver operating characteristic curve (AUC) was used to rank the performance of the models.Figure 5 summarized data acquisition and model development workflow.

Statistical analysis
The mean and standard deviation (SD) values were calculated for continuous variables, while the counts and percentages were used to summarize categorical features.The differences in feature values between patients with and without RIH were evaluated using the Mann-Whitney U tests and Chi-square tests.The difference in predictive performance of the models were evaluated using signed rank tests.The Benjamini-Hochberg procedure was performed to control for multiple testing.A p-value cutoff of 0.05 was set to define statistical significance.

Figure 1 .
Figure 1.Performances of the logistic regression and random forest models.(a) Distributions of the areas under the receiver operating characteristic curve (AUCs) for the logistic regression models trained on different input data combination.(b) Similar visualization for the AUCs of the random forest models.(c) The receiver operating characteristic curves of the best combined models on the held-out test dataset.

Figure 2 .
Figure 2. Confusion matrices for the best combined models on the held-out test dataset.(a) The combined logistic regression model.(b) The combined random forest model.

Figure 3 .
Figure 3. Coefficients and importance of features selected by either the best combined logistic regression model (top) or the best combined random forest model (bottom).Red bars indicate clinical and dosimetric features.

Figure 4 .
Figure 4. CT images of the thyroid gland with different radiomics feature values.(a) Thyroid gland in a patient who developed RIH.(b) Thyroid gland in a patient without RIH.

Table 3 .
Model performance on the training and validation sets.*Compared with the model utilizing only clinical and dosimetric data.

Table 4 .
Model performance on the held-out test dataset.