Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

# Integration of machine learning and genome-scale metabolic modeling identifies multi-omics biomarkers for radiation resistance

## Abstract

Resistance to ionizing radiation, a first-line therapy for many cancers, is a major clinical challenge. Personalized prediction of tumor radiosensitivity is not currently implemented clinically due to insufficient accuracy of existing machine learning classifiers. Despite the acknowledged role of tumor metabolism in radiation response, metabolomics data is rarely collected in large multi-omics initiatives such as The Cancer Genome Atlas (TCGA) and consequently omitted from algorithm development. In this study, we circumvent the paucity of personalized metabolomics information by characterizing 915 TCGA patient tumors with genome-scale metabolic Flux Balance Analysis models generated from transcriptomic and genomic datasets. Metabolic biomarkers differentiating radiation-sensitive and -resistant tumors are predicted and experimentally validated, enabling integration of metabolic features with other multi-omics datasets into ensemble-based machine learning classifiers for radiation response. These multi-omics classifiers show improved classification accuracy, identify clinical patient subgroups, and demonstrate the utility of personalized blood-based metabolic biomarkers for radiation sensitivity. The integration of machine learning with genome-scale metabolic modeling represents a significant methodological advancement for identifying prognostic metabolite biomarkers and predicting radiosensitivity for individual patients.

## Introduction

Despite being one of the oldest forms of cancer therapy and still a primary treatment modality, radiation therapy is not effective for over one-fifth of cancer patients distributed across almost all cancer types1,2. While biological understanding of radiation resistance has been advanced, use of a priori prediction of radiation response for individual cancer patients is not yet implemented clinically3. Early studies that identified biomarkers for radiation response focused on tumor histology, clinical factors including staging and Karnofsky performance score, and physiological parameters such as tumor oxygenation status4,5,6. As methods for transcriptomic analysis have improved, gene expression-based classifiers for radiation response have proliferated (recently curated in the RadiationGeneSigDB database)7. To date, however, these radiation response classifiers do not integrate multiple -omics modalities, owing in part to a lack of available -omics datasets for individual patient tumors. Specifically, while genomic and transcriptomic data are becoming more widely available for large numbers of patient tumors through initiatives such as The Cancer Genome Atlas (TCGA), metabolomic data associated with tumor biobanks are rarely captured, limiting inclusion of tumor metabolic features in predictive models for radiation therapy response2.

Given the lack of available tumor metabolomic data, genome-scale metabolic modeling approaches such as flux balance analysis (FBA) are becoming increasingly popular for predicting metabolic phenotypes8,9. By combining a curated reconstruction of the human metabolic network with constraints on metabolic reaction activities and an objective function to maximize a particular metabolic phenotype, predictions of steady-state reaction fluxes or metabolite production rates under physiological constraints can be obtained at a genome scale10. We previously developed a bioinformatics pipeline for integrating genomic, transcriptomic, kinetic, and thermodynamic parameters into personalized FBA models of 716 radiation-sensitive and 199 radiation-resistant patient tumors from TCGA across multiple cancer types11. Using these metabolic models, we identified differences in redox metabolism between radiation-sensitive and -resistant tumors, as well as personalized gene targets for inhibiting antioxidant production and clearance of reactive oxygen species. By validating model predictions using a panel of matched radiation-sensitive and -resistant cancer cell lines, we demonstrated that genome-scale metabolic models provide accurate predictions of tumor metabolism and can identify diagnostic and therapeutic biomarkers for radiation response.

While machine learning methods have been previously combined with genome-scale metabolic models to improve prediction of metabolic phenotypes, most studies combining these two methodologies have focused on microbiological applications rather than applications to cancer metabolism or predicting treatment outcomes12,13,14. We hypothesize that predictions from genome-scale metabolic models of patient tumors would provide additional information for distinguishing pathophysiological differences between radiation-sensitive and -resistant tumors, as well as for prediction of radiation response.

In this work, we utilize personalized FBA models of TCGA patient tumors to predict genome-scale metabolite production rates for incorporation into machine learning classifiers and identification of metabolite biomarkers associated with radiation resistance. In addition, through integration with clinical, genomic, and transcriptomic datasets, we develop gene expression, multi-omics, and non-invasive classifiers that outperform previous predictors of radiation response, as well as provide personalized diagnostic biomarker panels for individual patient tumors.

## Results

### Gene expression classifier implicates cellular metabolism

Gene set enrichment analysis (GSEA) of these 782 genes among the Hallmarks of Cancer showed significantly increased enrichment of the “Deregulating cellular energetics” hallmark, with very low enrichment of the “Genome instability & mutation” hallmark (Fig. 1c)29,30. Hierarchical clustering of the hallmark enrichment ranks for each gene set in RadiationGeneSigDB revealed two major clusters: a larger cluster with very high rank of “Genome instability & mutation,” and a smaller cluster with much higher ranks for other hallmarks involved in cellular metabolism, angiogenesis, and metastasis (Fig. 1d). This dichotomy suggests that although the biological response to radiation therapy certainly involves genomic instability and DNA damage repair, other biological processes such as cellular metabolism may play critical roles as well31,32. GSEA of cancer expression modules additionally showed increased enrichment of many modules involved in cellular metabolism, including amino acid and sulfur metabolism, redox metabolism, and lipid metabolism (Fig. 1e)33. Finally, GSEA of Recon3D metabolic subsystems demonstrated increased enrichment of pathways involved in central carbon metabolism and lipid metabolism, with the majority of genes being associated with increased probability of radiation resistance (Fig. 1f)10. Together, analysis of this gene expression classifier suggests that radiation-resistant tumors exemplify dysregulation in their cellular metabolic networks, and that additional features involving the metabolism of radiation-sensitive and -resistant tumors will provide significant benefit in developing machine learning classifiers for radiation response.

### FBA models accurately predict relative metabolite production

Personalized genome-scale FBA models of radiation-sensitive and -resistant TCGA tumors were generated to obtain metabolic features that could be used in machine learning classifiers for radiation response (see Methods section). These FBA models were developed through integration of gene expression and mutation information from individual patient tumors, as well as kinetic and thermodynamic parameters from publicly available repositories11. By systematically creating artificial metabolite sinks in the Recon3D metabolic network and evaluating fluxes to these sinks, the production rates of different metabolites were predicted and compared between radiation-sensitive and -resistant tumors (Fig. 2a and Supplementary Data 5). Figure 2b shows that many of the metabolite classes implicated from the gene expression classifier showed significantly increased production in radiation-resistant tumors. These included antioxidant and cysteine-containing metabolites (including precursors of glutathione, an antioxidant with previously implicated roles in radiation response), lipid and fatty acid metabolites (including those previously implicated in lipid peroxidation in response to ionizing radiation), and immune system mediators34,35,36. While fewer metabolites were predicted to be significantly downregulated in radiation-resistant tumors, many metabolites involved in nucleotide metabolism were among this group.

Regression of experimental metabolite concentrations among the NCI-60 cancer cell line panel with cell line surviving fraction at 2 Gy radiation (SF2) showed up- and downregulation of the same metabolite classes predicted from FBA models (Fig. 2c)37. Many lipid and fatty acid metabolites positively correlate with radiation resistance (including cholesterol, which had the most positive correlation among all metabolites tested); antioxidant metabolites including glutathione positively correlate as well. On the other hand, many nucleotide metabolites are anti-correlated with radiation resistance (including UDP-MurNAc, which had the most negative correlation among all metabolites tested). Regression of experimental metabolite concentrations with cell line radiation response among the larger CCLE cancer cell line panel yielded similar findings, with upregulation of lipid/antioxidant metabolites and downregulation of nucleotide metabolites in radiation-resistant cell lines (Supplementary Fig. 3).

### Machine learning architecture for radiation response

To integrate FBA model predictions of metabolite production rates with other TCGA datasets into multi-omics machine learning classifiers, a dataset-independent ensemble architecture was developed (Fig. 3a). Multiple independent “base learner” classifiers are trained on an individual -omics dataset (either clinical, genomics, transcriptomics, or metabolomics data), as described in Supplementary Fig. 1. Subsequently, by comparing predicted class probabilities from each individual base learner to known radiation responses, a “meta-learner” classifier is trained to determine which base learner provides the most accurate prediction of radiation response based on the multi-omics features of individual samples (Fig. 3b)40. For an individual testing sample, each base learner outputs the predicted probability of radiation resistance (pi), and the meta-learner outputs the predicted probability that each base learner will provide the most accurate prediction (wi); the final probability of radiation resistance is the weighted average of each pi, with weights being each wi (Fig. 3c). This dataset-independent ensemble architecture performs better across multiple performance metrics compared to the common practice of initially combining all -omics datasets and training on a single classifier (Fig. 3d and Supplementary Figs. 9 and 10). Overall, this machine learning architecture is a robust platform for integrating multi-omics data and providing accurate predictions of radiation response in individual patient tumors.

### Multi-omics classifier identifies clinical patient subgroups

Using the dataset-independent ensemble architecture described above, a multi-omics machine learning classifier integrating clinical, gene expression, mutation, and FBA-predicted metabolite production rates from TCGA tumors was developed (Supplementary Data 9). With an AUROC of 0.906 ± 0.004, this classifier has significantly greater performance compared to previously developed machine learning classifiers for radiation response (Figs. 4a and 1b)7,41. In addition, the threshold for separating radiation-sensitive and -resistant classes can be altered to optimize sensitivity, specificity, or a balance of both. In all, 725 of the 52,223 features from the four datasets (1.39%) were identified as significant in the classification of radiation response as determined by a 95% cumulative sum threshold on mean absolute SHAP values (mean |ΔP|) (Fig. 4b and Supplementary Data 10 and 11). While the majority of these 725 features were gene expression (48.3%) and metabolite (32.6%) features, clinical features including tumor histology, chemotherapeutic response, and cancer type contributed more than half of the total mean |ΔP| scores (60.1%; Fig. 4c and Supplementary Fig. 11). Mutations with significant mean |ΔP| scores included those directly involved in redox metabolism (IDH1 R132H) and lipid metabolism (BRAF V600E)42,43. Many of these significant mutation features were among those with the largest differences in mutation rates between breast cancer tumors exhibiting LRR versus CTL in an independent patient-derived dataset (Supplementary Fig. 12)28.

The contribution of a particular dataset toward radiation response classification for each individual patient was calculated by summing the patient-specific absolute SHAP values (|ΔP|) for all features within the particular dataset, normalized by the sum of patient-specific |ΔP| values across all datasets (Fig. 4d). Individual samples varied significantly in the contribution of different datasets toward radiation response classification. Using unsupervised clustering, three clusters of patients with varying contributions of clinical features were identified (Fig. 4e and Supplementary Fig. 13a). While “High Clinical” patients were categorized by large clinical feature contributions and small contributions from multi-omics datasets, multi-omics features provided the majority of cumulative SHAP values for “Low Clinical” patients, with metabolic features alone providing nearly as much utility as clinical features (Fig. 4f and Supplementary Figs. 1418). For this “Low Clinical” cluster, certain clinical features including chemotherapeutic response have diminutive utility, whereas multi-omics features including IDH1 SNP and lipid metabolite levels have much higher importance scores compared to the overall patient cohort. Significant heterogeneity in clinical clusters was observed based on patient clinical factors, especially cancer type and tumor histology (Fig. 4g–i). Output weights from the meta-learner provide an accurate prediction of clinical cluster, effectively differentiating between “Low Clinical” and “Medium/High Clinical” patients; this provides a valuable strategy for determining whether clinical information from electronic medical records is sufficient to accurately predict radiation response in an individual patient, or whether multi-omics features from tumor biopsy samples are needed (Fig. 4j).

### Metabolic biomarkers identified for radiation response

Metabolite set enrichment analysis of the 236 significant metabolite features from the multi-omics classifier indicated significant enrichment of several metabolic pathways involved in central carbon metabolism, lipid metabolism, and nucleotide metabolism (Fig. 5a). To identify individual metabolites with the largest impact on radiation response prediction, the Spearman correlation between SHAP value (ΔP) and predicted metabolite production rate across all patients was calculated for each metabolite (Supplementary Fig. 19). Figure 5b highlights many of the significant metabolic features, as well as metabolism-related gene expression and mutation features. Significant glycolytic and TCA cycle metabolites (fructose 1,6-bisphosphate, 3-phosphoglyceric acid, succinyl-CoA, and succinate) were all positively correlated with radiation resistance, while genes promoting gluconeogenesis (PCK2 and LDHC) were associated with radiation sensitivity. Fructose 2,6-bisphosphate, an allosteric regulator of PFK-1 that activates glucose breakdown, had one of the most positive correlation values. In addition, many metabolites in early mannose metabolism had positive correlation values, in accordance with previously observed radiation-induced upregulation of mannose-6-phosphate receptors and high-mannose type N-glycan production44,45.

Greater glycolytic fluxes in radiation-resistant tumor models resulted in increased production of the majority of significant lipid and fatty acid metabolites, including many with previously identified roles in antioxidation such as capric acid, butyric acid, eicosatrienoic acid, and γ-linolenic acid (Fig. 5c)46,47,48,49. On the other hand, significant nucleotide metabolites were highly correlated with radiation sensitivity, in agreement with the observed downregulation in radiation-resistant cancer cell lines (Fig. 5d). While production of energy metabolites including ATP was correlated with radiation sensitivity, FBA models predict significantly greater conversion of ADP to ATP in radiation-resistant tumors, in agreement with previous experimental findings (Fig. 5e and Supplementary Fig. 20)50,51. Finally, increased production of membrane phospholipids and arachidonic acid precursors resulted in significant correlations between inflammation-mediating eicosanoids and radiation resistance, corroborating previous evidence of radiation-sensitizing effects of cyclooxygenase inhibitors including aspirin (Fig. 5f)52. Together, these findings suggest that metabolic features from multiple interconnected pathways including central carbon, lipid, and nucleotide metabolism are viable diagnostic biomarkers for prediction of radiation sensitivity.

### Non-invasive classifier implicates blood metabolic features

Because non-invasive metabolic predictors of radiation response could be rapidly applied for informing patient-specific treatment, we refined machine learning classification to only integrate clinical data derived from non-invasive means (excluding any pathologic staging or histological information from tumor biopsies) with FBA-predicted production rates of metabolites known to be quantifiable in human blood samples (Fig. 6a)53. This non-invasive classifier performed similarly overall to the multi-omics classifier, with increased sensitivity and decreased specificity; this suggests that the non-invasive classifier may be optimal as a first-line screening test, followed by the multi-omics classifier as a second-line diagnostic test (Fig. 6b)54. A total of 97 of the 363 features from the two datasets (26.7%) were identified as significant in the classification of radiation response (Fig. 6c and Supplementary Data 12 and 13). Similar to the multi-omics classifier (Fig. 4e), individual patient contributions of clinical features formed a bimodal distribution of “Low Clinical” and “High Clinical” groups (Fig. 6d and Supplementary Fig. 13b). Blood metabolite features—including many lipid, nucleotide, and inflammation-mediating metabolites previously identified from the multi-omics classifier—provided almost one-half of the cumulative mean absolute SHAP values (mean |ΔP|) for “Low Clinical” patients (Fig. 6e). Dataset contributions and SHAP values for individual cancer patients can identify personalized biomarkers with maximal diagnostic utility (Fig. 6f–h). Overall, these findings demonstrate the value of blood-based biomarkers as a non-invasive approach toward personalized prediction of radiation response.

## Discussion

Despite significant interest in methodologies for the a priori prediction of radiation response in cancer patients, machine learning algorithms have yet to be used in the clinical setting for informing radiation treatment43. Recently developed classifiers for predicting tumor radiation response have focused mainly on gene expression data, rather than the integration of multiple -omics datasets7,55. This may be in part due to a lack of metabolomics datasets from tumor biobanks including TCGA, limiting inclusion of metabolic features in machine learning classifiers for radiation response. Here, we propose the strategy of utilizing personalized genome-scale FBA models of radiation-sensitive and -resistant patient tumors to predict the production rates of metabolites across the Recon3D metabolic network, leveraging the accessibility of genomic and transcriptomic tumor datasets to generate metabolic insight. These metabolic features are subsequently integrated with clinical, genomic, and gene expression data from TCGA tumors to generate gene expression, multi-omics, and non-invasive classifiers for radiation response. These classifiers provide more accurate predictions of tumor radiation response compared to previously developed classifiers, as well as multi-omics biomarkers associated with radiation sensitivity.

Integration of FBA model predictions into multi-omics machine learning classifiers for radiation response was performed by employing a dataset-independent ensemble architecture (Fig. 3). This approach was based on the concept of stacked generalization (having multiple “base learners” make predictions that are used as inputs for a separate “meta-learner”), which was shown to improve predictive accuracy in this study as well as multiple previous medical applications67,68,69. However, while in previous studies there is only one input dataset being supplied to the multiple base learners, we instead input different -omics datasets to separate base learners. The benefit of this dataset-independent approach is that the meta-learner can subsequently be used to predict which individual datasets will provide the most utility for determining radiation response in individual patients. For example, the meta-learner can accurately differentiate between “Low Clinical” patients (with large contributions of gene expression, mutation, and metabolic datasets from patient biopsy samples and genome-scale metabolic modeling) and “High Clinical” patients (with greater contribution of clinical data from electronic medical records) (Fig. 4). This stratification of patient populations allows for optimal resource allocation for collecting biological measurements with maximal diagnostic utility for individual cancer patients. Moreover, the use of GBM models as the base and meta-learners provides a significant amount of embedded feature selection; this decrease in model complexity not only lowers the cost of measuring biological features needed for prediction, but also improves the interpretability of models, increasing the likelihood of adoption by clinicians70.

In addition to demonstrating the utility of multi-omics data for the classification of radiation response, we found that a classifier utilizing non-invasive clinical information and blood-based metabolic biomarkers can predict radiation sensitivity with comparable accuracy (Fig. 6). Blood-based diagnostic tools are garnering attention for their use in early detection, monitoring, and optimal treatment identification for cancer patients71. Identification of circulating biomarkers through the integration of machine learning and genome-scale metabolic modeling could provide significant utility in adaptive radiotherapy to modify patient treatment with radiation or radiation-sensitizing chemotherapies in response to the observed efficacy of previous treatments72.

Metabolomic profiling is powerful for understanding cancer pathophysiology, identifying and monitoring clinical biomarkers, and predicting patient outcomes, but challenging to retrospectively analyze in specimen biobanks for inclusion in multi-omics data mining77. In this study, we demonstrate that integration of machine learning and genome-scale metabolic modeling methodologies allows for improved biomarker identification and prediction of radiation response in individual patient tumors without direct metabolomics measurements. This approach is generalizable toward other applications in guiding patient treatment, such as the prediction of chemotherapeutic response as well as identification of metabolic targets for pharmacological inhibition and treatment sensitization. The synergistic integration of machine learning and genome-scale metabolic modeling will inevitably yield additional insights for improving precision medicine and long-term care of cancer patients.

## Methods

### TCGA data retrieval and processing

Clinical data from TCGA patients were obtained from the GDC data portal (https://portal.gdc.cancer.gov; clinical drug, clinical patient, and clinical radiation files) and the Synapse TCGA_Pancancer project (https://www.synapse.org/#!Synapse:syn300013/wiki/70804; biological sample files)2. Drug names were standardized according to the standard available from the Gene-Drug Interactions for Survival in Cancer database78. Categorical clinical features were one-hot encoded before inputting into machine learning classifiers. RNA-Seq gene expression data were obtained from Rahman et al.’s alternative preprocessing method (GEO: GSE62944)79. Data from this preprocessing method showed fewer missing values, more consistent expression between replicates, and improved prediction of biological pathway activity compared to the original TCGA pipeline. Mutation data using the MuTect variant caller were obtained from the GDC data portal2,80. For all data types, only features with at least two unique non-missing values were included.

TCGA samples were classified into radiation-sensitive and radiation-resistant classes according to their reported sensitivity to radiation therapy based upon the RECIST classification method. Patients with a complete or partial response to radiation (greater than 30% decrease in tumor size) were classified as radiation-sensitive, and patients with stable or progressive disease (either less than 30% decrease in tumor size, or increase in tumor size) were classified as radiation-resistant. If a patient received multiple courses of radiation therapy, they were classified based on the response to their first course.

### Data splitting

Supplementary Fig. 21 provides an overview of data splitting for machine learning classifier training and testing. The collection of 716 radiation-sensitive and 199 radiation-resistant samples was randomly split into training+validation (80% of all samples) and testing (20% of all samples) groups. Within the training+validation group, five-fold cross-validation was performed to optimize hyperparameter values. The training (80% of training+validation samples) group was used for training the model with a given set of hyperparameters; within this training group, 87.5% was directly used for training, and 12.5% was used to identify the iteration at which to perform early stopping during training. The validation (20% of training+validation samples) group was used to assess model performance with the given set of hyperparameters. The average validation performance across all five folds was used to determine the optimized set of hyperparameters; once this set was determined, the model was retrained on the entire training+validation group, and the testing group (20% of all samples) was used to assess overall model performance. Twenty iterations of randomized training+validation/testing splitting were performed to analyze model predictions and performance metrics over multiple instances. All data splits were performed using stratified shuffle splitting, where the proportion of radiation-sensitive and -resistant samples was kept the same (refer to Supplementary Fig. 21).

### Base learners

Nd base learners were trained using an individual -omics dataset (either clinical, gene expression, mutation, or metabolic datasets), where Nd is the number of individual datasets being used for the classifier. Each base learner is composed of a GBM model that performs two-class classification (predicting either radiation sensitivity or resistance for each patient) using features from an individual dataset, such as clinical, genomics, or transcriptomics. GBM models using decision tree ensembles have many useful characteristics compared to other machine learning algorithms, including embedded feature selection, capability of handling missing values (which is common in clinical datasets), and efficient management of high-dimensional datasets (where the number of features greatly exceeds the number of samples)81,82. XGBoost (v0.90) was used to develop GBM base learners and meta-learners83.

Bayesian optimization was performed to optimize hyperparameter values for each GBM model. At each iteration of Bayesian optimization, five-fold cross-validation was used to calculate the performance of a particular set of hyperparameters. Weighted log loss was used as the performance metric for both GBM model training and evaluating model performance on validation sets:

$${\rm{Weighted}}\,{\mathrm{Log}}\,{\rm{Loss}}=\frac{1}{{N}_{s}}\mathop{\sum }_{i=1}^{{N}_{s}}[-({w}_{R}{y}_{i}\,\log ({p}_{i})+(1-{y}_{i})\log (1-{p}_{i}))]$$
(1)

where yi is the true class label of sample i (yi = 0 if sensitive, yi = 1 if resistant), pi is the predicted probability of sample i being radiation-resistant (belonging to class 1), wR is the weight given to radiation-resistant samples (wR = no. of sensitive samples/no. of resistant samples), and NS is the total number of samples. The weight given to radiation-resistant samples accommodates for the fact that there are more radiation-sensitive samples than radiation-resistant samples, and prevents classifiers from focusing on optimizing performance exclusively on radiation-sensitive samples. The mean weighted log loss plus one standard error over all five folds of cross-validation is used to choose the hyperparameter set with best performance. During model training, early stopping is employed to prevent overfitting.

For individual samples, each of the Nd base learners outputs the predicted probability of radiation resistance (p1, p2, …, pNd) using features from the individual data type. Each base classifier receives the same training/validation/testing split of samples.

### Meta-learner

For every sample within the five validation sets used for the base learners, each base learner’s output prediction of radiation resistance (pi) is compared to the sample’s true radiation response class (yi). The meta-learner is trained to predict the optimal base learner that provides the most accurate prediction of radiation response for each sample, based on the sample’s multi-omics features. This meta-learner performs an Nd-class classification, where Nd is the number of independent base learners. The features this meta-learner is trained on include all features from the Nd datasets that have non-zero SHAP values from their respective base learners; features that do not impact base learner predictions are not included, which increases the training speed while maintaining meta-learner accuracy. Because validation samples from the five-fold cross-validation were not directly used in base learner training, they can be used to train this meta-learner without overfitting or inflation of model performance metrics.

Implementation of the meta-learner is analogous to that of each base learner, using a GBM model, Bayesian optimization, early stopping, and five-fold cross-validation. Multiclass log loss was used as the performance metric for both GBM model training and evaluating model performance84:

$${\rm{Multiclass}}\,{\mathrm{Log}}\,{\rm{Loss}}=-\frac{1}{{N}_{s}}\mathop{\sum }_{i=1}^{{N}_{s}}\mathop{\sum }_{k=0}^{{N}_{k}}{y}_{i,k}\,\log ({p}_{i,k})$$
(2)

where yi,k is 1 if dataset k is the true optimal dataset of sample i and 0 otherwise, pi,k is the predicted probability of dataset k being the optimal dataset of sample i, NS is the total number of samples, and Nk is the total number of datasets. The mean multiclass log loss plus one standard error over all five folds of cross-validation is used to choose the optimal hyperparameter set with best performance.

For individual samples, the meta-learner outputs Nd probabilities (w1, w2, …, wNd) that each base learner is optimal for that sample (all Nd probabilities sum to 1). Note that, once the meta-learner is trained using the predicted probabilities from the base learners, the base learners and meta-learner act independently of each other when used on new testing samples.

Each testing sample is run through (1) all Nd base learners to obtain the predicted probabilities of radiation resistance using each of the Nd individual datasets (p1, p2, …, pNd), and (2) the meta-learner to obtain the predicted probabilities that each of the Nd base learners/datasets is optimal for that sample (w1, w2, …, wNd). To obtain the final predicted probability of radiation resistance for the testing sample, the weighted average of the base learner probabilities is taken, with the meta-learner probabilities as weights:

$$p={w}_{1}{p}_{1}+{w}_{2}{p}_{2}+\cdots +{w}_{{N}_{d}}{p}_{{N}_{d}}$$
(3)

Samples with p < 0.5 are classified as radiation-sensitive, while samples with p > 0.5 are classified as radiation-resistant.

### Bayesian optimization

Bayesian optimization was used to optimize GBM hyperparameters for both the base learner and meta-learner classifiers. This iterative approach automates the search for hyperparameter values by calculating an acquisition function that provides the expected benefit of sampling a particular point in hyperparameter space on the overall search for hyperparameters with minimal cross-validation error. At each iteration, the point in hyperparameter space with the largest acquisition function value is chosen, five-fold cross-validation is used to determine the performance of those particular hyperparameters, and the acquisition function is updated to then determine which next point in hyperparameter space will be sampled. Hyperopt (v0.1.2) was used to perform Bayesian optimization85. Supplementary Table 3 provides the eight GBM hyperparameters chosen for optimization of both base learner and meta-learner classifiers, with the ranges of values in the hyperparameter search space. A total of 28 = 256 iterations of Bayesian optimization were performed for each classifier.

### Classifier performance metrics

Final classifier performance was assessed on testing samples across the 20 iterations of randomized training+validation/testing splitting. The following performance metrics were used:

1. 1.

Weighted log loss: Eq. (1)

2. 2.

Area under the receiver operating characteristic curve (AUROC)

3. 3.

Balanced accuracy, an accuracy metric that corrects for unequal numbers of radiation-sensitive and -resistant patients:

$${\rm{Balanced}}\,{\rm{Accuracy}}=\frac{1}{2}\left(\frac{TP}{TP+FN}+\frac{TN}{TN+FP}\right)$$
(4)
4. 4.

Sensitivity:

$${\rm{Sensitivity}}=\frac{{TP}}{{TP}+{FN}}$$
(5)
5. 5.

Specificity:

$${\rm{Specificity}}=\frac{{TN}}{{TN}+{FP}}$$
(6)
6. 6.

Positive predictive value:

$${\rm{Positive}}\,{\rm{Predictive}}\,{\rm{Value}}=\frac{TP}{TP+FP}$$
(7)
7. 7.

Negative predictive value:

$${\rm{Negative}}\,{\rm{Predictive}}\,{\rm{Value}}=\frac{TN}{TN+FN}$$
(8)

### SHAP values

The importance of individual features toward the prediction of radiation response, both averaged across all samples and for individual samples, was determined by calculating SHAP values for each classifier15,16. Each SHAP value (ΔP) represents the change in predicted probability of radiation resistance for patient i attributed to feature j. Features with positive SHAP values for patient i signify those where the particular value of feature j attributed to patient i is such that it increases patient i’s predicted probability of radiation resistance. Larger absolute SHAP values (|ΔP|) indicate features with larger overall contributions (either negatively or positively). Mean absolute SHAP values across all samples (Mean |ΔP|) provide an indication of the overall importance of a particular feature in the classifier’s prediction of radiation response. SHAP values were averaged across 20 training+validation/testing splits by a weighted average, with weights proportional to the inverse of the weighted log loss performance metric on the testing set for that split86. This weighted average allows model analysis to be more reflective of the more accurate predictions, so that identified biomarkers are more likely to be true biomarkers rather than artifacts of poorly performing predictions. Values were normalized by the difference between prior and posterior probabilities of radiation resistance for each sample. SHAP v0.29.1 was used to calculate SHAP values16.

### Comparison of machine learning algorithms

scikit-learn v0.21.2 functions sklearn.ensemble.RandomForestClassifier() and sklearn.linear_model.LogisticRegression() were used to implement random forest and logistic regression with L1 regularization classifiers, respectively87. Keras v2.3.1 was used to implement the neural network with L1 regularization classifier (https://github.com/keras-team/keras). Weighted log loss (Eq. (1)) was used as the loss function for the neural network classifier, and early stopping was performed. Missing values were imputed and scaled using sklearn.impute.SimpleImputer() and sklearn.preprocessing.StandardScaler() functions, respectively, before training with the random forest, logistic regression, and neural network classifiers. Supplementary Tables 46 provide the hyperparameters and value ranges used for Bayesian optimization with each algorithm; 256 iterations of Bayesian optimization were performed for each classifier. Each classifier, including the GBM classifier, was run using the same training, validation, and testing samples at each of 20 training+validation/testing splits so that performance can be accurately compared.

### Comparison of gene expression datasets

Eleven gene expression sets for oxic radiation response in the RadiationGeneSigDB database were compared to the set of 782 significant genes from the gene expression classifier7. Gene names from RadiationGeneSigDB gene sets were converted to Entrez gene ID’s and gene symbols. Those genes where a matching Entrez gene ID or gene symbol could not be found were removed. In addition, those genes that were not in both TCGA and CCLE gene expression datasets were removed.

To compare performance of gene expression sets on TCGA data, classification models were trained to predict radiation-sensitive or -resistant classes of TCGA tumor samples using gene expression data from only the subset of genes for an individual set. Model performance was assessed using weighted log loss (Eq. (1)) and AUROC metrics. To compare performance of gene expression sets on CCLE data, regression models were trained to predicted radiation response (reported as area under the curve of survival vs. radiation dose) of CCLE cell lines using gene expression data from only the subset of genes for an individual set39. Model performance was assessed using mean absolute error and mean squared error metrics.

### Flux balance analysis (FBA)

Generation of personalized FBA models of individual TCGA tumor samples was performed as described in Lewis et al.11. To predict the maximum production of a particular metabolite in FBA models, the following objective function was used:

$$1{\rm{met}}[{\rm{all}}]\to \varnothing$$
(9)

where “met” is the metabolite to be maximized, and “[all]” represents the maximization of this objective function across all cellular compartments where the metabolite is located. This creates an artificial sink for a particular metabolite in the Recon3D metabolic network, resulting in the maximization of reaction fluxes generating this metabolite. We hypothesized that this objective function would be valid and thus yield accurate predictions for metabolites with large differences in production between radiation-sensitive and -resistant tumors, as these would be particularly beneficial to either tumor class and thus the metabolic network of these tumors would be optimized to maximize levels of the metabolite.

The modeled external compartment contained all metabolites found in DMEM/F-12 cell culture media (Thermo Fisher Scientific, Cat#11320) as well as fetal bovine serum (FBS) to match the cell culture media used for experimental validation88. All 871 metabolites in the Recon3D human metabolic reconstruction that (1) had KEGG database IDs, (2) were not present in the extracellular media, and (3) were capable of being produced by all FBA tumor models, were included in the FBA metabolite production screen.

### NCI-60 data retrieval and processing

Experimental metabolomics data from NCI-60 cancer cell lines were obtained from the Developmental Therapeutics Program of the National Cancer Institute (https://wiki.nci.nih.gov/display/NCIDTPdata/Molecular+Target+Data). Normalized concentration entries without metabolite names or for isobars were excluded. Cell line surviving fraction at 2 Gy radiation (SF2) values was obtained from Amundson et al.37.

### CCLE data retrieval and processing

TPM-normalized RNA-Seq transcriptomic data from the CCLE cancer cell lines were obtained from the Broad Institute CCLE database (https://data.broadinstitute.org/ccle/CCLE_RNAseq_rsem_genes_tpm_20180929.txt.gz). Experimental metabolomics data were obtained from Li et al.38. Normalized log10-transformed concentration entries were utilized. Cell line radiation responses measured using the area under the radiation response curve were obtained from Yard et al.39.

### Keene et al. data retrieval and processing

RNA-Seq transcriptomic data from breast cancer tumors exhibiting LRR or CTL from Keene et al. were obtained from GEO GSE11993728. RPKM-normalized values were normalized to TPM, and only genes with no missing values were kept in the analysis. For duplicate gene entries, the entry with the largest mean value across all samples was kept. Mutation data were obtained from Keene et al.’s Table S1.

### Cell culture

Supplementary Table 2 provides the matched radiation-sensitive and radiation-resistant cell lines used for experimental validation of metabolite levels predicted from FBA models. All cell lines were maintained in DMEM/F-12 cell culture media (Thermo Fisher Scientific, Cat #11320) with 10% FBS (Sigma-Aldrich, Cat #F4135) at 37 °C and 5% CO2, and were free of Mycoplasma.

### Metabolomics

Three biological replicates of each cell line were grown in separate T-25 flasks with the cell culture conditions described above. Cell pellets with approximately 1 million cells were obtained from trypsinization, centrifugation, and removal of supernatant. Samples were reconstituted in 90% MeOH, 10% H2O at a ratio of 200 μL/1 million cells. Aliquots of the supernatant were combined to create a pooled sample used for quality control. Aliquots of the samples were transferred to LC vials and stored at 4 °C.

Hydrophilic interaction chromatography-tandem mass spectrometry (HILIC-MS/MS) untargeted metabolomics were performed. Chromatography parameters were as follows: BEH HILIC Column, 150 mm × 2.1 mm, 1.7 μm; mobile phase A: 80% H2O/20% ACN, 10 mM ammonium formate, 0.1% FA; mobile phase B: 100% ACN, 0.1% FA; column temperature: 40 °C; 2 μL sample injection. MS parameters were as follows: resolution: 240,000; scan range: 70–1050 m/z; polarity: positive/negative; AGC target: 1e5. MS2 parameters were as follows: isolation window: 0.8 m/z; detector: Orbitrap; polarity: positive/negative; fragmentation method: HCD; collision energy: 15, 30, 45; resolution: 30,000.

Compound Discoverer 3.1 was used to perform quality control, putative metabolite identification, and quantification of metabolite levels. Results for positive and negative ion modes were combined. Metabolites with no identified name were removed from the analysis. If duplicate metabolites with the same identification were obtained, then the entry with the largest maximum area was used. KEGG IDs for each metabolite were manually identified based on metabolite name, molar mass, and chemical formula. Metabolites from experimental metabolomics were matched to those from FBA analysis by matching KEGG IDs.

For the comparison of model-predicted and experimentally measured metabolite levels, all metabolites within the following Recon3D subsystems that were matched with experimental metabolites were included in the analysis:

• Nucleotide metabolism: “Nucleotide interconversion,” “Nucleotide salvage pathway,” “Pentose phosphate pathway,” “Purine catabolism,” “Purine synthesis,” “Pyrimidine catabolism,” “Pyrimidine synthesis”.

• Lipid metabolism: “Cholesterol metabolism,” “Fatty acid oxidation,” “Fatty acid synthesis,” “Glycosphingolipid metabolism,” “Phosphatidylinositol phosphate metabolism,” “Sphingolipid metabolism,” “Steroid metabolism”.

• Cysteine/antioxidant metabolism: “Glutathione metabolism,” “ROS detoxification,” plus metabolite “Lipoamide”.

• Immune system mediators: “Arachidonic acid metabolism,” “Eicosanoid metabolism”.

### Statistics

A description of the statistical values used in creating boxplots and error bars within bar charts, as well as the sample size for these plots, is provided in the representative figure legends. In all cases except Supplementary Figs. 47, distinct samples were used; in Supplementary Figs. 47, n = 3 biological replicates for each cell line were used. A description of the symbols used to signify statistical significance, as well as the statistical test used, is provided at the end of representative figure legends.

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

## Data availability

TCGA, NCI-60, CCLE, and Keene et al. datasets used in this study are available as cited above in their respective Methods subsections. Information about KEGG database IDs is available at https://genome.jp. The following datasets generated from this study are available at https://github.com/kemplab/ML-radiation (https://doi.org/10.5281/zenodo.4540314)89:

Supplementary Data 1. TCGA samples included in the analysis, with corresponding radiation response and patient/tumor factors.

Supplementary Data 2. SHAP values (ΔP) from the gene expression classifier, for individual TCGA patients.

Supplementary Data 3. Mean absolute SHAP values (mean |ΔP|) for individual features from the gene expression classifier.

Supplementary Data 4. 782 significant genes from the gene expression classifier.

Supplementary Data 5. FBA model-predicted metabolite production rates in TCGA tumors.

Supplementary Data 6. Experimental metabolomics data from radiation-sensitive and -resistant cancer cell lines.

Supplementary Data 7. Comparison of model-predicted and experimentally validated metabolite levels in radiation-sensitive and -resistant cancers.

Supplementary Data 8. Breast, colorectal, glioma, and upper aerodigestive cancer cell lines within the CCLE panel analyzed for associations between experimental metabolomics and radiation response.

Supplementary Data 9. Frequency of SNPs within each gene among all 915 TCGA samples from this study.

Supplementary Data 10. SHAP values (ΔP) from the multi-omics classifier, for individual TCGA patients.

Supplementary Data 11. Mean absolute SHAP values (mean |ΔP|) for individual features from the multi-omics classifier.

Supplementary Data 12. SHAP values (ΔP) from the non-invasive classifier, for individual TCGA patients.

Supplementary Data 13. Mean absolute SHAP values (mean |ΔP|) for individual features from the non-invasive classifier. Source data are provided with this paper.

## Code availability

Jupyter notebooks containing Python code for running and analyzing the gene expression, multi-omics, and non-invasive classifiers for radiation response are available at https://github.com/kemplab/ML-radiation (https://doi.org/10.5281/zenodo.4540314)89. In addition, the gene sets and code used to compare the significant gene list from our gene expression classifier to those from the RadiationGeneSigDB database are available. Jupyter notebooks containing Python code related to the generation of personalized genome-scale FBA models of TCGA tumors are available at https://github.com/kemplab/FBA-pipeline (https://doi.org/10.5281/zenodo.4540330)11,90.

## References

1. 1.

Therasse, P. et al. New guidelines to evaluate the response to treatment in solid tumors. J. Natl Cancer Inst. 92, 205–216 (2000).

2. 2.

Weinstein, J. N. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet 45, 1113–1120, https://doi.org/10.1038/ng.2764 (2013).

3. 3.

Kim, B. M. et al. Therapeutic implications for overcoming radiation resistance in cancer therapy. Int J. Mol. Sci. 16, 26880–26913, https://doi.org/10.3390/ijms161125991 (2015).

4. 4.

Vogin, G. & Foray, N. The law of Bergonie and Tribondeau: a nice formula for a first approximation. Int J. Radiat. Biol. 89, 2–8, https://doi.org/10.3109/09553002.2012.717732 (2013).

5. 5.

Griffin, T. W. et al. Predicting the response of head and neck cancers to radiation therapy with a multivariate modelling system: an analysis of the RTOG head and neck registry. Int J. Radiat. Oncol. Biol. Phys. 10, 481–487, https://doi.org/10.1016/0360-3016(84)90027-0 (1984).

6. 6.

Fyles, A. W. et al. Oxygenation predicts radiation response and survival in patients with cervix cancer. Radiother. Oncol. 48, 149–156, https://doi.org/10.1016/s0167-8140(98)00044-9 (1998).

7. 7.

Manem, V. S. & Dhawan, A. RadiationGeneSigDB: a database of oxic and hypoxic radiation response gene signatures and their utility in pre-clinical research. Br. J. Radio. 92, 20190198, https://doi.org/10.1259/bjr.20190198 (2019).

8. 8.

Yizhak, K. et al. Phenotype-based cell-specific metabolic modeling reveals metabolic liabilities of cancer. Elife 3, https://doi.org/10.7554/eLife.03641 (2014).

9. 9.

Nilsson, A. & Nielsen, J. Genome scale metabolic modeling of cancer. Metab. Eng. 43, 103–112, https://doi.org/10.1016/j.ymben.2016.10.022 (2017).

10. 10.

Brunk, E. et al. Recon3D enables a three-dimensional view of gene variation in human metabolism. Nat. Biotechnol. 36, 272–281, https://doi.org/10.1038/nbt.4072 (2018).

11. 11.

Lewis, J. E., Forshaw, T. E., Boothman, D. A., Furdui, C. M. & Kemp, M. L. Personalized genome-scale metabolic models identify targets of redox metabolism in radiation-resistant tumors. Cell Syst. 12, 68–81.e11, https://doi.org/10.1016/j.cels.2020.12.001 (2021).

12. 12.

Zampieri, G., Vijayakumar, S., Yaneske, E. & Angione, C. Machine and deep learning meet genome-scale metabolic modeling. PLoS Comput. Biol. 15, e1007084, https://doi.org/10.1371/journal.pcbi.1007084 (2019).

13. 13.

Kavvas, E. S., Yang, L., Monk, J. M., Heckmann, D. & Palsson, B. O. A biochemically-interpretable machine learning classifier for microbial GWAS. Nat. Commun. 11, 2580, https://doi.org/10.1038/s41467-020-16310-9 (2020).

14. 14.

Yang, J. H. et al. A white-box machine learning approach for revealing antibiotic mechanisms of action. Cell 177, 1649–1661.e1649, https://doi.org/10.1016/j.cell.2019.04.016 (2019).

15. 15.

Lundberg, S. M. & Lee, S.-I. Advances in neural information processing systems. Vol. 30, (eds Guyon, I. et al.) (Curran Associates, Inc., 2017).

16. 16.

Lundberg, S. M. et al. Explainable ai for trees: from local explanations to global understanding. Preprint at arXiv:1905.04610 (2019).

17. 17.

Ghashghaei, M. et al. Identification of a radiosensitivity molecular signature induced by enzalutamide in hormone-sensitive and hormone-resistant prostate cancer cells. Sci. Rep. 9, 8838, https://doi.org/10.1038/s41598-019-44991-w (2019).

18. 18.

Hino, S. et al. Cytoplasmic TSC-22 (transforming growth factor-beta-stimulated clone-22) markedly enhances the radiation sensitivity of salivary gland cancer cells. Biochem. Biophys. Res. Commun. 292, 957–963, https://doi.org/10.1006/bbrc.2002.6776 (2002).

19. 19.

Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29, https://doi.org/10.1038/75556 (2000).

20. 20.

Luo, J. et al. mRNA and methylation profiling of radioresistant esophageal cancer cells: the involvement of Sall2 in acquired aggressive phenotypes. J. Cancer 8, 646–656, https://doi.org/10.7150/jca.15652 (2017).

21. 21.

Gong, L. et al. Differential radiation response between normal astrocytes and glioma cells revealed by comparative transcriptome analysis. Onco. Targets Ther. 10, 5755–5764, https://doi.org/10.2147/ott.S144002 (2017).

22. 22.

Dahan, P. et al. Ionizing radiations sustain glioblastoma cell dedifferentiation to a stem-like phenotype through survivin: possible involvement in radioresistance. Cell Death Dis. 5, e1543, https://doi.org/10.1038/cddis.2014.509 (2014).

23. 23.

Heddleston, J. M. et al. Hypoxia inducible factors in cancer stem cells. Br. J. Cancer 102, 789–795, https://doi.org/10.1038/sj.bjc.6605551 (2010).

24. 24.

Niu, N. et al. Radiation pharmacogenomics: a genome-wide association approach to identify radiation response biomarkers using human lymphoblastoid cell lines. Genome Res. 20, 1482–1492, https://doi.org/10.1101/gr.107672.110 (2010).

25. 25.

Deng, Q. et al. Chemotherapy and radiotherapy downregulate the activity and expression of DNA methyltransferase and enhance Bcl-2/E1B-19-kDa interacting protein-3-induced apoptosis in human colorectal cancer cells. Chemotherapy 58, 445–453, https://doi.org/10.1159/000345916 (2012).

26. 26.

Hurov, K. E., Cotta-Ramusino, C. & Elledge, S. J. A genetic screen identifies the Triple T complex required for DNA damage signaling and ATM and ATR stability. Genes Dev. 24, 1939–1950, https://doi.org/10.1101/gad.1934210 (2010).

27. 27.

Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607, https://doi.org/10.1038/nature11003 (2012).

28. 28.

Keene, K. S. et al. Molecular determinants of post-mastectomy breast cancer recurrence. NPJ Breast Cancer 4, 34, https://doi.org/10.1038/s41523-018-0089-z (2018).

29. 29.

Hanahan, D. & Weinberg, R. A. Hallmarks of cancer: the next generation. Cell 144, 646–674, https://doi.org/10.1016/j.cell.2011.02.013 (2011).

30. 30.

Kiefer, J. et al. Abstract 3589: a systematic approach toward gene annotation of the hallmarks of cancer. Cancer Res. 77, 3589–3589, https://doi.org/10.1158/1538-7445.Am2017-3589 (2017).

31. 31.

Morgan, W. F., Day, J. P., Kaplan, M. I., McGhee, E. M. & Limoli, C. L. Genomic instability induced by ionizing radiation. Radiat. Res. 146, 247–258 (1996).

32. 32.

Powell, S. & McMillan, T. J. DNA damage and repair following treatment with ionizing radiation. Radiother. Oncol. 19, 95–108, https://doi.org/10.1016/0167-8140(90)90123-e (1990).

33. 33.

Segal, E., Friedman, N., Koller, D. & Regev, A. A module map showing conditional activity of expression modules in cancer. Nat. Genet. 36, 1090–1098, https://doi.org/10.1038/ng1434 (2004).

34. 34.

Mitchell, J. B. & Russo, A. The role of glutathione in radiation and drug induced cytotoxicity. Br. J. Cancer Suppl. 8, 96–104 (1987).

35. 35.

Yurkova, I., Shadyro, O., Kisel, M., Brede, O. & Arnhold, J. Radiation-induced free-radical transformation of phospholipids: MALDI-TOF MS study. Chem. Phys. Lipids 132, 235–246, https://doi.org/10.1016/j.chemphyslip.2004.08.006 (2004).

36. 36.

Laiakis, E. C. et al. Metabolic phenotyping reveals a lipid mediator response to ionizing radiation. J. Proteome Res. 13, 4143–4154, https://doi.org/10.1021/pr5005295 (2014).

37. 37.

Amundson, S. A. et al. Integrating global gene expression and radiation survival parameters across the 60 cell lines of the National Cancer Institute Anticancer Drug Screen. Cancer Res. 68, 415–424, https://doi.org/10.1158/0008-5472.Can-07-2120 (2008).

38. 38.

Li, H. et al. The landscape of cancer cell line metabolism. Nat. Med. 25, 850–860, https://doi.org/10.1038/s41591-019-0404-8 (2019).

39. 39.

Yard, B. D. et al. A genetic basis for the variation in the vulnerability of cancer to DNA damage. Nat. Commun. 7, 11428, https://doi.org/10.1038/ncomms11428 (2016).

40. 40.

Wolpert, D. H. Stacked generalization. Neural Netw. 5, 241–259 (1992).

41. 41.

Deist, T. M. et al. Machine learning algorithms for outcome prediction in (chemo)radiotherapy: an empirical comparison of classifiers. Med. Phys. 45, 3449–3459, https://doi.org/10.1002/mp.12967 (2018).

42. 42.

Dang, L. et al. Cancer-associated IDH1 mutations produce 2-hydroxyglutarate. Nature 462, 739–744, https://doi.org/10.1038/nature08617 (2009).

43. 43.

Kang, J., Schwartz, R., Flickinger, J. & Beriwal, S. Machine learning approaches for predicting radiation therapy outcomes: a clinician’s perspective. Int. J. Radiat. Oncol. Biol. Phys. 93, 1127–1135, https://doi.org/10.1016/j.ijrobp.2015.07.2286 (2015).

44. 44.

Kim, S. et al. Radiation-induced autophagy potentiates immunotherapy of cancer via up-regulation of mannose 6-phosphate receptor on tumor cells in mice. Cancer Immunol. Immunother. 63, 1009–1021, https://doi.org/10.1007/s00262-014-1573-4 (2014).

45. 45.

Jaillet, C. et al. Radiation-induced changes in the glycome of endothelial cells with functional consequences. Sci. Rep. 7, 5290, https://doi.org/10.1038/s41598-017-05563-y (2017).

46. 46.

Lee, S. I. & Kang, K. S. Function of capric acid in cyclophosphamide-induced intestinal inflammation, oxidative stress, and barrier function in pigs. Sci. Rep. 7, 16530, https://doi.org/10.1038/s41598-017-16561-5 (2017).

47. 47.

Kumar, A. P., Chougala, M., Nandini, C. & Salimath, P. Effect of butyric acid supplementation on serum and renal antioxidant enzyme activities in streptozotocin‐induced diabetic rats. J. Food Biochem. 34, 15–30 (2010).

48. 48.

Gavino, V. C., Miller, J. S., Ikharebha, S. O., Milo, G. E. & Cornwell, D. G. Effect of polyunsaturated fatty acids and antioxidants on lipid peroxidation in tissue cultures. J. Lipid Res. 22, 763–769 (1981).

49. 49.

Cameron, N. E. & Cotter, M. A. Interaction between oxidative stress and gamma-linolenic acid in impaired neurovascular function of diabetic rats. Am. J. Physiol. 271, E471–E476, https://doi.org/10.1152/ajpendo.1996.271.3.E471 (1996).

50. 50.

Bhatt, A. N. et al. Transient elevation of glycolysis confers radio-resistance by facilitating DNA repair in cells. BMC Cancer 15, 335, https://doi.org/10.1186/s12885-015-1368-9 (2015).

51. 51.

Lu, C. L. et al. Tumor cells switch to mitochondrial oxidative phosphorylation under radiation via mTOR-mediated hexokinase II inhibition—a Warburg-reversing effect. PLoS ONE 10, e0121046, https://doi.org/10.1371/journal.pone.0121046 (2015).

52. 52.

Choy, H. & Milas, L. Enhancing radiotherapy with cyclooxygenase-2 enzyme inhibitors: a rational advance? J. Natl Cancer Inst. 95, 1440–1452, https://doi.org/10.1093/jnci/djg058 (2003).

53. 53.

Wishart, D. S. et al. HMDB: the Human Metabolome Database. Nucleic Acids Res. 35, D521–D526, https://doi.org/10.1093/nar/gkl923 (2007).

54. 54.

Maxim, L. D., Niebo, R. & Utell, M. J. Screening tests: a review with examples. Inhal. Toxicol. 26, 811–828, https://doi.org/10.3109/08958378.2014.955932 (2014).

55. 55.

Kang, J., Coates, J. T., Strawderman, R. L., Rosenstein, B. S. & Kerns, S. L. Radiogenomics models in precision radiotherapy: from mechanistic to machine learning. Preprint at arXiv:1904.09662 (2019).

56. 56.

Lewis, J. E. et al. Genome-scale modeling of NADPH-driven beta-lapachone sensitization in head and neck squamous cell carcinoma. Antioxid. Redox Signal 29, 937–952, https://doi.org/10.1089/ars.2017.7048 (2018).

57. 57.

Monk, J. M. et al. Genome-scale metabolic reconstructions of multiple Escherichia coli strains highlight strain-specific adaptations to nutritional environments. Proc. Natl Acad. Sci. USA 110, 20338–20343, https://doi.org/10.1073/pnas.1307797110 (2013).

58. 58.

Mims, J. et al. Energy metabolism in a matched model of radiation resistance for head and neck squamous cell cancer. Radiat. Res. 183, 291–304, https://doi.org/10.1667/rr13828.1 (2015).

59. 59.

Werner, E. et al. Ionizing radiation induction of cholesterol biosynthesis in lung tissue. Sci. Rep. 9, 12546, https://doi.org/10.1038/s41598-019-48972-x (2019).

60. 60.

Bogue, M. A. et al. Mouse Phenome Database: a data repository and analysis suite for curated primary mouse phenotype data. Nucleic Acids Res. 48, D716–D723, https://doi.org/10.1093/nar/gkz1032 (2020).

61. 61.

Svenson, K. L. et al. Multiple trait measurements in 43 inbred mouse strains capture the phenotypic diversity characteristic of human populations. J. Appl. Physiol. (1985) 102, 2369–2378, https://doi.org/10.1152/japplphysiol.01077.2006 (2007).

62. 62.

Chen, Y. A. et al. Simvastatin sensitizes radioresistant prostate cancer cells by compromising DNA double-strand break repair. Front Pharm. 9, 600, https://doi.org/10.3389/fphar.2018.00600 (2018).

63. 63.

Efimova, E. V. et al. HMG-CoA reductase inhibition delays DNA repair and promotes senescence after tumor irradiation. Mol. Cancer Ther. 17, 407–418, https://doi.org/10.1158/1535-7163.Mct-17-0288 (2018).

64. 64.

Kim, K. Y., Seol, J. Y., Jeon, G. A. & Nam, M. J. The combined treatment of aspirin and radiation induces apoptosis by the regulation of bcl-2 and caspase-3 in human cervical cancer cell. Cancer Lett. 189, 157–166, https://doi.org/10.1016/s0304-3835(02)00519-0 (2003).

65. 65.

Gash, K. J., Chambers, A. C., Cotton, D. E., Williams, A. C. & Thomas, M. G. Potentiating the effects of radiotherapy in rectal cancer: the role of aspirin, statins and metformin as adjuncts to therapy. Br. J. Cancer 117, 210–219, https://doi.org/10.1038/bjc.2017.175 (2017).

66. 66.

Jacobs, C. D. et al. Aspirin improves outcome in high risk prostate cancer patients treated with radiation therapy. Cancer Biol. Ther. 15, 699–706, https://doi.org/10.4161/cbt.28554 (2014).

67. 67.

Ma, Z., Wang, P., Gao, Z., Wang, R. & Khalighi, K. Ensemble of machine learning algorithms using the stacked generalization approach to estimate the warfarin dose. PLoS ONE 13, e0205872, https://doi.org/10.1371/journal.pone.0205872 (2018).

68. 68.

Bhatt, S. et al. Improved prediction accuracy for disease risk mapping using Gaussian process stacked generalization. J. R. Soc. Interface 14, https://doi.org/10.1098/rsif.2017.0520 (2017).

69. 69.

Grenet, I. et al. Stacked generalization with applicability domain outperforms simple QSAR on in vitro toxicological data. J. Chem. Inf. Model 59, 1486–1496, https://doi.org/10.1021/acs.jcim.8b00553 (2019).

70. 70.

Kelly, C. J., Karthikesalingam, A., Suleyman, M., Corrado, G. & King, D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med. 17, 195, https://doi.org/10.1186/s12916-019-1426-2 (2019).

71. 71.

Hanash, S. M., Baik, C. S. & Kallioniemi, O. Emerging molecular biomarkers—blood-based strategies to detect and monitor cancer. Nat. Rev. Clin. Oncol. 8, 142 (2011).

72. 72.

Yan, D. & Georg, D. Adaptive radiation therapy. Z. Med. Phys. 28, 173–174, https://doi.org/10.1016/j.zemedi.2018.03.001 (2018).

73. 73.

Cooper, L. A. et al. PanCancer insights from The Cancer Genome Atlas: the pathologist’s perspective. J. Pathol. 244, 512–524, https://doi.org/10.1002/path.5028 (2018).

74. 74.

Mobadersany, P. et al. Predicting cancer outcomes from histology and genomics using convolutional networks. Proc. Natl Acad. Sci. USA 115, E2970–E2979, https://doi.org/10.1073/pnas.1717139115 (2018).

75. 75.

Miousse, I. R., Kutanzi, K. R. & Koturbash, I. Effects of ionizing radiation on DNA methylation: from experimental biology to clinical applications. Int J. Radiat. Biol. 93, 457–469, https://doi.org/10.1080/09553002.2017.1287454 (2017).

76. 76.

Czochor, J. R. & Glazer, P. M. microRNAs in cancer cell response to ionizing radiation. Antioxid. Redox Signal 21, 293–312, https://doi.org/10.1089/ars.2013.5718 (2014).

77. 77.

Madssen, T. S. et al. Historical biobanks in breast cancer metabolomics—challenges and opportunities. Metabolites 9, https://doi.org/10.3390/metabo9110278 (2019).

78. 78.

Spainhour, J. C. G., Lim, J. & Qiu, P. GDISC: a web portal for integrative analysis of gene-drug interaction for survival in cancer. Bioinformatics 33, 1426–1428, https://doi.org/10.1093/bioinformatics/btw830 (2017).

79. 79.

Rahman, M. et al. Alternative preprocessing of RNA-Sequencing data in The Cancer Genome Atlas leads to improved analysis results. Bioinformatics 31, 3666–3672, https://doi.org/10.1093/bioinformatics/btv377 (2015).

80. 80.

Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219, https://doi.org/10.1038/nbt.2514 (2013).

81. 81.

Friedman, J. H. Greedy function approximation: a gradient boosting machine. Ann. Statist. 29, 1189–1232 (2001).

82. 82.

Saeys, Y., Inza, I. & Larranaga, P. A review of feature selection techniques in bioinformatics. Bioinformatics 23, 2507–2517, https://doi.org/10.1093/bioinformatics/btm344 (2007).

83. 83.

Chen, T. & Guestrin, C. Xgboost: A scalable tree boosting system. Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 2016.

84. 84.

Bishop, C. M. Pattern Recognition and Machine Learning (Springer, 2006).

85. 85.

Bergstra, J., Komer, B., Eliasmith, C., Yamins, D. & Cox, D. D. Hyperopt: a python library for model selection and hyperparameter optimization. Comput. Sci. Discov. 8, 014008 (2015).

86. 86.

Yousefi, S., Shaban, A., Amgad, M., Chandradevan, R. & Cooper, L. A. Learning clinical outcomes from heterogeneous genomic data sources. Preprint at arXiv:1904.01637 (2019).

87. 87.

Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

88. 88.

Price, P. J. & Gregory, E. A. Relationship between in vitro growth promotion and biophysical and biochemical properties of the serum supplement. Vitro 18, 576–584, https://doi.org/10.1007/bf02810081 (1982).

89. 89.

Lewis, J. E. & Kemp, M. L. ML-radiation datasets. GitHub https://doi.org/10.5281/zenodo.4540314 (2021).

90. 90.

Lewis, J. E., Forshaw, T. E., Boothman, D. A., Furdui, C. M. & Kemp, M. L. FBA-pipeline datasets. GitHub https://doi.org/10.5281/zenodo.4540330 (2021).

## Acknowledgements

The authors gratefully acknowledge support for this work from an NIH/NCI F30 CA224968 fellowship (PI: J.E.L.; Sponsor: M.L.K.) and an NIH/NCI U01 CA215848 grant (PI: M.L.K.). We wish to acknowledge David Gaul and the core facilities at the Parker H. Petit Institute for Bioengineering and Bioscience at the Georgia Institute of Technology for the use of their shared equipment, services, and expertise.

## Author information

Authors

### Contributions

Conceptualization: J.E.L. and M.L.K.; methodology, software, validation, formal analysis, and investigation: J.E.L.; resources: M.L.K.; data curation and writing (original draft): J.E.L.; writing (review and editing): J.E.L. and M.L.K.; visualization: J.E.L.; supervision: M.L.K.; and project administration and funding acquisition: J.E.L. and M.L.K.

### Corresponding author

Correspondence to Melissa L. Kemp.

## Ethics declarations

### Competing interests

The authors declare no competing interests.

Peer review information Nature Communications thanks Pedro Carmona-Sáez and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

Reprints and Permissions

Lewis, J.E., Kemp, M.L. Integration of machine learning and genome-scale metabolic modeling identifies multi-omics biomarkers for radiation resistance. Nat Commun 12, 2700 (2021). https://doi.org/10.1038/s41467-021-22989-1

• Accepted:

• Published: