Prediction model for short-term mortality after palliative radiotherapy for patients having advanced cancer: a cohort study from routine electronic medical data

We developed a predictive score system for 30-day mortality after palliative radiotherapy by using predictors from routine electronic medical record. Patients with metastatic cancer receiving first course palliative radiotherapy from 1 July, 2007 to 31 December, 2017 were identified. 30-day mortality odds ratios and probabilities of the death predictive score were obtained using multivariable logistic regression model. Overall, 5,795 patients participated. Median follow-up was 39.6 months (range, 24.5–69.3) for all surviving patients. 5,290 patients died over a median 110 days, of whom 995 (17.2%) died within 30 days of radiotherapy commencement. The most important mortality predictors were primary lung cancer (odds ratio: 1.73, 95% confidence interval: 1.47–2.04) and log peripheral blood neutrophil lymphocyte ratio (odds ratio: 1.71, 95% confidence interval: 1.52–1.92). The developed predictive scoring system had 10 predictor variables and 20 points. The cross-validated area under curve was 0.81 (95% confidence interval: 0.79–0.82). The calibration suggested a reasonably good fit for the model (likelihood-ratio statistic: 2.81, P = 0.094), providing an accurate prediction for almost all 30-day mortality probabilities. The predictive scoring system accurately predicted 30-day mortality among patients with stage IV cancer. Oncologists may use this to tailor palliative therapy for patients.


Short-term mortality predictive risk factors.
To transform raw EMR data into variables usable in a prediction model, we first collected all data from the 180-to 365-day period (depending on particular variables), ending the day before palliative RT initiation (we did not exclude patients based on absence of data during the period). Raw data were aggregated into potential predictors in the following categories: demographics, prescribed medications, comorbidities and other grouped ICD-9 diagnoses, surgical procedures, health care resource use, and laboratory results. No data on the first course palliative RT itself (e.g., dose-fractionation and techniques) were used in the predictive model. More precise information on the variables used as short-term mortality predictors are provided in Appendix 1 and Supplementary Table 3. outcomes. Our primary outcome was 30-day overall mortality, which was calculated from the start of the first course palliative RT until death or when censored (April 1, 2019). The start date of RT was used because it was closer to the date when the clinical decision to treat was made than that of the end of treatment and provides a uniform time point across all fractionation regimens.
Model selection, performance, and scoring. We used multivariable logistic regression models to evaluate the predictive performance of the primary outcome, 30-day mortality 45,46 . The model's predictor functions were pre-specified a priori based on subject matter knowledge ( Table 2). We assumed a pattern of randomness and created one imputed dataset using a fully conditional specification based on a multivariate normal distribution 47 . Different combinations of the 13 covariates were chosen for the regression models ( Table 2). The 13 covariates were age, sex, Royal College of Surgeons modified comorbidity score 48 , log peripheral white blood cell count, log peripheral blood neutrophil lymphocyte ratio (NLR), log plasma urea, log serum bilirubin, serum albumin, lactate dehydrogenase (LDH), red cell distribution, attendance to emergency room, sites receiving palliative RT, and primary lung cancer.
Data-adaptive methods based on cross-validation and mean absolute error for predictions (MAE) were used to evaluate the predictive performance of different model specifications. We used ten-fold cross-validation to reduce the risk of overfitting the final model to the training set 49 . The cross-validation procedure involved fitting a candidate model for the primary outcome, using data from nine of the ten blocks (the "derivation set"), and evaluating its performance in the held-out block (the "validation set"). We repeated this process ten times, each time using a different block as the validation set, and then averaged the performance over the ten validation sets.
As the overall performance metric, we used the MAE, which measures the average of the difference between predicted and observed outcome in the test, i.e., the average prediction error 50 . This represents the closeness of the prediction to the eventual outcomes. Our measure of model discrimination was the cross-validated areas under the receiver operating characteristic (ROC) curves 51,52 . An ROC curve is a plot of the sensitivity of a model (the vertical axis) vs 1 minus the specificity (the horizontal axis) for all possible cut-off values that might be used to classify patients predicted to have 30-day mortality compared with patients who will not die within 30 days 51 . Given any 2 random patients, one died within 30 days and one did not, the probability that the model will correctly classify the patient with the outcome as higher risk is equal to the area under the ROC curve (AUC) 53 . We calculated 95% confidence intervals (CIs) of the AUC following the method of DeLong et al. 54 . We evaluated the model calibration by observing the agreement between observed outcomes and predictions 55 . We used a graphical assessment of calibration, with predictions on the x-axis and the observed outcome on the y-axis. We performed a sensitivity analysis to evaluate the robustness of model performance by testing different model specifications. www.nature.com/scientificreports www.nature.com/scientificreports/ Finally, we produced a point score system from the best model we developed. In the system, points were assigned based on the predictor values for a patient; the total scores correspond to the risks of the 30-day overall mortality 56 . The steps to develop the point score system have been summarized in the Appendix 2. For each point score we summarized the positive predictive value (PPV) and negative predictive value (NPV) which respectively represented the probability that the disease is present given a positive test result and that the disease is absent given a negative test result 57 .
Statistical analysis. Descriptive analyses were conducted to describe the cohort of patients receiving first course palliative RT. We used frequencies and proportions for categorical variables and means with standard deviations (when normally distributed) or medians with interquartile ranges (when not normally distributed) for continuous variables. To describe the association between patient factors and an increased or decreased short-term mortality, we reported odds ratios (OR) from univariable and multivariable logistic regressions with their respective 95% CI.
The analysis was performed using Stata v. 15

Results
Description of the cohort. We identified 5,795 patients who commenced palliative RT between July 1, 2007 and December 31, 2017. Patient characteristics are summarized in Table 1. The median age was 64 (interquartile range: 55-75) years; 61.8% were male. Patients with lung cancer (39.7%) constituted the highest proportion of the cohort. In all, 55.1%, 29.2%, and 15.7% were classified as having score 0, 1, and ≥2, according to the Royal College of Surgeons modified Charlson score, respectively. A total of 5,291 patients died during the follow-up period (median follow-up 3 months), of which 995 patients (17.2%) died within 30 days from the start of RT. Data were complete except for those on albumin, peripheral blood cell counts, urea, bilirubin, and LDH, which were imputed 21 . thirty-day mortality and model performance. Of    www.nature.com/scientificreports www.nature.com/scientificreports/ models 1-4 from the regression analyses ( Table 2). The most important predictors of short-term mortality were primary lung cancer (OR: 1.73, 95% CI: 1.47-2.04), log peripheral blood NLR (OR: 1.71, 95% CI 1.52-1.92), and log plasma urea (OR: 1.55, 95% CI: 1.32-1.82). Figure 1 shows good model discriminations from the candidate models by the ROC curves. Figure 2 shows the 10-fold cross-validated receiver-operating characteristics (cv-ROC) curve for 30-day mortality prediction from the best model (model 2 in Table 2). Model 2 showed the highest discrimination, i.e., its predictive accuracy was good, with a cross-validated-area under curve (cvAUC) of 0.81 (95% CI: 0.79-0.82) (Figs. 1,2). Tables 3 and 4 show the point score and average predicted probabilities of 30-day mortality based on model 2, respectively. For ease of interpretation, values of the predictors in log-scale were back-transformed to their original scale. A point score cut-off value of 6 (positive predictive value: 33.0%; negative predictive value: 94.2%; sensitivity: 80.3%; specificity: 66.2%) showed the greatest Youden's index (46.5), corresponding to maximum joint sensitivity and specificity on the ROC curve.
Supplementary Figure 1 shows the predicted probabilities of 30-day mortality by mortality status. The model calibration suggests a reasonably good fit for the model (Supplementary Fig. 2, likelihood-ratio statistic: 2.81, P = 0.094), which provides accurate predictions for almost the entire range of the death probability. The predicted probabilities stay close to the ideal calibration line for low and high probabilities of death. Sensitivity analyses showed no association between comorbidities and 30-day mortality or interaction between comorbidities and age; no association between systemic treatments, including chemotherapy, and null increase in predictive performance was observed, regardless of whether comorbidity was included in the model. Additionally, in sensitivity analysis we assessed whether our model was consistent to different windows of time (0-29, 0-35 and 0-45 days)  www.nature.com/scientificreports www.nature.com/scientificreports/ and we applied our point score system to predict 3-and 6-month mortality in the same patient cohorts. We found similar values of NPV and PPV ( Supplementary Tables 1 and 2).

Discussion
We found that primary lung cancer, peripheral blood NLR, and plasma urea were strong predictors of short-term mortality among patients with stage IV cancer. Our score system was a good predictor of short-term mortality; performance metrics by ROC curves and calibration curves showed high model discrimination and calibration, respectively.
More recent and successful studies on predictive models for survival after palliative RT in patients with advanced cancer were conducted by Chow et al. and Kristnan et al. [22][23][24] . These studies are similar; however, ours has important advantages. We developed a scoring system that uses objective measurements (complete blood counts, liver and renal function tests within 180 days) to determine the 30-day mortality of patients receiving palliative RT. Furthermore, our data were obtained from routine practice; this increased the model's clinical applicability, unlike those by Chow et al., whose Radiotherapy Rapid Response Programme was established with an aim   Krishnan et al.).
The concordance (C)-statistic is a measure of goodness-of-fit for binary outcomes in a logistic regression model. It represents the probability that the predicted and observed outcomes are concordant for a randomly selected pair of patients in the predictive model 25 . The C-statistic for TEACHH model and Chow's model based on 3 risk factors were 0.59 and 0.65 respectively 23,24 , while the AUC (equivalent to C-statistic) in our model was 0.81 which is better. The AUC was cross-validated which reduces the optimism bias of the other two and was internally validated to provide higher consistency in absence of an external validation. It is an easy-to-calculate tool for patients with metastatic cancer who were referred for palliative RT and who account for 20-40% of patients treated in radiation oncology departments [26][27][28][29] . Furthermore, given the mandatory status of death certification in Hong Kong and the automated nature of RT and vital status data collection, data on the dates of the first course of palliative RT and death were reliable. The referring clinician's indication was included in our definition of palliative RT, which was better than merely using predefined dose and fractionation schedules. Our model performed reasonably well across a range of cancer types and other variables, despite lacking genetic data, cancer-specific biomarkers, or any detailed information beyond EMR. This emphasized that commonly available data in EMR contain important predictors to identify clinically relevant outcomes in patients with cancer under palliative care. Most of the inputs to the model are standard structured data components in EMR. The model's algorithm could easily integrate into existing clinical management systems, importing the data directly from the EMR without specialized infrastructure. Additionally, implementing the tool can continuously and independently validate the predictive power from an ongoing prospective cohort. This is important to reflect the secular trend in cancer epidemiology changes, treatment variations, and referral patterns in an evolving real-world setting.
The model outperformed clinician estimates of survival to guide appropriate clinical judgment in treatment, resource allocation, and early palliative care referrals with advanced care planning 30 . The NPV exceeded 90% which means patients have very high chance of staying alive beyond 30 days if predicted so by the model. This could be a better standpoint to start dialogue with patients. Realistic and honest disclosure of prognosis can encourage shared decision-making between the patient and the care team, with which the patient can settle personal, family, and financial issues earlier, instead of embarking on another course of treatment based on inaccurate prognosis. However, after thorough discussions with the patient and family, if the patient still opts for RT despite reasonable chance of early mortality, we argue that hypofractionation is preferred to avoid a protracted course of RT near death, given the well-documented evidence for equivalent effects in a range of symptoms 31 .
Regarding the choice of covariates and development of the model, patients referred for palliative RT often received oncological treatment and blood work before; hence, we included commonly performed biochemical or hematological markers. Clinical experience has shown that patients with lung cancer generally die earlier than patients with other cancers, such as breast cancer 14 , and patients with certain sites of metastases (e.g., bone only) live longer than patients with others, such as brain and spinal metastases with cord compression [32][33][34][35] . Since no data were available on sites of metastatic diseases, we substituted with irradiation data. Hence, we included   www.nature.com/scientificreports www.nature.com/scientificreports/ primary cancer site and irradiation site in the score determination. Age may influence not only recommendations for treatment but also prediction of remaining lifespan, analyzed in our model. Moreover, clinician estimates of survival were excluded because they were likely based on experience and training, poorly reproducible, and not commonly recorded in routine electronic database.
Our study had limitations. First, the prediction model was built on data from patients treated with RT and might not be accurate for untreated patients. Second, our procedure for categorizing the predictor variables may not identify the cutoff values with the best discriminating capacity. Third, some important prognostic factors may have been omitted. For example, data on performance status and patient quality of life evaluations using validated scales, or of their frailty status 36,37 , considered prognostic in previous studies, were not analyzed 22 . However, we introduced patient comorbidities as proxy for patient frailties. Fourth, palliative RT use was at the oncologists' discretion in some cases when curative and palliative intent treatment could not be distinguished (e.g., patients having limited metastasis receiving higher dose RT for better local control). Finally, we considered the patients for first course palliative RT without considering the effects of subsequent RT courses and other treatments.
A prediction tool using EMR data, retrieved from routine clinical practice, can accurately predict short-term mortality among patients with advanced cancer starting radiotherapy. Such tool could facilitate shared decision-making among the patients, family, and medical care team. Additionally, it could help clinicians identify patients unlikely to benefit from RT beyond 30 days and those who may instead benefit from earlier palliative care referral and end-of-life planning. Machine learning techniques have the potential to improve clinical decision-making by identifying those at increased risk of poor mortality 38 . In 3 studies summarized by a systematic review, machine learning techniques are better than routine logistic regression in building model for mortality prediction in older and/or hospitalized adults, if enough data are obtained [38][39][40][41] . Future research is needed to incorporate machine learning techniques and to determine the generalizability and feasibility of the application of prediction tool in clinical settings.

Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.