Development of a claims-based risk-scoring model to predict emergency department visits in older patients receiving anti-neoplastic therapy

This study developed and validated a risk-scoring model, with a particular emphasis on medication-related factors, to predict emergency department (ED) visits among older Korean adults (aged 65 and older) undergoing anti-neoplastic therapy. Utilizing national claims data, we constructed two cohorts: the development cohort (2016–2018) with 34,642 patients and validation cohort (2019) with 10,902 patients. The model included a comprehensive set of predictors: demographics, cancer type, comorbid conditions, ED visit history, and medication use variables. We employed the least absolute shrinkage and selection operator (LASSO) regression to refine and select the most relevant predictors. Out of 120 predictor variables, 12 were integral to the final model, including seven related to medication use. The model demonstrated acceptable predictive performance in the validation cohort with a C-statistic of 0.76 (95% CI 0.74–0.77), indicating reasonable calibration. This risk-scoring model, after further clinical validation, has the potential to assist healthcare providers in the effective management and care of older patients receiving anti-neoplastic therapy.


Model development and validation
In the initial stage of model development, from the 120 predictors initially screened (Table S1), 59 candidate predictors were excluded due to a frequency of less than 1%.These included specific two cancer diagnoses, 16 general potentially inappropriate medications (PIMs), 15 geriatric drug-drug interactions (DDIs), and 26 disease-specific PIMs.Following this, we assessed multicollinearity among the remaining variables, but none were excluded as all had a variance inflation factor below 10.This led to 61 candidate predictors being subjected to the least absolute shrinkage and selection operator (LASSO) regression analysis.We selected a lambda value of λ = 0.00129, corresponding to log (λ) = − 2.9 after plotting the binomial deviance curve against log lambda (λ) and the coefficient profile against the log (λ) sequence (Fig. S1).This choice resulted in retaining 39 variables for the LASSO regression model.Further refinement, based on the regression coefficients and excluding variables with coefficients below 1 when multiplied by 100, narrowed this down to 12 final variables, forming the basis of www.nature.com/scientificreports/predictive model.The final model included variables such as a history of ED visits within the past 3 months, a diagnosis of lung cancer, two specific comorbid conditions (atrial fibrillation and major bleeding), a Charlson comorbidity index score of 6 or higher, and seven medication-related factors.These factors are the initiation of anti-neoplastic therapy, anti-neoplastic therapy with cytotoxic agents, chemotherapeutic DDIs, use of three or more central nervous system (CNS)-active drugs, regular use of opioids without laxatives, use of megestrol, and the use of 10 or more medications.Each variable received a weighted score based on its coefficients in the LASSO regression model, as shown in Table 2: a history of ED visits within 3 months was assigned the highest weight of 13 points, followed by lung cancer, anti-neoplastic therapy with cytotoxic agents, use of three or more CNS-active drugs, and regular use of opioids without laxatives, each assigned 3 points.Two points were allocated for the initiation of anti-neoplastic therapy, chemotherapeutic drug interactions, and the use of megestrol.The risk scores ranged from 0 to 35 (Table 2).For validation, the area under the receiver operating characteristic curve (AUROC) of the prediction model in both the development and external validation cohorts was 0.76 (95% confidence interval (CI) 0.75-0.77and 0.74-0.77,respectively) (Fig. 2a).The calibration of the model was generally accurate, though a slight overestimation was observed for scores above 25(Fig.2b and Table S2).

Risk stratification
The cut-off value for high-risk classification was set at 7 points.Table 3 presents the stratification results for both the development and external validation cohorts.In the external validation cohort, 30.9% (3365 patients) were categorized as high-risk and 69.1% (7537 patients) as low-risk.Among these groups, 590 (17.5%) of the high-risk and 291 (3.9%) of the low-risk patients eventually visited the ED.The sensitivity and specificity of the model Table 1.Baseline characteristics of study participants in the development and external validation cohorts.ED emergency department, CNS central nervous system.a CNS-active drugs: antiepileptics; antipsychotics; benzodiazepines; nonbenzodiazepine, benzodiazepine receptor agonist hypnotics; tricyclic antidepressants; selective serotonin reuptake inhibitors; serotonin-norepinephrine reuptake inhibitors; and opioids.www.nature.com/scientificreports/were determined to be 67.0%(95% CI 63.9-70.1%)and 72.3% (95% CI 71.4-73.2%),respectively.The top three cases predicted as high-risk by our model are detailed in Table S3.

Post-hoc subpopulation sensitivity analysis
The broad range of applicability and performance consistency of the developed risk-scoring models was determined by conducting a sensitivity analysis across subpopulations (Table 4).Notably, the model performance was found to be the highest in patients with breast cancer and the lowest in patients with colon cancer.However, the performance did not differ significantly among the different types of anti-neoplastic agents.

Discussion
We developed a claims-based risk-scoring model focusing on medication variables as a screening tool to classify older patients receiving anti-neoplastic therapy who are at high risk of ED visits.The characteristics included in the risk score are easily accessible to clinicians or pharmacists and can be detected in medical records or prescription data, facilitating the calculation of personalized risk estimates for ED visits.
In the external validation cohort, our model demonstrated acceptable and moderately strong prediction performance, with a C-statistic of 0.76.This performance is notably better than those reported in previous studies 10,11,15,16 .For instance, Brooks GA et al. developed two models with C-statistics of 0.71 and 0.69, respectively 10,11 , and Sutradhar et al. 's model achieved a C-statistic of 0.737 16 .Moreover, Grant et al. 15 developed a logistic regression model to predict acute care use in cancer patients initiating systemic cancer therapy, which showed a C-statistic of 0.61 in a population-based cohort of 12,162 patients.Their model included three variables: a combination of cancer type and treatment regimen, age, and ED visits in the previous year.Our study, while not focusing on individual chemotherapy regimens, successfully developed a distinct predictive model for ED visits in older patients receiving anti-neoplastic therapy, placing special emphasis on modifiable variables like medication use.
In this study, the incidence of ED visits was relatively low, accounting for only 8.1% of the study population.This finding is in contrast with those of previous studies (ranging from 14.5 to 61.0%) 15,18,19 .These differences could be attributed to the outcome definitions, patient populations, clinical settings, and databases used for analysis.
The strongest predictor of ED visits during anti-neoplastic therapy was prior ED visits within 3 months prior to anti-neoplastic therapy.This finding aligns with previous studies investigating ED visit prediction models for patients with cancer, where prior ED visits were identified as significant predictors 15,[20][21][22] .Our results suggest that patients with a history of ED visits may require additional precautions to prevent future ED visits.
In comorbidities, atrial fibrillation and major bleeding were included in the final model, consistent with previous studies showing that these were the main symptoms of ED visits in older patients or patients with cancer 23,24 .
Among the final predictors, medication variables encompassed general PIM (megestrol), geriatric DDIs (use of three or more CNS-active drugs, regular use of opioids without laxatives), chemotherapeutic DDIs, and the use of 10 or more medications, which is in line with our previous study 5 that demonstrated that one or more PIMs, geriatric DDIs, chemotherapeutic DDIs, and 10 or more medications increased the risk of ED visits in older patients receiving anti-neoplastic therapy, although individual PIMs were not analyzed.Furthermore, lung cancer and anti-neoplastic therapy with cytotoxic agents, included as final predictors, were more likely to indicate inappropriate polypharmacy in older adults receiving anti-neoplastic therapy 5 .
To the best of our knowledge, this study is the first to investigate risk predictors, including medication variables that may lead to ED visits, among older patients receiving anti-neoplastic therapy, including solid and hematologic malignant neoplasms, using data derived from a national claims database.Our study has several implications for clinical practice and policies.The risk-scoring model developed in our study can be used as a screening tool to identify high-risk older patients receiving anti-neoplastic therapy who are prone to ED visits.This enables the early identification of high-risk patients and targeted interventions, such as medication review and counseling, to prevent ED visits and improve patient outcomes.Furthermore, healthcare policymakers can utilize this model to develop interventions and allocate resources effectively to reduce the risk of ED visits.This could include targeted outreach and education programs for patients and healthcare providers and the development of specialized clinics or telemedicine services to provide timely and appropriate care to patients at high risk of ED visits.While our study establishes a foundation for a novel predictive model, we recognize potential barriers in integrating this model into existing healthcare systems.These may include technical challenges, as well as resistance from policymakers and healthcare professionals.Additionally, conducting a cost-effectiveness analysis is crucial to determine the feasibility and economic viability of implementing this model in clinical settings.Addressing these aspects is essential for the successful adoption and practical application of the model.This study has several limitations.First, our model tends to overestimate the likelihood of ED visits, particularly at higher risk scores, which could be interpreted as an indicator for increased support and intervention.Therefore, while our model may tend to overestimate, this characteristic could be deemed acceptable when the model is used in tandem with clinical assessments.Second, the model's sensitivity of 0.67 suggest a potential misclassification rate of approximately 33% among those classified as high-risk, coupled with wide confidence intervals, indicating variability in performance that must be considered in clinical application.Third, the limitations inherent in the claims data precluded the consideration of factors such as social determinants of health, specific laboratory results, and medications available without a prescription, all crucial for understanding patient health outcomes, particularly in older populations that often have complex health needs.Fourth, our study did not account for the individual anti-neoplastic therapy regimens and their intensity due to limitations in obtaining accurate dosage and treatment schedule from claims data.Fifth, the external validation's effectiveness might be overestimated since it used the same administrative data source as our development cohort.Further validation in diverse real-world settings is necessary to confirm the model's robustness.Sixth, it's crucial to note that our study design and analysis are primarily geared towards predictive modeling, not to establish causation.This distinction is important for interpreting and applying our results in a clinical setting, as the identified associations do not necessarily imply causal relationships Lastly, our study, constrained by the nature of claims data, didn't differentiate between avoidable and unavoidable ED visits, focusing instead on overall acute care utilization.This limits our ability to identify preventable acute care uses, underlining the need for future research in this area.
In conclusion, our study developed and validated a risk-score model for predicting ED visits among older patients receiving anti-neoplastic therapy, demonstrating moderate discrimination capabilities.While this model provides valuable insights and hold potential in identifying high-risk patients, it necessitates further validation to confirm its effectiveness across diverse healthcare settings and to comprehensively evaluate its impact on patient care.

Study design and database source
We conducted a retrospective cohort study utilizing the annual National Adult Patient Sample (APS) database sourced from the Korean Health Insurance Review and Assessment Service (HIRA) spanning the years 2016 to 2019.This dataset comprises 4,766,420 patients aged 65 years or older, representing 20% of the elderly population before 2016 and reducing to 10% after 2017.The HIRA-APS compilation, conducted annually, involved anonymizing personal information and utilizing stratified sampling.This probabilistic approach of sample extraction, based on sex and age, was employed to ensure data representativeness 25 .
Within HIRA-APS, comprehensive data is available on demographic characteristics, healthcare utilization, prescriptions, and diagnoses.Notably, it provides detailed drug information covering both inpatient and outpatient prescriptions for all patients involved.This extensive data availability was a pivotal factor in selecting a retrospective cohort design for our study.This approach enabled us to utilize a large volume of existing data, crucial for evaluating the impact of medication-related factors on ED visits over multiple years.The retrospective design also helped reduce selection bias and improve generalizability by including a diverse patient population from a national database.Additionally, it provided an efficient way to analyze historical data, avoiding the complexities and time constraints associated with prospective data collection.This method aligns well with our goal to assess a wide array of medication-related factors in older patients on anti-neoplastic therapy and their impact on health outcomes.
This study was conducted in accordance with the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) statement 26 .This study was approved by the Seoul National University Institutional Review Board (No.E2002/001-008).The informed consent from the participants was waived by the Institutional Review Board because this study used de-identified data.All methods were performed according to relevant guidelines and regulations.

Study population and outcome definition
To identify the study population, we selected patients who were prescribed anti-neoplastic agents and also had diagnostic codes for cancer.The anti-neoplastic agents were identified using their Anatomical Therapeutic Chemical (ATC) codes and classified as cytotoxic agent (ATC code L01, except for targeted therapy), targeted agent (ATC codes L01XC, L01XE), and endocrine agent (ATC code L02).The diagnostic codes for cancer were based on the International Statistical Classification of Diseases and Related Health Problems, 10th Revision (ICD10), which included codes C00-C96 (cancer) and D37-D48 (neoplasms of uncertain or unknown behavior, polycythemia vera, and myelodysplastic syndromes).
Given that the HIRA-APS database is an annual database, we included only those received anti-neoplastic therapy after July each year.This timing was selected to establish a baseline period that allow for confirmation of comorbidities, healthcare utilization pattern, and history of anti-neoplastic therapy.We also applied exclusion www.nature.com/scientificreports/criteria for patients hospitalized for more than 150 days from January to June, as prolonged hospitalization during this period could significantly affect our outcome.Similarly, patients hospitalized for more than 30 days from their entry date were excluded because the outcome could not occur.The development cohort was constructed from the 2016-2018 APS database, whereas the external validation cohort was constructed from the 2019 APS (Fig. 1).The entry date was defined as the first date on which the anti-neoplastic agent was prescribed after July.The outcome was an initial visit to the ED within 30 days of entry date (Fig. S2).

Predictor variables
Predictor variables were selected based on factors previously identified as being associated with ED visits in a national claims data study 5 .This study assessed variables highlighted in earlier literature and the lists of potentially inappropriate medications for the elderly, as outlined in established guidelines 27,28 .The collected data included (1) demographic information such as age, sex, and insurance type; (2) ED visit history; (3) Charlson comorbidity index score; (4) presence or history of comorbid conditions such as anemia, congestive heart failure, diabetes, major bleeding, and hypertension (Table S4); (5) cancer diagnosis; and (6) medication variables including the type of anti-neoplastic agents, initiation of anti-neoplastic therapy, number of chronic medications, and general and disease-specific PIMs, and DDIs.General and disease-specific PIMs prescribed during anti-neoplastic therapy were evaluated based on the 2019 American Geriatrics Society Beers Criteria ®27 and screening tool of older people's prescriptions (STOPP) criteria 28 for inappropriate use in the geriatric population.We screened for clinically significant DDIs, categorizing them as geriatric and chemotherapeutic DDIs.We identified geriatric DDIs, which are drug interactions that should be avoided in older adults according to the 2019 American Geriatrics Society Beers Criteria ® and STOPP criteria, and chemotherapeutic DDIs, which are potentially significant interactions involving anti-neoplastic agents according to a reference database.For chemotherapeutic DDIs, those categorized as "D" or "X" by Lexicomp Online™ or those categorized as "major" or "contraindications" in severity by Micromedex ® were considered clinically significant (Table S5).Medication exposure was assessed based on medications used for more than three days during the one week before the entry date (Fig. S2).
Since we utilized a claims database, it was assumed that the absence of a record corresponded to the absence of the corresponding condition.No missing data was observed for demographic factors such as age group, sex, and insurance type.

Risk-scoring model development, validation and statistical analysis
We summarized the baseline characteristics of the study population using descriptive statistics.To facilitate the integration into a risk-scoring model, we transformed continuous variables into categorical variables.In the training cohort, we developed an ED visit risk-scoring model using LASSO regression method.The choice of LASSO was driven by its efficiency in handling high-dimensional data, a feature of our dataset, and its capability to minimize variable collinearity 29,30 .Unlike methods using L2 regularization (e.g., Ridge regression), LASSO employs L1 regularization, penalizing less significant variables by shrinking their coefficient to zero 30 .This feature selection effectively simplifies the model and reduces overfitting risk.With a large number of potential predictors in our dataset, LASSO's dual function of variable selection and regularization was particularly advantageous, enhancing the model's predictive accuracy and interpretability by focusing on the most significant variables 29,30 .After labeling outcome parameters and predictor variables generation, the risk-scoring model building process was as follows.First, the frequencies of variables were evaluated, and those with a prevalence of less than 1% were excluded.Second, we assessed multicollinearity between the variables using the variance inflation factor.Third, we performed LASSO regression on the training dataset to select predictor variables and to fit the model.The optimal penalty parameter λ for maximizing model performance was determined using tenfold cross-validation.Furthermore, we eliminated variables from the final list if their regression coefficient values, multiplied by 100, were below 1 to enhance the usability of the prediction scores.Subsequently, we obtain the regression coefficients for each variable from the final LASSO selection operator regressions.A risk-score model was developed by assigning a risk score to each variable, multiplying its β coefficient by 100, and rounding it to the nearest integer.Individual risk was determined by summing the weighted scores associated with each assigned risk factor score.
In the external validation cohort, we assessed the model performance in terms of discrimination and calibration 31 .Discrimination was assessed using AUROC.Calibration for comparing the predicted and observed risks was performed using the calibration plot.We categorized patients into low-and high-risk groups for clinical decision-making based on their risk score distribution and predicted probability in the development cohort.The Youden index was used to determine the cut-off point for the high-risk group, which balances sensitivity and specificity 32 .Sensitivity and specificity were evaluated for these cut-off values.SAS version 9.4 (SAS Institute, Cary, NC, USA) was used for the data management and descriptive statistics.LASSO regression was performed using the 'glmnet' package in the R statistical software (R Foundation for Statistical Computing, Vienna, Austria).

Post-hoc subpopulation sensitivity analysis
The performance of the risk-scoring model was assessed in subgroups of patients with diverse cancer diagnoses and those who were treated with different types of anti-neoplastic agents: cytotoxic agent-based, targeted agentbased, and endocrine agent-only.

Data availability
The dataset supporting the conclusions of this article is available from the Korea National Health Insurance Service (KNHIS) Data Sharing Service homepage (https:// nhiss.nhis.or.kr/ bd/ ab/ bdaba 001cv.do) but restrictions

Figure 1 .
Figure 1.Flow chart depicting the methodology of patient selection.

Table 2 .
The risk scores and beta coefficients in the prediction model for emergency department visits in older patients receiving anti-neoplastic therapy.ED emergency department, CNS central nervous system.

Table 3 .
Accuracy of the risk-scoring model using the cut-off values.ED emergency department.

Table 4 .
Performance of the risk-scoring model among different subpopulations.AUROC area under the receiver operating characteristic curve, CI confidence interval.