Introduction

Cancer pain is common in patients, particularly during the advanced stages of the disease, when the prevalence is estimated to be more than 40%. This contributes to poor physical and emotional state of the patients1, 2. Prolonged survival, followed by advancement in diagnosis and treatment of cancer, results in an increase in the number of patients experiencing persistent pain3, 4. This trend has also been documented in hematology patients at the time of diagnosis, during therapy, and in the last month of their life5, 6. As per the estimates of GLOBOCAN 2020, the incidences of cancer is increasing and will be more than 19 million7. Thus, cancer-related pain would be a major issue in global healthcare systems.

Acute exacerbation of cancer pain is a challenging clinical problem in managing cancer pain, negatively impacting the patient’s daily life8,9,10,11 Some guidelines suggest that the occurrence of three to four cancer pain exacerbation (CPE) episodes per day is acceptable12. However, a less frequent CPE could also affect patients’ daily quality of life. In addition, the interval between pain onset time and drug effect time could worsen the patients’ quality of life. In particular, hospitalized patients can only avail short-acting opioids upon informing the nurse. Thereafter, the nurse must inform the doctor for a decision. The patients would likely be left in severe pain during this processing time. From this point of view, accurate prediction of CPE could alleviate the frequency and interval of CPE and improve patients’ daily life well-being.

Recently, the time series model based on a deep learning algorithm has gained popularity based on its remarkable performance13, 14. Cancer pain may reflect the status of cancer invasiveness, exposure to cancer treatment, pharmacodynamics of the opioids, patient’s lifestyle, and the health care system of the hospital, including the management of doctors and nurses1. Therefore, we hypothesized that the exacerbation of cancer pain is repeated according to the patient’s previous patterns and could thus, be predictable. In this study, we aimed to investigate the clinical relevance of deep learning models that predict the time of breakthrough pain onset in cancer patients.

Patients and methods

Patients and data collection

This single-center retrospective study aimed to evaluate the efficacy of deep learning model in predicting the onset of CPE in cancer patients. This study was conducted in accordance with the Declaration of Helsinki and was approved by the institutional review board (IRB) at the Samsung Medical center (approval number 2020–09-073). The Samsung Medical Center IRB waived the need for informed consent because of the retrospective nature of this study. All pain records pertaining to 34,304 patients were retrospectively collected who were admitted to the department of hematology and oncology of the Samsung Medical center in Korea between July 2016 and February 2020. Clinical data were obtained from the medical records using de-identified clinical data warehouse15. Of all the patients, we excluded 2,697 patients who underwent surgery during hospitalization and 28,173 patients with less than 20 non-zero numerical rating scale (NRS) score records. The selected 3,431 patients included pain log data for 4,870 admissions, split into the 80–20% training/test (2,745/686 patients with 3,896/974 admissions) set (Fig. 1).

Figure 1
figure 1

The study cohort.

Pre-processing

Nurse rounding occurs every day at 5:00 h, 13:00 h, and 21:00 h. During nurse rounding, nurses usually record the patient’s self-reported pain scores using NRS scales with an ordinal range from 0 (no pain) to 10 (severe and unbearable pain)16. The patients could also ask the nurse for management at other times in the case of sudden pain, Herein, the nurse records the additional pain scores and notifies the doctors. As there was no quantitative consensus on the definition of CPE, we defined CPE as an NRS score of 4, a commonly used indication of opioid intervention in cancer pain management guideline[2]. Pre-processing consists of binning and transformation steps as mentioned below. Figure 2 explains the entire process.

Figure 2
figure 2

Data processing, modeling, and evaluation schemes.

Binning

Originally, the pain scores were recorded according to the time in minutes. However, considering the highly sparse signal of the data, we binned the entire NRS record in arbitrary \(\tau\)-length time-bins. If multiple records were within the same bin, the highest NRS score was considered as the record for that period.

Transformation

CPE could be different among patients (inter-patient) and could also be different in the specific situation of each patient (intra-patient). Intra-patient pain pattern refers to a patient's unique pain pattern depending on the patient’s medical status. Inter-patient pain pattern refers to the pattern of medical practice, including rounding time and clinician’s management. All pain records were processed in 24-h increments to reflect these characteristics. Zero padding was used to set the start and end of all pain records to 0:00 h. Recordings at time points where there were no NRS observations in the episode, the patient was considered pain-free and were imputed to zero. Accordingly, pain records for \(n\) days with \(\tau\)-length time-bins were transformed from a shape of \(1\times ((24/\tau )\times n)\) vector form to a (\(24/\tau )\times n\) matrix form.

Modeling

We explored six deep learning-based time series forecasting architectures for prediction of CPE according to the various input length and time-bin size \((\tau )\) of pain records. Time-bin size (\(\tau )\) was investigated by grid search for the divisors of 24, suited for transformation, excluding 24, considering the clinical application (\(\tau \in\) {1, 2, 3, 4, 6, 8, 12} h). The comparison was conducted for recurrent neural network (RNN)17, long short-term memory (LSTM)18, gated recurrent unit (GRU)19, bidirectional long short-term memory (Bi-LSTM)20, hybrid of the convolutional neural network, long short-term memory (CNN-LSTM)21, and transformer22. Each prediction model of CPE was implemented according to its respective basic recipe, and the models were constructed with a non-autoregressive prediction structure followed by a dense layer after stacking three basic blocks (Fig. 2). To make a fair comparison, the number of hyperparameters between all models was set to minimize their differences (within 3,000 parameters). The model was trained for 300 epochs with a batch size of 100. A balanced cross entropy loss was ensured and the system was optimized by stochastic weight averaging23 with an initial learning rate of 1e-4, start averaging of 5, and the average period of 1. Our model was programmed in Python 3.7, Tensorflow 2.4.1, and experimented using NVIDIA GeForce RTX 2080.

Evaluation

The prediction of CPE could be regarded as a binary classification of independents (0 s and 1 s) over the next 24 h (Fig. 2), and the number of trues depends on the time-bin size (\(\tau )\), which shows a class imbalance with a dominant. Therefore, we evaluated the performance of the model based on the Matthews correlation coefficient (MCC), a reliable statistical proportion that produced a high score proportional to the size of the positive and negative elements in the dataset. This is possible only if the predictions yield good results in all four categories of confusion matrix (true positive [TP], false positive [FP], true negative [TN], and false negative [FN])24. The range of MCC values is [− 1, 1], where − 1 indicates the opposite of prediction and trues, and + 1 indicates the correct. In addition to MCC, the performance of the model was also evaluated with area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC).

Ethics approval and patient consent statement

The ethical review board of Samsung Medical Center approved the study protocol. As per the regulations in Korea, the review board waived the requirement for informed consent for this study as it was a retrospective analysis.

Results

Baseline characteristics of patients

The baseline characteristics of 3,431 patients and 4,870 admissions are shown in Supplementary Table 1. The median age was 58 years (range, 15–89), and 2,047 (59.7%) were male. The most common type of cancer was lung cancer (n = 745, 21.7%), followed by lymphoma (n = 491, 14.3%). The median hospitalization duration was 14.96 days (range, 5.46–195.40), and patients with aplastic anemia were hospitalized with the longest duration (29.50 days [range, 7.79–93.10]). The median frequency of records of NRS and CPE per day was 2.74 (range, 0.17–12.22) and 1.01 (range, 0–1.31), respectively. Patients with head and neck cancer present CPE most frequently (1.57 per day [range, 0–7.85]).

Characteristics of pain records

Pain scores were mostly recorded at 8:00–10:00 h, and 16:00–18:00 h, which were the regular rounding times at the ward (Fig. 3A). Among the 1-h time binned records (n = 1,311,240), 78,376 (6.0%) was CPE, while more frequent CPE were noted with larger time binned records, showing 44.5% (n = 48,663) of CPE in the 12-h time binned record (n = 109,270) (Supplementary Table 2). Figure 3B described the correlation between daily pain records of forecast days and previous days. The Pearson coefficient scores increased closer to the forecast period with the increase in time binning. In the setting of an hour time binned records, the NRS record in the interval between 96 and 120 h before forecast day showed a coefficient of 0.08. The score was improved closer to forecast date, which showed a coefficient of 0.20 in records that were made 24 h before the forecast date. The Pearson coefficient scores increased with larger time bin records, presenting a coefficient of 0.53 in 12-h time binned NRS records of the previous day. The correlation analysis of matched time NRS scores between the forecast day and a day prior to the forecast period indicated that the interval between 11:00 h and 12:00 h (Coefficient: 0.17) and 18:00 to 19:00 h showed relatively high correlations (Coefficient: 0.18) compared to others (Fig. 3C).

Figure 3
figure 3

Characteristics of pain records. (A) Heatmap of the daily pain pattern of all patients. (B) Correlation analysis of the NRS scores between the forecast date and prior records. We used the Pearson correlation coefficient to evaluate the correlation between pain records according to the days. (C) Correlation analysis of the time-matched NRS scores between the forecast date and the previous day.

Comparison of performance

Table 1 presents the MCC values for each deep learning model across the various range of input lengths and time-bins. In all settings, there was no significant difference in CPE prediction performance according to the structural variation of each model. The average difference between the highest and lowest performance models for each experimental condition was 0.0182, and the standard deviation was 0.0087. Nevertheless, the LSTM-based model showed a good performance. It displayed the best performance on nine occasions, while the second-best performance was recorded eight times among the 21 experiment settings. The model based on GRU, a simplified version of LSTM, also showed good performance. Herein, eight best performances and seven second-best performances were recorded. Through this exploratory investigation, we selected the LSTM block as the backbone structure of the representative CPE prediction mode. Meanwhile, all deep learning-based models were consistently better than the MCC of the base model that makes predictions with average scores of same time intervals of previous days in the window period.

Table 1 Performance comparisons for various models according to input and time-bin length.

To investigate the efficacy of transformation, we investigated the performance of LSTM-based model input NRS before transformations (Supplementary Table 3). In the setting of 24 h input and 1 h time bin, the LSTM based model using transformed data showed an MCC of 0.1721, which was better than the performance input with original NRS records (MCC: 0.1686). The model performance was consistently improved after transformation in the various input and time-bin lengths. Also, the performance was not significantly improved with a larger input length. In particular, the model input of 120 h of NRS records did not show a significantly better performance than the model input 72 h of NRS records in various time bin length settings. Instead, the performance was greatly improved with larger-sized time binning. In this study, the best performance was an MCC of 0.4927 derived from LSTM-based model using 120 h of input length and 12 h time bin set, showing AUROC of 0.8080 and AUPRC of 0.7340 (Fig. 4). This model showed best performance in patients with aplastic anemia (MCC: 0.663), followed by head and neck cancer (MCC: 0.594) (Supplementary Table 4).

Figure 4
figure 4

AUROC and AUPRC of LSTM-based model with different sizes of time binning and input lengths. \(\tau\): time bin size.

Cases of prediction for onset timing of breakthrough pain

Figure 5-A depicts the case of a 76-year-old male patient with bladder cancer and retroperitoneal lymph node metastasis who was admitted for supportive care. The patient showed good performance (MCC: 0.3452, AUROC: 0.9051, AUPRC: 0.2067) as pert the LSTM-based model with an hour time binning and a 5-day input data during hospitalization. He complained of back pain and was administered 12.5 mcg/h of fentanyl patch and received 5 mg of morphine (intravenous) when they presented breakthrough pain. On the tenth day. patients complained of back pain at 10:00 h and 22:00 h, and the LSTM-based model predicted the onset time of breakthrough pain between 11:00–13:00 h and 18:00–23:00 h. This value was consistently close to the actual patients’ complaints after tenth day of hospitalization. In addition, the LSTM-based model was tested in patients with hematologic malignancy. Figure 5B shows the case of a 53-year-old female patient with aplastic anemia who displayed a relatively accurate prediction during hospitalization (MCC: 0.1843, AUROC: 0.2015, and AUPRC: 0.1767). She was admitted for allogeneic peripheral blood stem cell transplant (allo-PBSCT). The patient received conditioning chemotherapy on the first hospital day, followed by allo-PBSCT, 7 days later. On 14th hospital day, the patient complained of fibromyalgia at 13:00 h and 21:00 h, and the LSTM-based model predicted onset times at 11:00 h and 20:00 h.

Figure 5
figure 5

Representative cases for predicting the onset time of cancer pain exacerbation using serial pain records derived from the patients. (A) The case of a patient with bladder cancer who complained of back pain. Blue dots and lines indicated NRS pain scores during the window period, and yellow dots and lines were the prediction results about the presence of CPE, derived from LSTM-based models with a 5-day input length and 1-h time binned data setting. The green dots and lines indicated the real-world records about the existence of CPE according to the times. Yellow dots and lines in the black-colored box showed the probabilities of CPE on the forecast day. (B) The case of a patient with aplastic anemia who underwent allo-PBSCT.

Supplementary Figure 1A shows other cases capturing breakthrough pain incorrectly. The patient with renal cell carcinoma, bone and pleural metastasis was administered for pleural effusion and back pain management. The LSTM-based model incorrectly predicted CPE during hospitalization (MCC: 0.0417, AUROC: 0.5843 and AUPRC: 0.1767). He complained of back pain frequently on the fifth and sixth hospital days, the day after the second thoracentesis. On the seventh hospital day, the model predicted CPE as a similar pattern to the sixth day in the hospital. However, this patient complained of CPE less often after doctors escalated the dose from 5 mg of long-acting oxycodone to 10 mg of long-acting oxycodone on the sixth hospital day. Supplementary Figure 1B depicts a patient with stomach cancer with peritoneal seeding. This patient complained of severe abdominal pain and was hospitalized for management of afferent loop syndrome. He underwent percutaneous transhepatic biliary drainage (PTBD) and L-tube insertion in the ER. However, the PTBD tube was removed accidentally on the fourth hospital day. At first, the pain related to PTBD tube was alleviated. However, the afferent loop syndrome aggravated on the sixth hospital day, and the patient complained of abdominal pain more frequently on that day. The LSTM-based model underestimates the frequency of CPE compared to real-world pain records. The performance of LSTM-based model during hospitalization was MCC of 0.0467, AUROC of 0.4691, and AUPRC of 0.1379.

Discussion

In this study, we explored the feasibility of deep learning methods to predict the onset time of CPE in cancer patients at the time of hospitalization. The NRS pain records showed circadian patterns and correlated with NRS pain patterns of the previous days. In particular, the NRS scores were positively correlated with the closeness from the forecast date and the size of time binning. The LSTM-based model showed a good performance by achieving the best performance in the experiments with 24 h input length and 1-h time bin (MCC: 0.1721). The performance was improved in the experiments with more extended input data and larger binning size, which showed the best performance in the 120 h input length and 12 h bin lengths (MCC: 0.4927). Considering this model performance was significantly better than the base model performance (Table 1), our study showed that the NRS pain could be predictable using deep learning-based models.

The NRS pain records showed circadian pain patterns, mostly recorded during 8:00–10:00 h and 16:00–18:00 h, near the rounding time (Fig. 3A). All recorded pain episodes were pre-processed in 24-h increments to make the data reflect this circadian pattern. Zero padding was performed to set the start and end of all CPE episodes to 0:00 h. After the pre-processing, the performance was significantly improved (Supplementary Table 3). Meanwhile, the input record that was obtained one day before the prediction had the highest correlation and similarity with the forecast period when compared to the records on other days (Fig. 3B). This pain pattern may be affected by pain management in the hospital, including elevation of dose or frequency of opioids and other interventions. This data characteristic could be the reason why unidirectional RNN model, including LSTM and GRU, showed competitive performance compared to others, although the unidirectional RNN model is well known to be vulnerable in the case of long-term dependence. Meanwhile, as described in Supplementary Figure 1B, pain patterns could be related to various factors, including acute events and treatment patterns. As current models could not predict pain patterns reflecting this acute change, time series data reflecting these factors could make the model perform better.

Pérez-Hernández et al. investigated breakthrough pain characteristics and patterns using the Alberta breakthrough pain assessment tool25. This study showed that 42.6% of patients could correctly predict the occurrence of breakthrough pain. Another 20.5% of patients could accurately estimate breakthrough pain on certain occasions. This is mostly because CPE is associated with pose or movement of patients. In an earlier study, as per the answers of 81.5% of the patients, the duration of onset time to the peak intensity of breakthrough pain was less than 30 min. Considering interval breakthrough pain is twice as much as onset to peak time, we first tried to make a predictive model inputting the serial pain log data divided by an hour. However, as we used the zero-input method for missing values, the data had high sparsity after the 1-h binning (CPE: 6.0% of the total dataset). Therefore, we performed ablation studies with larger intervals binning up to 12 h, which was the least clinically available, and the sparsity of CPE was improved (CPE: 44.5% of the total dataset) (Supplementary Table 2). Meanwhile, the length of the input record also affected the model performance. However, the extent was less significant compared to the change of time binning length. It is necessary to reflect on these data characteristics and optimize them according to the clinical setting.

Our study has certain limitations. The most prominent limitation is its single center-based design, which might limit the generalizability of our data. In addition, we used NRS records divided by hours and simply defined the breakthrough pain as the time interval with the records with an NRS above 4. By following this protocol, we have excluded many other characteristics of cancer pain, that might limit the interpretation. Additionally, our study investigated univariable models with simple structures as our goal was to explore the feasibility of pain patterns. Nevertheless, our models showed adequate performance even though there were few input data types. Subsequent validation studies, including detailed data and sophisticated model structure, would make the model perform better and more applicable.

In conclusion, our study showed that cancer pain could be predictive using deep learning models. Though our exploratory study has limitations, further research could improve the model performance, and verification study could make our model applicable in real-world practice.