Introduction

Worldwide, 15–30% of patients admitted to intensive care (ICU) die in hospital1. A large proportion of these deaths are preceded by withholding or withdrawing life-sustaining treatment (WLST) that is no longer considered to be of benefit for the dying patients. Globally, withholding treatment is the most common form of limiting care, albeit with a large regional variability2. However, the representation of withdrawal has recently increased and in some regions it is already the dominant form of limitation of LST3. In addition to withdrawal of circulatory support, all non-comfort medications, nutrition, dialysis and mechanical ventilation (MV) are usually also withdrawn. Procedural aspects of withdrawal of MV are understudied, but its most common forms are either terminal weaning (a gradual reduction of ventilatory support, such as reducing the inspiratory FiO2, positive end-expiratory pressure, minute ventilation or switching the patient to spontaneous ventilation, whilst the endotracheal cannula is left in place) or as immediate removal of the endotracheal tube (further referred to as terminal extubation, TE)4.

TE in patients nearing the end of life in intensive care can prevent gagging and facial distortion of patients, thereby increasing the perceived comfort and dignity of dying. On the other hand, it may lead to grunting and/or gasping due to the loss of airways5, and there are concerns that it may hasten death. Current European guidelines recommend an individualized approach to ensure patient comfort6 despite the fact that the impact of extubation on patients, their families and the psychological well-being of healthcare providers may be difficult to predict in each individual case5. In practice, the decision on technical aspects of terminal care is often made based on local practices, following consensus among staff and relatives. To what extent these measures are tailored to patients’ needs may also significantly differ among centres. It is unknown whether specific patient characteristics may influence (consciously or subconsciously) healthcare providers to decide whether to perform TE or not. These characteristics are likely to have a complex non-linear relationship that can be explored using machine learning methods.

We sought to determine: (1) factors associated with the decision to perform TE as part of withdrawal of life-sustaining therapy (WLST); and (2) whether this action influences the time to death after these factors have been taken into consideration.

Methods

The reporting of this study is in accordance with the Strengthening the Reporting of Observational studies in Epidemiology (STROBE) statement7.

Study design

We performed a post hoc secondary analysis of data collected as part of a large, multi-national, prospective, observational study on the process of WLST in dying patients (the Death Prediction and Physiology after Removal of Therapy, or DePPaRT study), which collected baseline data and ECG, pulse oximeter and arterial waveforms from WLST until 30 min after determination of death. Patients in participating intensive care units in Canada, the Czech Republic, and the Netherlands were enrolled between May 2014 and December 2018. The DePPaRT study protocol, including secondary data analyses, were approved by the relevant institutional review board or ethics committee at each site (see Supplementary Table S8), and all patients’ surrogate decision makers provided written prospective informed consent for participation in the study. The protocol is compliant with the Declaration of Helsinki and data storage and analyses were performed in accordance with national legislation and General Data Protection Regulation of the European Union.

Patients

A subgroup of DePPaRT study patients (age ≥ 18 years who died after WLST in an intensive care unit) who had their lower airway secured with an orotracheal or tracheostomy cannula were included into the analysis. As per the DePPaRT study protocol, patients were excluded if they had a neurological determination of death, a functional cardiac pacemaker, or no arterial catheter at the time of WLST. Patients who died in the ICU due to unsuccessful cardiopulmonary resuscitation were also excluded from the study.

Data collection

For this secondary analysis a subset of available parameters (features) was selected for the machine learning model based on the literature study and consensus of 3 experts (PW, MS, FD). We also, a priori, decided to exclude parameters with more than 20% missing values (Fig. 1).

The following data were included in the data set:

The dependent variable was TE (yes vs. no). Independent variables used for modelling (N = 28) included:

Patients’ characteristics at baseline: study centre (binarized at 50% of terminal extubation frequency, group1 <  = 50%, group2 > 50%), age, sex, body mass index (BMI), acute physiology and chronic health evaluation II score at admission (APACHE)8, chronic pre-existing medical condition (PreCond, yes/no), cardiac arrest with resuscitation before study inclusion (CPR, yes/no), admission diagnosis (3 most common categories one-hot encoded into dummy features: neurologic disorder (ADM_neuro), respiratory failure (ADM_resp), sepsis (ADM_sepsis); yes/no).

Patients’ characteristics at WLST: Glasgow coma scale (GCS), pupillary reflex (PUP, present vs. absent), cough (present vs. absent), mechanical ventilation mode (MV mode, controlled vs. supported), respiratory rate (RR, bpm), inspiratory fraction of oxygen (FiO2%), positive end expiratory pressure (PEEP, cmH2O), peak inspiratory pressure (PIP, cmH2O), intubation route (route, orotracheal intubation vs. tracheostomy), mean arterial pressure (MAP, mmHg), hear rate (HR, bmp), lactate (mmol L−1), arterial pH, arterial partial pressure of oxygen (PaO2, mmHg), arterial partial pressure of carbon dioxide (PaCO2, mmHg), total ranked circulatory drugs dose (circ_total_ranked), total ranked sedation drugs dose (sedation_total_ranked) and total ranked analgesics drugs dose at WLST (analgetics_total_ranked). Attempted donation after circulatory death was collected (DCD, yes/no).

Statistical analysis

All analyses and data processing were performed in R v 4.2.19 and RStudio v 2022.02.310. Exploratory data analysis was calculated for all parameters. Univariate analysis was done using Wilcoxon rank sum test for continues and Pearson’s Chi-squared test for categorical features.

Data pre-processing

Features were kept in their original form to simplify the interpretation of the eventual nonlinear relationship between a feature and the probability of TE.

Normalization was performed only for drugs so that we could combine doses of different drugs in the same drug group (circulatory, sedation and analgesics drugs). To normalize doses, we used a rank doses method similar to that described by Trace et al11. Patients were ranked for each drug from lowest to highest dose administered in the hour prior to withdrawal. The rank was then divided by the total number of patients who received that drug. This gave the relative rank dose for each drug. The relative rank doses of the drugs in each drug group were then summed. This gave the total rank dose for each drug group. The total ranked dose of circulatory drugs was calculated as the sum of ranked doses of norepinephrine, epinephrine, vasopressin, and phenylephrine. Similarly, we calculated the total ranked sedation drug dose (midazolam and propofol) and the total ranked analgesics drugs dose (morphine, fentanyl, and hydromorphone).

Data imputation

Missing data (both continuous and categorical) were imputed using a random forest model, package missForest v 1.412.

Machine learning (ML) analysis

Mlr3 toolbox was used for ML modelling (mlr3verse package v. 0.2.513). Two classification ML models were used: logistic regression (LR) and random forest (RF). LR was chosen as a "glass box" that is well interpretable and known to the general medical community. RF was chosen as a robust "black box", capable of modelling even complex non-linear relationships and interactions between features while not requiring normalization or scaling of features.

For RF with number of trees, maximal tree depth and minimal node size hyperparameter tuning, we used package ranger v. 0.14.114. LR was performed with feature selection (Lasso regularization (s to z), package glmnet 4.1-415). For hyperparameter tuning we used 5-fols cross-validation with classification error (CE) as performance measure and for the model performance evaluation (CE, ROC AUC, Brier score16) 5-fold cross-validation repeated 5 times. The best performing model (for both RF and LR) was used to create the final model on the full dataset.

Feature importance, measured as the factor by which the model's prediction error increases when the feature is shuffled, was calculated17. Accumulated local effects (ALE) plots18, describing how features influence the prediction of a machine learning model (terminal extubation) on average and independently on other features, were created for individual features of random forest model using iml package 0.10.119. Finally overall interaction strength for each feature with all other features was calculated.

Survival analysis

Survival analysis was performed using univariate and multivariate Cox regression with feature selection using Lasso regularization (s to z), (package glmnet 4.1-415) with all features mentioned above. Adjusted survival curves and adjusted median with 95% confidence interval were created using package adjustedCurves 0.9.020.

Results

Six hundred and sixteen patients from 20 centres in Canada (N = 355, 57.6%), the Czech Republic (N = 219, 35.6%) and the Netherlands (N = 42, 6.8%) were included in the analysis (see flowchart Fig. 1). The most common admission diagnosis was neurologic disorder (48.5%), respiratory failure (15.1%) and sepsis (14.3%). Further patient characteristics along with univariate comparisons of individual features between extubated and non-extubated patients are shown in Table 1 and Supplementary Tables S1S4. Three hundred and ninety-six (64.3%) patients were terminally extubated. Eighty-seven patients (14.1%) underwent an attempt to organ donation after circulatory determined death (also known as donation after cardiac death, DCD) and sixty (9.7% of total sample, 69% of DCD attempted) proceeded to DCD. The median time from initiation of WLST to death was 1 h (IQR: 0.3–4.7 h). Correlation between features can be seen in Supplementary Fig. S3. Multicollinearity was tested by variance inflation factor (VIF) and was low (below 3) for all features.

Figure 1
figure 1

Flowchart of the study enrolment. Note: TTD = time to death; WLST = withdrawal of life-sustaining treatments.

Table 1 Characteristics of enrolled patients (N = 616) (imputed data), stratified by terminal extubation.

The performance of the RF and LR model with the five-fold cross-validation on the test set was similar, with an average ROC AUC of 0.91 and 0.90, classification error 16.6 and 16.4% (see Supplementary Fig. S4 and Supplementary Table S6).

By far the most important feature in both models was the study centre (binarized by frequency of terminal extubation). Frequency of TE across centres is shown in Fig. 2. The proportion of patients with TE was 97.6% in the Netherlands, 81.4% in Canada and 30.1% in the Czech Republic. Four Canadian centres extubated all patients enrolled at their site. There was no centre in the study that did not at all perform TE.

Figure 2
figure 2

Frequency of terminal extubation across centres. Y-axis: study centre ID and country. Note: Group 1 includes all centres in the Czech Republic and one Canadian centre. Group 2 includes all remaining centres in Canada and the centre in the Netherlands.

Visual interpretation of RF model (15 of the 28 most important features) can be seen in Fig. 3 and feature importance of all features in Supplementary Table S7.

Figure 3
figure 3

Visual interpretation of the random forest model for 15 most important features: (a) Accumulated local effects plots describing how features influence the prediction of a RF model (terminal extubation) on average (red horizontal line). Y-axis: % change in probability of extubation. The validity of the curves is limited in areas with few data—see the rug plot on the x-axis. The corresponding partial dependence plots (PDP) curves are shown in Fig. S5. (b) Permutation feature importance measured as the factor by which the model’s classification error (CE) increases when the feature is shuffled in the test data (permuted CE/original CE [4.7%]).

Other important patient characteristics associated with likelihood of TE, include the “circulation status” and the “respiratory status” features at the time of WLST. Patients on higher vasopressor support, lower mean arterial pressure, higher lactate, and lower pH are less likely to be terminally extubated. Similarly, patients with higher peak inspiratory pressure, higher FiO2, lower paO2, higher paCO2 and higher respiratory rate (i.e., in respiratory failure) are also less likely to be extubated. The most important parameters in this group are FiO2 and peak inspiratory pressure. Lastly, patients on a higher dose of opioids or with GCS of 3 are less likely to be extubated.

The output of the LR with Lasso feature selection is shown in Supplementary Table S5. LR captures the same patterns as RF with the centre being the strongest feature, and features describing the severity of circulatory and respiratory failure as negative prognostic markers of TE. In addition, LR selected a route where tracheostomy patients have a lower chance of TE.

Time to death after WLST did not differ between patients with and without TE (univariate Cox regression: HR 0.98 (95% CI: 0.81; 1.18), p = 0.83, median survival time extubated vs. not extubated: 60 [95% CI: 46; 76] vs. 58 [95% CI: 45; 75] min). After adjustment for confounders, time to death was significantly shorter in patients with TE (multivariate Cox regression: adj. HR 1.46 [95% CI 1.11; 1.92], p = 0.007, median survival time extubated vs. not extubated: 49 [95% CI: 40; 62] vs. 85 [95% CI: 61; 115] min), see Table 2 and Fig. 4.

Table 2 Multivariate Cox regression with feature selection (Lasso regularization (s to z)).
Figure 4
figure 4

Plot of survival after WLST derived from the Cox regression model. (a) Univariate Cox regression comparing patients with and without terminal extubation. (b) Adjusted Cox regression comparing patients with and without terminal extubation. (c) Adjusted Cox regression comparing patients with different total circulatory drugs doses. (d) Adjusted Cox regression comparing patients with peak inspiratory pressure above and below 30 cmH2O. Note: Circ_total_ranked = total ranked circulatory drugs dose, PIP = peak inspiratory pressure.

Discussion

In this study we sought to determine which factors influence health care providers' decision to perform TE at the end of life and whether the TE influences time to death. Our analyses demonstrate that the probability of TE was influenced more by the study centre than by patients’ characteristics, suggesting that local protocols or habits may dominate over individualisation of care according to individual patients’ needs. This is consistent with the findings of the large epidemiological studies Ethicus-121 and Ethicus-22, which described significant regional differences in the way life-sustaining treatment is limited in the ICU. Yet, some patient-related factors were indeed associated with the probability of TE. Most importantly, patients without circulatory or respiratory failure are more likely to be extubated.

The reason behind this pattern might be the belief that unstable patients are expected to die shortly after the withdrawal of vasopressors and/or ventilatory support, whereas in more stable patients, healthcare providers may be concerned about potential suffering during protracted dying, perhaps making them more likely to perform TE. The survival curves of patients with and without TE follow almost identical trajectories. Without knowing the factors that had influenced the decision to perform TE, this could be interpreted as that TE does not influence time to death, which was also the conclusion of previous observational studies by Suntharalingam et al.22 or Wind et al.23. Of note, the effect of TE on time to death becomes apparent and significant after adjustment for the factors that are associated with the decision to perform TE. Even though the death-hastening effect of TE is still smaller than the effect of withdrawal of high doses of vasopressors (Fig. 4 and Supplementary Fig. S8), we believe it is of clinical importance. Nonetheless, the finding that patients on low doses of opioids are more likely to be extubated may reflect healthcare providers’ intention to sustain or restore spontaneous breathing efforts before the intended TE, which argues against conscious or subconscious intention to shorten the process of dying. It also may reflect the fact that healthcare providers believe the removal of endotracheal tube will relieve patients’ suffering that would otherwise require the administration of opioids.

Another interesting finding is that although over 95% of patients with attempted DCD were terminally extubated, this parameter was eliminated from both multivariate analyses (both RF and regularized LR). A possible explanation is that the patients in whom DCD was attempted had predominantly neurological injuries and were more likely to have normal gas exchange and circulation, i.e., features typical for patients that are terminally extubated. Therefore, at least from the point of view of TE, healthcare providers did not treat the DCD cohort differently. On the other hand, attempted DCD remains a significant parameter in multivariate Cox regression and these patients die faster.

Interpretation of the individual influence of each feature on TE can be aided by principal component analysis (PCA), which supports the above explanations and can be found in the Supplementary appendix (see Supplementary Fig. S9).

Investigation of technical aspects of WLST is very difficult, and to the best of our knowledge, there are no randomised controlled trials in the field. Answering important questions such as what the effects of and for the TE motivators are, relies on observational data. Our study represents a multi-centre, multi-national, observational trial with the largest sample size so far published. Such a sample allowed us to use state-of-the-art machine learning techniques to explore the complex and nonlinear relationships between variables associated with TE. Of note, the performance of RF and regularized LR was very similar. The high performance of LR is due to the low interaction rate of individual features (Supplementary Fig. S6). We have chosen RF as the main model because it allows an intuitive visualization of very complex relationships between TE and features using ALE (Fig. 3) and partial dependence plots (PDPs) (Supplementary Fig. S5). In addition, RF can automatically detect interactions between features. ALE plots are an unbiased alternative to the PDPs24, which means they still work even when features are correlated25. On the other hand, ALE plots can produce misleading interpretations when features strongly interact26, which is not the case here. The validity of both plots is limited in areas with few data. All multivariate models used require a dataset with no missing data. Otherwise, the model eliminates subjects with missing data which may cause selection bias and reduce the power of the analysis. For this reason, missing data were imputed under the assumptions that they are missing at random. Many machine learning algorithms require transformation of continuous parameters in the form of normalization or standardization. Our goal was not to create the best possible predictive model, but to explore the potential relationship of each parameter under study to TE while maximizing interpretability, which is limited if the parameters are transformed. For this reason, we limited ourselves to normalizing the drugs so that their doses could be summarized across drug groups. For the same reason, we used a random forest algorithm that is robust to untransformed data. An objection could be that we only used internal and not external validation. This was because of the limited number of patients and the primarily exploratory not predictive purpose of our work. Thus, the performance of the models as presented in Supplementary Table S8 and Supplementary Fig. S4 will be overestimated compared to how it would look if performed on an external data set on which the models were not trained.

From the clinical perspective, it is important that TE is likely to have the potential to hasten death in stable ICU patients at the end of life, but to a smaller extent than withdrawing the vasopressors in unstable patients. This information may help healthcare providers to tailor the technical aspects of compassionate care to patients’ and families’ wishes and values, including the decision to proceed with DCD.

The primary limitation of our study is its non-randomised nature, which means that despite the sophisticated methodology, the discovered relations will always remain associative, without conclusive evidence of the causality. The cohort of extubated patients also included patients with tracheostomy who were decannulated (n = 19, 3.1%), but these numbers are too small to allow generalisability of our results to all patients with cuffed airways. In addition, this secondary data analysis was not specified a priori and therefore the spectrum of the analysed features is limited and may not include all motivators to TE. Lastly, we made assumptions about healthcare providers’ motivators to perform TE, based on quantitative data, without directly interviewing them. Future research should use qualitative methodology to gain a deeper insight into the conscious motivators of healthcare providers to perform TE and possibly explore other ML-based methods27.

In conclusion, the decision to terminally extubate is associated with specific centres and less respiratory and/or vasopressor support. In this context, terminal extubation was associated with a shorter time to death.