Prediction prolonged mechanical ventilation in trauma patients of the intensive care unit according to initial medical factors: a machine learning approach

The goal of this study was to develop a predictive machine learning model to predict the risk of prolonged mechanical ventilation (PMV) in patients admitted to the intensive care unit (ICU), with a focus on laboratory and Arterial Blood Gas (ABG) data. This retrospective cohort study included ICU patients admitted to Rajaei Hospital in Shiraz between 2016 and March 20, 2022. All adult patients requiring mechanical ventilation and seeking ICU admission had their data analyzed. Six models were created in this study using five machine learning models (PMV more than 3, 5, 7, 10, 14, and 23 days). Patients’ demographic characteristics, Apache II, laboratory information, ABG, and comorbidity were predictors. This study used Logistic regression (LR), artificial neural networks (ANN), support vector machines (SVM), random forest (RF), and C.5 decision tree (C.5 DT) to predict PMV. The study enrolled 1138 eligible patients, excluding brain-dead patients and those without mechanical ventilation or a tracheostomy. The model PMV > 14 days showed the best performance (Accuracy: 83.63–98.54). The essential ABG variables in our two optimal models (artificial neural network and decision tree) in the PMV > 14 models include FiO2, paCO2, and paO2. This study provides evidence that machine learning methods outperform traditional methods and offer a perspective for achieving a consensus definition of PMV. It also introduces ABG and laboratory information as the two most important variables for predicting PMV. Therefore, there is significant value in deploying such models in clinical practice and making them accessible to clinicians to support their decision-making.

Fraction of inspired oxygen paO 2 Partial pressure of oxygen paCO 2 Partial pressure of carbon dioxide pH Potential of hydrogen GCS Glasgow Coma Scale Temp Temperature About 30% of severely sick patients need prolonged mechanical ventilation (PMV) 1 . Since there is no specific definition for long-term mechanical ventilation, it is not possible to accurately evaluate these patients. But based on a previous study, the prevalence of long-term mechanical ventilation was estimated at 7.4 per 100,000 people 2 . A tracheostomy is finally performed on 10% of patients who require at least 3 days of artificial ventilation 3 . In addition, while MV is a life-saving procedure, it comes with a host of risks, including death, ventilator-associated pneumonia (VAP), ventilator-associated lung damage, and prolonged hospitalization 4,5 . Also, this topic can be effective with other variables, including the requirements of the patient's rights from the patients' point of view 6 . These risks are heightened with PMV 1,5 . As a result, predicting patients at risk of PMV is critical in assisting clinicians in developing unique care plans to reduce the risk of PMV [7][8][9][10] . Because, the policy makers and executive managers of the health system in Iran must prepare a comprehensive strategic plan for the improvement of hospitals for a proper, timely and up-to-date response 11 . However, the timing of tracheostomy remains controversial 12,13 . Many studies were conducted to identify predictors of PMV. However, determining a set of crucial predictors remains challenging due to differences in the clinical characteristics of patients and clinical settings 14 . Also, one of the reasons for the difficulty of PMV studies is the lack of a consensus definition. Some believe that the ventilation PMV is more than 7 days 7,9,10,15 , some say 10 days 16 , some say 14 days 8,15 , and some say 21 days [17][18][19] . Also, as previously stated, most of the previous studies aimed at predicting PMV using multivariate techniques, particularly logistic regression, and had low to moderate accuracy 7,10 . Implementing machine learning to predict PMV has a relatively higher performance than conventional prediction models 7 . Also, in a previous study 20 , less emphasis was placed on laboratory data and Arterial Blood Gas. Given that the timing of tracheostomy is controversial, there is also no consensus definition for PMV, we decided to investigate the predictive performance of machine learning models in this study. Therefore, this study aims to use supervised machine learning to predict PMV time and tracheostomy time in ICU patients. Therefore, we are trying to determine a specific definition for PMV and specify the important and influencing variables.

Methods
Study design and patients. This retrospective cohort study included all patients that were admitted to the ICU of Iran's Shahid Rajaei Hospital (the largest trauma center in Shiraz) between January 1, 2016 and March 20, 2022. We provided the data to ensure its quality and to complete patient information from three different sources in the hospital. These sources include Shiraz Intensive Care Intelligent Registry Data (in collaboration with the Anesthesia and Intensive Care Research Center with the Australian and New Zealand Intensive Care Association), the patient's records in the hospital medical records, and the extracted patient information at the Trauma Research Center. Then we merged the information based on the patients' file numbers.
Trauma patients over 14 years of age and patients who underwent intubation on the spot by emergency medical personnel or in the hospital following injury within the first 24 h were included in the study. Exclusion criteria for patients included (Fig. 1).
(1) Patients diagnosed with brain death, (2) Lack of data at the beginning of hospital admission, (3) Transfering the patient to another hospital, and (4) Perform a tracheostomy and un-intubated patients.
Shiraz University of Medical Sciences ethics committee approved this study (IR.SUMS.SCHEANUT. REC.1400.006). We confirm that all research was performed in accordance with relevant guidelines and regulations, Also informed consent was obtained from their legal guardians according to the condition of the patients. We reported the findings per STROBE guidelines for strengthening observational studies (Supplementary  Table S1).

Data collection.
We collected and recorded data on the following.
The duration of ICU and hospitalization were calculated from ICU and hospitalization, respectively. Also, we calculated the number of days the patient was connected to the ventilator via an endotracheal tube.
Comorbidity was defined as the presence or absence of Immune Disease, AIDS, Leukemia, Myeloma, Metastases, Lymphoma, Hepatic failure, Cirrhosis, Chronic liver failure, Chronic respiratory, Chronic Cardiovascular, and Chronic renal failure.
The results of arterial blood gases were recorded in the ICU in the first 24 h based on blood gases. ABG values were used one hour before admission if arterial blood gases were not tested within the first 24 h of ICU admission. We also used the conversion table to convert oxygen to tail oxygen concentration.
We used the lowest level of consciousness in the first 24 h of the ICU to calculate the level of consciousness in the absence of sedatives, relaxants, or neuromuscular blocking drugs. In the case of sedatives, we also recorded the level of consciousness before administration. It is worth mentioning that the level of consciousness before receiving a sedative is not always the lowest level of consciousness. However, the level of consciousness is recorded before taking the sedative. GCS scoring was done in four sections and then added together. (3) Verbal: orientated = 5, confused = 4, inappropriate words = 3, incomprehensible sounds = 2, no response = 1; (4) Verbal intubated: appears orientated = 5, ability to converse in doubt = 3, unresponsive = 1.
In the first 24 h of hospitalization in the ICU, patients' physiological data, including vital signs and laboratory data, were recorded as their maximum, minimum, mean, and suffering. The values from one hour before admission to the ICU were used if the mentioned items were absent within the first 24 h. Statistical analysis. Several supervised machine learning methods were used to compare their performance with previous studies and to find a model characterized by optimum performance and maximum applicability to support clinical decision-making. We chose random forest (RF), logistic regression (LR), decision tree C.5 (C.5 DT), artificial neural networks (ANN), as well as support vector machines (SVM) so as to present the baseline comparative performance. Hyper-parameter values for each algorithm are given in Table S2. Also, we used RapidMiner Studio 9.10.008 and SPSS modeler ver 18.0 for analysis. Due to the imbalance of the classes, we used the SMOTE sampling method to balance them. Then, we divided the collected data into a testing set (30%) and a training set (70%) to avoid over-fitting and validate model performance. Given that the accuracy criterion alone is inadequate for assessing the overall performance of the model, besides the accuracy, AUC, sensitivity, Negative Predictive Value (NPV), F-score, and specificity were also considered. The F1 criterion, which is a balanced combination between precision and accuracy, is applicable in cases in which the cost of a false negative and a false positive differ. The same accuracy criterion can be used if the cost of False Negative and False Positive is nearly the same. However, if the data is unevenly distributed across classes (for example, 90% patients and 10% healthy), it is preferable to use the accuracy, recall, or F1 criteria (F1 score = 2*(Precision*Recall)/Precision + Recall).
We also selected the most important variables in our best model using the Brute Force algorithm and the importance of each variable was reported based on the gain ratio (Table S3).

Logistic regression (LR).
Logistic regression is one of the popular techniques used for the prediction of binary, multinomial, or ordinal outcomes 21 . This study uses stepwise LR (backward stepwise method based on likelihood) to control for confounding variables and also calculate independent risk factors for PMV after ICU admission.

Random forest (RF).
Random forest is a robust supervised machine learning method commonly employed to solve classification-related problems 21,22 . It has been demonstrated that RF is more accurate than other methods of machine learning. This accuracy is ascribable to the fact that random forest employs bootstrap for forest growth of unrelated trees characterized by high randomness in feature selection. This helps to reduce errors significantly 23 .
Support vector machine (SVM). Support vector machine is a robust classification machine learning algorithm applied to linear and nonlinear data sets 24 . It is critical to determine which core function best reaches the optimal cloud page that separates classes when using SVM for classification purposes 25 . Also, this study used the radial basis function (RBF) to provide better predictive efficiency in the initial evaluation.
Artificial neural networks (ANNs). Artificial neural networks are machine learning methods for pattern recognition and classification 24  www.nature.com/scientificreports/ box. Their undeniable capability of supporting clinical practice via interaction with evidence-based medicine is indisputable 26 . The present investigation employs ANN Multilayer Perception (MLP) because it had a better performance than the radial basis function (RBF) in the initial analysis. The present investigation used a standard feed-forward neural network featuring three layers: a hidden layer, an output layer, and an input layer. A multilayer perception network is a novel tool for creating specialized layered feed-forward networks. These two layers are responsible for connecting the network to the outer world. A typical multilayer perception contains one or more layers of neurons. Because these layers are inaccessible directly, they are known as hidden neurons 27 . Hidden neurons are responsible for extracting the critical features from the input data. By partitioning data into separate experimental and training datasets to avoid overfitting, the neural network is typically optimized. The training process will continue until the error is reduced 28 . When utilized for classification, artificial neural networks are viewed as a collection of interconnected output/input units, each of which has its own associated weight. This value indicates the connection strength between the units 29 .
C.5 decision tree (DT). The C.5 decision tree classification data mining algorithm replaces the C.4.5 decision tree classification algorithm of data mining. The DT is a "classification algorithm" where every single non-leaf node represents an experiment on one of the properties of the input items. "Every single branch corresponds to a test outcome, while each single leaf node represents a class prediction. " 30 . Decision trees are logical, robust, and easy to understand and interpret classification algorithms 31 . Ethics approval and consent to participate. The Ethics Committee approved this study at Shiraz University of Medical Sciences (IR.SUMS.SCHEANUT.REC.1400.006). Informed consent was obtained from all subjects or their legal guardians to use their data for research.

Results
Data mining algorithms' performance. Table 1 shows the performance evaluation criteria for the five machine learning techniques on the test data partition. All models achieved accuracy (61.11-85.27). Random Forrest is the preferred model for deployment because it has a higher discriminating power (AUC = 0.821), which is critical for the classification function. In the set of models, logistic regression showed better or equal performance with Support Vector Machines and Artificial Neural Networks. When the discrimination power between the three sets is compared, set E, which defines PMV as more than 14 days, outperforms other groups, with AUC ranging from 83.70 to 90.20. This value demonstrates that the detection power and accuracy were optimal when the PMV was greater than 14 days. Table 2. In this study, 1138 ICU patients with a mean age (years) of 44.50 ± 21.42 (Max: 100 and Min = 18) participated. Of all participants, 929 (81.6%) were male. The median hospitalization time (day) for all participants was 29.50 (8,47). APACHE II Risk (%) was higher in MV for more than 14 days (P.value = 0.026). ICU stay in PMV ≥ 14 was longer (7 vs. 49, P.value ≤ 0.001).

Logistic regression (LR).
The relationship between hospitalization days, APACHE II Risk, ICU days, Mean systolic, Mean diastolic, Mean arterial pressure, Urine Output, FiO 2 , PaCO 2 , Inotropes Vasopressor, and PMV ≥ 14 days was investigated using logistic regression. PMV ≥ 14 days odds of occurring increased by 1.08 times (95% CI (1.006, 1.010) for a one-unit increase in the number of days in the hospital. It also increased by 9.47 times (95% CI (1.246, 72.091) for a one-unit increase in FiO 2 and by 0.966 times (95% CI (0.944, 0.989) for a one-unit increase in PaCO 2 . The odds of PMV ≥ 14 days increased by 1.959 times (95% CI (1.339, 2.867)) for patients who did not use Inotropes Vasopressor compared to used Inotropes Vasopressor. Fig. 2, the variable of the initial split in the DT model was inotropes vasopressor.

Decision tree. As shown in
The second split's variable was identified as pH, with an optimal cut-off value of 7.21. This value was recognized among the patients with inotropes vasopressor = yes. On the third split, Potassium, with an optimal cut-off value of 5.9, and FiO 2 , with an optimal cut-off value of 0.470, were identified. The diagnosis was then identified as the final split. According to the DT analysis, the variables in the order of importance were: inotropes vasopressor, FiO 2 , diagnose, Potassium, and pH. Figure 3 depicts the relative importance of the variables.
Random forrest (RF). The random forest model's test classification accuracy was 81.99%, and its F1 score was 79.92%. PaCO 2 , pH, and hemoglobin were the three most important predictors of prolonged mechanical ventilation.

Support vector machines (SVM).
We ran the SVM model with radial and gamma kernels of 1.0, kernel cache of 200, C with zero value and convergence epsilon of 0.001, and maximum iterations of 100,000. The model's accuracy was 82.52. Classification error and F1 scores were 17.48% and 81.20%, respectively (Table 1). Markedly, the total number of Support Vectors and Bias (offset) were 1332 and -0.567, respectively. The most important predictors of prolonged mechanical ventilation were: diagnosis, gender, age, inotropes vasopressor, APACHE II, PaCO 2 , PaO 2 , Glucose, Potassium, and systolic pressure.   (Figs. 4 and 5). ROC curve for comparing all 5 machine learning algorithms in 6 models used under stratified tenfold cross-validation is given in Fig. S1.

Discussion
The current study was designed to investigate predicting PMV. We discovered the following results in this study. First, the most critical variables in predicting PMV were vasopressor inotropes, FiO 2 , PaO 2 /FiO 2 , PaCO 2 , and APACHE II. Second, most models found Potassium to be the most crucial variable in predicting PMV. Finally, based on modeling accuracy and output ROC plots, we chose 14 days as the cut-off point for distinguishing PMV from non-PMV. As a result, when PMV is predictable, it is critical to plan tracheostomy early and possibly transfer the patient to the most appropriate institution for PMV and its complications. Furthermore, the use of supervised machine learning techniques especially ANN resulted in moderate to high overall performance across six ensembles, according to this study.
Utilizing vasopressor inotropes was one of the current study's most significant variables in predicting PMV. Fluid resuscitation and vasoactive therapy are critical in managing hypotensive patients to support organ perfusion [32][33][34] . Current guidelines from the 2016 Surviving Sepsis Campaign (SSC) recommend early initiation of vasopressors targeting mean arterial pressure ≥ 65 mmHg 35 . Guidelines specify that evaluating the need for vasopressor therapy starts when there is persistent hemodynamic instability despite fluid resuscitation. The SSC guidelines are not specific about when to initiate vasopressors. However, recent studies have shown that delaying the initiation of vasopressors is associated with a higher mortality rate, fewer vasopressor-free days, and a longer time to achieve target mean arterial pressure 36,37 . Following the patient's hemodynamic instability, PMV occurs in those who have not used vasopressor inotropes. FiO 2 and PaO 2 was one of our essential variables and predicted PMV for us. PaO 2 /FiO 2 on the third day of intubation was also found to be a predictor of PMV by Sellers et al. 38 . Long-term excessive FiO 2 exposure may be associated with pulmonary function deterioration in a dose-response manner. Excess FiO 2 exposure was not associated with mortality in one study 39 . However, in another large retrospective study of mechanically ventilated ICU patients in the Netherlands, high FiO 2 values during the ICU stay were associated with hospital mortality 40 . In the same study, high FiO 2 in the first 24 h of hospitalization was linearly related to in-hospital mortality rates of high and low PaO 2 values 40 . Long-term exposure to FiO 2 has been shown in previous studies to have a linear correlation with a worsening oxygenation index. Furthermore, regardless of FiO 2 , the oxygenation index increased. This increase was linked to a longer length of mechanical ventilation and ICU stay. The Oxygenation index, combining airway pressure and oxygenation, is a reliable predictor of worsening lung function, particularly in patients with acute lung injury [41][42][43] . Nash et al. performed the first detailed pathological examination of pulmonary changes following exposure to higher oxygen concentrations and treatment duration. They observed significant pathological changes in interstitial edema with prolonged treatment that progressed to fibrosis 44 .
Another of the principal variables for PMV prediction is PaCO 2 . In another study 45 , PaCO 2 was the only predictor of the need for PMV as a component of arterial blood gas analysis. However, it only predicted the need for PMV in two studies of patients admitted to the ICU for various reasons 46,47 . Interestingly, while low bicarbonate levels did not affect the likelihood of experiencing PMV 48 , a pH less than 7.25 significantly predicted the need for PMV for more than 14 days 48,49 . Our study is similar to the results of this study which shows the importance of pH value after PaCO 2 . Our study discovered a pH cut-off point of 7.21, consistent with previous research 48,49 .
APACHE II is the most widely used tool for predicting the result worldwide. Almost all staff, including doctors, nurses, physiotherapists, and social workers, are well-versed in it. Also, there is no single predictor of clinical outcome in PMV patients, as it is determined by a combination of respiratory and non-respiratory factors 50 . As a result, one of our goals was to assess the APACHE II scoring system's ability to predict PMV. We discovered that APACHE II has high predictive power for PMV. This finding differs from the previous study 45 , which found that ICU admission severity scores did not have a high predictive power when using the statistical method. One plausible explanation is that these scores may lose predictive power in patients with extended hospital stays 51 . Likewise, Rojek-Jarmuła et al. 50 discovered that in patients admitted to a weaning center for PMV, the APACHE II score could not predict successful freedom from mechanical ventilation or tracheostomy tube removal. However, predictive ability appears relatively better in studies with short PMV values 46 .
As a result, systematic reviews on the predictive ability of ICU scores for PMV are recommended as future collective evidence. APACHE II, on the other hand, has been validated by several researchers in predicting weaning success 52,53 . One reason could be that each center has different admission criteria and a different patient population regarding demographics, primary diagnoses, and comorbidities. Third, no single scoring system applies to   www.nature.com/scientificreports/ One of the crucial variables observed is the Potassium. According to our findings, Potassium is one of the laboratory data variables that can help predict PMV. In another study, Potassium disorder is one of the significant side effects that can lead to death, long-term mechanical ventilation (more than 21 days), and hospitalization (more than 36 days) 55 . Another study found that a patient's Potassium level at the start of intensive care was strongly associated with the risk of death, even if it was slightly above the normal range 56 . In addition, Potassium level is mentioned as one factor indicating a higher risk of death in this group in another study conducted on Covid-19 patients hospitalized in the ICU 57 . Abnormal plasma Potassium levels may be a symptom of an acid-base imbalance in patients suffering from acute respiratory failure. Markedly, it can also result in cardiac arrhythmia, Brady arrhythmia, complete heart block, and circulatory arrest 56,58 . Previous research has examined the link between Potassium and mortality in the intensive care unit. According to their findings, Potassium mortality is independent of AKI, selective b1 blockade, or ACEi/ARB use. There was no correlation between Potassium and mortality in patients with K > 5.5 mEq/l who received blood transfusions and Potassium supplements before Potassium administration. The Potassium-mortality relationship was not adjusted for the presence of a K-promoting drug in its entirety. A C1 mEq/l decrease in Potassium within 48 h of the start of intensive care eliminates the Potassium-mortality association. The mechanism underlying the link between Potassium and mortality is unknown. The most apparent effect of hyperkalemia is a decrease in the myocardium's resting membrane potential. This effect leads to a reduction in the speed of myocardial cell conduction and an increase in the repolarization rate 59 .
One of our hypotheses is that PMV may have contributed to increased mortality in patients with Potassium deficiencies. However, the degree of hyperkalemia is not associated with the risk of life-threatening arrhythmias in general 60 . Nonetheless, the progression of arrhythmias in hyperkalemia from benign to fatal is unpredictable 61 . Also, Potassium concentration may be an indicator of disease severity or may be related to an unmeasured patient factor that may be a cause of PMV in and of itself.
Finally, Set E outperformed the other sets regarding stability and discrimination power, with an average AUC of 87.075 compared to the other sets' average AUC. This finding is significant because the optimal predictive performance is achieved when PMV is defined as more than 14 days, which is the optimal period for primary tracheostomy. However, unlike the previous study 15 , the current work did not record a better performance of machine learning than traditional forecasting techniques. It achieved nearly the same level of accuracy and precision as machine learning. However, machine learning, particularly ANN and decision tree methods, outperformed the traditional logistic regression method in terms of accuracy. Accordingly, machine learning techniques  www.nature.com/scientificreports/ outperform conventional analytical techniques and can provide clinicians with more support for higher-quality decision-making, which improves patient treatment outcomes. This study has its strengths; however, several potential limitations are concerning. First, the data used comes from the province's largest trauma center. We used three data sources to ensure the information's accuracy. However, our analysis is based on single-center retrospective data. Our findings have limited external validity and cannot be generalized to a larger international population.
Second, to the best of our ability, we attempted to track the patients using the case codes. Furthermore, we tried to analyze six different models using five machine learning methods that were thought to be the most powerful statistical methods available.
Further, the laboratory and ABG data, which have received less attention in previous studies, are regarded as one of the study's strengths because these two variables can be measured and investigated relatively simply, allowing doctors to make decisions based on them. However, we did not include comprehensive data on sedation protocols and spontaneous breathing tests. Finally, we lack follow-up data on hospital readmission rates, MV reestablishment, and long-term mortality rates.

Conclusion
This study found that ABG (particularly FiO 2 , PaO 2 , and PaCO 2 ) and laboratory data (particularly Potassium) could assist physicians in predicting PMV. This study also demonstrated that a machine-learning approach could improve predictive power. However, improving data quality in registries or electronic medical records is more important than prediction, which helps enhance prediction quality. Furthermore, there is significant value in deploying such models in clinical practice and making them accessible to clinicians to support their decisionmaking. Also, like the results of the previous study 62 , the results of our study can help nurses working in the intensive care unit to make a better decision about when to perform tracheostomy and to be trained for it. We suggest that in future studies, attention should be paid to the measurements of variables over time in modeling, as well as other variables such as hospital readmission rates, MV reestablishment, and long-term mortality rates. Also, the use of multi-center data in future studies is suggested, which can control the difference of different treatment instructions to a great extent and increase the external validity of the study.

Data availability
The data that support the findings of this study are available from the corresponding author, [HGH], upon reasonable request. www.nature.com/scientificreports/