Introduction

Gallstone disease (GD) is one of the most common diseases in the world, affecting about 18% of the population1. Acute cholecystitis results in 90% of cases2 from the obstruction of the cystic duct due to the presence of these stones. Laparoscopic Cholecystectomy (LC) is currently the gold standard and has remained so even in the CoViD-19 era3 for the management of gallstone disease, performed in about 96% of cases4. In fact, compared with Open Cholecystectomy (OA) it offers the advantage of a shorter hospital stay, faster healing and less visible scars5. Approximately 3–5% of the LCs, however, are converted to OA during surgery. Risk factors for conversion include age, sex, anatomical features, severity of disease, intraoperative complications (bleeding, internal organs trauma), previous abdominal procedures or the lack of adequate laparoscopic instruments6,7. The most affected categories are women and the elderly8. The Italian Multicenter Study of Cholelithiasis (MICOL) showed that the overall rate of gallstone disease was 18.8% in women and 9.5% in men1. For both sexes, age is the main risk factor for the development of gallstone disease1,9,10, with a prevalence of more than 80% in patients older than 90 years11. The feasibility of LC in the over-80 population has been evaluated in several studies confirming its safety and efficacy comparable to that obtained in younger patients, making it the most common procedure for this age group12,13,14,15. The high incidence in this population category intersects with the demographic change in the industrialized countries. In fact, recent years are witnessing a rapid aging of the world population which will lead to an increase in the total prevalence of GD and increased use of the relative surgical procedure16,17. Furthermore, it should be considered that this category of patients generally has higher pre- and post-operative morbidity and mortality rates and a longer hospital stay18. Today there is a growing interest in improving the quality and efficiency of healthcare processes as much as possible, especially with a view to cost containment. This highlights the need to define and use quality and efficiency indicators19. Several tools have been successfully applied to data derived from healthcare processes20,21,22,23,24 or to support the management25,26,27,28. Length of Stay (LOS) is an important performance indicator for hospital costs and management and a key measure of national health system efficiency29. Comorbidities, such as heart disease, lung disease and diabetes30,31, commonly found in elderly patients, negatively affect this parameter. For example, Valent et al.32 showed, through the implementation of regression models, how patients who have diabetes mellitus as a comorbidity have a higher risk of in-hospital death and a longer LOS. Similarly, using a regression model, Ofori-Asenso et al.33 showed that the median LOS was 1.1 days longer for patients with a high Charlson comorbidity index (CCI) than for those with a low CCI. These two articles, although in different contexts, validate the use of regression models in the study of LOS and demonstrate the influence of comorbidities. An increase in LOS causes an inevitable increase in costs and a reduction in the number of hospitalizations that can be made34. The Italian government, the reference country for this study, has also moved in this direction defining in the National Outcome Plan (in Italian PNE) several indicators that assess the volumes of activity and outcomes of some of the referral health services35. For LC, in particular, several indicators were defined to evaluate complications at 30 days and post-operative LOS. After a careful review of the literature, where post-operative LOS varies between 3 and 5 days, the upper limit was set at 3 days. Therefore, it becomes strategic for a hospital to reduce the value of LOS, which is only possible after a full understanding of the main factors that negatively influence it to adopt preventive measures. In order to ensure the national goal, the “San Giovanni di Dio e Ruggi d’Aragona” University Hospital of Salerno (Italy), the target facility of the study, has already analyzed the situation from a Lean Six Sigma perspective36 to understand the critical aspects of the process and identify the most appropriate corrective actions.

However, reducing LOS does not only have economic benefits. In fact, many hospitals are successfully performing LC on an outpatient basis, ensuring lower costs and faster turnover37. Smith et al.38 state that 80% of patients undergoing elective LC can be safely discharged 4–6 h after surgery, with no difference to 23-h monitoring. In fact, the authors make it clear that if no symptoms occur within 3–6 h of surgery without complications, discharge can occur without the need to remain in hospital for days. However, as Topal et al.39 show, to make this possible, it is necessary to create specific pathways and define specific patient selection criteria. This new pathway, combined with early LC40, can offer significant benefits to the patient, resulting in a high rate of satisfaction and a faster return to normality.

In this study, regression models and classification algorithms were implemented on a dataset consisting of all patients undergoing LC at the “San Giovanni di Dio e Ruggi d’Aragona” University Hospital of Salerno in the years 2010–2020, in order to identify the variables that most influence LOS and to create predictive models useful for application purposes for proper management of hospital resources. This study is an extension of our previous paper, where Multiple Linear Regression (MLR) was applied on a limited set of years and variables21.

Materials and methods

Data collection

In this study, data from 2352 patients undergoing elective LC at the “San Giovanni di Dio e Ruggi d’Aragona” University Hospital of Salerno in the years 2010–2020 were processed. The information was extracted from the QuaniSDO hospital information system with which the hospital manages and computerizes hospital discharge forms. The dataset was obtained using the inclusion and exclusion criteria provided for the calculation of the indicator “Laparoscopic Cholecystectomy: post-operative hospitalization less than or equal to 3 days” provided by the PNE. The inclusion criteria are as follows:

  • LC surgery.

  • Primary or secondary diagnosis of gallbladder and bile duct lithiasis.

The exclusion criteria are as follows:

  • Patients not resident in Italy.

  • Patients younger than 18 years.

  • Hospitalizations with a diagnosis of trauma.

  • Hospitalization for pregnancy.

  • Hospitalization with a diagnosis of malignant tumor of the digestive system.

  • Admissions with OA surgery.

  • Admissions with patient discharged deceased.

  • Admissions in which the patient is transferred from another hospital.

  • Admissions for other abdominal surgeries, such as stomach or duodenum/small intestine surgeries, etc.

  • Patient undergoing intraoperative cholangiogram or common bile duct exploration.

  • Patients had medical admission for another reason.

We decided to exclude cases that generate high LOS variability (such as urgent cases or deceased patients discharged) in order to analyze a more standardized condition, such as elective surgery. The information extracted is the following:

  • Gender.

  • Age.

  • Date of admission.

  • Date of surgery.

  • Date of discharge.

  • Comorbid conditions:

    • Hypertension (yes/no).

    • Diabetes (yes/no).

    • Cardiovascular disease (yes/no).

    • Obesity (yes/no).

    • Allergies (yes/no).

    • Presence of hernia (yes/no).

    • Respiratory disorders (yes/no).

    • Surgery with complications (yes/no).

Multiple linear regression

MLR is a highly flexible system for analyzing the relationship between one or more independent variables, called predictors, and a single dependent variable, called criterion. At the basis of applying of the model, the dependent variable is assumed to be directly linearly related to the predictors. The relationship describing the model used is the following:

$$\mathrm{y}={\upbeta }_{0}+{\upbeta }_{1}{\mathrm{x}}_{1}+{\upbeta }_{2}{\mathrm{x}}_{2}+{\upbeta }_{3}{\mathrm{x}}_{3}+{\upbeta }_{4}{\mathrm{x}}_{4}+\dots {+\upbeta }_{\mathrm{n}}{\mathrm{x}}_{\mathrm{n}}+e,$$

where y is the dependent variable, xi are the dependent variables which in this case are Gender, Age; Year of Discharge; Comorbid conditions (Hypertension; Diabetes; Cardiovascular disease; Obesity; Allergies; Presence of Hernia; Respiratory Disorders; Surgery with Complications) and pre-operative LOS, the βi values are the coefficients of the model to be determined and e is the error, a random variable. Before implementation, it is necessary to test 6 assumptions that determine the applicability of the linear model, assessing the relationship between variables, the nature of the residuals and the presence of outliers21,41. IBM SPSS Statistics Version 26.0 software (IBM Corp, Armonk, NY, USA) was used for model construction and hypothesis testing.

Classification algorithms

In addition to the construction of the MLR model, Classification algorithms are used both for regression model building and as classifiers. Google Colaboratory (Colab) Cloud Platform was chosen for the implementation. The following algorithms have been implemented as classifiers: Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), Naïve Bayes (NB) and Multilayer Perceptron (MLP). These algorithms have been selected for their wide range of applications and excellent results in healthcare42,43.

For the study, the dataset was divided into three groups according to the value assumed by the LOS, as follows:

  • Group 0: LOS ≤ 3 days.

  • Group 1: 4 ≤ LOS ≤ 8 days.

  • Group 2: LOS > 8 days.

The choice of thresholds for creating the groups was entirely arbitrary, starting with the threshold of 3 days set by the national indicator. The other values were set to ensure a consistent number of observations for each group. No post-processing techniques were used to balance the dataset. To apply the algorithms, the dataset was divided into training datasets (80% of the total) and test (20% of the total) datasets to calculate evaluation metrics for classification analysis. Both hyperparameter optimization techniques and cross validation were implemented to improve the performance of the algorithms. The scikit-learn library used to implement the classification algorithms makes available to the user the CrossValidator tool that allows the user to partition the dataset into n pairs of separate datasets (training, test) to evaluate a particular set of parameters. The output presented is the average evaluation metric for the models built independently of the particular partitioning done. With the GridSearchCV tool, on the other hand, the hyperparameters of the algorithms are optimized in order to search for the combination that yields the best results for the particular case study. The Table 1 shows the parameters that were arbitrarily selected for the algorithms chosen.

Table 1 Selected values of each hyperparameter.

Having obtained the best algorithms the Voting Classifier (VC) was implemented. VC uses the prediction of the 5 classifiers to determine the predicted value of each sample through an ensemble technique based on a majority policy. In other words, the sample will be associated with the prediction that receives more than half of the votes, i.e., the predicted value from at least 3 classifiers.

Ethics approval and consent to participate

The authors declare that all methods were performed in accordance with the Declaration of Helsinki. The institutional review board of “San Giovanni di Dio and Ruggi d’Aragona” University Hospital has approved the study. The institutional review board of “San Giovanni di Dio and Ruggi d’Aragona” University Hospital provided waiver for informed consent for the study. Our data, provided by the Hospital’s Health Department, are completely anonymous and no personal information are linked or linkable to a specific person.

Results

First, data from the 2352 patients were analyzed using the MLR model. Before implementing the model, however, it was necessary to verify the consistency of the six hypotheses defined in the previous paragraph. After ensuring the linearity of the relationship between independent variables and dependent variable, Tolerance and VIF were calculated. In all cases, a Tolerance value greater than 0.2 and a VIF value less than 10 was obtained, values that guarantee the absence of multicollinearity. Assumptions 3 and 4 on the residuals are determined in graphical form. Figure 1 shows the graph of the standardized residual regression vs the regression standardized predicted value.

Figure 1
figure 1

Standardized residual regression vs the regression standardized predicted value.

It can be seen from the Figure that the residuals are distributed around 0, which suggests that the homoscedasticity assumption is not violated.

On the other hand, the absence of outliers is ensured by a Cook’s distance less than 1 for each record. Finally, the independence of the residues is demonstrated by the Durbin-Watson test result within the acceptability range of 0.0 and 4.044. At this point, the MLR was implemented. Table 2 shows the results of the regression model and Durbin–Watson test.

Table 2 Model summary.

The value of R2 greater than 0.5 showed that the model was robust enough for this particular type of application45. Table 3 shows the value of the coefficients βi, the t-test and the p-value obtained for the independent variables considered. The test is considered verified when p-value is less than 0.05.

Table 3 Regression coefficients and results of t-test.

Observing the p-value column, the only variable that significantly affected the LOS was the pre-operative LOS. Therefore, the selected classification algorithms were implemented. Before proceeding, it was necessary to arbitrarily partition the LOS into classes. The baseline characteristics of the 3 Groups are shown in Table 4.

Table 4 Baseline characteristics.

The classes showed significant differences not only in pre-operative LOS but also in the presence of complications during surgery, in the Age and in some cardiovascular-related comorbidities. Before presenting the results, the hyperparameters of each algorithm obtained as a result of the optimization process are shown in Table 5.

Table 5 Best parameters.

Table 6 reports the results obtained in terms of accuracy, precision, recall and F-measure.

Table 6 Performance metrics of all selected algorithms.

With an accuracy of 83.0% DT had the best performance, followed by RF with an accuracy of 81.0%, SVM with an accuracy of 80.0% and finally NB and MLP with an accuracy of 74.0%. For all algorithms, the worst results are obtained in the classification of the second class, i.e. patients with a post-operative LOS between 4 and 8 days. The best results, however, when considering F-measure are recorded for the third class, at prolonged LOS, which is the one of most interest to health care management. Details of the classification for the best algorithm are shown in Table 7.

Table 7 Decision tree confusion matrix.

Figure 2 shows the ROC curves for DT.

Figure 2
figure 2

ROC curves.

It can be seen from the figure that the largest area compared to the “no benefit” line (black discontinuous line) is obtained for the third class for which the algorithm returned the best results. On average, the area obtained still reaches a significant result of 0.88. Figure 3 reports the results of Permutation Feature Importance.

Figure 3
figure 3

Permutation feature importance.

Permutation feature importance allows visualization among the independent variables of which ones most influence the model by going to measure performance using a corrupted version of one of the variables each time. Figure shows that pre-operative LOS was one of the variables that reasonably influences the output, followed by the Year of Discharge. Age and complications incurring during surgery, on the other hand, had little effect. Finally, the VC has been implemented. The Accuracy achieved by a ‘Hard’ voting technique that classifies the input data according to the mode of all predictions of the various classifiers reaches 84.0% improving by 1.0% the value achieved by DT.

Discussion

In this study, the LOS patients who underwent LC was analyzed. Specifically, starting from the computerized hospital discharge forms, Age, Gender, Presence of Comorbidities, Date of Admission, Date of Discharge and the Date of LC procedure were extracted for each patient. From the date variables, it was possible to calculate the pre-operative LOS used, together with the others, as an independent variable. To allow the facility to test how the patient clinical and demographic variables affected LOS, a MLR model was built and classification algorithms were implemented. In the first case, the obtained MLR model had an R2 equal to 0.537, which demonstrated the robustness of the model in making predictions in this particular application. The results of the statistical test showed that the variable that significantly affected LOS was only the pre-operative LOS, obvious result being the pre-operative LOS a part of the total LOS. This result becomes indicative when considering that the procedures analysed are all carried out as electives. The preoperative phase thus showed a high variability, which could be limited by prehospitalisation. Then, the classification algorithms were implemented. For this purpose, the patients were divided into three groups according to the value assumed by the LOS (< 3 days, 4–8 days, > 8 days). The best algorithms with an Accuracy of 83.0% was DT, followed by RF, SVM, MLP and NB. Class 1, consisting of all patients with a LOS between 4 and 8 days, was the one for which the worst results are obtained due to a reduced number of entries. On the other hand, the best results were obtained on Class 2, which is the one that includes patients with prolonged LOS and on whom health care management needs to focus more for waste reduction. Permutation feature importance applied to the best algorithm made it possible to identify which of the variables considered has a significant effect in classification. In addition to pre-operative LOS, already highlighted by the MLR model, Year of discharge, Age and the complications incurring during surgery also had an effect, although minimal, on the output. In fact, over the years there has been increasing use of the laparoscopic procedure even on older patients demonstrating significant benefits46. Older patients, however, have shown prolonged hospital stay in several studies. In particular, Chang et al.47 showed that patients over 80 years old had a significantly longer length of perioperative hospital stay, while those over 65 generally record higher risk of having minor complications. The same result is also confirmed by Firilas et al.48 who although not noting differences in complication rates between patients aged 65–75 and patients over 75 years, highlight how younger patients had a significantly shorter mean length of hospitalization. Complications during surgery, which in severe cases can lead to conversion to an OA, can result in a longer hospital stay and greater risk for the patient. This condition has been studied as shown previously for the older population, which generally presents with worse clinical conditions that could lead to more complicated surgery49. The dependence on the year of discharge can be easily explained by taking into account a previous study of ours showing the interventions that were put in place in Lean Six Sigma logic to reduce the LOS36. Finally, the voting technique further improved the final performance, achieving an accuracy of 84.0%.

Although a single model was created and new variables added, we could not replicate the excellent results obtained with the comparison in terms of the MLR model before and after implementation of corrective actions to reduce postoperative LOS done in a previous study21.

The clinical implications of this study are intended primarily in healthcare planning and programming activities. Knowing the variables that most impact inpatient stays as well as building predictive models that help determine the value of LOS a priori could support both the identification of corrective actions—such as a pre-hospitalization phase or an increase in operating sessions36—for process optimization but also bed management activities and waiting list management.

The limitations of the study are multiple. The comparative analysis of the model does include the need for re-intervention, conversion to OA, the degree of complexity of comorbidities considered and the presence of confounding factors, such as infections. These limits are mainly related to the source of data extraction, the hospital discharge form, which does not allow a clear clinical picture to be made. In addition, although the study includes a significant number of years of observation and patients, the fact that it is a single-center study does not allow the results obtained to be generalizable. From a methodological point of view, for the implementation of the classification algorithms we made an arbitrary subdivision not based on scientific evidence and the inclusion of Pre-Operative LOS may have obscured the effect of some comorbidities. Finally, the effects of CoViD-19 on the complexity of the treated cases are not taken into consideration50.

Conclusion

In this study, Age, Gender, Pre-operative LOS and Presence of selected Comorbidities were used as independent variables in the construction of an MLR model and classification algorithms in order to predict the LOS of patients undergoing LC at the “San Giovanni di Dio and Ruggi d’Aragona” University Hospital of Salerno (Italy) in the years 2010–2020. The obtained MLR model and implemented algorithms have been shown to be valid in predicting LC hospitalization as well as in identifying the variables that have a more significant effect among those considered.

Future developments in the work include expanding the independent variables provided as input to the model and the number of patients should include in the study to improve the performance of the algorithms and provide the most accurate tool possible.