Prognosis prediction in traumatic brain injury patients using machine learning algorithms

Predicting treatment outcomes in traumatic brain injury (TBI) patients is challenging worldwide. The present study aimed to achieve the most accurate machine learning (ML) algorithms to predict the outcomes of TBI treatment by evaluating demographic features, laboratory data, imaging indices, and clinical features. We used data from 3347 patients admitted to a tertiary trauma centre in Iran from 2016 to 2021. After the exclusion of incomplete data, 1653 patients remained. We used ML algorithms such as random forest (RF) and decision tree (DT) with ten-fold cross-validation to develop the best prediction model. Our findings reveal that among different variables included in this study, the motor component of the Glasgow coma scale, the condition of pupils, and the condition of cisterns were the most reliable features for predicting in-hospital mortality, while the patients’ age takes the place of cisterns condition when considering the long-term survival of TBI patients. Also, we found that the RF algorithm is the best model to predict the short-term mortality of TBI patients. However, the generalized linear model (GLM) algorithm showed the best performance (with an accuracy rate of 82.03 ± 2.34) in predicting the long-term survival of patients. Our results showed that using appropriate markers and with further development, ML has the potential to predict TBI patients’ survival in the short- and long-term.


Literature review
In 2009, Guler et al. 13 investigated the application of artificial neural network (ANN) to develop a diagnostic system and determine the severity of TBI. This small study analyzed simple clinical features among 32 cases, including vital signs, GCS, and electroencephalography (EEG), using a 3-layered ANN to find the similarities. This study showed that neurological and systematic features of TBI cases are similar by more than 90%.
Rughani et al. 14 used 11 clinical inputs to predict hospital survival in individuals with head injury by an ANN and compared it with clinician diagnosis and regression models. The data analysis of 7769 patients showed that ANN models are more accurate, sensitive, and discriminating than clinicians and regression models. The specificity, however, was the same across all models. Although this study showed that ANN would represent a more efficient model for predicting the outcomes of patients with head injuries, there is still a significant gap between the present models and the actual clinical scenarios.
In a study by Shi et al. 5 , ANN was used to develop more accurate predictor models for in-hospital mortality after TBI surgery. The clinical inputs of 16,956 patients were analyzed to compare the performance of ANN and logistic regression (LR) models. Like previous observations, this study showed that ANN model is significantly more accurate, sensitive, and specific. Moreover, the ANN model demonstrated a higher area under the curve (AUC), positive predictive value (PPV), and negative predictive value (NPV). The findings showed that hospital volume, Charlson comorbidity index, length of stay, sex, and age would represent the best prediction of inhospital mortality after TBI surgery.
Chong et al. 15 compared the efficiency of ML and LR in predicting TBI. This retrospective case-control study included 39 TBI cases and 156 age-matched controls hospitalized from 2006 to 2014. Then, the performance of ML and LR in the prediction of TBI was compared using receiver operating characteristics (ROC). The findings indicated that analysis of four novel features (involvement in road traffic accidents, loss of consciousness, vomiting, and signs of a base of skull fracture) by ML improved diagnostic parameters (sensitivity (94.9% vs 82.1%), specificity (97.4% vs 92.3%), PPV (90.2% vs 72.7%), NPV (98.7% vs 95.4%), and area under the curve (0.98 vs. 0.93)) in comparison with LR.
In 2015, Lu et al. 16 investigated the application of ANN in predicting long-term outcomes in TBI cases. This study included different clinical variables, such as GCS (at admission, 7th day, and 14th day), gender, blood sugar, white blood cells, history of diabetes and hypertension, pupil size, diagnosis to predict the 6-month GOS using ANN, Naïve Bayes (NB), DT, and LR. The findings of 128 adult participants showed that ANN has the best performance among different models (AUC of 96.13%, sensitivity of 83.5%, and specificity of 89.73%).
Another study by Beliveau et al. 17 tried to optimize the prediction models of the one-year functioning of patients with TBI. Using clinical data from 3142 cases, this prospective study increased the diagnostic parameters of AI through novel techniques, including a subset of train and tests. The results indicated that ANN and other models, like LR, generally have high accuracy with the same AUC.
The study by Pourahmad et al. 18 was another attempt to optimize the predictive models of prediction in TBI patients. The clinical features of 410 cases (including age, gender, CT scan findings, pulse rate, respiratory rate, pupil size, reactivity, and cause of injury) admitted to Shahid Rajaee Hospital with GCS ≤ 10 were analysed by a 4-layered ANN combined with DT. This hybrid model improved the accuracy (86.3% vs. 82.2%), sensitivity (55.1% vs. 47.6%), specificity (93.6% vs. 91.1%), and AUC (0.705 vs. 0.695) of the prediction of 6-month GOS in patients with TBI.
In 2019, Hale et al. 19 applied computed tomography (CT) scans in broadly diagnosing TBI. In this study, six clinical features and 17 different variables of CT scan of 480 patients (< 18 years old) were included in an analysis by a two-layer feed-forward ANN with 11 sigmoid hidden and softmax output neurons. The results of this study showed that applying a CT scan to diagnose clinically relevant TBI would significantly increase all diagnostic parameters and achieve a highly optimized predictive model in the future.
A recent study by Abujaber et al. 20 investigated the application of ML models to predict in-hospital mortality for patients with TBI. The clinical and demographic features of 1620 patients, alongside their CT scan findings, were included in this study to develop efficient models using ANN and support vector machines (SVM). The results showed that SVM is more sensitive (73 vs. 62), accurate (95.6 vs. 91.6), and specific (99 vs. 96) than ANN and has a higher AUC (96 vs. 93.5) and F-score (0.8 vs. 0.64) in predicting the in-hospital mortality.
Recently, Thara et al. 21 conducted a novel study comparing ML and nomogram performance in predicting intracranial injury in children with TBI. Initially, the clinical parameters of 964 young patients with mild TBI, such as age, sex, road traffic injury, loss of consciousness, amnesia, hemiparesis, scalp injury, bleeding per nose or ear, hypotension, bradycardia, seizure, GCS at emergency department (ED), pupillary light reflex were fed to various classifiers namely SVM, LR, NB, k-nearest neighbors, DT, RF, gradient boosting classifier (GBC), and ANN. The findings showed that RF best predicts pediatric TBI using different clinical features, especially CT scans.
In 2021, Hodel et al. 22 explored databases such as EBSCOhost CINAHL Complete, PubMed, and IEEE Xplore, to find all publications that developed prediction models for spinal cord injury (SCI). The searches showed that twelve different predictive models were developed in seven unique studies to predict the following clinical outcomes in patients with SCI. This review clearly showed that providing a comprehensive overview of patients with neurological traumas using different ML models would improve our clinical decision-making in the future to make the least mistakes.
Mawdsley et al. 23 conducted a study to systematically review the efficiency of ML models in predicting different psychosocial aspects of TBI cases. This comprehensive study found nine studies that included eleven types of ML to predict various outcomes. The findings showed that although these models could successfully develop predictive models, there is a lack of evidence to choose ML algorithms as a reliable tool in clinical decision makings.
In 2017, a critical review by Alanazi et al. 24 evaluated the quality of ML models in predicting patients' outcomes with different disorders. This study showed that AI could provide several promising models to predict these outcomes using patients' multiple clinical, demographical, and imaging data. But, still, we face some limitations in applying these models in clinical situations. Some studies indicated that these novel models would demonstrate significant errors and low efficiency even using the same database. Therefore, further studies are required to increase the reliability of provided models in the future.
In 2022, Choi et al. 25 developed new models to predict the diagnosis and prognosis of TBI patients at the prehospital stage. This multi-center retrospective study included 1169 TBI cases that were admitted from 2014 to 2018 in different hospitals in Korea. Various features, such as intracranial hemorrhage, admission with/without the ED, and other demographic characteristics, were applied in five ML models, including LR, extreme gradient boosting, SVM, RF, and elastic net (EN). The findings of this study confirmed that EN would significantly develop the overview of the prediction of TBI outcomes at the prehospital stage by increasing AUC, specificity, and sensitivity.
In this year, Daley et al. 26 tried to provide effective ML-based models to predict severe TBI in admitted patients. This study used neurological and biological data, such as partial thromboplastin time (PTT), motor component of GCS, serum glucose, the fixed pupil(s), platelet count, and creatinine to evaluate the predictive performance of different ML algorithms in the prediction of TBI in 196 admitted children. The findings of this study showed that the optimized models achieve the highest available accuracy (82%) and AUC (0.90).
There are inconsistencies in choosing the best clinical or para-clinical features and the most accurate machine learning model to predict the TBI patients' outcomes. Hence, the present study is designed to address these problems by recruiting a large population and a wide range of variables using different ML and regression algorithms.
Dataset description. We used data from 3347 patients in the present study collected from admitted patients at Shahid Rajaee Hospital (Tertiary Trauma Centre), Shiraz, from 2016 to 2021. After the exclusion of patients with incomplete data, 1653 patients remained. The mean ± SD age of the final studied population was 39.55 ± 19.41, which consisted of 1371 men (82.9%). The set of features gathered from the studied patients are available in Table 1.
To use the dataset in this research regarding diagnostic and therapeutic purposes, institutional approval was granted on the grounds of existing datasets. Informed consent was obtained from all subjects and/or their legal guardian(s). All methods were compliant with relevant guidelines and regulations. To use data, ethical approval was obtained from Shahid Rajaee Hospital (Tertiary Trauma Centre), Shiraz, Iran.
The demographic features included age, gender, smoking (smoker, non-smoker), opium (addicted, nonaddicted), health status, hypertension, diabetes mellitus, and cardiovascular disease by asking the patients while taking history. Also, GCS and pupil condition (anisocoric/brisk/fixed/sluggish/unable to check/bilateral non-reactive) were measured during a physical exam. The laboratory data of patients, including international normalized ratio (INR), blood sugar (BS), and fibrinogen level, were recorded from reported measurements in electronic documents. The Marshall score, subarachnoid hemorrhage (SAH), intraventricular hemorrhage (IVH), epidural hematoma (EDH), subdural hematoma (SDH), intracerebral hemorrhage (ICH), base of skull fracture, depressed skull fracture, and cisterna were evaluated using CT-scan imaging. The GOS (1 = dead/ 2 = vegetative state/ 3 = severe disability/ 4 = moderate recovery/ 5 = good recovery) and GOSE (1 = dead/ 2 = Vegetative State/ 3. Lower Severe Disability/ 4. Upper Severe Disability/ 5 = Lower Moderate Disability/ 6 = Upper Moderate Disability/ 7 = Lower Good Recovery/ 8 = Upper Good Recovery) were measured at the discharge day (GOSE0) and after 6 months (fGOSE) by trained specialists. The validity and equality of the specialist measurements were confirmed in a session to evaluate 10 cases.

Methodology
We tested a few state-of-the-art ML algorithms on the dataset according to the flowchart shown in Fig. 1. The target features of our dataset (i.e. the GOS-extended of recovered TBI patients on the GOSE0 and fGOSE) have eight values ({1, 2, …, 8}) that show the level of consciousness. Target feature equal to 1 means no consciousness and the patient dies. On the other hand, when the target feature is 8, the patient can take care of his/her personal affairs. Unfortunately, when the target feature has 8 values (8 classes are defined), the performance of classification algorithms was poor. Therefore, we converted it to a 5-class-dataset according to the physician's suggestion. To this end, classes 3 and 4, 5 and 6, and 7 and 8 were merged. As a result, the performance of classification algorithms was improved significantly.
Given that multiple ML methods have been evaluated during our experiments, they are reviewed briefly in the rest of this section. The presented review will aid with the understanding of the achieved results in the conducted experiments.  27 . The name Naïve Bayes stems from the fact that this method naïvely assumes the features representing input samples are independent. This assumption is not always valid. The classification of input samples is based on the Bayes rule and the parameter estimation is done using maximum likelihood estimation. Suppose C = {C 1 , . . . , C K } is the set of possible classes; then the probability that sample x = [x 1 x 2 . . . x n ] belongs to class C k is computed as: where Z is called the evident and computed as: Random forest (RF). One of the classic ML methods capable of handling classification and regression is random forest (RF) which is an ensemble approach. As the name implies, RF is made of multiple decision trees each of which consists of multiple decision and leaf nodes. For a classification problem with C classes, the training dataset features are used to create the nodes of the decision trees such that the Gini impurity measure is minimized 28 : where p(i) 2 is the probability that a sample from class i is picked in node n. After creating the RF, upon receiving a test sample, it is passed down to each decision tree level by level until it reaches a leaf node. The final step of www.nature.com/scientificreports/ RF is aggregation of the decision tree outputs. For regression tasks, the aggregation is done by computing the average of the decision tree outputs. For classification tasks, majority voting is performed on the classes predicted by the decision trees to obtain the final output. The schematic of RF inference is shown in Fig. 2. As can be seen, each tree is built using a subset of features of dataset samples. After feeding the input sample to decision trees, majority voting is performed on their predictions to get the predicted class 29 .

K-nearest-neighbour (KNN). K-nearest neighbour (KNN) is a simple and powerful non-parametric
supervised method, which can be used for classification and regression. To classify a test sample, K samples that are closer to the test sample (according to some distance metric) are chosen from the training dataset. In the case of regression, the predicted output for the test sample is computed by taking the average of target values corresponding to K chosen training samples. For classification tasks, the dominant label among the target labels of the K chosen training samples is chosen as the predicted label for the test sample. A typical classification using KNN with K = 8 is shown in Fig. 3. As can be seen, the training dataset contains three classes, the samples of which are shown with triangles, squares, and circles. The test sample is shown with a start. Assuming K = 8, eight nearest neighbours of the test sample are the ones within the neighbourhood circle of the test sample. Given that majority of the eight neighbours are squares, the label of the test sample is predicted as square 30 .

Rule induction (RI).
One of the ML methods closely related to decision trees is rule induction (RI), which extracts formal rules from observations such that information gain is maximized. The rules are in "if-then" format and are iteratively grown and pruned during the rule extraction process. The advantage of RI is being expressible in first-order logic and ease of encoding prior knowledge in them 31 .
Deep learning (DL). DL is one of the most promising ML methods capable of efficient feature extraction from high dimensional data. Since the emergence of DL, many challenging high dimensional problems have been solved. The primary building blocks of DL models are trainable filters (kernel) that are convolved with previous layer output (or input sample) to extract salient features depending on the learning problem objective. The process of convolving a typical 2 × 2 kernel with a 3 × 3 input image has been depicted in Fig. 4. The kernel

Gradient boosting trees (GBT).
Ensemble learning has proved to be robust and reliable in challenging learning tasks. Gradient Boosting Trees (GBT) employs an ensemble of decision trees (weak learners) to achieve good classification/regression performance while keeping the computational complexity manageable. To this end, decision trees are constrained to be shallow in depth. As shown in Fig. 5, GBT builds the first shallow decision tree using the available training samples. The samples that are misclassified by the first decision tree (set S 1 ) are then used to build the second tree. The sample set S 2 that has been misclassified by the second decision tree is used to build the third decision tree. The process continues until all of the training samples are classified correctly. The set of built decision trees forms the GBT ensemble classifier. During testing, all decision trees classify the given test samples, and their predictions are aggregated to compute the final output of the GBT 33 .

K-fold cross-validation.
In ML problems, it is customary to split the available dataset into K disjoint subsets with equal sizes and repeat the training process K times. In kth training trial, the kth subset is used for testing and the remaining K-1 subsets are used as training data 34 . As an example, the process of splitting the dataset into K = 3 subsets (also known as folds) is shown in Fig. 6.a. The three subsets {D 1 , D 2 , D 3 } have no sample in   www.nature.com/scientificreports/ common and are completely disjoint. After splitting the dataset, the training process is repeated K times. In ith training trial, D i is used as the test set. The configuration of training and test sets for K = 3 has been shown in Fig. 6.b. The K training trials yield K values per performance metric. These K values are averaged to report the final performance of ML methods. The motivation behind K-fold cross-validation is the possibility of testing ML methods on all available samples. Moreover, aggregating the performance metrics via averaging leads to a more reliable performance evaluation of the methods mentioned above.

Results
In this section, the obtained results are presented. In all of the remaining tables, the abbreviations Acc, Prec, Rec, and Avg stand for accuracy, precision, recall, and average, respectively. We have applied some of the most important classification algorithms to our patients just when they leave the hospital which has yielded the results in Table 2. The classification algorithms used in this work are NB 42 , RF 43 , KNN(k = 5) 44 , KNN(k = 6), DT 45 , RI 46 , DL 47 and Gradient Boosting Trees (GBT) 48 implemented in RapidMiner v9.10 49 . Rapidminer is a comprehensive data science platform with visual workflow design and full automation. It is one of the most popular data science tools. This platform was run on a personal computer with Intel(R) Core(TM) i5-4570, 3.20 GHz processor and 4 GB of RAM. According to the obtained result, GBT, DL, and RF have the best accuracy rate of 47.67 ± 2.65, 46.22% ± 1.60%, and 45.37% ± 1.53%, respectively, while KNN (K = 5) has the worst with an accuracy rate of 33.82% ± 2.07%.
As we have more than two classes in this test, only recall for each class was calculated. Both accuracy and recall of investigated algorithms are shown in Table 2. Table 3 shows the top 10 features with a higher role in classification and their weights. The weights are calculated by information gain 50 . The GCS motor component on admission (GCSM0), pupil, and Cisterns are the most significant features in classification, respectively.
After six months of leaving the hospital, when the target feature is fGOSE, the patients' conditions were investigated again. As it was shown in Table 4, GBT, RF, and DL have the best accuracy rate of 64.97% ± 1.62%, www.nature.com/scientificreports/ 64.97% ± 2.72%, and 64.37% ± 1.56%, respectively, while KNN (K = 5) has the worst with an accuracy rate of 55.89% ± 3.72%. As we have more than two classes in this test, only recall for each class was calculated. Therefore, the recall of each class is shown in Table 4. In addition, comparing the average accuracy in Table 2 with that of Table 4 shows that predicting the future condition of the patients according to the selected features is more reliable after 6 months. Table 5 shows the top 10 features with a higher role in classification and their weights. The weights are calculated by information gain. GCSM0, pupil, and age are the most significant features in classification, respectively. Compared to Table 2, the importance of age has increased, and now its role is more important than Cisterns.
We also checked the system's performance when the patients were classified into only two groups, dead and alive. In this case, in addition to the classification mentioned above, two more algorithms LR and GLM were also investigated, which can be applied to only two-class classification problems. In this case, the performance of Table 3. Feature weights calculated by information gain applied on 5-classes-dataset of gos0. GCSM0: motor component of GCS on admission, DC: decompressive craniotomy, INR: international normalized ratio, IVH: intraventricular hemorrhage, BS: blood sugar.  www.nature.com/scientificreports/ classification algorithms was again improved compared with the 5-class-dataset. The result of classifying patient into either dead or alive when they leave the hospital are shown in Table 6. Accordingly, the accuracy rates of all algorithms are more than 80% which shows significant improvement compared with classification algorithms applied on the 5-class-dataset. In addition, there is no significant difference between the accuracy rates of most of these algorithms. All algorithms have a performance rate between 80 and 85%. The precision, recall, and AUC are also shown in this table.
According to the results shown in Table 6, RF, GLM, and RI have the best accuracy rate, respectively. The confusion matrix of best performing RF classifier is shown in Table 7. Table 8 shows the top 10 features with a higher role in classification and their weights. The weights are calculated by information gain. Like Tables 3 and 5, the pupil has a significant role in classification. The order of other features does not have a substantial difference between Tables 3 and 5.
The results of applying the classification algorithms on the 2-class-dataset after six months of leaving the hospital are shown in Tables 9. Table 10, 11 shows the importance of the features in classification. Comparing the average accuracy in Tables 6 and 9 shows that the accuracy rate does not change significantly after six months of the patient's discharge. Finally, the confusion matrix of the best-performing GLM algorithm is shown in Table 10.
Overall, according to the results shown in Tables 2 and 4, GBT has the best performance. RFs and DL are in the next ranks. Meanwhile, the ranks of accuracy in Tables 6 and 9 show that GLM, LR, and RF have better performance than other compared algorithms in the classification of these data. Finally, it should be noted that    Tables 6 and 9. The rank-based analysis of investigated algorithms is shown in Table 12.

Discussion
The present longitudinal study primarily aimed to predict the GOS of recovered TBI patients at discharge and six months after discharge. Our findings showed that different machine learning algorithms applied in this study provide acceptable performance using collected health status, demographic features, clinical physical exams, and laboratory data. The first steps of prediction begin with classifying TBI cases' severity by baseline features. There have been controversies about ML ability to outperform human neurologists. It has been previously claimed that ML algorithms were not more efficient than neurologists 13 . However, Rughani et al. showed that ANN can outperform regression models and clinicians' categorizations regarding survival prediction of TBI patients achieving accuracy of 73% 14 .
The first aim of this paper was to find the most reliable prognostic markers related to TBI. Several features have been introduced as the most reliable variables in recent years. Shi et al. achieved acceptable predictive DL models for in-hospital mortality in patients with TBI based on clinical and demographic features such as gender, age, and Charlson comorbidity index 5 . Other features including vomiting, signs of a skull base fracture, loss of consciousness (LOC), and history of traffic accidents have been introduced as well 15 . However, our assessments   51 . Some factors may stand for different findings among the studies, such as entering different variables into the analysis. For instance, we utilized the motor component of GCS rather than the total GCS, which is broadly used in various trials 16 . Supporting our findings, previous studies confirmed that using the motor component of GCS would provide more accurate models than the total GCS 26 .
The second aim of the present study was to provide efficient ML and statistical models to predict the shortand long-term outcomes of TBI patients. The outcomes of TBI would be appropriately predicted using the clinical features of the first day of admission 9 , as discussed earlier. The first evaluations emphasized that all prediction models, based on ML or LR would achieve a high success rate 17 . According to our findings, the RF, LR, and GLM models are the most accurate models to predict the in-hospital mortality of patients (based on the 2-class GOS).
On the other hand, GLM (with an accuracy of 82%) was found to be the most accurate predictor of 6-months mortality. Instead, when using 5-class GOS, GBT was the most accurate predictor of both in-hospital and 6-months follow-up morbidity and mortality. However, as described in the results, the accuracy of the 5-class GOS is lower than the 2-class GOS. Matsuo et al. found that RF is the best model for predicting in-hospital outcomes following TBI which supports our results 52 . Lu et al. conducted a study to compare the efficacy of different ML models and LR in predicting 6-month GOS. ANN showed the best performance using clinical features, with AUC of 0.96 16 .
Applying CT scans in prediction models based on ANN achieved promising outcomes in forecasting the TBI prognosis 19 . As an example, Abujaber et al. employed CT scans as part of their feature set and reported SVM as the best method for in-hospital mortality prediction of TBI patients 20 . In a similar attempt, Steyerberg et al. introduced the Marshal score (a CT scan index) as a major feature of predicting TBI outcomes, alongside glucose, hemoglobin, hypotension, and hypoxia 10 .
The race toward achieving reliable ML model for robust clinical decision-making continues 53 . For example, Lang et al. provided clinical decision support for TBI patients capable of reducing the 7-day mortality showing the ML potential in clinical decision makings 54 . On the contrary, ML failed to outperform LR in predicting the outcome of a large database of patients with moderate to severe TBI 55 . As a result, it has been suggested that the main focus must be on including valuable prognostic markers instead of ML algorithms. Using a more limited number of features and lacking serologic markers, Bruschetta et al. 56 also reported that LR and ML may have similar performance. Finally, Kazim et al. 57 reported that ML performance is similar to correlation and multiple linear regression analysis. However, the reported results were based on only 168 patients with severe TBI. In order to present our contribution compared to the ML-based TBI diagnosis methods reviewed above, they have been summarized in Table 13.
The novelties of our proposed model are as follows: 1. We have obtained high performance using simple ML algorithms. 2. Employed large number of patients and used more features compared to existing literature. 3. We have gathered a TBI dataset in Iran. 4. New features such as INR, Fibrinogen level, and CVD/CVA, have been investigated that have not been considered in previous studies. 5. Benchmarking well-known classic ML methods (NB, RF, KNN, DT, RI, GBT) as well as DL on TBI survival prediction. 6. The collected dataset has been analysed to determine features with significant impact on fGOS and GOS0.
The calculated weights have been reported in Table 3, Table 5, Table 8, and Table 10.
The limitations of our automated system are as follows:

Conclusion
In this work, we have used ML methods such as RF and GLM for survival prediction of TBI patients in short-and long-term periods. However, significant development must be made before ML methods get ready for deployment in safety-critical applications such as medical diagnosis. According to our findings, the condition of pupils, GCSM, condition of cisterns, and the patients' age are the best predictors of their survival. As future work, the investigated models must be further evaluated. To this end, we plan to prepare larger and more versatile datasets from multiple medical centers. Having access to larger datasets leads to more robust model training and reliable evaluation. While we only focused on the mortality rate of TBI patients, investigating patients' conditions after a predefined amount of time is worthy of future research.

Data availability
The datasets used and analysed during the current study are accessible by requesting the corresponding author.