Supervised Machine Learning for Risk Stratication of Inuenza-Like Illness: A Model to Prioritize Emerging Infectious Disease Disaster Responses

Emerging infectious diseases (EIDs), including the latest COVID-19 pandemic, have emerged and raised 2 global public health crises in recent decades. Without existing protective immunity, an EID may spread 3 rapidly and cause mass casualties in a very short time. Therefore, it is imperative to identify cases with 4 risk of disease progression for the best allocation of medical resources in case medical facilities are 5 overwhelmed with a flood of patients. This study aimed to exploit machine learning technologies to cope 6 with this challenge. The study was based on 83,227 hospital admissions with influenza-like illness and 7 we analysed the risk effects of 19 comorbidities along with age and gender for severe illness or mortality 8 risk. The experimental results revealed that the conventional decision tree (DT) models built with only 6 9 features, including age, gender, and four comorbidities, delivered the same level of prediction accuracy 10 as the state-of-the-art deep neural network models built with 18 features. Accordingly, we further studied 11 how to exploit the DT models with different sensitivity levels to determine patient triage and optimize 12 medical resource allocation in different stages of an EID disaster to aid the frontline clinicians and policy- 13 makers. In conclusion, our study demonstrated an approach to exploit machine learning technologies to 14 cope with the challenges during the outbreak of an EID.

Response" for your consideration as an original research article in Scientific Reports. 8 Our study was motivated by the observation that emerging infectious diseases (EIDs), including the latest 9 COVID-19, have led to several global medical and public health disasters. An EID may spread rapidly 10 and collapse medical facilities due to the medical demand far exceeding their capacity. Consequently, it 11 is vital to allocate patients with different disease severity risks to optimize the utilization of medical 12 resources. In this study, we aimed to combat this challenge with machine learning.

13
This inter-disciplinary study was based on a national health insurance database collected in Taiwan. We 14 analyzed 83,227 subjects admitted with influenza-like illness for risk of fatality or requiring critical cares. 15 The risk factors considered included age, gender, and 19 comorbidities. We investigated the performance 16 delivered by three different types of prediction models, namely the decision tree (DT) models, the deep 17 neural networks (DNN) models, and the logistic regression models. Our results showed that the DT 18 prediction models built with only 6 features delivered performance comparable to the state-of-the-art 19 DNN prediction models. As the structures of the DT models explicitly demonstrated the prediction logic, 20 we have concluded that DT models can be easily adapted by clinicians to provide high-risk patients with 21 the best medical treatments. Furthermore, the DT models with alternative sensitivity levels can be applied 22 in different stages of an EID disaster to effectively optimize medical resource allocation. 23 There is no conflict of interest in the submission of this manuscript, and the manuscript has been approved 24 by all authors for publication. On behalf of all co-authors, we declare that the work described is original 25 research that has not been published previously or been under consideration for publication elsewhere, 26 in whole or in part. 27 We deeply appreciate your consideration of our manuscript for publication, and we look forward to 28 receiving comments from the reviewers. If you have any questions, please do not hesitate to contact us 29 at the following addresses. emerging infectious disease may spread efficiently and lead to massive severe cases and mortality in the 6 community 5 . In particular, with a highly contagious novel respiratory infectious disease 6 , medical 7 resources, including medications, personal protective and life-supporting equipment, may be quickly 8 exhausted once hospitals are overwhelmed with infected patients 7,8 . It may inevitably cause excessive 9 mortality as demonstrated in many countries during the 2020-2021 COVID-19 pandemic 8,9 . As the 10 clinical spectrum of emerging respiratory infections may range from asymptomatic or mild respiratory 11 symptom to severe pneumonia or acute respiratory distress syndrome 10,11 , it is therefore imperative for 12 first-line physicians to prioritize scarce medical resources for critically ill patients and early symptomatic 13 patients with high risk of rapid progression and death 9,12 . However, in the early stage of the outbreak of 14 a novel respiratory infectious disease, there is usually no prior knowledge and available guidelines for 15 the physicians to optimize medical decisions. Accordingly, it is of interest to investigate how to exploit 16 machine learning (ML) technologies to cope with this challenge.

17
In recent years, ML technologies have been widely exploited in medical and public health research 12-14 .

18
ML algorithms are highly effective in analyzing interactions among multiple, complex variables in 19 clinical databases and making accurate predictions, which may take a medical practitioner months or 20 even years to accumulate sufficient experience to develop a decision making process. However, there are 21 a wide range of ML algorithms with very different characteristics and design goals. At one end of the 22 spectrum, advanced ML algorithms such as the deep neural network (DNN) 14,15 and the support vector 23 machine (SVM) 16 employ complicated non-linear transformations to achieve superior prediction 24 accuracy. However, due to the complicated non-linear transformations involved, it is essentially 25 impossible to figure out how these kinds of ML algorithms make predictions. At the other end of the 26 spectrum, ML algorithms such as decision trees (DT) 17-19 and the naïve Bayesian classifier follow highly 27 interpretable decision processes to make predictions 20 but may suffer inferior prediction accuracy due to 28 lack of non-linear transformations involved in the prediction process. The trade-off between prediction 29 accuracy and interpretability with alternative ML algorithms may be an everlasting dilemma depending 30 on different clinical applications. As pointed out by Flaxman and Vos, for some applications, using an 31 explainable approach is more understandable and favourable for physicians even when it results in a 7 slight reduction in accuracy 20 .

1
In this study, we conducted a comprehensive investigation on how ML algorithms can be exploited to 2 cope with the challenges addressed above. We analyzed how different types of prediction models, namely 3 the conventional decision tree (DT) 21,22 prediction models, the state-of-the-art Deep Neural Network 4 (DNN) prediction models 15 , as well as the logistic regression (LR) based prediction models, performed 5 in identifying the patients with high risk of rapid progression and death. With the experimental results, 6 our discussions have focused on how DT prediction models can be incorporated to facilitate physicians' 7 decisions on triaging patients based on risk of clinical deterioration and prioritizing scarce medical 8 resources for best population outcomes during a pandemic of EID. We conducted this study based on the reimbursement data of one million randomly sampled subjects 3 extracted from the de-identified National Health Insurance Research Database (NHIRD) in Taiwan. 4 Figure 1 shows the process to generate the cohort. We began with 92,376 hospitalized ILI cases during   Presence of a comorbidity was defined based on whether the patient was coded with the corresponding 19 ICD-9-CM codes within 12 months prior to the index date of the ILI-related hospitalisation.

20
With the initial 92,376 hospitalized ILI cases, we excluded 250 cases with unrecognized identity, that 21 absconded from hospitals, or committed suicide and additional 2,687 cases with incomplete records. 22 Then, we merged two consecutive records of the same patient if these two consecutive records were 23 within 14 days. In the end, a cohort containing 83,227 cases was created (Figure 1).

24
The outcome of concern was severe ILI, which was defined as the occurrence of fatality or requiring 25 critical cares such as intubation, ventilator support, extracorporeal membrane oxygenation treatment, 26 admission to an intensive care unit during the hospitalization period. This study was approved by the 27 Research Ethics Committee of National Taiwan University Hospital (#201603086RINB).

9
Experimental Procedures 1 Figure 2 shows the experimental procedure employed in this study to analyze the performance delivered 2 by different types of prediction models. The analysis began with a 2-stage feature selection process. In 3 the first stage, we employed the conventional logistic regression (LR) analysis to eliminate those features 4 that were uncorrelated to the outcome variable. Then, in the second stage, two advanced multivariate 5 analysis methods, namely being the least absolute shrinkage and selection operator (LASSO) method 31 6 and the ensemble variant of minimum redundancy maximum relevance (mRMRe) method 32,33 , were 7 employed along with the proposed DT-based method to determine the minimal subsets of the features 8 without compromising prediction performance. With the three feature sets output by the LASSO, the 9 mRMRe, and the proposed DT-based method, we proceeded to build the DT 17-19 , the LR 34 , and the deep 10 neural network (DNN) 15 prediction models. Finally, the performance of these different prediction models 11 were evaluated using the 10-fold cross validation 35 .

13
The feature selection process began with the 21-variable feature set shown in Table 1. Table 1 also shows 14 the results of the first-stage LR based analysis. Since the p-values with mental illness, hypothyroidism, 15 and hyperthyroidism were higher than 0.05, these three comorbidities were excluded. With the remaining 16 18 variables, we proceeded to carry out the DT-based multivariate analysis proposed in this study. In this 17 procedure, the DT package shown in Supplementary Table S3 was employed and the prior and cp 18 parameters 18 were set to different values in order to generate models with various sensitivity levels. 19 Supplementary Table S4 shows the DT models that delivered sensitivity at the 85%, 90%, and 95% levels, 20 respectively. Then, we selected the 6 variables that were consistently present in all of these DT models.

21
To evaluate the effectiveness of the proposed DT-based multivariate analysis, we further incorporated 22 the LASSO 31 and the mRMRe 32,33 methods to extract another two 6-variable feature sets from the 18-23 variable feature set output by the first-stage feature selection process. Then, we proceeded to build the 24 DT models, the LR models, and the DNN models based on these three 6-variable feature sets for 1 performance evaluation.

3
In this study, we developed two types of machine learning-based prediction models, namely being DT 17-4 19 and DNN models 15 . The performance of the DT models is of interest due to the explicit decision rules 5 produced by the DT algorithm, which is a unique feature favored by clinicians. However, the algorithm 6 for building a DT model is based on univariate analysis and does not incorporate any linear or non-linear 7 transformation. As a result, the prediction performance of the DT models may not match the advanced 8 prediction models when applied to those datasets in which different classes of samples are separated by 9 non-linear boundaries. In this respect, with the advantage of non-linear transformations, the state-of-the-10 art DNN models generally can deliver superior prediction performance in comparison with other types 11 of prediction models 36 . However, a DNN based model typically contains a large quantity of coefficients 12 and therefore it is almost impossible for clinicians to figure out the logic embedded in the prediction 13 process. In this study, we further investigated how the conventional LR models 34,37 performed because 14 logistic regression is widely used in medical and epidemiological research. Supplementary Table S3 15 summarizes the software packages and parameter settings employed to build the DT models and the main 16 characteristics of the DNN models. With respect to the structure of the DNN models, we actually 17 investigated the performance of more complicated networks and observed that the simple network  To evaluate model performance, we employed 10-fold cross validation to evaluate the performance of 23 our prediction models. 35 As shown in Supplementary Table S3, in order to generate the DT models with alternative performance characteristics, e.g. different levels of sensitivity, we set the prior and cp 1 parameters 18 to various values. For generating the LR models and the DNN models with alternative 2 performance characteristics, we varied the cutoff values at the outputs in order to discretize the numerical 3 outputs into binary states. Model performance was evaluated based on several metrics, including 4 accuracy, sensitivity, specificity, positive predicted value (PPV), and two additional metrics, F1 score 38 , 5 and Matthews correlation coefficient (MCC) 38 designed to report the overall performance of the 6 prediction models (Supplementary Table S5). The F1 score is defined to be the the harmonic mean of the 7 PPV and the sensitivity delivered by a prediction model and is a widely used performance metric in the 8 machine learning research community. In recent years, the scientists in the biomedical research 9 communities have also started to incorporate the F1 score to report their performance data 39 .

1
To conduct a comprehensive performance analysis, we built different types of prediction models with 2 alternative feature sets, Table 2 summaries the F1 scores delivered by these prediction models and the 3 comprehensive performance data is shown in Supplementary Tables S6 (a)-(c). The alternative feature 4 sets incorporated to build the prediction models included the three 6-variable feature sets identified by 5 the proposed DT-based analysis, the mRMRe, and the LASSO, along with the 18-variable feature set 6 identified by the logistic regression based analysis in the first stage of the feature selection process. 7 With respect to performance data shown in Table 2  identified by the proposed DT-based method, the mRMRe, and the LASSO were 0.446, 0.438, and 0.437, 1 respectively.

2
As decision-makers like to know how to allocate resources most appropriately under different scenarios, 3 Figure 3 (a)-(c) shows the DT models that delivered 95%, 90%, and 85% sensitivities, respectively. Since 4 age was placed at the top level of the tree structures in all these three models, it implied that age was the 5 most crucial factor. The DT model with 95% sensitivity revealed that patients aged over 37.79 or under 6 0.54 years suffered high risk for severe ILI. Furthermore, the following two groups of patients also 7 suffered high risk for severe ILI: (1) patients aged between 14.21 and 37.79 with heart disease, CVA, 8 diabetes, metastatic cancer; and (2) male gender aged between 34.46 and 37.79 (Figure 3 (a)). The DT 9 model with 90% sensitivity revealed that those patients older than 66.04 years-old suffered the highest 10 risk of progression to severe illness. Furthermore, those female patients aged between 41.46 and 66.04 11 and with CVA, diabetes, heart disease, and metastatic cancer also suffered high risk for severe ILI ( Figure   12 3 (b)). The DT model with 85% sensitivity identified the following three groups of patients that suffered We have conducted a comprehensive analysis on how to exploit machine learning algorithms to stratify 2 the risk of severe illness or death among hospitalized ILI patients. There were three major findings in 3 this study. Firstly, the three different types of prediction models investigated in this study, namely the 4 DNN models, the LR models, and the proposed DT based models, delivered comparable performance in 5 predicting severe ILI after hospitalization. Secondly, the tree structures of the DT models explicitly 6 illustrated how predictions were made and provide valuable guidelines for clinicians to develop effective 7 strategies for risk stratification of ILI patients. Thirdly, the clinicians can employ the DT models with an 8 appropriate sensitivity levels to cope with the availability of medical resources and public health needs 9 in different epidemic stages of an EID disaster.

10
With respect to the performance of the different types of prediction models, namely the DT models, the 11 LR models, and the DNN models, our results may be confusing for some machine learning experts who of prediction models may deliver comparable performance. In other words, the DNN models may not 17 prevail in this case, which was exactly what we observed in this study. In fact, we also observed a similar 18 result from one of our recent study on dengue 13 . 19 With the DT models being able to deliver performance comparable to the state-of-the-art DNN models, 20 the explicit prediction rules presented in the DT structures provide valuable references for developing 21 effective clinical strategies. All the studied DT models with different sensitivities identified age seniority 22 as the most critical risk factor for severe ILI. This result is in conformity with clinical experiences as age seniority, along with comorbid medical conditions such as diabetes 2,26 , cirrhosis 41 , malignant diseases 26,42 , 1 etc., has been recognized as one of the crucial risk factors for severe ILI. Furthermore, the cutoffs 2 employed by the DT models to partition age groups are in conformity with clinical insights. Nevertheless, 3 these cutoffs along with the comorbidities identified in the DT structures provide clinicians with 4 systematic clues regarding how to treat the patients most effectively when facing an EID.

5
The DT models with different levels of sensitivity can be employed by the first-line clinicians and policy-6 makers for modifying patient's triage and resource allocation in different stages of an EID disaster 7 (Figure 4). In the early stage of an EID disaster, when the healthcare capacity is adequate, the DT models 8 with high sensitivity levels should be employed to identify patients with risk of disease progression so 9 that they may be hospitalized and receive early and the best treatment 9,43 to minimize deaths. As the 10 development of the EID disaster progresses, the tremendous increase of patient number and the demand 11 of medical resources may rapidly exceed the surging capacity of medical facilities. In this scenario, 12 clinicians may be forced to discharge patients with mild symptoms at the early stage of infection 13 irrespective of their potential risk for subsequent deterioration 44 . Accordingly, the DT models with a 14 lower sensitivity should be employed to identify only those patients with high risk to progress to severe 15 infection or even death for hospitalization and therefore avoid collapse of medical facilities. For example, 16 with the DT model that delivered 85% sensitivity in our study, 48.3% of the admitted ILI patients might 17 be released from medical facilities. This strategy is of particular value in EID pandemic such as the 18 COVID-19 outbreaks since nearly all countries with community outbreaks experienced unprecedented 19 mortality due to collapse of the healthcare system.

20
There are several limitations in the current study. Firstly, the diagnosis of ILI was based on ICD-9-CM 21 codes without laboratory confirmation of influenza. Nevertheless, ILI-related clinical syndromes may be 22 the best surrogate diagnostic category representative of patients with community-onset respiratory 23 infections that may progress towards severe illness and death 24,26 . Secondly, other potential confounding 24 factors that may influence the prognosis of respiratory infections, including obesity 45,46 , smoking 47 , 1 geographic distributions 48 , social economic conditions 49 , and the use of antiviral treatments 50,51 were not 2 available in the NHIRD database. Thirdly, we did not investigate the performance of other advanced 3 machine learning algorithms such as the support vector machine, random forests, Bayesian networks, etc. 4 Nevertheless, it is generally observed that the DNN based prediction models can deliver comparable or 5 ever superior performance when compared with other advanced machine learning algorithms. Fourthly, 6 as our experimental data was extracted from a single national insurance reimbursement database, readers 7 should be cautious to generalize our findings before further validation studies are conducted.

8
In conclusion, our results showed that the DT-based prediction models delivered performance 9 comparable to the DNN models in predicting ILI severity. The explicit prediction logic shown in the DT 10 structures may be exploited to facilitate the decision-making process executed by clinicians. Furthermore, 11 the DT models with alternative sensitivity levels can be exploited in different stages of an EID disaster 12 to optimize medical resource allocation, which is crucial in the response to a large-scale epidemic of 13 emerging infectious disease. The analysis began with a 2-stage feature selection process. In the first stage, the conventional logistic 6 regression analysis was employed to eliminate those features that were uncorrelated to the outcome 7 variable. In the second stage, the proposed DT-based method along with two advanced multivariate 8 analysis methods, namely being the least absolute shrinkage and selection operator (LASSO) 9 method 31 and the ensemble variant of minimum redundancy maximum relevance (mRMRe) 10 method 32,33 , were employed to generate three 6-variable feature sets. Then, these three 6-variable 11 feature sets were employed to build the DT, LR, and DNN prediction models. Finally, the 12 performance of the alternative prediction models was evaluated based on the 10-fold cross validation 13 process. 14 *Abbreviations: CVA (cerebrovascular accident).      In the early stage of an EID disaster, when the healthcare capacity is adequate, the DT models with 4 high sensitivity levels should be employed to identify patients with risk of infection progression for 5 them to be hospitalized and receive the best treatment 9,43 . In the later stages of an EID disaster, the 6 available medical resources may be exhausted due to a tremendous increase of patients. In this 7 scenario, the DT models with a lower sensitivity should be employed to recommend only those 8 patients with a high risk of progressing to severe infection or death for hospitalization and thereby 9 avoid collapse of medical facilities.  The analysis began with a 2-stage feature selection process. In the first stage, the conventional logistic 3 regression analysis was employed to eliminate those features that were uncorrelated to the outcome 4 variable. In the second stage, the proposed DT-based method along with two advanced multivariate 5 analysis methods, namely being the least absolute shrinkage and selection operator (LASSO) method 31 6 and the ensemble variant of minimum redundancy maximum relevance (mRMRe) method 32,33 , were 7 employed to generate three 6-variable feature sets. Then, these three 6-variable feature sets were 8 employed to build the DT, LR, and DNN prediction models. Finally, the performance of the alternative 9 prediction models was evaluated based on the 10-fold cross validation process.     In the early stage of an EID disaster, when the healthcare capacity is adequate, the DT models with high 4 sensitivity levels should be employed to identify patients with risk of infection progression for them to be 5 hospitalized and receive the best treatment 9,43 . In the later stages of an EID disaster, the available medical 6 resources may be exhausted due to a tremendous increase of patients. In this scenario, the DT models with a 7 lower sensitivity should be employed to recommend only those patients with a high risk of progressing to   interests that might be perceived to influence the results and/or discussion reported in this paper.  Transparency: On behalf of all authors, the lead author affirms that this manuscript is an honest, 11 accurate, and transparent account of the study being reported; that no important aspects of the study 12 have been omitted; and that any discrepancies from the study as planned (and, if relevant, registered) 13 have been explained.