Efficient management strategy of COVID-19 patients based on cluster analysis and clinical decision tree classification

Early classification and risk assessment for COVID-19 patients are critical for improving their terminal prognosis, and preventing the patients deteriorate into severe or critical situation. We performed a retrospective study on 222 COVID-19 patients in Wuhan treated between January 23rd and February 28th, 2020. A decision tree algorithm has been established including multiple factor logistic for cluster analyses that were performed to assess the predictive value of presumptive clinical diagnosis and features including characteristic signs and symptoms of COVID-19 patients. Therapeutic efficacy was evaluated by adopting Kaplan–Meier survival curve analysis and cox risk regression. The 222 patients were then clustered into two groups: cluster I (common type) and cluster II (high-risk type). High-risk cases can be judged from their clinical characteristics, including: age > 50 years, chest CT images with multiple ground glass or wetting shadows, etc. Based on the classification analysis and risk factor analysis, a decision tree algorithm and management flow chart were established, which can help well recognize individuals who needs hospitalization and improve the clinical prognosis of the COVID-19 patients. Our risk factor analysis and management process suggestions are useful for improving the overall clinical prognosis and optimize the utilization of public health resources during treatment of COVID-19 patients.

www.nature.com/scientificreports/ same clinical treatment methods. Furthermore, significant difference and deviation, high false negative rate and false positive rate occurred in the existing SARS-CoV-2 virus nucleic acid testing methods and testing reagents, leading to the fuzzy statistical result of the true positive rate 9 . The objective clinical features are relatively stable in different individuals, and has been proved to help prediction of the prognosis of various diseases. Therefore, though nucleic acid test is crucial for confirming the infection of SARS-CoV-2 virus, the objective clinical features of COVID-19 patients should be more convincing.
According to the clinical data, the estimated incubation time for COVID-19 is 4 days (interquartile range: 2-7 days), 81% of the COVID-19 patients have uncomplicated or mild illness, 19% of them might develop severe or critical illness 1,10,11 . Patients with older age and comorbidities were proved to be at great risk of developing into severe or critical situation even death 11,12 . Heterogeneity of the COVID-19 disease was reported, COVID-19 patients can present distinct prognosis following the treatment. However, factors associated to the different prognosis of COVID-19 patients and clinical judgment of severe or critical cases at early stage is still unclear, which could not be solved by current diagnosis and treatment guidance for COVID-19 disease. Therefore, it is urgent to establish effective clinical pathways and processes for clinical classification of the cases, to distinguish the severe and critical ones at early stage in confirmed and suspected cases, identify and prevent the deterioration of the COVID-19 disease.
In current research, we performed a retrospective study for classification of confirmed COVID-19 cases with similar early clinical features in Wuhan, including both nucleic acid positive and negative cases. Though the true positive and negative rate of these cases were not confirmed, we consider that these patients can fully display the overview of COVID-19 disease. Comparing with the previous studies, here we studied the different disease processes and prognosis of COVID-19 patients and gave clinical classifications for these cases base on the objective clinical features. We further concluded an efficient work chart for prompt diagnosis and appropriate management of COVID-19 patients. Our research can offer identifying method and clear treatment process for general COVID-19 patients, distinguishing cases who might deteriorate into severe or critical situation and improving their terminal prognosis.

Methods
Data collection. All the data are collected from infected COVID-19 patients admitted to General Hospital of Chinese PLA Central Theater Command between January 23rd, 2020 and February 28th, 2020, who are ordinary citizens in Wuhan and has been cured or died after clinical treatment. These patients were confirmed to be infected by positive nucleic acid test or clinical diagnosis. All the cases were negative for respiratory virus including respiratory syncytial virus and influenza viruses, etc. The study was approved by the General Hospital of Chinese PLA Central Theater Command Ethics Committee. All methods were performed in accordance with the relevant guidelines and regulations. Since this is a retrospective study need for informed consent was waived by General Hospital of Chinese PLA Central Theater Command Ethics Committee.
The collected data included basic information, clinical symptoms, course of disease, comorbidities, chest CT scanning presentations, first blood test results, initial outcomes (cured, aggravation or death), and final outcomes (cured or death) using standard case report forms. Clinical clarifications of the database were performed by cluster analysis method based on the clinical objective indexes of the patients including gender, age, course of disease, comorbidities, clinical symptoms and chest CT images.
Prognosis of the patients were clarified into early and terminal prognosis. Early prognosis is defined as the status of all patients at the time of their first disease transition, including cured, aggravation or death. Terminal prognosis is defined as the terminal prognosis of all patients, including cured and dead. Acute exacerbation was defined as gradually exacerbation in sequence of mild, common, severe, or critically ill. Respiratory failure or death of mild cases or common cases after three days of hospitalization was also defined as acute exacerbation. Statistical analysis. Cluster analysis was used to explore the influencing factors and clinical typing of disease prognosis. Survival analysis and cox regression analysis were performed to evaluate the effects of treatment interventions and the associated risks of prognosis. K-means cluster analysis method was adopted for the cluster analysis, data processing and calculation were performed in SPSS statistical software version 26.0 (IBM Corp, Armonk, NY, USA, 2011). The decision tree model was built by adopting exhaustive CHAID method (exhaustive chi squared automatic interaction) and validated by confusion matrix analysis. Counting data were expressed as percentages, and measurement data were expressed as mean ± standard deviation (SD). Chi-square test and Fisher-exact test were used to compare the difference among the counting data. Independent sample t test was used for analysis of measurement data, p < 0.05 was considered statistically different.

Results
222 confirmed COVID-19 cases were admitted in this study, their clinical features are listed in Table 1 and Fig. 1. According to the final outcome, all the cases were divided into two groups (recovery group and death group), and their clinical features were compared (Table 1). These data also demonstrated that clinical diagnosis other than nucleic acid testing only is essential for confirming COVID-19 disease. Epidemic history, objective clinical features and chest CT manifestations should be primarily considered for timely treatment of COVID-19. No significant difference occurred between the two groups (Chi square 0.020, P = 0.887), suggesting that the occurrence time of negative nucleic acid test result has little effect on the final prognosis of COVID-19 patients. Therefore, the main goal of treatment for COVID-19 patients should not be nucleic acid negative only.
The following factors were finally applied in cluster analysis : (1) age of 50 years; (2) comorbidities including smoking, diabetes, hypertension, coronary heart disease, cerebral infarction, chronic renal failure; (3) clinical symptoms including cough, fatigue, anorexia chest tightness; (4) chest CT manifestation like multiple small www.nature.com/scientificreports/ patchy shadows, multiple ground glass shadow or infiltrating shadow. The 222 patients were then divided into two groups (Table 2 and Table S1). Based on the clinical characteristics and prognosis, the two groups were named as cluster I (common type) and cluster II (high-risk type). As depicted in Fig. 2, the mean survival time for cluster II patients was 40.4 days (95% CI 37.8-43.0 days), which was significantly shorter (Kaplan-Meier survival curve analysis, chi square 8.873, P = 0.003) than that for cluster I patients (55.1 days, 95% CI 54.4-57.4 days).
The main clinical features of cluster II patients were age > 50 years, cough, fatigue, anorexia and chest CT images with multiple ground glass or infiltrates ( Table 2). Other typical clinical features of cluster II patients include: comorbidities like smoking, diabetes, hypertension, coronary heart disease, cerebral infarction and chronic renal failure; hyper-inflammatory state occurred in these patients ( Table 2 and Table S1). According to logistic regression analysis data, shortness of breath, smoking, diabetes, hypertension and coronary heart disease, multiple ground glass or infiltrative shadow on chest CT were the risk factors for patients to develop into severe or critical diseases, while the degree of disease was not related to age, fever or positive nucleic acid test results (Table S2). Two types of CT manifestations were related to prognosis: multiple small patchy shadows, multiple ground glass shadow and infiltrating shadow. Patients with multiple small patches presented a better prognosis, with a lower exacerbation rate and mortality (Table S2). Their estimated mean time of progression to severe disease was 16.9 days (95% CI 13.6-20.2 days), significantly shorter than the 29.8 days (95% CI 29.3-30.2 days) for those without multiple ground glass shadow or infiltrative shadow (Kaplan-Meier survival curve analysis, Chi square 43.687, P = 0.000). All the patients were then clarified into 4 groups according to the chest CT images (Table S3).
Since no specific drug targeting COVID-19 disease has been explored, multiple drugs are applied in treating the patients. Here we performed Kaplan-Meier survival curve analysis and cox risk regression by using the cluster analysis factors, aiming to give a brief evaluation of these drugs ( Figure S1 and Table 3). Both nucleic acid negative and positive patients with anorexia presented increased risks of death, which might be improved by using oseltamivir (Table 3). Treatment with lopinavir and ritonavir could reduce the risk of death in all the patients especially in nucleic acid positive patients. Oseltamivir can prolong the survival time of nucleic acid negative patients, and glucocorticoid and immunoglobulin can significantly shorten the survival time of nucleic www.nature.com/scientificreports/ acid positive patients; while lopinavir and ritonavir could not improve survival in nucleic acid negative or positive patients ( Figure S1). According to the conclusions obtained above, we established a decision tree for determining the severe or critical cases by combining our clinical experiences with the analysis of factors involved in acute exacerbation and risk of death for patients after the early prognosis ( Figure S2 and Figure S3). According to this tree classification model, 90.1% of the patients with risk of acute exacerbation and death (risk probability 9.9%) might occur after early prognosis ( Figure S2). In addition, we also build a decision tree model without chest CT results by using short breath, fever, number of comorbidities and age (> 50 years) as independent variables ( Figure S3). This model can predict 86% of the patients with risk of acute exacerbation and death (14% of risk probability) after early prognosis. Confusion matrix analysis was also employed to validate the decision tree model (Tables S4-S7).
Finally, we suggest an efficiency therapeutic scheme for treatment of COVID-19 patients in general areas (Fig. 3) and areas with limited medical resources (Fig. 4). According to the two flow charts, epidemiological history of patients was primarily considered and followed by their clinical symptoms. For the confirmed cases, chest CT scan is then suggested for pneumonia examination. In particular, for patients in areas with limited medical resources where chest CT examination is unavailable, age is recommended as the key indicator (Fig. 4).  www.nature.com/scientificreports/ Patients without pneumonia should be checked by further nucleic acid test or anti-body test, which is essential for subsequent treatment (Fig. 3). General treatment should be applied for SARS-CoV-2 virus positive patients. These tests can be left out in areas with limited medical resources (Fig. 4). Combining the clinical diagnosis and CT images, patients with pneumonia can be clarified into 4 groups (A, B, C and D). Detailed medical therapy for these 4 groups of patients can further confirmed by classification of their clinical phenotypes (Figs. 3 and  4). Specifically, these patients can be divided into two types (common type and high-risk type) after clustering analysis by using chest CT manifestation, negative clinical features such as age and comorbidities. Consequently, these patients can be treated timely according to their appropriate therapies, which is important before the confirmation of nucleic acid testing result.

Discussion
Positive SARS-CoV-2 nucleic acid test is now considered crucial for confirming a COVID-19 case. However, a large number of nucleic acid negative patients with epidemiological history, same clinical manifestations and chest CT performance of COVID-19 existed in the endemic area, which was neglected in the initial COVID-19 treatment plan. In considering the uncertain false negative rate of nucleic acid test and unascertained cause of pneumonia by known viruses or other pathogens, these cases were included in our study, which was critical for diagnosis of COVID-19 disease by referring to the chest CT and clinical manifestations in epidemic area. Routine testing for non-SARS-CoV-2 respiratory pathogens during the COVID-19 pandemic was considered www.nature.com/scientificreports/ unlikely to provide clinical benefit unless a positive result would change disease management (e.g., neuraminidase inhibitors for influenza in appropriate patients) 13 . All of the patients in current study were confirmed as influenza virus negative, and their symptoms could not be alleviated by anti-influenza drug abidor, thus they cannot simply be considered as a patient with influenza virus-negative influenza pneumonia. Due to the absence of clinical anatomical study for these patients, we speculate that positive nucleic acid test result might not be an early manifestation of COVID-19 patients. Moreover, regarding of the limitations of current nucleic acid detection technology or the characteristics of COVID-19, further research is urgent for the treatment of these patients. Our data also indicated that the clinical response and terminal prognosis of these patients with similar chest CT and objective clinical manifestations was not affected by the time and results of nucleic acid test, or when the nucleic acid test result changed from positive into negative. We thus believe that the objective clinical performance and the final clinical prognosis goals, rather than the nucleic acid negative conversion ratio only, should be considered for effective treatment of the COVID-19 cases. We also suggest that improving the survival rate of the COVID-19 patients, rather than alleviating their clinical symptoms, is crucial to evaluate the therapeutic effects of drugs and treatments. Our cluster analysis indicated that COVID-19 patients can be divided into groups with different clinical prognosis outcomes based on their chest CT features, objective clinical manifestations and related risk factors. Therefore, classification management of the patients is essential in their isolation protection and clinical treatment www.nature.com/scientificreports/ due to the heterology of COVID-19. Though a decrease in oxygen saturation is considered as the indicator for severe cases, it is not feasible to effectively identify the severe illness by referring to their oxygen partial pressure and oxygen saturation. Here we found that patients with dry cough, abdominal pain and anorexia might easily develop into severe illness or die. Radiography is essential examination for conformation and early diagnosis of COVID-19 disease 14 . Multiple ground glass shadow and infiltrating shadow may occur before a positive nucleic acid test, and present a good consistency with pathological manifestations 15,16 . Therefore, chest CT manifestations can be used to predict the immune state, pathological and physiological conditions of COVID-19 patients.
Older COVID-19 patients and those with comorbidities have increased risk for severe disease and death 11,17-21 . We also found age could be used for clinical classification and prognosis of the patients in areas without chest CT testing facilities. Nucleic acid positive cases with abdominal pain and anorexia, cases with more than two comorbidities, cases with dyspnea, anorexia, and multiple ground glass shadow or infiltrating shadow in Chest CT images got higher risks for acute aggravation of illness, and needs timely hospital admission. We therefore consider that identifying the objective clinical manifestations of patients is more important for timely management of COVID-19 patients in epidemic area. Corresponding treatment should be performed according to the risk evaluation, regardless of the nucleic acid testing results, which is instructive for mild patients under home quarantine or hospital isolation.
Management of the COVID-19 patients brings great challenges and stress to health-care system of epidemic areas. Therefore, effective clinical pathways and processes for clinical classification of COVID-19 patients are essential to distinguish the severe and critical ones at early stage in confirmed and suspected cases, identify and prevent the deterioration of the COVID-19 patients. Multiple studies have raised factors like comorbidities, inflammatory cytokines and lymphocytes as the predictors of disease severity for COVID-19 patients, which can help to identify the severe and critical cases timely [17][18][19]21,22 . A recent article suggested to develop and validate a clinical score at hospital admission to predict which patients with COVID-19 will develop critical illness 23 . The flow chart we proposed will help general population to diagnose themselves when get clinical features of COVID-19, also make for timely and efficient treatment of confirmed COVID-19 patients by fast classification.
Our study also has some limitations. (1) Some of the patients presented negative result even after several SARS-CoV-2 virus nucleic acid tests. Since these patients could not be distinguished from nucleic acid positive patients with same clinical features, we decided to include these cases into our study. Though this is close to the real clinical practices, these patients might be misdiagnosed considering the existence of false negative results for known respiratory viruses including influenza viruses, as well as the absence of autopsy study for such cases. (2) Since various methods were used for SARS-CoV-2 virus nucleic acid tests, the positive rate, false negative rate and negative predictive value of the test methods were not effectively evaluated. Although the objective clinical features were emphasized, deviations might also exist during the evaluation of clinical data for these nucleic acid negative patients. We will perform antibody test and further confirm the infection of SARS-CoV-2 virus for these patients in the following clinical practices. (3) Single-center study with fewer mild cases has limited our study, further research will be performed to improve our conclusions.
Innovation of our study include: (1) The clinical classification based on objective clinical manifestations is helpful for early identification of high-risk patients and their further clinical treatment; Epidemic history and objective clinical features of the patients should be considered for early prognosis; (2) High-risk patients can be judged from their clinical characteristics (age > 50 years, chest CT images with multiple ground glass or wetting shadows, etc.); (3) Clinical effects of current treatments were evaluated; (4) The timing and purpose of nucleic acid test is proposed based on prognostic classification; (5) A clear flow chart for efficient management of COVID-19 patients is proposed, which can help get effective allocation of medical resources. www.nature.com/scientificreports/