Risk factors analysis of COVID-19 patients with ARDS and prediction based on machine learning

Xu, Wan; Sun, Nan-Nan; Gao, Hai-Nv; Chen, Zhi-Yuan; Yang, Ya; Ju, Bin; Tang, Ling-Ling

doi:10.1038/s41598-021-82492-x

Download PDF

Article
Open access
Published: 03 February 2021

Risk factors analysis of COVID-19 patients with ARDS and prediction based on machine learning

Wan Xu¹^na1,
Nan-Nan Sun²^na1,
Hai-Nv Gao⁴,
Zhi-Yuan Chen²,
Ya Yang³,
Bin Ju² &
…
Ling-Ling Tang⁴

Scientific Reports volume 11, Article number: 2933 (2021) Cite this article

16k Accesses
72 Citations
19 Altmetric
Metrics details

Subjects

Abstract

COVID-19 is a newly emerging infectious disease, which is generally susceptible to human beings and has caused huge losses to people's health. Acute respiratory distress syndrome (ARDS) is one of the common clinical manifestations of severe COVID-19 and it is also responsible for the current shortage of ventilators worldwide. This study aims to analyze the clinical characteristics of COVID-19 ARDS patients and establish a diagnostic system based on artificial intelligence (AI) method to predict the probability of ARDS in COVID-19 patients. We collected clinical data of 659 COVID-19 patients from 11 regions in China. The clinical characteristics of the ARDS group and no-ARDS group of COVID-19 patients were elaborately compared and both traditional machine learning algorithms and deep learning-based method were used to build the prediction models. Results indicated that the median age of ARDS patients was 56.5 years old, which was significantly older than those with non-ARDS by 7.5 years. Male and patients with BMI > 25 were more likely to develop ARDS. The clinical features of ARDS patients included cough (80.3%), polypnea (59.2%), lung consolidation (53.9%), secondary bacterial infection (30.3%), and comorbidities such as hypertension (48.7%). Abnormal biochemical indicators such as lymphocyte count, CK, NLR, AST, LDH, and CRP were all strongly related to the aggravation of ARDS. Furthermore, through various AI methods for modeling and prediction effect evaluation based on the above risk factors, decision tree achieved the best AUC, accuracy, sensitivity and specificity in identifying the mild patients who were easy to develop ARDS, which undoubtedly helped to deliver proper care and optimize use of limited resources.

Integrating deep learning CT-scan model, biological and clinical variables to predict severity of COVID-19 patients

Article Open access 27 January 2021

Explainable artificial intelligence approaches for COVID-19 prognosis prediction using clinical markers

Article Open access 20 January 2024

Finding of the factors affecting the severity of COVID-19 based on mathematical models

Article Open access 20 December 2021

Introduction

The coronavirus disease 2019 (COVID-19) is an acute infectious pneumonia caused by a severe acute respiratory syndrome-coronavirus-2 (SARS-CoV-2) infection previously unknown to humans. Spreading mainly through the droplet route and close contact, the virus causes mild symptoms in the majority of cases, the most common being: fever, dry cough, and fatigue^1,2.

The disease has the characteristics of fast transmission and strong infectivity³. Since the outbreak in early December 2019 in Wuhan, China, it has rapidly developed into a worldwide pandemic, with more than 3 million patients confirmed to have been diagnosed with the disease in more than 200 countries, and the number of infected people is probably much higher. As of April 30, 2020, 217,769 people died of COVID-19 infection. Despite the public health responses aimed at containing the disease and delaying its spread; during the courses of treatment, due to the large increase in the demand for hospital beds and the shortage of medical equipment, coupled with the lack of specific medicine, patients with basic diseases or old age are more likely to progress to severe disease, leading to death. Recent reports show that 14.1–33.0% of COVID-19 infected patients are prone to develop into severe cases, and the mortality rate of critical cases is 61.5%, increasing sharply with age and underlying comorbidities^4,5,6,7. Furthermore, medical staff may also be infected, which makes many countries face critical care crisis. COVID-19 poses an important and urgent threat to global health.

Acute Respiratory Distress Syndrome (ARDS) is a common and devastating critical illness⁸. It has been reported that 67% of COVID-19 patients with the severe illness have developed ARDS, which is the main cause of death⁹. However, in the early stage of onset, quite a few patients have no obvious clinical symptoms, so it is difficult to judge until ARDS occurs. Predicting which patients are more likely to develop ARDS, and thus face a greater risk of complications including death, is particularly important in a novel and accelerating outbreak¹⁰. It would be useful in evaluation or prediction the public health burden or resources demand in a large scale e.g. in a city or a province.

Artificial intelligence (AI) has begun to tackle these difficult challenges in healthcare and it can provide clinical decision support if used carefully¹¹. Currently, the prediction models of COVID-19 reported mainly focus on epidemics trend, early screening, CT diagnosis, and prognosis of COVID-19 patients^12,13,14,15. Few models have been studied for early identification of patients who are most likely to develop ARDS and recommending interventions. Xiang Bai et al. established a Long Short-Term Memory (LSTM) model by combining 75 clinical features and a quantitative CT sequence data obtained at different times to predict the malignant progression of COVID-19, which achieved an AUC of 0.954¹⁶. Xiangao Jiang et al. used traditional machine learning methods such as decision tree(DT), random forest(RF), and support vector machine(SVM) to predict disease progression to ARDS in COVID-19 patients, with the overall accuracy of 70%-80%¹⁰. This study was a small sample prediction model of only 53 patients, so the prediction accuracy was slightly lower. The most-reported predictors of severe progression in patients with COVID-19 included age, sex, features derived from computed tomography scans, C reactive protein, lactic dehydrogenase, and lymphocyte count. The C index of these models ranged from 0.85 to 0.98¹⁷. However, most reports did not include a description of the study population or intended use of the models and were rated at high risk of bias at the same time. Early detection of patients who are likely to develop critical illness is of great importance and may help to deliver proper care and optimize use of limited resources. We aimed to develop the COVID-19 ARDS clinical decision support system using machine learning algorithms and deploy it into electronic medical records(EMR) to assist doctors in identifying severe patients at the time of hospital admission.

Results

Characteristics of COVID-19 patients

Tables 1, 2 and 3 lists the distribution of various parameters including demographic, epidemiology and clinical characteristics of the COVID-19 ARDS and non-ARDS populations.

Table 1 Demographic and epidemiology of the study patients.

Full size table

Table 2 clinical characteristics and underlying diseases.

Full size table

Table 3 Radiologic, laboratory findings and complications.

Full size table

Demographics and epidemiology

In this study, we collected a total of 659 patients from Wuhan and non-Wuhan areas who were confirmed with COVID-19, of which 76 patients (11.5%) developed ARDS. 447 patients (70.9%) had contact with infected persons and 44.3% had a family infection. The median incubation period was 5 days (interquartile range, 3 to 9) and the average time from onset to ARDS and admission to ARDS were 10 days and 3 days, respectively. The median age of the patients was 50 years (interquartile range, 37 to 62) and 50.4% of the patients were male. Patients with ARDS were significantly older than those with non-ARDS by a median of 7.5 years (56.5 years vs. 49 years) and male patients (76.3%) were more likely to develop ARDS. More than 50% of ARDS patients had a BMI greater than 25. However, the exposure histories of the two groups were similar (Table 1).

Clinical characteristics and underlying diseases

On severity evaluation at admission, 75.4% of COVID-19 patients were assessed as ordinary type while among the patients with ARDS, 80.3% were evaluated as severe or critical. The most common clinical symptoms of COVID-19 patients at the time of onset were fever (66.6%), cough (68.7%), expectoration (39.6%), fatigue (34.2%) and dry cough (29.7%). Encephalopathy (0.5%), hemoptysis (1.6%), vomiting (3.0%) and stuffy nose (3.8%) were uncommon. Compared with non-ARDS patients, ARDS patients had a higher frequency of coughing (80.3% vs. 67.2%) and dyspnea (59.2% vs. 11.6%). The median temperature was 37.5°C. ARDS patients were 0.5°C higher than non-ARDS patients (37.9℃ vs. 37.4℃), which was statistically significant (P < 0.001).

Overall, the presence of any comorbidities was more common among ARDS patients than no-ARDS (56.6% vs. 39.8%). Patients with ARDS had a much higher incidence of hypertension (48.7% vs.23%) and diabetes (17.8% vs.9.5%). Two of the five patients infected with other viruses developed ARDS. ARDS also occurred in one patient who was treated with immunosuppressive agents (Table 2).

Radiologic, laboratory findings and complications

Table 3 shows the results of radiologic, laboratory findings on admission and complications. 74.7% of the patients presented ground-glass shadows on chest CT images and 28.3% of the patients presented consolidation. The above two imaging features accounted for a higher proportion of patients with ARDS than non-ARDS patients, which were 80.8% vs 73.9% and 53.9% vs 24.7%, respectively. The median number of consolidation quadrant in ARDS patients was two.

Within 48 h of admission, lymphocytopenia was present in 36.1% of the patients and leukopenia in 24.8%. However, among ARDS patients, 19.7% had an increase in the white blood cell count, which indicated that ARDS patients had a secondary infection. The ratio of neutrophils to lymphocytes was greater than 3 in 45.3% of COVID-19 patients and 82.7% in ARDS patients with a median of 6.11. 47.4% and 32.2% of patients had elevated levels of C-reactive protein and lactate dehydrogenase, respectively. In a small number of patients, levels of alanine aminotransferase (ALT), glutamate aminotransferase (AST), creatine kinase (CK) and D-dimer were elevated. Laboratory abnormalities were more severe in ARDS patients than in non-ARDS patients. Besides, the medians of myoglobin and fasting glucose in ARDS patients were 85.9 μg/L and 8.1 mmol/L respectively, which exceeded the normal reference range and was significantly different from the non-ARDS group.

During hospitalization, 91.3% of patients were diagnosed with pneumonia, and there was no statistical difference between the ARDS group and non-ARDS group. However, patients with ARDS had a higher incidence of shock and secondary bacterial infection (5.5% and 30.3%) than those with non- ARDS (0 and 4.3%), and 45.2% of them were admitted to ICU (Tables 2, 3).

Prediction of risk factors for COVID-19 ARDS

After removal of variables with missing rate > 20%, a total of 98 variables consisting of demographic, epidemiology, clinical symptoms, underlying diseases, complication, CT image features and laboratory results were extracted from the structured and unstructured data of electronic medical record (EMR) according to literature reviews and expert clinician opinions. Then, we selected 19 significant risk factors related to COVID-19 by means of SPSS single factor analysis. Among all risk factors, severity evaluation at admission (odds ratio [OR], 13.206; 95%CI, 8.550–20.397; P < 0.001), gender (OR, 3.312; 95%CI, 1.979–5.544; P < 0.001), age (≥ 70 year) (OR, 19.811; 95%CI, 4.473–87.741; P < 0.001), BMI (< 23 vs. > 25) (OR, 3.717; 95%CI, 1.966 -7.062; P < 0.001), temperature (> 39℃) (OR, 5.279; 95%CI, 2.305–12.090; P < 0.001), hemoptysis (OR, 7.307; 95%CI, 2.263–23.595; P < 0.001), cough (OR, 2.574; 95%CI, 1.429–4.542; P < 0.001), shortness of breath (OR, 11.281; 95%CI, 6.883–18.490; P < 0.001), hypertension (OR, 4.105; 95%CI, 2.572–6.554; P < 0.001), diabetes (OR, 2.176; 95%CI, 1.161–4.078; P < 0.001), secondary bacterial infection (OR, 9.686; 95%CI, 5.146–18.323; P < 0.001), lung consolidation (OR, 4.264; 95%CI, 2.668–6.815; P < 0.001), lymphocyte count (OR, 0.145; 95%CI, 0.080–0.263; P < 0.001), neutrophils/lymphocytes ratio (NLR) (< 3 vs. ≥ 3) (OR, 7.211; 95%CI, 3.980–13.064; P < 0.001), ALT(≤ 40 vs. > 40 U/L) (OR, 2.710; 95%CI, 1.639–4.482; P < 0.001), AST (≤ 40 vs. > 40 U/L) (OR, 5.139; 95%CI, 3.100–8.520; P < 0.001), CK (≤ 185 vs. > 185 U/L) (OR, 4.114; 95%CI, 2.312–7.319; P < 0.001), lactate dehydrogenase (LDH) (≤ 250 vs. > 250 U/L) (OR, 8.104; 95%CI, 4.733–13.876; P < 0.001), C-reactive protein (CRP) (≤ 10 vs. > 10 mg/L) (OR, 5.959; 95%CI, 3.510–10.119; P < 0.001) were all strongly correlated with ARDS (Table 4).

Table 4 Risk factor analysis for COVID -19.

Full size table

Development and verification of predictive models

Based on the above results of univariate analysis, we determined 19 risk factors including severity evaluation at admission, gender, age, BMI, temperature, cough, shortness of breath, hemoptysis, hypertension, diabetes, secondary bacterial infection, lung consolidation, lymphocyte count, CK, NLR, ALT, AST, LDH, and CRP as inputs to the model to evaluate whether COVID-19 patients would develop ARDS. We tried five algorithms for modeling, including logistic regression (LR), random forest (RF), support vector machine (SVM), decision tree (DT) and deep neural networks (DNN). Table 5 shows the mean ± standard deviation (std.) for 10-fold cross validation with AUC and accuracy. DT, LR and RF all exceeded AUC of 0.85 and the mean accuracy of each algorithm was over 0.8. In order to further verify the accuracy of the models, performances of five algorithms were evaluated on the external test set with each technique. Table 6 and Fig. 1 show that DT, LR, RF and DNN all demonstrated good performance in term of AUC, accuracy and specificity. The sensitivity of DT and LR was much higher than that of other three models. Considering the unbalance of the actual dataset, we also evaluated the balanced accuracy of each model. The result of DT and DNN was 0.98 and 0.93, respectively. The predictive model established by SVM exhibited the worst performance in five models. It is necessary for ARDS diagnosed tool with high sensitivity and accuracy. The results show that DT marked the best value in each evaluation with AUC of 0.99, accuracy of 0.97 and sensitivity of 1.0 respectively. Therefore, the model constructed by decision tree algorithm was optimum tool for ARDS prediction.

Table 5 10-fold cross-validation results of 5 algorithms.

Full size table

Table 6 Performance of the five algorithms on external testing dataset in predicting the occurrence of COVID-19 ARDS.

Full size table

Discussion

In this study, we comprehensively compared the clinical characteristics of all confirmed COVID-19 patients with and without ARDS, and determined 19 features for modeling. All included variables were strongly correlated with disease progression. Age (> 70 years), gender, hypertension, diabetes as well as severity evaluation are recognized risk factor for developing ARDS in COVID-19 patients¹⁸. Clinical manifestations such as fever, cough, hemoptysis, shortness of breath and lung consolidation reflect the progression of COVID-19^19,20. Viral infections predispose patients to secondary bacterial infections, which often lead to a more severe clinical course. Secondary bacterial infection has been considered as a critical risk factor for the severity and mortality rates of COVID-19 despite antimicrobial therapies^21,22. Lymphopenia, high concentrations of CRP and LDH may indicate severe acute lung inflammatory reaction and cell damage^23,24,25, which has been reported to be risk factors for severe patients with COVID-19²⁶. ALT and AST are markers of acute liver injury. Studies have found that abnormal liver tests in patients with COVID-19 were associated with the progression to severe pneumonia. The detrimental effects on liver were mainly related to the use of lopinavir/ritonavir during hospitalization. Therefore, liver function should be monitored and evaluated frequently during medication^27,28. NLR is an indicator of systemic inflammation²⁹, mainly seen in tumor-related diseases, autoimmune diseases, bacterial infectious pneumonia and tuberculosis^30,31,32,33. It was reported that COVID-19 infection-triggered inflammation increased NLR, which was significantly associated with poor clinical outcomes of COVID-19 patients³⁴. We found that CK was a high-risk factor for ARDS. On the one hand, it might be associated with heart injury in critically ill patients with COVID-19³⁵. On the other hand, this indicator was related to rhabdomyolysis^36,37. Several cases of rhabdomyolysis were reported in COVID-19 severe patients, with a marked increase of CK^38,39,40.

We tried five algorithms for modeling and finally the decision trees performed best. In clinical prediction research, decision tree is frequently designed to build binary classifiers, such as cancer prediction/prognosis⁴¹. As a method used in machine learning, it is nonparametric which makes fewer data assumptions and it can accommodate collinear independent variables⁴². It is also less sensitive to outliers and more robust to high-dimensional data, which possess many independent variables relative to outcomes⁴³. The main advantage of decision tree is its simple structure, which allows for better extracting classification rules and interpretation. Our model consisted of 19 clinical variables, which were all relatively inexpensive and easy to be obtained directly from clinical symptoms and routine laboratory tests. At the same time, the system showed good sensitivity, specificity and AUC in the external test cohort. Compared with the results of Jiang et al.¹⁰, the overall accuracy of our model is higher (70% vs 91%).

Our study has several strengths: first, we have successfully used a machine learning algorithm to analyze clinical datasets and developed a diagnosis aid system, which has been deployed in electronic medical records for early identification of ARDS in COVID-19 patients. By submitting clinical information online, medical staff can triage patients at hospital admission based on the predicted risk factors and arrange patient treatment plans accordingly, ensuring patients receive treatment early and medical resources can be efficiently allocated. Secondly, to ensure the reliability of the conclusion, we used data from multi-centers with large samples for modeling and verification. Third, we found that CK (> 185 U/L) and NLR were strongly correlated with ARDS, which might be the new potential early identification biomarkers in COVID-19 severe patients.

There are still some deficiencies in our study and we have a lot of works to do in the future. Firstly, although we collected data of 659 COVID-19 patients in multiple centers, samples available for ARDS were limited. Secondly, we did not collect CT images data, and the quantitative information of CT diagnostic data was not detailed enough. Thirdly, it has been reported that D-dimer was a risk factor for COVID-19 severity. However, due to a large number of missing data, similar conclusions were not reached in our study. Finally, it is of great clinical value to study the intervention measures and prognosis of COVID-19 patients before and after the development of ARDS and integrate them into the diagnostic system to achieve personalized recommendations of treatment measures.

Conclusion

We retrospectively analyzed the clinical characteristics of COVD-19 patients with and without ARDS from Zhejiang Province and Wuhan and identified 19 risk factors. Further, based on these risk factors, we used five methods for modeling, four of which had good predicting effect. The decision tree performed best with an accuracy rate of 97%. We have deployed it to the infectious disease electronic medical record system to assist doctors in early warning severe patients with COVID-19.

Method

Patient population and clinical data

Data on a total of 659 consecutive COVID-19 patients from January 22 to April 1, 2020 were retrospectively collected in hospitals from 11 regions: NingBo, ZhouShan, HuBei, Lishui, Jiaxin, HangZhou, TaiZhou, DongYang, ShaoXing, WenZhou, HuZhou. The age of the patient ranged from 14 and 90 years old. All patients were diagnosed by positive tests of severe acute respiratory syndrome-coronavirus-2(SARS-CoV-2) nucleic acids, according to WHO interim guidance. Clinical information including demographic, comorbidities, epidemiological history of exposure to COVID-19, vital sign, clinical symptoms, biochemical indices, blood routine, infection-related biomarkers, CT findings, therapeutic measures, and all the time information from onset to admission were collected from routine clinical practice. The date of disease onset was defined as the day when symptoms (i.e. fever, dry cough, expectoration, polypnea, fatigue, myalgia, pharyngalgia, dyspnea, headache, vomiting) first appeared. ARDS was defined according to the Berlin definition. Severity evaluation criteria on admission was based on the Guidelines for the Diagnosis and Treatment of Novel Coronavirus (2019-nCoV) Infection (Trial Version 7), which was a comprehensive evaluation index with important clinical diagnostic value. Patients with one of the following symptoms are diagnosed as secondary bacterial infection: bacteria are found in sterile sites; patients have a fever that was unrelated to the initial disease, accompanied by elevated CRP. This study was approved by the Ethics Committee of Shulan Hangzhou Hospital. Written informed consents were signed during hospitalization from patients or their parents.

Data analysis

Continuous variables were expressed as medians and interquartile ranges or simple ranges, as defined by experts. Categorical variables were summarized as counts and percentages. We assessed differences between ARDS and non-ARDS using Two-Sample T test or Mann–Whitney U test depending on parametric or non-parametric data for continuous variables and the Chi-square for categorical variables. Tests were two-sided with significance set at α less than 0·05. All statistical analysis was performed using IBM SPSS Ver. 19.0. The Python programming language (Python Software Foundation, version 3.6.6, https://www.python.org/downloads/) was used for our models.

Machine learning model establishment and evaluation

Datasets

All data was divided into three separate parts with no overlapping topics: training, validation, and external test sets (Table 7).

Table 7 Details of modeling datasets.

Full size table

For COVID-19 ARDS prediction.

Training and validation datasets: 236 subjects were assigned to the training and validation datasets following a 9:1 ratio, including 189 non-ARDS and 47 ARDS cases from 11 regions in Wuhan and Zhejiang, further cross-validated 10 times. These datasets were used to train model parameters.
External test dataset: There were 57 non-ARDS and 14 ARDS cases from 11 regions in Wuhan and Zhejiang. This dataset was used to evaluate and analyze the performances of different models to select the best model for AI system.

Algorithms

Four conventional types of machine learning algorithms (decision trees, random forests, support vector machines and logistic regression) and one deep learning method with ReLu activation function (deep neural networks, DNN) were conducted to develop the ARDS prediction model in COVID-19 patients. We implemented support vector machines with the RBF kernel. ID3 decision tree was constructed with the max leaf nodes of 5 and random forest was constructed by 40 decision trees with criterion of entropy algorithm. The pipeline of the DNN model was shown in Figure S1. The input data was a 19-dimensional vector, containing the clinical data of patients. The DNN model employed in this study was a 4-layer network structure with the hidden neurons of 64, 32, 8 and 1 respectively. A sigmoid layer was added at the top of the network to output the probability of ARDS occurrence and a total of 100 epochs were executed.

Evaluation

The performance of the models was assessed by 10-fold cross validation (10-fold CV) and external tests. Specifically, we randomly divided the training and validation datasets into 10 parts: 9 parts were used to train the algorithms and 1 part was used to estimate the prediction performance of the method. The mean AUC and accuracy were calculated by 10-fold CV as indicators of prediction accuracy. This process was repeated 10 times. Furthermore, we verify the prediction accuracy of the models on the external test dataset by evaluating the receiver operating characteristic (ROC) curves, the classification accuracy, F-measure, sensitivity and specificity.

Application development

The best algorithm for ARDS risk prediction was embedded into EMR and could be accessed via the link https://ai-ards.rubikstack.com/#/login. The Anaconda Distribution (Anaconda Inc, Austin, Texas), Visual Studio Code version 1.45.1 (Microsoft, Redmond, Washington), and Python version 3.6 (Python Software Foundation, Wilmington, Delaware) were used for data analysis, model creation, and web application development.

Ethics approval and consent to participate

This study has been approved by the ethics committee of ShuLan (Hangzhou) Hospital. All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. Written informed consents were signed during hospitalization from patients or their parents. The data used in this study were anonymized before its use.

References

Guan, W. J. et al. Clinical characteristics of coronavirus disease 2019 in China. N. Engl. J. Med. 382(18), 1708–1720 (2020).
Article CAS Google Scholar
Huang, C. et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan. China. The Lancet. 395(10223), 497–506 (2020).
Article CAS Google Scholar
Ding, Q., Lu, P., Fan, Y., Xia, Y. & Liu, M. The clinical characteristics of pneumonia patients coinfected with XXX novel coronavirus and influenza virus in Wuhan. China. J. Med. Virol. https://doi.org/10.1002/jmv.25781 (2020).
Article PubMed Google Scholar
Yang X. et al. Clinical course and outcomes of critically ill patients with SARS-CoV-2 pneumonia in Wuhan, China: a single-centered, retrospective, observational study. The Lancet Respir. Med. 8(5), 475–481 (2020).
Liu, W. et al. Analysis of factors associated with disease outcomes in hospitalized patients with 2019 novel coronavirus disease. Chin. Med. J. 133(9), 1032–1038 (2020).
Article Google Scholar
Zhao, X.-Y. et al. Clinical characteristics of patients with 2019 coronavirus disease in a non-Wuhan area of Hubei Province, China: a retrospective study. BMC Infect. Dis. 20(1), 311 (2020).
Article CAS Google Scholar
Li, K. et al. The clinical and chest CT features associated with severe and critical COVID-19 pneumonia. Investig. Radiol. 55(6), 327–331 (2020).
Article CAS Google Scholar
Bellani, G. et al. Epidemiology, patterns of care, and mortality for patients with acute respiratory distress syndrome in intensive care units in 50 countries. JAMA 315(8), 788–800 (2016).
Article CAS Google Scholar
Yang, X. et al. Clinical course and outcomes of critically ill patients with SARS-CoV-2 pneumonia in Wuhan, China: a single-centered, retrospective, observational study. Lancet Respir. Med. 8(5), 475–481 (2020).
Article CAS Google Scholar
Jiang, X. et al. Towards an artificial intelligence framework for data-driven prediction of coronavirus clinical severity. Comput. Mater. Continua. 62(3), 537–551 (2020).
Article Google Scholar
Shortliffe, E. H. & Sepulveda, M. J. Clinical decision support in the era of artificial intelligence. JAMA 320(21), 2199–2200 (2018).
Article Google Scholar
Wang, S. et al. A fully automatic deep learning system for COVID-19 diagnostic and prognostic analysis. Eur. Respir. J. https://doi.org/10.1183/13993003.00775 (2020).
Article PubMed PubMed Central Google Scholar
Gong, J. et al. A tool to early predict severe corona virus disease XXX (COVID-19): a multicenter study using the risk Nomogram in Wuhan and Guangdong. China. Clin. Infect. Dis. https://doi.org/10.1183/13993003.00775-2020 (2020).
Article PubMed Google Scholar
Meylan, S. et al. An early warning score to predict ICU admission in COVID-19 positive patients. J. Infect. https://doi.org/10.1016/j.jinf.2020.05.047 (2020).
Article PubMed PubMed Central Google Scholar
Yang, Z. et al. Modified SEIR and AI prediction of the epidemics trend of COVID-19 in China under public health interventions. J. Thorac. Dis. 12(3), 165–174 (2020).
Article Google Scholar
Bai X, Fang C, Zhou Y, Bai S, Liu Z, Chen Q, et al. Predicting COVID-19 malignant progression with AI techniques. medRxiv. 17:42. (2020).
Wynants, L. et al. Prediction models for diagnosis and prognosis of covid-19 infection: systematic review and critical appraisal. BMJ 369, m1328 (2020).
Article Google Scholar
Zheng, Z. et al. Risk factors of critical & mortal COVID-19 cases: a systematic literature review and meta-analysis. J. Infect. 81, e16–e25 (2020).
Article CAS Google Scholar
Wu, X. et al. Risk factors associated with acute respiratory distress syndrome and death in patients with coronavirus disease, et al 2019 Pneumonia in Wuhan, China. JAMA Intern. Med. https://doi.org/10.1001/jamainternmed.2020.0994 (2020).
Liang, W. et al. Development and validation of a clinical risk score to predict the occurrence of critical illness in hospitalized patients with COVID-19. JAMA Intern. Med. 180(8), 1081–1089. https://doi.org/10.1001/jamainternmed.2020.2033 (2020).
Article CAS PubMed Google Scholar
Mirzaei, R. et al. Bacterial co-infections with SARS-CoV-2. IUBMB Life https://doi.org/10.1002/iub.2356 (2020).
Article PubMed PubMed Central Google Scholar
Manna, S., Baindara, P. & Mandal, S. M. Molecular pathogenesis of secondary bacterial infection associated to viral infections including SARS-CoV-2. J. Infect. Public Health. https://doi.org/10.1016/j.jiph.2020.07.00 (2020).
Article PubMed PubMed Central Google Scholar
Agassandian, M., Shurin, G. V., Ma, Y. & Shurin, M. R. C-reactive protein and lung diseases. Int. J. Biochem. Cell Biol. 53, 77–88 (2014).
Article CAS Google Scholar
Liu, Y. et al. Clinical and biochemical indexes from 2019-nCoV infected patients linked to viral loads and lung injury. Sci. China Life Sci. 63(3), 364–374 (2020).
Article CAS Google Scholar
Li, X. et al. Clinical characteristics of 25 death cases with COVID-19: a retrospective review of medical records in a single medical center, Wuhan. China. Int. J. Infect. Dis. 94, 128–132 (2020).
Article CAS Google Scholar
Li, X. et al. Risk factors for severity and mortality in adult COVID-19 inpatients in Wuhan. J. Allergy Clin. Immunol. https://doi.org/10.1016/j.jaci.2020.04.006 (2020).
Article PubMed PubMed Central Google Scholar
Cai, Q. et al. COVID-19: Abnormal liver function tests. J. Hepatol. https://doi.org/10.1016/j.jhep.2020.04.006 (2020).
Article PubMed PubMed Central Google Scholar
Fan, Z. et al. Clinical features of COVID-19-related liver functional abnormality. Clin. Gastroenterol. Hepatol. 18(7), 1561–1566 (2020).
Article CAS Google Scholar
Guthrie, G. J. et al. The systemic inflammation-based neutrophil-lymphocyte ratio: experience in patients with cancer. Crit. Rev. Oncol. Hematol. 88(1), 218–230 (2013).
Article Google Scholar
Shimoyama, Y. et al. The neutrophil to lymphocyte ratio is superior to other inflammation-based prognostic scores in predicting the mortality of patients with pneumonia. Acta Med. Okayama. 72(6), 591–593 (2018).
CAS PubMed Google Scholar
Jeon, Y., Lee, W. I., Kang, S. Y. & Kim, M. H. Neutrophil-to-monocyte-plus-lymphocyte ratio as a potential marker for discriminating pulmonary tuberculosis from nontuberculosis infectious lung diseases. Lab Med. 50(3), 286–291 (2019).
Article Google Scholar
Ying, H. Q. et al. The prognostic value of preoperative NLR, d-NLR, PLR and LMR for predicting clinical outcome in surgical colorectal cancer patients. Med. Oncol. 31(12), 305 (2014).
Article Google Scholar
Uslu, A. U. et al. Two new inflammatory markers associated with Disease Activity Score-28 in patients with rheumatoid arthritis: neutrophil-lymphocyte ratio and platelet-lymphocyte ratio. Int. J. Rheum. Dis. 18(7), 731–735 (2015).
Article Google Scholar
Yang, A. P. et al. Infection with SARS-CoV-2 causes abnormal laboratory results of multiple organs in patients. Aging (Albany NY). https://doi.org/10.18632/aging.103255 (2020).
Shi, S. et al. Association of cardiac injury with mortality in hospitalized patients with COVID-19 in Wuhan. China. JAMA Cardiol. https://doi.org/10.1001/jamacardio.2020.0950 (2020).
Article PubMed Google Scholar
Chan, K. H., Farouji, I., Abu Hanoud, A. & Slim, J. Weakness and elevated creatinine kinase as the initial presentation of coronavirus disease XXX (COVID-19). Am. J. Emerg. Med. https://doi.org/10.1016/j.ajem.2020.05.015 (2020).
Article PubMed PubMed Central Google Scholar
Cabral, B. M. I., Edding, S. N., Portocarrero, J. P. & Lerma, E. V. Rhabdomyolysis. Dis. Mon. 2020, 101015 (2020).
Article Google Scholar
Suwanwongse, K. & Shabarek, N. Rhabdomyolysis as a presentation of 2019 novel coronavirus disease. Cureus. 12(4), e7561 (2020).
PubMed PubMed Central Google Scholar
Jin, M. & Tong, Q. Rhabdomyolysis as potential late complication associated with COVID-19. Emerg. Infect. Dis. 26(7), 1618 (2020).
Article CAS Google Scholar
Gefen, A. M. et al. Pediatric COVID-19-associated rhabdomyolysis: a case report. Pediatr. Nephrol. https://doi.org/10.1007/s00467-020-04617-0 (2020).
Article PubMed PubMed Central Google Scholar
Kourou, K., Exarchos, T. P., Exarchos, K. P., Karamouzis, M. V. & Fotiadis, D. I. Machine learning applications in cancer prognosis and prediction. Comput. Struct. Biotechnol. J. 13, 8–17 (2015).
Article CAS Google Scholar
Visweswaran, S., Ferreira, A., Ribeiro, G. A., Oliveira, A. C. & Cooper, G. F. Personalized modeling for prediction with decision-path models. PLoS ONE 10(6), e0131022 (2015).
Article Google Scholar
Che, D., Liu, Q., Rasheed, K. & Tao, X. Decision tree and ensemble learning algorithms with their applications in bioinformatics. Adv. Exp. Med. Biol. 696, 191–199 (2011).
Article CAS Google Scholar

Download references

Acknowledgements

We would like to thank the medical staffs in the Department of Infectious Disease for their support in providing information about patients. We also thank the patients for their willingness to participate. Finally, we thank Prof. Peilin Yu and Dr. Lin Wei for their generous help. This work was supported by COVID-19 Emergency Research Project of Zhejiang Provincial Department of Science and Technology (2020C03123) and National Science and Technology Major Project (No.2017ZX10204401).

Funding

The results reported herein correspond to specific aims of grant 2020C03123 to investigator Ling-Ling Tang from COVID-19 Emergency Research Project of Zhejiang Provincial Department of Science and Technology. This work was also supported by grant No.2017ZX10204401 from National Science and Technology Major Project.

Author information

These authors contributed equally:  Wan Xu and Nan-Nan Sun.

Authors and Affiliations

Hangzhou Xiaoshan District Center for Disease Control and Prevention, Hangzhou, China
Wan Xu
Hangzhou Wowjoy Information Technology Co., Ltd, Hangzhou, China
Nan-Nan Sun, Zhi-Yuan Chen & Bin Ju
State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Centre for Infectious Diseases, Collaborative Innovation Centre for Diagnosis and Treatment of Infectious Diseases, the First Affiliated Hospital, College of Medicine, Zhejiang University, Hangzhou, 310003, Zhejiang Province, China
Ya Yang
Department of Infectious Diseases, ShuLan (Hangzhou) Hospital Affiliated to Zhejiang Shuren University Shulan International Medical College, Hangzhou, China
Hai-Nv Gao & Ling-Ling Tang

Authors

Wan Xu
View author publications
You can also search for this author in PubMed Google Scholar
Nan-Nan Sun
View author publications
You can also search for this author in PubMed Google Scholar
Hai-Nv Gao
View author publications
You can also search for this author in PubMed Google Scholar
Zhi-Yuan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Ya Yang
View author publications
You can also search for this author in PubMed Google Scholar
Bin Ju
View author publications
You can also search for this author in PubMed Google Scholar
Ling-Ling Tang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

W.X. wrote the main manuscript text and prepared Tables 1, 2, 3 and 4. N.N.S. and W.X. prepared Tables 5, 6 and 7 and all figures. H.N.G. provided original dataset. Z.Y.C. and Y.Y. modified the main manuscript. B.J. and L.L.T. were the corresponding author of the article and guided the research. All authors reviewed the manuscript.

Corresponding authors

Correspondence to Bin Ju or Ling-Ling Tang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information 1.

Supplementary Information 2.

Supplementary Information 3.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Xu, W., Sun, NN., Gao, HN. et al. Risk factors analysis of COVID-19 patients with ARDS and prediction based on machine learning. Sci Rep 11, 2933 (2021). https://doi.org/10.1038/s41598-021-82492-x

Download citation

Received: 01 September 2020
Accepted: 11 January 2021
Published: 03 February 2021
DOI: https://doi.org/10.1038/s41598-021-82492-x

This article is cited by

A systematic review of machine learning models for management, prediction and classification of ARDS
- Tu K. Tran
- Minh C. Tran
- Andrew D. Farmery
Respiratory Research (2024)
Comparative analysis of feature selection techniques for COVID-19 dataset
- Farideh Mohtasham
- MohamadAmin Pourhoseingholi
- Mohammad Reza Zali
Scientific Reports (2024)
Investigation of factors regarding the effects of COVID-19 pandemic on college students’ depression by quantum annealer
- Junggu Choi
- Kion Kim
- Sanghoon Han
Scientific Reports (2024)
Cause of Death by Race and Ethnicity in Minnesota Before and During the COVID-19 Pandemic, 2019–2020
- Madelyn J. Blake
- Nicholas A. Marka
- Jonathan I. Ravdin
Journal of Racial and Ethnic Health Disparities (2024)
Machine learning for emerging infectious disease field responses
- Han-Yi Robert Chiu
- Chun-Kai Hwang
- Yen-Jen Oyang
Scientific Reports (2022)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Characteristics of COVID-19 patients

Demographics and epidemiology

Clinical characteristics and underlying diseases

Radiologic, laboratory findings and complications

Prediction of risk factors for COVID-19 ARDS

Development and verification of predictive models

Discussion

Conclusion

Method

Patient population and clinical data

Data analysis

Machine learning model establishment and evaluation

Datasets

Algorithms

Evaluation

Application development

Ethics approval and consent to participate

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links