A multipurpose machine learning approach to predict COVID-19 negative prognosis in São Paulo, Brazil

Fernandes, Fernando Timoteo; de Oliveira, Tiago Almeida; Teixeira, Cristiane Esteves; Batista, Andre Filipe de Moraes; Dalla Costa, Gabriel; Chiavegatto Filho, Alexandre Dias Porto

doi:10.1038/s41598-021-82885-y

Download PDF

Article
Open access
Published: 08 February 2021

A multipurpose machine learning approach to predict COVID-19 negative prognosis in São Paulo, Brazil

Fernando Timoteo Fernandes^1,2,
Tiago Almeida de Oliveira^1,3,
Cristiane Esteves Teixeira^1,4,
Andre Filipe de Moraes Batista¹,
Gabriel Dalla Costa⁵ &
…
Alexandre Dias Porto Chiavegatto Filho¹

Scientific Reports volume 11, Article number: 3343 (2021) Cite this article

11k Accesses
56 Citations
34 Altmetric
Metrics details

Subjects

Abstract

The new coronavirus disease (COVID-19) is a challenge for clinical decision-making and the effective allocation of healthcare resources. An accurate prognostic assessment is necessary to improve survival of patients, especially in developing countries. This study proposes to predict the risk of developing critical conditions in COVID-19 patients by training multipurpose algorithms. We followed a total of 1040 patients with a positive RT-PCR diagnosis for COVID-19 from a large hospital from São Paulo, Brazil, from March to June 2020, of which 288 (28%) presented a severe prognosis, i.e. Intensive Care Unit (ICU) admission, use of mechanical ventilation or death. We used routinely-collected laboratory, clinical and demographic data to train five machine learning algorithms (artificial neural networks, extra trees, random forests, catboost, and extreme gradient boosting). We used a random sample of 70% of patients to train the algorithms and 30% were left for performance assessment, simulating new unseen data. In order to assess if the algorithms could capture general severe prognostic patterns, each model was trained by combining two out of three outcomes to predict the other. All algorithms presented very high predictive performance (average AUROC of 0.92, sensitivity of 0.92, and specificity of 0.82). The three most important variables for the multipurpose algorithms were ratio of lymphocyte per C-reactive protein, C-reactive protein and Braden Scale. The results highlight the possibility that machine learning algorithms are able to predict unspecific negative COVID-19 outcomes from routinely-collected data.

Early risk assessment for COVID-19 patients from emergency department data using machine learning

Article Open access 18 February 2021

Machine learning based early warning system enables accurate mortality risk prediction for COVID-19

Article Open access 06 October 2020

Multivariable mortality risk prediction using machine learning for COVID-19 patients at admission (AICOVID)

Article Open access 17 June 2021

Introduction

The consequences of a long stay and demand for hospital resources due to COVID-19 have been disastrous for health systems in middle and low-income countries (LMICs)^1,2, requiring immediate clinical decisions, especially when dealing with limited resources^3,4. An accurate COVID-19 prognosis assessment is crucial for screening and treatment procedures and may increase patient survival^5,6. In Brazil⁷, many cities are at their saturation capacity for the provision of clinical care, especially regarding ICU beds and mechanical ventilators^{8,9,10,11,12,13,14,15,16,17,18,19,20}. Data-driven solutions are needed to support decision-making¹¹.

COVID-19 has shown to rapidly worsen a few days after infection^12,13. The median time from disease onset to ICU admission is 9–12 days^14,15. About 26–32% of the hospitalized patients are eventually admitted to ICU, and mortality in this group ranges from 39 to 72%, depending on the local characteristics of patients^14,15. The median length of ICU stay and use of mechanical ventilation is approximately 9 days (95% CI 6.5–11.2) and 8.4 days (95% CI 1.6–13.7), respectively¹⁶.

Previous studies have used blood tests¹⁷, CT images^18,19, sociodemographic and comorbidities history²⁰ to develop COVID-19 diagnostic and prognostic models, including machine learning techniques^21,22,23. Biomarkers from blood tests have emerged as important variables for poor prognostic factors²⁴, which are a promising tool in poorer regions, due to its low cost and inclusion in standard protocols for clinical care. However, the majority of studies²⁵ rely on algorithms trained on a single prognostic outcome, which in theory require the training of specific algorithms for each distinct negative outcome.

This study proposes to develop multipurpose machine learning algorithms to analyze if it is possible to predict overall poor prognosis for COVID-19 patients. We aim to test if the algorithms can generalize risk patterns for severe conditions, so they can be used as tools to assist in the prognosis of distinct negative outcomes for COVID-19 patients.

Results

Descriptive statistics

Table 1 shows the descriptive statistics for the demographic characteristics of the patients. The sample of the study (1040 patients with COVID-19) was mostly comprised by men (53.3%), with an average age of 51.7 years, and the majority of patients (63.8%) were white. The full descriptive statistics for all variables are presented in Supplementary Table 1.

Table 1 Descriptive statistics of the demographics characteristics of the sample, BP Hospital—A Beneficência Portuguesa de São Paulo, Brazil, 2020.

Full size table

Algorithms performance

We analyzed the predictive performance of the algorithms for three negative prognostic outcomes: ICU admission (n = 263, 25.5%), mechanical ventilation (MV) intubation (n = 106, 10.2%) and death (n = 92, 9.4%).

First, we tested the predictive performance of the machine learning algorithms for a specific individual outcome (e.g. death) to get a baseline for comparison. Then, we used observations from patients who had the other two outcomes (in this specific example, mechanical ventilation and ICU admission) to train an aggregated model. In the aggregated model, we tested the performance when predicting the severe outcome not included in training (e.g. death). Finally, we compared the performance of the two strategies (e.g. individual against aggregated models) using the 95% confidence interval of the area under the receiver operating characteristic curve (AUROC).

Table 2 shows the results of the models trained with the aggregated outcomes and the models with a single outcome. Every model, even the ones trained with different outcomes, presented high predictive performance, always with an AUROC over 0.91 in the test set. The individual models presented better AUC compared to the aggregated models when predicting ICU, MV or death with AUROC over 0.959, 0.945 and 0.972 respectively.

Table 2 Predictive performance comparison in the test set for aggregated and individual models, BP Hospital—A Beneficência Portuguesa de São Paulo, Brazil, 2020.

Full size table

Despite the individual models being overall better, the difference between the aggregated and individual models were all within the 95% confidence intervals. Supplementary Fig. 1 shows the AUROC for each model. The sensitivity and specificity of the machine learning algorithms were also very high, in most cases over 0.8, with an average sensitivity of 0.92 and specificity of 0.82.

The positive predictive values (PPV) for the aggregated models were higher than the individual models when predicting mechanical ventilation and ICU, reaching 0.398 and 0.729 respectively, while for death there was a decrease to 0.290. This means that two out of three of the aggregated models had higher PPV when predicting which patients would develop severe illness and require hospital resources than the individual models. In Supplementary Table 2 we present the final hyperparameters for each model.

Interpretability

Figure 1 presents the prediction density for each individual outcome according to the different training strategies. The results point to a low overlap between negative and positive cases, indicating a good discriminative ability of the algorithms irrespective of the training strategy.

Figure 2 presents the top five variables that most contributed to predict a severe outcome in the aggregated models, according to the Shapley values. The variables are ranked according to the contribution for each specific algorithm. The Braden score played an important role in the aggregated outcome algorithms, ranking as the most important predictor in two of the three models. Also, the C-reactive protein and ratio of lymphocytes per C-reactive protein were found to be good predictors, appearing in the top five in all three models. Urea, age, creatinine, and arterial lactate were important for only one of the aggregated models.

Discussion

Previous studies have used machine learning to develop early COVID-19 prognostic models for a specific severe outcome with overall good performance^21,23, frequently reaching over 0.90 AUROC²⁶. We used a different approach, by combining severe outcomes to train algorithms to predict another outcome, in order to test its potential for predicting multiple untrained outcomes.

We found that machine learning algorithms were able to predict negative prognostic outcomes with high overall performance for COVID-19, even when the specific outcome was not included in the training of the algorithms. All models presented an AUROC higher than 0.91 (average of 0.92) in the test set, with high sensitivity and specificity (average of 0.92 and 0.82, respectively). The results highlight the possibility that high-performance machine learning algorithms are able to predict unspecific negative COVID-19 outcomes using routinely-collected data.

The development of multipurpose prognostic algorithms, i.e. algorithms that identify nonspecific outcomes and overall future clinical deterioration, can be used in a large number of situations, especially in the case of complex and unknown diseases that lead to the development of several different negative outcomes. Instead of having to develop a different algorithm for each of the specific outcomes, multipurpose models can provide more comprehensive and clinically relevant information about the risks of future health problems of patients. The algorithms can be embedded in an app for smartphones or in electronic medical records to be used with routinely-collected data to perform simple predictions for each incoming patient, thus supporting screening procedures and decision-making. In the case of developing countries, while the issue of current availability of electronic medical records in poorer areas is still a challenge, in Brazil there have been promising recent advances regarding the use of electronic medical records²⁷.

Brazil is currently the third country in the world in total number of cases and second in deaths from COVID-19²⁸. There is a growing demand in Brazil, and in many other developing countries, for decision support in the allocation of scarce hospital resources, especially in relation to the availability of ICU beds and mechanical ventilators^29,30. From a clinical standard, knowledge about immediate risks of negative prognosis can also contribute to the early start of preventive measures and new interventions, and thereby increase patient survival^5,6.

For every outcome, variable importance analysis identified that age, C-reactive protein (CRP), creatinine, urea and the Braden Scale were usually among the most important. While the age of the patient is widely found to be an important predictor for most negative health outcomes, CRP has been increasingly included among the main inflammatory biomarkers for the prognosis of cardiovascular³¹ and respiratory diseases³². High levels of CRP have been also previously associated with individual severity of SARS-CoV-2^33,34. Interestingly, previous studies have also identified that chronic kidney disease is associated with developing severe conditions in COVID-19 patients^35,36,37, where it has been observed that patients with higher levels of creatinine and urea are more at risk³⁸. The Braden Scale is often used as a predictor for pressure ulcers, a common clinical classification scale for predicting pneumonia³⁹ during clinical reception, and in this study, it was an important predictor for negative prognosis in COVID-19 patients. The scale has a score between 1 (worst score) and 4 (best score) where the factors included are sensory perception, skin moisture, activity, mobility, nutritional status and friction⁴⁰. The percentage of lymphocytes in the blood has been described as a strong predictor of prognosis for the severity of the new coronavirus. A randomized study by Tan et al.⁴¹ suggested that, in most confirmed cases, the percentage of lymphocytes was reduced to 5% in 2 weeks after the onset of COVID-19, in line with other studies findings⁴².

The study has a few limitations that need to be mentioned. First, some of the outcomes overlap, which may have helped the performance of the aggregated models, even though in the majority of cases the outcomes were independent. In the case of ICU admission, 55% of the patients did not die or used MV, while in the case of MV and death, 63% and 70% of their respective aggregated model was trained on other outcomes. Ideally, the outcomes would never overlap, but this is clinically unfeasible given the interlaced nature of negative prognostic outcomes. Another limitation is that we analyzed data from an urban COVID-19 hotspot in Brazil, in a period where clinical protocols for the disease were still being established, so this could affect the incidence of prognostic outcomes and may not directly generalize to other periods.

In conclusion, we found that machine learning algorithms can predict severe outcomes in COVID-19 patients with high performance, including previously unobserved outcomes, using only routinely-collected laboratory, clinical and demographic data. The use of multipurpose algorithms for the prediction of overall negative prognosis is a promising new area that can support doctors with clinical and administrative decisions, especially regarding priorities for hospital admission and monitoring.

Methods

Data source

We followed a cohort of 3280 patients with a RT-PCR diagnostic exam for COVID-19 from a large hospital chain in the city of São Paulo (BP-A Beneficência Portuguesa de São Paulo) between March 1, 2020, and 28 June, 2020. Of these, 1040 (31.7%) patients were positive for COVID-19 and were included in the analysis. The study was approved by the Institutional Review Board (IRB) of BP—A Beneficência Portuguesa de São Paulo (CAAE:31177220.4.3001.5421), including a waiver of informed consent. The study followed the guidelines of the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD)⁴³.

Individual patient data was collected from electronic medical records. We included as predictors only variables collected in early hospital admission, i.e. within 24 h before and 24 h after the RT-PCR exam. From a total of 82 routinely-collected variables from the hospital, 57 variables were selected for the development of the predictive models, after removing variables with 90% or higher missing values, highly-correlated variables (above 0.9) and identifying variables such as patient number and hospital identification variables. The flowchart for feature selection is described in Supplementary Fig. 2 and the complete variable list, including demographic data, laboratory tests and vital signs is described in Supplementary Table 1. Figure 3 illustrates the overall process.

Machine learning techniques

Five of the most popular machine learning models for structured data (artificial neural networks⁴⁴, extra trees⁴⁵, random forests⁴⁶, catboost⁴⁷, and extreme gradient boosting⁴⁸) were trained with 70% of the data, and tested in the other 30%, simulating new unknown data. All the results reported in this study are from the test set. K-fold cross-validation with 10 folds was used to adjust the hyperparameters with Bayesian optimization (HyperOpt). Due to the unbalanced nature of the outcomes, random undersampling was performed in the training set, by randomly selecting examples from the majority class for exclusion. This technique was implemented using the RandomUnderSampler imbalanced-learn class⁴⁹.

Variables with more than two categories were represented by a set of dummy variables, with one variable for each category. Continuous variables were standardized using the z-score. Variables with a correlation greater than 0.90 (mean arterial pressure, total bilirubin, and creatine kinase) were discarded, and missing values were imputed by the median. To assess the performance of the models, measures such as accuracy, sensitivity (also known as recall), specificity, positive predictive value (PPV) (also known as precision), negative predictive value (NPV), and F1 score were analyzed. The value of the AUROC was used to select the best model. To understand the individual contribution of each variable to the predictive models, we calculated their respective Shapley values. All the analyzes were performed using the Python programming language with the scikit-learn library.

Data availability

The data comes from electronic medical records from BP—A Beneficência Portuguesa de São Paulo Hospital in Brazil and it is not publicly available as it contains sensitive information of patients.

Code availability

All the code written to process and analyze the data can be made available upon request to the corresponding author.

References

Bong, C.-L. et al. The COVID-19 pandemic: Effects on low- and middle-income countries. Anesth. Analg. 131, 86–92 (2020).
Article CAS Google Scholar
Stewart, R., El-Harakeh, A. & Cherian, S. A. Evidence synthesis communities in low-income and middle-income countries and the COVID-19 response. Lancet 396, 1539–1541 (2020).
Article CAS Google Scholar
Walker, P. G. T. et al. The impact of COVID-19 and strategies for mitigation and suppression in low- and middle-income countries. Science 369(6502), 413–422. https://doi.org/10.1126/science.abc0035 (2020).
Article ADS Google Scholar
Da Silveira, M. R. COVID-19: Intensive care units, mechanical ventilators, and latent mortality profiles associated with case-fatality in Brazil. Cad. Saude Publica. 36(5), 1–12 (2020).
Google Scholar
Cheng, F.-Y. et al. Using machine learning to predict ICU transfer in hospitalized COVID-19 patients. J. Clin. Med. 9(6). https://doi.org/10.3390/jcm9061668 (2020).
Cao, X. COVID-19: Immunopathology and its implications for therapy. Nat. Rev. Immunol. Internet. 20, 269–270. https://doi.org/10.1038/s41577-020-0308-3 (2020).
Article CAS Google Scholar
Candido, D. et al. Evolution and epidemic spread of SARS-CoV-2 in Brazil. Science 369(6508), 1255–1260. https://doi.org/10.1126/science.abd2161 (2020).
Article ADS CAS PubMed Google Scholar
Noronha, K. V. M. S. et al. The COVID-19 pandemic in Brazil: Analysis of supply and demand of hospital and ICU beds and mechanical ventilators under different scenarios. Cad. Saude Publica 36, 1–17 (2020).
Google Scholar
Palamim, C. V. C. & Marson, F. A. L. COVID-19—The availability of ICU beds in Brazil during the onset of pandemic. Ann. Glob. Heal. 86, 100 (2020).
Article Google Scholar
Castro, M. C., Carvalho, L. R. De, Chin, T. & Kahn, R. Demand for hospitalization services for COVID-19 patients in Brazil. medRxiv. https://doi.org/10.1101/2020.03.30.20047662 (2020).
Article PubMed PubMed Central Google Scholar
Souza, W. M. et al. Epidemiological and clinical characteristics of the COVID-19 epidemic in Brazil. Nat. Hum. Behav. 4, 856–865 (2020).
Article Google Scholar
Zhou, F. et al. Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: A retrospective cohort study. Lancet [Internet]. 395(10229), 1054–1062. https://doi.org/10.1016/S0140-6736(20)30566-3 (2020).
Article CAS PubMed PubMed Central Google Scholar
Hirayama, A. et al. The characteristics and clinical course of patients with COVID-19 who received invasive mechanical ventilation in Osaka, Japan. Int. J. Infect. Dis. 102, 282–284 (2020).
Article Google Scholar
CDC. Interim clinical guidance for management of patients with confirmed coronavirus disease (COVID-19). (2020). https://www.cdc.gov/coronavirus/2019-ncov/hcp/clinical-guidance-management-patients.html. (Accessed 7 December 2020)
Yang, X. et al. Clinical course and outcomes of critically ill patients with SARS-CoV-2 pneumonia in Wuhan, China: A single-centered, retrospective, observational study. Lancet Respir. Med. 8, 475–481 (2020).
Article CAS Google Scholar
Serafim, R. B., Póvoa, P., Souza-Dantas, V., Kalil, A. C. & Salluh, J. I. F. Clinical course and outcomes of critically ill patients with COVID-19 infection: A systematic review. Clin. Microbiol. Infect. https://doi.org/10.1016/j.cmi.2020.10.017 (2020).
Article PubMed PubMed Central Google Scholar
Zhang, L. et al. D-dimer levels on admission to predict in-hospital mortality in patients with COVID-19. J. Thromb. Haemost. 18, 1324–1329 (2020).
Article CAS Google Scholar
Qin, L. et al. A predictive model and scoring system combining clinical and CT characteristics for the diagnosis of COVID-19. Eur. Radiol. https://doi.org/10.1007/s00330-020-07022-1 (2020).
Article PubMed PubMed Central Google Scholar
Wang, S. et al. A fully automatic deep learning system for COVID-19 diagnostic and prognostic analysis. Eur. Respir. J. https://doi.org/10.1183/13993003.00775-2020 (2020).
Article PubMed PubMed Central Google Scholar
DeCaprio, D. et al. Building a COVID-19 vulnerability index. medRxiv https://doi.org/10.1101/2020.03.16.20036723 (2020).
Article Google Scholar
Yan, L. et al. An interpretable mortality prediction model for COVID-19 patients. Nat. Mach. Intell. 2, 283–288 (2020).
Article Google Scholar
Batista, A. F. M., Miraglia, J. L., Donato, H. R. & Chiavegatto Filho, A. D. P. COVID-19 diagnosis prediction in emergency care patients: A machine learning approach. medRxiv. https://doi.org/10.1101/2020.04.04.20052092 (2020).
Article Google Scholar
Heldt, F. S. et al. Early risk assessment for COVID-19 patients from emergency department data using machine learning. medRxiv. https://doi.org/10.1101/2020.05.19.20086488 (2020).
Article Google Scholar
Terpos, E. et al. Hematological findings and complications of COVID-19. Am. J. Hematol. 95(7), 834–847 (2020).
Article CAS Google Scholar
Wynants, L. et al. Prediction models for diagnosis and prognosis of COVID-19 infection: Systematic review and critical appraisal. BMJ 369. https://doi.org/10.1136/bmj.m1328 (2020).
Gao, Y. et al. Machine learning based early warning system enables accurate mortality risk prediction for COVID-19. Nat. Commun. 11, 5033 (2020).
Article ADS CAS Google Scholar
Junior, J. C., Andrade, A. B. & Carvalho, W. B. Evaluation of the use of electronic medical record systems in Brazilian intensive care units. Rev. Bras. Ter. Intensiva 30, 338–346 (2018).
Google Scholar
WHO. Coronavirus disease (COVID-19) weekly epidemiological update and weekly operational update. (2020). https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports. (Accessed 9 December 2020).
Satomi, E. et al. Fair allocation of scarce medical resources during COVID-19 pandemic: ethical considerations. Einstein. 18. https://doi.org/10.31744/einstein_journal/2020ae5775 (2020).
Dondorp, A. M., Hayat, M., Aryal, D., Beane, A. & Schultz, M. J. Respiratory support in COVID-19 patients, with a focus on resource-limited settings. Am. J. Trop. Med. Hyg. 102, 1191–1197 (2020).
Article Google Scholar
Rath, D. et al. Impaired cardiac function is associated with mortality in patients with acute COVID-19 infection. Clin. Res. Cardiol. https://doi.org/10.1007/s00392-020-01683-0 (2020).
Article PubMed PubMed Central Google Scholar
Bajwa, E. K. et al. Plasma C-reactive protein levels are associated with improved outcome in ARDS. Chest 136(2), 471–480 (2009).
Article CAS Google Scholar
Chen, W. et al. Plasma CRP level is positively associated with the severity of COVID-19. Ann. Clin. Microbiol. Antimicrob. 19, 18 (2020).
Article CAS Google Scholar
Wang, G. et al. C-Reactive protein level may predict the risk of COVID-19 aggravation. Open Forum Infect. Dis. 7. https://doi.org/10.1093/ofid/ofaa153 (2020)
Kermali, M., Khalsa, R. K., Pillai, K., Ismail, Z. & Harky, A. The role of biomarkers in diagnosis of COVID-19—A systematic review. Life Sci. 254, 117788 (2020).
Article CAS Google Scholar
Henry, B. M. & Lippi, G. Chronic kidney disease is associated with severe coronavirus disease 2019 (COVID-19) infection. Int. Urol. Nephrol. 52(6), 1193–1194 (2020).
Article CAS Google Scholar
Cheng, Y. et al. Kidney disease is associated with in-hospital death of patients with COVID-19. Kidney Int. 97, 829–838 (2020).
Article CAS Google Scholar
Xiang, J. et al. Potential biochemical markers to identify severe cases among COVID-19 patients. medRxiv. https://doi.org/10.1101/2020.03.19.20034447 (2020).
Article PubMed PubMed Central Google Scholar
Ding, Y. et al. Braden scale for assessing pneumonia after acute ischaemic stroke. BMC Geriatr. 19, 259 (2019).
Article Google Scholar
Suttipong, C. & Sindhu, S. Predicting factors of pressure ulcers in older Thai stroke patients living in urban communities. J. Clin. Nurs. 21(3–4), 372–379 (2011).
PubMed Google Scholar
Tan, L. et al. Lymphopenia predicts disease severity of COVID-19: A descriptive and predictive study. Signal Transduct. Target Ther. [Internet]. 5(1), 33. https://doi.org/10.1038/s41392-020-0148-4 (2020).
Article CAS PubMed PubMed Central Google Scholar
Huang, I. & Pranata, R. Lymphopenia in severe coronavirus disease-2019 (COVID-19): Systematic review and meta-analysis. J. Intensive Care 8, 36 (2020).
Article Google Scholar
Moons, K. G. M. et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): Explanation and elaboration. Ann. Intern. Med. 162(1), W1-73 (2015).
Article Google Scholar
Bishop, C. Neural Networks for Pattern Recognition (Oxford University Press, Oxford, 1995).
MATH Google Scholar
Geurts, P., Ernst, D. & Wehenkel, L. Extremely randomized trees. Mach. Learn. https://doi.org/10.1007/s10994-006-6226-1 (2006).
Article MATH Google Scholar
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Article Google Scholar
Dorogush, A. V., Ershov, V. & Gulin, A. CatBoost: Gradient boosting with categorical features support. arXiv. https://arxiv.org/abs/1810.11363 (2018)
Chen, T. & Guestrin, C. XGBoost: A scalable tree boosting system. Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining https://doi.org/10.1145/2939672.2939785 (2016).
Article Google Scholar
He, H. & Ma, Y. Imbalanced Learning: Foundations, Algorithms, and Applications (Wiley, New York, 2013).
Book Google Scholar

Download references

Acknowledgements

We would like to thank the BP—A Beneficência Portuguesa de São Paulo Hospital for its willingness to contribute to the research. This work was supported by National Council for Scientific and Technological Development (CNPq) under Grant Number 402626/2020-6 and Paraíba Research Foundation FAPESQPB with Grant Number 206/2020.

Author information

Authors and Affiliations

School of Public Health, University of São Paulo, São Paulo, SP, Brazil
Fernando Timoteo Fernandes, Tiago Almeida de Oliveira, Cristiane Esteves Teixeira, Andre Filipe de Moraes Batista & Alexandre Dias Porto Chiavegatto Filho
Fundacentro, São Paulo, SP, Brazil
Fernando Timoteo Fernandes
Statistics Department, Paraíba State University, Paraíba, PB, Brazil
Tiago Almeida de Oliveira
Bioinformatics and Computational Biology Lab, Brazilian National Cancer Institute, Rio de Janeiro, RJ, Brazil
Cristiane Esteves Teixeira
BP-A Beneficência Portuguesa de São Paulo, São Paulo, SP, Brazil
Gabriel Dalla Costa

Authors

Fernando Timoteo Fernandes
View author publications
You can also search for this author in PubMed Google Scholar
Tiago Almeida de Oliveira
View author publications
You can also search for this author in PubMed Google Scholar
Cristiane Esteves Teixeira
View author publications
You can also search for this author in PubMed Google Scholar
Andre Filipe de Moraes Batista
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel Dalla Costa
View author publications
You can also search for this author in PubMed Google Scholar
Alexandre Dias Porto Chiavegatto Filho
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Initial study concept and design: A.D.P.C.F. Acquisition of data: G.D.C. Model training: F.T.F, T.A.O, C.E.T, A.F.M.B. Analysis and interpretation of data: F.T.F, T.A.O, C.E.T, G.D.C., A.D.P.C.F. Drafting of the paper: All authors contributed for drafting the manuscript. Critical revision of the manuscript: all authors provided critical review of the manuscript and approved the final draft for publication.

Corresponding author

Correspondence to Fernando Timoteo Fernandes.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Fernandes, F.T., de Oliveira, T.A., Teixeira, C.E. et al. A multipurpose machine learning approach to predict COVID-19 negative prognosis in São Paulo, Brazil. Sci Rep 11, 3343 (2021). https://doi.org/10.1038/s41598-021-82885-y

Download citation

Received: 19 August 2020
Accepted: 14 January 2021
Published: 08 February 2021
DOI: https://doi.org/10.1038/s41598-021-82885-y

This article is cited by

Immune landscape and redox imbalance during neurological disorders in COVID-19
- Abhimanyu Thakur
- Vartika Sharma
- Kui Zhang
Cell Death & Disease (2023)
Generalizable machine learning approach for COVID-19 mortality risk prediction using on-admission clinical and laboratory features
- Siavash Shirzadeh Barough
- Seyed Amir Ahmad Safavi-Naini
- Mohamad Amin Pourhoseingholi
Scientific Reports (2023)
An overview of deep learning techniques for COVID-19 detection: methods, challenges, and future works
- Ercan Gürsoy
- Yasin Kaya
Multimedia Systems (2023)
SARS-CoV-2 Diagnosis Using Transcriptome Data: A Machine Learning Approach
- Pratheeba Jeyananthan
SN Computer Science (2023)
Proof of concept of the potential of a machine learning algorithm to extract new information from conventional SARS-CoV-2 rRT-PCR results
- Jorge Cabrera Alvargonzález
- Ana Larrañaga Janeiro
- Jacobo Porteiro Fresco
Scientific Reports (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.