Sepsis is the third leading cause of death worldwide and the main cause of mortality in hospitals1,2,3, but the best treatment strategy remains uncertain. In particular, evidence suggests that current practices in the administration of intravenous fluids and vasopressors are suboptimal and likely induce harm in a proportion of patients1,4,5,6. To tackle this sequential decision-making problem, we developed a reinforcement learning agent, the Artificial Intelligence (AI) Clinician, which extracted implicit knowledge from an amount of patient data that exceeds by many-fold the life-time experience of human clinicians and learned optimal treatment by analyzing a myriad of (mostly suboptimal) treatment decisions. We demonstrate that the value of the AI Clinician’s selected treatment is on average reliably higher than human clinicians. In a large validation cohort independent of the training data, mortality was lowest in patients for whom clinicians’ actual doses matched the AI decisions. Our model provides individualized and clinically interpretable treatment decisions for sepsis that could improve patient outcomes.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $18.75 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
MIMIC-III is openly available. Access to the eRI data is restricted to the Philips eICU Research Institute. The eICU Collaborative Research Database contains a sample of over 200,000 patient stays from the eRI database that is freely available. The databases were queried in pgAdmin 4 v 1.3, and computations were implemented in Matlab R2017a (MathWorks, Inc.). Access to the computer code used in this research is available by request to the corresponding authors. To facilitate the reproduction of our results, we provide the list of anonymous patient identifiers for both databases in Supplementary Data 1 and 2.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Gotts, J. E. & Matthay, M. A. Sepsis: pathophysiology and clinical management. Br. Med. J. 353, i1585 (2016).
Torio, C. M. & Andrews, R. M. National Inpatient Hospital Costs: The Most Expensive Conditions by Payer, 2011: Statistical Brief #160. in Healthcare Cost and Utilization Project (HCUP) Statistical Briefs (Agency for Health Care Research and Quality, Rockville, MD, USA, 2013).
Liu, V. et al. Hospital deaths in patients with sepsis from 2 independent cohorts. J. Am. Med. Assoc. 312, 90–92 (2014).
Byrne, L. & Van Haren, F. Fluid resuscitation in human sepsis: time to rewrite history? Ann. Intensive Care 7, 4 (2017).
Marik, P. E. The demise of early goal-directed therapy for severe sepsis and septic shock. Acta Anaesthesiol. Scand. 59, 561–567 (2015).
Marik, P. & Bellomo, R. A rational approach to fluid therapy in sepsis. Br. J. Anaesth. 116, 339–349 (2016).
Singer, M. et al. The third international consensus definitions for sepsis and septic shock (sepsis-3). J. Am. Med. Assoc. 315, 801–810 (2016).
Waechter, J. et al. Interaction between fluids and vasoactive agents on mortality in septic shock: a multicenter, observational study. Crit. Care Med. 42, 2158–2168 (2014).
Bai, X. et al. Early versus delayed administration of norepinephrine in patients with septic shock. Crit. Care. 18, 532 (2014).
Marik, P. E., Linde-Zwirble, W. T., Bittner, E. A., Sahatjian, J. & Hansell, D. Fluid administration in severe sepsis and septic shock, patterns and outcomes: an analysis of a large national database. Intensive Care Med. 43, 625–632 (2017).
Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction. 1st edn (MIT Press, Cambridge, MA, USA, 1998).
Bennett, C. C. & Hauser, K. Artificial intelligence framework for simulating clinical decision-making: a Markov decision process approach. Artif. Intell. Med. 57, 9–19 (2013).
Schaefer, A. J., Bailey, M. D., Shechter, S. M. & Roberts, M. S. Modeling Medical Treatment Using Markov Decision Processes. in Operations Research and Health Care (eds. Brandeau, M. L., Sainfort, F. & Pierskalla, W. P.) 593–612 (Springer, Boston, 2005).
Gulshan, V. et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. J. Am. Med. Assoc. 316, 2402–2410 (2016).
Prasad, N., Cheng, L.-F., Chivers, C., Draugelis, M. & Engelhardt, B. E. A Reinforcement Learning Approach to Weaning of Mechanical Ventilation in Intensive Care Units. Preprint at https://arxiv.org/abs/1704.06300 (2017).
Bothe, M. K. et al. The use of reinforcement learning algorithms to meet the challenges of an artificial pancreas. Expert. Rev. Med. Devices. 10, 661–673 (2013).
Lowery, C. & Faisal, A. A. Towards efficient, personalized anesthesia using continuous reinforcement learning for propofol infusion control. in International IEEE/EMBS Conference on Neural Engineering 1414–1417 (IEEE, San Diego, CA, USA, 2013).
Johnson, A. E. W. et al. MIMIC-III, a freely accessible critical care database. Sci. Data 3, 160035 (2016).
Elixhauser, A., Steiner, C., Harris, D. R. & Coffey, R. M. Comorbidity measures for use with administrative data. Med. Care 36, 8–27 (1998).
Puterman, M. L. Markov Decision Processes: Discrete Stochastic Dynamic Programming. (Wiley-Interscience, Hoboken, NJ, USA, 2014).
Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction. 2nd edn,(MIT Press, Cambridge, MA, USA, 2018).
Thomas, P. S., Theocharous, G. & Ghavamzadeh, M. High-Confidence Off-Policy Evaluation. in Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI, Palo Alto, CA, USA, 2015).
Hanna, J. P., Stone, P. & Niekum, S. Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation. Preprint at https://arxiv.org/abs/1606.06126 (2016).
Thomas, P. S., Theocharous, G. & Ghavamzadeh, M. High confidence policy improvement. in Proceedings of the 32nd International Conference on Machine Learning 2380–2388 (PMLR, Lille, France, 2015).
Acheampong, A. & Vincent, J.-L. A positive fluid balance is an independent prognostic factor in patients with sepsis. Crit. Care. 19, 251 (2015).
Johnson, A. E. W. et al. Machine learning and decision support in critical care. Proc. IEEE Inst. Electr. Electron Eng. 104, 444–466 (2016).
Vincent, J.-L. The future of critical care medicine: integration and personalization. Crit. Care Med. 44, 386–389 (2016).
Chen, J. H. & Asch, S. M. Machine learning and prediction in medicine—beyond the peak of inflated expectations. N. Engl. J. Med. 376, 2507–2509 (2017).
Gordon, A. C. et al. levosimendan for the prevention of acute organ dysfunction in sepsis. N. Engl. J. Med. 375, 1638–1648 (2016).
Ranieri, V. M. et al. Drotrecogin alfa (activated) in adults with septic shock. N. Engl. J. Med. 366, 2055–2064 (2012).
Seymour, C. W. et al. Assessment of clinical criteria for sepsis: For the third international consensus definitions for sepsis and septic shock (sepsis-3). J. Am. Med. Assoc. 315, 762–774 (2016).
Raith, E. P. et al. Prognostic accuracy of the SOFA Score, SIRS Criteria, and qSOFA score for in-hospital mortality among adults with suspected infection admitted to the intensive care unit. J. Am. Med. Assoc. 317, 290–300 (2017).
Hug, C. W. Detecting hazardous intensive care patient episodes using real-time mortality models. PhD thesis, Massachusetts Institute of Technology. (2009).
Tutz, G. & Ramzan, S. Improved methods for the imputation of missing data by nearest neighbor methods. Comput. Stat. Data. Anal. 90, 84–99 (2015).
Arthur, D. & Vassilvitskii, S. K-means++: The Advantages of Careful Seeding. in Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms 1027–1035 (Society for Industrial and Applied Mathematics, Philadelphia, 2007).
Jones, R. H. Bayesian information criterion for longitudinal and clustered data. Stat. Med. 30, 3050–3056 (2011).
Brown, S. M. et al. Survival after shock requiring high-dose vasopressor therapy. Chest 143, 664–671 (2013).
Norris, J. R. Discrete-time Markov chains. in Markov Chains (Cambridge University Press, Cambridge, MA, USA, 1997).
Jiang, N. & Li, L. Doubly robust off-policy value evaluation for reinforcement learning. Preprint at https://arxiv.org/abs/1511.03722 (2015).
Thomas, P. S. & Brunskill, E. Data-efficient off-policy policy evaluation for reinforcement learning. Preprint at https://arxiv.org/abs/1604.00923 (2016).
Precup, D., Sutton, R. S. & Singh, S. P. Eligibility Traces for off-policy policy evaluation. in Proceedings of the Seventeenth International Conference on Machine Learning 759–766 (Morgan Kaufmann Publishers Inc., Burlington, MA, USA, 2000).
Munos, R., Stepleton, T., Harutyunyan, A. & Bellemare, M. G. Safe and efficient off-policy reinforcement learning. Preprint at https://arxiv.org/abs/1606.02647 (2016).
We are grateful to F. Doshi-Velez and O. Gottesman for their assistance with the methodology. We are grateful for support from the National Institute of Health Research (NIHR) Comprehensive Biomedical Research Centre based at Imperial College Healthcare NHS Trust and Imperial College London. We are thankful to the Laboratory of Computational Physiology at the Massachusetts Institute of Technology and the eICU Research Institute for providing the data used in this research. M.K. and this project are funded by the Engineering and Physical Sciences Research Council and an Imperial College President’s PhD Scholarship. A.C.G. is funded by an NIHR Research Professorship award (RP-2015-06-018). The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.
About this article
Critical Care (2019)
Critical Care (2019)
Nature Medicine (2019)
Nature Medicine (2019)
Nature Human Behaviour (2019)