Abstract
Interest in machine-learning applications within medicine has been growing, but few studies have progressed to deployment in patient care. We present a framework, context and ultimately guidelines for accelerating the translation of machine-learning-based interventions in health care. To be successful, translation will require a team of engaged stakeholders and a systematic process from beginning (problem formulation) to end (widespread deployment).
This is a preview of subscription content
Access options
Subscribe to Journal
Get full journal access for 1 year
$59.00
only $4.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Buy article
Get time limited or full article access on ReadCube.
$32.00
All prices are NET prices.

Change history
19 September 2019
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
References
- 1.
Lazer, D., Kennedy, R., King, G. & Vespignani, A. Big data. The parable of Google Flu: traps in big data analysis. Science 343, 1203–1205 (2014).
- 2.
Hutson, M. Even artificial intelligence can acquire biases against race and gender. Science https://doi.org/10.1126/science.aal1053 (2017).
- 3.
He, J. et al. The practical implementation of artificial intelligence technologies in medicine. Nat. Med. 25, 30–36 (2019).
- 4.
Silva, I., Moody, G., Scott, D. J., Celi, L. A. & Mark, R. G. Predicting in-hospital mortality of ICU patients: the Physionet/Computing in Cardiology Challenge 2012. Comput. Cardiol. 39, 245–248 (2012).
- 5.
Luo, Y., Cai, X., Zhang, Y. & Xu, J. Multivariate time series imputation with generative adversarial networks. in Advances in Neural Information Processing Systems 1596–1607 (NeurIPS, 2018).
- 6.
O’Malley, K. J. et al. Measuring diagnoses: ICD code accuracy. Health Serv. Res. 40, 1620–1639 (2005).
- 7.
Saria, S. & Subbaswamy, A. Tutorial: safe and reliable machine learning. Preprint at https://arxiv.org/abs/1904.07204 (2019).
- 8.
Chen, I. Y., Szolovits, P. & Ghassemi, M. Can AI help reduce disparities in general medical and mental health care? AMA J. Ethics 21, E167–E179 (2019).
- 9.
Schulam, P. & Saria, S. Reliable decision support using counterfactual models. in Advances in Neural Information Processing Systems 1697–1708 (NeurIPS, 2017).
- 10.
O’neil, C. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy (Broadway Books, 2016).
- 11.
Williams, D. R., Mohammed, S. A., Leavell, J. & Collins, C. Race, socioeconomic status, and health: complexities, ongoing challenges, and research opportunities. Ann. NY Acad. Sci. 1186, 69–101 (2010).
- 12.
Rajpurkar, P. et al. Chexnet: radiologist-level pneumonia detection on chest X-rays with deep learning. Preprint at https://arxiv.org/abs/1711.05225 (2017).
- 13.
Liu, V.X., Bates, D.W., Wiens, J. & Shah, N.H. The number needed to benefit: estimating the value of predictive analytics in healthcare. J. Am. Med. Inform. Assoc. https://doi.org/10.1093/jamia/ocz088 (2019).
- 14.
Oh, J. et al. A generalizable, data-driven approach to predict daily risk of Clostridium difficile infection at two large academic health centers. Infect. Control Hosp. Epidemiol. 39, 425–433 (2018).
- 15.
Schulam, P. & Saria, S. Can you trust this prediction? Auditing pointwise reliability after learning. in The 22nd International Conference on Artificial Intelligence and Statistics 1022–1031 (PMLR, 2019).
- 16.
Henderson, P. et al. Deep reinforcement learning that matters. in Thirty-second AAAI Conference on Artificial Intelligence (AAAI, 2018).
- 17.
Nestor, B. et al. Rethinking clinical prediction: why machine learning must consider year of care and feature aggregation. Preprint at https://arxiv.org/abs/1811.12583 (2018).
- 18.
Henry, K. E., Hager, D. N., Pronovost, P. J. & Saria, S. A targeted real-time early warning score (TREWScore) for septic shock. Sci. Transl. Med. 7, 299ra122 (2015).
- 19.
Hemming, K., Haines, T. P., Chilton, P. J., Girling, A. J. & Lilford, R. J. The stepped wedge cluster randomised trial: rationale, design, analysis, and reporting. Br. Med. J. 350, h391 (2015).
- 20.
Evans, B. & Ossorio, P. The challenge of regulating clinical decision support software after 21st century cures. Am. J. Law Med. 44, 237–251 (2018).
- 21.
Okoro, A. O. Preface: The 21st Century Cures Act—a cure for the 21st century? Am. J. Law Med. 44, 155 (2018).
- 22.
Proposed Regulatory Framework for Modifications to Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) (U.S. Food & Drug Administration, 2019); https://www.fda.gov/media/122535/download
- 23.
Massachusetts Institute of Technology. Self-driving cars, robots: identifying AI ‘blind spots’. ScienceDaily (25 January 2019).
- 24.
Chien, S. & Wagstaff, K. L. Robotic space exploration agents. Sci. Robot. 2, eaan4831 (2017).
Acknowledgements
The authors would like to thank the participants in the MLHC Conference 2018 (http://www.mlforhc.org), specifically the organizers and participants of the pre-meeting workshop that served as the genesis for this manuscript, for providing valuable feedback on the initial ideas through a panel discussion.
Author information
Affiliations
Corresponding authors
Ethics declarations
Competing interests
J.W., F.D.-V., D.K. and K.J. are on the board of Machine Learning for Healthcare, a non-profit organization that hosts a yearly academic meeting; they are reimbursed for registration and travel expenses. F.D.-V. consults for DaVita, a healthcare company. S.T.-I. serves on the board of Scients (https://scients.org/) and is reimbursed for travel expenses. S.S. is a founder of, and holds equity in, Bayesian Health. The results of the study discussed in this publication could affect the value of Bayesian Health. This arrangement has been reviewed and approved by Johns Hopkins University in accordance with its conflict-of-interest policies. S.S. is a member of the scientific advisory board for PatientPing. M. Sendak is a named inventor of the Sepsis Watch deep-learning model, which was licensed from Duke University by Cohere Med, Inc. M. Sendak does not hold any equity in Cohere Med, Inc. M. Saeed is a founder and Chief Medical Officer at HEALTH at SCALE Technologies and holds equity in this company. P.O. consults for Roche-Genentech, from whom she has received travel reimbursement and consulting fees of less than $4,000/year. A.G., K.H., M.G. and V.L. have no conflicts to declare.
Additional information
Peer review information Joao Monteiro was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wiens, J., Saria, S., Sendak, M. et al. Do no harm: a roadmap for responsible machine learning for health care. Nat Med 25, 1337–1340 (2019). https://doi.org/10.1038/s41591-019-0548-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41591-019-0548-6
Further reading
-
Identification of diagnostic signatures in ulcerative colitis patients via bioinformatic analysis integrated with machine learning
Human Cell (2022)
-
Machine learning for subtype definition and risk prediction in heart failure, acute coronary syndromes and atrial fibrillation: systematic review of validity and clinical utility
BMC Medicine (2021)
-
How can we discover the most valuable types of big data and artificial intelligence-based solutions? A methodology for the efficient development of the underlying analytics that improve care
BMC Medical Informatics and Decision Making (2021)
-
The role of machine learning in clinical research: transforming the future of evidence generation
Trials (2021)
-
Artificial intelligence for good health: a scoping review of the ethics literature
BMC Medical Ethics (2021)