Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

The potential of artificial intelligence to improve patient safety: a scoping review


Artificial intelligence (AI) represents a valuable tool that could be used to improve the safety of care. Major adverse events in healthcare include: healthcare-associated infections, adverse drug events, venous thromboembolism, surgical complications, pressure ulcers, falls, decompensation, and diagnostic errors. The objective of this scoping review was to summarize the relevant literature and evaluate the potential of AI to improve patient safety in these eight harm domains. A structured search was used to query MEDLINE for relevant articles. The scoping review identified studies that described the application of AI for prediction, prevention, or early detection of adverse events in each of the harm domains. The AI literature was narratively synthesized for each domain, and findings were considered in the context of incidence, cost, and preventability to make projections about the likelihood of AI improving safety. Three-hundred and ninety-two studies were included in the scoping review. The literature provided numerous examples of how AI has been applied within each of the eight harm domains using various techniques. The most common novel data were collected using different types of sensing technologies: vital sign monitoring, wearables, pressure sensors, and computer vision. There are significant opportunities to leverage AI and novel data sources to reduce the frequency of harm across all domains. We expect AI to have the greatest impact in areas where current strategies are not effective, and integration and complex analysis of novel, unstructured data are necessary to make accurate predictions; this applies specifically to adverse drug events, decompensation, and diagnostic errors.


Adverse events related to unsafe care represent one of the top ten causes of death and disability worldwide, and a third to a half appear preventable1. Investments in reducing harm can lead to substantial savings, and more importantly improve patient outcomes.

Twenty years after the Institute of Medicine’s “To Err Is Human” report, problems with safety remain all too common2 despite patient-centered strategies to create a culture of safety; for example, implementation of inpatient checklists, and computerization of prescribing and bar-coding3,4,5,6. However, safety issues outside the hospital have received much less attention than hospital safety, yet care is increasingly being shifted outside the hospital.

The application of artificial intelligence (AI) has tremendous potential as a tool for improving safety, both inside and outside of the hospital, by providing solutions to predict harms, collect a variety of data including both new and already-available data, and as part of quality improvement initiatives. For instance, AI can provide decision support by identifying patients at high risk of hospital harm to guide prevention and early intervention strategies. Similarly, AI can be applied in outpatient, community, and home settings. When coupled with digital approaches, these technologies can improve communication between patients and healthcare providers to reduce the frequency of preventable harms. While existing data will be helpful, new data will be available through technologies like sensors which should improve predictions.

AI techniques, such as machine learning (ML), can be leveraged to provide clinical risk prediction to improve patient safety. Data-driven ML algorithms have advantages over rule-based approaches for risk prediction, as they allow simultaneous consideration of multiple data sources to identify predictors and outcomes. Healthcare organizations are increasingly implementing ML and other forms of AI to improve patient care and outcomes. However, substantial impacts to safety and reduction of associated costs related to safety issues will require further acceptance of these technologies across the larger ecosystem including regulatory agencies and the marketplace.

Evidence suggests that the majority of healthcare harms fall into the following domains: healthcare-associated infections (HAIs), adverse drug events (ADEs), venous thromboembolism (VTE), surgical complications, pressure ulcers, falls, insufficient decompensation detection, and diagnostic errors—including missed and delayed diagnoses7,8. These domains are centered around hospital harm, and other issues undoubtedly play a role, but these adverse events account for the bulk of harm in hospitals. The goal of this paper was to conduct a scoping review to evaluate if AI has the potential to improve healthcare safety by reducing the frequency of adverse events within these eight major domains of harm.


This scoping review is reported in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR)9.

Search strategy

A structured search was used to query MEDLINE (Ovid) for relevant articles published on or before October 25, 2019. Two main concepts of AI and patient safety, including the eight harm domains, were mapped to the most relevant controlled vocabulary using Medical Subject Headings (MeSH), and free-text terms were added where necessary. The full search strategy is provided in Supplementary Note 1.

Inclusion and exclusion criteria

The scoping review included studies that focused on the application of AI for prediction, prevention, and/or early detection of events in each of the harm domains in hospital, outpatient, community, and home settings. No comparisons were required, and all study designs were considered for inclusion. Articles were excluded if they were not published in the English language or reported on the use of AI to measure the frequency of harm events (e.g., post-marketing surveillance of drugs). Applications in robotics were also excluded. Detailed inclusion and exclusion criteria are provided in Supplementary Table 1.

Screening and data abstraction

Articles were screened in two stages using Covidence (Australia), a web-based review management tool. Titles and abstracts were screened for relevance, and eligible records were evaluated based on full-text articles by a single reviewer. Additional articles were identified through handsearching. For each article included in the scoping review, citation information was exported from Covidence into an Excel spreadsheet and harm domains were manually abstracted by a single reviewer.

Scoping review

The characteristics of studies that reported on the use of AI to improve patient safety were summarized. The literature was narratively synthesized for each harm domain highlighting key examples of how AI can be leveraged for prediction, prevention, and/or early detection of patient harms. Selected examples of traditional and novel data sources that could be used to develop AI algorithms to improve patient safety were summarized in tabular form.

Evaluation of the potential for AI to improve patient safety

The findings of the scoping review were considered in the context of incidence, cost, and preventability of events to evaluate the potential of AI for improving safety. Current literature reporting on incidence, cost, and preventability was summarized for the eight harm domains in tabular form. Cost estimates were adjusted to United States dollars (USD, 2019) using the Producer Price Index to facilitate comparisons across the domains10. Projections around the likelihood of AI to improve safety in each of the harm domains were made and attractive early targets were identified as part of the Discussion.


Characteristics of included studies

From 2677 unique records, 392 articles met the inclusion criteria for the scoping review and are presented in Supplementary Table 2. A modified Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram is provided in Fig. 1. The majority of studies were pre-clinical and relied on retrospective analyses of data. Most algorithms were not externally validated or tested prospectively. The incidence, cost, and preventability of events for each harm domain are presented in Table 1. Traditional and novel data sources that can be used to develop AI algorithms are presented in Table 2.

Fig. 1: PRISMA flow diagram showing disposition of articles.

The asterisk denotes that some studies addressed multiple harm domains.

Table 1 Incidence, cost, and preventability of events in the eight harm domains from the peer-reviewed literature.
Table 2 Traditional and novel (italicized) data sources that can be used to develop artificial intelligence algorithms to improve patient safety; selected examples.

Healthcare-associated infections

Approximately 3.2% of inpatients experienced HAIs in 2015 (ref. 11). The estimated annual cost for five significant HAIs is 10.7 billion (USD, 2019)12. Up to 70% of specific HAIs are considered preventable using existing evidence-based strategies13. The scoping review identified 54 articles (see Supplementary Table 2) describing the use of AI for prediction or early detection of HAIs.

ML and fuzzy logic (i.e., logical reasoning models based on incomplete or ambiguous data) have been applied for early detection of HAIs. Most algorithms were developed using claims-based data and information captured in electronic health records (EHRs) including laboratory test results and diagnostic imaging. With the integration of novel complex data, AI-based analytics could expedite detection and further improve diagnostic accuracy. For example, data from eNoses (i.e., chemical vapor sensors) have been analyzed using ML methods to rapidly detect ventilator-associated pneumonia (area under the curve (AUC) = 0.98), differentiate between six common wound pathogens (accuracy = 78%), and classify various strains of Clostridium difficile (sensitivities >80%; specificities >73%)14,15,16.

AI can also contribute to infection control by providing real-time, accurate predictions of HAI risk to guide patient-specific interventions before an infection occurs. For example, a random forest classification algorithm can predict onset of central line-associated bloodstream infections with an AUC of 0.82 (ref. 17).

AI can also play a role in improving adherence to existing safety protocols; for instance, computer vision using a convolutional network classifier has been applied to monitor hand hygiene compliance in the hospital setting (accuracy = 75%). Similarly, an ML algorithm was developed to provide real-time hand hygiene alerts in the outpatient setting based on data from multiple types of sensors, improving compliance from 54% to 100%18,19. These technologies are increasingly being applied to complex problems and could be used to improve other aspects of infection control, including sanitation or adherence to condition-specific safety protocols20,21.

Adverse drug events

In 2014, ADEs were associated with 1.6 million hospitalizations in the U.S., totaling an estimated 30.0 billion (USD, 2019), with ~½ million ADEs occurring during hospital stays (2.1% of inpatients) and ~1 million present on admission (5.1% of admissions)22. About one in four ADEs are considered preventable given what is known today23. The review located 52 papers (see Supplementary Table 2) about leveraging AI to reduce the frequency of ADEs.

AI-based analytics can be applied to predict previously unreported ADEs based on drug similarities including chemical structure, mechanism of action, and polypharmacy side effects24,25. Deep learning methods using neural fingerprints have been shown to not only predict adverse drug reactions with an AUC of ~0.85, but also identify the associated molecular sub-structures26. These algorithms can inform the evidence-based development of safer medications. Similar techniques can be applied to predict drug–drug interactions for untested combinations of drugs24.

At the point of care, ML can be applied to analyze multiple datasets, including traditional patient data documented in EHRs (e.g., medical history, laboratory test results) with novel data (e.g., bioactivity of single nucleotide polymorphisms (SNPs)), to provide personalized ADE risk estimates and treatment recommendations to support decision making. Using genomic sequencing data, an artificial neural network (ANN) algorithm was developed to guide safer and more effective dosing of warfarin, predicting therapeutic dose with an accuracy of 83% in patients with international normalized ratios (INRs) >3.5 (ref. 27).

Venous thromboembolism

Approximately 3.3% of inpatients develop VTEs, including deep venous thromboses (DVT) and pulmonary emboli (PE), with an estimated cost of 15.1–30.4 billion (USD, 2019) annually7,28. Adherence to current evidence-based strategies could reduce up to 70% of healthcare-associated VTEs29.

AI techniques can be used to identify patients at high risk for VTEs. The review located 26 articles (see Supplementary Table 2) about AI algorithms to prevent or safely rule out VTE. One study applied a super learner ensemble approach to identify inpatients at higher risk of future VTEs with an AUC of 0.69 (ref. 30). Prediction can also be applied to manage at-risk populations in the outpatient setting; for example, a multiple kernel learning algorithm was developed to predict VTE risk among patients undergoing chemotherapy with a sensitivity of 89%, markedly outperforming the recommended Khorana score (sensitivity = 11%)31.

AI methods could also recommend optimal patient-specific treatments. As described above, ML leveraging genomic sequencing data was used to guide safer warfarin dosing resulting in a reduced time to achieving a therapeutic INR (OR = 6.7) compared with standard clinical dosing27.

To date, AI has mostly contributed to VTE detection through the analysis of diagnostic imaging or radiologic reports. ML methods can also be applied to guide appropriate use of diagnostic imaging. For example, an ANN was applied to safely rule out DVT without ultrasonography in 38% of patients with a false-negative rate of only 0.2%32. Similarly, an ANN model was developed to guide computed tomography use for diagnosis of PE33. The algorithm achieved an AUC of 0.90 using an internal validation sample and 0.71 using external data, reiterating the importance of external validation for all AI or ML models.

Surgical complications

Surgical complications are common; 16.0% of patients receiving invasive procedures experience a post-operative complication within 30 days34. Annual U.S. costs associated with complications following emergency general surgery are 7.5 billion (USD, 2019)35. It is estimated that 42.1% of complications following emergency non-trauma surgery are preventable36.

ML use cases include predicting adverse events in both the operative and post-operative setting. Eighty-one papers that leveraged AI to reduce surgical complications were located through the scoping review (see Supplementary Table 2). Predicting blood loss, need for prolonged post-operative intubation, post-operative mortality, pain, nausea, and vomiting all represent areas with demonstrated improvements to current risk tools37,38,39,40. For example, an ANN-based model achieved an accuracy of 92% at stratifying post-operative bleeding risk in patients undergoing cardiac pulmonary bypass37. Another ANN algorithm was developed to predict the need for prolonged ventilation after coronary bypass grafting (AUC = 0.71–0.73)38. Early intervention in these situations could translate into substantial improvements in patient safety.

An area of active research is the use of ML to recognize critical procedural steps in intra-operative videos. ANNs have been trained to identify the steps of laparoscopic sleeve gastrectomy procedures with an accuracy of 82%, and to determine whether the critical view of safety had been achieved in laparoscopic cholecystectomy videos, yielding an accuracy of 95%41,42. ML algorithms that can identify key operative components might be used in the future during procedures to warn surgeons of deviations from an expected sequence of steps or omission of critical elements. Other ML approaches in surgery on the horizon include computer precision pre-operative evaluation, augmented reality in the operating room, technical skills augmentation such as suturing, and ultimately autonomous robotic surgery43.

Pressure ulcers

Approximately 2.7% of hospitalized patients in the U.S. develop a pressure ulcer44. The annual financial burden associated with treatment is estimated to be 28.2 billion (USD, 2019)45. Up to 97% of hospital-acquired pressure ulcers are preventable46.

The scoping review identified 18 articles (see Supplementary Table 2) that used AI for management of pressure ulcers. To date, most AI research in this area has focused on using sensor data for early detection; as such, using AI to predict future risk remains an area of opportunity. A recent study developed a random forest model, using EHR data to classify critical care patients based on their risk of developing pressure ulcers (AUC = 0.79 vs. 0.68 for the Braden Scale)47. Earlier studies tested the feasibility of using smart beds and wheelchair cushions for pressure ulcer detection using fuzzy logic and ML models, respectively48,49. Tracking data from embedded sensors, these algorithms detected a lack of movement and identified specific areas of skin that were at risk of developing an ulcer. Although the models were able to produce detection accuracy of up to 90% in experimental settings, their application and utility in notifying care providers and promoting early intervention remain uncertain.


In 2014, 7.0 million fall-related injuries occurred among adults aged 65 and older50. These falls are estimated to account for 53.4 billion (USD, 2019)51. In the hospital, ~1.1% of inpatients experience a fall and 87.5% of these falls are considered preventable7,46. Forty-seven articles (see Supplementary Table 2) identified through the scoping review described the use of AI for prediction or early detection of falls.

AI approaches could be used to predict fall risk at the point of care using existing data from EHRs. For example, a support vector machine model was able to predict inpatient falls based on data documented from the previous day52. However, the model showed a sensitivity of 65% and a specificity of 70%, which are comparable to existing clinical risk assessments.

Many studies have applied ML methods for the early detection of falls. Classification models using data from wearable sensors in a laboratory setting showed relatively high levels of accuracy (54–84%) at stratifying subjects based on their risk of falls53,54. Using data from cameras, smart carpets, and wearable sensors intended for use in the home environment, support vector machine classifiers have been developed to detect falls, as well as to identify deviating gait patterns as predictors of future falls55,56. These models achieved accuracies of up to 100% in fall detection based on experimental and training datasets; however, their usability and applicability in real-world settings needs further testing.


Clinical deterioration in the hospital remains common. For example, 3.6% of inpatients develop sepsis, costing an estimated 25.7 billion (USD, 2019) annually57. The failure-to-rescue rate following complications of trauma surgery, such as sepsis, is estimated at 13.2%, and one in four of these deaths are considered preventable58. However, prediction and early detection of decompensation remain a challenge in all areas of medicine.

The review located 84 papers (see Supplementary Table 2) that used AI to predict or detect the early signs of decompensation. Most research has focused on sepsis detection, which has seen improvements compared to traditional methods although, as with most ML algorithms, its generalizability may be poor59,60,61,62,63. It is likely that the detection of decompensation will improve by adding new categories of data, including biometric sensors such as continuous telemetry, motion activity sensors such as time spent in the bathroom or bedroom, novel biomarkers, and relevant patient-reported measures64,65,66,67,68,69. For example, ML has been used for early detection of sepsis using novel gene expression biomarkers with AUCs ranging from 0.86 to 0.92 (ref. 68). An AI tool has also been developed using a random forest model to predict nocturnal hypoglycemia from midnight to 6 am with an AUC of 0.84 based on continuous glucose monitoring to provide real-time feedback to inform optimal diabetes management before going to sleep70.

Diagnostic errors

Diagnostic errors—both missed and delayed diagnoses—are relatively common in both inpatient and outpatient settings and estimated to occur in at least 5.1% of the U.S. population each year, with associated costs exceeding 100 billion (USD, 2016) annually71,72.

The scoping review identified 73 articles (see Supplementary Table 2) that leveraged AI to reduce diagnostic error. ML has widely demonstrated reduced errors in interpretation of imaging73. It has also proven beneficial for early diagnosis of lung cancer by analyzing exhaled breath using an eNose sensor; the support vector machine was able to classify cancer patients vs. non-cancer controls with a sensitivity of 87% and a specificity of 71%74. AI techniques are also being applied to reduce delays for critical diagnoses; for example, a clinical decision support system based on fuzzy logic was able to appropriately triage patients presenting to an emergency department with an accuracy of >99%—a 13% increase compared with traditional methods75.

A recent issue of the journal Diagnostics was devoted to this area76, and articles addressed diagnosis of a number of conditions. Another recent review summarized the main classes of problems that they believed AI systems are well suited to solve77.


Based on epidemiologic evidence and our scoping review, we believe that there are major opportunities to improve safety using data and AI across the eight domains to reduce the frequency of harm (Table 3). We expect AI to have the greatest impact in areas where current strategies are not effective, and integration and complex analysis of novel, unstructured data are necessary to make accurate predictions, which applies specifically to ADEs, decompensation, and diagnostic errors.

Table 3 Evaluation of the potential of artificial intelligence to improve patient safety in the eight harm domains.

However, the application of AI and ML to improve patient safety is an emerging field and most of these algorithms have not yet been externally validated or tested prospectively. Promising performance based on development or internal validation samples may not translate into improvements in real-world practice. Algorithms may be limited in generalizability, and performance may be affected by the clinical context where the solution is implemented. Although the level of evidence is modest for all domains, we are highlighting what we believe to be the most promising areas.

Future research must focus on careful evaluation of clinical decision support systems based on AI analytics prior to widespread implementation to ensure safety and accuracy. From a technical perspective, candidate algorithms and tools should be validated at other sites, account for differential performance in subgroups, and explicitly report the uncertainty around any estimates or recommendations78. Furthermore, papers describing model development and performance assessments should adhere to reporting standards for transparency and provide important information about validity, biases, and generalizability to other settings79. Once high-quality AI solutions are developed, additional factors beyond performance must be considered to increase the likelihood of successful implementation and adoption by individual providers. There is an active area of research focused on identifying key barriers and facilitators to implementation of AI-based tools in healthcare78,80,81.

With data available today, especially laboratory information, imaging and continuous vital sign data, it should be possible to reduce the frequency of many types of harm. However, when the data are available, they are often unstructured, simply not in any documented form, or disputed. High-quality, large annotated databases will prove quite fruitful in minimizing patient harm in the future. New types of data, especially from the huge array of sensing technologies becoming available, but also including data from various other sources like information supplied directly by patients, genomic sequencing, and social media, offer new opportunities to improve predictions as the first step toward development of preventive interventions to improve safety. These types of data are becoming available and more accessible over time for research and to drive innovation82,83,84.

In addition, automated detection of safety issues of all types, but especially harm outside the hospital (e.g., post-marketing surveillance of drugs), will make routine measurement of the frequency of harm possible. While some of this will be rule-based, data-driven AI will also undoubtedly play a role.

This study has several limitations. The search query extracted evidence from a single database to identify published articles focused on the eight harm domains, and other literature may be available. Screening and data abstraction were completed by a single reviewer. The projections were informed by the incidence, cost, and preventability of harm as well as effectiveness of current strategies and promise of AI solutions.


Overall, AI has great potential to improve the safety of care (Fig. 2). In our view, harm domains including ADEs, decompensation, and diagnostic errors represent particularly attractive early targets. Transparent population-based datasets, which include diverse traditional (e.g., EHR, claims) and novel data (e.g., sensors, wearables, broader determinants of health), will be essential to build robust and equitable models. For AI to be effective, implementation of data-driven analytics will require organizations to develop, support, and iterate clinician, team, and system workflows for continued patient safety improvements.

Fig. 2: Summary of major domains of harm and key points.

The first panel highlights the eight major domains of harm. The second panel summarizes the key points from the article.

Data availability

All data generated or analyzed during this study are included in this published article and its Supplementary Information.


  1. 1.

    Kohn, L., Corrigan, J. & Donaldson, M. To Err Is Human (National Academies Press, 2000).

  2. 2.

    Bates, D. W. & Singh, H. Two decades since to err is human: an assessment of progress and emerging priorities in patient safety. Health Aff. 37, 1736–1743 (2018).

    Article  Google Scholar 

  3. 3.

    Pronovost, P. et al. An intervention to decrease catheter-related bloodstream infections in the ICU. N. Engl. J. Med. 355, 2725–2732 (2006).

    CAS  PubMed  Article  Google Scholar 

  4. 4.

    Haynes, A. B. et al. A surgical safety checklist to reduce morbidity and mortality in a global population. N. Engl. J. Med. 360, 491–499 (2009).

    CAS  PubMed  Article  Google Scholar 

  5. 5.

    Bates, D. W. et al. Effect of computerized physician order entry and a team intervention on prevention of serious medication errors. JAMA 280, 1311 (1998).

    CAS  PubMed  Article  Google Scholar 

  6. 6.

    Poon, E. G. et al. Effect of bar-code technology on the safety of medication administration. N. Engl. J. Med. 362, 1698–1707 (2010).

    CAS  PubMed  Article  Google Scholar 

  7. 7.

    Jha, A. K. et al. The global burden of unsafe medical care: analytic modelling of observational studies. BMJ Qual. Saf. 22, 809–815 (2013).

    PubMed  Article  Google Scholar 

  8. 8.

    Jha, A. K., Chan, D. C., Ridgway, A. B., Franz, C. & Bates, D. W. Improving safety and eliminating redundant tests: cutting costs in U.S. hospitals. Health Aff. 28, 1475–1484 (2009).

    Article  Google Scholar 

  9. 9.

    Tricco, A. C. et al. PRISMA extension for scoping reviews (PRISMA-ScR): checklist and explanation. Ann. Intern. Med. 169, 467–473 (2018).

    PubMed  Article  Google Scholar 

  10. 10.

    U.S. Bureau of Labor Statistics. Producer price index by industry: selected health care industries (PCUASHCASHC). (2020).

  11. 11.

    Magill, S. S. et al. Changes in prevalence of health care–associated infections in U.S. hospitals. N. Engl. J. Med. 379, 1732–1744 (2018).

    PubMed  Article  Google Scholar 

  12. 12.

    Zimlichman, E. et al. Health care–associated infections. JAMA Intern. Med. 173, 2039 (2013).

    PubMed  Article  Google Scholar 

  13. 13.

    Umscheid, C. A. et al. Estimating the proportion of healthcare-associated infections that are reasonably preventable and the related mortality and costs. Infect. Control Hosp. Epidemiol. 32, 101–114 (2011).

    PubMed  Article  Google Scholar 

  14. 14.

    Liao, Y.-H. et al. Machine learning methods applied to predict ventilator-associated pneumonia with pseudomonas aeruginosa infection via sensor array of electronic nose in intensive care unit. Sensors 19, 1866 (2019).

    Article  Google Scholar 

  15. 15.

    Saviauk, T. et al. Electronic nose in the detection of wound infection bacteria from bacterial cultures: a proof-of-principle study. Eur. Surg. Res. 59, 1–11 (2018).

    CAS  PubMed  Article  Google Scholar 

  16. 16.

    Kuppusami, S., Clokie, M. R. J., Panayi, T., Ellis, A. M. & Monks, P. S. Metabolite profiling of Clostridium difficile ribotypes using small molecular weight volatile organic compounds. Metabolomics 11, 251–260 (2015).

    CAS  Article  Google Scholar 

  17. 17.

    Beeler, C. et al. Assessing patient risk of central line-associated bacteremia via machine learning. Am. J. Infect. Control 46, 986–991 (2018).

    PubMed  Article  Google Scholar 

  18. 18.

    Haque, A. et al. Towards vision-based smart hospitals: a system for tracking and monitoring hand hygiene compliance. Mach. Learn. Healthc. Conf. (2017).

  19. 19.

    Geilleit, R. et al. Feasibility of a real-time hand hygiene notification machine learning system in outpatient clinics. J. Hosp. Infect. 100, 183–189 (2018).

    CAS  PubMed  Article  Google Scholar 

  20. 20.

    Mehra, R., Bianconi, G. M., Yeung, S. & Fei-Fei, L. Depth-based activity recognition in ICUs using convolutional and recurrent neural networks. Mach. Learn. Healthc. Conf. 1–9 (2017).

  21. 21.

    Suresh, H. et al. Clinical intervention prediction and understanding using deep networks. Mach. Learn. Healthc. Conf. 68, 1–16 (2017).

    Google Scholar 

  22. 22.

    Weiss, A., Freeman, W., Heslin, K. & Barrett, M. Adverse drug events in U.S. hospitals, 2010 versus 2014. (2018).

  23. 23.

    Bates, D. W. et al. Incidence of adverse drug events and potential adverse drug events. Implications for prevention. ADE prevention study group. JAMA 274, 29–34 (1995).

    CAS  PubMed  Article  Google Scholar 

  24. 24.

    Zitnik, M., Agrawal, M. & Leskovec, J. Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics 34, i457–i466 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  25. 25.

    Ogallo, W. & Kanter, A. S. Towards a clinical decision support system for drug allergy management: are existing drug reference terminologies sufficient for identifying substitutes and cross-reactants? Stud. Health Technol. Inform. 216, 1088 (2015).

    PubMed  Google Scholar 

  26. 26.

    Dey, S., Luo, H., Fokoue, A., Hu, J. & Zhang, P. Predicting adverse drug reactions through interpretable deep learning framework. BMC Bioinformatics 19, 476 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  27. 27.

    Pavani, A. et al. Artificial neural network-based pharmacogenomic algorithm for warfarin dose optimization. Pharmacogenomics 17, 121–131 (2016).

    CAS  PubMed  Article  Google Scholar 

  28. 28.

    Mahan, C. E. et al. Venous thromboembolism: annualised United States models for total, hospital-acquired and preventable costs utilising long-term attack rates. Thromb. Haemost. 108, 291–302 (2012).

    CAS  PubMed  Article  Google Scholar 

  29. 29.

    Zeidan, A. M. et al. Impact of a venous thromboembolism prophylaxis “smart order set”: improved compliance, fewer events. Am. J. Hematol. 88, 545–549 (2013).

    CAS  PubMed  Article  Google Scholar 

  30. 30.

    Nafee, T. et al. Machine learning to predict venous thrombosis in acutely ill medical patients. Res. Pract. Thromb. Haemost. 4, 230–237 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  31. 31.

    Ferroni, P. et al. Risk assessment for venous thromboembolism in chemotherapy-treated ambulatory cancer patients. Med. Decis. Making 37, 234–242 (2017).

    PubMed  Article  Google Scholar 

  32. 32.

    Willan, J., Katz, H. & Keeling, D. The use of artificial neural network analysis can improve the risk‐stratification of patients presenting with suspected deep vein thrombosis. Br. J. Haematol. 185, 289–296 (2019).

    PubMed  Article  Google Scholar 

  33. 33.

    Banerjee, I. et al. Development and performance of the pulmonary embolism result forecast model (PERFORM) for computed tomography clinical decision support. JAMA Netw. Open 2, e198719 (2019).

    PubMed  PubMed Central  Article  Google Scholar 

  34. 34.

    Corey, K. M. et al. Development and validation of machine learning models to identify high-risk surgical patients using automatically curated electronic health record data (Pythia): a retrospective, single-site study. PLoS Med. 15, e1002701 (2018).

    PubMed  PubMed Central  Article  Google Scholar 

  35. 35.

    Scott, J. W. et al. Use of national burden to define operative emergency general surgery. JAMA Surg. 151, e160480 (2016).

    PubMed  Article  Google Scholar 

  36. 36.

    Linnebur, M. et al. Preventable complications and deaths after emergency nontrauma surgery. Am. Surg. 84, 1422–1428 (2018).

    PubMed  Article  Google Scholar 

  37. 37.

    Huang, R. S. P. et al. Post-operative bleeding risk stratification in cardiac pulmonary bypass patients using artificial neural network. Ann. Clin. Lab. Sci. 45, 181–186 (2015).

    CAS  PubMed  Google Scholar 

  38. 38.

    Wise, E. S. et al. Prediction of prolonged ventilation after coronary artery bypass grafting: data from an artificial neural network. Heart Surg. Forum 20, E007–E014 (2017).

    PubMed  Article  Google Scholar 

  39. 39.

    Bertsimas, D., Dunn, J., Velmahos, G. C. & Kaafarani, H. M. A. Surgical risk is not linear: derivation and validation of a novel, user-friendly, and machine-learning-based predictive optimal trees in emergency surgery risk (POTTER) calculator. Ann. Surg. 268, 574–583 (2018).

    PubMed  Article  PubMed Central  Google Scholar 

  40. 40.

    Wu, H.-Y. et al. Predicting postoperative vomiting among orthopedic patients receiving patient-controlled epidural analgesia using SVM and LR. Sci. Rep. 6, 27041 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  41. 41.

    Hashimoto, D. A. et al. Computer vision analysis of intraoperative video: automated recognition of operative steps in laparoscopic sleeve. Ann. Surg. 270, 414–421 (2019).

    PubMed  PubMed Central  Article  Google Scholar 

  42. 42.

    Namazi, B., Sankaranarayanan, G., Devarajan, V. & Fleshman, J. A deep learning system for automatically identifying critical view of safety in laparoscopic cholecystectomy videos for assessment. In SAGES 2017 Annual Meeting (Sages, Houston, TX, 2017).

  43. 43.

    Hashimoto, D. A., Rosman, G., Rus, D. & Meireles, O. R. Artificial intelligence in surgery. Ann. Surg. 268, 70–76 (2018).

    PubMed  PubMed Central  Article  Google Scholar 

  44. 44.

    Gardiner, J. C., Reed, P. L., Bonner, J. D., Haggerty, D. K. & Hale, D. G. L. Incidence of hospital-acquired pressure ulcers - a population-based cohort study. Int. Wound J. 13, 809–820 (2016).

    PubMed  Article  Google Scholar 

  45. 45.

    Padula, W. V. & Delarmente, B. A. The national cost of hospital‐acquired pressure injuries in the United States. Int. Wound J. 16, 634–640 (2019).

    PubMed  Article  Google Scholar 

  46. 46.

    Landrigan, C. P. et al. Temporal trends in rates of patient harm resulting from medical care. N. Engl. J. Med. 363, 2124–2134 (2010).

    CAS  PubMed  Article  Google Scholar 

  47. 47.

    Alderden, J. et al. Predicting pressure injury in critical care patients: a machine-learning model. Am. J. Crit. Care 27, 461–468 (2018).

    PubMed  PubMed Central  Article  Google Scholar 

  48. 48.

    Hsiao, R.-S. et al. Body posture recognition and turning recording system for the care of bed bound patients. Technol. Health Care 24, S307–S312 (2015).

    PubMed  Article  Google Scholar 

  49. 49.

    Luboz, V. et al. Personalized modeling for real-time pressure ulcer prevention in sitting posture. J. Tissue Viability 27, 54–58 (2018).

    PubMed  Article  CAS  Google Scholar 

  50. 50.

    Bergen, G., Stevens, M. R. & Burns, E. R. Falls and fall injuries among adults aged ≥65 years — United States, 2014. MMWR Morb. Mortal. Wkly. Rep. 65, 993–998 (2016).

    PubMed  Article  Google Scholar 

  51. 51.

    Florence, C. S. et al. Medical costs of fatal and nonfatal falls in older adults. J. Am. Geriatr. Soc. 66, 693–698 (2018).

    PubMed  PubMed Central  Article  Google Scholar 

  52. 52.

    Yokota, S., Endo, M. & Ohe, K. Establishing a classification system for high fall-risk among inpatients using support vector machines. CIN Comput. Inform. Nurs. 35, 408–416 (2017).

    PubMed  Google Scholar 

  53. 53.

    Howcroft, J., Kofman, J. & Lemaire, E. D. Prospective fall-risk prediction models for older adults based on wearable sensors. IEEE Trans. Neural Syst. Rehabil. Eng. 25, 1812–1820 (2017).

    PubMed  Article  Google Scholar 

  54. 54.

    Howcroft, J., Lemaire, E. D. & Kofman, J. Wearable-sensor-based classification models of faller status in older adults. PLoS ONE 11, e0153240 (2016).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  55. 55.

    Alazrai, R., Mowafi, Y. & Hamad, E. A fall prediction methodology for elderly based on a depth camera. Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. 2015, 4990–4993 (2015).

    PubMed  Google Scholar 

  56. 56.

    Juang, L.-H. & Wu, M.-N. Fall down detection under smart home system. J. Med. Syst. 39, 107 (2015).

    PubMed  Article  Google Scholar 

  57. 57.

    Torio, C. M. & Moore, B. J. National Inpatient Hospital Costs: The Most Expensive Conditions by Payer, 2013: Statistical Brief #204. Healthcare Cost and Utilization Project (HCUP) Statistical Briefs. (Agency for Healthcare Research and Quality, Rockville, MD, 2016).

  58. 58.

    Kuo, L. E. et al. Failure-to-rescue after injury is associated with preventability: the results of mortality panel review of failure-to-rescue cases in trauma. Surgery 161, 782–790 (2017).

    PubMed  Article  Google Scholar 

  59. 59.

    Sanchez-Pinto, L. N., Venable, L. R., Fahrenbach, J. & Churpek, M. M. Comparison of variable selection methods for clinical predictive modeling. Int. J. Med. Inform. 116, 10–17 (2018).

    PubMed  PubMed Central  Article  Google Scholar 

  60. 60.

    Ward, L., Paul, M. & Andreassen, S. Automatic learning of mortality in a CPN model of the systemic inflammatory response syndrome. Math. Biosci. 284, 12–20 (2017).

    PubMed  Article  Google Scholar 

  61. 61.

    Taylor, R. A. et al. Prediction of in-hospital mortality in emergency department patients with sepsis: a local big data-driven, machine learning approach. Acad. Emerg. Med. 23, 269–278 (2016).

    PubMed  PubMed Central  Article  Google Scholar 

  62. 62.

    Islam, M. M. et al. Prediction of sepsis patients using machine learning approach: a meta-analysis. Comput. Methods Prog. Biomed. 170, 1–9 (2019).

    Article  Google Scholar 

  63. 63.

    Wetzel, R. C., Aczon, M. & Ledbetter, D. R. Artificial intelligence: an inkling of caution. Pediatr. Crit. Care Med. 19, 1004–1005 (2018).

    PubMed  Article  Google Scholar 

  64. 64.

    Vandendriessche, B., Abas, M., Dick, T. E., Loparo, K. A. & Jacono, F. J. A framework for patient state tracking by classifying multiscalar physiologic waveform features. IEEE Trans. Biomed. Eng. 64, 2890–2900 (2017).

    PubMed  PubMed Central  Article  Google Scholar 

  65. 65.

    Hackmann, G. et al. Toward a two-tier clinical warning system for hospitalized patients. AMIA Annu. Symp. Proc. 2011, 511–519 (2011).

    PubMed  PubMed Central  Google Scholar 

  66. 66.

    Brown, H., Terrence, J., Vasquez, P., Bates, D. W. & Zimlichman, E. Continuous monitoring in an inpatient medical-surgical unit: a controlled clinical trial. Am. J. Med. 127, 226–232 (2014).

    PubMed  Article  Google Scholar 

  67. 67.

    Sutherland, A. et al. Development and validation of a novel molecular biomarker diagnostic test for the early detection of sepsis. Crit. Care 15, R149 (2011).

    PubMed  PubMed Central  Article  Google Scholar 

  68. 68.

    Taneja, I. et al. Combining biomarkers with EMR data to identify patients in different phases of sepsis. Sci. Rep. 7, 10800 (2017).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  69. 69.

    Hassan, U., Zhu, R. & Bashir, R. Multivariate computational analysis of biosensor’s data for improved CD64 quantification for sepsis diagnosis. Lab Chip 18, 1231–1240 (2018).

    CAS  PubMed  Article  Google Scholar 

  70. 70.

    Vu, L. et al. Predicting nocturnal hypoglycemia from continuous glucose monitoring data with extended prediction horizon. AMIA Annu. Symp. Proc. 2019, 874–882 (2019).

    PubMed  Google Scholar 

  71. 71.

    Newman-Toker, D. The team sport of diagnosis: a culture shift can reduce missed diagnoses. The Healthcare Blog (2016).

  72. 72.

    Singh, H., Meyer, A. N. D. & Thomas, E. J. The frequency of diagnostic errors in outpatient care: estimations from three large observational studies involving US adult populations. BMJ Qual. Saf. 23, 727–731 (2014).

    PubMed  PubMed Central  Article  Google Scholar 

  73. 73.

    Topol, E. J. High-performance medicine: the convergence of human and artificial intelligence. Nat. Med. 25, 44–56 (2019).

    CAS  PubMed  Article  Google Scholar 

  74. 74.

    Tirzīte, M., Bukovskis, M., Strazda, G., Jurka, N. & Taivans, I. Detection of lung cancer in exhaled breath with an electronic nose using support vector machine analysis. J. Breath Res. 11, 036009 (2017).

    Article  Google Scholar 

  75. 75.

    Dehghani Soufi, M., Samad-Soltani, T., Shams Vahdati, S. & Rezaei-Hachesu, P. Decision support system for triage management: a hybrid approach using rule-based reasoning and fuzzy logic. Int. J. Med. Inform. 114, 35–44 (2018).

    PubMed  Article  Google Scholar 

  76. 76.

    Neri, E. & Pinker-Domenig, K. (eds) Special issue “Artificial Intelligence in Diagnostics”. (2020).

  77. 77.

    Dias, R. & Torkamani, A. Artificial intelligence in clinical and genomic diagnostics. Genome Med. 11, 70 (2019).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  78. 78.

    Bates, D. W., Auerbach, A., Schulam, P., Wright, A. & Saria, S. Reporting and implementing interventions involving machine learning and artificial intelligence. Ann. Intern. Med. 172, S137–S144 (2020).

    PubMed  Article  Google Scholar 

  79. 79.

    Hernandez-Boussard, T., Bozkurt, S., Ioannidis, J. P. A. & Shah, N. H. MINIMAR (MINimum Information for Medical AI Reporting): developing reporting standards for artificial intelligence in health care. J. Am. Med. Inform. Assoc. 27, 2011–2015 (2020).

    PubMed  PubMed Central  Article  Google Scholar 

  80. 80.

    Watson, J. et al. Overcoming barriers to the adoption and implementation of predictive modeling and machine learning in clinical care: what can we learn from US academic medical centers? JAMIA Open 3, 167–172 (2020).

    PubMed  PubMed Central  Article  Google Scholar 

  81. 81.

    Shaw, J., Rudzicz, F., Jamieson, T. & Goldfarb, A. Artificial intelligence and the implementation challenge. J. Med. Internet Res. 21, e13659 (2019).

    PubMed  PubMed Central  Article  Google Scholar 

  82. 82.

    Bates, D. W., Heitmueller, A., Kakad, M. & Saria, S. Why policymakers should care about “big data” in healthcare. Health Policy Technol. 7, 211–216 (2018).

    Article  Google Scholar 

  83. 83.

    Open Data Science (ODSC). 15 Open datasets for healthcare. Medium (2019).

  84. 84.

    AltexSoft. Best public datasets for machine learning and data science: sources and advice on the choice. AltexSoft (2019).

Download references


The authors would like to thank Dr. Paul Bain for assistance in developing the search strategy for this scoping review. Dr. Syrowatka is supported by a Fellowship Award from the Canadian Institutes of Health Research. This work has been supported by IBM Watson Health (Cambridge, MA), which is not responsible for the content or recommendations made.

Author information




D.W.B., D.L., AS., K.J.T.C., G.P.J., and K.R. were responsible for study conception and design; A.S., M.K., and A.R. reviewed the literature; D.W.B., D.L., A.S., and M.K. analyzed and interpreted the data; D.W.B., D.L., A.S., and M.K. drafted the manuscript, and all the remaining authors have made revisions to it. All the authors have approved the manuscript.

Corresponding author

Correspondence to David W. Bates.

Ethics declarations

Competing interests

Dr. Bates consults for EarlySense, which makes patient safety monitoring systems. He receives cash compensation from CDI (Negev), Ltd, which is a not-for-profit incubator for health IT startups. He receives equity from ValeraHealth, which makes software to help patients with chronic diseases. He receives equity from Clew, which makes software to support clinical decision-making in intensive care. He receives equity from MDClone, which takes clinical data and produces deidentified versions of it. He receives equity from AESOP, which makes software to reduce medication error rates. Drs. Craig, Jackson, and Rhee are all employed by IBM Watson Health. The other coauthors have no disclosures.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Bates, D.W., Levine, D., Syrowatka, A. et al. The potential of artificial intelligence to improve patient safety: a scoping review. npj Digit. Med. 4, 54 (2021).

Download citation


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing