Main

Artificial intelligence (AI) tools are rapidly maturing for medical applications, with many studies determining that their performance can exceed or complement human experts for specific medical use cases1,2,3. Unimodal supervised learning AI tools have been assessed extensively for medical image interpretation, especially in the field of radiology, with some success in recognizing complex patterns in imaging data3,4. Surgery, however, remains a sector of medicine where the uptake of AI has been slower, but the potential is vast5.

Over 330 million surgical procedures are performed annually, with increasing waiting lists6,7 and growing demands on surgical capacity8. Substantial global inequities exist in terms of access to surgery, the burden of complications and failure to rescue (that is, post-complication mortality) after surgery9,10,11,12,13. A multifaceted approach to surgical system strengthening is required to improve patient outcomes, including better access to surgery, surgical education, the detection and management of postoperative complications and optimization of surgical system efficiencies. To date, minimally invasive surgery has been a dominant driver of improvements in surgical outcomes, reducing postoperative infections, length of stay and postoperative pain and improving long-term recovery and wound healing. Enhanced recovery programs14, improved patient selection, broadening adjuvant approaches and organ-sparing treatments have also been important contributors. We are now entering an era where data-driven methods will become increasingly important to further improving surgical care and outcomes15. AI tools hold the potential to improve every aspect of surgical care; preoperatively, with regards to patient selection and preparation; intraoperatively, for improving procedural performance, operating room workflows and surgical team functioning; and postoperatively, to reduce complications, reduce mortality from complications and improve follow-up.

Current AI applications in surgery have been mostly limited to unimodal deep learning (Box 1). Transformers are a particular recent breakthrough in neural network architectures that have been very effective empirically in several areas, owing to their improved computational efficiency through parallelizability16 and their enhanced scalability (with models able to handle vast input parameters). Such transformer models have been pivotal in enabling multimodal AI and foundation models, with substantial potential in surgery.

Emerging applications of AI in surgery include clinical risk prediction17,18, automation and computer vision in robotic surgery19, intraoperative diagnostics20,21, enhanced surgical training22, postoperative monitoring through advanced sensors23,24, resource management25, discharge planning26 and more. The aim of this state-of-the-art Review is to summarize the current state of AI in surgery and identify themes that will help to guide its future development.

Preoperative

There is much room for improvement of preoperative surgical care, encompassing areas of active surgical research such as diagnostics, risk prognostication, patient selection, operative optimization and patient counseling—all aspects of the preoperative pathway of patients receiving surgery where AI has emerging capabilities.

Preoperative diagnostics

Patient selection and surgical planning have become increasingly evidence based, but are still contingent on experiential intuitions (and biases), with profound individual and regional variabilities. The influence of AI may emerge most rapidly in the context of preoperative imaging for early diagnosis and surgical planning. As an example, a model-free reinforcement learning algorithm showed promise when applied to preoperative magnetic resonance images to identify and maximize tumor tissue removal while minimizing the impact on functional anatomical tissues during neurosurgery27. Technically challenging cases with high between-patient anatomical variations, such as in pulmonary segmentectomy, have been met with pioneering approaches to enhance preoperative planning with novel amalgamations of virtual reality and AI-based segmentation systems. In a pilot study by Sadeghi et al.28, AI segmentation with virtual reality resulted in critical changes to surgical approaches in four out of ten patients.

Accurate preoperative diagnosis is an important area of surgical practice with substantial influence on clinical decision-making and therapeutic planning. For example, in the context of breast cancer diagnostics, the RadioLOGIC algorithm extracts unstructured radiological report data from electronic health records to enhance radiological diagnostics29. Extraction of unstructured reports using transfer learning (applying cross-domain knowledge to boost model performance on related tasks) showed high accuracy for the prediction of breast cancer (accuracy >0.9), and pathological outcome prediction was superior with transfer learning (area under the receiver operating characteristic curve (AUROC) = 0.945 versus 0.912). This report emphasizes the value of integrating natural language processing of unstructured text within existing infrastructures for promoting preoperative diagnostic accuracy. Another prominent example is a three-dimensional convolutional neural network that detected pancreatic ductal adenocarcinoma on diagnostic computed tomography and visually occult pancreatic ductal adenocarcinoma on prediagnostic computed tomography with AUROC values of 0.97 and 0.90, respectively30. Streamlining preoperative diagnostics can optimize integrated multidisciplinary surgical treatment pathways and facilitate early detection and intervention where timely management is prognostically critical31. Progress with large language models and integration with electronic healthcare record systems—particularly the utility of foundation models empowering the analysis of unlabeled datasets—could be transformative in enabling earlier disease diagnosis and early treatment before disease progression32,33.

AI-based diagnosis is one of the most mature areas of surgical AI where model accuracy and generalizability are seeing early clinical translation. Numerous in-depth and domain-specific explorations of the efficacy of AI in endoscopic34,35, histological36, radiological37 and genomic38 diagnostics have been outlined elsewhere (see ref. 39). These task-specific advances are enabling more accurate diagnostics and disease staging in the oncology space, with substantial potential to optimize surgical planning. However, to date, all applications have been unimodal; novel transformer models that are able to integrate vastly more data, both in quantity and format, could spur further progress in the near future.

Clinical risk prediction and patient selection

High-accuracy risk prediction seeks to enable enhanced patient selection for operative management to improve outcomes, reduce futility40, better inform patient consent and shared decision-making41, triage resource allocation and enable pre-emptive intervention. It remains an elusive goal of surgical research18. Numerous critical reports have highlighted the high risk of bias42 and overall inadequacy of the high majority of clinical risk scores in the literature, with few penetrating routine clinical practice43. It is important to remember that in the pursuit of enhanced predictive capabilities, the novelty of a tool (such as AI) should not supersede a tool’s utility. Finlayson et al.44 expertly summarize the often false dichotomy of machine learning and statistics; they argue that dichotomizing machine learning as separate to classical statistics neglects its underlying statistical principles and conflates innovation and technical sophistication with clinical utility.

The majority of current AI-based risk prediction tools offer sparse advances over existing tools, and few are used in clinical practice45. The COVIDSurg mortality score is one machine learning prediction score based on a generalized linear model (chosen for its superiority to random forest and decision tree alternatives) that shows a validation cohort AUROC of 0.80 (95% confidence interval = 0.77–0.83)46. Numerous other machine learning risk scores exist for preoperative prediction of postoperative morbidity and mortality45,47,48,49,50,51. One notable example is the smartphone app-based POTTER calculator, which uses optimal classification trees and outperforms most existing mortality predictors with an accuracy of 0.92 at internal validation52, 0.93 in an external emergency surgery context48 and 0.80 in an external validation cohort of patients >65 years of age receiving emergency surgery47. Notably, POTTER also showed improved predictive accuracy compared with surgeon gestalt53. Deep learning methods have also shown utility in neonatal cardiac transplantation outcomes, with high accuracy for predicting mortality and length of stay (AOROC values of 0.95 and 0.94, respectively)54.

The use of AI in surgical risk prediction remains an emerging field that is lacking in randomized trials55 and external validation, and has a high risk of bias56. Future work should move toward predictions of relevance to clinicians and patients57 and prioritize compliance with the CONSORT-AI extension58, TRIPOD (and its upcoming AI extension)59,60, DECIDE-AI61 and PRISMA AI62 and other relevant reporting guidelines, to advance the field in a standardized, safe and efficient manner while minimizing research waste.

Preoperative optimization

Preoperative optimization is still an underdeveloped concept that is beginning to receive more attention in surgical research63 and could be leveraged with multimodal inputs. A multifaceted appreciation of patients’ cardiovascular fitness, frailty, muscle function and optimizable biopsychosocial factors could be accurately characterized through multimodal AI approaches leveraging the full gamut of -omics data64,65. For example, research using AI to detect ventricular function using 12-lead electrocardiograms could rapidly streamline preoperative cardiovascular assessment66,67,68. While more information does not necessarily correlate with improved risk prediction, a more holistic understanding of patient factors in the preoperative setting could be leveraged to optimize characteristics such as sarcopenia, anemia, glycemic control and more, to facilitate improved surgical outcomes.

Patient-facing AI for consent and patient education

Large language models (LLMs)—a form of generative AI—are a generational breakthrough, with the emergence and adoption of ChatGPT occurring at an unprecedented pace and other LLMs emerging at an equally rapid pace. These models have attained high scores on medical entrance exams69,70 and contextualized complex information as competently as surgeons71, and there is the potential for patients to interact with them as an initial clinical contact point72,73. AI models can augment clinician empathy74, contribute to reliable informed consent41,71 and reduce documentation burdens. A recent report demonstrates promising readability, accuracy and context awareness of chatbot-derived material for informed consent compared with surgeons71. These advances offer a unique opportunity for tailored patient-facing interventions.

While clinical implementation of AI is a work in progress, there is great potential for superior patient-facing digital healthcare. A pilot clinical trial by the company Soul Machines (Auckland, New Zealand) highlights the potential power of amalgamating LLMs and avatar digital health assistants (or digital people)75. OpenAI’s fine-tuned generative pretrained transformers and assistant application programming interfaces could be leveraged for such a purpose if solutions to trust and privacy concerns are found76,77. The COVID-19 pandemic highlighted the value of decentralized digital health strategies to enable wider access to healthcare, and as global healthcare demands rise, these promising reports offer a valuable augment to the delivery of healthcare. These concepts also offer a step toward a hospital-at-home future that aims to further democratize healthcare delivery. Such innovations have particular utility in surgical care, where preoperative counseling, surgical consent and postoperative recovery and follow-up could all be augmented by patient-facing AI models validated to show high reliability for target indications41,71,78,79.

In current practice, informed consent and nuanced discussions about surgical care plans are frequently confined to time-limited clinic appointments. Chatbots powered by accurate LLMs offer an opportunity for patients to ask more questions, facilitating ongoing communication and better-informed care. Integrated with accurate deep learning-based risk prediction, such AI communication platforms could offer a personalized risk profile, answer questions about preoperative optimization and postoperative recovery and guide patients through the surgical journey, including postoperative follow-up consultations80. Early generative AI models are probably already primed for translation to such clinical education settings81, with many more rapidly emerging (for example, Hippocratic AI, Sparrow82 and Gemini (Google DeepMind), BlenderBot 3 (Meta Platforms), HuggingChat and more). Nuanced appreciations of real-world complexity74 and the introduction of multi-agent conversational frameworks will be key for the testing and implementation of medical AIs83. At present, these models are yet to incorporate the vast and historic corpus of the medical literature; however, with specialized fine tuning and advances in unsupervised learning, the accuracy and generality of these tools is likely to improve. Nevertheless, further work to improve the transparency and reliability of such integrations is required84, as evidenced by recent examples of inaccurate and unreliable information from LLMs in breast cancer screening85.

In summary, multimodal approaches may transform the preoperative patient flow paradigm. The use of unstructured text from electronic health records, in conjunction with preoperative computed or positron emission tomography, genomics, microbiomics, laboratory results, environmental exposures, immune phenotypes, personal physiologies, sensor inputs and more will enable deep phenotyping at the individual patient level to optimize personalized risk prediction and operative planning. Such advances are highly sought after to improve shared decision-making, patient selection and offer individualized targeted therapy.

Intraoperative

The intraoperative period is a data-rich environment, with continuous monitoring of physiological parameters amid complex insults and alterations to anatomy and physiology. This time is the core of surgical practice. Advances in intraoperative computer vision have enabled preliminary progress in the analysis of anatomy, including assessment of tissue characteristics and dissection planes, as well as pathology identification. Likewise, progressing the reliable identification of instruments and stage of operation and the prediction of procedural next steps are important foundations for future autonomous systems and data-driven improvements in surgical techniques19.

Events inside the operating theater have substantial impacts on recovery, postoperative complications and oncological outcomes86. Yet, despite their pivotal importance, minimal data are currently recorded, analyzed or collected in this setting. Valuable data streams from the intraoperative operative period should be harnessed to contribute to advances in surgical automation and to underpin the utility of AI in the theater space15. We envision a future operating room with real-time access to patient-specific anatomy, operative plans, personalized risks and dashboards that integrate information in real time throughout a case, updating based on surgeon and operating team prompts, actions and decisions (Fig. 1).

Fig. 1: Integration of novel AI-powered digital interventions in the intraoperative setting.
figure 1

Operating room components with the potential for AI integration are shown in blue. Traditional laparoscopic towers could be integrated with virtual or augmented reality to facilitate improved three-dimensional views, adjustable overlaid annotations and warning systems for aberrant anatomy. They could also overlay the individual patient’s imaging with AI diagnostics to improve R0 resections in oncological surgery, identify anatomical differences and better identify complex planes. Existing diathermy towers could incorporate voice assistants and black box-type systems for audit and quality control. An intraoperative dashboard aligned with the entire theater team could enable virtual consultations, virtual supervision for trainee surgeons and AI-powered access to the corpus of medical knowledge and surgical techniques, all contextualized to the operative plan. In addition, continuous vital signs, anesthetic inputs and patient-centered risks (for example, of hypotension) could be available as the operation progresses, to help the planning of pre-emptive actions and postoperative care. Personalized screens for scrub nurses, indicating stock location, phase detection169 and the predicted next instrument needed could also improve efficiency.

Intraoperative decision-making

Most pragmatically, enhanced pathological diagnostics from tissue specimens (as overviewed above in the section ‘Preoperative diagnostics’) could optimize surgical resection margins, reduce operative durations and optimize surgical efficiency87,88. One such example is a recent patient-agnostic transfer-learned neural network that used rapid nanopore sequencing to enable accurate intraoperative diagnostics within 40 minutes88, enabling early information for operative decision-making. Multimodal AI interrogation of the surgical field could aid the determination of relevant and/or aberrant anatomy (with major strides toward such surgical vision already occurring in laparoscopic cholecystectomy19,89), augment the surgeon’s visual reviews (for example, by employing a second pair of AI 'eyes' to run the bowel when looking for perforation), inform the need for biopsies and quantify the risk of malignancy90.

The advantages of AI in hypothetico-deductive surgical decision-making are expertly overviewed by Loftus et al.25. Deep learning (in particular neural networks) is devised in an attempt to replicate human intuition—a key element of rapid decision-making among experienced surgeons91. One of the first machine learning tools developed for intraoperative decision-making92, which has undergone validation and translation to clinical use, is the hypotension prediction index93, which has shown proven benefit in two randomized trials20,94. This represents an early example of a supervised machine learning algorithm that has undergone external validation and the gold standard of randomized clinical testing to demonstrate benefit. Notably, since its advent, numerous advances in AI methods have emerged to strengthen algorithmic performances95.

Such models could be improved in the future through continuous learning, ongoing iteration with constant refinement, and external validation. This requires collaboration with regulatory bodies to facilitate safe and monitored development and maturation of algorithms as the field rapidly advances. The iterative nature of these models may pose challenges for clinical evidence requirements to keep pace with the rate of innovation.

The operative team

Early efforts to gather data in the operating room include the OR Black Box, the aim of which is to provide a reliable system for auditing and monitoring intraoperative events and practice variations96,97. A particularly novel advance toward optimizing surgical teamwork comes from preliminary work toward an AI coach to infer the alignment of mental models within a surgical team98. Shared mental models, whereby teams have a collective understanding of tasks and goals, have been identified as a critical component to decreasing errors and harm in safety-critical fields such as aviation and healthcare. Such approaches require further interrogation within a real-world operating room context, but highlight the breadth of opportunities for digital innovation in surgery.

On the theme of surgical teamwork, multimodal digital inputs, including physiological inputs (for example, skin conductance and heart rate variability) for the identification of operative stress, anesthetic inputs (continuous pharmacological and vitals outputs), nursing team staffing inputs and equipment stock and availability inputs, are all routine elements of the operating room experience that could be quantified digitally and integrated into a digital pathway suitable for automation and optimization. The expansion of multimodal inputs and use of generative AI models incorporating both patient and environmental inputs within the operating room present opportunities to augment nontechnical skills that are pivotal in surgery, including communication, situational awareness and operative team functioning99,100. Operative fatigue, anesthesiologist–surgeon miscommunication, staffing changeovers and shortages, and equipment unavailability are common causes for intraoperative mistakes and are all amenable to digital tracking. A digitized surgical platform can therefore be envisaged to facilitate an AI-enhanced future. The importance of investment toward the platform itself to leverage utility from digital innovations was, for example, embraced by Mayo Clinic in a recent CEO overview101.

Surgical robotics and automation

While there has been much progress19,102, early attempts at computer vision have been limited to specific tasks and have lacked external validation. AI has been applied to unicentric, unimodal video data to identify surgical activity103, gestures104, surgeon skill105,106 and instrument actions107. A demonstrative advance has been made by Kiyasseh et al.108, who have developed a unified surgical AI system that accurately identifies and assesses the quality of surgical steps and actions performed by the surgeon using unannotated videos (area under the curve >0.85 at external validation for needle withdrawal, handling and driving). This procedure-agnostic, multicentric approach, with a view to generalizability, facilitates integration into real-world practice108. Technical advances, such as through Meta’s self-supervised (SEER) model currently offer particular promise in the realm of computer vision109. Similar efforts in the future that aim to improve the feedback available to surgeons, tactile responses from laparoscopic and robotic systems and the identification of optimal surgical actions in the intraoperative window could be advanced through multimodal inputs, including rich physiological monitoring, rapid histological diagnostics and virtual reality-based guidance (for example, toward identification of aberrant anatomy and tissue planes, perfusion assessments and more). Low-risk opportunities for the integration of these emerging technologies include co-pilot technologies for operative note writing; with surgeon oversight, providing verification and the potential for iterative model improvement110,111,112.

Computer vision, surgical robotics and autonomous robotic surgery are at the very early stages of development, with incremental but exciting strides occurring. Importantly, robust frameworks have recently been developed to progress the development of surgical robotics and complementary AI technologies, with guidelines for evaluation, comparative research and monitoring throughout clinical translational phases113. Several reviews offer more in-depth analysis of these emerging topics108,114,115.

Operative education

Surgical education has long been entrenched in apprenticeship models of learning, with little progress toward objective metrics and useful mechanisms for feedback to trainee operators. As discussed above, the operating room setting is a data-rich environment that could be leveraged toward automated, statistical approaches to tailored learning. Reliable feedback results in improved surgical performance116,117,118, and data-driven optimization of surgical skill assessment has the potential to have a trans-generational impact on surgical practice119,120. Recently, the addition of human explanations to the supervised AI assessment of surgical videos improved reliability across different groups of surgeons at different stages of training, such that equitable and robust feedback could be generated through AI approaches108. This offers feedback to learners with mitigation against different quality feedback based on different surgeon sub-cohorts121. This is a promising example of the nuanced approaches to model development that will be key to the translation and implementation of AI models in real-world surgical education. An exploration of the potential biases of AI explanations in surgical video assessment122—namely under- and overskilling biases, based on the surgeon’s level of training—highlights the importance of comparisons with current gold standards and utilizing AI outputs as data with which to iterate, learn and optimize toward real-world benefits. In another example, an AI coach had both positive and negative impacts on the proficiency of medical students performing neurosurgical simulation, including improved technical performance at the expense of reduced efficiency123. This example also shows the importance of expert guidance in the development and implementation of AI tools in specialized domains, as well as the need for ongoing assessment of such programs. The opportunity to harness AI in operative education is evidenced by the growing number of registered randomized trials evaluating this approach22,124,125.

The intraoperative period is, therefore, a data-rich environment for surgical AI with early success seen with intraoperative diagnostic and surgical training models, as well as early emerging capabilities in computer vision and automation. Intraoperative applications are diverse and critical for the future of surgery, with vast potential to optimize nontechnical intraoperative functions such as communication, teamwork and skill assessment. Ongoing work toward computer vision systems will lay the groundwork for future autonomous surgical systems.

Postoperative

Postoperative monitoring

The aim of transforming hospital-based healthcare through hospital-at-home services is to liberalize and democratize healthcare and to improve equity and access while unburdening overloaded hospitals. Such a future will enable patients to recover in a familiar environment and will optimize patient recovery, convalescence and their return to functioning in society. Major strides have been made toward reducing postoperative lengths of stay, facilitating early discharge from hospital and improving functional recovery, largely through minimally invasive surgical approaches, encouragement of earlier return to normal activities, enhanced postoperative monitoring, early warning systems and better appreciation of important contributors to recovery. The implementation of enhanced recovery after surgery programs has been pivotal toward this goal.

However, the postoperative period frequently remains devoid of data-driven innovations, crippling further progress. Many hospitals still rely on four-hourly nurse-led observations, unnecessarily prolonged postoperative stays driven by historic protocols and a 'one size fits all' approach to the immediate postoperative period. Ample opportunity exists for wearables to offer continuous patient monitoring, enabling multimodal inputs of physiological parameters that can contribute toward data-driven, patient-specific discharge planning. This would have the added benefit of unburdening nursing staff from cumbersome vital sign rounds, freeing up time and capacity for more patient-centered nursing care. Leveraging postoperative data can further guide discharge rehabilitation goals and interventions, inform analgesic prescriptions and prognosticate adverse outcomes.

One systematic review highlights 31 different wearable devices capable of monitoring vital signs, physiological parameters and physical activity23, but further work is required to realize the potential of these data, including improving the quality of research and reporting126,127. We envision a future where continuous inputs can be integrated into predictive analytics and dashboard-style interfaces to enable rapid escalation, earlier prognostication of complications and reduced mortality from surgical complications128,129. Intensive care units are an example of a highly controlled, data-rich environment where such interventions are emerging, with the potential to modify the postoperative course130. Classical machine learning approaches, such as random forests, have been robustly applied in other heterogenous, multimodal time-series applications and stand to have particular value in the postoperative monitoring setting. For example, the explainable AI-based Prescience system monitors vital signs, predicts hypoxemic events five minutes before they happen and provides clinicians with real-time risk scores that continuously update with transparent visualization of considered risk factors131,132.

To enable multimodal data-driven insights in postoperative sensors, a plethora of novel medical devices and sensors are being pursued (Fig. 2). Real-time physiological sensing of wound healing133,134, remote identification of superficial skin infections135,136 and cardiorespiratory sensors137,138 are all putative technologies to enhance postoperative monitoring.

Fig. 2: Sensor inputs for peri- and postoperative continuous monitoring.
figure 2

Examples of innovative sensors include chest- and axilla-based electrocardiogram, respiratory rate, tidal volume, temperature and skin impedance sensors. In the postoperative setting, when patients are mobilizing and discharged home, wrist- and finger-based sensors offer a safety netting system for the monitoring of sympathetic stress (via heart rate variability and skin impedance), postoperative arrhythmia and wound healing (for the early identification of superficial skin infection and/or wound dehiscence). Sensor-based technologies can be catergorized as continuous inpatient monitoring and early post-discharge monitoring to enable hospital-at-home services.

Complication prediction

The prediction of complications after surgery has been the goal of many academic studies139,140 and presents a formidable challenge in a complex postoperative setting, with myriad variables affecting care and outcomes. However, the early detection of complications138,141—in particular, devastating outcomes such as anastomotic leaks after rectal cancer surgery and postoperative pancreatic fistulas after pancreatic surgery142,143—is likely to have a substantial impact on the ability of healthcare systems to reduce mortality following complications144,145. MySurgeryRisk represents one of the few advances in complication prediction, using a machine learning algorithm50. However, despite promising performance in single-center studies68, there is little understanding of how to scale these algorithms to other health systems. The value of algorithmic approaches to complication prediction18 and postoperative monitoring after pancreatic resections has been demonstrated in The Netherlands146, serving as a reproducible model to aspire to. Wellcome Leap’s US$50 million SAVE Program has identified failure to rescue from postoperative complications as a leading cause of death and the third most common cause of death globally147, and has prioritized this as a target for innovation. Its goals focus on advanced sensing, monitoring and pattern recognition148. This remains a nascent field with numerous attempts but few breakthroughs, making complication prognostication a high-value target for AI-based technologies, particularly as sensors149, wearables23 and devices capable of enabling multimodal, temporally rich inputs emerge.

Home-based recovery

In the United States, 50% of those who undergo a surgical procedure are over 65 years of age150. With advancing age, recovery can be prolonged and periods of return to baseline activities of daily living (ADLs) can extend beyond several months. Kim et al.151 have proposed a multidimensional, AI-driven, home-based recovery model enabled by frequent, noninvasive assessments of ADLs. They centered their proposed paradigm shift on the basis of: (1) continuous real-time data collection; (2) nuanced assessment of relevant measures of activities of daily living; and (3) innovative assessments of ADLs to be leveraged in the postoperative, post-discharge and home-based setting. Again, these innovations would be driven by sensor technologies152, including the continuous detection of video, location, audio, motion and temperature data in various home-based settings, integrated to provide a continuous assessment of activity patterns. These data contribute toward phenotyping recovery patterns and predicting adverse outcomes (for example, falls), informing care needs and personalized interventions in conjunction with multidisciplinary teams (such as occupational therapists). Systems-level implementation, data privacy and real-world prospective validation are awaited. ClinAIOps (clinical AI operations) is a recent framework for integrating AI into continuous therapeutic monitoring in a way that could be directly translated toward postoperative home-based monitoring153.

As innovations proliferate toward the goal of remote postoperative monitoring, including mobile technologies24, sensors149, wearables23 and hospital-at-home services, we have identified the key limitations to advancement to be the lack of routine large-scale implementation efforts, collaborations and comprehensive innovation evaluations in line with the IDEAL (idea, development, exploration, assessment and long-term follow-up) framework154.

Building the evidence base

Emerging AI technologies need to be robustly evaluated in line with existing innovation frameworks113,155 and, with the advent of multimodal and generative models, regulatory oversight and monitored implementation are pivotal. Complex intervention frameworks provide a robust tool to facilitate ongoing monitoring and rapid troubleshooting156. Engagement with all stakeholders, including patients, administrators, clinicians, industry and scientists, will be important to align visions and work concertedly toward improved surgical care.

While emerging models demonstrate promise, robust, prospective, randomized evidence is required to demonstrate improvements in patient care. To date, only six randomized trials of AI exist in surgery (Table 1), all of which employed unimodal approaches, but the increasing number of trial registrations on the topic of assessing the efficacy of AI interventions is promising. AI offers diverse strengths and potential across many fields, but development alone is insufficient. Evaluation, validation, implementation and monitoring are required. The implementation of AI platforms at the pre-, intra- and postoperative phases should be guided by robust evidence of the benefits, such as a more accurate and timely diagnosis, reduced complications and improved systems efficiencies.

Table 1 Summary of the six published randomized controlled trials of AI in surgery

Future of surgical AI

Medicine is entering an exciting phase of digital innovation, with clinical evidence now beginning to accumulate behind advances in AI applications. Domain-specific excellence is emerging, with vast potential for translational progress in surgery. A sector of medical practice that once lagged behind in terms of evidence-based medicine157, surgery has evolved to thrive on world-class research and evidence158. Surgery now equals other fields, such as cardiology, in terms of the quantity of randomized trials in AI applications, only lagging behind frontrunner fields such as gastroenterology and radiology, where task-specific applications are opportune, particularly around image processing55. In this Review, we have highlighted many of the most pragmatic and innovative emerging use cases of AI in surgery, with a particular focus on direct feasibility and preparedness for clinical translation, but there remain numerous additional examples and untapped avenues for further pursuit. As we pioneer surgical AI, the values of privacy, data security, accuracy, reproducibility, mitigation of biases, enhancement of equity, widening access and, above all, evidence-based care should guide our technological advances.

Reviews of AI in surgery frequently speculate toward autonomous robotic surgeons. In our view, this is the most distant of the realizable goals of surgical AI systems. While much attention has also been given to surgical automation159, robotics and computer vision, these efforts should be contextualized in a time period where robotic surgery has yet to definitively demonstrate its advantage over other minimally invasive approaches159,160,161. In a resource-limited global surgical landscape, it remains to be seen whether AI-driven automation may offer the scalability to robotic surgical platforms that may help define its clinical value.

Surgery poses specific challenges for AI integration that are distinct from other areas of medicine. There is a paucity of digital infrastructure in most healthcare settings such that annotated datasets and digitized intraoperative records are rarely available162. In addition, procedural heterogeneity, acuity and rapidly changing clinical parameters represent a challenging and dynamic environment in which AI interventions will be required to deliver accurate and evolving output. Despite these known challenges, targeted work in these areas, including growing priority toward digital infrastructure, data security and privacy, as well as unsupervised AI paradigms, demonstrates substantial promise.

Transformer models are poised to enable real-time analytics of multi-layered data, including patient anatomy, biomarkers of physiology, sensor inputs, -omics data, environmental data and more. When leveraged by a fine-tuned understanding of the corpus of medical knowledge, such models stand to have a vast impact on surgical care64. At the time of writing, few examples exist for novel generative AI models in surgery. In the sections above, we present several opportunities for such generalizable AI models unburdened by labeling needs to be implemented in surgical care as the generalist AI surgeon augmenter. These approaches are common to AI in medicine, with the majority of approaches using decision trees, neural networks and reinforcement learning55. Early implementations of existing LLMs for text generation, data extraction and patient care are undoubtedly underway163, with notable caveats such as model accuracy degradation, output overconfidence, lack of data privacy and regulatory approvals and a deficiency of prospective clinical trials yet to be overcome.

Numerous apprehensions remain with regard to the integration of AI into surgical practice, with many clinicians perceiving limited scope in a field dominated by experiential decision-making competency, apprentice model teaching structures and hands-on therapies. However, with the rapid development of AI in software, hardware and logistics, these perceived limitations in scope will be continuously tested. We envision a collaborative future between surgeons and AI technologies, with surgical innovation guided first and foremost by patient needs and outcomes.

AI in surgery is a rapidly developing and promising avenue for innovation; the realization of this potential will be underpinned by increased collaboration154, robust randomized trial evidence55, the exploration of novel use cases164 and the development of a digitally minded surgical infrastructure to enable this technological transformation. The role of AI in surgery is set to expand dramatically and, with correct oversight, its ultimate promise is to effectively improve both patient and operator outcomes, reduce patient morbidity and mortality and enhance the delivery of surgery globally.