Expert surgeons and deep learning models can predict the outcome of surgical hemorrhage from 1 min of video

Pangal, Dhiraj J.; Kugener, Guillaume; Zhu, Yichao; Sinha, Aditya; Unadkat, Vyom; Cote, David J.; Strickland, Ben; Rutkowski, Martin; Hung, Andrew; Anandkumar, Animashree; Han, X. Y.; Papyan, Vardan; Wrobel, Bozena; Zada, Gabriel; Donoho, Daniel A.

doi:10.1038/s41598-022-11549-2

Download PDF

Article
Open access
Published: 17 May 2022

Expert surgeons and deep learning models can predict the outcome of surgical hemorrhage from 1 min of video

Dhiraj J. Pangal¹,
Guillaume Kugener¹,
Yichao Zhu²,
Aditya Sinha¹,
Vyom Unadkat²,
David J. Cote¹,
Ben Strickland¹,
Martin Rutkowski³,
Andrew Hung⁴,
Animashree Anandkumar^5,6,
X. Y. Han⁷,
Vardan Papyan⁸,
Bozena Wrobel⁹,
Gabriel Zada¹ &
…
Daniel A. Donoho¹⁰

Scientific Reports volume 12, Article number: 8137 (2022) Cite this article

2917 Accesses
8 Citations
6 Altmetric
Metrics details

Subjects

Abstract

Major vascular injury resulting in uncontrolled bleeding is a catastrophic and often fatal complication of minimally invasive surgery. At the outset of these events, surgeons do not know how much blood will be lost or whether they will successfully control the hemorrhage (achieve hemostasis). We evaluate the ability of a deep learning neural network (DNN) to predict hemostasis control ability using the first minute of surgical video and compare model performance with human experts viewing the same video. The publicly available SOCAL dataset contains 147 videos of attending and resident surgeons managing hemorrhage in a validated, high-fidelity cadaveric simulator. Videos are labeled with outcome and blood loss (mL). The first minute of 20 videos was shown to four, blinded, fellowship trained skull-base neurosurgery instructors, and to SOCALNet (a DNN trained on SOCAL videos). SOCALNet architecture included a convolutional network (ResNet) identifying spatial features and a recurrent network identifying temporal features (LSTM). Experts independently assessed surgeon skill, predicted outcome and blood loss (mL). Outcome and blood loss predictions were compared with SOCALNet. Expert inter-rater reliability was 0.95. Experts correctly predicted 14/20 trials (Sensitivity: 82%, Specificity: 55%, Positive Predictive Value (PPV): 69%, Negative Predictive Value (NPV): 71%). SOCALNet correctly predicted 17/20 trials (Sensitivity 100%, Specificity 66%, PPV 79%, NPV 100%) and correctly identified all successful attempts. Expert predictions of the highest and lowest skill surgeons and expert predictions reported with maximum confidence were more accurate. Experts systematically underestimated blood loss (mean error − 131 mL, RMSE 350 mL, R² 0.70) and fewer than half of expert predictions identified blood loss > 500 mL (47.5%, 19/40). SOCALNet had superior performance (mean error − 57 mL, RMSE 295 mL, R² 0.74) and detected most episodes of blood loss > 500 mL (80%, 8/10). In validation experiments, SOCALNet evaluation of a critical on-screen surgical maneuver and high/low-skill composite videos were concordant with expert evaluation. Using only the first minute of video, experts and SOCALNet can predict outcome and blood loss during surgical hemorrhage. Experts systematically underestimated blood loss, and SOCALNet had no false negatives. DNNs can provide accurate, meaningful assessments of surgical video. We call for the creation of datasets of surgical adverse events for quality improvement research.

Prediction of tumor origin in cancers of unknown primary origin with cytology-based deep learning

Article Open access 16 April 2024

TacticAI: an AI assistant for football tactics

Article Open access 19 March 2024

Segment anything in medical images

Article Open access 22 January 2024

Introduction

Major vascular injury during minimal access, endoscopic or robotic-assisted surgery can impair visualization and requires immediate action^1,2. Despite maximal efforts, including the conversion from minimally invasive to ‘open’ surgery, 13–60% of major vascular injuries result in patient death^2,3,4,5,6. Surgeons immediately assess the likelihood of achieving hemostasis and the need for blood transfusion, however; inexperience, inability^7,8,9,10,11 and stress^1,3,12,13 impair decision-making. Accordingly, surgeon self-assessments of the likelihood of controlling an unexpected vascular complication are uncorrelated with their actual performance¹⁴. Inaccurate predictions of blood loss and task outcome risk patient harm by delaying changes in technique, aid from surgical colleagues, or transfusion of blood products. Rather than waiting for a patient’s clinical deterioration, early prediction of difficulty at achieving hemostasis and high-volume blood loss using computer vision (CV) techniques could optimize patient outcomes.

We created SOCAL (Simulated Outcomes Following Carotid Artery Laceration), a video dataset of attending and resident surgeons (otorhinolaryngologists and neurosurgeons) controlling life-threatening internal carotid artery injury (ICAI) in a validated, high-fidelity bleeding cadaveric simulator^{14,15,16,17,18}. Carotid injury is a catastrophic complication of endonasal surgery and results in up to 30% mortality, similar to vascular injuries during minimally-invasive abdominal and thoracic surgery^5,19,20. In prior work, we applied artificial intelligence (AI) methods to SOCAL video and developed tools that quantify blood loss and measure surgeon performance metrics from video^21,22. Using these tools, we showed that video contains signals of surgical task outcome, but we do not know whether the model can detect predictive signals early in a bleeding episode, nor its performance compared to gold-standard human experts.

We provided human experts (fellowship trained skull-base neurosurgeons) with the first minute of 20 videos from SOCAL (‘Test Set’) and collected predictions of blood loss and task success over the entire unseen task. Experts’ predictions of outcome and blood loss established a benchmark of human performance. We then built a deep learning neural network (DNN) trained on the SOCAL video dataset (excluding the Test Set), called SOCALNet, and compared model performance on the Test Set to expert benchmarks. We validated SOCALNet predictions in subsequent experiments. To the authors knowledge this is the first comparison of DNN-derived surgical video outcome prediction to human experts viewing the same video.

Methods

Experimental design

Experimental setup, data collection, consent and implementation parameters for the dataset are found in Appendix 1. Seventy-five surgeons ranging from junior trainees to world experts on endoscopic endonasal approaches (EEA) were recorded in a nationwide, validated, high-fidelity training exercise. Surgeons attempted to control an ICAI in a cadaveric head perfused with blood substitute. In short, task success was defined as the ability for the operating surgeon to achieve hemostasis using a crushed muscle patch within 5 min, upon which simulated patient mortality occurred. Blood loss was additionally measured and recorded for each trial. Performance data and intraoperative video was used to develop the SOCAL database^{14,15,16,17,18,23}. The SOCAL database was developed in concordance with previously published methods, and is publicly available^23,24,25,26. The SQUIRE reporting guidelines were followed²⁷. The study was approved by the IRB of the University of Southern California. All research was performed in accordance with relevant regulations/guidelines. No patient data was utilized therefore patient-level informed consent was waived. Participating surgeons’ consent was obtained for intraoperative video recording. Surgeon-expert consent was obtained.

Datasets

The 147 videos in SOCAL were divided into a training set of 127 videos and a separate test set of 20 videos. Ten videos depicting successes and 10 of failure were initially chosen at random for the test set; ultimately, 11 success videos (and 9 failures) were used due to ease of video formatting. Videos were truncated after 60 s. Only videos in the test set were shown to experts for grading.

SOCALNet model architecture

SOCALNet utilized two distinct neural network architectures and a transfer learning approach to generate predictions using video. The first layer, a ResNet, is used to analyze each individual frame to generate a vector representation of features which correspond with success/failure of a trial, or an amount of blood lost. However, given the necessity to analyze video (versus individual frames), a temporal layer was added following the ResNet. This temporal layer utilizes an LSTM architecture, a type of recurrent neural network which contains an input, output and forget gate. These gates can modify information from the current frame as well as the frames prior, before passing these modified weights to the subsequent cell, effectively regulating the flow of information across a temporal sequence. This enables SOCALNet to take individual frame-predictions generating by a ResNet and link them together in a temporal sequence using an LSTM. A schematic of SOCALNet is shown in Fig. 1.

SOCALNet model implementation

See eSupp1 for model code. Video was sampled at 1 frame-per-second (fps) and input into two layers, a feature generating layer and a temporal analysis algorithm (Fig. 1). The output of the model was a binary prediction of surgical ability (trial success or failure) and estimated blood loss over the entire trial (in milliliters).

For the feature generator, we utilized a transfer-learning approach, where a Residual Learning Neural Network (ResNet) model was pretrained on the ImageNet 2012 classification dataset^28,29. ResNet is a single-stage convolutional neural network (CNN) which uses skip connections to allow for large networks with many layers to skip layers that hurt overall performance. ResNet has become ubiquitous for object detection and classification in computer vision (CV)²⁹. All weights from pretraining on ImageNet were used in our model, however the final three layers of the ResNet were retrained on SOCAL images to predict blood loss or task success. The values of the four output nodes from the penultimate layer of the ResNet were extracted, representing a 4 × 1 matrix of values predictive of task success/failure or blood loss within that individual frame. This matrix is combined with with tool presence information encoded as an array of eight binary values (1 × 8 matrix per frame, representing whether specific surgical instruments were present within the frame). This process is repeated for all frames, and the resulting 2D matrix is passed into a bi-layer Long Short-Term Memory (LSTM) recurrent neural network³⁰. Instrument annotations alone are inadequate for outcome prediction; successful detectors incorporate instrument data and image features²¹.

Expert assessment

Experts were four skull base fellowship-trained neurosurgeon instructors in ICAI management. Experts watched the 20, 1-min test videos and provided: blood loss estimates (in mL), outcome predictions (success/failure), and surgeon grades (1–5 Likert scale, 1 represents novice and 5 represents master). Experts also reported self-confidence in their outcome prediction (1–5 Likert scale; 5 represents most confident). Each expert was surveyed for this data in a standardized fashion via the following questions: Based on the 1 min of video viewed, (1) do you feel the operating surgeon will succeed or fail in controlling bleeding within 5 min? (2) how much blood (in mL) will be lost by the end of the trial. (3) On a Likert scale of 1–5, how skilled is the operator? (4) How confident are you in this prediction? To provide baselines prior to grading, experts were shown 3 anchoring videos demonstrating predetermined novice, average, and master performances with respective outcomes data. Anchoring videos were not contained in the Test-Set and were chosen as representative videos of each skill level by adjudication by the study team. Experts were not given additional data (e.g. years of practice, attending/resident status) on participating surgeons and relied solely on intraoperative video. Grading sessions were conducted in double-blinded fashion by the lead author (DJP) and individual experts (BS, MR, GZ, DAD, referred to as S1–S4). Given high concordance, mean and mode are reported for experts (‘S’).

Validation analysis

We conducted two experiments to evaluate model and expert concordance. In experiment one, two videos were identified in the Test-Set where a critical error occurred shortly after the 1-min video sample concluded (i.e., not shown to the model or surgeons). The model and all surgeons predicted, incorrectly, that both videos were successes. A new, 1 min clip was generated showing the critical error and its aftermath. These new clips were evaluated by one of the human experts and SOCALNet.

In a second experiment, the three best (least blood loss, successes) and worst (most blood loss, failures) videos were identified from within the Test-Set. Composite ‘best’ and ‘worst’ videos were constructed by combining the first 20 s of each of the three best and worst trials in each possible order permutation (6 ‘best’, 6 ‘worst’ videos). The twelve composite videos were then presented to SOCALNet.

Statistical analysis

Blood loss prediction was reported using mean error, root mean square error (RMSE), and Pearson’s correlation coefficients. Categorical inter-rater reliability was calculated using Cohen’s Kappa and Krippendorff’s alpha for more than two raters. Continuous inter-rater reliability was calculated using Pearson’s correlation coefficient and an inter-rater correlation coefficient (ICC) (> 2 groups; using a two-way random effects ICC model)³¹. We used Fisher’s exact test for categorical comparisons. We performed analysis in Python with SciPy³².

Results

Table 1 lists predictions and ground truth data. There were 11 successful trials and 9 failed trials in the Test Set, with mean blood loss of 568 mL (range 20–1640 mL, mean success = 323 mL, mean failure = 868 mL). Experts correctly predicted outcome in 55/80 predictions (69%, Sensitivity: 79%, Specificity: 56%). Expert predictions were concordant, with one dissent in 80 ratings (Fleiss’ kappa = 0.95). The average root mean square error (RMSE) for blood loss prediction of surgeons was 351 mL (mean error = − 131 mL, average R² = 0.70). Expert ICC was high at 0.72.

Table 1 Results comparing deep learning model with expert Surgeons.

Full size table

Figure 2, and Supplemental Table 1 demonstrates the relationship between prediction confidence, surgeon skill and prediction accuracy. Experts were most accurate when maximally confident (5/5 confidence, accuracy 88%) or viewing a surgeon they rated as having minimal (Likert scale 1, accuracy 92%) or maximal skill (Likert scale 5, accuracy 79%). Predictions with non-maximal confidence (levels 2–4,) were only marginally better than chance (53%, p = 0.02 compared to maximal confidence). Predictions of intermediate skill surgeons were also less accurate (levels 2–4, 63%, p = 0.04 compared to composite 1/5 and 5/5 skill).

SOCALNet correctly predicted outcome in 17/20 trials (85%, Sensitivity: 100%, Specificity: 66%), noninferior to surgeons (p = 0.12). The model predicted blood loss with a RMSE of 295 mL (mean error = − 57 mL, R² = 0.74) (Fig. 3). The model and experts all predicted outcome correctly in 13/20 trials. In four trials, the model was correct and all experts incorrect, in one trial the model was incorrect, and all experts correct, and two trials all were incorrect (Fig. 4). Correlation (R²) between blood loss estimates for the model, experts and ground truth are shown in Supplemental Fig. 1, and range from 0.53 to 0.93. Correlation between the model and the average surgeon blood loss estimate was 0.73, ranging from 0.53 to 0.74 for individual surgeons (Table 1).

We then evaluated trials above the 50th percentile for blood loss, where blood loss exceeded 500 mL and transfusion might be needed. The model predicted a blood loss estimate above 500 mL in 80% (8/10) compared to experts 47.5% (19/40); this difference was not statistically significant (p = 0.09).

Exploratory model-validation

Supplemental Table 2 reports model-validation experiments. In two trials, experts and SOCALNet predicted success, but the surgeon failed due to a critical error shortly after the end of the 1-min clip (therefore unseen by experts and SOCALNet). When we included the critical error, the model accurately predicted ‘failure’, as did an expert. In a second experiment, SOCALNet viewed six composite ‘Best’ trials and uniformly predicted success with low blood loss (328–473 mL); conversely, in six composite ‘Worst’ videos the model uniformly predicted failure with high blood loss (792–794 mL).

Discussion

To address the need for datasets depicting surgical adverse events we created SOCAL, a public video dataset of 147 attempts to control carotid injury in high-fidelity perfused cadavers. In this work we compared human expert predictions of outcome using 1 min of video from 20 trials in the dataset to those of a DNN (SOCALNet). Compared to expert benchmarks, SOCALNet met or surpassed expert prediction performance, despite its relatively primitive architecture and small training data size relative to CV tasks. We synthesized counterfactual videos of excellent and poor surgeon performance to challenge SOCALNet, and it correctly predicted the outcomes in these challenges. SOCALNet and other CV methods can aid surgeons by quantifying and predicting outcome during surgical events, and in automatic video review. The absence of video datasets containing adverse events is a critical unmet need preventing the development of predictive models to improve surgical care.

Benchmark performance of human experts

Expert predictions were highly concordant, indicating that experts detected similar signals of blood loss and outcome (cross-correlation: R² = 0.74–0.93, Kappa for success prediction = 0.95). Experts had uniform definitions of success (hemostasis) and were familiar with the stepwise progression of a well-described technique^18,33. Thus, it is reasonable to conclude that using the first minute of video of a bleeding event, human experts detect signals predictive of blood loss and task outcome.

Although experts had reasonably accurate outcome and blood loss predictions (69% accuracy, R² = 0.7), experts systematically overestimate surgeon success and underestimate bleeding: 4/6 of expert errors were false ‘success’ predictions, experts systematically underestimated blood loss by 131 mL and experts failed to identify 52% of high blood loss (above 500 mL) events. This post-hoc cutoff of 500 mL represents a potential clinical marker of need for transfusion. The tendency for human experts to underestimate blood loss is well documented^34,35,36,37, corroborated by our findings, and may result in delayed recognition of life-threatening hemorrhage.

To validate individual ratings, we asked experts to provide their confidence in each prediction, and perceived skill rating of the participating surgeon. Maximally confident predictions were more likely to be correct, as expected from prior work^34,35,38. Similarly, predictions were most accurate when evaluating highest and lowest-skilled surgeons (skill rating 1 or 5), but scarcely better than chance when evaluating intermediate surgeons. Intermediate skill surgeons comprised half of all surgeons and may benefit greatly from performance assessments.

During a real vascular injury, estimation ability of the average surgeon is likely to be inferior to our experts calmly rating a single stereotyped task after training with videos of known blood loss. Experts’ systematic underestimation of blood loss and struggle to assess performance of intermediate surgeons represents a chasm in surgeon-assessment proficiency. Surgical patients may benefit from novel methods that improve on these benchmarks.

SOCALNet performance compared to experts

We designed a primitive deep-learning architecture containing a standard CNN and a recurrent neural network, which we call SOCALNet. We provided SOCALNet with short videos from a much smaller training dataset than is customary in CV. Despite these disadvantages, SOCALNet made statistically non-inferior (and numerically superior) outcome predictions and superior blood loss predictions compared to human experts. SOCALNet’s predictions of blood loss had a smaller mean underestimation and standard error. Unlike experts, SOCALNet predictions were accurate for intermediate-skill surgeons.

The advantages of SOCALNet support the development of computer vision tools for surgical video review and as potential teammates for surgeons³⁹. SOCALNet demonstrates that CV models can provide accurate, clinically meaningful analyses of surgical outcome from video. Future models could leverage the vast but largely untapped collections of surgical videos. Workflows developed in building SOCALNet can guide model deployment for other surgical adverse events. Human-AI teaming is a validated concept in other domains^40,41,42. A SOCALNet-and-expert combined team (with model as a tiebreaker, particularly when expert confidence was low) would have generated 18/20 correct predictions. Furthermore, the only two inaccurate predictions from this teaming occurred when a critical error was made after the video ceased, and these errors were detected by the model and experts. If utilized at scale, AI-driven video analysis may quantify comparisons of surgical technique, provide real-time feedback for trainees, or provide guidance during rare scenarios a surgeon may not have encountered (e.g. vascular injury) but the model has been trained on³⁹.

SOCALNet has room for improvement. For adverse events, the (1) accurate estimation of high-volume blood-loss and (2) detection of task failures may be prioritized as exsanguination is life-threatening. SOCALNet blood loss predictions exhibited more robust central tendency than experts, resulting in better predictions for typical performances. However, when grading edge cases of the two worst surgeons in the Test Set, SOCALNet underestimated blood loss (absolute error of 790–800 mL on videos exceeding 1.5L of blood loss). In predicting failure (specificity), both experts and SOCALNet showed limitations (Specificity = 0.56, 0.66 respectively); however, improving expert predictions are challenging, and most surgeons are non-experts. Accordingly, applying CV optimization techniques to AI models (e.g. cost-sensitive classification, oversampling) may be preferred^43,44.

Surgical adverse event video datasets: an unmet need in surgical safety

A growing body of evidence supports the quantitative analysis of surgical video^{22,45,46,47,48}. One fundamental discovery has been the detection of signals in surgical video that predict patient outcome: surgeons have heterogeneous skill resulting in heterogeneous outcomes^14,45,46,49. Although low-skill surgeons are more likely to have adverse intraoperative events, video of these events has not been systematically studied. Instead of studying surgical video, studies describe adverse events using textual medical records, radiography, and laboratory results. Analysis of these extra-operative records and correlations with pre-operative risk factors and post-operative management can be useful^{50,51,52,53,54}. However, this research omits a crucial determinant of the outcome of the surgical patient: the surgical event itself. This omission limits root-cause analysis to only the extra-operative universe and prevents evaluation of the technical maneuvers and patient anatomic conditions that make adverse events more likely. Unlike textual records, surgical video depicts all visualized surgeon movements and patient anatomy, making video uniquely suited for the study of operative events. The results of the present study begin to demonstrate the value of studying video of surgical adverse events.

We propose the creation of large, multi-center datasets of surgical videos that includes adverse events^55,56. Video datasets of surgical adverse events can be leveraged using predictive models (e.g., SOCALNet) which can detect intraoperative events, evaluate performance and quantify technique. This study was supported the North American Skull Base Society, whose mission is to promote scientific advancement, share outcomes data for education and to advance outcomes research. Groups such as the Michigan Bariatric Surgery Collaborative and the Michigan Urologic Surgery Improvement Consortium have conducted similar work and we hope to call their attention to adverse events in addition to routine procedures^57,58. National organizations capable of soliciting large bodies of data should prioritize collecting adverse event videos and apply technical innovations adopted by other medical fields to ensure privacy and confidentiality^59,60,61. National organizations can also facilitate the scaling of expert labeling. Small groups face long delays in accruing sufficient cases and labeling video. In this study, despite a long term track record of collaboration amongst our team, it required 2 months for our experts to review 20 min of aggregated video⁶². Collaborative efforts may be able to require video review as a condition of membership. This work is of importance given the potential strength of AI models to augment human performance. In the context of ICAI, an AI model may be useful in predicting high volumes of blood loss, or where outcomes are more uncertain. However, the volume of video required for appropriate statistical power to demonstrate clinical utility would require significant collaboration between institutions and expert surgeon reviewers. We are in the process of establishing a data sharing collective, aimed at providing a secure mechanism for surgeons to share anonymized video and corresponding outcomes. This effort mirrors other quality improvement efforts already underway in surgical fields, with the added modality of surgical video and computer vision analysis. It is our hope that these efforts can accelerate the collection of surgical video and analysis using DNN methods such as described in this manuscript.

Finally, high-fidelity simulation enables analysis of rare surgical events. Curating 150 videos of real carotid injuries would require tens of thousands of cases, an impossible task without streamlined data-sharing mechanisms; using perfused cadavers and real instruments we collected hundreds of observations of this otherwise rare event. Videos in the simulated environment can complement surgical video datasets that otherwise depict thousands of uncomplicated cases and only a few rare events^{14,15,17,18,63,64,65,66}. As more surgical video datasets are developed, we can follow the ‘sim-to-real’ process where models are trained on virtual data and then fine-tuned and validated in the real environment^67,68,69.

Limitations

Our study has several limitations. First, validation on clinical video is a clear next step, although accruing a corpus of carotid injury video would likely require substantial national efforts. Second, individualized models are required which incorporate surgeon experience, response to hemorrhage, and patient specific factors into a predictive model. This is a necessary step in the development of deep learning models and for human-AI teaming. Concepts such as the ‘OR Black Box’ may be able to incorporate factors which may not be captured in purely intraoperative video (e.g. a surgeon’s appropriate response to an injury)⁷⁰. Additionally, results from carotid injuries may not transfer to other vascular injuries, and vascular injuries differ from other adverse events. Finally, this task was performed in a constrained, simulated environment, with clear endpoints; this is of course far removed from realities of clinical practice. Rather than diminishing our results, these complementary challenges showcase the depth of unmet need within surgical-video data science. Separately from these study design limitations, SOCALNet ingests ground truth tool annotations as input, which requires pre-processing of data and is thus not fully automated^71,72,73. The lack of curated surgical video datasets remain a major limitation for future work.

Conclusion

Experts and a neural network can predict the outcome of surgical hemorrhage from the first minute of video of the adverse event. Neural network-based architectures can already achieve human or supra-human performance at predicting clinically relevant outcomes from video. To improve outcomes of surgical patients, advances in quantitative and predictive methods should be applied to newly collected video datasets containing adverse events.

Data availability

The datasets generated during and/or analyzed during the current study are available in the figshare repository, link: https://doi.org/10.6084/m9.figshare.15132468.v1.

References

Lee, Y. F. et al. Unplanned robotic-assisted conversion-to-open colorectal surgery is associated with adverse outcomes. J. Gastrointest. Surg. 22, 1059–1067 (2018).
Article PubMed Google Scholar
England, E. C. et al. REBOA as a rescue strategy for catastrophic vascular injury during robotic surgery. J. Robot. Surg. 14, 473–477 (2020).
Article PubMed Google Scholar
Sandadi, S. et al. Recognition and management of major vessel injury during laparoscopy. J. Minim. Invasive Gynecol. 17, 692–702 (2010).
Article PubMed Google Scholar
Hemingway, J. F. et al. Intraoperative consultation of vascular surgeons is increasing at a major American trauma center. J. Vasc. Surg. 74, 1581–1587 (2021).
Article PubMed Google Scholar
Laws, E. R. Vascular complications of transsphenoidal surgery. Pituitary 2, 163–170 (1999).
Article PubMed Google Scholar
Beekley, A. C. Damage control resuscitation: A sensible approach to the exsanguinating surgical patient. Crit. Care Med. 36, S267-274 (2008).
Article PubMed Google Scholar
Tisherman, S. A. Management of major vascular injury: Open. Otolaryngol. Clin. N. Am. 49, 809–817 (2016).
Article Google Scholar
Melnic, C. M., Heng, M. & Lozano-Calderon, S. A. Acute surgical management of vascular injuries in hip and knee arthroplasties. J. Am. Acad. Orthop. Surg. 28, 874–883 (2020).
Article PubMed Google Scholar
Quasarano, R. T., Kashef, M., Sherman, S. J. & Hagglund, K. H. Complications of gynecologic laparoscopy. J. Am. Assoc. Gynecol. Laparosc. 6, 317–321 (1999).
Article CAS PubMed Google Scholar
Asfour, V., Smythe, E. & Attia, R. Vascular injury at laparoscopy: A guide to management. J. Obstet. Gynaecol. 38, 598–606 (2018).
Article PubMed Google Scholar
Filis, K. et al. Iatrogenic vascular injuries of the abdomen and pelvis: The experience at a Hellenic University Hospital. Vasc. Endovasc. Surg. 53, 541–546 (2019).
Article Google Scholar
Arora, S. et al. Stress impairs psychomotor performance in novice laparoscopic surgeons. Surg. Endosc. 24, 2588–2593 (2010).
Article PubMed Google Scholar
Jukes, A. K. et al. Stress response and communication in surgeons undergoing training in endoscopic management of major vessel hemorrhage: A mixed methods study. Int. Forum Allergy Rhinol. 7, 576–583 (2017).
Article PubMed Google Scholar
Donoho, D. A. et al. Improved surgeon performance following cadaveric simulation of internal carotid artery injury during endoscopic endonasal surgery: Training outcomes of a nationwide prospective educational intervention. J. Neurosurg. 1, 1–9 (2021).
Google Scholar
Shen, J. et al. Objective validation of perfusion-based human cadaveric simulation training model for management of internal carotid artery injury in endoscopic endonasal sinus and skull base surgery. Oper. Neurosurg. 15, 231–238 (2018).
Article Google Scholar
Zada, G. et al. Development of a perfusion-based cadaveric simulation model integrated into neurosurgical training: Feasibility based on reconstitution of vascular and cerebrospinal fluid systems. Oper. Neurosurg. 14, 72–80 (2018).
Article Google Scholar
Donoho, D. A. et al. Costs and training results of an objectively validated cadaveric perfusion-based internal carotid artery injury simulation during endoscopic skull base surgery. Int. Forum Allergy Rhinol. 9, 787–794 (2019).
PubMed Google Scholar
Pham, M. et al. A perfusion-based human cadaveric model for management of carotid artery injury during endoscopic endonasal skull base surgery. J. Neurol. Surg. B 75, 309–313 (2014).
Article Google Scholar
Ciric, I., Ragin, A., Baumgartner, C. & Pierce, D. Complications of transsphenoidal surgery: Results of a national survey, review of the literature, and personal experience. Neurosurgery 40, 225–236 (1997) (discussion 236–237).
Article CAS PubMed Google Scholar
AlQahtani, A. et al. Assessment of factors associated with internal carotid injury in expanded endoscopic endonasal skull base surgery. JAMA Otolaryngol. Head Neck Surg. https://doi.org/10.1001/jamaoto.2019.4864 (2020).
Article PubMed PubMed Central Google Scholar
Kugener, G. et al. Deep neural networks can accurately detect blood loss and hemorrhage control task success from intraoperative video. Neurosurgery. https://doi.org/10.1227/neu.0000000000001906.
Pangal, D. J. et al. Surgical video-based automated performance metrics predict blood loss and success of simulated vascular injury control in neurosurgery: A pilot study. J. Neurosurg.. https://doi.org/10.3171/2021.10.JNS211064.
Pangal, D. J. et al. Technical note: A guide to annotation of neurosurgical intraoperative video for machine learning analysis and computer vision. World Neurosurg. https://doi.org/10.1016/j.wneu.2021.03.022 (2021).
Article PubMed Google Scholar
Kugener, G., Pangal, D. J. & Zada, G. Simulated Outcomes following Carotid Artery Laceration (2021)https://doi.org/10.6084/m9.figshare.15132468.v1.
Paper Information/Code Submission Policy. https://nips.cc/Conferences/2021/PaperInformation/CodeSubmissionPolicy.
Kugener, G. et al. Utility of the simulated outcomes following carotid artery laceration (SOCAL) Video dataset for machine learning applications. JAMA Netw. Open. https://doi.org/10.1001/jamanetworkopen.2022.3177
Squire 2.0 (Standards for Quality Improvement Reporting Excellence): Revised Publication Guidelines From a Detailed Consensus Process|American Journal of Critical Care | American Association of Critical-Care Nurses. https://aacnjournals.org/ajcconline/article-abstract/24/6/466/4045/Squire-2-0-Standards-for-Quality-Improvement.
He, K., Zhang, X., Ren, S. & Sun, J. Deep Residual Learning for Image Recognition. arXiv:1512.03385 [cs] (2015).
Deng, J. et al. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition 248–255 (2009). https://doi.org/10.1109/CVPR.2009.5206848.
Yengera, G., Mutter, D., Marescaux, J. & Padoy, N. Less is More: Surgical Phase Recognition with Less Annotations through Self-Supervised Pre-training of CNN-LSTM Networks. arXiv:1805.08569 [cs] (2018).
Koo, T. K. & Li, M. Y. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J. Chiropr. Med. 15, 155–163 (2016).
Article PubMed PubMed Central Google Scholar
Virtanen, P. et al. SciPy 1.0: Fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).
Article CAS PubMed PubMed Central Google Scholar
Kassir, Z. M., Gardner, P. A., Wang, E. W., Zenonos, G. A. & Snyderman, C. H. Identifying best practices for managing internal carotid artery injury during endoscopic endonasal surgery by consensus of expert opinion. Am. J. Rhinol. Allergy https://doi.org/10.1177/19458924211024864 (2021).
Article PubMed Google Scholar
Thomas, S. et al. Measured versus estimated blood loss: Interim analysis of a prospective quality improvement study. Am. Surg. 86, 228–231 (2020).
Article PubMed Google Scholar
Lopez-Picado, A., Albinarrate, A. & Barrachina, B. Determination of perioperative blood loss: Accuracy or approximation?. Anesth. Analg. 125, 280–286 (2017).
Article CAS PubMed Google Scholar
Saoud, F. et al. Validation of a new method to assess estimated blood loss in the obstetric population undergoing cesarean delivery. Am. J. Obstet. Gynecol. 221(267), e1-267.e6 (2019).
Google Scholar
Rubenstein, A. F., Zamudio, S., Douglas, C., Sledge, S. & Thurer, R. L. Automated quantification of blood loss versus visual estimation in 274 vaginal deliveries. Am. J. Perinatol. https://doi.org/10.1055/s-0040-1701507 (2020).
Article PubMed Google Scholar
Serapio, E. T., Pearlson, G. A., Drey, E. A. & Kerns, J. L. Estimated versus measured blood loss during dilation and evacuation: An observational study. Contraception 97, 451–455 (2018).
Article PubMed Google Scholar
Ward, T. M. et al. Computer vision in surgery. Surgery 169, 1253–1256 (2021).
Article PubMed Google Scholar
Maia Chess. https://maiachess.com.
Zhang, R., McNeese, N. J., Freeman, G. & Musick, G. ‘An ideal human’: Expectations of AI teammates in human-AI teaming. Proc. ACM Hum.-Comput. Interact. 4(246), 1–25 (2021).
Google Scholar
Human–AI collaboration inspires tyre innovation.
Elkan, C. The foundations of cost-sensitive learning. In Proceedings of the 17th International Joint Conference on Artificial Intelligence, Vol. 2, 973–978 (Morgan Kaufmann Publishers Inc., 2001).
Teh, K., Armitage, P., Tesfaye, S., Selvarajah, D. & Wilkinson, I. D. Imbalanced learning: Improving classification of diabetic neuropathy from magnetic resonance imaging. PLoS ONE 15, e0243907 (2020).
Article CAS PubMed PubMed Central Google Scholar
Birkmeyer, J. D. et al. Surgical skill and complication rates after bariatric surgery. N. Engl. J. Med. 369, 1434–1442 (2013).
Article CAS PubMed Google Scholar
Brajcich, B. C. et al. Association between surgical technical skill and long-term survival for colon cancer. JAMA Oncol. https://doi.org/10.1001/jamaoncol.2020.5462 (2020).
Article PubMed Central Google Scholar
Chhabra, K. R., Thumma, J. R., Varban, O. A. & Dimick, J. B. Associations between video evaluations of surgical technique and outcomes of laparoscopic sleeve gastrectomy. JAMA Surg. 156, e205532 (2021).
Article PubMed Google Scholar
Greenberg, C. C., Dombrowski, J. & Dimick, J. B. Video-based surgical coaching: An emerging approach to performance improvement. JAMA Surg. 151, 282–283 (2016).
Article PubMed PubMed Central Google Scholar
Stulberg, J. J. et al. Association between surgeon technical skills and patient outcomes. JAMA Surg. https://doi.org/10.1001/jamasurg.2020.3007 (2020).
Article PubMed PubMed Central Google Scholar
Elsamadicy, A. A. et al. Reduced impact of obesity on short-term surgical outcomes, patient-reported pain scores, and 30-day readmission rates after complex spinal fusion (>/=7 levels) for adult deformity correction. World Neurosurg. 127, e108–e113 (2019).
Article PubMed Google Scholar
Jones, D. et al. Multicentre, prospective observational study of the correlation between the Glasgow Admission Prediction Score and adverse outcomes. BMJ Open 9, e026599 (2019).
Article PubMed PubMed Central Google Scholar
Arango-Lasprilla, J. C. et al. Predictors of extended rehabilitation length of stay after traumatic brain injury. Arch. Phys. Med. Rehabil. 91, 1495–1504 (2010).
Article PubMed Google Scholar
Giannini, A. et al. Predictors of postoperative overall and severe complications after surgical treatment for endometrial cancer: The role of the fragility index. Int. J. Gynaecol. Obstet. 148, 174–180 (2020).
Article PubMed Google Scholar
Simpson, A. M., Donato, D. P., Kwok, A. C. & Agarwal, J. P. Predictors of complications following breast reduction surgery: A National Surgical Quality Improvement Program study of 16,812 cases. J. Plast. Reconstr. Aesthet. Surg. 72, 43–51 (2019).
Article PubMed Google Scholar
NEUROSURGERY Journal. Carotid Injury in Endonasal Surgery. (2013).
NEUROSURGERY Journal. Managing Arterial Injury in Endoscopic Skull Base Surgery. (2015).
Home|MBSC Coordinating Center. Michigan Bariatric S https://www.mbscsurgery.org.
Michigan Urological Surgery Improvement Collaborative (MUSIC). https://musicurology.com/.
Rieke, N. et al. The future of digital health with federated learning. npj Digit. Med. 3, 1–7 (2020).
Article Google Scholar
Dou, Q. et al. Federated deep learning for detecting COVID-19 lung abnormalities in CT: A privacy-preserving multinational validation study. NPJ Digit. Med. 4, 60 (2021).
Article PubMed PubMed Central Google Scholar
Willemink, M. J. et al. Preparing medical imaging data for machine learning. Radiology 295, 4–15 (2020).
Article PubMed Google Scholar
Lendvay, T. S., White, L. & Kowalewski, T. Crowdsourcing to assess surgical skill. JAMA Surg. 150, 1086–1087 (2015).
Article PubMed Google Scholar
Winer, J. L. et al. Cerebrospinal fluid reconstitution via a perfusion-based cadaveric model: Feasibility study demonstrating surgical simulation of neuroendoscopic procedures. J. Neurosurg. 123, 1316–1321 (2015).
Article PubMed Google Scholar
Christian, E. A. et al. Perfusion-based human cadaveric specimen as a simulation training model in repairing cerebrospinal fluid leaks during endoscopic endonasal skull base surgery. J. Neurosurg. 129, 792–796 (2018).
Article PubMed Google Scholar
Strickland, B. A. et al. The use of a novel perfusion-based human cadaveric model for simulation of dural venous sinus injury and repair. Oper. Neurosurg. 19, E269–E274 (2020).
Article Google Scholar
Bakhsheshian, J. et al. The use of a novel perfusion-based cadaveric simulation model with cerebrospinal fluid reconstitution comparing dural repair techniques: A pilot study. Spine J. 17, 1335–1341 (2017).
Article PubMed Google Scholar
Closing the simulation-to-reality gap for deep robotic learning. Google AI Blog http://ai.googleblog.com/2017/10/closing-simulation-to-reality-gap-for.html.
Christiano, P. et al. Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model (2016).
Bissonnette, V. et al. Artificial intelligence distinguishes surgical training levels in a virtual reality spinal task. J. Bone Jt. Surg. 101, e127 (2019).
Article Google Scholar
Jung, J. J., Jüni, P., Lebovic, G. & Grantcharov, T. First-year analysis of the operating room black box study. Ann. Surg. 271, 122–127 (2020).
Article PubMed Google Scholar
Kranzfelder, M. et al. Real-time instrument detection in minimally invasive surgery using radiofrequency identification technology. J. Surg. Res. 185, 704–710 (2013).
Article PubMed Google Scholar
Du, X. et al. Articulated multi-instrument 2-D pose estimation using fully convolutional networks. IEEE Trans. Med. Imaging 37, 1276–1287 (2018).
Article PubMed Google Scholar
Staartjes, V. E., Volokitin, A., Regli, L., Konukoglu, E. & Serra, C. Machine vision for real-time intraoperative anatomic guidance: A proof-of-concept study in endoscopic pituitary surgery. Oper. Neurosurg. https://doi.org/10.1093/ons/opab187 (2021).
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Neurosurgery, Keck School of Medicine of the University of Southern California, Los Angeles, CA, USA
Dhiraj J. Pangal, Guillaume Kugener, Aditya Sinha, David J. Cote, Ben Strickland & Gabriel Zada
Viterbi School of Engineering, University of Southern California, Los Angeles, CA, USA
Yichao Zhu & Vyom Unadkat
Department of Neurosurgery, Medical College of Georgia, Augusta, GA, USA
Martin Rutkowski
Center for Robotic Simulation and Education, USC Institute of Urology, Keck School of Medicine of the University of Southern California, Los Angeles, CA, USA
Andrew Hung
Department of Computer Science + Mathematics, California Institute of Technology, Pasadena, CA, USA
Animashree Anandkumar
Nvidia Corp., Santa Clara, CA, USA
Animashree Anandkumar
Department of Operations Research and Information Engineering, Cornell University, Ithaca, NY, USA
X. Y. Han
Department of Mathematics, University of Toronto, Toronto, ON, Canada
Vardan Papyan
Department of Otolaryngology, Keck School of Medicine of the University of Southern California, Los Angeles, CA, USA
Bozena Wrobel
Division of Neurosurgery, Center for Neuroscience, Children’s National Hospital, Washington, DC, 20010, USA
Daniel A. Donoho

Authors

Dhiraj J. Pangal
View author publications
You can also search for this author in PubMed Google Scholar
Guillaume Kugener
View author publications
You can also search for this author in PubMed Google Scholar
Yichao Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Aditya Sinha
View author publications
You can also search for this author in PubMed Google Scholar
Vyom Unadkat
View author publications
You can also search for this author in PubMed Google Scholar
David J. Cote
View author publications
You can also search for this author in PubMed Google Scholar
Ben Strickland
View author publications
You can also search for this author in PubMed Google Scholar
Martin Rutkowski
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Hung
View author publications
You can also search for this author in PubMed Google Scholar
Animashree Anandkumar
View author publications
You can also search for this author in PubMed Google Scholar
X. Y. Han
View author publications
You can also search for this author in PubMed Google Scholar
Vardan Papyan
View author publications
You can also search for this author in PubMed Google Scholar
Bozena Wrobel
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel Zada
View author publications
You can also search for this author in PubMed Google Scholar
Daniel A. Donoho
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Study design: D.J.P., G.K., A.S., G.Z., D.A.D. Data acquisition: D.J.P., G.K., B.S., M.R., G.Z., D.A.D. Model development: D.J.P., G.K., A.S., V.U., X.H., V.P., D.A.D. Statistical analysis: D.J.P., G.K., D.A.D. Writing—original draft: D.J.P., G.K., D.A.D. Writing—revisions: All authors. Final approval: All authors. Study supervision: G.Z., D.A.D.

Corresponding author

Correspondence to Daniel A. Donoho.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Legends.

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Pangal, D.J., Kugener, G., Zhu, Y. et al. Expert surgeons and deep learning models can predict the outcome of surgical hemorrhage from 1 min of video. Sci Rep 12, 8137 (2022). https://doi.org/10.1038/s41598-022-11549-2

Download citation

Received: 22 December 2021
Accepted: 18 April 2022
Published: 17 May 2022
DOI: https://doi.org/10.1038/s41598-022-11549-2

This article is cited by

Simulated outcomes for durotomy repair in minimally invasive spine surgery
- Alan Balu
- Guillaume Kugener
- Daniel A. Donoho
Scientific Data (2024)
Advancements in Skull Base Surgery: Navigating Complex Challenges with Artificial Intelligence
- Garima Upreti
Indian Journal of Otolaryngology and Head & Neck Surgery (2024)
Neurosurgical skills analysis by machine learning models: systematic review
- Oleg Titov
- Andrey Bykanov
- David Pitskhelauri
Neurosurgical Review (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.