Introduction

Advancements in optical and robotic technology over the last 30 years have significantly enhanced the ability of the surgeon’s eyes and hands, thereby facilitating a safe and precise surgery in a minimally invasive manner1,2. Meanwhile, there is a lag in technology that helps the brain between the surgeon’s eyes and hands. The step in recognising the anatomy based on the visual information acquired from the eyes and that in determining the actions that should be applied to the object is still completely dependent on the surgeon’s knowledge and experience. However, if surgeons are not experienced or their attention level is impaired due to fatigue or disturbance, there is a risk of misidentification. Approximately 30% of surgical complications are caused by intraoperative misidentification3. If this recognition step can be supported by artificial intelligence (AI), it will greatly assist inexperienced surgeons in performing safe and accurate surgeries.

In recent years, the performance of visual information analysis using deep-learning technology, such as automated driving systems, has been rapidly advancing. Moreover, its application in society has been progressing. Using a similar technology, we developed an AI surgical support system that recognises anatomical structures on surgical images using AI based on deep learning with convolutional neural networks and that immediately presents the recognition results in real-time4. It can recognise various anatomical structures that are important in thoracic and abdominal surgery, such as the nerves, vessels, ureters and pancreas and appropriate dissection layers. During lung cancer surgery, it is necessary to identify and securely preserve important thoracic nerves such as the vagus, recurrent, and phrenic nerves during lymph node dissection. Moreover, their palsy significantly impairs the postoperative quality of life. The current study aimed to evaluate the accuracy of the AI surgical support system in recognising and presenting the thoracic nerves in real-time during lung cancer surgery.

Results

Ten patients participated in this study in which the surgical support system was used in actual lung cancer surgery. The procedures performed were seven left-lung surgeries (left upper lobectomy, n = 1; left lower lobectomy, n = 3; and left upper division segmentectomy, n = 3) and three right-lung surgeries (right upper lobectomy, n = 3). All patients underwent mediastinal lymph node dissection with exposure of the vagus and phrenic nerves. All seven patients with left-lung cancer underwent dissection of the left recurrent laryngeal nerve. The results of nerve recognition were presented successfully in all cases.

The results of the computational evaluation of the created recognition model of the thoracic nerves were relatively favourable for recognising numerous thin structures, with a Dice index of 0.56 and a Jaccard index of 0.39.

The expert evaluation results of the accuracy of neural recognition were satisfactory, with a recall score of 4.5 ± 0.4 and a precision score of 4.0 ± 0.9 (Table 1). The pleura and bronchial wall were likely to be misidentified as nerves by the AI system (Fig. 1).

Table 1 Surgical procedures and evaluation scores of each patient.
Figure 1
figure 1

Accuracy of the surgical support system in recognising thoracic nerves. (A) The left recurrent laryngeal nerve is well recognised immediately after slight exposure. (B) Thoracoscopic image after dissection of the dorsal side of the station #4L lymph nodes. The recurrent nerve is well delineated to the periphery.

The video quality of the AI monitor was also favourable (Supplementary Video). Almost no time lag (4.9 ± 0.3) or difference in image quality (4.6 ± 0.5) was observed between the thoracoscopy monitor and the AI monitor. Meanwhile, there was a slight difference in terms of smoothness of movement (3.2 ± 0.4). There was no difference in the results of neural recognition regardless of the orientation of the thoracoscope, insertion position of the thoracoscope or diameter of thoracoscope used (10 vs. 5 mm).

Discussion

Results showed that the accuracy of the AI surgical support system based on deep learning in recognising the thoracic nerves was satisfactory for expert thoracic surgeons. The system could present the recognition results in real-time during the actual surgery without time lag or degradation of image quality.

Computational evaluation results of the created recognition model of the thoracic nerves were relatively favourable for recognising numerous thin structures, with a Dice index of 0.56 and a Jaccard index of 0.39. Kumazu et al. reported the Dice index and the Jaccard index of the recognition model of the dissection layer using our AI surgical support system reached 0.554 and 0.383, respectively4. These values indicated a high performance considering that the highest values for neuronal cell segmentation under an optical microscope are 0.525 for the Dice index and 0.356 for the Jaccard index based on the report of Kaggle, in which data scientists around the world competed in terms of data analysis skills using machine learning5.

Similar to this study, other studies have made various attempt to use AI for intraoperative guidance for surgical anatomy6. Madani et al.7 developed and evaluated the performance of AI models that can identify safe and dangerous zones of dissection during laparoscopic cholecystectomy. Further, Mascagni et al.8 established an AI system that automatically segments hepatocystic anatomy. Sato et al.9 reported that the real-time detection ability of AI for the recurrent laryngeal nerve in thoracoscopic esophagectomy was higher than that of general surgeons. The method for recognising anatomical structures using the image segmentation technique by deep learning is common in previous studies and this study. The most substantial difference between these AI models and the one used in this study appears to be versatility. The model reported by Madani et al. can identify safe and unsafe areas in the standardised operative field of laparoscopic cholecystectomy. This segmentation was based on the visual image information of the safe area and adjacent vessels and organs. Annotation of the safe area in the images of training data did not reflect the actual microanatomical structures in detail. Therefore, the segmentation boundaries are vague and are likely to lack versatility if the operative procedure or field of view is changed. In Sato et al., the model learned to recognise only the recurrent nerve, not the vagus nerve and other nerve fibres. Therefore, the model recognises the pixel patterns of nerve fibres and then distinguishes the recurrent nerve from other nerves under some conditions. The condition may involve anatomical position in relation to other vessels and organs, which is not universal.

However, the AI model used in this study recognises the image pattern of the nerve itself by annotating all nerve fibres in the training images, with minimal consideration of anatomical positioning. This makes the AI model significantly more versatile than those reported in previous studies. Our AI system can recognise not only the recurrent nerve but also the vagus, phrenic and various abdominal nerves, with the same accuracy regardless of the surgical procedure (thoracic or abdominal), position and rotation of scopes or organ exposure method used. Surgical Vision Eureka was commercialised as an educational tool in 2022 (https://anaut-surg.com/), and it will be approved as a medical device through further deep learning using training data generated from surgical images taken under various conditions to improve the accuracy and universality of the anatomical recognition.

This study had some limitations. This was a single-institution study and the number of cases was limited. As identifying the left vagus and left recurrent nerves was considered clinically important, cases involving station #4L lymph node dissection were prioritised. Recognition accuracy was evaluated using captured still images in this study; however, a video should be used for evaluation considering its use in actual surgery. This system emphasises only what is visible. Hence, it cannot predict the location of what is not visible. Therefore, if a nerve is completely covered by mediastinal fat, it cannot be recognised. However, even in such cases, the AI system can recognise and highlight a small portion of the nerve that is momentarily visible when the fat is dissected. Although the accuracy of our AI model was unlikely to be affected by anatomical positioning, it might change with differences in the surgical procedures. Thus, further research should be conducted to verify whether this system can provide universally accurate recognition in various types of surgeries and patients who are fatty.

In conclusion, the surgical support system based on AI using deep learning was sufficiently accurate in recognising the thoracic nerves. It presented the recognition results in real-time during the actual surgery without time lag or degradation of image quality.

Methods

Ethics statement

All procedures were conducted in accordance with the ethical standards of the institutional and national committees on human experimentation and with the 1964 Declaration of Helsinki and its later amendments. This study was approved by the Institutional Review Board of Clinical Research of the Cancer Institute Hospital of the Japanese Foundation for Cancer Research on 01 February, 2022 (referral no. 2021-GB-067). A written informed consent was obtained from the participants prior to surgery to record surgical images and use them for AI analysis.

Study design

This retrospective, observational study included patients who underwent video-assisted thoracoscopic lobectomy or segmentectomy with mediastinal lymph node dissection for lung cancer between June 2022 and April 2023. This study aimed to assess the accuracy of nerve recognition using the fixed surgical technique. Hence, this study only included cases in which the first author (Junji Ichinose) was the primary surgeon. Surgery was performed using four ports (7, 7, 15 and 30 mm) with a confronting upside-down monitor setting, in which the patient was viewed horizontally as in open thoracotomy10,11. In left-sided surgery, a 30° 10-mm scope was inserted via the fifth intercostal space.

Surgical Vision Eureka (Anaut Inc., Tokyo, Japan) was used as the AI surgical support system. This system uses a workstation and display monitor, which can be connected to a conventional endoscopy system with a single cable. All equipment used in this study was borrowed free of charge from Anaut Inc. The use of this system is currently approved only for educational purposes and is yet to be approved as a medical device. Therefore, the physicians who engaged in the surgery did not see the monitor of the AI system. The recognition results were evaluated by the physician who did not participate in the surgery.

Development of the AI surgical support system

The development of the recognition model in Eureka was performed as described in a previous study4. Deep learning was performed using an algorithm based on the convolutional neural network U-Net architecture on a workstation with Tesla V100 GPU (NVIDIA Corp., Santa Clara, CA) with a memory of 32 GB (Fig. 2). A total of 682 still images depicting the vagus nerve, left recurrent nerve or phrenic nerve were captured from videos of lung cancer surgery and saved in BMP format at 1920 × 1080 pixels (aspect ratio: 16:9). All neural tissues in each frame were accurately annotated by board-certified surgeons (N. K. and K. K.) and used as training data (Fig. 3).

Figure 2
figure 2

Deep-learning architecture implementing U-Net used in the surgical support system. conv, convolution; concat, concatenation.

Figure 3
figure 3

Development of the recognition model of the thoracic nerves. (A) Still images showing that the thoracic nerves were captured from the videos of lung cancer surgery. (B) All neural tissues in each frame were accurately annotated by board-certified surgeons (N. K. and K. K.) and used as training data. Nerve fibres are marked in yellow-green. (C) The system can present the recognition results on the AI monitor in real-time at 30 frames per second.

Computational evaluation of the AI model

The created thoracic nerve recognition model was applied to surgical videos for validation and 47 frames depicting the vagus, left recurrent or phrenic nerves were extracted from the videos of the recognition results. Two annotators, different from the creators of the training data, manually annotated the corresponding original images to create the ground truth. The degree of agreement in the entire image between the AI recognition area and the annotated area in the ground truth was evaluated using the Dice index (F1 score) and the Jaccard index (Intersection over Union). Both of these indicators are frequently used to evaluate precision in machine learning. A comprehensive analysis of biomedical image analysis challenges revealed that Dice index was used in 92% of all 383 segmentation tasks12. The Dice index and the Jaccard index are calculated using the following formulas, where TP, FN and FP represent true-positive, false-negative and false-positive counts, respectively.

$$Dice \; index = \frac{TP}{TP + \frac{1}{2} \left(FP + FN\right)},$$
$$Jaccard \; index = \frac{TP}{TP + FP + FN}.$$

Expert evaluation of the AI model

The clinical usefulness of the surgical support system should be evaluated by not only computers but also expert surgeons. Five still images were captured from the recognition result video of each case, selecting images where the thoracic nerves were in the field of view enough to assess the accuracy of the neural recognition. Four board-certified general thoracic surgeons skilled in mediastinal lymph node dissection (J. I., A. S., Y. M. and M. N.) assessed and rated the images using a 5-point scale, with 5 being the best and 1 being the worst, based on two aspects: recall score (how infrequently the AI fails to recognise neural areas) and precision score (how infrequently the AI overdetects non-neural areas).

Video quality of the AI monitor

In addition, to evaluate whether it is possible to perform surgery while watching the AI monitor displaying the recognition results, a physician who did not participate in the surgery compared the AI monitor with the thoracoscopy monitor during surgery and rated it using a five-point scale based on three aspects: time lag, difference in image quality and difference in smoothness of movement. Smoothness of movement indicated that the video was not jumpy and that it instantaneously updated without delay.

Statistical analysis

Continuous variables were expressed as average ± standard deviation. A comparison test was not performed in this study. All statistical analyses were performed using EZR (Saitama Medical Center, Jichi Medical University, Saitama, Japan), a graphical user interface for R (The R Foundation for Statistical Computing, Vienna, Austria).

Meeting presentation

We presented this study at the 60th Annual Meeting of The Society of Thoracic Surgeons on January 27–29, 2024.