Automated recognition of objects and types of forceps in surgical images using deep learning

Analysis of operative data with convolutional neural networks (CNNs) is expected to improve the knowledge and professional skills of surgeons. Identification of objects in videos recorded during surgery can be used for surgical skill assessment and surgical navigation. The objectives of this study were to recognize objects and types of forceps in surgical videos acquired during colorectal surgeries and evaluate detection accuracy. Images (n = 1818) were extracted from 11 surgical videos for model training, and another 500 images were extracted from 6 additional videos for validation. The following 5 types of forceps were selected for annotation: ultrasonic scalpel, grasping, clip, angled (Maryland and right-angled), and spatula. IBM Visual Insights software was used, which incorporates the most popular open-source deep-learning CNN frameworks. In total, 1039/1062 (97.8%) forceps were correctly identified among 500 test images. Calculated recall and precision values were as follows: grasping forceps, 98.1% and 98.0%; ultrasonic scalpel, 99.4% and 93.9%; clip forceps, 96.2% and 92.7%; angled forceps, 94.9% and 100%; and spatula forceps, 98.1% and 94.5%, respectively. Forceps recognition can be achieved with high accuracy using deep-learning models, providing the opportunity to evaluate how forceps are used in various operations.

www.nature.com/scientificreports/ captured in the electronic medical records. For all other research subjects, information will also be disclosed by posting a document approved by the Ethics Committee on the Tokyo Women's Medical University website; this posting will also mention the possibility to refuse to participate as a research subject.
Datasets. The colorectal surgical videos used for annotation were recorded during surgeries conducted at the Tokyo Women's Medical University. A total of 1173 images were extracted from 11 surgical videos for model training, and another 500 images were extracted from 6 additional videos for validation. The following 5 types of forceps in the videos were selected for annotation: grasping, ultrasonic, clip, angled (Maryland and rightangled), and spatula forceps. A surgical video with a 60 s run time was extracted from the other videos and used to verify the model.

Imaging data and model deployment.
Abdominal endoscopic images were extracted from surgical videos ( Fig. 1). In total, 1173 images were extracted to train a forceps-type recognition model. Five types of forceps were selected for manual annotation by only 1 researcher. The selected types of forceps were grasping forceps, ultrasonic scalpel, clip forceps, angled forceps, and spatula forceps (Table 1 and Fig. 2). The model was  www.nature.com/scientificreports/ deployed, and the other 500 test images of various different angles of forceps with different patterns were input into the deployed model to verify its diagnostic accuracy (Fig. 3).
Performance metrics. Accuracy: percentage of correct image labels.
Mean average precision (mAP): calculated mean of precision for each object. Precision: percentage of images with a correctly labeled object out of all labeled images that contain an object. Recall: percentage of images that are labeled to contain an object out of all tested images that contain an object. Intersection over Union (IoU): location accuracy of the image label boxes. Confidence score: event probability.

Results
The accuracy, mAP, precision, recall, and IoU of the model were 90%, 100%, 92%, 100%, and 77%, respectively (Fig. 4). The total number of forceps identified in 500 test images was 1062. Of these, the number of correctly detected forceps was 1039 (97.8%). The number of false positives was 31. The recall and precision of each type of forceps calculated from the outcome values were as follows: grasping forceps, 98.1% and 98.0%; ultrasonic scalpel,

Discussion
In the field of surgery, AI-based decision support systems have provided a broad range of technological approaches to augment the information available to surgeons that have accelerated intraoperative pathology and surgical step recommendations 19 . Accurate and efficient object representation and segmentation are necessary for multilabel object classification in surgery based on the annotation of objects and frameworks 21 . Further, skill and motion assessments in surgical videos using CNN have been reported in recent years [22][23][24] .
In this study, we demonstrated the recognition of forceps (including type of forceps) from surgical images using CNN. In most test results, all 5 types of forceps were detected correctly with high confidence scores. Correspondingly, we obtained positive results in terms of the corresponding recall and precision values. The trained model was able to accurately detect the forceps at various angles ( Fig. 4a-i). These results indicate that the model recognized the shapes and colors of each type of forceps with high precision.
Although small in number, some forceps were not detected, or the outcomes yielded false positives. Based on the incorrect outcome images, we found that errors arose when only part of the forceps was observed in the images (Fig. 5a,b) or when the shapes of the forceps were similar to those of other types of forceps (Fig. 5c,d). Additionally, the results suggest that image resolution affects the validation outcome considerably. Because the forceps are in motion during surgeries, they are sometimes blurred in surgical videos or are closed in the cutout images. As a result, the model could not identify them or would recognize them as another type of forceps.
The potential of automatic video indexing and surgical skill assessment has been reported with the use of 300 laparoscopic sigmoidectomy videos from multiple institutions in Japan 25 . In the present study, the recall and precision values were good despite the limited learning because of the mixed frameworks of deep learning based on the use of the commercial software IBM Visual Insights.
The results of our study will aid the development of a system that will manage, deliver, and retrieve surgical instruments for surgeons upon request. The object recognition model in surgery has reached feasible performance levels for widespread clinical use. The object recognition of forceps could be used to provide real-time object information during surgeries upon further development based on the results of this study. By integrating and developing these technologies, the digitalization of surgical scenes and techniques becomes possible. The ability to evaluate how and what procedure was performed is significant. Moreover, these innovations will enable surgical technique evaluation and surgical navigation. Utilization of AI is largely expected not only in medical treatments, such as the prevention and diagnosis of diseases, but also in cases associated with insufficient resources and in risk management to prevent medical accidents.
This study had some limitations. First, it is difficult to modify the model itself via tuning other than by changing the training data, because the model was made using IBM Visual Insights. Further, there were only limited types of forceps created from colorectal cancer videos of a single facility.

Conclusion
In this study, we evaluated the recognition of different types of forceps using CNN and obtained positive results with high accuracy. Results of this study demonstrate the opportunity to evaluate use and navigation of forceps in surgeries.  (a) A grasping forceps and 1 spatula forceps were detected accurately, but 1 of the 2 grasping forceps in the image was not detected correctly; (b) the clip forceps was not identified correctly; (c) the angled forceps was recognized as an ultrasonic scalpel incorrectly; and (d) the clip forceps was identified correctly but was also recognized as an ultrasonic scalpel.