Vision-based tracking system for augmented reality to localize recurrent laryngeal nerve during robotic thyroid surgery

We adopted a vision-based tracking system for augmented reality (AR), and evaluated whether it helped surgeons to localize the recurrent laryngeal nerve (RLN) during robotic thyroid surgery. We constructed an AR image of the trachea, common carotid artery, and RLN using CT images. During surgery, an AR image of the trachea and common carotid artery were overlaid on the physical structures after they were exposed. The vision-based tracking system was activated so that the AR image of the RLN followed the camera movement. After identifying the RLN, the distance between the AR image of the RLN and the actual RLN was measured. Eleven RLNs (9 right, 4 left) were tested. The mean distance between the RLN AR image and the actual RLN was 1.9 ± 1.5 mm (range 0.5 to 3.7). RLN localization using AR and vision-based tracking system was successfully applied during robotic thyroidectomy. There were no cases of RLN palsy. This technique may allow surgeons to identify hidden anatomical structures during robotic surgery.


Results
In the pilot study, seven RLNs (4 right and 3 left) from six patients were used to measure the distance between the actual RLN and the trachea ( Table 1). The mean distance was 7.5 mm.
In this prospective study, nine patients were enrolled and 11 RLNs (9 right and 4 left) were tested ( Table 2). After activation of the vision-based tracking system, AR images moved in line with the actual organs (Supplementary Video). The distance between the AR image of the RLN and the actual RLN was 1.9 ± 1.5 mm (range 0.5 to 3.7; Table 3). There were no RLN palsy cases on postoperative laryngoscopic examination.

Discussion
Identification of hidden structures is more challenging during robotic surgery than open surgery because of the loss of tactile feedback and difficulties with hand-eye coordination with instruments 7 . Identifying the RLN, the most time-consuming step during thyroid surgery, is more challenging during robotic thyroid surgery compared to open surgery. AR is a technology that superimposes images of objects onto physical objects 8,9 . AR approaches are employed in surgery for hidden or deeply located tumors such as parathyroid, liver, or brain tumors [10][11][12]  Mean ± SD 1.9 ± 1.5 Table 3. Distance between the AR image of the RLN and actual RLN. AR; augmented reality, RLN; recurrent laryngeal nerve, RMSE, root mean squared error.
technology is also applied in robotic surgery 7 . However, the AR image is unable to follow camera movement during surgical procedures, thus it requires continuous manual overlaying or the use of additional instruments for image localization 1,13,14 .
In this study, we developed a semi-automatic registration method to enable an AR image to be automatically overlaid on the actual organs followed by manual fine-tuning. Constructing AR images of the common carotid artery, trachea, and RLN took about 30 minutes in total. For automatic overlaying, the color and axis of the trachea were used as a landmark. The actual tracheal region was segmented using the thresholding method which is a technique used to detect color difference. The angles of the long and short axes of the segmented trachea regions were determined. The AR image of the trachea was overlaid onto the actual tracheal region at the predetermined angle. To enable the AR of the RLN to follow camera movement, we applied simultaneous localization and mapping (SLAM) technology 15,16 .
SLAM technology constructs a 3D map of the surrounding environment almost in real-time while simultaneously tracking an object's location [17][18][19] . The proposed tracking system is also capable of detecting respiratory movement, as shown in Supplementary Video. This technology does not require training with images and is widely used across a range of fields including navigation, robotic mapping, odometry for virtual reality, and AR [20][21][22][23] . In this study, vision-based tracking was performed in three steps: (1) SLAM application on the physical surgical image, (2) semi-automatic organ overlay, and (3) vision-based organ tracking for AR images to follow camera movement in real-time.
There are limitations to this study. We were unable to display AR on the surgeon's binocular monitor (on the robot) because integration of AR on the robotic monitor is prohibited by the equipment's license. Instead, we used a separate monitor to display and manipulate the AR. Integration will be possible in the future when manufactures make surgical robots accessible by AR software. Another limitation of this technique is that the current system does not compensate for tissue deformation. We constructed AR images for the non-deformable structure (trachea) and the deformable structures (common carotid artery and RLN). However, we did not use a deformable registration method in this study for two reasons: First, the common carotid artery and RLN can be regarded as fixed organs because their position is relatively fixed, although their nature is deformable. Second, a minor inaccuracy in the location of the RLN is not clinically significant and was acceptable to the surgeon. The role of AR in this study was not to predict the exact location of the RLN, but to suggest the probable location of the RLN and help the surgeon to explore the area around the RLN.
In summary, we developed a vision-based tracking system for AR using SLAM technology and showed that RLN can be successfully localized during robotic thyroidectomy. This technique may be helpful for surgeons to identify hidden anatomical structures during robotic surgery.  Figure 1 demonstrates AR image construction. A thresholding method was used for the trachea AR image (threshold value ranged −2,650 to −240 HU), and AR images of the common carotid artery and the RLN were manually segmented. 3D volume rendering was performed using a marching cubes algorithm through segmented section images 24 . The surface of the AR image was post-processed with smoothing using MeshMixer (v.3.5.474, Autodesk Inc., San Rafael, CA, USA).

Study protocol.
In the pilot study, AR images of the trachea and common carotid artery were constructed based on CT images. The images were overlaid on the actual organs during surgery. After the RLN was identified, the distance between the actual RLN and the AR image of the trachea was measured.
In the prospective study, AR images of the trachea, common carotid artery, and the RLN were constructed using CT images. The AR image of the RLN was positioned to the side of the trachea at the distance determined www.nature.com/scientificreports www.nature.com/scientificreports/ by the pilot study. During robotic thyroid surgery, the AR images of the trachea and common carotid artery were overlaid on the actual structures after they were exposed. The vision-based tracking system was activated so that the AR image of the RLN followed the camera movement. After identifying the actual RLN, the distance between the AR image of the RLN and the actual RLN was measured. Figure 2 shows the hardware of the tracking system used during robotic surgery. The AR screen is branched from the screen of the master surgical robot using a capture board, and connected to a laptop computer running a vision-based tracking system. We used the IS4000 8 mm camera provided with the Xi model of the da Vinci Surgical System. It is a 3D camera consisting of two lenses, but only a single image capture by a 2D monocular camera was used in this study. The camera tip angles was 30 degrees, and the working distance range was 20 mm to 40 mm. The field of view was 70 to 80 degrees, and the focal length was 8 mm. The video specification was 1920 × 1080 resolution, and 30 fps.

Hardware of tracking system in robotic surgery.
Procedures of AR application using vision-based tracking system. Figure 3 demonstrates the process of AR application and vision-based organ tracking. After exposing the trachea and common carotid artery, SLAM technology was applied to create a 3D map of the operative field with the ability to hold AR images after overlay. We used a 3D-2D registration method to apply a constructed 3D AR image to a 2D image obtained through a monocular surgical camera. AR images of the trachea and common carotid artery were overlaid semi-automatically. Color and axis of the segmented tracheal region were used for initial automatic overlay, and the location of the AR image was fine-tuned manually through translation (x, y, z), rotation (roll, pitch, yaw), and zoom (in, out) using a mouse control. Following the development of an environment for the tracking system to  www.nature.com/scientificreports www.nature.com/scientificreports/ operate within, we developed the tracking system using OpenCV (v2.3.8) for Unity software (v.2018.3.7 f1, Unity Software Inc, San Francisco, CA, USA), as shown in Fig. 4.
SLAM technology, using MAXST AR SDK software (v4.1.4, MAXST, Seoul, Korea), was applied to create a 3D map of the operative field with the ability to hold AR images after semi-automatic overlay. This technology has the ability to track an object or a space without any prior knowledge in real-time using only visual information from a camera 25,26 . SLAM achieves 3D point mapping via stereo matching using motion parallax between the first and subsequent camera images. The points are tracked and 6 degrees of freedom (6-DoF) camera pose is estimated. The tracking algorithm first calculates a rough relative pose between contiguous images by frame-to-frame matching, and projects the 3D points onto the current image by the pose. Then, the algorithm searches their corresponding images to gather matched pairs between 3D and 2D 27 . The final pose is optimized iteratively from the pairs 28 . If the image includes an unknown scene, some feature points are extracted from the unknown scene, triangulated by epipolar geometry, and added to the 3D point map 29 .
To apply SALM in this study, the surgeon moved the robotic camera continuously for about 5-10 seconds at various angles to create an initial 3D point map. Then, the vision-based tracking system was applied for AR images to follow camera movement in line with the actual organs, in real-time. This method employs a coarse-to-fine strategy to cope with motion blur or fast movement of a camera and efficient second-order minimization 30 (ESM) algorithm for precise tracking. When a camera moves fast, the correspondence search often fails due to the large search range. To overcome this problem, the input image is configured into a four-level image pyramid and the smallest image is utilized in the coarse step where a rotation of only 3-DoF pose is calculated quickly between