Ability of arti�cial intelligence to detect T1 esophageal squamous cell carcinoma from endoscopic videos: supportive effects of real-time assistance

Diagnosis using arti�cial intelligence (AI) with deep learning could be useful in endoscopic examinations. We investigated the ability of AI to detect super�cial esophageal squamous cell carcinoma (ESCC) from esophagogastroduodenoscopy (EGD) videos. We retrospectively collected 8428 EGD images of esophageal cancer to develop a convolutional neural network through deep learning. We evaluated the detection accuracy of the AI diagnosing system compared with that of 18 endoscopists. We used 144 EGD videos for the two validation sets. First, we used 64 EGD observation videos of ESCC using both white light imaging (WLI) and narrow-band imaging (NBI). We then evaluated the system using 80 EGD videos from 40 patients (20 with super�cial ESCC and 20 with non-ESCC). In the �rst set, the AI system correctly diagnosed 100% ESCCs. In the second set, it correctly detected 85% (17/20) ESCCs. Of these, 75% (15/20) and 55% (11/22) were detected by WLI and NBI, and the positive predictive value was 36.7%. The endoscopists correctly detected 45% (25-70%) ESCCs. With AI real-time assistance, the sensitivities of the endoscopists were signi�cantly improved without AI assistance (p<0.05). AI can detect super�cial ESCC from EGD videos with high sensitivity and improve endoscopists’ detection of ESCC with real-time support.


Introduction
Esophageal cancer is the sixth most common cause of mortality worldwide, accounting for almost 508,000 deaths annually [1,2] Esophageal squamous cell carcinoma (ESCC) is the most common histological type of esophageal cancer throughout Asia, speci cally Japan [3,4] .ESCC diagnosed in advanced stages often requires invasive treatment and has a poor prognosis; therefore, early detection is important for optimal prognosis [4] .However, early diagnosis remains di cult, and early-stage disease can be overlooked during endoscopic examination.
It can be challenging to correctly diagnose ESCC at early stages using only white light imaging (WLI).
Iodine staining can improve ESCC detection with high sensitivity and speci city; however, it can cause severe discomfort and increases the procedure time [5,6,7] .It is therefore used only for high-risk patients.Narrow-band imaging (NBI) is a revolutionary technology of optical image-enhanced endoscopy that facilitates ESCC detection without iodine staining [8][9][10] .NBI is easier to use than iodine staining and does not cause patient discomfort.However, NBI has insu cient sensitivity (53%) for detecting ESCC when used by inexperienced endoscopists [11] .Therefore, there is an urgent and unmet need to improve ESCC detection for less experienced practitioners.
Computer-aided diagnosis using arti cial intelligence (AI) with deep learning methods could be a useful adjunct to endoscopic examination that could improve detection of early cancers [12][13][14] .Our group was the rst to report good diagnostic performance of AI using deep learning to detect esophageal cancer, including ESCC and adenocarcinoma, from still endoscopic images.In our study, AI had a sensitivity of 98% and could distinguish super cial and advanced cancer with an accuracy of 98% [12] .In super cial cancers, the AI diagnosing system could differentiate pathological mucosal and submucosal microinvasive (SM1) cancers from submucosal deep invasive (SM2) cancers; this can help determine the appropriate treatment course for each patient [15] .
In this study, we evaluated the ability of AI to detect ESCC from esophagogastroduodenoscopy (EGD) videos.Analyzing still images, AI can evaluate only a limited area, however, numerous images are required to screen the entire esophagus, which requires a lot of time.In EGD videos, the whole esophagus can be evaluated without taking pictures of the non-cancerous areas.To ensure that AI would detect ESCC in a fast-moving situation, for example, in the event that an inexperienced endoscopists examines the esophagus to quickly and does not notice the lesions, we prepared two validation video sets which include slow-speed and high-speed video sets.Analysis of AI diagnosis using videos will aid in realizing the real-time support of AI diagnosing systems for endoscopic examination.

Methods
This study was approved by the Institutional Review Board of the Cancer Institute Hospital (No. 2016-1171) and the Japan Medical Association (ID JMA-II A00283).Informed consent or an acceptable substitute was obtained from all patients.

Preparation of training image sets and construction of a convolutional neural network (CNN) algorithm
For this single-center retrospective study, we obtained EGD images taken from February 2016 to April 2017 at the Cancer Institute Hospital, Tokyo, Japan, as described previously [12] .Brie y, we collected 8428 training images of esophageal lesions histologically con rmed to be ESCC or adenocarcinoma.The training esophageal cancer images included 397 lesions of ESCC, including of 332 super cial cancers and 65 advanced cancers.Training images included 6026 and 2402 images obtained using WLI and NBI endoscopy, respectively.Poor-quality images resulting from halation, blur, defocus, mucus, and poor air insu ation were excluded.Magni ed images obtained by magnifying endoscopy were also excluded.All images of esophageal cancer lesions were manually marked by a well-experienced endoscopist.These images were used to develop a deep learning algorithm using an AI diagnosing system for esophageal cancer.
To develop our AI-based diagnosing system, we used a deep neural network architecture (https://arxiv.org/abs/1512.02325),referred to as a "Single Shot MultiBox Detector" [12] .

AI system to detection ESCC in videos
The AI diagnosing system recognized 30 continuous frames of still images in 1 sec of video and detected ESCC in the same manner as the analysis of still images.When the AI detected a cancer, it reviewed the video for 0.5 sec (15 frames).If the reviewed section included a cancer image in more than 3 frames, and the maximum interval from the latest cancer image was 0.1 sec (3 frames), the AI diagnosed the lesion as cancer, giving a discovery signal (Figure 1a).This setting was based on a small number of videos that were independent of the validation dataset and obtained in a preliminary examination.The AI diagnosing system inserted the image of the recognized cancer on the left side of the monitor (Figure 1b, c, d), indicating that it diagnosed the lesion as cancerous.If the inserted image included any part of ESCC, we considered it positive, and if the inserted image did not include ESCC, we considered it a false-positive result.Endoscopists could easily verify whether the AI diagnosing system had correctly detected the cancer.

Validation EGD video set and AI diagnosis
The performance of the AI diagnosing system was evaluated using independent validation EGD videos.
We used a total of 144 EGD videos for the two validation sets.As a slow-speed video validation set, we prepared a dataset of 64 videos of 32 ESCC patients obtained using both WLI and NBI from August 2018 to August 2019 at the Cancer Institute Hospital.In the EGD videos, ESCC was observed while the endoscope was moving slowly.The whole lesions were observed for 5 to 15 sec.When the AI diagnosing system recognized a cancer, it indicated it with a bordering square and inserted the image on the left side of the monitor.Because all videos included cancer, we examined only sensitivity in this validation set.
As a high-speed video validation set, we prepared a dataset of 80 videos of WLI and NBI endoscopies performed for 40 patients from August 2018 to August 2019 at the Cancer Institute Hospital.The dataset included 20 patients with 22 super cial ESCC lesions and 20 patients without ESCC.We used EGD videos inserting the endoscope from the cervical esophagus to the esophagogastric junction (EGJ) at a speed of 2 cm/sec without stopping or focusing on the speci c lesions.These EGD videos were considered the speed at which the endoscopist taking the video passed by without noticing the lesion in routine examination.
Sensitivity, speci city, positive predictive value (PPV), and negative predictive value (NPV) of the AI diagnosing system to detect ESCC from each EGD video were calculated as follows: sensitivity, number of EGD videos in which the AI diagnosing system correctly diagnosed cancer divided by the total number of EGD videos with cancer; speci city, number of EGD videos in which the AI diagnosing system determined that no cancerous lesion existed divided by the total number of videos without cancer; PPV, number of EGD videos in which the AI diagnosing system accurately detected cancer divided by the total number of videos in which the AI diagnosing system detected cancer; and NPV, number of EGD videos in which the AI diagnosing system accurately determined that no cancerous lesion existed divided by the total number of videos that the AI diagnosing system determined as not having a cancerous lesion.In a comprehensive analysis, when the AI diagnosing system detected ESCC in either WLI or NBI videos, we de ned this as a correct diagnosis.

Comparison with endoscopists
We prepared two sets of validation videos for diagnosis by endoscopists.One set was composed of the same set of high-speed videos that the AI diagnosed.The other set was the same as the rst set but included the diagnostic real-time assistance of AI, with the AI indicating cancers with a rectangular border without inserting the image on the left side of the monitor.In this video set, we examined the additive effect of the AI system to the diagnostic ability of the endoscopists.These validation video sets were diagnosed by 18 endoscopists, including 7 board-certi ed endoscopists and 11 non-certi ed endoscopists, at the Japan Gastroenterological Endoscopy Society.Endoscopists watched high-speed videos on a personal computer and pushed a button when they detected ESCC (correct answer).However, the answer was considered incorrect when an endoscopist failed to pushed the button and did not recognize the ESCC in the video.Moreover, if endoscopists noticed the lesion but could not con rm that is was ESCC while the lesion was on the monitor, the answer was considered incorrect.These rules were strictly adhered to, in order to ensure the accuracy of this analysis.The endoscopists could push the button as many times as they detected ESCC.Each endoscopist diagnosed one set of videos chosen randomly.After one month of washout, the endoscopists diagnosed the other validation video set.Between the two rounds of analysis, the endoscopists were not given feedback on their performance or the correct answers.

Statistical analysis
All continuous variables are expressed as median and range.The differences in AI sensitivity and speci city by WLI and NBI were compared using McNemar's test.
The sensitivities of the endoscopists with or without AI assistance were compared using the Mann-Whitney test with GraphPad Prism software (GraphPad Software, Inc, La Jolla, CA, USA).A p value of <0.05 was considered statistically signi cant.

AI the slow-speed validation video set
The characteristics of patients and in the validation video set are summarized in Table 1a.In the set, there were more men than women, the median age was 67.5 years, and half the lesions were located in the middle thoracic esophagus.These characteristics are typical for ESCC in the Japanese population [16]   .The median tumor size was 17 mm, and most lesions were mucosal ESCC (T1a) (Table 1a).Therefore, the sensitivity of the AI diagnosing system was 100% for both WLI (32/32) and NBI (32/32).

AI diagnosis the high-speed validation video set
In the high-speed validation set, 90% patients were and the median age was 70 years.The median tumor size was 17 mm with 95% being T1a and 5% being T1b (Table 1b).The sensitivity of the AI diagnosing system was 85% (17/20) for the comprehensive diagnosis, whereas the sensitivities based on WLI (Supplementary Video S1) and NBI (Supplementary Video S2) were 75% and 55%, respectively (Figure 2).The speci city in NBI was signi cantly higher than that in WLI (80% vs 30%, p<0.01) (Table 2).
Causes of false-positive and false-negative results in the high-speed video set The most frequent cause of false-positive results (Table 3) was a shadow in the esophageal lumen (Figure 3a), which accounted for 41% of all false-positives.Normal structures and benign lesions, such as the EGJ (Figure 3b), post-endoscopic resection scars (Figure 3c), and mucosal in ammation (Figure 3d) were also misdiagnosed as cancer.
With regard to false-negative results (Table 3), nearly half the false-negative images were due to esophageal in ammation in the background mucosa (Figure 4a).Other common causes were anterior wall (Figure 4b) and obscure ESCC lesions, particularly with WLI endoscopy (Figure 4c), which were sometimes di cult to diagnose even by expert endoscopists.The AI diagnosing system also missed a lesion measuring 5 mm in diameter (Figure 4d), which was the smallest lesion in the EGD videos.Endoscopists could detect it in the shape of a submucosal tumor-like elevated lesion.

Outcomes of endoscopists in the high-speed validation video set
The median sensitivity of the endoscopists for the comprehensive cancer diagnosis was 45% (range, 25-70%), whereas the median sensitivities based on WLI and NBI videos were 25% (range, 15-45%) and 35% (range, 15-60%), respectively (Figure 5).There was no difference in sensitivity between board-certi ed endoscopists and non-certi ed endoscopists.
With AI real-time assistance indicating cancers with a rectangle, the sensitivities of the endoscopists were signi cantly improved relative to their sensitivities without AI assistance (p<0.05).The sensitivities were improved in 13 of 18 endoscopists by a median of 10% (5-25%).

Discussion
We evaluated the computer-aided detection of ESCC from EGD videos that employed AI-based CNN with deep learning.AI diagnosis of ESCC in videos has been reported recently in other studies [17,18] .In the videos of these studies, endoscopists carefully observed the ESCC lesions and diagnosed them using AI.The videos were similar to the slow-speed video sets in our study, and the results were good.However, it is impossible to observe the whole esophagus carefully for every patient in routine screening examination, as this would require an extended amount of time.As endoscopists are required to detect the lesions rst during routine screening examinations, we also tested high-speed video sets.Furthermore, we have shown that the diagnosis of cancers by the endoscopists improved with AI assistance.
We rst evaluated slow-speed videos and achieved a detection rate of 100% in both WLI and NBI.We then examined the AI's performance using high-speed validation video.The sensitivity of the AI diagnosing system in high-speed videos was 85%, which was much higher than the 45% sensitivity of endoscopists.Their sensitivity was signi cantly improved to 52.5% with AI assistance.
From a point of view, videos are very different from still images.However, AI recognizes 1 sec of video as a sequence of 30 frames of still images and detects ESCC in the same manner as still images of the same quality.In the slow-speed videos, the AI detection rate was 100%, which is consistent with previous reports [17] .Although this is an important step in using AI to diagnose cancer, these results are not su cient to prove that AI is useful for detecting cancers that humans overlook.To address this in a clinical situation, we used a high-speed validation video set.The sensitivity of the AI diagnosing system was 85% in this high-speed video set.The difference between these two results can be explained by fewer focused clear images in high-speed videos, including more unclear bridging images in between clear images, because the endoscope moved continuously without stopping or focusing on any lesion.It was di cult to detect ESCCs that the scope passed in all consecutive frames in the high-speed videos.It was also more challenging to detect ESCCs during peristalsis, when the ESCC appeared bent or shrunk on the moving esophageal wall, because we only trained the AI system on well-extended esophageal walls.To improve these weaknesses, training videos for an AI system should include plenty of bridging images to achieve higher robustness.
Although the sensitivity of NBI was low and the sensitivity of WLI was slightly high, there was no signi cant difference between the two observation method.The number of cases was not large in this study; however, the sensitivity of NBI was su ciently high with regard to slow-speed validation set and still image evaluation in previous studies [12] .In addition, it was still higher than the reported PPV of endoscopists examining NBI endoscopies (45% for experienced endoscopists and 35% for less experienced endoscopists) [11] .In daily clinical practice, false-positive results for cancer screening are considered more acceptable than false-negative results.Adding magnifying endoscopy reportedly improves PPV [18][19] .However, we believe that the AI system without magnifying endoscopy presented here would be most useful for primary detection in clinics or hospitals without well-experienced endoscopists on staff, so we speci cally aimed to develop a non-magnifying system in this study.
We also analyzed causes of false-positive and false-negative results.False-positives were often caused by shadows of the esophageal lumen and EGJ (Table 3), similar to our still image analysis [12] .Nearly half false-negative results were due to in ammation of the background mucosa, which can also be di cult for endoscopists to differentiate.The second most common reason for false-negatives was anterior wall lesions, which can be di cult for endoscopists to detect on tangential views.
The sensitivity of the AI was better than that of 15 endoscopists using the same videos.The sensitivity of diagnosis by the endoscopists was 45%, demonstrating the di culty of obtaining a proper diagnosis.Moreover, this result suggests that AI could identify 40% ESCCs that were missed by the endoscopists.We hypothesize that the low sensitivity of endoscopists was due to the increased speed of the videos and strict criteria of correct answers in which endoscopists had to diagnose the lesions.However, AI could diagnose ESCC in fast-moving situations that were di cult for endoscopists.Furthermore, we showed the improved diagnostic ability of the endoscopists with AI assistance.
Endoscopists usually move the endoscope quickly through the esophagus, as in the high-speed validation videos, until a suspected cancerous lesion is noticed.They con rm the presence of cancer by observation under magni cation or biopsy, although many point out cancer by non-magnifying observation.After detecting the lesion, endoscopists can diagnose it as ESCC by examining a slow-speed video set and still images.AI may help clinicians detect these cancers in real time.After identifying a lesion, the endoscopist should stop to examine it more carefully, as in the slow-speed validation videos.The image of the ESCC that appears on the monitor will let the clinician know that the AI has identi ed the suspected lesion as cancerous.Diagnostic assistance using the AI system would be helpful in both slow-moving and fast-moving situations.This study has several limitations.First, this was a single-center, retrospective study.However, we think the results are reliable because the ESCC diagnosis was objectively veri ed.Second, we moved the endoscope at several speeds to imitate a screening endoscopy; however, we saw an outcome at only two speeds.The AI detected 100% ESCCs in the slow-speed videos that imitated careful lesion observation.In the high-speed videos that imitated endoscopy without careful lesion inspection.Third, we validated a limited number of ESCCs, but we believe that our previous analysis of still images compensates for the lack of variety in cancers in this study.

Conclusion
The AI-based diagnostic system demonstrated a high diagnostic accuracy to detect ESCC from EGD videos.Moreover, the detection of endoscopists were improved by the real-time assistance of AI diagnosing system in high-speed videos.Next, we plan to demonstrate that the AI diagnosing system would be helpful to detect ESCCs in a clinical study.We hope that the AI-based diagnostic system presented here will improve ESCC detection and facilitate earlier diagnosis in daily clinical practice in the near future.

Declarations
Shiroma and all authors commented on previous versions of the manuscript.All authors read and approved the nal manuscript.
Human rights statement and informed consent All followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki Declaration of 1964 and later versions.Informed consent or substitute for it was obtained from all patients for being included in the study.

Table 2 :
Detailed results of the AI-based diagnosis for each case PPV: positive predictive value, NPV: negative predictive value, AI: arti cial intelligence, WLI: white light imaging, NBI: narrow-band imaging