Enhanced multi-class pathology lesion detection in gastric neoplasms using deep learning-based approach and validation

This study developed a new convolutional neural network model to detect and classify gastric lesions as malignant, premalignant, and benign. We used 10,181 white-light endoscopy images from 2606 patients in an 8:1:1 ratio. Lesions were categorized as early gastric cancer (EGC), advanced gastric cancer (AGC), gastric dysplasia, benign gastric ulcer (BGU), benign polyp, and benign erosion. We assessed the lesion detection and classification model using six-class, cancer versus non-cancer, and neoplasm versus non-neoplasm categories, as well as T-stage estimation in cancer lesions (T1, T2-T4). The lesion detection rate was 95.22% (219/230 patients) on a per-patient basis: 100% for EGC, 97.22% for AGC, 96.49% for dysplasia, 75.00% for BGU, 97.22% for benign polyps, and 80.49% for benign erosion. The six-class category exhibited an accuracy of 73.43%, sensitivity of 80.90%, specificity of 83.32%, positive predictive value (PPV) of 73.68%, and negative predictive value (NPV) of 88.53%. The sensitivity and NPV were 78.62% and 88.57% for the cancer versus non-cancer category, and 83.26% and 89.80% for the neoplasm versus non-neoplasm category, respectively. The T stage estimation model achieved an accuracy of 85.17%, sensitivity of 88.68%, specificity of 79.81%, PPV of 87.04%, and NPV of 82.18%. The novel CNN-based model remarkably detected and classified malignant, premalignant, and benign gastric lesions and accurately estimated gastric cancer T-stages.

Deep learning algorithms are widely used in various fields owing to their growing clinical relevance in the medical domain.This can assist clinicians in decision-making 22 , enhance lesion detection, and alleviate the fatigue experienced by endoscopists [23][24][25] .
In this study, we aimed to develop a novel algorithm for the detection and classification of gastric cancer and premalignant and benign gastric lesions that are commonly identified through upper gastrointestinal endoscopy, while predicting the depth of invasion of gastric cancer.

Dataset
We retrospectively gathered still-image white-light endoscopy images of pathologically confirmed gastric lesions from patients who underwent upper gastrointestinal endoscopy between January 1, 2018, and December 31, 2021, at Seoul National University Hospital (SNUH).These included cases of gastric cancer (early gastric cancer [EGC]  and advanced gastric cancer [AGC]), gastric premalignant lesions (low-grade and high-grade dysplasia), benign gastric lesions (benign gastric ulcers [BGU], benign polyps [hyperplastic and fundic gland polyps], and benign erosions), and normal endoscopy cases (normal gastric mucosa with no visible lesions).The exclusion criteria were as follows: (1) inappropriate images (low resolution, blurring, artifacts, bubbles, shadowing, inadequate air inflation, etc.) and ( 2) images without pathology results (except for images of a normal stomach).The models for lesion detection and invasion depth classification were designed as shown in Fig. 1.
Table 1 shows the composition of image categories in the datasets used in this study.A total of 10,181 whitelight images from 2606 participants were included in the study, with an 8:1:1 ratio maintained for the training, validation, and test data to ensure that the patient images did not overlap between the sets.Specifically, the All endoscopic procedures were performed and reviewed by experienced endoscopists, each with more than 6000 cases of prior experience.Gastric cancers and adenomas were treated with either endoscopic submucosal dissection or surgery, and the pathological results of the resected tumors were reviewed.
The lesions were classified by combining endoscopic findings with the pathology reports reviewed by the endoscopists (HSC and BKK).Endoscopic images were classified into six categories: EGC, AGC, gastric dysplasia, BGU, benign polyps, and benign erosions.Images were also classified according to their malignant potential: neoplasm versus benign and cancer versus non-cancer.Cancers included EGC and AGC, whereas neoplasms included both gastric cancer (EGC and AGC) and gastric dysplasia (low-grade dysplasia [LGD] or high-grade dysplasia [HGD]).For gastric cancers, the pathology results of the resected specimens were reviewed, and the depth of invasion was identified as: (mucosal cancer (T1a), submucosal invasion (T1b), proper muscle invasion (T2), subserosal invasion (T3), and serosal invasion or invasion of adjacent structures (T4).The training dataset for the model that classified the depth of invasion is presented in Table 2.

Characteristics of the included images
The data were categorized into six classes, as listed in Table 1.Specifically, 48.24% (4911/10,181) of the entire dataset fell into the "neoplasm" category (including EGC, AGC, HGD, and LGD), 23.38% (2380/10,181) were classified as "non-neoplasm" (which included BGU, benign polyps, and benign erosions), and 28.39% (2890/10,181) as "normal mucosa." Within the neoplasm category, dysplasia images comprised the highest proportion at 20.71% (2108/10,181), followed by AGC at 13.80% (1405/10,181) and EGC at 13.73% (1398/10,181).In the non-neoplasm category, benign polyp images constituted the largest portion at 8.99% (915/10,181), followed by erosions at 8.07% (822/10,181), and BGU at 6.32% (643/10,181).Normal mucosa images were not separately categorized during training; however, they were used as background images and negative examples in the test set.Endoscopic images were extracted from the picture archiving and communication system of SNUH in PNG format and captured using Olympus Medical Systems endoscopes (GIF-H290) and video processing systems (EVIS LUCERA ELITE CV-290) in Tokyo, Japan.Furthermore, to anonymize the patient data, sections corresponding to patient information were cropped and removed from the original endoscopic images.Consequently, only the images corresponding to the field of view of the gastrointestinal endoscope were obtained through preprocessing (the minimum resolution of these images was 371 × 322 pixels).www.nature.com/scientificreports/ In medical datasets, achieving a natural balance can be challenging, resulting in imbalances in the number of data points across different lesions when the images are used for classification training.Various methods have been used to address this issue.In this study, we adopted data augmentation techniques (Fig. 2), including horizontal flip, HSV channel translation, affine augmentation, polar augmentation 26 , mosaic augmentation 27 , and copy paste augmentation 28 .Lesion images from patients undergoing upper endoscopy typically consist of approximately three to four images per patient, taken from various angles and distances; therefore, we addressed class imbalance by employing image stitching 29 in the validation set (Fig. 2).The authors assert that all procedures contributing to this work complied with the ethical standards of the relevant national and institutional committees on human experimentation and the Declaration of Helsinki of 1975, as revised in 2008.The requirement for written consent was waived by the Institutional Review Board (IRB) of Seoul National University Hospital (no.2108-030-1242; the IRB acquisition date of the IRB is August 31, 2021).

Model development and main outcome
All deep learning models were developed using the Python programming language (version 3.9.0) 30and Pytorch 1.11.0 31 .The imgaug library 0.4.0 26 was used for data augmentation.We employed YOLOv7 32 to develop a multiclass detection model for the six classified lesions.To identify the optimal hyperparameter configuration for achieving the best-performing model, we employed Hyperparameter Optimization with Genetic Algorithm for YOLOv7 32 and presented the results as an optimized parameter table (Table 3).The hardware setup used for training included 2 * RTX 3090ti graphics processing units, 12th Generation Intel ® Core™ i9-12900K, and 32 GB RAM.
We also developed a classification model to distinguish the depth of invasion in cancer images.Based on the T stage from the pathological reports of resected specimens from patients with gastric cancer, we developed a binary classification model for T stage estimation.
Notably, we evaluated our model, especially on images showing discrepancies between the initial endoscopic impression and the actual T stage reported from the resected specimen.These included images from 13 patients who were initially thought to have EGC based on endoscopic findings but were upstaged to AGC after resection, and 75 patients who were initially thought to have AGC based on endoscopic findings but were downstaged to EGC after resection.
The primary outcome was lesion detection rate in the detection model.Additional performance metrics include the following: The Positive Predictive Value (PPV), defined as "true positive / (true positive + false positive)".www.nature.com/scientificreports/-Sensitivity, defined as "true positive / (true positive + false negative)".

Comparative performance analysis with experts
To compare the performance with experts, we conducted additional analysis using our AI model and four expert endoscopists.We collected an additional set of 104 anonymized endoscopic images, which were not part of our model's development dataset and were from a different period (2023-2024).These images were evenly distributed across six diagnostic classes.Both the model and four expert endoscopists independently reviewed these images without any prior knowledge of each other's assessments.Following their evaluations, we compiled and analyzed the results to assess and compare the diagnostic performance of the AI model and the experts.

Ethics declarations
Approval of all ethical and experimental procedures and protocols was granted by the institutional review board (IRB) in Seoul National University Hospital (IRB No. 2108-030-1242).Due to the retrospective nature of the study, 2108-030-1242 waived the need of obtaining informed consent.

Test performance of the computer-aided detection (CADe) model
A schematic of the established lesion detection system is depicted in Fig. 3

Test performance of the T-stage classification model
In this study, we developed an algorithm to classify cancer images and their depth of invasion into T stages.For this purpose, we employ the EfficientNet-B3 model.When the T stage was classified as T1 and T2-T4, the model achieved an accuracy of 85.17%, sensitivity of 88.68%, specificity of 79.81%, PPV of 87.04%, and NPV of 82.18% (Table 4).Notably, among 75 pathologically proven patients with AGC, with the initial impression of the endoscopist being EGC, our model accurately predicted the T stage in 65 patients.In addition, out of 13 patients diagnosed by their endoscopist as having AGC,, their` actual T stage was EGC, and the model accurately predicted the T stage in nine cases.The detailed training parameters for the lesion detection model were as follows: (a) Batch size: 48 (b) Epoch: 100

Comparative performance analysis with experts
In our expanded analysis involving both four expert endoscopists and our CNN model, we found that our model performed robustly across a diverse dataset of six lesion types.We analyzed our model's performance across various lesion types including the performance to distinguish between cancer and non-cancer, neoplasm and non-neoplasm, each 6 lesion types (Supplementary Table 1-3).In the classification between cancer versus non-cancer, which is the most important issue in endoscopic exam, the negative predictive value and sensitivity of the AI was 88.89% and 98.51% which was superior than that of experts (78.38% and 88.06% respectively) (Supplementary Table 1).We further identified every case where our model correctly classified lesion types, while expert endoscopists did not (Supplementary Table 4) with representative images (Fig. 4).Our model accurately recognized dysplasia, in cases where some experts categorized the same lesions as benign erosion (Fig. 4A, B) or EGC (Fig. 4C).Furthermore, our model successfully identified benign polyps in cases where they were misclassified as dysplasia (Fig. 4D, E) or AGC (Fig. 4F) by some experts.This is of particular importance as such distinctions can have significant implications on subsequent clinical management, including treatment decisions and follow-up endoscopy scheduling.

Discussion
In this study, we developed an automated system for detecting and classifying malignant, premalignant, and benign gastric lesions.Previous studies 25 aimed to classify lesions using a separate artificial intelligence model for images detected using an anomaly detection algorithm.However, this approach requires an additional step to indicate the required histological examination, even after lesion detection, deviating from the primary goal of reducing the workload and fatigue [23][24][25] .To overcome this limitation, we developed a multiclass detection algorithm with real-time processing to eliminate the need for dual processing.Although detection algorithms with high sensitivity can identify subtle lesions, there is a risk of overprediction.Misclassifying nonlesions as lesions can inundate clinicians with false alarms, possibly overshadowing true lesions and misguiding lesion classification 9,10 .This can compound clinician fatigue, which is a challenge that CADe aims to address.In addition, a major concern during endoscopic examinations is the possibility of missed lesions such as cancers.Considering these challenges, our algorithm was primarily tailored to yield a high NPV for cancerous and neoplastic lesions 11 .From the data encompassing 2,606 patients, we achieved an NPV of 88.53% and a sensitivity of 80.90% for the six-class classification.Notably, cancerous lesions had an NPV of 88.57% and sensitivity of 78.62%, and neoplastic lesions had an NPV of 89.80% and sensitivity of 83.26%.Overall, the detection rate was 95.22% in the 306 patients evaluated.The decision to perform a biopsy is often based on the assessment of the malignant and premalignant potential of a lesion.This nuanced judgment is based on extensive endoscopic experience.Our methodology can substantially aid in refining biopsy decisions and in classifying lesions as cancerous, neoplastic, or benign.
In addition, when estimating the depth of invasion for gastric cancer, discrepancies between endoscopist impressions and the actual pathological T stage are often observed.Our model achieved high performance in the prediction of the T stage, even in cases that showed discrepancies in the endoscopic and pathological T stages.Because treatment options for gastric cancer, such as endoscopic or surgical resection, often rely on the estimated T stage, this model could aid in accurate decisions optimal treatment.
However, this study has a few limitations.The data used in this study were exclusively obtained from a single institution.Therefore, the algorithms employed in this study require external validation.Moreover, owing to the nature of Seoul National University Hospital, which is a tertiary hospital, there was a higher proportion of advanced cancer cases than of EGC or BGU cases, resulting in somewhat lower evaluation metrics, especially in the BGU and EGC classes.Additionally, data imbalance raises the possibility of bias towards cancerous lesions.Hence, securing multi-institutional data is recommended for external validation in future studies.Furthermore, because we could not compare the performance of the developed algorithm with that of an endoscopist using the same images, it is impossible to determine the true extent of the algorithm's enhancement of the lesion detection rate or its potential impact.
Compared with previous studies 12,16,22,23 , the strength of this study is its comprehensive inclusion of various lesions.Not only did we account for diverse cancers and premalignant lesions, but we also incorporated common benign lesions that show diverse endoscopic appearances, including BGUs, benign polyps, and benign erosions.This wide-ranging analysis ensured a thorough and representative evaluation, thus enhancing the applicability and robustness of the findings.
In summary, our model, which was tailored for detecting and classifying gastric cancers, dysplasia, and various benign lesions, demonstrated an outstanding performance and has the potential to assist clinicians in decision-making during endoscopic procedures.

Figure 1 .
Figure 1.Schematic diagram for the automated Multi-Class Lesion Detection and T-stage Classification Model.

Figure 2 .
Figure 2. Application examples of image augmentation: augmentation method using imgaug library, involving affine transformations on the left and polar augmentation on the right of the original image.Image stitching with homography aligns multiple multi-angle lesion images for augmentation.

Figure 4 .
Figure 4. Representative cases where the AI correctly classified the lesion, while experts did not.(A, B) Cases where AI accurately recognized Dysplasia, while experts categorized the same lesions as Benign erosion.(C) Cases where AI accurately recognized Dysplasia, while experts categorized the same lesions as EGC.(D, E) Cases where AI accurately recognized Benign Polyps while misclassified as dysplasia.(F) Cases where AI accurately recognized Benign Polyps while misclassified as

Table 1 .
Data distribution of the training, validation, and test sets for the detection model of gastric lesions.

Table 2 .
Data distribution of the training, validation, and test sets for the model classifying the depth of invasion.

Table 3 .
Hyperparameters of the detection model after optimization with the genetic algorithm.

Table 4 .
Diagnostic performance of the model in classifying lesions on endoscopic images.