Bleeding risk factors for gastroesophageal varices (GEV) detected by endoscopy in cirrhotic patients determine the prophylactical treatment patients will undergo in the following 2 years. We propose a methodology for measuring the risk factors. We create an artificial intelligence system (ENDOANGEL-GEV) containing six models to segment GEV and to classify the grades (grades 1–3) and red color signs (RC, RC0-RC3) of varices. It also summarizes changes in the above results with region in real time. ENDOANGEL-GEV is trained using 6034 images from 1156 cirrhotic patients across three hospitals (dataset 1) and validated on multicenter datasets with 11009 images from 141 videos (dataset 2) and in a prospective study recruiting 161 cirrhotic patients from Renmin Hospital of Wuhan University (dataset 3). In dataset 1, ENDOANGEL-GEV achieves intersection over union values of 0.8087 for segmenting esophageal varices and 0.8141 for gastric varices. In dataset 2, the system maintains fairly accuracy across images from three hospitals. In dataset 3, ENDOANGEL-GEV surpasses attended endoscopists in detecting RC of GEV and classifying grades (p < 0.001). When ranking the risk of patients combined with the Child‒Pugh score, ENDOANGEL-GEV outperforms endoscopists for esophageal varices (p < 0.001) and shows comparable performance for gastric varices (p = 0.152). Compared with endoscopists, ENDOANGEL-GEV may help 12.31% (16/130) more patients receive the right intervention. We establish an interpretable system for the endoscopic diagnosis and risk stratification of GEV. It will assist in detecting the first bleeding risk factors accurately and expanding the scope of quantitative measurement of diseases.
Decompensated cirrhosis is defined in terms of development of ascites, variceal hemorrhage, or hepatic encephalopathy1. Gastroesophageal varices (GEV) are severe complications of cirrhosis present in 85% of decompensated cirrhotic patients, and consequent variceal hemorrhage is life-threatening2,3. Although noninvasive methods have been adopted to exclude patients who are unlikely to develop GEV, endoscopy is the gold standard for diagnosing GEV and predicting the risk of hemorrhage within 2 years4. Patients diagnosed with cirrhosis should undergo endoscopy to detect varices and rank the risk of variceal bleeding. Endoscopic risk rank determines different treatment recommendations for primary prophylaxis in the following 1–2 years5. Cirrhotic patients with grade 1 varices and red color signs (RC)/Child‒Pugh C or grade 2–3 varices should receive prophylactic treatments according to guidelines6,7. Several studies also confirmed the significance of endoscopic risk factors for predicting variceal hemorrhage, one of the manifestations of decompensated cirrhosis8,9.
However, the endoscopic description of risk factors is subject to operator dependence. There is a low consistency among endoscopists on the grade, RC, and size of GEV10,11. Incorrect diagnoses by endoscopists come at the expense of the patients’ security and medical costs. Nonselective beta-blockers or variceal band ligation should be primary prophylaxis for high-risk patients. If patients with high-risk varices were missed, the rupture rate of varices is ~15% per year, and the mortality is up to 25% in 6 weeks8,12. Low-risk patients should be screened every 2 years. If low-risk patients are misdiagnosed as high-risk patients, they might experience side effects but not benefit from prophylaxis, including postoperative bleeding and bradycardia, etc. What is worse, similar research indicated that subjectivity is inherent in humans, and training may only poorly improve it13. A clinical method that provides a quantitative and accurate assessment of endoscopic risk factors is urgently needed.
With the significant advances of artificial intelligence (AI) in endoscopy, AI has made up for the shortcomings of endoscopists and normalized the diagnoses made by endoscopists13,14. AI was successfully used to help endoscopists detect colorectal adenomas during colonoscopy, and to reduce blind spots during esophagogastroduodenoscopy, etc. Several studies used computed tomography to diagnose high bleeding risk esophageal varices (EV)15,16,17,18. However, these studies were limited in small sample size, retrospective design, and low area under the curves. More accurate methods, and larger multicenter validations are still needed. Deep convolutional neural networks (DCNN) were also used in endoscopy to detect EV and gastric varices (GV)10,19. These systems conclude whether varices appear in an image or video, but don’t point out where the lesion is. A growing number of people believe that the “black box” features of AI attenuated its reliability20. Although algorithms perform excellently in a broad spectrum of diseases including varices, why they make such decisions is difficult to interpret21. Interpretable AI is attracting much interest in medicine, but most of the attempts so far, such as Shapley values, were for developers, not end-users. The Shapley value of features on the image is calculated by retraining models after the removal the features. But its long computing time does not allow Shapley value to be displayed to users in time22,23. To address these limitations, we developed an AI system that is explainable for both developers and end-users to delineate GEV.
In this study, we provide visualized, objective, and quantitative deep learning measurements to predict the risk factors for GEV hemorrhage. The system is designed to identify patients with high bleeding risk by segmenting the varices and RC on the varices and further classifying the grade (size) of varices and the density and distribution of RC. The changes and accumulated percentages of grade(size) and RC during esophagogastroduodenoscopy would be calculated to visualize the prediction of the system. The system shows robust performance in the observational study. The system will increase the effectiveness of interventions tailored to the risk of hemorrhage, improve health outcomes of cirrhotic patients and reduce spending on healthcare.
The system consists of the main models for GEV diagnosis and risk factors detection, and supportive models for unqualified images deletion. Main models include EV segmentation model(model 1), RC segmentation model(for both EV and GV, model 2), RC classification model (for EV, model 3)and grade classification model(for EV, model 4), GV segmentation (model 5), size classification model for GV (model 6).
From July 1st, 2020, to April 30th, 2021, 174 cirrhotic patients undergoing endoscopic screening for varices were eligible for inclusion in the study. Thirteen patients were excluded because of malignancy (n = 1), refusal to participate in this study (n = 8), or incomplete endoscopy (n = 4). Therefore, 161 patients were analyzed in this research (117 men, 44 women; mean age 57.41 years, range 32–79 years). The demographic data are summarized in Table 1. The flowchart of the dataset preparation is shown in Fig. 1.
The performance of ENDOANGEL-GEV on dataset 1(Testing dataset)
The system’s performance is summarized in Table 2, and Supplementary Tables 2–3. Representative images of the system are shown in Fig. 2. Model 1(EV segmentation model) detected EV with a sensitivity of 93.49% (95% confidence interval (CI), 92.04–94.74%) per varix (1351 varices). Model 1 delineated the outlines of EV with a mean intersection over union (mIoU) of 0.8087 (95% CI, 0.7968–0.8206). Model 2 (RC segmentation model) achieved an accuracy of 97.80% (95% CI, 96.10–98.90%) for detecting RC of EV. Model 3 (RC classification model) reached an accuracy of 94.40% (95% CI, 92.01–96.25%) for the classification of RC with accuracies of 89.24% (95% CI, 80.68–94.44%), 91.67% (95% CI, 83.04–96.30%), and 95.83% (95% CI, 89.07–98.65%) for RC1, RC2, and RC3, respectively. Model 4 (grade classification model) correctly classified 93.00% (95% CI, 90.40–95.08%) of images for the grade of EV. Model 4 reached accuracies of 90.00% (95% CI, 83.05–94.68%), 93.19% (95% CI, 89.87–95.68%) and 98.27% (95% CI, 90.76–99.96%) for classifying grade 1, 2 and grade 3, respectively.
Model 5(GV segmentation model) achieved a sensitivity of 95.93% (95% CI, 94.33–97.18%) per varix (811 varices) for detecting GV and delineated GV with a mIoU of 0.8141 (95% CI, 0.8087–0.8195). Model 3(RC segmentation model) achieved an accuracy of 90.73% (95% CI, 87.99–93.02%), a sensitivity of 90.03% (95% CI, 86.09–92.98%) and a specificity of 91.70% (95% CI, 87.15–94.79%) for detecting RC of GV.
The performance of ENDOANGEL-GEV on dataset 2(validation dataset)
Model 1(EV segmentation model) achieved a sensitivity of 90.79% (95% CI, 89.55–91.91%) for detecting EV (2402 varices). It delineated EV with a mIoU of 0.8890(95% CI, 0.8811, 0.8970). Model 2 (RC segmentation model) achieved a sensitivity of 99.79% (95% CI, 99.65–100.00%) and, a specificity of 92.54% (95% CI, 91.23–93.66%) for predicting RC of EV. Model 3 (RC classification model) achieved an accuracy of 93.43% (95% CI, 92.51–94.24%) for the classification of RC Model 4 (grade classification model) classified EV grades 1–3 with accuracies of 94.84% (95% CI, 91.62–96.04%), 93.67% (95% CI, 90.57–94.75%), and 93.88% (95% CI, 89.74–96.82%).
Model 5(GV segmentation model) diagnosed GV with a sensitivity of 88.34% (95% CI, 87.76–88.90%) per varix (12383 varices). Model 5 delineated the outlines of GV with a mIoU of 0.8551(95% CI, 0.8524, 0.9077). Model 3(RC segmentation model) achieved a sensitivity of 91.10% (95% CI, 89.85–92.20%) and a specificity of 91.58% (95% CI, 90.80–92.30%) for detecting RC of GV.
Comparison between ENDOANGEL-GEV and endoscopists on dataset 3(Prospective study)
The videos in dataset 3 were processed according to Fig. 3. Representative original images and qualified images filtered by supportive models are shown in Supplementary Figs. 1 and 2. Figure 4 presents the diagnostic yields of ENDOANGEL-GEV and endoscopists on dataset 3. The sensitivity of ENDOANGEL-GEV for detecting EV was comparable to that of endoscopists (100.00%, 95% CI 96.44–100.00% vs. 99.23%, 95% CI 95.19–99.96%, p = 1.000) The accuracy of ENDOANGEL-GEV for classifying RC was significantly higher than that of endoscopists (94.62%, 95% CI 89.11–97.56% vs. 66.92%, 95% CI 58.44–74.44%, p < 0.001). ENDOANGEL-GEV ranked grade better than endoscopists (94.57%, 95CI% 89.14–99.90% vs. 75.97%, 95% CI 67.66–83.05%, p < 0.001).
ENDOANGEL-GEV showed comparable performance with endoscopists in detecting GV (97.52%, 95% CI, 93.57–99.25% vs. 98.76%, 95%, CI 95.30–99.95%, p = 0.625) The accuracy of ENDOANGEL-GEV in classifying the RC of GV is significantly higher than that of endoscopists. (94.92%, 95% CI 89.26–98.11% vs. 69.49%, 95% CI 60.34–7.63%, p < 0.001).
Regarding the risk stratification of EV using endoscopic findings and Child-Pugh score, ENDOANGEL-GEV significantly outperformed endoscopists (97.69%, 95% CI, 93.14–99.51% vs. 85.38%, 95% CI, 78.22–90.52%, p < 0.001) (Table 3). ENDOANGEL-GEV and endoscopists showed similar metrics for ranking the risk of GV (95.76%, 95% CI, 90.21–98.43% vs. 85.38%, 95% CI, 78.22–90.52%, p = 0.152) (Table 3). More results are shown in Supplementary Tables 4–13 and Supplementary video 1.
The median follow-up was 12.12 months. Six patients experienced rebleeding (3 EV and 3 GV), and none of the patients died during follow-up (Supplementary Table 14). The results of questionnaire on the satisfaction are shown in supplementary fig. 4.
We developed an interpretable, quantitative, expert-level system for detecting the risk factors of first bleeding of GEV in cirrhosis. Our system yielded high predictive accuracy, detailed assessment, and interpretable results, providing a means for further endoscopic exploration of cirrhosis. ENDOANGEL-GEV outperformed expert panel for ranking the risk of bleeding in real world. Moreover, the high consistency between cohorts with high variance in endoscopy brands and the quality of images indicates a substantial degree of generalizability. These new achievements will facilitate the application of explainable AI in medical training and expand the new scope of quantification in medicine.
All cirrhotic patients suspected to have GEV will undergo endoscopy to assess the risk of rupture within 2 years. Misdiagnosis will expose high-risk patients to the risk of bleeding, and low-risk patients will suffer the side effects of unnecessary treatment. Noninvasive tests accurately identify patients without varices but are not recommended to diagnose GEV24. Endoscopy is still the gold standard for diagnosing GEV. The presence of varices indicates a higher portal pressure level, but the endoscopic findings are not correlated with a specific portal pressure level. Higher portal pressure, large varices, and RC are positively correlated with variceal bleeding, while there is no close correlation between changes in portal pressure and changes in the endoscopic findings of varices25,26. Therefore, the endoscopic findings of varices are relatively independent indicators of variceal bleeding. Classifying the endoscopic findings of GEV is essential to predict the risk of variceal bleeding.
Quantification is a key point in the endoscopic assessment of GEV: the size of varices and the density and distribution of RC are divided into three groups for further measurement. Quantifying lesion features is also challenging for doctors. For example, endoscopists could not classify the bowel preparation into four groups well enough27. The thresholds between groups are unclear and qualitative, which leads to high inconsistency between endoscopists28. These refined tasks are also challenging for DCNN, which perform classification based on the whole image. In our previous study, the DCNN model achieved an accuracy of 63.44% (95% CI, 58.30–68.30%) for ranking the grade of varices10. Therefore, we introduced fully convolutional networks (FCNs) to help endoscopists perform more delicate tasks such as quantifying complex features and detecting small targets29. Instead of semantic segmentation, this study extended the application of FCN to disease classification, presenting the results directly on endoscopic images. We linked FCN to DCNN and achieved a significantly higher accuracy of 93.00% (95% CI, 90.40–95.08%) for classifying grade. We adopted a slightly different method by linking the FCN to DBSCAN to classify RC density and distribution, generating intuitive density and distribution maps for endoscopists. Erosion (25/31, 80.64%), ulcers (3/31, 9.67%), and other mucosal injuries (3/31, 9.67%) on the esophagus and stomach are usually mistaken for RC by endoscopists (Supplementary fig. 4). In comparison, ENDOANGEL-GEV could detect real RCs, which appear as dark red spots under the mucosa. Erosion is also the leading cause (3/4, 75%) of false positives of ENDOANGEL-GEV.
Explainability has been accompanying AI in medicine. The Federal Trade Commission reported using AI and algorithms, mentioning that models should explain their decision to the consumer. If models are used to assign risk scores to consumers, they should disclose and rank the factors that affected the results30. An article published in Nature Medicine also pointed out, “AI in medicine must be explainable”31. According to our interview and related articles, end-users also need an explainable interface to build trust32. Compared with our previously published study, this system will help end-users to understand how it makes its conclusions33. DCNN models conceal the features supporting their predictions, preventing people from exploring or optimizing them. Compared with DCNN models, ENDOANGEL-GEV estimates every pixel on the image, providing an intuitionistic prediction of the ill region. Interpretable and direct presentation exposes the model’s logic, directly paving the way for fixing mistakes, explaining to end-users, and training. The questionnaire results also indicated that explainable AI systems are more likely to be accepted by endoscopists (Supplementary Fig. 4).
Articles analyzing the risk factors for first variceal bleeding vary considerably in the level of detail for measuring endoscopic findings7,34. One of the reasons may be that it is difficult to unify the criteria for classifying the findings. GV are described in less minor detail than EV because the prevalence of GV is lower than that of EV, and GV bleeding is less correlated with portal vein pressure than EV35,36. In our study, endoscopists showed a higher false-positive rate in both RC and grade, while ENDOANGEL-GEV maintained high specificity and remained fairly sensitive. ENDOANGEL-GEV outperformed endoscopists in classifying high-risk patients and low-risk patients. The system will help more patients receive prophylactic therapy and reduce health care waste by freeing 52.63% (10/19) more low-risk patients from unnecessary treatment. In summary, ENDOANGEL-GEV will resolve the inconsistencies and assess infrequent features accurately, contributing to a more detailed clinical analysis.
Lesions in the digestive tract can be divided into solitary lesions (polys, cancers, etc.) and diffuse lesions (inflammatory bowel disease, gastritis, etc.). Previous research mainly provided algorithms more suitable for solitary lesions, such as detecting scattered lesions or diagnosing lesions37. However, it is equally important to describe how the lesions change with location and summarize the features of lesions, which is another quantitative problem. For endoscopists who detect lesions while operating, this distraction will undermine their analysis. Therefore, we quantified the change in varices with time and location, relieving endoscopists of the burden of summarizing the features of long varices.
Reporting guidelines for clinical trials involving AI suggested describing how the input data were acquired and selected for the AI intervention. We trained a supportive system to standardize the input images. The system automatically removes poor-quality images guaranteeing the input data are standardized across different trial sites38.
The limitations to the current study must be acknowledged. First, in this article, a total of 9.31% of patients were admitted with advanced liver failure (Child‒Pugh C). Because this study was conducted in a tertiary hospital where there are more severe cirrhosis patients. Second, this was a single-arm study rather than a randomized trial, but the performance of ENDOANGEL-GEV and endoscopists were compared. Third, this system did not classify GV according to their location, because the endoscopic location is not the gold standard to determine the supplying vessels and the drainage vessels of GV. Instead, we performed contrast-enhanced computed tomography before treatment.
In conclusion, the present study provided an accurate and interpretable deep learning-based system for the diagnosis and risk stratification of GEV, and our system was validated in a prospective study. The system will increase the effectiveness of interventions tailored to the risk of hemorrhage, thus improving health outcomes and reducing spending on healthcare. This explainable, expert-level system will expand the application scope of AI in quantitative measurement in medicine.
Datasets and preprocessing
The flowchart of the dataset preparation is shown in Fig. 1. Endoscopic images of GEV used for training, validation, and testing (dataset 1) were collected from Renmin Hospital of Wuhan University, Jingzhou Second People’s Hospital, and Wuhan No. 1 Hospital from January 2nd, 2015, to April 30th, 2019. A doctoral student excluded images with inferior quality (blurs, repetition, or poor preparation). A total of 6034 images from 1156 GEV patients were used to train the models for EV segmentation (model 1), RC segmentation (for both EV and GV, model 2), RC and grade classification (for EV, model 3 and model 4), and GV segmentation (model 5). The size classification model for GV (model 6) has been published10. If models segment suspicious varices (or RC) area on an image, the image is classified as varices (RC) positive. If models don’t identify suspicious varices (or RC), the image is classified as varices (or RC) negative. Images from one individual were not split into different datasets.
All images were captured by Olympus (Medical Systems, Tokyo, Japan; GIF-H260Z, CF-HQ290) and Fujifilm systems (Kanagawa, Japan; EC-590WM, EC-600WM). The distribution of images is shown in Supplementary Table 1.
To develop the models, three experts who had more than 10 years of GEV experience (both endoscopists and hepatologists) reviewed all images and classified the images as follows:
(1) EV/normal esophagus.
(2) RC are graded as 0, 1, 2, or 3 according to their density and distribution: (a) RC0 = absent; (b) RC1 = small in number and localized; (c) RC2 = intermediate between RC1 and RC3; and (d) RC3 = large in number and circumferential.
(3) (a) Grade 1 lesions are straight, small-caliber varices. Small venous dilatations that disappear upon insufflation of the esophagus are not included in this subgroup. (b) Grade 2 lesions are moderately enlarged, beady varices. (c) Grade 3 lesions are markedly enlarged, nodular or tumor shaped varices.
(1) GV/normal stomach;
(2) RC (0)/RC (1).
(a) RC0 = absent; and (b) RC1 = GV with RC.
(3) Size big (diameter ≥5 mm)/size small (diameter < 5 mm).
All of the above items were classified according to general rules for recording endoscopic findings of GEV28. Gold standards were achieved by two or more experts agreed upon results. They will discuss the images which they didn’t reach a consensus on the first classification and finally classified the image into a category. Then, two experts delineated the GEV margins and RC on the images.
Fully convolutional networks (Unet + +) were used to train models 1, 2, and 5 for EV, GV and RC segmentation29. Original images were input into framework regardless of resolution, and Unet ++ trained the model in Keras with the labeled maps of experts as the output. Keras is a neural network application programming interface for Python. There was no overlap among the training, validation and test datasets. Cut-off values were chosen to segment the regions of EV, GV, and RC according to the results of the validation datasets. In the later part of the article, the training and validation dataset of Dataset 1 is recorded as Dataset 1(Training and validation dataset), and testing dataset of Dataset 1 is recorded as Dataset 1(Testing dataset).
As guidelines suggest, RC are graded according to their density and distribution28. Therefore, RC could be regarded as a group of points, the number and distribution of which were graded. Density-based spatial clustering of applications with noise (DBSCAN) was used to classify the rank of RC (model 3)39. DBSCAN was determined by ε and the minimum number of points required to form a dense region (minPts)39. Based on ε and minPts, DBSCAN classified the points into core points, reachable points, and outliers. Core points reach n (n ≥ minPts) points within the distance ε. Reachable points could reach core points through a bunch of points directly reachable to each other. If a reachable point cannot reach more than minPts points, it is the cluster’s edge.
All results were compared with the gold standards, retaining the best model with minPts as 1 and ε as math.sqrt(w*h/6.5).
Model 4 and model 6 were deep learning convolutional neural networks trained based on Fast.ai to classify the results of model 1 and model 5.
Supportive model 1 removed the unqualified images, including images with blurring, digital chromo, biopsy forceps, and flushing water. 38,422 endoscopic images were classified into 15,084 qualified images and 23,338 unqualified images (blurry, digital chromo, biopsy forceps, duplicate, flushing water) by doctoral students to develop supportive model 1. A deep convolutional neural network was trained based on ResNet 50.
Supportive model 2 was used to classify the esophagogastroduodenoscopy images into 26 sites and retain images in the esophagus, squamocolumnar junction, fundus lesser curvature, fundus anterior wall, fundus greater curvature, and fundus posterior wall14,40.
Supportive model 3 was to remove the images with inadequate inflation. Images in inadequate inflation section or inadequate inflation caused by breath will be removed by supportive model 3. It was trained using 3813 inadequate inflation images and 6392 adequate inflations based on ResNet 50.
Hardware parameters: All models were trained on Windows 10 Professional operating system. CPU versions are Intel® Core™ firstname.lastname@example.orgGhz and @3.19 GHz. GPU is NVIDIA GeForce RTX 2080 (memory size: 8 GB, memory bandwidth: 256 bits, frequency: 7000 MHz).
Software Environment: Programming language is Python 3.6.5. Deep Learning Frameworks are TensorFlow 1.12.2 and Keras 2.2.5.
Python packages: OpenCV-python 184.108.40.206, NumPy 1.19.5, and Pandas 1.1.5.
Validation dataset (dataset 2)
To validate the ability of the system to diagnose and classify risk factors in real-time, ENDOANGEL-GEV was tested using sequential images clipped from 141 esophagogastroduodenoscopy videos (25 frames per second) from 3 independent cohorts (Wuhan Puren Hospital, Central Hospital of Enshi Tujia and Miao Autonomous Prefecture and Renmin Hospital of Wuhan University). Experts affiliated with the hospitals established the gold standard. Smoothing was used by taking the results of three or more images out of five consecutive qualified images as the prediction result.
Prospective study (dataset 3)
The system was installed on computers in the endoscopy unit of Renmin Hospital of Wuhan University, and endoscopic videos (7 frames per second) of prospective cirrhotic patients were analyzed to validate the system in the clinic. Endoscopists were blinded to the results of the system. The gold standards were the same as those in the training dataset. Three supportive models were added to the system and activated in order to process the videos.
This prospective observational study was conducted at Renmin Hospital of Wuhan University from July 1st, 2020, to April 30th, 2021. Cirrhotic patients presented to Renmin Hospital of Wuhan University were invited to participate in this study. The inclusion criteria were as follows: (1) cirrhosis diagnosed by histology or by both blood samples and two methods of imaging, ultrasound and computed tomography/magnetic resonance imaging; (2) age between 18 and 80 years; and (3) no previous EV or GV bleeding and never received endoscopic treatment, surgical treatment, or transjugular intrahepatic portosystemic shunt for EV or GV before. The exclusion criteria included (1) gastrointestinal malignancies before participation; (2) a history of esophagus or stomach surgery; (3) severe diseases of other organs or infections with a prehepatic or posthepatic origin; and (4) refusal to give informed consent to participate in the study.
Data on the presence or absence of ascites, jaundice, and hepatic encephalopathy were collected before endoscopy. A blood sample under fasting conditions was taken before endoscopy to assess liver disease etiology and severity (Child‒Pugh score).
All eligible patients underwent endoscopy, performed using CF-HQ290, CF-Q260AI (Olympus Optical, Tokyo, Japan) EC-590WM, or EC-600WM systems (Fujifilm, Kanagawa, Japan). The endoscopists were six staff members of the Gastroenterology Department in Renmin Hospital of Wuhan University, Wuhan, China, with an endoscopic experience of 6.67 ± 2.58 years.
EV and GV were classified and recorded according to the general rules of the Japan Research Society for Portal Hypertension. All patients were treated by the endoscopists mentioned above according to the latest guidelines2. The indication for primary prophylaxis was small varices (grade 1) with Child‒Pugh C, or the presence of medium (grade 2) to large varices (grade 3) with or without RC on varices. All patients were followed up for at least 6 months. Adverse events were considered one of the following complications: upper gastrointestinal hemorrhage from variceal bleeding confirmed by endoscopy and death.
The study was carried out in compliance with the Declaration of Helsinki. The study protocol was approved by the ethics committees of the Renmin Hospital of Wuhan University (Reference number: 2019K-K094(Y01)). Written informed consent was obtained from all prospective patients. The ethics committee waived the requirement of informed consent for retrospectively collected information.
A questionnaire on the satisfaction of ENDOANGEL-GEV
Five endoscopists were asked to watch three videos applied by ENDOANGEL-GEV and ENDOANGEL (previously published)10. They filled in a questionnaire after watching the videos. The questionnaire contains three questions on the two systems’ accuracy, helpfulness, and trustworthiness. They ranked five levels of agreement: strongly agree, agree, neutral, disagree, and strongly disagree.
The primary outcome of the study was the accuracy of ENDOANGEL-GEV in detecting GEV on dataset 3(Prospective study). The secondary outcomes were the metrics of ENDOANGEL-GEV and endoscopists to detect and rank risk factors for GEV, the comparison results between ENDOANGEL-GEV and endoscopists, and the diagnostic value of six endoscopists for detecting risk factors and risk stratification.
We assumed ENDOANGEL-GEV could reach the diagnostic accuracy of 90% in a single-arm group study with objective performance criteria. With a power of 90%, a two-sided significance level of 0.05, 158 patients were required. Assuming a drop-out rate of 5%, the target sample size was 166. The sample size was calculated using Power Analysis and Sample Size 15.
Precision, recall, and Intersection over union (IoU) were calculated to assess the segmentation.
IoU was defined as the relative overlap between the predicted bounding box and the ground-truth bounding box.
Precision = True positive area/(True positive area + False positive area)
Recall = True positive area/(True positive area + False negative area)
Accuracy, sensitivity, specificity, positive predictive value, and negative predictive value were calculated. Categorical variables were compared by using the chi-square test (McNemar test). P values < 0.05 were considered statistically significant. All calculations were performed using SPSS 23 (IBM, Chicago, Illinois, USA).
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Individual de-identified data and pretraining model, and source code reported in this article will be shared for investigators 12 months after article publication. Data requesters could contact the corresponding author to gain access. The codes are uploaded on Github. (https://github.com/endo-angel/gastroesophageal-varices).
Villanueva, C. et al. β blockers to prevent decompensation of cirrhosis in patients with clinically significant portal hypertension (PREDESCI): a randomised, double-blind, placebo-controlled, multicentre trial. Lancet 393, 1597–1608 (2019).
European Association for the Study of the Liver. EASL Clinical Practice Guidelines for the management of patients with decompensated cirrhosis. J. Hepatol. 69, 406–460 (2018).
Garcia-Tsao, G. & Bosch, J. Management of varices and variceal hemorrhage in cirrhosis. N. Engl. J. Med. 362, 823–832 (2010).
Abraldes, J. G. et al. Noninvasive tools and risk of clinically significant portal hypertension and varices in compensated cirrhosis: the “Anticipate” study. Hepatology 64, 2173–2184 (2016).
Bosch, J. & Sauerbruch, T. Esophageal varices: stage-dependent treatment algorithm. J. Hepatol. 64, 746–748 (2016).
Tripathi, D. et al. U.K. guidelines on the management of variceal haemorrhage in cirrhotic patients. Gut 64, 1680–1704 (2015).
Garcia-Tsao, G., Abraldes, J. G., Berzigotti, A. & Bosch, J. Portal hypertensive bleeding in cirrhosis: Risk stratification, diagnosis, and management: 2016 practice guidance by the American Association for the study of liver diseases. Hepatology 65, 310–335 (2017).
North Italian Endoscopic Club for the Study and Treatment of Esophageal Varices. Prediction of the first variceal hemorrhage in patients with cirrhosis of the liver and esophageal varices. A prospective multicenter study. N Engl J. Med. 319, 983–989 (1988).
Merkel, C. et al. Prognostic indicators of risk for first variceal bleeding in cirrhosis: a multicenter study in 711 patients to validate and improve the North Italian Endoscopic Club (NIEC) index. Am. J. Gastroenterol. 95, 2915–2920 (2000).
Chen, M. et al. Automated and real-time validation of gastroesophageal varices under esophagogastroduodenoscopy using a deep convolutional neural network: a multicenter retrospective study (with video). Gastrointest. Endosc. 93, 422–432.e423 (2021).
Haq, I. & Tripathi, D. Recent advances in the management of variceal bleeding. Gastroenterol. Rep. (Oxf.) 5, 113–126 (2017).
Jairath, V. et al. Acute variceal haemorrhage in the United Kingdom: patient characteristics, management and outcomes in a nationwide audit. Dig. Liver Dis. 46, 419–426 (2014).
Gong, D. et al. Detection of colorectal adenomas with a real-time computer-aided system (ENDOANGEL): a randomised controlled study. Lancet Gastroenterol. Hepatol. 5, 352–361 (2020).
Wu, L. et al. Randomised controlled trial of WISENSE, a real-time quality improving system for monitoring blind spots during esophagogastroduodenoscopy. Gut 68, 2161–2169 (2019).
Liu, H. et al. Establishment of a non-invasive prediction model for the risk of oesophageal variceal bleeding using radiomics based on CT. Clin. Radio. 77, 368–376 (2022).
Yan, Y. et al. A novel machine learning-based radiomic model for diagnosing high bleeding risk esophageal varices in cirrhotic patients. Hepatol. Int 16, 423–432 (2022).
Dong, T. S. et al. Machine Learning-based Development and Validation of a Scoring System for Screening High-Risk Esophageal Varices. Clin. Gastroenterol. Hepatol. 17, 1894–1901.e1891 (2019).
Lee, C. M. et al. An index based on deep learning-measured spleen volume on CT for the assessment of high-risk varix in B-viral compensated cirrhosis. Eur. Radio. 31, 3355–3365 (2021).
Ding, S., Li, L., Li, Z., Wang, H. & Zhang, Y. Smart electronic gastroscope system using a cloud–edge collaborative framework. Future Gener. Comput. Syst. 100, 395–407 (2019).
Castelvecchi, D. Can we open the black box of AI? Nature 538, 20–23 (2016).
Muti, H. S. et al. Development and validation of deep learning classifiers to detect Epstein-Barr virus and microsatellite instability status in gastric cancer: a retrospective multicentre cohort study. Lancet Digit Health 3, e654–e664 (2021).
Wulczyn, E. et al. Interpretable survival prediction for colorectal cancer using deep learning. NPJ Digit. Med. 4, 71 (2021).
Zhang, G. et al. Clinically relevant deep learning for detection and quantification of geographic atrophy from optical coherence tomography: a model development and external validation study. Lancet Digit Health 3, e665–e675 (2021).
de Franchis, R. Expanding consensus in portal hypertension: report of the Baveno VI Consensus Workshop: Stratifying risk and individualizing care for portal hypertension. J. Hepatol. 63, 743–752 (2015).
Ramanathan, S. et al. Correlation of HVPG level with ctp score, MELD Score, ascites, size of varices, and etiology in cirrhotic patients. Saudi J. Gastroenterol. 22, 109–115 (2016).
Garcia-Tsao, G. et al. Portal pressure, presence of gastroesophageal varices and variceal bleeding. Hepatology 5, 419–424 (1985).
Zhou, J. et al. A novel artificial intelligence system for the assessment of bowel preparation (with video). Gastrointest. Endosc. 91, 428–435.e422 (2020).
Tajiri, T. et al. General rules for recording endoscopic findings of esophagogastric varices (2nd edition). Dig. Endosc. 22, 1–9 (2010).
Zhou, Z., Siddiquee, M. M. R., Tajbakhsh, N. & Liang, J. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support 3–11 (Springer, 2018).
Smith, A. Using Artificial Intelligence and Algorithms. https://www.ftc.gov/business-guidance/blog/2020/04/using-artificial-intelligence-and-algorithms. (2020).
Kundu, S. AI in medicine must be explainable. Nat. Med. 27, 1328 (2021).
Tonekaboni, S., Joshi, S., McCradden, M. D. & Goldenberg, A. What clinicians want: contextualizing explainable machine learning for clinical end use. In Machine learning for healthcare conference. (PMLR) 359–380 (2019).
Brennen, A. What Do People Really Want When They Say They Want" Explainable AI?" We Asked 60 Stakeholders. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems. 1–7 https://doi.org/10.1145/3334480.3383047. (2020).
Beppu, K. et al. Prediction of variceal hemorrhage by esophageal endoscopy. Gastrointest. Endosc. 27, 213–218 (1981).
Morrison, J. D. et al. Gastric Varices Bleed at Lower Portosystemic Pressure Gradients than Esophageal Varices. J. Vasc. Inter. Radio. 29, 636–641 (2018).
Saad, W. E. Endovascular management of gastric varices. Clin. Liver Dis. 18, 829–851 (2014).
Repici, A. et al. Efficacy of Real-Time Computer-Aided Detection of Colorectal Neoplasia in a Randomized Trial. Gastroenterology 159, 512–520.e517 (2020).
Liu, X., Cruz Rivera, S., Moher, D., Calvert, M. J. & Denniston, A. K. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. Nat. Med. 26, 1364–1374 (2020).
Wang, W.-T., Wu, Y.-L., Tang, C.-Y. & Hor, M.-K. Adaptive density-based spatial clustering of applications with noise (DBSCAN) according to data. In 2015 International Conference on Machine Learning and Cybernetics (ICMLC). (IEEE) 1, 445–451 (2015).
Wu, L., et al. Evaluation of the effects of an artificial intelligence system on endoscopy quality and preliminary testing of its performance in detecting early gastric cancer: a randomized controlled trial. Endoscopy. 12, 1199–1207 (2021).
The project was supported by Hubei Province Major Science and Technology Innovation Project (grant no. 2018-916-000-008); the Fundamental Research Funds for the Central Universities(2042022kf1099).
The authors declare no competing interests.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Wang, J., Wang, Z., Chen, M. et al. An interpretable artificial intelligence system for detecting risk factors of gastroesophageal variceal bleeding. npj Digit. Med. 5, 183 (2022). https://doi.org/10.1038/s41746-022-00729-z