A deep-learning pipeline to diagnose pediatric intussusception and assess severity during ultrasound scanning: a multicenter retrospective-prospective study

Ileocolic intussusception is one of the common acute abdomens in children and is first diagnosed urgently using ultrasound. Manual diagnosis requires extensive experience and skill, and identifying surgical indications in assessing the disease severity is more challenging. We aimed to develop a real-time lesion visualization deep-learning pipeline to solve this problem. This multicenter retrospective-prospective study used 14,085 images in 8736 consecutive patients (median age, eight months) with ileocolic intussusception who underwent ultrasound at six hospitals to train, validate, and test the deep-learning pipeline. Subsequently, the algorithm was validated in an internal image test set and an external video dataset. Furthermore, the performances of junior, intermediate, senior, and junior sonographers with AI-assistance were prospectively compared in 242 volunteers using the DeLong test. This tool recognized 1,086 images with three ileocolic intussusception signs with an average of the area under the receiver operating characteristic curve (average-AUC) of 0.972. It diagnosed 184 patients with no intussusception, nonsurgical intussusception, and surgical intussusception in 184 ultrasound videos with an average-AUC of 0.956. In the prospective pilot study using 242 volunteers, junior sonographers’ performances were significantly improved with AI-assistance (average-AUC: 0.966 vs. 0.857, P < 0.001; median scanning-time: 9.46 min vs. 3.66 min, P < 0.001), which were comparable to those of senior sonographers (average-AUC: 0.966 vs. 0.973, P = 0.600). Thus, here, we report that the deep-learning pipeline that guides lesions in real-time and is interpretable during ultrasound scanning could assist sonographers in improving the accuracy and efficiency of diagnosing intussusception and identifying surgical indications.


INTRODUCTION
Intussusception is one of the common acute abdomens in children, with the ileocolic type being the most prevalent, which is usually diagnosed urgently using ultrasound 1 .Nonsurgical enemas administered within 24 h of onset relieve symptoms in approximately 84% of patients 2,3 .Less than one-third of patients present with the classic triad of symptoms (abdominal pain, palpable mass, and blood stained stools), and some pediatric diseases have similar clinical manifestations 4 .Thus, its diagnosis can be easily delayed or missed during the initial emergency visit, delayed treatment can cause sepsis or even hypovolemic shock 5 .Patients with severe symptoms or failed enemas require prompt identification of surgical indications 6,7 .
The sensitivity and specificity of ultrasound for intussusception can achieve 92-100% of diagnosis 8,9 .However, the speed of the ultrasound scan exceeds 25 frames per second (FPS).An ultrasound scan includes thousands of image frames, and only a few frames with clear pathological features are useful for the diagnosis of intussusception and are thus easily missed by the human eye.In addition, noise, distortion, and artifacts degrade the quality of ultrasound images 10 .Therefore, specialized imaging knowledge and skilled sonographers are required to diagnose whether it is intussusception, and the recognition of surgical indications to assess disease severity is more challenging.
Artificial intelligence (AI) has demonstrated general applicability in diagnosing pediatric diseases using medical images 11,12 .Previous studies have also proposed deep-learning (DL) approaches to extract frames with complete pathological features from ultrasound videos for diagnosis 13,14 .AI using images labeled with a priori knowledge to diagnose intussusception has also been reported.However, they did not consider the algorithm's speed and provide surgical indications 15,16 .Furthermore, intuitively understanding the internal decision-making process of DL is challenging, thereby hindering its translation into clinical practice.
We aim to develop and validate a deep-learning pipeline for realtime navigation of diagnostic planes during ultrasound scanning to identify ileocolic intussusception and provide surgical indications using heterogeneous multicenter datasets of images and videos for retrospective testing and external validation.The tool's scalability is prospectively evaluated using a real-world clinical dataset.The performance of junior sonographers with AI-assistance is compared with those of junior, intermediate, and senior sonographers.We also attempt to visually interpret the "black box" of DL's internal decisionmaking to boost the sonographer's confidence in the algorithm.

Characteristics of patients
Figure 1 illustrates the flow diagram of this study.Epidemiological characteristics, medical image features, and final diagnoses for the three data sets are summarized in Table 1.In the retrospective datasets with 14,085 images from 8736 children with ileocolic intussusception, the median age was eight months (range, 3-36), 63.5% were male, 8.4% required surgery, and the classic triad of symptoms was observed in less than one-third of the patients.

Model generalisation to an external retrospective video dataset
We added the "Con-best.py"program to the YOLOv5 model to select the optimal standard diagnostic plane with the highest Confidence in each video.Subsequently, the AI system was tested using 184 ultrasound videos from 184 patients (including cases with no intussusception, nonsurgical intussusception, and surgical intussusception) with an AUC of 0.958 (95% CI, 0.909-1.000),0.953 (95% CI, 0.919-0.988),and 0.956 (95% CI, 0.902-1.000),respectively, with an average-AUC of 0.956 (95% CI, 0.961-0.991)and a median FPS of 91 (range, 83-101) (Table 3).The model not detecting the nonsurgical doughnut, nonsurgical sleeve, or surgical signs in a video indicated that the patient did not have ileocolic intussusception.

Visual interpretation of deep-learning internal decisionmaking
Four images were cropped at any angle, and their brightness and contrast were adjusted.Then they were stitched into a "Mosaic" enhanced image (Fig. 3a).The learning effect of each convolutional layer can be explained using the visualized feature maps.Here, the last convolutional layer of YOLOv5 was chosen to map the features to the range of 0-255 and convert them into images.The binary grayscale image revealed that the convolutional layer learned to recognize the doughnut sign (Fig. 3b).A heat map, which visualizes the activation of convolutional layer features, further aided in determining whether the model can correctly identify image features.A heat map was created by extracting the activation value from the last convolutional layer of YOLOv5 and multiplying it with the average gradient feature map value.The red and yellow regions in Fig. 3c signify the model's identification of the lesion area.The AI system automatically identified the images as a nonsurgical doughnut, nonsurgical sleeve, and surgical signs and labeled the corresponding lesion areas with a Confidence of 0.97, 0.97, and 0.98, respectively.The values in Fig. 3d-f represent the Confidence, calculated using Eq. 1.In this equation, P (object) was assigned 1 when the category was accurately predicted and 0 otherwise.
Generalizing the model to real-world ultrasound diagnostic scenarios and comparing the performances of four groups of sonographers We connected this tool to ultrasound machines to assess the performance of the deep-learning pipeline in a real-world VS Fig. 1 The AI system for diagnosing ileocolic intussusception and providing surgical indications.a Model development for the AI system and selecting the best-performing model.The system includes a pipeline consisting of an image normalization module, an image enhancement module, and an image analysis module.b Application and evaluation of the AI system.4, and Supplementary Fig. 1).

DISCUSSION
In this study, we developed a real-time multiobjective detection and tracking deep-learning model using multicenter heterogeneous datasets and ultrasound imaging characteristics to diagnose ileocolic intussusception and provide surgical indications and tested retrospectively and prospectively.We assessed the retrospective and prospective data.This model achieved average AUCs of 0.972 and 0.956 on the internal image test set and the external video dataset, respectively.In the real-world ultrasound diagnostic scenarios in 242 volunteers, the performance of junior sonographers was significantly improved with AI-assistance (average-AUC: 0.966 vs. 0.857, P < 0.001; median scanning time: 9.46 min vs. 3.66 min, P < 0.001), which surpassed that of intermediate sonographers (average-AUC: 0.966 vs. 0.919, P = 0.039; median scanning time: 9.46 min vs. 6.98 min, P < 0.001), but was comparable to that of senior sonographers (average-AUC: 0.966 vs. 0.973; P = 0.600).Overall, this diagnostic tool can assist sonographers in managing children with ileocolic intussusception.
Accurate and timely diagnosis of ileocolic intussusception and recognizing surgical indications are critical for selecting treatment plans and achieving positive treatment outcomes [1][2][3]7 . Stdies have proposed using deep-learning to diagnose intussusception in children with plain abdominal radiographs 15,16 .However, X-rays are less sensitive and specific for diagnosing intussusception than ultrasound 17 .Our algorithm achieved a higher AUC using three ultrasound datasets. Additonally, this tool processed images at a median FPS of 91 (range, 83-101) during the ultrasound scan.It displayed Confidence values and anchor boxes to guide the sonographer in adjusting the scan position in real-time, thereby enhancing the diagnostic accuracy and efficiency of less experienced sonographers.The examination time of junior sonographers was reduced from 9.46 ([IQR], 7.91-11.17)min to    3.66 min ([IQR], 2.91-4.21),which is particularly valuable because children <age 3 often do not cooperate during ultrasound scans and tend to cry.Furthermore, our algorithm identified surgical indications, facilitating the assessment of disease severity and increasing confidence in selecting the appropriate treatment options.Nevertheless, intussusception is a dynamic disease.Despite the diagnosis of a surgical indication by an experienced sonographer, a few patients whose abdomens were opened did not require partial bowel resection.Therefore, in clinical practice, even with surgical indications, preference is given to conservative, nonsurgical enemas to minimize surgical risk.Surgery is considered only after 1-3 failed enemas, depending on the status of each patient's surgical indication.The proposed model is stable and compatible because it was trained using a multicenter and multi-device dataset and tested with image, video, and real-world clinical datasets.YOLOv5 has three sets of multiscale adaptive anchor boxes that can be adapted to deviations in the physical size of images caused by different ultrasound systems 18 .The tool can be easily applied in clinical practice by connecting it to an ultrasound machine, which is particularly helpful for junior and intermediate sonographers.Furthermore, we have added a "Con-best.py"file to YOLOv5, which can select the optimal standard plane with the highest confidence in a postultrasound video.This addition will further improve the accuracy of the sonographer's diagnosis because even skilled senior sonographers find it challenging to select the standard plane with the highest confidence in a high-speed ultrasound scan.
The AI system detects differential and subtle features of medical images, even beyond the observational ability and comprehension of the clinicians 19,20 .Differences in ultrasound imaging between nonsurgical and surgical intussusception have also been studied [21][22][23][24][25][26][27] .Based on the combined evidence, we suggest that a deep-learning pipeline trained using a dataset labeled with a priori knowledge can diagnose intussusception and provide surgical indications.
Our study has some limitations.First, the model was trained and validated using ileocolic intussusception ultrasound datasets and did not involve other types of intussusception, such as ileoileocolic, enteroenteric (including jejunojejunal and ileoileal), and colocolic types.However, over 90% of intussusceptions are ileocolic 3,28 , this limitation affects the generalizability of the model.When used to diagnose all types of intussusception, false negatives may lead to delayed treatment, whereas false positives may result in unnecessary enemas or surgical interventions.Second, the few surgical intussusception samples might have also affected the model's performance, despite increasing the number of images using image enhancement techniques.Third, the AI system cannot diagnose intussusception in adults due to the adult intestine's high gas and fat content and the poor quality of ultrasound images, thereby necessitating computed tomography scans or X-rays.
In conclusion, a deep-learning pipeline based on heterogeneous multicenter ultrasound datasets and their imaging features can assist sonographers in diagnosing pediatric ileocolic intussusceptions and provide surgical indications for assessing disease severity.Further training and validation using datasets involving additional types of intussusception and new technologies are needed to enhance the generalizability and performance of the model.

Datasets
The heterogeneous multicenter dataset included a retrospective dataset of images for training and internal testing, a retrospective dataset of videos for external testing, and a prospective dataset of volunteers to compare the performance of junior sonographers with AI-assistance with that of junior, intermediate, and senior sonographers (Fig. 1).After screening, the final eligible data were collected from six hospitals: three regional hospitals affiliated with the Guangzhou Women's and Children's Medical Center, namely: the Children Branch (4781 images from 2842 patients, 61 videos from 61 patients, and 242 volunteers), Zengcheng Branch (2360 images from 1409 patients and 27 videos from 27 patients), and Zhujiang New Town Branch (1748 images from 1184 patients and 33 videos from 33 patients); Children's Hospital of Zhengzhou University (2791 images from 1,651 patients and 25 videos from 25 patients); Kaifeng Children's Hospital (1361 images from 981 patients and 17 videos from 17 patients); and Dongguan Children's Hospital (1044 images from 669 patients and 25 videos from 25 patients).
All parents of the volunteers agreed to their children's participation in the prospective study.In the retrospective dataset, parents were informed in the initial admission form that their children's clinical data might be used for the study.Subsequently, the data of those whose parents did not object were included.The study was approved by the local ethics committee and institutional review board of each hospital (Guangzhou Women and Children's Medical Center: [2021] No. 486B01; Children's Hospital Affiliated to Zhengzhou University: 2022-H-K29; Kaifeng Children's Hospital: [2021] 127; Dongguan Children's Hospital: LL2022121501).

Retrospective image datasets
The initial image dataset included 15,776 images of ileocolic intussusception containing nonsurgical doughnut, nonsurgical sleeve, and surgical signs in 9725 patients aged 3-36 months who underwent ultrasound between January 2017 and December 2021.It also included 1-8 images with typical pathological features in each child from the electronic medical record systems.Inclusion criteria were based on the discharge outcomes of the patients, image quality, and the consensus of three ultrasound experts who reviewed the data (Prof.Haiwei Cao, Prof. Hongkui Yu, and Prof. Hongying Wang with 13, 15, and 21 years of experience, respectively).In total, 1,048 initially incorrectly extracted diagnostic planes from 629 patients were excluded: (1) 264 from 147 patients with incomplete or poor quality pathologic features; (2) 614 from 379 patients whose initial ultrasounds were misdiagnosed as other abdominal conditions, but who were ultimately diagnosed with nonsurgical or surgical intussusception; (3) 143 from 91 patients initially diagnosed with nonsurgical or surgical intussusception using ultrasound, but eventually diagnosed with other abdominal conditions; (4) 27 from 12 patients misdiagnosed as intussusception requiring surgery, but did not need partial bowel resection on opening.

Retrospective video datasets
The video dataset included patients first seen in the emergency department for acute abdominal conditions between October 2021 and June 2022, and only those the emergency department physicians initially diagnosed with suspected intussusception based on experience and rapid laboratory tests were sent to our ultrasound department for further examination.Patients with other types of intussusception (n = 15) and other abdominal diseases (n = 7), according to the actual outcome of the patient's final discharge, were excluded.In addition, poor-quality videos (n = 11) were excluded.The final 184 patients with ultrasound diagnoses of no intussusception, nonsurgical ileocolic intussusception, or surgical ileocolic intussusception were selected, with one video selected for each patient.

Prospective volunteer datasets
Patients who were initially diagnosed with suspected intussusception in the emergency department and referred to our ultrasound department for further examination were recruited as volunteers between April 2023 and May 2023.Based on the patient's final discharge record, patients with other types of intussusception (n = 21) and other acute abdominal diseases (n = 27) were not included in the statistical analysis.The final 242 patients, including those with no intussusception, nonsurgical ileocolic intussusception, and surgical ileocolic intussusception, were used to prospectively validate the scalability of this system and compare the performance of the four sonographer groups with different skill levels in real-world scenarios.

Ultrasound equipment
The ultrasound equipment used was as follows: ( 1

Ultrasound image analysis
According to the guidelines for managing intussusception in children 29 , ultrasound imaging of intussusception displays typical features.It appears as a doughnut and a sleeve sign in the transverse and longitudinal view, respectively.These images with clear and complete pathological features are standard planes 8,9 .Compared with temporary intussusception, surgical intussusception presents more coexisting features such as longer intussusception, thicker edematous intestinal wall, larger doughnut diameter, pneumoperitoneum, signs of peritonitis, peritoneal fluid, and "trapped" fluid between the intestinal walls, which may indicate a higher surgical risk (Fig. 5) [21][22][23][24] .In this study, In this study, the aforementioned evidence was the 'gold standard', while the patient's final discharge record and the ultrasound expert's labeling of the images were employed as the 'silver standard' for diagnosis.
Two pediatric ultrasound experts (Prof.Haiwei Cao and Prof. Hongkui Yu) selected 14,085 standard planes in 8,736 patients with ileocolic intussusception via consensus.Subsequently, three types of lesions (nonsurgical doughnut, nonsurgical sleeve, and surgical sign) were labeled using their expertise and in conjunction with the patient's discharge records using an online image labeling and intelligence enhancement tool (Roboflow, https:// www.roboflow.com).Due to the fact that some ultrasound images of intussusception are not easily distinguishable into the surgical or nonsurgical types, and there were also some initially extracted standard planes that were incorrect, leading to inconsistencies between the initial diagnostic results and the actual results of the final discharge, which require review in the discharge record and exclusion by the ultrasound experts.In cases of disagreement, a third pediatric ultrasound expert (Prof.Hongying Wang) was consulted.

AI Model
We modified the YOLOv5 algorithm (https://github.com/ultralytics/yolov5) for real-time "intelligent navigation" of the standard planes to diagnose ileocolic intussusception and provide surgical indications during ultrasound scanning (Fig. 6).Previous studies have also proposed the term "intelligent navigation" 30,31 .Here, "intelligent navigation" refers to the automatic recognition of standard planes during ultrasound scanning.The model displays anchor boxes and Confidence values on the lesions, prompting the sonographer to adjust the position and direction of the scan to capture the optimal standard plane.
The YOLOv5 predicts the boundaries of targets in images, classifies and localizes them using probability, and achieves endto-end image detection, with the ability to process images at 45-155 FPS 18 .Within the YOLOv5 framework, 'depth_multiple' signifies the model's depth, determining the number of modules (number × depth).Similarly, 'width_multiple' represents the model's width, regulating the count of convolutional channels (number × width).We selected 'depth_multiple' = 0.33 and 'width_multiple' = 0.50.In this configuration, the network's depth was reduced by a factor of three, and the number of convolutional channels was reduced by half.This approach enhances image processing speed and decreases reliance on high-end computer hardware configurations.
Data augmentation is used to expand datasets to improve convolutional neural network performance, making models more robust and preventing overfitting 32   geometry of the images.The "Mosaic" enhancement feature of YOLOv5 was used to stitch four images together into a single image.This significantly improves the model's ability to recognize images with weak features 33 .
The model outputs standard planes with different confidence values during each ultrasound scan.Therefore, we added a "Conbest.py"file to select the standard plane with the highest confidence value from each video after the ultrasound scan.

Statistical analysis
Normalized confusion matrices were used to depict the classification results of the three types of patients.AUC with 95% confidence intervals (CIs) of the four sonographer groups (junior, intermediate, senior sonographers, and junior sonographers with AI-assistance) and the three AI (Faster R-CNN [https://github.com/rbgirshick/py-faster-rcnn], YOLOv5, and modified YOLOv5) were compared using the DeLong test and "pROC" package.Wilcoxon signed-rank test was used to compare the scanning time of the four observer groups and the median FPS of the three AIs because the non-normality of these data distributions was assessed beforehand using the Kolmogorov-Smirnov test.The performances of the four sonographer groups were evaluated using accuracy, sensitivity, specificity, AUC, and Fleiss' Kappa.The shortest two-sided 95% CIs were reported for each experiment.Data were analyzed using R statistical software (version 4.1.1,R Core Team, 2021).P < 0.05 was considered indicative of a statistically significant difference.

Fig. 2 Fig. 3 Fig. 4
Fig. 2 Prediction results of the deep-learning system on the internal test set of 1,086 ultrasound images in 632 patients.a Normalized confusion matrix of images.b AUCs of images.c Normalized confusion matrix of patients.d AUC of patients.

Fig. 5
Fig. 5 Three types of ileocolic intussusception lesions were labeled by ultrasound experts.a NSD Nonsurgical doughnut sign.b NSS Nonsurgical sleeve sign.c, d SSI Surgical sign.

Fig. 6
Fig.6The modified YOLOv5 for diagnosis of pediatric intussusception and providing surgical indications.Its structure consists of a backbone network to extract features, a feature pyramid to obtain multiscale features, a prediction head to generate target predictions, and a loss function to optimize the model.

Table 1 .
Datasets for training, validation, testing, prospective pilot study, and Characteristics of patients.
N = Number of patients, images or videos.Data are presented as n (%) or median (range).*Percentages may not add to exactly 100% because of rounding.NON Normal or not intussusception patients, NSI Nonsurgical intussusception patients, SIP Surgical intussusception patients, NSD Nonsurgical doughnut sign, NSS Nonsurgical sleeve sign, SSI Surgical sign.

Table 2 .
Performance of the deep-learning system on the internal test set of 1086 ultrasound images in 632 patients.
AUC The area under the receiver operating characteristic curve, Average-AUC the arithmetic mean of the AUC for each class, FK Fleiss' Kappa, FPS Frames per second.

Table 3 .
Performance of the deep-learning system on the external retrospective 184 videos in 184 patients.

Table 4 .
Performance of the four groups of sonographers in the diagnosis of 242 volunteers.