Introduction

Necrotising enterocolitis (NEC) is one of the most serious conditions in newborns, affecting up to 10% of very-low-birth-weight infants. In the most premature population, mortality rates can increase to as high as 60%.1

The precise aetiology remains unclear. However, fundamental risk factors, such as prematurity, enteral feeding, intestinal colonisation, and bowel ischaemia, have been well established.2 NEC mainly affects the distal ileum and colon. Clinical symptoms include feeding intolerance, abdominal distension, and bloody stools. In the most severe cases, patients may present with abdominal wall erythema, apnoeic spells, lethargy, and septic shock.3 The suspected diagnosis is confirmed with typical findings on abdominal radiography (AR), including pneumatosis intestinalis (PI), portal vein gas (PVG), and in extreme cases, pneumoperitoneum.4

In 1978, Bell developed a staging system for NEC, which was modified by Walsh and Kliegmann in the mid-1980s.5 A combination of clinical symptoms and AR findings allows for the grading of interventions and standardised treatment. This widely adopted classification is based on plain radiology findings, despite the first reports on ultrasound (US) use for NEC diagnosis being published before Walsh and Kliegmann5 in the early 1980s.6,7 Abdominal ultrasonography can depict PI, PVG, and pneumoperitoneum (in some cases, the head of the AR); however, it also provides other crucial information, such as bowel wall viability (thickness or thinning) and free abdominal fluid. These additional findings are helpful for the diagnosis and management of NEC.

Despite several studies, NEC remains a conundrum. Timely diagnosis and treatment implementation remain a major challenge for neonatologists worldwide. Despite numerous efforts, morbidity rates have not improved over time. This is mainly due to the fact that early symptoms of NEC are often non-specific and difficult to distinguish from benign illnesses, such as apnoea of prematurity or feeding intolerance. Moreover, NEC is a rare disease; hence, it is difficult to gain professional experience and develop expertise in its timely diagnosis and treatment. The rarity of NEC also hinders the creation of effective clinical protocols owing to the lack of high-quality data. Electronic health records are often incomplete and lack crucial information, such as the outcomes of AR or US studies. Moreover, data collection procedures vary from country to country, increasing the difficulty of creating and benchmarking NEC diagnosis protocols.

Artificial intelligence (AI) is a broad discipline with the main purpose of providing machines the ability to perform actions that require domain knowledge and intelligence (driving a car, conducting surgery, patient diagnosis, etc.). Over the last decades, we have witnessed rapid progress in machine learning (ML), a subdomain of AI that has also made big strides in healthcare.8 ML is a data-driven method that can assist in decision making in healthcare. Data-driven methods are constructed solely based on retrospective data with no expert input. Despite several reported successes with retrospective datasets, AI methods have yet to have an impact on clinical practice8 and experience commercial success.9

ML is divided into three categories: unsupervised, supervised, and reinforcement learning. Supervised learning is used when the ground truth (GT) is available (e.g., whether the newborn has NEC or not).10 The supervised ML model predicts the GT from the input data. In data science, the input variables are often referred to as features or explanatory variables. Among the most popular supervised ML models are support vector machines (SVMs), decision trees (DTs), linear regressions (LRs), and naive Bayes (NB). Unsupervised learning is often used when there is no available GT or when the goal is to find new patterns in the data. Typical tasks in unsupervised learning are clustering (grouping data samples) and anomaly detection (e.g., detecting arrhythmia in the ECG signal). Reinforcement learning is devoted to training an agent’s (a car, AI-controlled computer game player, or an autonomous robotic surgeon) behaviour to achieve the maximum reward (e.g., successful surgery).11 This type of ML is frequently used in robotics.

The greatest limitation of AI systems is that they adequately generalise their behaviour to an unknown environment (e.g., diagnosing a patient with previously unseen symptoms). With the rise of computational capabilities, such as those of artificial neural networks (ANNs), this issue has been partially overcome. Deep ANNs have opened the path to a new area of ML: deep learning (DL). Deep refers to the fact that, in deep networks, information is processed by several layers of ANN (hundreds or even thousands), whereas in shallow networks, there are only a few of these. DL has quickly become the go-to tool for several tasks owing to its robustness and multiple developed DL architectures of ANN suited for different problems. For example, the early architectures of deep convolutional neural networks (CNNs) were able to improve image recognition tasks by 20% (e.g., if provided a picture of a horse, the algorithm classifies the picture as showing a horse) compared with standard computer vision algorithms.12 Currently, state-of-the-art CNNs surpass human performance in image recognition tasks. Furthermore, deep ANNs completely transform the abilities of computer programmes to understand spoken or written languages.13

This narrative review aimed to summarise the currently available literature on the use of AI in diagnosing NEC, highlight open issues, and identify future directions for implementing AI in clinical neonatal practice (Fig. 1).

Fig. 1: Clinical and radiographic features of necrotising enterocolitis.
figure 1

Panel a shows an infant with a shiny, distended abdomen with periumbilical erythema. (Photograph courtesy of Dr David Kays, Department of Pediatric Surgery, University of Florida.) In the radiograph shown in panel b, the upper arrow points to portal air, and the lower arrow points to a ring of intramural gas, which is indicative of pneumatosis intestinalis. (Radiograph courtesy of Dr Jonathan Williams, Department of Pediatric Pathology, University of Florida.) In Panel c, the arrow points to an area of necrotic bowel in a patient with necrotising enterocolitis. (Photograph courtesy of Dr David Kays, Department of Pediatric Surgery, University of Florida.). Figure reused Neu and Walker3 with permission from the New England Journal of Medicine.

Methods

Our study was a narrative review based on a systematic search strategy to gather facts from the available literature. This is in contrast to a classic systematic review, which is designed to provide an answer to a defined empirical question. Our aim was to gather data on the use of AI in NEC diagnosis. Reports were included or excluded if they met our inclusion or exclusion criteria, respectively (Table 1).

Table 1 Inclusion and exclusion criteria used in literature search.

PubMed, Embase, arXiv, and IEEE Xplore databases were searched for this narrative review. In consultation with a research librarian (K.W.), a standardised search strategy was employed using a standardised set of keywords and operators, which are listed in the Appendix. No other filtering or restrictions were applied in the search strategy. Additional strategies to identify studies included manual reviews of reference lists from key articles that fulfilled our eligibility criteria and the use of ‘related articles’ feature in PubMed. The electronic database search was supplemented by searching for grey literature: trial protocols through clinical registers (ISRCTN registry and ClinicalTrials), thesis dissertation (sourced through NDLTD and EthOS), conference proceedings (searched through Web of Science and Embase), and other grey literature databases (OpenGrey and Trip database). Details of the search strategy are presented in the Appendix.

Results

Eight relevant publications were identified. Figure 2 shows a flow diagram of the selection process. We divided the articles into two groups based on the type of ML used.

Fig. 2: Flow diagram of the study selection process.
figure 2

The search identified 118 publications, this number was reduced to only 8 by duplicate removal, screening, and applying eligibility criteria.

Classic ML in decision support in NEC

Classic ML algorithms (such as DTs, SVMs, and LRs) are often utilised for supervised learning using clinical data. Clinical data and other numerical values are often in the form of tabulated values, and classic ML methods are designed for use with such datasets. In a paper published by Mueller et al.,14 ANN was used to diagnose NEC using a retrospective clinical dataset from a single institution. The authors included a relatively small dataset of 197 premature infants, of whom 67 were diagnosed with NEC. Data were obtained from the Perinatal Information System (PINS) database of the Medical University of South Carolina. Fifty-seven variables from the PINS database were selected as features after performing a literature review and discussion within an expert panel, which features may be relevant to the diagnosis of NEC. The authors investigated the importance of variables that can be used to perform accurate decision making in the case of NEC prediction. This study did not include the statistical results for NEC prediction.

Ntonfo et al.15 presented a different approach for NEC detection. An infrared camera was used for thermal image acquisition instead of collecting clinical data or radiological images. After initial image preprocessing operations, thermal signatures were extracted from newborn abdominal thermal images. The statistical features acquired from the signatures can be fed into a classifier for NEC diagnosis. The features showed different characteristics in children with and without NEC. Unfortunately, this method was only tested in two newborn children and should be further investigated.

Irles et al.16 developed two estimation models for intestinal perforation (IP) based on a back-propagation ANN: (a) at birth and (b) at birth and during hospitalisation. The study cohort included three groups: (1) control group without NEC (N = 27), (2) NEC group (N = 23), and (3) IP (Bell’s stage IIIB) (N = 26). They excluded 15 cases with incomplete clinical information, spontaneous or not associated with NEC IP, as well as digestive tract malformations. For further analysis, they chose 113 variables of maternal and neonatal clinical, feeding, and laboratory parameters from the medical record data of the neonatal intensive care unit (NICU) in a single institution. This study aimed to obtain an ANN-based model to estimate IP associated with NEC diagnosis and investigate key factors for prediction. The regression coefficient between the experimental and predicted data is R2 > 0.97. They found that the male sex was a highly predictive parameter for NEC-associated IP. However, more studies are needed to confirm that the male sex is more likely to progress to IP. These models may allow for quality improvement in medical practice. The main limitation of this study is its single-centre nature and relatively small dataset.

Lure et al.17 used random forest and ridge logistic regression to discriminate between NEC and spontaneous IP (SIP). These diseases are difficult to differentiate without bowel visualisation; however, ML algorithms accurately separate NEC and SIP. The risk factors for NEC, including very low gestational age at birth, placental abruption, and asphyxia, were used as explanatory variables (tabulated input values). It has been shown that this method can improve the clinical decision-making process prior to any surgical intervention. The experiments were conducted using a dataset of 40 patients collected from the University of Florida.

Lueschow et al.18 suggested that no formal comparison between the multiple existing NEC definitions has been performed. They investigated the performance of these definitions and applied ML techniques to test their ability to diagnose NEC. To conduct the experiments, a cohort dataset of >200 patients acquired over 10 years from a single institution was analysed. The features (explanatory variables) selected for the experiments were those required for different NEC diagnosis methods: Bell staging,5 modified Bell staging,19 and non-Bell NEC definitions.20,21 For each NEC definition, six ML classifiers (K-nearest neighbours, simple neural network, NB, random forest, SVM, and DT) were trained on the features required by the definition. NEC diagnosis with ML outperformed traditional criteria in terms of specificity and sensitivity and opened a discussion for further examination and the development of new NEC definitions. Moreover, newer definitions were more accurate than the Bell-based criteria. In addition, feature importance analysis was performed, and the authors suggested that features containing values from the specific range: volume of feeding at NEC onset, and gestational age, can be more informative than simple, binary (yes/no) features.

Use of DL in decision support

In this section, we describe publications that use DL algorithms for NEC diagnosis. Van Druten et al.22 proposed a computer-aided diagnosis (CAD) system that consists of an ensemble (outputs of multiple algorithms are combined) of conventional ML and DL algorithms. The authors used AR images of participants with radiological patterns consistent with NEC as well as those without these patterns. Radiologists identified NEC-related patterns on AR images. No information about the study cohort used was noted. The classic ML algorithms include a feature extraction algorithm (e.g., texture-based feature extraction), feature selection, and classification. This study aimed to produce heatmaps for various imaging features to highlight NEC pathology on ARs. In the DL algorithm, ARs were automatically analysed using a deep neural network for their automatic classification. The CAD-based system compares and qualifies the prediction accuracy of conventional ML and DL approaches. As the final output, the algorithm uses a visualisation technique that highlights areas on the AR images with NEC features. The authors did not provide any information on the number of datasets used to develop the methods. However, no quantitative evaluation has been conducted.

Gao et al.23 proposed a multimodal AI-based system consisting of feature engineering, ML, and DL algorithms. Feature engineering in data science is a process of creating additional features from existing features (e.g., having features a and b, we engineer new features a/b). A multimodal (as opposed to unimodal) AI model uses different types of data when computing the output (e.g., the probability of NEC). They evaluated the proposed system using ARs and clinical data from a single institution. The study cohort included 2234 infants, including 1201 non-NEC, 622 NEC, and an independent group of 411 NEC patients, including surgical and medical NEC. Some infants were excluded from the study due to the lack of complete clinical parameters or poor image quality, and 827 infants (342 NEC and 385 non-NEC) were selected for the analysis. They used ARs and clinical data; hence, a multimodal approach. In Gao et al.’s study, the authors identified significant features of ARs and clinical data that were closely related to the AI diagnosis and prediction of the success of surgical intervention for NEC. The authors found that the AI system was capable of predicting which NEC patients would have a higher likelihood of successful surgery. The limitations of this study include the lack of genetic information, microbiome data, and biochemical parameters of the infants. The authors concluded that AI could be used as an auxiliary diagnostic tool to confirm their results in a prospective clinical trial.

Lin et al.24 proposed a novel, interpretable neural network-based architecture solution for independent microbiota DNA sequences. Multicentre datasets came from two centres and contained 3595 stool samples from 261 at-risk infants, 75 of whom developed NEC. The data were collected noninvasively. In addition, 10 clinical metadata features collected and reported in both of the two historical studies, including maternal information, such as age, parity, and biometric data, and details about the birth, were used. In this study, the authors used DL neural networks to estimate NEC risk. They introduced a novel ML method called the ‘growing bag’ analysis, which models time evolution. They found that NEC predictions can be made on an average of 8 days before disease onset. The system described in this study generates a longitudinal NEC risk score from a limited set of bacterial taxa and basic clinical metadata. It allows early and accurate NEC prediction, with a mean sensitivity and specificity of 86% and 90%, respectively.

Discussion

The number of publications on the use of ML in NEC diagnosis is small, as we discovered in this study. We found only eight publications using the inclusion and exclusion criteria. Some of those publications were conference papers, which typically underwent a less strict peer-review process, with missing key information, such as the number of patients. One commonality of all the papers that we found is the limitations, sometimes severe, in the size of the data that was used. ML models, particularly DL models, require high-quality large datasets for model training and testing. Obtaining such datasets for NEC is difficult owing to the rarity of the disease that, in our opinion, will hinder the development of such models in the future compared with other AI applications for more common diseases. Medical institutions differ in the ways NEC patients are diagnosed and cared for, which makes the process of developing robust and generalisable AI models even more difficult. This heterogeneity of data from different sources puts even more demand on the amount needed to create an algorithm that can be successfully used across different institutions.

Notwithstanding the difficulties associated with the lack of data, we also found results that encourage the pursuit of AI as a decision support tool in NEC diagnosis. The best example is the recent work of Lin et al.24 The authors predicted the future occurrence of NEC in patients with impressive sensitivity and specificity of 86% and 90%, respectively. These results are encouraging and demonstrate the potential of AI in improving the care of patients with NEC.

In the reviewed work, AI was used to generate inferences from data acquired at the neonatal stage to diagnose NEC. AI can be used before conception and during the perinatal and neonatal stages. ML algorithms can help predict the success rate of in vitro fertilisation treatment and outcomes,25,26 as well as live births from embryo data.27 Using images of embryos, AI can increase the chances of successful implantation and development into pregnancy28 during the selection process. In the perinatal period, AI solutions help improve maternal and foetal care, thereby increasing the chances of successful delivery. The most common solutions are preterm birth risk assessment,29,30,31 foetal biometry measurement,32,33,34 foetal heart disease detection,35,36,37 and computer-assisted fetoscopic surgical treatment.38,39 In NICUs, AI is useful for monitoring patient vital signs and has been shown to predict life-threatening situations, including birth asphyxia, seizures, respiratory distress symptoms, or sepsis.40,41

NEC has an unknown aetiology, and factors present in the preconception, perinatal, and neonatal stages leading to the development of NEC in newborns may contribute to the disease. As such, an AI model capable of accurately diagnosing NEC may need to consider a whole range of factors and various types of retrospective data from all stages. As described in the previous paragraph, there are efforts to construct AI models applied to different tasks, and the idea of combining all those models and using the data on the entire spectrum leading to NEC may be a future solution to create a robust AI system for NEC diagnosis. However, we note that there is even more demand for data to create an AI model.

Recommendations from decision support systems are largely opaque, which means that recommendations are provided; however, they do not explain why. This is a serious limitation, especially when used in high-stake decision making, such as in medicine. Considering that there is an AI tool diagnosing NEC in newborn children and evaluating whether a specific child with severe symptoms does not have to undergo the surgery, how can a physician reconcile such a recommendation if he/she is of a different opinion? This is especially difficult if a recommendation comes without an explanation.42

Explaining the complex computations behind AI recommendations is a subject of worldwide research efforts and will be important for successful AI solutions used for NEC diagnosis. A new research area devoted to reliable AI systems was created and called explainable artificial intelligence (XAI).43 XAI focuses on developing new algorithms or DL architectures that would help understand the decisions made by AI systems or generate a proper explanation. For instance, gradient-weighted class activation mapping44 is a method for generating explanations (heatmaps) that can be superimposed over images analysed by AI. If AI provides an assessment of AR for patterns of PI or others consistent with NEC, it highlights an image region where these structures are present. Other algorithms have also been designed to explain these decisions. These include model-agnostic explanations,45 which use linear relationships to simplify more complex models, or Shapley additive explanations, which generate explanations based on the game theory.46 Unfortunately, as of 2022, all XAI methods have significant drawbacks and cannot be sufficiently generalised to make them more widely used, particularly in medicine.

There are opportunities to utilise AI decision support systems for the diagnosis and treatment of NEC. Improvements in the interpretation of the available data during the diagnostic process are a natural avenue for using AI, which is also reflected in the literature. NEC diagnosis is time and resource consuming and is often performed with AR imaging that uses ionising radiation. Non-Ionising imaging using US is an optional imaging technique for NEC with no harmful side effects. The value of US for NEC diagnosis has been widely recognised in recent years.47 However, it is difficult to perform and interpret, and AI may help physicians in performing high-quality examinations as well as in interpreting US images.

AI has the potential to be invaluable, especially for clinical practice with little experience in NEC diagnosis. The papers reviewed here demonstrate the potential of this approach. However, the road to clinical implementation is unclear, and no studies have documented successful AI implementation in a clinical setting. We found only eight relevant papers on the topic of this review, which is an important limitation of this study. Drawing strong conclusions about trends in the literature based on such a small sample size is impossible.

Conclusions

In this narrative review, we present the currently available literature on the use of AI and ML to diagnose NEC in newborns. Only a small number of publications relevant to this topic were found. We recognise that there is a substantial need for further research to fill this gap. AI, especially DL, has the potential to improve NEC diagnosis and provide predictions of treatment outcomes, as shown by reviewed work; however, no literature exists showing its clinical impact. We emphasise that the opaque predictions of DL models (black-box predictions) and the lack of large multi-institutional datasets evident from the review will hinder and slow down the development and implementation of clinical AI systems for NEC diagnosis in the near future.