The past, current, and future of neonatal intensive care units with artificial intelligence: a systematic review

Machine learning and deep learning are two subsets of artificial intelligence that involve teaching computers to learn and make decisions from any sort of data. Most recent developments in artificial intelligence are coming from deep learning, which has proven revolutionary in almost all fields, from computer vision to health sciences. The effects of deep learning in medicine have changed the conventional ways of clinical application significantly. Although some sub-fields of medicine, such as pediatrics, have been relatively slow in receiving the critical benefits of deep learning, related research in pediatrics has started to accumulate to a significant level, too. Hence, in this paper, we review recently developed machine learning and deep learning-based solutions for neonatology applications. We systematically evaluate the roles of both classical machine learning and deep learning in neonatology applications, define the methodologies, including algorithmic developments, and describe the remaining challenges in the assessment of neonatal diseases by using PRISMA 2020 guidelines. To date, the primary areas of focus in neonatology regarding AI applications have included survival analysis, neuroimaging, analysis of vital parameters and biosignals, and retinopathy of prematurity diagnosis. We have categorically summarized 106 research articles from 1996 to 2022 and discussed their pros and cons, respectively. In this systematic review, we aimed to further enhance the comprehensiveness of the study. We also discuss possible directions for new AI models and the future of neonatology with the rising power of AI, suggesting roadmaps for the integration of AI into neonatal intensive care units.


Introduction AI Tsunami.
Advances in artificial intelligence (AI) are constantly changing almost all fields including healthcare; it is challenging to track the changes by AI as there is no single diagnosis), detection (i.e., localization) and segmentation (i.e., pixel level classification in medical images). Metrics for evaluations in those studies were the standard metrics such as sensitivity (true-positive rate), specificity (true-negative rate), false-positive rate, falsenegative rate, receiver operating characteristics (ROC), area under the ROC curves (AUC), and accuracy.
We review the past, current, and future of AI-based diagnostic and monitoring tools that might aid neonatologist's patient management and follow-up. We discuss several AI designs for electronic health records, image, and signal processing, analyze the merits and limits of newly created decision support systems, and illuminate future views clinicians and neonatologists might use in their normal diagnostic activities. AI has made significant breakthroughs to solve issues with conventional imaging approaches by identifying clinical variables and imaging aspects not easily visible to human eyes. Improved diagnostic skills could prevent missed diagnoses and aid in diagnostic decisionmaking. The overview of our study is structured as illustrated in Figure 2. Briefly, our objectives in this review are: • To explain the various AI models and evaluation metrics thoroughly explained and describe the principal features of the AI models. • To categorize Neonatology-related AI applications into macro-domains, to explain their sub-domains and the important elements of the applicable AI models.  How do ML and DL work? AI covers a broad concept for the application of computing algorithms that can categorize, predict, or generate valuable conclusions from enormous data sets 5 . Algorithms such as Naive Bayes, Genetic Algorithms, Fuzzy Logic, Clustering, Neural Networks, Support Vector Machines, Decision Trees, and Random Forests have been used for more than three decades for detection, diagnosis, classification, and risk assessment in medicine as Machine Learning (ML) methods 12,13 . Conventional ML approaches for image classification involve using hand-engineered features, which are visual descriptions and annotations learned from radiologists, that are encoded into algorithms.
Images, signals, genetic expressions, electronic health records (EHR), and vital signs are examples of the various unstructured data sources that comprise medical data ( Figure  3). Due to the complexity of their structures, DL frameworks may take advantage of this heterogeneity by attaining high abstraction levels in data analysis.
While ML requires manual/hand-crafted selection of information from incoming data and related transformation procedures, DL performs these tasks more efficiently with higher efficacy 5,12,13 . DL is able to discover these components by analyzing large number of samples with high degree of automation 8 . Literature on these ML approaches is extensive before the development of DL 3,4,8 . It is essential for clinicians to understand how the suggested ML model should enhance patient care. Since it is impossible for a single metric to capture all the desirable attributes of a model, it is customarily necessary to describe the performance of a model using several different metrics. Unfortunately, many end-users do not have an easy time comprehending these measurements. In addition, it might be difficult to objectively compare models from different research models, and there is currently no method or tool available that can compare models based on the same performance measures 14 .In this part, the common ML and DL evaluation metrics are explained so neonatologists could adapt them into their research and understanding of upcoming articles and research design 14,15 .

Term Definition True Positive (TP)
The number of positive samples that have been correctly identified

True Negative (TN)
The number of samples that were accurately identified as negative

False Positive (FP)
The number of samples that were incorrectly identified as positive

False Negative (FN)
The number of samples that were incorrectly identified as negative.

Accuracy (ACC)
The proportion of properly identified samples to the total sample count in the assessment dataset The accuracy is limited to the range [0, 1], where 1 represents properly predicting all positive and negative samples and 0 represents successfully predicting none of the positive or negative samples.

Recall (REC)
The sensitivity or True Positive Rate (TPR) is the proportion of properly categorized positive samples to all samples allocated to the positive class. It is computed as the ratio of correctly classified positive samples to all samples assigned to the positive class Specificity (SPEC) The negative class form of recall (sensitivity) and reflects the proportion of properly categorized negative samples Precision (PREC) The ratio of correctly classified samples to all samples assigned to the class.

Positive Predictive Value (PPV)
The proportion of correctly classified positive samples to all positive samples.

Negative Predictive Value (NPV)
The ratio of samples accurately identified as negative to all samples classified as negative.

F1 score (F1)
The harmonic mean of precision and recall, which eliminates excessive levels of either.

Cross Validation
A validation technique often employed during the training phase of modeling, without no duplication among validation components.

AUROC (Area under ROC curve -AUC)
A function of the effect of various sensitivities (true-positive rate) on false-positive rate. It is limited to the range [0, 1], where 1 represents properly predicting all cases of all and 0 represents predicting the none of cases.

ROC
By displaying the effect of variable levels of sensitivity on specificity, it is possible to create a curve that illustrates the performance of a particular predictive algorithm, allowing readers to easily capture the algorithm's value Overfitting Modeling failure indicating extensive training and low performance on tests

Underfitting
Modeling failure indicating inadequate training and inadequate test performance

Dice Similarity Coefficient
Used for image analysis. It is limited to the range [0, 1], where 1 represents properly segmenting of all images and 0 represents successfully segmenting none of images 3. AI in Neonatology AI is commonly utilized everywhere, from daily life to high-risk applications in medicine. Although slower compared to other fields, numerous studies began to be available in the literature investigating the use of AI in neonatology. These studies have used various imaging modalities, electronic health records, and ML algorithms, some of which have barely gone through the clinical workflow. Though, there is no systematic review and future discussions in particular this field [16][17][18] . Many studies were dedicated to introducing these systems into neonatology. However, the success of these studies has been limited. Lately, research in this field is moving towards a more favorable direction due to exciting new advances in DL. This section includes Neonatology applications with ML approaches.

ML Applications in Neonatal Mortality
Motivation: Neonatal mortality is a major factor in child mortality. Neonatal fatalities account for 47 percent of all mortality in children under the age of five, according to the World Health Organization 19 . It is therefore a priority to minimize worldwide infant mortality by 2030 20 to 21 . Approach: ML investigated the infant mortality and its reasons and prediction of mortality [21][22][23][24][25][26][27] . In a recent review, 1.26 million infants born from 22 weeks to 40 weeks of gestational age were enrolled. Predictions were made as early as 5 minutes of life and as late as 7 days. Average of four models per investigation were neural networks, random forests, and logistic regression (58.3 %). Two studies (18.2%) completed external validation, although five (45.5%) published calibration plots. Eight studies reported AUC, and 5 supplied sensitivity and specificity. The AUC was 58.3%-97.0%. Sensitivities averaged 63% to 80% and specificities 78% to 98%. Linear regression analysis was the best overall model, despite having 17 features. This analysis highlighted the most prevalent AI neonatal mortality measures and predictions. Future research is suggested to focus on external evaluation, calibration, and implementation of healthcare applications 26 .

ML Applications in Neurodevelopmental Outcome
Motivation: Recent advancements in neonatal healthcare have resulted in a decrease in the incidence of severe prenatal brain injury and an increase in the survival rates of preterm babies 28 . However, even though routine radiological imaging does not reveal any signs of brain damage, this population is nonetheless at a significant risk of having a negative outcome in terms of neurodevelopment [29][30][31][32] . It is essential to discover early indicators of abnormalities in brain development that might serve as a guide for the treatment of preterm children who are at a greater risk of having negative neurodevelopmental consequences 33,34 .
Approach: Morphological studies have demonstrated that preterm birth is linked to smaller brain volume, cortical folding, axonal integrity, and microstructural connectivity 35,36 . Studies concentrating on functional markers of brain maturation, such as those derived from resting-state functional connectivity (rsFC) analyses of blood-oxygen-level dependent (BOLD) fluctuations have revealed further impacts of prematurity on the developing connectome, ranging from decreased network-specific connectivity 34,37,38 . Many studies investigated the brain connectivity in preterm infants 37,39,40 and brain structural analysis in neonates 41 and neonatal brain segmentation 42 with the help of the ML methods. Similarly, one of the most important outcomes of neurodevelopment is at 2year age neurocognitive evaluations. The studies evaluated the morphological changes in brain relation to neurocognitive outcome 43-45 and brain age prediction 46,47 . It has been found that near-term regional white matter (WM) microstructure on diffusion tensor imaging(DTI predicted neurodevelopment in preterm infants using exhaustive feature selection with cross-validation 44 and multivariate models of near-term structural MRI and WM microstructure on DTI might help identify preterm infants at risk for language impairment and guide early intervention 43,45 (Table 4). One of the studies evaluated the effects of the PPAR gen activity on brain development with ML methods 48 revealed that a strong association between abnormal brain connectivity and implicating PPAR gene signaling in abnormal white matter development. Inhibited brain growth in individuals exposed to early extrauterine stress is controlled by genetic variables, and PPARG signaling has a formerly unknown role in cerebral development 48 (Table 2).
Alternative to morphological studies, neuromonitorization is shown to be an important tool for which ML methods have been frequently employed for example in automatic seizure detection from video EEG [49][50][51] and EEG bio signals in infants and neonates with HIE 52-56 . Detection of artifact 57,58 , sleep state 51 and rhythmic pattern 59 , burst suppression in extremely preterm infants 60,61 from EEG records were studied with ML methods. EEG records are often used for HIE grading 62 too. It has been shown in those studies that EEG recordings of different neonate datasets found AUC from 89% to 96% 52,53,63 , accuracy 78%-87% 62,64 regarding seizure detection with different ML methods (Table 3).

ML Applications in Predictions of Prematurity Complications (BPD, PDA, NEC, ROP):
Motivation: Another important mortality and morbidity reason in NICU is PDA (Patent Ductus Arteriosus). The ductus arteriosus is typically present during fetal stages, when the circulation in the lungs and body is regularly supplied by the mother; in newborns, the ductus arteriosus closes functionally by 72 h of age 65 . 20-50% of infants with a gestational age (GA) 32 weeks have the ductus arteriosus on day 3 of life 66 , while up to 60% of neonates with a GA 29 weeks have the ductus arteriosus. The presence of patent ductus arteriosus (PDA) in preterm neonates is associated with higher mortality and morbidity, and physicians should evaluate if PDA closure might enhance the likelihood of survival vs the burden of adverse effects [67][68][69][70] .
Approach: ML methods were utilized on PDA detection from EHR 71 and auscultation records 72 such that 47 perinatal factors were analyzed with 5 different ML methods in 10390 very low birth weight infants predicted PDA with an accuracy of 76% 71 and 250 auscultation records were analyzed with XGBoost and found the accuracy 74% 72 (Table  2).
Bronchopulmonary dysplasia (BPD is a leading cause of infant death and morbidity in preterm births. While various biomarkers have been linked to the development of respiratory distress syndrome (RDS), no clinically relevant prognostic tests are available for BPD at birth 73 . There are ML studies aiming to predict BPD from birth 74 , gastric aspirate content 73 and genetic data 75 and it has been shown that BPD could be predicted with an accuracy of up to 83.25% in the best-case scenario 74 (Table 5) , analysis of responsible genes with ML could predict BPD development with the AUC of 90% 75 ( Table  3) and combination of gastric aspirate after birth and clinical information analysis with SVM predicted BPD development with a sensitivity of 88% 73 (Table 5).
In relation to published studies in BPD with ML based predictions, long term invasive ventilation is considered as one of the most important risk factors of BPD, nosocomial infections and increased hospital stay. There are ML based studies aiming to predict extubation failure 76-78 and optimum weaning time 79 using long term invasive ventilation information. It has been shown in those studies that predicted extubation failure an accuracy with from 83,2% to 87% [76][77][78] (Table 2 and 3).
Retinopathy of prematurity (ROP) is another area of interest in the application of machine learning in neonatology 80 . ROP is a serious complication of prematurity that affects the blood vessels in the retina and is a leading cause of childhood blindness in high-income and middle-income countries, including the United States, among very low-birthweight (1500 g), very preterm (28-32 weeks), and extremely preterm infants. (less than 28 weeks) 80 . Due to a shortage of ophthalmologists available to treat ROP patients, there has been increased interest in the use of telemedicine and artificial intelligence as solutions for diagnosing ROP 80 . Some ML methods, such as Gaussian Mixture Models were employed to diagnose and classify ROP from retinal fundus images in studies 49,80,81 , and it has been reported that, the i-ROP 81 system classified pre plus and plus disease with 95% accuracy. This was close to the performance of the three individual experts (96%, 94%, and 92%, respectively), and much higher than the mean performance of 31 nonexperts (81%) 81 (Table 2) .

Other ML Applications in Neonatal Diseases
Motivation: Electronic health records (EHR) and medical records were featured in ML algorithms for diagnosis of congenital heart defects 82 , HIE (Hypoxic Ischemic Encephalopathy) 83 , IVH ( Intraventricular Hemorrhage) 84 , neonatal jaundice 85 , prediction of NEC (Necrotizing Enterocolitis) 86 ,prediction of neurodevelopmental outcome in ELBW (extremely low birth weight) infants 87,88 , prediction of neonatal surgical site infections 89 (Table 5).
Approach: Electronically captured physiologic data are evaluated as a signal data and they were analyzed with ML to detect artefact pattern 90 and predict morbidity of infant 91 .Electronically captured vital parameters ( respiratory rate, heart rate) of 138 infants(≤34 weeks' gestation, birth weight ≤2000 gram) first 3 hours of life predicted an accuracy of overall morbidity an AUC with 91% 91 (Table 5).
In addition to physiologic data, clinical data up to 12 hours after cardiac surgery of HLHS (hypoplastic left heart syndrome) and TGA (transposition of great arteries) infants were analyzed to predict PVL (periventricular leukomalacia) occurrence after surgery 92 . The Fscore results for infants with HLHS and those without HLHS were 88% and 100%, respectively 92 (Table 5).
Voice records were used to diagnose respiratory phases in infant cry 93 , to classify neonatal diseases in infant cry 94 , and to evaluate asphyxia from infant cry voice records 95 . Voice records of 35 infants were analyzed with ANN and accuracy was found 85% 94 .Cry records of 14 infant in their 1 st year of life were analyzed with SVM and GMM and phases of the respiration, crying rate were quantified with an accuracy of 86% 93 (Table  3). Support Vector Machine (SVM) was the most commonly used method in the diagnosis of metabolic disorders of newborns including MMA (methyl malonic acidemia) 96 , PKU (Phenylketonuria) 97,98 , MCADD (Medium Chain Acyl CoA Dehydrogenase Deficiency) 97 .During the Bavarian newborn screening program dried blood samples were analyzed with ML and increased the positive predictive value for PKU(71.9% versus 16,2) and for MCADD (88.4% versus 54.6%) 97 (Table 3).

Neonatology with Deep Learning
In this section, the main uses of Deep Learning (DL) in clinical image analysis are categorized into three categories: classification, detection, and segmentation. Classification involves identifying a specific feature in an image, detection involves locating multiple features within an image, and segmentation involves dividing an image into multiple parts. 8,12,[112][113][114][115][116][117][118] .

Neuroradiological Evaluation with AI in Neonatology
Motivation: Neonatal neuroimaging can establish early indicators of neurodevelopmental abnormality to provide early intervention during a time of maximal neuroplasticity and fast cognitive and motor development 31,44 . Deep Learning (DL) methods can assist in earlier diagnosis than clinical signs would indicate.
Approach: The imaging of an infant's brain using MRI can be challenging due to lower tissue contrast, substantial tissue inhomogeneities, regionally heterogeneous image appearance, immense age-related intensity variations, and severe partial volume impact due to the smaller brain size. Since most of the existing tools were created for adult brain MRI data, infant-specific computational neuroanatomy tools are recently developing. A typical pipeline for early prediction of neurodevelopmental disorders from infant structural MRI (sMRI) made up of three basic phases. (1) Image preprocessing, tissue segmentation, and regional labeling, and extraction of image-based characteristics (2) Surface reconstruction, surface correspondence, surface parcellation, and extraction of surface-based features (3) Feature preprocessing, feature extraction, AI model training, and prediction of unseen subjects 119 . The segmentation of a newborn brain is difficult due to the decreased SNR (signal to noise ratio) resulting from the shorter scanning duration enforced by predicted motion restrictions and the diminutive size of the neonatal brain. In addition, the cerebrospinal fluid(CSF) -gray matter border has an intensity profile comparable to that of the mostly unmyelinated white matter(WM), resulting in significant partial volume effects. In addition, the high variability resulting from the fast growth of the brain and the continuing myelination of WM imposes additional constraints on the creation of effective segmentation techniques. Several non-DL-based approaches for properly segmenting newborn brains have been presented over the years. These methods may be broadly classified as parametric 120-1221 , classification 123 , multi-atlas fusion 124,125 , and deformable models 126,127 The Dice Similarity Coefficient metric is used for image segmentation evaluation, higher the dice is, higher the segmentation accuracy 13 (Table  1).
In the NeoBrainS12 2012 MICCAI Grand-Challenge (https://neobrains12.isi.uu.nl), T1W and T2W images were presented with manually segmented structures to assess strategies for segmenting neonatal tissue 120 . Most methods were found to be accurate, but classification-based approaches were particularly precise and sensitive. However, segmentation of myelinated vs unmyelinated WM remains a difficulty since the majority of approaches 120 failed to consistently obtain reliable results.
Future research in the neonatal brain segmentation will involve a more thorough neural segmentation network. Current studies intended to highlight efficient networks capable of producing accurate and dependable segmentations while comparing them to existing conventional computer vision techniques. In the perspective of comparing previous efforts on newborn brain segmentation, the small sample size of high-quality labeled data must also be recognized as a significant restriction 127

Evaluation of Prematurity Complications with DL in Neonatology
Motivation: From the above discussion, we have addressed the primary applications of deep learning (DL) in relation to disease prediction. These include DL for analyzing conditions such as PDA(patent ductus arteriosus) 116 (Table 6).
Signal detection for sleep protection in NICU is another ongoing discussion. DL has been used to analyze infant EEGs and identify sleep states. Interruptions of sleep states have been linked to problems in neuronal development 140 . Automated sleep state detection from EEG records 141,142 and from ECG monitoring parameters 143 were demonstrated with DL. The underperformance of the all-state classification(kappa score 0.33 to 0.44) was likely owing to the difficulties in differentiating small changes between states and a lack of enough training data for minority classes 143 (Table 6).
Deep Learning (DL) has been found to be effective in real-time evaluation of Cardiac MRI for Congenital Heart Disease. Studies have shown that DL can accurately calculate ventricular volumes from images rebuilt using residual UNet, which are not statistically different from the gold standard, Cardiac MRI. This technology has the potential to be particularly beneficial for infants and critically ill individuals who are unable to hold their breath during the imaging process Deep Learning (DL) has been found to be effective in real-time evaluation of Cardiac MRI for Congenital Heart Disease 144 . Studies have shown that DL can accurately calculate ventricular volumes from images rebuilt using residual UNet, which are not statistically different from the gold standard, Cardiac MRI. This technology has the potential to be particularly beneficial for infants and critically ill individuals who are unable to hold their breath during the imaging process 144 (Table 6).
DL-based 3D CNN algorithms have been used to demonstrate the automated classification of brain dysmaturation from neonatal brain MRI. In a study, brain MRIs of 90 term neonates with congenital heart diseases and 40 term born healthy controls were analyzed using this method and it achieved an accuracy of 98%. This technique could be useful in detecting brain dysmaturation in neonates with congenital heart diseases. (as seen in Table 6) DL-based 3D CNN algorithms have been used to demonstrate the automated classification of brain dysmaturation from neonatal brain MRI 145 . In a study, brain MRIs of 90 term neonates with congenital heart diseases and 40 term born healthy controls were analyzed using this method and it achieved an accuracy of 98%. This technique could be useful in detecting brain dysmaturation in neonates with congenital heart diseases 145 (Table 6).
DL algorithms have been used to classify neonatal diseases from thermal images. These studies analyzed neonatal thermograms to determine the health status of infants and achieved good AUC scores. However, these studies didn't include any information about the sample size and clinical information DL algorithms have been used to classify neonatal diseases from thermal images [146][147][148][149] . These studies analyzed neonatal thermograms to determine the health status of infants and achieved good AUC scores [146][147][148][149] . However, these studies didn't include any clinical information (Table 6).
Two large sample size study showed breakthrough results regarding the effect of the nutrition practices in NICU 128 and wireless sensors in NICU 150 . In nutrition study revealed that nutrition practices found association discharge weight and BPD 128 . This exemplifies how unbiased ML techniques may be used to effectively bring about clinical practice changes 128 . Novel, wireless sensors can improve monitoring, prevent iatrogenic injuries, and encourage family-centered care 150 . Early validation results show performance equal to standard-of-care monitoring systems in high-income nations. Furthermore, the use of reusable sensors and compatibility with low-cost mobile phones may reduce monitoring expenses in the future. Large-scale deployment of these techniques in low-income countries may improve outcomes for newborns 150 (Table 6).

Discussions
The studies in Neonatology with AI were categorized according to following criteria.
i) the studies were performed with ML or DL ii) imaging data or non-Imaging data were used iii) according to the aim of the study: diagnosis or other predictions.
Most of the studies in Neonatology were performed with ML methods: pre-DL era. We have listed 12 studies with ML with imaging data for diagnosis. There are 33 studies which used non imaging data for diagnosis purposes. Imaging data studies cover BA diagnosis from stool color 99 , postoperative enteral nutrition of neonatal high intestinal obstruction 100 , functional brain connectivity in preterm infants 34 When it comes to DL applications, there has been less research conducted compared to ML applications. The focus of DL with imaging and non-imaging data focused on brain segmentation 117,127,133,135,145 , IVH diagnosis 115 EEG analysis 141,142 , neurocognitive outcome, 134 PDA and ROP diagnosis [129][130][131] . Upcoming articles and research will be surely from DL field, though.
It is worth to note that there have also been several articles and studies that have been published on the topic of the application of AI in Neonatology. However, the majority of these studies do not contain enough details, are difficult to evaluate side-by-side, and do not give the clinician a thorough picture of the applications of AI in the general healthcare system 25,26,41,[43][44][45]47,[73][74][75]87,89,92,105,111,127,132,135,142,145,152 .
There are several limitations in the application of AI in neonatology, including lack of prospective design, lack of clinical integration, small sample size, and single center evaluations. DL has shown promise in bioscience and bio signals, extracting information from clinical images, and combining unstructured and structured data in EHR. However, there are some issues that limit the success of DL in medicine, which can be grouped into six categories: In the following paragraphs, we'll examine the key concerns related to DL, which have been divided into six components: 1) Difficulties in clinical integration which including the selection and validation of models; 2) need for expertise in decision mechanisms, including the requirement for human involvement in the process; 3) lack of data and annotations, including the quality and nature of medical data, distribution of data in the input database, and lack of open-source algorithms and reproducibility; 4) lack of explanations and reasoning, including the lack of explainable AI to address the "black-box" problem; 5) lack of collaboration efforts across multi-institutions and 6) ethical concerns 3,7,12,13,153,154 .  106 . Algorithm is not available, and code is not shared. Another one is a study showing the physiologic effects of music in premature infants 156 . Even it could not find any AI analysis in this study. The third study is "Rebooting Infant Pain Assessment: Using Machine Learning to Exponentially Improve Neonatal Intensive Care Unit Practice (BabyAI)" is newly posted and recruiting 157 . To sum up, there is only one prospective multi center randomized AI study has been published with its results.

The need for expertise in the decision mechanisms
In terms of neonatologists to determine whether to implement a system's recommendation, it may be required for that system to present supporting evidence 43,44,73,152 .
Many suggested AI solutions in the medical field are not expected to be an alternative for the doctor's decision or expertise, but rather to serve as a helpful assistance. When it comes to struggling neonatal survival without sequela, AI may be a game changer in the neonatology. Neonatology has multidisciplinary collaborations in the management of patients and AI has the potential to achieve levels of efficacy that were previously unimaginable in the neonatology if more resources and support from physicians were allocated to it.
Given that many medical professionals have a limited understanding of DL, it may be difficult to establish contact and communication between data scientists and medical specialists. Many medical professionals, including pediatricians and neonatologists in our instance, are unfamiliar with AI and its applications due to a lack of exposure to the field as an end user. However, authors are also acknowledging the increasing efforts in building bridges but many scientistic and institutions, with conferences, workshops, and courses, that clinicians are successfully started to lead AI efforts, even with software coding schools by clinicians 158 159-162 .

Lack of imaging data and annotations and reproducibility problems
There is a rising interest in building deep learning approaches to predict neurological abnormalities using connectome data; however, their usage in preterm populations has been limited 33,37-40 . Similar to most DL applications, the training of such models often requires the use of big datasets 163 ; however, large neuroimaging datasets are either not accessible or difficult and expensive to acquire, especially in pediatric world. Since the success of DL methods currently relies on well labeled data and high-capacity models requiring several iterative updates across many labeled examples and obtaining millions of labeled examples is an extreme challenge, there is not enough jump in the neonatal AI applications.
As a side note, accurate labeling always requires physician effort and time, which overcomplicates the current challenges. Unfortunately, there is no established collaboration between physicians and data scientist at a large scale that can ease some of the challenges (data gathering/sharing and labeling). Nonetheless, once these problems are addressed, DL can be used in prevention and diagnosis programs for optimal results, radically transforming clinical practice. In the following, we envision the potential of DL to transform other imaging modalities in the context of neonatology and child health.
The requirement for a massive volume of data is a significant barrier, as mentioned earlier. The quantity of data needed by an AI or ML system can grow in proportion to the sophistication of its underlying architecture; deep neural networks, for example, have particularly high data needs. It's not enough that the needed data just be sufficient; they also need to be of good quality in terms of data cleaning and data variability (both ANN and DNN tend to avoid overfitting data if the variability). It may be difficult to collect a substantial number of clean, verified, and varied data for several uses in neonatology. To this reason there is a data repository shared with neonatal researchers including EHR 152 and clinical variables. Some approaches for addressing lack of labeled, annotated, verified and clean dataset include: (1) building and training a model with a very shallow network (only a few thousand parameters), (2) data augmentation. Data augmentation techniques are not helpful in the medical imaging field and medical setting 164 .
In the field of neonatal imaging, high-quality labeling and medical imaging data are exceedingly uncommon. One of the other comparable available neonatal data sets that the authors are aware of has just ten individuals 124,165,166 . This pattern holds even in more recent research, as detailed by with the majority of studies involving little more than 20 individuals 123 . Regardless of sample size and technology, it is crucial to be able to generalize to new data in the field of image segmentation, especially considering the wide range of MRI contrasts and variations between scanners and sequences between institutions. Moreover, it is generally known that models based on DL have weak generalization skills on unseen data. This is especially crucial for the future translation of research into reality, since (1) there is a shift between images obtained under various situations and (2) the model must be retrained as these images become accessible. Adopting a strategy of continuous learning is the most practical way to handle this challenge. This method involves progressively retraining deep models while preventing any virtual memory loss on previously viewed data sets that may not be available during retraining. This field of endeavor will advance 127 . Another new technique is transfer learning 167 which focuses on storing knowledge learned from solving issues in one datarich domain and applying it to a new problem in another data-scarce area. Transfer learning is a crucial key that may be used to tackle the fundamental difficulty of DL, which is insufficient data for training [168][169][170] .
Most of the papers did not put their algorithm as an open source to the libraries. Even algorithm is available it should be known whether separate training and testing dataset exist. Studies should have clarified which validation method have been chosen. In terms of comparison of algorithm success, reproducibility is crucial point.
Methodological bias is another issue with this system. Research is frequently based on databases and guidelines from other nations that may or may not have patient populations similar to ours 44 . A database that only contains data that is applicable to the specific problem that must be solved; however, obtaining the relevant information may be difficult due to the number of databases.

Lack of explanations and reasoning
The trustworthiness of algorithms is another obstacle 171 . The most widely used deep learning models use a black-box methodology, in which the model simply receives the input and outputs a prediction without explaining its thought process. In high-stakes medical settings, this can be dangerous. Some models, on the other hand, incorporate human judgment (human in the loop) or provide interpretability maps or explainability layers to illuminate the decision-making process. Especially in the field of neonatology, where AI is expected to have a significant impact, this trustworthiness is essential for its widespread adoption.

Lack of collaboration efforts (multi-institutions) and privacy concerns
New collaborations have been forged because of this information early detection and treatment of diseases that affect children, who make up a large portion of the world's population, will change treatment and follow-up status. Monitoring systems, knowing mortality and treatment activity with multi-site data, will help. Considering the necessity for consent to the processing of personal health data by AI systems as an example of a subject related to the protection of privacy and security 44 .

AI Ethics
While AI has great promise for enhancing healthcare, it also presents significant ethical concerns. Ethical concerns in health AI include informed consent, bias, safety, transparency, patient privacy, and allocation, and their solutions are complicated and difficult to negotiate 172 . In neonatology, crucial decision-making is frequently accompanied with a complicated and challenging ethical component. Interdisciplinary approaches required for progress 173 .The border of viability, life sustaining treatments 174 and the different regulations in worldwide made more complicated AI utilization in Neonatology. How an ethics framework is implemented in an AI in Neonatology is not reported yet, and there is a need for transparency for trustworthy AI.

Concluding Remarks
The applications of AI in real-world contexts have the potential to result in a few potential benefits, including increased speed of execution; potential reduction in costs, both direct and indirect; improved diagnostic accuracy; increased health care delivery efficiency ("algorithms work without break"); and the potential of supplying access to clinical information even to persons who would not normally be able to utilize from healthcare due to geographic or economic constraints 4 .
To achieve the accurate diagnosis, it is planned to limit the number of extra invasive procedures new DL technologies and easy-to-implement platforms will enable regular and full follow-up of health data for patients unable to access their records owing to a physician shortage, hence reducing health costs.
The future of neonatal intensive care units and healthcare will likely be profoundly impacted by AI. This article's objective is to provide neonatologists in the AI era with a reference guide to the information they might require. We defined AI, its levels, its techniques, and the distinctions between the approaches using in medical field, and we examine the possible advantages, pitfalls, and challenges of AI. While also attempting to present a picture of its potential future implementation in standard neonatal practice. AI and pediatrics require clinicians' support, and the fact that AI researchers with clinicians need work together and cooperatively. As a result, AI in neonatal care is highly demanded and there is a fundamental need for a human (pediatrician) to be involved in the AI-backed up applications, in contrast to systems that are more technically advanced and involve fewer healthcare professionals.

Competing Interests
Dr E. Keles has no COI. Dr U. Bagci discloses Ther-AI LLC. This work is partially supported by the NIH NCI funding: R01-CA246704 and R01-CA240639.
Dr. E Keles is working as a senior clinical research associate in the Machine and Hybrid Intelligence Lab at the Northwestern University Feinberg School of Medicine, Department of Radiology.
Dr. U Bagci is directing Machine and Hybrid Intelligence Lab and Associate Professor at the Department of Radiology, Northwestern University, Feinberg School of Medicine.