Artificial intelligence model comparison for risk factor analysis of patent ductus arteriosus in nationwide very low birth weight infants cohort

Despite the many comorbidities and high mortality rate in preterm infants with patent ductus arteriosus (PDA), therapeutic strategies vary depending on the clinical setting, and most studies of the related risk factors are based on small sample populations. We aimed to compare the performance of artificial intelligence (AI) analysis with that of conventional analysis to identify risk factors associated with symptomatic PDA (sPDA) in very low birth weight infants. This nationwide cohort study included 8369 very low birth weight (VLBW) infants. The participants were divided into an sPDA group and an asymptomatic PDA or spontaneously close PDA (nPDA) group. The sPDA group was further divided into treated and untreated subgroups. A total of 47 perinatal risk factors were collected and analyzed. Multiple logistic regression was used as a standard analytic tool, and five AI algorithms were used to identify the factors associated with sPDA. Combining a large database of risk factors from nationwide registries and AI techniques achieved higher accuracy and better performance of the PDA prediction tasks, and the ensemble methods showed the best performances.

www.nature.com/scientificreports/ learning, semisupervised learning and reinforcement learning, depending on the problem and dataset 12 . Al technologies are growing in use in the fields of imaging, diagnosis, therapy selection, risk prediction, disease stratification, and precision medicine 13 . For example, prior studies have predicted the risks of heart transplantation and in-hospital mortality by supervised learning 14,15 . More recently, an ensemble model aggregating four different classifiers has been developed to predict agitation in invasive mechanical ventilation patients 16 . Along with advances in the accuracy of AI analytics, methodologies for explainable AI (XAI) have also evolved. XAI accounts for the rationale underlying the decision-making process and shows which risk factors contribute the most to the decision-making process 10 . This study is the first report to analyze the perinatal risk factors for preterm sPDA cases registered in a nationwide cohort database and to suggest the feasibility of supervised learning-based AI in newborn screening for this disease. To date, respiratory distress syndrome (RDS), birth weight, sex, and gestational age have been deemed risk factors for sPDA 17,18 . In addition to the factors commonly considered by clinicians, we tried to determine whether various factors affect PDA; thus, the model was configured to include all the registered variables as much as possible. Ultimately, we tried to diagnose sPDA as soon as possible using only perinatal factors without imaging findings. This study aimed to investigate the perinatal risk factors leading to sPDA and sPDA treatments for very low birth weight (VLBW) infants in a nationwide cohort registry and to compare the performance of AI analysis with that of conventional analysis. This study may support the idea that an integrated combination of Al and conventional analysis can synergistically aid clinical risk prediction and therapy selection in medicine.

Methods
Study design. Patients and data collection. In this study, we derived data from infants registered in the Korean Neonatal Network (KNN), a nationwide prospective web-based registry of VLBW infants. These data were collected from patients admitted to 74 neonatal intensive care units (NICUs) in Korea and analyzed retrospectively. The KNN registry was approved by the institutional review board of each participating hospital. Informed consent was obtained from the parents of each infant prior to participation in the KNN registry. All the methods were performed in accordance with the relevant guidelines and regulations. This study was supported by the Korea Centers for Disease Control and Prevention (2019-ER7103-01) 19 and was approved by the Hanyang University Institutional Review Board (IRB No. 2013-06-025-043).
The cohort data comprised 10,390 VLBW infants born between January 5, 2013, and November 19, 2017, weighing less than 1500 g. Infants who had died before three postnatal days since the confirmation of sPDA was impossible, those who had received prophylactic or presymptomatic PDA treatment, and those who had major congenital anomalies were excluded. Infants with missing or unknown PDA treatment policies were also excluded. After this exclusion process, 8,369 infants from the KNN were eligible for sPDA prediction and risk factor analysis. After the group without PDA was excluded, the data of 2,982 patients remained and were used to analyze the treatment-determining factors of sPDA (Fig. 1). www.nature.com/scientificreports/ Definitions. According to the PDA treatment strategy, KNN classified the population as follows: group 1, prophylactic PDA treatment without clinical symptoms or abnormal echocardiographic findings; group 2, PDA treatment performed without clinical symptoms due to PDA although PDA was confirmed by echocardiography; group 3, sPDA treatment as PDA treatment performed because of clinical symptoms due to PDA; group 4, only conservative and supportive treatment although with clinical symptoms due to PDA; group 5, asymptomatic PDA or spontaneously closed PDA (nPDA); group 6, missing or unknown PDA treatment policy. sPDA was defined as the presence of more than 2 of the following 5 clinical symptoms with/without echocardiographic confirmation of a large left-to-right ductal flow: (1) a systolic or continuous murmur; (2) a bounding pulse or hyperactive precordial pulsation; (3) hypotension; (4) respiratory difficulty; and (5) evidence from a chest radiograph (pulmonary congestion, cardiomegaly). According to the therapeutic strategies for PDA registered in the KNN database, we further stratified the sPDA group into the following two subgroups to compare the risk factors in the treatment and nonintervention groups that mediated sPDA closure: treated group (sPDA_tx), comprising patients who had received any treatment for sPDA; untreated group (sPDA_nontx), comprising patients who had received only conservative treatment or no treatment for sPDA. The term "treatment" refers to medication (indomethacin, ibuprofen, and other NSAIDs) or ligation.
Artificial intelligence analysis. Dataset Table 2). The imputation was conducted using the medians of numerical variables and modes of nominal variables. To prevent unseen information from being used in the training process, the median or mode was calculated using only the training set and then imputed to the same values as the test data.
Artificial intelligence algorithms. Five classical AI algorithms (AAs) were selected in this study because their specific properties are appropriate for risk prediction analyses: a random forest (RF), a decision tree-based theory used to avoid overfitting; a light gradient boosting machine (L-GBM), a low-bias model formed by combining sequential weak models with a light computational algorithm; a multilayer perceptron (MLP), a feedforward artificial neural network that has excellent pattern extraction capability; a support vector machine (SVM), a model optimized exploiting the kernel trick for highly complex problems in cases where linear separation is not possible; and k-nearest neighbors (k-NN), which perform classification based on most of the nearby data points. Among these AAs, the RF and L-GBM are decision tree-based ensemble models. The AAs used in the present study have been previously used to predict hypertension and cardiovascular risk 12,20 . Further detailed information concerning these AAs is provided in Supplementary Methods 1 and 2.
Hyperparameter optimization. The study population was divided into a training set and a test set at a ratio of 80:20 using stratified random sampling 21 . To avoid AA methods overfitting the test set, we applied fivefold crossvalidation to the training set in validating procedure, and the area under the receiver operating curve (AUC) was averaged over all the data fold sets 22,23 . The stratified cross-validation method ensures that each training and test fold has a similar distribution of outcomes with the entire dataset to reduce bias in the training and evaluating processes. Additionally, cross-validation improves generalization performance by estimating the performance of the model by averaging the results of multiple validations 24 . We used a grid search to find the optimal hyperparameter that maximizes the AUC by performing cross-validation on all possible hyperparameter combinations. A significant difference was found in the number of positive and negative classes in the study population. If the data are unbalanced, the problem arises that the model gives a higher weight to the majority class; thus, the sensitivity to the minority class would be reduced. We resolved the class imbalance using the synthetic minority oversampling technique (SMOTE), an oversampling method in which a new minority class sample is synthesized by adding a random value to the sample of the original minority class 25 . We evaluated the performance of the AAs using the AUC and accuracy metrics, and 95% confidence intervals (CIs) were calculated using bootstrapping 26 . We implemented the AAs using Python 3.8.5 (Python Software Foundation, https:// www. python. org/) and a compatible package-i.e., Scikit-Learn version 0.24 (https:// scikit-learn. org/) 27 .
Shapely additive explanation. After training the AAs, we analyzed the associations between the risk factors and outcomes. AI classifiers are black boxes that do not reveal their internal working processes, making it challenging to understand the associations between specific factors and decisions 28 . To give clinicians a convincing reason to trust the decisions, we used a game theory-based AI interpretation method called SHAP (Shapely Additive explanations) 29 . SHAP is a leading algorithm to identify the main risk factors that drive the decisions of a model. The SHAP value of each factor was calculated as the average difference in the prediction probabilities between the combinations of risk factors in which the target risk factor was included and not included. Because of their computational nature, SHAP values can be positive or negative depending on the side to which the given risk factor pushes the model's predictions. www.nature.com/scientificreports/ MLR analysis. We also evaluated the predictive accuracies of the examined risk factors using a conventional analysis. A multiple logistic regression (MLR) approach was used as the reference method, and the raw data were stratified into a binomial distribution. Variables with a threshold p-value of 0.15 were selected to remain in the model according to backward selection, starting with the full model, and all the regression coefficients were tested using Wald statistics at α = 0.05. The effects whose p-values were less than 0.05 were regarded as significant. The goodness of fitness of the final model was tested using Hosmer-Lemeshow's method at α = 0.05. The identification of significant effects was based on SAS 9.4 (SAS, Inc., NC, USA, https:// www. sas. com/), and prediction analyses were performed using Python 3.7.5 (https:// www. python. org/).

Interpretation of the correlations among the factors.
In addition to the main AI-based risk factor analysis, we analyzed the statistical correlations among all the risk factors and outcomes. A correlation matrix was calculated using formulas such as Spearman's rank, point biserial coefficients, and pi coefficients, depending on the type of variable (continuous, ordinal, nominal), across the entire study cohort (n = 8369). The correlation results were visualized as dendrograms, heatmaps and networks through a hierarchical clustering process using the Ward distance method and the Force Atlas function of the gephi 0.9.2 program (https:// gephi. org/) [30][31][32] . Positive/negative correlation analysis. The summary plot of SHAP in Supplementary Fig. 2 shows the quantitative contributions of the top 10 factors for sPDA/nPDA and sPDA_tx/sPDA_nontx) in the AI analysis. For example, we found that invasive mechanical ventilator treatment and the number of administered surfactants were positively associated with sPDA, and gestational age, sepsis, birth weight and birth height were negatively correlated. For sPDA_tx, supplemental oxygen, the need for oxygen supplementation at birth, and noninvasive mechanical ventilator treatment were positively associated with sepsis. Antenatal steroid use was negatively correlated.

Results
Relationships among risk factors. Hierarchical clustering was used to cluster highly correlated factors in a dendrogram (Fig. 3a). sPDA was clustered with the gestational period, birth height, birth weight, and birth head circumference. sPDA_tx was clustered with sepsis and fungal infections, and its cluster was closest to the cluster comprising noninvasive mechanical ventilation, oxygen inhalation, and birth temperature, among others. According to the heatmap (Fig. 3b), sPDA was negatively correlated with the gestational period, birth height, birth weight, Apgar score, and head circumference. By contrast, sPDA was positively correlated with surfactant administration, positive airway pressure, and endotracheal intubation. sPDA_tx showed a negative correlation with sepsis and fungal infection and a positive correlation with noninvasive mechanical ventilation and oxygen inhalation.
sPDA was relatively close to the Apgar score, physical measurement, resuscitation, and ventilator treatment factors (Fig. 3c). Thus, whether those factors were positively or negatively correlated, they were highly correlated. However, parental factors were far from sPDA, indicating that they were not correlated with sPDA. sPDA_tx was located near sepsis and oxygen supplementation. The factors that were highly correlated with sPDA or sPDA_tx are also shown as important factors in Table 3. Therefore, the important factors derived from the AAs were somewhat consistent.

Discussion
The analysis of risk factors for symptomatic PDA and determination of PDA treatment in VLBW infants using AI showed higher accuracy and better performance than the conventional analysis. The ensemble model showed a better prediction accuracy and AUC than the other methods when the performances of the models were evaluated. Ultimately, conventional analysis and AI analysis were incorporated to create a new diagram containing the relationships between each factor and sPDA to allow medical staff to intuitively apply these results in actual clinical practice.
According to a relatively recent large-scale study conducted in another country, RDS, birth weight, female sex, gestational age, and the 5-min Apgar score were suggested as risk factors for sPDA in preterm infants 17 . In a study that analyzed 18 factors for hemodynamically significant PDA in preterm infants within 22-29 weeks of gestational age, a lower gestational age, pregnancy-induced hypertension (PIH), and surfactant use were analyzed as risk factors 18 . In the present study, a very low gestational age, a low birth weight and height, and the number of administered surfactants showed close correlations with sPDA. However, the presence of RDS and PIH did not significantly affect the prediction of sPDA. Instead, in the case of invasive ventilator care, a clear prediction of sPDA was shown. In the case of sepsis, the opposite result was obtained with MLR and other AAs. Regarding the national data used in this study, the definition of sepsis was limited to patients who had positive blood cultures www.nature.com/scientificreports/ or had received more than 5 days of systemic antibiotic treatment. Therefore, the definition of neonatal sepsis is unclear 33 , and the inability to include culture-negative sepsis in particular led to the creation of statistical bias. For the prediction of sPDA treatment, AI showed very high accuracy and good performance. Because no study has determined the presence or absence of treatment for sPDA, knowing exactly which model selects the most accurate factors is impossible. Generally, a relatively high probability exists that PDA will be treated with absent sepsis when supplemental oxygen is provided during hospitalization (O2) or is needed at birth (O2_R) and when noninvasive ventilator care is required (NI_VENT). When examining each analysis method in detail, some differences were found. These differences were considered due to differences in the strategies used at various hospitals, even in cases in which the sPDA situations were the same and when the time point of sPDA and that at which treatment was started were different. www.nature.com/scientificreports/ In recent years, AI has been applied and used in various fields beyond simple engineering domains, and advances in machine learning have begun to affect real-world decisions in many areas, including politics, economics, finance, and medicine 34,35 . The applications of AI in health care include image analysis, treatment, the diagnosis and prognosis of diseases, health care, the improvement of medical administration and management systems, and drug development. In some studies, AI has shown sufficient or rather high disease risk prediction ability compared with existing models 20,36 .
To our best knowledge, this study is the first to use AAs to predict sPDA and sPDA_tx and to analyze the main risk factors for sPDA using large-scale cohort data comprising only electronic records and structured factors (excluding images). The proposed AI classifiers can classify patients well, even when nonlinear relationships exist in the data. This nonlinear characteristic makes it difficult to interpret the prediction processes of AI classifiers. However, by introducing a game-theoretical contribution-based explanation algorithm (SHAP), we identified the main factors. That the AI models' performances exceeded AUCs of 0.8 and that the main factors were identified using a mathematically fair explanation method support the study's validity. Additionally, the main factors derived from AI and those that were statistically highly correlated with the outcomes coincided, enhancing the consistency of the AI and correlation analysis results.
According to the SHAP interpretation, too many factors were considered in the case of analysis other than the ensemble models, indicating that these models overfitted trivial factors and had lower performances (Table 3). However, the tree-based ensemble models achieved the highest performances because they are designed to overcome overfitting. Ensemble learning has been demonstrated as a solution to construct balanced datasets to enhance prediction performance.
However, AI analysis still has some limitations, such as representation, accuracy, and homogeneity, which occur during the data collection process 37 , and the nature of self-extracting data from large datasets makes it difficult to determine how an AI method produces results and why errors occur 38,39 . Overreliance on AI models when making decisions or analyzing images may lead to automation bias, and it is difficult to analyze the basis of a given judgment 40 . Furthermore, because an SHAP value is a measure of the corresponding factor's contribution to the model result, predict the amount of change induced in a model's prediction based on a change in factor value is impossible. Additionally, the data collected by the KNN are not focused on PDA; thus, limited factors are included. The lack of information, including vital signs, may reduce the performance of AI. In addition to the variables studied in this study, better results will be obtained if more individual data, such as chest radiographs Table 3. Top significant variables for sPDA and sPDA_tx Prediction. sPDA, symptomatic patent ductus arteriosus; nPDA, asymptomatic PDA or spontaneously closed PDA; sPDA_tx, symptomatic patent ductus arteriosus with any treatment; sPDA_nontx, symptomatic patent ductus arteriosus without treatment; RF, random forest; L-GBM, light gradient boosting machine; MLP, multilayer perceptron; SVM, support vector machine; k-NN, k-nearest neighbors. The abbreviations for all factors are shown in Supplementary  Table 1. Feature importance describes how relevant a factor is to the model's predictions. In MLR, the feature importance values were selected according to a p-value of 0.05 during the testing procedure. These are listed in descending order as the absolute values of the coefficients for the MLR and as the average absolute SHAP values for the AAs. The variables in italics indicate positive associations between the selected factors and sPDA or sPDA_tx. a The factor analysis with MLR as the standard reference method. To overcome the abovementioned limitations, this study attempted to enhance objectivity by conducting integrated analysis of MLR, which has been widely used, and AAs. To consider the effects of multicollinearity, Fig. 2c was used to understand the interrelationships among the factors so that the factors most strongly correlated with PDA could recognize the influence of one another more readily than they recognize their influence on PDA.
Using the present study, a follow-up study is planned that prospectively analyzes risk factors and applies management for PDA. Additionally, although this study used existing AAs, in future studies, we will try to study more advanced techniques to improve generalization performance, analyze risk factors by developing a Figure 3. Relationships among the risk factors. (a) Dendrogram visualizing hierarchical clustering based on the obtained correlation coefficients. The dendrogram's x-axis comprises sPDA, sPDA_tx and all risk factors, and highly correlated factors are forced to be adjacent through hierarchical clustering. Each horizontal line indicates that the two associated subclusters are merged into one cluster, and the y-height indicates the distance between the two subclusters. We divided the factors into 9 clusters with a threshold of 1.15 and marked each cluster by color. (b) Heatmap of the correlation matrix. The x-axis and y-axis of the heatmap follow the arrangement of factors generated by hierarchical clustering, and the correlation coefficients are depicted in red or blue at the intersection of the factors. According to the color bar on the right, red represents a positive correlation, and blue represents a negative correlation. A darker color indicates a higher correlation, while a lighter color indicates a lower correlation. (c) Schematic diagram of the relationships among the factors. The circles (nodes) represent the risk factors, connected by the absolute value of the correlation coefficients (edges). In this network, the edges act as attraction forces, bringing highly correlated nodes closer together and pushing less-correlated nodes away from each other. The color of each cluster is the same as that in the dendrogram in (a www.nature.com/scientificreports/ new model explanation method that detects the amounts of changes in risk factors, analyze the relationships between treatment methods and long-term prognosis according to the timing of a given treatment and propose the best treatment policy. This AI analysis using a nationwide cohort registry is the first study of VLBW infants in the NICU. We evaluated risk factor variables associated with and potentially causally linked to sPDA and sPDA_tx and showed that the ensemble models (RF and L-GBM) were the best among the examined AAs at predicting specific disease development trends, yielding higher accuracy than that of an established risk prediction approach. The use of these readily available online AAs underlined their applicability as an auxiliary means of risk prediction and therapy selection.

Data availability
According to the Korean Neonatal Network (KNN) Publication Ethics Policy, all information about patients is confidential. The information contained in the data must be protected as confidential, and only available to individuals who have access for the permitted research activity.