Introduction

Newborns have become the main blood transfusion population due to their small blood volume and underdeveloped hematopoietic system. As the survival rate of newborns continues to improve1, there has been a significant increase in the number of infants requiring long-term stays in the neonatal ward. Consequently, over the past few years, the overall transfusion rate has experienced exponential growth2. Nearly 58% of low birth weight infants and more than 90% of very low birth weight infants need at least one blood transfusion during hospitalization3. Providing red blood cell (RBC) transfusions can increase the capacity of tissues to carry oxygen, thereby reducing the likelihood of apnoeic episodes and promoting weight gain and growth among premature newborns4.

Nevertheless, an increasing body of evidence suggests a link between transfusion exposure and negative outcomes. The existing studies indicate that blood transfusion is not only closely related to the mortality of children, but also may be related to the occurrence of premature complications such as intraventricular hemorrhage (IVH), bronchopulmonary dysplasia (BPD), necrotizing enterocolitis (NEC)5,6,7. These complications not only threaten the life of newborns, but also lead to long hospitalization. Some studies showed that transfusion infants have longer hospital or NICU length of stay than non-transfusion group8,9. The long hospitalization exposes them to medical environments for a longer duration, thereby increasing the risk of hospital-acquired infections and other complications10. Additionally, it places a heavy burden on neonatal care. Moreover, an extended LOS may hinder the establishment of parental bonding and interaction with the newborn, potentially causing significant emotional and financial stress on the family11. Therefore, the long LOS of these patients in the neonatal intensive care unit has become a worrisome issue, making accurate prediction of LOS in the neonatal ward increasingly crucial. Furthermore, there is currently limited research on LOS specifically focused on transfusion patients.

Indeed, our objective is to utilize data from the NICU to explore the predictor variables that may lead to extended hospital stays in NICU patients who have undergone blood transfusions and develop a predictive nomogram, in order to provide more evidence for the prevention of long hospital stays and the optimization of resource allocation for NICU patients.

Methods

This study aims to conduct a retrospective investigation on the data of registered newborns at the First Affiliated Hospital of Xinjiang Medical University. The research protocol received approval from the Ethics Committee of the First Affiliated Hospital of Xinjiang Medical University in Urumqi. Given the retrospective nature of the study, the necessity for a written informed consent form has been waived.

Study population

Between May 1, 2021, and May 1, 2023, neonates who received blood transfusions at the First Affiliated Hospital of Xinjiang Medical University were included in this study. The following criteria were applied for inclusion in the study: 1. Neonates who received at least one blood transfusion during their hospitalization; 2. Admission age less than 1 day and 3. Admitted to the NICU rather than general wards. The following exclusion criterion was applied in this study: 1. Rapid discharge or non-prescription discharge for non-medical reasons; 2. Neonatal death during NICU hospitalization or before admission; 3. Chromosomal abnormalities or severe congenital malformations; 4. Neonates undergoing surgery and 5. Patients with missing data exceeding 10% in general conditions, complications, and laboratory measurements are excluded.

Data collection

Data collection from the enrolled neonates encompassed the following 57 variables: (1) General conditions of the neonate—GA (< 28 weeks), BW(< 1000 g), gender, 1-min and 5-min Apgar scores, feeding patterns, respiratory support type (invasive ventilation, noninvasive ventilation, others: nasal cannula oxygen inhalation or anaerobic), antenatal glucocorticoids, umbilical venous catheterization (UVC), amniotic fluid contamination, and placental abruption. (2) Complications—neonatal respiratory distress syndrome (NRDS), NEC, IVH, pneumorrhagia, sepsis, congenital heart disease (CHD), hemolytic jaundice, and rescue frequency (≥ 3 times). (3) Laboratory parameters ()- white blood cell (WBC) count, platelet (PLT) count, red cell distribution width (RDW), hematocrit (HCT), neutrophils, lymphocytes, procalcitonin (PCT), C-reactive protein (CRP), interleukin (IL), potassium (K), sodium (Na), potential of hydrogen (pH), arterial partial pressure of carbon dioxide (PaCO2), arterial partial pressure of oxygen (PaO2), bicarbonate (HCO3), total bilirubin (TBIL), albumin (ALB), aspartate aminotransferase (AST), alanine aminotransferase (ALT), lactate dehydrogenase (LDH), creatine kinase (CK), lactic acid (LA), triglyceride (TG), total cholesterol (TC), high-density lipoprotein (HDL), low-density lipoprotein (LDL), creatinine, and urea. (4) Maternal factors—maternal age, primipara, multiple births, cesarean section, pregnancy anemia, hypertensive disorders of pregnancy (HDP), intrahepatic cholestasis of pregnancy (ICP), gestational diabetes mellitus (GDM), hypothyroidism, and idiopathic thrombocytopenic purpura (ITP). The laboratory data utilized in this study were derived from the initial blood tests conducted in the NICU. We imputed the missing data (< 10%) using the MICE package (version 3.14.0)12.

Definitions

We defined long LOS as exceeding the 75th percentile of LOS13,14. The original continuous variables of gestational age and weight did not show significant differences (p > 0.05). Therefore, we explored various combinations, including different categories for weight and gestational age classifications. In the end, we found that the classification method using gestational age (< 28 weeks, ≥ 28 weeks) and weight (< 1000 g,  ≥ 1000 g) yielded more significant differences for our study.

Development and assessment of the nomogram

In this study, the subjects were randomly split into groups, with two-thirds of the subjects assigned to the training set, which was used to identify the predictor variables associated with long hospitalization in newborns who received blood transfusions and to develop a prediction scoring model. The remaining one-third of the subjects were designated as the validation set, which was used to evaluate the effectiveness of the prediction scoring system. This study utilized the least absolute shrinkage and selection operator (LASSO) method to identify the optimal predictive variables among the predictor variables present in newborns who received blood transfusions15. LASSO regression, which incorporates L1 regularization, is particularly well-suited for datasets with highly correlated predictor variables. By introducing a penalty term, LASSO regression tends to select a subset of variables while shrinking the coefficients of other highly correlated variables towards zero, effectively mitigating the impact of multicollinearity. Therefore, in our study, we employed LASSO regression to handle the presence of highly correlated predictors and to provide more stable and reliable estimates. The LASSO regression model was employed to select variables that had nonzero coefficients, which were then used in a multivariable logistic regression analysis to construct a predictive model for long LOS. A predicting model nomogram was created by incorporating all the potential predictors selected in the LASSO regression model. Moreover, the diagnostic performance of the visual prediction model was externally validated using the Hosmer–Lemeshow test and coefficient of determination (R2) to evaluate its goodness of fit. The model’s predictive accuracy and conformity were also assessed examining the shape of the ROC and calibration curves and by using metrics such as the area under the ROC curve (AUC). Additionally, the decision curve analysis (DCA) was utilized to assess the net benefit of the model for patients. The discrimination and calibration of the model were checked through bootstrapping with 1000 resamples. Finally, to enhance the credibility and accuracy of our model evaluation, we utilized a technique called fivefold cross-validation. This method involves dividing our dataset into five subsets of approximately equal size. During the evaluation process, we iteratively trained our model on four of these subsets while using the remaining subset as a validation set. This allowed us to assess the model’s performance across multiple iterations, each time using a different subset for validation.

Statistical analysis

Statistical analysis was conducted using R software, Version 4.1.3 (available at https://www.Rproject.org). Categorical data are presented as numbers and percentages, while continuous variables are reported as mean ± standard deviation (SD) if they follow a normal distribution, or as median (interquartile range [IQR]) if they do not. To assess proportions, the χ2 test or Fisher’s exact test was used for comparing categorical variables. For continuous variables that exhibited a normal distribution, independent group t-tests were employed to compare means. All statistical tests were performed in a two-sided manner, and p-values ≤ 0.05 were considered statistically significant.

Ethics approval and consent to participate

This study followed the Helsinki Declaration and was approved by the Ethics Committee board of Xinjiang Medical University Affiliated First Hospital. Due to the retrospective nature of the study, the need of informed consent was waived by the Ethics Committee board of Xinjiang Medical University Affiliated First Hospital. All methods were carried out in accordance with relevant guidelines and regulations. No biological specimens were used in this study.

Results

Baseline characteristics of included neonates

The study analyzed a total of 539 infants, among whom 398 had hospital stays shorter than the 75th percentile (normal LOS), while 141 infants had hospital stays longer than the 75th percentile (long LOS). (Fig. 1) The baseline characteristics of these two groups are presented in Table 1. Compared with children in the normal LOS group, a significantly higher proportion of children in the long LOS group had a gestational age of less than 28 weeks (19.9% vs. 1.76%; p < 0.001) and weight less than 1000 g (29.8% vs. 2.01%; p < 0.001). A statistically significant difference (p < 0.05) was observed between the two groups in terms of Apgar score, feeding patterns, respiratory support, UVC, RDS, NEC, pneumorrhagia, sepsis, rescue frequency (≥ 3 times), HCO3, ALB, CK, and urea, as illustrated in Table 1.

Figure 1
figure 1

Study flowchart.

Table 1 Baseline characteristics of patients.

Variables selection

Based on the data from the training set, we conducted LASSO regression analysis to identify independent predictor variables that significantly affect long LOS. The LASSO analysis yielded a reduction of the initial 57 perinatal variables down to six potential predictors, resulting in a ratio of 9.5:1 (Fig. 2A,B). The six potential predictors identified through the LASSO analysis were GA (< 28 weeks), BW(< 1000 g), respiratory support type, umbilical venous catheter (UVC) use, sepsis, and rescue frequency (≥ 3 times) (Fig. 2C).

Figure 2
figure 2

Feature selection. (A) Variable selection using LASSO logistic regression model. The dashed line on the left represents the minimum criterion, and the 1-SE of the minimum criterion is used to determine the optimal parameter (lambda) selection in the LASSO model (represented by the dashed line on the right). (B) Silhouette of LASSO coefficients for 57 features. (C) Features with non-zero coefficients selected by LASSO.;

Risk prediction nomogram development

A logistic regression model was constructed using the six predictor variables identified by LASSO: GA (< 28 weeks), BW(< 1000 g), respiratory support type UVC use, sepsis, and rescue frequency (≥ 3 times). Table 2 presents the logistic regression model, displaying the coefficients and corresponding p-values for each of the six predictor variables. These coefficients indicate the strength and direction of the association between the predictor variables and the outcome (long LOS). As shown in Fig. 3, Each blood transfusion patient’s risk of an extended hospital stay for can be estimated by evaluating the cumulative points assigned on the nomogram. A higher total score indicates a greater likelihood of long hospitalization. Previous research has identified specific clinical features as predictor variables for long LOS. To enhance our model, we incorporated additional features and evaluated their discriminatory ability (Table 3). However, the results indicated that adding these predictor variables to the validation set did not lead to significant improvements, and may have led to models that were overfitted. Consequently, we have decided to use the nomogram as our final model.

Table 2 Predictive factors for long LOS in infants undergoing blood transfusion.
Figure 3
figure 3

A nomogram predicting the risk of long Length of stay in infant with blood transfusions. The long LOS risk nomogram was developed in the cohort, with GA, BW, Respiratory support type, UVC, sepsis and rescue frequency.

Table 3 Different model performance.

Evaluation of the performance of the predictive model

The differentiation capacity of the developed model was validated in both the training set and validation set. The AUC for the nomogram in the training set was 0.851 (95% CI 0.805–0.891), as shown in Fig. 4A. Similarly, in the validation set, the AUC was 0.859 (95% CI 0.789–0.920), as depicted in Fig. 4B. These results indicate that the model exhibited good discriminatory and predictive abilities. The calibration curve demonstrated that the model exhibited an excellent ability to accurately predict actual probabilities, as shown in Fig. 5A,B. Based on 1000 rounds of resampling, the mean absolute error (MAE) obtained for the training set calibration curve was 0.024 with a sample size of 154. Similarly, for the validation set calibration curve, the MAE achieved through 1000 rounds of resampling was 0.01 with a sample size of 385.The Hosmer–Lemeshow test indicated no significant difference between our model and the observed values (p > 0.05). The R2 of our model was 0.331. Furthermore, we conducted an assessment using fivefold cross-validation to evaluate the generalizability of our model. The results, depicted in Fig. 6, demonstrate satisfactory performance.

Figure 4
figure 4

ROC curves. (A) Training set (ROC). (B) Validation set (ROC). ROC = receiver operating characteristic, AUC = area under the ROC curve.

Figure 5
figure 5

Calibration curves. (A) Assessing agreement between predicted probabilities of long LOS and observed outcomes within the training set. (B) Assessing agreement between predicted probabilities of long LOS and observed outcomes within the validation set. X-axis: Predicted probabilities; Y-axis: Observed proportions of long LOS. Deviations from the ‘Ideal’ line indicate potential errors. Points aligning with the ‘Ideal’ line indicate good calibration. The ‘Apparent’ estimate is uncorrected and biased. ‘Bias-corrected’ estimate improves accuracy. ‘Ideal’ estimate serves as a benchmark. Tick marks show percentiles. Mean Absolute Error (MAE) measures overall accuracy. Sample size (n) reported for each estimate.

Figure 6
figure 6

Fivefold cross-validation. Fivefold cross-validation involves dividing the dataset into 5 equal parts, where 4 parts are used for training and 1 part is used for validation. This process is repeated 5 times, with each different subset serving as the validation set. The results are then averaged to provide an evaluation metric for the model’s performance.

Clinical effectiveness of the model

Figure 7 displays the decision curve analysis for the long LOS nomogram for children undergoing blood transfusion. This analysis reveals that the model is relevant across a wide range of risk thresholds.

Figure 7
figure 7

Decision curve analysis of nomograms. Using the nomogram to predict long length of stay is the optimal decision-making strategy for maximizing net benefit, especially when compared to scenarios where no prediction model is utilized (i.e., treat-all or treat-none scheme) across a majority of given threshold probabilities (> 2%).

Discussion

The LOS in the NICU has been a focal point of research. There exists a correlation between long LOS and hospital-acquired conditions, as well as adverse events in healthcare16,17. Blood transfusion is a frequently performed procedure for neonates who require intense care, especially for preterm neonates. However, there is limited literature available on predicting LOS specifically for this high-risk patient population.

Early identification of long LOS-NICU risk in NICU neonatal patients who received transfusion therapy is not only crucial for providing important counseling references to families but may also guide decisions on optimal clinical interventions. Thus, in this study, we utilized historical clinical data from the NICU to identify important predictor variables and developed a predictive model for long NICU LOS in neonates receiving blood transfusions. We deployed routinely used machine learning algorithm (LASSO) to selection the predictor variables related to aforementioned issue. We identified six independent characteristics in our study: GA (< 28 weeks), BW(< 1000 g), Respiratory support type, UVC, sepsis and rescue frequency (≥ 3 times). Our model demonstrated excellent predictive performance in both the training and validation set, with discrimination abilities of 0.851 (95% CI 0.805–0.891) and 0.859 (95% CI 0.789–0.920) respectively. These results indicate strong predictive power and suggest excellent performance in accurately identifying long LOS.

The GA and BW are frequently used to evaluate newborn infants. We found that GA lower than 28 weeks and BW less than 1000 g dramatically increased the probability of long LOS. Our findings are consistent with previous research indicating that birth weight and gestational age are the primary predictor variables influencing long length of stay in the NICU18,19. The incidence of anemia is high among premature infants. This is due to several factors, including their smaller circulating blood volume, shorter lifespan of red blood cells, and an immature bone marrow response to anemia. The immature hepatic receptors in premature infants are relatively insensitive to tissue hypoxia, and their plasma erythropoietin (EPO) levels are also low20. The degree of deficiency is especially significant in the smallest and least mature infants21. Another crucial factor is that premature infants require careful monitoring of various parameters, which often involves repeated blood sampling for laboratory analysis. During the initial hospitalization period, premature infants may require a greater number of red blood cell transfusions compared to full-term infants in order to elevate hemoglobin levels and improve blood oxygenation capacity22.

In this study, blood-transfused NICU who required respiratory support such as mechanical ventilation during hospitalization generally had longer hospital stays compared to those who only needed supplemental oxygen or do not require respiratory support. This finding is similar to previous research23. This means that there is a strong and consistent correlation between the need for invasive respiratory support and the length of hospital stay in transfusion-dependent infants. This may be attributed to the fact that patients who require invasive respiratory support may have more severe conditions, such as serious illnesses or injuries, that necessitate ongoing supportive treatments including blood transfusions. Transfusions can help improve oxygenation levels by providing sufficient hemoglobin and oxygen transport23,24. On the other hand, they may face a higher risk of complications such as infections25, which can further prolong hospitalization. Moreover, patients requiring mechanical ventilation may take longer to wean off the ventilator and may require respiratory physiotherapy after extubating to promote lung function recovery. Therefore, the need for invasive respiratory support is an important predictor variable in predicting the length of hospital stay in transfusion-dependent patients. Healthcare professionals should consider this when developing patient care plans and allocating hospital resources. Appropriate management of invasive respiratory support may help shorten the length of hospital stay and potentially improve patient prognosis.

UVC is a common invasive procedure often used in neonates26. This procedure involves inserting a catheter into the neonate’s umbilical vein for purposes such as blood transfusion, fluid administration, medication delivery, or monitoring hemodynamics27. While umbilical vein catheterization is considered effective and safe, it is important to note that the UVC is associated with longer duration of hospitalization. The procedure involves local anesthesia, manipulation, and fixation of neonates, which may cause discomfort or complications such as infection and bleeding28,29,30. As a result, hospitalization time for neonates requiring blood transfusion may be extended. On the other hand, in some cases of severely ill neonates, umbilical vein catheterization may be necessary for ongoing treatment or monitoring. In such situations, the presence of the umbilical vein catheter extends the duration of hospitalization for the child, requiring longer periods of observation and treatment for transfusion-dependent neonates.

The occurrence of nosocomial infection in hospitalized newborns is prevalent and carries significant consequences. It is widely recognized as one of the most frequently encountered adverse events during their hospitalization31. The immune system of newborns is relatively weak, with poor resistance, making them vulnerable to various pathogens. Infections can lead to severe complications and even endanger lives32,33. In our study, predictor variables contributing to long LOS in transfused children included sepsis. Previous studies have also confirmed that sepsis is a risk factor for long hospitalization10,34,35. Sepsis may require long-term symptom management, antimicrobial therapy, shock management, nutritional support, and carry a high risk of complications, thereby extending treatment time and LOS. Research has shown that the median LOS for infected newborns is twice that of uninfected newborns19. This indicates that infections do have a significant impact on the length of hospital stay for newborns. The occurrence of sepsis in transfused children may necessitate additional treatment and rehabilitation processes, also increasing medical costs and family burden. In our study, we have identified an important predictor variable affecting the length of hospital stay for newborns, which is the number of resuscitation attempts. This finding, which has not been previously mentioned in existing research, may be attributed to variations in healthcare policies across different regions. Newborns requiring multiple resuscitation attempts, particularly those in need of blood transfusions, often have severe conditions with multiple organ dysfunctions or systemic diseases. These infants require long medical monitoring, treatment, and rehabilitation to stabilize their condition. Additionally, frequent resuscitation attempts can induce fatigue and physical stress in newborns. To ensure their safety and stability, healthcare teams typically extend the duration of hospital observation, providing necessary rehabilitation and monitoring to prevent further deterioration.

Our research provides healthcare professionals with a visual predictive tool for identifying transfused infants at higher risk of long LOS. This allows clinicians to differentiate infants with a higher risk of long LOS, enabling them to plan general resource allocation accordingly. By identifying high-risk patients in advance, clinicians can plan for sufficient beds, equipment, and staff in the neonatal intensive care unit. The LOS prediction model also identifies potentially modifiable predictor variables that are associated with long hospital stays in transfused infants. The impact of modifying care to optimize these predictor variables could be studied in future research.

Limitations

The present study has several limitations that should be acknowledged. First, the discharge standards across hospitals may differ, which may confound the results. Second, although our model’s internal validation demonstrated excellent calibration and discrimination, external validation is still required using additional datasets to confirm its reliability. Third, as our study was conducted at the largest children’s medical center in the region, there may be a selection bias towards critically ill premature infants. Thus, generalizing our findings to other healthcare settings with different patient populations should be done with caution. Furthermore, to establish the robustness and reliability of our results, prospective studies conducted in multicenter clinical trials are needed. Finally, future research could explore including additional predictive variables to improve the model’s performance and predictive capabilities.

Conclusions

The present study introduced a novel nomogram that demonstrates satisfactory accuracy in assisting clinicians to assess the risk of long LOS in infants undergoing blood transfusion. GA (< 28 weeks), BW(< 1000 g), respiratory support type, UVC, sepsis and rescue frequency (≥ 3 times) were related to an increased risk of long length of NICU stay in infants receiving blood transfusion. This model can help healthcare professionals stratify the risk level of long hospital stay for children undergoing blood transfusion in NICU, conduct appropriate clinical interventions, and effectively allocate medical resources.