Introduction

Respiratory syncytial virus (RSV) is the most common cause of lower respiratory tract infections in children.1 Worldwide, there are approximately 34 million new cases of RSV-related lower respiratory tract infections per year in children under the age of 5 years, with 10% requiring hospitalization.2 RSV presents with signs and symptoms of upper and/or lower respiratory tract infection, varying in severity from rhinorrhea, nasal congestion, fever, and myalgia to wheezes, prolonged expiration, and respiratory failure.3 The primary management for RSV is supportive care, including respiratory support, antipyretics, and hydration.3 Respiratory support ranges from humidified oxygen therapy to endotracheal intubation and mechanical ventilation, where a breathing tube is inserted into the airway and a machine breathes for the child.4 Children at risk for severe respiratory disease may be given palivizumab, an immunoprophylaxis medication, but this is restricted to certain populations such as preterm infants (born before 29 weeks gestational age or before 34 weeks gestational age with chronic lung disease) and children with severe heart disease.1 Palivizumab is given to prevent RSV and is unable to treat the viral infection once it has been contracted.1 The hospital course for RSV is challenging to predict, partially attributable to the supportive nature of care and the sparsity of evidence-based therapies for this population.5

The length of stay during hospital admission for RSV is estimated to range between 3 and 7 days, which is determined by the severity of illness and hospital-level factors that impact efficiency.5,6,7 Prolonged length of stay has been defined as greater than or equal to the 90th percentile length of stay for a particular patient population or group.8 Prolonged length of stay results in an increased risk of hospital-related adverse effects and increased costs to the healthcare system, patient, and family.6,7,9

Studies have evaluated variables that are associated with the severity of disease or death caused by RSV. Associated comorbidities include preterm birth, congenital heart disease, neuromuscular disease, chronic lung disease, airway anomalies, asthma, trisomy 21, and conditions that impact immune function.10,11,12 Prolonged length of stay for RSV has been associated with several patient-level variables, including younger age, prematurity, co-infection, co-morbidities, and interventions such as intubation and mechanical ventilation.5,6,8,13,14,15

Prediction models are used to classify patients as being at risk (or higher probability) for an outcome, which can often have a greater clinical impact than traditional logistic regression models.16 A model to predict a prolonged length of stay in children with RSV would help clinicians identify these patients and prepare the family for a longer length of stay. Logistic regression models have been built to predict the severity of RSV17 or hospitalization for RSV.18 To our knowledge, a model has not been built to predict the prolonged length of stay in children with RSV.

The overarching aim of this study is to predict the prolonged length of stay in children admitted to the hospital with RSV. The specific research objectives are: to develop and internally validate a prediction model for a prolonged length of stay in children admitted to hospital with RSV; and to externally validate the prediction model using temporal validation.

Methods

This report adheres to the Transparent Reporting of a multivariate prediction model for Individual Prognosis Or Diagnosis (TRIPOD) guidelines.19

Data source

We examined the National Inpatient Sample (NIS), which is a sample obtained through stratified, self-weighted, single-stage cluster design, approximating 20% of all hospitalizations in the United States.20 Data from 2016 were used to develop and internally validate the prediction model. Data from 2017 were used for temporal external validation. As the NIS contains publicly available de-identified data, this study was exempt from ethics review by the Conjoint Health Research Ethics Board at the University of Calgary.

Sample

Hospitalizations were included if the patients were under the age of 2 years at admission to the hospital and had a primary diagnosis of RSV in 2016 or 2017. The Tenth Revision of the International Statistical Classification of Diseases and Related Health Problems Clinical Modification (ICD-10-CM) diagnosis codes were used to identify RSV diagnosis. The codes used were: J12.1 (RSV pneumonia), J20.5 (acute bronchitis due to RSV), J21.0 (acute bronchiolitis due to RSV), and B97.4 (RSV).21 It is estimated that 75–90% of children admitted to hospital with RSV are ≤12 months old,22 so including children up to the age of 2 years should capture the population of interest and will be consistent with the literature in the field. Hospitalizations were excluded if the length of stay was <1 day to ensure we only captured children who were admitted to the hospital. Hospitalizations were also excluded if the children died during their hospital admission or were transferred out to an acute care hospital, as including these may bias the results.

Variables

The primary outcome was prolonged length of stay. Prolonged length of stay was determined by finding the 90th percentile length of stay from the data. The patient-level variables that we included in the initial model were age, sex, race, comorbidities, transfer from another site, household income, health insurance status, and intubation. Age was categorized into <1 month, 1 month to <12 months, and ≥12 months. Sex was included in the initial model despite the conflicting evidence to support male sex being associated with RSV hospitalization and severity.11,12,23,24,25,26 Race was included as others have found an association between the non-Hispanic white race and RSV hospitalization.23 The comorbidities we included in the model include congenital cardiac disease, chronic lung disease, neuromuscular disease, trisomy 21, and immunosuppression. See Supplemental Information for the ICD-10-CM codes used to identify these variables. These comorbidities were chosen based on the literature and availability in the NIS database. Others have found that children who required transfer from an acute care hospital have increased odds of severe RSV,10 so we included this variable. Household income (in quartiles) and health insurance (government insurance, private insurance, self-pay, or other) were included in the initial model, as these relate to the child’s socioeconomic status and access to healthcare. Intubation and mechanical ventilation have been found to be associated with increased length of stay for children hospitalized for RSV, so intubation was also included.13 The hospital-level variables that were considered for the model included urban or rural hospital status and teaching or non-teaching hospital status.20 It is anticipated that the urban and teaching hospitals would have higher acuity patients, resulting in a longer length of stay.27

Statistical analyses

Data analyses were performed using STATA IC 15 (Stata Statistical Software: Release 15. College Station, TX: StataCorp LLC). Descriptive statistics were used to examine the data, taking into account the sampling design. Discharge weights were used to produce national estimates.20 Logistic regression models were developed using the NIS data from 2016. The variables described above were assessed for association with the outcome using bivariate analysis (p < 0.2) to be included in the initial logistic regression model. All variables were dichotomized or categorized for interpretation purposes. Multicollinearity was assessed using the variance inflation factor.28 Stepwise selection was used to remove variables, retaining variables significantly associated with length of stay (p < 0.05) in the model.

Comparison of model performance (internal validation)

To test the model’s performance, each regression model was fit in a bootstrapped sample with the same number of subjects as the original sample. This method is preferred over split-sample or cross-validation designs, as it maintains a large sample size.29 The bootstrap sample was drawn with replacement and was replicated 500 times to generate precise confidence intervals and optimism correction. The c-statistic, which, for dichotomous outcomes is equivalent to the area under the receiver operating characteristic (ROC) curve, was used to quantify the discriminative ability of each model. Discrimination quantifies how well a model can predict whether a child hospitalized with RSV would have a prolonged length of stay.30 Calibration, the agreement between the predicted and the observed outcomes, was assessed using the Hosmer–Lemeshow goodness-of-fit test and calibration slopes.30 Both discrimination and calibration are essential to assess for in prediction models because they quantify the model’s performance, demonstrating whether or not the model can predict what it should.

Comparison of model performance (external validation)

The data from 2017 were used to test for the external validity of the models. Discrimination and calibration of the models in the 2017 sample were conducted using the same methods described above.

Results

The total sample size was 9589 hospitalizations. The mean length of stay was 3.7 days (95% confidence interval (CI): 3.6–3.8). The prolonged length of stay (90th percentile of the length of stay) was 7 days, so the outcome was dichotomized into ≥7 days and <7 days. In general, hospitalizations with children <1 month had the highest proportion of the prolonged length of stay (20.1%), compared to those 1–12 months (10.8%) and those ≥12 months (7.9%). Children being transferred from another acute care hospital had a higher proportion of the prolonged length of stay (18.3%), compared to those not transferred (10.0%). Most hospitalizations that required intubation had a prolonged length of stay (94.1%). See Table 1 for the sample characteristics. The mean total charges for the hospital stay was $16,709 USD (95% CI: $16,389–17,032) for hospitalizations <7 days and $100,052 USD (95% CI: $91,688–108,416) for hospitalizations lasting 7 days or longer.

Table 1 Sample demographics.

The first model (Model 1) included age, transport from an acute care hospital, intubation, comorbidities (congenital cardiac disease, chronic lung disease, neuromuscular disease, trisomy 21, and immunosuppression), urban or rural status, and teaching status (Table 2). Using a classification cut-off of 0.2, Model 1 had an area under the ROC curve of 0.73 (95% CI: 0.71–0.75), demonstrating good predictive ability (Fig. 1a). The Hosmer–Lemeshow goodness-of-fit test was non-significant with a p value of 0.44, demonstrating agreement between the predicted and observed outcomes. The calibration slope was 1.0, confirming the agreement between the predicted and observed outcomes (Fig. 1b). The model’s sensitivity was 33.7% and specificity was 96.2%; the positive predictive value was 52.4%, and the negative predictive value was 92.1%.

Table 2 Model 1—β coefficients and odds ratios.
Fig. 1: Internal and external validation.
figure 1

a ROC curve for Model 1 (internal validation); b calibration slope for Model 1 (internal validation); c ROC curve for Model 1 (external validation); d calibration slope for Model 1 (external validation).

A second model (Model 2) was developed to test whether the specific comorbidities were highly predictive, or if the number of comorbidities a child had could replace these variables. There were 454 (4.7%) RSV hospitalizations where children had one comorbidity and 73 (0.7%) with two. This model included age, if the patient was transported, if the patient was intubated, the number of comorbidities, urban or rural status, and teaching status. Model 2 had an area under the ROC curve of 0.73 (95% CI: 0.71–0.75), demonstrating a similar predictive ability as Model 1 with a classification cut-off of 0.2. The Hosmer-Lemeshow goodness-of-fit test was non-significant with a p value of 0.38, demonstrating agreement between the predicted and observed outcomes. Again, the calibration slope was 1.0, confirming the agreement between the predicted and observed outcomes. Model 2 had similar sensitivity and slightly higher specificity than Model 1, with values of 33.3 and 96.3%, respectively.

A third model (Model 3) was developed to test how well the model performed without the variable intubation, including variables: age, if the patient was transported, the number of comorbidities, urban or rural status, and teaching status. Model 3 had an area under the ROC curve of 0.70 (95% CI: 0.69–0.71), finding slightly lower predictive ability than Models 1 and 2. The Hosmer-Lemeshow goodness-of-fit test was non-significant with a p value of 0.32, suggesting adequate model fit.

All models included sex, race, household income, and health insurance in the initial model development; however, these variables were not found to be predictors of increased length of stay for RSV hospitalizations.

All three models were externally validated using NIS data from 2017. See Table 1 for the sample characteristics for this data. The models performed similarly in the external validation analysis. The ROC curve for Model 1 using 2017 data was 0.74 (95% CI: 0.72–0.76), demonstrating that the model has a similar predictive ability in a different sample (Fig. 1c). The calibration slope was 1.0 for Model 1 fit with 2017 data, demonstrating agreement between the predicted and observed outcomes (Fig. 1d). The ROC curve for Model 3 using 2017 data was 0.74 (95% CI: 0.72–0,76), and the calibration slope was 1.0, demonstrating that when the variable intubation was removed, it did not change the model’s predictive ability or the fit of the model in the new dataset.

Going back to the NIS data from 2016, there were 99 hospitalizations that had ≥95% predicted probability of having a prolonged length of stay, obtained from the second model. The majority of the patients (69.4%) were >1 month, but <1 year old. Only 7.1% of the total sample was under 1 month; however, this age group accounted for 19.4% of the hospitalizations with ≥95% predicted probability of prolonged length of stay. Most (75.7%) of the hospitalizations with ≥95% predicted probability of being prolonged had intubation occur. The majority (52.5%) had a patient with 1 comorbidity, and 9.1% had 2 comorbidities. Almost all of the hospitalizations occurred in urban hospitals (90.1%) and teaching hospitals (85.9%). The majority of hospitalizations with ≥95% predicted probability of prolonged length of stay did not involve a transfer from another acute care hospital (69.7%); however, a higher percentage of hospitalizations had patients who were transferred (30.3%) in this group than the overall sample (12.1%).

Comment

The variables included in this prediction model are consistent with the literature. Others have found that younger age,12,13,31 transport from another site,10 and comorbidities, including trisomy 21,14 congenital cardiac disease13,32, and bronchopulmonary dysplasia,12,31 have been associated with increased severity of RSV. Other studies have found factors that have moderate associations with severity of illness such as maternal smoking during pregnancy,25 siblings ≥2 years of age,22,25 and daycare exposure22; we were unable to include these variables in the model due to limitations in the availability of such data. Intubation for RSV has been associated with an increased length of stay in other cohorts,10,13 which makes sense from a clinical perspective. Children who require intubation have a higher severity of illness, requiring the highest level of respiratory support. Unfortunately, one limitation of using data from NIS is that the variables are not time-related, and we were unable to determine when during the hospital stay the child was intubated. We developed Model 3 excluding the variable intubation to provide a model that only included variables that would be known at the time of admission. Interestingly, all three models performed similarly in external validation, suggesting that removing the variable intubation from the model did not impact the model’s performance.

The internal validation demonstrated good predictive ability and discrimination of the models. All three models had high specificity, as they were able to accurately predict >95% of the patients who did not have a prolonged length of stay. High specificity is clinically relevant, as it provides information about the hospitalizations that will most likely have a length of stay <7 days. Clinicians can use this information to identify the children who will most likely have a shorter, less-costly length of stay, reassuring families and the healthcare team. Conversely, the models accurately predicted approximately one-third of patients who actually had a prolonged length of stay, demonstrating low sensitivity. This may be due to unmeasured variables that are not captured in the NIS database, such as birth history and pre-existing conditions. As with any administrative database, information is only captured if it was documented during the admission.33 Other studies have demonstrated that the validity of certain conditions, such as congenital anomalies, varies based on the database used, which can bias the analyses.34 The benefit of using an administrative dataset for this study is the low cost and labor associated with the collection, and the ability to have a large sample size. To address this concern, future research could pursue externally validating the model using clinical data.

An interesting finding from the descriptive analyses includes the estimated mean cost difference between children admitted for RSV with and without a prolonged length of stay ($100,000 vs $17,000 USD). The increased costs would not only be due to a longer hospitalization; this likely reflects the increased acuity, requiring additional interventions such as intubation and intensive care admission, that may be associated with a prolonged length of stay. Predicting the patients who are at risk for a prolonged length of stay should be of interest to hospital administrators and unit managers, based on these stark differences in estimated mean costs.

This study has several strengths, including large sample size and external validation conducted in a temporally different sample. The large sample size ensured there was adequate power to develop and validate the models. The external validation demonstrated very similar results to the internal validation measures, increasing the transportability of the models.30,35 This suggests the models are generalizable to the population that this sample is reflective of (hospitalizations in the United States). Future studies could explore the generalizability of the model in another population, separate from the NIS database.

There are limitations when using an administrative database. We are relying on the validity of the ICD-10 codes to identify the diagnoses and procedures desired; however, if the hospitalization record was not coded properly, there is a risk for both selection and misclassification bias. When possible, we used ICD-10 codes that had been validated in other studies. A recent study evaluated ICD-10 codes for RSV surveillance and found them to have low sensitivity (6%, 95% CI: 3–12%) but high specificity (99.8%, 95% CI: 99.6–99.9%).21 This may lead to an underestimation of the true prevalence of RSV in the population, and patients who were admitted to hospital with RSV may be excluded; however, we are confident that most of the hospitalizations identified as having RSV are true positive cases.

Missing data is another concern when using the NIS.36 Race, for example, is missing in approximately 8% of cases because not all sites provide this data. To maintain a large sample size and avoid listwise deletion, we coded any of the missing data on race to “unknown.” We were also unable to determine if a child was represented twice in this database, as the NIS contains individual admissions, not individually identified patients. Based on prior research demonstrating low rates of readmission for pediatric lower respiratory tract infections,37 we hypothesize that the number of patients admitted for RSV twice in one year would be limited, and would not impact our findings.

There are certain variables, such as a history of preterm birth, that have been found to be associated with prolonged length of stay with RSV18 that are not captured in the NIS database. To mitigate this limitation, we used the ICD-10 code for chronic respiratory disease originating in the perinatal period (classified as chronic lung disease) to capture infants born preterm who are experiencing lasting respiratory effects. Additionally, we were unable to categorize age into smaller groups, which limits the analysis because the risk of severe infection is not uniform across the first year of age.38 Due to these limitations, we feel that the development of a clinical decision-making tool would be misleading at this time. The next steps should involve capturing clinically relevant variables and collaboration with clinicians to develop probability thresholds that could alert the healthcare team about the children most likely to have a prolonged length of stay. This could benefit patients and families by adequately preparing them for a longer hospital stay, as well as ensuring the hospital units are appropriately staffed.

Conclusions

The analysis suggests that the variables that can predict a prolonged length of stay for RSV include younger age, transport from an acute care hospital, intubation, increased number of comorbidities (or specifically one or more of congenital cardiac disease, chronic lung disease, neuromuscular disease, trisomy 21, or immunosuppression), urban hospitals, and teaching hospitals. These results are consistent with previous literature that demonstrated these variables were associated with increased severity of disease or increased length of stay with RSV.1,10,11,12,26 Future studies could explore the clinical presentations of children with RSV that predict the prolonged length of stay, perhaps increasing the predictive ability of this model. With electronic medical records being used in most acute care hospitals,39 accessing clinical data such as vital signs, respiratory assessments, oxygen requirements, and interventions may facilitate the development of a more clinically usable prediction model.