Introduction

Acute respiratory infections (ARIs) in children are a major reason for healthcare visits, especially in ambulatory settings. In a national survey, ARIs were the third most common primary diagnosis in office visits after hypertension and routine well-child checks in children and adults.1 Children have between 1–8 ARIs per year depending on age, exposures, and individual susceptibility.2,3,4,5 In the United States, these non-influenza viral respiratory tract infections cost over 40 billion dollars annually, including both direct healthcare costs and indirect costs related to missed days of school and work.6 Three billion dollars are spent annually on over-the-counter medications for symptomatic relief of ARIs despite little evidence to support their efficacy in children.6

Accurate measurement of ARI symptoms is critical for both epidemiologic and clinical trials to further understand illness severity, duration, and response to treatment. Various validated assessment instruments exist for children with bronchiolitis,7,8,9,10,11,12,13,14,15,16,17 but few are available for children older than 2 years of age with ARIs. Randomized controlled trials in these children,18,19 predominantly focusing on symptomatic therapies, usually use their own assessment measures that lack previous validation. This lack of consistent and validated measurement tools for assessing children in the ambulatory setting creates a challenge for the scientific community when comparing or conducting observational or therapeutic studies.

Existing assessment instruments for children with ARIs concentrated solely on symptom severity. The Canadian Acute Respiratory Illness and Flu Scale was the first ARI severity measure in children to assess functional and parental impact as well as symptom severity.20 The addition of these components is essential to understanding the full scope of illness in children who are not always able to communicate their symptoms effectively.

Recognizing the importance of including both symptom severity and functional impact, the Wisconsin Upper Respiratory Symptom Survey (WURSS) was created.21 WURSS is an illness-specific quality of life instrument that was developed and validated to assess the impact of ARIs in adults.22 It is available in different length versions, more than 20 languages, and has been used in dozens of clinical trials in several countries.23,24,25,26

The aim of this study was to assess the validity and reliability of a pediatric version of the WURSS for Kids (WURSS-K). WURSS-K is a 15-item instrument that focuses on both illness-specific symptoms and impact on quality of life, and is designed for use in children 4–10 years of age.

Methods

Phase 1: Development of WURSS-K

Phase 1 involved the development of the initial WURSS-K instrument aimed at assessing symptoms and quality of life using visual representation of happy and sad faces to assist Likert scale ratings, similar to Wong–Baker faces.27 The WURSS-21 provided the initial questionnaire format.23 Iterative review and refinement of the evolving questionnaire instrument was carried out by a multidisciplinary team, including two pediatricians (Wald, Gern), a family physician (Barrett), a psychometrics expert (Brown), a physician in postdoctoral research fellowship (Hayer), and an expert in questionnaire design from the University of Wisconsin Survey Center (Dykema). Research nurses trained in cognitive debriefing strategies then interviewed children regarding content, format, and ability to understand each questionnaire item. Audio recordings of these interviews were reviewed individually by team members and then discussed in a group format. Following these discussions and revisions, the WURSS-K daily symptom report was finalized in 2014.

Participants in this development process included 14 children, 4–6 years of age, who were enrolled in an ongoing research study.3 These children had upper respiratory symptoms starting 1–3 days before the interviews. The child’s parent was present during the interview. For children requiring assistance, the parent was allowed to explain questions or response options.

WURSS-K includes 15 items; 14 answered on a 4-point ordinal Likert-type scale (0 = not sick/do not have this/not at all, 3 = very sick/very bad/very hard) (Fig. 1). Questionnaire items included those related to global illness severity (item 1), severity of symptoms (items 2–7), functional impacts (items 8–14), and comparison in order to evaluate change over days of illness (item 15). Happy and sad face representations are included along with the ordinal scales to facilitate survey completion by children. There was a question at the end of the form for children to indicate whether the questionnaire was completed: (1) all by myself, (2) with some help, or (3) with a lot of help.

Fig. 1: Wisconsin Upper Respiratory Symptom Survey for Kids (WURSS-K)—daily symptom report.
figure 1

WURSS-K questionnaire items include global illness severity (item 1), severity of symptoms (items 2-7), functional impacts (items 8-14), and evaluation of change over time (item 15).

Scoring of the daily survey in Fig. 1 uses the sum of unweighted scores for each item. The sum of the 14 items on a 4-point ordinal scale results in a global total score ranging from 0 (low symptoms and low functional impairment) to 42 (high symptoms and high functional impairment). Item 15, which evaluates for change between days of illness, is not included in the overall score. For the two factors of symptoms and functionality, the total symptoms span from 0 (low) to 18 (high) and the total functionality spans from 0 (low impairment) to 21 (high impairment).

Phase 2: Administration of WURSS-K

Following development of the WURSS-K instrument, psychometric testing was done using two separate populations: (1) locally recruited sample in Madison, Wisconsin (Madison sample) and (2) part of an ongoing multicity research project (Urban Environment and Childhood Asthma (URECA) sample).28

For the Madison sample, volunteers were recruited by advertisements posted in the local community from October 2014 to December 2016. Inclusion criteria were an age of 4–10 years and willingness to answer daily symptom surveys during an ARI episode. The majority of participants were enrolled before exhibiting ARI symptoms, and were asked to call study personnel at the first sign of cold symptoms.

Study team members met with parents and children to obtain parental informed consent and child assent, review study design, explain the WURSS-K instrument, and collect baseline demographic information. When the child developed ARI symptoms, a parent called the study phone and was asked “Do you believe that your child has a cold?”. After the parent answered “Yes,” the parent was asked questions about symptoms using modified Jackson criteria29 (Supplementary Table 1). The child was classified as having an ARI if the parent answered “Yes” to the initial question and to one of the questions regarding nasal symptoms or sore throat. Once the child was classified as having an ARI, the child (or parent as proxy) would fill out WURSS-K booklet daily during an ARI episode. An ARI episode was considered complete when the child answered “Not sick” 2 days in a row on item 1 of WURSS-K. Booklets were mailed back and data entered into a REDCap database.30 Participants received minimal compensation for joining the study and for each booklet returned during an ARI.

In the URECA sample, the WURSS-K was administered to 9- and 10-year-old children at their routine study follow-up visits. URECA is an observational prospective study that enrolled pregnant women living in urban areas throughout the United States (Baltimore, Boston, New York City, and St. Louis) and has followed their children from birth.28 This cohort consists mainly of ethnic minorities; enrollment criteria included having at least one parent with allergic disease or asthma, and residence in an area in which at least 20% of the families have a household income below the poverty line. URECA study team members asked participants the first item of WURSS-K (“How sick do you feel today?”) and noted their responses. If the response was other than “Not sick,” the remainder of WURSS-K was completed by the participant (or a proxy). In order to avoid potential confounders, children with a diagnosis of asthma were removed from our analysis.

This study was approved by the University of Wisconsin School of Medicine and Public Health Institutional Review Board.

Phase 3: Psychometric analysis of WURSS-K

The WURSS-K questionnaire items are grouped into symptoms (six items) and functionality (seven items). Descriptive statistics of these 13 items included measures of central tendency, dispersion, and a polychoric correlation matrix, which optimizes the analysis of ordinal data. Missing data were assessed as either missing completely at random (MCAR) or missing at random.31 Multiple imputation by chained equations of missing values were used if the assumption of MCAR was accepted.32 Since some of the participants experienced multiple ARIs during the study period, psychometric structural analysis was based on ARI episodes and not on subjects. A cluster factor analysis33 was then used to model the ARIs clustered within subjects to adjust the standard errors for nested dependency.

Analyses were conducted in five different stages using classical test theory for the first four and invariance across the two-sample populations for the last one. Each of these stages is important in the evaluation of an instrument to determine (1) construct validity in the first stage, (2) internal reliability in the second stage, (3) whether to use as a global or a two-factor assessment (construct validity) in the third stage, (4) difficulty of items in the fourth stage, and (5) how it performs in different populations in the fifth stage.

In the first stage, confirmatory factor analysis was used to model the factor (congeneric) structure of WURSS-K. The congeneric model is one type of measurement in which a factor is measured by several observed items.34 WURSS-K was defined as a two-factor correlated model, with factors of (1) symptoms and (2) functionality.

In the second stage, reliability was assessed by a series of hierarchical measurement models, ranging from least to most restrictive: factor structure, tau equivalent, partial tau equivalent, and parallel.35 Tau equivalent assumes that each of the individual items measures the same factor on the same scale with the same degree of precision, but with different amounts of error. The corresponding reliability coefficient of tau equivalent is Cronbach’s alpha. Byrne, Shavelson, and Muthen36 have proposed relaxing some of the tau-equivalency restrictions to partial tau-equivalent model. Partial tau equivalent occurs when equal constraints are imposed on some but not all loadings. This less restrictive model then allows the model fit to be reassessed after some loadings are freely estimated. The parallel model assumes that each individual item measures the same domain, on the same scale with the same degree of precision and the same amount of error. Each of these models was assessed for goodness of fit to determine the best representation of the data, and to assess which measure of reliability would be considered the most appropriate.

In the third stage, bifactor analysis37 was used to evaluate the unidimensional or global nature of WURSS-K. Along with parameter and fit measures, explained common variance (ECV) and factor determinacy (FD) were estimated. ECV provides the extent to which an item’s responses are accounted for by variation on the single global factor. ECV > 0.70 demonstrates that the common variance can be regarded as unidimensional with a slight relative bias.38 FD assesses the correlation between the estimated score and the underlying factor; values range from 0 to 1 with FD > 0.90 being ideal.39 Finally, relative parameter bias was estimated, which is the difference between an item’s loading in the unidimensional solution and its general factor loading in the bifactor, divided by the general factor loading in the bifactor. According to Muthén et al., parameter bias <10–15% is acceptable.40,41

The fourth stage addresses item importance based on item response theory (IRT), using a two-faceted Rasch partial credit model. This model uses infit mean square (INFIT MNSQ) and outfit mean square (OUTFIT MNSQ) statistics to compare the fit of the observed data to values expected by the Rasch model.42,43,44 While item fit indices of 1.0 are ideal,45 Wright and Linacre have suggested that items are “fit” if their MNSQ falls within the range 0.6–1.4, with the range 0.5–1.5 considered acceptable.44 Higher scores indicate a wider variation in the response to an item, which suggests more difficulty, whereas lower (or negative) scores indicate less difficulty.

In the last stage, measurement invariance analysis was used to compare the URECA and Madison samples to see if the structure remained the same despite any potential differences in the two study population. If measurement invariance fails, it is evidence that the items do not measure the same factor in the same way in different situations. The most basic requirement is configural or pattern invariance, which addresses whether the configuration of the model is similar across groups. If this is not accepted, the usefulness of the scale in alternate populations should be questioned.

Results

There were 249 ARI episodes with a complete WURSS-K instrument, comprising 130 and 119 ARI episodes among 58 and 105 children, respectively, from the Madison and URECA samples (Fig. 2). Children in the Madison group completed a WURSS-K symptom report once daily throughout each illness episode, up to a maximum of 21 days; there was a mean of 2.2 (SD 1.5) ARI illness episodes per participant during the study period.

Fig. 2: Flow chart of participants.
figure 2

Children included in the validation of WURSS-K with the Madison sample (on the left) and URECA sample (on the right).

Participant demographics are shown in Table 1. The mean age at time of WURSS-K completion was 6.7 (SD 1.8) years in the Madison sample and 9.5 (SD 0.5) years in URECA sample. In addition to being older, the URECA sample had more participants who were female, African American, of Hispanic ethnicity, and with a lower annual household income.

Table 1 Participant demographics for the children included in the Madison and URECA sample populations.

WURSS-K total mean scores from the Madison sample were 9.6 (SD 5.0) on day 1, 6.8 (SD 5.4) on day 3, and 4.6 (SD 4.9) on day 10 (Fig. 3). The URECA sample had a mean global score of 8.2 (SD 6.5), with symptom and functionality scores of 4.8 and 3.4. Psychometric structural analysis in the Madison sample focused on day 3 of symptoms as this was the day during an ARI episode when the symptom complex stabilized. On day 3, the Madison sample had a mean global score of 6.8 (SD 5.4), with the symptom and functionality scores of 4.7 and 2.0. Missing data analysis showed completeness of 96% (p = 0.02) for day 3 in the Madison sample and 95% (p = 0.04) for the URECA sample. Based on Little’s MCAR test,31 the missing values in both samples indicated that the assumption of MCAR was accepted, allowing imputation using multiple imputation by chained equations.32

Fig. 3: Change of WURSS-K scores over time during ARI episodes in the Madison sample using notched boxplots.
figure 3

a WURSS-K global score; b WURSS-K total symptom score; c WURSS-K total functionality score. n number of ARI episodes included on each day. Closed circles symbols represent outliers.

There were 1029 and 119 total days of ARI illness with a completed WURSS-K from the Madison and URECA samples, respectively. The last item on this report (“I completed this page: all by myself, with some help, or with a lot of help”) was completed by 97% of children with 39% (433/1114) “all by myself,” 44% (490/1114) “with some help,” and 17% (191/1114) “with a lot of help.” Mean ages of children were 8.0 (SD 1.5), 6.7 (SD 1.9), and 6.1 (SD 1.6) years for responses of “all by myself,” “with some help,” and “with a lot of help,” respectively.

Psychometric results

Table 2 includes the goodness of fit for the models from the first four stages. The two-factor congeneric model (Model 1) was supported with high FDs of 0.98 for symptom factor and 0.92 for the functionality factor.

Table 2 Psychometric analysis of WURSS-K using goodness of fit with five models.

The tau-equivalent model (Model 2) was not supported based on rejection of χ2 difference test (χ2 diff 40.05, p < 0.001). Cronbach’s alpha coefficients were 0.67 and 0.82 for symptom and functionality factors. Since the two-factor tau-equivalency model was rejected, the use of Cronbach’s alpha may be biased, underestimating the reliability. Therefore, a partial tau-equivalent model (Model 3) was subsequently pursued, in which two items were freed, which was accepted by the χ2 difference test (χ2 diff 16.85, p = 0.052). The corresponding omega reliability assessment coefficients are 0.72 and 0.91 for symptoms and functionality.

Since evidence supported two separate factors (symptom and functionality), global assessment with a bifactor model was assessed (Model 5). The bifactor model fit indices yield an FD of 0.98, with reliability coefficients of Cronbach 0.85 and omega 0.90. ECV was 0.72, greater than the acceptable value of 0.70. The relative parameter bias resulted in average absolute relative bias across items of 10.4%, which is within the accepted range40,41 and allows the use of WURSS-K as either a single global factor or a two-factor assessment.

Item response theory analysis of the WURSS-K indicated adequate item fit with all INFIT and OUTFIT MNSQ having values of 0.5–1.5 (Table 3). The Rasch analysis indicated strong support for unidimensional integrity explaining 59% of the variance; Reckase suggests that this level is acceptable for unidimensional assumption.46,47 The easiest items were “How bad are your cold symptoms”—“stuffy nose” (item 3), “runny nose” (item 2), and “cough” (item 6). The items that appeared to be most difficult for participants were “How hard has it been to think” (item 8), “How hard has it been to go to school” (item 13), “How hard has it been to play with friends” (item 14), and “How hard has it been to walk, climb stairs, exercise” (item 12).

Table 3 WURSS-K items sorted by item difficulty from easiest to hardest using infit mean square (INFIT MNSQ) and outfit mean square (OUTFIT MNSQ) statistics.

The two-sample populations (Madison vs URECA) were compared using a two-factor configural invariance model, which was accepted (Supplementary Table 4). The weak invariance based on partial tau-equivalent model estimated that the two groups fit the data fairly well (χ2 difference 13.97, p = 0.23) according to suggested fit criteria.48 Further analyses of invariance across groups, both strong and strict were not supported with χ2 difference of 45.68 (p = 0.004) and 54.84 (p < 0.001), respectively. These findings suggest that the measurement of the two-factor WURSS-K model did not differ substantially across the two-sample populations, thereby demonstrating that the instrument was robust across groups with different demographics.

Discussion

These analyses demonstrate that WURSS-K is a valid and reliable instrument for assessing ARI in children 4–10 years old. The congeneric (Model 1) and bifactor (Model 5) model results support the validity of WURSS-K as either a two-factor structure or a single global structure. The acceptance of partial tau equivalent (Model 3), with strong corresponding Cronbach alpha and omega coefficients, demonstrates internal consistency across the two factors of symptoms and functionality.

Comparing the WURSS-K global structure across the Madison and URECA samples, structural invariance was accepted for both configural and weak invariance, providing evidence for the robustness of this instrument across two pediatric populations with different demographic characteristics.

Item response analysis suggested that the functionality items were more difficult for children to answer. Rating the impact of the illness on ability to “think” was the most difficult, followed by impact on ability to “go to school.” This is understandable, as these assessments require a certain level of comprehension that may be difficult for younger children to evaluate or for parents to assess. In this study, 38% completed WURSS-K by themselves without any help (mean age 8.0 years) and 17% required a lot of help (mean age 6.1 years). To our knowledge, this is the first standardized and validated assessment tool for ARI that children can complete by themselves if able.

This study has several limitations, including sample size and potential selection bias. Another limitation is that the URECA sample population collected information at a single time point, thus limiting information about the illness over time. Further research is needed to address instrument responsiveness and the clinical significance of cumulative score values.

Conclusion

WURSS-K shows strong psychometric properties for validity and reliability in children aged 4–10 years during ARI episodes. This instrument will be useful in clinical trials or observational studies when assessing children with ARI in the ambulatory setting.

WURSS-K and other WURSS instruments are available free of charge for educational and public health purposes with registration at the following website: https://www.fammed.wisc.edu/wurss/. Commercial for-profit use will need to be licensed by the Wisconsin Alumni Research Foundation.