Introduction

Neurogenic bladder dysfunction affects a heterogeneous group of people with varying bladder symptoms [1]. Assessment and care of this population is challenging due to different functional abilities, comorbidities, and bladder management strategies. The impact of a neurogenic bladder on a person’s quality of life (QOL) can be significant, and genitourinary complications are a source of morbidity for many people [2, 3]. While there are urinary specific QOL measures, such as the Qualiveen [4] and the SCI-QOL Bladder Management Complications tool [5], they focus primarily on the impact of bladder issues on QOL, rather than directly measuring symptom burden. The Neurogenic Bladder Symptom Score (NBSS) is a validated 24 item questionnaire that measures bladder symptoms across 3 different domains: incontinence (scored 0–29), storage and voiding (scored 0–22), and consequences (scored 0–23); there is a single general urinary QOL question scored from 0 (pleased) to 4 (unhappy) [6]. For all domains, a higher score represents a worse symptom burden or QOL.

The NBSS has only been validated in the original study population which included a mix of people with a spinal cord injury (SCI), multiple sclerosis (MS) or congenital neurogenic bladder [6]. Reviews of the SCI measurement literature have suggested that a validation in a predominantly SCI population would add to its face validity [7]. The objective of this study was to complete a secondary assessment of the validity and reliability of the NBSS using a large cohort of people with a SCI.

Methods

Data from an ongoing multicenter prospective observational study measuring bladder related complications and QOL of people with a SCI over time was used (clinicaltrials.gov NCT02616081). Institutional ethics board approval was obtained from all participating institutions, and all applicable institutional and governmental regulations concerning the ethical use of human volunteers were followed during the course of this research.

People were recruited from neurourology and rehabilitation clinics in the United States, and through an open online portal between December 2015 to September 2016 [8]. Briefly, the inclusion criteria were age of at least 18 years with an acquired SCI. People completed an extensive standardized telephone interview and patient reported outcome measures (including the NBSS) at initial enrollment. They then completed assessments (including the patient reported outcome measures such as the NBSS) at 3-month intervals over a 1 year period. These assessments were carried out independently by the participants and triggered using a reminder system and a standardized online gateway (that was unchanged between time points). Additional variables used for the hypothesis testing validity assessment included, first, the Short Form-12 (SF-12), an extensively studied general QOL tool; scores are standardized to a mean of 50 and a standard deviation of 10, and a higher score is interpreted as better QOL [9]. The physical domain questions were modified for people with SCI [10]. Second the computer adaptive version of the SCI-QOL Bladder Management Complications tool [5] which generates a standardized score (mean score of 50 and a standard deviation of 10), and a higher score equates to more complications. Third, responses to questions about the presence of a renal stone procedure in the 3 months prior to enrollment (binary variable), the number of urinary infections in the past year (continuous variable), and a hospitalization for urinary infections in the prior year (binary variable). Our hypotheses regarding these relationships are outlined in Table 2 of the results. As this study was ongoing, an interim data set after 9 months of study recruitment was used for this study.

Statistical analysis

Medians and interquartile ranges were used, except for the calculation of measurement error which required the mean and standard deviation of the NBSS total score and individual domains. For our reliability assessment where the NBSS was measured at two time points, we used an intraclass correlation coefficient (ICC2,1) = 0.85 with a lower confidence bound of 0.75, alpha = 0.05, and beta = 0.20 to estimate a minimal sample size for the reliability analysis of 79 people [11].

Internal consistency was assessed using Cronbach’s alpha (with values ≥0.70 considered good internal consistency) [12], and with item to domain correlations (a Pearson correlation coefficient ≥0.30 was considered a moderate to strong association). While the full cohort was used to assess validity, a subset of this cohort (based on data availability and treatment stability) was used to assess reliability. Reliability was assessed using a test–re-test methodology comparing the original NBSS score with the repeat NBSS score at the 3-month reassessment using an ICC2,1 [13]; this analysis was restricted to the subset of people who had experienced no change in their urinary health based on the available data. They were identified by selecting people who self-reported no hospitalizations, surgeries, changes to medications, or changes to bladder management method between enrollment and the first 3-month follow-up. The validity of the domains was assessed by testing hypothesized correlations between NBSS domains and other variables collected in the study. Correlations >0.70 were considered strong, 0.70–0.30 considered moderate and those <0.30 considered weak [14]. The Spearman’s rank (for ordinal variables), point-biserial (for binary variables), or Pearson’s correlation coefficient (for continuous variables) were used as appropriate.

Finally, the measurement error for each of the domains, and for the total score was determined. This was estimated both using the standardized mean difference (SMD, half of the between person standard deviation at baseline), and standard error of the mean (SEM, calculated as between person standard deviation at baseline multiplied by the square root of 1 minus the ICC2,1) [15]. The smallest real difference (SRD), using a 90% confidence interval was calculated from the SEM for both group level comparisons (SEM multiplied by 1.64) and individual comparisons (SEM multiplied by 2.31) [16]. Group level SRD are the most relevant, and identify differences between groups, whereas individual SRD identifies the real difference when considering change in a single individual’s score.

SAS 9.4 and R 3.4.0 were used for at statistical analysis, and a two-sided p < 0.05 was considered significant and 95% confidence intervals were reported where applicable. The Consensus-based standards for the selection of health measurement Instruments checklist was used to ensure complete study reporting [17].

Results

There were 644 people with a confirmed SCI who had initiated study enrollment during the interim study period. Of these, data from 14 people were not used as they failed to complete the enrollment process, and data from 21 people could not be used due to missing NBSS data. Our final group of 609 people had a median age of 48 (IQR 36–57) years, were 67% were male, and most used CIC (63%, Table 1). Our cohort was representative of various socioeconomic groups and educational backgrounds. The median NBSS total score at study enrollment was 22 (IQR 15–30), and the median NBSS QOL was “Mixed” (IQR “Mostly unsatisfied”—“Mostly satisfied”). The median NBSS domain scores were incontinence (9, IQR 2–14), storage and voiding (7, IQR 4–10), and consequences (6, IQR 5–8).

Table 1 Description of the study cohort demographics

Validity assessment

The item-to-domain correlations were all moderate to strong (≥0.30) for the incontinence and storage and voiding domains; 3 questions from the consequences domains had only weak item-to-domain correlations (Supplementary Appendix 1). Cronbach’s alpha was calculated for the incontinence (0.93), storage & voiding (0.76), consequences (0.49), and total score (0.85). Correlations were assessed between the NBSS or its components and other variables collected as part of the existing study protocol (Table 2). People with missing data for the additional measurement tools required for the hypothesis testing validity assessment were excluded from that specific correlation (<1%). There was a correlation of r = 0.50 between the SCI-QOL Bladder Management Complications tool and the NBSS consequences domain and weak correlations between the SCI-QOL Bladder Management complications tool and the NBSS total score (r = 0.28) and QOL question (r = 0.29) were observed. The SF-12 physical and mental domains had a weak negative correlation with the NBSS QOL question. The NBSS consequences domain had weak to moderate correlations with prior UTIs and renal/bladder stone procedures.

Table 2 Assessment of hypothesized relationships between NBSS components and external measures

Reliability assessment

Of the 609 people, 349 had 3-month followup data. Of these, 163 were excluded due to a potential change in their bladder function or general health (they had a reported change to their medications, bladder management strategy, or a new surgery or hospitalization), and an additional 12 were excluded due to missing NBSS data in the follow-up assessment. In our final cohort of 174 presumably urologically stable people, the test–retest reliability based on an ICC2,1 was >0.75 for all domains, the QOL question, and the total score (Table 3).

Table 3 Test–retest reliability of the NBSS over a 3 month period (n = 174)

Measurement error

The measurement error for the domains and the total score were very similar between the SMD and SEM (Table 4). The smallest real difference for group level comparisons (as would commonly be done in clinical research) ranged from 0.9 for the QOL question to 7.7 for the NBSS total score.

Table 4 Measurement error and smallest real difference

Discussion

Choosing the right measurement tool for a research project can be challenging. In the field of neurourology, there are few tools that have been developed specifically for bladder-related QOL or symptoms, and most studies have used cross-validated instruments from other populations or questionnaires which have not been validated in a neurourology population [18]. Our multicenter prospective cohort study of the internal consistency, validity, and reliability of the NBSS in a large population of people with a SCI yielded results similar to the originally reported results in a group of people with different reasons for neurogenic bladder dysfunction [6]. For example, the Cronbach’s alpha for the overall NBSS was quite similar (0.85 vs. 0.89); however, for the consequences domain it was lower than previously measured (0.49 vs. 0.69). Cronbach’s alpha is higher when the underlying construct is more consistent, and therefore a domain trying to capture urologic morbidity (which is quite variable among people) would be expected to be somewhat problematic. The initial validation study included a population of people with a SCI or MS with lower consequences scores, and this may explain why Cronbach’s alpha for this domain was higher in the initial study population. The higher urologic morbidity generally seen in this larger cohort motivated by enrollment in a bladder-related QOL survey (and a significant number recruited from tertiary neurourology clinics) likely have more urinary consequences, and this intensifies the variability within this domain.

In general, the NBSS was consistent in its relationships with other measurement tools and clinical variables. Urinary specific QOL (the NBSS QOL question) only had a weak correlation with overall QOL (measured by the SF-12), which was expected given the multiple physical and social factors that determine overall QOL. The validated SCI-QOL bladder management complications tool (with questions about urinary tract infections, and the impact of bladder issues) was moderately correlated with the NBSS consequences domain, demonstrating the expected link between urinary morbidity and QOL. The magnitude of these correlations are in keeping with validation studies of other questionnaires in the neurogenic population (for example the I-QOL and Qualiveen-SF) [19, 20]. The test–retest reliability was appropriate, with ICC values >0.75 [21]. The values were however lower than the previously reported values of 0.91–0.86 (which were calculated in the original study using a median 3-week re-test period as compared to a 3 month test–re-test time period in this study). The longer the period between questionnaire administrations, the more likely a real (and potentially undetected) change will have occurred among the people, which results in a lower ICC. This likely explains the differences in the reliability measurement, and in addition it is also possible that some people did experience a real change in bladder symptoms that was not identified with the questions we had to use to detect change.

Statistically significant differences in questionnaire scores are dependent on sample size, which is why the SEM, and the SRD are useful characteristics to know. A difference between two groups of people that is greater than the SEM (which indicates the smallest detectable change) can be generalized as “a little better”, whereas a difference greater than the SRD (which indicates a more meaningful change) can be generalized as “a good deal better” [15]. For the total NBSS score, a change of 5–8 points, or 0.5–1.0 points on the QOL question is likely to represent a small but real change. This magnitude of change is consistent with other symptom scales, for example the American Urologic Association Symptom Score for benign prostatic hyperplasia [22].

Limitations of our study are important to acknowledge. People were recruited through social media and neurourology/physiatry clinics, and therefore our data may be skewed towards those with a higher level of technological engagement or urologic complications; this potentially limits the external generalizability of our results. Our assessment of validity was limited to the variables and questionnaires that were included for the primary objective of this study. In this study, our reliability measurement likely underestimates the true reliability of the NBSS given the long time between administrations, and the lack of a question specifically asking about a change in bladder function or symptoms. As the SRD is derived from this reliability measurement, the actual SRD may actually be lower. As an example, when the original reliability estimates are used [6], the group level SRD for the total NBSS score is 5.1 as opposed to 7.7. Finally, while minimally important clinical change (which uses a relevant indicator of change which would influence management) is an attractive metric of meaningful change, we could not determine that with the current study data.

Conclusions

The NBSS shows good internal consistency, validity, and reliability in a large population of people with SCI. However, the NBSS consequences domain had a low internal consistency, and this should be taken into account if it is to be used as a stand-alone domain.