Introduction

The life expectancy of individuals with spinal cord injury (SCI) has improved dramatically over time as a result of medical advances [1], and urologic complications are not the primary cause of death anymore [2]. However, neurogenic lower urinary tract dysfunction (NLUTD) and associated urologic complications are among the most prevalent and severe health issues after SCI [3, 4]. Furthermore, recovery of bladder function is a priority for individuals with SCI [5]. Having established a level of care that enables long-term survival, quality of life (QoL) becomes increasingly important for both individuals with SCI and caregivers.

The strong interaction between QoL and NLUTD management as well as urologic complications [6,7,8] highlights the importance to also measure QoL when documenting and evaluating the response to treatment and rehabilitative care. Without considering the effect on QoL, measures regarding NLUTD cannot be assessed comprehensively. The prerequisites for suitable instruments to assess the impact of NLUTD on QoL are that they have been developed specifically for individuals with NLUTD and that the presence of other health conditions does not minimize their sensitivity [9]. The Quality of Life Index (QLI) and the Qualiveen questionnaire are currently the only validated patient reported outcome measures with good sensitivity to the effects of NLUTD [9, 10]. The Qualiveen questionnaire has strong psychometric properties and contains 30 items for assessing the impact of NLUTD on limitations, fears, and feelings [11, 12]. It has been used extensively and has been validated in several languages (originally French), including English [13], German [14], Portuguese [15], Italian [16], Spanish [17], and Persian [18]. However, the length of the questionnaire has been mentioned as a limitation in validation studies [15, 17]. A short-form containing only eight items has therefore been developed and validated in French and English [19], Dutch [20], Greek [21], Russian [22], Polish [23], and Turkish [24]. The short-form has shown similarly strong psychometric properties, and both questionnaires have been recommended for assessing the impact of NLUTD on QoL by international expert panels [10, 25].

The objectives of this investigation were to validate and evaluate the measurement properties of the German Qualiveen short-form questionnaire in individuals with chronic NLUTD resulting from SCI.

Methods

Participants

Individuals with chronic (>12 months) NLUTD resulting from traumatic or non-traumatic SCI presenting for an annual urodynamic follow-up examination in a tertiary neuro-urologic referral center from July 2019 to November 2020 were asked to participate in this prospective validation study. From February 2020 to July 2020, no patients were recruited due to pandemic measures. The following exclusion criteria were applied: age younger than 18 years, concomitant neurological or psychological illness, cognitive impairment or insufficient German language skills. The study had been approved by the competent ethics committee (2019-00652), and all applicable institutional and governmental regulations were followed. The enrollment goal was set at 50 participants based on the guidelines for the validation of questionnaires [26].

Collected data

Study participants were asked to complete both the full version QUALIVEEN-30® (All rights reserved 2008, Véronique Bonniaud and Coloplast Laboratories) [12] and the short-form (SF-)QUALIVEEN® (All rights reserved 2007, Dr Véronique Bonniaud, Pr Dianne Bryant, Pr Gordon Guyatt, Pr Bernard Parratte and Coloplast Laboratories) [19] questionnaires twice within three weeks. The questionnaires contain 30 and 8 items (i.e., questions) assigned to the four domains “bother with limitations”, “fears”, “feelings” and “frequency of limitations”. Each item is evaluated using a five-point Likert scale and scored from 0 to 4. The number of items assigned to each domain varies from 9 to 5 in the QUALIVEEN-30®. In the SF-QUALIVEEN®, each domain is evaluated by two items. The domain scores and overall score are calculated as the average of the item and domain scores, respectively. The lower the score, the lower the impact of NLUTD on QoL. The participants completed the questionnaires without the help of a third-person (self-administered). For the first evaluation, paper questionnaires were administrated, and for the second evaluation, participants had the choice to either complete paper or electronic (web-based) versions of the questionnaires.

Individual (sex, age) and SCI characteristics (etiology, level, completeness, severity), data regarding NLUTD, bladder management, urinary incontinence, urinary tract infections and concurrent urologic medication were extracted from electronic patient charts.

Statistical analyses

The data were calculated as mean and standard deviation (SD) or median and lower / upper quartiles or frequency and percentage where appropriate.

The criterion validity was evaluated by calculating the interclass correlation coefficients (ICC) (two-way mixed effects model, absolute agreement) between the scores of the SF-Qualiveen and the Qualiveen-30. Internal consistency between all items of the SF-Qualiveen and the two items of each domain was evaluated by calculating Cronbach’s alpha. The test-retest reliability of the SF-Qualiveen and the Qualiveen-30 were also evaluated by calculating the ICC (two-way mixed effects model, absolute agreement). Finally, the cross-sectional construct validity was evaluated by calculating Spearman’s rank correlations between the scores of the SF-Qualiveen and the Qualiveen-30 for bladder evacuation (intermittent catheterization, indwelling catheterization, no-catheter), urinary continence (yes/no) and urinary tract infection (UTI) (yes/no) sub-groups at the first evaluation. The values for criterion validity, internal consistency, test-retest reliability and cross-sectional construct validity were categorized as follows: poor: <0.5, moderate: 0.5–69, good: 0.7–0.89, excellent: ≥0.9.

The statistical analyses were performed using the SPSS software (Version 25, IBM, Somers, NY, USA). A p value of ≤0.05 was considered significant.

Results

In order to enroll 50 study participants, 125 patients presenting for an annual urodynamic follow-up examination were asked to participate. All study participants completed both questionnaires at the first evaluation time point, and there were 35 and 37 participants who completed the SF-Qualiveen and Qualiveen-30, respectively, at the second evaluation time point. The characteristics of the 50 evaluated patients are presented in Table 1. The mean age was 53 (14) years (range 26–78 years). The median duration of NLUTD was 14.9 years (7.8 / 29.0 years, range 1.3–56.0 years). A median 24.0 days (14.5 / 52.5 days, range 11–154 days) had elapsed from the first to the second evaluation.

Table 1 The characteristics of the evaluated individuals.

Domain and overall scores

The domain and overall scores of the SF-Qualiveen and Qualiveen-30 at the two evaluation time points are presented in Table 2. The mean scores of the SF-Qualiveen were higher compared to the Qualiveen-30 scores at the two evaluation time points. The mean differences between the two questionnaires ranged between 0.23 and 0.41. The highest mean scores were reported for the domains “frequency limitations” (mean scores ranged from 1.66 to 2.06) and “bother with limitations” (mean scores ranged from 1.35 to 1.76). The mean change in the SF-Qualiveen values from the first to the second evaluation ranged from 0.03 to 0.11.

Table 2 Domain and overall scores of the SF-Qualiveen and Qualiveen-30 at the two evaluation time points.

Criterion validity

The criterion validity for the different domain scores and the overall score at the two evaluation time points were all greater than 0.8 (Table 3).

Table 3 Criterion validity, internal consistency and test-retest reliability of the SF-Qualiveen.

Internal consistency

The SF-Qualiveen overall and the domains “bother with limitations” as well as “feelings” showed good internal consistency (Cronbach’s alpha >0.75) at both evaluation time points (Table 3). However, the internal consistency of the domains “frequency of limitations” and “fears” was moderate (Cronbach’s alpha 0.65/0.59) and moderate-poor (Cronbach’s alpha 0.68/0.37), respectively.

Test-retest reliability

The test-retest reliability for the different SF-Qualiveen domain scores and the overall score was greater than 0.9 and ranged from 0.91 to 0.94, similarly to the reliability of the Qualiveen-30, which ranged from 0.92–0.96 (Table 3).

Construct validity

The results of the evaluation of the cross-sectional construct validity are presented in Table 4. The correlation coefficients between the SF-Qualiveen and the Qualiveen-30 scores ranged from 0.75 to 0.92 in the urinary continence and UTI sub-groups. In the sub-group “indwelling catheterization” (n = 5), there were no significant correlations apart from the domain “feelings” (r = 0.97, p = 0.005). In the sub-group “no-catheter” (n = 15), the correlation coefficients of the following domains were smaller than 0.7: “bother with limitations” (0.60), “fears” (0.64) as well as the overall score (0.65).

Table 4 Cross-sectional construct validity of the SF-Qualiveen at the first evaluation.

Discussion

The SF-Qualiveen showed good to excellent criterion validity with correlation coefficients greater than 0.8. The internal consistency generally ranged from good to moderate. The test-retest reliability was excellent with correlation coefficients greater than 0.9. Finally, the cross-sectional construct validity of the SF-Qualiveen ranged from good to excellent in the urinary continence and UTI sub-groups and from excellent to moderate in the bladder evacuation sub-groups.

The concurrent criterion validity has been evaluated in order to determine whether the SF-Qualiveen can be used in place of the Qualiveen-30 questionnaire. There were strong correlations (>0.8) between the scores of the SF-Qualiveen and those of the Qualiveen-30 at both evaluation time points. Thus, the original version can be replaced with the short version. The criterion validity of the French and English SF-Qualiveen versions [19] showed very similar values (from 0.7 to 0.92) compared to the present evaluation. Similarly, the correlation coefficients of the domains “bother with limitations” and “fears” were slightly smaller compared to the other domains and the overall score. In validation studies of other language versions of the SF-Qualiveen [20,21,22,23,24], criterion validity had been evaluated by calculating correlations between the short-form scores and the scores of other instruments assessing the impact of urinary symptoms on QoL. Based on these results, the authors ascribed good criterion validity to the SF-Qualiveen. In the present study, the criterion validity at the second evaluation was similar compared to the first evaluation, ~3 weeks earlier. Reuvers et al. [20] also reported similar criterion validity for two evaluation time points which were ~2 weeks apart.

The internal consistency of the SF-Qualiveen was evaluated for each domain and the overall score at both evaluation time points. Internal consistency is a measure of how well questions correlate with each other and thus, shows whether questions assess the same underlying concept [26]. The values were > 0.75 for most domains and the overall score at both evaluation time points. This is in accordance with other investigations reporting generally good internal consistency for the SF-Qualiveen [20,21,22,23]. In most of these studies [20,21,22] as well as the present one, the internal consistency for the domains “fears” and “frequency limitations”, however, ranged from poor to moderate. The Cronbach’s alpha values for the domain “fears” ranged from 0.26 to 0.62. The two questions in the domain “fears” (i.e., “Do you worry about your bladder problems worsening?” and “Do you worry about smelling of urine?”) do not seem to assess the same underlying concept. In the domain “frequency limitations”, interviewees seem to answer the second question (“Can you go out without planning anything in advance?”) not solely related to their bladder problems. Furthermore, the reduction of the questions for the SF-Qualiveen was based on the level of responsiveness: the two most responsive questions were chosen for each domain [19]. The domains “fears” and “frequency limitations” showed the lowest responsiveness values of all domains (standardized response mean of 0.76 and 0.94, respectively) [27]. This may be the reasons for the weaker internal consistency of these two domains. Reuvers et al. [20] identified two components within the SF-Qualiveen based on a factor analysis of the eight questions. The first component entails the first seven questions and the second component only the last question, which may therefore be excluded. However, the last question should not be excluded from the SF-Qualiveen, because interviewees have confirmed the importance (content validity) of all eight questions [20, 23, 24]. Nevertheless, the categorization into four domains should be reconsidered, and the overall score should be used primarily.

The test-retest reliability for the different SF-Qualiveen domain scores and the overall score was excellent and similar to the reliability of the Qualiveen-30. The correlation coefficients ranged from 0.91 to 0.94, which is generally in accordance with previous reports [19, 20, 22,23,24]. In previous reports, the test-retest reliability has been evaluated after 2 weeks compared to three weeks in the present investigation. The excellent test-retest reliability furthermore supports the use of the SF-Qualiveen version instead of the Qualiveen-30 version.

The cross-sectional construct validity of the SF-Qualiveen was almost exclusively good and very good, which is in accordance with previous reports [19, 20, 22, 23]. Solely in the bladder evacuation sub-group “no catheter”, the correlation coefficients of some domains ranged from 0.60 to 0.65. In the sub-group “indwelling catheterization”, a significant correlation was only observed in the domain “feelings” (r = 0.97, p = 0.005) as a result of the small number of study participants in this sub-group (n = 5). Other authors have reported weak to moderate correlation coefficients regarding incontinence and voiding [22] or moderate correlation for intermittent catheterization [23].

The SF-Qualiveen mean scores were all slightly higher compared to the Qualiveen-30 mean scores at the two evaluation time points. There seems to be a systematic effect toward higher scores in the SF-Qualiveen. However, the mean score differences between the two questionnaires were not greater than 0.4. Furthermore, systematic errors were considered in the evaluation of criterion validity, and the results showed good to excellent absolute agreement. The mean changes in the SF-Qualiveen scores from the first to the second evaluation time point were all considerably smaller (0.03–0.11) than the mean minimal clinically important differences reported for the overall and domain scores (0.31–0.67) [19]. The greatest SF-Qualiveen mean scores were observed in the domains “bother with limitations” (1.76) and “frequency limitations” (2.06). This is in accordance with data observed in individuals with NLUTD as a result of multiple sclerosis (1.7 and 2.13) [20]. In the investigated individuals with chronic SCI, the impact of NLUTD on QoL was medium overall and in the domains “bother with limitations” and “frequency limitations” and small in the domains “fears” and “feelings”.

The recruitment of the study participants in a single tertiary neuro-urologic referral center during a relatively long period of time and the low response rate may question the external validity of the present data and thus, pertain to the limitations of the present investigation. However, the characteristics of the evaluated patients (Table 1) reflect the characteristics of the general population with chronic NLUTD resulting from SCI, and the results obtained are in accordance with other validation studies of the SF-Qualiveen [19,20,21,22,23,24]. The comparison of the cross-sectional construct validity with other published reports was limited because construct validity had been assessed differently [19, 20, 22, 23]. However, the sub-groups chosen to evaluate construct validity in the present study represent main factors affecting QoL in individuals with NLUTD [6, 7, 28]. Finally, the responsiveness and minimum clinically important differences of the SF-Qualiveen were not investigated.

The management of NLUTD aims to preserve renal function and to maintain the best possible QoL. As there is no sufficient association between symptoms and urodynamic findings [29], regular, standardized video-urodynamic evaluation combined with a validated questionnaire assessing QoL should be established in affected individuals in order to peruse both goals. Quality of life is a crucial aspect in the management and treatment of chronic conditions such as SCI and measures regarding NLUTD cannot be assessed comprehensively without considering the effect on QoL. The SF-Qualiveen has shown to have good criterion validity, internal consistency and construct validity as well as excellent test-retest reliability. Containing only eight items, it can be integrated into any urologic follow-up assessment with only little additional patient burden.

Conclusions

The German SF-Qualiveen has shown excellent reliability and validity, with variable internal consistency, ranging from poor to excellent. Its brevity will increase compliance, and we therefore recommend to include the SF-Qualiveen in urologic assessments.