Introduction

To better assess and subsequently enhance the social participation of adults with disabilities in Taiwan, the Taiwanese government developed a new tool of disability evaluation; the Functioning Disability Evaluation Scale (FUNDES) [1]. FUNDES is based on the International Classification of Functioning, Disability and Health (ICF) of the World Health Organization (WHO) [2], and the WHO Disability Assessment Schedule 2.0 (WHODAS 2.0), which was chosen as the main instrument to assess the activity and participation domains in FUNDES [1, 3]. All domains of FUNDES based on WHODAS 2.0 address the performance and capability dimensions in this study. The difficulty levels of the performance dimensions were judged with typical assistive technology and personal assistances, and that of the capability dimensions were without the aid of devices and personal assistances [4].

In 2014, the FUNDES group developed its own norm values of WHODAS 2.0 for adults with disabilities in Taiwan [5]. The results, using classical test theory (CTT) approach, showed that the WHODAS 2.0 was valid and reliable in evaluating the activity and participation function (AP function) problems of adults with a disability [4]. However, we still lack information about the validity of WHODAS 2.0 in the group of people with spinal cord injury (SCI) in a cultural setting such as Taiwan, including information about the measurement properties of the WHODAS 2.0 when analyzed with modern test theory. Modern test theory in addition to CTT, provides information on how and why participants responded to an item in the way they did [6]. Such information would be useful, especially to know whether WHODAS 2.0 can capture the needs of people with a high level of disability as in SCI [7].

Persons with SCI living in Taiwan have major challenges in the area of health care, financial and family support, and community integration, which includes finding suitable employment, social prejudices, and negative attitudes from others [8]. Thus far, most SCI studies in Asian countries have focused on epidemiology, quality of life, and comparison with other health conditions [9, 10]. Therefore, integrating WHODAS 2.0 into FUNDES in Taiwan represents relevant additional information to gain a better and more comprehensive understanding of the impact of SCI on individual functioning. However, the reliability and validity of WHODAS 2.0 in the SCI population in Taiwan remain unexplored. To help investigate the utility of WHODAS 2.0 for the disability assessment of persons with SCI in Taiwan, this study aims to examine the psychometric properties of WHODAS 2.0 in terms of its reliability and construct validity in the SCI population in Taiwan based on classical and modern test theory.

Methods

Participants

The participants who registered from November 2013 to January 2015 in the National Disability Determination System in Taiwan were selected. Data were collected in 239 authorized hospitals by trained interviewers in face-to-face interviews, as part of the official disability evaluation in Taiwan.

The sample was defined using the SCI-related diagnostic codes 344, 806, and 952 from the International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM). We included 521 adults with SCI more than 1-year chronic injury and over 18 years of age. We excluded those cases where missing data exceeded 50% in every domain of WHODAS 2.0. If the missing data were less than 50%, we used that domain’s mean scores for imputation, which is the approach recommended by Garin (2009) to handle missing values in WHODAS [11]. The study was approved by the Tzu-Chi research ethics committee (IRB-104-4-A).

Variables

Sociodemographic information included sex, age, work status, education, living status, and injury characteristics. The living status was defined as living in the community or institution. The variable severity of disability was defined as degree of disability resulting from the lesion base on the body function and structure of ICF which is the physician’s rating of the level of disability.

The criterions of neurological status about sustained paraplegia, tetraplegia, incomplete lesion, and complete lesion are based on the codes of body function and body structure from ICF, which were developed by professionals and physicians during 2009–2012. The criteria are the following: the definition code of tetraplegia was b730a (Muscle power functions, Upper limbs) and b730b, and the paraplegia was b730b (Muscle power functions, Lower limbs). For the complete type, we have defined that the complete was b730b.3 (MRC classification grade was 0 or 1), the incomplete was b730b.1 or b730b.2 (MRC classification grade was 2 or 3) [12].

WHODAS 2.0 was used to evaluate the difficulty experienced in an individual's daily life over the past 30 days, and the difficulty levels of the performance dimensions were judged with typical assistive technology and personal assistances. WHODAS 2.0 contains six domains with a total of 36 items, including D1 “Understanding and communicating” (6 items), D2 “Getting around” (5 items), D3 “Self-care” (4 items), D4 “Getting along with people” (5 items), D5 “Life activities” (household and school/work, 8 items), and D6 “Participation in society” (8 items). In fact, D5 “Life activities”, in the present study included only the items D5.1 to D5.4 that relate to “Household activities”. We would not ask the D5.5–D5.8 “Work” if the participants were not working, the 32-item WHODAS 2.0 score can be used as full 36-item version based on the WHODAS 2.0 manual, because the participants were not engaged in work activities at this moment [13]. Response options for the WHODAS 2.0 to the stem question “In the past 30 days, how much difficulty did you have….” ranged from 1 = “non” to 5 = “Extreme or cannot do”. Domain raw scores were calculated and converted into an ordered metric ranging from 0 to 100, using the algorithm provided by WHO [13].

Data analysis

Data were analyzed with IBM SPSS 20.0 for classical statistical tests and the software R with package eRM for the Rasch analyses.

The descriptive analysis of the sample distribution by domains includes the description of ceiling and floor effect and normality test. The ceiling effect was defined as the proportion of participants with maximum scores of 100, and the floor effect as the proportion of minimum scores of 0. Then the 15% of maximum and minimum score reflect floor and ceiling effects [14]. Skewness and kurtosis were used to test the normality of WHODAS 2.0 scores distribution. The distribution of skewness and kurtosis values are both 0 which means the scores is normal distribution, the skewness value less than 0, the distribution is negatively skewed that means more cases with higher scores. The kurtosis value less than 0 which is low peak, lower than normal distribution.

The internal consistency was analyzed using Cronbach’s α, a statistical coefficient which estimates the degree to which the items of a questionnaire are interrelated. The values of Cronbach’s α above 0.9 were excellent, 0.8–0.9 were good, and 0.6–0.8 were acceptable [15].

We use the exploratory factor analysis to identify the underlying relationships between the measured variables of the WHODAS 2.0, the exploratory factor analysis of WHODAS 2.0 in the sample of individuals with SCI in Taiwan allowed to determine its factorial structure and hence its construct validity. We assumed that the results of the exploratory factor analysis would confirm the original factor structure provided by WHO. As a prerequisite, we first checked the sample adequacy according to Kaiser–Meyer–Olkin (KMO) and Bartlett’s test (p-value < 0.05). KMO values between 0.8 and 1 are considered acceptable.

Subsequently, a Rasch analysis taking into consideration the findings of the exploratory factor analysis were performed. The Rasch analysis provides more information on metric properties at item level in addition to the model reliability and scale dimensionality questions [16]. Rasch’s probabilistic measurement approach assumes that the ability of individuals is a function of the difficulty of questionnaire items. The degree to which the data complies to a series of Rasch assumptions determines the quality of its measurement properties. An important assumption is that items in a summary score are locally independent, with correlations of the standardized analysis residuals below 0.2 [17]. Locally dependent items may lead to biased and inflated analysis estimates. In the presence of local item dependencies (LID), correlating items can be aggregated to enter the analysis as a so-called testlet [18]. Lesion level and type are not included in the differential item functioning analysis as they are on the pathway of functioning and it can be expected that the WHODAS 2.0 items are in favor of participants with paraplegia.

Monotonicity of the item response options is another assumption that requires items to display strictly increasing levels of difficulty. The Rasch analysis may reveal disordered thresholds that can be repaired by collapsing the reversed response options into one single option. In general, collapsing of disordered response options of testlets is not required. While strict monotonicity is required, equidistance of the intervals between response thresholds is not expected, as this can be handled by the specific Rasch model applied in this study, i.e., the partial credit model [19].

Unidimensionality of the instrument requires that the loading structure of standardized analysis residuals indicates the presence of one single dimension with eigenvalues below 2 [20]. It is expected that component loadings of a principal component analysis (PCA) of the residuals confirm the dimensional structure found with the factor analytical approach. Finally, The Person Separation Index (PSI) gives the reliability of the model. It can be interpreted similarly to Cronbach alpha, ranging between 0 and 1. Usually, a PSI reliability of 0.70 is required for analysis at the group level, and values of 0.85 and higher for individual use [21].

The fit of the WHODAS 2.0 items to the Rasch model is given with the infit statistic derived from the mean squared standardized residuals. In the present setting, infit values within 0.8–1.2 are considered sufficient for high stakes assessment purposes [22]. The Rasch analysis was first performed for the entire WHODAS 2.0, mainly to verify and confirm the dimensional structure revealed by the exploratory factor analysis. Then the relevant dimensions or factors were analyzed separately and adjustments were undertaken towards creating reliable summary scores for WHODAS 2.0. Finally, a conjoint analysis of all domains of the WHODAS 2.0 was performed, using a testlet per domain approach to overcome multidimensionality issues and to come up with one single interval-scaled WHODAS 2.0 summary score. Based on the metric information derived from the Rasch analysis, the quality and validity of domain and summary scores will be discussed.

Finally, to explore the sensitivity of the WHODAS 2.0 scores in the light of the level of disability, the 0–100 Rasch-transformed person scores for each WHODAS 2.0 domain and the WHODAS 2.0 summary score (testlet analysis) were compared to the physician’s rating of the level of disability (1 = mild, 2 = moderate, 3 = severe, or 4 = profound).

Results

Characteristics of participants and the distribution of AP function

The study population was 59 years old on average with 67% of the participants being male. About 54% had sustained paraplegia and 46% a tetraplegia, 86% had an incomplete lesion, and 14% a complete lesion. A little more than twenty-five percent of the study population experienced mild disability of severity, 31% moderate, 37% severe, and 7% profound disability. Further characteristics of the study population are shown in Table 1. Table 2 outlines the average scores, median of the six WHODAS 2.0 domains, and ceiling and floor effects for the overall SCI population. The internal consistency (Cronbach’s α) of the six domains ranged from 0.87 to 0.99. The skewness and kurtosis values were −0.091 and −0.459 that were negatively skewed and the distribution was lower peak than normal distribution.

Table 1 Sociodemographic of study population (n = 521)
Table 2 The distribution of WHODAS 2.0 scores with floor and ceiling effects based on interval scores (0–100) for the people with spinal cord injury

Construct validity: factor analysis

The KMO measure of sampling adequacy was >0.91 and the significance of Bartlett's test of sphericity was p < 0.05, meaning that exploratory factor analysis could safely be applied to the obtained dataset. As hypothesized, the factor loadings of exploratory factor analysis confirmed the original WHODAS 2.0 scale for the most part, explaining a total variance of 72.5%. Yet three items cross-loaded on different scales: D4.3 “Getting along with close people” loaded strongly on D1 “Understanding and communicating”, and two questions from D2 “Getting around” loaded on D3” Self-care” (see Table 3).

Table 3 The results of exploratory factor analysis for people with SCI (n = 521)

Measurement properties: Rasch analysis

The Rasch analysis was performed in several steps. First the Rasch analysis of the entire WHODAS 2.0, provided further support for the findings of the factor analysis. The PCA indicated multidimensionality with at least 5 dimensions (Supplement 1). The dependency structure, based on the correlation matrix of the standardized residuals, showed that mainly the items from a same domain were strongly related (r > 0.2). The only exception was the item D6.4 “How much time did you spend on your health condition, or its consequences” which was uncorrelated to any other WHODAS 2.0. item. Also items of domain D1 “Understanding and communicating”, i.e., D1.1 Concentration, D1.3 Problem-solving, D1.4 Learning a new task, and D1.6 Conversation appeared associated to items from domain D4 “Getting along with people”, and items of domain D2 “Getting around”, i.e., D2.3 Moving around inside home and D2.4 Getting out of home, to those of the domain D3 “Self-care” (Fig. 1).

Fig. 1
figure 1

Local Item dependency structure for the WHODAS 2.0

In the next steps, each WHODAS 2.0 domain was analyzed separately with the Rasch model and then, if necessary, testlet based solutions for the six domain scores and the summary score were proposed. The results of the analysis by domain before (start model) and after Rasch based adjustments (final model) are shown in table 1_supplementary material.

Table 4 shows the general fit of the Rasch models by domain or across domains, including the model before (start model) and after Rasch based adjustments to achieve good fit (final model). Two domains, D1 “Understanding and communicating” and D5(1) “Household activities”, did not need to be adjusted at all as the model and their items complied to all Rasch assumptions. All analyses by domain showed a high PSI, ranging between 0.87 for D2 “Mobility” and 0.98 for D5(1) “Household activities”.

Table 4 Start and final model targeting fit of entire WHODAS 2.0, each subscale, and the calibration of domains as items

Uniform differential item functioning for age was found for the testlet aggregating items D6.6 and D6.7 in the adjusted analysis of domain D6” Participation in society”. Uniform differential item functioning for gender was not found in any of the final models. All items of the final adjusted models showed good fit, with a tendency towards overfit, meaning lower discrimination for scores in the middle of the range compared to more extreme scores, especially in domain D3 “Self-Care” and D5(1) “Household activities”.

Finally, a testlet approach for the domain analysis was undertaken to derive one single summary score as a measure of health and disability. This analysis confirmed the association between the D1 “Understanding and communicating” and D4 “Getting along with people”, as well as the D2 “Getting around” and D3 “Self-Care” domains. In a final step, the correlated domains were aggregated. In the basic analysis across the original six WHODAS 2.0 domains, only D5(1) “Household activities” showed an insufficient fit. The fit of D5(1) increased in the final testlet solution. Unsurprisingly, the PSI dropped for the testlet models, e.g., the first approach with the six original domains had a PSI of 0.85, while the final testlet approach, free of local dependencies, showed a PSI good enough for group level measurement (>0.7). PSI for WHODAS 2.0 aggregated across domains was only 0.79.

The analysis of the scores per WHODAS 2.0 domain, as well as a Rasch based WHODAS 2.0 total score matched significantly with the physician’s general disability rating. The changes in the distribution of WHODAS 2.0 scores for the entire SCI-sample by disability level for each domain and across domains are illustrated in Fig. 2.

Fig. 2
figure 2

Boxplot for WHODAS 2.0 0–100 transformed scores by disability level

Discussion

The current study has examined reliability and validity of WHODAS 2.0 with a classical and a modern psychometric approach, and provided evidence from both approaches that the instrument is reliable and valid to measure disability in a Taiwanese SCI population. It is indeed important to discuss the psychometric properties of WHODAS 2.0 in Taiwan, especially for the rare subgroups whose demographic characteristics were different with our total disabled population, like SCI population. Because WHODAS 2.0 is the main instrument to evaluate the people with disability under the national social welfare policy, eligibility and the severity of disability are strongly correlated with the welfare allocation and employment-oriented services in Taiwan. Therefore, how to measure the disabilities status precisely would be helpful for the people with SCI population.

Nevertheless, it must be noted that items related to work in the domain D5(1) “Household activities” were excluded from the analysis because less than 6% of participants reported being employed. Not being employed may be a characteristic of SCI population in Taiwan in the first few years after SCI, but international data suggest that the employment rate of persons with SCI increases over the years [23]. Therefore, the consequence of not assessing the work items must be carefully evaluated in the light of disability eligibility in the long-term evaluation of functioning of people with SCI.

WHODAS 2.0 showed good to excellent internal consistency for chronic adults with SCI with a Cronbach’s alpha ranging from 0.85 to 0.99, which is slightly above the findings of other studies [24]. The PSI of WHODAS 2.0 total score was 0.79 that was below 0.85 in this study that implied the total score of WHODAS 2.0 should only be used on population or group level [25]. The observed difference between PSI and Cronbach’s α may result from the fact that the PSI is sensitive to a skewed distribution of scores and increases as the scores become more extreme, while there is no such effect for Cronbach’s α [26]. Even though the total score of WHODAS 2.0 distribution among SCI patients was negatively skewed (The skewness = −0.091) that does not affect the applicability of this measurement tool. The exploratory factor analysis confirmed the original 6-factor structure in most parts and therefore all items were included in the following Rasch analysis [27].

To a large extent, both analyses, the exploratory factor analysis and the Rasch analysis, support the multidimensionality of the WHODAS and the validity of the construct used by WHO. Similar to the Rasch analysis, which indicates a certain independence of the item D6.4 “How much time did you spend on your health condition, or its consequences”, the exploratory factor analysis loadings were lowest on its domain factor and low on the other factors. The exploratory factor analysis and Rasch analysis both support the strong association of items D2.3 and D2.4 with the items from D3, the Self-Care domain. Interestingly, the exploratory factor analysis showed that item D4.3 loaded on Factor D1 “Understanding and communicating”. Also, the Rasch analysis showed high local dependency for all items of D1 and D4 “Getting along with people”, suggesting that the items from both domains could be joined to form one common domain. The association between D1 and D4 items could be explained by the Taiwanese culture, in which the items from the domain D4 are mostly perceived as a cognitive ability [4]. If researchers want to use WHODAS 2.0 as an assessment instrument to evaluate adults with SCI in Taiwan or other Asia countries, it could be sensible to consider a formal cross-cultural adaption to ensure proper understanding of the items.

The decrease of the reliability from the WHODAS 2.0 analysis with and without the testlets can be explained; the local dependencies caused a reliability inflation in the first Rasch analysis which included all WHODAS 2.0 domains [28]. This inflation was controlled in the testlet approach, which treats each domain like a super-item. Further, the testlet strategy allows overcoming the dimensionality issues found in the first analysis and provides the true reliability of the WHODAS 2.0 scale, as well as information about the reliability of the raw scores. A WHODAS 2.0 summary score for the entire SCI population could be confirmed. This testlet approach was also successfully applied in a stroke population [29].

In the light of disability eligibility determination, it will be crucial to see whether our solutions can be replicated in a chronic SCI population or in different diagnostic groups. Although we expected a substantial influence of the cultural background no differential item functioning for gender was found in item D5.1 “Taking care of your household responsibilities”, e.g., social roles of males and females in Asian countries, where household tasks that are traditionally performed by women in Taiwan. This means that the item was understood and answered the same by both male and female participants. Only the domain D6: “Participation in society” showed uniform differential item functioning for age in the overall score, which would be reasonable given the changing participation patterns in different life phases. Typically, differential item functioning can be solved by applying a so-called item split [30]. Splitting an item for differential item functioning would result in estimating different item difficulties across levels of person factor subgroups that cause differential item functioning. However, one must keep in mind that adjusting the item difficulties would than favor the subgroup with more difficulties on an item to adjust for the difference. This adjustment, of course, would impact the comparability of the person ability across groups. Also, different cut-off scores must then be calculated as the subgroups may not be comparable anymore. For these reasons, we decided to report for differential item functioning without undertaking any adjustments.

Study limitation

Our participants first registered in the National Disability Determination System in Taiwan between 2013 and 2015. In other words, the study period was limited, Therefore, it will be applicable to provide more information about their work status, especially the work items of D5-2. Nevertheless, our findings indicate that WHODAS 2.0 is valid to assess and document the level of activity and participation of adults with SCI in Taiwan. The external validity issue would be must careful because the results might not apply to a sub population that has not adequately participated in the study.

Conclusion

The present study is the first to examine the psychometric properties of WHODAS 2.0 in a population of persons with chronic SCI in Taiwan using a classical and modern test approach. Our study could provide reference information for the Taiwanese government to assess functioning based on WHODAS 2.0 for disability policy in the future.