Background

Rheumatoid arthritis (RA) is a chronic, systemic, inflammatory disease of unknown etiology. It affects 0.3–1% of population worldwide1. The progressive course of RA may result in deformity and destruction of bones and joints. If untreated or unresponsive to therapy, it can lead to functional disability (e.g. difficulties in conducting daily activities), impaired psychological and social functioning and premature death2,3,4. In China, RA is a leading cause of disability, and cause heavy burden for the patients, families and society. For instance, the annual cost of RA was estimated to be approximately $13.9–22.4 billion in China5,6.

In recent decades, the concept of quality of life (QoL) has been received an increasing attention in the evaluation of clinical and medical interventions. The World Health Organization (WHO) defines QOL as ‘a broad ranging concept incorporating in a complex way the person’s physical health, psychological state, level of independence, social relationships, person’s beliefs and their relationship to salient features of the environment”7. RA is a major cause of an impairment of patient’s QOL. The disability and symptoms (e.g. pain, stiffness, fatigue) related to RA has significant impacts on patient’s physical, psychological and social health8,9,10,11. Therefore, the assessment of QOL is an integral way to evaluate the impacts of this disease on patient’s health and wellbeing, as well as to evaluate the effectiveness of medical treatment or health interventions9,12,13.

Some instruments to assess QOL of RA patients have been developed. Examples are the Quality of Life-Rheumatoid Arthritis scale(QOL-RA)14, Rheumatoid Arthritis Quality of Life Questionnaire(RAQoL)15, the McMaster Toronto Arthritis Patient Preference Disability Questionnaire, the Cedars-Sinai Heath-Related Quality of Life in Rheumatoid Arthritis instrument (CSHQ-RA)16,17, Juvenile Arthritis Quality of Life Questionnaire (JAQQ)18, the Arthritis Impact Measurement Scales (AIMS)19,20 and the Rheumatoid Arthritis Quality of life Scale(RAQOL)21,22. The above-mentioned instruments were mostly developed and validated in industrialized counties, which showed a relatively good feasibility, validity and sensitive to change in QOL of RA patients. However, these instruments are not developed by the popular modular approach-a general/core module plus specific modules. A popular trend in the field of scale development has been establishment of the comprehensive measurement tool that can capture both similarities and differences among diseases. By creating a general module for a class of diseases and additional modules for individual-specific variations, researchers hope to provide a more accurate assessment of patients’ quality of life. For example, both the QLQs (Quality of Life Questionnaires) from EORTC (European Organization for Research and Treatment) and the FACIT (Functional Assessment of Chronic Illness Therapy) in USA for QOL assessments have been developed based on this modular approach23,24.

Besides, QOL is cultural dependence. In Chinese culture, the family relationship and kinship play very important roles in daily life. Food culture is also thought of highly, and thus good appetite, sleep, and energy are highly regarded in daily life. Taoism and traditional medicine focus on good temper and high spirit. This kind of culture dependence does not reflect in most QOL instruments in other languages25,26.

Considering these needs, we developed a QOL system Quality of Life Instruments for Chronic Diseases (QLICD), which combines a general module with disease-specific modules25,26. In the second edition of QLICD, a general module (QLICD-GM) which can be used for all chronic disease, and 34 specific modules tailored to different diseases such as hypertension, psoriasis, chronic gastritis etc. have been developed27,28,29. Each module is designed exclusively for the relevant disease, ensuring precise evaluations. As an example, the hypertension scale (QLICD-HY V2.0) was formed by combining the QLICD-GM (V2.0) and the specific module for hypertension27. Similarly, the Chronic Gastritis instrument (QLICD-CG V2.0) was formed by combining the QLICD-GM (V2.0) and the specific module for this disease29.

In regard to Rheumatoid arthritis, we developed the QLICD-RA(V2.0) under the system, which is a multidimensional, disease-specific, self-administered questionnaire applied to measure QOL of RA patients. This study is aimed to present the development and validation of the QLICD-RA in RA patient population, including the reliability, validity and responsiveness.

Methods

Establishment of the general module QLICD-GM(V2.0)

QLICD-GM (V2.0) was developed on the basis of the first edition25. In order to consider the clear hierarchy of the theoretical structure, the theoretical framework was proposed after several rounds of qualitative work including nominal group and focus group discussions, and also in-depth interviews to doctors and patients. In addition, in order to consider comprehensiveness, the structure is further refined to the sub-lateral (facets) level. The test version (beta version) has a relatively large number of items (36), with 13 being for physical and 13 for psychological functions and 10 for social functions. At the process of item screening, the pilot and pre-test data was used to select item by quantitative statistical procedures including variation analysis, correlation analysis, factor analysis, doctor’s importance ratings and patients’ importance ratings. Also the in-depth interviews on items for doctors and patients and several rounds of focus group discussions were carried out. After these quantitative and qualitative works, 7 items were deleted and the formal version of QLICD-GM (V2.0) was formed containing 10 facets and 29 items. After a two-year practical applications and evaluation again at large samples, removing the urination item and merging the will and personality facets, the modified formal version of QLICD-GM (V2.0) which includes 9 facets and 28 items was further revised in 2015, with 9 items being for physiological function, 11 items for psychological function and 8 items for social function28,29.

Establishment of the specific module

Based on a comprehensive literature review and experts’ experience, the members of the research group independently proposed 53 non-repeating items which formed the alternative item pool. Similar to the general module, the theoretical framework and item screening were carried out by several rounds of qualitative work including focus group discussions and in-depth interviews to doctors and patients. Specifically, some of the less important items have been deleted and some items reflecting specific social and psychological functions have been added. For instance, we separated item 1 into swelling and pain; replaced item 13 “muscles” with “muscle atrophy”, added an item “feel dry mouth”. Eventually the initial specific module contains 42 items.

We conducted a pretest of this 42-item questionnaire among RA patients (n = 30) and medical staff (n = 26) in two hospitals in Kunming and Zhanjiang in order to evaluate whether all the items were sensitive, representative, and comprehensive. Besides 42 items, the questionnaires of RA patients and medical staff were different, focusing on cognitive Interview for the former and evaluation interview for the later. Similarly, quantitative statistical procedures of variation analysis, correlation analysis, doctor’s importance ratings and patients’ importance ratings, and also the in-depth interviews for doctors and patients were used to analyze and evaluate items. Results of the pretest have been discussed in two rounds by the research group members. After the first round of discussion, 22 items were deleted. Later on, in order decrease the response burden on patients, 5 items were deleted based on measurements for test version. Finally, the specific module with 15 items was formed, which classify into 3 facts of Limitation of activity(LOA), Complications(COM) and treatment side effects, and Joint pain and deformity(JPD).

The above steps to form the final version of QLICO-RA were presented in Fig. 1.

Figure 1
figure 1

Steps towards development and validation procedure of QLICD-RA (V2.0).

Validation of the QLICD-RA

Data collection and scoring

The research protocol, along with the informed consent document, gained approval from the Institutional Review Boards (IRBs) at the investigators’ respective institutions and the associated hospital. In terms of sample size, according to our experience and estimation based on variation from 0–100 standardized scores30,31, 200 cases are enough for validation of the scale because of using sensitive statistical methods such as Pearson correlation analysis and paired-t tests.

We recruited 379 patients diagnosed with RA for our study, and these participants came from the Affiliated Hospital of Kunming Medical University and the Affiliated Hospital of Guangdong Medical University in China, and were screened by their treating physicians and the investigative team. The enrolled participants were capable of understanding and completing the questionnaires related to the various stages of treatment and volunteered to participate in this study.

All participants filled in the questionnaires of the QLICD-RA, the Chinese version of SF-3632 on the first day of admission to the hospital by themselves. Among them, 47 received the second measurements on the second day of hospitalization for test–retest reliability, and 223 measured again at the day before discharge for responsiveness. The investigators were checked the answers immediately to ensure the completeness of the answers each time. If missing values were found, the questionnaire would be returned to the patients to fill in the missing item.

Each item of QLICD-RA is based on a five-level scale, namely, not at all, a little bit, somewhat, quite a bit, and very much. The positively stated items directly obtain scores from 1 to 5 points and the negatively stated items are reversed. The domain and the overall scores are obtained by adding together the within-domain item scores. For comparison, all domains scores were linearly converted to a 0–100 scale using the formula: SS = (RS − Min) × 100/R, where SS, RS, Min and R represent the standardized score, raw score, minimum score, and range of scores, respectively. The higher score indicates better QOL.

Psychometrics analysis

To evaluate the internal consistency of our measurement tool, we conducted several statistical analyses. We calculated Cronbach’s alpha coefficients for each domain/facet of the scale, and item-to-domain score correlations using Pearson correlation coefficients. A Cronbach’s alpha coefficient exceeding 0.70 was considered indicative of good internal consistency, while an item-to-total score correlation exceeding 0.40 indicated good item internal consistency33.

In order to assess the test–retest reliability of the QLICD-RA instrument, we employed correlation coefficients (r) and intra-class correlation coefficients (ICCs). The threshold for test–retest reliability was defined as ICC 0.80.

We evaluated the convergent validity and discriminant validity, which also represented the construct validity. Pearson correlation analysis was applied to assess the correlation between the item scores and domain scores. The correlation coefficients were interpreted according to the following criteria27,34: (1) the convergent validity is supported when an item-domain correlation is greater than 0.40; (2) the discriminant validity is revealed when the correlation between the score of an individual item and the score of its designated domain is stronger than that between the score of this item and non-designated domains. We additionally evaluated the construct validity of the specific module and the general module respectively by confirmatory factor analysis using structural equation modeling, with the CFI (comparative fit index) and TLI (Tucker-Lewis index) greater than 0.90, RMSEA(root-mean-square error of approximation) less than 0.08 and SRMR(standardized root mean square residual) less than 0.10 reflecting a good fit of the model to the data35,36. Criterion-related validity was evaluated by calculation the correlation coefficients between domain scores of QLICD-RA and domain scores of Chinese SF-36 (the 36 item Short Form Health Survey)32.

Responsiveness was assessed through comparing the mean difference between the pre-treatment and post-treatment with effect size, SRM (standardized response mean).

All statistical analysis was done with SPSS (version 22.0) software.

Ethical approval and consent to participate

The study protocol was approved by the Institutional Review Board (IRB) of the first affiliated hospital of Guangdong medical university (PJ2013037). The investigators explained the aims of the trial and the instrument to the patients and obtained informed consent from those patients who agreed to participate in the study and met with the inclusion criteria. A complete assurance was given that all information would be kept confidential. The right was given to the patients not to participate and to discontinue participation in the study with consideration /without penalty. The Declaration of Helsinki’s ethical guidelines were followed in the study.

Results

Socio-demographic characteristics of the participants

The socio-demographic characteristics of the participants were presented in Table 1. Among the 372 participants, around half were older than 50 years; 80% female; above 70% had moderate educational level (i.e. high school); around 90% were married and 97% had Han ethnicity. The diagnosis of most participants is typical RA and was at chronic stage. Around 35% participants had immunosupressor treatment, and 40% had both hormone and immunosuppresor treatment. 34% of participants had public insurance, while around half paid all the cost by themselves.

Table 1 Socio-demographic characteristics of the sample (n = 379).

Reliability

Table 2 shows the Cronbach’s α, test–retest reliability coefficients (correlation r and ICC) for domains and modules, as well as the overall instrument. The range of Cronbach’s α values was 0.77–0.94 at domains level, which was greater than 0.70. Forty three patients completed the questionnaires for test–retest reliability analysis. The correlation r ranged from 0.86 to 0.99. All the values of ICC were greater than 0.80.

Table 2 Reliability of the quality of life instrument QLICD-RA(V2.0) (n = 379 for α, and floor and ceiling effects, n = 47 for r, ICC).

Validity

Construct validity

Table 3 shows the correlation coefficients between items and domains of QLICD-RA. All correlation coefficients between the scores of items and their relevant domains were greater than 0.40, which indicates a good convergent validity. The correlation coefficient between the score of every item and the score of its designated domain was greater than that with its non-designated domains, except for the item ‘Attention’ (GPS1), which indicates good discriminant validity.

Table 3 Correlation coefficients r among items and domains of QLICD-RA(V2.0) (n = 379).

Structural equation modeling showed that the structure of the general module of the QLICD-RA was roughly consistent with the conceptual theoretical construct (three domains, nine facets), with relatively not higher goodness of fit indicators: Chi-square \({{\varvec{\chi}}}^{2}\)= 1059.817 (P < 0.001), df = 332, \({{\varvec{\chi}}}^{2}\)/df = 3.192, Tucker–Lewis index (TLI) = 0.805, comparative fit index (CFI) = 0.829, root mean square error of approximation (RMSEA) = 0.076, standardized root mean square residual (SRMR) = 0.116. See Table 4 and Fig. 2 in detail.

Table 4 The results of SEM analysis on the general module of QLICD-RA (n = 379)*.
Figure 2
figure 2

The structure of the general module of QLICD-RA by structural equation modeling.

Structural equation modeling showed that the structure of the specific module of the QLICD-RA was consistent with the conceptual theoretical construct (three facets), with goodness of fit Chi-square \({{\varvec{\chi}}}^{2}\)= 268.393(P < 0.001), df = 84, \({{\varvec{\chi}}}^{2}\)/df = 3.195, Tucker–Lewis index (TLI) = 0.927, comparative fit index (CFI) = 0.942, root mean square error of approximation (RMSEA) = 0.076, standardized root mean square residual (SRMR) = 0.056. See Table 5 and Fig. 3 in detail.

Table 5 The results of SEM analysis on thespecific module of QLICD-RA (n = 379)*.
Figure 3
figure 3

The structure of the specific module of QLICD-RA by structural equation modeling.

Criteria-related validity

Table 6 shows the Pearson correlation coefficients between the physical, psychological, social and specific domains of QLICD-RA with the eight domains (or subscales) of SF-36. It can be seen that the correlations between the same and similar domains are generally higher than those between different and non-similar domains. For example, the correlation coefficient between the physical domain of QLICD-RA and the physical function domain of SF-36 (r = 0.71) is higher than that with other domains of SF-36. The higher correlation coefficients are also seen between psychological domain of QLICD-RA and mental health domain of SF-36 (r = 0.61); social domain of QLICD-RA and Social function of SF-36 (r = 0.51).

Table 6 Correlation coefficients among domains scores of QLICD-RA(V2.0) and SF-36 (n = 379).

Responsiveness

There were 223 patients who completed the questionnaire at the third assessment in order to evaluate the responsiveness. As shown in Table 7, more than half of the facets (seven out of 12) are seen with significant differences in scores before and after treatment (p < 0.05). In addition, the changes in scores of the physical and social domain of the general module are significant (p = 0.027 and p = 0.001, respectively). The value of SRM ranges from 0.07 to 0.27 for significant domains/facets, with the largest SRM being the facet of ‘Joint pain and deformity’ in the specific module of the instrument.

Table 7 Responsiveness of the quality of life instrument QLICD-RA(V2.0) (n = 223).

Discussions

This paper focused on the development and validation of the QLICD-RA (V2.0), a specific QOL instrument for Rheumatoid Arthritis Scale among the System of Quality of Life Instruments for Chronic Diseases. It demonstrated good psychometric properties in terms of reliability and validity in Chinese speaking adult RA patients.

In terms of reliability, internal consistency reliability (Cronbach’s α), test–retest reliability (Pearson r) and ICC were applied in the current study. All domains and the overall score of the QLICD-RA demonstrated excellent internal consistency by a relatively high Cronbach’s α (range 0.77–0.94), indicating that all items are measuring the same thing. The subscale/domain scores of QLICD-RA had high test–retest reliability (both Pearson r and ICC ranged from 0.86 to 0.99) according to the correlation between the first- and second-time measurements. Thus this instrument has excellent reliability considering that internal consistency coefficients above 0.70 and test–retest reliability coefficient above 0.80 are generally accepted as satisfactory.

The duration of these two measurements is one day, which may result in memory effect. In the previous studies, the duration between test and retest is normally 14 days or four weeks for healthy people17,37,38. Given some practical factors, such as the relatively short duration of admission in hospital, the potential (“quick”) changes in QOL caused by the therapy and the discussion by expert panel, we decided to conduct the retest measure one day after the first measurement. According to our experiences25,27, one-day interval is stable and not much more memory effect because too many items and the patients do not know it will repeat again.

The present study demonstrates a good validity of QLICD-RA. More specifically, our data supports a good convergent and discriminant validity, because all the item-domain correlation coefficients are greater than 0.4, and all the correlations between items and designated domains are higher than that between items and non-designated domains, except for the item GPS1 (‘Attention’). The item GPS1 is “Can you focus attention on what you are doing?” The correlation between GPS1 and physical/social domain (i.e. 0.57 and 0.49) is higher than that with psychological domain (i.e. 0.42), which is not consistent with our hypothesis. This finding may be related to how patients perceived and understood this item. Probably, from perspectives of patients, the attention issue was more related to their physical and social health status due to living with RA than psychological health. We suggested to ask the RA patients in the further study regarding how they perceive the wording of this item and maybe in the future we will rephrase this item.

Moreover, the validity of the construct was also further confirmed by structural equation modeling, which revealed excellent fit for the specific module from the data corresponded with the theoretical constructs of the instrument35,36. In contrast, it just shows basically acceptable fit for the general module in RA data in this research. However, the general module can be used for all patients with chronic diseases, and has been confirmed excellent fit by SEM from 11 diseases data (report elsewhere). Therefore, it can be confirmed that the general module of the QLICD-RA was consistent with the conceptual theoretical construct (three domains, nine facets), although SEM in RA data has only just basically acceptable fit.

With regard to the criteria-related validity, as expected, our study shows a good correlation between physical domain of QLICD-RA and physical function domain of SF-36; as well as psychological domain of QLICD-RA and mental health domain of SF-36. However, the social domain of QLICD-RA was significantly correlated with the social function domain (0.51) but also the vitality domain (0.55) of SF-36. The specific module score is highly significantly correlated with the Physical function and role-physical domain of SF-36. This finding is consistent with the nature of question items in both domains. The questions of specific module are mainly about the discomforting symptoms and the physical limitations due to RA. Besides, these correlation coefficients also revealed the convergent and divergent validity to some extent, which again confirmed the good construct validity.

With regard to the responsiveness, the assessment methods on responsiveness can be divided into two categories: internal and external39,40. In this paper we focused on the internal responsiveness with the hypothesis that the sensitive instrument should detect changes in response to treatments when assessed at post-treatment. We did not find the significant change in the scores of overall instrument, generic/specific module and the psychological domain before and after treatment. We found the significant changes regarding the physical and social domain scores, i.e. the physical domain score has been increased after treatment and the social domain score decreased. This finding could be explained by that the treatment in hospital probably has relieved the discomforting symptoms and improved patient’s physical health, while the admission in hospital may limit the social functioning of the patient. The above explanation could also be applied to the change in the scores in specific facets, such as “Energy and discomfort”, “interpersonal communication”, “social support and security”, “social roles”, “treatment side effects” and “joint pain and deformity”.

To our best knowledge, the present study is the first study in China to validate QLICD-RA (V2.0) in the clinical patients with RA in a relatively large sample. We have evaluated a comprehensive set of the psychometric parameters, including reliability, validity and responsiveness, which provided the valuable evidence for the clinical professionals to apply this instrument in the clinical research and daily practice. The QLICD-RA has several advantages over existing instruments. First, it could compare QOL across diseases by the general module and also capture the symptoms and side effects by the specific module. Second, it is of the strong Chinese cultural background. For example, the Chinese culture pay more attention to family relationship and kinship, dietary, temperament and high spirit, which are all captured in the QLICD-RA by items focusing on appetite (GPH1), sleep (GPH2), energy (GPH9) and family support (GSO2, GSO4 etc.).

However, there are several limitations that warrant attention. First, the sample to evaluate the test–retest reliability is relatively small, and the internal between both tests is relatively short. Second, though we have reduced the amount of items based on several rounds of expert panels and pretest, the total number of items of the second version QLICD is 43, which may cause respond burden for patients in certain circumstances. We recommend the future research to carefully assess the time spent on filling in the questionnaire and the barriers to understand and complete it. Third, given the composition of the sample in our study, patients were more often from relatively low socio-economic status, which may limit the generalization of our study. And all the patients are inpatients in our study. We suggest to duplicate the validation of QLICD-RA in a more representative population, in the outpatient population, and in other geographic areas in China where the socio-cultural characteristics may be different from the area where the patients in our study resided, and may influence the psychometric performance of this instrument.

Conclusions

Our study shows that the second version of QLICD-RA has a good reliability, validity and responsiveness. It can be used to measure QOL among patients with RA in mainland China. Other foreign language versions can develop rigorous translation programs based on this scale. We suggest the future studies to duplicate the present study in other settings, such as RA outpatients in hospital, a population with different socio-demographic background, to extend the evidence pool in terms of the validation of this instrument.