Comparative Assessment of Different Health Utility Measures in Systemic Lupus Erythematosus

In a time of increasing economic constraints, it is crucial that health systems optimize their resource use to ensure that they generate the maximum possible health gain. Therefore, it is necessary for health interventions to be evaluated and compared across therapeutic boundaries. Undertaking such an evaluation a generic utility-based measure is required. But it remains uncertain whether the utility values obtained by direct or indirect methods are comparable and which approach is the most appropriate in Systemic Lupus Erythematosus (SLE) population. In the study, we compared the utility values obtained by an indirect method (EQ-5D) with direct utility instruments, the standard gamble (SG) and visual analog scale (VAS), in SLE patients. The correlations between VAS, EQ-5D and LupusQoL were significant; relative good intraclass correlations or kappa coefficients indicated the reliability of these instruments. A model incorporating the SLEDAI scores and LupusQoL domains of emotional health and pain was a good predictor of VAS. SLEDAI score was a good predictor in the SG regression model. These findings suggested that the VAS and EQ-5D might be valid and reliable measures to assess health related quality of life in SLE patients and represent promising outcome measures for future research in this population.

1 for optimal health and 0 for death (health states considered worse than death can be represented by negative values) 9 . Utility preference scores are key elements in cost-effectiveness research, which can be conducted in a number of ways, including using various "off the shelf " instruments [Health Utilities Index Mark 3 (HUI-3, http://www.healthutilities.com; Health Utilities Inc, Hamilton, Ontario, Canada), EuroQol (EQ-5D, http://www.euroqol.org; EuroQol Group, Rotterdam, the Netherlands), and Short Form 6D (SF-6D, http://www.qualitymetric.com; Quality Metric Inc, Lincoln, RI)] or direct instruments [visual analog scale (VAS), standard gamble (SG) and time trade-off(TTO)] 10 . To date there is no consensus on which method is the most appropriate 11 , however, if the scores produced by different methods differ significantly, this can impact estimates of cost-effectiveness obtained and may lead to discrepant or uncertain conclusions as to whether or not an intervention should be recommended or funded.
In present study, we address the issue of concurrent validity and reliability of the following utility or preference measures in patients with SLE: VAS, SG, and EQ-5D. We also examine the ability of these measures to distinguish between subgroups of individuals with different levels of SLE severity, and compare them to a disease-specific instrument, LupusQoL. The predictors of utility of VAS and SG were determined by multiple regression models.

Methods
This study was approved by the Institutional Review Board of Shanghai Jiao Tong University and the Ethics Committee of Ren Ji Hospital, and all subjects consented to participation. The methods were carried out in accordance with the approved guidelines. All research was performed on the basis of the principles expressed in the Declaration of Helsinki.
Patients. 245 consecutive SLE patients, who were followed at the Ren Ji Hospital, School of Medicine, Shanghai Jiao Tong University from March 2013 to May 2014 were included. All enrolled patients satisfied the 1997 revised American College of Rheumatology classification criteria for SLE 12 , and were on a stable treatment regimen for at least 2 months at the time of entry into the study. Each patient took part in at least one standardized interview with the same trained interviewer. 100 patients were selected randomly to participate in a second interview within a 2 to 4 week interval. At each interview, all patients underwent assessments of disease activity [SLE Disease Activity Index (SLEDAI)] and damage [Systemic Lupus Collaborating Clinics/American College of Rheumatology Damage Index (SLICC/ACR DI)], and completed a disease-specific HRQoL measure, LupusQoL, and 3 utility measures: VAS, SG, and EQ-5D. The questionnaires were self-administered on the same day as the baseline evaluation. Questionnaires were administered in a different order each day to minimize the effect of sequence on the outcomes of interest.
Utility measures. EQ-5D. EQ-5D is a generic preference-based measure of health developed by a multidisciplinary group of researchers, and is one of the most popular methods for obtaining health state values to calculate QALYs 13,14 . This instrument has a structured health state descriptive system with five dimensions (mobility, self-care, usual activities, pain/discomfort and anxiety/depression) and the EQ visual analogue scale (EQ VAS). There are two versions of the questionnaire, one with five-level responses and one with three-level responses. We chose to use the version with five levels, EQ-5D-5L (no problems, slight problems, moderate problems, severe problems, and extreme problems), because it has improvement in discriminatory power and is considered more user-friendly. These five dimensions together define a total of 5 5 health states formed by different combinations of levels. EQ-5D-5L health states, defined by the EQ-5D-5L descriptive system, may be converted into utility values according to country specific value sets. Previous research has already demonstrated the construct and criterion validity in SLE patients 15 .
VAS. The VAS is usually represented by a line with well-defined end-points, on which respondents can indicate their judgments, values or feelings. It has been used in context of health as a measure of symptoms and various domains of health, and to provide a single index measure of HRQoL. This measure has been identified as a possible economic evaluation tool for over four decades, and has become one of the most widely used measures for economic purposes 16 . In this study, we obtained the VAS utility by using EQ-5D-5L VAS. Using interval markings of 0-100, where 0 corresponded to the worst imaginable health and 100 corresponded to the best imaginable health, patients were asked to indicate their state of health on the day of interviews. Reported VAS scores were divided by 100 prior to analysis to make the scores comparable to the 0-1 scale of the utility score.
SG. The SG, one of best known approaches to explicit assessment of preferences, is regarded as the gold standard for eliciting utilities due to its grounding in utility theory. Patients in our study were asked repeatedly to choose between the certain intermediate outcome and the gamble with varied probability (perfect health or death) until they were indifferent between the two alternatives 8 . This probability was the utility for this health state.
Disease specific HRQoL questionnaire. LupusQoL is a lupus-specific HRQoL questionnaire consisting of 34 items grouped in eight domains: physical health (PH), pain (PN), planning (PL), intimate Scientific RepoRts | 5:13297 | DOi: 10.1038/srep13297 relationships (IR), burden to others (BU), emotional health (EH), body image (BI) and fatigue (F) 17 . This tool was modified and validated for applicability to Chinese patients with SLE in our previous study 18 .

Statistical analyses. Spearman's correlations were calculated between the LupusQoL domains and
the utility measures of the first visit to assess validity.
Patients were divided into two groups by a SLEDAI score cutoff of 4 and SLICC-DI score cutoff of 1 respectively. To test the construct validity of these utility measures, we compared the utility scores between these two groups by using the Kruskal-Wallis test. Our hypothesis was that utility scores of patients would be altered in these two groups.
To assess reliability, intraclass correlations for VAS and SG or kappa coefficients for EQ-5D were calculated between first and second assessments, performed 2-4 week apart, in patients whose self-assessed quality of life was rated as no change on a 15-point health status change scale (− 7 to + 7).
Multiple regression models were performed for VAS and SG to determine predictors of utility. Variables included in the model were SLEDAI, SDI, and selected domains of LupusQoL which were considered to best characterize the values of patients' health states. We selected the best models according to Approximate Bayes Factors calculated by the Bayesian Information Criterion (BIC), which have better out of sample predictive properties compared to backwards or forwards model selection algorithms.
Categorical data were compared between groups using Pearson and Fisher's exact tests as appropriate and continuous variables were compared using analysis of variance or Kruskal-Wallis, as appropriate. Correlations were evaluated with Spearman's or Pearson's correlation tests, as appropriate. A strong correlation was defined as ≥ 0.70, moderate to substantial as 0.30-0.70 and weak as < 0.30. All reported p values were 2-sided and p values < 0.05 were considered statistically significant. We used SPSS software, version 10.0 to analyze data. Multiple regression models analysis was programmed in R software environment (version 3.1.1; R Development Core Team, Vienna, Austria).
The mean (SD) of the three health utility measures are shown in Table 2. The strength of the correlations between the utility scores of the three measures and the eight LupusQoL domains have been shown in Table 3. We found significant positive correlations between EQ-5D score and all domains of LupusQoL (p values < 0.01). The correlations were moderate to strong (0.4-0.7) for all domains except intimate relationship (r = 0.252) and body image (r = 0.179). A similar situation could be found in the correlation between VAS scores and domains of LupusQoL. However, SG scores correlated poorly with six domains of LupusQoL (maximum r = 0.293; minimum r = 0.151). The remaining two domains, burden to others and body image did not correlate significantly with SG scores (r = 0.104 and 0.057).   All three utility measures demonstrated the ability to discriminate patients with lower versus higher disease activity (SLEDAI 0-4 versus > 4) and damage scores (SDI ≤ 1 versus > 1). The utilities differences of these two groups were significant ( Table 4) Analysis of multiple regression models showed that a model that integrated the SLEDAI scores, pain and emotional health domains of LupusQoL was a good predictor of VAS (R 2 = 0.543) ( Table 5). SLEDAI score was also a good predictor in the regression model for SG (R 2 = 0.226) ( Table 5).

Discussion
Utility scores are widely used to compare the effect of different disease states on overall functional status and quality of life. They are also essential tools for comparing the cost-effectiveness of different treatments for a given disease. Both direct and indirect measures can be used to obtain utility values. They both have a solid theoretical basis, each having its advantages and disadvantages 8 . However, it remains uncertain whether the utility values obtained these methods are comparable and which approach is the   most appropriate for patients with SLE. This is the first study to examine the construct validity of three such instruments simultaneously in a relatively large sample of participants with SLE. Sufficient data in our study are available indicating that three utility measures can be used for assessment of health status in patients with SLE. The mean utility values derived from three methods, VAS, SG and EQ-5D, were at a comparable level.
However, similarly to the previous studies, each data of these methods displayed relative large standard deviation (SD). The most possible reason is that rheumatic diseases, especially SLE, have extremely variable health states. SLE, as a chronic disease with easy recurrence, has a wide spectrum of clinical manifestation. These disease characteristics were well reflected in the utility values.
In our study, the VAS and EQ-5D measures show better overall correlation with LupusQoL compared with SG instrument. The VAS and EQ-5D all reflected the general health status of SLE patients, and their correlation with the domains reflecting general health status (e.g. physical health, pain, planning and fatigue) were stronger. By contrast, SG was not significantly correlated with the disease-specific domains for SLE (burden to others and body image), and the correlations between SG and the remaining domains of LupusQoL were also weak.
Based on the axioms of expected utility theory, SG has traditionally been viewed as the "gold standard" for the measurement of the utility associated with particular health states 8 . According to the basic principles of SG, participants in a hypothetical are presented with the risk of immediate death when making a choice between two offered alternatives, which generally leads patients to choose the more conservative strategy. As a result, the findings deduced with SG may theoretically be slightly higher than other methods 19 . However, in our study the mean utility of SG was founded to be lower than VAS and EQ-5D. This finding casts some doubt regarding applicability of SG in this patient population. In China, patients living with chronic illnesses commonly face formidable challenges securing employment, attaining education and marriage, and suffer from discrimination in society. As such, some patients may have a strong desire to rid themselves of chronic disease, even at the cost of valuable life. We believe this kind of psychology may have had a significant impact on our results. Furthermore, although it is well established that the SG is normatively more validity than the other methods of health utility measurement, but its descriptive validity is less certain 8 . Some experts therefore recommend utilities derived from SG should be adjusted for probability weighting using a weighting function, thereby improving the descriptive validity without sacrificing normative validity 20 .
As indicated by good intraclass correlations between the two interviews, these three methods, VAS, SG and EQ-5D are all reliable in our population. Only one domain, anxiety/depression in EQ-5D, was found to be less reliable. However, the emotional states of patients with SLE such as anxiety and depression can be affected by many factors, some of which are associated with disease states and others are not. Therefore, it is possible that some patients who were in relatively stable disease states could experience emotional changes over a 2 to 4 weeks period due to other reasons.
A model integrated the SLEDAI scores, pain domain and emotional health domain of LupusQoL was a good predictor of VAS, suggesting that the health utility score depends not only on physical levels of lupus activity, but also on the patients' state of mental health as well. The finding also demonstrated that pain is one of the most important factors impacting quality of life in this population.
This study had a few important Limitations. All patients were recruited from a single clinical site in China, which limits the generalizability of our findings. However, as one of the most reputable departments of rheumatology in China, our patients population consisted of diverse range of SLE cases referred from a large catchment area. It should mitigate the influence of the weakness to our results. We also did not take the measurement of the minimal clinically important difference for each instrument. Further studies to measure the responsiveness of utility measurement to change following treatment of SLE are also required.

Conclusions
The results of this study lend support to the construct validity of two instruments (VAS, EQ-5D) in SLE and provide some detail regarding their limitations and strengths. SG, if it is adjusted for probability weighting, might become a more efficient method for utility calculation in SLE however this requires further examination. Our study provides a foundation for future studies evaluating utility data with larger samples of patients with SLE and provides an important theoretical approach for the reasonable allocation of health resources for this population.