Introduction

Recent developments in the medical treatment of individuals with spinal cord injury (SCI) have resulted in greater life expectancy post injury.1 Accordingly, enabling patients to attain an acceptable quality of life is considered by many to be the primary goal of health-care providers following SCI.2 To pursue such an objective in a rigorous and quantifiable manner, the evaluation of alternative health-care interventions and/or rehabilitation services requires outcome instruments that are capable of adequately measuring health-related quality of life (HRQoL) in this patient population. The difficulty of capturing quality of life constructs, whether health-related or not, has been discussed previously in the peer-reviewed literature. These include the lack of consensus for a general definition of quality of life,3 the need to distinguish between conceptually distinct subjective and objective measurement perspectives,4 and the aptitude of individuals to adapt to their condition.5, 6

The broad concept of HRQoL measurement includes a subset of instruments that are used primarily within cost-utility analysis, so-called ‘preference-based’ measures of HRQoL (also known as multi-attribute utility scales, preference-based health-state classification questionnaires, or utility measures). Cost-utility analysis is a form of economic evaluation that facilitates comparison of the ‘value’ of interventions across clinical specialties for resource allocation purposes, through the use of a generic measure of health, the quality-adjusted life year (QALY).7

Given the increasing support for economic evaluation within a cost-utility framework,8 the measurement properties of preference-based HRQoL instruments have been evaluated in many clinical areas to ensure that they provide practical, reliable and empirically valid estimates of health benefit.9, 10 However, the SCI research community has not embraced such measures to the same extent. A recent systematic review of studies that have assessed the measurement properties of quality of life instruments within SCI populations identified only two papers relating to preference-based measures, both of which evaluated a single instrument.11 This study reports a comprehensive review of the adoption and assessment of preference-based HRQoL measures within the context of SCI research. It provides valuable information for researchers interested in economic evaluation and preference measurement, and for all readers who look to interpret the findings from studies that incorporate preference-based measures (for example; population surveys, economic evaluations, and psychometric assessments). Secondary objectives of this multi-component review were to identify knowledge gaps regarding preference-based measurement and SCI, and to highlight important areas for future research.

Preference-based HRQoL instruments

Preference-based HRQoL measures are made up of two constituent parts: a descriptive system and a valuation system. The descriptive system defines respondents’ HRQoL as one of a finite number of health states, that is, each combination of item responses defines a particular health state. Given that preference-based HRQoL measures are developed for use across the complete spectrum of clinical conditions, the descriptive content should capture a broad range of health dimensions and provide response options that enable respondents to accurately describe their current health state. The valuation component of preference-based HRQoL instruments is a procedure for scoring each health state defined by the questionnaire. These single index scores represent the relative value that society places on living in each health state (often known as ‘community-derived’ or ‘societal’ preferences), and fall on a scale where 1 indicates full health and 0 represents a health state equivalent to death. Negative index scores can be generated, which represent health-state valuations considered to be worse than death.

Multiple applications have been touted for preference-based measures, including the determination of profile scores across individual dimensions or a global index of HRQoL for use in population-based studies.12 However, their primary role is to provide utility estimates for the purposes of generating QALYs in cost-utility analysis. QALYs represent the benefit of a health-care intervention in terms of time spent in a series of quality-weighted health states, incorporating the effects of changes in mortality (quantity of life) and morbidity (quality of life) within a single measure.7 Mortality is a relatively simple outcome because an individual is either alive or dead. With regard to morbidity, preference-based HRQoL measures provide the ‘quality-adjustment’ with which to weight periods of time spent in different health states.

Many preference-based measures have been developed and evaluated since the late 1970s and a number of different questionnaires are currently used in economic evaluation. To allow for a comprehensive summary of the use of preference-based measures in SCI research to-date and to highlight recent developments regarding the alternative options available to researchers, this study focuses on six instruments: the 15D,12 the Assessment of Quality of Life (AQoL),13 the EQ-5D,14 the Health Utilities Index (HUI),15 the Quality of Well-Being Scale (QWB),16 and the SF-6D.17 This review focuses solely on generic preference-based HRQoL instruments, that is, questionnaires that are used in the conventional economic evaluation framework to reflect societal preferences.7 Studies that consider direct preference elicitation techniques (such as time trade-off or standard gamble), reflecting patient/individual preferences, are not within the scope of this review.

Specifying the instruments under consideration for any study involving preference-based HRQoL measures requires particular attention. Multiple formats (and translations) exist across the six instruments and many of the formats have undergone subsequent valuation studies to provide index scores for country-specific populations. Table 1 reports key properties relating to 10 instrument formats, providing a concise summary of the descriptive systems and the valuation studies.

Table 1 Key properties of 10 formats of preference-based HRQoL instrumentsa

Materials and methods

Search strategy

A review of the peer-reviewed literature from January 1995 to May 2011 was conducted. Electronic database sources were: Medline, PsycINFO, Excerpta Medica Database (EMBASE), Cumulative Index to Nursing and Allied Health Literature (CINAHL), Hispanic American Periodicals Index (HaPI), EconLit, Cochrane Central Register of Controlled Trials (CENTRAL), Cochrane Methodology Register (CMR), NHS Economic Evaluation Database (NHSEED), the Health Technology Assessment database (HTA) and the database held by the Patient-reported Health Instruments (PHI) group at Oxford University (http://phi.uhce.ox.ac.uk/). The search strategy comprised a combination of SCI-specific search terms and the names and abbreviations of the six instrument ‘families’ described above. Additional search terms relating to QALYs, quality-adjusted life expectancy and utility were also included to account for papers where the preference-based measure was not identifiable from the abstract, keywords or database indexing. The search strategy is provided in Appendix A.

To complement the database search, a bibliographic search of publication histories for lead authors of included papers was also done to identify potentially suitable papers (Medline only).

Inclusion criteria

Inclusion criteria were applied in two stages. Firstly, any reference to one of the aforementioned preference-based HRQoL instruments, or more general reference to QALYs, quality-adjusted life expectancies or utility measures within the abstract (including keywords) of an identified article was deemed potentially eligible for inclusion. Further requirements for the papers to be written in English and published in peer-reviewed journals were incorporated at this first stage. Having identified definite exclusions, full text versions of the remaining articles were obtained. Given the incorporation of broad search terms (QALY, utility and so on), the second stage of the inclusion process was to verify that at least one of the target instruments was present in the retrieved studies with specific regard to a traumatic or non-traumatic SCI population. For studies that considered broader patient populations (trauma, injury, disability and so on) as the primary focus, the inclusion criteria for this review required explicit reference to an SCI group in relation to the preference-based measure. The same inclusion criteria were applied to articles identified from the database and bibliographic searches.

Both stages of the study selection process were performed, independently, by two of the authors (DGTW and SB); a third author (VKN) provided assistance with regard to clinical queries. Disagreements between reviewers were resolved through discussion. Reasons for exclusion were documented at each stage. The bibliographic search was conducted by the lead author only.

Analytic considerations

Specific analytic considerations consisted of establishing how preference-based instruments have been used in research studies, tallying the frequency of use for different instruments and identifying the extent to which mean index scores have been reported for defined SCI (sub)groups. A further goal was to collate and appraise evidence for measurement properties from psychometric evaluations identified through the systematic search.

The identification of mean utility values in patient (sub)groups is important when the use of preference-based measures in primary research has been limited. Secondary sources of data are frequently used in decision analytic models to provide parameter estimates and, therefore, published utility estimates provide a valuable resource for future modelling exercises. Pooling reported utility scores can also provide insight into the direct comparability of different preference-based HRQoL measures within defined SCI populations. By considering reported mean index scores and evidence of measurement properties within a single study, readers are better placed to judge the validity of published utility values.

A final explicit analytic consideration of this multi-component review was to explore how the appropriateness of alternative preference-based measures for SCI research has been discussed, such as the justifications for selecting particular instruments or the range of measures considered in a review article. Albeit subjective in nature, this analytic focus originated from the expectation that few studies have explored the measurement properties of preference-based instruments in an SCI context and, therefore, attempts to identify the nature of non-evidence-based opinions in the literature.

Results

A total of 420 unique abstracts were identified from the database search strategy, with 19 papers meeting the inclusion criteria. The bibliographic search resulted in an additional 3 papers being included in the review (22 in total). A breakdown of the search strategy retrievals, by database, and the reasons for exclusions at each stage are reported in Appendix Tables B1 and B2, respectively.

An overview of the field: usage, frequency and index scores

Table 2 reports brief study details for the 22 papers, identifying which preference-based measures feature in the manuscript and the role that the respective measures had in the analysis. Four review papers were identified: one systematic review of published measurement properties for quality of life instruments in individuals with SCI11 and three semi-analytic commentaries.40, 41, 42 The QWB-SA and the SF-6D are the only preference-based HRQoL instruments to have been subject to psychometric evaluation within an SCI context (discussed further in the following section), although the EQ-5D was the most frequent measure to feature in the 18 non-review papers (8 times). The recently developed EQ-5D-5L and the AQoL instruments have not been considered in any empirical form or commentary.

Table 2 Details of the (a) 18 non-review studiesa and (b) four review studiesa

Only two studies used preference-based measures within a decision-making context as part of an economic evaluation. However, neither of these two decision model evaluations had patient-level data to calculate utility estimates, instead using subjective judgement from expert panels24 and researchers30 to provide representative responses for ‘typical’ patients.

Eleven (61%) of the 18 non-review studies report at least one (sub)group mean utility score relating specifically to individuals with SCI.22, 23, 31, 32, 33, 34, 35, 36, 37, 38, 39 In total, 55 mean scores were retrieved, with the SF-6D being the dominant contributor (21 (sub)group means, all reported in Lee et al.34), while only one mean utility score is reported for both the HUI-238 and HUI-3.38 The 55 mean scores covered 49 different (sub)groups of patients; for one subgroup, utility estimates are reported for the EQ-5D, HUI-2 and HUI-3,38 while SF-6D scores calculated from both the SF-36 and SF-12 are reported for four distinct subgroups.34 The group mean data are purposely not reported in this manuscript because of the paucity of empirical validity evidence and the importance of study-specific details, that is, patient samples and injury characteristics.

Differences in study characteristics make it difficult to draw definitive conclusions about the comparability of instruments in SCI populations based on current literature. From the single observation where the same individuals completed multiple instruments, variation was observed in index scores generated by the EQ-5D (0.63), HUI-2 (0.81) and HUI-3 (0.68).38 Indirect comparisons also suggest variation between QWB-SA and SF-6D (SF-36) index scores; reported values for different subgroups of tetraplegic patients were 0.53 (QWB-SA22) and 0.68 (SF-6D34), while paraplegic patients reported mean scores of 0.56 (QWB-SA22) and 0.73 (SF-6D34).

Evidence of measurement properties from psychometric evaluations

Table 3 collates the evidence for measurement properties of the QWB-SA and SF-6D (SF-36) for SCI populations, identified from the studies of Andresen et al.22 and Lee et al.34 Polinder et al.38 explored the measurement properties of convergent and construct validity in a comparative evaluation comprising the EQ-5D, HUI-2 and HUI-3.38 However, the study sample consisted of general injury patients of all levels of severity and, therefore, the findings were not specific to SCI.

Table 3 Measurement properties specific to preference-based HRQoL instruments in individuals living with SCIa

Methods of analysis within the two psychometric evaluations primarily focused on issues of acceptability and feasibility, floor and ceiling effects, and discriminative validity (that is, the ability of an instrument to distinguish between subgroups that are expected to differ with regard to their HRQoL). In addition, responsiveness was assessed for the SF-12 and SF-36 versions of the SF-6D, with both measures demonstrating positive results in individuals with SCI who had developed urinary tract infections. A minimal important difference of 0.03 was reported for the SF-6D (SF-36 version) for respondents who reported being somewhat worse or somewhat better compared to one year ago.34 Both papers conclude that the respective instruments (QWB-SA and SF-6D) are appropriate for research within SCI populations, while highlighting the need for further investigation.

The QWB-SA and SF-6D validity studies can only be considered to be partial evaluations and neither provides overly convincing evidence. A priori defined hypotheses were confirmed in 50% (2 of 4) and 43% (3 of 7) of constructs for the QWB-SA and SF-6D, respectively, which falls short of quality criteria proposed for the assessment of measurement properties.43 In addition, because data was collected during interviews in both studies, no information exists with regard to instrument-completion rates (that is, the proportion of respondents that provide sufficient information to enable calculation of an index score) or item-completion rates (to identify if response patterns are consistent across items within an instrument). These fundamental measurement properties are important considerations given the regular use of postal questionnaires in health services research.

General consideration of ‘appropriateness’ for SCI populations

The database and bibliographic searches identified three review papers that address the general concept of quality of life measurement. Two papers reviewed measures for quality of life assessment specific to SCI,40, 42 while Stadhouder et al.41 reviewed outcome measures for the broader concept of spinal trauma. Within the spinal trauma review paper, published in 2010, the EQ-5D and HUI were the only instruments identified through a literature search. The study reported that no validity evidence for spinal trauma populations exists for either instrument, although the authors state that the EQ-5D is being used increasingly as an outcome measure in spine research.

Both SCI-specific reviews considered published evidence for quality of life instruments in addition to subjective expert opinion (clinical and/or quality of life experts as opposed to experts in the field of preference-based HRQoL). Wood-Dauphinée et al. discuss the QWB only in their 2002 review, citing the psychometric evaluation discussed earlier (Andresen et al.22), in recommending the instrument as an appropriate generic measure of HRQoL.42 Similarly, Dijkers40 highlights only two SCI studies that report using preference-based measures—the QWB in both instances22, 36—although the EQ-5D, HUI-3 and SF-6D are named as available alternatives.

Amending existing questionnaires in order for them to be more acceptable to SCI populations was discussed with regard to the EQ-5D and SF-6D. Despite concluding that the SF-6D is a reliable measure for persons with SCI, Lee et al.32 stated that the exclusion of questions that ask about walking or climbing stairs may make the instrument more acceptable.34 In three of the retrieved studies, which focused on injury patients across all levels of severity, a cognitive dimension was added to the EQ-5D (sometimes referred to as the EQ-6D or EQ-5D+C) because of the perceived inability of the EQ-5D to capture important health-status consequences regarding cognitive function due to, in particular, dementia and mental retardation.35, 37, 38 Responses to this additional dimension provided a descriptive outcome for cognitive impairment only and did not contribute to the derivation of utility scores. In a fourth EQ-5D study,23 researchers changed the wording of the mobility dimension response options, so that references to walking ability were replaced with the ability to move with a wheelchair. No further details are provided with regard to this amendment, for example, validation studies or permission from the EuroQol Group. This suggests that the reported index scores are not valid estimates of preference-based HRQoL and, therefore, their derivation should not be attributed to the EQ-5D instrument.

Discussion

Key findings

This systematic review provides the first comprehensive report documenting the adoption and assessment of generic preference-based HRQoL instruments in the context of SCI. Despite a previous study concluding that there are numerous promising quality of life instruments for SCI research,11 the same can not be said for the preference-based subset. Without any restrictions on study design (that is, primary studies and review studies were eligible) and a broad search strategy (comprising terms for SCI, six ‘families’ of preference-based instruments and general outcome terminology common in the area of economic evaluation), only 22 papers were identified.

Two studies used preference-based measures within a decision-making context as part of an economic evaluation—both decision analytic models. However, both models relied on subjective judgement to provide representative responses for ‘typical’ patients.24, 30 Although other economic evaluations exist in the SCI literature (which have looked at SCI-specific measures of outcome in a cost-effectiveness framework44 or used direct elicitation techniques to estimate patient-specific utilities45), an important finding from this review is that no studies have used preference-based measures in their conventional form, that is, to calculate QALYs using patient-level data in a cost-utility analysis.

Recently, an economic evaluation of electrical stimulation therapy for pressure ulcers in SCI was published, with the incremental cost-effectiveness ratio (the primary outcome measure) representing the incremental cost per wound healed. In discussion of the study limitations, the authors state that they ‘did not consider QALY as an outcome because there is a paucity in the literature of health preference values’.44 While there is value in the provision of such evidence, this study highlighted how SCI is lagging behind other fields with regard to the state of knowledge necessary in modern-day health-care provision, where cost-utility analysis is becoming a staple component of technology appraisals.8

A second key finding concerns the lack of psychometric evaluation. For example, there is no published evidence regarding the reliability of any preference-based HRQoL measure in SCI populations. Currently, evidence for the empirical validity of utility scores for individuals living with SCI is only available for the QWB-SA and SF-6D; no evidence exists for the EQ-5D or HUI, which are widely used in all clinical areas across Europe and North America.

A third observation was the absence of awareness of the range of measures available. For example, neither the AQoL instruments nor the 15D were mentioned in the identified review/commentary papers. This is particularly surprising for the 15D, given that two studies assessing quality of life in individuals with SCI (using the 15D) were published at least 4 years before the first review paper. A commonality between the AQoL-6D and 15D instruments is the incorporation of items relating to sexual relationships (the AQoL-4D also refers to general relationships with friends, partners or parents). Sexual dysfunction is a major secondary complication associated with SCI.46 Given the desire for comparable methods in economic evaluation and the unavoidable level of pragmatism needed in applied health research, it is not surprising that research focuses on the most common preference-based HRQoL measures (that is, EQ-5D, SF-6D or the HUI).9, 10, 47 However, in order to act in the best interests of individuals with SCI (by using instruments that enable people to adequately describe their health state) and to be objective in the consideration of alternative measures, it seems nonsensical to disregard instruments that may have face validity and content validity advantages.

Areas for future research

Comparative evaluation of alternative measures within the same study sample is commonly highlighted as a necessary area of research within particular clinical conditions. However, for SCI populations this may be premature. Qualitative research that explores the suitability of the descriptive systems of alternative instruments would provide an appropriate starting point, allowing evidence-based judgment as to which measures should be considered for further empirical, quantitative evaluation.

In addition to the concerns regarding the appropriate choice of HRQoL measure for economic evaluation, criticisms of the health-related focus of the current QALY framework have centred on its failure to capture important sources of value to patients and to society in areas such as mental health, long-term disability, social and informal care, and public health.48 To capture the full benefits of many health- and social-care interventions, broader measures of wellbeing that go beyond a narrow health focus may be more suitable; SCI provides an opportunity to explore a clinical area where broader, non-health attributes may be better suited to capturing the benefits of health care that are valued by patients.49

Study limitations

A limitation of this review was the potential for relevant articles to be overlooked by the database and bibliographic search strategy; 3 of 22 (14%) papers were not identified through the database search. Systematic reviews regarding quality of life outcome measures can be problematic owing to the absence of formal indexing standards and the subjective reporting styles of authors. Within SCI research, it has previously been stated that authors need to improve the quality of abstracts to make retrieval and screening of relevant papers more effective and efficient.50 A comprehensive search strategy, use of broad search terms and a two-stage inclusion criteria were used to address these potential problems and to reflect the broad nature of the research question.

Concluding comments

There is a substantial lack of evidence regarding the appropriateness of preference-based HRQoL instruments for SCI populations. Although research in many clinical areas has progressed from psychometric assessments of individual instruments, through to evaluations to consider the direct comparability of alternative measures, there is a dearth of empirical data to support the use of any preference-based instrument for SCI research.

Numerous challenges exist with regard to quantifying the quality of life of individuals with SCI.40 However, the objective of the present study was not to assess the pros and cons of alternative preference-based HRQoL measures, or the pitfalls of preference-based measurement per se, in the context of SCI. National bodies advocate the use of preference-based HRQoL measures for the purposes of resource allocation decision making and the question is not whether they should be used, it is a question of how we move forward to meet the urgent need for appropriate measures of patient benefit suitable for economic evaluation.