Introduction

The use of genetic tests in Europe is expanding at a rapid pace.1, 2 Even though technical improvements are leading to decreasing laboratory costs per tested mutation, overall expenditure on genetic tests is likely to increase, also as a result of the challenges of interpreting DNA sequences. On the basis of theoretical considerations, a recent study demonstrated that, even today, funding is insufficient to provide all potentially beneficial tests to the public.3 Faced with a large number of novel genetic tests, service providers are therefore likely to face limited budgets that do not allow them to offer all valuable tests to their patients.

Therefore, decisions may have to be made on how to allocate resources between competing tests. This is usually referred to as ‘prioritisation’, ie placing genetic tests into a rank order to choose the tests that are the most important to provide.4 Up to now, there has been no practical guidance on how to prioritise genetic tests. As a result, decisions are likely to be left to the providers of genetic tests, resulting in heterogeneity within and across health-care systems.5 In a recent study, Canadian decision makers reported that, in the absence of transparent and principle-based methodologies, decisions are often made on an ad hoc basis.6 Ad hoc decision making imposes the risk of an unfair distribution of benefits and disadvantages.7

To prepare guidance for fair and more harmonised prioritisation, it is desirable to obtain evidence of the actual value judgements of key stakeholder groups (such as policy makers, funders, scientists, clinicians, industry and patients). Indeed, the inclusion of multiple perspectives may be considered the most important element in fair priority setting.8 The aim of this study is to contribute to a more structured approach in eliciting and comparing preferences across key stakeholders in order to support the development of an explicit approach for prioritising genetic tests.

Establishing the ‘value’ of health-care interventions is a complex task. In the case of genetic testing, multiple benefits need to be accounted for.9 Clinical studies may conclude that genetic tests offer no value if the screening does not change the course of treatment or improve patient outcomes. However, patients, doctors and other stakeholders may place great value on genetic testing for a variety of non-medical reasons (especially the value of knowledge for making decisions about life planning rather than to guide clinical interventions10, 11). Indeed it has been argued that only by considering health and non-health criteria simultaneously can the full value of genetic tests be captured.12

Different approaches are available when analysing multiple criteria to guide health-care decision making. In the last decade, discrete-choice experiments (DCE) have become widespread practice when analysing respondents’ perspectives regarding resource allocation decisions, for example to determine the public’s distribution preferences within the UK NHS13 or to consider the implementation of lung health programmes in Nepal14 (see De Bekker-Grob et al15 for an overview). However, till now, no DCEs have been conducted to support resource allocation decisions for genetic testing. Best-worst scaling (BWS) is an alternative preference electing technique that is assumed to have a number of advantages over DCE. In particular it may impose less cognitive burden on respondents as it presents profiles one by one rather than two (or more) at the same time. However, BWS is a relatively new method, and there is a lack of evidence demonstrating the superiority of this method.

In this paper, we are interested in testing the feasibility of both the DCE and the BWS approach in the context of decision making for priority setting in genetic testing. The study is undertaken with the objective of informing the design of a larger survey on priority setting criteria. It is the first stage in a project designed to establish priority scores for different genetic testing options. Ultimately, the project seeks to integrate priority scores into the Clinical Utility Gene Cards16 in order to assist in appraisal regarding the tests’ relative importance in the case of resource scarcity.

Materials and methods

Priority setting criteria and levels

DCE and BWS experiments are both preference eliciting methods that analyse the relative importance of several attributes and their corresponding levels. We developed attributes for inclusion in the experiments from a qualitative review of the literature on the criteria proposed for priority setting relevant to genetic testing. We hypothesised that both the characteristics of the individuals undergoing the test and the attributes of the test itself influence the order of priority.

As there is growing awareness that incorporating patients’ (or patient representatives’) opinions in priority setting decisions can help to ensure the fairness and legitimacy of the decision outcomes,17 we subsequently conducted a series of qualitative interviews with patient representatives from different countries in Europe. The overall aim of this was to: (1) assess whether characteristics identified in the literature review were confirmed as being important from the patients’ point of view; and (2) establish whether patients think that there are other essential factors not mentioned in the literature. On the basis of the literature research and the results of qualitative interviews, a preliminary list of attributes was established. Next, the criteria were checked for conceptual overlap as this would hinder accurate analysis.18

Table 1 presents the levels that were assigned to each attribute. These levels were selected to represent the available testing options. To be confident that the levels reflected clinical scenarios and are therefore meaningful for the respondents, members of the research team identified a suitable clinical situation for each level from the published literature.

Table 1 Explanation of prioritisation criteria and levels in the DCE

Design of the discrete-choice experiment

In the DCE, the respondents were asked to state their choice from two hypothetical alternatives. On the basis of five criteria with two levels and one criterion with three levels, 96 (25 × 31) different scenarios could be defined (full factorial design) resulting in over 4500 combinations of pairwise choices. Clearly, this would be an impossible task for any person conducting the DCE. Therefore, we reduced the number of choice sets using a D-efficient design generated with SAS software (SAS Institute Inc., Cary, NC, USA, Version 9.2).15 Following this approach, we created 12 choice tasks similar to that shown in Figure 1 (resulting in a D-efficiency of 90.57).

Figure 1
figure 1

Example DCE question.

Each choice task involved a forced choice between the two different genetic tests presented. The forced choice design was chosen as neither opting out nor keeping the status quo appeared to be realistic decision options in clinical practice. We decided in favour of a generic presentation of the scenarios using the label ‘testing option A/B’ to ensure that participants base their decision on the attribute levels rather than on the disease name.19 Findings from earlier DCEs support a generic presentation of genetic testing options.20, 21

Additionally, a dominant choice task, in which one testing option was assumed to be superior to the other in all attributes was included in the questionnaire to test participants’ understanding of the DCE format. We assumed that participants prefer diagnostic testing for highly prevalent, treatable and severe conditions where no alternative to genetic testing exists and the infrastructure for delivering the testing is widely established.

Design of the BWS experiment

The BWS experiment used the same attributes as the DCE, but the nature of the choice tasks was different. Rather than deciding which of two profiles they prefer, respondents are presented with one profile, in which they have to choose the ‘most important’ item (from which they derive the highest utility) as well as the ‘least important’ item (from which they derive the lowest utility) based on the levels displayed in a given situation. For the BWS experiment, the full factorial design resulting from all possible combinations of attribute levels was reduced using an orthogonal array resulting in 24 options. As we assumed that 24 choice tasks are still too many to evaluate for one respondent we blocked the design into two versions (using the SAS Block procedure). Figure 2 shows an example of a BWS exercise.

Figure 2
figure 2

Example BWS question.

BWS has a potential for bias, when so called ‘easy choices’ are presented, ie profiles in which one attribute is at the top (bottom) whereas all other attributes are at the bottom (top) level. When such states appear this is likely to result in greater choice consistency (variance heterogeneity).22 Therefore, we tried all possible coding schemes and chose the one that minimised this problem arising from easy choices.

Administering the questionnaire

Both experiments were conducted alongside the European Human Genetics Conference (Amsterdam, the Netherlands, 28–31 May 2011/ Nürnberg, Germany, 23–26 June 2012). Attendees were approached during breaks and asked to participate in a face-to-face interview. Only one interviewer (FS) administered the questionnaire to standardise and maintain the quality of interviews and data collection.

Respondents received either the DCE or one of the BWS questionnaires. The questionnaires were presented in three sections. The first section consisted of socio-demographic information, including gender, profession and nationality. The second section consisted of the DCE or the BWS experiment. The attributes and levels relating to the choices were described to each participant. While answering the choice tasks, we asked participants to imagine that they were clinical geneticists faced with the difficult choice of how to allocate their budget. The third section contained a series of follow-up questions regarding participants’ understanding of the choice format and potential improvements in the survey instrument. During the interviews, participants were encouraged to discuss any problems arising.

Data analysis

Discrete-choice experiment

The choices observed in a DCE are assumed to reveal the true latent but unobservable utility (ie measure of individual value or benefit). As shown in Equation 1, the utility Uijt that individual i derives for each of j choice alternatives in a choice questions t can be decomposed in a determinant part Vjt and random error term denoted by:

where βik represents the marginal utility of an additional unit of attribute k, and Xkjt is the value of attribute k shown in alternative j, question t. ɛ denotes the random error term. Assuming that ɛ is independently and identically distributed (IIA) following a type 1 extreme value distribution the conditional logit regression model can be applied. The probability of observing a particular sequence of choices C in choice questions t with j alternatives is then given by:

Specifically, the econometric model estimate of our study has the following form:

A main effect model was used, which assumes that all attributes have an independent effect on the choices (ie all interactions between attributes were zero) as including attribute interactions increases the requirements of the experimental design. Exclusion of interaction terms may be justified as the main effects typically account for 70–90% of explained variance in DCEs.23

Effect coding was used to transform the attributes’ levels into L-1 dummy variables. Given the coding, we expected all β coefficients to have positive signs, indicating an increase in utility.

BWS

Although the choice task was different in the BWS experiment—respondent i always chooses the most and the least important item from a given set of items t—this choice task can still be modelled in the discrete-choice framework outlined above:22 a utility function can be specified for each possible most/least important pair measuring the difference in utility between each pair chosen as most important and least important. Thus in each choice profile, respondents could choose between 30 most/least important pairs. According to Potoglou et al.24, the utility of choosing for example ‘high severity’ as most important and ‘personal knowledge’ as least important within a given choice question t can be specified by:

Whereas it is necessary to fix one level of each attribute to avoid over specification of the model in the DCE, in BWS experiments, this need only to be done for one attribute level. In our model the preferences for each attribute level were estimated relative to infrastructure ‘not established’. All estimates of the model can then be interpreted relative to the omitted level.

Results

Respondents

A sample of 31 respondents completed the DCE questionnaire and 26 (13 per version) completed the BWS questionnaires. Table 2 provides an overview of the sample’s socio-demographic characteristics. On average, it took participants about 25 min to complete the interview for both DCE and BWS questions.

Table 2 Sample characteristics

Model estimation

Table 3 presents the estimated coefficients of the conditional logit model using data from the DCE and BWS experiments, respectively.

Table 3 Results from the conditional logit models

In the DCE, one respondent failed to answer the ‘dominant choice task’ (the option that was dominant to the alternative in all provided attributes) correctly. We tested the ‘irrational responder’, demonstrating that the exclusion did not have major effects on the results. Therefore, data from this respondent were included in the analysis as suggested by Lancsar and Louviere.25 All coefficients resulting from the DCE analysis have a positive sign and thus have the expected direction. Staging of the attribute ‘prevalence of the condition’ is consistent across attribute levels (ie βPrevalence_high>βPrevalence_medium). Except for the level coefficient referring to Infrastructure ‘established’, all coefficients were significant (P<0.05). Coefficients in the DCE present the part worth utilities associated with changes in each attribute level compared with the reference case of this attribute. For example, a genetic test leading to mortality/morbidity reduction resulting from available treatment or preventative options has a higher level of utility than testing options that are conducted only for personal knowledge (everything else being equal).

In the BWS experiments, the preferences for each attribute level was estimated relative to Infrastructure ‘not established’. As such, each coefficient can be interpreted as the utility on a common underlying preference scale.26 Coefficients with a positive sign indicate a higher level of utility compared with the reference level, and vice versa for negative coefficients. In contrast to the data derived from DCEs the staging of the two lower attribute levels of ‘prevalence of the condition’ is not consistent across levels (ie βPrevalence_medium<βPrevalence_low). However, these parameter estimates were not statistically significant. The staging of the other attributes levels was in line with the a priori expectations. The level estimate for the availability of alternatives was also not statistically significant.

Comparison of the values derived from DCE and BWS

In order to compare estimates derived from the DCE and the BWS experiments, the difference in preferences within an attribute has to be examined. Therefore, we calculated the marginal value of moving from the lowest attribute level to a higher level as outlined by Potoglou et al.24 As BWS and DCE have different underlying preference scales the coefficients cannot be compared directly.27 However, the relative size can be assessed by dividing all attribute levels by a fixed attribute level and thus by scaling all levels relative to this.24 For this division, we have chosen the ‘severity’ level of ‘high’.

Figure 3 presents an overview of rescaled values derived from the DCE and BWS experiments. As can be seen, the attributes ‘availability of treatment options’ to reduce mortality and morbidity’ and ‘highly prevalent conditions’ have been valued most highly in both experiments even though utility weights differ. Although the middle and rear parts of the rank ordering of attribute levels differ in detail, the results suggests that, in both experiments, the participants attached a higher utility to testing for severe conditions than to the non-availability of alternatives or to diagnostic testing.

Figure 3
figure 3

Comparison of rescaled DCE and BWS coefficients.

Follow-up: participants’ cognitive ability

Table 4 presents data on respondent self-reported difficulty in understanding and answering the choice formats. The majority of respondents to the DCE reported that they found the choice tasks not very difficult to understand. However, the respondents reported that the DCE questions were difficult to answer. In contrast, more than half the participants in the BWS exercise found the BWS tasks difficult or very difficult to understand and nearly half of the participants found the questions very difficult to answer.

Table 4 Difficulty in understanding and answering DCE/BWS questions

On conclusion of the DCE/BWS exercise, we discussed the choice tasks with the respondents in order to learn about potential improvements in survey design. Issues covered in the follow-up questions included the attributes and levels and the choice exercise as a whole. Further information on the follow-up questions and the attribute selection are available in a Supplementary Appendix available from the corresponding author on demand.

Discussion

To our knowledge, this is the first study testing the feasibility of the DCE and BWS formats in weighting different key considerations for priority setting in genetic testing. Given the exploratory nature of the study, the findings are not generalisable. The study is used to exemplify how quantitative preference eliciting methods can be used to informing decision makers about aggregated preferences.

The results derived from the DCE and BWS experiments indicate that the genetic experts surveyed in this study consider clinical utility (in terms of the availability of treatment and prevention options) as the most decisive criterion for prioritising genetic tests. Thus, genetic tests that lack a treatment strategy and therefore merely provide information had an increased probability of being rejected. This finding corresponds with the observation that, generally, the effectiveness of a health technology and thus improved health outcomes are seen as key criteria in priority setting decision making.28 Most regulatory and reimbursement agencies assess the scientific evidence of a technology’s effectiveness if explicit coverage decisions are made.5

Independent of whether or not treatment is available, the participants exhibited a very strong preference for testing for highly prevalent conditions. This finding may indicate that participants accounted for the criterion of related health need when prioritising genetic tests or that they aimed at maximising health benefits from scarce resources.

Comparing DCE and BWS

Although we found similar preference patterns, as outlined above, values derived from the DCE and BWS differed in detail. While this could reflect bias due to true differences in preferences between DCE and BWS participants, it might also result from methodological effects. Flynn22 discussed further how the different nature of the choice task might lead to disagreement between the parameter estimates. Although Potoglou et al.24 found similar preference patterns for quality of life domains in BWS and DCE experiments, the similarity of the findings derived from DCE and BWS data still remains unclear within the context of priority setting. Further research is needed to assess under which circumstances and due to what reasons the results derived from the BWS approach differ from the values derived from the DCE.

Comparing the DCE and the BWS approaches, BWS is cited as having a number of advantages because it presents respondents with profiles one by one rather than two at a time.29 Thus, BWS has the advantage that all coefficients can be estimated on a common scale. This allows comparison of utilities across all attribute levels (and not just differences within each attribute).30 Moreover, although all preference eliciting methods are cognitively burdensome for participants, the one profile approach is often assumed to be less cognitively demanding for participants. However, within this study, we could not find any evidence supporting this assumption in the context of priority setting. Indeed, answering BWS questions was found to be even more burdensome by respondents than answering traditional DCE questions.

In on-going research on priority setting in genetic testing, the advantages of the BWS approach must be traded off against the potential for bias that may result from an additional cognitive burden placed on respondents as well as the lack of published experiences with this novel elicitation technique. Concerning the project on prioritising genetic tests it appears more suitable to rely on results from the DCE approach as it has been applied and discussed more widely in the scientific literature and it seems to be less cognitively burdensome for respondents to answer.

Feasibility of the choice formats

The findings from this pilot study also provided insight into the feasibility and practicality of the BWS and DCE approaches. In general, the individuals we approached seemed to be involved in the experiments, which might be viewed as an important indicator of survey acceptability. During this pilot study, we found broad support regarding the necessity of the research project as such and did not encounter principally negative reactions, eg, participants opposing such research, because they believed unlimited genetic testing should be made available.

Respondents participating in the DCE experiments in general revealed fewer problems in answering and completing the questions than participants in the BWS. This may be alleviated by modifying the introduction to the BWS choice tasks. In the DCE, the staging and direction of the coefficients were consistent with a priori expectations and thus support the theoretical validity of the choice experiment.31 The BWS experiment, in contrast, revealed irrational answerers for the two lower levels of prevalence of the condition, which might be due to bias resulting from cognitive overload.

Methodological limitations

The methodological designs used in this pilot study are not complex, as only small factorial designs are used in both experiments, and analysis was carried out using a main effect conditional logit model. This might be a methodological limitation, as the model assumes no correlation in unobserved factors over the alternatives (independence of irrelevant alternatives (IIA)). This assumption might not be realistic for genetic testing in any given situation. However, the relatively simple approach helps to keep the design straightforward and make the findings communicable to interested people who have little experience with preference eliciting methodology. Further research should use more sophisticated methods for data analysis and compare the results with the approach used here.

The findings presented here are the combined results for all participants regardless of their nationality or profession. The small sample size limits our ability to conduct separate analyses for different subgroups. Further research based on a revised version of this survey among a large number of geneticists and patient representatives is necessary to document the similarities and differences between value judgements across countries and professions. The questions of which stakeholders should be involved in the final decision and for which opinions should be given more weight if priorities differ across groups then present a new challenge.

Further ethical considerations

Empirical data about a population’s value judgements can be considered a relevant input into decisions about prioritising health-care resources. However, given the lack of ethical reflection, it can hardly be claimed that they have sufficient normative power to determine decisions as important as those about providing or withholding health care. Therefore, such empirical evidence should be complemented by theoretical considerations about which criteria can be considered a reasonable basis for fair decision making.32

Despite the use of the best available evidence and thorough ethical reflection, reasonable people may still disagree about which criteria should have a role in decision making and about what their relative importance should be. It has therefore been claimed that decisions about limits in health care can only be legitimate if they also meet the criteria of procedural fairness. To achieve this, Daniels and Sabin33 have proposed conditions of accountability for reasonableness, which include the recommendation that decisions and their rationales should be made transparent, that they should rest on reasons that fair-minded parties can agree and are relevant to the decision and that there is a mechanism of challenge and appeal available to the relevant stakeholders.

Such theoretical considerations as well as a criterion of procedural fairness should therefore complement the use of DCE or BWS experiments when developing guidance for prioritising genetic tests.

Conclusion

Scientific evidence about value judgements regarding different prioritisation criteria can provide important insights for a rational approach to priority setting. This exploratory study presents an example of how DCE and BWS experiments can be used to collect such evidence for priority setting in genetic testing.

The findings presented here show that the methods are feasible, but that they exhibit particular strengths and limitations. This underlines the importance of pilot testing the DCE and BWS format before its wider application using both qualitative and quantitative methodology. Further research needs to trade off the methodological advantages of the BWS approach against potentially biased results when using this new elicitation method.