Surveillance of cirrhosis for hepatocellular carcinoma: a cost–utility analysis

Using a decision-analytic model, we evaluated the effectiveness and cost-effectiveness of surveillance for hepatocellular carcinoma (HCC) in individuals with cirrhosis. Separate cohorts with cirrhosis due to alcoholic liver disease, hepatitis B and hepatitis C were simulated. Results were also combined to approximate a mixed aetiology population. Comparisons were made between a variety of surveillance algorithms using α-foetoprotein (AFP) assay and/or ultrasound at 6- and 12-monthly intervals. Parameter estimates were obtained from comprehensive literature reviews. Uncertainty was explored using one-way and probabilistic sensitivity analyses. In the mixed aetiology cohort, 6-monthly AFP+ultrasound was predicted to be the most effective strategy. The model estimates that, compared with no surveillance, this strategy may triple the number of people with operable tumours at diagnosis and almost halve the number of people who die from HCC. The cheapest strategy employed triage with annual AFP (incremental cost-effectiveness ratio (ICER): £20 700 per quality-adjusted life-year (QALY) gained). At a willingness-to-pay threshold of £30 000 per QALY the most cost-effective strategy used triage with 6-monthly AFP (ICER: £27 600 per QALY gained). The addition of ultrasound to this strategy increased the ICER to £60 100 per QALY gained. Surveillance appears most cost-effective in individuals with hepatitis B-related cirrhosis, potentially due to younger age at diagnosis of cirrhosis. Our results suggest that, in a UK NHS context, surveillance of individuals with cirrhosis for HCC should be considered effective and cost-effective. The economic efficiency of different surveillance strategies is predicted to vary markedly according to cirrhosis aetiology.

Hepatocellular carcinoma (HCC) occurs mainly in cases of cirrhosis, which, in turn, tend to be secondary to either alcoholic liver disease (ALD) or infection with hepatitis B or C viruses (HBVs/HCVs). Symptomatic HCC typically presents late with a bleak prognosis, whereas HCCs detected during formal surveillance are smaller, more likely to be uninodular and more commonly amenable to curative treatment (De Masi et al, 2005). For this reason, UK (Ryder, 2003), European (Bruix et al, 2001) and American (Bruix and Sherman, 2005) clinical guidelines recommend routine surveillance for HCC among individuals with cirrhosis, and approximately three-quarters of UK gastroenterologists undertake such a programme, mostly using a combination of periodic serum a-foetoprotein (AFP) testing and ultrasound (US) (Lai et al, 2002).
However, consensus has not been reached as to the optimal surveillance strategy. We therefore developed a decision-analytic model to assess the effectiveness and cost-effectiveness of a range of different surveillance strategies in the United Kingdom. We considered populations with cirrhosis secondary to HBV or HCV infection, and as a result of ALD.

Overview of model
We developed a state-transition (Markov) model using TreeAge Prot 2005 (TreeAge Software, Williamstown, MA, USA) to compare alternative surveillance strategies. In this approach, disease progression is modelled as movement between different health states over time. Time is modelled as a series of fixed cycles, in this case 1 month, with probabilities of movement between states calculated per cycle. Costs and utility values are attached to each state, and the differences between the aggregated costs and health outcomes in each simulation are used to estimate the costeffectiveness of surveillance, expressed as incremental cost per quality-adjusted life-year (QALY) (Sonnenberg and Beck, 1993). Costs and QALYs were discounted at 3.5% per year (HM Treasury, 2003). The perspective of the analysis is that of the UK NHS.

Simulated populations
The population of interest is people with compensated cirrhosis aged 70 years or less with no pre-existing medical conditions that might preclude treatment with liver transplantation (OLT) or hepatic resection (including current alcohol or intravenous drug use). The model considers three cirrhosis aetiologies (ALD, HBV and HCV). Results were also combined to simulate a mixed aetiology cohort, consisting of the following proportions (based on an average of estimates provided by gastroenterologists in several centres around the United Kingdom): 57.6% ALD, 7.3% HBV and 35.1% HCV.

Model structure
The structure of the model is shown in Figure 1. For the natural history component of the model, we defined three classes of HCC: small (o2 cm in diameter), medium (2 -5 cm) and large (45 cm). Tumour characteristics in terms of detectability and treatability are reflected in the transition probabilities. The following simplifying assumptions were made: (i) progression from cirrhosis to decompensated cirrhosis is irreversible; (ii) the rate of incidence of HCC is the same in compensated and decompensated livers; (iii) tumour diameter is used as a surrogate index of all characteristics of tumour progression (multifocal tumours are not modelled separately, since tumour nodularity is heterogeneous in the populations from which parameters have been drawn; therefore, what our model defines as 'medium' tumours would, in real-world practice, include those with multiple, small nodules that would not preclude transplantation, and a 'large' tumour may be thought to include those which are diffuse in nature); (iv) the presence of an HCC is only associated with additional mortality risk when it becomes 'large' (at which point it is also very likely to become symptomatic) and (v) compensated cirrhosis is not subject to an excess mortality rate, as the primary causes of death in these individuals are those already accounted for in the model (i.e., progression to decompensation and/or HCC).
The surveillance programme is superimposed onto the disease process. We simulated six different surveillance regimes, consisting of AFP-led, US-led and combined screening at 6-or 12monthly intervals ( Figure 2). These were based on European guidelines, which recommend that the diagnosis of HCC be based on two coincident imaging techniques (Bruix et al, 2001). For comparison, we also modelled an arm simulating no surveillance. The possibility of incidental/symptomatic presentation of HCC is modelled at all stages of disease. All confirmatory imaging is by CT scan. We assumed 100% compliance with the surveillance programme in the base-case analysis.
We modelled a mixed treatment approach using OLT and resection. People can enter the OLT waiting list following diagnosis of either a surgically treatable HCC or decompensated cirrhosis. Each person is as likely to receive a liver as any other, regardless of the reason for listing. While on the waiting list, people are subject to the same natural history process as those prelisting, and no 'bridging' therapies are simulated. People who undergo successful surgical treatment enter a simplified disease process in which excess mortality rates and associated costs and utilities encompass the spectrum of possible post-treatment experiences. There is no waiting list for liver resection. Some people are deemed unsuitable for either surgical treatment. A proportion of people with surgically untreatable small and medium-sized tumours receive palliative treatment with radiofrequency ablation and percutaneous ethanol injection. On progression to 'terminal HCC large', an excess mortality with associated costs and utilities is applied that reflects the palliation provided by transarterial chemoembolisation for a proportion of people.

Parameters
Model parameters are listed in Tables 1 -3 Figure 2 Decision trees illustrating the three screening algorithms investigated in the model. (full details and strategies available from the authors). We sought data that fulfilled the following criteria: large, recent studies of UK patients with a diagnosis of cirrhosis (with details of aetiology). For parameters in which there were no UK-based studies available, we sought data from countries with a similar disease profile. Cost data were obtained from national (UK NHS; Department of Health (DoH), 2005) sources where available, supplemented by data from a recent UK-based observational study of patients with HCV (Wright et al, 2006). Historical values were inflated to 2004 prices.

Analysis of uncertainty
Extensive one-way sensitivity analyses were undertaken to explore, which of the input parameters had the greatest impact on results. For simplicity, these were performed using a single-core comparison: 6-monthly AFP þ US vs no surveillance. Owing to the paucity of reliable estimates for US sensitivity and the possibility that these estimates do not accurately reflect current practice, we examined the impact of simultaneously varying the sensitivity of US for detecting tumours over a range of correlated values from 5 to 50, 10 to 75 and 50 to 100% for small, medium and large tumours, respectively. We also performed scenario analyses testing less optimistic assumptions about patient compliance. Probabilistic sensitivity analysis was also undertaken. Ten thousand Monte Carlo simulations per aetiology were run, with key input values randomly drawn from probabilistic density functions in each iteration. Distributions can be obtained from the authors. Table 4 summarises the effectiveness of each surveillance strategy in the mixed aetiology cohort. The 6-monthly AFP þ US was most effective across all outcomes, more than tripling the number of HCCs diagnosed while operable, and almost halving the number dying from HCC, when compared with no surveillance. However, the cheapest strategy, annual AFP-triage, still achieved substantial gains: for example, more than doubling the number of operable HCC found and increasing the number of small tumours found more than six-fold.

Cost -utility of surveillance
Cost -utility results are shown in Table 5. In an incremental analysis, neither of the US-only strategies would be considered (since they are both slightly less effective and more costly than surveillance at the same frequency with AFP-triage). Therefore, in the mixed aetiology cohort the cheapest surveillance strategy is annual AFP-triage, with incremental cost-utility of d20 700 per QALY. Doubling the frequency of surveillance would increase the mean number of QALYs by 0.035 at a cost of d1000 each, giving an incremental cost-effectiveness ratio (ICER) of d27 000 per QALY gained.

Deterministic sensitivity analyses
The cost -utility estimates appear to be most sensitive to changes in tumour growth rate, mortality following OLT and excess  (Sheu et al, 1985;Ebara et al, 1986;Cottone et al, 1988;Oka et al, 1990;Cottone et al, 1994;Zoli et al, 1996). l Because, in the algorithm simulated by the model, the use of CT is limited to a second line, confirmatory context, our base-case assumption was that all such scans would accurately identify HCC, where present. Although CT is subject to a significant false-negative rate when used in blinded, first-line assessments (Taouli et al, 2004;Baron and Brancatelli, 2004), we made the assumption that such fallibility should not be apparent in this context. The impact of this assumption was tested in sensitivity analyses. m Source: Valls et al (2004). n Source: Spreafico et al (1997). o Authors' assumption, in the absence of suitable published data source. On the basis of the rationale that sensitivity should be greater than the evidence-based figure adopted for HCC M , but less than 100%.
mortality associated with undiagnosed large tumours. Quality of life in compensated cirrhosis and following OLT are also important, as are costs associated with US and OLT. When the cirrhosis aetiologies are considered separately, the mean age at diagnosis in individuals with HBV-related cirrhosis becomes an important variable. Increases in US sensitivity lead to improved effectiveness in all surveillance strategies. At both annual and 6-monthly frequencies, US-led surveillance becomes more effective than AFP-triage surveil-lance when it can be assumed that US is at least sensitive enough to detect one in five small tumours, one in three medium tumours and two in three large tumours. However, when costs are also considered, it is only in the HBV cohort that US becomes more cost-effective than AFP-triage surveillance at the same frequencies.
When we examined imperfect patient compliance with surveillance, we found a noticeable reduction in effectiveness. However, there was a commensurate reduction in costs, so incremental costeffectiveness results were not greatly altered.  (2001), as re-analysed by Wright et al (2006). e Assumed to be the same as the annual cost for a weighted mix of compensated and decompensated cirrhosis, plus an amount to reflect HCC recurrence (at a rate of 70% by 5 years) and treatment (33% OLT; 33% repeat resection; 33% palliative treatments). f In addition to costs of underlying cirrhosis and costs of HCC. g Assuming 33% of patients receive RFA on an average of once per year; and 33% of patients receive PEI on an average of six times per year. h Assuming 33% of patients receive TACE on an average of once per year. i Assumes that the discovery of a falsepositive diagnosis would involve one additional CT scan, one additional MRI scan and two additional outpatient visits. j Assumes that confirming a symptomatic or incidental HCC diagnosis would entail one AFP test, one liver ultrasound and one CT. The model is extremely sensitive to the discount rate applied (3.5% per year for both costs and utilities in the base case). The ICER for 6-monthly AFP þ US compared with no surveillance ranges from d19 400 per QALY if no discounting is applied to d35 800 per QALY if rates of 6% are used for both costs and utilities.
Probabilistic sensitivity analysis Figure 3 shows cost-effectiveness acceptability curves (Fenwick et al, 2001;Fenwick et al, 2004) for ALD, HBV, HCV and the mixed cohort. These graphs show the probability that each strategy would be considered the most cost-effective (in terms of highest net  monetary benefit) at different levels of willingness to pay for a QALY. At a willingness-to-pay threshold of d30 000 per QALY, the most intensive surveillance protocol simulated (6-monthly AFP þ US) is only likely to be considered cost-effective in individuals with HBVrelated cirrhosis. In those with HCV-related cirrhosis, 6-monthly AFP-triage is more likely to be considered cost-effective; indeed, willingness to pay would have to rise to around d65 000 per QALY before 6-monthly AFP þ US becomes most likely to be considered cost-effective in this population. In individuals with ALD-related cirrhosis, there is uncertainty about which strategy would be most cost-effective at a willingness-to-pay threshold of d30 000 per QALY, with no surveillance, annual AFP-triage and 6-monthly AFP-triage having approximately equal likelihood of maximal cost-effectiveness.
In the mixed aetiology cohort, which approximates the decision framework if a single strategy is to be adopted across all aetiologies, surveillance of any kind can only be recommended if willingness to pay approaches d30 000 per QALY. At this level, 6monthly AFP-triage appears to be the most cost-effective surveillance protocol, and remains the foremost option until willingness to pay reaches very high levels. This analysis suggests that society would have to be prepared to spend nearly d70 000 per QALY gained before the most effective strategy -6-monthly AFP þ US -could confidently be assumed to provide best value for money.

Summary of main findings
Our analysis suggests that, in patients with cirrhosis, surveillance strategies for HCC are effective, and can often be considered costeffective. The most effective strategy for a mixed aetiology cohort of individuals with cirrhosis is AFP assay combined with US imaging on a 6-monthly basis. However, when costs are taken into account, using AFP as a triage step may be preferable. Surveillance is much more likely to be cost-effective in those with HBV-related cirrhosis, while surveillance of people with ALD-related cirrhosis appears least economically efficient.

Interpretation of findings
According to our simulation, the economic efficiency of different surveillance strategies can be expected to vary substantially according to cirrhosis aetiology. As a result, the most efficient mode of resource allocation, from a purely decision-analytic viewpoint, would be to offer surveillance of differing intensity to each subgroup. However, apart from the practical complications of establishing different recall pathways for different patient groups, concerns might also be raised about the ethical implications of this approach. For instance, one-way sensitivity analysis suggested that the particularly good value offered in the HBV group may be substantially due to the younger age of the cohort. By implication, then, there may be further subgroups of individuals with HCV and ALD, diagnosed with cirrhosis at a younger age, in which more intensive surveillance might be particularly cost-effective.
Our results also suggest that reports of the AFP test's demise (Sherman, 2001) may be exaggerated, particularly if one adopts a cost-effectiveness perspective. We believe one reason for this is that previous authors may have failed to account fully for tumour size in their analyses. Although the AFP test has been found to be relatively insensitive for detecting HCC in general (Trevisani et al, 2001), it has the substantial advantage that its sensitivity is only weakly correlated with tumour size, meaning that it should be a valuable aid to the detection of some of the smallest, most easily treated tumours. According to the evidence used in our model, 65% of tumours less than 2 cm in diameter secrete 20 ng ml À1 or more of AFP (although this figure may be an overestimate, with the true proportion around 46%; Farinati et al, 2006). In contrast, our model is configured to simulate US sensitivity of only 10.7% for the smallest category of tumour (Bennett et al, 2002). Although this is at the pessimistic end of the range of available evidence, it should not be seen as an outlying estimate. Every study that has used an optimal reference standard (explant pathology) to investigate the diagnostic capabilities of US, in this setting, has reported disappointing sensitivity for the tumours p2 cm in diameter: 13.8% (Kim et al, 2001), 22.2% (Rode et al, 2001) and 30.0% (Liu et al, 2003). Even if one was to adopt the most positive estimate available, all robust evidence suggests that US is less sensitive than AFP assay for the detection of the smallest tumours.
Additionally, AFP is a very cheap test and, inevitably, this is a crucial consideration from a cost-effectiveness perspective. The fact that current guidelines do not recommend using AFP screening may reflect the poor quality of current evidence or a lack of explicit attention to cost-effectiveness considerations during the development of clinical guidelines. However, a strategy led by one single diagnostic modality will always run the risk of serial false-negative findings, and this setting provides a good example (some tumours never secrete AFP and will therefore never be detected using an AFP-led approach and, equally, some tumours will infiltrate diffusely and resist US detection).
Because tumour growth rate had a clear influence on costeffectiveness in the one-way sensitivity analysis, we investigated this factor further. We performed stratified scenario analyses, simulating three mixtures of slow-growing and fast-growing tumours, instead of applying one average growth rate throughout the model. While the combination of growth rates had an impact on the effectiveness and cost-effectiveness of surveillance (the more fast-growing HCCs, the more cost-effective surveillance becomes), the incremental relationship between surveillance strategies was preserved. The only practical implication of these additional findings is to suggest that, if the true mix of HCCs features a preponderance of slow-growing tumours, it may not be cost-effective to offer any surveillance strategy at 6-monthly intervals.

Strengths of the evaluation
This is the first analysis of the effectiveness and cost-effectiveness of surveillance of cirrhosis for HCC in the UK NHS. Comprehensive literature searches were conducted to inform the model parameters, wherever possible choosing data either derived from the UK population or most likely to be applicable to the UK population. Extensive exploration of model structures and uncertainty suggest that our model more appropriately captures the disease and surveillance process and impacts than previous studies in this field. While we have assumed that decompensated cirrhosis is irreversible, we showed in one-way sensitivity analysis that our results are not greatly influenced by alterations in the annual rate of progression from compensated to decompensated cirrhosis. From this, it is safe to infer that allowing some regression from decompensated back to compensated cirrhosis would not significantly alter our main results. By accounting for the substantial differences in age-related incidence, natural history and response to treatment that exist according to cirrhosis aetiology, we predict that different approaches to surveillance may be justified according to different causes of cirrhosis.

Limitations of the evaluation
First, there is very little published evidence on which to base many of the parameter estimates for the model, and few data originate within the United Kingdom. This was particularly apparent for defining US performance. Second, as the primary focus of this evaluation was the effectiveness and cost-effectiveness of surveillance, we have used a simplified approach to modelling treatment in which OLT and resection are the only curative options available. We performed limited modelling of nonsurgical/ablative therapies as the evidence at the time of the analysis was inconclusive (Di Bisceglie, 2005). Recent evidence, predominantly from Asia (Izumi et al, 2007;Peng et al, 2007;Zytoon et al, 2007), suggests that such therapies may improve survival in patients with small tumours. If similarly promising findings could be shown in the United Kingdom, we would expect widespread adoption of such techniques to have beneficial cost-effectiveness implications, since a minimally invasive approach is much cheaper than OLT, and might also be expected to have less negative impact on quality of life. Third, we have assumed that entry to the surveillance programme is confined to those more likely to be considered eligible for the available curative treatment options (OLT or resection) and therefore assumes that high-risk activity (excessive alcohol consumption, intravenous drug use and so on) has ceased, and this may not be realistic. We have also assumed that the three cirrhosis aetiologies are mutually exclusive and acknowledge that many people develop cirrhosis as a result of multiple causes.

Comparison with other studies
There are no other studies published studies from Europe. We identified three comparable studies from our literature searches (Arguedas et al, 2003;Lin et al, 2004;Patel et al, 2005); all were conducted in the United States and simulated HCV-related cirrhosis only. In these studies, 6-monthly surveillance using AFP and US produced utility gains of between 0.23 and 0.49 QALYs, compared with no surveillance, giving ICERs of between $24 500 and $46 600 per QALY gained. Apart from obvious dissimilarities in input values and assumptions -reflecting the different populations and health-care systems simulated -there are substantial differences in the structures of the models. Crucially, ours is the only model to have accounted for tumour size in simulating the sensitivity of surveillance.

Implications for future research
Further research is required in the following areas (i) further modelling studies using alternative modelling methods such as individual patient sampling techniques could be used to account for heterogeneity in the patient population, so that factors such as tumour growth rate, tumour characteristics and the variability in individual patients' serial test results could be assessed. Such methods could also be used to assess the optimal surveillance strategy, the optimal surveillance interval and the effects of surveillance on waiting lists for OLT; (ii) further modelling studies could also investigate innovative surveillance strategies not currently undertaken in clinical practice (e.g., alternating AFP and US investigations at 6-or 12-monthly intervals, or measuring change in AFP levels following serial tests rather than absolute levels with fixed cutoff points); (iii) further empirical and modelling analysis of the impact of age at diagnosis of cirrhosis on the effectiveness and cost-effectiveness of surveillance; (iv) empirical evaluation of newer imaging techniques (e.g., contrastenhanced US) to detect HCCs; and (v) assessment of the effectiveness and cost-effectiveness of surveillance in other aetiologies (e.g., nonalcoholic fatty liver disease) and using other curative treatment options (e.g., ablative techniques).