Defining and analysing symptom palliation in cancer clinical trials: a deceptively difficult exercise

The assessment of symptom palliation is an essential component of many treatment comparisons in clinical trials, yet an extensive literature search revealed no consensus as to its precise definition, which could embrace relief of symptoms, time to their onset, duration, degree, as well as symptom control and prevention. In an attempt to assess the importance of these aspects and to compare different methods of analysis, we used one symptom (cough) from a patient self-assessment questionnaire (the Rotterdam Symptom Checklist) in a large (>300 patient) multicentre randomized clinical trial (conducted by the Medical Research Council Lung Cancer Working Party) of palliative chemotherapy in small-cell lung cancer. The regimens compared were a two-drug regimen (2D) and a four-drug regimen (4D). No differences were seen between the regimens in time of onset of palliation or its duration. The degree of palliation was strongly related to the initial severity: 90% of the patients with moderate or severe cough at baseline reported improvement, compared with only 53% of those with mild cough. Analyses using different landmark time points gave conflicting results: the 4D regimen was superior at 1 month and at 3 months, whereas at 2 months the 2D regimen appeared superior. When improvement at any time up to 3 months was considered, the 4D regimen showed a significant benefit (4D 79%, 2D 60%, P = 0.02). These findings emphasize the need for caution in interpreting results, and the importance of working towards a standard definition of symptom palliation. The current lack of specified criteria makes analysis and interpretation of trial results difficult, and comparison across trials impossible. A standard definition of palliation for use in the analysis of clinical trials data is proposed, which takes into account aspects of onset, duration and degree of palliation, and symptom improvement, control and prevention. © 1999 Cancer Research Campaign

In published reports, Ôpalliative treatmentÕ may simply reflect noncurative treatment, with no information provided on relief of presenting symptoms [e.g. trials of palliative chemotherapy in small-cell lung cancer (James et al, 1996) and hormonal palliation of ovarian cancer (Ahlgren et al, 1993)]. In other reported trials, palliation is defined as the treatment given to prevent medical conditions arising [e.g. pamidronate to prevent hypercalcaemia and bone pain in patients with bone metastases (van Holten-Verzantvoort et al, 1993)].
When relief of symptoms is considered, the most straightforward analyses are those in which the primary end point is the resolution of a specific symptom [e.g. sucralfate to palliate acute radiation oesophagitis (Sur et al, 1994) or granisetron as an antiemetic in cytostatic-induced emesis (Marty et al, 1992)]. Occasionally, when treatment affects a number of symptoms, the individual symptoms are listed and their palliation reported [e.g. neurological symptoms in patients with brain metastases (Kurtz et al, 1981) or general symptoms resulting from hepatic metastases (Mohiuddin et al, 1996)].
When assessments have been made over several symptoms or time points, authors have used a variety of methods of simplifying analysis and presentation by summarizing the symptoms or the time points. For example, Lewington et al (1991) combined change in general condition, mobility, analgesic use and pain into a symptom response. In contrast, Barr et al (1990) compared the ÔmeanÕ and ÔbestÕ dysphagia score against baseline, and Ball et al (1997) calculated an average grade for each symptom.
All too often, results presented give scant information on how symptoms were measured, by whom, when, for how long, and what the severity was at baseline and thereafter (e.g. Haie-Meder et al, 1993;Buroker et al, 1994). Nevertheless, a number of groups have investigated more innovative ways of analysing palliation in the context of their trials. Burris and Storniolo (1997) attempted to demonstrate Ôclinical benefitÕ, and categorized their patients as responders based on changes in pain (analgesia use or intensity), Karnofsky performance status and weight. Such an approach is potentially useful, but requires considerable preparatory work. Tong et al (1982), investigating radiotherapy schedules to palliate symptomatic bone metastases, considered the promptness of pain relief. However, as the authors point out, the comparison was based not on when relief occurred, but when it was reported, which introduces the possibility of bias. Tannock et al (1996), in a trial of chemotherapy for prostate cancer, devised a palliative response based on pain intensity and analgesic intake. They then used KaplanÐMeier plots to show the duration of palliation. Leibel et al (1987) considered the palliation achieved within groups of patients who had different levels of initial severity. This indicated that a greater proportion of patients with moderate or severe abdominal pain reported relief after treatment than those with mild pain, but that complete response was less often achieved.
A palliation index in patients with lung cancer (Muers and Round, 1993) was based on the duration of response divided by the duration of survival. Importantly, this definition included the concept of symptom control, so that palliation encompassed patients with initially no symptoms or mild symptoms which did not get worse.
There is little evidence, however, that any of these potentially useful methods have been repeated by the authors or applied more widely.
The MRC LCWP have used a variety of methods of examining clinician-rated symptom relief in a series of trials of palliative radiotherapy in non-small-cell lung cancer (NSCLC) and palliative chemotherapy in small-cell lung cancer (SCLC). In three trials (MRC LCWP, 1991, 1992, the duration of improvement of individual presenting symptoms was calculated and then (as with Muers and Round, 1993) presented as a proportion of survival time. In addition, the proportion of patients achieving complete resolution of presenting symptoms and duration of palliation was reported, and better palliation in patients with initially moderate or severe symptoms was noted. In two further trials (MRC LCWP, 1996a, b), KaplanÐMeier plots were used to estimate the frequency of palliation of individual patient-reported symptoms by specified time points. Finally, in the most recently published trial (MRC LCWP, 1996c), a palliation score was devised to incorporate four key symptoms, and changes in this score from baseline to 3 months, importantly including patients who died before 3 months as non-palliated, were compared.

AIM
The aim of this paper is to report exploratory analyses that highlight some of the advantages and disadvantages of published methods, and to try to develop a generally applicable definition of symptom palliation in clinical trials. To do this, we have used data from patient-completed RSCL questionnaires in an MRC LCWP trial because this is typical of the data available in many trials.
However, the concepts are relevant to any assessments of symptoms, whether they be by the patient or the clinician or on any standard QL questionnaire.

DATA USED
Data from the MRC LCWP LU12 trial (MRC LCWP, 1996b) was used in this analysis. In the LU12 trial, patients with extensive disease SCLC, or limited disease but poor performance status [WHO grade 3 or 4 (WHO, 1979)], were allocated at random to receive three 3-weekly cycles of either a four-drug regimen Ð etoposide, cyclophosphamide, methotrexate and vincristine (4D) or a two-drug regimen Ð etoposide and vincristine (2D) as palliative treatment. The 4D regimen had been shown in previous MRC trials (MRC LCWP, 1989a, b) to be an effective treatment for SCLC, and the aim of this trial was to see if, in poor prognosis patients, the 2D regimen would prove equally effective but less toxic.
A total of 310 patients (154, 4D; 156, 2D) were randomized from 23 centres in the UK. There was no difference in survival between the two treatment groups (medians: 4D, 141 days; 2D, 137 days), and levels of palliation were reported to be similar, but patients on the 4D regimen had more haematological toxicity and more early, possibly treatment-related, deaths (Stephens et al, 1994).

Assessment of Quality of Life
QL was assessed by patients using the RSCL, a patient-completed categorical scale consisting of 38 core items to which four lung cancer-specific symptoms (chest pain, cough, hoarseness and  coughing up blood) were added. Patients were asked to tick boxes to indicate how they had been feeling during the past week. In this paper, these categories are referred to as nil, mild, moderate and severe. Questionnaires were completed pretreatment, at each attendance for protocol chemotherapy, then monthly to 6 months from randomization, 2 monthly to 1 year, and 3 monthly thereafter.

Analyses
For the sake of simplicity, we have focused on one major diseaserelated symptom (cough). Compliance was assessed according to the method proposed by Hopwood et al (1994), which gave an overall compliance of 56% in the first 6 months falling from 85% at baseline to 31% at 6 months. Such low levels of compliance, which are not unusual in palliative trials, affect the analysis of palliation. For example, to compare the treatments in terms of improving the severity of cough by one grade or more at any follow-up from baseline, Table 1 shows that, of the 154 4D and 156 2D patients randomized, 16 and 29, respectively, did not complete a baseline RSCL; a further one and one completed the form outside the specified time window; 49 and 23 completed no follow-up questionnaires; and 17 and 17 patients did not report having a cough at baseline. There thus remain 71 4D and 86 2D patients for any comparison of improvement of cough.

Aspects of palliation
By synthesizing some of the ideas in the published papers, we propose that palliation can be viewed as a three-dimensional concept, namely, time of onset, duration and degree.

Onset of symptom improvement
Considering the 157 patients with data available for analysis (Table 1), the majority of patients reported an almost immediate improvement, but 12 (11% of the patients who reported improvement) reported this later than 2 months from starting treatment. In one of these 12 patients, this was simply due to the absence of any earlier assessments, but the remaining 11 patients had intermediate assessments at which they reported no change or indeed a worsening cough.
When assessing time to onset of palliation, the timings and frequency of questionnaire administration and patient compliance are vitally important, and the time period over which palliation is to be assessed should be limited to that period when trial regimens, rather than additional interventions, will be influencing events.
One possible method of treatment comparison of the time of onset of symptom relief uses KaplanÐMeier plots: the time that improvement was first reported is taken as the event, with patients not reporting improvement being censored at the date of their last QL questionnaire completion (MRC LCWP, 1996a, b). The main advantage of this method is that all patients reporting cough at baseline can be included. The major disadvantage is that it assumes that censored patients [those not achieving the event (palliation) whether they be alive or dead] are assumed to follow the same pattern as those who do. Thus, a treatment with poor survival but good palliation among the survivors can appear better than a treatment with good survival and variable palliation, even if the actual number of patients alive and with palliation is exactly the same in both treatments. Similarly, any unequal reduction in group sample sizes must be considered worrying, e.g. Buroker et al (1994), from initial group samples of 183 and 179, compared 122 with 90 in terms of palliation.
If time to onset is too difficult a concept to include in an overall definition of palliation, at least limiting the period over which palliation is assessed ensures that it is related to the treatment being studied.

Duration of improvement
KaplanÐMeier plots can be used to estimate and compare the proportion of patients who remain alive and with symptom improvement because it is reasonable to assume that patients censored will die or relapse at the same rate as their contemporaries. Figure 1 is based on 56 4D and 52 2D patients who reported improvement in cough. There was no statistically significant difference between the regimens (P = 0.32) using this method of comparison.
Of course, the accuracy of these figures relates to the frequency of assessment. The dates of symptom improvement and deterioration have to be taken as the date of the assessment at which the event was first reported, which could be weeks or even months after the previous assessment and after the actual event.  One simple way of incorporating duration into any definition of palliation is to ensure that the palliation reported is sustained, not transient or due to day-to-day changes. Thus, as with the WHO definition of tumour response (WHO, 1979) and as used by Tannock et al (1996), improvement should only be accepted if it is reported on two successive assessments at least 3 weeks apart.

Degree
The degree of symptom relief was analysed according to the severity at presentation. Improvement in severity of cough was related to the reported baseline severity (as shown in Table 2). Thus, in the 4D group, of the patients who reported severe cough at baseline, all 16 (100%) reported improvement, compared with 18 (95%) of those who reported moderate cough at baseline and only 22 (61%) who reported a mild cough. The corresponding figures for the 2D group were 83%, 80% and 47%.
Therefore, different criteria for ÔpalliationÕ are required for patients presenting with different symptom severity. These can be referred to as improvement, control and prevention.

Improvement
The intuitive definition of palliation is that the symptom is relieved. Given the wide use of a four-point categorical severity scale, this is inevitably defined as an improvement in symptom rating by one or more categories (Sur et al, 1994;MRC LCWP, 1991, 1992. Inevitably, patients without the symptom at presentation are excluded.

Control
As suggested by Muers and Round (1993), for those patients with absent or mild symptoms at presentation it makes clinical sense to consider control, i.e. the symptom does not develop or, if mild, get worse.

Prevention
Another way of including the patients reporting no initial symptoms is to include prevention as part of palliation (e.g. van Holten-Verzantvoort et al, 1993).
Thus, all patients can be encompassed in a definition which requires improvement in those with initially moderate or severe symptom, improvement or control for those with mild symptom, and prevention in those with no symptom. This accords with the definition of palliation outcome measures described by Maher et al (1994).

SELECTING THE TIME POINT FOR ANALYSIS
The landmark analysis, calculating the change from baseline to a specific time point, is most often used in the analysis of palliation. However, there are two distinct methods, both with inherent problems. Using all the available data results in a diminishing sample [e.g. Twelves et al (1994) compared 14 patients at baseline with 12 at 6 weeks], which is not strictly comparable and does not account for the fact that any improvement seen may simply be due to patients who are more ill dropping out. Alternatively, using only those patients with data at both time points does not allow for the fact that such patients, who must be survivors and compliers, may not be representative of the whole sample, e.g. Brewster et al (1995) could use only 161 patients with data at baseline and at 6 weeks out of their sample of 197, and Richards et al (1992) could use only 42 out of 53 patients at pre-and post-treatment.
Our paper discussing this problem (Hopwood et al, 1994) suggests that, in order to utilize all the data, it is first necessary to show that the scores from patients completing different numbers of questionnaires (for whatever reason) are similar. Glimelius et al (1996) produced charts to show similar patterns of symptom severity in subgroups of patients within each regimen, and more importantly differences between regimens. However, as there are no simple statistical methods to compare plots composed of diminishing populations over time, such comparisons can only be descriptive. Thus, using only those patients with data at both baseline and the landmark time point is the preferred option.
In Table 3, three time points were considered: 1, 2 and 3 months from randomization. For each, the denominator was the number of patients with a reported cough at baseline and follow-up data at that specific time point, and the proportion of patients is reported whose grade of cough improved compared with baseline. At 1 month and at 3 months, the 4D regimen emerged superior (and at 1 month significantly so, P = 0.02), whereas at 2 months the 2D regimen appeared favourable. The selection of time point appears critical, and it is very concerning to find that conflicting results can be obtained.
Considering several time points results in multiple comparisons and the need to adjust the level of statistical significance, therefore, reporting improvement at any one assessment over a specified period of time is probably more reliable than choosing one landmark time point.

ANALYSIS OF PALLIATION: EXAMPLES OF METHODS IN USE
Five different definitions of palliation, each of which is commonly reported or presented, were applied to the MRC data set, and the proportions of patients reporting palliation of cough compared. The results are summarized in Table 4. In addition, to avoid the possibility that differences in survival biased results, for each analysis we added a subsidiary analysis in which the patients excluded because of early death were included as non-palliated (MRC LCWP, 1996c).
(i) Improvement from baseline by one or more categories at any subsequent assessment. This is the most commonly used definition (MRC LCWP, 1991, 1992. The population for this analysis are the 157 patients who reported having a cough at baseline and had at least 1 follow-up assessment (see Table 1). If the score for cough on any of the follow-up assessments is lower than that reported at baseline, that patient is considered to have been palliated. The analysis suggests a benefit for the 4D group (4D, 79%; 2D, 60%; P = 0.02).
However, if the patients who reported having a cough at baseline, but who died before completing the first follow-up QL questionnaire are added to the denominator, then this difference disappears: 4D, 61%; 2D, 54%; P = 0.39. (ii) Disappearance of symptom at any subsequent assessment In some circumstances, the aim of treatment is to eradicate symptoms completely. Using the same 157 patients as in definition (i), in this case only those who reported ÔnilÕ at any subsequent assessment were considered palliated. The 4D regimen again emerged as superior to 2D (4D, 54%; 2D, 49%), but not statistically significantly so.
Adding in the patients who reported cough at baseline, but who died, to the denominator changed the apparent outcome, with the 2D regimen now achieving 43% palliation and the 4D achieving 41%. (iii) Improvement from baseline by one or more categories at two successive assessments.
In an attempt to ensure that palliation obtained is clinically beneficial and not merely transient, palliation may be defined as symptom improvement sustained over two successive follow-up assessments (Tannock et al, 1996). In our data, the study population was reduced from the 157 used in definitions (i) and (ii) to 123 (58, 4D; 65, 2D), as 34 patients (13 and 21 respectively) only had one follow-up QL questionnaire assessment and were excluded. The benefit appeared to be associated with the 4D regimen: 4D, 57% palliated; 2D, 42%; P = 0.13.
In addition to the 21 and 11 patients, respectively, who reported a cough at baseline but who died before the first follow-up assessment and, therefore, can be classed as Ônot palliatedÕ, we can also include three and two patients, respectively, who reported a cough at baseline, completed a QL questionnaire at first follow-up, but died before the  (43) 49/89 (55) P = 0.14 second follow-up. Adding these patients into the denominator gives palliation rates of 40% and 35% for the 4D and 2D patients respectively, P = 0.58. (iv) Improvement from baseline of moderate or severe to mild or nil at any assessment. There may be situations in which the aim of palliative treatment is to concentrate on patients with more severe symptoms. In this definition, only those with moderate or severe cough at presentation are considered, and only those who report this reduced to mild or nil are ÔpalliatedÕ. In our dataset, the initial sample size was thus reduced from 157 patients to 68. The level of palliation achieved with this definition appeared higher than with previous definitions (4D, 86%; 2D, 82%) because, as already shown, of improvement being easier to demonstrate in patients with moderate or severe symptoms. This analysis suggests that there is no significant difference between the regimens.
Using this definition, we can add in the 16 patients (12, 4D; 4, 2D) who reported a moderate or severe cough at baseline and who died before the first follow-up assessment. This, as with definition (ii), suggests a non-significant benefit from the 2D regimen (4D, 64%; 2D, 73%).
(v) Improvement from baseline of moderate or severe to mild or nil, or no deterioration in mild or nil.
Other groups have suggested that the aim of palliation is not merely to relieve, but to control and to prevent symptoms. One major benefit of this definition is that it allows the inclusion of 34 patients (see Table 1), who reported no cough at baseline assessment but completed at least one follow-up questionnaire. Thus, those patients whose cough was moderate or severe at baseline must improve to nil or mild at some point (definition iv), and those with nil or mild at baseline must not get worse at any subsequent assessment. The analysis shows a non-significant benefit from the 4D regimen (4D, 70%; 2D, 64%).
In addition to the 32 patients (21, 4D; 11, 2D) who reported a cough at baseline but died before their first follow-up, we can also include the 13 (12, 4D; 1, 2D) who reported no cough at baseline, but who died. This suggests a slight benefit from the 2D regimen (4D, 51%; 2D, 57%).

Summary
It is reassuring that, if the early deaths are not included as nonpalliated, for all five definitions the same regimen emerged as the more effective of the two. However, the level of palliation, the size of the difference and the associated P-values varied widely. CliniciansÕ opinions of the palliative capacity of regimens, as well as the importance that they are likely to attach to differences between regimens, could therefore depend to a considerable extent on the definition used.
However, when account was taken of early deaths, by scoring these patients as Ônot palliatedÕ the results were inconsistent, and, although never reaching statistical significance, the benefit tended to favour the 2D regimen.

APPLICATION OF THE PROPOSED DEFINITION OF PALLIATION
Because of the observed inconsistencies, we believe it is necessary to encompass improvement, control, prevention, degree, duration and onset. The results of this analysis are shown in detail in Table  5. The period of assessment was limited to the first 3 months from randomization, patients with moderate or severe cough at baseline were required to show improvement (a reduction to nil or mild on two successive assessments), those with mild cough at baseline to show control (not getting worse), and those with no symptoms to show no deterioration (prevention). To be included in this analysis, therefore, patients had to have completed questionnaires at baseline and at two successive follow-up assessments in the first 3 months, and those who died before the second follow-up but with complete data were included as Ônot palliatedÕ.
Using these criteria, a total of 117 patients (38%) were excluded, mainly because of lack of data, leaving 104 4D and 89 2D patients. Of the patients who started with moderate or severe cough, this improved to no or mild cough in 62% of the 4D patients and 55% of the 2D patients. Of the patients who started with a mild cough, this either remained mild or improved to nil in 83% 4D and 72% 2D. Finally, of the patients who started treatment without a cough, this did not develop in 42% 4D and 62% 2D. In total, therefore, 67% 4D and 65% 2D were considered to be palliated using this definition.
Adding in as Ônot palliatedÕ those patients who died, the balance shifts towards the 2D regimen, with 43% 4D and 55% 2D achieving palliation.

CONCLUSION
Symptom palliation is likely to be a major end point in cancer therapy trials when it is unrealistic to aim for cure, as well as in other comparisons of treatments aimed at symptom relief or control (e.g. analgesics, antiemetics). Clinicians would benefit greatly from being able to compare published trial results and extract clinically relevant outcomes to discuss treatment decisions with their patients. Agreeing a definition of palliation in this context is an essential next step, and one that warrants urgent consideration given the amount of QL data currently being generated in palliative treatment trials.
For simplicity in our analyses, we used a single symptom, cough. In analysing the QL aspects of a clinical trial, it would be unwise, because of the dangers of multiple comparisons, as well as unwieldy, to present the results of every individual item. Nevertheless, we would also argue that combining items into subscales or domains can sometimes mask important differences in individual symptoms. The recommended approach is to set out predefined hypotheses, which also will have the advantage of helping to choose the correct questionnaire, deciding on the timing of administration, and of calculating a suitable sample size.
We recognize that our methods are not exhaustive or necessarily the best, but our aim has been to draw attention to the problems inherent in assessing palliation, to tease out the advantages and disadvantages and make some preliminary recommendations.