Clinical relevance of single item quality of life indicators in cancer clinical trials

We investigated the hypothesis that global single-item quality-of-life indicators are less precise for specific treatment effects (discriminant validity) than multi-item scales but similarly efficient for overall treatment comparisons and changes over time (responsiveness) because they reflect the summation of the individual meaning and importance of various factors. Linear analogue self-assessment (LASA) indicators for physical well-being, mood and coping were compared with the Hospital Anxiety and Depression Scale (HAD), the Mood Adjective Check List (MACL) and the emotional behaviour and social interaction scales of the Sickness Impact Profile (SIP) in 84 patients with early breast cancer receiving adjuvant therapy. Discriminant validity was investigated by multitrait-multimethod correlation, responsiveness by standardized response mean (SRM). Discriminant validity of the indicators was present at baseline but less under treatment. Responsiveness was demonstrated by the expected pattern among treatments (P = 0.008). In patients without chemotherapy, the SRMs indicated moderate (0.5–0.8) to large (>0.8) improvements in physical well-being (0.70), coping (0.92), HAD anxiety (0.89) and depression (1.19), and MACL mental well-being (0.68). In patients with chemotherapy for the first 3 months, small but clinically significant improvements (>).2) included mood (0.38), coping (0.41), HAD axiety (0.31) and MACL mental well-being (0.35). Patients with 6 months chemotherapy showed no changes. The indicators also reflected mood disorders (HAD) and marked psychosocial dysfunction (SIP) at baseline and under treatment according to pre-defined cut-off levels. Global indicators were confirmed to be efficient for evaluating treatments overall and changes over time. The lower reliability of single as opposed to multi-item scales affects primarily their discriminant validity. This is less decisive in large sample sizes. © 2001 Cancer Research Campaign http://www.bjcancer.com

In implementing quality-of-life (QL) endpoints in cancer clinical trials, the plea for practical measures has become commonplace. The Australian New Zealand Breast Cancer Trials Group (ANZBCTG) and the International Breast Cancer Study Group (IBCSG) use a limited set of patient-rated indicators for assessing the impact of chemo-and endocrine therapy on QL in breast cancer clinical trials. These are single-item measures in the linear analogue self-assessment (LASA) format (Priestman and Baum, 1976), also known in social sciences as visual analogue scales (VAS).
The advantages of simple LASA indicators for data collection are clear-cut. However, these measures are generally expected to have lower reliability (i.e., less statistical precision) than sound multi-item measures (McHorney et al, 1992), resulting in lower responsiveness. For example, in an extensive investigation of a LASA indicator for mood (Hürny et al, 1996a), the coarse indicator was less efficient than the multi-item reference scale for detection of chemotherapy side-effects, especially in situations with a low impact, such as completion of chemotherapy.
It is less recognized that, in particular situations, single-item scales may be as efficient as multi-item scales. In the study cited above, the responsiveness of the indicator increased with the subjective impact of the clinical event and even exceeded that of the multi-item scale in case of disease recurrence (i.e., a major event) (Hürny et al, 1996a). The indicator was probably more influenced by factors other than mood related to the event, whereas the multi-item scale, assessing mood more precisely, was less subject to such influences. In other words, the impaired discriminant validity of the indicator was associated with an increased responsiveness. Discriminant validity of a measure refers to a higher correlation between this measure and the concepts intended to be measured than those not intended to be measured. Responsiveness to chemotherapy and course of disease are key criteria for clinical validity.
To further investigate the relationship between discriminant validity and responsiveness of these indicators, we compared them with standard measures of mental well-being and psychosocial functioning in patients with early breast cancer. Our hypothesis was that global single-item indicators are less precise for specific treatment effects (i.e., less discriminant validity) than multi-item scales but similarly efficient for overall treatment comparisons and changes over time because they reflect the summation of the individual meaning and importance of various factors for each patient.

Sample
This study included a consecutive sample of patients with operable breast cancer from Sahlgrenska University Hospital in Göteborg which were randomized into one of the following IBCSG adjuvant therapy trials: Trial VI, for pre-and peri-menopausal, nodepositive (N+) patients; Trial VII, for postmenopausal N+ patients; Trial VIII, for pre-and peri-menopausal, node-negative (N-) patients; Trial IX, for postmenopausal N-patients. In these trials, varying schedules of chemotherapy, endocrine therapy and their combinations were studied. The chemotherapy consisted of CMF (cyclophosphamide, methotrexate and 5-fluorouricil); the endocrine therapy was Tamoxifen or LH-RH (luteinizing hormone-releasing hormone) analogue. In patients with conservative surgery (quadrantectomy or lumpectomy) radiotherapy was started 2 weeks after the last chemotherapy course, or within 3 months in case of endocrine therapy alone. The randomization in Trials VI and VII was stratified by institution, type of surgery and oestrogen receptor (ER) status. The randomization in Trials VIII and IX was stratified by institution, ER status and radiotherapy.
Trials VI and VII started in July 1986 and were closed in April 1993 (Hürny et al, 1996b), (International Breast Cancer Study Group, 1996, 1997. Trial VIII started in March 1990and was closed in October 1999. Trial IX started in October 1988and was closed in August 1999. For this investigation, patients were enrolled between April 4, 1990, and November 27, 1992. Patient characteristics of the study sample were compared with those of all patients randomized into Trials VI to IX in Sweden between July 22, 1986 andNovember 24, 1993 (total Swedish sample).

Data collection procedure
Patients were approached by a research nurse within 6 weeks after primary surgery and after being randomized into 1 of the 4 IBCSG trials but before starting adjuvant treatment. Besides the IBCSG QL form assessed in hospital, those patients who agreed to participate in this study were asked to fill in a set of additional questionnaires at home and to send it back to the local data manager. Both the IBCSG QL form and the additional questionnaires were assessed at baseline and at months 3 and 6 of adjuvant therapy. Clinical and sociodemographic data were part of the documentation of the IBCSG trials.

Indicator and standard measures
4 LASA indicators were incorporated in the IBCSG QL form: physical well-being (PWB) (Priestman and Baum, 1976), mood (Priestman and Baum, 1976;Hürny et al, 1996a) and effort to cope (PACIS) (Hürny et al, 1993) were designed as global indicators, appetite as a more specific indicator for cytotoxic side-effects (Bernhard et al, 1997). All indicators were scored by measuring in millimetres from 0 to 100 and were reversed, with higher numbers reflecting better QL (e.g., less effort to cope). Concurrent validity (Butow et al, 1991), test-retest-reliability (Coates et al, 1990) and responsiveness to chemotherapy (Hürny et al 1992) have previously been documented. A 28-item adjective checklist for emotional well-being (Bf-S) (Zerseen, 1986) was also included in the IBCSG QL form. The Bf-S was transformed into scores from 0 to 100, with higher numbers reflecting better emotional well-being. In clinical trials, the global indicators were particularly relevant endpoints (Coates et al, 1987;Hürny et al, 1996b;Bernhard et al, 1999b). We capitalize on this experience.
To target the broad construct of psychosocial adaptation, we selected different domains. Mental well-being was measured by the Mood Adjective Check List (MACL) (Sjöberg et al, 1979). It contains 71 adjectives which are aggregated into 6 bipolar dimensions: pleasantness/unpleasantness, activation/deactivation, calmness/tension, extraversion/introversion, positive/negative social orientation and control/lack of control. Each dimension and an overall score (MACL TOT) is scored from 1 to 4, with higher numbers reflecting better mood. In various chronic conditions, the first 3 dimensions were of particular clinical relevance (Sullivan et al, 1993). We capitalize on this experience.
The Hospital Anxiety and Depression Scale (HAD) (Zigmond and Snaith, 1983) was used as a complement to the MACL. The HAD contains 14 items which are aggregated into summary scores for anxiety and depression ranging each from 0 to 21, with higher numbers reflecting more mood disturbance. The validated classification of psychiatric morbidity regarding non-psychiatric cases (scores 0-7), possible cases (scores 8-10) and probable cases (scores 11-21) was also tested for the Swedish version in patients with chronic disease or injury (Sullivan et al, 1993). We used a dichotomization with the cut-off score of ≥8.
Emotional behaviour (EB) and social interaction (SI), the main psychosocial dimensions of the Sickness Impact Profile (SIP) (Bergner et al, 1981), were chosen to assess health-related dysfunction in personal and social life (Ahlmén et al, 1990;Sullivan et al, 1993). The SIP/EB contains 9 statements indicative of depression, anxiety, low self-esteem and lack of control, the SIP/SI includes 20 statements on quality and quantity of social interaction within and outside the family. For each dimension, the percentage of maximum dysfunction is calculated according to predetermined weights, ranging from 0 to 100 (most dysfunction). Based on experiences in Sweden (Sullivan et al, 1986), limits for no (score = 0), slight to moderate (scores 1-10) and marked dysfunction (scores 11-100) were defined (Augustinsson et al, 1989). We used a dichotomization with the cut-off score of ≥11.

Statistical methods
Submission rates of the IBCSG QL form including the indicator measures and of the questionnaires including the comparison measures were calculated as the ratio of numbers of received versus expected questionnaires of all patients randomized in the participating hospital during the study period separately for each time point.
Convergent and discriminant validity of the indicators were investigated by a multitrait-multimethod correlation analysis (Ahlmén et al, 1990;Sullivan et al, 1993), created from hypotheses about measures targeting the same (convergent) versus different concepts (discriminant validity). A matrix was developed for baseline and month 6. Correlation coefficients were considered low (<0.4), moderate-to-high (0.4-0.7) and substantial (>0.7) (Ware et al, 1993).
Responsiveness to chemotherapy and changes over time were tested by standardized response mean (SRM; mean change/SD of this change) (Liang et al, 1990;Katz et al, 1992). Randomized treatment assignments were grouped across the whole study sample separately for the first 3 and 6 months on study as shown in Table 1. Chemo-endocrine therapy was grouped together with chemotherapy. The SRMs were interpreted as trivial (<0.2), small (0.2-0.5), moderate (0.5-0.8) or large (>0.8) effect size (Cohen, 1977). We used Cohen's criteria as an illustrative measure and compared its threshold for a small effect to a minimal clinically significant change as defined in an adjuvant breast cancer trial using the same indicators (Hürny et al, 1996b).
We expected a substantial improvement in QL in patients with endocrine therapy only reflecting adaptation to disease, to a lesser extent in those with chemotherapy for the first 3 months reflecting treatment burden and no improvement in case of chemotherapy for the first 6 months (Hürny et al, 1996a;1996b;Bernhard et al, 1997). Among the indicators, adaptation was expected to be most expressed in coping (PACIS) scores, chemotherapy sides-effects in coping and physical well-being. The 6-months grouping was selected as primary comparison and tested across all measures by the Friedman test. This test is related only to the pattern and therefore robust against variation of the single measurements. A sample size of n = 70 was considered as sufficient.
As a further issue of clinical validity, we explored whether the indicators are sensitive to subgroups of patients according to their levels of mental well-being and psychosocial functioning. The HAD and SIP scores were chosen as criterion measures.
Known-groups comparisons of the dichotomized absolute scores were used at baseline and at month 6. Lines indicating 95% CI around observed mean effects were chosen to show the consistency of patterns.

Sample description and baseline scores
During the study period, 101 patients were randomized into IBCSG Trials VI to IX at the Sahlgrenska University Hospital and were asked to participate in this additional investigation. 88 of these patients (87%) agreed but 4 were ineligible for the IBCSG trials. In the remaining sample (n = 84), the submission rate of both the IBCSG QL form including the indicators and the set including the comparison measures was 96% at baseline, 90% at month 3 and 89% at month 6. At each timepoint, the sample size was varying by QL measure due to missing data on available questionnaires (LASA indicators: 0 -3%, Bf-S: 4 -13%, comparison measures: 0 -2% by measure and timepoint).
Biomedical and sociodemographic characteristics of the study sample and of the total Swedish sample are summarized in Table  A majority of patients in the study sample underwent chemotherapy during the first 6 months on study. 11% of all patients started radiotherapy before month 3, and 21% between months 3 and 6. No case of disease recurrence was registered within the first 6 months.
The baseline scores of the indicators are shown in Table 2. The 2 samples showed comparable scores. A tendency toward higher scores (i.e., better QL) was present in all indicators. Overall, the mood and coping scores were most impaired.

Convergent and discriminant validity
The multitrait-multimethod matrix is shown for the scores at baseline and month 6 in Table 3. Measures targeting the same (convergent) versus different concepts (discriminant validity) were investigated both within the indicators and standard measures and among all measures. It has to be noted that convergent measures included in the same questionnaire are generally expected to be more highly correlated than those of separate questionnaires.
Regarding the indicators at baseline, the 2 measures of emotional well-being (mood, Bf-S) showed the highest correlation (r = 0.72). The PACIS was moderately related to both physical and emotional measures (0.42-0.62), thus referring to a separate construct. Among the standard measures, emotional scales of different instruments (MACL subscales, HAD anxiety and depression, SIP emotional behaviour) were closer correlated with each other (0.38-0.76) than with SIP social interaction (0.29-0.53).
Taking into account both indicator and standard measures at baseline, the mood indicator was most strongly correlated with the MACL pleasantness (r = 0.77) and total score (r = 0.71) and with HAD depression (r = 0.69). The complementary adjective checklist Bf-S reflected the same pattern with more substantial correlations. The indicators for coping (0.53-0.65) and physical well-being (0.42-0.62) showed lower correlations with the mental well-being measures than the mood indicator (0.61-0.77), and they were more highly correlated with mental well-being and emotional functioning than with social functioning; SIP social interaction was only marginally associated.
The matrix at month 6 is based on scores from patients with different adjuvant treatments. The correlation coefficients were not adjusted for treatment group to investigate discriminant validity under treatment overall. Among the indicators, mood and Bf-S were both strongly correlated with physical well-being (r = 0.80 and 0.78, respectively). Coping was again moderately correlated with both physical and emotional measures (0.48-0.62). Among the standard measures, emotional scales of different instruments (MACL subscales, HAD anxiety and depression, SIP emotional behaviour) were again closer correlated with each other (0.48-0.84) than with SIP social interaction (0.43-0.66). The coefficients between corresponding indicator and standard measures were generally lower than at baseline. The mood indicator was again most but only moderately correlated with the MACL pleasantness (r = 0.61) and total score (r = 0.60) and showed the same pattern of relationships as the Bf-S. Coping was again more highly correlated with mental well-being and emotional functioning (0.52-0.63) than it was with SIP social interaction (r = 0.45). Physical well-being showed a similar pattern, despite the high correlation with mood.
In summary, the correlation analyses at baseline showed convergent and discriminant patterns among the indicator and standard measures in accordance with their construct. In contrast to the standard measures, the patterns among the indicators showed less discriminant validity under treatment than at baseline.

Responsiveness to chemotherapy and changes over time
Responsiveness of the indicators and the mental well-being measures to chemotherapy and changes over time were evaluated over the first 3 and 6 months from randomization. The mood indicator provided reference data for a minimal clinically significant change. In IBCSG Trial VII, postmenopausal patients who did not All scales range from 0 to 100, with higher scores indicating better QL.  The Pearson correlation coefficients above the diagonal represent the scores at baseline, those below the scores at month 6. Correlation coefficients are considered low (<0.4), moderate-to-high (0.4-0.7) and substantial (>0.7: in bold). Varying sample size is due to partial missing data. b The Bf-S is a 28-item scale included in the form of the indicators as internal reference measure.
receive prior chemotherapy indicated an average within-patient deterioration of 3.6% of full scale range (i.e., 0 -100; P = 0.05) at the beginning of delayed chemotherapy (Hürny et al, 1996b). This effect corresponds to a SRM of 0.14 in the group without early chemotherapy and to 0.18 in that with chemotherapy for 6 months. It is close to the threshold value of 0.2 for a small effect (Figures 1  and 2). The group with chemotherapy included different treatment schedules and the number of patients in each group was small. Therefore, only the main effects are to be interpreted. For the 3-months comparison, the 4 indicators were compared with the HAD anxiety and depression scales, the MACL pleasantness, activity and calmness scales and the MACL total score. This comparison included patients receiving chemotherapy (with or without endocrine therapy) versus endocrine therapy only for the first 3 months. Figure 1 shows the SRMs separately for each measure and treatment group between baseline and month 3. In patients with no early chemotherapy, the SRM indicated the expected improvement in all measures of at least moderate degree, with the exceptions of appetite (0.01) and MACL activity (0.35). Large effects were present for coping (1.33) and MACL pleasantness (0.80). In patients with chemotherapy in this period, the SRM indicated a small improvement in mood (0.31), coping (0.21), MACL pleasantness (0.27), calmness (0.37) and total score (0.27). Physical well-being did not change and differed from emotional measures, in agreement with the MACL activity scale. The coping indicator was most responsive to the presence or absence of chemotherapy, followed by physical well-being and the HAD depression scale. For the 6-months comparison, we selected those measures with an SRM of at least moderate degree in either group of the 3months comparison. This comparison included patients receiving chemotherapy (with or without endocrine therapy) for 6 months versus chemotherapy (with or without endocrine therapy) for the first 3 months versus endocrine therapy only for the first 6 months. Figure 2 shows the SRMs separately for the selected measures and the 3 treatment groups between baseline and month 6. In patients with no early chemotherapy, the SRM showed the expected improvement of moderate to large degree in all measures, with the exception of mood (SRM = 0.35). In patients with chemotherapy for the first 3 months, the SRM indicated a similar pattern of a small improvement as in the 3 month period, with comparable responsiveness of the indicators for mood (0.38) and coping (0.41), the MACL pleasantness (0.37), calmness (0.46) and total score (0.35), and HAD anxiety (0.31). In patients with chemotherapy for 6 months, the SRM were again smaller, with the exception of mood (0.49), indicating no or only small changes. The only and small deterioration relative to baseline was noted for MACL calmness (-0.23). The latter scale was most responsive to the distinction between 3 and 6 months of chemotherapy, followed by MACL pleasantness and HAD anxiety. The predicted dose-response pattern was present both for the standard measures (HAD anxiety, MACL) and the indicators (physical well-being, coping) (P = 0.008).
In summary, the indicators reflected the presence or absence of early chemotherapy at least as well as the standard measures but were less sensitive to the duration of chemotherapy.

Distinguishing groups by levels of mental distress and psychosocial dysfunction
The indicators' responsiveness to clinically validated levels of mental distress and psychosocial dysfunction (i.e., 'case' versus 'non-case') was investigated for scores at baseline and month 6. The evaluation at month 6 was based on absolute scores without adjustment for baseline or treatment. Figure 3 shows the indicator scores according to HAD anxiety (non-cases n = 54; cases: n = 27) and depression (non-cases n = 74; cases: n = 7) at baseline. All of the indicators reflected the presence or absence of a possible or probable mood disorder in the expected direction. The only marked overlap of confidence intervals regarding non-cases and cases was in the prediction of physical well-being by depression, the number of cases being at the lower limit for this type of illustration. The distinction was most pronounced for mood and coping. Patients with anxiety beyond the cut-off level reported a similar level of mood (mean = 43) and coping (mean = 37) which was remarkably lower compared to that of the non-cases (mood: mean = 71; coping: mean = 68). Depression yielded similar figures, with low levels in cases (mood: mean = 28; coping: mean = 30) and substantially higher levels in non-cases (mood: mean = 65; coping: mean = 61). Figure 4 shows the indicator scores according to SIP emotional behaviour (non-cases = no or slight-to-moderate dysfunction: n = 56; cases = marked dysfunction: n = 25) and social interaction (non-cases: n = 67; cases: n = 14) at baseline. Regarding emotional behaviour, all of the indicators reflected the presence or absence of dysfunction in the expected direction. Mood and coping were again most sensitive. Cases with marked emotional dysfunction reported a low level of mood (mean = 47) and coping (mean = 42) which clearly contrasted to that of the non-cases (mood: mean = 68; coping: mean = 65). There were similar findings regarding social interaction (SIP), with some overlap of confidence intervals in physical well-being, mood and coping probably due to the small number of cases.
At month 6, the findings were consistent with those at baseline. All of the indicators reflected the absence (n = 54) or presence (n = 21) of anxiety (HAD) and the absence (n = 58) or presence (n = 17) of marked emotional dysfunction (SIP), with the largest differences again for mood and coping. There was some overlap of confidence intervals in all indicators regarding non-cases (n = 68) and cases (n = 7) of depression (HAD) and non-cases (n = 63) and cases (n = 12) of marked social dysfunction (data not shown).

DISCUSSION
In a consecutive Swedish sample of patients with early breast cancer, we investigated the hypothesis that global QL indicators assessed with single items have less discriminant validity and thus are less precise for specific treatment effects than multi-item scales but similarly efficient for overall treatment comparisons and changes over time.
The standard measures indicated the expected impairment in this situation (Fallowfield et al, 1990;Maunsell et al, 1996), characterized by anxiety rather than depression (Maraste et al, 1992). At baseline, the indicators and standard measures showed convergent and discriminant patterns in accordance with their concept. For example, the mood indicator yielded the same pattern with the standard measures as the adjective checklist for emotional wellbeing (Bf-S) but showed lower correlations in consequence of its lower reliability.
Under treatment, these patterns were less convergent or discriminant as compared to baseline. In particular, mood and physical well-being were substantially correlated. Obviously, the indicator and standard measures were affected differently by the various treatment regimens. To get an overall impression, we did not adjust this analysis for treatment. The question is how the lower discriminant validity of the indicators does affect their responsiveness to chemotherapy and changes over time.
In patients without chemotherapy, both the global indicators and the standard measures reflected the adaptation to the disease. Among the indicators this change was most obviously expressed in perceived coping effort, the most subjective measure (Hürny et al, 1993). This finding speaks for a summative effect of various factors.
In patients with 3 months chemotherapy, the responsiveness was comparable between the indicators for mood and coping and the MACL pleasantness and calmness scales, whereas the HAD depression scores were almost stable. As a reflection of treatment burden, patients receiving chemotherapy for 6 months showed no improvement. An exception was mood. This indicator and that for physical well-being showed clearly different patterns despite the unusually high proportion of variance (64%) explained by each other.
Overall, the standard measures reflected the distinction between 3 and 6 months of chemotherapy better than the indicators. However, in regard to the more sharply contrasting situations of patients with and without chemotherapy, the indicators showed at least comparable performance. In other words, their lower discriminant validity did not result in less responsiveness to 2 markedly different clinical situations.
From a psychometric point of view, the lower precision of the indicators under treatment questions their validity as outcome measures. This may be less so from a clinical point of view. It is common sense that patients' perception of disease and treatment burden is more of a global nature than subdivided into highly specific domains as assessed by the standard measures. The contributing clinical factors may substantially change over time. The pattern of response among the indicators speaks again for a summative effect reflecting the individual meaning and importance of various factors.
This property may also explain the responsiveness of the indicators to mental distress and psychosocial dysfunction. All were In accordance with their concept, the indicators for mood and coping were most sensitive. Adjustment to breast cancer is known to be associated with mental distress and psychiatric morbidity (Watson et al, 1991), irrespective of any causal interaction. Physical well-being and appetite also reflected the criterion measures well. Patients under higher psychological distress are expected to report more physical symptoms (Watson and Pennebaker, 1989), as shown in the situation of adjuvant therapy for breast cancer (Manne et al, 1994). Serious psychosocial impairment is only partly determined by cytotoxic side-effects but influenced by multiple individual factors such as a history of depression (Maunsell et al, 1992). In case of relatively mild regimens, as in this study, there is evidence that adaptation to the disease is more important for patient's QL than cytotoxic side-effects (Hürny et al, 1996b). Identifying patients with poor adaptation is relevant for subgroup analyses, for example in developing risk-adapted treatment strategies or supportive interventions. Given the large sample sizes of phase-III trials, these indicators may carry this type of information sufficiently well. However, the sensitivity of the HAD as screening instrument has recently been questioned, especially regarding depression (Hall et al, 1999). Our findings have to be interpreted within these limitations.
The evaluation of single-item measures has frequently been restricted to cross-sectional comparison with standard measures (McCormack et al, 1988;Cunny and Perri, 1991). McHorney et al investigated how precisely different methods for measuring general health status discriminated between different groups of patients (McHorney et al, 1992). Their results suggested that roughly twice the sample size would be required for a single-item measure to achieve the precision of a long-form (multi-item) measure. A cancer clinical trial is a different situation. Given that disease and treatment factors may change considerably only a longitudinal comparison gives sufficient information to judge the properties of QL measures.
The rather small sample limits the generalization of our findings, although it was sufficiently large to demonstrate the expected pattern, in agreement with previous studies in early (Hürny et al, 1996b) and advanced disease (Coates et al, 1987). The sensitivity to performance status (Bernhard et al, 1999a), disease recurrence (Hürny et al, 1996b) and tumour response (Bernhard et al, 1999b), and their prognostic value for survival ) add evidence to the clinical relevance of these indicators.
In conclusion, LASA indicators were confirmed to be responsive to cytotoxic side effects, mental distress and psychosocial dysfunction in patients with early breast cancer. According to our hypothesis, the lower reliability of single as opposed to multi-item scales affects more their discriminant validity than responsiveness. This is less decisive for treatment comparisons in large sample sizes.