Evaluating treatment strategies in advanced Waldenström macroglobulinemia: use of quality-adjusted survival analysis

Article metrics


A randomized phase II multicenter clinical trial comparing the efficacy of fludarabine (FAMP) to that of the association of cyclophosphamide, doxorubicin and prednisone (CAP) in 92 patients with Waldenström's macroglobulinemia in first relapse or with primarily resistant disease, was conducted on the behalf of the ‘Groupe Coopératif Macroglobulinémie’. The main analysis of this study failed to demonstrate a clear cut benefit of FAMP in terms of overall survival (OS), although a significant benefit in terms of time to disease progression and event-free survival (EFS) was noted. In this rare disorder, where few randomized trials have been conducted, we took advantage of this trial to assess treatment differences while integrating quality of life considerations. We thus performed a quality-adjusted survival analysis, using the quality-adjusted time without symptoms or toxicity (Q-TWiST) approach. Four health states differing in terms of quality of life (QoL) were defined, namely treatment-related toxicity, treatment free of toxicity, no treatment or symptoms, and relapse. The average time spent in these health states (TOX, CT, TWiST and REL, respectively) were then weighted by utility coefficients reflecting relative QoL value according to that of TWiST and summed up giving the so-called Q-TWiST. No difference was found between randomized groups in terms of mean CT. Mean TOX in the two groups were similarly close except when considering alopecia as a relevant toxic event. By contrast, mean TWiST was 5.9 months longer in the FAMP group than in the CAP group (P = 0.006). Unsurprisingly, given the absence of difference in OS but the difference in EFS in favor of the FAMP group, mean REL was increased by 6.8 months in the CAP group (P = 0.047). As a result, benefit of FAMP in terms of average Q-TWiST only relied on the value of the utility coefficient attributed to REL (UREL), with a significant benefit when UREL ranged from 0 to 0.28, ie in patients undergoing poor QoL after relapse, which is likely.


Waldenström's macroglobulinemia (WM) is a rare malignant disease characterized by the proliferation of well-differentiated plasmacytic lymphocytes producing a monoclonal immunoglobulin (IgM). Historically, first-line therapy consisted of the use of alkylating agents with or without steroids.1 Recent studies have suggested the efficacy of purine analogs – such as fludarabine and 2-chlorodeoxyadenosine – in previously untreated and treated patients. However, no consensus exists for the treatment of patients with either first relapse or primary resistant disease.2,3,4,5,6,7,8 The ‘Groupe Coopératif Macroglobulinémie’ conducted a randomized phase II multicenter clinical trial comparing the efficacy of fludarabine (FAMP) to that of cyclophosphamide, doxorubine and prednisone (CAP) in 92 patients with WM in first relapse or primarily resistant disease.

Main statistical analysis of this trial performed at the reference date of 16 April 1999 failed to show any clear-cut benefit of FAMP over CAP in terms of overall survival (P = 0.62), although a significant benefit in terms of time to disease progression (P 0.001) and event-free survival (P 0.001) was reported.

This study addressed the potential difference in the quantity of survival between CAP and FAMP, and was not originally designed to provide insight into the quality of that survival.

Secondary end points such as ability to tolerate treatment, the impact of treatment on functional status and overall quality of life (QoL) may distinguish one choice from the other, particularly when treatment options achieve similar long-term survival. In order to integrate QoL considerations into the treatment comparison, quality-adjusted survival analyses have been proposed.9,10,11,12 It allows the combination of ordered end points such as treatment-related toxicity, disease progression and death, as a basis for comparison of the effect of different treatments on the average course for patients. This so-called quality-adjusted time without symptoms and toxicity (Q-TWiST) methodology consists of estimating the duration of health states that differ in terms of QoL and weighting these durations according to their respective QoL to derive a quality-adjusted survival end point. We decided to retrospectively derive quality-adjusted survival in the setting of this randomized phase II trial.

Materials and methods

The study design and the main results of the trial have been reported elsewhere13 and will only be briefly summarized here.

From December 1993 to December 1997, 92 patients (pts) with resistant or relapsing Waldenström's macroglobulinemia were randomly allocated to receive monthly courses of either fludarabine (FAMP) (46 pts) or CAP (cyclophosphamide, doxorubicine, prednisone) (46 pts). Response was evaluated after six courses of chemotherapy, as follows: treatment was considered as a failure in case of mixed response, no response, or progressive disease after completion of the protocol, as well as in case of progressive disease after three courses of chemotherapy or withdrawal from the study. Clinical and biological criteria of response are detailed in the main report of this trial.13

For each course of chemotherapy, duration and intensity of toxic events were collected by the referring physician, using the WHO scale for each organ or symptom. Toxic events were considered in the analysis when their intensity was such that they had a possible impact on QoL, as defined in the Appendix.

Derivation of the Q-TWiST

Quality-adjusted survival is based on the partitioning of survival times in distinct successive time lengths according to relevant health states. Three states are usually defined: time with treatment-related toxicity (TOX), time without disease symptoms and toxicity (TWiST) and time from treatment failure to death (REL). We defined duration of treatment without treatment-related toxicity (CT) as an additional clinically relevant health state.

Each time is then weighted according to patient's preferences. This reaches the so-called Q-TWiST, given by: Q-TWiST = UTOX TOX + UCT CT + TWiST + UREL REL where TOX is the duration of treatment with toxic side-effects (as detailed below), CT is the duration of the treatment without toxicity, TWiST is the time without disease symptoms and treatment-related toxicity, REL is the time from treatment failure to death; UTOX, UCT, UREL are utility weights on a scale from 0 (as bad as death) to 1 (TWiST), to reflect the value of time relatively to TWiST in the health states TOX, CT and REL, respectively.

Statistical analysis

The Q-TWiST analysis was performed on the basis of the intent-to-treat and reference date (16 April 1999) principles. The mean duration of each health state (TOX, CT, TWiST and REL) were estimated from the clinical trial data, with comparison of treatments in terms of average Q-TWiST.14

Time to each relevant health state was estimated by the Kaplan–Meier method.15 We estimated time with toxicity (TOX) from the recorded individual duration of toxic events, as the number of days with at least one relevant toxicity, as defined in the Appendix. However, since effective duration of alopecia was not recorded, two analyses were performed: (1) excluding alopecia from the set of possible toxic events and (2) estimating mean duration of alopecia at 90 days (see Appendix). Overall duration of treatment was estimated from the first day of the first chemotherapy to either the last day of the final course of chemotherapy or the last day of toxicity. Time to treatment failure and overall survival (OS) were estimated from randomization.

Restricted mean time spent in each health state was estimated separately for each treatment arm by the area under the non-parametric Kaplan–Meier curve, with data truncation at median follow-up interval, ie median time elapsed from inclusion to reference date (43 months), as previously recommended.14,16,17,18 Mean duration of CT was estimated by difference between mean durations of treatment and TOX, respectively, while mean duration of REL was estimated by difference between mean OS and mean time to treatment failure.

Since no patient-level data were available to estimate utility weights, results are presented as threshold utility analyses over the range of the different utility weights. Two-sided P values are given using a t-test based on observed differences between treatments, with standard errors computed by the bootstrap method (based on 999 independent replications).14 Statistical analysis was performed on SAS (SAS Inc, Cary, NC, USA) and Splus (MathSoft Inc, Seattle, WA, USA) software packages.


At the reference date of 16 April 1999, median follow-up was 43 months, no patient was lost to follow-up and 40 have died, 19 patients in the CAP arm and 21 patients in the FAMP arm.

Analysis of the trial failed to demonstrate any significant difference in terms of survival between the two groups (P = 0.62, by the log-rank test). However, event-free (EFS) and disease-free (DFS) survivals were both significantly higher in patients allocated to the FAMP arm (P < 0.001 each, by the log-rank test).

Figure 1 displays the partitioning of survival times for each treatment group into the clinical health states defined for the Q-TWiST model, ie TOX, CT, TWiST and REL, while the resulting restricted mean durations are given in Table 1.

Figure 1

Partitioning of survival time for each treatment group into the four clinically relevant clinical health states retained for defining the Q-TWiST model, namely: CT, TOX, TWiST, REL. In this figure, TOX was estimated as the number of days with at least one relevant toxic event, excluding alopecia (see Appendix). As indicated in the figure, the areas between the survival curves represent the means health duration restricted to the median follow-up interval of 43 months.

Table 1 Component of the Q-TWiST for overall sample of the trial

No difference was found between the two randomized groups in terms of mean duration of treatment free of toxicity, ie CT (4.6 months in both groups). When considering time with toxicity as the number of days with at least one relevant toxicity excluding alopecia, mean times spent in TOX in the FAMP and the CAP groups were similarly close (6.1 vs 7.9 days, respectively, P = 0.40; Figure 1). When computing TOX as the sum of the days spent with any relevant toxicity except alopecia, results were slightly modified (mean TOX: 7.5 in the FAMP group vs 9.4 days in the CAP group, P = 0.51). However, introducing alopecia as a relevant toxic event affecting QoL for 90 days achieved a significantly shortened mean duration of TOX in the FAMP group (10.0 days) as compared to the CAP group (29.4 days, P = 0.005). Nevertheless, owing to the slight to moderate clinical differences in TOX duration between the two randomized groups (from 2 days to 20 days) relatively to the mean OS (accounting for 0.2% to 2%) whatever the computation used, the first estimates of mean TOX (6.1 days and 7.9 days) were subsequently used in the analysis. Finally, by contrast, mean time in REL was increased by 6.8 months in the CAP group as compared to the FAMP group (24.2 vs 17.4 months in the CAP group, P = 0.047). As a result, a gain in mean TWiST of 5.9 months was observed in the FAMP group (8.4 vs 2.5 months in the CAP group, P = 0.006).

Secondly, we performed a threshold utility analysis by introducing utility coefficients. As expected, results were almost independent of the utility weights associated with TOX or CT. Indeed, mean durations of CT and TOX were very close in both groups. As a result, whatever the values of UTOX or UCT, the potential benefit of FAMP in terms of average Q-TWiST was only related to the value of the utility coefficient attributed to REL (UREL). This is illustrated in Figure 2 where UTOX and UCT have been arbitrarily fixed at 0.2 and 0.5, respectively. For UREL ranging from 0 to 0.28, average Q-TWiST was significantly higher for the FAMP group than for the CAP group with a benefit (ΔQ-TWiST) between 125 and 200 days. For UREL values ranging from 0.28 to 0.88, average Q-TWiST was higher for the FAMP group, although not significantly. It is only for high values of UREL (>0.88) that average Q-TWiST was higher in the CAP group than in the FAMP group, but never significantly.

Figure 2

Threshold utility analyses: plot of the mean difference in Q-TWiST between the FAMP and the CAP arm (ΔQ-TWiST) against UREL coefficient. UTOX and UCT were arbitrarily fixed at 0.2 and 0.5, respectively. The dashed lines represent limits of statistical significance (α = 5%) for ΔQ-TWiST. Nevertheless, the plotted curve was slightly dependent on values chosen for UTOX and UCT (see text for more details).

When considering all the possible values of UTOX and UCT, the threshold value of UREL below which the average Q-TWiST was significantly higher with the FAMP group over the CAP group was slightly modified, ranging from 0.27 to 0.30.


Treatment of advanced stage of Waldenström's macroglobulinemia is still a challenging problem for clinicians. Indeed, despite progress in supportive care treatment and the introduction of purine analogs, the disease remains incurable.1,3,4,19,20,21,22 In this setting, evaluation of quality of life is of prime interest.

We performed a randomized trial comparing fludarabine to CAP in patients with WM in first relapse or primary refractory disease. Main analysis of this trial failed to demonstrate a benefit of the FAMP group over the CAP group in terms of survival, although the duration of response and the event-free survival was significantly superior with FAMP over CAP.

The Q-TWiST approach was first used in the evaluation of cancer clinical trials, notably in breast cancer,23,24 malignant melanoma,25,26 colorectal cancer23,27and prostate cancer.28,29 Such an approach has also been used more recently in the evaluation of hematological malignancies, such as follicular lymphoma,16 childhood acute myeloid leukemia17 and multiple myeloma.18 It is especially appropriate when no significant difference in terms of overall survival is observed between the randomized groups, whilst differences in secondary end points (affecting quality of life) are observed. Indeed, the Q-TWiST approach allows assessment of potential average differences in treatment effect when integrating quality of life considerations, which appear of prime interest, at least from a palliative point of view.

The application of the Q-TWiST method first requires the definition of a sequence of relevant health states, usually three: time of treatment-related toxicity, time without disease symptoms or toxicity, and time after treatment failure. These relevant states are weighted by utility coefficients to incorporate QoL considerations. As detailed information for the duration of treatment and toxicity was available in this trial, we chose to define four health states (TOX, CT, TWiST, REL), ie by segregating treatment duration according to the occurrence of toxicity or not. However, this was poorly informative, since both separate mean durations of TOX and CT were close between randomized groups. Indeed, our findings showed no differences between the FAMP and CAP groups in terms of either mean CT (4.6 months in both groups) or TOX (6.1 and 7.9 days, respectively). Interestingly, mean duration of TOX was very short in both groups, accounting for less than 1% of the median follow-up, although accounting for alopecia achieved a 20-day difference between the two groups (10.0 vs 29.4 days, respectively, P = 0.005). Nevertheless, this difference only accounted for 1.6% of the median follow-up, so that its potential influence on the resulting mean difference in average TWIST was negligible (6 months excluding or including alopecia). By contrast, the difference of 6.8 months between the FAMP and CAP groups in mean REL (accounting for 14% of median follow-up) was more important, and likely resulted in the prolonged average mean TWiST by 6 months of the FAMP group over the CAP group. Indeed, patients with advanced WM have a poor outcome whatever the proposed treatment. A benefit of about 6 months in terms of TWiST, that represents 15% of their median survival time, seems not negligible, given the low toxicity profile of both treatment groups.

Because this trial did not incorporate any quality-of-life objective in its original design, patient assignment of utility values was not available to estimate mean Q-TWiST. Thus, a threshold utility analysis was performed. Nevertheless, self-reports of measures of patient utility may be susceptible to the effects of cognitive biases which makes a straightforward interpretation of the measure of utility problematic, as recently summarized by Hanita.30 The threshold analysis, giving information for all possible sets of utility coefficient circumvents these problems.

As expected, given the difference in mean times spent in REL between the two groups, results almost only depended on the utility coefficient value attributed to REL (UREL). We found that, when attributing a low utility coefficient to the relapse state (ie UREL < 0.3), average Q-TWiST was significantly higher for the FAMP group over the CAP group. Moreover, there was no situation, whichever utility coefficient used, where the mean difference in Q-TWiST was significantly higher in the CAP group than in the FAMP group. Finally, in these WM patients refractory or relapsing after first-line treatment, low values of the utility coefficient attributed to REL appear realistic, despite the underlying heterogeneity of this state. Therefore, clinical benefit of the FAMP over the CAP group could be expected for most of these patients.

In summary, the Q-TWiST approach provides a useful method in interpreting clinical trials dealing with chronic diseases. In patients with relapsing or refractory WM, it demonstrated that, for clinically pertinent choices of utility coefficient values, FAMP could be significantly beneficial over CAP in terms of quality-adjusted survival.

Appendix: Codification of the toxicity

Mucositis, hair loss, pericarditis, consciousness were considered when WHO grading was >0. The following toxicities were considered when WHO grading was >1: hemorrhage, nausea, diarrhea, pulmonary toxicity, fever, allergic reaction, cutaneous toxicity, infection, cardiac rhythm, cardiac function, neurological toxicity, pain.


  1. 1

    Dimopoulos MA, Alexanian R . Waldenstrom's macroglobulinemia Blood 1994 83: 1452–1459

  2. 2

    Liu ES, Burian C, Miller WE, Saven A . Bolus administration of cladribine in the treatment of Waldenstrom macroglobulinaemia Br J Haematol 1998 103: 690–695

  3. 3

    Halaburda K, Hellmann A . Fludarabine therapy in a patient with progressive symptomatic Waldenstrom's macroglobulinemia Acta Haematol Pol 1994 25: 63–67

  4. 4

    Hellmann A, Lewandowski K, Zaucha JM, Bieniaszewska M, Halaburda K, Robak T . Effect of a 2-hour infusion of 2-chlorodeoxyadenosine in the treatment of refractory or previously untreated Waldenstrom's macroglobulinemia Eur J Haematol 1999 63: 35–41

  5. 5

    Dimopoulos MA, Weber D, Delasalle KB, Keating M, Alexanian R . Treatment of Waldenstrom's macroglobulinemia resistant to standard therapy with 2-chlorodeoxyadenosine: identification of prognostic factors Ann Oncol 1995 6: 49–52

  6. 6

    Dimopoulos MA, Kantarjian H, Weber D, O'Brien S, Estey E, Delasalle K, Rose E, Cabanillas F, Keating M, Alexanian R . Primary therapy of Waldenstrom's macroglobulinemia with 2-chlorodeoxyadenosine J Clin Oncol 1994 12: 2694–2698

  7. 7

    Fridrik MA, Jager G, Baldinger C, Krieger O, Chott A, Bettelheim P . First-line treatment of Waldenstrom's disease with cladribine. Arbeitsgemeinschaft Medikamentose Tumortherapie Ann Hematol 1997 74: 7–10

  8. 8

    Foran JM, Rohatiner AZ, Coiffier B, Barbui T, Johnson SA, Hiddemann W, Radford JA, Norton AJ, Tollerfield SM, Wilson MP, Lister TA . Multicenter phase II study of fludarabine phosphate for patients with newly diagnosed lymphoplasmacytoid lymphoma, Waldenstrom's macroglobulinemia, and mantle-cell lymphoma J Clin Oncol 1999 17: 546–553

  9. 9

    Schwartz CE, Mathias SD, Pasta DJ, Colwell HH, Rapkin BD, Genderson MW, Henning JM . A comparison of two approaches for assessing patient importance weights to conduct an extended Q-TWiST analysis Qual Life Res 1999 8: 197–207

  10. 10

    Glasziou PP, Cole BF, Gelber RD, Hilden J, Simes RJ . Quality adjusted survival analysis with repeated quality of life measures Stat Med 1998 17: 1215–1229

  11. 11

    Gelber RD, Cole BF, Gelber S, Goldhirsch A . Comparing treatments using quality-adjusted survival: the Q-TWiST method Am Stat 1995 49: 161–169

  12. 12

    Feldstein ML . Quality-of-life-adjusted survival for comparing cancer treatments. A commentary on TWiST and Q-TWiST Cancer 1991 67: (3 Suppl.) 851–854

  13. 13

    Leblond V, Lévy V, Maloisel F, Cazin B, Fermand JP, Harousseau JL, Remenieras L, Porcher R, Gardembas M, Marit G, Deconinck E, Desablens B, Guilhot F, Philippe G, Stamatoullas A, Guibon O on behalf of the French Cooperative group on CLL and Macroglobulinemia . Results of a multicentric randomized study comparing the efficacy of fludarabine to that of cyclophosphamide, doxorubicine and prednisone in 92 patients with Waldensrtöm's macroglobulinemia in first relapse or primary refractory disease (in press)

  14. 14

    Glasziou PP, Simes RJ, Gelber RD . Quality adjusted survival analysis Stat Med 1990 9: 1259–1276

  15. 15

    Kaplan EL, Meier P . Nonparametric estimation from incomplete observations J Am Stat Assoc 1958 54: 457–481

  16. 16

    Cole BF, Solal-Celigny P, Gelber RD, Lepage E, Gisselbrecht C, Reyes F, Sebban C, Sugano D, Tendler C, Goldhirsch A . Quality-of-life-adjusted survival analysis of interferon alfa-2b treatment for advanced follicular lymphoma: an aid to clinical decision making J Clin Oncol 1998 16: 2339–2344

  17. 17

    Parsons SK, Gelber S, Cole BF, Ravindranath Y, Ogden A, Yeager A . M, Chang M, Shuster J, Weinstein HJ, Gelber RD. Quality-adjusted survival after treatment for acute myeloid leukemia in childhood: a Q-TWiST analysis of the pediatric oncology group study 8821 J Clin Oncol 1999 17: 2144–2152

  18. 18

    Zee B, Cole B, Li T, Browman G, James K, Johnston D, Sugano D, Pater J . Quality-adjusted time without symptoms or toxicity analysis of interferon maintenance in multiple myeloma J Clin Oncol 1998 16: 2834–2839

  19. 19

    Leblond V, Ben-Othman T, Deconinck E, Taksin AL, Harousseau JL, Delgado MA, Delmer A, Maloisel F, Mariette X, Morel P, Clauvel JP, Duboisset P, Entezam S, Hermine O, Merlet MY, Akoub-Agha I, Guibon O, Caspard H, Fort N . Activity of fludarabine in previously treated Waldenstrom's macroglobulinemia: a report of 71 cases. Groupe Cooperatif Macroglobulinemie J Clin Oncol 1998 16: 2060–2064

  20. 20

    Legouffe E, Rossi JF, Laporte JP, Isnard F, Oziol E, Fabbro M, Janbon C, Joudan J, Najman, A . Treatment of Waldenstrom's macroglobulinemia with very low doses of alpha interferon Leuk Lymphoma 1995 19: 337–342

  21. 21

    Case DC, Jr ., Ervin TJ, Boyd MA, Redfield DL. Waldenstrom's macroglobulinemia: long-term results with the M-2 protocol Cancer Invest 1991 9: 1–7

  22. 22

    Martino R, Shah A, Romero P, Brunet S, Sierra J, Domingo-Albos A, Fruchtman S, Isola L . Allogeneic bone marrow transplantation for advanced Waldenstrom's macroglobulinemia Bone Marrow Transplant 1999 23: 747–749

  23. 23

    Gelber RD, Goldhirsch A, Cole BF, Wieand HS, Schroeder G, Krook JE . A quality-adjusted time without symptoms or toxicity (Q-TWiST) analysis of adjuvant radiation therapy and chemotherapy for resectable rectal cancer J Natl Cancer Inst 1996 88: 1039–1045

  24. 24

    Trippoli S, Becagli P, Messori A . Adjuvant cyclophosphamide, methotrexate and fluorouracil for node- positive breast cancer: a lifetime cost-utility analysis based on a modified Q-TWIST method Eur J Clin Pharmacol 1997 53: 281–282

  25. 25

    Agarwala SS, Kirkwood JM . Adjuvant interferon treatment for melanoma Hematol Oncol Clin North Am 1998 12: 823–833

  26. 26

    Cole BF, Gelber RD, Kirkwood JM, Goldhirsch A, Barylak E, Borden E . Quality-of-life-adjusted survival analysis of interferon alfa-2b adjuvant treatment of high-risk resected cutaneous melanoma: an Eastern Cooperative Oncology Group study J Clin Oncol 1996 14: 2666–2673

  27. 27

    DeCosse JJ, Cennerazzo WJ . Re: A quality-adjusted time without symptoms or toxicity (Q-TWiST) analysis of adjuvant radiation therapy and chemotherapy for resectable rectal cancer J Natl Cancer Inst 1996 88: 1686

  28. 28

    Pummer K, Lehnert M, Stettner H, Hubmer G . Randomized comparison of total androgen blockade alone versus combined with weekly epirubicin in advanced prostate cancer Eur Urol 1997 32 (Suppl. 3): 81–85

  29. 29

    Rosendahl I, Kiebert GM, Curran D, Colen BF, Weeks JC, Denis LJ, Hall RR . Quality-adjusted survival (Q-TWiST) analysis of EORTC trial 30853: comparing goserelin acetate and flutamide with bilateral orchiectomy in patients with metastatic prostate cancer. European Organization for Research and Treatment of Cancer Prostate 1999 38: 100–109

  30. 30

    Hanita M . Self-report measures of patient utility: should we trust them? J Clin Epidemiol 2000 53: 469–476

Download references


We acknowledge all the participants of the ‘French Cooperative group on CLL and Macroglobulinemia’ and Schering SA France for making this research possible. We also thank Mr Hervé Finel for technical assistance. This work was supported by a grant of the Association pour la Recherche sur le Cancer (ARC) No. 6531.

Author information

Correspondence to V Lévy.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Lévy, V., Porcher, R., Leblond, V. et al. Evaluating treatment strategies in advanced Waldenström macroglobulinemia: use of quality-adjusted survival analysis. Leukemia 15, 1466–1470 (2001) doi:10.1038/sj.leu.2402221

Download citation


  • quality of life
  • Q-TWiST
  • macroglobulinemia
  • Waldenström

Further reading