Introduction

Response assessment in multiple myeloma (MM) is based upon the reduction of the M-protein in serum and urine protein electrophoresis, with different levels and/or speed of response showing an association with clinical outcome in the majority of trials.1, 2, 3, 4, 5, 6, 7 The introduction of novel therapies has seen an increasing number of patients achieving deep levels of response, including in relapsed/refractory myeloma.8, 9, 10 Despite achieving deep responses, some of these patients relapse,11 possibly reflecting the presence of minimal residual disease (MRD)12, 13 and impaired normal plasma cell recovery.14, 15, 16, 17

With more patients experiencing profound responses, many during induction therapy,18, 19 stringent as well as immunophenotypic and molecular complete response (CR) categories have been defined as distinct response entities beyond conventional CR.20, 21 While these have proven powerful predictors for survival, serological sampling remains the centrepiece for serial monitoring before invasive bone marrow biopsies are required; efforts should therefore focus on adapting such approaches in line with new monitoring demands and to decide the optimal timing for MRD testing.

An alternative strategy for monitoring M-proteins has recently become available in the form of automated serum heavy+light chain (HLC) immunoassays (Hevylite), which separately measure the intact immunoglobulin of each light chain type, and from which κ/λ ratios (HLC ratios) can be derived to provide an indication of clonality.22 Comparative studies have shown HLC responses to be generally equivalent to those assigned by conventional assessment.23 However discrepancies have been noted at the deepest level of response, with HLC ratios providing additional sensitivity in some cases;24 and showing better agreement with MRD assessment than electrophoretic methods.25 Abnormal HLC ratios can be driven by HLC-pair suppression (for example, suppressed Immunoglobulin G (IgGκ) levels in an IgGλ patient), hence giving a sensitive indication of tumour activity and polyclonal reconstitution after treatment. Notably polyclonal plasma cell recovery and reconstitution of immune function have been associated with better survival outcomes,14, 16, 26, 27 irrespective of the presence of residual disease.15 The survival benefit may be the consequence of improved immunological control of the disease,28, 29 increased ability to fight infection30, 31 and reversing of the malignant plasma cell clone to a non-aggressive MGUS-like state, with little impact over the bone marrow integrity.32 However, how HLC-pair suppression relates to other methods of tumour monitoring and whether HLC assays can aid patient management has yet to be established. Here we compare response assignment determined by electrophoresis with that by HLC assays, and evaluate for the first time the clinical importance when there is discordance.

Subjects and methods

Patients

Between November 2010 and December 2012, 700 newly diagnosed MM patients under the age of 66 years were enrolled into the IFM 2009 trial. Patients were randomised to either arm A, which included 8 × RVD (lenalidomide, bortezomib and dexamethasone) treatment cycles, or arm B, which comprised 3 × RVD treatments followed by high-dose melphalan (with autologous stem cell rescue) and two further RVD consolidation treatments. All patients received 1 year of lenalidomide maintenance therapy. Exclusion criteria were serum creatinine >25 mg/l or a creatinine clearance of less than 60 ml/min. Of the 700 patients enrolled in the therapeutic trial we analysed 509 patients who had intact immunoglobulin MM (IIMM; 116 IgA and 393 IgG) and measurable levels of disease; defined as 10 g/l monoclonal immunoglobulin (M-Ig) on serum protein electrophoresis (SPEP). Patients with light chain only myeloma (n=113), oligosecretors (n=42), other (n=21) and non–evaluable (n=15) were excluded.

Laboratory methods

SPEP and IFE (Sebia, France) were performed using standard laboratory procedures. Serum IgGκ, IgGλ, IgAκ and IgAλ HLC concentrations were measured using Hevylite (The Binding Site Group Ltd, Birmingham, UK) on a SPA PLUS turbidimeter (The Binding Site Group Ltd). HLC concentrations and ratios were considered abnormal if they were outside the 95% reference ranges provided by the manufacturer: IgGκ (3.84–12.07 g/l), IgGλ (1.91–6.74 g/l), IgGκ/IgGλ (1.12–3.21), IgAκ (0.57–2.08 g/l), IgAλ (0.44–2.04 g/l) and IgAκ/IgAλ (0.78–1.94). Free light chain (FLC) concentration were measured using Freelite (The Binding Site Group Ltd) on a SPA PLUS turbidimeter. Serum FLC concentrations and ratios were considered abnormal if they were outside the reference ranges provided by the manufacturer: κFLC (3.3–19.4 mg/l), λFLC (5.7–26.3 mg/l), κ/λ FLC ratio (0.26–1.65).

Response assessment

Responses were compared at the end of consolidation therapy in 463 patients with matched electrophoretic and HLC data. Standard response assignment was carried out following International Myeloma Working Group (IMWG) guidelines.21, 33 HLC responses were modelled on those by IMWG and based on changes in the difference between the involved HLC (tumour-derived; iHLC) and uninvolved HLC (uHLC) concentrations (iHLC—uHLC=dHLC). In summary: progressive disease was defined as >25% increase of M-Ig or dHLC; stable disease was <50% decrease of M-Ig or dHLC; partial response (PR) was 50–89% decrease in M-Ig or dHLC; and a very good partial response (VGPR) was90% decrease in M-Ig (or negative SPE with a positive immunofixation) or dHLC. Patients were classified as having achieved a complete response or better (CR) if they had <5% plasma cells by bone marrow biopsy, were negative when screened for plasmacytomas and had either no detectable M-Ig by IFE or a normal HLC ratio. (Patients with negative IFE or normal HLC ratio but for who bone marrow examination was not performed, were assigned VGPR.)

In addition to other techniques, the presence of MRD was determined by multiparametric flow cytometry (MFC) using a seven-colour antibody panel (sensitivity 10−4), indicating that it can detect one malignant plasma cell within 10 000 bone marrow cells. Minimal residual disease was detected if at least 50 plasma cells were observed in the bone marrow. HLC-pair suppression was defined in patients with an abnormal HLC ratio and uninvolved HLC levels below the published normal ranges (IgGκ<3.84, IgGλ<1.91, IgAκ<0.57 and IgAλ<0.44 g/l); and severe suppression as greater than 50% reduction below these levels.

Statistical analysis

Differences in progression-free survival (PFS) between patient groups were analysed using Kaplan–Meier survival curves with the log rank test used to indicate significance. Hazard ratios were calculated using a Cox proportional hazard model. To avoid bias from early progressions, survival was landmarked and calculated from the end of consolidation therapy. Model fit and discriminatory power were assessed by Akaike information criterion and Harrell’s C-statistic calculations, respectively, using Stata software. All other statistical analyses were performed using SPSS v23 (IBM, Chicago, IL, USA). Survival graphs were generated using GraphPad/Prism 5 software. Concordance between methods for response assignment was assessed by quadratic weighted kappa analysis; a weighted kappa value >0.81 corresponds to near perfect agreement.34

Ethical considerations

The study was approved by the local ethics committee and conducted in agreement with the Declaration of Helsinki and Good Clinical Practice Guidelines. Written informed consent from participating patients was required.

Results

The patient characteristics at diagnosis are detailed in Table 1. All the patients included in the study were SPEP-positive, while the HLC ratio was abnormal for 116/116 (100%) of the IgA-MM patients and 388/393 (99%) of the IgG-MM patients. The median (range) concentrations of iHLC and M-Ig (by SPEP) were 33.9 (7.9–120.7) g/l and 36.8 (10.0–103.1) g/l, respectively; (P=0.060).

Table 1 Patient characteristics

A comparison of the response assignments (HLC-based vs standard IMWG) is presented in Table 2. Overall there was moderate agreement between the two systems, shown by the weighted kappa value of 0.61. There were too few patients in the progressive disease and stable disease categories for a meaningful comparison in these groups, while there was good agreement for the PR andCR categories. However, concordance between methods was poor within the VGPR category, principally because 105 of the 225 patients assigned a VGPR by IMWG criteria were assessed as having CR, according to HLC measurements (that is, normal HLC ratio and5% BMPCs). Re-stratification of VGPR patients applied to both treatment arms; specifically, 56% transplant and 34% non-transplant patients at conventional VGPR, achieved CR by the HLC method (data not shown). In all patients, repeating the weighted kappa analysis after excluding the 225 VGPR patients produced a result of 0.98 (95% CI: 0.92–1) (data not shown) indicating an excellent level of agreement.

Table 2 IMWG and HLC responses at the end of consolidation therapy

Since VGPR assignment is based on either 90% reduction of the M-protein on SPEP, or a positive IFE, we investigated further the discrepancy between methods and found that among 225 VGPR patients, in 124 (55%) cases the response was assigned based on 90% reduction of the M-protein as measured by SPEP, whereas for the remaining 101 (45%) patients the response was assigned based solely on a positive IFE. HLC (+BM assessment) identified CR in 45 (36%) patients from the former group, and in 60(59%) from the latter.

Figure 1 shows the hazard ratios for PFS according to depth of response. For both IMWG and HLC assessments, VGPR patients had a significantly increased risk of progression compared to those achievingCR; although hazard ratios for the HLC categories were higher. Of note, patients at VGPR or CR by IMWG assessment could be separated into different groups according to their HLC response assessments, and these sub-groups had statistically significant or near significant differences in PFS. Conversely, patients within the HLC-derived VGPR and CR groups did not reveal any PFS differences on the basis of their IMWG assessments. Kaplan–Meier survival analyses demonstrated that HLC assessment could stratify further the IMWG responses into meaningful sub-groups with significantly different PFS (Figures 2a and b).

Figure 1
figure 1

IMWG and HLC responses at the end of consolidation therapy associate with outcome. Forest plot showing hazard ratios (HR) and 95% confidence intervals (95% CI) for VGPR vs CR responses, using either IMWG or HLC assessment. Hazard ratios (95% CI) were calculated by Cox regression univariate analysis.

Figure 2
figure 2

HLC responses further stratify standard VGPR and CR patients. (a) In 225 patients at VGPR by IMWG at the end of consolidation therapy, HLC assessment identified patients at PR, VGPR and CR, with median PFS of 21.3, 28.9 months and not reached, respectively. (b) In 141 patients at CR by IMWG, HLC identified patients at VGPR and CR, with median PFS of 35.1 months and not reached, respectively. HR refers to hazard ratio of patients at VGPR vs CR. Log rank test: P0.05 indicates statistical significance. Red line: PR; green line: VGPR; blue line: CR. Number of patients (event) for each arm is shown.

In addition, VGPR patients (by IMWG criteria) could be stratified based on normal vs abnormal FLC ratios (median PFS: 34.5 vs 26.1 months, respectively; P<0.001). In 199 patients with matched HLC and FLC data, the two methods were concordant in 120 (60%) cases, whereas 37 (19%) had an abnormal HLC ratio only, and the remaining 42 (21%) had only an abnormal FLC ratio. Survival analyses demonstrated patients with both normal HLC and FLC ratios, either one abnormal or both abnormal had significantly different outcomes (median PFS: not reached, 29.6 and 14.0 months, respectively; P<0.001; Supplementary Figure 1).

Our results suggest that standard assessment may underestimate responses in a significant proportion of patients, and that HLC monitoring may be a more suitable tool for identifying patients achieving CR. Consequently we compared the IMWG and HLC models for their ability to stratify CR vs non-CR patients. Survival analyses showed a significant advantage for CR compared to non-CR patients in both models (median PFS not reached in both vs 31.0 (IMWG) and 28.7 (HLC) months for non-CR patients; P<0.001). Calculation of the Akaike information criterion (AIC) and C-statistic indicated a better model fit and discriminatory power, respectively, for the HLC compared to IMWG groups (Figure 3).

Figure 3
figure 3

IMWG and HLC models for response assessment at the end of consolidation therapy stratify patients for progression-free survival (PFS). (a) PFS for IMWG response categories was 31.0 months for VGPR and not reached for CR. (b) PFS for HLC responses was 28.7 months for VGPR and not reached for CR. Red line: VGPR; blue line: CR. HR refers to hazard ratio of patients at VGPR vs CR. Log rank test: P0.05 indicates statistical significance. Number of patients (event) for each arm is shown. Akaike information criterion (AIC) and C-statistic for each model are shown.

In addition to serum IFE and HLC ratio, MFC was employed for detection of MRD after the completion of consolidation therapy. In 327 patients with matched data for the three variables, serum IFE was positive in 62% patients compared to 34% who were MFC-positive and 37% with an abnormal HLC ratio (Figure 4a). MFC and HLC ratio reported concordant results (either normal or abnormal) in 268/327 (82%) patients, and were discordant in the remaining 18% patients. We therefore assessed the significance of MFC assessment at different levels of response by IMWG and HLC criteria. The majority of patients at PR by either method were also positive by MFC (87 and 82%, respectively); this percentage decreased in patients at VGPR (43 and 66%, respectively), and even further in CR patients (6 and 12% respectively) (Supplementary Figure 2). Using Kaplan–Meier analyses, MRD positivity associated with shorter PFS in both VGPR and CR categories by IMWG criteria. By contrast, MRD had no prognostic value in patients at VGPR by HLC assessment; and associated with shorter PFS only in those at CR (Figure 5).

Figure 4
figure 4

MRD assessment by serum immunofixation (IFE), HLC ratio and multiparametric flow cytometry (MFC). (a) In 327 patients with matched data by all methods at the end of consolidation therapy, 62% patients had a positive IFE, 37% an abnormal HLC ratio and in 34% patients there was presence of malignant plasma cells in the bone marrow as determined by seven-colour MFC.

Figure 5
figure 5

Prognostic significance of multiparametric flow cytometry (MFC) assessment at different levels of response. Detection of disease using MFC associated with shorter PFS in patients at (a) VGPR and (b) CR by IMWG response criteria. Using HLC response assignment, a positive MFC result had no prognostic value for (c) patients at VGPR; however in (d) those at CR, disease detection by MFC associated with shorter PFS. HR: hazard ratio. Log rank test: P0.05 indicates statistical significance. Red line: MFC positive; Blue line: MFC negative. Number of patients (event) for each arm is shown.

Finally, HLC-pair suppression was identified in 175/461 (38%) patients at the end of consolidation therapy. These patients had significantly shorter PFS compared to those without HLC-pair suppression (median PFS: 30.6 months vs not reached, respectively; P<0.001; Figure 6a). Likewise in the subset of 142 patients at CR by IMWG assessment, we found that 15/142 (11%) had HLC-pair suppression and trended towards shorter PFS (P=0.060, not shown), whereas 7/142 (5%) had severe HLC-pair suppression and significantly poorer outcomes (median PFS: 22.9 months vs not reached, respectively; P=0.004; Figure 6b).

Figure 6
figure 6

HLC-pair suppression at the end of consolidation therapy associates with poorer outcome. (a) Patients with HLC-pair suppression had shorter PFS compared to patients with no suppression (median PFS 30.6 months vs not reached, respectively). (b) In patients at CR by IMWG assessment, those with severe HLC-pair suppression had shorter PFS compared to all other patients (median PFS 22.9 months vs not reached, respectively). HR: hazard ratio. Log rank test: P0.05 indicates statistical significance. Red line: HLC-pair suppression; Blue line: no or moderate HLC-pair suppression. Number of patients (event) for each arm is shown.

Discussion

Current multidrug regimens for MM have led to improved and faster CR rates in both newly diagnosed and relapsed/refractory patients,8, 9, 18, 19, 35 and a realistic expectation of even higher rates in the near future with the introduction of targeted therapies using monoclonal antibodies such as the anti CD38 daratumumab.36 However, the majority of these patients eventually relapse, indicating the presence of persistent disease not detectable with conventional methods. MRD can now be captured in these patients using immunophenotypic and molecular approaches with sensitivities of up to 10−5 and 10−6, respectively.37 To date these techniques are not widely available outside trials and importantly, the appropriate timing for MRD testing remains unclear.38 Consequently measurement of M-proteins remains the cornerstone for monitoring responses in the majority of patients. Standard electrophoretic techniques can make M-protein quantification challenging, particularly at low concentrations as typically found in good responders; with new effective therapies, the need to integrate old and new methods to better define deep responses in a timely manner, and to guide patient management, has therefore never been so clinically relevant.

Our results suggest that HLC measurements could add value to current monitoring methods based on three supporting arguments. First, we found good agreement between standard IMWG (electrophoretic) and HLC assays for response assignment, with the significant exception of patients at conventional VGPR, where HLC ratios had normalised and BMPC were <5% in approximately 50% of cases, suggesting a deeper level of response (CR) in these patients. Similar differences have been reported elsewhere: Suehara et al.39 and Batinic et al.40 reported 55 and 41%, respectively, of a combined 170 patients at VGPR, had normal HLC ratios, whereas Ludwig et al.24 identified 12/35(34%) IFE-positive patients at VGPR or nCR as having normal HLC ratios. Here for the first time we also show that the substantial proportion of patients at standard VGPR but whose HLC ratios have normalised have a significantly smaller risk of progression. Such results illustrate the heterogeneous nature of conventional VGPR as a response category, which has led to many studies combining VGPR and CR categories for the reporting of results.11 The PFS outcomes of patients achieving CR by both methodologies were similar, indicating that adoption of HLC responses does not undermine the validity of conventional CR.

HLC ratios have been consistently associated with PFS during monitoring of myeloma patients.24, 41 Likewise FLC ratios have been shown to be prognostic of PFS and OS, regardless of the level of response.4, 42 In a recent study, FLC but not HLC ratios were reported to associate with PFS in a small cohort of myeloma patients in CR.43 Notably in our study, HLC ratios were prognostic of PFS both in VGPR and CR patients, and the combination of HLC and FLC in VGPR patients had a powerful discriminatory power for patients with both parameters normal or both abnormal (median PFS not reached vs 14 months, respectively). These results suggest that integrating HLC and FLC assessments earlier during monitoring may represent an accurate reflection of response to therapy and add relevant clinical information in the era of novel agents. In line with this need for precise monitoring a new technology, matrix-assisted laser desorption ionisation time-of-flight mass-spectrometry, which relies upon immunoglobulin ratios for identifying and quantifying the monoclonal component, may represent an additional tool for the diagnosis and monitoring of patients with monoclonal gammopathies.44, 45

Second, HLC results were supported by MRD assessment in bone marrow samples using MFC with a sensitivity level of 10−4. At the end of consolidation, approximately two thirds of patients were MFC-negative, and a similar number had normal HLC ratios; by contrast immunofixation remained positive in over 60% patients at this time. Paiva et al.46 also noted that some patients achieving an immunophenotypic response remain IFE positive, and that the monoclonal immunoglobulin disappears by the time of follow-up analysis; whereas Olivero et al.25 observed that IFE was positive in more patients than the HLC ratio or MFC MRD, when assessed pre or post high-dose therapy, and post consolidation. In addition, Rawstron et al.13 reported that ~26% of patients who were MFC MRD negative failed to achieve a conventional CR.

The discordance between IFE and both HLC and MFC results may partly be explained by the extended half-life of the IgG M-protein when present at low concentrations—as expected in patients with deep responses—which may persist in the circulation after the tumour cells which produced the protein have died.47 In these circumstances MRD may be negative and the HLC ratios normalised due to the restoration of polyclonal IgG production, while monoclonal bands are still present on IFE hence misleading the true response for those patients; indeed in 122 patients with discordant IFE-positive but normal HLC ratios at pre-maintenance in our cohort, the vast majority were IgG patients (101 (83%)).

Importantly, MFC in our study added prognostic value to both VGPR and CR categories as determined by IMWG criteria, but only to CR patients when using HLC categorisation. In addition to more accurate monitoring, these results open the possibility of HLC assessment as a valuable serological tool for making decisions on the appropriate timing for bone marrow testing, as recently suggested in a small study where HLC and FLC ratios post-transplant demonstrated 100% specificity for MFC MRD.48

Third, previous studies have already identified HLC absolute values and ratios as prognostic throughout the course of MM.24, 26, 39, 49, 50 The prognostic significance of HLC ratios appears to be a result of both the elevation of iHLC (clonal) and suppression of uHLC (non-clonal) concentrations. We defined HLC-pair suppression in patients with an abnormal HLC ratio and levels of the uninvolved HLC below the lower limit of normal. In line with reports suggesting that normalisation of HLC ratios may be a surrogate for immune recovery,51 patients in HLC CR (that is, with normal HLC ratios) were classed as not suppressed. Interestingly some of these patients had suppression of both involved and uninvolved HLC components, which may reflect successful response to therapy—similar to reports on FLC suppression52—and thus contribute to the positive prognostic value of normal HLC ratios.

In our study, HLC-pair suppression was clearly associated with shorter PFS. Suppression of uninvolved immunoglobulins is presumably a consequence of tumour related, cytokine-mediated, inhibition of polyclonal immunoglobulin production.16 Conversely, a patient’s ability to recover normal immunoglobulin levels (and immune function) likely reflects both effective tumour-kill and a lack of treatment-related toxicity. The clinical significance of monitoring uHLC levels in MM patients has already been identified by Ludwig et al.,27 who reported a correlation between HLC-pair recovery and deeper levels of response; and that severe HLC-pair suppression is an independent risk factor for shorter survival. Likewise Harutyunyan et al.53 found that levels of uHLC following treatment predict clinical status and associate with PFS. Tovar et al.51 expanded on these results and interpreted that abnormal HLC ratios in patients in complete remission after myeloablative transplantation may be a surrogate for immune recovery. These and our own results position HLC measurements as an indicator of both residual tumour cells and normalisation of the immune system, and as such support the current view that HLC assessment could make part of a composite end-point, together with cellular, molecular and possibly imaging techniques, to define a MRD-negative state20 that identifies those patients more likely to experience long-term survival.

Our study has limitations. Patients with light chain myeloma and oligosecretory disease were not included in our analysis. Comparisons have been restricted to a single time-point, namely before maintenance therapy. The prognostic value of HLC and the degree of agreement with MRD should be compared against next-generation techniques with sensitivity 10−5, and at other clinically relevant milestones, including pre-transplant and day +100 post-transplant, and in different treatment and risk groups. At the time of analysis median overall survival had not been reached; follow-up analyses must explore the significance of our results for survival. Finally, although overall there is a reasonable concordance between immunophenotypic, molecular and imaging methods for MRD detection, HLC comparisons should be studied independently against each of these techniques.