Implications and prognostic impact of mass spectrometry in patients with newly-diagnosed multiple myeloma

Mass spectrometry (MS) is a promising tool for monitoring monoclonal protein in plasma cell dyscrasias. We included 480 transplant-eligible newly-diagnosed multiple myeloma (MM) patients from the GMMG-MM5 trial (EudraCT No. 2010-019173-16) and performed a retrospective MS analysis at baseline (480 patients) and at the pre-defined, consecutive time points after induction (444 patients), prior to maintenance (305 patients) and after one year of maintenance (227 patients). We found that MS negativity was significantly associated with improved progression-free survival (PFS) even in patients with complete response (CR) at all investigated follow-up time points. The prognostic impact was independent of established risk factors, such as the revised International Staging System. Combining MS and baseline cytogenetics improved the prediction of outcome: MS-positive patients with high-risk cytogenetics had a dismal PFS of 1.9 years (95% confidence interval [CI]: 1.6–2.3 years) from the start of maintenance. Testing the value of sequential MS prior to and after one year of maintenance, patients converting from MS positivity to negativity had an excellent PFS (median not reached) while patients converting from MS negativity to positivity progressed early (median 0.6 years, 95% CI: 0.3-not reached). Among patients with sustained MS positivity, the baseline high-risk cytogenetic status had a significant impact and defined a group with poor PFS. Combining minimal residual disease (MRD) in the bone marrow and MS allowed the identification of double negative patients with a favorable PFS (median 3.33 years, 95% CI: 3.08-not reached) and no overall survival events. Our study provides strong evidence that MS is superior to conventional response monitoring, highlighting the potential of MS to become a new standard. Our data indicate that MS should be performed sequentially and combined with baseline disease features and MRD to improve its clinical value. Clinical Trials Register: EudraCT No. 2010-019173-16


INTRODUCTION
Novel therapeutics have significantly improved response rates as well as the depth of response in patients with multiple myeloma (MM) [1]. However, despite complete responses (CR) patients stratify into those who achieve long-term remission and those who relapse within a few months [2]. Hence, there is a high need for techniques that track residual disease with increased sensitivity and specificity as compared to the conventional response criteria [3]. Molecular techniques, such as nextgeneration flow cytometry or next-generation sequencing, fulfill both criteria but come with drawbacks. They require inconvenient and painful procedures to obtain bone marrow samples and do not account for the potential inhomogeneous distribution of residual disease (e.g. focal intramedullary or extramedullary disease), since they are based on a sample from a randomly selected site at the iliac crest [4].
The minimally invasive technology mass spectrometry (MS), which is amenable to automation, is emerging as a promising approach for detecting and monitoring monoclonal proteins in the peripheral blood (PB) [5][6][7][8][9]. MS has been shown to be superior to standard electrophoretic methods for the detection of monoclonal immunoglobulins, such as serum immunofixation (IFE) [10][11][12][13]. Furthermore, recent data suggests a role for MS as a complementary approach for the detection of minimal residual disease (MRD) [11,13], overcoming the limitations of bone marrow-based methods for identifying systemic disease.
To provide further evidence of the clinical value of MS and to explore whether it provides independent prognostic information, we have retrospectively tracked treatment response by serum MS in patients who had been enrolled in the German-speaking Myeloma Multicenter Group (GMMG) multicenter phase III GMMG-MM5 trial (EudraCT No. 2010-019173-16). The primary endpoint of the trial investigated continuation versus omission of lenalidomide maintenance for patients achieving a CR [14], which allowed us to compare the prognostic impact of MS in CR patients with or without maintenance. Other strengths of this study include 1) a rather homogeneous treatment with a proteasomeinhibitor containing induction therapy followed by high-dose melphalan (HDM) and autologous stem-cell transplantation (ASCT) and lenalidomide consolidation/maintenance, 2) the availability of a baseline sample in all patients, which allowed us to determine the unique mass of the monoclonal protein for tracking, and 3) the availability of comprehensive clinical and cytogenetic data.

PATIENTS AND METHODS Study design and participants
For quantitative immunoprecipitation mass spectrometry (QIP-MS), we included peripheral blood (PB) serum samples from 480 GMMG-MM5 patients at baseline, from 444 of these patients after three cycles of either VCD or PAd induction therapy and prior to HDM/ASCT, from 305 patients prior to maintenance treatment or observation in case of CR in arm B of the trial, and from 227 patients after one year (±3 months) of maintenance treatment or observation. The design and main outcomes of the prospective, open-label, multicenter phase III trial GMMG-MM5 trial, which enrolled a total of 604 transplant-eligible patients with newly-diagnosed MM, have been previously reported [14]. The study design is also shown in Supplementary Fig. 1. Response within the trial was assessed according to the International Myeloma Working Group (IMWG) criteria as described [14,15]. Response assessment was performed using serum electrophoresis (SPEP) and IFE in the serum or urine to quantify and detect the monoclonal protein. For CR assessment, a bone marrow puncture was mandatory and less than 5% plasma cells (by cytology or histology) in the bone marrow were required. Yet, bone marrow punctures were not mandatory for patients in arm A. All patients provided written informed consent. The trial was approved by the ethics committee of the University of Heidelberg and all participating sites and was conducted according to the European Clinical Trial Directive and the Declaration of Helsinki.

Mass spectrometry for detection of monoclonal immunoglobulins
QIP-MS was carried out using the automated EXENT ® assays and system (The Binding Site Group Ltd., UK; assays and system in development). Briefly, sheep polyclonal antibodies (anti-IgG, -IgA, -IgM, -total κ, -total λ, free κ and free λ) covalently attached to paramagnetic microparticles were separately incubated with serum to enrich for immunoglobulins. The microparticles were washed and treated to simultaneously elute and reduce patient immunoglobulins into their constituent heavy and light chains. Light chain mass spectra were acquired by matrix-assisted laser desorption ionization time-of-flight MS (MALDI-TOF MS). Readout and interpretation were performed using proprietary software to yield immunoglobulin isotype and mass-to-charge ratio (m/z). The patient's specific molecular mass of the monoclonal light chain was defined at baseline and was used to track the presence of the monoclonal protein during follow-up. A positive score by mass spectrometry in follow-up samples was based on the presence of a monoclonal protein at the same m/z (+/− 10 for the doubly charged light chain) as determined at baseline. Two experienced analytical scientists blinded to all clinical information reported on the MS spectra.
Allele-specific oligonucleotide polymerase chain reaction for detection of minimal residual disease For MRD analyses, DNA was extracted after density gradient separation of lymphocytes from bone marrow (BM) aspirates, which was then stored at −20°C until analysis. We used patient-specific quantitative allele-specific oligonucleotide PCR (qASO-PCR) assays on immunoglobulin heavy chain (IgH) and kappa/lambda (k/λ) light chain as recently described [20,21]. For MRD-negative results, a minimum of 10 6 cell equivalents had to be tested without any positive amplification if this amount of material was available to reach a sensitivity for MRD negativity of at least 1 × 10 −6 .

Statistical design and analysis
The Kaplan-Meier method was used for survival analyses. PFS time was measured from the respective landmark to relapse or death from any cause, whichever occurred first. OS was defined as time from the respective landmark until death from any cause. For analyses of sustained MS from start of maintenance/observation until one year, PFS and OS were calculated from the second time point (after one year) and only patients who did not have a prior progression event were included. MS test results were evaluated in a multivariable Cox regression model including established risk factors. Main analyses were undertaken using R (v4.0.4) software.

RESULTS
Mass spectrometry results are associated with outcome at multiple time points We used the QIP-MS for longitudinal monitoring of 480 patients with PB serum samples at baseline and at least one additional time point. Combining MS for intact immunoglobulins and free light chains, a monoclonal immunoglobulin could be identified at baseline for each patient, which allowed us to longitudinally track the respective tumor clone in all patients. An example is shown in Fig. 1. Baseline characteristics of the cohort are presented in Table 1. Median follow-up of the cohort was 57 months for both PFS (interquartile range: 49-64 months) and OS (interquartile range: 50-66 months) with 296 PFS and 131 OS events, respectively.
A recent study demonstrated that the extended half-life of IgG could impact disease monitoring by MS [22]. In line, only 2% of IgG but 18% of Bence Jones (BJ) MM patients were MS negative after induction in our study (p < 0.001, Supplementary Table 1). A significant difference between these two groups (24% vs 54%, p < 0.001) was also seen before maintenance/observation. To account for the extended half-life of IgG, Abeykoon et al. proposed to perform MS at least 6 months from ASCT or thereafter. In support of this notion, there was no significant difference in MS negativity between IgG and BJ MM patients (40% vs. 53%, p = 0.2) after one year of maintenance/observation in our study, which contributes to the strong prognostic value of single MS testing at this later time point.
The mass spectrometry test result constitutes an independent prognostic factor Next, we evaluated the prognostic value of MS testing at the three defined time points in a multivariable model, which included age at diagnosis, gender, treatment arm, R-ISS stages and gain(1q21) status (Fig. 3A, B). Furthermore, we included conventional response (CR vs. no CR), with CR being defined as <5% plasma cells in the bone marrow and absence of the monoclonal protein in the serum and urine according to SPEP and IFE. We focused on PFS and the two time points prior to and after one year of maintenance/observation. For completeness OS results for these time points are shown in Suppl. Fig. 2A, B. The MS test result was an independent prognostic factor for PFS (prior to maintenance therapy: HR = 0.60, 95% CI: 0.40-0.90, p = 0.01; and after 1 year of maintenance therapy/observation: HR = 0.28, 95% CI: 0.17-0.45, p < 0.001). Other significant prognostic factors for PFS were R-ISS stage III and gain(1q21).

Mass spectrometry improves the prognostic value of established risk markers
The prognostic value of the depth of response, as assessed by conventional response, MRD in the BM or functional imaging, is impacted by baseline disease features, such as high-risk cytogenetics [23,24]. The results of our multivariable survival model suggest that the same holds true for response assessment by MS. Indeed, when combining the FISH high-risk markers of the R-ISS del(17p13), t(4; 14), and t(14;16) as well as gain(1q21) at baseline with MS test results before maintenance/observation, patient groups with excellent or dismal outcomes could be defined (Fig.   3C). The best outcome was seen for patients without high-risk cytogenetics and MS negativity (median PFS: 4.8 years, 95% CI: 3.3-not reached), while patients with high-risk FISH and MS positivity had a median PFS of just 1.9 years (95% CI: 1.6-2.3 years) from the start of maintenance/observation. Highlighting the value of combining molecular and response data, patients with high-risk cytogenetics but a negative MS test had a significantly better outcome compared to high-risk patients with MS positivity (HR = 0.54, 95% CI: 0.34-0.87, p = 0.01). As shown in Fig. 3D, separation was even more pronounced when combining MS test results and cytogenetic risk status after one year of maintenance/ observation. The best PFS outcome was again observed for patients without high-risk cytogenetics and MS negativity (median PFS: not reached), while MS testing was able to significantly discriminate favorable vs. dismal outcome in patients with highrisk cytogenetics (median PFS; MS negative: 3.3 years, 95% CI: 2.6-not reached vs. MS positive: 1.3 years, 95% CI: 0.9-1.9; HR = 0.29, 95% CI: 0.16-0.52, p < 0.001). OS results for the two time points are shown in Supplementary Fig. 2C, D.
Mass spectrometry in patients with complete response and the impact of lenalidomide maintenance We recently showed that omitting maintenance in CR patients resulted in significantly worse outcomes [14], suggesting a high proportion of CR patients with significant residual disease requiring continued treatment. In line with this observation, we ascertained MS positivity in 41% (40/98) of CR patients prior to maintenance/observation, and in a landmark analysis from start of maintenance these patients had a median PFS of just 1.7 years  Fig. 4A, Supplementary Fig. 3A).
The high positivity rate of MS in CR patients and its prognostic impact indicate that MS is superior to IFE in terms of sensitivity, independent on IMWG response criteria. Indeed, 78% (69/89), 50% (78/156) and 37% (49/133) of all patients with a negative IFE were still positive in MS after induction, prior to maintenance/ observation and after one year of maintenance/observation, respectively (Supplementary Table 2).
To address the impact of lenalidomide maintenance in MS positive and negative CR patients, we compared CR patients in arm A of the GMMG-MM5 trial, who received lenalidomide maintenance, with CR patients in arm B without maintenance ( Fig.  4B and Supplementary Fig. 3B). Lenalidomide increased PFS in both groups, MS positive (1.4 (95% CI: 0.6-3.7) years vs. 2.1 (95% CI: 1.5-not reached) years) and MS negative (not reached vs. 3.3 (95% CI: 2.3-not reached) years) patients, suggesting that even MS negative CR patients benefit from lenalidomide maintenance. Yet, likely due to the small sample size of these subgroups, results did not reach statistical significance (p = 0.14 and p = 0.06, respectively).

Sequential testing improves the prognostic value of mass spectrometry
Long-term deep responses have been associated with improved survival in MM [25,26]. Thus, we aimed to determine the prognostic value of sequential MS testing and chose the time points prior to maintenance/observation and after 1 year. In a landmark analysis after one year of maintenance/observation, the best outcome was seen for 28 patients who converted from MS positivity to negativity (median PFS not reached, Fig. 5A). Among those, we observed a trend towards an enrichment of IgG MM patients (18% [22/125] IgG, 9% [6/66] non-IgG, p = 0.14), and only 7% (2/28) patients had not received lenalidomide maintenance (patients in CR in the observation arm).
The worst outcome was seen for 6 patients who converted from MS negativity to positivity with a median PFS of only 0.6 years (95% CI: 0.3 years-not reached) reflecting early disease progression. Sustained negativity (MS tests at both time points negative), which was seen in 56 patients, was associated with improved PFS (median PFS: 3.5 years, 95% CI: 2.7 years-not reached) as compared to sustained positivity (n = 101, median PFS: 1.9 years, 95% CI: 1.4-2.9 years; HR = 0.51, 95% CI: 0.31-0.83, p = 0.007). We did not observe significant OS differences between the four groups, likely due to the small number of total OS events (n = 20, Supplementary Fig. 4A).

Mass spectrometry complements bone marrow minimal residual disease assessment
The position of MS regarding detection of MRD in the BM is one important question. MRD data were not systematically collected within the GMMG-MM5 trial. However, for 45 patients BM MRD (sensitivity 1 × 10 -6 ) and MS data were available for one time point after HDM/ASCT. Details on time points of MRD evaluations are shown in Suppl. Table 3. Residual disease with at least one method, either MS or MRD, was detectable in 36 patients (80%). The worst PFS from the time point of MRD evaluation post HDM/ ASCT was seen for the 17 double-positive patients (median: 2.15 years, 95% CI: 1.32 years-not reached, Fig. 6A). Yet, we did not detect a significant difference when comparing them to patients who were only positive for MRD (2.40 years, 95% CI: 2.00-not reached) or MS (2.45 years, 95% CI: 0.75-not reached) (p > 0.05). As expected, the best PFS was seen for the double-negative patients but we still observed disease relapses in this group (median PFS: 3.33 years, 95% CI: 3.08-not reached). We did not observe deaths in this subgroup within the observation time (Fig. 6B).

DISCUSSION
MS has been proposed as a minimally invasive complementary approach for monitoring of residual disease in MM patients [5].
Here we show that QIP-MS is superior to conventional response assessment in newly-diagnosed MM in terms of sensitivity and prognostic value, which is in line with recently published data of the STAMINA trial using a comparable MALDI-TOF MS approach [13]. Compared to this study, we observed a lower proportion of  In both the STAMINA [13] and our own study, MS positivity had a significant negative impact on PFS even in CR patients, suggesting that positivity does not just reflect residual circulating monoclonal protein but derives from treatment-resistant tumor cells that constitute a source of disease relapse. However, we also show that the long half-life of the monoclonal protein in patients with IgG MM impacts disease monitoring, especially at early time points. We did not observe a significant difference between isotypes after one year of maintenance/observation anymore, but the optimal time point for testing still remains to be determined. In contrast to PFS, we observed only a minor or no impact on OS, probably due to limited follow-up and heterogeneous salvage therapies.
From a clinical point of view, it is an important question if QIP-MS results can be used to guide treatment decisions. While we show a significant impact of single MS testing on PFS, we demonstrate that sequential MS testing, and combination of test results with baseline biological tumor features, could strongly improve the clinical value of this technique. For instance, there were GMMG-MM5 patients who converted from MS positivity to negativity during lenalidomide maintenance. Although this subgroup showed a trend towards an enrichment of IgG MM, the excellent PFS indicates that an extended half-life of the monoclonal protein did not solely underlie this observation. Of note, the patients were not exposed to an immunomodulatory agent prior to HDM/ASCT. Thus, this subgroup most probably reflects an explicitly lenalidomide-sensitive treatment group. In contrast, patients who converted from negative to positive MS had the worst outcome, highlighting MS as a method for early detection of emerging relapse in line with other studies [11,27]. These findings exemplify how MS testing could inform therapeutic decision making in both directions: to proceed with the current treatment if the patient becomes negative or switch to alternative treatment if the patient converts from negativity to positivity or remains MS positive, especially if the patient has high-risk status at baseline. We fully appreciate that the value of MS in therapeutic decision making needs to be addressed in the setting of prospective clinical trials.
In line with recent studies using conventional CR or MRDnegativity to define the level of response [25,26,28], long-term deep responses according to MS were associated with improved outcome in our study. Yet, we need to emphasize that even MS negative patients benefited from lenalidomide maintenance. A significant proportion of patients who were still negative after one year of maintenance suffered from progressive disease during the observation period of the clinical trial, indicating that QIP-MS alone is not sensitive enough to identify patients who are eligible for treatment holidays. One potential solution would be to combine MS with highly sensitive BM MRD techniques. Besides its retrospective nature, the lack of comprehensive MRD data in the GMMG-MM5 trial constitutes one of the major limitations of   our study. BM MRD results were only available for a subset of patients and a comparison with MS indicates that the two approaches are complementary, which is in line with recent studies [13,29]. In these studies, only double-positive patients had poor outcome [13,29]. We did not detect a significant difference between double-positive patients and patients who were positive just by one technique, which could be explained by the limited patient number and follow-up. Interestingly, we observed disease progression but no deaths in double-negative patients. Disease progression in this subgroup was probably due to the limited duration of lenalidomide maintenance (maximum up to 2 years) in our trial. However, it also highlights that even a combined residual disease approach with a highly sensitive BM MRD tool was not sufficient to identify disease-free MM patients. Yet, we fully appreciate that further studies are needed to evaluate the complementary or combinatorial, and sequential value of MS testing and MRD status as well as combination with other methods, such as functional imaging. These attempts are ongoing.
In conclusion, MS is a promising tool for monitoring treatment response. Recent data from the STAMINA trial [13], a further study from the MAYO clinic [7], and our own study strongly indicate that MS could replace immunofixation for the definition of CR and should be considered for IMWG response criteria. Though single MS testing should not be used to assess residual disease or prognosis, combination with baseline disease features and MRD from the BM as well as sequential MS testing improve outcome prediction. Further studies are warranted to determine the optimal  time for testing and the utility of this novel approach in combination with MRD testing from the BM or imaging-based techniques and whether MS can guide therapeutic decisions.

DATA AVAILABILITY
Data from the GMMG-MM5 trial and QIP-MS samples is not publicly available. For requests, please contact the corresponding author.