Introduction

In recent years, the landscape of treatment of multiple myeloma (MM) has substantially changed1, 2, 3, 4 leading to significantly increased complete response (CR) rates and survival.5, 6, 7, 8, 9, 10, 11 Therefore, CR has become a major goal in newly diagnosed MM, even in aged patients.5, 12, 13, 14, 15 Despite these advances, most CR patients will ultimately relapse.7, 10 Consequently, better insight in depth of treatment response is required and more sensitive methods are needed for detection of minimal residual disease (MRD), particularly in cases that reached CR and stringent CR (sCR).16

Conventional 4–8-color flow cytometry,17, 18, 19, 20, 21, 22, 23, 24 and to a lesser extent also allele-specific oligonucleotide quantitative PCR (ASOqPCR) and next generation sequencing (NGS),25, 26, 27, 28, 29 are progressively being used to monitor MRD in bone marrow (BM) of MM after therapy.16, 30 These studies confirmed the relevance of MRD measurements for identification of MM patients at higher risk of relapse.16, 22 However, despite the greater sensitivity of the MRD approaches (vs classical CR/sCR criteria), identification among MRD cases of patients that will eventually relapse vs those who are potentially cured still remains a challenge, implying that more sensitive MRD approaches are needed.16, 31

High sensitivity and broad applicability have both become mandatory requirements for MRD monitoring in MM.16 Early studies have shown that conventional 4–6-color flow-MRD is applicable in virtually all MM patients (95%), whereas ASOqPCR and NGS have a more restricted applicability (50–90% of cases), mainly due to the high number of somatic hypermutations, which cause variable levels of primer annealing with unpredictable amplification/quantitation results.16, 25, 32, 33, 34 However, the sensitivity of conventional flow-MRD (<10−4) remains (systematically) lower than that of ASOqPCR and NGS (<10−4–10−6).25, 26, 29, 35 More recently, several studies in MM have shown that conventional 8-color flow-MRD assays have an increased sensitivity – limit of detection (LOD) of between <10−4 and <10−5-, leading to a significantly improved prediction of outcome.24, 36 However, current flow-MRD assays and NGS still suffer from a major limitation: lack of standardization.37 Actually, different markers and antibody panels, distinct numbers of cells measured, and highly variable criteria for MRD positivity are currently applied worldwide.38, 39 Therefore, standardization efforts have been made and consensus recommendations and guidelines have been recently proposed.40, 41, 42 However, such consensus recommendations still rely on subjective ‘expert-shared’ knowledge and experience, and do not completely solve the lack of technical standardization, whereas prospectively validated flow-MRD approaches are still missing.39

Here we report on the design of a EuroFlow-based next generation flow (NGF) approach for highly sensitive and standardized detection of MRD in MM, and the results of its validation vs a conventional 8-color flow-MRD method24 and NGS. The novel NGF-MRD approach takes advantage of innovative tools and procedures recently developed by the EuroFlow Consortium for sample preparation, antibody panel construction (including choice of type of antibody and fluorochrome), and automatic identification of plasma cells (PC) against reference databases of normal and patient BM.43, 44, 45, 46 Prospective validation of the whole procedure at two distinct centers confirmed its robustness and significantly greater sensitivity vs conventional 8-color flow-MRD approaches, comparable to current NGS methods, with an improved prediction of patient outcome.

Subjects and methods

Patients, controls and samples

A total of 375 BM and 10 peripheral blood EDTA-anticoagulated samples from 53 controls and 332 adult plasma cell disorder (PCD) patients, were studied (Supplementary Table 1) to design Version 1 of the MM-MRD antibody panel (n=94; 31 normal/reactive and 63 MM studied at diagnosis), to compare the performance of antibody reagents evaluated in Versions 2–5 of the NGF-MRD panel (19 BM diagnostic patient samples), to evaluate distinct sample preparation protocols (n=10 peripheral blood and 8 BM samples) and to validate NGF (Version 5) against conventional 8-color flow-MRD -n=244 consecutively recruited samples corresponding to 16 healthy donors, 66 PCD samples studied at diagnosis and 162 MM patient samples investigated during follow-up, including 110 follow-up BM samples from MM patients evaluated at very good partial response (VGPR): 39 VGPR; 52 CR; and 19 sCR cases-. Twenty-one VGPR BM samples and 10 additional BM samples -22/31 with very low MRD levels (10−4)- were further used to blindly compare NGF vs NGS (Supplementary Table 1).

The study was approved by the local ethics committees and written informed consent was given by each donor according to the Declaration of Helsinki. All samples were processed at each participating center (USAL/HUSAL, UNAV, EMC, UNIKIEL, IPOLFG, SUM) within 24 h after collection.

Immunophenotypic studies for selection of plasma cell (PC)-associated markers

BM samples used to establish Version 1 (Table 1) of the MM-MRD panel were stained with the EuroFlow PCD 8-color panel for a total of 12 different markers (CD38, CD138, CD45, CD19, CD27, CD28, CD56, CD81, CD117, CyIgκ, CyIgλ and β2-microglobulin), using previously described EuroFlow sample preparation protocols.44, 45 For data analysis, events from both 8-color tubes (per sample) were merged and the values of all parameters measured per tube were mathematically calculated for the individual PC events using the merge and calculation functions of the Infinicyt software (Cytognos SL, Salamanca, Spain).43, 44, 45 Subsequently, phenotypic data on normal PC (nPC) from 31 normal/reactive BM samples were merged in a reference database. PC were identified based on their unique pattern of expression of CD38, CD138, CD45 and light scatter features, following consensus recommendations.40, 41, 42 Principal component analysis44, 45 of single PC events was used to compare the reference database of nPC vs aberrant/clonal PC (aPC) from each of 63 MM BM studied at diagnosis, to identify the most discriminating markers that distinguished between (reference) nPCs and aPC from individual MM patients (Figure 1), and establish the applicability of the method. Additionally, CD200 expression was also evaluated on PC in a subset of 28 MM.

Table 1 Multiple myeloma NGF-MRD: 8- and 10-color antibody panels evaluated from the first (Version 1) to the final version (Version 5)
Figure 1
figure 1

Diagram illustrating the process used for the selection and evaluation of markers for the NGF MM MRD panel. Left and lower right panels show the sequential steps followed to select those markers providing the best resolution between aPCs and nPC, including principal component 1 (PC1) vs PC2 —automatic population separator (APS1)— plots illustrating the described comparisons for steps 1, 2 and 4, respectively. (right; a) An APS1 plot corresponding to the simultaneous evaluation of 31 normal/reactive BM samples vs all 63 MM patients studied at diagnosis, in which the resolution power of the EuroFlow diagnostic PCD antibody panel combination is illustrated. Please note that PC from five samples (highlighted by the black arrows) showed suboptimal separation in the overall comparison; nonetheless, when individually compared vs the normal/reactive reference PC pool (bf), these cases also showed sufficient phenotypic discrimination from nPC. Markers contributing to PC1 (and their percentage contribution) for each panel on the right include: (a) CD19(20%), CD56(17%), CD81(13%), CD45(11%), CD117(10%), CD27(8%), CyIgλ(5%), CD38(5%), CyIgκ(4%), β2 micro(3%), CD138(3%), CD28(1%); (b) CyIgκ(27%), CyIgλ(13%), β2 micro(12%), CD38(11%), CD56(10%), CD138(7%), CD28(16%), CD45(4%), CD27(4%), CD117(4%), CD19(2%), CD81(0%); (c) CyIgκ(20%), CD45(16%), CD56(11%), CD28(10%), CD19(10%), CD27(8%), CD117(6%), CD81(6%), CD38(6%), CyIgλ(4%), CD138(3%), β2 micro(0%); (d), CD19(29%), β2 micro(13%), CyIgκ(13%), CD56(13%), CD45(11%), CD38(5%), CD27(4%), CyIgλ(3%), CD28(3%), CD138(2%), CD117(2%), CD81(2%); (e) CD19(23%), β2 micro(13%), CD45(12%), CyIgλ(11%), CD81(9%), CD56(7%), CyIgκ(6%), CD38(6%), CD28(4%), CD138(4%), CD27(4%), CD117(1%); (f) CD19(20%), CD117(19%), CD81(14%), CD45(13%), CD56(12%), CyIgλ(9%), β2 micro(6%), CD38(3%), CD27(2%), CD28(1%), CD138(1%), CyIgκ(0%). In all PC1 vs PC2 plots, solid circles represent median values for the 12 fluorescence-associated parameters evaluated, inner (dotted) and outer (solid) lines represent the first and second standard deviations for individual PC. nPC populations are depicted in green while aPC are shown as red dots, circles and lines, respectively.

Results on the comparison between antibody reagents directed against the same marker (Supplementary Table 2), was based on their staining profiles on nPC vs aPC (vs other non-PC BM populations) as defined by median fluorescence intensity (range: 0–262,144 arbitrary units) and stain index values, as previously described47 (Supplementary text). The EuroFlow reagent evaluation criteria44 were used to discard or accept individual reagents: (i) increased background fluorescence; (ii) dim fluorescence of PCs vs other BM populations (CD38, CD138, CyIgκ+ or CyIgλ+) or the positive vs negative BM reference populations (CD19, CD27, CD56, CD45, CD81 vs CD117) and (iii) interaction with the staining pattern of other reagents.

Design and evaluation of sample preparation protocols

For the evaluation of different sample preparation protocols, eight BM and 10 peripheral blood samples were stained in parallel with the CD138-HV450 CD45-PacO CD56-PE CD5-PerCPCy5.5 CD19-PECy7 CD3-APC antibody combination under five different conditions: (i) the EuroFlow standard operating procedure (SOP) for staining of 50 μl of sample with cell surface markers,44 and (ii) four different ammonium chloride-based bulk-lysis procedures followed by staining of 5 × 106 cells in 100 μl/tube (final concentration of 5 × 104 cells/μl), as described in detail in the Supplementary text.

Validation of the NGF MM-MRD method

Overall, 228 MM diagnostic (n=66) and follow-up (n=162) BM samples (n=110 in VGPR or CR/sCR) were evaluated in parallel with the NGF MRD approach vs local routine flow-MRD methods (that is, conventional 8-color flow-MRD technique).24 Detailed description of these BM samples, related patient clinical data, disease status and time points at evaluation is provided in Supplementary Tables 1, 3 and 4. Briefly, conventional flow-MRD was based on staining of 300 μl of whole BM with a single 8-color antibody combination (CD45-PacB CD138-OC515 CD38-FITC CD56-PE CD27-PerCPCy5.5 CD19-PECy7 CD117-APC CD81-APCH7), as previously described.24 In turn, for the NGF approach a median volume of 1.5±1.3 ml (range: 0.1–5.3 ml) was employed adding up to a median total sample volume of 1.8 ml (maximum of 5.6 ml). PC populations that coexisted in individual BM samples were identified based on a combination of the CD38, CD138, CD45 PC-associated markers and light scatter characteristics, the presence vs absence of myeloma-associated phenotypes, plus CyIg light chain restriction in case of NGF, as described elsewhere.48 According to consensus recommendations,42 the limit of quantitation and the LOD of the NGF MRD method was calculated at <5 × 10−6 and <2 × 10−6 aPC, based on the identification of 50 and 20 aPC among 107 events, respectively. More detailed information about instrument conditions, data acquisition and analysis, and the specific reagents used in the present study is provided as Supplementary Material.

Automatic identification and enumeration of aPC was performed in 110 MM BM follow-up (VGPR or CR/sCR) samples using the automatic gating function of the Infinicyt software and previously described procedures,49, 50 and the results were compared against the conventional expert-guided PC-identification/gating approach. For automatic gating, a database consisting of a subset of 14 normal BM samples stained with Version 5 of the antibody panel was constructed and used.50

In a subset of 31 MM follow-up BM samples with low MRD levels (for example, 10−4) in which enough DNA was available, MRD was also evaluated by NGS. For this purpose, patient-specific VDJH rearrangements were amplified and directly sequenced from DNA extracted from diagnostic samples using the DNAzol reagent (MRC, Cincinnati, OH, USA) and IGHV family-specific primers that covered framework regions 1 (FR1) and FR2, plus a JH consensus primer, as described elsewhere.26, 32 VDJH rearrangements identified at diagnosis were used as MRD-targeted sequences for subsequent follow-up samples. Follow-up DNA samples were amplified using the LymphoTrack IGH Assay (InVivoScribe Technologies, San Diego, CA, USA) and sequenced in an Illumina MiSeq platform (Illumina, San Diego, CA, USA). To all reactions, a known amount of DNA from the MWCL-1 cell line was added as reference control for cell enumeration. The Fastq files generated were analyzed with the LymphoTrack/MiSeq Software (InVivoScribe/Illumina). The number of MRD cells was calculated from the number of reads for the diagnostic VDJH target rearrangements and the number of reads of the reference rearrangement; percentage MRD was calculated upon dividing the number (× 100) of MRD cells by the total number of cells in the reaction.

Statistical methods

Number of samples required in the assay development and validation experiments were calculated using the hypothesis contrast strategy for the comparison of paired quantitative data (EPIDAT 4.0 software, Consellería de Sanidade, Xunta de Galicia, Spain). To evaluate the distribution of MRD data, the Kolmogorov–Smirnov test was used. The Wilcoxon or Friedman tests and the Mann–Whitney U or the Kruskal–Wallis tests were used to assess the (two-sided) statistical significance of differences observed between 2 groups for paired and unpaired variables, respectively. For correlation studies, the (two-sided) Spearman’s rho (ρ) for non-parametric paired data was employed. The Kaplan–Meier method and the (two-sided) log-rank test were used to plot and compare progression-free survival (PFS) curves. PFS was defined as the time from MRD assessment to either disease progression or the last follow-up visit. Statistical significance was set at P<0.05. All samples evaluated were blindly analyzed during the experimental phase.

Results

Antibody panel construction for optimal identification of MM plasma cells at MRD levels

Comparison of the whole immunophenotypic profile of aPC from individual MM patients (n=63) vs the normal/reactive BM PC database (n=31) showed multiple aberrant phenotypes in every case (Figure 1). Eight of the 12 markers evaluated contributed most frequently to the discrimination between aPC and nPC based on principal component analysis of single PC phenotypes: CD19 (97% of cases); CD45 (89%); CD56 (86%); CD81 (86%); CyIgλ (73%); CD27 (71%); CD117 (60%); and CyIgκ (56%). Re-evaluation of the utility of the combination of these eight top markers alone in the same 63 MM BM confirmed clear-cut distinction between aPC and nPC in the database, in every case. Consequently, the six surface membrane markers of this list, plus the CD138 and CD38 PC-identification markers, were selected to be combined in a single 8-color tube. In a second 8-color antibody combination, CyIgκ and CyIgλ were added to the CD138 and CD38, together with CD229 as an extra PC-identification antigen, plus the three most informative markers (CD19, CD45, CD56), for parallel confirmation of immunoglobulin light chain (κ vs λ) restriction of PC suspected to be (clonal) myeloma PC (Versions 2 and 3).

Optimization of the two 8-color MM MRD antibody combinations

Subsequently, evaluation of the same 63 MM diagnostic samples using the two 8-color antibody combinations selected above, but focusing now on the detection of minimal numbers (that is, 0.02–0.1%) of MM PC, was performed using virtual (software) dilution experiments of decreasing numbers of PC in the nPC reference database, as previously described.51 This revealed suboptimal reagent performance (for example dim staining) for two fluorochrome positions (that is, CD138-PacO and CD81/CyIgλ-APCH7).

From this initial Version 1, until the final version of the antibody panel (Version 5), four other versions of different fluorochrome-conjugated reagents of the same markers were tested (Table 1) as described in detail in Supplementary text. Briefly, in Version 2, the suboptimal CD138-PacO reagent was replaced by CD138-HV500C; in Version 3, suboptimal CD81-APCH7 and CyIgλ-APCH7 reagents were both replaced by CD81-APCC750 and CyIgλ-APCC750. For Version 4, the CD138-HV500C reagent found to be still suboptimal was replaced, together with the CD27-PerCPCy5.5 and CD45-PacB fluorochrome positions, by CD138-HV450, CD27-HV500C and CD45-PerCPCy5.5, respectively. Finally, in Version 5, CD138-HV450, CD27-HV500C and CD38-FITC were replaced by the optimized CD138-BV421 and CD27-BV510 conjugates, and the multi-epitope CD38-FITC antibody, respectively. The later CD38 reagent showed an equivalent performance in diagnostic MM samples to that of the original CD38 antibody clone, but a much better resolution in BM samples from MM patients who had received Daratumumab therapy (Supplementary Figure 1). Moreover, CD229 was excluded from tube 2, since this marker did not identify all aPC in 4/49 (8%) MM cases tested, and it was not specific for PC, being also (strongly) expressed on plasmacytoid dendritic cells and a subset of lymphocytes.52

Evaluation of sample preparation protocols

Overall, bulk-lysis procedures were systematically associated with acquisition of a significantly (P<0.05) greater number of cells vs the conventional BD FACS Lysing Solution (BD Biosciences, San Jose, CA, USA) based (FACS-lyse) SOP (Supplementary Figure 2A). However, all bulk-lysis conditions but that using low bovine serum albumin (0.5% bovine serum albumin) and a FACS-lysing-fixation step (protocol A1 in Supplementary Figure 2), showed a significantly higher proportion of debris and dead cells (P<0.05 vs FACS-lyse protocols) associated with similar numbers (P>0.05) of cell doublets (Supplementary Figure 2B). Of note, detailed analysis of the specific recovery of the major leukocyte populations (that is, eosinophils, neutrophils, monocytes, mature lymphocytes and nucleated red cells) as well as of nPC and aPC showed no significant differences between the conditions evaluated (P>0.05), except for higher nPC percentages for the bulk-lysis protocol B1 (Supplementary Figures 2C and D). Therefore, the bulk-lysis procedure including a FACS-lysing-fixation step and 0.5% bovine serum albumin (protocol A1 in Supplementary Figure 2) was selected as the reference SOP and used to further titrate the individual antibody reagents selected, for staining of 107 cells/tube (Supplementary Table 5).

Validation of the EuroFlow-based NGF MM-MRD method

The final NGF MM-MRD approach described above was validated against conventional 8-color flow-MRD in 228 MM BM samples studied at diagnosis (n=66) or after therapy (n=162), particularly focusing on 110 BM samples from patients in VGPR or CR/sCR. A strong correlation was found between both methods in diagnostic and follow-up samples with relatively high tumor burden from patients in partial response, stable disease and progressive disease (ρ=0.96; P<0.001; Figure 2a). Most importantly, a fairly good overall correlation was also observed among cases in VGPR and CR/sCR (ρ=0.77; P<0.001; Figure 2b), albeit significantly different rates of MRD+ samples were detected with both methods: 37/110 (34%) for conventional flow-MRD vs 52/110 (47%) for EuroFlow-NGF, respectively (P=0.003; Figure 2b). This was due to a relatively high number of discrepant cases, which were mostly MRD+ by EuroFlow-NGF but MRD by conventional flow: 18/21 (86%) vs 3/21 (14%) discrepant samples, respectively. Of note, such discrepant NGF+ cases typically showed MRD levels <10−4 by EuroFlow-NGF with median (range) MRD levels of 0.001% (0.0001–0.03%) vs 0.02% (0.0008–1.79%) for MRD+ cases by both methods (P<0.001; Figures 2b and 3). Interestingly, in three MRD cases by EuroFlow-NGF, MRD+ results at relatively high levels (median of 0.01%; range: 0.006–0.02%) were observed by conventional MRD-flow (Table 2). Evaluation of cytoplasmic κ/λ expression in the suspicious PC from these three patients by EuroFlow-NGF, demonstrated the polyclonal nature of the suspected aPC in 2/3 cases, indicating false-positivity in conventional flow-MRD; in contrast, no clear explanation was found for the discrepant results observed in the other patient. Overall, the frequency of aberrant expression profiles for individual markers on clonal PC by NGF, was as follows: CD45, 96%; CD19, 96%; CD56, 96%; CD27, 89%; CD81, 79%; CD38, 77%; CD117, 48% and CD138, 37%. In 8/52 MRD+ cases confirmation of light chain restriction among small numbers of suspicious PC carrying slightly aberrant phenotypes was required. No significant differences were observed in the validation phase between the participating centers with respect to rate and type of MRD discrepant cases (P=0.63). Importantly, in all but 7/110 cases, >7 × 106 cells were evaluated (median 10.4 × 106 cells) with an impact on the sensitivity of the method because of not reaching the LOD in only 2 cases (1.8%). Interestingly, an alarm for decreased percentage of CD117hi mast cells (0.002%) suggesting blood contamination, was observed in 17/110 samples, 11/17 MRD and 6/17 MRD+ samples (Table 3).

Figure 2
figure 2

Validation of the new NGF method for MRD detection in MM against both conventional 8-color flow-MRD (a, b) and NGS (c), including expert-based vs automated NGF-MRD data analysis (d). In a, the comparison between NGF and conventional flow-MRD is shown for diagnostic and follow-up samples from patients with stable/progressive disease and partial response (n=118), while in b the two flow methods are specifically compared for follow-up samples from MM patients in VGPR and CR/sCR (n=110). (d) The correlation between expert-based vs automatic PC identification MRD levels in the same 110 BM samples as those of b. In turn, c shows the correlation between NGS and NGF MRD levels for those 27 (low level) MRD samples analyzed by both methods. *Samples proven polyclonal by Cy Ig κ/λ staining (2/2 and 2/3 in a and b, respectively). ƚSamples positive by NGS at the limit of quantitation of the technique. White and black circles in b and d represent NGF MRD levels below and above the limit of quantitation of the technique, respectively.

Figure 3
figure 3

Illustrating graphical representations of the performance of the NGF method based on the analysis of (merged) data files corresponding to a BM MM sample (>107 cells) with low level MRD stained with the NGF-MM MRD panel (Version 5). Left panels show classical bivariate dot plot representations in which PC (blue and red dots) were gated using a conventional manual analysis strategy. nPC (blue dots) display characteristic normal patterns of expression for the surface membrane markers used, with a cytoplasmic (Cy) Igκ vs CyIgλ ratio of 1.6. In contrast, clonal/aberrant PC (red dots) can be clearly discriminated from nPC based on their more homogeneous phenotypic profile, the presence of myeloma-associated phenotypes (CD138hi, CD38dim, CD19, CD81, CD117 and CD27dim) and a restricted pattern of expression of CyIgλ. Other non-PC BM populations are depicted as gray dots. In turn, the top right panel shows the results of principal component analysis –automatic population separator 1 (APS1) view of principal component 1 (PC1) vs PC2— demonstrating a clearly different overall immunophenotypic profile of normal and aPCs in this sample. In this later plot, circles represent median values for all phenotypic parameters measured in the two tubes but CyIgs, while inner (doted) and outer (solid) lines represent the first and second s.d. of the distribution of the PC events in the multidimensional space, respectively. The table in the right illustrates the top 6 parameters contributing to the separation between nPC and aPC in the above PC1 vs PC2 plot and their percentage contribution to the separation. Please note that, in this sample, PC corresponded to 0.005% of all nucleated BM cells; in turn, aberrant PC (127 PC events) corresponded to 0.001% of the whole BM cellularity with an assay sensitivity (in the quantitative range) of <5 × 10−6.

Table 2 Distribution of aberrant (aPC) and normal (nPC) plasma cells in BM samples from MM patients in VGPR, CR and sCR with discrepant MRD results (MRD+ vs MRD) by NGF vs conventional flow-MRD assays
Table 3 Distribution of distinct BM-associated populations as identified by the NGF antibody combination (version 5) and percentages of MRD and MRD+ cases with decreased levels

Automatic identification and enumeration of aPCs showed an excellent correlation with expert-based gating, also in VGPR and CR/sCR cases (ρ=0.96; P<0.001; Figure 2d). However, due to the minimum number of events required by the software algorithm, aPC were not identified in 3/110 cases (2.7%) with low MRD levels by NGF (and MRD by conventional flow-MRD) -median % aPC (range): 0.0006% (0.0001–0.005%; Figure 2d)-.

Parallel assessment of MRD by NGF and NGS in a subset of 31 samples showed a higher applicability for the EuroFlow-NGF approach: 31/31 (100%) vs 27/31 (87%) cases, respectively (P<0.001). Among those 27 cases assessed by both methods (22 of them with MRD levels 10−4), a good correlation was found between the percentage of residual aPC by NGF and NGS (ρ=0.62; P=0.001; Figure 2c). However, NGF showed a higher sensitivity than NGS with 19/27 (70%) vs 14/27 (52%) MRD+ samples (P=0.06) with higher MRD levels -mean percentage±s.d. MRD+ cells of 0.01±0.04% vs 0.006±0.02% (P=0.07), respectively-. This was due to 7/27 discrepant cases including six MRD+ by NGF and MRD by NGS: median (range) percentage aPC of 0.001% (0.0002–0.07%) and one MRD+ by NGS (0.0004% aPC) and negative (LOD <0.0002% aPC) by NGF (Figures 2c and 3). In fact, 8/27 samples (30%) were NGF-positive (quantifiable), but NGS-negative or discrepantly low positive.

From the prognostic point of view, MM patients who were MRD by NGF had a significantly (P=0.01) longer PFS vs MRD+ cases -75% PFS not reached (NR) vs 10 months; Figure 4a-, including also those that were MRD+ by NGF and MRD by conventional 8-color flow (75% PFS of 10 months; Figure 4b); similar results were observed when the analysis was restricted to patients in CR/sCR (P=0.02; 75% PFS NR vs 7 and 5 months, respectively; Figures 4c and d). Of note 2/6 patients who were MRD+ by NGF and MRD by NGS also showed disease progression, while the only NGF/NGS+ discrepant patient remained in continuous CR after 14 months follow-up.

Figure 4
figure 4

PFS curves of MM patients grouped according to their BM MRD status as assessed by NGF (a, c) and both NGF and conventional flow-MRD (b, d). In a and b the impact of the MRD status is shown for MM patients in VGPR, CR and sCR (n=79), while in c and d, PFS analyses was restricted to MM patients who were in CR and sCR at the moment of MRD assessment (n=50).

Discussion

MRD detection in BM has proven clinically meaningful for MM monitoring after therapy, particularly to predict outcome of patients that reach CR12, 13, 14, 15, 16, 29 independently of therapy.22, 53 Currently, several different flow- and PCR-based MRD approaches are available.30 Flow-MRD has clear advantages vs PCR-based methods because of its relative simplicity, high speed, greater clinical applicability and worldwide availability.16, 25 However, major concerns have been recently raised about the lack of standardization of flow-MRD in MM38 and its lower sensitivity vs PCR-based approaches, particularly NGS.25, 29, 53, 54, 55

Here we describe an innovative EuroFlow-based high-sensitive, standardized and validated NGF-MRD method which can be applied to virtually every MM patient for MRD monitoring in BM after therapy. Overall, our results show a similar applicability but a significantly increased sensitivity for the novel EuroFlow-NGF approach vs conventional flow-MRD, with around one fourth of all MRD-negative samples by conventional flow becoming MRD-positive by NGF. To the best of our knowledge, this is the first time that a validated high-sensitive flow-MRD assay is described with a LOD close to 10−6 (ability to identify down to 20 tumor PCs among 107 evaluated BM cells) and a limit of quantitation of <5 × 10−6 (ability to accurately quantity tumor PC percentages at levels down to five cells per million cells, that is, 0.0005%), calculated following consensus recomendations.42 Importantly, the EuroFlow-NGF approach provided similar results in different centers, which further confirms the high standardization level of the method. From the clinical point of view, despite the (still) limited follow-up time, the NGF approach already showed a significant prognostic impact on PFS of MM patients, even among those in CR/sCR, significantly improving the predictive clinical value of conventional flow-MRD.

The greater sensitivity of NGF vs conventional flow-MRD was mostly due to the use of both an optimized combination of fluorochromes and antibody reagents for increased specificity at very low MRD levels, and the 10-fold increase in the number of cells evaluated, in the context of fully standardized laboratory protocols. The two-tube approach proposed also allows confirmation in a second independent measurement, of the clonal nature of suspicious (low numbers of) PC, through evaluation of the cytoplasmic κ/λ restriction of phenotypically aberrant PC which proved to be required in a significant number of cases. Interestingly, the phenotypic markers selected as most informative did not differ from those considered to be essential by expert consensus.40, 41, 48 However, we proved here that selection of optimal fluorochrome-conjugated antibody clones per marker could not be predicted by pre-existing (shared) expertise. Thus, identification of the optimal marker combinations for the 2-tube 8-color antibody panel required five rounds of optimization of what we already considered initially to be a potentially ‘optimal’ panel (that is, Version 1). Major limitations of suboptimal reagents were: (i) increased non-specific and background fluorescence; (ii) too dim fluorescence intensity; (iii) specific interactions among mixed reagents and/or; (iv) suboptimal staining or reduced reactivity on nPC vs aPCs, particularly on Daratumumab-treated vs non-treated patients. Altogether, these results indicate that (extensive) prospective testing is mandatory to define optimal combinations of reagents for flow-MRD monitoring in MM, due to the problems encountered and the significantly different staining profiles obtained with distinct combinations of reagents of the same CD markers. As an example, only two of the many (n=9) CD38 antibody clones evaluated proved efficient for detecting CD38 on PC from Daratumumab-treated MM patients (Supplementary Figure 1); even more, only one of these two clones proved to be effective for detecting CD38 on MM PC treated in vitro with the Isatuximab antibody (Supplementary Figure 1).

Although the specific combination(s) of markers used is a key factor for optimal identification of BM PC and discrimination between nPC and aPC, another critical factor in building a sensitive flow-MRD technique is the number of cells analyzed.42, 56 To the best of our knowledge, this is the first report in which consistently >5 × 106 cells/patient sample (usually >107 cells) were investigated. For the most frequently used stain-lyse-and-then-wash sample preparation flow-MRD procedures, hundred thousand to 1–2 million cells in 300 μl BM are analyzed. In contrast, the here described EuroFlow SOP assured acquisition of >107 events in most MM BM MRD samples, by means of staining a median of 1.5 ml of total BM sample, with a proven limit of quantitation for the NGF MRD method of <5 × 10−6 and a LOD of <2 cells per million. These features contribute to explain the higher sensitivity of EuroFlow-based NGF vs conventional flow-MRD, and NGS. The relatively high frequency of discrepant results (30%) in the NGF vs NGS comparison among cases with low (<10−4) MRD levels, might be caused by suboptimal annealing of the NGS-PCR primers due to high levels of somatic hypermutations in the IG genes of nPC and aPC,16, 32, 34 and deserves further investigation.

Independently of its potentially greater sensitivity, NGF has other additional advantages over NGS:37 it is faster (<4 h), standardized and reproducible, it has a greater applicability (98%), and it does not require a diagnostic sample or patient-specific probes, which potentially lead to a lower variability in the sensitivity reached per patient.16 However, EuroFlow-based NGF required fresh material analyzed within 24 h after sampling; this is feasible in virtually all countries, since the standardized EuroFlow procedures have now been implemented worldwide. In addition, the costs of the NGF reagents (100 USD) and assay (350 USD), are estimated to be lower than those of NGS (350 USD and 700 USD, respectively).16

Importantly, NGF can also provide an overall assessment of the quality of the patient sample through identification of a significant decrease in non-PC BM cell populations (for example, CD117hi mast cells, CD45 sideward-scatterlo nucleated red blood cells, CD117+ myeloid precursors, CD19+CD38hiCD45lo B-cell precursors and CD19 nPC) in hemodiluted BM samples (Table 3) and through providing full insight in the normal B-cell compartment via the identification of normal BM residual B-cell precursors (CD19+CD38hiCD45lo and CD45int), immature B-lymphocytes (CD19+CD45hiCD38lo), naive B cells (CD19+CD38CD27), memory B-lymphocytes (CD19+CD38CD27+), and nPC (CD19+CD56, CD19CD56 and CD19CD56lo) in addition to myeloma PC (Supplementary Figure 3). Of note, decreased numbers of mast cells were found in 17/110 VGPR BM samples, particularly among MRD cases, including the only two patients that showed disease progression (Table 3), pointing out the need for careful evaluation of MRD cases for blood contamination. Whether evaluation of an additional sample from the same patient is required in the such cases, still remains to be established.

The relatively short time needed for the complete EuroFlow-NGF procedure (<4 h) can be further reduced by implementing automatic sample preparation procedures, pre-mixed and dried antibody cocktails, and software algorithms for automatic data analysis. Such improvements are ongoing and will further contribute to prevent diagnostic errors.

A major challenge we faced during the design phase was to determine whether the 2-tube 8-color NGF approach could be replaced by a single 10-color tube to decrease reagent costs and data acquisition time, as suggested by others.57 Direct comparison of Version 5 of the two 8-color tube antibody panel vs three different versions of a single 10-color antibody combination (Supplementary text) showed quite comparable results for both formats. However, the 2-tube 8-color method emerged as a more robust assay because (i) higher numbers of cells were measured; (ii) the confirmatory value of the second tube in case of small populations of suspicious PC found in the first tube; and (iii) the increased consistency and precision of replicate vs single measurements.

In summary, here we describe a novel validated EuroFlow-NGF assay for high-sensitive, fast and standardized quantification of MRD in MM that overcomes previous limitations of conventional flow-MRD approaches and improves prediction of patient outcome. This method is ready-to-use and well-suited for implementation in clinical trials to establish the diagnostic role of MRD in MM.