Standardized assay for assessment of minimal residual disease in blood, bone marrow and apheresis from patients with plasma cell myeloma

The recent advances in myeloma treatment result in significantly better outcomes, defined as increased progression free survival (PFS) and overall survival (OS). Since there is a proven correlation between the extend of response and prolonged survival, there is an urgent need for highly sensitive assays for the detection of minimal residual disease (MRD). Next generation flow cytometry has become a valuable approach for sensitive evaluation of the depth of complete response (CR). Here, we report the diagnostic performance and validation results of a single-tube 9-color panel assay. The validation design included intra-assay analysis measuring accuracy, inter-assay analysis estimating method’s linearity and precision and inter-assay analysis evaluating repeatability. Furthermore, in inter-operator analysis assessed the comparability of the result analysis of different operators. Staining stability was evaluated in age-of-stain experiments. Our validation results show that a reliable detection of residual myeloma cells is feasible to a detection level of 10−5 with a single-tube assay for a variety of materials (peripheral blood, bone marrow and stem cell apheresis). This study establishes highly sensitive, fully standardized approach for MRD detection in myeloma that is ready for implementation in routine diagnostic laboratories.

Plasma cell myeloma is a hematologic neoplasm characterized by the proliferation of malignant plasma clones. With targeted therapies available, a considerable number of patients can achieve complete response and have a significantly better outcome, defined as increased progression free survival and overall survival 1,2 . However, only 3 to 10% of plasma cell myeloma patients who have received high dose therapy will remain in complete remission for more than ten years 3 , while the majority will eventually relapse and undergo further treatment. Since there is a correlation between the extend of response and prolonged survival, there is an urgent need for highly sensitive assays for the detection of minimal residual disease (MRD) 4,5 . MRD is a more sensitive measure of response than conventional criteria and was shown to have an enhanced predictive value in comparison to standard methods 5 . Thus, MRD detection is very important for deciding whether a patient will undergo relapse-appropriate treatment 2,6 .
Multiparameter flow cytometry enables robust and cost effective monitoring of minimal residual disease 7 in plasma cell myeloma patients. Because of the increased number of simultaneously used fluorochromes huge variety of cells and subtypes with different characteristics can be assessed. This enables estimation of the MRD by detection and differentiation between normal and abnormal plasma cells.
In order for MRD assays to be highly specific and sensitive, a combination of immunophenotypic markers that are able to identify and discriminate between normal and abnormal plasma cells is required 1,[8][9][10] . CD38 and CD138 were used as gating markers, while CD19, CD27, CD45, CD56, CD81, CD200 and CD117 allowed for the identification of the most frequent deviation from the normal plasma cell phenotype. In addition, the presence of CD45 allowed for further phenotypic characterization of plasma cells and their quantification relative to the leukocyte count.
In order to obtain a quantification limit (LOQ) 11 , defined as the lowest concentration at which the analyte can be quantified, in the magnitude of ≤10 −5 (i.e. one abnormal plasma cell detected in a population of ≥100,000 leucocytes) the sample has to be enriched to a total leucocyte count of 3-5 million in a small volume (e.g. 100 μl) following blood cell counting. The obtained cell suspension has to be stained according to a standard operating procedure (SOP) 11,12 .
In this study, we present a highly sensitive and standardized procedure for assessing minimal residual disease in patients with plasma cell myeloma in peripheral blood, bone marrow as well as in apheresis product. Our results show that our assay due to its highly discriminative combination of antibodies and effective gating strategy can be easily applied and validated in high throughput flow cytometry laboratories.

Materials and Methods
Qualification of instruments and good manufacturing practice (GMP) training. Qualification of all cytometers used in the study was preceded by risk analysis using the Ishikawa (fishbone diagram) and risk mitigation strategy performed according to failure modes and effects analysis (FMEA) 13 . Moreover, all cytometers underwent qualification based on written SOPs. All procedures were described in SOPs and the technical staff was adequately trained in using the SOP Guard Software.
Blood and apheresis specimen collection. The study was approved by the Ethics Committee of the Charité -Universitätsmedizin, Berlin, Germany. All experiments were performed in accordance with relevant guidelines and regulations. Healthy individuals and plasma cell myeloma patients undergoing stem cell apheresis at the Charité -Universitätsmedizin, Berlin, Germany were recruited for this study. Written informed consent was obtained from all participants. Blood was collected into vacutainers (BD, Heidelberg, Germany) containing EDTA for anticoagulation. Apheresis samples were collected with the Spectra Optia ® Apheresis System (Terumo BCT) using the Continuous Mononuclear Cell Collection (CMNC) protocol.
Briefly, the number of leucocytes in the specimens was determined using the hematology analyzer (Sysmex IP300 (XP-350), Sysmex). Leucocytes were then lysed using Versafix solution (VersaLyse, Beckman Coulter supplemented with 0.25% IOTest Fixative Solution) at room temperature. Cell suspensions containing 3-5 × 10 6 leukocytes were then centrifuged at 300 x g and resuspended in 100 μl phosphate buffered saline (PBS, Gibco). Cell suspensions were then transferred to DuraClone RE PC tubes and an appropriate volume of CD117 ECD antibody was added. After 15 min incubation at room temperature, cells were washed with PBS and centrifuged at 150 × g. The final pellet was resuspended in 500 μL PBS containing fixative solution (10% IOTest3 fixative solution, Beckman Coulter).

Data acquisition.
Sample acquisition was performed on 10-color, 3-laser NAVIOS flow cytometers (Beckman Coulter) using predefined settings. Debris was excluded by appropriate adjustment of the FSC recording trigger. Each sample was run twice in order to enhance the number of recorded events. Those duplicate readings were merged using the Kaluza analysis software prior to the final analysis.
Acquisition settings were defined according to the manufacturer's instructions using the eight DuraClone RE PC compensation tubes as well as single CD117 or CD3 staining. Obtained photomultiplier tube (PMT) voltages were used to define target channels for all scatter and fluorescence detectors using calibration bead particles (Flow-Set Pro beads, Beckman Coulter). Matching of target channels was verified daily with a new calibration run to prevent target mismatch. Furthermore, all instruments underwent daily verification of optical alignment and fluidics using other calibration bead particles (Flow Check beads, Beckman Coulter).

Data analysis.
All acquired data files were analyzed using the Kaluza software, version 1.3 (Beckman Coulter). The two data files of the each stained sample were merged and analyzed. Cell doublets were excluded using either selection of events with highest FSC peak signals or with smallest width of FSC signal. Furthermore, cell debris was excluded from the analysis by using forward scatter time versus forward scatter dot plot. Dye aggregates were excluded from the analysis as outliers with high fluorescence intensities on the FITC detector and/or the near-infrared APC-AF750 detector. Absolute counts of the subpopulations were calculated in all panels by correlating CD45+ events with the white blood count obtained from all samples. Plasma cells were identified as events with high CD138 and high CD38 expression density. Abnormal phenotypes varied amongst patients and were identified by a combination of the following features: diminished expression of CD19, CD27, CD38, CD45, and/or CD81, overexpression of CD56, asynchronous expression of CD117 and CD200. Clusters of normal and abnormal plasma cells were identified using a 2D projection of all 9 fluorescent parameters (radar plot, Kaluza Analysis Software).
For the analysis of the inter-assay test, intra-assay test and inter-operator analysis, template analysis protocols were created for each set of experiments. Data files of the same sample were analyzed by loading the data files of each time point (inter-assay test) or parallel staining (intra-assay test) into the appropriate template. Accordingly, for the inter-operator analysis data, files were analyzed by 5 different operators using the same templates. Furthermore, each analysis included verification of compensation and, if applicable, minor adjustments. In case of age of stain assays analysis (for the 8-, 18-and 24 h specimens) the sideward scatter (SSC) parameter was adjusted when necessary.  www.nature.com/scientificreports www.nature.com/scientificreports/ Gated event counts were exported to Excel (Microsoft, Redmond, WA, USA) for the calculation of the frequencies of the subpopulations together with the respective mean fluorescence intensities and related standard deviation. Coefficients of variation (CVs) were calculated for each subpopulation frequency from replicates prepared from the same sample. The CVs obtained in intra-and inter-assay analysis were compared with the CVs as expected for the Poisson distribution characteristics.

Design of validation approach. Design of the validation of the residual plasma cells measurement was
based on the current recommendation 10,22,23 . Specifically, the validation design included intra-assay analysis measuring accuracy, inter-assay analysis estimating method's linearity and precision and inter-assay analysis evaluating repeatability (Table 1). Furthermore, in inter-operator analysis the comparability of the result analysis of different operators was assessed. Stability of the staining was evaluated in age-of-stain experiments. For the intra-assay test, blood samples from healthy individuals were spiked with different concentrations of stem cell apheresis product from patients with plasma cell myeloma. For the inter-assay analysis blood from five healthy donors was collected and spiked with U266 cells at five different concentrations: 0,5000%; 0,0500%; 0,0050%; 0,0010%; 0,0005%; 0,0003%.
For the inter-assay analysis, whole blood, stem cell apheresis and bone marrow samples were taken from three patients with multiple myeloma. For the inter-operator analysis, a data set consisting of 25 files (22 unique files, 3 file doublets) was analyzed independently by five independent, trained operators.

Results
Intra-assay analysis: Accuracy. The aim of the intra-assay analysis was to evaluate the accuracy of assessment of minimal residual disease in patients with plasma cell myeloma. 18 samples from normal donors with 6 different concentrations of spiked U266 cells were assayed, recording 1,799 × 10 3 ± 481 × 10 3 CD45+ events. Samples were created by spiking whole blood from a healthy donor with patient apheresis product of known plasma cell frequencies as assayed by flow cytometry analysis. Final target frequencies ranged from 0.008-0.0005% of CD45+ anomalous plasma cells, representing 8 anomalous plasma cells in 1 × 10 5 CD45+ to 1 anomalous plasma cell in 2 × 10 5 CD45+ cells, respectively. Table 3 shows the results of normal samples spiked with patient apheresis products. CVs were in the expected range (Poisson noise) for all tested samples. Recovery of spiked cells was found at an average of 85% when referring to samples above the theoretical lower limit of quantitation (LLOQ) from 0.008% to 0.004% and 75% for samples above the theoretical lower limit of detection (LLOD) from 0.008% down to 0.002%. The theoretical lower limit of detection was calculated as the percentage of minimum 20 positive events in the total number of leucocytes acquired. Furthermore, the theoretical limit of quantification was calculated as the percentage of minimum 50 positive events in the total number of leucocytes acquired 11,24,25 . Verification of drop-in options: Accuracy. The aim of the drop-in verification was to compare the accuracy of the assessment of minimal residual disease in patients with plasma cell myeloma while using additional antibodies. Five apheresis products and three bone marrow samples were stained in Duraclone RE PC tube parallel without additional anitibody, with CD3 and CD117 ECD drop-in. The addition of drop-in antibody did not compromise the performance of Duraclone RE PC tube staining (see Fig. 1B). Final frequencies of malignant plasma cells between different stainings of the same sample ranged from 0% to 0.01% of CD45+ cells whereas normal plasma cell frequencies ranged from 0% to 0.07% of CD45+ cells.
Furthermore, CD3 drop-in antibody was used to extend the analysis of the immune compartment. The frequencies of residual T, B, and NK cells analyzed in 5 leukapheresis products are shown in Table 4.
Inter-assay analysis: precision/Linearity. The aim of the first set of experiments of the inter-assay analysis was to evaluate the precision and linearity of assessment of minimal residual disease in patients with plasma cell myeloma. 90 samples with 6 different concentrations of U266 cells across 5 different whole blood samples from normal donors were assayed recording 1,157 × 10 3 ± 176 × 10 3 CD45+ events. Samples were created by spiking whole blood from a healthy donor with U266 cells to create samples with known frequencies of this cell line. Final target frequencies ranged from 0.5% to 0.0005% of CD45+ anomalous plasma cells, representing 1 anomalous plasma cell in 2 × 10 3 CD45+ to 2 anomalous plasma cells in 2 × 10 5 CD45+ cells, respectively (Fig. 1 Table 3. Results from validation runs on intra-assay variation/accuracy ( * represents values below theoretical LLOQ; **represents spiked values below the theoretical LLOD).
www.nature.com/scientificreports www.nature.com/scientificreports/ www.nature.com/scientificreports www.nature.com/scientificreports/ as approximated through linear regression by a linear equation y = 0.963 ×−1 × 10 −4 with a correlation coefficient of R 2 = 0.948 (Fig. 2). Table 5 shows the results of normal samples spiked with U266 cells. CVs were higher than expected (Poisson noise) for samples with frequencies from 0.5% to 0.005%. Recovery of spiked U266 cells was found at an average of 95% when referring to samples above the theoretical lower limit of detection (LLOD) or quantitation (LLOQ) from 0.5% to 0.005%. LLOD and LLOQ were calculated as described above.
www.nature.com/scientificreports www.nature.com/scientificreports/ Inter-operator analysis. To evaluate the inter-operator variability a comparison of results from five different data analyses conducted by five independent operators was performed. Trained operators analyzed 25 datasets of apheresis and whole blood samples. The validation data set included 22 unique files and 3 file doublets. Analyses of data sets with average event counts above 50 plasma cell events (malignant or normal), showed high consistency, with CVs lower than 25% (Fig. 3). Outliers of inter-operator CVs were observed in samples with low frequencies ranging from 200% CV for 25 plasma cells total to 0% for 8 plasma cells total.
Age-of-stain analysis. Age of stain analysis was performed in order to evaluate the stability of the staining over time. 10 replicates from a single bone marrow sample of a myeloma patient were analyzed immediately after staining as well as 4, 6 and 8 h later. The variability for a change from the baseline assessed directly after performed staining ranged from 2 to 13% across different time points. Table 6 and Fig. 4 show the bone marrow age-of-stain analysis with comparable cell frequencies and CVs across all time points.
Although the reproducibility of the analysis was shown across all sample types (bone marrow, peripheral blood and apheresis (data not shown)) even after 24 h, we note that a moderate shift in granularity, as shown by altered SSC, was observed that required adaptation of the gating.
Background from blank. In order to determine the limit of blank (LOB) defined as the highest measurement result that is likely to be observed for a blank sample, 25 samples from healthy donors were collected and stained according to standard protocols. The number of detected plasma cells with abnormal phenotype ranged between 0 and 2, while the mean number of CD45+ cells was 1472,5 × 10 3 and the mean normal plasma cell number recorded was 330. The analysis was performed with a standard protocol and LOB was calculated as the mean result and the standard deviation (SD), following the formula: mean blank +1.645(SD blank ). The calculated LOB was 0,76.

Discussion
Flow cytometry is a reliable, easy and cost effective tool for the assessment of minimal residual disease in patients with plasma cell myeloma 26 . Novel drugs significantly improve patients' outcomes by achieving longer remissions that cannot be measured reliably with standard methods such as immunofixation and electrophoresis 5 . On the other hand, molecular tests like allele-specific oligonucleotide (ASO)-PCR 27 or next generation sequencing (NGS) of immunoglobulin rearranged genes 28,29 require pre-treatment evaluation, have relatively high cost per sample and are time consuming. Although flow cytometry MRD testing is being performed by many laboratories, there are major differences in antibody panels, gating strategies and minimal event counts 30 . The effort of minimalizing the laborious workflow and cost while preserving the robustness and sensitivity, has been recently shown by the EuroFlow Consortium 31 and MSKCC group 32 (see Table 2).
The presented, standardized assay addresses the urgent need for easy, applicable, lean and sensitive flow cytometry based method for the evaluation of treatment efficacy that can then be routinely used in the clinical   www.nature.com/scientificreports www.nature.com/scientificreports/ setting. Furthermore, this validation evaluated the capability of using different starting material like mobilized peripheral blood and stem cell apheresis for MRD evaluation.
The tested flow cytometry panel was designed based on current recommendation and results of clinical trials 1,2,10 and took advantage of room temperature stable dried antibodies, preformatted for one test in a assay tube as well as automated instrument setup and compensation routine 33 . Furthermore, analysis using a predefined template, including dynamic gates and radar plots 34 , allowed for a high level of standardization independent of the personnel performing the analysis. This leads to a simplified and easy protocol that can be expanded by including further markers (e.g. ECD and APC-AF700 conjugated fluorochromes), as shown here with the addition of the CD117 or CD3 antibody. Additional antibodies used as drop-ins did not compromise the performance of the assay.
Reliable assessment of minimal residual disease can only be achieved upon stringent validation of the diagnostic assay. Here, we presented a validation study that follows the current recommendation summarized in the consensus guidelines 12,21,23 with the aim to demonstrate the ability of this assay to detect, in a reproducible and reliable manner, the presence of minimal residual disease.
Validation performed in compliance with ISO15189: 2012 accreditation guidelines for clinical laboratories requires a verification of the accuracy. To validate the accuracy of an assay it is necessary to compare average values obtained with a conventional true value. This presents a technical challenge, since there are currently no fully characterized reference materials available. Furthermore, there are no external quality assessment programs, while other techniques for MRD assessment like (ASO)-PCR or NGS have different sensitivities and specificity, making it difficult to use them as a comparison method 35 . In order to overcome those difficulties and determine the lower level of detection (LLOD), we used whole blood from a healthy donor and spiked it with apheresis product from a patient with plasma cell myeloma with active disease as estimated by standard means. Due to the amount and nature of the original sample the percentage of malignant plasma cells in the samples varied between 0.008% and 0.0005% for CD45+ anomalous plasma cells. CVs were in the expected range: 7-25% (Poisson noise) www.nature.com/scientificreports www.nature.com/scientificreports/ for all samples. Calculated LLOQ lied at 0.002% and LLOD was shown for 0.001%. In cases of rare populations where the LLOD is 0.01% or lower a CV < 30% is acceptable 22 .
Inter-assay experiments demonstrated linearity of the assay with a linear regression slope of 0.96 and a correlation coefficient of R 2 = 0.948. Precision of the assay was analyzed in the dilution experiment with samples created by spiking whole blood from a healthy donor with U266 cells to create samples with known frequencies of this cell line. The CVs were higher than expected (Poisson noise) for samples with frequencies from 0.5% to 0.005%; this might be attributed to the modified gating strategy needed for identification of the U266 cellular phenotype as compared to native malignant plasma cells. Analysis of the blank samples from healthy donors showed a low LOB of 0.76. Furthermore, repeatability showed high consistency of the results, regardless of the material used and the age-of-stain for no more than 24 h.
The development and introduction of a highly standardized analysis procedure as well as the introduction of radar plots significantly simplified the analysis procedure. The performed inter-operator analysis showed that inter-operator variability which is highly dependent on the subset abundance 33 can be significantly reduced. According to our analysis, the consistency in data evaluation with average event counts above 50 plasma cell events (abnormal or normal) can be very high with CVs lower than 25% 11 .
Although this standardized test has been validated to detect at least 20 abnormal plasma cells and quantify the cell concentration above 50 plasma cells, the actual sensitivity will depend on the adherence to the protocol and the quality of the sample 11,24,25 . Since the age of the sample significantly influences the variability, especially in low-abundant cell populations 36 , our assay was performed immediately after sample collection. Age-of-stain analysis showed that immediately following staining, samples could be measured up to 24 h post staining. Variability remained comparable for 24 h.
Our validation results show that a reliable detection of residual myeloma cells was feasible to a detection level of 0.0010% (10 −5 ) with a single-tube assay for a variety of materials (peripheral blood, bone marrow and stem cell apheresis).
Our MRD detection approach is applicable to more than 98% of plasma cell myeloma patient samples run in our laboratory. Furthermore, valid estimation of the presence of minimal residual disease was possible from peripheral blood and stem cell products. Due to the high sensitivity and robustness of the assay there was no need of assessment of pretreatment samples from the patients 37 .
In summary, the presented validation results demonstrate a highly standardized, well streamlined and lean workflow approach for the assessment of minimal residual disease in plasma cell myeloma patients. The presented validation of the test followed the FDA-NCI roundtable guidelines 7 and international consensus recommendations for myeloma flow cytometry based MRD quality control 12,23 .
Expanding the sample material to include peripheral blood and apheresis product will help to collect long-term data of their predictive value as well as extend their usage. Furthermore, wider usage of this MRD assessment standardized approach will allow reliable comparison between laboratories setting new standards in routine evaluation patients' response to treatment.

Data Availability
The datasets generated during and analysed during the current study are available from the corresponding author on reasonable request.