Introduction

Treatment of chronic lymphocytic leukemia (CLL) with chemoimmunotherapy such as fludarabine, cyclophosphamide, rituximab (FCR) results in high response rates with prolonged progression-free survival (PFS) and overall survival (OS).1, 2 Identifying more effective treatments using standard end points (e.g. PFS) would require clinical trials including a very large number of patients and a long follow-up, as recently acknowledged by regulatory agencies.3, 4

The detection of minimal residual disease (MRD) above a 0.010% (10−4) threshold is an independent predictor of PFS and OS in patients with CLL treated with chemoimmunotherapy.5, 6, 7, 8, 9 Although novel therapies such as B-cell receptor (BCR) signal inhibitors can result in prolonged survival without achieving MRD negativity,10, 11 and it remains to be established the actual prognostic value of achieving an MRD-negative status with therapies other than chemoimmunotherapy (e.g. FCR), MRD studies continue to be necessary to evaluate treatment strategies aimed at disease eradication and cure, including those in which new agents are combined with cytotoxic drugs (e.g. FLAIR trial, ISRCTN 01844152).12 Moreover, using MRD as a surrogate of treatment effectiveness would allow determination of the efficacy of new therapies without the need for prolonged observation times.

The European Research Initiative on CLL (ERIC) has previously harmonized flow cytometry methods to detect residual disease using 4-color (4 tubes)13 and 6-color (2 tubes)14 panels. Although these approaches are effective at the 0.010% threshold recommended by the International Workshop on CLL to define absence of detectable MRD,15 both have several technical and practical limitations, including the necessity of distributing the blood sample across multiple tubes, which can impair sensitivity in cases with poor cellularity. The majority of new flow cytometry instruments offer 8- or 10-color analysis allowing combination of the required antibodies into a single tube. This could provide an equivalent level of specificity and sensitivity and facilitate acquisition of more events per analysis, thus potentially improving the limit of detection below 0.010% (10−4).16, 17 In addition, high-throughput sequencing (HTS) technology has already shown potential to detect MRD at the 10−6 level.18

Because of all these developments, the ERIC undertook this study whose primary aim was to identify and validate in multiple centers a single-tube assay fulfilling the following conditions: (i) reliable for MRD detection at the levels required by the International Workshop on CLL guidelines.15 (ii) independent of instrument/reagent characteristics and (iii) flexible enough to incorporate and validate new, additional markers in the future. The secondary aim was to explore the relative merits of the flow cytometry assay and HTS to detect MRD.

Patients and methods

Patient samples

The diagnosis of CLL was based on International Workshop on CLL criteria.15 Leukocytes for analysis by flow cytometry and/or HTS were prepared from a total of 128 samples from 108 patients with CLL or monoclonal B-cell lymphocytosis, studied either at diagnosis or after FCR-based treatment (detailed in Supplementary methods). Normal leukocytes were separated from waste anonymized peripheral blood samples from healthy women aged 18–30 years or from leucodepletion filters. Informed consent for sample collection was obtained in all cases. Ethical approval was obtained for assay development using anonymized surplus waste material and patient samples sent for diagnosis or detection of residual disease (UK NRES 04/Q2150/125), and for comparison of HTS (ViViCLL protocol).

Flow-cytometry and dilution series

For the development of the core marker panel, leukocytes were prepared by ammonium chloride lysis from five patients with CLL at presentation or relapse and each diluted into normal leukocytes in five serial 1:10 dilutions (for a total of 25 samples) and 27 CLL cases sent for routine analysis of MRD levels after treatment. The sample numbers were selected to meet the validation criteria for cellular assays.19 Two million leukocytes from the dilution series were incubated with the CLL MRD antibody cocktail for 30 min, washed twice and the cells acquired on a FACSCanto II and analyzed using FACSDiva software (BD Biosciences, Oxford, UK). The antibody clones, fluorochromes and reagent volumes are specified in Supplementary methods. For the comparison between the 4-color standard assay, the six-marker core panel and HTS, leukocytes from CLL patients were prepared as above with ammonium chloride lysis and diluted into normal leukocytes at different concentrations, from 40 × 106 down to 40/ml, and acquired on a FC500 cytometer (Beckman-Coulter, Milan, Italy) starting from the lowest concentration to avoid cross-contamination. Electronic manipulation of data files for identifying superfluous antibodies and preparation of samples for developing the reagent specification are detailed in the Supplementary methods along with a description of technical aspects and how to calculate to limit of detection (LOD) and limit of quantification (LOQ).

ClonoSEQ

Immunoglobulin heavy (IGH) complementarity determining regions (CDR3) were amplified and sequenced using the ClonoSEQ platform (Adaptive Biotechnologies, Seattle, WA, USA), from (i) the peripheral blood samples of 13 patients affected by CLL and sent for analysis of MRD (see Supplementary data), using either 400 ng if untreated or 6–7 μg of all available DNA if treated; (ii) samples generated by dilution of three CLL cases into leukocytes from leukodepletion filters in serial 1:10 dilutions (n=18); (iii) 57 samples of peripheral blood from CLL patients at diagnosis (n=51) or individuals affected by high-count monoclonal B-cell lymphocytosis (with >0.5 × 109/l clonal B cells, n= 6), for whom an IG sequence had been determined by Sanger sequencing as reported previously,20, 21 to assess the efficacy of the clonoSEQ platform in detecting CLL-related IGH gene rearrangements. For the latter analysis the selected samples were purposely biased toward ‘difficult’ cases, the selection criteria being as follows: (a) CDR3 features (namely length, utilization of particular IGHV, IGHD and IGHJ genes, and somatic hypermutation load); (b) presence of multiple rearrangements; (c) absence of any detectable or productive IGH rearrangement; and (d) availability of multiple samples at different time points from the same case.

The ClonoSEQ platform consists of a set of multiplexed forward primers matching IGH variable (IGHV) and diversity (IGHD) gene sequences, combined with a set of reverse primers matching the joining (IGHJ) gene sequences. In this way all possible mature VDJ and immature DJ IGH rearrangements can be amplified. Sequencing was performed starting from the 3′ end of the rearranged J gene and extending upstream 87 base pairs, which results into covering the whole IGH CDR3 region. Rearranged IGH CDR3 sequences that contained insertions or deletions that resulted in frameshifts or premature stop codons were classified as non-productive. ClonoSEQ analysis was performed and results provided without knowledge of any previously determined IG sequence or MRD level.

Statistical methods

Assay comparison analysis was performed using Microsoft Excel 2013. Linearity (LINEST function), correlation coefficient (PEARSON functions), Bland–Altman plots, mean difference (AVERAGE function) and 95% limit of agreement, reported as ±1.96 s.d. (STDEV function), were calculated from log-transformed data. The minimum population size for the lower limit of detection and limit of quantification of CLL cells in a multiparameter analysis has been demonstrated to be 20 and 50 events, respectively;13, 14 therefore, the limit of detection is defined as 100 × 20/total leukocytes and the limit of quantification is defined as 100 × 50/total leukocytes. Percentage values are reported to two significant figures. Values above the limit of quantification were used for method comparison and dilution analyses. Concordance was considered acceptable for quantitative method comparison if the 95% limit of agreement was within ±2-fold (±0.3 log) based on acceptable performance for BCR-ABL quantitative PCR.22 Concordance was considered acceptable for qualitative method comparison if there was 90% agreement in detection of MRD at 0.010% vs <0.010% levels as indicated by the ICCS/ICSH guidelines for validation of cellular methods.19

For the analysis of IG genes, amplified rearrangement sequences were analyzed and delineated according to established methods.23 A standard algorithm for junction analysis and IGHV, IGHD and IGHJ gene identification was applied.24

Results

Identifying a core set of markers required for reproducible detection of MRD in CLL

An eight-color combination comprising CD19, CD20, CD5, CD43, CD79b, CD81, CD22 and CD3 was assembled based on the merging of the markers utilized in the previously published 2-tube 6-color ERIC-harmonized panel (Supplementary methods). The 8-color panel was assessed in dilution studies comprising 5 × 1:10 dilutions on five CLL cases and the results showed good linearity to 0.0010% (Figure 1a). Interoperator variation was also within acceptable limits using this assay (Supplementary Figure S1).

Figure 1
figure 1

(a) 8-CLR 1-tube panel dilution analysis. Data from serial dilution analysis of 5 × 1:10 dilutions on five CLL cases were analyzed using a single-tube eight-marker panel. Markers with a dark gray fill indicate results above the limit of quantitation; markers with a light gray fill indicate results below the limit of quantitation but above the limit of detection; and markers with no fill indicate results below the limit of detection. For log-transformed data above the LOQ, linearity=1.01, correlation coefficient Pearson R=0.99. (b) Confirmation that six markers are sufficient for detection of MRD: Bland–Altman plot comparing MRD level calculated using the single-tube eight-marker combination against the MRD level calculated using a six-marker core panel, that is, excluding CD3 and CD22. For log-transformed data above the LOQ, linearity=1.00, correlation coefficient Pearson R=1.00, average difference=−0.0026 log, 95% limit of agreement ±0.012 log. LOQ, limit of quantification.

In order to determine if any markers could be excluded because of redundancy, the dilution study files as well as data from 27 CLL cases sent for routine analysis of MRD levels after treatment were first analyzed with all markers present and then reanalyzed after excluding single markers. The exclusion of CD5, CD43, CD79b or CD81 had a substantial impact on the ability to detected MRD (data not shown) but the exclusion of CD3 and/or CD22 did not impact results below the limit of detection or above the 0.010% limit of quantification (Figure 1b). Differences were seen for results in the 0.0010–0.010% (10−4–10−5) range, thereby leading to the conclusions that (i) CD3 is not required in all cases but may be informative if a very high accuracy in the 0.0010–0.010% range is necessary, and (ii) the inclusion of both CD20 and CD22 is redundant in cases with typical expression of 2 markers CD5, CD79b, CD43 and CD81. Based on these findings, a core panel comprising six markers (CD19, CD20, CD5, CD43, CD79b and CD81) was defined as the most reliable and convenient to identify typical CLL.

Validation of the six-marker core panel against harmonized 4-color and 6-color assays

The six-marker core panel was validated in eight CLL cases diluted into normal peripheral blood leukocytes in serial 1:10 (n=5) or 1:5 dilutions (n=3). The results showed good concordance between observed and expected CLL cell levels (for log-transformed data above the limit of quantification, linearity=1.02, correlation coefficient (Pearson R)=0.996, average difference=−0.018 log, 95% limit of agreement ±0.18 log; Figure 2a). In three of the dilution series, using cells from leukodepletion filters as a diluent, it was possible to acquire sufficient total cells to demonstrate a limit of detection of 0.0010% (10−5) and a limit of quantification of 0.0025% (2.5 × 10−5) based on the identification of CLL-phenotype cells above the 20 and 50 event thresholds respectively and results within the quantitative range showing acceptable concordance with the expected level (±0.3 log). In this series there were also sufficient cells to permit comparison with the 4-tube 4-color ERIC-harmonized panel. Even though 2 × 106 events were acquired for each tube, both the limit of detection and quantification were only 0.0050%, based on detection at the 20 and 50 event thresholds and acceptable concordance with the expected level, thus inferior to those reached with the one-tube six-marker core panel. Therefore, the latter was able to improve detection and quantification capabilities compared with the 4-tube 4-color analysis, in addition to the reduced acquisition time and amount of reagents needed. Analytical variation was tested using 19 operators (Supplementary data) and showed acceptable interoperator variability (95% limit of agreement ±0.27 log; Figure 2b).

Figure 2
figure 2

(a) Six-marker core panel dilution analysis. Data from serial dilution analysis of eight CLL cases diluted into normal peripheral blood leukocytes in serial 1:10 (n=5) or 1:5 dilutions (n=3). Markers with a dark gray fill indicate results above the limit of quantitation; markers with a light gray fill indicate results below the limit of quantitation but above the limit of detection; and markers with no fill indicate results below the limit of detection. For log-transformed data above the LOQ, linearity=1.02, correlation coefficient Pearson R=1.00. (b) Acceptable interoperator variation in analysis of the six-marker core panel: analytical variation was tested using 19 operators with experience of flow cytometry but not direct experience of MRD analysis in CLL using the six-marker core panel. The results showed good concordance at the 0.010% threshold with and acceptable 95% limit of agreement of ±0.27 log for results above the limit of quantitation. For log-transformed data above the LOQ, linearity=1.02, correlation coefficient Pearson R=0.99, average difference=0.013 log, 95% limit of agreement ±0.27 log.

Identifying a platform-independent reagent specification

Neoplastic and normal leukocytes were prelabeled with different CD19 markers and then mixed and incubated with varying concentrations of the markers used to differentiate CLL cells from normal B cells. The prelabeling allowed the percentage of CLL cells having overlapping expression with normal B cells to be calculated in each case for each antibody dilution (n=30). The separation was considered adequate if there was 10% difference compared with the optimal separation in the dilution series from the same case. In addition, the relative signal for each marker at each dilution could be calculated by dividing the median fluorescence intensity on an internal positive control population by the median fluorescence intensity on an internal negative control population was calculated. The calculations are described in more detail in Supplementary methods and Supplementary Figure S2. Figure 3 shows the cumulative proportion of cases with suboptimal discrimination of CLL cells from normal B cells plotted against the relative signal. An appropriate reagent would provide optimal separation of CLL cells from normal B cells in >95% of cases, and the minimum relative signal to achieve this was calculated for each antibody. The minimum and preferred relative signal levels are specified in Table 1.

Figure 3
figure 3

Identification of the optimal reagent specification required for MRD analysis. CLL cells and normal B cells were separately labeled with CD19 PE-Cy7 and CD19 PerCP-Cy5.5, respectively, prior to washing and mixing to create five samples which were then incubated with serial dilutions antibodies at varying concentrations (neat to 1:243, serial 1:3 dilutions). This permitted calculation of the degree to which CLL cells overlapped with normal B cells in fluorescence intensity for each marker across a range of signal intensities. The signal intensities were calculated using internal positive and negative controls and plotted against the proportion of cases with suboptimal separation of CLL cells from normal B cells, where suboptimal separation was defined as an increase in overlap of 10% or more compared with the lowest overlap for each dilution series.

Table 1 Target values for markers used in CLL MRD analysis

High-throughput versus Sanger sequencing

The application of the ClonoSEQ platform led to the identification of a dominant clonotypic IGH CDR3 in all 57 CLL/monoclonal B-cell lymphocytosis samples tested. The size of the dominant clone ranged from 29% to virtually 100% with an average value of 89% and a median value of 99%. The dominant IGH CDR3 sequence was productive in 56/57 samples (98%).

Sanger sequencing was informative in 52/57 (91%) samples. The negative samples were from two CLL cases with two samples each from different time points and from one CLL case with a single sample. When the HTS dominant clonotype of 50 samples (from 46 CLL cases) was compared with the IGH CDR3 sequence identified by Sanger sequencing, clonal IGH CDR3 sequences were identical in 42/50 (84%) samples/cases. In five of the eight discordant samples the result of the Sanger sequencing was a single unproductive rearrangement with ClonoSEQ identifying a dominant productive clone together with the unproductive clone identified by Sanger. In another two CLL samples the Sanger clone was identified also by the HTS sequencing reads but was not the dominant one.

Concerning samples with double productive rearrangements, for which the phenotype was consistent with a single monoclonal population, both productive IGH rearrangements were identified by the HTS method in all six such samples with one of the two being the dominant IGH CDR3 sequence. In regard to the data reproducibility of the HTS method, we included in the analysis seven cases (namely six CLL and one high-count monoclonal B-cell lymphocytosis) with two subsequent samples. The dominant IGH rearrangement clone was identical in all seven cases.

Comparison between the six-marker core panel and HTS

HTS was compared with the six-marker core flow panel (CD19/CD5/CD22/CD43/CD79b/CD81) and dilutional analysis. The results demonstrated good linearity to the 10−6 level (Figure 4). HTS detected CLL IGHV-D-J sequences in 22% (7/31) samples with no detectable CLL cells by flow cytometry (i.e. CLL level 0.0001–0.0010%, 3/13 patient samples and 4/18 dilution samples). There was acceptable (>90%) concordance at the 0.010% threshold with 3/31 discrepancies (MRD level 0.0080% vs 0.039%, 0.027% vs 0.0040%, 0.56% vs <0.0010% by HTS vs flow cytometry). Although HTS demonstrated clear superiority in the limit of detection, there was a relatively high limit of agreement between the two techniques for data within the quantitative range (down to 0.010%/10−4; Figure 5).

Figure 4
figure 4

ClonoSEQ high-throughput sequencing (HTS) shows good linearity to one CLL cell in one million leukocytes. Analysis of three CLL cases diluted into leukocytes from leucodepletion filters in serial 1:10. Each CLL clone was tagged with two sequences, one productive and one non-productive. The plot shows the CLL sequence as a percentage of nucleated genomes. Each case has a different marker shape (square for case 1, circle for case 2 and diamond for case 3) with no fill for the productive sequence data, gray fill for the non-productive sequence and black fill for the average. For log-transformed data above the limit of detection, linearity=1.12, correlation coefficient Pearson R=0.98.

Figure 5
figure 5

Comparison of the six-marker core MRD flow assay with the 4-CLR 4-tube MRD flow assay and ClonoSEQ HTS. Data from serial dilution studies (n=18) and from patient samples after FCR-based therapy (n=12) were analyzed using the harmonized 4-CLR ERIC panel, the six-marker core panel and ClonoSEQ high-throughput sequencing. (a) Comparison of the six-marker core MRD flow assay with the ERIC 4-tube 4-CLR panel: For log-transformed data above the LOQ, linearity=0.99, correlation coefficient Pearson R=1.00, average difference=−0.044 log, 95% limit of agreement ±0.17 log. (b) Comparison of the six-marker core MRD flow assay with ClonoSEQ HTS: For log-transformed data above the LOQ, linearity=0.89, correlation coefficient Pearson R=0.75, average difference=−0.12 log, 95% limit of agreement ±1.3 log.

Discussion

Quantification of residual disease continues to be an important tool for the evaluation of treatment efficacy. This international ERIC project identified a simple and comprehensive approach to the detection of MRD in CLL that can be adapted to most laboratories using cytometers with six or more colors, providing reliable detection of residual CLL cells down to the level of 0.0010% (10−5) with a single-tube assay. This approach is directly comparable to previous ERIC-designed 4-color (4 tubes)13 and 6-color (2 tubes)14 assays.

The majority of new flow cytometry instruments offer 8- or 10-color analysis and therefore the six-marker core marker panel identified in this study (i.e. CD19, CD20, CD5, CD43, CD79b and CD81) may be combined with additional markers as required: for example, using CD45 or CD3 to facilitate leukocyte and CLL cell gating and enumeration; incorporating CD200(ref. 25) or CD23(ref. 26) to streamline the diagnosis and monitoring; or testing alternative CLL MRD markers such as ROR1(ref. 27) or CD160.28 To achieve this, CD3 and CD22 can be excluded in situations where they do not add a significant discriminatory value. CD3 was required in previous harmonized panels to exclude contaminating CD3+CD19+ events phenotypically similar to CLL cells in marker combinations not containing CD81/CD43.14 Here we demonstrated a lack of added value for CD3 in samples with CLL cells above the limit of quantification when combined with the six-marker core panel which incorporates CD81/CD43. Although CD22 was part of previous panels to discriminate CLL cells from normal B cells in patients exposed to anti-CD20 monoclonal antibodies, it has already been demonstrated that all normal mature B cells are absent in patients undergoing anti-CD20 treatment.29 Although CD20 remains a better discriminator of CLL cells from normal B cells under other circumstances30 panels using either CD20 or CD22 may be equally effective.31 The combination of the required markers into a single test allows the acquisition of a larger number of cells, thereby permitting an improved limit of MRD detection. Nevertheless, the key benefit is likely to be a greater reproducibility and a wider access for MRD detection at the threshold of 0.010% (10−4) with greater confidence in the MRD status. The optimal approach for MRD detection may vary depending on the setting, and the recommended options are shown in Table 2.

Table 2 Harmonized methods for residual disease detection using ERIC-harmonized approaches

The approach to detect MRD presented here is applicable to more than 95% of typical CLL cases.13 A pretreatment sample is not essential unless the diagnosis indicates an atypical phenotype, in which case the applicability of the MRD assay should be confirmed prior to treatment. Flow-cytometry assays typically target a coefficient of variation below 20%(ref. 19) but this is not consistent with other MRD approaches such as BCR-ABL quantitative PCR which target a 95% limit of agreement of ±2-fold (±0.3 log).22 Using this target, the interoperator variation is within acceptable limits for laboratories either with consolidated experience in flow cytometry analysis or after an education session of approximately 1 h. The identification of a platform-independent reagent specification means that individual laboratories are not restricted to using reagents from specific manufacturers and can rapidly determine the applicability of a panel using normal peripheral blood. In most cases the MRD analysis will contain sufficient internal controls to validate the assay after completion.

The development of this assay was primarily based on peripheral blood analysis because of the logistical difficulty in obtaining normal bone marrow. Moreover it has been shown previously that hematogones, plasmablasts and plasma cells can be easily differentiated from CLL cells based primarily on CD81 and CD5 expression.13, 31 However, the core panel has now been or is being tested prospectively in several trials including ADMIRE, ARCTIC, COSMIC, FLAIR and GALACTIC (ISCRTN references 42165735, 16544962, 51382468, 01844152, 64035629 respectively, http://medhealth.leeds.ac.uk/info/443/haematological), and the LLR TAP IcICLLe trial of Ibrutinib monotherapy (ISRCTN12695354) with available data confirming that CLL MRD analysis in bone marrow is readily achievable with the six-marker core panel (abstract S794 EHA Learning Center. Rawstron A. Jun 14, 2015; 103177). Examples of analysis using the six-marker core panel in patients treated with venetoclax, ofatumumab and ibrutinib are shown in Supplementary Figure S3.

The comparison between Sanger and HTS sequencing supports the application of HTS techniques like the one provided by the ClonoSEQ platform for the production of high quality and quantity IGH CDR3 data in the majority of CLL patients irrespective of any IGH sequence properties, the number of productive IGH rearrangements or the presence of two clonal markers. This analysis also demonstrated that HTS techniques could enhance the detection of productive rearrangements not identified by Sanger sequencing. Of note, the ClonoSEQ assay demonstrated a good concordance with flow for detection of MRD using the 0.010% (10−4) threshold with a much better sensitivity and good linearity across the range of MRD levels to 1 in a million (10−6). HTS does not require analysis on fresh material but can be applied to stored DNA, and therefore might be easier to be used in clinical trials than flow cytometry which typically requires samples to be less than 48 h old. However, the variation in quantification between flow cytometry and HTS could be higher than preferable and thus further work is warranted to standardize the quantitative analysis of HTS.

CLL-associated IGH sequences are frequently unmutated and stereotyped (i.e. identical sequences in different patients) and may be present in the normal IG repertoire. The detection of a CLL-associated IGH sequence in an unrelated disease-free sample is a possibility using HTS but analysis of the data in this series indicates that this would affect <5% of cases with a maximum false-positive result below 0.0020% (see Supplementary results). Since the IG rearrangement remains unaltered over time, knowing the clonotype at diagnosis of each patient allows a very sensitive determination of the upper limit of residual disease in each patient. Therefore our results confirm previous studies demonstrating the potential to detect one neoplastic cell in a million (10−6) normal leukocytes, a sensitivity level that is only limited by the amount of DNA that can be analyzed.

Although it is currently unclear what the impact of BCR signal inhibitors and other small molecules will be in CLL therapy, there are already trials using these agents in combination and pursuing disease eradication. MRD studies are important to assess the degree of response to therapy in trials aimed at eradicating CLL, and could expedite the evaluation of efficacy for new CLL treatments. To achieve these goals, the approach to determine MRD in CLL should be reliable, easy to perform and simple to interpret so that it can be applied routinely. This paper presents a method that fulfills those conditions, providing a core set of markers that can be easily re-validated in an individual laboratory. More interestingly, the combination of markers presented here permits a lower detection limit to be attained than that achieved by current harmonized methods (0.010%/10−4). It is conceivable that achieving an MRD level below 0.0010%/10−5 will translate into better clinical outcome, but this would need to be investigated prospectively. In line with this we also found that HTS can reliably detect disease below the levels that can be assessed by flow cytometry. It is likely therefore that HTS, either by itself or in combination with flow cytometry, may prove to be a valuable resource to improve MRD detection.