Minimal residual disease (MRD) diagnostics is used for treatment stratification in childhood acute lymphoblastic leukemia. We aimed to identify and solve potential problems in multicenter MRD studies to achieve and maintain consistent results between the AIEOP/BFM ALL-2000 MRD laboratories. As the dot-blot hybridization method was replaced by the real-time quantitative polymerase chain reaction (RQ-PCR) method during the treatment protocol, special attention was given to the comparison of MRD data obtained by both methods and to the reproducibility of RQ-PCR data. Evaluation of all key steps in molecular MRD diagnostics identified several pitfalls that resulted in discordant MRD results. In particular, guidelines for RQ-PCR data interpretation appeared to be crucial for obtaining concordant MRD results. The experimental variation of the RQ-PCR was generally less than three-fold, but logically became larger at low MRD levels below the reproducible sensitivity of the assay (<10−4). Finally, MRD data obtained by dot-blot hybridization were comparable to those obtained by RQ-PCR analysis (r2=0.74). In conclusion, MRD diagnostics using RQ-PCR analysis of immunoglobulin/T-cell receptor gene rearrangements is feasible in multicenter studies but requires standardization; particularly strict guidelines for interpretation of RQ-PCR data are required. We further recommend regular quality control for laboratories performing MRD diagnostics in international treatment protocols.
Several studies have shown that detection of minimal residual disease (MRD) has prognostic relevance in childhood acute lymphoblastic leukemia (ALL).1, 2, 3, 4, 5, 6, 7, 8 On the basis of MRD analysis during the early phases of treatment, preferably at two different time points, MRD-based risk groups can be recognized. Within the International BFM Study Group (I-BFM-SG), patients were classified according to MRD levels at day 33 and day 78 of therapy, and three risk groups could be distinguished: low-risk patients (LR), having MRD negativity at both time points (about 45% of patients; 5-year relapse rate of 2%); patients at high-risk (HR), having high (10−3) MRD levels at both time points (about 15% of patients; 5-year relapse rate of 80%); and the remaining patients at intermediate risk (IR; 5-year relapse rate of 22%).3, 9 Of note, for recognition of LR patients, the MRD assay had to reach a sensitivity of at least 10−4.3, 9 On the basis of these results, MRD diagnostics for treatment stratification is currently applied in many childhood ALL treatment protocols, including the ongoing AIEOP/BFM ALL-2000 and DCOG-ALL10 protocols.
Analysis of MRD is mostly performed using polymerase chain reaction (PCR) analysis of immunoglobulin (Ig) and T-cell receptor (TCR) gene rearrangements, as this method is applicable in the vast majority of childhood ALL patients and generally reaches sensitivities of 10−4 required for identification of LR patients.3, 9, 10 Within the I-BFM-SG, PCR analysis was initially followed by dot-blot hybridization using a radio-labeled junctional region-specific probe, resulting in a semi-quantitative analysis of MRD levels.3, 9 In the meantime, real-time quantitative RQ-PCR analysis has become available and offers an easier, faster and more quantitative method for MRD analysis.11
MRD detection by PCR analysis of rearranged Ig/TCR genes is however a complex process, involving many steps (Figure 1). Identification of pitfalls in this process is of importance in order to ensure comparable MRD results between the MRD-PCR laboratories of multicenter national or international treatment protocols. Furthermore, the move from a laboratory research tool used for retrospective analysis of clinical trials to a diagnostic tool for stratification of patients necessitates uniformity in MRD data not only within single treatment protocols but also between different treatment protocols.
Within the MRD Task Force of the I-BFM-SG, we therefore aimed to identify and solve potential problems in multicenter MRD studies and to achieve and maintain consistent MRD results between the MRD-PCR laboratories participating in the AIEOP/BFM ALL-2000 protocol. To this end, we evaluated several steps in PCR-based MRD detection, including detection and sequencing of Ig/TCR gene rearrangements, MRD analysis of follow-up samples, and interpretation of RQ-PCR MRD data. As the dot-blot hybridization method was fully replaced by RQ-PCR techniques during the course of the AIEOP-BFM ALL-2000 protocol, we particularly focused on the comparison of MRD data obtained by both methods and on the reproducibility of the RQ-PCR methods, both experimental variation and variation in the interpretation of RQ-PCR data.
Materials and methods
DNA was isolated as described previously.12 The presence of IGK-Kde, TCRG and TCRD rearrangements in diagnostic samples was determined using various primer combinations.3, 13, 14 Complete IGH rearrangements were detected using five VH family primers in combination with one consensus JH primer.15 Sequence analysis was performed as described previously.3
MRD levels in follow-up samples were either analyzed by dot-blot hybridization3 or by RQ-PCR analysis. Four laboratories performed RQ-PCR analysis using the ABI Prism equipment (ABI Prism 7700, 7900 or 7000) and single PCR assay with hydrolysis (TaqMan) probes;11, 15, 16, 17 one laboratory performed a nested PCR assay in which the second PCR was run on the Light Cycler using SYBR Green I.18 As these two approaches theoretically differ considerably, the results are shown separately where relevant.
Exchange of samples and data
Several steps of blinded testing were conducted on DNA samples from 18 patients with childhood ALL in the MRD laboratories of the AIOEP-ALL 2000 and ALL-BFM 2000 protocols: Vienna, Heidelberg/Hannover and Monza/Padova. The samples were not chosen randomly but selected by the Rotterdam MRD laboratory, based on their potential to identify and understand pitfalls that might cause discrepancies in MRD results. Also 20 RQ-PCR data files were selected by Rotterdam and circulated for independent interpretation.
All experiments were performed under routine conditions in parallel to the ongoing MRD diagnostics. All results were discussed in depth during closed meetings with participation of all laboratories, including scientists, technicians and clinicians. The regular and open discussions were essential for the learning process and for making agreements on the standardization of the MRD analysis.
RQ-PCR reproducibility experiments
To evaluate the reproducibility of the RQ-PCR analysis and the interpretation of RQ-PCR data, several analyses were performed in the MRD laboratories of Vienna, Heidelberg, Monza, Rotterdam and Sydney, the latter performing the MRD analysis for the I-BFM-SG-related Australian ANZCHOG Study VIII clinical trial. First, each of the five participating laboratories repeated the RQ-PCR MRD assays for a number of patient cases (total number of patients: 74). These repetitions were performed using new DNA dilutions but the same oligonucleotides, one to several months after the initial analysis. Second, the newly obtained RQ-PCR data were interpreted by both the executing laboratory and a second laboratory.
All data were analyzed by the department of Immunology, Rotterdam (VHJvdV). Data were presented non-blinded to facilitate the identification of the underlying causes of discrepancies and the discussion of how to overcome the pitfalls and achieve concordance.
Results and discussion
MRD diagnostics using PCR analysis of Ig/TCR gene rearrangements includes three main steps: (1) MRD-PCR target identification; (2) sensitivity testing; and (3) MRD analysis of follow-up samples (Figure 1). These three main steps were evaluated by comparing the results obtained in the laboratories of the I-BFM-SG MRD task force using centrally provided samples and data files.
Evaluation of step 1: MRD-PCR target identification
Potential MRD-PCR targets were identified by PCR-heteroduplex analysis in eight ALL patients. These eight patients were not chosen randomly, but were selected based on the availability of sufficient DNA, the presence of particular rearrangements and/or the presence of subclonal rearrangements. As shown in Table 1 a total of 40 clonal Ig/T-cell receptor rearrangements could be detected by at least one of the four participating laboratories. Twenty-five out of these 40 rearrangements (63%) were identified in all four laboratories. Discrepancies in target identification between the four laboratories were particularly caused by: 1, lack of detection of clonal Ig/TCR gene rearrangements; 2, sequencing errors; and 3, errors in sequence interpretation (Figure 1a).
Lack of detection of clonal Ig/TCR gene rearrangements. The detection of Ig/TCR gene rearrangements is dependent on factors such as the applied primer set, the PCR conditions, the amount of DNA input, the quality of the DNA and the amount of PCR product used for heteroduplex analysis. For example, the missed VH3-JH rearrangement in TF2 (see Table 1) might be owing to the use of a consensus FR3 primer instead of VH family-specific FR1 primers. Consequently, all four laboratories agreed to use the BIOMED-1 primer sets and PCR protocol.13
By PCR-heteroduplex analysis, several clonal PCR products showed a (very) weak band on the gel, suggesting a subclonal origin. In case of weak clonal bands in heteroduplex analysis (e.g., the Vγ3–Jγ2.3 rearrangement in TF9; Table 1), further identification of the rearrangements by sequencing was not performed in all laboratories, resulting in apparent discrepancies in the reported rearrangements between the four laboratories.
The presence of Ig/TCR gene rearrangements in a (minor) subclone may hamper its identification. To facilitate the interpretation of the PCR data, Southern blot-analysis was performed in one laboratory (Rotterdam).19, 20 Indeed, the IGH and TCRD rearrangements in TF2 and the IGH rearrangement in TF12, which were detected by PCR in only some of the laboratories, appeared to be oligoclonal according to Southern-blot analysis.
It was agreed that the use of an oligoclonal PCR target should preferably be avoided for MRD analysis, because only a part of the leukemic clone will be monitored and it is not known at diagnosis which subclone may eventually cause a relapse. If alternative monoclonally appearing rearrangements are not available, oligoclonal or subclonal appearing IGH gene rearrangements should be checked for the presence of a common DH-JH stem. An ASO primer designed in the common DH-JH stem will enable the simultaneous monitoring of multiple subclones containing this common stem. By this approach, one can avoid false-negative MRD results that might occur owing to ongoing clonal evolution if VH-DH specific primers were used.
Sequencing errors resulting in an incorrect sequence of the junctional region. In three cases, the sequence of the junctional region appeared not to be correct. These sequences were not obtained in the MRD-PCR laboratory itself, but were outsourced via a company that performed the sequencing reaction as well as the sequence interpretation. It is therefore of importance to re-check commercially obtained sequences by evaluating the original sequencing file, which is now routinely performed.
To obtain a reliable junctional region, it was agreed that each clonal PCR product should preferably be sequenced from both directions. In case of doubt, a second (independent) clonal PCR product should be sequenced.
Sequence interpretation errors. In 16 out of 40 rearrangements, the interpretation of the sequences obtained from the detected clonal Ig/TCR gene rearrangements differed between the four laboratories. Such misinterpretation may result in non-optimal design of ASO primers. For appropriate analysis of junctional regions, it was therefore agreed to use databases available on the worldwide web, such as IMGT (http://imgt.cines.fr), V-BASE/DNAPLOT (http://vbase.mrc-cpe.cam.ac.uk), Blast (www.ncbi.nlm.nih.gov/BLAST/), or IgBlast (www.ncbi.nlm.nih.gov/igblast/).
Incorrect interpretation of the junctional region may also be owing to alignment of too short sequences, resulting in inappropriate recognition of the involved gene segment. Finally, it was agreed that at least one-third of a germline D-segment sequence, with a minimum of five nucleotides, should be present for assigning D-segments in the junctional region sequence.
Evaluation of step 2 sensitivity testing
First, diagnostic samples from 10 ALL cases as well as the sequence of one or two Ig/TCR gene rearrangements were provided. ASO primers were designed by the different laboratories and evaluated for their sensitivity (Table 2). Two potential variables were identified: the ASO primer design and the interpretation of the sensitivity of the RQ-PCR assay (Figure 1).
ASO primer design. The specificity and characteristics of the ASO primer will affect the sensitivity. However, there was no straightforward relation between the designed ASO primers and the obtained sensitivities. Furthermore, in TF14 (Vγ3–Jγ1.1), the ASO primer designed by three laboratories was identical, but sensitivities obtained were not (lower sensitivity in one laboratory: 10−2 versus 10−4). This likely was due to the use of different control DNA samples, resulting in variable levels of background amplification (non-specific amplification observed in control DNA).
RQ-PCR data interpretation. During the discussion of the RQ-PCR results, it was clear that the interpretation of sensitivity varied between laboratories. Therefore a set of guidelines for RQ-PCR interpretation were drafted, focusing on definitions for reproducibility and reproducible sensitivity, definition of maximal sensitivity, definition of background and criteria for acceptable standard curves. Considerations for the design of these guidelines have been published previously.11 For evaluation of the guidelines for interpretation of RQ-PCR sensitivity, RQ-PCR data files from 20 ALL patients were subsequently analyzed by all four laboratories and discordant results were obtained in 14 out of 20 cases. After discussion and modification of the guidelines for RQ-PCR sensitivity interpretation, re-interpretation of the 20 cases still showed a different interpretation in eight cases. Further evaluation of these guidelines was performed in the ‘reproducibility experiments’ (see below).
Evaluation of step 3: MRD analysis of follow-up samples
To identify potential pitfalls and problems in MRD detection in follow-up samples, initially samples from 10 ALL cases were exchanged. On the basis of the obtained results (Table 2), two major factors affecting the reported MRD results were recognized: 1, the obtained sensitivity of the assay; and 2, the interpretation of obtained RQ-PCR MRD data (Figure 1).
Sensitivity of the MRD analysis. Logically, the level of MRD that can be detected in follow-up samples is dependent on the sensitivity of the applied method. Therefore, in case of low sensitivity, a sample may be considered negative, whereas it can be found positive in a more sensitive experiment (e.g., the second follow-up sample of TF3; Table 2). It should be noted that MRD-based risk group stratification in the AIEOP/BFM ALL-2000 protocols requires the availability of two targets with a sensitivity of at least 10−4.
Interpretation of RQ-PCR MRD data. The interpretation of RQ-PCR MRD results varied between laboratories, in particular for low to negative MRD results. Guidelines for RQ-PCR MRD data interpretation were therefore drafted, focusing on criteria for MRD positivity, MRD negativity and criteria for the calculation of MRD levels. Considerations for the design of these guidelines have been published previously.11 To evaluate and optimize the guidelines for RQ-PCR MRD data interpretation, RQ-PCR data from 20 ALL patients were interpreted by all laboratories. Discordant MRD results were obtained in 12 out of 50 follow-up samples that were analyzed; this would have lead to discrepant MRD-based risk group stratification in two out of the 20 cases. Re-interpretation of these data applying optimized guidelines still resulted in discordant MRD levels in six follow-up samples, all with very low MRD levels (<10−4). Yet, after re-interpretation of the data no differences were observed in MRD-based risk group stratification (based on the maximal MRD level of two MRD-PCR targets analyzed at two time points). Further evaluation of the guidelines was performed in the ‘reproducibility experiments’ (see below).
Further evaluation of Step 3: reproducibility of RQ-PCR experiments
The implementation of RQ-PCR-based MRD diagnostics during the course of the AIEOP/BFM ALL-2000 protocol and the observed variation in interpretation of RQ-PCR data (see above) necessitated the evaluation of the experimental reproducibility of the RQ-PCR methods as well as the RQ-PCR data interpretation in more detail. For evaluation of the experimental reproducibility, all five laboratories repeated the RQ-PCR MRD analysis of several ALL patients that were previously analyzed within the same laboratory. The results of these repeated MRD assays were interpreted by both the executing laboratory as well as a second laboratory.
Experimental reproducibility of RQ-PCR. Repetition of RQ-PCR assays resulted in comparable reproducible sensitivities in 80 out of 136 cases (59%). However, in 22% of cases the repeated experiment showed a lower reproducible sensitivity, whereas an improved reproducible sensitivity was obtained in 18% of cases.
As shown in Figure 2, comparison of MRD levels of individual targets and maximal MRD levels (highest MRD value for all targets analyzed per follow-up sample) showed concordant results in the majority of samples (<3-fold difference between MRD level in initial and repeated experiment, see legend Figure 2). In the single PCR approach with hydrolysis (TaqMan) probes, discordant MRD results were obtained in 52 out of 198 samples (26%) and maximal MRD levels were discordant in 27 out of 104 samples (26%). In the nested PCR with SYBR Green I detection, discordant MRD results were obtained in 25 out of 72 samples (35%) and maximal MRD levels were discordant in 18 out of 46 samples (39%). It should be emphasized that the main difference between the ‘single step PCR hydrolysis (TaqMan) probe’ approach and the ‘nested PCR SYBR Green I’ approach is not related to the use of an ABI RQ-PCR machine or the Light Cycler, but is related to the single PCR versus nested PCR approach. Discordant results were mainly observed in samples with low MRD levels (<10−4). First, some samples were considered to be positive (but not quantifiable) in one experiment but negative in the other experiment. These differences can be explained by the fact that low MRD levels are often detected below the reproducible range of the RQ-PCR assay; by definition repetition of experiments with MRD results below the reproducible range may give different results. It should be noted that within protocols aimed at therapy reduction for MRD-based low-risk patients (such as the AIEOP/BFM ALL-2000 protocol), prevention of false-negative MRD data may result in some non-specific (background) amplification being interpreted as a very low positive MRD level. A second type of discordant results was observed in samples that were considered being positive (but not quantifiable) in one experiment, whereas they could be quantified in the other experiment. These differences reflect variation in the reproducible sensitivity of the two experiments, which may result in identical MRD levels being quantified in the experiment with the highest reproducible sensitivity only (and being considered positive, not quantifiable in the other).
In order to prevent potential differences in MRD results, all five laboratories now use the single PCR approach employing hydrolysis probes (no nested PCR with SYBR Green I detection anymore).
MRD-based risk group stratification according to the AIEOP/BFM ALL-2000 protocol was identical between the initial and repeated experiment in 73% of cases (47 out of 64 patients in which both the day 33 and 3 months sample could be repeated with both targets). Discordant results concerned: HR → IR (2; 3%), IR → HR (2; 3%), LR → IR (2; 3%), IR → LR (4; 6%); all four patients who shifted between HR and IR groups had MRD levels just around the cutoff value. In the initial experiment, MRD-based risk group stratification could not be made in two patients (3%); both cases were MR in the repeated experiment. Five patients (8%), initially classified as LR (1) or IR (4), could not be stratified on the basis of the repeated experiment, owing to insufficient reproducible sensitivities in combination with negative MRD results. A comparable variation in MRD results has recently been reported for paired bone marrow samples.21
Reproducibility of RQ-PCR MRD data interpretation. The RQ-PCR data of the repeated experiments were sent to a second laboratory for re-interpretation of the data using the guidelines for RQ-PCR data interpretation. In 50% of cases, the reported reproducible sensitivity was identical between the two laboratories. As shown in Figure 3a and c, the re-interpretation by the second laboratory resulted in comparable MRD levels in 72% (single PCR using hydrolysis probe) and 77% (nested PCR using SYBR Green) of cases, but some clear discrepancies were observed as well. Particularly, MRD levels were quantified by two laboratories, whereas they were considered ‘positive, below reproducible sensitivity’ in the other laboratories.
Re-evaluation of MRD-based risk group stratification gave concordant results in 81% of cases (56 out of 69 evaluated patients). Discordant results concerned: HR → IR (1; 1%), IR → HR (1; 1%), LR → IR (1; 1%), unclassifiable → IR (3; 4%), IR → unclassifiable (4; 6%) and LR → unclassifiable (3; 4%). The discrepancies in considering a patient unclassifiable according to MRD results was mainly owing to a different interpretation of the reproducible sensitivity of the MRD-PCR targets.
On the basis of these results, the guidelines for RQ-PCR data interpretation were re-evaluated and adapted, and discordant cases were re-interpreted by the laboratories again. This resulted in identical reproducible sensitivities in 68% of cases. The second round interpretation significantly improved the individual results for MRD levels, but some discrepancies remained (Figure 3b and d). Furthermore, concordance in MRD-based risk group stratification between the two laboratories was increased to 86% of cases. One patient was HR versus IR, and seven patients were considered as not appropriate for MRD-based stratification (based on the lack of two sensitive targets) by one laboratory but were stratified by the other laboratory. All cases with discordant results were subsequently discussed within the MRD Task Force and consensus on the interpretation was reached in all cases.
Further evaluation of Step 3: dot-blot hybridization versus RQ-PCR
During the course of the AIEOP-BFM ALL-2000 protocol, the dot-blot hybridization method was replaced by RQ-PCR analysis. Therefore, we compared MRD data obtained by both methods. To this end, 46 patients previously analyzed by dot-blot technology were re-analyzed using RQ-PCR (62 targets, 109 samples); particularly patients with detectable MRD levels were selected for this purpose. As shown in Figure 4, the data showed a good correlation between the two methods (y=0.9993 × +0.1778; R2=0.7381). It should be noted that quantification of very high ‘MRD’ levels (10−2) by the dot-blot method is not accurate and consequently all MRD levels higher than 10−2 were reported as 10−2. Of importance, in only two patients (4%) MRD-based risk group stratification would have been different between the two applied MRD methods.
Within clinical treatment protocols, it is essential to obtain comparable MRD results in the involved laboratories. However, our data show that Ig/TCR-based MRD diagnostics is complex and that results may differ between laboratories. In our study, run in parallel to the MRD-based AIEOP/BFM ALL-2000 protocol, we identified several pitfalls in MRD analysis and made agreements on how to circumvent potential problems and to achieve and maintain uniform MRD data. Two topics appeared to be of utmost importance: standardization of experimental approaches and strict guidelines for interpretation of RQ-PCR data.
The standardization of the experimental approaches needs to address all steps of MRD analysis, including isolation of DNA, detection and identification of Ig/TCR gene rearrangements, interpretation of Ig/TCR sequences, ASO primer design, RQ-PCR technique and analysis of RQ-PCR data (Figure 1). Although it may not be necessary to replicate every step in exact detail, a certain level of standardization is required in order to achieve uniformity in MRD results, thereby ensuring the comparability of patient risk groups in different clinical trials. Given the complexity of the MRD-PCR procedure, it is advised to limit the number of MRD-PCR laboratories per treatment protocol. These laboratories need to have a detailed knowledge on the structure and composition of Ig/TCR genes and thorough experience in analyzing the rearrangement patterns. Furthermore, in order to achieve and maintain a minimal level of experience, the number of laboratories should preferably be limited to one laboratory per 10–14 million inhabitants (or one laboratory for smaller countries).
Guidelines for interpretation of RQ-PCR data are a prerequisite for clinical MRD studies and make it possible to compare results of different treatment protocols. The implementation of guidelines within this group was greatly facilitated by regular meetings with open discussion of non-blinded results. The guidelines for interpretation of RQ-PCR data as developed by the I-BFM-SG MRD task force are currently evaluated within the European Study Group on MRD detection in ALL (ESG-MRD-ALL), a consortium of 32 laboratories involved in MRD analysis of ALL patients. Within the ESG-MRD-ALL, the guidelines are further being optimized, in particular with respect to readability and practical applicability (van der Velden et al. Leukemia, in press).
The overall aims of the MRD Task Force were to identify and solve pitfalls in achieving consistent results in multicenter MRD studies. Indeed, the concordance in percentage of patients that could be MRD-stratified and the relative distribution of patients over the three MRD-based risk groups increased over time between the MRD laboratories of the AIEOP/BFM ALL2000 protocol (data not shown). In addition, the in-depth discussions of all results also contributed to achieving a higher level of efficiency in the MRD-PCR laboratories, because experimental procedures and approaches were attuned and optimized. Furthermore, the experience obtained in the distribution of samples and in the analysis and reporting of the results were highly valuable for the set-up of a quality control program. Such program, consisting of two quality control rounds per year that focus on all laboratory aspects of Ig/TCR-based MRD analysis, is currently being organized by the ESG-MRD-ALL. This quality control program is required for the implementation of RQ-PCR based MRD diagnostics in clinical protocols.
We are grateful to Dr Martin Zimmermann (Hannover, Germany) for advice on statistical issues and to Marieke Comans-Bitter for preparing the figures. We acknowledge the Kind-Phillip Stiftung, BMBF, Deutsche Krebshilfe, St Anna Kinderkrebsforschung, Fondazione Tettamanti, Fondazione Cariplo, Fondazione Città Della Speranza, Associazione Italiana per la Ricerca sul Cancro (AIRC), MIUR PRIN 2005 no. 2005069388_001, NH & MRC and Cancer Council (Australia) for financial support.
About this article
T-Cell Receptor Rearrangements Determined Using Fragment Analysis in Patients With T-Acute Lymphoblastic Leukemia
Annals of Laboratory Medicine (2019)