Quality control and quantification in IG/TR next-generation sequencing marker identification: protocols and bioinformatic functionalities by EuroClonality-NGS

Assessment of clonality, marker identification and measurement of minimal residual disease (MRD) of immunoglobulin (IG) and T cell receptor (TR) gene rearrangements in lymphoid neoplasms using next-generation sequencing (NGS) is currently under intensive development for use in clinical diagnostics. So far, however, there is a lack of suitable quality control (QC) options with regard to standardisation and quality metrics to ensure robust clinical application of such approaches. The EuroClonality-NGS Working Group has therefore established two types of QCs to accompany the NGS-based IG/TR assays. First, a central polytarget QC (cPT-QC) is used to monitor the primer performance of each of the EuroClonality multiplex NGS assays; second, a standardised human cell line-based DNA control is spiked into each patient DNA sample to work as a central in-tube QC and calibrator for MRD quantification (cIT-QC). Having integrated those two reference standards in the ARResT/Interrogate bioinformatic platform, EuroClonality-NGS provides a complete protocol for standardised IG/TR gene rearrangement analysis by NGS with high reproducibility, accuracy and precision for valid marker identification and quantification in diagnostics of lymphoid malignancies.


Introduction
Identification and assessment of clonal immunoglobulin (IG) and T cell receptor (TR) gene rearrangements is a widely used tool for the diagnosis of lymphoid malignancies, and is also essential for monitoring minimal residual disease (MRD) [1][2][3][4][5][6].
Next-generation sequencing (NGS) of IG/TR gene rearrangements is gaining popularity in clinical laboratories, as it avoids laborious design of patient-specific real-time quantitative (RQ)-PCR assays and provides the capability to sequence multiple rearrangements and rearrangement types within a single sequencing run. It also allows detection of MRD with a more specific readout than RQ-PCR [7]. Hence, several methods have already been described for high-throughput profiling of IG/TR rearrangements at diagnosis and follow-up in acute lymphoblastic leukaemia (ALL), chronic lymphocytic leukaemia (CLL) and other lymphoid malignancies [8][9][10][11][12][13].
NGS assays, especially those based on amplicons, pose major challenges, as multiple primers need to anneal under the same reaction conditions, while many technical variables may be introduced by library preparation, sequencing and bioinformatics, potentially leading to inaccurate results [14]. Particularly in a clinical context, strategies for standardisation of laboratory protocols and quality control (QC) of each component of an NGS assay are highly desirable, if not required.
Reference standards are essential for the evaluation of wet-lab and in silico NGS processes to ensure the analytical validity of test results prior to implementation of an NGS technology into clinical practice [15][16][17]. Reference DNA materials should be stable sources of rearrangements that can be sequenced and used for measuring qualitative and quantitative properties. However, previously published standards have a limited scope and utility, since they (1) do not cover all relevant IG/TR loci, (2) do not report on the quality of the sequencing run or the performance of samples and primers and/or (3) are synthetic constructs that may not reflect the complexity of native genomic DNA [9,18,19].
The EuroClonality-NGS Working Group was initiated to develop, standardise and validate protocols for IG/TR NGS applications, as introduced in Langerak et al. [20] and described in the accompanying manuscripts by Brüggemann et al. [21] and Scheijen et al. [22]. Innovatively, the EuroClonality-NGS assays include two types of QCs, both based on basic assay components, and both fully integrated in ARResT/Interrogate [23], the interactive bioinformatics platform developed within the Working Group: 1. A central polytarget QC (cPT-QC) consisting of a standardised mixture of lymphoid specimens, representing a full repertoire of IG/TR genes. It serves to assess performance biases or unusual amplification shifts in a sequencing run by tracking primer usage and comparison with stored reference profiles. 2. A central in-tube quality/quantification control (cIT-QC) consisting of human B and T cell lines with welldefined IG/TR rearrangements. The cIT-QC is directly added to a sample to undergo concurrent library preparation and sequencing, acting as in-tube qualitative and quantitative standard that is subjected to the same technical downstream variables.
Here we describe, evaluate and showcase these concepts and functionalities. We tested the developed protocol on a dataset of polyclonal samples, B-ALL and T-ALL diagnostic materials and follow-ups of patients with substantial treatment-induced shifts in IG/TR repertoires. We show its successful application and robustness for clinical laboratories that want to implement the EuroClonality-NGS assays for marker identification and quantification. Figure 1 provides an overview of the study.

ARResT/Interrogate
ARResT/Interrogate uses a web browser-based interface to (1) run an analytical pipeline to identify different types of rearrangements-'junction classes'-across all IG/TR loci (Supplementary Table S1), (2) store, retrieve and report on runs, (3) allow highly varied analyses and visualisations and (4) enable purpose-built meta-analyses and applications. Bioinformatic analyses were performed with ARResT/ Interrogate and purpose-built tools unless otherwise stated. Further implementation details are provided below and as Supplementary Information. The platform is currently freely available at arrest.tools/interrogate, hosted at the Meta-Centrum and CERIT-SC centres in the Czech Republic.

Sources and methods
The cPT-QC consists of genomic DNA isolated from healthy human thymus, tonsil and peripheral blood mononuclear cells (MNCs) in a 1:1:1 ratio (see Supplementary  Information). The cPT-QC undergoes library preparation alongside the investigated samples (Figs. 1 and 2).

Implementation
Primers are bioinformatically identified in the reads of each of the eight cPT-QC tubes of the run and their abundances compared to stored cPT-QC reference results using the test of proportions.
Stored reference results are the output of ARResT/ Interrogate from the analysis of a cPT-QC sample. These results should be confirmed through replicate runs over time in each lab to accommodate for technical variability (see Discussion). The results (and not the raw NGS data) are stored to ensure that the bioinformatic analysis is not compromised inadvertently by the user; this means that the results are updated with every major release of ARResT/ Interrogate to ensure compatibility with new runs.
Issues with abundances of primers of a specific primer set are used to tag the corresponding cPT-QC samples and all user samples of the same primer set as 'QC-failed'.

Replicates
As reproducibility is important for a QC of this type, we performed replicate runs of cPT-QC and also of MNC (four libraries in total); MNCs are regularly used and could serve as an alternative. Relative abundances of 5′ primers were compared employing the test of proportions.

Primer perturbations
To investigate whether and how the cPT-QC can be used to detect issues with primer performance, artificial perturbations of primer concentrations were created to simulate missing pipetting a primer or pipetting the wrong primer concentration.
First, 5′ primer usage was analysed in a cPT-QC sample. Two primers of differing abundances were selected from each primer set, skipping intron-Kde that only has two primers: IGH-VJ-FR1-M-1, IGHV-FR1-O-1; IGHD-B-1, IGHD-E-1; IGK-V-G-1, IGK-V-I-1; TRB-V-AD-1, TRB-V-G-1; TRB-D-A-1, TRB-D-B-1; TRG-V-F-1, TRG-V-E-1; TRD-D-A-1, TRD-V-B-1. Second, these primers were perturbed by fully excluding them from the primer pool (0%) and by changing their concentration by reduction to 10% and by increase to 200%. Replicate runs of these three primer-perturbed cPT-QC libraries (six in total) were performed; however, since the replicates were consistent (data not shown), only the first replicate of each is shown in Results. Finally, relative abundances of 5′ primers were compared between normal replicates and between normal replicates and the perturbed libraries using the test of proportions.

Sources and methods
In total, 59 human B (n = 30) and T (n = 29) lymphoid cell lines were obtained from the American Type Culture Collection (ATCC, Manassas, VA, USA; www.lgcpromochem-a tcc.com) and the German Collection of Microorganisms and Cell Cultures GmbH (DSMZ, Braunschweig, Germany; www.dsmz.de), or were derived from internal cell line banks. Supplementary Table S2 gives an overview of the cell lines. DNA from cultured cell lines was isolated using a phenol-chloroform extraction protocol, followed by ethanol precipitation and elution in Tris ethylenediaminetetra-acetic acid buffer. Alternatively, DNA was isolated with the GenElute Mammalian Genomic DNA Miniprep Kit (Sigma-Aldrich, St. Louis, MO, USA) according to the manufacturer's protocol.

Multiplex amplification and Sanger sequencing
according to the BIOMED-2 protocol: PCR products were checked for fragment sizes and clonality in the QIAXCEL Advanced System [24,25]. Clonal PCR products were subjected to heteroduplex analysis and sequenced on either an ABI 3130 or ABI 3500 platform (Applied Biosystems, Foster City, CA, USA).
IG/TR rearrangement profiles of all cell lines were compared between the different methods.
For cases with discrepant results between the three methods, IG/TR allele-specific PCR assays were designed for digital droplet PCR (ddPCR) (QX200TM Droplet DigitalTM PCR System, Bio-Rad) to verify the respective rearrangement. Absolute quantification of IG/TR gene rearrangements by ddPCR was performed using two different genomic DNA amounts (50 ng, 100 ng) (Supplementary Information). Each experiment included a polyclonal MNC control and a no-template control.

Cell line selection criteria
For establishment of the cIT-QC from the spectrum of IG/ TR gene rearrangements of the 59 cell lines, the following selection criteria were defined: 1. The final set should consist of as few cell lines as possible, while covering each primer set by at least three different rearrangements, hence aiming for ALL cell lines harbouring not only lineage characteristic but also cross-lineage rearrangements. 2. The rearrangements should be unambiguously detectable with Sanger sequencing and ampliconbased NGS. 3. The variable region of IGHV-(IGHD)-IGHJ gene rearrangements should preferably be unmutated in order to avoid issues with primer annealing.

Implementation
For cIT-QC mixture preparation see Supplementary  Information. Bioinformatically, cIT-QC reads are identified using an immunogenetic annotation-based approach that is extremely fast while allowing for variations in sequence, avoiding compute-intensive and potentially inaccurate alignment.
For QC, we expect identification of at least one read per cIT-QC rearrangement and of at least as many total cIT-QC reads as total cIT-QC cells, otherwise the tube is tagged as 'QC-failed' (see below for how this is used in ARResT/ Interrogate).
Quantification applies the quantification factor-calculated per primer set by dividing total cIT-QC cells by total cIT-QC reads-to convert read counts of a clonotype to cell counts, and then calculate its relative abundance against the total sample input cells.

Creation of a test dataset
To evaluate and showcase the aforementioned concepts and functionalities, we compiled a test dataset with: 1. Four diagnostic bone marrow B-/T-ALL samples with high leukaemic infiltration (assessed by routine cytomorphology to be 60-80%). 2. Four samples of patients with B/T cell aplasia after antibody treatment. The two samples with B cell aplasia were CLL samples after Rituximab (anti-CD20) treatment and the two samples with T cell aplasia were T cell prolymphocytic leukaemia samples after Alemtuzumab (anti-CD52) treatment. In all these samples lineage-specific aplasia was confirmed by flow cytometry. 3. cPT-QC for all primer sets, but with the TRB-VJ primer set results swapped with perturbed results from experiments outlined above. To showcase generic QC functionalities, one diagnostic sample was subsampled to <1000 random reads.
The diagnostic samples and the cPT-QC were run with all primer sets as described in the accompanying manuscript by Brüggemann et al. [21], while the aplastic follow-up samples only with the corresponding primer sets, that is, the IG sets for samples with B cell aplasia, and the TR sets for samples with T cell aplasia. Figure 1 includes a schematic of the test dataset. Finally, the follow-up samples were run without the addition of MNC to test that the addition of cIT-QC is sufficient to stabilise the samples for sequencing without compromising their immunogenetic profile.

Results
The resulting protocol and functionalities for QC and quantification in IG/TR NGS marker identification are depicted in Fig. 2. We present and further discuss the underlying results below.

cPT-QC allows to assess primer performance
We compared normal cPT-QC and MNC replicate libraries and primer-perturbed cPT-QC replicate libraries (10 libraries in total) to investigate the use of cPT-QC in assessing primer performance. We applied the test of proportions on 5′ primer relative abundances in those libraries, which showed that there is a clear difference in p values between un-perturbed (high p values indicating insignificant changes) and perturbed (low p values) primers. In other words, p values of the differences in abundance of the perturbed primers are noticeably lower, an observation we can use to highlight such cases. Table 1 presents a simplified view of the results, focusing on perturbed primers plus at least one other un-perturbed primer per primer set, either to show their normal behaviour or discuss their abnormal behaviour. At a p value threshold of 1e −200 none of the primers are flagged in the cPT-QC (white cells), which highlights the reproducibility of the assay, while all the perturbed primers are flagged in the perturbed libraries (light/dark grey cells). Significant changes in abundance are also visible in other cells, with the most likely explanation that those primers were indirectly affected by perturbations of other primers. That is, a primer 'taking over' when an initially abundant primer was excluded, such as IGHV-FR1-D-1 when IGH-VJ-FR1-M-1 is perturbed either way, especially since these primers amplify partially overlapping lists of genes. Supplementary  Table S3 presents the full set of results, including the actual p values and results from the replicate MNC libraries.

Composing the cIT-QC sample from human B and T cell lines
Following the criteria outlined above, we selected six B cell lines: ALL/MIK (ALL), Raji (Burkitt lymphoma), REH (B cell precursor ALL), TMM (CML-BC/EBV + B-LCL), TOM-1 (ALL) and WSU-NHL (B cell lymphoma, histiocytic lymphoma); and three T cell lines: JB6 (ALCL), Karpas299 (ALCL) and MOLT-13 (ALL). The nine cell lines featured a total of 46 rearrangements, all of which are used as part of the cIT-QC. All but two rearrangements that were not detected by capture NGS were detected by all three sequencing methods. Also, another two were of very low abundance and/or trimmed in the capture NGS data, but since the junction segmentation was clearly the same, they Table 1 cPT-QC: replicates and primer perturbations. Relative abundances (%) of selected 5′ primers across all primer sets. Top group of primers were perturbed as described in Materials and methods; bottom group is a selection of primers that were left un-perturbed: one per primer set selected alphabetically, plus two examples where the primer behaviour is of interest to the discussion (see text). Results are shown from two cPT-QC replicates (blue column) and from replicate 1 of the blue column ("rep1") vs. cPT-QC libraries where primers were excluded (0%, orange column), reduced to 10% (yellow column) and increased to 200% (green column). Changes in abundance compared to cPT-QC rep1 are shown separately (column "% or rep1", in italics) and coloured from red (0%) to white (100%, i.e. no change) to green (200%). Actual primer abundances are coloured based on the p value from the test of proportions, with grey indicating a noticeable change according to our threshold of 1e−200 (p value <1e−199 highlighted in dark grey, and <1e−99 in light grey, otherwise in white) were still tagged as confirmed.

QC aspects can be evaluated in ARResT/Interrogate
Information on the in silico QC based on both the cPT-QC and cIT-QC is available in ARResT/Interrogate (Supplementary Figure S1). Generic QC is also performed on samples, specifically to check for low number of raw reads and low percentage of reads with an identified junction. Such samples are tagged as 'QC-failed' and excluded by default to prevent the user from their unintended use. However, the user is notified and has the option to include them back in the analysis.

Marker identification and quantification
Abundances of lymphocyte subpopulations are frequently not available for samples of patients with lymphoid malignancies. Furthermore, as IG/TR NGS only reflects relative representation of the rearrangements, it was important to establish a calibrator that would allow us to normalise sequencing reads to input DNA cells. Analysis of our test dataset showed the utility of the cIT-QC in marker identification and quantification. Excluding cIT-QC reads, both diagnostic and aplastic samples seem to harbour few highly abundant clones if simply based on the number of reads (Fig. 3, Supplementary Table S5). However, the very high number of reads from only a very limited number of cIT-QC cells (120-440, dependent on the number of cIT-QC rearrangements per primer set), in all aplastic and a few of the diagnostic samples, are an indirect yet clear indication of the restricted numbers of patient cells harbouring rearrangements in those samples. From another perspective, the total percentage of reads of cIT-QC is much greater than that of patient rearrangements in those samples, suggesting that also cIT-QC cells are more numerous than patient cells with rearrangements. Consequently, after quantification with the cIT-QC, marker abundances fall well below the threshold indicating clonality. On the other hand, and as expected, in most diagnostic samples cIT-QC reads constitute a minority, indicating the true abundant presence of patient cells with clonal rearrangements. Hence, using the cIT-QC, a marker can be more accurately quantified and identified.

ARResT/Interrogate user mode for marker identification
A critical aspect of bioinformatic-based protocols is their standardisation and usability, as evident from our experiences within EuroClonality-NGS and EuroMRD. We have thus designed ARResT/Interrogate to be flexible but also 'lockable'. Flexibility comes from a deep parameterisation of many aspects of the pipeline and the browser. At the same time, we can lock down important parameters so that users cannot inadvertently compromise the analysis. This concept is called 'user mode' in ARResT/Interrogate, and as a result of this study we have created a marker identification user mode.
In this user mode, EuroClonality-NGS primer sets and cIT-QC sequences are pre-selected and locked, as are other pipeline options. A special samplesheet is available to annotate samples with metadata, including providing numbers of sample input cells for quantification. The user interface is simplified, with many non-essential functionalities (including many of the visualisations normally available) hidden from view, and with less user actions required to load results. The minimum read-based percentage abundance for a clonotype is pre-set to 5% for marker identification.

Discussion
In this study, we introduce protocols developed within the EuroClonality-NGS Working Group for QC and quantification in NGS-based IG/TR marker identification. Both laboratory and bioinformatic protocols are presented and showcased on clinically relevant data.
The cPT-QC is used to monitor the primer performance of each of the EuroClonality multiplex NGS assays; the cIT-QC is spiked into each patient DNA sample for QC and quantification. The use of 'central' highlights that these controls should be as stable as possible and thus centrally available at an applicable level (minimum at an intralaboratory level)-this is further discussed below in the context of the cPT-QC.
Our experiments show that the cPT-QC is a valuable tool to monitor reproducibility of results and to identify primer perturbations and other deviations in the wet-lab protocol, as they introduce detectable changes to the sequencing profile. The addition of cPT-QC to each analysis allows to check the primer and assay performance after sequencing. Accidental deviations in the concentrations of single primers within the multiplexed IG/TR primer sets can be detected, performance failures of single primers can be traced and consequences for the IG/TR analysis can be estimated by analysis of cPT-QC data.
In our study, replicates of cPT-QC demonstrated high reproducibility. Nevertheless, we are aware that reproducibility across labs may be affected by a large number of other variables, from consumables and equipment to users. Only centralised access to consumables, for example, in the form of a kit, and a comprehensive protocol, including the  equipment used, will further improve inter-laboratory comparability of results. Besides, activities such as the QC rounds organised bi-annually by ESLHO (eslho.org) are an opportunity to gather data and experience, compare assay performance and identify relevant factors introducing variations. Until full inter-laboratory standardisation is guaranteed, the implementation of the cPT-QC will require that the reference samples are analysed in each laboratory separately, and updated with every new batch of reagents, while keeping track of equipment and users. These reference data can then be stored in ARResT/Interrogate, which has the ability to store as many different such sets of  , it is what we term 'usable' reads with junction, which excludes cIT-QC reads; this may lead to sums of those two numbers that exceed 100% per sample. b Abundance of markers before (in orange) and after (in green) cIT-QC-based quantification to percentage of patient input cells ("%cells"). Quantification of markers in the aplastic samples places their abundances below the 5% threshold routinely used in marker identification and in the EuroClonality-NGS protocols. Note: When cIT-QC read counts are very low, indicating clonality, quantification factors may lead %cells to exceed 100%; three such cases in the test dataset are indicated by an asterisk (" * ") reference data as needed, for example, linking a specific set to a specific user if necessary.
In this study we also highlighted a number of unique and advantageous properties of the cIT-QC. In contrast to plasmids or synthetic reference templates, cIT-QC cell lines are particularly well suited to be used as control because they are sources of large quantities of genomic DNA. Second, the nine cell lines with a total of 46 rearrangements represent as few cell lines as possible while covering each primer set by at least three different rearrangements, taking advantage of ALL cell lines harbouring not only lineageassociated but also cross-lineage rearrangements. Third, the rearrangements are unambiguously detectable with amplicon-based NGS. Fourth, the variable region of IGHV-(IGHD)-IGHJ gene rearrangements are not/lowly mutated and therefore minimise issues with primer annealing. Fifth, cIT-QC rearrangements represent 2/3 of the amplifiable junction classes (in italics in Supplementary Table S1) over all eight primer sets, and thus offer an opportunity to highlight a number of issues, most obviously over-/underamplification, but also bioinformatic misidentification. Additionally, cIT-QC rearrangements can replace MNC for PCR stability without influencing the patient immune repertoire (since cIT-QC rearrangements are identified and by default excluded from the results).
Our cIT-QC enables the conversion from reads to cells, which is of utmost importance for clinical use. Diagnostic material being analysed for MRD marker identification can show abundances of particular clonotypes that do not reflect the clonal composition of the sample. For example, if the diagnostic sample is highly infiltrated by a lymphoid malignancy that does not harbour a targetable rearrangement, the (few) residual lymphoid cells would generate the whole spectrum of detectable rearrangements; in such situations minor accompanying physiological B or T cell clones could be misassigned as clones with leukaemic markers. In the accompanying study by Brüggemann et al. [21], where 134 clonal signals with abundance >5% were detected by NGS but not by Sanger sequencing, cIT-QC quantification reduced the abundances of 71 (53%) of them below the 5% threshold.
In addition to its use in marker identification, and as exemplarily shown for B and T cell depletion in aplastic follow-up samples, the cIT-QC is of utmost relevance for MRD quantification in samples on or after treatment, in particular if B or T cell-directed therapy was applied, which minimises the background of polyclonal gene rearrangements. If the relative tumour burden is calculated by the ratio of leukaemia-specific reads to all annotated reads without any quantification, the quotient reflects the marker frequency only among cells carrying a particular type of rearrangement (e.g. IG rearrangements in B cells) and might thus heavily overestimate the tumour load [26].
Quantification values over 100% (examples in Fig. 3b and Supplementary Table S5) show that using the cIT-QC is still a semi-quantitative approach, potentially affected by amplification biases. However, there is to date no other scientific or commercial solution available that exceeds our methodology in its broad applicability (universal IG/TR approach) and/or allows precise absolute quantification [12,[27][28][29].
Finally, the QC protocols are embedded in ARResT/ Interrogate, which informs users with reports and messages and allows them, for example, to include the QC-failed samples back into the analysis. The logic behind this is that the 'fail' flag simply indicates that our pre-defined QC criteria were not met, and not that the data are corrupt beyond use. Nevertheless, flagged data should always be used with caution, and dependent on the application or question.
In summary, our study showcases the applicability of two reference standards, developed by the EuroClonality-NGS Working Group, which allow standardised analysis of IG/TR NGS data (using the EuroClonality-NGS primer sets) with high reproducibility, accuracy and precision in marker identification. With ARResT/Interrogate, a complete in silico solution accompanying the in vitro assays was built, enabling an analysis of IG/TR sequences including all quality criteria and quantification concepts necessary for valid marker identification in lymphoid malignancies. advisory board for Janssen, Abbvie, Gilead; PG: speaker for Gilead. The other authors declare that they have no conflict of interest.
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons. org/licenses/by/4.0/.