Main

Diffuse large B-cell lymphoma (DLBCL) is a heterogeneous entity with remarkably variable clinical outcome. Among the many biomarkers investigated so far, only the molecular subtypes by cell of origin (COO) classification, the MYC involved chromosome translocation and TP53 mutation, have been consistently shown to bear prognostic value in the setting of rituximab-containing chemotherapy regimens. The COO-classification by whole-genome expression profiling (GEP) classifies DLBCL into activated B-cell like (ABC), germinal center B-cell like (GCB), and Type-III (unclassified) subtypes, with the ABC-DLBCL characterized by a poor prognosis and constitutive NF-κB activation.1, 2, 3, 4, 5 The original classification was based on similarity of DLBCL gene expression to the activated peripheral blood B cells or normal germinal center B cells by hierarchical clustering analysis.1 Subsequently, Wright et al3 identified 27 genes that were most discriminative in their expression between ABC and GCB-DLBCL, and developed a linear predictor score (LPS) algorithm for COO-classification. These seminal works are entirely based on retrospective investigations of fresh-frozen (FF) lymphoma tissues. A major challenge for the application of this COO-classification in clinical practice is to establish a robust clinical assay amenable to routine formalin-fixed paraffin-embedded (FFPE) diagnostic biopsies.

Several immunohistochemistry based algorithms have been investigated to recapitulate the COO-classification by GEP, but all suffer from reproducibility, particularly low efficacy in survival separation between the ABC and GCB subtypes.6, 7, 8, 9 Several studies have investigated the possibility of COO-classification of DLBCL using FFPE tissues by quantitative measurement of mRNA expression, including quantitative nuclease protection assay,10 (GEP) with the Affymetrix HG U133 Plus 2.0 platform or the Illumina whole-genome DASL assay,11, 12, 13 and NanoString technology.14 In general, these studies demonstrated high confidence of COO-classification of DLBCL using FFPE tissues and a robust separation in overall survival between ABC and GCB-DLBCL.

Apart from the above technologies for GEP, the expression of a small number of genes can be quantified by high-throughput real-time PCR. In comparison with the microarray-based approach, high-throughput real-time PCR is likely more sensitive and accurate in data acquisition, and the data analysis is expected to be much easier and robust. Real-time PCR has been successfully used for construction of the 6-gene prediction model in DLBCL,15 but this model, unlike COO classification, does not depict the underlying molecular mechanism, and its utility in the context of new therapeutic trials remains unknown. In this study, we have developed and validated a protocol for COO-classification of DLBCL using FFPE tissues by high-throughput real time PCR with the Fluidigm BioMark HD system together with a newly validated classifier.

MATERIALS AND METHODS

Patient Materials and Data Sets

A total of 152 cases of DLBCL were experimentally investigated in the present study, and all were previously studied for COO classification by Illumina WG-DASL array using FFPE lymphoma tissues.13, 16 In each case, mRNA sample from FFPE lymphoma tissue was available from the previous study. Local ethical guidelines were followed for the use of archival tissues for research with the approval of the ethics committees of the involved institutions.

The following GEP data sets, which had companion clinical follow-up data, were retrieved and used for construction and validation of DLBCL classifier: the FF data set by Lymphochip (http://llmpp.nih.gov/DLBCLpredictor/),3 the FF data set (GSE10846: this was further split according to treatment with CHOP or R-CHOP and analyzed independently) by Affymetrix U133 plus 2.0,17 the Monti FF data set by Affymetrix U133A&B (http://www.ncbi.nlm.nih.gov/gds/);18 and the FFPE data set (GEO: GSE32918) by Illumina WG-DASL array (Haematological Malignancy Diagnostic Service, St James Institute of Oncology, Leeds).13, 16

Primer Design and Validation

PCR primer pairs were designed for the 20 classifier genes that are commonly present in different microarray platforms, 5 NF-κB target genes that are characteristically enriched in their expression in ABC-DLBCL,4, 5 and 3 reference genes (Supplementary Table S1). The reference genes were selected as they are stably expressed in lymphoid tissues, but not affected by genomic copy number changes in lymphoma, nor involved in lymphomagenesis. A set of criteria were followed for the primer design and these included: (a) targeting a small fragment of the coding sequence with all amplicons in the range of 70–130 bp, thus being amenable to FFPE tissues; (b) where possible flanking an intron, hence preventing from amplification of any potentially contaminated genomic DNA; (c) targeting as many known transcript variants as possible; (d) giving a Tm value at or close to 60 °C (http://www.oligoevaluator.com); (e) avoiding any known SNPs- and GC-rich sequence region. The specificity of the primers designed and their potential formation of primer dimmers were checked with Primer Blast (www.ncbi.nlm.nih.gov/tools/primer-blast/) and Oligos 9.1, then further assessed by in silico PCR package (http://genome.ucsc.edu/cgi-bin/hgPcr?command=start) before purchase from Thermo Fisher Scientific.

The PCR primer pairs were experimentally validated for quantitative reverse transcription PCR (qRT-PCR), with RNA samples extracted from FFPE tonsil and lymphoma tissue specimens using the iCycler iQ system (BioRad). The primers failed to give specific results or high efficiency of amplification or worked inconsistently were rejected, and new primers were designed until satisfactory primer sets obtained for each of the genes investigated.

RNA Extraction, cDNA Synthesis, and Quantitative PCR

RNA was purified from FFPE tissues using the RecoverAll Total Nucleic Acid isolation Kit (Life Technologies), followed by TURBO DNase (Life Technologies) treatment to remove genomic DNA. A total of 200 ng RNA was subjected for cDNA synthesis in a 10 μl reaction mixture with random hexamers using the Superscript III Kit (Invitrogen, Life Technologies) according to the manufacturer’s instructions. An aliquot (2 μl) of the cDNA was pre-amplified in a 10 μl reaction using TaqMan PreAmp mastermix (Life Technologies) with all the 28 pairs of gene specific primers, and the PCR cycle conditions were 95 °C for 10 min, followed by 19 cycles of 95 °C (15 s) and 60 °C (4 min). This protocol was shown to yield unbiased amplification (Supplementary Figure S1). The pre-amplified products were treated with exoSAP-IT (Affymetrix), diluted fivefold using DNA suspension buffer (TEKnova) and stored at −80 °C until use for quantitative real time PCR.

For quantitative PCR with iCycler iQ system (BioRad), this was carried out using 2.7 μl of the diluted pre-amplified product and SSO Evagreen supermix with low ROX (BioRad) according to the manufacturer’s instructions. All reactions were performed in triplicate. Controls included RNA sample to check for genomic DNA contamination and no template cDNA to monitor any across contamination. PCR cycle conditions were 95 °C for 1 min, followed by 30 cycles of 96 °C (5 s) and 60 °C (20 s). Melting curve analysis was routinely preformed to confirm specific amplification. The expression level of each gene was calculated by the ΔCT method. The amplification efficiency of each primer pair was obtained by quantitative PCR of a serial dilutions of the specific sequence products respectively. The primer pairs validated satisfactorily by experiments with a high amplification efficiency were proceeded to quantitative PCR with the Fluidigm BioMark HD system (Fluidigm Corporation, CA, USA).

Massive Parallel Quantitative PCR with Fluidigm BioMark HD System

This was carried out essentially according to the manufacturer’s instructions (Figure 1). In brief, a sample mixture was prepared by mixing 2.7 μl of the diluted pre-amplified product, 3 μl SSO Evagreen supermix with low ROX (BioRad) and 0.3 μl 20 × sample-loading reagent. Each sample was investigated in duplicate. Separately, an assay mixture was prepared for each primer pair and this included 3 μl of 10 μM forward and reverse primer and 3 μl 2 × assay-loading reagent. The dynamic array was first primed with control line fluid, and then loaded with the sample and assay mixtures via the appropriate inlets using an IFC controller. The array chip was placed in the BioMark Instrument for PCR at 95 °C for 10 min, followed by 30 cycles at 95 °C for 15 s and 60 °C for 1 min according to the protocol GE Fast 48x48 PCR+Melt v2.pcl. The data were analyzed with real-time PCR analysis software in the BioMark HD instrument (Fluidigm Corporation).

Figure 1
figure 1

Outline of classifier training, validation and testing, and application of the best classifier to qRT-PCR data. The LLMPP lymphochip data set by Wright et al was used for training and selecting the top basic machine-learning classifiers using the Weka 3.7.7 package (http://www.cs.waikato.ac.nz/ml/weka/). A total of 26 representative basic machine-learning classifiers were trained and cross-validated on the Wright data set using the default settings. The trained Weka classifiers gave prediction for each of the three classes (that is, ABC, GCB, Type-III), the class with the highest probability was taken as the predicted class. The performance of these basic classifiers was ranked according to F-measure and ROC area value. The resulting six top basic machine-learning classifiers were systematically combined, trained, and cross-validated on the Wright data set. All the meta classifiers except one yielded F-measure and ROC area values similar to the top six basic classifiers. The performance of these meta classifiers together with the six top basic classifiers were ranked according to survival separation between the assigned ABC and GCB cases and the least number of cases assigned to Type 3. The resulting best classifier, SimpleLogistic, was then tested on the Affymetrix GSE10846 data set from FF tissues,17 the Illumina WG-DASL GSE32918 data set from FFPE tissues,13, 16 and the Affymetrix data set from FF tissues by Monti et al (http://www.ncbi.nlm.nih.gov/gds/).18 Finally, the validated best classifier was applied to the qRT-PCR data generated on the FFPE tissues in the present study.

Normalization and Analysis of Fluidigm qRT-PCR Data

This was carried out using the R statistic software and Bioconductor HT-qPCR package. For each sample, the Ct values of the two replicates were averaged and then normalized for each primer pair according to their amplification efficiency. The expression level of each gene was calculated by the ΔCT method using the mean from the three reference genes. On the basis of the application efficiency in our experimental system, any qPCR reaction with a ΔCt value above the cutoff (25) for linear amplification was set to 26.19 For a small proportion of PCR reactions, there was no evidence of amplification at the maximum 30 cycle set by the manufacturer’s default protocol, commonly due to low levels of gene expression (for example GCB genes in ABC-DLBCL or vice versa) or rarely as a result of failed amplification. Any cases with >15% of targets, that is, 4/28 genes, showing a negative result were considered unreliable and excluded from data analysis.

Data Preparation for DLBCL Classifier Validation

For each of the GEP data sets by the Affymetrix platform, the probe annotation was updated according to Release 33 (30 October 2012), whereas for GEP data set by Illumina WG-DASL assay, the specificity of the classifier gene probes was further checked by search of the NCBI human database. The probe that was found to be nonspecific was excluded from the analysis. The median value across the probes for each gene was selected and used for DLBCL classification, as it was shown to be more informative in a recent study.13 The median value for each classifier gene in a given data set was transformed into a quantile score appose to Z-score, as the expression value of a high proportion of the classifier genes in all the data sets tested was not in a normal distribution (Supplementary Table S2). The quantile transformed values were converted to ARFF files for training and testing with the machine-learning classifiers in Weka 3.7.7 (http://www.cs.waikato.ac.nz/ml/weka/).

DLBCL Classifier Testing and Ranking

The LLMPP lymphochip data set by Wright et al was used for training and selecting the top basic machine-learning classifiers using the Weka 3.7.7 package (http://www.cs.waikato.ac.nz/ml/weka/). A total of 26 representative basic machine-learning classifiers were trained and cross-validated on the Wright data set using the default settings (Figure 1). As with the original study, the training and validation were performed in the identical series of 160 and 80 cases, respectively.2, 3 The trained Weka classifiers gave prediction for each of the three classes (that is, ABC, GCB, Type-III), the class with the highest probability was taken as the predicted class. The performance of these basic classifiers was ranked according to F-measure and ROC area value as described previously.13 The resulting top basic machine-learning classifiers were systematically combined, trained, and cross-validated on the Wright data set, and the resulting best classifier was then tested on the Affymetrix GSE10846 data set from FF tissues,17 the Illumina WG-DASL GSE32918 data set from FFPE tissues,13, 16 and the Affymetrix data set from FF tissues by Monti et al (http://www.ncbi.nlm.nih.gov/gds/).18 Finally, the validated best classifier was applied to the qRT-PCR data generated on the FFPE tissues in the present study.

Comparison of NF-κB Target Gene Expression between ABC and GCB-DLBCL

The expression of NF-κB target genes, BCL2, CCDN2, CCR7, CD44, cFLIP, IκBα, and IRF4, was compared between the assigned ABC and GCB subgroups using non-parametric Mann–Whitney U-test.

RESULTS

Identification of the Best Classifier Amenable to Data Sets from Both FF and FFPE Tissues

The DLBCL automatic classifier (DAC) was developed for COO classification by Illumina WG-DASL profiling using FFPE tissues.13 The meta-classifier DAC utilized a balanced voting between the individual classifiers LMT, J48, RF100, and SMO, and was shown highly confidence in classification of GEP data from both FF and FFPE tissues.13 In the initial analysis, we applied DAC to our qRT-PCR data, but found classification unsatisfactory in comparison with those obtained by WG-DASL.13

While exploring the data analysis, we found that the expression value of a high proportion of classifier genes in the published data sets was not in a normal distribution (Supplementary Table S2), thus not suitable for Z-score transformation. In addition, we found that the models generated by logistic based classifiers, such as Logistic, SimpleLogistic, LMT, and FT, were not always compatible with the expected function of individual classifier genes in ABC and GCB assignment. Depending on individual classifiers, a proportion of the ABC classifier genes was given a coefficient favoring GCB rather ABC class assignment or vice versa (Supplementary Table S3). This is most likely caused by multicollinearity owing to high correlation of expression of the classifier genes.

To circumvent the above issues, we first used quantile score for data transformation, which is amenable to quantitative data regardless their distribution. To overcome multicollinearity, we converted the expression of 20 individual classifier genes into two variable indices, by summing the quantile score of all ABC or GCB genes in each case, respectively. On the basis of the characterized expression pattern of the classifier genes, one would expect that ABC-DLBCL is featured by high ABC but low GCB index, GCB-DLBCL by high GCB but low ABC index, and Type-III DLBCL by both low ABC and GCB indices.

On the basis of the above principle, we systematically trained and validated a series of basic and meta machine-learning classifiers and identified the classifier that gave the best performance across different microarray platforms and data sets. Of the 27 classifier genes defined by Wright et al,3 only 20 were commonly present among the Lymphochip, Affymetrix, and Illumina WG-DASL platforms. A previous study showed that the FFPE data set was best classified using the 20 classifier genes with the LLMPP lymphochip data set by Wright et al for training.13 Accordingly, we calculated the ABC and GCB indices based on the same 20 classifier genes and used the Wright data set for initial classifier training (n=160 cases) and validation (n=80 cases). We first tested a total of 26 representative basic machine-learning classifiers and ranked them according to F-measure and ROC area value.13 There were seven basic classifiers showing F-measure and ROC area value above 0.80 and 0.90, respectively, and six of these classifiers, namely NaiveBayes, Logistic, FT, MultilayerPerceptron, RBFnetwork, SimpleLogistic, yielded Type-III class at a relatively low frequency. These six classifiers were systematically combined and trained, and validated on the Wright data set.

All the meta classifiers except one showed F-measure and ROC area values similar to the top six basic classifiers (Supplementary Table S5). The performance of the 6 basic and 16 of their derived meta classifiers were ranked according to survival separation between the assigned ABC and GCB subtypes and the least number of cases assigned to Type-III. This identified SimpleLogistic as the best choice, and we then further tested this classifier on the Affymetrix GSE10846 data set based on FF tissues (including two cohorts: one treated with CHOP and the other treated with R-CHOP),17 and the Illumina WG-DASL GSE32918 data set based on FFPE tissues.13, 16 SimpleLogistic consistently gave excellent performance for all three data sets, overall better than the original class assignment as measured by overall survival separation between the ABC and GCB groups, and the least number of cases assigned to Type-III (Figure 2, Supplementary Table S6).

Figure 2
figure 2

Testing of SimpleLogistic classifier for COO classification of DLBCL using published data sets. The probability of ABC/GCB/Type-III assignment is shown on the top of heatmap illustration in each data set. Sum_ABC: the summed ABC gene score; Sum_GCB: the summed GCB gene score. The middle panel shows comparison of overall survival among the ABC, GCB, and Type-III groups. The table on the right displays the significant difference in the expression of the NF-κB target genes between ABC and GCB-DLBCL assigned. *Benjamini and Hochberg method was used to correct multiple testing problems.

We further tested SimpleLogistic on the data set by Monti et al18 The Monti data set lacked 4 of the 20 classifier genes, including absence of GCET1. On the basis of the 16 classifier genes, SimpleLogistic also showed a comparable ABC/GCB/Type-III assignment and survival separation between the ABC and GCB class (Supplementary Table S6), confirming robustness of the classifier.

Finally, to further testify the ABC/GCB class assigned by SimpleLogistic, we compared the NF-κB target gene expression between the two classes by SimpleLogistic and the original classifier. For each of the three data sets investigated, SimpleLogistic and the original classifier showed nearly identical results on the NF-κB target gene expression between the ABC and GCB groups assigned, with the expression of IRF4, CCND2, CD44, cFLIP, BCL2, and CCR7 being significantly higher in ABC than GCB-DLBCL, but no difference in the NFKB1A expression between the two groups (Figure 2). These results are very similar to the NF-κB target gene signature in DLBCL ABC/GCB subtypes, which was defined in the original studies.4, 5

DLBCL COO Classification by qRT-PCR with Fluidigm Dynamic Array

In total, 143 of the 152 cases included were successfully investigated by qRT-PCR with the Fluidigm BioMark HD system, whereas the remaining nine cases failed owing to insufficient quantity of RNA or incomplete data acquisition. We applied the SimpleLogistic classifier to this qRT-PCR data set as described above. The distribution of ABC (28.7%), GCB (53.1%), and Type-III subtype assigned by the qRT-PCR/SimpleLogistic classifier was nearly identical to those by Illumina WG-DASL/DAC classifier.

Of the 143 cases successfully investigated by qRT-PCR, 120 were treated with R-CHOP and were further analysed for correlation between treatment outcome and COO subtype. As shown in Figure 3b, there was a significant difference in the overall survival between the ABC and GCB-DLBCL assigned by the qRT-PCR/SimpleLogistic classifier. As expected, the expression of NF-κB target genes IRF4, CCND2, CD44, cFLIP, and CCR7, with the exception of BCL2, was significantly higher in ABC than GCB-DLBCL (Figure 3).

Figure 3
figure 3

COO classification of DLBCL using RNA samples from FFPE tissues by parallel qRT-PCR and SimpleLogistic classifier. (a) Heatmap shows the level of expression of the 20 classifier genes used for classification; (b) Difference in overall survival between ABC, GCB, and Type3 DLBCL in 120 cases of DLBCL treated with R-CHOP; (c) Difference in NF-κB target gene expression between ABC and GCB-DLBCL. *Benjamini and Hochberg method was used to correct multiple testing problems.

DISCUSSION

COO-classification of DLBCL has two significant clinical implications. First, the classification divides DLBCL into different prognostic subgroups with ABC-DLBCL, showing worse survival than GCB-DLBCL in both CHOP and R-CHOP treatment settings. Second, ABC-DLBCL is characterized by constitutive NF-κB activation, and may be treated by inhibitors of the NF-κB pathway. For example, addition of bortezomib to chemotherapy significantly improved the treatment response and overall survival of the patients with ABC-DLBCL, but not those with GCB-DLBCL.20 Currently, a prospective phase-III randomized controlled clinical trial, known as REMoDL-B, is being conducted to assess the clinical efficacy between R-CHOP and bortezomib plus R-CHOP according to ABC and GCB molecular subtype. Therefore, the survival separation between the ABC and GCB subtype and their difference in NF-κB activities are the key parameters in the assessment of methodologies for COO-classification of DLBCL. There are many factors affecting the performance of COO-classification and the critical elements include the nature of lymphoma specimen, experimental methods for data collection, data normalization, and transformation, classifier, and the level of probability used for subtype assignment.

Immunohistochemical study of the surrogate protein markers and several algorithms have been extensively investigated to recapitulate the COO-classification by GEP, but all suffered from reproducibility and low efficacy in survival separation between the ABC and GCB-DLBCL.8, 9 GEP using RNA samples from FF tissues has been shown highly consistent in COO-classification, and this approach has also been applied to FFPE tissues with very encouraging results. Nonetheless, the drawback of the GEP approach is cumbersome in experimental setup and data analyses, and, in addition, not cost effective. For clinical application, a quantitative measurement of the expression of the classifier genes, rather than the whole genome, would be preferable. In a recent study, Care et al showed that a panel of 20 of the 27 classifier genes defined by Wright et al3, 13 gave the best performance for COO-classification after testing a series of classifier genes.

To develop a tailor-made clinical assay for COO-classification of DLBCL, we established a robust protocol to measure the expression of the 20 classifier genes together with the NF-κB target genes characteristically overexpressed in ABC-DLBCL, using RNA samples from FFPE tissues by parallel qRT-PCR with the Fluidigm BioMark HD system. We demonstrated that the expression of classifier genes could be reproducibly measured using the protocol established. In comparison with the microarray-based GEP, the qRT-PCR-based approach offers several notable advantages, including high sensitivity and reproducibility, easy to perform and cheap to run, and a short turnaround time.

Apart from high quality of data acquisition, generation of an accurate classifier is another challenge for application of COO-classification to qRT-PCR data set from FFPE tissue, as which classifier to use appears to depend on the method/platform of data acquisition and the nature (FF or FFPE) of the tissue specimen used. For example, the LPS algorithms successfully used on the LLMPP FF data set by Lymphochip was not suitable for the FFPE data set by Illumina WG-DASL.13 Similarly, we found that the meta classifier DAC developed on the basis of the FFPE data set by Illumina WG-DASL was not amenable to our qRT-PCR data set. The potential reasons accounting for these incompatibilities are many and the critical ones may include variations in data distribution, normalization, and transformation, as well as classifier selection and validation.

There are many machine-learning classifiers, and a proportion of them are based on logistic model, which is potentially suitable for assessing the relationship between categorical variables (DLBCL subtypes) and their dependent variables (expression of classifier genes). However, their utility in COO-classification of DLBCL is restricted by the presence of multicollinearity among the expression of the classifier genes (Supplementary Table S3). To overcome this, we combined all the ABC and GCB gene expression values into two indices, respectively, thus making the data amenable to a wide range of machine-learning classifiers.

To develop a classifier that is amenable to the qRT-PCR-based data set, we have systematically trained and validated 26 basic machine-learning classifiers and their derived meta classifiers, using the LLMPP lymphochip data set by Wright et al2, 3 As with the study by Care et al,13 the selection of top classifier was based on the significance of survival separation between the ABC and GCB groups, and the least number of cases assigned to Type-III. This combined assessment avoided selection bias toward classifiers that gave significant survival separation between the ABC and GCB groups, but at the expense of the numbers of cases that were successfully assigned to these biological subtypes. The top classifier identified was then tested with additional GEP data sets from both FF (by Affymetrix platform) and FFPE lymphoma specimens (Illumina WG-DASL). Analyses of survival separation and NF-κB target gene expression between the ABC and GCB groups assigned, and the distribution of the three molecular subtypes consistently showed that SimpleLogistic, the top classifier identified, gave a similar or better performance than the respective original classifier. Finally, the assured top classifier was applied to our qRT-PCR data set. As the final classifier used for our qRT-PCT data set was sequentially validated using a series of published data sets derived from both FF and FFPE specimens by different microarray platforms, the above process of classifier generation has little risk of overfitting, a common issue in classifier generation.

Despite that SimpleLogistic and original classifier yielded similar performance, including survival separation and differential NF-κB target gene expression between the ABC and GCB subtypes in a range of data sets, the concordant rate in class assignment between these different classifiers was at 80% across all data sets investigated. These findings are very similar to those found in previous studies irrespective of the nature (fresh frozen or FFPE) of DLBCL specimens investigated.10, 13, 14, 21 In general, the samples showing discrepant classes by different classifiers had a low confidence score for either ABC or GCB class assignment.13 It is possible that these cases are in a ‘molecular grey zone’, and their class cannot be accurately defined, no matter which classifiers used. In support of this speculation, there is emerging evidence, showing overlap in somatic mutation profiles between ABC and GCB DLBCL.22, 23 Nonetheless, it remains to be investigated whether the cases not amenable for an accurate COO classification represent an intermediate subset with overlapping genetic changes.

In summary, we have developed a robust protocol for COO-classification of DLBCL using RNA samples from FFPE tissues by qRT-PCR using the Fluidigm BioMark HD system with SimpleLogistic classifier. The ABC and GCB-DLBCL assigned show the respective characteristics in their clinical outcome and NF-κB target gene expression. The methodology may provide a robust approach for DLBCL sub-classification using routine FFPE diagnostic biopsies in a routine clinical setting.