Gene expression profiling-based risk prediction and profiles of immune infiltration in diffuse large B-cell lymphoma

The clinical risk stratification of diffuse large B-cell lymphoma (DLBCL) relies on the International Prognostic Index (IPI) for the identification of high-risk disease. Recent studies suggest that the immune microenvironment plays a role in treatment response prediction and survival in DLBCL. This study developed a risk prediction model and evaluated the model’s biological implications in association with the estimated profiles of immune infiltration. Gene-expression profiling of 718 patients with DLBCL was done, for which RNA sequencing data and clinical covariates were obtained from Reddy et al. (2017). Using unsupervised and supervised machine learning methods to identify survival-associated gene signatures, a multivariable model of survival was constructed. Tumor-infiltrating immune cell compositions were enumerated using CIBERSORT deconvolution analysis. A four gene-signature-based score was developed that separated patients into high- and low-risk groups. The combination of the gene-expression-based score with the IPI improved the discrimination on the validation and complete sets. The gene signatures were successfully validated with the deconvolution output. Correlating the deconvolution findings with the gene signatures and risk score, CD8+ T-cells and naïve CD4+ T-cells were associated with favorable prognosis. By analyzing the gene-expression data with a systematic approach, a risk prediction model that outperforms the existing risk assessment methods was developed and validated.


Characteristics of diffuse large B-cell lymphoma patients
summarizes the clinical characteristics of 718 patients with complete overall survival information.

Gene expression profiling
Hierarchical clustering was applied to gene expression data using the unweighted pair-group method as implemented in the R hclust package. The distance matrices used were Pearson correlation for clustering the arrays. We examined the cluster sizes to determine a criteria on the clusters to be included in the statistical analyses. The distribution of the number of genes in the pre-defined cutoff clustering method are illustrated in Figure S1.   Figure S4: Gene expression profiling of good-prognosis genes and the hierarchical clustering with gene clusters identified by the Dynamic Cut Tree method.
Training set (n=502) Genes associated with favorable prognosis Figure S5: Gene expression profiling of poor-prognosis genes and the hierarchical clustering with gene clusters identified by the Dynamic Cut Tree method.

Figures S6 and S7 show the correlations between the gene signatures identified by the Dynamic Cut
Tree method and the cell type abundances. Table S10 presents the correlation between the immune cell subtypes and gene signatures identified by the Dynamic Cut Tree method that had the most significant correlation. Figure S8 demonstrates the correlations between the risk score and immune cell fractions in the TME and tumor, and the corresponding p-values. We note that as the risk score increases, the survival experience worsens. In the absence of B-cells, CD4+ naive, CD4+ memory resting, regulatory and follicular helper T-cells and M0 macrophages were negatively associated with the risk score, implying that an increased infiltration of these cell subtypes is associated with improved survival outcome (all p < 0.05). In the presence of B-cells, we observed that CD4+ memory resting, follicular helper and regulatory T-cells and M0 macrophages were again significantly associated with favorable prognosis (all p < 0.005). Lastly, memory B cells were negatively correlated with the risk score (p = 0.04) whereas naive B-cells were positively correlated (p = 0.04), implying that increased naive B-cells are related with unfavorable prognosis.
In the presence of B-cells, we observed that CD4+ memory resting, follicular helper and regulatory T-cells and M0 macrophages were again significantly associated with favorable prognosis (all p < 0.005, Figure S8). There was not an association between naive CD4+ T-cells and risk score when B-           Figure S8: The association between the survival-predictor score and the immune cell fractions.