The EGR3 regulome of infant KMT2A-r acute lymphoblastic leukemia identifies differential expression of B-lineage genes predictive for outcome

KMT2A-rearranged acute lymphoblastic infant leukemia (KMT2A-r iALL) is associated with outsize risk of relapse and relapse mortality. We previously reported strong upregulation of the immediate early gene EGR3 in KMT2A::AFF1 iALL at relapse; now we provide analyses of the EGR3 regulome, which we assessed through binding and expression target analysis of an EGR3-overexpressing t(4;11) cell culture model. Our data identify EGR3 as a regulator of early B-lineage commitment. Principal component analysis of 50 KMT2A-r iALL patients at diagnosis and 18 at relapse provided strictly dichotomous separation of patients based on the expression of four B-lineage genes. Absence of B-lineage gene expression translates to more than two-fold poorer long-term event-free survival. In conclusion, our study presents four B-lineage genes with prognostic significance, suitable for gene expression-based risk stratification of KMT2A-r iALL patients.

Our group recently reported the immediate early gene Early Growth Response 3 (EGR3) as relapse-associated, with 100-fold increased EGR3 gene expression levels at the time of relapse compared to primary diagnosis [5]. This prompted us to investigate the role of EGR3 in the context of KMT2A::AFF1 iALL in more detail.
Here we explore the EGR3 regulome of KMT2A::AFF1 proB-ALL in detail through integration of data derived from massive analysis of cDNA ends-sequencing (MACE-Seq) and chromatin immunoprecipitation DNA-sequencing (ChIP-Seq) of an EGR3-overexpression SEM cell model. Our study identifies EGR3 as a regulator of early B-lineage specification and commitment. Additionally, gene expression and principal component analysis of 50 KMT2A-r iALL patients at diagnosis and 18 at relapse provides strictly bimodal clustering of patients based on the expression of the identified B-lineage genes. Absence of B-lineage gene expression translates to dismal outcome with more than two-fold poorer long-term event-free survival.
Flow cytometry of SEM::EGR3 and SEM::mock Cells were blocked using Human BD Fc Block TM (BD) and stained with FACS antibodies (BD) according to the manufacturer's protocol. Cells were analyzed using a BD FACSVerse TM . Cells of interest were gated out of all cells using FSC-H and FSC-A. Subsequently, single cells were gated and assessed for CD19, IgM and CD79A protein surface expressions with gates set using fluorescence minus one (FMO) controls. Flow cytometric analysis was performed with four biological replicates of SEM::mock and SEM::EGR3 on a BD FACSAria™ III Cell Sorter. FACS plots were created using FlowJo™ Software. Antibodies used for flow cytometry are described in Table 4.

Massive analysis of cDNA ends-sequencing (MACE-Seq)
MACE-Seq is 3′ single end mRNA sequencing enabling high resolution transcription profiling of RNA extracted from three biological replicates of SEM::mock and SEM::EGR3 48 h after induction with Doxycycline 1 µg/mL. Screen tape analysis of RNA was performed using the bioanalyzer Agilent 2200 TapeStation assessing the RNA integrity number (RIN). RIN values above 8.5 were considered as tolerable. MACE-Seq of extracted RNAs was performed by GenXPro GmbH. MACE-Seq data are available at GEO with accession code GSE225710.

Chromatin immunoprecipitation DNA-sequencing (ChIP-Seq)
ChIP-Seq data of SEM::EGR3 immunoprecipitated with an α-FLAG antibody in comparison to input were already obtained during our former study (GSE205652). ChIP-Seq data are available at GEO with accession code GSE205652.

Statistics and data analysis
Appropriate statistical tests were performed within DeSeq2, GSEA, or BETA plus algorithms in context of differential expression and gene set enrichment analyses of MACE-Seq data or binding and expression target analysis of ChIP-and MACE-Seq data, respectively. The level of significance is indicated by p values (DeSeq2), false discovery rates (GSEA) or rank products (BETA plus). Used expression and binding data meet the demands of respective algorithms and associated statistical tests in terms of normality and equal-variance assumptions. The phenotypic populations assessed by flow cytometry were compared using two-tailed t tests performed using GraphPad Prism 9.5.0 software. Principal component analyses (PCA) were conducted as singular value decompositions using ClustVis [22] with applied unit variance scaling for rows.
Sample sizes were chosen depending on experimental context. qRT-PCR was performed as technical triplicates of 68 patient samples in total. MACE-Seq was performed using three biological triplicates of SEM::mock and SEM::EGR3 RNA. Flow cytometric analysis was performed using four biological replicates of SEM::mock and SEM::EGR3.
Gene set enrichment analysis (GSEA) was conducted using GSEA 4.2.3 according to the developer's protocol [23].
Integration of MACE-Seq and ChIP-Seq data as well as transcription factor motif scanning was performed using the online tool BETA plus according to the developer's protocol [24].
The 3890 up-and 3107 downregulated direct EGR3 target genes identified with BETA were uploaded to the PANTHER 17.0 classification system [25]. The functional classification considering protein class, biological function/gene ontology (GO) term, and chromosomal location was plotted using Microsoft Excel as net plots.
Survival analysis was performed using GraphPad Prism 9.4.1. Event-free survival (EFS) was defined as the time from diagnosis to first failure including induction failure, relapse, death, or second malignant neoplasm according to the Interfant-99 protocol [2]. Time was censored at last followup if no events were observed. Curves were computed with the Kaplan-Meier estimator, standard errors (SE) with the Greenwood formula, and curves were compared with the log-rank test.

RESULTS
Overexpression of EGR3 in the KMT2A::AFF1 proB cell line SEM upregulates B-lineage specification and commitment genes We recently identified EGR3 as a relapse-associated factor in KMT2A-r iALL, whose gene expression is~100-fold increased in patients at relapse (rel) compared to primary diagnosis (dx) [5]. Surprisingly, re-analysis of the Leukemia MILE study data involving 2096 patient samples using the online database BloodSpot revealed B cell malignancies to be characterized by a decreased EGR3 gene expression in comparison to healthy bone marrow (BM) (Supplementary Fig. 1A) [26,27]. Regarding healthy hematopoietic cells, the BloodSpot DMAP dataset demonstrates low EGR3 expression in proB cells which strongly elevates with differentiation to naïve B cells and their progeny ( Supplementary  Fig. 1B) [28].
Gene set enrichment analysis (GSEA) was used to functionally characterize the EGR3 transcriptome considering 13,938 gene sets, of which 9910 were significantly enriched in the phenotype EGR3 at a nominal p value below 0.01. 'Signaling by the B cell receptor' (Reactome, R-HSA-983705) was identified as the highest ranked gene set (NES = 2.96, FDR = 0.050) ( Table 5 [29][30][31]. The top scored upregulated genes in the three sets were IGHM, CD79A, BLK, PTPN6, CD22, and CD19 ( Fig. 1D-F). Importantly, also the pre-BCR surrogate light chain gene IGLL1 was found to be We assessed the surface expression of CD19 and CD79A in SEM::EGR3 and SEM::mock using flow cytometry, as the genes encoding these surface receptors were highly ranked and part of the core enrichment of the Reactome BCR signaling gene set ( Fig. 2A). Remarkably, although the SEM cell line is per se CD19 + , EGR3 overexpression resulted in an approximately ten-fold increase of the CD19 median fluorescence intensity (MFI) of the single cell population (p < 0.0001) (Fig. 2B, C). Furthermore, we observed a significant relative expansion of the CD19 hi CD79A + population of SEM::EGR3 (p = 0.0006) (Fig. 2D). This expansion of phenotypic B cells (CD19 hi CD79A + ) was characterized by a significant increase of the CD79A MFI compared to SEM::mock (p = 0.0008) (Fig. 2E).
These results demonstrate that upregulation of CD19 and CD79A by EGR3 led to an increase of the receptors' surface expressions. EGR3 overexpression induced B-lineage specification indicated by relative expansion of phenotypic B cells, thereby enabling functional BCR signaling as indicated by GSEA.

EGR3 and downstream intermediate factors transactivate B-lineage specification and commitment genes
To explore the EGR3 regulome in detail, we performed binding and expression target analysis using the open source application BETA plus [24]. The BETA software algorithm integrates transcription factor ChIP-Seq data with differential gene expression data to deduce direct target genes, and is a standard processing pipeline for transcription factor binding studies [24,32].
BETA enabled integration of MACE-Seq transcriptome data with EGR3 chromatin immunoprecipitation DNA-sequencing (ChIP-Seq) data of SEM::EGR3 and SEM::mock. ChIP-Seq data were already obtained during our former study (GSE205652).
The BETA algorithm ranked genes according to the regulatory potential score and assigned to the cumulative percentage of genes. Plotting this assignment as a graph visualized that EGR3 owns a direct activating and repressive function, and thus, acts as a direct transactivator and -repressor in KMT2A::AFF1 proB-ALL (Fig. 3A).
In total, 3890 directly upregulated and 3107 downregulated EGR3 target genes were identified, referred to as the EGR3 regulome. We used the PANTHER classification system [25] to functionally characterize the EGR3 regulome. This analysis revealed that EGR3 transactivates and -represses the same classes of genes ( Fig. 3B), involved in the same biological functions (Fig. 3C) and located on the same chromosomes (Fig. 3D). Especially the class 'gene-specific transcriptional regulator' comprised more directly activated (n = 361) than repressed (n = 212) genes. Concordant with this, a higher number of transactivated (n = 1229) than -repressed (n = 840) genes were related to the Gene Ontology (GO) term 'biological regulation' (Fig. 3C).
We compared the BCR pathway-related genes of the EGR3 transcriptome with those of the regulome (Fig. 3E, F) and identified 64 of 114 genes (56.1%) as direct EGR3 targets, including CD79A, BLK, PTPN6, CD19, and CD22 (Table 6). Thus, the remaining 50 genes including IGHM were indirect EGR3 targets and transactivated by unknown intermediate transcription factors. To uncover these transcription factors, a motif analysis of all differentially expressed genes (MACE-Seq data) and all direct EGR3 targets (ChIP-Seq data) was performed using BETA plus. As expected, the highest scoring and significant consensus sequence found in all up-and downregulated target genes matched that one of EGR3 and other transcription factors binding to the same or a highly similar motif including EGR1, EGR2, EGR4, KLF16, SP3, and SP8 (Fig. 4A). Lower scoring binding motifs of up-and downregulated genes were assigned to ZEB1, ZNF354C, and SOX10, indicating potential alternate regulation of EGR3 target genes by these transcription factors. Scanning for enriched motifs in differentially expressed but not direct EGR3-regulated genes identified a set of motif-assigned transcription factors. Of these were GATA3, FOXO6, and E2F1 strongly upregulated direct EGR3 targets, and PAX5 an indirect EGR3 target with strong differential expression (log2fc = 9.47, p = 4.95 × 10 −41 ) (Fig. 4B). Especially PAX5 has been described as a mediator of B cell identity and B-lineage commitment [33][34][35]. Furthermore, analysis of the Leukemia MILE study data set using BloodSpot revealed the strongest gene correlations of PAX5 to be CD19 and CD79A.
Concluding this, the binding and expression target analysis followed by motif scanning uncovered GATA3, FOXO6, E2F1, and PAX5 as intermediate factors in the EGR3 transcriptomic network, regulating B-lineage specification and commitment gene expression.
IGHM, CD79A, BLK and PTPN6 expressions correlate strongly with each other and partly with EGR3 among 50 infant KMT2A::AFF1 proB-ALL patients We aimed to investigate the regulation of B-lineage specification and commitment gene expression by EGR3 in patient material and therefore assessed the transcription levels of IGHM, CD79A, BLK, and PTPN6 in 50 infant KMT2A::AFF1 proB-ALL patients at diagnosis (Table 1). This patient cohort was already investigated in our recent study [5], from which we obtained the EGR3 expressions. cDNA of peripheral blood was used for quantitative real-time PCR (qRT-PCR) based gene expression measurement and subsequent Pearson correlation testing of ΔC T values was performed. The resulting Pearson correlation matrix demonstrated a very strong and highly significant correlation between the IGHM, CD79A, BLK, and PTPN6 gene expressions with Pearson r values above 0.70, suggesting their collective belonging to a distinct gene expression program (Fig. 5A, B).     The transcription levels of EGR3 correlated less strongly with IGHM and BLK (Pearson r = 0.35 and 0.34, respectively) ( Fig. 5C-G), and not significantly with CD79A and PTPN6 (Fig. 5C, H-K). Apart from that, higher patient age correlated with higher EGR3 expression (Pearson r = −0.31), whereas this was not the case for IGHM, CD79A, BLK, and PTPN6 (Fig. 5L).
These results confirm the B-lineage phenotype associated with engineered EGR3-overexpression suggesting direct causality between EGR3 expression and B-lineage specification. Furthermore, other factors are likely to influence the EGR3 target gene expression as well. The latter conclusion is concordant with the motif scan revealing that EGR1, EGR2, EGR4, KLF16, SP3, SP8, ZEB1, ZNF354C, and SOX10 bind the same genes as EGR3, suggesting these factors as possible co-regulators.
Low IGHM, CD79A, BLK, and PTPN6 expressions indicate a patient subgroup with inferior EFS The strong correlation between the IGHM, CD79A, BLK, and PTPN6 gene expressions prompted us to analyze their distribution within the patient cohort. For that purpose, a principal component analysis (PCA) of the expression levels was conducted using the open-source tool ClustVis [22], uncovering a distinct bimodal distribution of patient clusters (Fig. 6A). The corresponding heatmap visualized that patients showed either a high (BCR hi , n = 37) or low (BCR lo , n = 13) expression of the analyzed B-lineage-related genes (Fig. 6B). This bimodal distribution points to differences in the B-lineage specification process, with BCR lo patients having a less committed B cell identity, indicated by the low expression of B-lineagerepresenting genes. Accordingly, the high expression of IGHM, CD79A, BLK, and PTPN6 in the BCR hi group indicates a more mature proB-cell phenotype. The strict bimodal clustering suggests that the development from BCR lo to BCR hi is rather a stepwise maturation process than a fluent transition, and presumably reflects the developmental stages of early and late proB cells.
We compared the event-free survival (EFS) for all patients with available outcome data (n = 43) considering their assignment to the BCR lo (n = 9) or BCR hi (n = 34) group. Four-year-EFS of BCR lo patients was significantly poorer, reaching only 22.2 ± 13.9% compared to 49.8 ± 9.3% of the BCR hi group (p = 0.0205, Fig. 6C). Notably, all events within the BCR lo group occurred within the first 12 months from diagnosis. To test for age and HOXA status as possible confounders, we performed the same PCA analysis with patients assigned to age groups (0-6 months vs. 6-12 months) (Fig. 6D, E) and to HOXA9 status groups (HOXA9 high vs. low) (Fig. 6F, G). We observed that both factors did not affect dichotomous separation of patients as young and old as well as HOXA9 high and low groups overlap almost completely.

BCR lo vs. BCR hi clustering of patients is sustained at relapse
Considering the elevated level of EGR3 expression at relapse in infant KMT2A-r proB-ALL, we aimed to examine the IGHM, CD79A, BLK, and PTPN6 gene expressions at the time of relapse. To do so, we performed the same gene expression analysis with the relapse (rel) cohort of our former study. This cohort comprised 18 infant KMT2A-r proB-ALL patients (14 KMT2A::AFF1 cases) at the time of relapse, composing an independent cohort, not matched to the diagnosis (dx) cohort (Table 2). . Plots display the respective single cell populations and were normalized to mode. C CD19 median fluorescence intensities (MFI) of SEM::mock and SEM::EGR3 singlets. Significance was tested using a two-tailed t test (p < 0.0001). Error bars indicate standard deviation. D Mean percentages of the CD19 hi CD79A + population relative to singlets. Error bars indicate standard deviation and significance was tested using a two-tailed t test (p = 0.0006). E CD79A median fluorescence intensities (MFI) of the CD19 hi CD79A + populations of SEM::mock and SEM::EGR3. Significance was tested using a two-tailed t test (p = 0.0008). Error bars indicate standard deviation. Fig. 3 Integration of MACE-Seq and ChIP-Seq data revealed the EGR3 regulome of infant KMT2A::AFF1 ALL. A Assignment of gene ranks to the cumulative fraction of genes using BETA identifies EGR3 as a direct activator and repressor of target genes. Analysis of the EGR3 regulome using the PANTHER database with up-and downregulated target genes subclassified considering their corresponding protein class (B), biological function/GO term (C) and chromosomal location (D). Numbers indicate the count of up-(blue) and downregulated (red) genes. E Volcano plot visualizing genes of the Reactome BCR geneset according to their differential expression indicated as the log2-fold change (log2fc) and p value (MACE-Seq data). F Volcano plot visualizing genes of the Reactome BCR geneset according to their differential expression (log2fc) and direct EGR3 regulation (rank product) (MACE-Seq and BETA data). Table 6. Genes of the set 'signaling by the B cell receptor' (Reactome, R-HSA-983705), their type of regulation by EGR3 and differential expression.  As for the dx cohort, a PCA revealed co-segregation of patients into a BCR lo and a BCR hi group (Fig. 7A). Contrasting the dx cohort, clustering of relapsed patients was less cohesive, with patients REZ2, REZ14, and REZ16 exhibiting low IGHM, CD79A, and BLK expressions, but increased PTPN6 gene expression levels reaching those of the BCR hi group (Fig. 7B).
Pearson correlation testing of the ΔC T values showed strong correlations between the IGHM, CD79A, BLK, and PTPN6 gene expressions, almost resembling the Pearson r and corresponding p values of the dx cohort ( Fig. 7C-H). Contrasting the dx cohort, the rel cohort did not indicate a significant correlation for any of the aforementioned gene expressions with that of EGR3 (Fig. 7I). The transcription level of EGR3 was approximately 100-fold elevated at relapse compared to diagnosis, independent of BCR lo or BCR hi classification (Fig. 7K). Unlike EGR3, the expressions of IGHM, CD79A, BLK, and PTPN6 were not generally elevated at relapse, rather were the expression level differences between the BCR lo and BCR hi groups almost the same comparing the diagnosis and relapse cohorts ( Fig. 7M-P). These results indicate that BCR lo vs. BCR hi clustering of patients was sustained at relapse and elevated EGR3 levels were not concomitant with IGHM, CD79A, BLK and PTPN6 gene expressions. This suggests a minor role for EGR3 in B-lineage-specific gene regulation at the time of relapse and demonstrates preservation of both developmental proB cell stages during relapse formation, although to a lesser extent.

DISCUSSION
This study identified EGR3 as a regulator of B-lineage specification and commitment processes in the context of infant KMT2A::AFF1 acute lymphoblastic leukemia. This characterization is in line with the fact that EGR3 is generally downregulated in B cell malignancies according to the Leukemia MILE study [27] and upregulated in naïve and mature B cells referring to the Bloodspot DMAP dataset [26] (Supplementary Fig. 1B). Furthermore, the murine homologues Egr3 and Egr2 are required for B cell proliferation upon antigen receptor stimulation [20,36]. Accordingly, our study and these data imply the limitation of EGR3 expression to be concomitant with a differentiation block of the B-lineage in hematologic malignancies.
That EGR3 is involved in B-lineage specification and commitment is further strengthened by the identification of PAX5 as an intermediate factor of the EGR3 regulome. PAX5 is a known activator of B cell identity regulating the gene expression of CD19, CD79B, and EBF1 [33,37]. In addition, PAX5 is a regulator of B cell development [38] and B-lineage commitment [35,39,40]. In this regard, an important difference between EGR3 and PAX5 is that the latter not only binds DNA response elements but also regulates chromatin accessibility at target promoters [41]. This points to a collaborative effect of EGR3 and PAX5, regulating in part the same genes by different means.
Finally, CD19 is commonly accepted as a hallmark B cell commitment marker [42,43], and we demonstrate direct transcriptional upregulation of CD19 by EGR3 resulting in approximately ten-fold increased surface expression of the corresponding protein (Fig. 2). Importantly, CD19-directed therapies including CAR T cells and Blinatumomab were reported to lead to lineage switch of infant KMT2A::AFF1 ALL [44][45][46]. This process has been shown to be accompanied by a loss of the Blineage-specific transcription factors EBF1 and PAX5 [47]. As our data suggest a collaborative effect between PAX5 and EGR3, the role of EGR3 during lineage switch of KMT2A-r B ALL needs further investigation.
In contrast to primary diagnosis, elevated EGR3 expression at relapse was not accompanied by increased expression of B lineage-associated direct targets. Consequently, EGR3 expression at relapse does not indicate a more committed or mature proB cell identity, but could reflect external stress stimuli including response to chemotherapy or inflammatory mediators in the leukemic microenvironment as EGR3 is involved in rapid and transient stress responses and inflammatory signaling [48,49].
Fetal pre-pro B cells are the earliest B-lineage-committed progenitors giving rise to proB cells [50]. In this context, both progenitor subtypes can be distinguished regarding their expression of B cell-associated genes including CD79A, CD19, PAX5, MME, EBF1, DNTT among many others [50]. As we worked with RNA extracted from peripheral blood, our gene expression analysis considered mainly the blast population. Therefore, it is very likely that clustering of patients into the BCR lo and BCR hi group at diagnosis and relapse represents an early vs. late proB state of the blast population, with BCR lo representing pre-proB cells and BCR hi proB cells. This interpretation is in line with the understanding of B-lineage commitment as a stepwise process, orchestrated by transcription factor-mediated gene regulatory networks [51,52]. Furthermore, the inferior survival of the BCR lo group could be explained by decreased maturity of blasts within the proB cell state, likely to go along with elevated lineage plasticity, a hallmark of KMT2A-r leukemia and  Fig. 4 Motif scanning of all differentially expressed genes. A Motif scanning result of all direct EGR3 target genes. T score, p value, transcription factor and sequence logo were identified using BETA plus. Upper T score and p value apply to upregulated genes, lower T score and p value apply to downregulated genes. B Motif scanning result of differentially expressed but not directly EGR3-regulated genes. T score, p value, transcription factor and sequence logo were identified by BETA plus. Upper T score and p value apply to upregulated genes, lower T score and p value apply to downregulated genes.
implicated in therapy resistance [53,54]. On the other hand, the undifferentiated phenotype of the BCR lo group could indicate a more aberrant oncogenic signaling program interfering with B-lineage gene expression networks. Nevertheless, the identification of four B-lineage genes whose expression reflects outcome enables early gene expression-based risk-stratification of patients. That we identified both developmental proB stages at diagnosis and relapse with similar gene expression differences indicates preservation of the blast cell identity from diagnosis through the minimal residual disease phase until relapse formation. Accordingly, it could well be that this cell identity is determined by the cell of origin, which is suggested to belong to the fetal liverderived hematopoietic progenitor cell compartment [55][56][57]. However, further studies are needed to uncover mechanisms mediating the developmental arrest of blasts either in the pre-proB or proB stage.
In summary, analysis of the EGR3 regulome of infant KMT2A-r iALL identified EGR3 as a regulator of B-lineage commitment. Besides, our study presents four B-lineage genes with prognostic  D PCA of the diagnosis cohort among the IGHM, CD79A, BLK, and PTPN6 expressions with patients assigned to the age groups 0-6 months and 6-12 months. E Heatmap of the PCA visualizing bimodal clustering of patients is not affected by patient age at diagnosis. F PCA of the diagnosis cohort among the IGHM, CD79A, BLK, and PTPN6 expressions with patients assigned to HOXA status groups HOXA9 high (hi) and low (lo). G Heatmap of the PCA visualizing bimodal clustering of patients is not affected by HOXA status. significance, suitable for gene expression-based risk stratification of KMT2A-r iALL patients.