DNA methylation and copy number variation profiling of T-cell lymphoblastic leukemia and lymphoma

Despite having common overlapping immunophenotypic and morphological features, T-cell lymphoblastic leukemia (T-ALL) and lymphoma (T-LBL) have distinct clinical manifestations, which may represent separate diseases. We investigated and compared the epigenetic and genetic landscape of adult and pediatric T-ALL (n = 77) and T-LBL (n = 15) patient samples by high-resolution genome-wide DNA methylation and Copy Number Variation (CNV) BeadChip arrays. DNA methylation profiling identified the presence of CpG island methylator phenotype (CIMP) subgroups within both pediatric and adult T-LBL and T-ALL. An epigenetic signature of 128 differentially methylated CpG sites was identified, that clustered T-LBL and T-ALL separately. The most significant differentially methylated gene loci included the SGCE/PEG10 shared promoter region, previously implicated in lymphoid malignancies. CNV analysis confirmed overlapping recurrent aberrations between T-ALL and T-LBL, including 9p21.3 (CDKN2A/CDKN2B) deletions. A significantly higher frequency of chromosome 13q14.2 deletions was identified in T-LBL samples (36% in T-LBL vs. 0% in T-ALL). This deletion, encompassing the RB1, MIR15A and MIR16-1 gene loci, has been reported as a recurrent deletion in B-cell malignancies. Our study reveals epigenetic and genetic markers that can distinguish between T-LBL and T-ALL, and deepen the understanding of the biology underlying the diverse disease localization.

Introduction T-cell acute lymphoblastic leukemia (T-ALL) and T-cell lymphoblastic lymphoma (T-LBL) are precursor lymphoid neoplasms, characterized by the uncontrolled proliferation of progenitor T-cells. A lymphoid neoplasm with a bone marrow infiltration of malignant blast cells by more than 20-25% is classified as T-ALL 1 . However, if the bone marrow infiltration is less than 25% and the primary disease localizes in the mediastinum, lymph nodes or the extramedullary tissues, it is classified as T-LBL 1 . Despite differences in clinical manifestation, T-ALL and T-LBL have an overlapping immunophenotype and recurrent genetic aberrations, such as NOTCH1 activating mutations, CDKN2A/B deletions (cell cycle defects) and lossof-heterozygosity in chromosome 6q 2,3 . Previous studies 4,5 have identified gene expression differences between pediatric T-ALL and T-LBL, but no genetic or epigenetic markers has yet been defined to distinguish between the two malignancies, which are currently differentiated solely based on bone marrow infiltration. DNA methylation classification has been implicated as a prognostic and diagnostic marker in various cancers including hematological malignancies [6][7][8] . We have identified a CpG island methylator phenotype (CIMP) panel, consisting of 1293 specific CpG sites, which classified pediatric T-ALL patients into clinically relevant subgroups 6,7 . The CIMP− (low methylation) T-ALL patient subgroup had a significantly worse prognosis compared to the CIMP+ (high methylation) subgroup 6,7,9 .
The aim of this study was to further analyze DNA methylation-based heterogeneity in T-cell lymphoblastic malignancies, with focus on investigating if the previously reported CIMP subgroups identified in pediatric T-ALL were also present in T-LBL and adult T-ALL patients. Furthermore, the methylomic and genomic landscape of T-ALL and T-LBL were compared using high-resolution genome-wide DNA methylation and copy number variation (CNV) detection arrays to investigate and identify molecular markers that could distinguish between T-ALL and T-LBL. Such epigenetic and genetic markers can give an insight into the biological mechanisms underlying the divergent clinical manifestation of the two neoplasms, as well as reveal novel therapeutic targets and strategies.

Patient samples
Bone marrow and fresh frozen lymph node tissue samples from diagnostic (n = 76) and late relapsed (n = 1) T-ALL and diagnostic (n = 15) T-LBL patients were retrieved along with complete remission bone marrow samples (n = 4). The adult (age > 18 years, n = 7) and pediatric (age ≤ 18 years, n = 8) T-LBL patients, adult T-ALL patients (age > 18 years, n = 12), were diagnosed at University Hospital of Umeå, Sweden, between years 1998 and 2012. The study included all available T-ALL and T-ALL samples diagnosed during the specified time period and no further selection was done. The diagnosis was based on morphologic, cytogenetic and immunophenotypic analysis, according to the WHO classification of lymphoid neoplasms 10 . Patients were classified as T-ALL or T-LBL based on blast cell infiltration in the bone marrow, according to previously described guidelines 1 . The 65 Nordic pediatric (age < 18 years) T-ALL patient samples have been previously described and analyzed in our lab by HumanMethylation450K arrays (data deposited to the NCBI Gene Expression Omnibus (GEO) repository accession no. GSE69954) 6 .
The study was approved by the Regional and/or National Ethics Committees and the patients and/or their guardians provided informed consent in accordance with the Declaration of Helsinki.

DNA Extraction and bisulfite conversion
DNA extracted from freshly frozen lymph nodes and bone marrow tissue samples were sodium bisulfite converted using Zymo EZ DNA methylation kit (Zymo Research, CA, USA) according to the manufacturer´s protocol.
DNA Methylation array analysis, CIMP classification, and epigenetic and mitotic age prediction Illumina´s HumanMethylation450K BeadChip arrays (Illumina Inc., San Diego, CA) were used for genomewide methylation profiling of bisulfite converted DNA. These arrays interrogate 485,577 CpG sites across the genome and the average beta (avg. β) methylation level of each CpG site (CpGs) ranges from 0 (unmethylated) to 1 (fully methylated).
The raw methylation data was extracted using the Genome Studio software (Illumina), and the data was preprocessed, normalized and filtered, as described previously 6 ( Supplementary Fig. S1). The methylation data were corrected for the bias introduced by different bead types in the methylation array (BMIQ normalization) 13 , and batch-effect corrected using ChAMP Bioconductor package version 2.8.9 14 with default parameters.
DNA methylation-based models were used to predict epigenetic and mitotic age 15,16 .

Differential methylation analysis
Differential methylation analysis of promoter-associated CpGs i.e., CpG sites located up to 1500 base pair upstream of genes, (n = 146035 CpGs) was performed between T-ALL and T-LBL using ChAMP 14 (Supplementary Fig. S1). Batch effect corrected methylation data were analyzed using default parameters and significant differentially methylated CpGs (DM-CpGs) with adjusted p-value < 0.05 and absolute delta beta ≥0.3 were retained ( Supplementary Fig. S1). Functional analysis was performed using the functional annotation clustering tool in Database for Annotation, Visualization and Integrated Discovery (DAVID) v6.7 17 . Clusters including more than 10 genes and with an enrichment score >1.0 were selected as significant (Supplementary Table S1).

CNV analysis by DNA methylation arrays
The raw signal intensity data from the HumanMethy-lation450K arrays was imported to R by the minfi package 18 and CNV analysis was performed by the conumee package 19 , where two T-ALL complete remission bone marrow samples were used as reference. Parameters and limits for deletion and gains were set for each individual array through manual inspection. The CNV analysis included four complete remission samples as negative controls in order to identify possible false positives due to technical biases or germline variation. CNVs identified by the segmentation with fewer than 10 observations, as defined by the conumee package, were excluded from the analysis. CNV segments overlapping in three or more samples in either T-ALL or T-LBL and in fewer than three of the four negative controls were summarized with the minimal common region. One of the samples was excluded as the limits could not be set with any certainty due to high variation. Group comparisons of CNVs were performed using Fisher's exact test on individual CNVs. Human genome GRCh37 (NCBI)/ hg19 (UCSC) was used for assigning all chromosome positions.

CNV analysis by genome-wide SNP array
Genome-wide copy number variation (CNV) analysis was performed using whole-genome single-nucleotide polymorphism (SNP) Infinium CytoSNP-850K v1.1 BeadChip microarrays (Illumina), covering approximately 850 000 SNPs. In accordance with the manufacturers' protocol, 200 ng DNA was hybridized on a beadchip after whole-genome amplification, after which the arrays were scanned using the HiScan instrument (Illumina). Genotyping results were visualized, normalized and clustered using Genotyping module of the GenomeStudio software (Illumina). The cnvPartition 3.2.0 (Illumina) was applied for CNV detection by retrieving Log R Ratio (LRR, the ratio between the observed and the expected probe intensity) and the B Allele Frequency (BAF). Deviations from the expected values indicate copy number alterations. Human genome GRCh37 (NCBI)/hg19 (UCSC) was used for assigning all chromosome positions.

Bioinformatic and statistical analysis
Principal component analysis (PCA) of centrally scaled DNA methylation (avg. β values) and log2 scaled gene expression (average signal values) data was carried out using SIMCA version 14 (Umetrics, Umeå, Sweden). For statistical analysis, the Statistical Package for the Social Sciences (SPSS Inc., Chicago, IL) software version 24 was used along with R.
Statistical tests included Fisher´s exact test for comparing categorical variables and independent samples Ttest for comparing continuous variables. Welch´s two sample T-test was used to compare log2 transformed gene expression data. Cluster analyses were performed using Wards method with Euclidean distance metric.

Demographic data
A total of 15 T-LBL (7 adult and 8 pediatric), 77 T-ALL (12 adult and 65 pediatric) patient samples, and four complete remission samples were analyzed using DNA methylation arrays. Publicly available DNA methylation array data of normal bone marrow and lymph node samples 12 , as well as sorted CD3+ and CD34+ cells 11 were further included as reference samples. There were no significant differences in age and gender distribution between T-ALL and T-LBL patients, and both malignancies showed higher prevalence in males ( Table 1).
The T-ALL and T-LBL patient samples were CIMP classified according to the previously defined CIMP panel, consisting of 1293 CpGs 6 . Irrespective of disease (T-ALL vs. T-LBL) and age group, the frequency of CIMP+ was similar (61.5-71.4%) ( Table 1).

Genome-wide DNA methylation profiling
Genome-wide DNA methylation patterns across adult and pediatric T-ALL and T-LBL patient samples along with reference samples were explored by principal component analysis (PCA) analysis on batch-effect corrected DNA methylation array data ( Fig. 1, Supplementary Fig.  S2). The PCA plots were generated using center scaled beta (avg. β) values of the genome-wide CpGs retained after filtering (n = 397316), comprising of 76% generelated and 24% intergenic CpGs. The PCA did not project T-ALL and T-LBL as separate clusters, nor on patients' age group in the first and second principal components (Fig. 1). However, the non-malignant reference samples (bone marrow and lymph node origin)  the presence of CIMP subgroups in all age groups of T-LBL and T-ALL (Fig. 2a).
The CIMP subgroups in T-LBL and adult T-ALL had similar characteristics as the CIMP subgroups in pediatric T-ALL 9 (Fig. 2a). The methylation profile of the CIMP− malignant samples were closer to the CIMP profile of the normal reference samples (complete remission samples and normal sorted CD34+ and CD3+ T-cells), whereas the CIMP+ samples were more hypermethylated compared to the reference samples (Fig. 2a).
The CIMP subgroups in pediatric T-ALL have been previously shown to have differential replicative histories 9 , reflected by significant differences in predicted epigenetic and mitotic age 9,15,16 . Similar differences in epigenetic age and mitotic age were shown in the CIMP subgroups within T-LBL and adult T-ALL, with the CIMP+ samples having an older predicted epigenetic 15 and mitotic age 16 (Fig. 2b).

Differential DNA methylation analysis between T-ALL and T-LBL
Despite the overlapping global DNA methylation patterns, T-ALL and T-LBL have different patterns of disease distribution and clinical manifestation. In order to investigate methylomic differences between T-ALL and T-LBL, differential methylation analysis was performed using the ChAMP algorithm focusing on promoterassociated CpG sites (n = 146,035). To avoid identifying age-related DNA methylation differences, differential methylation analysis was carried out firstly, between all T-ALL (n = 77) and T-LBL (n = 15) samples, secondly, between adult T-ALL (n = 12) and adult T-LBL (n = 7) and lastly, between pediatric T-ALL (n = 65) and pediatric T-LBL (n = 8) (Supplementary Fig. S1, Supplementary Table S2). A total of 634 differentially methylated CpGs (DM-CpGs), with an adjusted p-value < 0.05 and absolute delta β (absΔβ) value ≥ 0.3, were common between all three analyses ( Fig. 3a-b).
Hierarchical clustering separated the 634 DM-CpGs and analysis was focused on the two clusters (cluster 2 and 4) that were not associated with CIMP class (Fig. 3b,  Supplementary Fig. S3 with CIMP profiles (Fig. 3c, Supplementary Fig. S3). PCA using avg. β values of the 128 DM-CpGs (from clusters 2 and 4) clearly separated T-ALL and T-LBL samples, without discriminating between different patient age groups or the CIMP phenotype (Fig. 3d). Furthermore, the methylation signature of the 128 DM-CpGs did not separate the non-malignant lymph nodes and bone marrow reference samples, showing that the DM-CpGs did not represent tissue-specific methylation differences between T-ALL and T-LBL (Fig. 3d).

Functional relevance of DM-CpGs between T-ALL and T-LBL evaluated by integrated gene expression analysis
The potential functional relevance of the 128 DM-CpGs was evaluated by retrieving publicly available gene expression array data of T-ALL (n = 10) and T-LBL (n = 20) 5 . The public gene expression data 5 was used to evaluate whether the transcriptomic profile of the genes, corresponding to the identified 128 DM-CpGs, also distinguished between T-ALL and T-LBL. The 128 DM-CpGs mapped to a total of 110 unique genes out of which gene expression data were available for 100 genes (219 gene expression probes) in 10 T-ALL and 20 T-LBL pediatric patients 5 . PCA of the log2-transformed average signal values of 219 gene expression probes clustered T-ALL and T-LBL samples separately (Fig. 4a). This showed that the differentially methylated CpG loci identified in our cohort of T-ALL and T-LBL patients, also separated the two diseases based on the corresponding gene expression profile in a separate cohort of T-ALL and T-LBL patients 5 (Fig. 4a). Gene enrichment analysis of these 110 genes, revealed significant enrichment of transmembrane and membrane-associated proteins (Supplementary  Table S1).
An integrated DNA methylation and gene expression analysis using Basso´s cohort was performed for the top ten most significant DM-CpGs (Supplementary Table S3). Amongst the top most significant DM-CpGs, two sites hypomethylated in T-LBL compared to T-ALL, mapped to the shared promoter region of SGCE and PEG10 genes (Fig. 4b, Supplementary Table S3). The expression of the SGCE and PEG10 genes was significantly differentially expressed in Basso's cohort (SGCE log2FC 3.98, p < 0.001 and PEG10 log2FC 1.87, p < 0.001) with a higher gene expression in T-LBL than in T-ALL (Fig. 4c, Supplementary Table S3).

Copy number variation analysis identified a higher frequency of Ch13q14.2 deletions in T-LBL
To explore and compare the genomic landscape of T-ALL and T-LBL, we screened for recurrent deletions or gains in patient samples using the HumanMethyla-tion450K arrays. A total of 17 chromosomal regions were identified with recurrent copy number variations out of which, a majority were common in both T-ALL and T-LBL patients (Fig. 5a, Table 2). Deletion in the CDKN2A/ 2B locus (9p21.3), a known recurrent deletion in T-ALL and T-LBL, was detected in 10% of T-ALL and 14% of T-LBL in our cohort (Table 2).
In order to validate the 13q14.2 deletions and gains in chromosome 5 in T-LBL, all fifteen T-LBL samples were further analyzed by CytoSNP-850K arrays ( Table 3, Supplementary Table S4). SNP array analysis verified the gain of all or parts of chromosome 5 in three (21%) of the T-LBL samples (Supplementary Table S4). The recurrent 13q14.2 deletions in five (36%) of the T-LBL samples were also confirmed by SNP-array analysis ( Table 3, Fig. 5c).

Discussion
The biology behind the different clinical manifestation in T-ALL and T-LBL is still not well established. Despite overlapping immunophenotypic and morphological features, T-ALL and T-LBL have divergent clinical manifestation, with lymph nodes/extramedullary tissue infiltration associated with T-LBL and a predominant bone marrow infiltration in T-ALL. By high-resolution genome-wide DNA methylation and copy number variation detection arrays, we aimed at exploring the methylomic and genomic landscape in T-ALL and T-LBL to identify molecular markers that could differentiate between the two diseases and could also help evaluate the biology behind the differential pattern of primary disease distribution. Neither global, nor genome-wide promoterfocused DNA methylation analysis by PCA, could separate T-ALL patients from T-LBL patients or distinguish between adult and pediatric patients. It has been shown previously that adult and pediatric T-ALL have significant overlap of leukemia-specific genetic and cytogenetic lesions 20,21 . It can therefore be speculated that adult and pediatric cases also have similar methylation profiles, as demonstrated by the co-clustering of adult and pediatric patients in our multivariate analysis. The largest variation in global and promoter-associated DNA methylation pattern between the patient samples was based on CIMP status, verifying the presence of epigenetic CIMP subgroups in a broader set of T-cell malignancies including pediatric and adult T-LBL and adult T-ALL patients, which has not been shown before 6,7,9 . Even though the prognostic significance of CIMP profiling in T-LBL and adult T-ALL could not be validated because of limited cohort size, the CIMP subgroups in aforementioned diagnoses showed similar differences in cellular proliferative history, as shown previously by us in pediatric T-ALL CIMP subgroups 9 . The CIMP+ T-LBL and adult T-ALL patients had older predicted epigenetic and mitotic age compared to the CIMP− patients, which is in line with our previously published data 9 .
In order to identify specific genomic loci with differential DNA methylation between T-ALL and T-LBL, we performed differential methylation analysis in the promoter-associated CpG sites, considering the functional relevance of DNA methylation in promoters. A set of 128 DM-CpG sites was identified, that separated T-ALL and T-LBL as distinct clusters, without reflecting methylomic variations based on tissue type, age, or CIMP heterogeneity. Functional analysis of genes corresponding to the 128 DM-CpGs between T-ALL and T-LBL revealed an overrepresentation of membrane and transmembrane associated proteins. Gene expression signatures, discriminating T-ALL and T-LBL identified in previous studies, were also shown enriched in genes encoding cellular adhesion proteins and extracellular matrix proteins 4,5 . This implies that T-ALL and T-LBL have biological differences that might govern the difference in their different disease manifestation. The most significant differentially methylated sites mapped to the shared CpGisland rich promoter region of Sarcoglycan-epsilon (SGCE) and its neighbor, paternally expressed gene 10 (PEG10), which are both maternally imprinted genes on chromosome 7q21 22 . PEG10 plays a vital role in placental formation and differentiation of adipocytes while SGCE gene encodes a transmembrane protein that links the actin cytoskeleton to extracellular matrix 23 . However, deregulated PEG10 expression is associated with malignant transformation, affecting cell proliferation and apoptosis 24 . Both PEG10 and SGCE are known to be overexpressed in high-risk B-cell chronic lymphocytic leukemia (B-CLL), with promoter DNA methylation regulating the gene expression 24 . Overexpression of PEG10 has also been implicated in progression of various solid cancers including hepatocellular carcinoma 25 , pancreatic cancer 26 , breast cancer 27 and prostate cancer 24 . Overexpression of PEG10 in these solid tumors was shown to correlate with higher TNM stage and lymph node metastasis 28 . Interestingly, long non-coding RNA PEG10, also encoded from 7q21, was shown upregulated in diffuse large B cell lymphoma (DLBCL), which correlated with worse prognosis 29 . One of the oncogenic effects of PEG10 in the solid cancers has been shown to promote metastatic migration and promoting epithelial-mesenchymal transition of neoplastic cells 24 . The overlapping epigenetic and transcriptomic disparities identified between T-ALL and T-LBL in our study might contribute to the different manifestations of the malignancies. However the functional relevance has to be further evaluated in order to conclude the epigenetic contribution to disease distribution.
Furthermore, we explored possible similarities and differences in genetic alterations between T-LBL and T-ALL. The most frequently commonly observed CNVs in T-ALL and T-LBL was the deletion of 9p21.3 including the CDKN2A/2B loci. Previous studies have also identified 9p21.3 deletions commonly recurrent in both T-ALL and T-LBL 2,30-32 . Interestingly, we identified and validated a frequently deleted genomic region on chromosome 13q14.2 present in 36% (5/14) of the T-LBL samples, that was not observed in any T-ALL sample (0/77). Similarly, gain of all or parts of chromosome 5 were observed in T-LBL patients in our study cohort. Trisomy 5, gain of the whole chromosome 5, is recurrent in hematological malignant cases with hyperdiploid karyotype (chromosome number >50). Our findings are in line with a previous study that reported hyperdiploid karyotype significantly more frequent in T-LBL compared to T-ALL patients 33 .
The presence of 13q14.2 deletions in T-LBL samples has also been described in a previous study that reported 13q14.2 deletions in 2 out of 12 T-LBL (17%) 31 . However, that study also observed 13q14.2 deletions in 6 out of 57 (11%) T-ALL patients, which were not seen in our Nordic T-ALL cohort 31 . The 13q14.2 region partially overlaps with the most commonly deleted region in B-cell chronic lymphocytic leukemia (B-CLL) [34][35][36] and contains RB1, KCNRG, TRIM13, DLEU1/2, MIR16-1, and MIR15A genes. In addition, 13q14 deletions have been observed in other hematological malignancies including mantle cell leukemia, multiple myeloma and acute myeloid leukemia [37][38][39][40] . One of the downstream inhibitory targets of MIR16-1 and MIR15A include the anti-apoptotic BCL2 gene. In B-CLL, loss of 13q14.2 region has been shown associated with reduced MIR16-1 and MIR15A expression and increased BCL2 mRNA levels 41 . Another study showed through Zebrafish modeling, that increased expression of BCL2 favored the development of T-LBL over T-ALL 42 . However, further functional analyses are required to elucidate the link between 13q14.2 deletions, BCL2 overexpression and T-LBL biology. If 13q14.2 deletions and subsequent BCL2 overexpression are validated as potential oncogenic events in T-LBL, the next step would be to evaluate the efficacy of drugs like Venetoclax, for treating 13q14.2 deleted T-LBL patients. Venetoclax, a selective inhibitor of BCL2, is currently used as a targeted therapy for treating CLL 43 and acute myeloid leukemia 44 , but its efficacy in treating T-LBL patients is yet to be determined.
In conclusion, we showed that DNA methylation CIMP subgroups, with prognostic significance in pediatric T-ALL, are also present in adult T-ALL, as well as in pediatric and adult T-LBL, suggesting a broader relevance of CIMP classification in T-cell malignancies. Furthermore, epigenetic and genetic profiling revealed molecular differences between T-ALL and T-LBL, which may contribute to the differential biology of the two neoplasms.