DNA methylation of chronic lymphocytic leukemia with differential response to chemotherapy

Acquired resistance to chemotherapy is an important clinical problem and can also occur without detectable cytogenetic aberrations or gene mutations. Chronic lymphocytic leukemia (CLL) is molecularly well characterized and has been elemental for establishing central paradigms in oncology. This prompted us to check whether specific epigenetic changes at the level of DNA methylation might underlie development of treatment resistance. We used Illumina Infinium HumanMethylation450 BeadChips to obtain DNA methylation profiles of 71 CLL patients with differential responses. Thirty-six patients were categorized as relapsed/refractory after treatment with fludarabine or bendamustine and 21 of them had genetic aberrations of TP53. The other 35 patients were untreated at the time of sampling and 15 of them had genetic aberration of TP53. Although we could not correlate chemoresistance with epigenetic changes, the patients were comprehensively characterized regarding relevant prognostic and molecular markers (e.g. IGHV mutation status, chromosome aberrations, TP53 mutation status, clinical parameters), which makes our dataset a unique and valuable resource that can be used by researchers to test alternative hypotheses. Measurement(s) DNA methylation Technology Type(s) methylation profiling by array Factor Type(s) TP53 mutation status • response to fludarabine or bendamustine • chromosomal aberration • IGHV mutation status Sample Characteristic - Organism Homo sapiens Measurement(s) DNA methylation Technology Type(s) methylation profiling by array Factor Type(s) TP53 mutation status • response to fludarabine or bendamustine • chromosomal aberration • IGHV mutation status Sample Characteristic - Organism Homo sapiens Machine-accessible metadata file describing the reported data: https://doi.org/10.6084/m9.figshare.12006624


Background & Summary
Chronic lymphocytic leukemia (CLL) is the most common leukemia in the Western world and mainly affects elderly patients 1 . Its incidence rate was 8.3 cases per 100 000 men and 5.8 cases per 100 000 women in Germany in 2014 2 . CLL is characterized by accumulation of small B lymphocytes with a mature appearance in blood, bone marrow, lymph nodes and other lymphoid tissues 3 . The clinical course of CLL differs depending on the biological characteristics of the disease (hypermutation status of the immunoglobulin heavy-chain genes (IGHV), presence of specific genomic aberrations and/or recurrent mutations in oncogenes and tumor suppressor genes) [4][5][6] . Some of these genetic features are associated with distinct epigenetic profiles, e.g. CLL tumours with high level of IGHV somatic hypermutation (M-CLL) have distinct DNA methylation patterns compared to CLL tumours with a low or absent IGHV mutational load (U-CLL) 7 .
Chemoimmunotherapeutic regimens like fludarabine, cyclophosphamide and rituximab (FCR) or bendamustine and rituximab (BR) achieve durable remissions in the majority of treatment-naïve CLL patients [8][9][10][11] . Although novel targeted and effective treatments for CLL were introduced in the past five years, FCR is not inferior to them as first-line therapy in the subgroup of young and fit patients with M-CLL without 17p deletion and/or TP53 mutation (del(17p)/TP53mut) 12,13 . Additionally, the high cost of novel targeted drugs limits their use in developing countries where conventional cytotoxic chemotherapy is still a viable option 14,15 . Thus, drugs like fludarabine and bendamustine will continue to be used in the future for treatment of CLL and development of resistance to these classical chemotherapeutics remains an important problem to study.
Chemorefractoriness of CLL is most often caused by functional impairment of the ATM-p53 DNA damage response pathway, mostly as a result of cytogenetic aberrations or mutations 16,17 . Del(17p) is found in 5% to 10% of patients at diagnosis but in up to 40% of patients relapsing after fludarabine-based treatment regimens 18  www.nature.com/scientificdata www.nature.com/scientificdata/ Del(17p) causes loss of one allele of the tumour suppressor TP53 but in about 80% of the cases the other allele is also inactivated by somatic mutation 6,18 . Nevertheless, even monoallelic aberrations of TP53 confer poor prognosis. Interestingly, some cases of chemorefractory CLL show dysfunction of the ATM-p53 pathway without respective genetic lesions 16,17 . Additional genes and pathways have been implicated in development of resistance to fludarabine, although also in these cases mutations are not always detectable 17,19,20 . These observations leave the possibility that chemoresistance in CLL can also be driven by epigenetic mechanisms. In order to find epigenetic changes associated with chemoresistance, we selected samples from patients that were relapsed/refractory after treatment with fludarabine or bendamustine and/or had del(17p)/TP53mut, as well as samples from CLL patients without del(17p)/TP53mut who had treatment-naïve disease or who achieved prolonged remission after treatment with fludarabine-or bendamustine-based regimens. The grouping of the samples is shown in Fig. 1. This selection of samples allows comparing relapsed/refractory patients to untreated patients after stratification for the presence or absence of aberrations affecting the TP53 locus. In our opinion, this stratification is important because presence of TP53 aberrations could obscure the effect of epimutations, as TP53 aberrations themselves are a strong determinant of chemoresistance 8,21,22 . On the other hand, the chosen design of the study could allow to detect epimutations that additionally occur in the subgroup of TP53-disrupted CLL tumours to further reduce their sensitivity to chemotherapy. Genome-wide DNA methylation in all selected samples (N = 72) was quantified using Illumina Infinium HumanMethylation450 BeadChips. The resulting raw signal data and a normalized data matrix are provided here as a resource for studying relationships between epigenetics and chemoresistance in CLL. Basic clustering and principal component analyses did not intuitively show grouping of samples according to chemoresistance status. However, we cannot exclude that more sophisticated analyses will be able to extract relevant differences and correlations. Notably, the dataset is unique with the high proportion of patients with del17p and/or mutated TP53. This dataset thus allows comparison of epigenetic profiles of CLL patients with negative prognostic markers to profiles of patients with chemosensitive CLL and CLL not harbouring TP53 defects.

Methods
Patient sample selection and molecular characterization. The biological and molecular characteristics of the 71 CLL patients included in the study are listed in Table 1 and Online-only Table 1. Fifty-one of the patients were subjects of the multi-centre CLL2O clinical trial (clinicaltrials.gov: NCT01392079) and were subdivided here into 4 subgroups depending on their del(17p)/TP53mut and treatment/response statuses as follows: groups A (N = 15), B (N = 10) and C (N = 11) consisted of patients with del(17p) and/or TP53 mutation and group E (N = 15) consisted of patients without del(17p) or TP53 mutation. Patients in group A were not treated previously but required treatment, patients in group B had relapsed after treatment with fludarabine-or bendamustine-containing regimens and patients in groups C and E were refractory to fludarabine or bendamustine. Additional 20 cases (group D) were patients whose tumours did not harbour del(17p) or TP53 mutation and who were not previously treated but some of whom required treatment and responded to subsequent therapy with fludarabine-or bendamustine-containing regimens (N = 6, Online-only Table 1). All patients had a confirmed diagnosis of CLL by flow cytometry; their IGHV mutational status and cytogenetics were also determined during the diagnostic workup. Unmutated IGHV gene (≥98% homology to germline) was detected by sequencing in 58 patients (81.7%). Fluorescent in-situ hybridization (FISH) analysis revealed the presence of del(17p) in 35 of a total 36 patients in groups A, B and C, as well as the absence of such an aberration in all patients from groups D and E. Mutated TP53 was detected in 30 of total 36 patients in groups A, B and C, and in none of the patients in groups D and E. One of the patients in group D had two consecutive samples taken with a time difference of 40 months (Online-only Table 1). All patients provided informed consent to subsequent analysis and research in accordance with the Declaration of Helsinki and under a protocol approved by the ethical committee of the University of Ulm.  www.nature.com/scientificdata www.nature.com/scientificdata/ equally divided among the 6 batches to mitigate possible batch effects. DNA was extracted from the cell pellets by the Qiagen AllPrep kit (#80204) and quantification and quality control were performed using a NanoDrop ND-1000 UV-Vis Spectrophotometer (Thermo Scientific, USA). One and a half micrograms of DNA from each sample were sent to the Genomics and Proteomics Core Facility of the German Cancer Research Center (DKFZ) for bisulfite conversion and hybridization to Illumina Infinium HumanMethylation450 BeadChips, according to the manufacturer's instructions. The bisulfite conversion was performed using the EZ DNA Methylation Kit (Zymo Research) and then the converted DNA was whole-genome amplified and fragmented. The processed samples were distributed randomly among 6 Illumina Infinium HumanMethylation450 BeadChips. The core facility was blinded regarding the identity of the samples and the experimental groups to which they belonged. After hybridization, single-base extension and staining, BeadChips were scanned using an Illumina iScan reader, and the fluorescence intensity raw data for each sample was recorded as two IDAT files, one for the green (Cy3) and one for the red (Cy5) channel 23 . Quality control of the whole procedure was performed using the Methylation Module of Illumina's GenomeStudio software.
Data processing and statistics. After acquiring the raw data, we performed quality control, preprocessing and basic analysis using R/Bioconductor with the RnBeads package 24 . Illumina probes known to be cross-reactive or overlapping known SNPs 25 were excluded from analysis. This was also done for probes giving unreliable measurements as determined by the Greedycut algorithm implemented in RnBeads. The data from the remaining probes were subjected to background subtraction using the Noob method 26 and beta-mixture quantile normalization (BMIQ) 27 . In a subsequent step, probes of non-CpG context, probes binding to sequences on sex chromosomes and probes with low standard deviation were filtered out. CpG sites on the sex chromosomes were excluded to avoid gender-specific methylation bias, as groups within our study did not contain equal numbers of males and females. CpG sites with low standard deviation are generally not informative and removing them from the analysis is a common approach to increase power for detection of differentially methylated CpGs and to improve sensitivity of clustering 28,29 . The data obtained by the remaining probes 23  Both multidimensional scaling (MDS) and principal-component analysis (PCA) were used as dimension reduction techniques. Hierarchical clustering was carried out using the Manhattan distance metric and complete linkage criteria.

Data records
The complete DNA methylation microarray dataset has been deposited in the NCBI Gene Expression Omnibus (GEO) database and consists of the raw data in the form of 72 pairs (red/green fluorescence) of raw Intensity Data files (.idat), the processed data matrix and a metadata table describing the samples and their groups 23 . For convenience, Online-only Table 1 lists all patients and samples with their characteristics, as well as experimental and analytical procedures and output data file names.

Technical Validation
Quality control of genomic DNA. Genomic DNA 260 nm/280 nm absorbance ratios were determined using a NanoDrop ND-1000 UV-Vis Spectrophotometer (Thermo Scientific, USA). All samples had ratios in the range 1.8-2.0, as expected for DNA of high purity (Online-only Table 2). Quality control of bisulfite conversion and Infinium 450k data. Quality control of bisulfite conversion and of data obtained by the Illumina Infinium HumanMethylation450 BeadChips was performed independently by the team of the core facility using the Methylation Module of Illumina's GenomeStudio software (Supplementary File 1 and Online-only Table 3) and by us using the rnb.run.qc command of the RnBeads package (Fig. 2). Both analyses ascertained the correct execution of the separate steps of the whole experimental procedure: bisulfite conversion, hybridization, single-base extension and stripping. Figure 2a,b demonstrate bisulfite conversion efficiency as reported by control probes of Infinium I or II design, respectively. Overall hybridization performance was assessed using synthetic reference targets that are present in the hybridization buffer at three concentrations (low, medium and high) and that resulted in signals with well separable intensity intervals, as expected (Fig. 2c). The extension controls showed high efficiency of extension with any of the 4 nucleotides (Fig. 2d) and the staining controls demonstrated high efficiency and sensitivity of the staining step (Fig. 2e). The overall performance of the assay from amplification to detection is summarized by the signal from probes that query non-polymorphic bases in the genome -one probe for each nucleotide (Fig. 2f).
The Infinium 450k BeadChip contains 65 genotyping probes that are useful for identification of sample mix-ups. These probes produced highly similar signal patterns in two of our samples, which was expected as these two samples (07PB1887 and 10PB6041) came from the same patient (Fig. 3).
Quality control of normalization procedure. Preprocessing of the raw data was performed using R/ Bioconductor with the RnBeads package 24 . Probes known to be cross-reactive (43 230) or overlapping known single-nucleotide polymorphisms (SNPs; 8 704) 25 , as well as probes giving unreliable measurements (884 as determined by the Greedycut algorithm) were excluded from analysis. The data from the remaining 432 759 probes were subjected to background subtraction using the Noob method 26 and beta-mixture quantile normalization (BMIQ) 27 . This normalization strategy successfully mitigated the inherent bias in β-value distributions between the two different types of probes (Infinium I and II) that are present on the HumanMethylation450 BeadChip 30 , as shown in Fig. 4. In a subsequent step, probes of non-CpG context (1 251), probes binding to sequences on sex chromosomes (9 917) and probes with standard deviation <0.005 (69 867) were filtered out. Thus, the data obtained by the remaining 351 724 probes qualified for downstream analysis. (2020) 7:133 | https://doi.org/10.1038/s41597-020-0456-0 www.nature.com/scientificdata www.nature.com/scientificdata/ Check for batch effects. Dimension reduction techniques are a powerful way of visualizing associations between different variables and global trends in DNA methylation data 24 . Applying PCA on the 10000 most variable CpGs in our data did not result in visible grouping of samples according to the BeadChip that they were applied on (Fig. 5). The lack of batch effects was further verified by Kruskal-Wallis one-way analysis of variance taking into account the first 8 primary components (Table 2).
Biological validation of the DNA methylation data. A technically sound dataset would allow confirmation of known facts. Using our dataset, we could replicate the finding that M-CLL and U-CLL are associated with distinct DNA methylation profiles 7 . In addition to the PCA in Fig. 5 and Table 2, we performed unsupervised hierarchical clustering using the 5000 most variable CpG sites. The resulting dendrograms and heatmap of β-values are presented in Fig. 6. In both analyses, samples were well separated according to IGHV mutation status, with the bigger cluster consisting only of U-CLL cases and the smaller cluster comprising all M-CLL cases www.nature.com/scientificdata www.nature.com/scientificdata/ plus five U-CLL cases, three of which had a considerable level of IGHV hypermutation (98.2-98.3% homology to germline; see also Online-only Table 1), thus possibly belonging to the intermediate CLL group as defined by Kulis et al. 7 .
The highly similar DNA methylation profiles of the two serial samples from one patient (Fig. 6) demonstrated the reproducibility and robustness of the whole analytical procedure (the two samples were hybridized to different BeadChips) and in addition supported previous observations that evolution of DNA methylation in CLL is limited to cases that acquire high-risk genetic alterations 31 .

Usage notes
In our exploratory analysis, we could not observe obvious clustering of samples based on patients' sensitivity or resistance to chemotherapy (Figs. 5 and 6). Accordingly, a differential methylation test using the limma method with the threshold for difference of β-values set to 0.1 and the false discovery rate set to 0.05 could not find any differentially methylated CpGs between chemoresistant and untreated/chemosensitive patients, neither in the subgroup of patients with del(17p)/TP53mut (groups B and C vs. group A), nor in the subgroup without these aberrations (group E vs. group D). Nevertheless, as multiple testing correction inflates the type II error rate considerably, we cannot exclude that some of the tested CpGs have truly different methylation between the groups and a causative role in chemoresistance development. In this regard, our dataset can be a valuable resource for conducting hypothesis-driven research addressing questions of chemoresistance in CLL.

Fig. 6
Unsupervised hierarchical clustering of the 72 CLL samples based on the β-values of the 5000 most variable CpG sites. Clustering was based on Manhattan distance with complete linkage. Columns represent samples and rows CpGs. Two of the samples originate from the same patient (see the main text) and are marked by an asterisk. The high similarity of DNA methylation patterns between them affirms the reproducibility of the methodology.