The role of reciprocal fusions in MLL-r acute leukemia: studying the chromosomal translocation t(4;11)

Leukemia patients bearing the t(4;11)(q21;q23) translocations can be divided into two subgroups: those expressing both reciprocal fusion genes, and those that have only the MLL-AF4 fusion gene. Moreover, a recent study has demonstrated that patients expressing both fusion genes have a better outcome than patients that are expressing the MLL-AF4 fusion protein alone. All this may point to a clonal process where the reciprocal fusion gene AF4-MLL could be lost during disease progression, as this loss may select for a more aggressive type of leukemia. Therefore, we were interested in unraveling the decisive role of the AF4-MLL fusion protein at an early timepoint of disease development. We designed an experimental model system where the MLL-AF4 fusion protein was constitutively expressed, while an inducible AF4-MLL fusion gene was induced for only 48 h. Subsequently, we investigated genome-wide changes by RNA- and ATAC-Seq experiments at distinct timepoints. These analyses revealed that the expression of AF4-MLL for only 48 h was sufficient to significantly change the genomic landscape (transcription and chromatin) even on a longer time scale. Thus, we have to conclude that the AF4-MLL fusion protein works through a hit-and-run mechanism, probably necessary to set up pre-leukemic conditions, but being dispensable for later disease progression.


ATAC-Seq experiments
In particular, cells were grown in 6-well plate and treated with 1 µg/ml Doxycyclin for 48 hours. After that cells were harvested, and viability were checked in every samples. In all samples viability were more than 90%. Cells were then resuspended in cold PBS and cell numbers in each sample were counted. 50,000 cells were centrifuged at 500 RCF for 5 minutes at 4°C in a fixed angle centrifuge. After centrifugation, 900 µl of supernatant was aspirated with P1000 pipette and the remaining 100 µl of supernatant was carefully aspirated by pipetting with a P200 pipette tip to carefully avoid the cell pellet. Cells were resuspended in 50 µl cold ATAC-Seq Resuspension Buffer (RSB; 10 mM Tris-HCl pH 7.4, 10 mM NaCl, and 3 mM MgCl2 in water) containing 0.1% NP40, 0.1% Tween-20, and 0.01% Digitonin, pipetting up and down 3 times. Cells were incubated on ice for 3 minutes and then 1 ml of cold ATAC-Seq Resuspension Buffer (RSB) containing 0.1% Tween-20 but no NP-40 and Digitonin was added. Tubes were inverted 3 times to mix and then centrifuged at 500 RCF for 10 minutes at 4°C in a fixed angle centrifuge to pellet down the nuclei. Supernatant was removed with two pipetting steps, as described before, and nuclei were resuspended in 50 µl of transposition mix (25 µl 2× TD buffer (20 mM Tris-HCl pH 7.4, 10 mM MgCl2, 20 % Dimethyl Formamide in water), 2.5 µl transposase 26 (100 nM final), 16.5 µl PBS, 0.5 µl 1% digitonin, 0.5 µl 10% Tween-20, and 5 µl water) by pipetting up and down six times. Transposition reactions were incubated at 37°C for 30 min in a thermomixer with shaking at 1000 rpm. The final ATAC-Seq experiment was performed at GenXPro (https://genxpro.net/) that used an Illumina HiSeq for the analysis, and the resulting data were analysed by bioinformatics tools.

Differential gene expression profiling by MACE-Seq and Bioinformatic analyses
The chimeric genes were either expressed constitutively, or induced for 48h with 1 µg/ml Doxycycline and total RNA were isolated from transfected cell lines. After testing the correct expression of both transgenes, differential gene expression (DGE) profiles were obtained by MACE (Massive Analysis of cDNA Ends) -Seq experiments following the manufacturer protocol (GenXPro, Frankfurt, Germany). Resulting data from 3 biological replicates of the cell line at the 4 different time points (day 0, day 3, day 12 and day 28) were compared with 3 biological replicates of mock-transfected cells. The MACE-libraries were prepared at GenXPro GmbH using the Massive Analysis of cDNA Ends (MACE) Library Preparation Kit (v2.0) from GenXPro GmbH. First, cDNA was generated using Oligo(dT) primers with distinct Oligo IDs per sample for subsequent pooling of up to 24 samples. After pooling, cDNA was fragmentated to an average size of 200 bp using the sonicator Biorupter Plus (Diagenode, Belgium). The distribution of cDNA fragment sizes was monitored using the automated microfluidic electrophoresis station LabChip GXII Touch HT platform (PerkinElmer, USA). The poly(A) containing cDNA fragments were purified using solid phase reversible immobilization (SPRI) beads (Agencourt AMPure XP, USA), end repaired and ligated to distinct 8-base pair UMI Adapters (also called TrueQuant adapters). Then, the library containing labelled and fragmentated cDNA was amplified by PCR, purified by SPRI beads (Agencourt AMPure XP, USA) and strand-specific sequenced using the HiSeq2500 (Illumina, USA). Bioinformatic analysis was performed according to the analysis pipeline for MACE libraries by GenXPro GmbH. Unique Oligo IDs and UMIs on each transcript allowed initial demultiplexing and subsequent removal of PCR-duplicates. The remaining reads were trimmed for high-quality as well as adapter-free sequences and aligned to the human reference genome (Genome Reference Consortium Human Build 38 patch release 13 (GRCh38.p13) using Bowtie 2. Resulting output data were implemented in the database program FileMaker for further analysis. All data received from the Bioconductor software from our RNA-Seq or ATAC-Seq experiments were incorporated into a FILEMAKER database program. All final data sets were exported from the FILEMAKER Database program as individual Excel documents for publication. Moreover, thr FILEMAKER Database program was used to develop the GUDC, DAGT and DAGE/ST modules. In addition, we used the following server for further data analysis: ClustVis (https://biit.cs.ut.ee/clustvis/) for heatmap analyses; VolcaNoseR (https://huygens.science.uva.nl/VolcaNoseR/) for volcano plots; ShinyGO v0.61 (http://bioinformatics.sdstate.edu/go/) for pathway analyses.

Figure S1: Data dissection of the MACE-and ATAC-Seq experiment
The dissected data obtained from our MACE-and ATAC-Seq experiments are summarized. Upper panels display the identified gene signatures from days 0 -28; the gene signatures were dissected into pseudogenes (PG), non-annotated genes (NA), LincRNA genes (LINC), microRNA genes (MIR), SNO genes (SNO), mitochondrial genes (MT) and protein coding genes (PCG), respectively. Lower panels display the dissected ATAC-Seq data. In the first panel, total number of entries, and the number of accessible (total up) and non-accessible chromatin (total down) is displayed. The last 4 columns represent the total number of the upand down-regulated chromatin regions and their associated genes. These were calculated for up-and downregulated chromatin regions by a minimum of 2 reads, a p-value < 0,05 combined with log2 changes of >2 in case of up-regulated genes, while downregulated genes were identified by a minimum of 2 reads in the mock sample, a p-value < 0,05 combined with a log2 value of < -2. In the lower panel, the same ATAC-Seq data was dissected for the number of pseudogenes (PG), non-annotated genes (NA), LincRNA genes (LINC), microRNA genes (MIR), SNO genes (SNO), mitochondrial genes (MT) and protein coding genes (PCG), respectively. Green and red numbers represent the total number of identified and classified up-and down-regulated signatures, respectively.

Figure S2: Profiling mitochondrial gene transcription
Several mitochondrial genes were strongly transcribed in cells that express MLL-AF4 or both t(4;11) fusion proteins, even when the AF4-MLL transgene was shut down after day 3. Among those genes, ATP6, ATP8, CO1-CO3, CYB, ND4, as well both ribosomal RNAs RNR1 and RNR2 were highly expressed. It is not clear what kind of physiological consequences this causes in the cells, because respiration experiments did not show any significant difference between mock cells and cells expressing these t(4;11) fusion proteins.

Figure S3: GUDC module analysis
All 4 gene signatures were analyzed for the number of genes that were deregulated on each chromosome and displayed as percentages of all genes present on the same chromosome. Left: up-regulated genes in the t(4;11) cell culture model at all 4 timepoints (day 0 = blue; day 3 = green; day 12 = red; day 28 = orange). Right: down-regulated genes. For each timepoint the "mean of deregulated genes for all chromosomes" is given in percent. The graphs display a 'fingerprint' of the gene usage on each chromosome. The resulting graphical pattern can be used to deduce the genome-wide impact of the tested fusion proteins. It also demonstrates that target gene usage is not a random process. Figure S4: Venn diagram and detailed pathway analysis of de novo genes A. De novo genes signatures were analyzed along all 4 timepoints. All idiosyncratic and shared signature genes were analyzed by pathway analysis. Those where pathways could be identified are indicated by green numbers, while for all grey numbers, no pathway was identified. Pathways were found for days 3 and 28, as well intersections exhibiting 41, 95 and 189 genes, respectively. These pathways revealed G protein-coupled signaling, including Ca2+ signaling, as well as innate and humoral immune response, as well as B cell proliferation. B. the Signature Tracing module revealed the same pathways, but could in addition attribute these pathways to the protein coding genes that arose at d3 and the combined signatures that derive from d3 and d28.

Figure S5: Venn diagram and detailed pathway analysis of shut-down genes
Shut-down genes signatures were analyzed along all 4 timepoints. All idiosyncratic and shared signature genes were analyzed by pathway analysis. Those where pathways coukld be identified are indicated by red numbers, while for all grey numbers, no pathway was identified. Pathways were found for days 0, 3, 12 and 28. These signatures all show 3 major pathways, namely adhesion, migration and secretion. Idisosyncratic and intersection signaturesexhibiting 202 and 153 genes -identified again innate immune pathways and cell adhesion. B. the Signature Tracing module revealed the same pathways (adhesion and migration), but could in addition attribute these pathways to the protein coding genes that arose at d0 and d3, while downregulated T-cell specific functions could be attributed to the d0-derived signature at day 12, while T-helper immune functions derived from the d3-signature at day28. Figure S6: MACE or ATAC-Seq experiments were performed and resulting data were analyzed by Bioconductor software to create output Excel files. Various other bioinformatic tools were used to analyze these data (volcano plots with huygens.science.uva.nl; heatmaps with biit.cs.ut.ee/clustvis; pathway analsyis with bioinformatics.sdstate.edu). Circos plots display the genome-wide changes identified by RNA-Seq or ATAC Seq experiments. Finally, all Bioconductor output files were imported to the Filemaker database program, which allowed the creation of novel modules for refined analyses not provided by the Bioconductor package (GUDC, DAGT, DAGE/ST).