Introduction

Chromosomal rearrangements involving the human MLL gene are recurrently associated with the disease phenotype of acute leukemias.1, 2 The presence of distinct MLL rearrangements is an independent dismal prognostic factor, while very few MLL rearrangements display either a good or an intermediate outcome.3, 4 It became also clear from recent studies that the follow-up of patients during treatment and therapy adjustment based on minimal residual disease (MRD) monitoring has a very strong impact on outcome.5, 6, 7 For this purpose, we established a diagnostic network that allowed different study groups and clinical centers to obtain genomic MLL breakpoint sequences that can be directly used for quantifying MRD levels in patients. The current work flow to identify MLL rearrangements includes a prescreening step (cytogenetic analyses,8, 9 split-signal fluorescence in situ hybidization10, 11, 12 or reverse-transcription PCR) in combination with long-distance inverse PCR that was performed on small amounts (~1 μg) of isolated genomic DNA.13 This allowed us to readily identify reciprocal translocations, complex chromosomal rearrangements, gene internal duplications, deletions or inversions on chromosome 11q, and MLL gene insertions into other chromosomes, or vice versa, the insertion of partner chromosome material into the MLL gene located at 11q23.

To gain insight into the frequency of distinct MLL rearrangements, all prescreened samples of infant, pediatric and adult leukemia patients were sent for analysis to the Frankfurt Diagnostic Center of Acute Leukemia (DCAL) after cytogenetic prescreening. All prescreened MLL rearrangements were successfully analyzed at the Frankfurt DCAL, and patient-specific MLL fusion sequences for MRD monitoring were obtained.

Materials and methods

Patient material

Genomic DNA was isolated from bone marrow and/or peripheral blood samples of leukemia patients and sent to the DCAL (Frankfurt/Main, Germany). Patient samples were obtained from different study groups (the AMLCG-study group, Munich; the GMALL study group, Berlin; Polish Pediatric Leukemia and Lymphoma Study Group; Zabrze; I-BFM network) and diagnostic centers in Europe (Aarhus, Berlin, Barcelona, Bordeaux, Bratislava, Brest, Bristol, Catania, Copenhagen, Ekaterinburg, Frankfurt, Giessen, Granada, Graz, Grenoble, Haifa, Hamburg, Hanover, Heidelberg, Jena, Jerusalem, Kiel, Lille, Lisbon, Madrid, Minsk, Montpellier, Monza, Moscow, Munster, Munich, Nancy, Nantes, Newcastle upon Tyne, Olomouc, Padua, Paris, Porto, Prague, Reims, Rotterdam, Strasbourg, Tampere, Tel Hashomer, Toulouse, Turku, Ulm, Valenciennes, Vienna, Zabrze and Zurich) or centers located outside of Europe (Adelaide, Boston, Brisbane, Buenos Aires, Hong Kong, Houston, Rio de Janeiro, Seoul, Sydney and Tohoku), where acute leukemia patients are enrolled in local study groups. Informed consent was obtained from all patients or patients’ parents/legal guardians, and control individuals.

Long distance inverse PCR experiments

All DNA samples were treated and analyzed as described.13, 14, 15, 16 Briefly, 1 μg genomic patient DNA was digested with restriction enzymes and re-ligated to form DNA circles before long-distance inverse PCR analyses. Restriction polymorphic PCR amplimers were isolated from the gel and subjected to DNA sequence analyses to obtain the patient-specific fusion sequences. This genomic DNA fusion sequence is idiosyncratic for each leukemia patient and was made available to the sender of the DNA sample. The average processing time was around 5 working days.

Data evaluation and statistical analyses

All clinical and experimental patient data were implemented into a database program (FileMaker Pro, FileMaker Inc., Santa Clara, CA, USA) for further analysis. Information about all individual patients was used to compare all defined subgroups and to perform statistical analyses to retrieve important information or significant correlations. χ2-tests were performed to identify significant deviations from mean values.

Nomenclature

We are well aware about all the changes in the HUGO gene nomenclature over the past years. However, for the readability of the text, we use the following gene nomenclature throughout the text: MLL (KMT2A); AF4 (AFF1); LAF4 (AFF3); AF5 (AFF4); ENL (MLLT1); AF9 (MLLT3); AF6 (MLLT4); AF17 (MLLT6); AF10 (MLLT10); and AF1Q (MLLT11).

Results

The study cohort

To analyze the recombinome of the human MLL gene, 2381 prescreened acute leukemia samples were obtained from the above-mentioned centers from 2003 to 2016. In all cases, we first used PCR experiments combined with sequence analysis to diagnose the direct MLL fusion allele, and in case of failure or having a complex rearrangement, the reciprocal MLL fusion allele was analyzed. Successful analysis could be performed for all patient samples. In 31 cases we were only able to characterize the reciprocal MLL fusion allele to guarantee subsequent MRD experiments. Complete data were available on 2345/2381 cases (gender, age at diagnosis, disease type and subtype, or information about de novo or secondary leukemia). Genetic and clinical information of these 2345 patients are summarized in Table 1. The 36 excluded cases had the following MLL rearrangements: 9 × MLL–AF9; 8 × MLL–AF4; 4 × MLL–ENL; 4 × MLL–AF10; 3 × MLL–AF6; 2 × MLL–AF17; 2 × MLL–EPS15; 1 × MLL–GAS7; 1 × MLL–LOC100128568; 1 × MLL–CREBBP; and 1 × MLL–PTD. The exclusion of these 36 patients did not affected with the general conclusions made in this study.

Table 1 Overview about all investigated TPGs

Age distribution according to clinical subtypes

We first analyzed our cohort according to the age at diagnosis. As displayed in Figure 1, the age distribution is quite similar to the expected distributions known from other cancer registries. MLL-r acute lymphoblastic leukemia (ALL) incidence has a peak in the first 2 years, then declines during the pediatric and young adult phase and then steadily increases again with age. A similar picture was observed with MLL-r acute myeloid leukemia (AML) patients, however, missing the postnatal peak seen for infant ALL. For the purpose of our study we separated our cohort into an ‘infant acute leukemia cohort’ (0.03–12 months; n=876: 692 ALL, 160 AML and 24NA) (not annotated), a ‘pediatric acute leukemia group’ (>12 months–18 years; n=671: 313 ALL, 339 AML and 19NA) and an ‘adult acute leukemia patient’ group (>18 years; n=798: 415 ALL, 373 AML and 10NA). As shown in Figure 1, we also added information about therapy-induced leukemia (n=110). Fifty-three patients could not be categorized into ‘ALL’ or ‘AML’ because they received other diagnoses (mixed lineage leukemia (MLL)=38, myelodysplastic syndrome=7 and lymphoma=4), or because we had no information from the corresponding center (unknown disease type=4).

Figure 1
figure 1

Age distribution of investigated patients. The age distribution of all analyzed patients (n=2345) is summarized. Upper part: diagram displaying ALL and AML patients. Age at diagnosis was divided into infants (0–1 year), pediatric (1–18 years) and adult patients (>18 years). The number of ALL, AML and other patients is listed below. We also added the information about therapy-induced leukemia (TIL) patients, the number of complex MLL rearrangements (CL) and specified the ‘non-ALL’ and ‘non-AML’ patients (MLL, MDS, lymphoma and other) in more detail for each age group. The precise number of all patient cases is summarized on the right.

Identification of MLL rearrangements and their distribution in clinical subgroups

The most frequent MLL rearrangements in these six subgroups were summarized in Figure 2. Infant ALL (n=692) displayed 338 AF4, 113 AF9, 154 ENL, 39 AF10, 1 AF6 and 16 EPS15 gene fusions. Additional 31 MLL rearrangements were identified (9p13.3, 11q23.3, ACER, AF1Q, LAF4, AF5, BTBD18, CLTA, DCP1A, EEFSEC, 14 cases (NA) with no der(11) and only a reciprocal fusion allele, PICALM, PRPF19 and TNRC18).

Figure 2
figure 2

Classification of patients according to age classes and disease type. Top: frequency of most frequent TPGs in the investigated patient cohort of MLL-r acute leukemia patients (n=2345). This patient cohort was divided into ALL (left) and AML patients (right). Gene names are written in black, percentages are indicated as white numbers. Fifty-three patients could not be classified into the ALL or the AML disease types, respectively. Middle: TPG frequencies for the infant, pediatric and adult patient group. Bottom: subdivision of all three age groups into ALL and AML patients. Negative numbers refer again to the number of patients that were neither classified to the ‘ALL’ nor to the ‘AML’ subgroup.

Infant AML (n=160) was represented by 4 AF4, 40 AF9, 2 ENL, 43 AF10, 24 ELL, 2 AF6 and 1 EPS15 gene fusion. Another 44 MLL rearrangements have been characterized (11q24, ABI1, ABI2, AF1Q, FLNA, FNBP1, GAS7, KIAS1524, MYO1F, 14 cases (NA) with no der(11) and only a reciprocal fusion allele, NEBL, NRIP3, PICALM, SEPT5, SEPT6 and SEPT9).

Pediatric ALL (n=313) displayed 139 AF4, 56 AF9, 56 ENL, 12 AF10, 16 AF6 and 6 EPS15 gene fusions. Another 31 MLL rearrangements were characterized (21q22, AF17, LAF4, BCL9L, CBL, FOXO3A, MAML2, MKL1, 5 cases (NA) with no der(11) and only a reciprocal fusion allele, NUP153, PFDN4, PICALM, RUNDC3B, SEPT5, SEPT11, TET1 and TNRC18).

Pediatric AML (n=339) displayed 3 AF4, 132 AF9, 21 ENL, 66 AF10, 24 ELL, 6 MLL PTDs, 28 AF6 and 5 EPS15 gene fusions. Another 54 MLL rearrangements have been diagnosed (6q27, 11q23.3, ABI1, ACACA, ACTN4, AF1Q, AF17, ARHGAP26, ARHGEF17, BUD13, CLTC, DCP1A, FLNA, KLN1, LAMC3, an MLL gene-internal deletion, MYO1F, 5 cases (NA) with no der(11) and only a reciprocal fusion allele, NEBL, SEPT2, SEPT5, SEPT6, SEPT9, SEPT11, TET1 and VAV1).

Adult ALL (n=415) displayed 332 AF4, 9 AF9, 50 ENL, 1 AF10, 1 ELL, 1 MLL PTD, 9 AF6 and 4 EPS15 gene fusions. Additional 8 MLL rearrangements were identified (11q23, ACTN4, CEP164, KIF2A, MAML2, PRRC1, PTD and TET1).

Adult AML (n=373) displayed 3 AF4, 90 AF9, 14 ENL, 33 AF10, 45 ELL, 98 MLL PTDs, 38 AF6 and 5 EPS15 gene fusions. Another 47 MLL rearrangements were detected (1p13.1, 11q23, AF1Q, AKAP13, AP2S2, ARHGEF12, C2CD3, CASP8AP2, CBL, CEP170B, DCPS, GMPS, ME2, AF17, MYH11, 2 cases (NA) with no der(11) and only a reciprocal fusion allele, NOX4, PDS5A, PICALM, SEPT5, SEPT6, SEPT9, SMAP1, TCF12, TET1 and TOP3A).

On the basis of the above distribution, about 96% of all ALL patients (n=1420) were characterized by the fusion genes MLL–AF4 (~57%), MLL–ENL (~18%), MLL–AF9 (~13%), MLL–AF10 (~4%), MLL–EPS15 (~2%) and MLL–AF6 (~2%). About 83% of all AML patients (n=872) were characterized by the fusion genes MLL–AF9 (~30%), MLL–AF10 (~16%), MLL–ELL (~11%), MLL PTDs (~12%), MLL–AF6 (~8%), MLL–ENL (~4%) and MLL–SEPT6 (~2%). These results are in line with recently published data about the frequency and distribution of different MLL fusion partner genes.16, 17, 18 This updated information is quite important for diagnosis and has already been used to establish a fast reverse-transcription PCR-based multiplex screening method.19 Additional information about the distribution of translocation partner genes (TPGs) in major disease subgroups (different B-cell developmental stages, T-ALL or French–American–British (FAB) M0–M7 for AML patients) have been summarized in Supplementary Figures S1 and S2. Here the different FAB classes in AML show a quite distinct pattern of fusion genes with some of the major fusion partners missing in distinct FAB groups M0–M7. For example, in FAB M0 AF4 and ELL are missing, in FAB M1 ENL is missing, in FAB M2 AF4, AF10 and AF6 are missing and so on. In FAB class M6 and M7, only certain fusion genes could be identified. In B-ALL stages, no such exclusion patterns were observed, rather a shift for specific fusion genes, while T-ALL is mainly composed of ENL and AF6 gene fusions (see also below).

Breakpoint distribution according to clinical subtypes

We also investigated the distribution of chromosomal breakpoints within the MLL breakpoint cluster region in all investigated clinical subgroups. Briefly, the major breakpoint cluster region is localizing between MLL exon 9 and MLL intron 11, where the majority of patients (93.5%) had their individual breakpoints (n=2192). Only 153 patients (6.5%) had their breakpoint outside of the major breakpoint cluster region (Supplementary Figures S3–S5 and Supplementary Table S6).

As the localization of breakpoints may have an impact on cancer biology and clinical behavior, we started to analyze the breakpoint distribution for all clinical subgroups and compared them with the ‘mean distribution’ (MD) observed for all 2345 patients. We decided not to use a ‘random distribution model’ of chromosomal breakpoints, because this is only based on the length of each DNA region. However specific features in MLL intron 9 (four Alu repetitive elements of which three are transcriptionally active) and MLL intron 11 (sensitivity against cytotoxic drugs, a DNase1 hypersensitive site,20 an apoptotic cleavage site,21 an RNA polymerase II-binding site22 and topoisomerase II-binding sites23) may account for a specific increase of DNA double-stranded breaks due to specific molecular features of the chromatin, or, breakpoints differ because of a selection process for resulting MLL fusion proteins.

For our analyses, we subdivided the MLL breakpoint cluster region into three subregions: (A) exon 9–intron 9=1761 bp; (B) exon 10–intron 10=679 bp; (C) exon 11–intron 11–exon 12–intron 12 and exon 13=5026 bp. The observed ‘MD’ for these three MLL breakpoint regions was A=37.0%, B=19.8% and C=40.1% for all 2345 patients listed in Supplementary Table S7. In these analyses, all patients were investigated for their fusion partner gene in correlation with age at diagnosis, gender, patient group, therapy-induced leukemia, complex genetic rearrangements, origin of patient and breakpoint distribution. Here a significant deviation from the ‘MD’ was observed for AF1Q, AF6, AF10, ENL, EPS15, SEPT6, SEPT9, AF17 and MLL PTDs. The fusion partner genes ENL and SEPT6 had preferentially MLL intron 11 breaks, while all others tend to bear MLL intron 9 recombination events. Of interest, also therapy-induced acute leukemias differ significantly in their ‘MD’, with a tendency for MLL intron 11 breaks.

A detailed analysis for the most frequent MLL fusion partner genes is depicted in Supplementary Table S8. Here we separated according to fusion partners and patient subgroup (infant I, pediatric P and adult A) with regard to several other parameters such as age, gender, therapy-induced, complex translocation, origin and disease type. The overall breakpoint distribution of all seven most frequent genetic aberrations with more than 2000 patients was not significantly deviating from the MD of all patients. However, significant changes were observed for patient subgroups bearing chromosomal translocations to AF4 (I and A), AF9 (A), ENL (I and P), AF10 (P), ELL (I, P and A), AF6 (I, P and A) and MLL PTD patients (P and A). This clearly demonstrates that certain fusion genes have a selective preference for distinct breakpoints, most likely because of the resulting function of a given fusion protein. As an example, AF6 fusions in ALL and AML patients are mostly occurring in MLL intron 9 (or even upstream), while infant AF4 and infant/pediatric ENL fusions tend to have breakpoints within MLL intron 11. Similar observations were made for the more rare fusion partner AF1Q (significantly toward MLL intron 9), AF17 (significantly toward MLL introns 8 and 9), EPS15 (significantly toward MLL intron 11 in adult patients), SEPT6 (significantly toward MLL intron 11 in pediatric and adult patients) and SEPT9 (significantly toward MLL introns 7–9).

To evaluate these data further, we correlated the breakpoint distribution with the age of patients. We have done so for ALL and AML patients (Supplementary Figures S9 and S10). These analyses revealed that the disease subtypes (ALL and AML) divide patients into two groups (ALL more in MLL intron 11 breakpoints; AML more in MLL intron 9 breakpoints). However, these breakpoint tendencies seem to change with age. Thus, young patients usually display MLL intron 11 breakage, while older patients have more breaks in MLL intron 9. This is true for all investigated subgroups (AF4, ENL and AF9) where we had enough patients to perform this type of analysis and to obtain a significant result. Vice versa, young AML patients usually prefer MLL intron 9 breakage, while older patients have more breaks in MLL intron 11. This has been done also for the AF9, AF10 and ELL subgroups. The cross-over points were 10–14 years in ALL patients and 75 years in AML patients. MLL–ELL patients within the AML group are somehow different from all other subgroups because they start very early on with a preference for MLL intron 11 (all patients above 1 year of age) and display no cross-over point. These breakpoint preferences and their change with age are indicating that two different molecular mechanisms are driving MLL rearrangements: one is a transcription-coupled hot spot that is quite sensible for external cytotoxic triggers (MLL intron 11), while the other is presumably based on transcriptionally active ALU repeats where POL III and POL II transcription is causing torsional DNA stress.

Another important point is the outcome of patients that is linked to the distribution of chromosomal breakpoints within the MLL breakpoint cluster region.24 Basically, the outcome of leukemia patients with breakpoints in MLL intron 11 was worse compared to those patients with upstream breakpoints. A rational explanation for this observation came from functional studies of the plant homeodomain (PHD) domain of the MLL protein, encoded by MLL exons 11–16 (Supplementary Figure S11). This domain is built up by PHD1, PHD2 and an enhanced PHD3. The first three PHD domains are separated by the adjacent bromodomain and another enhanced PHD4. PHD3 has an important dual function, because it either binds to the CYP33/PPIE protein25, 26 or to methylated lysine-4 residues of histone H3.27 Binding of PHD3 to H3K4me2/3 peptides is greatly enhanced by the adjacent bromodomain,28 but binding of the prolyl-peptidyl isomerase CYP33/PPIE confers a cistrans isomerization of proline-1665. This enables binding to BMI1 and associated repressor proteins (HDAC/CBX4/KDM5B). This molecular switch determines the human MLL protein of being a transcriptional activator/maintenance factor or a transcriptional repressor. Noteworthy, PHD2 and PHD3 also bind to E3-ligases (CDC34 and ASB2, respectively), which control the steady-state stability of the MLL protein.29, 30 As shown recently by our group, breakpoints within MLL intron 11 destroy the dimerization capacity of the PHD1–3 domain.31 This also disables binding to the BMI1 repressor complex.32 Thus, a breakpoint within MLL intron 11 has functional consequences for the resulting fusion proteins, which may explain the clinical finding of the altered outcome.24

The MLL recombinome

On the basis of the results obtained in the present and previous studies,13, 14, 15, 16 a total of 84 direct TPGs and their specific breakpoint regions have now been identified, all of which generate an in-frame MLL fusion protein (Table 2A). Additional 10 MLL fusions were characterized that were fused out of frame to translocation partner genes (Table 2B). In the latter cases, alternative splicing may still allow to generate an MLL-fusion protein, however, this was not investigated here. Another 6 loci have been cloned where the 5′-portion of MLL was not fused to another gene (Table 2C). 3′-Race and reverse-transcription PCR experiments with several exon combinations were performed to identify potential fusion transcripts. But no in-frame fusion RNAs could be identified. Therefore, these 16 unusual MLL rearrangements—where neither any dimerization nor a transcriptional activation domain is present—probably represent a subclass of MLL abnormalities, which have per se no or only a weak ability to transform hematopoietic cells and are only identified in the context of other genetic abnormalities in hematopoietic stem cells.33, 34

Table 2 Overview about the MLL recombinome 2017

In 31 additional cases we were not able to identify a der(11) fusion gene at all. This could be either attributed to a technical problem (for example, a too long genomic fragment) or to the fact that no der(11) exists in these few patients. However, in 22/31 cases we successfully identified a reciprocal MLL fusion allele, while 9 cases carried no detectable direct or reciprocal fusion gene. This subgroup (n=31) was included in the group of ‘complex MLL rearrangements’ (n=247) because of the extending class of ‘reciprocal MLL fusion genes’ (Supplementary Table S12). Within this group of patients with complex MLL rearrangements, a total of 32 reciprocal MLL fusions represent in-frame fusions (marked in red in Supplementary Table S12), while 215 fusions were out-of-frame fusions at the genomic DNA level (88 gene loci/127 partner genes; shown in black in Supplementary Table S12). Finally, there were still 35 chromosomal translocations of the human MLL gene that were characterized in the past by cytogenetic methods, but that were never analyzed at the molecular level (Table 2D). Thus, the MLL recombinome to date comprises 94 different ‘direct TPGs’ (decoding the MLL N terminus) and 6 different 5′-MLL genes fused only to genomic DNA. On the other hand we have now 247 ‘reciprocal TPGs’ (bearing the MLL C terminus) that are deriving from complex rearrangements with already-known ‘direct TPGs’. It is noteworthy that nearly all of these reciprocal MLL fusion (243 out of 247) are per se able to express only the 3′-MLL portion, named MLL*, due to a gene internal promoter located upstream of MLL exon 12.22

Novel translocation partner genes

Apart from the many new MLL fusion genes that have already been discovered at the DCAL and published in the last decade (Tables 2a–c; n=40), we present additional 11 novel TPGs (marked as ‘not published yet’): GIGYF2 (GRB10-interacting GYF; 2q37.1; 1299 aa); KIF2A (kinesin heavy chain member 2A; 5q12.1; 706 aa); CLIP2 (CAP-GLY domain-containing linker protein 2; 7q11.23; 1046 amino acids (aa)); CLTA (clathrin, light chain A; 9p13.3; 248 aa); RABGAP1 (RAB GTPase-activating protein 1; 9q33.2; 1069 aa); TCF12 (transcription factor 12; 15q21.3; 682 aa); CLTC (clathrin, heavy chain; 17q23.1; 1,675 aa); PFDN4 (prefoldin subunit 4; 20q13.2; 134 aa); MKL1 (megakaryoblastic leukemia (translocation) 1; 22q13.1; 931 aa); NUP153 (nucleoporin 153 kDa; 6p22.3; 1475 aa); and NOX4 (NADPH oxidase 4; 11q14.3; 578 aa).

The Drosophila GIGYF2 protein ortholog was shown to be a modulator of autophagy that controls neuron and muscle homeostasis.35 GIGYF2 binds directly to AGO2 and is involved in siRNA-mediated post-transcriptional silencing.36 A quite specific feature of GIGYF2 is to build a complex together with eIF4E and ZNF598 to selectively block the process of translation of distinct capped mRNAs.37 Several papers have linked GIGYF2 also to Parkinson’s disease, however, these data are so far not sufficiently significant.

KIF2A is a member of the kinesin-13 family and involved in spindle assembly at the metaphase I–anaphase I transition of oocytes.38, 39 Moreover, genetic mutations in the motor domain of this protein is associated with cortical malformation syndromes such as microcephaly.40 Vice versa, overexpression of KIF2A has been diagnosed in different cancers, because KIF2A expression and phosphorylation influences microtubuli dynamics, which is important for tumor cell migration and metastasis.41

CLIP2 has been discovered as overexpressed biomarker after radiation in papillary thyroid carcinomas, usually accompanied by a gain of chromosome band 7q11.42 This disease has been frequently diagnosed as the main consequence of the Chernobyl accident.

CLTA, also named clathrin light chain A, is involved in vesicle trafficking and endocytosis. However, a recent paper demonstrated that CLTA has a role on the migration of tumor cells.43 This is in part due to the fact that CLTA interacts with Huntingtin-interacting protein, involved in the regulation of the actin cytoskeleton. Upon depletion of clathrin light chains a steady-state downregulation of β1-integrins was observed because of defects in vesicle recycling.

RABGAP1 is highly specific for RAB6A and has a role in microtubule nucleation at the centrosome. It also participates in a RAB6A-mediated pathway involved in the metaphase–anaphase transition (Mad2-spindle checkpoint).44

TCF12, also known as HeLa E-box binding protein, is quite interesting as it controls the osteogenic differentiation of mesenchymal stem cells in the bone marrow.45 This basic helix–loop–helix transcription factor was also found to be mutated in anaplastic oligodendroglioma.46 TCF12 is able to bind to TWIST1 and involved in the early differentiation pathways of thymic T cells (DN3->DN4 and ISP->DP).47

CLTC has been identified in complex chromosomal rearrangements causing the expression of the CLTC–ALK fusion in blastic plasmacytoid dendritic cell neoplasm.48 A more recent work has found CLTC mutations are associated with neuronal malformations and intellectual developmental delays.49 This is due to the fact that clathrin heavy and light chains (CLTA and CLTC) are involved in vesicle trafficking, vesicle recyling and neurotransmitter release.

PFDN4 has been linked to colorectal cancer, however, inversely correlated with outcome (low expression has poorer outcome). A knockdown of this gene was correlated with increased cell growth and invasiveness.50

MKL1 interacts with the transcription factor myocardin, a key regulator of smooth muscle cell differentiation. The encoded protein is predominantly nuclear and may help transduce signals from the cytoskeleton to the nucleus. This gene is involved in a specific translocation event that creates a fusion of this gene and the RNA-binding motif protein-15 gene. This specific t(1;22)(p13;q13) translocation has been associated with the development of acute megakaryocytic leukemia.51

NUP153 s a highly versatile protein, involved in nuclear pore functions, pore architecture, nuclear import and export, de novo pore formation after mitosis and destruction of NUP153 during apoptosis.52

NOX4 is NADPH oxidase 4 that is important in the regulation of glycolysis and glutamate metabolism. Disruption of NOX4 by CRISPR/Cas9 is inhibiting cell growth of HeLa cells, indicating that NOX4 is quite important as metabolic regulator in tumor cells.53 NOX4 has been identified in many tumors as a relevant gene.

T-ALL cases

A tiny fraction of investigated patients were diagnosed with a T-ALL (n=59) (Supplementary Figure S13). This group of patients is mainly characterized by MLL fusion with AF6 (n=23) and ENL (n=22). Other fusions were AF4, AF9, AF10, MAML2, PFDN4, RUNDC3B, SEPT5, SEPT11, TNRC18 and 1 reciprocal USP20–MLL fusion. Only in the cohort of MLL–AF6 patients, quite unusual MLL breakpoints were observed (n=4), where the chromosomal breakpoint in the MLL gene was diagnosed within intron 21 and 23. This is quite important because such a far away downstream breakpoint includes the complete PHD1–3, the bromodomain as well as the complete enhanced PHD4 domain of MLL into the fusion protein with AF6 (Supplementary Figure S11). These additional 581 amino acids could be an important hint for the importance of these MLL domains in T-ALL. The PHD1–3 and bromodomain exert important regulatory functions to the MLL N terminus, such as chromatin reading, protein stability or CYP33 binding. In the latter case, binding of the BMI1 repressor complex will reverse the function of the MLL–AF6 fusion by repressing gene transcription. This is quite interesting and provides a new research aspect for MLL–AF6.

Therapy-induced leukemia cases

We also investigated the therapy-induced patient cases (n=110; Supplementary Table S14). The dominant partner genes are AF9 (n=41), ELL (n=11), AF4 (n=11) and ENL (n=10). All other fusions (n=23) have been identified one to four times. To our surprise, the AF9 cases were shifting from MLL intron 9 breaks to MLL intron 11 breaks. Some MLL fusions can only be found in therapy-induced acute leukemia and not in patients with de novo diseases: ACTN4; AKAP13; ARHGEF12; FOXO3A; GMPS; LAMC3; ME2; PDS5A; PRRC1; and TCF12. As expected, therapy-induced acute leukemias were only diagnosed in pediatric and adult patients, not in infants.

Spliced fusions

Spliced fusions are rare events except for the ENL fusion gene (n=302). In the latter cases, about 50% of all breakpoints localize far upstream of ENL exon 1 (n=153; Supplementary Table S15). In these cases, no reciprocal fusion protein can be made, only an MLL–ENL fusion transcript.54 For the other cases, a similar scenario was found. In all these cases, a 3′-terminal truncated MLL was recombined upstream of PRPF19 (1 out of 1 case), ELL (8 out of 97 cases), MYO1F (1 out of 4 cases), EPS15 (9 out of 38 cases), AF4 (1 out of 839 cases), AF6 (3 out of 95 cases) and AF9 (2 out of 449 cases). A total of 180 cases were identified that show this unusual peculiarities.

Discussion

Here we present an update of the ‘MLL recombinome’ associated with different hematologic malignancies, and in particular with acute leukemia (ALL and AML). All our analyses were performed by using small amounts of genomic DNA that were isolated from bone marrow or peripheral blood samples (n=2345) of leukemia patients. In some cases, we analyzed cDNA from a given patient to validate the presence of MLL spliced fusions, or to investigate alternative splicing of RNA generated from the investigated MLL fusion genes. The results of this study allow to draw several conclusions.

The applied long-distance inverse PCR technique allowed to identify direct and reciprocal MLL fusions, MLL gene-internal duplications, chromosome 11 inversions, chromosomal 11 deletions and the insertion of chromosome 11 material into other chromosomes, or vice versa, the insertion of chromatin material of other chromosomes into the MLL gene. It is noteworthy to mention that no other technique (for example, next-generation sequencing) displays such a high identification of chromosomal fusion sites so far. Even paired-end mRNA analysis by next-generation sequencing has a discovery rate of 60–70% only, however, RNA-based technologies do not provide the patient-specific chromosomal fusion sequences that could be used for MRD analyses. Thus, this ‘old-fashioned’ method is still state of the art and will be used also in the future to gain additional information of the MLL recombinome.

Moreover, we extended our knowledge on complex MLL rearrangements (n=247) leading to a large collection of reciprocal MLL fusions (Supplementary Table S12). About 13% represent in-frame fusions that can be readily expressed into reciprocal fusion proteins. All other represented out-of-frame fusions with either a chromosomal locus or a reciprocal TPG. Out-of-frame fusions such as IKZF1–MLL, PBX1–MLL or JAK1MLL most likely represent a situation where such TPGs were destroyed, creating a typical loss-of-function situation. However, even those out-of-frame MLL fusions still allow to transcribe and express a 5′-truncated MLL protein, termed MLL*.22 This shorter version of MLL has no ability to bind Menin1, LEDGF or MYB, but still carries all enzymatic functions necessary to carry out H4K16 acetylations by the associated MOF protein or H3K4 methylation by the SET domain complex.

The analysis of 2345 MLL fusion alleles led to the discovery of 51 novel TPGs in the past 12 years, of which 40 have already been described (Tables 2a–c). Eleven TPGs are completely new and have not been published yet. Together with 49 MLL fusions that have been described by others, we can present today a total of 94 MLL fusions that have been characterized at the molecular level and 6 MLL translocations to different genetic loci (with no obvious gene fusion). All these MLL fusions provide a rich source for future analyses of oncogenic MLL protein variants. We have summarized all yet-known MLL fusion partner genes in Figure 3, according to their disease type/subtype in which they have been diagnosed.

Figure 3
figure 3

Classification of all yet known fusion partner genes by disease. All TPGs identified were grouped by their diagnosed disease type. Genes have been diagnosed in ALL, t-ALL, t-AML, AML, T-ALL, MLL, bilineal acute leukemia (BAL), MDS, t-MDS, chronic myelogenous leukemia (CML), t-CML, juvenile myelomonocytic leukemia (JMML) and lymphoma. Genes in the intersection belong to two different groups. Bold-marked TPGsb are the most frequent ones.

According to our data, the 7 most frequent rearrangements of the MLL gene differ significantly in the cohorts of infant, pediatric and adult leukemia patients. We also observed significant tendencies that correlate with fusion genes, gender or age at diagnosis. As an example MLLAF10 (P=0.0024) occur more frequently in the male group of patients, while females were more affected by MLL–AF4 fusions (P=0.00576). The most striking finding was that breakpoint distributions differ significantly when concerning distinct TPGs and age groups. It is well-known that breakpoints in infants occur more frequently in MLL intron 11. We could validate this finding for infants with MLL–AF4, infant/pediatric patients with MLL–ENL fusions and pediatric/adult patients with MLL–ELL fusions. However, we observed a contrary situation in adult patients with MLL–AF9 or pediatric patients with MLL–AF10 fusions. Quite surprising was the breakpoint distribution for MLL–AF6 fusions that displayed a clear preference for MLL intron 9 recombinations. Again, these deviations from the observed mean breakpoint distribution (MD) are an argument for differences in the biology of the resulting fusion proteins with respect to oligomerization or factor-binding dependency. Alternatively, it may reflect differences in the biology of transformed cell types, or, reflect different situations during the onset of these translocations (in utero exposition with poisons vs postnatal acquirement).

An important translational aspect of this study is the establishment of patient-specific DNA sequences that can be used for monitoring MRD by quantitative PCR techniques. Because of the fact that a given MLL fusion allele is genetically stable and a mono-allelic marker for each tumor cell, a more reliable quantification and tracing of residual tumor cells becomes possible. For each of these 2384 acute leukemia patients at least one MLL fusion allele was identified and characterized by sequencing. Several prospective studies were already initiated and first published data verified the reliability of these genomic markers for MRD monitoring.4, 5, 6, 7 Therefore, the use of these MRD markers will contribute in the future to a better stratification of leukemia patients, which will help to further improve the outcome. In particular, for infant ALL patients, due to the relatively low numbers of potential IG/TR MRD-PCR targets, the availability of an MLL fusion DNA rearrangement has a high impact for the clinical application of MRD monitoring.

The analysis of the MLL recombinome allows to classify MLL fusion partner genes into functional categories. As discussed above, only very few TPGs are recurrently identified with a significant frequency. On the basis of the present study these TPGs are AF4, AF6, AF9, AF10, ELL and ENL. At least for the AF4, AF9, ENL and AF10 proteins exist a functional correlation, as all these proteins are organized within a protein complex (or different subcomplexes) that affect transcriptional elongation. AF4 is the docking platform for AF9 or ENL, which both interact (via AF10) to DOT1L.55, 56 DOT1L enable methylation of lysine-79 residues of histone H3 proteins, a prerequisite for the maintenance of RNA transcription57, 58 AF4 binds with its N-terminal portion to the P-TEFb kinase that phosphorylates the largest subunit of RNA polymerase II, DSIF, the NELF complex and UBE2A. This converts RNA POL A into POL E and allows gene transcription.59 As a result, increased and extended H3K79 methylation signatures seem to accompany the presence of several fusion proteins (MLL–AF4, AF4–MLL, MLL–AF9, MLL–ENL, MLL–AF10 and MLL–AF6),60 while an additional increase in H3K4 methylation was only demonstrated by the presence of the reciprocal AF4–MLL59 that causes proB ALL in C57Bl6 mice61 and was shown to cooperate with the RUNX1 protein.62 Thus, all the major MLL fusions share a common pathway, which is not only functionally related but offers new and interesting venues to develop new drugs against these leukemias, for example, by the development of DOT1L inhibitors.63 The fusion proteins MLL–ENL, MLL–AF9 and MLL–AF10 recruit thereby the AF4 complex, while the reciprocal AF4–MLL fusion protein is able to perform exactly the same actions on RNA polymerase II and DOT1L. Thus, future therapies addressing either the inhibition of DOT1L, P-TEFb, or blocking the interaction of the MLL N terminus with MENIN1/LEDGF/MYB are promising new ways to address these leukemias. In addition, the inhibition of Taspase1 would help to inactivate the AF4–MLL fusion protein, as the uncleaved fusion protein is rapidly degraded by SIAH1 and SIAH2.64

For all the other MLL fusion partners identified so far, a systematic classification about their function(s) has been described in great detail recently.65 However, further functional studies are necessary to study the mechanisms that are involved or causative for their leukemogenic activity. Such studies will provide the basis for developing new therapeutic strategies in the future.