The KMT2A recombinome of acute leukemias in 2023

Chromosomal rearrangements of the human KMT2A/MLL gene are associated with de novo as well as therapy-induced infant, pediatric, and adult acute leukemias. Here, we present the data obtained from 3401 acute leukemia patients that have been analyzed between 2003 and 2022. Genomic breakpoints within the KMT2A gene and the involved translocation partner genes (TPGs) and KMT2A-partial tandem duplications (PTDs) were determined. Including the published data from the literature, a total of 107 in-frame KMT2A gene fusions have been identified so far. Further 16 rearrangements were out-of-frame fusions, 18 patients had no partner gene fused to 5’-KMT2A, two patients had a 5’-KMT2A deletion, and one ETV6::RUNX1 patient had an KMT2A insertion at the breakpoint. The seven most frequent TPGs and PTDs account for more than 90% of all recombinations of the KMT2A, 37 occur recurrently and 63 were identified so far only once. This study provides a comprehensive analysis of the KMT2A recombinome in acute leukemia patients. Besides the scientific gain of information, genomic breakpoint sequences of these patients were used to monitor minimal residual disease (MRD). Thus, this work may be directly translated from the bench to the bedside of patients and meet the clinical needs to improve patient survival.


INTRODUCTION
Chromosomal rearrangements involving the human KMT2A gene (NM_001412597.1) are recurrently associated with the disease phenotype of acute leukemias [1,2]. The presence of distinct KMT2A rearrangements is an independent dismal prognostic factor, while very few KMT2A rearrangements confer either a good or intermediate outcome [3,4]. It became also clear from recent studies that the follow-up of patients during their treatment and therapy-adjustments based on individual MRD monitoring has a very strong impact on outcome [5][6][7]. For this purpose, we established more than 20 years ago a diagnostic network that allowed different study groups and clinical centers to obtain genomic KMT2A breakpoint sequences that can be directly used for quantifying MRD levels in their patients. The current workflow to identify KMT2A rearrangements includes still a pre-screening step (cytogenetic analyses [8,9], split-signal fluorescence in situ hybridization (FISH) [10][11][12], RT-PCR [13] or RNA-Seq) at study/ diagnostic centers. Pre-screened samples derived from infant, pediatric, and adult leukemia patients were then sent for analysis to the Frankfurt Diagnostic Center of Acute Leukemia (DCAL).
These patient samples were then analyzed by a combination of long-distance inverse PCR (LDI-PCR) [14], LD multiplex PCR, and by targeted sequencing of full-length KMT2A by next-generation sequencing (NGS) [15,16]. This allowed us to identify reciprocal translocations, complex chromosomal rearrangements, gene internal duplications, deletions or inversions on chromosome 11, and KMT2A gene insertions into other chromosomes, or vice versa, the insertion of partner chromosome material into the KMT2A gene located at 11q23. 3. As a result, at least one patient-specific KMT2A fusion sequence was obtained and used for establishing patient-specific qPCR assays to monitor MRD of the patient in the clinical setting. It also allowed us to identify unknown fusion partner genes. The results of this effort will be presented, statistically analyzed and discussed.

Patient material
Genomic DNA was isolated from bone marrow and/or peripheral blood samples of leukemia patients and sent to DCAL. Patient samples were obtained from different diagnostic centers worldwide involved in different study groups (Australia, Austria, Brazil, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Israel, Italy, Netherlands, Poland, Portugal, Slovakia, Spain, Russian Federation, Switzerland, United Kingdom) Informed consent was obtained from all patients or patients' parents/legal guardians and control individuals.

Detection of chromosomal breakpoints by LDI-PCR, LD multiplex PCR, and targeted NGS
For LDI-PCR all DNA samples were treated and analyzed as described [17][18][19][20]. Briefly, genomic patient DNA was digested with restriction enzymes and re-ligated to form DNA circles prior to LDI-PCR analyses. Restriction polymorphic PCR amplimers were isolated from the gel and subjected to DNA sequence analyses. The most frequent TPGs were in general analyzed by LD multiplex PCR. Alternatively genomic patient DNA was subjected to targeted NGS, as previously described [15,16]. For three patients the fusion sites were obtained by either RNA-Seq (n = 2) or RT-PCR experiments (n = 1). Idiosyncratic genomic DNA fusion sequences were made available to the sender of the DNA sample for patient-specific MRD surveillance.

Data evaluation and statistical analyses
All clinical and experimental patient data were implemented into a database program (FileMaker Pro) for further analysis. Information about all individual patients was used to compare all defined subgroups and to perform statistical analyses. Chi-Square distribution analyses were performed by using the following website: www.mathsisfun.com/data//chisquare-calculator.html. The database offers the possibility to analyze specific patient cohorts and we will share these data with requesting scientists.

RESULTS
The study cohort To analyze the recombinome of the human KMT2A gene, prescreened and not-prescreened acute leukemia samples were obtained between 2003 and 2022. As described in Methods, patient genomic DNA was either analyzed by PCR or targeted NGS to obtain the genetic information of rearranged KMT2A fusion alleles. For most of the investigated cases, we obtained all clinical information (gender, age at diagnosis, disease type and subtype or information about de novo or secondary leukemia), which is necessary for subsequent data processing. Patients for which we were unable to obtain the relevant clinical data, or which were KMT2A-r negative in our analyses were excluded from the present study. The results of the remaining 3401 patients are summarized in Fig. 1 and Table 1.

Identification of KMT2A rearrangements and their distribution in clinical subgroups
The most frequent KMT2A rearrangements in the two disease subgroups ALL and AML are summarized in Fig. 2A (left side). ALL patients (n = 2182) displayed the following rearrangements: AFF1 (n = 1233; 56.5%), MLLT1 (n = 404; 18.5%), MLLT3 (n = 258; 11.8%),  Infant  Pediatric  Adult  not age classifed  Total   ALL  AML  other  ALL  AML  other  ALL  AML  other  ALL  AML  other   56  GMPS  0  0  0  0  0  0  0  1  0  0  0  0  1   57  ITSN1  0  1  0  0  0  0  0  0  0  0  0  0  1   58  KIF2A  0  0  0  0  0  0  1  0  0  0  0  0  1  All fusion genes that have been analyzed at the DCAL and their distribution between infant, pediatric and adult leukemia patients is shown. Total numbers are given for each patient group separated in ALL, AML and other (diseases). The 11 most frequent direct TPGs and KMT2A-PTDs were separated from other 29 fusion genes that were recurrently diagnosed. Additional 54 TPGs were identified so far only once, of which 15 were out-of-frame fusions (marked in bold). We also identified 16 KMT2A rearrangements with no direct KMT2A fusion gene, but in some cases with a reciprocal KMT2A fusion (#95), and additional 18 patients had no partner gene fused to 5'-KMT2A (#96). Two cases were identified with a deletion of 5'-KMT2A (#97), and one patient had an ETV6-RUNX1 fusion gene with a KMT2A insertion at the breakpoint (#98).  These results are in line with previously published data about the frequency and distribution of different KMT2A fusion partner genes [21,22]. This updated information is highly relevant for diagnostic purposes and the establishment of RT-PCR-based multiplex screening methods [13].

Breakpoint distribution according to clinical subtypes
We also investigated also the breakpoint distribution of the KMT2A recombinome. The major breakpoint cluster region (BCR1) can be mapped between KMT2A intron 7 and KMT2A exon 13, and the minor BCR (BCR2) between intron 20 and exon 24. The majority of patients (n = 3336; 98%) showed breakpoints in BCR1 while a minority (n = 47; 1,4%) was found in BCR2. When restricting this distribution analysis only to our NGS data, then the distribution between BCR1 and BCR2 is 94% vs. 6%, respectively. The remaining breakpoints (n = 17; 0,6%) were found up-stream of BCR1 (n = 4), between BCR1 and BCR2 (n = 8), and downstream of BCR2 (n = 5) (see Fig. 4). We also analyzed the data according to the leukemia phenotype AML or ALL. While ALL breakpoints are found in BCR1 and BCR2, breakpoints in AML patients nearly exclusively occur in BCR1.
The distribution of the 13 most important TPGs and PTDs (AFF1, MLLT3, MLLT1, MLLT10, ELL, KMT2A-PTD, AFDN, EPS15, USP2, MLLT11, MLLT6, SEPTIN6, SEPTIN9, and all others) are summarized in Suppl. Table S2. The Table also contains information about gender, age, disease classification, TIL, and complex leukemia (CL) cases regarding their breakpoint distribution. Excluded from Suppl. Table S2 were again patients with no gender information (n = 57) or no age information at diagnosis (n = 43) It has recently been demonstrated that the localization of breakpoints, particularly within the major BCR, has an impact on cancer biology and clinical behavior: breakpoints within KMT2A intron 11 are associated with poorer outcome [23]. Therefore, we first compiled the breakpoint distribution for all 3401 patients. Specific features of KMT2A intron 9 (4 Alu repetitive elements of which 3 are transcriptionally active) and KMT2A intron 11 (sensitivity against cytotoxic drugs, a DNase1 hypersensitive site [24], an apoptotic cleavage site [25], an RNA Polymerase II binding site [26] and Topoisomerase II binding sites [27]) may account for an increase of DNA double-strand breaks due to these molecular features. As shown in Suppl. Table S2, deviations from the mean distribution in the major BCR were observed for MLLT1, KMT2A-PTD, AFDN, MLLT11, SEPTIN6, SEPTIN9 and MLLT6. The fusion partner genes MLLT1 and SEPTIN6 had preferentially KMT2A intron 11 breaks, while all others tended to bear KMT2A intron 9 or upstream recombination events, e.g. SEPTIN9 where breakpoints are shifting to regions even upstream of intron 9 because of the intron phase of the BCR of this partner gene. Of interest, also therapy-induced acute leukemias shifted significantly to KMT2A intron 11 breakpoints. None of the other parameters (gender, age classes at diagnosis or diseases subtype) displayed a significant variation from the overall breakpoint distribution.
For a more detailed analysis, we subdivided the KMT2A BCR1 into three subregions: (A) exon 9 -intron 9 = 1761 bp; (B) exon 10intron 10 = 679 bp; (C) exon 11 -intron 11 -exon 12 -intron 12 and exon 13 = 5026 bp. The functional cut is between regions A-B and C (separating the regions from exon 9 until intron 10 from the region of exon 11 to exon 13). The observed 'mean distribution' (MD) for these three KMT2A breakpoint regions was A = 37.2%, B = 19.8% and C = 39.5% for all 3401 patients as listed in Suppl. Table S2. We decided not to use a 'random distribution model' (RDM) of chromosomal breakpoints, because this is only based on the length of each DNA region, which does not take into account the above-mentioned molecular features. In these subsequent analyses, Fig. 3 Classification of all fusion partner genes by disease phenotype and age classification. All 3401 diagnosed patients were grouped by their diagnosed disease type (ALL: 2182; AML: 1116; 103 pts had other diseases listed on the right). Since we had for 43 pts no age information at diagnosis, they were excluded from being further subdivided into the age groups infant (n = 1224), pediatric (n = 1021) and adult patients (n = 1113). All 3 age groups were again subdivided in ALL or AML subgroups (infant ALL = 987 pts; infant AML = 197 pts; pediatric ALL = 530 pts; pediatric AML = 465 pts; adult ALL = 647 pts; adult AML = 441 pts). Number of patients with missing information or different disease subtypes are indicated (grey letters). The mean age for all 6 subgroups is given below, either in months or years ± SD. The distribution of the 7 most frequent fusion partners is given by different colors (color code on top) and their frequency in percent. The additional number of identified fusion partner genes are given by blue numbers for each subgroup.
all patients were investigated for their fusion partner gene in correlation with age class at diagnosis (I/P/A), gender, TIL, CL, disease subtypes and the precise breakpoint distribution.
A more detailed analyses (Suppl . Table S3) showed that more fusion partners diverged from the mean deviation and revealed 10 subgroups with breakpoints in KMT2A exons 11-13, and 25 subgroups with preference for KMT2A exon 9 to intron 10 (all marked in orange). This finding clearly argues that certain fusion genes have a selective preference for distinct breakpoints, most likely because of specific functions of the respective fusion proteins. As an example, infant KMT2A::AFF1 patients show breakpoints predominantly localizing to KMT2A intron 11, while adult patients displayed a shift to KMT2A intron 9 and intron 10. KMT2A::MLLT10 patients of the pediatric group display a shift towards KMT2A intron 9. KMT2A::ELL patients show the opposite of KMT2A::AFF1 patients, namely that pediatric patients have a preference for KMT2A intron 9 breakpoints, while pediatric and adult patients have a clear preference for KMT2A intron 11. In KMT2A::AFDN patients the breakpoints are mostly occurring in KMT2A intron 9. Similar observations were made for the rarer fusion partner MLLT11 (significantly shifting towards KMT2A introns 9 and 10), MLLT6 (significantly towards KMT2A introns 8 and 9), EPS15 (significantly towards KMT2A intron 11 in adult patients), SEPTIN6 (significantly shifting towards KMT2A intron 11 in pediatric and adult patients) and SEPTIN9 (significantly shifting towards KMT2A introns 7-9). Noteworthy, the observed shift of KMT2A breakpoints towards intron 11 in the adult patient group with MLLT3 fusions was clearly linked to therapy-induced leukemia. This was not the case for ELL, EPS15 or SEPTIN6 fusions. Whether these findings have an impact on clinical outcome is yet unclear, but it has been recently shown that breakpoints upstream of exon 11 retain the PHD domain I structurally intact for the reciprocal fusion protein, while breakpoint within exon 11 or downstream of it seem to result in a different folding of the PHD domain I, leading to an impairment for CYP33 binding and the homo-dimerization capacity of the PHD domain I [28,29].
We also correlated the number of breakpoints within the two regions exon 9 -intron 10 and exons 11-13 with the age of individual patients that exhibited either an ALL or AML disease phenotype (Fig. 5). In both disease subgroups breakpoint tendencies seem to change with age. In ALL patients, the infant group displays a clear preference for KMT2A intron 11 fusions. This preference appears to switch at about 6 months, when the majority of patients display a preference for KMT2A intron 9 fusions. Conversely, AML patients preferentially display a KMT2A intron 9 breakage which is slightly decreasing with age. These "breakpoint preferences" in the two disease subgroups and their change with age is potentially indicating that "infant ALL" (<6 months) is representing a unique group, which differs from pediatric and adult ALL. Most likely, "infant ALL" -especially cells with t(4;11)/KMT2A::AFF1 translocationsderives from rapidly growing proB fetal liver cells (CD10 -, CD19 + , CD34 + ), while all other disease subgroups derive from bone marrow hematopoietic stem/precursor cells [30,31].
The minor BCR of KMT2A A total of 47 breakpoints were found in the minor BCR of KMT2A. The most frequent partner genes are USP2 (n = 29), AFDN (n = 9), and USP8 (n = 3). Other fusions have been identified only once comprising AFF1, ARHGAP32, CREBBP, ELL, MLLT3 and MLLT10. USP2 cases were mainly associated with B-ALL (n = 21) and MPAL (n = 7), while one patient was diagnosed with AML. The AFDN involving cases were mainly associated with T-ALL (n = 8), while one patient was diagnosed with B-ALL. All other cases displayed an ALL (n = 7) or MPAL (n = 2) disease phenotype. . From this analysis it became clear that breakpoints in the minor BCR of KMT2A is a ALL-specific feature, which is nearly absent (only 1 patient) in AML patients. Noteworthy, the 4 breakpoints upstream of the major BCR were associated with ALL, the interim breakpoints (between intron 12 and exon 20) with ALL, AML and MPAL, while the breakpoints downstream of the minor BCR were associated with ALL, AML, MPAL and NHL. the most prominent areas for major and minor BCR are indicated by darker colors (major BCR is intron 9 -intron 11; minor BCR is intron 21 -intron 23).
Noteworthy, in the major BCR the four most frequent partner genes AFF1, MLLT3, MLLT1, and MLLT10 are responsible for 80% of the cases. By contrast, partner genes identified in the minor BCR were USP2, AFDN, and USP8, which account for 85% of these cases. While USP2 and USP8 are exclusively found in the minor BCR, the others are found in BCR1 and BCR2.    Table 3, the breakpoints in KMT2A distributed differently in infant, pediatric and adult patients. Here, patients were categorized by disease subtype (ALL or AML) and age at diagnoses in years (indicated under the plots). The amount of breakpoints in the regions KMT2A ex9-in10 (region A/B; blue lines) was compared to the breakpoints in region KMT2A ex11-ex13 (region C; red lines). From this analysis it became clear that ALL patients below 6 months at diagnosis have much more breakpoints in the region C than in region A/B. After 6 months, this changes into the opposite distribution with having at the end 90% of breakpoints within region A/B and only 10% of breakpoints in region C. This is completely different in AML patients, where breakpoints start already in the first months of life at 75% within regions A/B and slowly decreasing with age. Vice versa, breakpoints in region C slightly increase with age in AML patients, starting from 25% and ending in elderly patients at much higher rates. This again demonstrates that infant ALL patients up to 6 months at diagnosis are probably different from all other patients.
The KMT2A recombinome Based on the results obtained in the present and previous studies [13][14][15][16][17], a total of 79 direct TPGs and their specific breakpoint regions have been identified, all of which generate an in-frame KMT2A fusion protein (  (Table 2D). In one case, the insertion of KMT2A material was found between ETV6 and RUNX1. (Table 1, #98,  Table 2E). Therefore, these 37 KMT2A rearrangements (Table 1, #95-98) probably represent a subclass of KMT2A abnormalities for which other genetic abnormalities may account for the transformed phenotype of the leukemia cells [34,35].
In our cohort of 3401 patients, a total of 426 patients displayed complex rearrangements involving KMT2A. Within this group of patients, a total of 40 reciprocal KMT2A fusions represent in-frame fusions, while 386 fusions were either non-functional or out-offrame gene fusions at the genomic DNA level (167 chromosome loci / 219 partner genes, see also Suppl. Table S4).

Novel translocation partner genes
Apart from the many new KMT2A fusion genes that have already been discovered at the DCAL and published in the last decade (see Table 2; 38 in-frame-fusions, 9 out-of-frame fusions, 6 chromosome loci), we present additional eight novel in-frame fused TPGs and four out-of frame fused TPGs (marked as "KMT2A  Novel in-frame fusions to KMT2A ACTN2 (Actinin Alpha 2) is a protein coding gene. Diseases associated with ACTN2 include cardiomyopathy with or without left ventricular noncompaction and myopathy. Congenital ACNT2 mutations are associated with structured cores and Z-line abnormalities [36]. ACTN2 encodes a muscle-specific, alpha actinin isoform that is expressed in both skeletal and cardiac muscles. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a muscle-specific, alpha actinin isoform that is expressed in both skeletal and cardiac muscles. ACNT2 forms antiparallel homodimers or heterodimers with ACTN3 and interacts with ADAM12, MYOZ1, MYOZ2 and MYOZ3.
FAM13A (Family With Sequence Similarity 13 Member A) is a protein-coding gene. Diseases associated with FAM13A include polycystic kidney disease 2 with or without polycystic liver disease and interstitial lung disease 2. FAM13A is also implicated in chronic obstructive pulmonary disease COPD). FAM13A is predicted to be involved in the regulation of small GTPase-mediated signal transduction and to be located in the cytosol. Of interest, FAM13A overlaps at the C-terminal portion with a convergently expressed FAM13A-AS lncRNA gene. Downregulation of this particular lncRNA was associated with overexpression of miR-205-3p and downregulation of DDI2 in cervical cancers. Overexpression of FAM13A-AS reversed this effect and caused tumor growth impairment (growth, migration, invasion) and the induction of apoptosis [37].
MATR3 (Matrin 3) is a protein-coding gene. Diseases associated with MATR3 include amyotrophic lateral sclerosis 21 and distal myopathy with vocal cord weakness. This gene encodes a nuclear matrix protein, which is proposed to stabilize certain messenger RNA species. Matrin 3 plays a role in transcription or may interact with other nuclear matrix proteins to form the internal fibrogranular network. In association with the SFPQ-NONO heteromeric MATR3 may play a role in the nuclear retention of defective RNAs, and is involved in the regulation of DNA virus-mediated innate immune response. It is also part of a complex that serves as a platform for IRF3 phosphorylation and subsequent innate immune response activation. Matrin 3 binds to N6-methyladenosine (m6A)containing mRNAs, e.g. by binding to m6A-containing MYC mRNAs which may inflict with MYC protein synthesis. Among several tumors, overexpression of MATR3 has been associated with hepatocellular carcinoma (HCC) and non-small cell lung cancer (NSCLC) stageI/II development and has tumor-suppressive activity in basal-like breast cancer [38][39][40]. Quite important, Matrin 3 has been described as essential for the stabilization of chromatin architecture and the regulation of differentiation processes [41].
SNX9 (Sorting Nexin 9) is a protein-coding gene. Diseases associated with SNX9 include Wiskott-Aldrich syndrome and   ALL trichothiodystrophy 3. This gene encodes a member of the sorting nexin family. Members of this family contain a phosphoinositide binding domain, and are involved in intracellular trafficking. The encoded protein does not contain a coiled-coil region, like some family members, but instead a SRC homology domain near its N-terminus. The protein has been reported to have a variety of interaction partners, including of adaptor protein 2, dynamin, tyrosine kinase non-receptor 2, Wiskott-Aldrich syndrome-like, and ARP3 actin-related protein 3. SNX9 is implicated in several stages of intracellular trafficking, including endocytosis, macropinocytosis, and F-actin nucleation. SNX9 has been described to be important for metastasis by regulating specific surface protein patterns and RhoGTPases [42][43][44][45][46]. RANBP3 (RAN Binding Protein 3) is a protein-coding gene. Among its related pathways are Degradation of ß-catenin and cytoskeletal signaling. This gene encodes a protein with a RanBD1 domain, is found in both the nucleus and cytoplasm and acts as a cofactor for XPO1/CRM1-mediated nuclear export. It is a negative regulator of TGF-beta signaling through interaction with the R-SMAD proteins, SMAD2 and SMAD3, and mediating their nuclear export. RANBP3 regulates melanoma cell proliferation and ß-Catenin import in colorectal cancer [47,48].
STK4 (Serine/Threonine Kinase 4, also known as MST1) is a protein-coding gene. Diseases associated with loss-of STK4 include T-cell immunodeficiency, recurrent infections, autoimmunity and cancer progression. The protein encoded by this gene is a cytoplasmic kinase that is structurally similar to the yeast Ste20p kinase, which acts upstream of the stress-induced mitogenactivated protein kinase cascade. STK4 has been described to regulate the Hippo pathway [49]. STK4 itself undergoes autophosphorylation and can phosphorylate myelin basic protein. A caspase-cleaved fragment of the encoded protein has also been shown to be capable of phosphorylating histone H2B. The particular phosphorylation catalyzed by this protein has been correlated with apoptosis, and it is possible that this protein induces the chromatin condensation observed in this process. Phosphorylation of YAP1 by LATS2 inhibits its translocation into the nucleus to regulate cellular genes important for proliferation, cell death, and cell migration. STK4 also phosphorylates FOXO3 upon oxidative stress, which results in its nuclear translocation and cell death initiation. Similarly, it phosphorylates also SIRT1 and inhibits SIRT1-mediated TP53 deacetylation, thereby promoting TP53-dependent transcription and apoptosis upon DNA damage.
In addition, STK4 acts as an inhibitor of AKT1. Downregulation of STK4 promotes colon cancer invasion/migration [50]. BCAS4 (Breast Carcinoma Amplified Sequence 4) is a protein coding gene. Diseases associated with BCAS4 include breast cancer. BCAS4 is either amplified, overexpressed or fused with the last two exons of BCAS3 to BCAS4 in breast cancer [51]. Overexpression of BCAS4 was also detected in endometrial cancer [52].
ITSN1 (Intersectin 1) is a protein-coding gene. Diseases associated with ITSN1 include autosomal dominant nonsyndromic intellectual disability and esophageal atresia. The protein encoded by this gene is a cytoplasmic membraneassociated protein that indirectly coordinates endocytic membrane traffic with the actin assembly machinery. In addition, ITSN1 may regulate the formation of clathrin-coated vesicles and could be involved in synaptic vesicle recycling. This protein has been shown to interact with dynamin, CDC42, SNAP23, SNAP25, SPIN90, EPS15, EPN1, EPN2, and STN2. ITSNq is PI3KC2ß-dependent and has been linked to tumorigenesis of neuroblastoma and malignant glioma [53][54][55].
Novel out-of-frame fusions to KMT2A DDX6 (DEAD-Box Helicase 6) is a protein-coding gene. Diseases associated with DDX6 include intellectual developmental disorder with impaired language and dysmorphic facies and non-specific syndromic intellectual disability. This gene encodes a member of the DEAD box protein family. The protein is an RNA helicase found in P-bodies and stress granules, and functions in translation suppression and mRNA degradation [56]. It is required for microRNA-induced gene silencing. DDX6 is implemented in the regulation of MYC expression in gastric cancer [57]. DDX6 has also been also linked to the transfer of P-TEFb from the 7SK snRNP to the AF4 super elongation complex (SEC) [58].
OPCML (Opioid Binding Protein/Cell Adhesion Molecule Like) is a Protein Coding gene. Diseases associated with OPCML include ovarian cancer and hypogonadotropic hypogonadism 14 with or without anosmia. This protein is localized in the plasma membrane. The opioid binding-cell adhesion molecule encoded by the rat gene binds opioid alkaloids in the presence of acidic lipids, exhibits selectivity for mu ligands and acts as a GPIanchored protein. Since the protein is highly conserved in species during evolution, it may have a fundamental role in mammalian systems. Differential expression or DNA methylation of OPCML has been linked to several types of cancers [59][60][61].
MGMT (O-6-Methylguanine-DNA Methyltransferase) is a proteincoding gene. Diseases associated with MGMT include oligodendroglioma and gliosarcoma. MGMT is a DNA repair protein that is involved in cellular defense against mutagenesis and toxicity from alkylating agents. The protein catalyzes the transfer of methyl groups from O(6)-alkylguanine and other methylated moieties of the DNA to its own molecule, which repairs the toxic lesions. Methylation of the MGMT promoter or inactivating mutations have been associated with several cancer types, including colorectal cancer, lung cancer, prostate cancer, lymphoma, glioblastoma, and astrocytoma [62][63][64].
ARHGAP32 (Rho GTPase Activating Protein 32) is a proteincoding gene. GTPase-activating protein (GAP) is promoting GTP hydrolysis on RHOA, CDC42 and RAC1 small GTPases. The encoded protein may be involved in the differentiation of neuronal cells during the formation of neurite extensions. It is also involved in N-methyl-D-aspartate (NMDA) receptor activitydependent actin reorganization in dendritic spines.
In summary, the complete "KMT2A recombinome 2023" is comprised by 107 in-frame fusion partner genes, 16 out-of-frame gene fusions, 18 patients with fusions to chromosomal loci, 2 patients with a 5´-KMT2A deletion but with the presence of a reciprocal fusion allele, one patient with an KMT2A insertion between ETV6 and RUNX1, and finally, 16 patients where a 5´-KMT2A fusion could not be identified, but with the presence of reciprocal fusion allele (Table 2).

DISCUSSION
Herein, we present an updated 'KMT2A recombinome 2023' associated mainly with acute leukemia, ALL and AML. Our analyses of 3401 samples were performed by using only small amounts of genomic DNA isolated from bone marrow or peripheral blood collected at diagnosis. Of these patients, 2702 were analyzed by our well-established PCR methods [14], while 696 were analyzed by state-of-the-art targeted next-generation sequencing (NGS) of the KMT2A [15,16].
The applied techniques allowed to identify direct and reciprocal KMT2A fusions, KMT2A gene-internal duplications, chromosome 11 inversions, chromosomal 11q deletions and the insertion of chromosome 11 material into other chromosomes, or vice versa, the insertion of chromatin material of other chromosomes into the BCR of the KMT2A gene. The different LD-PCR technologies (inverse and multiplex PCR) that have been used in the past had a discovery rate of about 95%, while the KMT2A-targeting NGS method has nearly a 100% discovery rate. This is in contrast to diagnostic techniques based on RNA technologies, which do neither provide patient-specific chromosomal fusion sequences that may be used for MRD studies, nor allow paired-end mRNA analysis discovery rates greater than 90% due to variability in gene transcription and bioinformatic problems. However, RNA-Seq methods provide insights into alternative splice events, which could be quite important e.g. in case of "out-of-frame" fusions (see our 389 reciprocal cases), where a genomic analysis can not provide any functional information.
Our own analyses (Table 1) and data present from the literature, we can provide an updated status about the KMT2A recombinome (Table 2), which is currently comprised of 107 direct in-frame KMT2A fusions (Table 2A), 16 direct out-of-frame KMT2A fusions (Table 2B), 18 KMT2A-r patients with a translocation with a chromosomal locus where no gene is present (Table 2C), two patients with a deletion of the 5'-KMT2A, but with reciprocal fusion genes (Table 2D), one RUNX1::ETV6 patient with an KMT2A insertion (Table 2E), and finally, 16 cases in which no direct KMT2A fusion but only the reciprocal KMT2A fusion could be detected (Table 2F).
Moreover, we successfully extended the current knowledge by analyzing more cases with complex KMT2A rearrangements (n = 426). During these analyses a large collection of reciprocal KMT2A fusions was identified, of which 40 were in-frame, while 386 fusions were either non-functional or out-of-frame gene fusions at the genomic DNA level (167 chromosome loci / 219 partner genes, see Suppl. Table S4). However, the majority of the reciprocal out-of-frame KMT2A fusions may still be transcribed and encode a 5'-truncated KMT2A protein, termed KMT2A*, due to a gene-internal promotor upstream of KMT2A exon 12 [26]. This shorter version of the KMT2A protein has no ability to bind Menin1, LEDGF or MYB, but still carries all enzymatic functions and necessary domains to bind known binding proteins that carry out H4K16 acetylations (by the MOF protein) or H3K4 methylation by the SET domain complex. This aberrant KMT2A* protein complex still retains the capacity to bind, read and modify chromatin. This may also explain also the findings that this particular 5'-truncated KMT2A* protein exhibited oncogenic potential in a focus formation assay [65].
Moreover, recent studies with two reciprocal fusion proteins (AFF1::KMT2A and AFDN::KMT2A) [32,66] demonstrated their important function as "chromatin opening protein complexes", which subsequently allowed the corresponding direct KMT2A fusion proteins to activate~10-fold more target genes. The tremendously increased number of deregulated genes ("gain of target genes") changed over time in an evolutionary selection process leading to the final oncogenic gene expression signature [66]. Thus, reciprocal fusion proteins are probably key elements for the onset of pre-leukemic clones that are then be selected to overt leukemic cells by the bone marrow environment. Since this process may be initiated by reciprocal fusion proteins and even maintained after their shutdown, we can assume that they arefor some TPGs -required only for the onset of the pre-leukemic state. Shutting down their gene transcription, or even deleting these reciprocal fusion alleles may even support the manifestation of an oncogenic gene expression pattern. This is also in line with two recent publications that reported a better outcome of t(4;11)/ KMT2A::AFF1 proB ALL patients when both the direct and the reciprocal fusion alleles were expressed [67,68]. If a given transcriptome is strongly enhanced by the presence of the reciprocal fusion protein, then this also causes the expression of more druggable target proteins. Under chemotherapy this may translate in better outcome, because more druggable targets may result in a higher sensitivity of these tumor cells. Thus, leukemia cells with a missing reciprocal fusion protein and displaying therefore a more restricted transcriptome could potentially exhibit a more resistant phenotype.
The analysis of 3401 KMT2A fusion alleles over the past 20 years has led to the discovery of 48 novel TPGs (Table 2) of which 13 TPGs have not been published yet. Together with 59 TPGs described in literature, we can present today a total of 107 direct KMT2A fusions that have been characterized at the molecular level. We have summarized all currently known KMT2A TPGs in Fig. 6, according to the genetic aberration in which they have been diagnosed. Fig. 6 The KMT2A recombinome 2023. All known KMT2A gene rearrangements are subclassified either into reciprocal (balanced) chromosomal translocations (n = 91), spliced fusions (n = 3 + 9), inversion on chromosome 11 (n = 6 + 1), deletions on chromosome 11 (n = 3) and TPG chromatin fragment insertions into the KMT2A gene, or vice versa, KMT2A gene fragment insertions into the TPG's (n = 12). A few of the possible gene rearrangements are depicted at the bottom where different genetic scenarios are indicated. Since we had analyzed more than 400 complex KMT2A rearrangements, most of these scenarios have been identified, apart from chromotripsis which is a known mechanism to generate a multitude of gene fusions in solid cancer, but not in hemato-malignant tumors.
The 8 most frequent TPGs of the KMT2A gene are AFF1, MLLT3, MLLT1, MLLT10, ELL, AFDN, EPS15 and KMT2A-PTDs. Their occurrence differ significantly in the cohorts of infant, pediatric and adult leukemia patients. We also observed significant differences of individual fusion genes and gender distribution (see Suppl. Table S1): KMT2A::MLLT10 occurs more frequently in the male group of patients (p = 0.00379), while females were more affected by KMT2A::AFF1 fusions (p = 0.00148). The most striking finding was that the breakpoint distributions differ significantly for distinct TPGs and age groups. It is well known that breakpoints in infants occur more frequently in KMT2A intron 11 (Suppl . Table S3). These significant preferences clearly argue for a different biology or oncological mechanism behind the fusion proteins with respect to oligomerization capacity, exerted functions or requirements for a HOXA signature. At least KMT2A::AFF1 patients that have a breakpoint in (intron 9, exon 10, intron 10) of the major BCR tend to display strong HOXA signatures (HOXA hi ), while breakpoints within or downstream of exon 11 display a HOXA lo gene signature. Therefore, it will be of importance to analyze other KMT2A::TPG entities for their HOXA high or low signatures. Significant differences in breakpoint distribution for different TPG classes may either be linked to the requirement of HOXA signatures for leukemia development, or simply reflect different cells of origin, like e.g. fetal liver or definitive hematopoietic stem cells. Future analysis may help to unravel this yet underinvestigated phenomenon.
As already mentioned above, the outcome of leukemia patients has been linked to the distribution of chromosomal breakpoints within the KMT2A breakpoint cluster region [23]. Basically, the outcome of leukemia patients with breakpoints in KMT2A intron 11 was worse compared to those patients with upstream breakpoints. The cut point was exactly at the borderline between intron 10 and exon 11. A rational explanation for this observation is provided by functional studies of the PHD domain of the KMT2A protein, encoded by KMT2A exons 11-16. This domain is built up by PHD1, PHD2 and an enhanced PHD3 (ePHD3). This PHD domain is separated by the adjacent bromodomain (BD) and another ePHD4 domain (encode by exons [17][18][19][20][21]. PHD3 has an important dual function, because it either binds to the CYP33/PPIE protein [69,70] or to methylated lysine-4 residues of histone H3 [71]. Binding of PHD3 to H3K4me2/3 peptides is greatly enhanced by the adjacent bromodomain [72], but binding of the prolyl-peptidyl isomerase CYP33/PPIE confers a cis-trans isomerization of proline-1665. This enables the recruitment of BMI1 and associated repressor proteins (HDAC/CBX4/KDM5B) to the CXXC domain of wildtype KMT2A to repress gene transcription. Noteworthy, PHD2 and PHD3 also bind to E3-ligases (CDC34 and ASB2, respectively) which control the steady-state stability of the KMT2A protein [73,74]. As recently shown, breakpoints within KMT2A intron 11 destroy the dimerization capacity of the PHD1-3 domain [28] and disables binding to the BMI1 repressor complex [75]. Thus, a breakpoint upstream of or within KMT2A intron 11 has functional consequences for the resulting fusion proteins, which may provide an explanation for the altered outcome of these patients [23].
An important translational aspect of this study is the establishment of patient-specific DNA fusion junction sequences that can be used for monitoring MRD by quantitative PCR techniques. Since a given KMT2A fusion allele is genetically stable and a monoallelic marker for each tumor cell, a more reliable quantification and tracing of residual tumor cells becomes feasible. For each of the 3401 acute leukemia patients at least one KMT2A fusion allele was identified and characterized by sequencing. Several prospective studies were already initiated and first published data verified the reliability of these genomic markers for MRD monitoring [4][5][6][7]. Therefore, the use of these MRD markers will in the future contribute to a better stratification of leukemia patients which in turn will help to further improve their outcome. In particular, in infant ALL MRD monitoring has a high impact in the outcome of the patients.
For the majority of KMT2A TPGs identified so far, a systematic classification about their function(s) has recently been described in comprehensive detail [76]. However, further functional studies are required to elucidate the mechanisms which are causative for their leukemogenic activity. Such studies may provide the basis for developing new therapeutic strategies in the future.

DATA AVAILABLILITY
Patient data of our FileMaker database containing breakpoint information and data from the investigated acute leukemia patients can be made available to scientist upon request.