Integrative genomic analysis of adult mixed phenotype acute leukemia delineates lineage associated molecular subtypes

Mixed phenotype acute leukemia (MPAL) is a rare subtype of acute leukemia characterized by leukemic blasts presenting myeloid and lymphoid markers. Here we report data from integrated genomic analysis on 31 MPAL samples and compare molecular profiling with that from acute myeloid leukemia (AML), B cell acute lymphoblastic leukemia (B-ALL), and T cell acute lymphoblastic leukemia (T-ALL). Consistent with the mixed immunophenotype, both AML-type and ALL-type mutations are detected in MPAL. Myeloid-B and myeloid-T MPAL show distinct mutation and methylation signatures that are associated with differences in lineage-commitment gene expressions. Genome-wide methylation comparison among MPAL, AML, B-ALL, and T-ALL sub-classifies MPAL into AML-type and ALL-type MPAL, which is associated with better clinical response when lineage-matched therapy is given. These results elucidate the genetic and epigenetic heterogeneity of MPAL and its genetic distinction from AML, B-ALL, and T-ALL and further provide proof of concept for a molecularly guided precision therapy approach in MPAL.

A cute leukemia is a clonal hematopoietic malignancy that is characterized by increased proliferation and disorganized differentiation of hematopoietic cells. Although acute leukemia generally presents with either a myeloid or lymphoid lineage, rare cases present with blasts that show immunophenotypes of both myeloid and lymphoid lineages (biphenotypic) or with multiple blasts each having different lineage of immunophenotypes (bi-lineal). Such types of acute leukemia are classified as mixed phenotype acute leukemia (MPAL), which accounts for 1-3% of acute leukemias in adults 1   World Health Organization (WHO) classification of hematopoietic and lymphoid tumors defines five subtypes of MPAL: MPAL with t(9;22)(q34;q11.2), MPAL with MLL rearrangement, MPAL B/myeloid not otherwise specified (NOS), MPAL T/myeloid NOS, and MPAL NOS rare types 2 . Morphologically, the blasts of MPAL are indistinguishable from those of acute myeloid leukemia (AML) or acute lymphoid leukemia (ALL), and diagnosis relies on immunophenotyping. Because of the rarity of the disease and the mixed phenotype presentation, standard therapy for MPAL has not been clearly defined, leading to inconsistency in treatment choices between AML-directed regimens and ALL-directed regimens 3,4 . Patients with MPAL often present with abnormal karyotypes or complex chromosomal abnormalities, and their long-term prognosis is poor 3,5 . The underlying molecular pathophysiology that accounts for the mixed phenotype and distinct characteristics of MPAL is not well understood. Here, we performed multimodal molecular analyses of 31 adult MPAL samples using targeted-capture DNA sequencing, genome-wide methylation array, and RNA sequencing, with the purpose of delineating the genetic basis of MPAL and laying the groundwork for a precision therapy approach in MPAL. The analysis reveals the genetic and epigenetic heterogeneity of MPAL and potential link between molecular subtype and clinical response to therapy.

Results
Clinical characteristics of patients with adult MPAL. The search of our institution's patient database identified 69 patients with the diagnosis of MPAL seen between 2000 and 2015. A total of 55 patients met the WHO diagnostic criteria for MPAL. Among those, 31 patients were untreated at the time of presentation and had pretreatment bone marrow samples available for analysis. The clinical characteristics of these 31 patients are summarized in Table 1. There was no significant difference in the characteristics between these 31 patients and 24 patients who were not eligible for the study because of prior therapy or nonavailability of the samples (Supplementary Table 3). The median age of the studied cohort was 53 years (interquartile range: . Thirteen (42%) of the patients had an immunophenotype consistent with myeloid-B MPAL, while 18 (58%) had an immunophenotype-consistent myeloid-T MPAL. Karyotypes were abnormal in 21 of the 31 cases (68%), and 8 (26%) had complex karyotype abnormalities. Four cases of myeloid-B phenotype were positive for Philadelphia chromosome (Ph+) and one case of myeloid-T phenotype had 11q23 rearrangement (t [11;19][q23;p13.3]). The myeloid marker, myeloperoxidase (MPO), was detected in 85% of myeloid-B cases and 89% of myeloid-T cases. The cases in which MPO were not detected were positive for other myeloid markers such as non-specific esterase, CD11c, CD14, CD64, or lysozyme. All myeloid-T cases were positive for a T cell-specific markers (cytoplasmic CD3 or surface CD3), while all myeloid-B cases were positive for a B cell-specific marker (CD19). Most of the MPAL cases (87%) were positive for stem cell marker CD34, which was consistent with a previous report 3 .
Landscape of high-confidence somatic mutations in adult MPAL. The targeted capture deep sequencing of 295 leukemiaenriched cancer genes revealed 65 high-confidence somatic single-nucleotide variants (SNVs) and 35 small insertions and deletions (indels) in 38 genes in 29 of 31 adult MPAL samples (94%) (Fig. 1a and Supplementary Table 2). The two cases with no detectable point mutations were Ph+ cases. The median number of cancer gene mutations was 2 (range: 0-7) per sample. The most frequently mutated genes were NOTCH1 in nine samples (29%, all myeloid-T), RUNX1 in eight samples (26%, six myeloid-B and two myeloid-T), and DNMT3A and IDH2 in seven samples each (23%, one myeloid-B and six myeloid-T for both). The myeloid-B and myeloid-T phenotypes showed distinct patterns of mutations (Fig. 1b), in which RUNX1 mutations were significantly enriched in myeloid-B (P = 0.039, odds ratio test), whereas NOTCH1 mutations showed significant enrichment in myeloid-T (P = 0.029, odds ratio test). Assessment of clonality based on the estimated cancer cell fraction (CCF) of the mutations revealed that RUNX1, DNMT3A, IDH2, SRSF2, KRAS, and TET2 mutations were frequently clonal, whereas mutations in FLT3, ETV6, TP53, and other rare genes were often subclonal or minimally subclonal (Fig. 1c, d).
We then compare the frequency of cancer gene mutations in MPAL to those of other lineage-committed acute leukemias (AML, B-ALL, and T-ALL) that were sequenced using the same platform (Fig. 1e) Mutations in TP53, NRAS, and DNMT3A were shared across all types of acute leukemia, albeit with some difference in frequency. In contrast, NPM1 mutations were specific to AML and were not found in MPAL, which was consistent with a previous study 6 . Mutations in IL7R, FBXW7, and NOTCH1 were specific to myeloid-T MPAL and T-ALL, suggesting a strong functional role of these genes in T cell lineage leukemia. While myeloid-T MPAL and T-ALL shared a number of mutations in common, there were also differences. PHF6 and JAK3 mutations, each detected in 21.4% of T-ALL cases, were not detected in myeloid-T MPAL (Fig. 1e). In contrast, ASXL1 (11.1%) and FLT3 (11.1%) mutations were detected in myeloid-T MPAL but not in T-ALL. Almost all of the mutations detected in myeloid-B MPAL were shared with AML, although the high prevalence of RUNX1 mutations (46.2%) in myeloid-B MPAL is noteworthy (Fig. 1e). Overall, consistent with the mixed immunophenotype presentation, both AML-type and ALL-type mutations were detected in MPAL ( Supplementary   Fig. 2). These data suggest that the aberrant immunophenotypes of MPAL may be partially driven by the mixture of both AMLand ALL-type mutations.

DNA sequencing of longitudinal samples from MPAL patients.
Since all of the DNMT3A mutations detected in the current MPAL samples were non-R882 mutations (Supplementary Table 2) and were clonal in bone marrow (median estimated CCF = 1.0), we suspected that some MPAL cases are outgrowths of pre-leukemic clonal hematopoiesis, as clonal hematopoiesis is often associated with this type of DNMT3A mutations 7,8 . To address this question, we analyzed longitudinal bone marrow samples taken at the time of complete remission (CR) in eight MPAL cases, of which three had pretreatment DNMT3A mutations. As we expected, persistent DNMT3A mutations were detected in the CR samples in all three cases while other cooccurring mutations were cleared ( Supplementary Fig. 3). These data suggest that the DNMT3A mutations detected in the MPAL cases are likely of pre-leukemic origin. We sequenced relapse samples available for two of the eight cases with longitudinal samples. In one of these cases (MDA016, myeloid-B), mutations in GATA3 and ARID2 were acquired at relapse. Interestingly, this      NOTCH1  RUNX1  DNMT3A  IDH2  NRAS  ASXL1  FLT3  SRSF2  IL7R  KRAS  ETV6  SH2B3  TP53  IDH1  IKZF1  NF1  TET2  GATA3  BCOR  CBL  CCND3  Copy number alterations (CNAs) in MPAL. Using data from the methylation array, we inferred genome-wide CNAs in the 31 MPAL samples ( Supplementary Fig. 4). Recurrently detected arm-level CNAs were loss in chromosomes 5q (19%), 7 (6%), and 12p (13%), and gain in chromosome 4 (10%). Analysis of focal CNAs revealed copy number loss or deletions in genes that were often affected by loss-of-function mutations: CDKN2A (6%), IKZF1 (3%), FBXW7 (10%), ETV6 (6%), and CEBPA (3%).
Fusion transcripts in MPAL. RNA sequencing data were available for 24 of the 31 patients (77%). Analysis of these data revealed, in addition to the translocation detected by karyotyping (BCR-ABL1 [t(9;22)] and MLL rearrangement), five additional in-frame, non-recurrent fusion genes in three patients (Fig. 2). Case MDA020 (myeloid-T) had a KMT2A (MLL)-MLLT4 fusion that was not apparent by karyotyping (cytogenetics showed normal karyotype), suggesting a cryptic abnormality. An NSD1-NUP98 fusion was detected in one case of myeloid-B (MDA022), which has been described as a cryptic fusion in~16 and 2% of pediatric and adult AML cases, respectively 9 . In addition to these two previously described leukemic fusions, we detected three additional in-frame fusions, FOXP1-DNAJC15 and TNKS-LYZ in a myeloid-T case (MPAL27) and NOP14-PLEC in a myeloid-B case (MDA022). The pathogenic significance of these fusion transcripts is not clear. However, FOXP1 translocations have been described previously in both lymphoid and myeloid leukemias (B-ALL 10,11 and myeloproliferative disorder 12 ) with various partner genes such as ABL1, PAX5, and PDGFR, suggesting a role of this gene in both myeloid and lymphoid lineage leukemias. In a pediatric MPAL series, translocation of ZNF384 was detected in~40% of myeloid-B phenotype cases (personal communication with Dr. Charles Mullighan). We did not detect ZNF384 fusion transcripts through our RNA sequencing. We also screened our cohort for the most common ZNF384 fusions (ZNF384-EP300, ZNF384-CREBBP, ZNF384-TAF15, and ZNF384-TCF3) by RT-PCR but the fusions were not detected ( Supplementary Fig. 5).
DNA methylation and gene expression profiles in MPAL. To further explore the potential molecular distinctions between the myeloid-T and myeloid-B phenotypes, we analyzed the genome-wide DNA methylation profile in the 31 MPAL samples. Consensus k-means clustering of the top 10,000 variably methylated probes identified two distinct clusters in MPAL (Fig. 3). These two methylation clusters showed significant correlation with immunophenotype: 73% of cases expressing cluster 1 were myeloid-B, whereas 88% of cases expressing cluster 2 were myeloid-T (P < 0.001). Overall, the myeloid-T cases had more hypermethylated CpG probes than the myeloid-B cases (Fig. 4), which was observed consistently among various CpG locations ( Supplementary Fig. 6). Because IDH1 and IDH2 mutations were detected frequently in the current MPAL cohort (29%) and they are known to cause the CpG island hypermethylated phenotype (CIMP) in AML 13 , we analyzed methylation differences between myeloid-T and myeloid-B phenotypes in IDH1/2 wild-type and mutant cases separately. The methylation difference between the two phenotypes was still significant in IDH1/2 wild-type cases, suggesting that this methylation difference is not conditional on IDH1/2 mutation status ( Supplementary Fig. 7).
We next analyzed promoter CpG probes that were differentially methylated between myeloid-T and myeloid-B phenotypes. Pathway analysis of differentially methylated CpG probes (DMP) revealed that promoter CpGs in genes associated with T cell receptor (TCR) signaling (CD3D, PRKCQ, LCK, and CD247) were hypo-methylated in myeloid-T cases (Fig. 5a). Integrated analysis of promoter methylation and mRNA expression data also revealed that TCR signaling genes, such as CD3D, PRKCQ, CCR9, and CD7, were differentially methylated and expressed between myeloid-T and myeloid-B cases (Fig. 5b, c). These data suggest that aberrant expression of CD3 in myeloid-T MPAL is mediated by promoter hypomethylation. We also performed motif enrichment analysis of promoter DMPs. This analysis revealed that IRF8 and IRF4 binding motifs were significantly hypo-methylated in myeloid-B cases compare to myeloid-T cases (Fig. 5d). Both IRF8 and IRF4 are transcription factors that play critical roles in pre-B cell development, differentiation, and function 14 . Consistent with the findings, PAX5, CXCR4, RAG1, and RAG2, downstream targets of IRF4/IRF8 and essential to B cell development, were significantly upregulated in myeloid-B cases compare to myeloid-T cases (Fig. 5e). This is likely a mechanism of CD19 expression in myeloid-B MPAL, because  15 . Additionally, gene set enrichment analysis (GSEA) of differentially expressed genes between the two phenotypes showed that the B cell receptor (BCR) and NF-κB pathways were upregulated in myeloid-B cases compare to myeloid-T cases (Fig. 5f). This is in line with the activation of the IRF4 pathway in the myeloid-B phenotype, because the BCR-NF-κB axis is an upstream inducer of IRF4 16 . Altogether, these data suggest that the immunophenotypic difference between myeloid-T and myeloid-B MPAL is affected substantially by differential activation of lineage-defining transcription factors and promoter methylation differences.
Methylome comparison among MPAL and other acute leukemias. We then compare the promoter CpG methylation patterns of AML, B-ALL, and T-ALL with that of MPAL to further explore differences between lineage-committed and mixed lineage leukemias. Unsupervised hierarchical clustering of the most variable promoter CpGs from AML, B-ALL, T-ALL, and MPAL cases differentiated AML, B-ALL, and T-ALL into distinct clusters (Fig. 6a). Because 18 of the 31 (59%) MPAL cases clustered with B-ALL or T-ALL and 13 (41%) clustered with AML, we designated these MPAL cases as ALL-like or AML-like MPAL, respectively (Fig. 6b). Of the18 myeloid-T MPAL cases, 13 (72%) clustered with T-ALL, 5 (28%) clustered with AML, and none clustered with B-ALL. Of the 13 myeloid-B MPAL cases, 8 (62%) clustered with AML, 4 (31%) clustered with B-ALL, and 1 (7%) clustered with T-ALL. This case of myeloid-B MPAL that clustered with T-ALL had strong CD7 positivity along with CD19 and MPO positivity, which might explain why it clustered with T-ALL.
To better understand the difference between methylationdefined AML-like MPAL and ALL-like MPAL, we compare the mutation profiles and immunophenotypes of these two subtypes. Mutations in RUNX1 and SRSF2 were significantly associated with AML-like MPAL (Fig. 6c). Moreover, strong expression of CD19 was associated with ALL-like myeloid-B MPAL (Fig. 6d) and strong association of CD7 with ALL-like myeloid-T MPAL (Fig. 6e).
Correlation with clinical outcomes in MPAL. Among the 31 MPAL patients analyzed in this study, 29 (94%) received induction chemotherapy (Supplementary Table 4). Of those, 17 (59%) received an ALL-directed regimen, 9 (31%) received an AMLdirected regimen, and 3 (10%) received other types of therapy. Twelve of the 18 (67%) ALL-like MPAL cases were treated with an ALL-directed regimen, while 6 of 13 (46%) AML-like MPAL cases received AML-directed therapy. Among the 27 patients with evaluable response data, 18 received therapy that matched with their methylation-defined phenotype (i.e., AML-directed therapy for AML-like MPAL and ALL-directed therapy for ALL-like MPAL) and 9 patients received un-matched therapy. Notably, patients who received matched therapy were significantly more likely to achieve CR than patients who received un-matched therapy (CR rate: 72% vs. 22%, P = 0.037, Fisher's exact test, Fig. 7a). However, this did not translate to a difference in composite CR rate (CR + CR with insufficient blood recovery, Fig. 7a) and overall survival (Fig. 7b).

Discussion
In this study, we performed an integrative molecular analysis of 31 adult MPAL samples by using targeted gene sequencing, methylation array, and RNA sequencing. We also compare patterns of mutations and DNA methylation of MPAL to AML, B-ALL, and T-ALL to further define the molecular underpinnings of this unique clinical entity. Our data elucidated the genetic and epigenetic heterogeneity associated with the distinct myeloid-B and myeloid-T immunophenotypes of MPAL. While the myeloid-B and myeloid-T phenotypes shared some cancer gene mutations in common, the pattern of somatic mutations differed   between the two phenotypes, with significant differences in the frequency of NOTCH1 and RUNX1 mutations. Furthermore, myeloid-B and myeloid-T MPAL cases showed distinct DNA methylation patterns that were associated with differences in expression of key genes essential to hematopoietic lineage commitment. In comparison with AML, B-ALL, and T-ALL, the mutation landscape and methylation pattern were similar between myeloid-T MPAL and T-ALL although there were some differences. For example, myeloid-T MPAL lacked JAK3 and PHF6 mutations which were detected in~20% of T-ALL. Additionally, based on the RNA sequencing, we did not detect oncogenic transcription factor fusions that are often  characteristic to T-ALL such as but not limited to TAL1, TLX1, TLX3, NKX2.1, LMO1, and LMO2 fusions. These differences might represent the unique molecular feature of MPAL. DNMT3A mutations were frequently detected in myeloid-T MPAL (33%) and all mutations were nonsense or frameshift mutations with high VAF leading to loss of function (Supplementary Table 2). This is similar to the mutation characteristics in adult T-ALL 17 . Furthermore, loss-of-function DNMT3A mutations have been shown to cooperate with NOTCH1 activation in T-ALL mice model 18 . In fact, in our cohort, four of six (67%) DNMT3A mutated myeloid-T had co-occurrence of NOTCH1 mutation, supporting the cooperative mechanism between these two genes in T cell lineage malignancy. Interestingly, there was a strong co-occurrence of DNMT3A and IDH2 mutations (five of six DNMT3A-mutated cases had IDH2 mutations) in myeloid-T, which is not seen in T-ALL and maybe unique to myeloid-T MPAL. As a result of this co-occurrence, we found no difference in global methylation level between DNMT3A mutated and wild-type myeloid-T cases (Supplementary Fig. 8), because IDH2 and DNMT3A have been shown to have epigenetic antagonism 19 . One of the significant DMPs between DNMT3A mutated and wild-type myeloid-T cases was the CpGs in gene body of NOTCH1 (Supplementary Fig. 8) and these loci overlapped with one of the permissive enhancers detected in FANTOM project 20 . Functional consequence of this hypo-methylated enhancer region is unclear and we did not find difference in gene expression of NOTCH1 by DNMT3A mutation.
The clonal origin of MPAL and the mechanism underlying its mixed-phenotype presentation have been debated since the firstcase description of MPAL 4,21,22 . One theory, "lineage promiscuity", refers to a mechanism whereby leukemic transformation occurs at the stage of early hematopoietic stem/progenitor cells (HSC/HSPC) with multi-lineage potential, allowing promiscuous expression of myeloid and lymphoid lineage markers 23 . Another theory, "lineage infidelity", argues that oncogenic mutations misconfigure differentiation program of leukemia cells, leading to the mixed phenotype presentation 24 . Although our study was not designed to provide a definitive answer to this controversy, our data suggest that these two theories may not be mutually exclusive and that both mechanisms may be in play for MPAL development. Frequent positivity of early stem cell markers, CD34 and CD117, and the fact that MPAL is most often myeloid-T or myeloid-B, not T-B, support the lineage promiscuity concept that MPAL arises from early HSC/HSPCs that have multi-lineage potential 3 . At the same time, our mutation data support the lineage infidelity mechanism, in that oncogenic mutations in lineage-defining transcription factor genes such as NOTCH1 and RUNX1 were detected frequently in MPAL. RUNX1 mutations were previously reported to have significant association with minimally differentiated AML (M0 by the French-American-British [FAB] classification), and analysis of gene expression profiles showed that RUNX1-mutated AML had significant upregulation of the BCR pathway [25][26][27] . This is in line with our data indicating that myeloid-B MPAL, 40% of which had RUNX1 mutations, had BCR-NFκB pathway activation, which likely resulted in IRF4-PAX5 activation and subsequent CD19 expression. Similarly, activating mutations in NOTCH1 and IL7R in myeloid-T MPAL may explain the aberrant expression of T cell markers 28 . Taken together, our results suggest a model of MPAL development in which HSCs/HSPCs with multilineage potential acquire mutations in lineage-defining transcription factor genes that facilitate expression of the mixed immunophenotype. The putative candidate cell of origin is the multi-lymphoid progenitor (MLP) because it has bi-potential capacity to develop into lymphoid (B and T) and myelomonocytic lineages 29 . It is still unknown whether myeloid-T and myeloid-B phenotypes arise from the same clonal origin. The significant differences in methylation signatures between the two phenotypes, which was independent of IDH mutations, raises the possibility that the two phenotypes inherited distinct methylation signatures from different cell types, because cell-specific methylation signatures are generally well preserved during hematopoietic differentiation and can be used as a clonal fingerprint 30 . A further analysis of the mutation mapping in purified HSC compartments from MPAL patients might help in identifying the precise clonal origin.
Clinically, MPAL often generates diagnostic and therapeutic challenges 4 . No standard therapy has been defined for MPAL, and treatment choice has been inconsistent between AML-type and ALL-type regimens. Previous retrospective studies of MPAL suggested that an ALL-type induction regimen produces higher response and survival rates 3,5 . This is consistent with our data, because the majority of the MPAL cases in our cohort were classified as ALL-like MPAL based on methylation profile, and thus an ALL-type regimen may render a higher response rate overall. However, an ALL-type regimen might not be recommended for all MPAL patients, because 40% of MPAL showed a methylation pattern similar to that of AML. In our retrospective analysis, patients who received therapy "matched" to their methylation profile had significantly higher rates of CR to induction therapy than patients who received "un-matched" therapy. These data serve as proof of concept for the molecularly guided precision therapy approach in MPAL. Currently, genomewide methylation analysis is not widely implemented in the clinic, but a recent study showed the potential feasibility and reproducibility of methylation assays in the clinical setting 31 . Future studies in a larger cohort of MPAL patients correlating clinical profiles and outcomes with molecular analysis, especially methylation analysis, are needed to confirm our results. Another clinically relevant finding from this study is that~50% of MPAL patients carried at least one clinically actionable mutation (IDH2 and FLT3). Midostaurin (FLT3 inhibitor) and enasidenib (IDH2 inhibitor) are clinically effective in AML with FLT3 and IDH2 mutations, respectively 32,33 . The clinical activity of Fig. 5 Transcriptomic changes between myeloid-T and myeloid-B. a Pathway analysis of differentially methylated promoter CpG probes (FDR <0.1 and delta beta value >0.15) between myeloid-B and myeloid-T phenotypes showing significant enrichment of T cell receptor pathways. P value was calculated by Fisher's exact test and FDR was calculated by Benjamini-Hochsberg method. b Starburst plot integrating analysis of gene expression and promoter methylation. Red dots represent promoter CpG probes with significantly differential methylation that are also associated with significant differences in expression between myeloid-B and myeloid-T. c Log2 fold differences of transcription levels of genes that were significantly different between myeloid-B and myeloid-T phenotypes and were associated with significant promoter methylation differences. P value was calculated by Wald test and adjusted for multiple testing by Benjramini-Hochberg method. d Motif enrichment analysis of promoter CpG probes differentially methylated between the two phenotypes showing significant enrichment of IRF8 and IRF4 recognition motifs. P value was calculated by Wilcoxon rank-sam test. e Log2 fold differences of transcription levels of key downstream target genes of IRF8 and IRF4 between myeloid-B (My-B) and myeloid-T (My-T) phenotypes. P value was calculated by Wald test and adjusted for multiple testing by Benjramini-Hochberg method. f Gene set enrichment analysis (GSEA) comparing gene expression data from RNA sequencing between myeloid-B and myeloid-T phenotypes showing significant enrichment of B cell receptor (BCR) and NFκB pathways in myeloid-B MPAL. The method of estimating nominal P value and FDR adjustment is described elsewhere 52 these agents in MPAL is not known, but its important MPAL cases are screened for these clinically actionable mutations for potential inclusion of these patients in molecularly targeted agent trials.
This study has several limitations. First, the sample size is small and therefore some of the analysis has limited statistical power to draw a definitive conclusion. For example, DNMT3A, IDH2, and FLT3 were more frequently mutated in myeloid-T compare to myeloid-B but the difference was not statistically significant. We might identify more immunophenotype-specific mutations if the cohort is large. Given the rarity of the disease, this is a limitation of a retrospective study from a single center and it highlights the need of a multi-institutional large study. Second, due to the lack of normal control, we did not perform whole exome/genome DNA sequencing and instead performed targeted DNA sequencing. Although our targeted sequencing panel covered 295 genes   that are recurrently mutated in both myeloid and lymphoid malignancies, there is a chance that missed novel MPAL genes. Third, an interpretation of the results on methylation-defined subtype and its correlation with clinical response/outcome needs caution because of the small sample size and variability in post-induction therapy.
In summary, we report data from an integrated molecular analysis of 31 adult MPAL cases. To the best of our knowledge, this is the most comprehensive and detailed molecular analysis of MPAL. Our data delineate distinct genetic and epigenetic bases for myeloid-T and myeloid-B MPAL phenotypes and suggest potential molecular mechanisms for the mixed immunophenotype presentation.

Methods
Patients. We searched our institution's medical record database for patients with acute leukemia evaluated between 2000 and 2015; from that subset, we identified patients whose disease was diagnosed as MPAL. Because MPAL cases diagnosed before 2008 were classified as "bi-phenotypic leukemia" according to the European Group of Immunological Markers for Leukemia (EGIL) criteria 34 , we reclassified those diagnoses according to the 2008 WHO criteria 35 . Also, for the purpose of this study, we included only MPAL cases with a bi-phenotypic presentation; bi-lineal cases were excluded because of the presumed biological and clonal differences of these two diseases. This resulted in 55 patients meeting the diagnostic criteria of MPAL by 2008 WHO Classification. We then selected patients whose MPAL was untreated at the time of presentation at MD Anderson and who had pretreatment bone marrow samples available for analysis. For some of these MPAL patients, bone marrow samples taken at the time of complete remission (CR) and/or relapse were available for longitudinal analysis. Mutation frequencies in these samples were compare with those in samples from patients with other types of acute leukemia, including previously untreated AML (N = 194), precursor B cell ALL (pre-B-ALL, N = 71), and T cell ALL (T-ALL, N = 14). Written informed consent for sample collection and analysis was obtained from all patients. The study protocols adhered to the Declaration of Helsinki and were approved by the Institutional Review Board at The University of Texas MD Anderson Cancer Center.
Targeted gene next-generation sequencing and mutation calling. We used a SureSelect custom panel of 295 genes (Agilent Technologies, Santa Clara, CA) that are recurrently mutated in hematologic malignancies (Supplementary Table 1). This panel was designed to cover a wide range of recurrently mutated genes in both myeloid and lymphoid malignancies, making it suitable for the analysis of mixed phenotype leukemia. Details of the sequencing methods have been described previously 36 . Briefly, genomic DNA was extracted from diagnostic bone marrow aspirates, as well as CR and relapse specimens when available, using an Autopure extractor (QIAGEN/Gentra, Valencia, CA). DNAs were fragmented and baitcaptured in solution according to the manufacturer's protocols. Captured DNA libraries were then sequenced using a HiSeq 2000 sequencer (Illumina, San Diego, CA) with 76 base pair paired-end reads.
Bioinformatics analysis calling high-confidence somatic mutations. The bioinformatic pipelines calling high-confidence somatic single-nucleotide variants (SNVs) and indels from targeted capture DNA sequencing were described previously 37 . Cancer cell fraction (CCF) of each mutation was estimated by a series of formulas. Briefly, raw ploidy was calculated from the copy number segmentation data, adjusted by tumor purity, and rounded to the next integer. The roundedadjusted ploidy, along with raw variant allele frequency and tumor purity, were then used to estimate the CCF. Detailed formulas are listed below. Diploid status was assumed. Mutations in sex chromosomes were skipped and only adjusted by tumor purity.
Adjusted VAF ¼ Raw VAF Purity Rounded adjusted ploidy þ 2 ð1 À PurityÞ ð Þ =Purity=2 DNA methylation analysis and data processing. DNA methylation analysis was performed using Illumina's Infinium MethylationEPIC assay (EPIC), which covers 850,000 CpG positions in the human genome. Briefly, genomic DNA from bone marrow samples was treated with sodium bisulfite (Zymo Research, Irvine, CA) that coverts un-methylated cytosine residues to uracil and then processed by whole-genome amplification, enzymatic fragmentation, and hybridization to bead chips at 48°C for 17 h according to the manufacturer's protocol. After array hybridization, specific probes designed to interrogate bisulfite-converted loci were single-base extended by incorporating DNP or biotinlabeled ddNTP and stained with a fluorescent reagent to determine the signal intensity ratio of methylated versus un-methylated residues. After washing and staining, the bead chips were scanned on iScan (Illumina) to generate IDAT files. We used the ChAMP algorithm for data processing and normalization following the program's default procedures 38 . The IDAT files were taken as input files and raw beta values were generated. Following initial filtering and quality check, the data were normalized using the BMIQ method 39 . The batch effect was assessed by using the singular value decomposition (SVD) method and corrected if necessary 40 . Differential methylation analysis was performed by using the limma algorithm 41  Difference between two groups was tested by Fisher's exact test. c Comparison of overall survival in patients who received a therapy matched to their methylation status and in patients who received an un-matched therapy. Difference between two grous was tested by log-rank test methylation data from 194 AML samples were downloaded from The Cancer Genome Atlas Data Portal (https://tcga-data.nci.nih.gov/docs/publications/ laml_2012/) and methylation data from 505 B-ALL and 101 T-ALL samples analyzed for a previously published study were kindly provided by Drs. Syvanen and Nordlund from Uppsala University 42,43 . Raw IDAT files from both data sources were taken as raw input files and processed and normalized using the same pipeline already described. Since both of these data sets were generated by Illumina's previous platform, the Infinium HumanMethylation450 (HM450) array that covered~485,000 CpG positions, we analyzed only the CpG probes that overlapped between EPIC and HM450 (91% of the HM450 probes overlapped with the EPIC probes, Supplementary Fig. 1).
RNA sequencing and data processing. cDNA was synthesized via Ribo-SPIA Technology (NuGEN, San Carlos, CA) per the manufacturer's instructions. The synthesized cDNA library was sequenced by Illumina's HiSeq 2000 platform. Raw sequencing data from the Illumina platform were converted to fastq files and aligned to the reference genome (hg19) using the Spliced Transcripts Alignment to a Reference (STAR) algorithm 44 . HTSeq-count was then utilized to generate the raw counts for each gene 45 . Raw counts were then analyzed by DESeq2 for data processing, normalization, and differential expression analysis according to standard procedures 46 . To identify candidate fusion transcripts, we ran three different structural variation (SV) detection algorithms: TopHat-Fusion 47 , FusionMap 48 , and MapSplice 49 . SVs that met the following criteria were considered candidate fusion transcripts: (1) SVs supported consistently by 2 or more algorithms, (2) inframe fusion transcripts, (3) reads that were not mapped to different transcripts by Blat search, (4) supporting seed reads count ≥4, and (5) junction located at the exon-intron boundaries.
Estimation of copy number alterations (CNAs). Genome-wide CNAs were estimated from methylation array data using a combination of the Conumee algorithm (http://bioconductor.org/packages/conumee/) and our in-house segmentation pipeline. Briefly, IDAT files were loaded into Conumee and combined intensity values were generated and normalized. The raw log2 ratio for each CpG site was then calculated by comparing against a set of eight normal internal controls. The calculated log2 ratios were subjected to segmentation by the circular binary segmentation method 50 . Ploidy value was calculated from the segmentation log2 ratios and then adjusted by tumor purity that was estimated from bone marrow blast percentage. The final adjusted ploidy was then plotted by using the Copynumber R package 51 .
Gene set enrichment analysis. Normalized counts data from DESeq2 was taken as input for Gene Set Enrichment Analysis (GSEA) 52 . The permutation parameter was set as the gene set, while the remaining parameters were kept as the defaults.
Motif enrichment analysis. We defined differentially methylated CpG probes (DMP) with a difference between myeloid-T and myeloid-B phenotypes corresponding to a false discovery rate (FDR) value <0.1 and a delta beta value >0.15. The list of DMPs and non-DMPs (as controls) was then mapped to the genome and the nearest transcription start site (TSS) was identified. Genomic sequences 2000 bp upstream and downstream of the TSS were extracted using BEDTools 53 . Extracted sequences were then taken as input for the motif enrichment analysis using the Analysis of Motif Enrichment (AME) software 54 .
Reverse-transcriptase polymerase chain reaction (RT-PCR). Total RNA was isolated from bone marrow aspiration samples and were reverse transcribed to generate complementary DNA (cDNA) using iSCript cDNA Synthesis Kit, (Bio-Rad). We screened for various ZNF384 fusions using the following primer pairs. EP300-ZNF384: Forward primer TCTAGGGGTGGGTCAACAGT, Reverse primer CTGTCAGCAAGGTGGGGTAG, TCF3-ZNF384: Forward primer CAGCCTCATGCACAACCAC Reverse primer CCAGTGTGGATTCGTGTGTG, CREBBP-ZNF384: Forward primer CTCTCGGACTCCCCTACATGA Reverse primer TCAGCAAGGTGGGGTAGTGA, TAF15-ZNF284: Forward primer GGAAGCCAAGGTGGAAGAG Reverse primer ACAGCCCTTCTCTGGCAAC. To complement the lack of positive control, we used two internal controls. First to validate the RT-PCR procedure by confirming expression of GAPDH: Forward primer GAGTCAACGGATTTGGTCGT Reverse primer TTGATTTTGGAGGGA TCTCG. Second to confirm expression of ZNF384: Forward primer AACCCTT-CAAGTGCCACAAC Reverse primer GCACCTGTTGCTGAAGATCA.
Statistical analysis. The chi-square or Fisher's exact test was used to assess differences in categorical variables, and the Mann-Whitney U test or Student's t test was used to analyze differences in continuous variables. Survival outcomes were plotted by the Kaplan-Meier method and the difference in survival was assessed by the log-rank test. Statistical analyses were performed using SPSS (version 24; IBM Corporation, Armonk NY) and R (ver. 3.1.3).
Code availability. We used publicly available computer codes to generate results and all codes are cited in the method.
Data availability. De-identified mutation data and clinical data are available in the Supplementary Information. The data sets generated from RNA sequencing and methylation array are available in the Gene Expression Omnibus repository with the following accession numbers: GSE113601 (RNA sequencing) and GSE113545 (methylation array).