Integrated genomic sequencing in myeloid blast crisis chronic myeloid leukemia (MBC-CML), identified potentially important findings in the context of leukemogenesis model

Chronic myeloid leukemia (CML) is a model of leukemogenesis in which the exact molecular mechanisms underlying blast crisis still remained unexplored. The current study identified multiple common and rare important findings in myeloid blast crisis CML (MBC-CML) using integrated genomic sequencing, covering all classes of genes implicated in the leukemogenesis model. Integrated genomic sequencing via Whole Exome Sequencing (WES), Chromosome-seq and RNA-sequencing were conducted on the peripheral blood samples of three CML patients in the myeloid blast crisis. An in-house filtering pipeline was applied to assess important variants in cancer-related genes. Standard variant interpretation guidelines were used for the interpretation of potentially important findings (PIFs) and potentially actionable findings (PAFs). Single nucleotide variation (SNV) and small InDel analysis by WES detected sixteen PIFs affecting all five known classes of leukemogenic genes in myeloid malignancies including signaling pathway components (ABL1, PIK3CB, PTPN11), transcription factors (GATA2, PHF6, IKZF1, WT1), epigenetic regulators (ASXL1), tumor suppressor and DNA repair genes (BRCA2, ATM, CHEK2) and components of spliceosome (PRPF8). These variants affect genes involved in leukemia stem cell proliferation, self-renewal, and differentiation. Both patients No.1 and No.2 had actionable known missense variants on ABL1 (p.Y272H, p.F359V) and frameshift variants on ASXL1 (p.A627Gfs*8, p.G646Wfs*12). The GATA2-L359S in patient No.1, PTPN11-G503V and IKZF1-R208Q variants in the patient No.3 were also PAFs. RNA-sequencing was used to confirm all of the identified variants. In the patient No. 3, chromosome sequencing revealed multiple pathogenic deletions in the short and long arms of chromosome 7, affecting at least three critical leukemogenic genes (IKZF1, EZH2, and CUX1). The large deletion discovered on the short arm of chromosome 17 in patient No. 2 resulted in the deletion of TP53 gene as well. Integrated genomic sequencing combined with RNA-sequencing can successfully discover and confirm a wide range of variants, from SNVs to CNVs. This strategy may be an effective method for identifying actionable findings and understanding the pathophysiological mechanisms underlying MBC-CML, as well as providing further insights into the genetic basis of MBC-CML and its management in the future.

Chronic Myeloid Leukemia (CML) is a unique model for the evolution of cancer and is classified as a triphasic myeloproliferative neoplasm based on clinical and pathological characteristics. Premalignant leukemia stem cells (LSCs) are, in fact, generated in the bone marrow nich by unknown mutagenesis processes 1,2 . In this step, the disease may be undetectable for a decade or more. Further oncogenic processes cause LSCs to develop into leukemia progenitor cells (LPCs) 3 . Leukemogenic alterations mainly affect five classes of regulatory proteins: signaling pathway components, transcription factors (TFs), epigenetic regulators (ERs), tumor suppressor genes (TSGs), and components of the spliceosome 4 .
Without therapeutic interventions or in the case of drug resistance, CML will progress to accelerated phase (AP) and then into an acute leukemia phase that is called blastic phase or blast crisis 5 . Even in the advent of Tyrosin Kinase Inhibitor therapy, blastic phase survival is less than 12 months. In general, blast crisis in chronic myeloid leukemia is still a fatal disease 6,7 .
Over the past decade using Massively Parallel Sequencing (MPS) or Next Generation Sequencing (NGS) the multi-dimensional genomic view with higher resolution of molecular profiling has become possible 8 . It was shown that the combination of exome, genome and RNA Sequencing data 9 , called integrative sequencing [10][11][12] paves the way to better understand the mechanisms and pathogenesis of the cancer and can potentially improve the clinical management and finding new therapeutic targets specially in the advanced phase patients 13 .
In the present study, we conducted integrated genomic sequencing through an in-house filtering algorithm to uncover the leukemogenic important/actionable findings and investigate the myeloid leukemogenesis model in three Iranian myeloid blast crisis CML patients.

WES data analysis (variant filtering).
Paired-end sequence reads were mapped to the refseq (NCBI) human reference genome (GRCh38 assembly) by BWA-MEM. Picard software was used to mark duplicate reads. Genome Analysis Tool Kit (GATK) software Mutect2 package was used to call single nucleotide variants (SNVs) and short insertions or deletions (Indels). The intronic variants were removed, however the 100 bp flanking each exon was included in VCF files using interval/bed file.
Several tools were used to annotate and interpret variants including CGI, Varsome, InterVar, CancerVar, and Franklin. The annotated VCF files were filtered to find Tier1 or Tier2 variants according to AMP classification and pathogenic (P), likely pathogenic (LP), or Leaning pathogenic variants based on ACMG classification.
All Tier 1 or Tier 2 variants in AMP classification and the Tier 3 variants that were pathogenic (P), likely pathogenic (LP), or Leaning pathogenic VUS in ACMG classification were selected as potentially important findings (PIFs). PIFs known to be therapeutic, prognostic or diagnostic biomarkers were annotated as potentially actionable findings (PAFs). RNA sequencing. RNA from whole blood in PaxGene tube was extracted and processed by the TruSeq Stranded Total RNA with Ribo-Zero Gold Kit (Illumina Inc., CA, USA) using 1 µg of DNAse treated RNA. Then paired-end sequencing was performed on Illumina platforms. CLC genomics workbench 20 was used for RNA sequencing and fusion gene analyses. RNA sequencing results were used for determining the outlier expression analysis of genes harboring important/actionable variants, analyzing the fusion transcripts as well as confirmation of variants found in WES.

Chromosome-sequencing. Vista™ Chromosome Sequencing-100 K an NGS based Whole Genome
Sequencing was performed in BGI Clinical Laboratories to evaluate copy number variations (≥ 100 Kb).
Cross validation of the identified variants. The identified SNVs and small Indels in WES were visually validated in the mapping read-tracks after importing WES BAM files in the CLC genomics workbench 20. Besides, the validations of identified WES variants were performed using RNASeq BAM files by confirming both variant and gene expression.
The detected CNVs in chromosome-seq data were validated by CNV analysis from WES data using CLC genomics workbench 20.
Ethics approval and consent to participate. The study was approved by the Ethical Committee of the Iran University of Medical Sciences (Code: IR.IUMS.REC 1395.95-04-87-30235). Informed consent was obtained from all subjects and/or their legal guardian(s) for participation.

Results
Clinical findings. The patient (I), was a 66-year-old female presented by fever, hepatosplenomegaly, leukocytosis, anemia, thrombocytopenia. The blast level in the peripheral blood was 25%. Bone marrow aspiration and immunophenotyping confirmed the myeloid blastic crisis. The patient had a history of thyroid cancer that had been treated many years prior to the development of CML. She was taken to the hospital at the age of 63 with a primary diagnosis of chronic phase CML in terms of leukocytosis with myeloid precursors in the peripheral blood and a positive Philadelphia chromosome in karyotype.
The patient (II), a 55-year old female presented by leukocytosis, anemia, thrombocytopenia. The diagnosis of chronic phase CML was established 15 years before blastic phase at the age of 40 years. There was 36% blast in the peripheral blood. Bone marrow aspiration and immunophenotyping confirmed the myeloid blastic crisis. At blast crisis, the peripheral blood karyotype revealed a clone that was positive for both the philadelphia chromosome and isochromosome 17q, as well as a clone with paracentric inversion in 15q. www.nature.com/scientificreports/ Patient (III) was a 45-year old male presented by leukocytosis, anemia and thrombocytopenia. The diagnosis of chronic phase CML was established two years before blastic phase at the age of 43 years. There was 40% blast in the peripheral blood. Bone marrow aspiration and immunophenotyping confirmed the myeloid blastic crisis.
BCR-ABL1 fusion transcript (b13-a2) was also found in the RNA-seq fusion gene study of all patients during the blast crisis phase. Table 1 summarizes the patients' detailed clinical and paraclinical findings at chronic phase CML diagnosis. The clinical and paraclinical findings at myeloid blast crisis are presented in the Table 2.  (Table 3).
Genetic findings in the patient no.1. The patient was a BCR-ABL1 positive CML case in the myeloid blast crisis phase. She had abnormal karyotype with Philadelphia chromosome. In chromosome-seq there were no specific copy number alterations.
In WES she had six PIFs which contributed in all classes of leukemogenic genes ( Fig. 1) except splicing components (class V). The p.Y272H variant on ABL1 (class I), p.A627Gfs*8 on ASXL1 (class III) and p.L359S on GATA2 (class II) were PAFs.
In WES she had six PIFs that contribute in all classes of leukemogenic genes (Fig. 2). Considering the TP53 involvement by 17p deletion, this patient has seven PIFs. The p.F359V variant in ABL1 (class I), p.G646Wfs*12 in ASXL1 (class III) and TP53 (class IV) deletion were PAFs.
Genetic findings in the patient no.3. The patient was a BCR-ABL1 positive myeloid blast crisis CML.
He also had four PIFs in WES (Fig. 3). In terms of the pathogenic deletions in the short and long arms of chromosome 7 affecting EZH2 and CUX1, this patient has six PIFs which involve four classes of leukemogenic genes. PTPN11:p.G507V (class I) and IKZF1:p.R65Q variants (class II) were PAFs. The WT1:p.R435X (class II) and ATM:p.C117Y (class IV) variants were not PAFs.

Discussion
The ability to analyze and model the leukemogenesis process may aid in the effective management and treatment of hematologic malignancies. According to the proposed "Slot machine" model of leukemogenesis by Murati et al. 4 , progression from chronic phase into secondary acute myeloid leukemia (blast crisis) can be modeled by the combination of at least four steps that one classes of leukemogenic genes is affected in each step.
Using integrated genomic sequencing to assess the genes affected by SNVs, Indels and CNVs, in the present study all MBC-CML patients had at least five leukemogenic genes involved. The detected SNV and Indel variants in WES represent all classes of leukemogenic genes (Fig. 4)  The detected variants in different signaling pathway related genes including ABL1, PIK3CB and PTPN11 in MBC-CML patients may affect ErbB, PI3K-Akt, Ras-MAPK cascade and JAK-STAT signaling pathways [15][16][17] . ABL1 was involved in the first and second patients. Four important missense variants were detected in ABL1 (p.Y272H, p.I418T, p.F359V, p.E459G). All variants were located in the tyrosine kinase domain of ABL1 and were previously reported in CML (COSM12576, COSM12605, COSM1460549) 18 . Mutations in the ABL1 kinase domain, in particular, have been identified in CML patients in the blastic phase 19 . Occurrence of 2 or more mutations in BCR-ABL1 fusion gene following TKI therapy has been reported in CML progression 20 .
PIK3CB-R231H is an important missense variant in the Ras Binding domain (RB domain). This deleterious variant in PIK3CB gene has not been reported in the hematological malignancies including CML (https:// www. mycan cerge nome. org/ conte nt/ gene/ pik3cb/). This gene plays a critical role in solid tumor development 16,21 .   There were several important variants in different transcription factor genes including GATA2, PHF6, WT1 and IKZF1 in MBC-CML patients. In addition, TP53, IKZF1 and CUX1 genes were affected by chromosomal deletions/CNVs. GATA2-L359S variant in the first patient is located on the zinc finger GATA domain. It has been indicated that GATA2 germline mutations strongly predispose patients to leukemia 22 . GATA2 mutations were described in MBC-CML; however p.L359S variant has not been previously reported in blastic CML. p.L359S variant was suggested as a predisposing germline likely pathogenic variant in MDS/AML 23 .
PHF6-R274Q variant in patient No.2 is located in the PHD-like zinc binding domain. Mutations in PHF6 also have been reported in the blastic phase CML while p.R274Q variant is a novel mutation in CML which only reported in Early T-cell precursor (ETP) acute lymphoblastic leukemia (COSM306061).
Biallelic involvement of IKZF1 was observed in the third patient. The detected missense variant was located in the C2H2 zinc finger domain of the IKZF1 (p.R65Q). Because of the 7p deletion, this variant showed a high variant allele fraction (VAF = 100%) in WES analysis. IKZF1 mutations had been found in both chronic and blastic phase CML 24 .The patient with IKZF1 mutation also had a mutation in PTPN11. It has been shown that in AML patients, mutations in IKZF1 and PTPN11 are associated with aggressive clinical course and primary resistant to chemotherapy 25 . CUX1 gene involvement was detected in the third patient in the cytogenetic studies in terms of 7q deletion. CUX1 is a transcription factor regulating TP53 and ATM 26 . 17p deletion in TP53 gene was established by cytogenetic analysis in the second patient. TP53 is a transcription factor located in 17p13.1. TP53 regulate cell cycle, DNA repair and apoptosis. It was observed that 45% of blast phase MPN (MPN-BP) patients have a TP53-related defect such as TP53 gene mutations or haploinsufficiency 27 . Involvement of TP53 has been described in both chronic and blastic phases of CML 28,29 .
Additional chromosomal abnormalities (ACAs) have been discovered in the second and the third patients. Patients with high-risk ACA, such as i(17q) and -7/7q-, are known to have a poor response to TKIs and a greater risk of disease progression. -7/7q-are less common in BC than i(17q), although they have a greater unfavorable influence on prognosis 30 . The patient number 3 with multiple deletions in the chromosome 7 has shorter time to MBC.
WT1-R435X is another significant nonsense variant discovered in the third patient. WT1 is a tumor suppressor gene which has known function as a transcriptional regulator 31 . This mutation causes loss of function in WT1 by the activation of NMD (Nonsense mediated decay) mechanism. Mutations in WT1 were reported in the blastic phase CML 32 , Wilms tumor and desmoplastic small round cell tumor (COSM21401). Pathogenic variants in this gene are correlated with poor prognosis in the patients with myelodysplastic syndromes 33 . The p.R435X germline mutation in WT1 has been classified as pathogenic which is associated to Wilms tumor in the clinvar (RCV000003671). www.nature.com/scientificreports/ Two frameshift single nucleotide deletions were identified in ASXL1 (A627Gfs*8, G646Wfs*12). ASXL1 mutations are more frequent in secondary AML and may contribute to the development of CML to AML. Moreover, ASXL1 mutation might be associated with an aggressive phenotype in myeloid malignancies 4 .
It has been shown that mutations in ABL1 kinase domain, ASXL1, IKZF1, TP53 are common in CML patients who develop BC 12,30,34 . ASXL1 mutation is the most frequent mutation in poor outcome patients. In addition, individuals with an ASXL1 mutation had a longer median time to develop BC 12 . In the current study, the first and the second patients with ASXL1 mutation had longer time to MBC than the third patient.
EZH2 gene deletion was established by chromosome-seq in the third patient. EZH2 gene is an epigenetic regulator gene (H3K27 methyltransferase). Loss-of-function mutations in EZH2 have been reported in primary myelofibrosis (PMF), MDS and MDS/MPN overlap syndromes 27 .
Several important variants have been documented in the tumor suppressor genes specially the genes involved in the DNA damage-recognition and DNA repair pathways including CHEK2, ATM and BRCA2. The CHEK2 c.-4C > T variant is an important Tier3 variant. CHEK2 has a role in DNA damage-recognition and repair, cell cycle and P53 signaling pathway. The c.-4C > T variant is located in the kozak sequence which may affect translation 35 . This variant has been reported in clinvar as VUS for hereditary cancer-predisposing syndrome (RCV000131287).
ATM-C117Y variant is a Tier2 missense variant. ATM is involved in homologous recombination, cell cycle and P53 signaling pathway 36 . ATM has a key role in the DNA damage-response pathway and loss of function mutations in ATM may promote acceleration of the blast crisis in CML 37 . The ATM-CHEK2-p53 axis has been shown to play an important role against cancer initiation by inducing apoptosis, cell cycle arrest or senescence in the cancer cells 27 .
BRCA2-P3292L variant is a missense Tier2 variant. BRCA2 is a tumor suppressor gene which contributes to homologous recombination 38 . It has been indicated that 20% of the patients with Therapy-Related Myeloid Neoplasms (t-MN) among breast cancer survivors have a germline mutation in the BRCA1/2, CHEK2, TP53, or PALB2 genes, which are important in DNA repair pathways 27 .   39 . PRPF8-G1796R variant in the second patient is a missense Tier3 variant. This variant has been classified as driver mutation and has not been previously reported.
The self-renewal, differentiation and proliferation are three main processes which are impaired in malignancies. In the current study, the integrated genomic sequencing detected important variants in twelve different cancer genes. Mutations in signaling genes may induce proliferation of malignant cells while mutations in the epigenetic regulators (ERs) or tumor suppressor genes (TSG) may promote self-renewal in the progenitor cells and transcription factors mutations may affect differentiation. Mutations in the spliceosome component may also involve differentiation and self-renewal of progenitor cells 40 . Therefore, the alterations in ABL1, PTPN11 and PIK3CB signaling molecules may potentiate the proliferation process in MBC-CML. Variants in the CHEK2, BRCA2 and ATM tumor suppressor genes, as well as the ASXL1 and EZH2 Epigenetic Regulators, may support the self-renewal of the proliferating progenitors. Variants in the GATA2, PHF6, WT1, IKZF1 and CUX1 transcription factors may operate as differentiation blockers. Finally, the variant in the PRPF8 as spliceosome components may act as differentiation blocker or may cause self-renewal of the proliferating progenitor.
Finally, a recent integrated model of blast crisis 41 showed that one of the most important mechanism associated to the BC progression is the epigenetic reprogramming related to the abnormal Polycomb Repressive Complex (PRC) function. The components of the PRC complex as an epigenetic complex, repress the expression of certain genes including leukemogenic HOXA cluster. On the other hand, specific gene products interact with PRC components to add/remove repressive marks or recruitment of PRC components to distinct loci. Based on recent studies on BC genome and transcriptome, it has been shown that the components of the PRC complex have mutated or altered in expression respectively. These changes overally affect the gene expression signature of BC cells. ASXL1 gene products interact with PRC2 components including EZH2. IKZF1 also recruits PRC2 to the target gene loci. According to this model, in the first and second patients the frameshift mutations in the ASXL1 may have a critical role in BC progression. In the third patient the large deletions in the chromosome 7 which affect IKZF1 (7p12.2) and EZH2 (7q35-q36) may accelerate BC.
A suggested model of leukemogenesis for the hierarchy of events is presented in the Fig. 5. In this model the initial step is the presence of mildly-deleterious variants (VUS-P) mainly in the tumor suppressor genes (TSG)/ DNA damage repair (DDR) genes, transcription factors (TF) and/or components of splisosome. These variants may increase the individual's susceptibility and predisposition to malignancies. The BCR-ABL1 fusion gene and mutations in the epigenetic regulator (ER)/transcription factor (TF) genes, which mainly influence the function of PRC, are the key genetic discoveries in the second and third steps. The final step in the accelerated and blast crisis phase may be started by the mutations in signaling molecules and/or additional chromosomal abnormalities (ACAs). ABL1 mutations specifically in the kinase domain lead to TKI resistance and delay in TKI switching may affect prognosis and survival by blast crisis. ACAs which involve leukemogenic genes (such as ER, TF or TSG) also may exacerbate the clinical prognosis and enter the disease to the blastic phase. www.nature.com/scientificreports/ Although mutations in ASXL1 or IKZF1 genes may play a key role in BC, they are more often found at the time of diagnosis in the chronic phase, in individuals with poor prognosis. Therefore, mutations in oncogenes such as ABL1 and PTPN11 may act as an accelerator.
In this study, samples related to the different phases of disease progression were not available. Future studies using genomic sequencing in different serial time points may allow detecting the sequence of occurrences related to the disease progression. Furthermore, recent advances in single cell sequencing may help to more accurately clarify the disease mechanisms.

Conclusions
The integrated genomic sequencing in this study effectively identified large spectrum of important and actionable variants from SNVs to CNVs. Mildly-deleterious passenger variants mainly in DNA damage repair (DDR) genes may increase the individual's predisposition to cancer. BCR-ABL1 fusion, mutations in signaling and epigenetic regulator genes along with ACAs may act as cancer drivers. Epigenetic reprogramming caused by ASXL1 or IKZF1 mutations seems to be one of the most critical mechanisms linked with BC development. ABL1 domain mutations, mutations in other signaling genes and high risk ACAs may promote CML to BC.
Using this strategy we have identified several actionable findings which may improve targeted therapies. These findings may help better understanding the underlying pathophysiological mechanisms and may provide further insights into MBC-CML genetic basis in future. Further studies will shed light on the clinical usefulness of the integrated genomic analysis.

Data availability
The datasets generated and/or analyzed during the current study are available in the ENA https:// www. ebi. ac. uk/ ena/ brows er/ view/ PRJEB 51084.

Figure 5.
A suggested model of leukemogenesis. In this model the initial step is the presence of mildlydeleterious variants which may affect the tumor suppressor genes (TSG), transcription factors (TF) and/or components of splisosome and increase susceptibility to cancer. In the second and third steps, the major genetic findings are BCR-ABL1 fusion gene and mutation in the epigenetic regulator (ER)/transcription factor (TF) genes. The final step in the accelerated and blast crisis phase may be started by the mutations in the signaling molecules and/or additional chromosomal abnormalities (ACAs).