Comprehensive analysis of genetic factors predicting overall survival in Myelodysplastic syndromes

Myelodysplastic syndromes (MDS) are a group of clonal hematological disease with high risk of progression to AML. Accurate risk stratification is of importance for the proper management of MDS. Genetic lesions (Cytogenetic and Molecular mutations) are known to help in prognosticating the MDS patients. We have studied 152 MDS patients using cytogenetics and next generation sequencing (NGS). These patients were evaluated and as per cytogenetic prognostic group, majority (92.1%) of the patients classified as good (81.6%) and intermediate (10.5%) group. The NGS identified 38 different gene mutations in our cohort. Among 111 MDS patients with mutations, the most frequent mutated genes were SF3B1 (25.2%), SRSF2 (19%) U2AF1 (14.4%) ASXL1 (9.9%) RUNX1 (9.9%) TET2 (9%), TP53 (9%), ATM (6.3%), NRAS (5.4%) and JAK2/3 (5.4%). The survival analysis revealed that the mutations in TP53, JAK2/3, KRAS, NRAS and ASXL1 were significantly (P < 0.05) associated with poor survival of the patients. The univariate cox and multivariate cox analysis of our study suggested that the age, marrow morphology, cytogenetic and gene mutations with IPSS-R should be considered for prognosticating the MDS patients. We have proposed M-IPSS-R which changed the risk stratification i.e. 66.3% patients had decreased risk whereas 33.75% showed increased risk compared to IPSS-R. The survival analysis also showed that the M-IPSS-R were more significant in separating the patients as per their risk than the IPSS-R alone. The change in risk stratification could help in proper strategy for the treatment planning.

Myelodysplastic syndromes (MDS) are a clonal hematopoietic stem cell disorder manifesting significant clinical heterogeneity and genetic diversity. MDS mainly characterized by varying degree of cytopenias due to ineffective hematopoiesis and dysplasia in one or more hematopoietic stem cell lineages. It is primarily a disease of the elderly with risk of transformation to acute myeloid leukemia (AML) 1,2 .
Chromosomal abnormalities, somatic mutations, and epigenetic changes represent key pathogenic defects in MDS. Recurrent chromosomal aberrations associated with distinct clinical outcomes, continue to remain the most important prognostic factors for treatment planning in MDS. However, approximately 50% of MDS patients are cytogenetically normal suggesting the presence of distinct molecular events that contribute to disease phenotype and transformation 3,4 .
Although, clinical risk stratification tools have been successful in prognostication of MDS patients, neither the International Prognostic Scoring System (IPSS) nor its revision (IPSS-R), consider somatic mutations which could not only aid in diagnosis of early-stage disease with ambiguous morphology but also account for clinical heterogeneity associated with inter-patient variability of harbored somatic mutations [5][6][7] . Comprehensive molecular studies have identified a set of recurrently mutated genes, associated with cellular processes, such as signal transduction, RNA splicing, epigenetic and transcriptional regulation. The current treatment protocols rely on IPSS-R for prognostication of MDS patients, however, there is a need for optimized treatment strategies for proper prognostic evaluation of individual patients owing to the clinical and biological heterogeneity associated with the disease.
Several studies have evaluated the prognostic significance of genetic lesions in MDS patients with varied findings in different study populations 1,2,[8][9][10] . In last few decades the rapid development of high throughput technology like next generation sequencing has tremendously improved the genetic profiling of MDS 1,2,[8][9][10] . The mutations identified majorly the genes involved in DNA methylation (DNMT3A, TET2, IDH1/IDH2), chromatin modification (EZH2, ASXL1), transcription factors (RUNX1, GATA1/GATA2), RNA splicing (SF3B1, U2AF1, SRSF2 and ZRSR2) and signal transduction (JAK2, KRAS/NRAS, CBL) [8][9][10][11][12][13][14] . These recurrent mutations often occur in combination with distinct prognostic implications and are associated with leukemia transformation, response to hypomethylating agents and overall survival of the patients 8,[11][12][13][14] . The cytogenetic profile of MDS has been explored in Indian subcontinent, however, the data on mutational profile of MDS is lacking 4,15,16 . The evolving data on mutations implicated in pathogenesis of MDS is continuously increasing [17][18][19] and it also implies that targeted sequencing could offer cost-effective and treatment strategies for MDS patients. The incorporation of mutations in current prognostic scoring systems along with cytogenetic and clinical data is required and should be explored in different populations to define an inclusive scoring system for proper prognostic evaluation. The present study describes the mutational spectrum of MDS in India and its importance in disease prognostication.

Results
Patients characteristics. The clinical characteristics of MDS patient's cohort of our study in comparison with Haferlach et al. 8  Gene mutations in MDS patients. Next generation sequencing revealed at least one non-synonymous gene mutation in 111 (73%) patients. In the current study different genomic lesions (203) were detected, which caused change in amino acid of coding regions. Among these mutations, 159 (77.9%) were single nucleotide substitution followed by deletions 21 (10.2%), insertions 12 (6.1%) and duplications 10 (5%) (Supplementary Table S6).
The multivariate analysis revealed that the cytogenetic (HR = 1.95 p = 0.007), marrow morphology (HR = 2.8, p = 0.014), TP53 (HR = 5.9, p = 0.001), RUNX1 (HR = 2.87, p = 0.022) and NOTCH1 (HR = 3.8, p = 0.03) were significantly associated with poor OS ( Table 5).  www.nature.com/scientificreports/ risk (p = 0.08) and intermediate risk versus high-risk (OS = 72 months) subgroups (p = 0.913) ( Fig. 2A) The MDS patients with genetic lesions increased to 88.8% when gene mutations (73%) data was combined with cytogenetics (40%). The cox proportional hazards regression analysis suggested that age, marrow morphology with number of lineage dysplasia and gene mutations are significant predictors for OS, we then included these factors into IPSS-R and proposed new scoring system, named (Molecular) M-IPSS-R system. Scoring for each proposed factors and selected gene mutations described in Table 6 and Supplementary Table S7. In our proposed model for single gene mutation the score was given as 1 and the score was increased with increase in number of mutations. Based on the M-IPSS-R model, we have re-classified the 152 MDS patients into Low Risk (score from 1.5 to 3), Intermediate Risk (score 3.5-5.5) and High risk (score > 5.5). The K-M survival analysis showed significant difference between OS of Low risk (OS = Not reached 50% median survival) versus Intermediate risk (OS = 52 months) (p = 0.02) and between intermediate risk versus high (OS = 12 months) (p < 0.0001) suggests that these risk classified patients were more accurately prognosticated into 3 different prognostic subgroups (Fig. 2B).

Discussion
MDS is a heterogeneous group of hematologic neoplasms which may occur de novo or secondary to various offense to the bone marrow. MDS is usually an elderly disease and the incidence increases with advancing age. In our study the median age of MDS patients at diagnosis was 55 years with range from 16 to 90 years, which is similar with the reports from Asian countries; India (55 years) 16 , Pakistan (51 years) 3 and China (62, 51 years) 21,22 . On the other hand, median age at diagnosis in our study cohort is different from US (77 years) 23 , Europe (76 years) 24 and France (~ 78 years) 24 . The younger patients (≥ 55 years), were predominant (55.9%) in our study, which is similar to China MDS populations (~ 52%) 22 , but different from USA (13.5%) 25 , Greece (28.1%) 26 and Poland (~ 12) 27 . As per Surveillance Epidemiology and End Results (SEER)-Medicare database, also reported that the median age in Asian countries is earlier than that of Western countries 24 . The trend towards MDS development in younger age in India and Asian countries may be due to different genetic susceptibilities among ethnic groups, geographical, dietary reasons and importantly occupational and environmental stress like toxin exposures.
The MDS patients recruited in our study were classified as per the 2016 revision of the WHO classification of myeloid neoplasms and acute leukemia 28 16 from India reported monosomy 7 as the most frequent cytogenetic abnormality detected in 31.6%, followed by del 5q in 21% and trisomy 8 in 16%. We also compared the cytogenetic data with the study reported by Papaemmanuilʼs et al. 2013 12 . The chromosome aberration frequency almost similar except del(20q) (3% vs. 8.5%) and complex karyotype (8% vs. 1.9%) when calculated among MDS cohort. This difference in frequency of chromosome aberrations may be due to younger age MDS cohort. Hence our study indicate that the type and frequency of aberrations varies with ethnic and regional differences. The prognosis of the patients were assessed as per IPSS-R scoring system which combines the scores of 5 main factors including hemoglobin, platelets, neutrophil count, cytogenetics and marrow blast, among www.nature.com/scientificreports/   Table 6. M-IPSS-R proposed scoring system for prognosis of MDS patients. ¶ Selected genes TET2, ASXL1, TP53, RUNX1, NRAS, KRAS, NOTCH1. *Presence of any 1 gene mutation from selected genes; **Presence of any 2 gene mutation from selected genes; ***Presence of any 3 gene mutation from selected genes; ****Presence of any 4 gene mutation from selected genes. >****Presence of any more than 4 gene mutation from selected genes.  Table 1). The IPSS-R scoring system was developed in patients without treatment. The insignificant difference (p = 0.913) ( Fig. 2A) of OS between intermediate and high risk in our study may be due to patients treated with HMAs, chemotherapy or HSCT. However, the genetic data may give better understanding of OS. The advancement in molecular biology technology has improved the genome-wide analysis of genetic mutations in MDS 8,34 . Gene mutations frequency differ among different countries from 51 to 91.4% in MDS patients 24 . In our study, we have identified mutations in 111 (73%) of MDS patients had at least one mutation. Overall mutation frequency among MDS cohort is similar (78%, 576/738) to the study reported by Papaemmanuilʼs data 12 , whereas other study group from China reported 91.4% mutations 35 and Haferlach et al. 8 reported 89.5%. The most frequently mutated genes among 111 patients were SF3B1, SRSF2, U2AF1, ASXL1, RUNX1, TET2, TP53, ATM, NRAS and JAK2/3 accounting for more than 5% frequency in these patients (Supplementary Table S6). Though similar genes were mutated among different 8,12 MDS cohort, in our study low frequency of TET2 (9% vs. 37.6% vs. 30.2%), ASXL1 (9.9% vs. 26.6% vs. 16.6), DNMT3A (4.5% vs. 15% vs. 12%) were observed. However a high frequency of U2AF1 (14% vs. 8.6% vs. 8%) compared to other studies ( Supplementary Fig. S3). These differences in frequency could be explained by difference in MDS cohort characteristics such as median age (55, 72.8 8 , 68 12 years), marrow morphology [(SLD-29.3, 16% 8 , 33% 12 ), (MLD-32.6%, 40% 8 , 25% 12 ), (Excess blast-30.2%, 40.6% 8 , 23% 12 )]. International working group for the prognosis of MDS, reported that approximately half of MDS patients carry somatic mutations in splice factor genes, and of these, SF3B1 is the most commonly mutated gene 36 . Our data is concurrent with the previous study and showed that the out of 111 mutated patients 58.5% of patients had RNA splice factor gene mutations and of these SF3B1 gene mutations were reported with high frequency (25.2%). The results showed that MDS patients with SF3B1 mutations were predominant in the lowrisk group. The patients with NRAS, KRAS, IDH2, TP53, RUNX1, ASXL1, ZRSR2, U2AF1, ATM, SETBP1 and SRSF2 mutations were predominant in the high-risk group (Table 3). Haferlach et al. 8 and Makishima et al. 37 also reported these mutations to be frequent across MDS high risk patients. In our study 6 patients with MDS-RS morphology SF3B1 gene mutations were identified in 5 patients (83%) which is similar with the studies 8,36 . However SF3B1 mutations were also identified in 22 other MDS patients in our cohort. As these patients had low percentage of ring sideroblast (> 14%) and hence failed to classify as MDS-RS as per WHO classification. Xiong et al. 38 suggested that the SF3B1 mutation, but not the presence of ring sideroblasts, identified a distinct subtype and showed independent prognostic value on survival and leukemia transformation.
The pathogenic role of recurrent gene mutations in MDS has been suggested by several groups 8,12,[34][35][36] . These gene mutations are drivers for disease evolution, i.e. from asymptomatic clonal hematopoiesis to MDS, and, ultimately progression to AML. These genes have been classified by different study groups [39][40][41] , into a limited number of cellular processes, including RNA splice factor genes, epigenetic and transcriptional regulation, and signal transduction. Our study showed that the older patients had a significantly (p = 0.03) higher incidence of DNA methylation-and hydroxyl-methylation related gene (23.2%) (DNMT3A, IDH1/2 and TET2) mutations, while younger MDS patients had higher incidence of activated signaling genes (27.3%) (CBL, GNAS, JAK2/3, KRAS, NRAS, PTPN11 and NOTCH1) mutations and transcriptional factors related genes (27.3%) (CUX1, GATA2, IKZF1, RUNX1, PHF6, ETV6, TP53 and WT1), however there was no significant difference was observed between both the age groups ( Supplementary Fig. S2). Therefore, it is evident from our study that the detection of mutations can give useful genetic information that may be clinically applicable to current treatment methods.
In our study the NPM1 gene mutations were identified in 4 (2.6%) female MDS patients (Supplementary  Table S6) including MDS MLD (n = 2), MDS-EB (n = 2) and with normal karyotype (Supplementary Table S3). Guillermo Montalban-Bravo et al. 42 studied a large cohort of MDS and reported lower frequency (n = 31/1900, 1.6%) of NPM1 gene mutations, predominantly identified in females (55% vs. 33%, p = 0.02) and with high frequency of normal karyotypes (81% vs. 47%, p = 0.001). Though we have also identified lower frequency (4/152, 2.6%) of NPM1 gene mutations, these patients need to be characterized carefully as these patients are prone to develop AML. Nucleophosmin 1 (NPM1) is a nucleolar protein involved in multiple cell function and protein-protein interactions 43 . The NPM1 mutations are detected in 20-30% of AML also in 50-60% of karyotypically normal AML patients. Presence of NPM1 mutations in AML is known to be associated with favorable outcomes when treated with intensive chemotherapy 44 . MDS and MDS/MPN patients have poor clinical course if presented with NPM1 mutations with a high rate of AML transformation 45 . The distinction between AML and MDS is defined based on blast percent margin of 20% blasts and accordingly the treatment decisions changes. Hence, it is important to understand NPM1 mutant MDS patients, as these patients are more likely to progress into AML compared with NPM1 wild type MDS patients, regardless of blasts percentage. Study groups have suggested that NPM1 mutations in myeloid neoplasms may classify as AML, even in the presence of, < 20% bone marrow (BM) blasts 45 .
In our study we considered, patient's median OS below 25 months as a poor survival of MDS patients. Kaplan-Meier survival analysis revealed ASXL1, TP53, KRAS, NRAS and JAK2/3 mutations were significantly (p < 0.05) associated with poor survival and prognosis independent of IPSS, similar result was reported by Jiang et al. 46 and Bejar et al. 47 study group. In our study we have observed that KRAS and NRAS mutations were frequent in high risk patients, Muhammad et al. 48 showed, MDS patient with RAS gene mutation progressed to AML and its unfavorable indicator of survival in AML. Gene mutations in RUNX1 and NOTCH1 also showed poor survival but no statistical (p > 0.05) difference among wild type patients ( www.nature.com/scientificreports/ was predominant (13.2%) versus (4.7%) ( Table 3) in high risk patients, of our study and same frequency has been reported by Bejar et al. 47 and also showed shorter survival. The NOTCH1 is known to be associated with leukemogenesis in lymphocytic leukemias and has been reported more frequent in AML than MDS and showed poor patient's survival 49 . In case of TET2 mutation our study fail to show the association of TET2 mutation with patient's survival may be due to low number (N = 10) of patients or higher response to HMA 14 . Our result was similar to study from Jiang et al. who suggested that TET2 mutation burden has association with patient's OS and not the TET2 mutation status 50 . Overall, the patients with mutational factors had shorter OS in comparison with those without such mutational factors within the same IPSS-R risk group. The genetic mutation information of patients could help to identify low risk MDS with poor survival. Also, the patient's prognosis was deteriorating with increase in number of mutations. Hence identification of mutations in low risk is important for management of the disease. Several study groups 1,2,8 have suggested inclusion of gene mutation in IPSS-R prognosis system. Haferlach et al. 8 considered the predictors such as age, gender, IPSS-R and 14 different gene mutations, built a novel prognostic model (model-1) and separating patients into four risk groups which showed significantly different 3-year survival rate. Nazha et al. 51 incorporated EZH2, SF3B1, and TP53 mutations with IPSS-R and improved the predictive ability in MDS. We also proposed M-IPSS R scoring system ( Table 6) Table S4).
In summary, ASXL1, TP53, RUNX1, NRAS, KRAS, NOTCH1 and TET2 mutations along with number of lineage dysplasia are important predictors for survival of MDS. Our study highlights that integrating mutation status, lineage dysplasia, age into IPSS R may improve risk stratification of patients with MDS and assist in identification of those with worse than expected prognosis for more aggressive treatment.

Materials and methods
Subjects. One fifty-two (152) primary MDS patients including 83 males and 69 females referred to our laboratory from various centers in India were enrolled in the study. Patients with, secondary/therapy-related MDS, toxic bone marrow damage, and congenital bone marrow failure syndromes were excluded from the study. The clinical and demographic details were recorded from patient's medical records. Bone marrow aspirate (BMA) and peripheral blood (PB) samples were collected in heparin (4 cc) and EDTA (4 cc) vacutainers. Informed written consent was obtained from the study participants. The protocols of the study were approved by Institutional Ethics Committee on human subjects of ICMR-National Institute of Immunohaematology, Parel, Mumbai, India and all the methods were performed in accordance with the relevant guidelines and regulations.
Bone marrow morphology. Giemsa Staining of BMA/PB Smears was carried out as per standard procedure to classify MDS patients according to clinico-pathomorphological criteria of WHO Classification (2016) 28 . The type and degree of dysplasia in myeloid lineages was also evaluated by experienced hematologists for IPSS based prognostication. The dysplastic features (pseudo-pelger neutrophils, ring sideroblasts, micromegakaryocytes and increased blast count) were assessed in minimum 10% of the nucleated cells in the lineage for significant dysplasia. At least 500 and 200 cells were evaluated from marrow and blood respectively.
Conventional cytogenetic study. The BMA (2 ml) samples were cultured for 24-72 h in F-10 nutrient media (Sigma-Aldrich, USA) with 20% fetal bovine serum (FBS). The bone marrow aspirate were also directly harvested after arresting with colchicine (Sigma-Aldrich, USA) (50 μg/ml). The cultures were fixed with methanol: acetic acid (3:1v/v) after treating with 0.075 M hypotonic solution (KCl). The chromosomal preparations were obtained by dropping on pre chilled slides followed by aging of the chromosomes for 48 h at room temperature then subjected to GTG banding. The chromosomal analysis was carried out from minimum 20 metaphases from each case and karyotyped according to International System for Human Cytogenomic Nomenclature (ISCN) 2020.  (4) LSI D20S108 (20q12) SO probe for chromosome 20 were used in the study. The excess non-hybridized probes were washed out with wash solutions kept at 80 °C followed by nuclear counterstaining with DAPI for 15-20 min at room temperature. Analysis was carried out under fluorescence microscope (Nikon 90i) and digital images were analysed using GenASIs applied spectral imaging systems software (Applied Spectral Imaging, Israel). A total of two hundred intact, non-overlapping nuclei were assessed by 2 independent investigators and the percentages of positive nuclei were averaged. www.nature.com/scientificreports/ WHO classification of MDS patients and prognostic risk stratification. The patients were diagnosed and sub-grouped according to WHO 2016 classification criteria by well-trained haematologists. The risk stratification of the patients were carried out according to IPSS R scoring system 7 considering the cytogenetic category, bone marrow blast percentage and complete blood count of patients. The treatment details and followup data of MDS patients for, survival analysis was collected and recorded during the disease course.
Next generation sequencing. The genomic DNA was extracted from BMA and/or PB collected in EDTA tube, using QIAmp DNA blood mini kit (Qiagen). The custom capture kit was used for selective target enrichment followed by clinical exome sequencing at the Med-Genome Labs Pvt Ltd, Banglore, India. The libraries were prepared and sequenced at mean depth of 200-250X on Illumina sequencing platform with a gene coverage of approximately 98% to 100%. After sequencing, the sequences were obtained and aligned using BWA program 52,53 to human reference genome (GRCh37/hg19) followed by analysis using Picard and GATK version 3.6 54,55 to identify clinically relevant variants. VEP program 56 was used for gene annotation of the variants against the Ensembl release 91 human gene model 57 .