Exploration of the role of gene mutations in myelodysplastic syndromes through a sequencing design involving a small number of target genes

Novel sequencing designs are necessary to supplement the recognized knowledge of myelodysplastic syndrome (MDS)-related genomic alterations. In this study, we sequenced 28 target genes in 320 Chinese MDS patients but obtained 77.2% of recall factors and 82.8% of genetic abnormalities (including karyotype abnormalities). In addition to known relationships among mutations, some specific chromosomal abnormalities were found to link to specific gene mutations. Trisomy 8 tended to be linked to U2AF1 and ZRSR2 mutations, and 20q- exhibited higher SRSF2/WT1 and U2AF1 mutation frequency. Chromosome 7 involvement accounted for up to 50% of RUNX1 mutations and 37.5% of SETBP1 mutations. Patients carrying a complex karyotype were prone to present TP53 mutations (36.1%). However, individuals with normal karyotypes rarely possessed mutations in the TP53, RUNX1 and U2AF1. Moreover, DNMT3A, TP53, SRSF2, STAG2, ROBO1/2 and WT1 predicted poor survival and high AML transformation. By integrating these predictors into international prognostic scoring system (IPSS) or revised IPSS, we built a set of mutation-based prognostic risk models. These models could layer different degrees of risk in patients more satisfactorily. In summary, this sequencing design was able to detect a number of gene mutations and could be used to stratify patients with varied prognostic risk.


Methods and Materials
Patients and samples. Samples were obtained from 320 MDS patients without characteristics of MPN or transformed AML at our center between January 2009 and August 2014. MDS was diagnosed in accordance with the minimum diagnostic criteria (Vienna, 2006) 13 . The patients were subclassified into the refractory cytopenia with unilineage dysplasia (RCUD), refractory anemia with ringed sideroblasts (RAS), refractory cytopenias with multilineage dysplasia or with ringed sideroblasts (RCMD-RS), refractory anemia with excess blasts (RAEB-1), RAEB-2, and MDS with sole 5q-groups according to the WHO 2008 classification 14 . Their prognosis was evaluated based on the IPSS-R 11 . Bone marrow (BM) samples were obtained via aspiration from all of the patients when the diagnosis was made, prior to receiving any treatment with chemotherapy or hypomethylation agents, and mononuclear cells were subsequently collected via density gradient centrifugation. The ethics committee of the Sixth Hospital affiliated with Shanghai Jiao Tong University approved this research. All subjects provided informed consent in accordance with the Declaration of Helsinki. The methods were carried out in accordance with the approved guidelines.
Genomic DNA preparation. Genomic DNA (gDNA) was extracted from bone marrow mononuclear cells or matched oral mucosa epithelia from the MDS patients. The purity (OD260/280 > 1.8) and concentration (50 ng per μ l) of the gDNA met the sequencing requirements.
Targeted sequencing. The 28 selected mutational gene targets related to MDS were examined for mutations in all 320 cases from the cohort through MiSeq sequencing (Illumina, San Diego, CA, USA). To identify mutations in the highlighted genes, we designed PCR primers using the primer XL pipeline. A total of 560 oligonucleotide pairs were produced, encompassing all of the coding sequences (CDS) and most of the UTRs of the 28 genes. The amplification reactions were conducted using an ABI 2720 Thermal Cycler with the following cycling conditions: 95 °C for 2 min; 11 cycles of 94 °C for 20 s, 63 °C per cycle for 40 s and 72 °C for 1 min; 24 cycles of 94 °C for 20 s, 65 °C for 30 s and 72 °C for 1 min; and 72 °C for 2 min. The PCR products were used to generate a library for further detection, and the DNA adapter-ligated and adapter-indexed fragments from 10 libraries were then pooled and hybridized. After hybridization of the sequencing primer, base incorporation was performed in a single lane using a MiSeq Benchtop Sequencer following the manufacturer's standard cluster generation and sequencing protocols, for 250 cycles of sequencing per read to generate paired-end reads including 250 bps at each end and 8 bps of the index tag.
Mutation calling. Sequence data were analyzed through our established pipeline to detect possible somatic mutations. All of the sequencing reads were aligned to the human reference genome (hg19) using BWA version 0.5.8 with default parameters. After all of the duplicated reads and low-quality reads and bases were removed, the allele variable frequencies (AVFs) of single-nucleotide variants (SNVs) and indels were calculated at each genomic position by enumerating the relevant reads using SAMtools. All of the variants showing an AVF > 10% were extracted and annotated with ANNOVAR for further consideration only if they were found in > 5 positive reads among > 10 total reads. All of the synonymous variants and ambiguous (unknown) candidates were discarded. In addition, known SNVs with a frequency greater than 0.01 in the 1000 Genomes databases were removed, though SNVs that were found in the COSMIC database were rescued as somatic mutations.
Statistical analysis. Statistical analyses were conducted using SPSS software version 18.0. The association of the mutations with clinical characteristics was analyzed via the χ 2 test. Comparisons of two independent samples were assessed using the two-tailed Student's t-test. Fisher's exact test was applied to determine the co-occurrence of highly recurrent genes. Overall survival (OS) was defined as the time from the date of diagnosis to death or survival at the last follow-up (censored). AML-free survival was calculated from diagnosis to AML progression at the last follow-up (censored). Kaplan-Meier analysis was used to evaluate the time to survival and time to progression. Univariate analyses were performed by using COX proportional hazards regression analyses, among the 308 patients with available survival data (12 patients are lost to follow-up) to assess the impact of age, gender, IPSS-R and gene mutations, as variables, on OS and AML-free survival and to screen the main prognostic factors. The mutation index obtained using the weighted coefficients (log HR) and P values from COX analyses of each prognostic factor was used to build the prognostic models. For survival model, the mutations with univariate COX P < 0.05 (corresponding log HR range 1.59-2.34, mean 2.08) were scored 2 and the mutations with univariate COX P > 0.05 (corresponding log HR range 0.38-1.77, mean 1.16) were scored 1. For AML transformation model, in view of improving the integration of model with IPSS or IPSS-R and reduce bias caused by limited cases with AML transformation, the mutations with univariate COX P < 0.05 (corresponding log HR range 1.83-4.09, mean 3.6) were also scored 2 and the mutations with univariate COX P > 0.05 (corresponding log HR range 0.29-2.3, mean 1.4) were scored 1. All P values were based on 2-sided tests, and P values less than 0.05 were considered statistically significant.
Cooperative or exclusive relationships among distinct gene categories. The landscape of somatic alterations identified in our analysis revealed several known and unknown associations of gene mutations and indicated mutual exclusion of other genetic alterations (Fig. 2a). In detail, Fig. 2b shows q value < 0.001 for the cooperative relationships between the following three gene mutation pairs: the cohesin family gene STAG2 and the spliceosome gene UPF3A; WT1 and SRSF2; and IDH2 and DNMT3A. A q value < 0.01 for the cooperative relationship was observed between STAG2 and DHX9; FZR1 and ASIC2; ROBO2 and IDH2; ROBO1 and UPF3A; GATA2 and SF3B1; BCOR and ASXL1; ANKRD11 and ZRSR2; IDH2 and DHX9; and EZH2 and ASXL1. Thereafter, the cooperative relationships were relatively weak (q < 0.05) including those between TP53 and ANKRD11; ROBO1 and SRSF2; and RUNX1 and U2AF1. On the other hand, mutually exclusive models could exist between genes in the same functional categories. For example, TP53 mutations (which have been recognized as a predictor of poor prognosis) did not coexist with the reported disease progression-related RUNX1/ SETBP1 mutation. Mutual exclusion of gene mutations was also observed among different splicing modifiers. Interestingly, the so-called "good" TET2 mutation showed mutual exclusion in relation to RUNX1 and SETBP1.
Cooperative relationships among distinct karyotypes and gene categories. We analyzed the cooperative or exclusive relationships among distinct karyotypes and gene mutations (Fig. 3). Interestingly, most of the recurrent abnormal karyotypes presented some relationship with specific gene mutations. Trisomy 8 frequently coexisted with U2AF1 and ZRSR2 mutations; 20q deletion coexisted with SRSF2, WT1 and U2AF1 mutations; − 7/7q-frequently coexisted with RUNX1 and SETBP1 mutations (at frequencies reaching 50% and 37.5%, respectively) and 5q-coexisted with SF3b1 and TP53 mutations (Supplemental Table 1). Notably, the patients with complex karyotypes exhibited a close correlation with TP53 mutations (36.1% recurrence). "Very good" 11q-presented a positive relationship with poor ROBO1 mutations, although 11q-was found in only three patients (two of the three cases harbored only 11q-, while the third harbored 11q-plus 12p-). The analysis of the 196 patients with normal chromosome G-binding results showed the following characteristics: the frequencies of ASXL1, DNMT3A and TET2 were very close to the frequencies in all 320 cases (12.8% vs. 12.8%, 10.7 vs. 10.6%, and 10.7 vs. 13.4%, respectively), but these normal karyotype patients rarely exhibited mutations in TP53, RUNX1 and U2AF1 (Fig. 3).

Impact of individual mutations on AML transformation.
Regarding AML-free survival, univariate analyses showed that advanced age (P = 1.0E-03, HR = 1.026), a high IPSS-R score (P = 1.0E-13, HR = 1.709)  Novel prognostic model including molecular markers. Based on the significant impact of gene mutations on OS, we constructed a molecular marker-based model for risk stratification using the following definitions: Low: no mutations; Intermediate-1: presence of 1 common mutation (mutations in all genes except for TP53, STAG2, DNMT3A, EZH2, RUNX1, ROBO1/2, SRSF2 and WT1); Intermediate-2: presence of 2-3 common mutations or 1 poor mutation (TP53, STAG2, DNMT3A, EZH2, RUNX1, ROBO1/2, SRSF2 and WT1) plus 0-2 common mutations; High: presence of ≥ 4 mutations (common or poor mutations) or two poor mutations (Supplemental Table 2). Using this system, the 320 patients could be significantly divided into 4 risk groups (P = 6.7E-11). To further improve the prognostic stratification, we integrated this mutation-based system into the IPSS or IPSS-R scoring system (Supplemental Tables 3 and 4). The results suggested that whether the four subgroups of IPSS or five subgroups of IPSS-R were employed, the addition of mutation scoring significantly separated the survival curves (P = 5.6E-12; P = 1.0E-13) (Fig. 5a).
Similar to the OS analysis, we constructed a predictive model for AML transformation based on the significant gene mutations associated with AML transformation using the following definitions: Low: 0-1 common mutations (mutations in all genes except for DNMT3A, TP53, WT1, SRSF2, IDH1/2, STAG2 and ROBO1/2); Intermediate: 1 driving mutation (DNMT3A, TP53, WT1, SRSF2, IDH1/2, STAG2 and ROBO1/2) or 2 common mutations; High: ≥ 2 driving mutations or the presence of ≥ 3 common mutations (Supplemental Table 5). The 320 patients could be significantly divided into 3 risk groups according to this system, based on which AML transformation risk was well distinguished (P = 1.0E-13) (Fig. 5b). We integrated this mutation-based model into IPSS and IPSS-R (Supplemental Tables 6 and 7), and the results suggested that the IPSS-M or IPSS-MR system achieved better risk stratification for AML transformation than the primary system (P = 1.0E-13; P = 1.0E-13) (Fig. 5b).

Discussion
Several next-generation target-sequencing studies on MDS have been reported thus far [7][8][9][10] . However, there are no available data from multiple-target gene-sequencing based on a large Chinese population. As we described in INTRODUCTION, the detectable coverage varies according to the number of target genes 7,8,16-18 . Additionally, the incidence of MPN-related (JAK2, CALR and MPL) and de novo AML-related (N-RAS, K-RAS and FLT-3) gene mutations is low in typical MDS subsets 7,8,19,20 . In this study, 18 genes showing a high frequency of mutation in MDS were adopted for sequencing analysis, including the epigenetic modifiers ASXL1, DNMT3A, EZH2, TET2, IDH1/2 and BCOR; the RNA spliceosome genes SF3B1, SRSF2, U2AF1, and ZRSR2; the transcription factors CEBPA, GATA2, RUNX1, and SETBP1; the adhesion molecule STAG2; and the tumor suppressors WT1 and TP53. Additionally, 10 novel recurrently mutated genes (which showed a higher incidence in MDS in our previous sequencing work and have been reported to be related to carcinogenesis) were employed to obtain a small group of 28 target genes. The ten novel genes were ROBO1/2, ITIH3, PTPRD, KIF20b, ANKRD11, UPF3A, DHX9, ASIC2 and FZR1 12   Previous studies have found specific cooperative or exclusive relationships among specific gene mutations 7,8 . Our results did not exhibit absolute consistency with previous reports, possibly due to the different races, disease subsets, and choice of target genes involved 7,8 . Because of the especially low incidence of SRSF2 mutations in Chinese MDS cases (3.4% vs. 13-15% in foreign literature) 7,8,20 , the cooperative relationship between the mutations in SRSF2 and STAG2/RUNX1/ASXL1/IDH2 confirmed in the published literature was not validated by our results 7,8 . However, we obtained consistent findings with Papaemmanuil et al. 7 and Haferlach et al. 8 regarding the cooperation between EZH2 and ASXL1/RUNX1. Additionally, the exclusive relationship between spliceosomes was explicitly validated 7,8 . Interestingly, TP53 mutations rarely co-occurred with RUNX1 or SETBP1 mutations in the same individuals (exclusive q value of 0.001). Furthermore, RUNX1 and SETBP1 were rarely mutated together. All three gene mutations have been considered to be closely related to MDS progression 4,21-23 . It has been proposed that different signaling pathways may participate in the development MDS in individual cases, and a key mutation may be sufficient for the disease to undergo transformation. Thus, MDS progression could occur through a heterogeneous pathway, unlike a homogeneous model (controlled by signal-transducing and transcription factors) as observed for de novo AML.
A comprehensive analysis was carried out between chromosome karyotypes and gene mutations. First, 30 patients with trisomy 8 presented a higher incidence of U2AF1 or ZRSR2 mutations (20% and 16.7%, respectively). However, the "good" karyotype 20q-showed a marked cooperative relationship with WT1, SRSF2 and U2AF1 mutations, all of which are related to poor outcomes according to the literature and our results 7,24,25 . Patients with − 7 or 7q-showed a very close cooperative relationship with RUNX1/SETBP1 (frequencies reaching 50% and 37.5%, respectively), which is highly consistent with the poor outcome for this subset of MDS. However, 5q-MDS patients tended to harbor TP53 mutations, consistent with the findings of Papaemmanuil E 7 . Patients with complex karyotypes in this study showed a highly cooperative relationship with TP53 mutations (also consistent with Papaemmanuil E 7 ) (incidence as high as 36%), and the inferior personal predictive effects of TP53 mutations and complex chromosomes were highly consistent 26,27 . Notably, two of 3 patients with the "very good" (according to IPSS-R) 11q-presented ROBO1 mutations. Mutation detection approached nearly 75% in the 196 patients with normal karyotypes. Two typical MDS-related mutations showed nearly equal frequencies in normal karyotype patients and patients with abnormal chromosomes, in the genes ASXL1 (12.8% vs. 12.8%) and  28,29 , suggesting that normal karyotypes do not always point to a good prognosis, although we do not consider the outcomes of these chromosomally normal patients to be the same as those of high-risk patients because they rarely display mutations in genes such as TP53, RUNX1 or U2AF1 that have been confirmed to predict an undesirable outcome. These chromosome-related mutation analyses provide new insights. IPSS or IPSS-R based on prognostic factors including bone marrow blasts, cytogenetics and cytopenias (hemoglobin, platelets and ANC) is considered the most preferred prognostic system 3,11 . However, IPSS or IPSS-R does not include gene mutations that have been shown to be associated with disease prognosis in recent studies 4,7,8 . In studies by our group and others, certain gene mutations have been shown to coexist with specific chromosomal abnormalities 8,16 , though this does not mean that they can be equated, considering, for example, that 74.2% (142/192) of MDS patients with a normal karyotype exhibit at least one gene mutation. In addition, IPSS or IPSS-R analysis did not distinguish the survival curves for each group well in the present study, especially among lower-risk groups or higher-risk groups. Therefore, it is necessary to integrate gene mutations into the prognostic system. Mutation-based survival analyses showed that the combination of a group of significant high-risk markers (TP53, STAG2, DNMT3A, EZH2, RUNX1, ROBO1/2, SRSF2 and WT1) and the cumulative effects of mutations in 19 other genes predicted four subgroups with significantly different survival. Therefore, we established a new comprehensive IPSS-M or IPSS-RM system through integrating clinical and mutation parameters, which appeared to generate perfect stratification curves. Similarly, gene mutations were also included in a new AML-predicting model. The combination of a group of significant risk markers of AML transformation (DNMT3A, TP53, WT1, SRSF2, IDH1/2, STAG2 and ROBO1/2) and the cumulative effects of mutations in 19 other genes predicted three subgroups with significantly different AML transformation rates. The IPSS-M and IPSS-RM based on this model generated better-separated curves for leukemia-free survival than the primary system. Similar prognostic models were also reported in previous studies 8, 30 . In a recent study, Nassau A et al. established and validated a predictive model in treated MDS patients through incorporation of 62 mutated genes into IPSS-R. Integrated model also significantly improved the prognostic evaluation of MDS patients regardless of their initial or subsequent therapies. Specifically, EZH2 and TP53 were considered as independent factors predicting poor survival, which is consistent with our findings 31 . Further validating work on our risk model including more newly diagnosed MDS patients (over 300 cases) is being carried out. Validating model will be built according to the calculation mode from the training model. The concordance index will be used to compare the statistical power of the training with validating model. Based on the integration of mutations information from training cohort with validating cohort, independently prognostic factors would be screened to supplement into IPSS or IPSS-R, and improve their prognosis evaluation. In brief, the aim of construction of prognostic models is to supply better prognostic prediction, therapeutic interventions and our knowledge of MDS pathogenesis.
In summary, using a sequencing design involving 28 target genes, we were able to detect at least one gene mutation in over 77.2% of MDS patients. This design is simple and effective and can provide us with valuable information on the pathogenesis and prognostic evaluation of MDS as well as clinical interventions for MDS.