Introduction

Myeloid neoplasms are a heterogeneous group of clonal disorders of mutant hematopoietic stem and progenitor cells, characterized by diverse clinical presentations and outcomes [1, 2]. Mutations in the family of genes encoding the multimeric protein complex cohesin [3] are present across the spectrum of myeloid neoplasms, including myelodysplastic neoplasm (MDS), myeloproliferative neoplasms (MPN), MDS/MPN overlap syndromes, acute myeloid leukemia (AML), and chronic myeloid leukemia (CML) [4,5,6,7,8,9]. In human somatic cells, proteins encoded by SMC1, SMC3, RAD21, and one of the paralog genes STAG1 or STAG2, form a ring-like structure, which wraps around the DNA, and is loaded and unloaded by the modulator proteins MAU2-NIPBL and PDS5-WAPL, respectively [3]. The cohesin complex dynamically shapes genome architecture and regulates gene expression and DNA integrity, and loss of function mutations in the hematopoietic system have shown to effect hematopoietic stem cell self-renewal and differentiation, leading to the development of myeloid neoplasia [10,11,12,13,14,15,16,17,18,19].

Cohesin mutations have been understood as genetic drivers in myeloid malignancies for almost a decade, however, detailed examination of the cohesin gene-specific disease characteristics and the prognostic impact are lacking. STAG2 mutations have previously been grouped with other secondary AML ontogeny-defining mutations [20, 21], and the 2022 European Leukemia Network (ELN) guidelines classify them in the subgroup of adverse risk AML [22]. However, it is unclear whether a secondary AML ontogeny attribution and adverse prognostic impact can also be assigned to the less frequent cohesin complex mutations in SMC1A, SMC3, RAD21, and PDS5B [21]. Furthermore, an independent prognostic value for any cohesin gene mutation has previously not been established [23, 24].

In the largest cohort of cohesin-mutated myeloid neoplasms reported to date, we characterized the incidence, clinical presentation, genomic landscape, and clinical outcomes of cohesin subunit mutations, and identified subunit-specific effects with disease ontogeny and prognostic implications, which informs distinct biology of these important genetic drivers.

Methods

Patient cohort

We analyzed 2 independent cohorts of patients as described below (Supplementary Fig. 1). Cohort 1 (“DFCI cohort”) included 5,191 patients seen at the Dana-Farber Cancer Institute (DFCI) with a confirmed hematologic malignancy as defined by the 2016 World Health Organization (WHO) classification [25] from August 2014 to November 2021 based on morphology and cytogenetic findings. For these cases, the WHO diagnoses were retrospectively translated to WHO 2022 [2] classification based on published diagnostic criteria. Classification of all cohesin-mutant (MT) cases underwent independent hematopathology review. The subset of 759 patients with any detectable variant (defined by previously established allele frequency thresholds [26]) in a cohesin complex gene regardless of disease entity were extracted, and 311 (40.1%) patients were found to have a pathogenic cohesin mutation (as defined in the “Mutational profiling” section below). From these, 256 cohesin-MT AML, MDS, or MDS/MPN patients were compared to 3,109 wild type (WT) cases. Patients were compared in terms of demographics, clinical characteristics, and outcomes.

Cohort 2 (“MLL cohort”) included a total of 479 patients treated across Germany who underwent diagnostic workup of a suspected or confirmed myeloid malignancy at the Munich Leukemia Laboratory (MLL) between 2005 and 2022 and were found to have 1 or more pathogenic mutations in STAG2, SMC1A, SMC3, or RAD21 (PDS5B mutation status was not assessed). Diagnoses from peripheral blood and bone marrow were made based on cytomorphology, cytogenetics, and molecular genetics as previously described [27,28,29] in accordance with the 2016 WHO classification and reviewed by 2 board-certified hematopathologists. All cases were classified into specific subgroups according to the WHO 2022 classification. All 479 cases were used for the demographics and disease type analyses. Only 134/479 MLL cases (28%) had sufficient clinical annotation with available date of diagnosis and follow-up for outcome analysis. In addition, a selected cohort of 1378 cases (838 MDS and 540 AML) without evidence of a cohesin mutation and with available follow-up data were selected from the MLL dataset based on comprehensive sequencing data availability and used to compare outcomes. All patients gave their consent for genetic analyses and the use of laboratory results for research purposes. The study was approved by the DFCI Institutional Review Board (IRB) and the MLL IRB.

Mutational profiling

For the DFCI cohort, cytogenetic data were extracted from clinical reports of karyotype and fluorescent in-situ hybridization (FISH) generated by the DFCI clinical cytogenetics laboratory. Molecular data were obtained from reports of clinical next-generation sequencing (NGS) performed using the DFCI Rapid Heme Panel (RHP) at diagnosis and relapse, as previously reported [26]. Genes included in the RHP were selected based on their known or suspected involvement in the pathogenesis of myeloid or lymphoid cancers, or inherited or acquired bone marrow failure syndromes, and are listed in the Supplementary Table 1. One of two different versions of RHP was used for sequencing analysis of study cases: samples acquired between August 2014-October 2019 were analyzed using RHP version 2, which was based on a custom amplicon-based approach. A minimum of 10 variant reads or 5–9 variant reads at >33% allelic frequency were required for mutation calling. RHP version 3 was used for samples acquired after November 2019, and the revised platform used unique molecular identifiers (UMIs) for error-suppression to allow reliable variant calls at a variant allelic fraction (VAF) of 0.01 or greater requiring a minimum of 3 mutant reads. Median mutation coverage was 568x (95% confidence 100x-2036x). All truncation, frameshift, or splice site mutations in STAG2, RAD21, SMC1A, SMC3, and PDS5B were considered pathogenic, and all missense mutations were manually reviewed for non-germline allelic frequency and damaging PolyPhen score ( > 0.85), or for evidence in OncoKB [30] or COSMIC [31] for significant mutations in cancer [21]. Mutational and cytogenetic analyses for the MLL cohort were performed as previously reported and based on whole genome sequencing (WGS), and validation used targeted deep sequencing [32, 33]. A total of 763 samples were assayed by whole genome sequencing (WGS) and analyzed as described in previous reports [7, 34, 35]. There were 1157 cases assayed by targeted sequencing, which were analyzed during routine diagnostic workup or for research purposes [7]. WGS data confirmed all mutations detected by targeted NGS panels and was further consulted for completing the mutational analysis of the 73 genes.

Transcriptomic analysis of BeatAML

Raw counts from patients with reported mutations [36] in STAG2, RAD21, SMC3 or SMC1A were normalized using DESeq2. Differentially expressed genes were called between any cohesin subunits using DESeq2 with an FDR < 5%. All these genes were used for unsupervised k-means clustering and annotated cohesin mutations were superimposed to this clustering. Genes from each k-means cluster were queried against Metascape [37], and gene sets with FDR < 0.01 were plotted.

Statistical analysis

All statistical analyses were conducted using R v4.2.1. Statistical significance was considered using a significance level α of 0.05. Normality was assessed using the Shapiro-Wilk test. If normality distribution was met, a two-sided unpaired t-test was applied to continuous variables between 2 groups unless stated otherwise. Non-normal distributed continuous variables between 2 groups were analyzed using a Wilcoxon rank sum test. For multiple group comparison, an ANOVA analysis was first performed, followed by an emmeans test for the indicated comparisons using rstatix package v0.7.1. Mutational co-occurrence was calculated as pairwise odds ratio (OR) for any given gene between patients with the respective cohesin-MT and WT cases. Statistical significance was derived from Fisher exact test, which was adjusted for multiple testing using Benjamini-Hochberg procedure. Outcome analyses were carried out using the Kaplan Meier method stratified by the presence or absence of a respective mutation. Statistical comparison was conducted using Cox proportional hazard ratio (HR) (coxph, survival package v3.2-11) and a two-sided log rank test with default parameters using survfit. For multivariate modeling, only significant univariate parameters were used forward in a Cox proportional model, using transplantation as a time-dependent variable. Unless stated otherwise, only results that hold significance in the merged datasets were reported in the main text; individual cohorts are presented in the supplement.

Results

Cohesin-mutant hematologic malignancies have distinct disease characteristics

We investigated 2 large cohorts of patients diagnosed with a hematologic malignancy for presence of a mutation in the cohesin complex (Supplementary Fig. 1). In total, we identified 790 patients with a pathogenic mutation in any of the cohesin complex genes (Fig. 1A). The incidence of cohesin mutation was 10% in MDS, 5% in MDS/MPN, and 8% in AML patients in the DFCI cohort (Supplementary Table 2). Mutations in different cohesin subunits were noted to be mutually exclusive with each other, with only 12/790 (2%) cases characterized by mutations in more than 1 cohesin subunit, and they were spread throughout the coding sequence without any hotspots (Fig. 1B and Supplementary Fig. 2). STAG2 mutations were the most common and present in 610 (77%) of patients, followed by RAD21 in 104 cases (13%), SMC3 in 26 cases (3%), SMC1A in 22 cases (3%), and PDS5B in 16 cases (2%). Frameshift indel mutations were the most frequent type of mutations for most cohesin genes except for STAG2, where nonsense mutations leading to a premature stop codon comprised more than 50% of all mutations (Supplementary Fig. 3A). Patients with mutations in the cohesin complex were diagnosed with AML, MDS, and MDS/MPN in 374, 351, and 63 cases, respectively (Table 1, Supplementary Fig. 3B). In the DFCI cohort, we identified 55 patients (17.7%) with a pathologist-validated non-myeloid hematologic malignancy, including indolent and high-grade lymphomas (chronic lymphocytic leukemia (n = 7), diffuse large B cell lymphoma (n = 7), multiple myeloma (n = 3), and acute lymphoblastic leukemia (n = 8)) (Supplementary Fig. 3C). In all subsequent analyses, we excluded cases with non-myeloid hematologic malignancies and focused on patients with AML, MDS, and MDS/MPN only.

Fig. 1: Molecular characterization of cohesin mutations in hematologic malignancies demonstrates subunit-specific differences.
figure 1

A Oncoprint of all patients with cohesin mutations (DFCI and MLL cohorts combined), n = 790. B Lollipop plot panel of cohesin mutations for the combined cohort. C Pie charts of distribution of cohesin mutations across MDS, AML, and MDS/MPN for the combined cohort. D Box plot of the total number of pathogenic mutations identified by targeted sequencing of patients with cohesin-mutant MDS and AML at the time of diagnosis, stratified by cohesin status for the combined cohort. Wilcoxon test was used to determine significance.

Table 1 Patient and disease characteristics of cohesin-mutant versus cohesin-wild type MDS, MDS/MPN, and AML patients.

Patients were divided into a cohesin-WT (n = 4487) or cohesin-MT (n = 735) cohort. We then systematically compared the clinical and demographic features of these 2 cohorts. Cohesin-MT patients were older at the time of diagnosis (AML: 69 vs. 65 years; MDS: 73 vs. 69 years; p < 0.0001) and had different patterns of AML and MDS subtypes than their cohesin-WT counterparts. AML with myelodysplasia-related defining genetic abnormalities (AML-MR) was present in 73% of cohesin-MT compared to 34% of cohesin-WT cases (p < 0.001). Conversely, AML without genetically defined lesions (summarized as AML by differentiation) and AML with NPM1 mutation were more common among cohesin-WT than cohesin-MT cases (31% vs. 8.3%, p < 0.001; 14 vs. 11%, p = 0.14, respectively). Within MDS, the cohesin-MT cohort had a higher fraction of more advanced MDS diagnoses than the cohesin-WT cohort (MDS-IB1: 35% vs. 16%, p < 0.0001; MDS-IB2: 36% vs. 17%, p < 0.0001). Consistent with these findings, the fraction of patients with documented progression from MDS to AML was higher in MDS patients with cohesin mutations than MDS patients without these mutations (32% vs. 21%, p = 0.005; data only available for the DFCI cohort). Notably, MDS with bi-allelic TP53 inactivation, del5q, and SF3B1 associated MDS were nearly mutually exclusive with cohesin-mutant MDS (Table 1, Supplementary Table 2). These data demonstrate that cohesin mutations segregate with distinct clinical features linked to MDS and subsequent secondary AML.

Cohesin complex mutations have distinct clinical features and AML ontogeny

We next aimed to delineate differences among mutations of the cohesin complex components in patients with AML, MDS, and MDS/MPN overlap syndromes (Table 2, Supplementary Table 3). Given their high prevalence, our analysis focused on the comparison between cases with STAG2 versus RAD21 mutations, and examined the impact of SMC1A, SMC3, and PDS5B mutations as a group due to their significantly lower numbers (thereafter referred to as SMC1A/SMC3/PDS5B). We observed that patients with STAG2 mutations were older than patients with mutations in other cohesin genes at time of AML diagnosis (70 vs. 64 vs. 64/57/50 years, p < 0.001, Supplementary Fig. 3D) but not at time of MDS diagnosis (Supplementary Fig. 3E). We observed a significant difference in the distribution of cohesin subunit mutations, with 72.5% of all AML cases, but 87.4% of all MDS cases (p < 0.001) and 93% of MDS/MPN cases (p = 0.001) carrying a STAG2 mutation (Fig. 1C, Table 2). Furthermore, there was a significant difference in the AML ontogeny among different cohesin mutations. STAG2 mutations were almost exclusively associated with an AML-MR diagnosis (256/271, 94%), with rare cases of NPM1 and bi-allelic CEBPA (8/271 (3%) and 5/271 (1.8%), respectively) (Table 2). Conversely, patients with RAD21 and SMC1A/SMC3/PDS5B mutations presented more frequently with de novo AML with NPM1 mutations (27/69 (39%), 2/12 (17%), 3/11(27%), and 2/6 (33%) cases, respectively) versus STAG2-mutant patients (p < 0.0001). Core binding factor leukemia t(8;21)/RUNX1::RUNX1T1 was found in 10/69 (14%) of RAD21-mutant patients and 2/29 (7%) of SMC1A/SMC3/PDS5B-mutant patients but only in 1/271 (0.4%) STAG2-mutant patient (p < 0.0001 and 0.026, Table 2).

Table 2 Patient and disease characteristics of different cohesin complex mutations in MDS, MDS/MPN, and AML patients.

Consistently, RAD21 and SMC1A/SMC3/PDS5B mutations represented a significantly greater proportion of cohesin-mutant AML than MDS as compared to STAG2 mutations (Fig. 1C). This suggests that STAG2 mutations tend to be acquired at the MDS stage, and RAD21 and the other cohesin subunit mutations may be more likely acquired at the AML stage and lead to rapid leukemic transformation rather than a slower increase in blast count over time, as may be expected in MDS. Indeed, patients with RAD21 and SMC1A/SMC3/PDS5B mutations trended towards a higher percentage of blasts in their diagnostic AML bone marrow biopsy compared to patients with STAG2 mutations (median morphology-defined blast count of 47% for RAD21 vs. 28% for STAG2-mutant AML, p = 0.11, Supplementary Fig. 3F, data available for the DFCI cohort only).

To further investigate whether STAG2 mutations may be preferentially acquired at MDS stage and lead to development of secondary AML, we extracted all patients with available longitudinal mutation data and identified 23 patients diagnosed with STAG2-mutant AML with at least one mutational assessment before AML diagnosis. The median time from the first mutational assessment to AML diagnosis was 13 months (range 6–53 months) (Supplementary Fig. 4A, B and 5A). Out of 23 cases, 17 patients (73%) had a STAG2 mutation that was detected prior to AML diagnosis, of which 13 patients (76%) were diagnosed with MDS, and 4 patients (24%) with MDS/MPN. There were 5/23 patients (22%) that showed the first emergence of a STAG2 mutation at AML diagnosis, and only 1/23 patients (4%) acquired the STAG2 mutation upon AML disease relapse. Furthermore, we observed a relatively stable STAG2 VAF leading up to AML diagnosis (Supplementary Fig. 4C, p = 0.85), suggesting that the STAG2-mutant clone had already dominated the bone marrow in non-leukemic cells before the AML diagnosis was made. These data collectively suggest that the clinical presentation of patients with mutations in different cohesin subunits is not uniform, with different mutations affecting disease biology and AML ontogeny in distinct ways. Furthermore, our data demonstrate that STAG2 mutations do not act as AML-defining lesions and are usually acquired at the MDS stage.

Cohesin complex mutations are associated with unique co-mutational, cytogenetic and transcriptional profiles

Having established a unique pattern of disease characteristics of cohesin mutations, we next examined the genetic characteristics associated with each mutation. To dissect the genetic makeup of STAG2 and other cohesin gene mutations, we first compared the total number of additional detected mutations. We observed that patients with STAG2 mutations had a higher number of co-mutations compared to cohesin-WT patients (median 4 vs. 3, p < 0.001), and all other cohesin mutations (median 4 vs. 3, p < 0.003, Fig. 1D, Supplementary Fig. 3H). Next, we analyzed differences in the co-mutational patterns between STAG2 and RAD21-mutant AML and MDS patients (Fig. 2, Supplementary Fig. 56). The most frequently co-occurring mutations with STAG2 were secondary type mutations, including mutations in ASXL1 (64%), SRSF2 (45%) and RUNX1 (37%). In contrast, RAD21 mutations were much more likely to co-occur with de novo or pan-AML mutations, including NPM1 (32%) and FLT3 (23%). We also observed t(8;21) in 14% of RAD21-mutated cases, with and without additional KIT mutations, but not in a single case of STAG2-mutant disease. By systematically comparing the co-mutational landscape of STAG2 mutations and other cohesin gene mutations (Fig. 3A), we found a significant positive enrichment of STAG2 being co-mutated with secondary ontogeny-defining mutations, including ASXL1, SRSF2, BCOR, and RUNX1 (OR = 5.6, 7.5, 2.5, 3.5; false discovery rate (FDR)= 4×10–29, 5×10–31, 7×10–5, 2×10–13, respectively; Supplementary Table 4, Supplementary Fig. 7A-B). Conversely, mutations in NPM1 and DNMT3A were underrepresented in patients carrying a STAG2 mutation (OR = 0.7 and 0.5; FDR = 0.5 and 0.004, respectively). In contrast, the co-mutational pattern for RAD21, SMC3, and SMC1A was distinct from STAG2, and confirmed enrichment for de novo and pan AML ontogeny-defining mutations NPM1 and FLT3 (OR = 4.4 and 2.3; FDR = 1×10–5 and 0.01, respectively). Interestingly, STAG2 and RAD21 mutations shared a near-mutational exclusivity with TP53 mutations (OR = 0.04–0.06, FDR < 0.001), which was evident for MDS and AML cases (ORSTAG2-MDS = 0.07, ORSTAG2-AML = 0.05, ORRAD21-MDS = 0, ORRAD21-AML = 0.08).

Fig. 2: Co-mutational patterns suggest distinct AML ontogeny for STAG2 versus RAD21, SMC3, SMC1A and PDS5B mutations.
figure 2

Oncoprint for MDS, AML, and MDS/MPN patients from the combined cohort. Cases with STAG2, RAD21, SMC3, SMC1A, and PDS5B mutations were sorted by co-mutational pattern based on disease ontogeny [21] and association with cohesin subunits. A 2% allelic frequency cut off was used.

Fig. 3: Mutations in different cohesin subunits display unique mutational and chromosomal abnormality characteristics.
figure 3

Balloon plot for relative enrichment of co-occurrence of cohesin subunit mutations with other myeloid driver mutations (A) and chromosomal aberrations (B). Cohesin-WT cohort was used as reference to calculate enrichment, which is indicated as log2 odds ratio (OR). Combinations with q < 0.05 or 5% mutational frequency in the total cohort are shown in (A). Missing balloons indicate OR = 0. False discovery rate (FDR) is as indicated: ***<0.0001, **<0.001, *<0.05 and corresponds to dot size. C Gene expression heatmap of cohesin mutant patients with annotated cohesin mutations and clustered by k-means (1328 genes). Samples were clustered by hierarchical clustering and annotated by cohesin mutation. Color indicate z score transformed CPM per gene.

We similarly observed a distinct pattern of co-occurring cytogenetic aberrations among cohesin-MT cases (Fig. 3B, Supplementary Fig. 7C, D). Trisomy 8 was enriched in STAG2-mutant versus cohesin-WT cases (OR = 10.6, FDR = 2.2×10–50). In contrast, mutations in RAD21, SMC1A, and SMC3 were enriched for t(8;21) (OR = 134, 131, 37; FDR = 4.1×10–19, 3.8×10–5, 0.14, respectively). Complex karyotypes were less common among all cohesin-MT patients compared to cohesin-WT patients (ORSTAG2 = 0.12, FDR = 2.1×10–22; ORRAD21 = 0.24, FDR = 0.006), which would be expected given their near-mutual exclusivity with TP53 mutations.

Our genetic and cytogenetic analyses supported the hypothesis that STAG2- and non-STAG2- mutant myeloid diseases represent distinct biology and ontogeny. To investigate whether this was supported by distinct gene expression programs, we analyzed transcriptomic profiles of the cohesin-mutant AML cases in the BeatAML cohort [36]. Using unsupervised k-means clustering, we observed distinct transcriptional profiles of STAG2- and RAD21/SMC3/SMC1A-mutant cases (Fig. 3C). Gene set enrichment analysis (GSEA) highlighted differential expression of viral response and interferon signaling, as well as metabolic programs and extracellular matrix-associated pathways (Supplementary Fig. 8). In summary, the distinct co-mutational, cytogenetic and molecular landscapes of different cohesin mutations suggest unique patterns of disease development driven by different cohesin mutations.

STAG2 mutations have prognostic impact in MDS and AML

Having established unique disease and genetic characteristics for different cohesin mutations, we next assessed their impact on clinical outcomes. We conducted independent analyses of overall survival (OS) and progression free survival (PFS) in MDS and AML. The median follow-up time for the entire patient cohort was 73.8 months (95% confidence interval (CI) = 69.6–80.5 months) for MDS, and 49.4 months (95% CI = 45.4–54.4 months) for AML. We first compared outcomes for STAG2-mutant MDS to cohesin-WT MDS in which STAG2 conferred a poor risk at a median OS of 30.3 versus 58.9 months (HR: 1.44, 95% CI 1.17–1.78, p < 0.001, combined cohort, Supplementary Fig. 9). Given our observations of near-mutual exclusivity of cohesin and TP53 mutations (Figs. 2 and 3A), and the well-established association of TP53 mutations with poor outcomes [38, 39], we next compared the OS and AML-free survival of patients with STAG2-mutant MDS to TP53-mutant MDS and cohesin/TP53-WT MDS. We observed a significantly worse OS of STAG2-mutant MDS compared to cohesin/TP53-WT MDS (HR = 1.73, 95% CI = 1.4–2.14, median OS 30.3 vs. 69.8 months, p < 0.001, Fig. 4A), and a similar risk of leukemic transformation in STAG2- and TP53-mutant MDS cases (median AML-PFS of 15.4 months for STAG2 and 12.1 months for TP53, p = 0.3, DFCI cohort only, Fig. 4B). In a multivariable regression analysis to ascertain the effect of mutations, cytogenetics, diagnostic blast count, and age at the time of MDS diagnosis, the presence of a STAG2 mutation did not reach significance as an independent predictor of MDS outcome (Fig. 4E).

Fig. 4: Prognostic impact of STAG2 mutations in MDS and AML as secondary ontogeny mutations.
figure 4

Survival analysis using the Kaplan-Meier method and log-rank test for overall survival in MDS (A), AML-Progression free survival in MDS (DFCI cohort only) (B), and AML survival stratified by cohesin subunit mutational status and cohesin-WT group by AML MR or AML-non-MR (C) and TP53 mutation (D). HR = Hazard ratio. Statistical significance was determined using the log rank-test. E Forrest plot for multivariate prognostic impact of STAG2 mutations for MDS OS and (F) for AML OS.

For our outcome analysis in AML, we first compared STAG2-mutant AML to cohesin-WT cases separated into cohesin-WT AML associated with myelodysplasia-related changes (thereafter referred to as “AML-MR”) and cohesin-WT AML not associated with myelodysplasia-related changes (thereafter referred to as “AML-non-MR”) according to the WHO 2022 classification. We observed that STAG2-mutant AML had significantly worse OS than AML-non-MR (HR = 0.62, 95% CI = 0.5–0.76, median OS 16 vs. 35 months, p < 0.001), and only a modestly better OS than AML-MR (HR = 1.43, 95% CI = 1.16–1.77, median OS 10.3, p < 0.001, Fig. 4C). Given the near-mutual exclusivity of STAG2 and TP53 mutations in AML, we performed a subset analysis of cohesin-WT AML excluding TP53-mutant cases which removed most of the differences and showed a very similar and numerically even favorable outcome between STAG2-mutant and AML-MR without TP53 mutations. (HR = 0.80, 95% CI = 0.64–0.99, median OS 13.6 vs. 11.8 months, p = 0.04, Fig. 4D). This poor outcome was also evident for the rare STAG2-mutant cases that were not diagnosed as AML-MR because of competing classifying mutations (e.g., NPM1 and/or CEPBA, Supplementary Fig. 10A, B).

Importantly, this pattern was distinct from the outcomes of RAD21-mutant AML, which was almost identical to AML-non-MR and significantly better than STAG2-mutant AML OS (HR = 0.56, 95% CI = 0.34–0.93, median OS 48 vs. 16 months p = 0.024) (Fig. 4C, Supplementary Fig. 11A, B). This effect was most apparent in the DFCI cohort, although we observed the same trend in the MLL cohort, with differences likely being driven by intrinsic variability in treatment and selection biases between DFCI and MLL (Supplementary Fig. 11C, D). Allogenic stem cell transplantation cases accounted for 41% of DFCI but only 12% of MLL cases (Supplementary Table 5), and response rates to induction therapy (Supplementary Table 6) were similar between groups. The effects of STAG2 and RAD21 mutations on OS remained significant when censored for allogeneic stem cell transplantation (Supplementary Fig. 10B, Supplementary Fig. 12), although neither one reached statistical significance as an independent predictor of outcome in a multivariable regression analysis of known clinical co-variables (age and transplantation as time dependent variables) and co-mutation with ASXL1, SRSF2, RUNX1 and TP53 (Fig. 4F).

In summary, our findings suggest that only STAG2 mutations confer a negative impact on AML outcomes, which is attributed to secondary ontogeny and a genetic makeup of preceding myeloid dysplasia. Notably, the prognostic impact of RAD21 mutations is shared with de novo AML.

Discussion

Our study establishes a role for different cohesin subunit mutations in distinct subtypes of MDS and AML, which has significant prognostic implications, expands our current understanding of this important group of driver genes, and informs unique biology of different cohesin subunits. We assembled and analyzed the largest existing cohort of 790 patients with cohesin-mutant hematologic malignancies and demonstrated that mutations in STAG2 and RAD21 shape the presentation and outcome of AML in unique ways, which can be explained by distinct co-mutational patterns and AML ontogeny. Furthermore, the size of our cohort strongly supports this prognostic impact to be driven by disease ontogeny in both MDS and AML, which was under-appreciated in significantly smaller cohorts [23, 24].

We demonstrated that STAG2 mutations are associated with secondary AML ontogeny, are usually acquired at MDS or MDS/MPN stage, and co-occur with other secondary ontogeny-defining mutations, such as ASXL1, SRSF2, and RUNX1. Our data are in agreement with initial reports identifying STAG2 as one of the eight secondary AML ontogeny defining lesions [21], as well as the 2022 International Consensus Classification which uses STAG2 as an AML-MR defining mutation for classification of AML [40, 41]. In contrast, we found that RAD21 mutations are associated with de novo AML, are rarely preceded by MDS or MDS/MPN, and are associated with de novo or pan AML molecular abnormalities, such as t(8;21), FLT3, and NPM1 mutations [24]. These differences are also reflected in distinct gene expression patterns between STAG2- and non-STAG2-mutant AML, and the unique co-mutational and cytogenetic patterns likely contribute to distinct biological trajectories of leukemic evolution and warrant further investigation in preclinical models.

Importantly, clinical outcomes reflect the different ontogeny associated with STAG2 and non-STAG2 cohesin mutations, including RAD21, SMC1A, and SMC3 mutations. While we did not find them to be independent prognostic markers, the distinct pattern of outcomes is reflective of their different disease ontogeny. We observed that STAG2 mutations conferred overall survival similar to AML-MR, while RAD21-mutant cases displayed overall survival similar to AML-non-MR cases. In addition, SMC1A and SMC3-mutant cases similarly share clinical and molecular features with AML-non-MR. We therefore propose that RAD21, SMC1A, and SMC3-mutant AML should be considered as AML-non-MR. Our data demonstrate that within the family of cohesin complex mutations, only STAG2 mutations are indicative of secondary ontogeny and are associated with worse clinical outcomes.

We observed that STAG2-mutant cases may have higher numbers of co-mutations (as determined by targeted sequencing panels, with its inherent limitations), which could be a clinical proxy of an intrinsically increased genomic instability. This is in line with several prior studies demonstrating that STAG2 deficiency is coupled with replication fork stalling, impaired DNA damage repair, and accumulation of DNA damage [14, 42,43,44]. These findings have contributed to the therapeutic window for inhibitors of Poly(ADP-ribose)polymerase (PARP) [14], which are currently being investigated in a proof of concept study of single agent and combination treatment with hypomethylating agents in a clinical trial for cohesin-mutant AML and MDS (Clinicaltrials.gov identifier NCT03974217). Furthermore, the association of STAG2-mutations with trisomy 8 is intriguing considering recent findings suggesting that RAD21 is the driver of chromosome 8 gain to mitigate replication stress in Ewing sarcoma [45], a disease characterized by frequent STAG2 mutations. Currently, our data does not allow us to predict the order of STAG2 and trisomy 8 acquisition, or whether trisomy 8 may affect response to DNA damage repair inhibitors or replication fork stressors, such as PARP inhibitors or hydroxyurea.

The strengths of our study include the large, well-annotated patient cohorts that were representative of clinical practice in Europe and the United States, although such retrospective analyses have an inherent selection bias found in both cohorts to different extents. We therefore note that the retrospective nature of this approach limits the generalizability of our results. Furthermore, the numbers of SMC1A and SMC3-mutant cases in our cohort were significantly smaller than the number of STAG2 and RAD21-mutant cases, which may limit some of our conclusions about mutations in these cohesin subunits.

In summary, our study contributes to a better understanding of the distinct effects of cohesin gene mutations in myeloid malignancies. Although the biology underlying these differences is not yet known, our work supports the notion that not all cohesin subunit mutations were created equal and that the distinct pattern of cohesin mutations across cancer types may be driven by the unique biology of cohesin subunits.