Introduction

Recurrent gene fusions involving EWSR1/FUS with members of the cAMP response element binding protein (CREB) family (ATF1, CREB1 and CREM) are shared amongst multiple tumor-types spanning a wide clinicopathologic spectrum. Despite sharing related gene fusions, members of the EWSR1::CREB family of translocation-associated tumors exhibit significantly different clinicopathologic characteristics. The prototypic example is angiomatoid fibrous histiocytoma (AFH) vs clear cell sarcoma (CCS)—two morphologically distinct tumors, the former mostly associated with a benign behavior, while the latter being an aggressive sarcoma with a high metastatic potential and poor outcome, as illustrated by the survival analysis of our cohort. Clear cell sarcoma-like tumor of gastrointestinal tract (GICCS, also known as gastrointestinal neuroectodermal tumor) and hyalinizing clear cell carcinoma of salivary gland (HCCC) had intermediate overall survival relative to AFH and CCS.

To elucidate the molecular mechanisms underlying their differences, we performed a comprehensive genomic analysis of fusion transcript variants, secondary recurrent genetic alterations (mutations, copy number alterations), gene expression and methylation profiles across a large cohort of tumour-types defined by EWSR1/FUS::CREB gene fusions. Specifically, the tumors included in this study encompassed AFH, CCS, GICCS, HCCC, clear cell odontogenic carcinoma (CCOC), malignant epithelioid neoplasm with predilection for mesothelial-lined cavities (ME), mesothelioma (Meso), myxoid mesenchymal tumor (MMT), and primary pulmonary myxoid sarcoma (PPMS).

Materials and methods

Case selection and study cohort

After approval from the Institutional Review Board, cases were identified from the Memorial Sloan Kettering Cancer Center (MSKCC) surgical pathology archives, or from collaborating institutions, based on tumor types and/or presence of EWSR1/FUS::ATF1/CREB1/CREM fusions. The diagnosis of all 137 cases included molecular confirmation of both fusion partners: 37 cases by fluorescence in situ hybridization, 24 cases by reverse transcription PCR (RT-PCR), 29 cases by Memorial Sloan Kettering-Integrated Mutation Profiling of Actionable Cancer Targets (MSK-IMPACT) only, 10 cases by MSK-Fusion only, 10 cases by both MSK-IMPACT and MSK-Fusion, 21 cases by TruSight RNA Fusion Panel (Illumina, San Diego, CA), with the remaining cases based on NGS testing performed by referring institutions. The meta-analysis of the published literature was based on an exhaustive literature search on any fusions or gene rearrangements reported in all of the listed entities in Supplementary Table 1 that we could identify on PubMed.

DNA seq and RNA seq

Detailed descriptions of MSK-IMPACT workflow and data analysis, a hybridization capture-based targeted DNA NGS assay for solid tumor, and MSK-Fusion, an amplicon-based targeted RNA NGS assay using the Archer™ FusionPlex™ standard protocol, were described previously1,2.

850k methylation array

Details of the methylation array protocol were described previously3. Briefly, for each sample, 250 ng of input DNA was used for bisulfite conversion (EZ DNA Methylation Kit; Zymo Research; catalog number D5002), followed by an FFPE restoration step using the Infinium HD FFPE DNA Restore Kit (Illumina; catalog number WG-321-1002). All samples were processed on the Infinium methylationEPIC 850k BeadChip array and scanned using the Illumina iScan. Each CpG site interrogated by the Infinium array was identified by a unique cg identifier in the format of cg#, where # is a number. The methylation level for each CpG site was quantified using β values (continuous values between zero and one), calculated as the ratio of methylated signal/total signal plus an offset. 850k methylation array profiling was performed in a total of 80 samples, including: 7 AFH, 4 CCS, 8 GICCS, and compared to 51 soft tissue tumors of various histotypes (4 angiosarcomas, 27 gastrointestinal stromal tumors, 1 HCCC, 1 ME, 3 Meso, 11 paragangliomas, 4 small blue round cell tumors) and 10 normal tissues (8 human peripheral white blood cell samples, 2 normal adrenal medulla). A minimum cutoff of log2FC (fold change) >1.0 and FDR < 0.01 was used for statistical analysis of differential methylation analysis using t test, focusing on comparison of the 7 AFH against all other samples and 4 CCS against all other samples. Unsupervised hierarchical clustering was performed by the t-distributed stochastic neighborhood embedding (t-SNE) method using aforementioned samples and additional raw IDAT files downloaded from the Heidelberg sarcoma methylation classifier reference cohort4 (see supplementary figure 3 for sample and method details).

Sarcoma classification by DNA methylation profiling

Details of the DNA methylation-based machine learning sarcoma classification algorithm were described in Koelsche et al4. This random forest-based machine learning algorithm was developed at the German Cancer Research Center (DKFZ) in Heidelberg, Germany. Briefly, the method defines 62 methylation classes based on a reference cohort of 1077 samples encompassing a broad range of sarcomas. The classifier quantifies the confidence of the sample’s assigned methylation class using a calibrated score between 0 and 1. The sum of all calibrated scores across all methylation classes is 1.0. A confident match is generally considered >0.9 and a poor match <0.54. The 22 cases that underwent analysis with the DNA methylation classification algorithm corresponded to 6 AFH, 4 CCS, 8 GICCS, 1 ME, 3 Meso among the 80 samples that underwent methylation profiling.

Affymetrix microarray gene expression analysis

Details of the microarray protocol were described previously5,6. RNA was isolated using RNAwiz RNA isolation reagent (Ambion) and run through a column with RNase-free DNase (Qiagen). Ten micrograms of labeled and fragmented cRNA were then hybridized onto a Human Genome U133A expression array (Affymetrix, Santa Clara, CA). Post-hybridization staining, washing, and scanning were done according to instructions from the manufacturer (Affymetrix). The raw expression data were derived using the Affymetrix Microarray Analysis 5.0 (MAS 5.0) software. The data were normalized using a scaling target intensity of 500 to account for differences in the global chip intensity. The expression values were transformed using the logarithm base two. Affymetrix U133A gene expression array analysis was performed in a total of 58 samples, including 3 AFH, 4 CCS, 1 GICCS, and compared to 44 soft tissue tumors of various histotypes (3 adult fibrosarcomas, 5 angiosarcomas, 3 leiomyosarcomas, 10 gastrointestinal stromal tumors, 3 myxoid liposarcomas, 6 paragangliomas, 4 small blue round cell tumors, 4 solitary fibrous tumors, 3 synovial sarcoma, 3 undifferentiated pleomorphic sarcomas) and 6 normal tissues (adrenal gland, brain, kidney, small intestine, stomach, testis). For differential gene expression analysis, a minimum cutoff of log2FC (fold change) >1.0 and FDR adjusted p < 0.01 were used for t test. We compared one histotype against all other tumors for each respective analysis. For example: CCS (4 cases) vs all others (54 cases) in one analysis, and AFH (3 cases) vs all others (55 cases) in a different analysis. Unsupervised hierarchical clustering was performed using the pheatmap R package version 1.0.12 with Ward’s linkage and Euclidean distance for clustering.

Integration of gene expression and methylation analysis

First, we performed differential gene expression and differential methylation analysis by setting a false discovery rate (FDR) adjusted p value of 0.01 and a minimum log2FC (fold change) of 1.0 for t test, comparing one histotype against all other tumors each time (e.g., CCS vs all others, AFH vs all others). Thereafter, for integration of transcriptomic and methylation data, we matched all the genes that were both differentially expressed based on log2FC >1.0 and FDR < 0.01 and differentially methylated based on log2FC > 2.0 and FDR < 0.01 for the CCS vs all others and AFH vs all others comparisons. Out of the 3 AFH, 4 CCS and 1 GICCS on the Affymetrix U133 microarray, 1 AFH and 2 CCS did not overlap with the samples used for the methylation array.

Results

Clinicopathologic summary

A total of 137 cases were identified [76 females, 61 males, mean age 37 (range 2–86)], including: 40 CCS (29%), 36 AFH (26%), 20 GICCS (15%), 14 ME (10%), 10 HCCC (7%), 8 Meso (6%), 5 MMT (4%), 3 PPMS (2%), and 1 clear cell odontogenic carcinoma (CCOC) (1%) (Fig. 1A). The mean ages in HCCC, PPMS and CCOC were higher than those in AFH, CCS, GICCS, MMT, and ME (Table 1). As expected, the primary sites were predominantly soft tissue for AFH and CCS, gastrointestinal tract/pelvis for GICCS, brain for MMT, thoracic or abdominopelvic cavities for Meso and ME, lung for PPMS, and major and minor salivary glands for HCCC.

Fig. 1: Distribution of tumor types and EWSR1/FUS fusion partners.
figure 1

A Study cohort showing number of cases and percentages by tumor type. B Distribution of EWSR1/FUS fusion partners (ATF1, CREB1, CREM) by diagnosis (number of cases for each histotype indicated between parenthesis). Abbreviations—AFH angiomatoid fibrous histiocytoma, CCS clear cell sarcoma, CCOS clear cell odontogenic carcinoma, GICCS clear cell sarcoma-like tumor of gastrointestinal tract, HCCC hyalinizing clear cell carcinoma of salivary gland, ME malignant epithelioid neoplasm with predilection for mesothelial-lined cavities, Meso mesothelioma, PPMS: primary pulmonary myxoid sarcoma.

Table 1 Clinicopathologic summary of the study cohort.

Fusion types and transcript variants by diagnosis

The distribution of the EWSR1/FUS fusion partners, ATF1, CREB1, and CREM, was significantly different across different tumor types (chi-square P < 0.0001) (Fig. 1B). Specifically, EWSR1::ATF1 fusions were the only fusion type in HCCC (100%) and the predominant fusion type in CCS (85%) and Meso (88%); EWSR1::CREB1 fusions were the only fusion type in PPMS (100%) and the predominant fusion type in AFH (60%); EWSR1/FUS::CREM fusions were the predominant fusion type in ME (86%). CREB1 and CREM fusions were equally distributed in MMT (40% each). ATF1 and CREB1 fusions were equally distributed in GICCS. The single case of CCOC had a EWSR1::ATF1 fusion. Of the 137 cases, only 5 (4%) harbored FUS fusions: four were FUS::CREM fusions in ME, one was a FUS::ATF1 fusion in a Meso.

The exon usage for the fusion transcript variants for each tumor type was derived from either MSK-Fusion and/or MSK-IMPACT testing and available in 48 cases (8 AFH, 18 CCS, 9 GICCS, 7 HCCC, 3 Meso, 1 PPMS, 2 ME) (Table 2). The predominant fusion transcript variants were EWSR1ex8::ATF1ex4 in CCS and GICCS; EWSR1ex7::ATF1ex5 and EWSR1ex7::CREB1ex7 in AFH; EWSR1ex11::ATF1ex3 in HCCC; FUSex8::CREMex5/7 for ME (Fig. 2). Supplementary Table 1 summarizes the CREB family fusion variants of various tumor types derived from our meta-analysis of published studies in comparison to those detected in the current cohort.

Table 2 Distribution of the most prevalent fusion transcript variants by exon usage in the current study.
Fig. 2: Schematics of predominant fusion transcript variants, for AFH, CCS, GICCS, HCCC, and ME. Exon numbers were based on canonical transcripts for each gene.
figure 2

Percentage indicates frequency of the fusion transcript variant within the corresponding histotype subgroup. RefSeq accession number: ATF1 (NM_005171); CREB1 (NM_134442); CREM (NM_181571); EWSR1 (NM_005432); FUS (NM_004960). Abbreviations—AFH angiomatoid fibrous histiocytoma, CCS clear cell sarcoma, GICCS clear cell sarcoma-like tumor of gastrointestinal tract, HCCC hyalinizing clear cell carcinoma of salivary gland, ME malignant epithelioid neoplasm with predilection for mesothelial-lined cavities, Meso mesothelioma.

Clinically significant recurrent genetic alterations

39 cases [6 AFH, 14 CCS, 9 GICCS, 5 HCCC, 3 Meso, 1 PMMS, 1 clear cell odontogenic carcinoma (CCOS)] were analyzed by MSK-IMPACT. Only clinically significant variants with OncoKB annotations (Chakravarty 2017) (or known recurrent hotspots) and secondary recurrent genetic alterations (events that occur >1 in our cohort) were included. Variants of unknown significance were excluded. Of the 39 cases that underwent targeted NGS testing, 18 (46%) had OncoKB mutations or copy number alterations (29 secondary genetic events in total), of which 15 (52%) were recurrent. Specifically, TERT promoter hotspot mutations (n = 5) and CDKN2A X51_splice and P81Lfs*30 mutations (n = 2) were mutually exclusive and identified in CCS only. Other secondary recurrent genetic alterations identified were: TP53 R248Q and T155Pfs*15 mutations (n = 2, 1 CCS, 1 GICCS), 9p21.3 (CDKN2A/CDKN2B) copy number loss (homozygous deletion) (n = 4, 2 AFH, 1 CCS, 1 HCCC), and DIS3 D479G and D488N mutations (n = 2, both GICCS) (Fig. 3A). No secondary recurrent genetic alterations were identified in any of the 3 Meso, 1 PMMS, or 1 CCOC. The type of secondary recurrent genetic alterations did not correlate with the EWSR1/FUS fusion partner type (Supplementary Fig. 1).

Fig. 3: Recurrent genomic alterations in AFH, CCS, GICCS, and HCCC.
figure 3

A Recurrent genomic alterations identified by MSK-IMPACT, including OncoKB mutations and copy number alterations84, in 6 AFH, 14 CCS, 9 GICCS, and 5 HCCC. Only genomic alteration events occurring >1 were included. B Presence of TERT, CDKN2A, and CDKN2B alterations in AFH with or without metastatic disease. C Presence of TERT, CDKN2A, CDKN2B and TP53 alterations in living vs deceased CCS patients. Data generated from cBioPortal and visualized using OncoPrint85. Abbreviations—AFH angiomatoid fibrous histiocytoma, CCS clear cell sarcoma, GICCS clear cell sarcoma-like tumor of gastrointestinal tract, HCCC hyalinizing clear cell carcinoma of salivary gland.

Interestingly, AFH cases with CDKN2A/CDKN2B homozygous deletion (n = 2, 33%) were exclusively found in metastatic cases, whereas the remaining CDKN2A/CDKN2B non-altered AFH cases were non-metastatic (Fig. 3B). On the other hand, CCS cases with TERT promoter mutations and CDKN2A loss-of-function mutations (frameshift and splice site mutations) (n = 7, 50%) were significantly correlated with decreased overall survival (Mantel Haenszel chi-square P = 0.0196) (Fig. 3C), with a median survival of 5.13 vs 22.85 months in non-altered CCS cases (n = 7, 50%). The presence of DIS3 mutations were not correlated with metastatic nor survival status in GICCS.

Methylation and gene expression correlation

Gene expression profiling were performed on the Affymetrix U133A expression array comparing 3 AFH, 4 CCS cases, and 1 GICCS to a group of 44 soft tissue tumors of various histotypes and 6 normal tissue samples (Supplementary Fig. 2 and Supplementary Table 2). Methylation profile testing was performed comparing 7 AFH, 4 CCS, 4 GICCS to a group of 51 soft tissue tumors of various histotypes and 10 normal tissue samples on the 850k methylation array (Supplementary Table 3). The goal was to identify correlates between differential gene expression (1.5 log2 FC, FDR 0.01) and differential methylation (4 log2 FC and FDR 0.01) for EWSR1::ATF1-rearranged CCS and EWSR1::CREB1-rearranged AFH, respectively, in relation to other tumor types. Gene expression profiling revealed upregulation of PMP22, MITF, SLC7A5, CDH19, WIPI1, FYN, PARVB, and PFKP in CCS but not AFH, and upregulation of SGK1, S100A4, XAF1 and LY96 expression in AFH but not CCS. Additional methylation data from the study by Koelsche et al.4 (8 AFH, 7 CCS, and soft tissue tumors of various histotypes and normal tissue samples) were retrieved and included in unsupervised t-SNE clustering analysis. Unsupervised clustering of methylation profiles showed that CREB family translocation tumors (AFH, CCS, GICCS, Meso) each form tight, distinct clusters that were nearby each other (Supplementary Fig. 3 and Supplementary Table 3).

Thereafter, differentially expressed genes were matched to CpG sites based on chromosomal locations. We matched all the genes that were both differentially expressed based on log2FC > 1.0 and FDR < 0.01 and differentially methylated based on log2FC > 2.0 and FDR < 0.01. We focused on upregulated genes with corresponding hypomethylation. Our analyses revealed genes (MITF, CDH19, PARVB, and PFKP) with increased expression and hypomethylation in CCS but not AFH (Fig. 4A), and genes (S100A4, XAF1) with increased expression and hypomethylation in AFH but not CCS (Fig. 4B). MITF is involved in melanogenesis and overexpressed in CCS as part of its core gene signature5,7. CDH19 and PARVB are involved in cell adhesion and were highly expressed in primary melanoma8. S100A4 has been implicated in cell migration and cancer metastases9. XAF1 is a proapoptotic tumor suppressor gene10.

Fig. 4: Differential gene upregulation corresponding to hypomethylation on matched CpG sites.
figure 4

A CCS. B AFH. Affymetrix U133A was performed comparing 3 AFH, 4 CCS cases, and 1 GICCS to a group of 44 tumors of various histotypes and 6 normal tissues, using log2-fold change threshold of 1 and P < 0.01. Infinium 850k methylation array was performed comparing 7 AFH, 4 CCS, 4 GICCS to a group of 29 tumors of various histotypes and 8 normal tissues, using a log2-fold change threshold of 2 and P < 0.01. Differentially expressed genes were matched to CpG site identified by a unique cg identifier in the format of cg#. The numbers of CpG sites assigned to each of these 4 genes on the 850k array were as follows: 8 for S100A4, 14 for XAF1, 27 for MITF, and 3 for CDH19. Out of these, the numbers of CpG sites that showed negative correlation with gene expression were as follows: 3 for S100A4, 6 for XAF1, 3 for MITF, and 3 for CDH19. The most representative CpG site from each gene was displayed in this figure. Abbreviations—AFH angiomatoid fibrous histiocytoma, CCS clear cell sarcoma, GICCS clear cell sarcoma-like tumor of gastrointestinal tract.

Tumor type prediction by the sarcoma methylation classifier

The DNA-methylation based sarcoma classification algorithm described in Koelsche et al.4 was applied to 22 cases (6 AFH, 4 CCS, 8 GICCS, 1 ME, and 3 Meso) (Table 3). This algorithm was able to accurately match 100% of four CCS cases to the correct methylation class (calibrated score = 0.99 in all cases), but only 33% (2 of 6) of AFH cases (calibrated score = 0.75 and 0.33, respectively). GICCS was not a methylation class in the original classifier. Interestingly, the algorithm matched 1 GICCS to AFH (calibrated score 0.56) and 2 GICCS to CCS (calibrated score = 0.65 and 0.96, respectively).

Table 3 Tumor type prediction by sarcoma methylation classifiera.

Survival analysis

The overall survival across AFH, CCS, GICCS, HCCC was significantly different (log rank P = 0.023), with CCS associated with the worse survival (median survival 15 months), followed by HCCC (median survival 36 months) and then GICCS (median survival 43 months). All AFH patients remained alive across the follow-up period of 42 months (Fig. 5).

Fig. 5: Comparison of overall survival in 6 AFH, 14 CCS, 9 GICCS, and 5 HCCC.
figure 5

Median survival time (months) for each tumor type listed beneath Kaplan–Meier curves. Hazard ratios compared using log-rank analysis. Abbreviations—AFH angiomatoid fibrous histiocytoma, CCS clear cell sarcoma, GICCS clear cell sarcoma-like tumor of gastrointestinal tract, HCCC hyalinizing clear cell carcinoma of salivary gland.

Discussion

The EWSR1/FUS::CREB family of translocation-associated tumors encompasses a wide and heterogenous clinicopathologic spectrum. To understand the pathogenesis that sets them apart, we performed comprehensive genomic analysis of fusion variants, secondary recurrent genetic alterations (mutations, copy number alterations), gene expression and methylation profiles across a large cohort of EWSR1::CREB family of translocation-associated tumors, with emphasis on AFH, CCS, GICCS, and HCCC.

Although the analysis of fusion transcript variants in our cohort largely paralleled the published literature, some new interesting findings emerged. For AFH, the most common reported fusion transcript variant is EWSR1::CREB1 (ex7-ex7) (58%)7,11,12,13,14,15. However, we identified a significant proportion (39%) of AFH cases with EWSR1::ATF1 (largely ex7-ex5). A minority (3%) of AFH harbored EWSR1::CREM fusions. Interestingly, MMT16,17,18,19,20,21, which remains disputed by some authors to be related to a myxoid, intracranial variant of AFH22,23,24,25,26,27, harbor roughly equal proportions of EWSR1::CREM and EWSR1::CREB1 fusions, with a minority harboring EWSR1::ATF1. For CCS, the predominant fusion transcript is EWSR1ex8::ATF1ex415,28,29,30,31,32,33,34. This pattern is mirrored by a subtype of Meso, initially described by our group and occurs in younger patients without asbestos exposure history, which are driven predominantly by EWSR1::ATF1ex535,36,37. Of interest, in contrast to prior data, GICCS showed similar proportions of EWSR1::ATF1 (mostly ex8-ex4) and EWSR1::CREB1 fusions5,34,38. On the other hand, the recently described distinct tumor type, so-called “malignant epithelioid neoplasm with predilection for mesothelial-lined cavities”39 and subsequently validated by Shibayama et al.40, most commonly harbor either fusions between EWSR1 or FUS and exon 7 of CREM. In contrast, PPMS is almost exclusively driven by EWSR1::CREB1 (mostly ex7-ex7)41,42,43,44,45,46,47,48 except for a rare case with EWSR1::ATF149. Some authors proposed that PPMS and AFH exist on a morphologic and molecular spectrum43,49. Finally, both HCCC50,51,52,53,54,55 and CCOC56,57,58,59,60 harbor only EWSR1::ATF1, supporting the notion that HCCC and CCOC are likely related tumors. Nevertheless, it is evident from our meta-analysis of the published literature and from the current study that there is significant intertumoral overlap as well as intratumoral heterogeneity of fusion transcript variants and exon usage across the different CREB family translocation tumors. Here, intratumoral heterogeneity refers to variation of fusion transcript variants, e.g., EWSR1::ATF1 and EWSR1::CREB1 in GICCS, and exon usage within the same histotype, e.g., FUS::CREM exon 5 and FUS::CREM exon 7 in ME.

This is the first study to report secondary recurrent genomic alterations in CCS, AFH and GICCS. In CCS, we identified the presence of recurrent TERT promoter and CDKN2A hotspot mutations, which were mutually exclusive but in combination strongly associated with worse overall survival. TERT promoter somatic mutations and amplifications are frequently found across multiple tumor types61,62. In soft tissue tumors, TERT promoter mutations have been identified in myxoid liposarcomas63, atypical fibroxanthoma/pleomorphic dermal sarcomas64, chondrosarcomas65, and solitary fibrous tumors66; this is reported to be associated with a worse prognosis in the latter. Our findings suggest that TERT promoter and CDKN2A mutations may serve as prognostic biomarkers for worse survival in CCS.

In AFH, we identified CDKN2A/CDKN2B homozygous deletions exclusively amongst cases with metastasis. Genomic profiling of multiple sarcoma types has revealed secondary recurrent CDKN2A alterations67,68, with a role suggested as a biomarker for poor prognosis67. Compared to other CREB family translocations tumors, AFH is a soft tissue tumor of borderline malignant potential and a relatively good prognosis; metastasis is usually <2%. In fact, all the patients whose AFH were sequenced in our cohort remain alive at the time of reporting. Our finding of CDKN2A/CDKN2B deletions in the two AFH cases with biopsy-proven metastasis, but not in the non-metastatic AFH cases, raises the possibility of CDKN2A/CDKN2B deletion testing as a biomarker for metastatic potential. Although not a recurrent abnormality in this cohort, one of the metastatic AFH case showed a co-existing BRAF V600E mutation which was confirmed by immunohistochemistry to have diffuse and strong VE1 expression. This was the only case in our cohort to have a BRAF mutation detected.

Gene expression profiling revealed differential gene expression in AFH vs CCS, which clustered in distinct genomic groups by unsupervised analysis. A number of genes involved in melanocyte regulation and cellular membrane/migration were upregulated in CCS compared to AFH, including PMP22, MITF, SLC7A5, CDH19, WIPI1, FYN, PARVB, and PFKP, while upregulation of SGK1, S100A4, XAF1 and LY96 mRNA expression was detected in AFH but not CCS. An expression profiling analysis of CCS cell lines revealed upregulation of S100A11 (encoding for S100 protein), MITF (microphthalmia-associated transcription factor), and Pmel17 (SILV) (silver mouse homologue-like melanosomal protein detected by the IHC marker HMB45)69. Moreover, in an in vitro CCS induced pluripotent stem cell model, Komura et al. reported expression of several Schwann cell marker genes, such as P75NTRS100bMbpPlp1, and Pmp2270. They proposed that S100-expressing peripheral nerve cells could be a cell of origin for EWS/ATF1-induced CCS. In a recent study using human embryonic stem (hES) cell models, hES cells driven by EWSR1::CREB1 and EWSR1::ATF1 fusions recapitulate the core gene signatures, respectively, of AFH (SGK1 and MXRA5 upregulation) and GICCS (SGK1, MXRA5, SOX10, and DUSP4 upregulation)71. Our gene expression profiling of patient samples validates a subset of the findings of these preclinical studies.

Our methylation profiling and multidimensionality reduction clustering analysis revealed that CREB family translocation-associated tumors (AFH, CCS, GICCS, Meso) form neighboring but tight clusters that were distinct from other soft tissue tumor types and normal tissue. When matching the differentially expressed genes to the corresponding methylation probes/CpG sites, we found significant correlations between upregulated genes that were hypomethylated in CCS but not AFH. These genes included MITF, CDH19, PARVB, and PFKP in CCS. MITF is involved in melanogenesis and found to be overexpressed in CCS as part of its core gene signature, but not in AFH or GICCS5,7. More recently, a Cre-loxP-induced Ewsr1::Atf1-driven CCS model demonstrated that Mitf and Myc can contribute to sarcomagenesis72. Both CDH19 and PARVB are involved in cell adhesion and were highly expressed in primary melanoma, associated with worse survival8. It is interesting how they were found to show increased expression and hypomethylation in CCS in our study. On the other hand, we identified upregulation and hypomethylation of S100A4 and XAF1 in AFH but not CCS. S100A4 protein is a member of the S100 calcium binding protein family, also known as metastasin, and has been implicated in cell migration and cancer metastases9. XAF1 is a proapoptotic, interferon-stimulated tumor suppressor gene that suppresses tumorigenesis10. While XAF1 is usually hypermethylated and downregulated in most cancers, it was found to be paradoxically hypomethylated in glioblastoma with adaptive temozolomide resistance73. These findings serve as a proof-of-concept example of how integrative gene expression and methylation profiling may provide interesting biological insights into the different pathogenesis underlying tumors sharing the same driver gene fusions. Integrated DNA methylation and gene expression studies have been performed in Ewing sarcoma74, pediatric rhabdomyosarcomas75, myxoid, dedifferentiated, and pleomorphic liposarcomas76, which identified sets of genes with inverse methylation and gene expression relationship. In a comprehensive molecular and genomic study of undifferentiated sarcomas (USARC), DNA methylation profiling failed to identify distinct USARC subgroups and did not correlate with gene expression, but showed MSH2 and TERT promoter hypermethylation77. On the other hand, DNA methylation profiling also revealed epigenetic heterogeneity within the same tumor type, e.g., Ewing sarcoma78. Unfortunately, the sample size of individual tumor types used for methylation profiling in the current study is insufficient to perform differential methylation analysis within the same tumor type, which could be explored in future studies.

Genome-wide DNA methylation profiling has largely been performed for tumor classification purposes in a wide range of mesenchymal tumors, with varying degree of success. These include: benign and malignant nerve sheath tumors79, osteosarcomas80, undifferentiated small round blue cell tumors81, CIC-rearranged undifferentiated sarcomas82. Most recently, a Random Forest machine learning sarcoma classifier from the German Cancer Research Center (DKFZ) in Heidelberg were developed to classify a wide spectrum of 66 soft tissue and bone tumors using a large reference and validation cohort4. The limitations of using methylation profiling alone to differentiate soft tissue tumors were illustrated by the inability of the Heidelberg methylation classifier to accurately classify tumor entities in our cohort, with the exception of CCS. There are several major shortcomings to the applicability of this methylation classifier for soft tissue tumors with EWSR1/FUS fusions with CREB family transcription factors. First, several tumor types, including GICCS and Meso, were not included in the reference cohort that was used to develop the classifier. Second, although the reference cohort included 8 cases of AFH, only 1 case was used in the validation cohort, which was misclassified as desmoplastic small round cell tumor4. In our study, the classifier was able to correctly classify one-third of the AFH cases. On the other hand, the methylation classifier performed very well, both in the Koelsche et al study and in our experience, in the classification of CCS: classifying 100% of the cases correctly. It is also interesting that when we applied the classifier to GICCS, 2 cases were classified as CCS and 1 case as AFH, illustrating their overlapping methylation profiling as described above. All 3 of these GICCS cases were located in the gastrointestinal tract (1 stomach, 2 small bowel), and were diffusely and strongly positive for S100 and negative for HMB45. The combined clinical and immunohistochemical profile essentially excludes CCS and AFH. These findings highlight the existing limitation of methylation profiling in soft tissue tumor classification, which may require further algorithm refinement as well as larger reference and validation cohorts83.

In addition to these molecular mechanisms, the nature of the initial stem cell host in which the fusion, and its degree of commitment/plasticity, arose may also play a significant role in ultimate tumor type (i.e., depending on location/extent of totipotency). These are interesting questions that are beyond the scope of the current study. Recent studies using Cre inducible mouse and human embryonic stem cell models have begun to address these questions71,72.

The lack of consistency in the sample sizes of the cases with each technique is a major drawback of our paper. Further studies focusing on specific molecular profiling techniques with deeper genomic characterization utilizing a larger sample size of some of the rarer histotypes would be beneficial to validate or expand on our findings.

In conclusion, our comprehensive genomic profiling of EWSR1/FUS::CREB translocation-associated tumors uncover fusion transcript variant heterogeneity, prognostically significant secondary recurrent genetic alterations, and differentially hypomethylated and upregulated genes. These findings underscore the utility of integrative genomic approaches in the study of translocation-associated tumors with diverse clinicopathologic features, and whether some of the entities in this family could be unified under the same morphologic/molecular spectrum (e.g., CCS and GICCS, AFH and PPMS).