Methylation profiling reveals novel molecular classes of rhabdomyosarcoma

Rhabdomyosarcomas (RMS) represent a family of aggressive soft tissue sarcomas that present in both children and adults. Pathologic risk stratification for RMS has been based on histologic subtype, with poor outcomes observed in alveolar rhabdomyosarcoma (ARMS) and the adult-type pleomorphic rhabdomyosarcoma (PRMS) compared to embryonal rhabdomyosarcoma (ERMS). Genomic sequencing studies have expanded the spectrum of RMS, with several new molecularly defined entities, including fusion-driven spindle cell/sclerosing rhabdomyosarcoma (SC/SRMS) and MYOD1-mutant SC/SRMS. Comprehensive genomic analysis has previously defined the mutational and copy number spectrum for the more common ERMS and ARMS and revealed corresponding methylation signatures. Comparatively, less is known about epigenetic correlates for the rare SC/SRMS or PRMS histologic subtypes. Herein, we present exome and RNA sequencing, copy number analysis, and methylation profiling of the largest cohort of molecularly characterized RMS samples to date. In addition to ARMS and ERMS, we identify two novel methylation subtypes, one having SC/SRMS histology and defined by MYOD1 p. L122R mutations and the other matching adult-type PRMS. Selected tumors from adolescent patients grouped with the PRMS methylation class, expanding the age range of these rare tumors. Limited follow-up data suggest that pediatric tumors with MYOD1-mutations are associated with an aggressive clinical course.

www.nature.com/scientificreports/ patients presenting in late adolescence or early adulthood, the distinction between pediatric ERMS with anaplasia and adult-type PRMS can be difficult, given the paucity of defining diagnostic molecular abnormalities in each group. Several investigators have reported that RMS can be separated into at least two primary epigenetic groups based on differences in their genome-wide methylation profiles. These groups correspond closely to the embryonal and alveolar subtypes, with the latter being highly enriched for PAX3-FOXO1 and PAX7-FOXO1 fusions [10][11][12] . Despite these observations, complete representation of all subtypes of rhabdomyosarcoma has not been included in previous cohorts, leaving the question of how histologically-defined SC/SRMS and PRMS are epigenetically related to ERMS and ARMS.
We hypothesized that with sufficient representation, additional molecular subtypes could be identified in RMS. To test this hypothesis, we performed genome-wide methylation profiling on 154 rhabdomyosarcomas and unrelated pediatric skeletal muscle controls representing all primary histologic types of both adult and pediatric disease and correlated the findings with copy number profiling, next-generation sequencing, and clinical outcome analysis.

Materials and methods
All methods were carried out in accordance with relevant guidelines and regulations. All experimental protocols were approved by the St. Jude Children's Research Hospital Institutional Review Board (#XPD17-163). Informed consent was not required under the Office for Human Research Protections (OHRP) guidelines regarding the disposition of deidentified human tissues for human subjects research, and was waived by the St. Jude Children's Research Hospital Institutional Review Board.
Data generation and methylation array processing. One hundred and fifty-eight samples from St.
Jude Children's Research Hospital (SJCRH, pediatric) and Stanford Hospital and Clinics (adult) were analyzed using Illumina Methylation BeadChip (EPIC) arrays according to the manufacturer's instructions. Data were generated from formalin-fixed paraffin-embedded (FFPE) tissue samples. DNA of 158 patients was extracted and hybridized to Illumina Infinium HumanMethylationEPIC BeadChip (850 K) arrays. All methylation data were analyzed in R (http:// www.r-proje ct. org, version 3.5.3), using multiple of packages from Bioconductor and other repositories. Specifically, array data were preprocessed using the minfi package (v. 1.28.4) 13 . Background correction with dye-bias normalization was performed for all samples using noob (normal-exponential out-ofband) with the "single" dye method 14 . Filtering was performed to remove probes located on sex chromosomes, probes containing nucleotide polymorphisms (dbSNP132 Common) within five base pairs of the targeted CpGsite, or probes mapping to multiple sites on hg19 (allowing for one mismatch), and cross-reactive probes.
Survival analysis was performed after a manual chart review to extract survival status For deceased patients, the time from the initial diagnosis to death was calculated, and for live patients, the time from the initial diagnosis to the last clinic visit was calculated. Survival curves were generated using Graphpad Prism 9ing.
Unsupervised clustering and copy number variation analysis. Principal component analysis was performed using the top 10,000 variably methylated probes. The number of statistically significant principal components was determined by agDimension function in the PCDimension package (v.1.1.11) 15 . Dimensionality reduction using 5 statistically significant principal components (k = 5) was used for Uniform Manifold Approximation and Projection (UMAP v.0.2.6) 16 with non-default parameters: theta = 0, pca = F, perplexity = 4. To identify distinct clusters in the methylation data, the density-based spatial clustering of applications with noise (DBSCAN) algorithm 17 was applied to the UMAP coordinates with minPts = 4 and eps = 0.65. To evaluate the separation of DBSCAN clusters in the UMAP coordinates, silhouette analysis was performed on the Euclidian distances among the samples using the R package cluster (v.2.1.0). For hierarchical clustering, Kendall correlation was calculated as a distance measure between samples using the top 10,000 most variable probes among the SJ samples, and the unsupervised hierarchical clustering was performed on the computed distance by complete linkage agglomeration method.
Copy number variation (CNV) analysis from methylation array data was performed using the conumee package (version 1.16.0) 18 . Chromosomal gain or loss was determined using a 0.18 threshold. Statistically significant frequent copy number variations (CNVs) were determined using GISTIC version 2.0.23 29 . Copy number profiles output as segments obtained from conumee R package were used as inputs for GISTIC2. Gain and loss were categorized with CNV values greater than 0.18 or smaller than -0.18, respectively. CNVs were also divided into those that are broad (defined as exceeding half of the length of a chromosome arm) and focal (shorter than this). We considered events with False Discovery Rate q-values < 0.25 as significant at a 90% confidence level. An "arm-level peel-off " correction was enabled to assign all CNVs in the same chromosome arm of the same sample to be part of a single event when determining whether multiple significantly recurrent events exist on that chromosome arm. Mutation detection of tumor-only FFPE RNA samples. The adapters in sequencing reads were trimmed with "trim_galore" (v0.4.4, https:// www. bioin forma tics. babra ham. ac. uk/ proje cts/ trim_ galore/, -q 20 -phred 33 -paired) and were mapped using STAR 27 . The GATK SplitNCigarReads tool was used for adjusting the cigar string of RNAseq BAMs, and the resulting BAMs were run through BQSR to calibrate base quality. The variant was called by GATK HaplotypeCaller and filtered by the VariantFiltration tool (-window 35 -cluster 3 -filter-name FS -filter "FS > 30.0" -filter-name QD -filter "QD < 2.0"). Further filtering of the variants was performed using the same criteria as described in the tumor-only FFPE calling. The variant annotation was performed using Annovar.

Results
Description of Cohort. Patients were selected from the anatomic pathology archives based on sample availability for histopathologic review and methylome analysis. Histopathologic and clinical data were tabulated from the electronic medical record (Supplemental Table 1). The remaining "pediatric" cohort included samples from patients treated in a pediatric hospital setting, with a few patients representing young adults. This cohort included samples from 70 female and 76 male patients. Age ranged from 3 months to 27 years (mean 7.8 years, median 7 years). Initial histologic diagnosis included: ARMS (n = 53), ERMS (n = 46), SC/SRMS (n = 3), RMS not otherwise specified (n = 44). Cases designated as NOS were enriched by our consultation practice, and included cases for which a definitive diagnosis was not reached in the initial clinical workup. This occurred secondary to poor sample quality, inability to perform ancillary studies, and in the case of inconclusive histologic findings. The adult pleomorphic rhabdomyosarcoma subset (n = 8) included 4 female and 4 male patients. Age ranged from 61 to 79 years (mean 69.8 years, median 68.5 years).

DNA methylation profiling of rhabdomyosarcoma.
To determine if their genome-wide methylation signature could separate the types of rhabdomyosarcoma, we performed Infinium EPIC 850 K methylation array testing on a cohort of 154 rhabdomyosarcomas and four controls (skeletal muscle samples), followed by cluster analysis. Four main clusters emerged, which we designated ARMS, ERMS, SC/SRMS, and PRMS based on the most dominant histopathologic diagnosis represented in each cluster (Fig. 1). Normal control tissue grouped closely with ERMS, a finding which has been observed previously 12,28 . Copy number analysis in subtypes of rhabdomyosarcoma. We evaluated for recurrent copy number abnormalities, focal or chromosome level, in the RMS molecular groups using the methylation array data. While few recurrent broad copy number abnormalities were detected in alveolar rhabdomyosarcoma, the ERMS group demonstrated typical changes previously reported to be associated with that subtype, including enrichment for gains of chromosomes 2, 8, 11, 12, 13, and 20 ( Fig. 2A) and most of these gains were found to be statistically significant by GISTIC2.0 (Supplemental Table 2) 29 . Focal changes in ARMS included a gain of 12q13 and 13q14, containing MYO1A, STAT6, and FOXO1 genes, while the ERMS group had a gain of 12q15, containing both the FRS2 and MDM2 genes. The molecularly defined PRMS subtype demonstrated complex copy number changes with a frequent gain of 1p (Supplemental Table 2). Other abnormalities included loss of 13, including focal changes at the RB1 locus. Among focal changes, the PRMS group was also characterized by frequent gain of 6q24.3, a region containing FBXO30, an F-box gene known to show myocyte-specific expression (Fig. 2). The SC/SRMS group was characterized by frequent broad gain of chromosomes 11 and 22q, and loss of 10p, 13q, and 16q (Supplemental Table 2). Focal deletion in 9p21.3 encompassing the CDKN2A locus was also identified in the SC/SRMS group (Fig. 2B).
Next-generation sequencing and fluorescence in situ hybridization of rhabdomyosarcomas. We performed next-generation sequencing of 109 RMS (57 ERMS, 39 ARMS, 9 PRMS, 4 SC/SRMS) tumors with sufficient material that had undergone methylation analysis (Fig. 3) Break-apart FOXO1 fluorescence in situ hybridization (FISH) was attempted in all cases with sufficient materials (n = 88) and was successful in 85 cases (97%). A positive rearrangement was identified in 49 of 52 ARMS and zero cases of ERMS (0/32).

Histologic correlates of PRMS and SC/SRMS. Clinicopathologic evaluation of the RMS samples was
performed both by chart review and microscopic evaluation, emphasizing the histology of the novel molecular groups. The methylation-defined ARMS and ERMS were concordant with the corresponding histopathologic designation in most instances (51 of 53 ARMS; 96%) and (45 of 46 ERMS; 98%). The SC/SRMS with MYOD1 abnormality were designated as spindle cell RMS in a subset of initial cases (2 of 3 initially diagnosed as SC/ SRMS were truly MYOD1 mutant; 67%), though most were designated as RMS not otherwise specified (NOS) clinically (n = 6). The tumors, in most instances, were dominated by a spindle-cell morphology (Fig. 4A,B), though in some examples, the characteristic hyalinized stroma was the dominant phenotype (Fig. 4C). For those SC/SRMS-MYOD1 mutant tumors that were available for immunostaining, we observed strong and diffuse immunoreactivity for MYOD1 (Fig. 4D).
While the PRMS methylation group predominately consisted of cases corresponding to the adult-type PRMS histologic class, a small proportion of pediatric patients clustered in that molecular group (n = 3). These pediatric tumors in this methylation subtype demonstrated severe morphologic anaplasia, analogous to the histologic appearance of the adult PRMS tumors (see Fig. 5).
The tumors included in this study displayed enrichment for those with diagnostic ambiguity (clinically classified as RMS not otherwise specified, NOS). A molecular class was assignable in all RMS, NOS cases (n = 43, Fig. 6A) with the majority of cases assigned to the ERMS group (n = 28), and a few cases falling into ARMS (n = 6), SC/SRMS (n = 6), and PRMS groups (n = 3).

Clinical Correlation of RMS methylation groups.
The consultative nature of the cohort limited the amount of follow-up data available (Fig. 6B-C). Of the three pediatric and adolescent cases clustered with adult type-PRMS, follow-up was available for one patient who died of disease (Table 1). Similarly, clinical follow-up was available for three patients with MYOD1-mutant SCRMS, all of whom succumbed to their disease ( Table 2). The adults with PRMS patients had a high mortality rate, with five of the six patients with clinical follow-up dying of disease (overall 83%).   Fusion  PAX7  FOXO1  MET  EWSR1  MYOD1  BCOR  BRAF  FGFR1  RARA  CACNA1H  MAP3K4  IGF2  ATRX  DCC  RB1  IGF1R  LRP1B  A2ML1  CDKN2A MDM2      Figure 6. Suggested molecular re-classification of RMS and Clinical Outcome. The histologic type designated at diagnosis was compared to the molecular classification by methylation profiling and displayed as a Sankey diagram (A). Methylation profiling was able to classify a significant number of tumors that could not be classified using traditional histology. Clinical outcome data in the Kaplan-Meier plot (B) highlights poor outcome in the pleomorphic, alveolar, and spindle/sclerosing groups. Numerical data (C) displaying number at risk over a ten year period.

Discussion
Rhabdomyosarcomas represent a heterogeneous group of soft tissue sarcomas associated with poor clinical outcomes. Historically, risk stratification within RMS has primarily been based on histopathologic subtype, with ARMS tumors getting more intensive therapy 30,31 . Because FOXO1 fusions characterize most ARMS, these can be identified molecularly. Notably, only 94% of our ARMS molecular group were found to harbor a FOXO1 gene rearrangement by FISH. This observation correlates with the historically recognized fusion-negative ARMS group and demonstrates the limitations of relying on FISH alone for molecular stratification 32 . Reliable diagnostic or prognostic biomarkers for all remaining RMS groups have not been established. ERMS are characterized by heterogeneous mutations that lead to activation of the RAS or AKT pathways but have no single recurrent mutation or fusion that can easily be tracked 32 . The most recent edition of the WHO Classification of Tumours, Soft Tissue and Bone Tumours now includes a histologic type encompassing the spindle cell/ sclerosing subtypes of RMS, a subset of which harbor MYOD1 L122R point mutations and have particularly aggressive clinical behavior and inferior outcomes 7 . Other tumors in the SC/SCRMS group are reported to contain heterogeneous fusions, including those involving VGLL2 or NCOA2, and have a less aggressive clinical course.
DNA methylation profiling has emerged as a promising method for discovering molecular heterogeneity in solid tumors, including soft tissue sarcomas [33][34][35][36] . Additionally, by combining the technology with supervised machine learning methods, neoplastic and nonneoplastic tissues can be classified using their genome-wide DNA methylation signatures in a single clinical assay. Methylation signatures correlate with cellular developmental pathways and closely recapitulate existing histologic and molecular classification schemas. Previous studies utilizing methylation profiling to interrogate RMS reported two methylation groups corresponding to the dominant ARMS and ERMS subtypes 11 . A methylation class corresponding to MYDO1-mutant SC/SRMS was recently reported as part of a comprehensive sarcoma classifier 37 . Our cohort provides independent validation of the SC/ SCRMS group with MYOD1 mutation and extends those findings to include a methylation class corresponding to the adult-type pleomorphic RMS histologic groups.
Despite the reported molecular heterogeneity of SC/SRMS, we only detect a methylation group for the SC/ SRMS tumors with MYOD1 L122R missense mutations. SC/SRMS with MYOD1 mutations are well-established and associated with a dismal prognosis 38 . Despite enriching for RNA sequencing in tumors histologically diagnosed as SC/SRMS, we did not find a corresponding methylation group of SC/SRMS containing gene fusions. This could suggest that SC/SRMS with fusions form a subset of ERMS or that fusion-positive SC/SRMS were represented infrequently in our cohort, with insufficient representation to form a distinct cluster. One histologically-defined SC/SRMS grouped with ERMS by methylation profiling. Unfortunately, sufficient material could not be obtained from the case to evaluate for fusions by RNA sequencing. Conceptually, this classification process is also further hindered by the knowledge that subsets of ERMS will display spindled morphology, a finding that will continue to hamper morphologic classification systems moving forward.
We also identified a second novel methylation group corresponding to the histologic adult-type PRMS group. Our PRMS methylation group demonstrated no recurrent driver mutations but was characterized by genomic instability and a high proportion of mutations in tumor suppressor genes. Although histologically-defined PRMS has primarily been considered specific to the adult population, with a peak age in the 60 s and 70 s 1 , we identified three pediatric patients in the molecularly-defined PRMS group (aged 9, 14, and 16 years). This finding challenges the notion of adult-and pediatric RMS being completely distinct diseases and suggests that a subset of pediatric ERMS with anaplasia may be molecularly indistinguishable from adult-type PRMS. In this cohort, only 3 of 13 cases classified as ERMS with anaplasia clustered with the adult-type PRMS, suggesting morphologic features alone are insufficient to identify such cases. SC/SRMS has also been found across the entire age spectrum, from infancy to adulthood, suggesting that the rare molecular groups may have a specific predilection to cross age boundaries.
Sample size limitations are often encountered in studies exploring rare malignancies. Despite the relatively low case numbers, our unsupervised analysis supports that these represents bona fide molecular groups. The number of cases for each of the rare groups (11 for PRMS and 8 for SC/SRMS, respectively) is in line with the size of molecular groups that were used to train supervised classifiers in clinical laboratories for brain tumors (19 of 91 subclasses have between 8 and 11 examples) 39   www.nature.com/scientificreports/ Clinical outcome data were limited in our cohort, and specifically for the new novel methylation groups. The single pediatric case in the PRMS methylation subtype with follow-up experienced rapid clinical decline and died of disease 43 days following diagnosis. The tumors in the SC/RMS methylation class also were associated with poor outcomes in our cohort. Additional characterization of the clinical correlations and outcome data will be required to characterize these novel molecular groups further.

Data availability
The methylation datasets generated during the current study are available in the GEO datasets repository. Accession number GSE167059.