Integrated exome and RNA sequencing of dedifferentiated liposarcoma

The genomic characteristics of dedifferentiated liposarcoma (DDLPS) that are associated with clinical features remain to be identified. Here, we conduct integrated whole exome and RNA sequencing analysis in 115 DDLPS tumors and perform comparative genomic analysis of well-differentiated and dedifferentiated components from eight DDLPS samples. Several somatic copy-number alterations (SCNAs), including the gain of 12q15, are identified as frequent genomic alterations. CTDSP1/2-DNM3OS fusion genes are identified in a subset of DDLPS tumors. Based on the association of SCNAs with clinical features, the DDLPS tumors are clustered into three groups. This clustering can predict the clinical outcome independently. The comparative analysis between well-differentiated and dedifferentiated components identify two categories of genomic alterations: shared alterations, associated with tumorigenesis, and dedifferentiated-specific alterations, associated with malignant transformation. This large-scale genomic analysis reveals the mechanisms underlying the development and progression of DDLPS and provides insights that could contribute to the refinement of DDLPS management.

D edifferentiated liposarcoma (DDLPS) is a rare malignant tumor with an incidence of <0.1/million each year 1,2 that occurs in~10% of cases of intermediate (locally aggressive) well-differentiated liposarcoma (WDLPS) 3 . Surgical excision is the primary treatment modality used for DDLPS, as DDLPS exhibits a low response rate to conventional chemotherapeutic reagents 4 . To date, several analyses have unveiled genomic characteristics common to DDLPS, including the amplification of 12q13-15, that are also frequently found in WDLPS [5][6][7][8][9][10] . These studies have also identified a number of genes within 12q13-15, including HOXC13, MDM2, HMGA2, CDK4, and CPM, as being key to the development of DDLPS and WDLPS; a number of additional genomic occurrences, such as the loss of 11q23 and the gain of 6q23 and 1p32, have been defined as genomic abnormalities that are specific to DDLPS 6,8,[11][12][13][14][15][16][17] . Recently, the Cancer Genome Atlas (TCGA) Research Network has identified the characteristics of some types of soft-tissue sarcoma, including DDLPS with amplification of 12q13-15, through comprehensive genomic analysis and showed that the classification of DDLPS tumors based on the status of their somatic copy-number alterations (SCNA) and DNA methylation could predict clinical prognosis 18 . These results, based on the analysis of 50 DDLPS cases, still require validation. In addition, the genomic events associated with the malignant transformation of DDLPS, and with DDLPS tumors without 12q13-15 amplifications that are histologically diagnosed, remain to be identified.
We established the Japan Sarcoma Genome Consortium (JSGC) in 2014 with the aim of generating a comprehensive map of the genomic alterations and abnormalities present in bone and soft-tissue tumors, in order to facilitate the implementation of precision medicine. Here, we collect tumor and normal tissue samples from 65 patients with DDLPS and perform whole-exome and RNA sequencing at two facilities, the Institute of Medical Science at the University of Tokyo (hereinafter, JSGC-IMSUT), and the National Cancer Center Research Institute, Japan (JSGC-NCC). In addition, we obtain FASTQ data derived from the whole-exome and RNA sequencing of 50 DDLPS tumors from TCGA, in order to conduct genomic meta-analysis on a total of 115 patients with DDLPS. In addition, eight pairs of welldifferentiated (WD) and dedifferentiated (DD) components from DDLPS tumors are obtained for the comparison of their genomic alterations.

Results
Clinical characteristics of the subjects. A total of 115 patients were enrolled in the current study (28 from JSGC-IMSUT, 37 from JSCG-NCC, and 50 from TCGA), and clinical information was collected from 108 of 115 patients. Of these 108 patients, 75 (69.4%) were male, and the mean age at diagnosis was 62.7 (±12.7) years (Table 1). A total of 73.1% of the DDLPS tumors arose from the retroperitoneum or abdomen, while the distribution of the primary tumor sites varied among the three groups; tumors were most frequently located in an extremity in JSGC-IMSUT patients (66.7%) and in the retroperitoneum or abdomen in JSGC-NCC (78.4%) and in TCGA (86.0%) patients. A total of 76.4% of the patients had tumors that were 10 cm or more in diameter (Table 1; Supplementary  Table 1) and 98.1% of the patients underwent surgery (Table 1;  Supplementary Table 1). These results indicated that the patients enrolled in this study may be representative of the general population of patients with DDLPS.
Somatic mutations and copy-number alterations. Based on exome sequencing for 115 pairs of DDLPS and normal tissue samples, we identified 2639 somatic mutations, including nonsynonymous single nucleotide variants (SNVs) and short insertions/deletions (INDELs), with a mean of 24.2 (0.274 per coding megabase) and a range of 0 to 70 mutations (Fig. 1a) per sample. The frequency distribution of the somatic mutations was almost comparable among the three groups, JSGC-IMSUT, JSGC-NCC and TCGA ( Supplementary Fig. 1a). The mutation frequency at each chromosome ranged from 0.114 (in chromosome 21) to 0.482 (in chromosome 12) per coding megabase ( Fig. 1b; Supplementary Fig. 1b). Base substitution analysis of synonymous and nonsynonymous SNVs with the adjacent 5′ and 3′ flanking nucleotides, in order to better understand the mutational processes involved, showed that nucleotide alterations from anyCG to anyTG were the most frequently detected in DDLPS (Fig. 1c). The nucleotide alterations exhibited similar trends among the three groups ( Supplementary Fig. 1c). Mutation signature analysis, using the COSMIC database, showed that signature 3 contributed the most to these base substitutions, followed by signature 1 (Fig. 1d), indicating that both the failure of DNA double-strand break-repair (signature 3) and aging (signature 1) may contribute to the development of DDLPS. Recurrently, mutated genes (frequency of more than five samples) were MUC16, TTN, ATRX, TRHDE, PCLO, ZNF717, TP53, FLG, and NAV3; however, the mutated loci at each gene were not recurrent ( Supplementary Fig. 2). The GISTIC analysis of SCNAs identified 28 gained regions (357 genes) and 55 lost regions (455 genes) ( Fig. 1e; Supplementary Data 1 and 2). As expected, the gain of 12q15 resulted in the lowest FDR q-value in the GISTIC analysis. The genome-wide analysis of SCNAs and their corresponding GISTIC values showed that the copy numbers of genes in chromosomes 11 and 13, and the short-arm of chromosome 9 were generally decreased, while those of the long-arms of chromosomes 9 and 20 and the short-arms of chromosomes 4, 5, 7, 19, and X were increased. Notably, the copy numbers of the genes located at 12q14.1-15 greatly increased ( Supplementary  Fig. 3). Among these genes, SLC35E3, MDM2, and CPM exhibited the highest mean GISTIC values with the lowest standard deviations ( Supplementary Fig. 3, boxed area), indicating that these three genes are greatly and consistently amplified in DDLPS. Recurrent chromosomal rearrangements and fusion genes. Genomon-Fusion, based on the sequencing of RNA from 101 DDLPS samples, revealed that the long-arm of chromosome 12 was the most frequent site of intra-and interchromosomal rearrangements, followed by chromosome 1 (Fig. 2a; Supplementary Fig. 4a).
Mutational landscape of dedifferentiated liposarcoma. The mutational profiles and genomic alterations that were found to be associated with DDLPS are summarized in Fig. 3a. As the TCGA study included DDLPS cases that were defined by 12q13-15 amplifications 18 , the cluster analysis for DDLPS with the highlevel gain of 12q15 was conducted after the classification of DDLPS without the high-level gain of 12q15 as Cluster 3. One SCNA, that involving the gain of 1p32.1, was independently associated with disease-specific survival and was the basis for dividing the DDLPS cases into two major clusters: Cluster 1 harbored the high-level gain of 12q15 along with the gain of 1p32.1, while Cluster 2 showed only the gain of 12q15 (Fig. 3a). Histological examinations, including immunohistochemistry and FISH, verified the compatibility of Cluster 3 samples with DDLPS. Among the recurrently mutated genes, mutations or the copy-number loss of TP53 was found to be accumulated in Cluster 3, particularly in three of the five DDLPS samples that did not have the high-level gain of 12q15 (Fig. 3a), indicating that the disruption of the MDM2/TP53 axis was the most decisive genomic event contributing to DDLPS development. Copy-number analysis also identified the common genomic features of Cluster 3, including the consistent gain or loss of 157 genes in nine regions (Supplementary Data 3), some of which were associated with PI3K-AKT signaling based on the KEGG pathway database.
Association of genomic alterations with clinical prognosis. Survival analysis after genomic clustering showed favorable progression-free survival rate in patients with Cluster 2 DDLPS compared with patients with Cluster 1 using Kaplan-Meier and univariate Cox-regression analyses ( Fig. 3b and Table 2a). The disease-specific survival in patients with Cluster 2 was also more favorable than in patients with Cluster 1 (Fig. 3c and Table 2b). Multivariate Cox-regression analyses showed that Cluster 1 classification (vs Cluster 2) was a significant predictor for poor progression-free and disease-specific survival, independently of the surgical margin and primary tumor site (Table 2a, b). Further multivariate analysis including the SCNA regions independently associated with progression-free survival (i.e., the high-level gain of 4p16.3 and 6p21.1, gain of Xq21.1 and loss of 9q34.11) and significant clinical parameters also demonstrated that these four SCNA regions are independent predictors of poor progressionfree survival (Supplementary Table 5). As GISTIC analysis identified 83 significant SCNA regions (Fig. 1e) Table 7). To further explore the SCNAdependent alterations of gene expression that were associated with clinical prognosis, we examined the transcript levels of genes in the SCNA regions. Six genes, JUN, DNM3, DNM3OS, TAF9B, DGKQ, and STX18, which are located at 1p32.1, 1q24.3, Xq21.1, and 4p16.3, showed altered expression that was correlated with the SCNAs in all three cohorts (Supplementary Table 8) and some of the genes exhibited significant association of highexpression with poor clinical prognosis (Supplementary Fig. 9); this indicated that these genes that contained SCNAs were the initial drivers of tumor progression in DDLPS.
Comparative analysis between WD and DD components. As we obtained both intermediate well-differentiated (WD) and highgrade malignant dedifferentiated (DD) components from eight DDLPS samples, we compared their genomic profiles in order to determine the mechanisms underlying the malignant transformation of DD. As expected, DD harbored more somatic mutations than matched WD in all cases, but shared few somatic mutations with WD ( Fig. 4a; Supplementary Fig. 10a). In contrast, WD and DD shared more SCNA regions in common, while DD harbored more prominent SCNAs as well as additional SCNA regions when compared with WD ( Fig. 4b; Supplementary  Fig. 10b). GISTIC analysis confirmed the results of the comparative SCNA analysis, and identified the shared gain of 1q24.3 and 12q14.3-15 and loss of 1p36.33, 15q11.2, and 16p13.3 between DD and WD ( Supplementary Fig. 10c, d). Circos plots showed common recurrent intra-and interchromosomal rearrangements at chromosomes 1 and 12 in the DD and WD pairs ( Fig. 4c; Supplementary Fig. 11a), while the heatmap of the chromosomal rearrangements revealed that the frequency of rearrangements was increased in DD compared to WD (Supplementary Fig. 11b, c). These results indicated that SCNAs and chromosomal rearrangements at chromosomes 1 and 12, but not somatic mutations, were common initial genomic events in both DD and WD and that additional copy-number alterations or chromosomal rearrangements were associated with the development of DD.
The multidimensional scaling of the RNA expression from six matched pairs revealed a clustered expression profile for WD but a relatively scattered expression profile for DD (Fig. 4d). GSEA revealed that gene sets that were related to cell-cycle progression, including G2M checkpoint and E2F targets, were significantly enriched in DD, while those related to adipocyte differentiation or lipid metabolism, including adipogenesis and fatty acid metabolism, were enriched in WD (Fig. 4e, f; Supplementary Table 9a, b). We finally performed genome-wide screening analysis to identify genes that are involved in the malignant transformation of DD. During the first screening step, we searched for genes with recurrent SCNAs that are specifically found in DD and identified 133 gained genes in 20 regions and 305 lost genes in 37 regions ( Supplementary Fig. 12). In the second step, we examined the expression levels of the 438 genes in DD and WD, and identified 27 genes that showed differential expression in accordance with the SCNAs (Fig. 4g,

Discussion
This study examined the genomic alterations associated with DDLPS by conducting whole-exome and RNA sequencing of more than 100 tumor samples. Through a series of analyses, we confirmed that the gain of the chromosomal region 12q15, which is already well-known to be associated with DDLPS, is the most frequent mutation observed in DDLPS; we also identified a number of DNM3OS-fusion genes. Based on the status of the genomic alterations, DDLPS could be classified into three groups, and this genomic classification could predict clinical outcomes. In addition, the comparative analysis of WD and DD revealed the SCNAs and chromosomal rearrangements at chromosomes 1 and 12 to be common initial genomic events and also revealed that the augmentation of the initially gained SCNA regions, the occurrence of additional SCNA regions, and/or further chromosomal rearrangements are events that are specifically associated with DD. Previous studies have repeatedly reported the copy-number gain at 12q13-15 in DDLPS [5][6][7][8][9][10] . Most of these studies focused on the fact that the copy-number gain of specific genes, including MDM2, HMGA2, and CDK4, were driver genomic alterations 7 . The current study also identified the gain of 12q15 as the most frequent event in DDLPS (Fig. 1e) and notably distinguished MDM2, CPM, and SLC35E3 as the most consistently and greatly duplicated genes in this region ( Supplementary Fig. 3), indicating that the simultaneous gain of MDM2, CPM, and SLC35E3 is a crucial step during the development of DDLPS. This study found that five of the 115 DDLPS tumors harbored no or low-level amplification of 12q15. Indeed, it is difficult to diagnose malignant soft-tissue tumors with little or no gain of 12q15 as DDLPS, but the histological diagnosis of the JSGC patients in Cluster 3 was verified by musculoskeletal pathologists both before and after the genomic analysis. The evidence that somatic mutations or the copy-number loss of TP53 were accumulated in Cluster 3 (Fig. 3a) could support the histological diagnosis and suggests the necessity of the disruption of the MDM2-TP53 axis during the development of DDLPS.
Some histological types of sarcoma can be characterized by the presence of specific fusion genes, such as EWSR1-FLI in Ewing's sarcoma 23 , EWSR1-ATF1 or EWSR1-CREB1 in clear cell sarcoma 24 , and SS18-SSX1/2 in synovial sarcoma 25 , all of which function as drivers of tumor development. FUS-DDIT3 is frequently associated with myxoid liposarcoma 26 , while no recurrent fusions have been reported in DDLPS. The current study identified CTDSP1-DNM3OS and CTDSP2-DNM3OS as recurrent fusion genes. As CTDSP1 and CTDSP2 encode the C-terminal domain small phosphatases 1 and 2 and the knock-down of CTDSP2 in DDLPS cell lines inhibited cell proliferation 27 , further analysis is essential to characterize the expression and function of the fusion protein. However, and more importantly, DDLPS that contained DNM3OS-fusion genes showed the significant upregulation of DNM3OS ( Fig. 2d; Supplementary Fig. 5a) and were correlated with cell-cycle pathways when compared with those without fusion genes ( Supplementary Fig. 5b). In addition, the gain of 1q24.3, accompanied by the upregulation of DNM3 and DNM3OS (Supplementary Table 8), was significantly associated with poor progression-free survival in Cluster 2 DDLPS (Supplementary Table 7). Because DNM3OS encodes the MIR199A2-MIR214 cluster 20,21 , of which MIR214 was maintained in the DNM3OS-fusion genes (Fig. 2b, c), the fusion genes may be involved with the induction of MIR214. Indeed, MIR214 and DNM3OS are consistently expressed during embryonic development 22 , and their expression was highly correlated in DDLPS (Fig. 2e, f). Several tumor-suppressor genes, including PTEN, ATM, TP53, and the adipogenic transcription factor PPARD, have been validated as targets of MIR214 20,28 . Taken together, these lines of evidence suggest that the upregulation of DNM3OS mediated by chromosomal rearrangement could lead to the proliferation of tumor cells and contribute to DDLPS progression.
Based on a series of genomic analyses, the DDLPS tumors could be classified into three groups (Fig. 3a). This classification showed that Cluster 1 DDLPS tumors were associated with poorer clinical outcomes than Cluster 2 DDLPS tumors (Fig. 3b, c and Table 2). Cluster 1 was characterized by the gain of 1p32.1, which contains JUN and other genes, and is comparable with K1 and a portion of the K2 clusters from the TCGA study 18 . The Table 2 Cox-regression analysis of progression-free (a) and disease-specific (b) survival with genomic clustering.  Fig. 3. *P < 0.05, **P < 0.01 a Trunk includes abdomen, retroperitoneum, chest wall, and back, and extremity includes extremity, shoulder, and girdle current and previous studies also showed the upregulation of JUN was correlated with the gain in its copy number (Supplementary Table 8) 18 . Because JUN amplification blocks adipogenesis 11 and is oncogenic in liposarcomas 13 , the upregulation of JUN that occurs as a result of 1p32.1 gain may play a pivotal role in DDLPS progression 18,29 . Prognostic nomograms, that provide survival predictions for sarcoma patients has been established 30,31 . Genomic clustering was shown to be an independent prognostic factor of the clinical parameters; primary tumor site and surgical margin ( Table 2), and further multivariate Cox-regression analysis, , and SVs (c) in WD and DD components from the same patients. In a, the circle size and numbers indicate the number of somatic mutations in WD or DD. A boxed gene, OTP1, indicates a common somatic mutation in the sample. In b, copy numbers were plotted according to the order of the chromosomal regions, from chromosome 1 (top) to 22 (bottom) and chromosome X. Red lines indicate segmented exome circular binary segmentation calls. The segmentation size is based on the exome capture kit bed file. Solid arrows indicate 12q15; empty arrows indicate 1p32.1. In c, two representative cases are presented, with others presented in Supplementary Fig. 9. d Multidimensional scaling analysis of expression profiles of paired WD and DD. Six pairs of WD and DD were analyzed. e, f GSEA analysis comparing the expression profiles of DD and WD. The gene sets most enriched in DD (e) and WD (f) are shown. g, h Volcano plots of the DD-specific gain (g) and loss (h) of genes. The red and blue dots denote large magnitude fold-changes (more than 2 or less than ½; horizontal axis) and high statistical significance (more than 1.301 of −log 10 of P-value by one-sided paired T test; vertical axis), respectively.
including majority of MSKCC-nomogram clinical parameters; age, gender, surgical margin, primary site, and tumor size 30 , showed significant association of the genomic clustering with progressionfree and disease-specific survivals (Supplementary Table 11). Such clinical models can more precisely predict prognosis by considering the relevant genomic clustering information in addition to the clinical parameters. The results of a previous CNA assay combined with a comparative genomic hybridization (CGH) array on 52 DDLPS samples showed that the loss of 11q23-24 was the most common mutational event in DDLPS, and that the loss of 19q13 was associated with poor prognosis 6 . The current study also identified 11q24.2 and 19q13.43 as recurrent SCNAs (frequency of 40.3% and 26.1%, respectively), but failed to show the association of 11q24.2 or 19q13.43 with clinical outcomes. We also identified the genes, whose expression level was dependently modulated on the prognostic SCNAs (Supplementary Table 8). These recurrent SCNAs and genes can be the potent prognostic marker, though further validation analysis, using other cohort samples, is required for the clinical application.
The comparative analysis of the genomic alterations in WD and DD from the same tumor tissue identified frequent SCNAs and chromosomal arrangements, especially in chromosomes 12 and 1, but few common somatic mutations (Fig. 4a- 10,11). This evidence supports the influence of two important factors that contribute to the mechanisms underlying the development of DDLPS. First, SCNAs, especially at 1q24.3 and 12q14.3-15, and concurrent inter-and intrachromosomal rearrangements in chromosomes 1 and 12, but not somatic mutations, were initial occurrences that were shared during the development of these types of tumors. Second, the augmentation of the common SCNAs and the emergence of additional SCNAs and chromosomal rearrangements in chromosome 12, both of which did not occur in WD, may cause the malignant transformation of DD. Previous comparative analysis between DDLPS and WDLPS identified the loss of 11q23 and the amplification of 6q23 and 1p32 as genomic abnormalities specific to DDLPS 6,11 . This study identified these SCNAs as recurrently affected regions in both DD and WD. This discrepancy might be caused by the differing backgrounds of the WD components in DDLPS and WDLPS. Indeed, the current fusion analysis identified DNM3OS fusions only in DDLPS and not in WDLPS, and a previous microarraybased transcriptome analysis showed a distinct gene expression pattern in WD from DDLPS versus that from WDLPS 32 . Another previous CGH array that compared pairs of WD and DD components from DDLPS tumors failed to identify any SCNAs that were able to significantly distinguish the two types of components 33 . Further comparative analysis using nextgeneration sequencing may discriminate the genomic profiles of WD in DDLPS from those of WDLPS and provide important information for use in establishing a treatment strategy for these tumors.
This study identified 27 genes that were specifically gained or lost in DD, but not in WD, and were differentially expressed in accordance with the alteration in their copy number (Fig. 4g, h;  Supplementary Table 10a, b). Of the 27 genes, G0S2 (G0/G1 Switch 2) and DGAT2 (diacylglycerol O-acyltransferase 2) were highly expressed in WD, with a mean FPKM of~1200 and 70, respectively, but were remarkably suppressed in DD at a level of approximately 95% (Supplementary Table 10b). As G0S2 regulates lipid metabolism and promotes apoptosis by binding to BCL2 [34][35][36] , and DGAT2 plays an important role in triacylglycerol biosynthesis and fat digestion and absorption 37 , the copy-number loss and concomitant downregulation of G0S2 and DGAT2 most likely strongly induced the dedifferentiation and malignant transformation of adipogenic cells.
Overall, the genomic alterations that occur during the progression of DDLPS can be summarized as follows (Fig. 5): The common genomic alterations in the DD and WD components, including the gain of 12q15 (containing MDM2, CPM, and SLC35E3), arise during the initiation step of DDLPS and lead to the impairment of P53 and chromosomal instability. During the second step of the malignant transformation, some of the tumor clones undergo further augmentation of the initial SCNAs and gain additional SCNAs and chromosomal rearrangements, including the DNM3OS-fusion genes and the loss of G0S2 and DGAT2, which contribute to the cell-cycle progression and impairment of adipogenesis. Finally, additional SCNAs, including the gain of 1p32.1, 1q24.3, 4p16.3, and Xq21.1 and the loss of 9q34.11, 12q24.33, 13q32.3, and Xp22.33, occur, which were found to be involved in tumor progression and are associated with poor clinical outcomes.
In conclusion, this study revealed the genomic characteristics of DDLPS using comprehensive genomic analysis of more than 100 tumor samples and revealed the genomic clustering of DDLPS tumors, which can be used to predict the prognosis of DDLPS patients. These findings will shed light on the underlying mechanisms of DDLPS development and progression and provide insights that can contribute to the refinement of DDLPS therapy.

Methods
Patients and tumor samples. We collected matched pairs of frozen normal and tumor samples from 65 patients (28 in JSGC-IMSUT and 37 in JSGC-NCC) with dedifferentiated liposarcoma (DDLPS). We also obtained WD components from 8 of the 65 DDLPS samples. We used blood, skin, or adipose tissue as the germline control samples. All of the samples collected from JSGC-IMSUT were transferred to a core analytic facility after anonymization at each hospital. Other samples that were collected from JSGC-NCC were prepared for next-generation sequencing at the National Cancer Center Research Institute. The frozen tumor samples from JSGC-IMSUT were sectioned for histological evaluation (Supplementary Fig. 13) and extraction of DNA and RNA. Histological data from the frozen and formalinfixed paraffin-embedded tumor samples, which had been prepared for clinical diagnosis, were evaluated by musculoskeletal pathologists to confirm the diagnosis and validity of the tumors and also to examine the content of the tumor cells. The present protocols were reviewed and approved by the Ethics Committees of all participating institutions, including the Institute of Medical Science, the University of Tokyo, the National Cancer Center, Japan, Tokyo Metropolitan Cancer and Infectious Diseases Center Komagome Hospital, Kyushu University, Osaka International Cancer Institute, Chiba Cancer Center, Nagoya University Graduate School of Medicine, Kanagawa Cancer Center, National Hospital Organization Hokkaido Cancer Center, and RIKEN Center for Integrative Medical Sciences. All of the participants were enrolled and anonymised after approval by the institutional review board. We obtained written informed consent from all participants, except for those we could not contact due to loss of follow-up or death at registration. In these cases, the Institutional Review Boards at each participating institution granted permission for existing tissue samples to be used for research purposes. In addition, the Institutional Review Board of the Institute of Medical Science, University of Tokyo provided permission for the fully anonymised genetic data to be shared (protocol numbers 26-22-0630 and 30-78-B0305). None of the samples used in this study came from patients who had opted out of participation.
Whole-exome sequencing. The whole-exome sequencing of the 65 DD and 8 WD components as well as 65 matched germline samples was performed using target capture with Agilent SureSelect XT Human All Exon V5 + IncRNA (Agilent, 5190-6448) in JSGC-IMSUT and with Agilent SureSelect XT Human All Exon V5 (Agilent, 5190-6210) in JSGC-NCC. The raw sequence data generated by the Illumina HiSeq2000 or HiSeq2500 sequencers were processed through an in-house pipeline used for the whole-exome analysis of paired cancer genomes at the Human Genome Center, Institute of Medical Science, University of Tokyo.
We also obtained FASTQ sequence data for 50 DDLPS cases from TCGA, which were merged with our sequence data and subjected to the following analyses.
Analysis of somatic mutations. For our sequencing data, FASTQ files were generated by CASAVA 2.0. Candidate somatic mutations were identified using the Genomon pipeline [https://github.com/Genomon-Project/genomon-docs/tree/ v2.0]. The human reference file that was used is GRCh37/hg19. The candidate mutations in a tumor sample were identified using the following criteria: (i) Fisher's exact P ≤ 0.01; (ii) ≥ 5 variant reads in the tumor sample; (iii) variant allele frequency (VAF) in the tumor sample ≥ 0.08; and (iv) VAF of the matched normal sample < 0.07, with the exclusion of synonymous SNVs and known variants listed in NCBI dbSNP build 131.
Analysis of somatic copy-number alterations. Copy-number aberrations were quantified and reported for each bed size as the segmented, normalized, log2transformed exon coverage ratio between each tumor sample and its matched normal sample. Significant focal copy-number alterations were identified using GISTIC2 (v 2.0.22) 38  RNA sequencing. In JSGC-IMSUT, the total RNA was extracted from frozen tumor tissues using the Qiazol reagent (Qiagen) and was purified using a RNeasy Plus Universal Mini kit (Qiagen) with DNase I digestion, according to the manufacturer's instructions. The RNA integrity was verified using an Agilent 2100 Bioanalyzer with RNA Nano reagents (Agilent Technologies). High-quality RNA from15 DD and 6 WD samples was subjected to polyA + selection and chemical fragmentation, and the 100-200-bp RNA fraction was used to construct cDNA libraries using the TruSeq Stranded mRNA Prep kit (Illumina) according to the manufacturer's protocol. For the RNA-seq of low-quality RNAs from 4 DD samples and one WD sample, libraries were constructed from the total RNA using the TruSeq RNA Access Library Prep kit (illumina), which captured the coding regions of the transcriptome. In JSGC-NCC, the total RNA was extracted from 32 DDLPS and 17 WDLPS samples using ISOGEN reagent (Nippon Gene), and was purified using an RNeasy MinElute Cleanup kit (Qiagen). Libraries were constructed from total RNA using the TruSeq Stranded Total RNA with Ribo-Zero Gold kit (illumina). These paired-end libraries were sequenced with the Illumina HiSeq2000 or HiSeq2500 platform.
Analysis of gene expression. Gene expression values were estimated from the RNA-seq data from the tumor samples using Tophat2 41 (Tophat 2 v2.1.0 [http:// ccb.jhu.edu/software/tophat/downloads/tophat-2.1.0.tar.gz]) and Cufflinks 42 (cufflinks v. 2.2.1 [http://cole-trapnell-lab.github.io/cufflinks/releases/v2.2.1/]). The paired-end transcriptome sequencing reads were aligned to the human reference genome (GRCh37/hg19) in Tophat2. BAM files named accepted_hits.bam, which were generated by the Tophat mapping module, were used to quantify the expression data using Cufflinks. Each gene expression dataset, derived from a different RNA library kit, was analyzed separately, as the RNA library kits each produced different expression profile clusters. A GSEA to identify gene sets enriched with DNM3OS-fusion-positive samples or DD components was performed using the JAVA GSEA v3.0 program 43 . Quantitative RT-PCR was performed twice to examine the MIR214 expression level in 29 JSGC-NCC samples. Each experiment was conducted in triplicate and the MIR214 expression levels were calculated by normalization to RNU48. The mean of the results of two experimental was used for further correlation analysis.
Statistical analysis for clinical variants. We obtained clinical information, including sex, age at diagnosis, primary tumor site, tumor size, modality of local treatment, and surgical margin, from 112 of 119 participants ( Table 1). The mean follow-up duration for the 112 patients with DDLPS was 3.61 years, with a total of 401 person-years. Clinical factors, including age at initial presentation, sex, tumor size (10 cm or more vs less than 10 cm), primary site (retroperitoneum, abdomen or chest wall vs extremity), surgical margin status, metastasis status at presentation, and genomic status were analyzed for their association with progression-free and overall survival using the Cox proportional hazards regression model and Kaplan-Meier statistics. Log-rank tests determined the univariate significance of a factor. Factors found to be significant in a univariate analysis were included in a multivariate Cox proportional hazards regression model. The hazard ratio (HR) and 95% CI were used to report the magnitude of the differences and the strength of the association.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
Sequencing FASTQ data files from exome and RNA sequencing have been deposited at the Japanese Genotype-phenotype Archive (JGA), which is hosted by the DDBJ, under accession number JGAS00000000177 and JGAS00000000182. Other data sets referenced during the study are available from the Genomic Data Commons [https://gdc.cancer.gov/]. All the other data supporting the findings of this study are available within the article and its Supplementary Information files and from the corresponding author upon reasonable request. A reporting summary for this article is available as a Supplementary Information file.