Multi-dimensional genomic analysis of myoepithelial carcinoma identifies prevalent oncogenic gene fusions

Myoepithelial carcinoma (MECA) is an aggressive salivary gland cancer with largely unknown genetic features. Here we comprehensively analyze molecular alterations in 40 MECAs using integrated genomic analyses. We identify a low mutational load, and high prevalence (70%) of oncogenic gene fusions. Most fusions involve the PLAG1 oncogene, which is associated with PLAG1 overexpression. We find FGFR1-PLAG1 in seven (18%) cases, and the novel TGFBR3-PLAG1 fusion in six (15%) cases. TGFBR3-PLAG1 promotes a tumorigenic phenotype in vitro, and is absent in 723 other salivary gland tumors. Other novel PLAG1 fusions include ND4-PLAG1; a fusion between mitochondrial and nuclear DNA. We also identify higher number of copy number alterations as a risk factor for recurrence, independent of tumor stage at diagnosis. Our findings indicate that MECA is a fusion-driven disease, nominate TGFBR3-PLAG1 as a hallmark of MECA, and provide a framework for future diagnostic and therapeutic research in this lethal cancer.


Introduction
Myoepithelial carcinoma (MECA) is an understudied type of salivary carcinoma, characterized by aggressive clinical behavior and a high rate of distant metastases. There are no active systemic therapies for this cancer, rendering recurrent or metastatic disease generally incurable. The molecular alterations that define MECA have not been well characterized (1,2). In earlier studies, MECA constituted less than 2% of all salivary gland cancers, but its incidence is now believed to be higher due to increased recognition by pathologists (3). In fact, MECA is the second most common type of carcinoma arising from transformed benign salivary adenomas (4). MECAs most commonly arise in the parotid gland, followed by the submandibular and minor salivary glands (1). While most cases present as localized disease, advanced locoregional or distant recurrence is relatively common, leading to a 5-year overall survival rate of 64% (2). Rearrangements affecting the pleomorphic adenoma gene 1 (PLAG1) or high-mobility group AThook 2 gene (HMGA2) have been observed in PAs and MECA ex-PAs (6)(7)(8)(9)(10)(11)(12). However, PLAG1 rearrangements have not been detected in MECA de novo, and the repertoire of PLAG1 fusion partners in MECA ex-PA remains poorly defined. Apart from rearrangement of PLAG1 and HMGA2, the spectrum of genetic alterations in MECA is unknown.
In order to guide clinical investigation for understudied, lethal cancers such as MECA, understanding the genetic alterations that underlie this disease is critical. Identifying molecular alterations may nominate targetable signaling pathways, and guide further research with targeted or immunotherapeutic approaches. In this study, we performed multi-dimensional analyses on 40 MECAs, including whole-exome and RNA sequencing, array comparative genomic hybridization, fluorescence in situ hybridization, and protein expression analyses. Taken together, our results provide the first molecular characterization of MECA, providing novel information on the molecular underpinnings of this cancer, which may guide future clinical investigation in this aggressive disease.

Patients
After obtaining written informed consent and approval from the Institutional Review Board, 40 patients diagnosed with myoepithelial carcinoma (MECA) of the salivary gland who were treated at Memorial Sloan Kettering Cancer Center (MSKCC) during the years 1965 to 2014 were included in the study. Clinical data from 34 patients were previously reported (5). Eight patients were included in previous studies examining PLAG1 and HMGA2 rearrangement status (6,13). marker or S100. In cases that initially did not meet these criteria due to lacking IHC results, we performed additional IHC to confirm the diagnosis; see Supplementary Fig. 2 for IHC results.

DNA and RNA extraction
For cohort 1 (cases 01-12), tissue samples were snap frozen in liquid nitrogen at the time of surgery and stored at -80 °C. Tumor and matched normal (from peripheral blood or non-neoplastic tissue) DNA and tumor RNA were extracted using the DNeasy Blood and Tissue Kit (Qiagen, Venlo, Netherlands) and the RNeasy Plus Mini Kit (Qiagen) respectively. For cohort 2 (cases 13-40), formalin-fixed paraffin-embedded (FFPE) tumor tissue was sectioned and stored at room temperature. DNA and RNA were extracted using the QIAamp DNA FFPE Tissue Kit (Qiagen) and the RNeasy FFPE kit (Qiagen) respectively. DNA was quantified with Nanodrop 8000 and amplified by PCR. Exome libraries were captured with the SureSelect XT v4 51 Mb capture probe set (Agilent Technologies), enriched by PCR, and quantified using the KAPA Library Quantification Kit (KAPA Biosystems, Wilmington, MA, USA), 2100 BioAnalyzer (Agilent Technologies), and Qubit Fluorometer (Thermo Fisher Scientific). Sequencing was done on a HiSeq2500 (Illumina) using 2 x 125 bp cycles.

RNA sequencing
In cohort 1, RNA sequencing libraries were generated from ~100 ng total RNA extracted from fresh frozen tumor tissue, using the KAPA Stranded RNA-Seq with RiboErase sample preparation kit (KAPA Biosystems). RNA was subjected to ribosomal depletion, first and second strand cDNA synthesis, adapter ligation, and PCR amplification using 11 cycles. Libraries were quantified using the KAPA Library Quantification Kit (KAPA Biosystems), 2100 BioAnalyzer (Agilent Technologies), and Qubit Fluorometer (Thermo Fisher Scientific), and were sequenced on a HiSeq2500 (Illumina) using 2 x 125 bp cycles. In cohort 2, libraries were generated from ~1 µg RNA extracted from FFPE, using the TruSeq® Stranded Total RNA LT kit (Illumina). PCR was amplified using 6 cycles, and sequencing was done using 2 x 50 bp cycles on a HiSeq2500.

Mutation analysis
Raw whole exome seq data were aligned to the hg19 reference sequence using the Burrows-Wheeler Aligner (BWA) version 0.7.10 (14). Realignment of insertions and deletions (indels), base quality score recalibration and duplicate reads removal were done with the Genome Analysis Toolkit (GATK) version 3.2.2 (broadinstitute.org/gatk), according to the raw read alignment guidelines (15). Single nucleotide variants (SNVs) were independently detected by MuTect (16), Somatic Sniper v1.0.4.2 (17), Strelka v1 (18), and Varscan v2.3.8 (19), and significant SNVs were identified using a previously described pipeline (20). In brief, SNVs that were identified by ≥2 callers, with ≥8x coverage and ≥10% variant allelic fraction in tumor DNA, as well as ≥15x coverage and >97% normal allelic fraction in normal DNA, were considered high-confidence variants. Variants that met the criteria for tumor coverage and variant allelic fraction but were detected by 1 caller only and/or had a normal DNA coverage of 4-14x, and passed manual review using the Integrative Genomics Viewer (IGV) v2.3 (broadinstitute.org/igv), were considered low-confidence SNVs. Indels were detected by Strelka and VarScan. Variants with ≥4x tumor allelic coverage, >10% tumor allelic fraction, ≥4x normal certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted June 9, 2017. ; https://doi.org/10.1101/148072 doi: bioRxiv preprint DNA coverage, and >97% normal allelic fraction that passed manual review in IGV, were considered potential indels. In cohort 2, mutations were called from RNA seq data using HaplotypeCaller. Since no normal tissue DNA or exome seq data was available in this cohort, only mutations affecting genes listed in COSMIC cancer Gene Census (21), and present in the COSMIC database (22), were evaluated.
Variants with ≥15x coverage and ≥20% tumor allelic fraction that passed manual review using IGV were considered as true mutations.

Validation of mutations
The match between tumor and normal samples of each patient was confirmed with VerifyBamID (23), and with fingerprinting analysis using an in-house panel of 118 single nucleotidepolymorphisms. In cohort 1, a random selection (15%) of the high-confidence SNVs, as well as all potential indels and low-confidence SNVs, were subjected to orthogonal validation using NimbleGen SeqCap EZ target enrichment (Roche, Basel, Switzerland), with a 500x and 250x sequencing depth for tumor and normal DNA, respectively. Variants achieving >100x coverage of both tumor and normal DNA, >15% variant allelic fraction in tumor, and <3% variant allelic fraction in normal DNA, were considered validated. The overall validation rate in high-confidence SNVs was 98%. All high-confidence SNVs, as well as validated indels (80% of all potential indels) and low confidence SNVs (20% of all), were considered true mutations.

Array comparative genomic hybridization (arrayCGH)
ArrayCGH analysis was performed using the human genome CGH 4X180K microarray (G4449A; Agilent Technologies). Approximately 500 ng of tumor and reference DNA were fragmented, labeled, hybridized, and washed according to manufacturers instructions (Agilent Technologies).
The arrays were subsequently scanned on a High-Resolution C Microarray Scanner and the images processed using Feature Extraction v.10.7.1 (Agilent Technologies) with linear normalization (protocol CGH_107_Sep09). Data were analyzed using Nexus Copy Number Software ® certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted June 9, 2017. ; https://doi.org/10.1101/148072 doi: bioRxiv preprint Discovery Edition v. 8.0 (BioDiscovery, El Segundo, CA, USA). Copy number alterations (CNAs) were defined using the FASST2 segmentation algorithm with a significance threshold of P=1.0E-8. The log 2 ratio thresholds for aberrations calls were set to 1.5 for amplification, 0.3 for gain, -0.3 for loss and -1.5 for homozygous deletion. Each aberration was manually validated to confirm the accuracy of the call and excluded when occurring in sex chromosomes or in regions reported to have a high prevalence of copy number variation (Database of Genomic Variants; http://dgvbeta.tcag.ca/dgv/app/news?ref=NCBI37/hg19).

Gene expression analysis
RNA seq FASTQ files were aligned to the hg19 genome using STAR aligner with default parameters (25), counted using Rsamtools v3.2 (26), and annotated using the

Tumor clonality analysis
The degree of intratumoral genetic heterogeneity in Cohort 1 samples with at least 10 mutations (n=9) was analyzed by integrating mutation allelic fractions, tumor purity, and allele-specific copy number, using methods previously described (28). Mutations were clustered using PyClone (29), with allele specific copy number information determined by FACETS (30).

Prediction of neoantigen formation
For fusion genes, RNA seq reads were aligned to the UCSC human genome assembly hg19 by TopHat2 (31). Neoepitopes resulting from gene fusions were integrated with predicted HLA alleles from HLAminer (32) by INTEGRATE-Neo (33). For both fusion genes and mutations, NetMHC 4.0 (34) was used for neoantigen binding prediction.

Immunohistochemistry
Sections (5µm) of formalin-fixed, paraffin embedded tumor tissue were stained with antibodies specific for TGFBR3 (HPA008257, polyclonal rabbit, dilution 1:130; Sigma-Aldrich, St.Louis, certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted June 9, 2017.   Table 1.

Fluorescence in-situ hybridization (FISH)
FISH was performed on paraffin section using dual-color, dual-fusion translocation probes (see Supplementary Table 2 for clone list). Probe labeling, tissue processing, hybridization, posthybridization washing, and fluorescence detection were performed according to standard procedures. Slides were scanned using an Axioplan 2i epifluorescence microscope (Zeiss, certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted June 9, 2017. ; https://doi.org/10.1101/148072 doi: bioRxiv preprint Oberkochen, Germany) equipped with a 6 megapixel CCD camera (CV-M4+CL, JAI) controlled by Isis 5.5.9 imaging software (MetaSystems Group Inc, Waltham, MA). Marked region(s) within the section were scanned through 63X or 100X and at least five images per representative region captured (each image was a compressed stack of 12 or 15 z-section images taken at 0.5 micron intervals). Signal counts were performed on the captured images and a minimum of 30 discrete nuclei scored. Within each section, normal regions/stromal elements served as the internal control to assess quality of hybridization. A case was considered positive for translocation if >15% cells showed at least one fusion signal with or without a separate red and green signal representing the normal copy/allele. Amplification was defined as >10 copies or at least one small cluster of gene/locus (≥4 copies; resulting from tandem duplication/repeats). In cells with high-level amplification (HSR-type/tandem repeats or DM), signals beyond 20 could not be accurately counted and were therefore given a score of 20. Cells with 3-5 copies and 6-10 copies were considered to be polysomic and high-polysomic respectively.

Cell culturing
The HSG and TCG580 cell lines were obtained from O. Baker and G. Stenman, respectively. Short

Proliferation experiments
For growth curve assays, 20,000 cells were seeded in triplicate in a 12-well plate and counted at different time points using the Vi-Cell XR Cell Viability Analyzer (Beckman Coulter, Brea, CA, USA).

Migration experiments
For migration analysis, 30,000 cells were seeded in each well on an xCELLigence CIM-plate 16.
Migration index was analyzed hourly for 24 h using an xCELLigence RTCA DP instrument (ACEA Biosciences, San Diego, CA, USA).

Soft-agar colony formation assay
A bottom layer of agar was created by adding 1.5 ml of a 1:1 mix of melted noble agar (1%) and 2x concentrated DMEM with 10% FBS in 6-well plates. After 30 minutes incubation at room temperature, 5,000 cells were seeded in triplicates in a 1:1 mix of melted noble agar (0.6%) and 2x concentrated DMEM with 10% FBS. Plates were incubated at 37 °C for 3 weeks, and colonies certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted June 9, 2017. ; https://doi.org/10.1101/148072 doi: bioRxiv preprint were quantified with a GelCount TM colony counter (Oxford Optronix, Abingdon, UK). For ALK inhibitor experiments, crizotinib or lorlatinib was added to the 2x concentrated media at double the final concentration before mixing it 1:1 with 0.6% noble agar.

Drug sensitivity assay
For ALK inhibitor sensitivity experiments, 20,000 cells were seeded in triplicates on a 12-well plate. After 24 hours, cells were treated with indicated doses of crizotinib (Sigma-Aldrich), lorlatinib (MedChem Express, Monmouth Junction, NJ, USA), or DMSO (Sigma-Aldrich) for 72 hours and then quantified using the Vi-Cell XR Cell Viability Analyzer (Beckman Coulter).

Mutation spectrum of MECA
To investigate the mutational landscape of MECA, we initially performed whole-exome sequencing of DNA from snap-frozen tumor and matched normal tissue of 12 patients (cohort 1).
The mean coverage was 140x for tumor DNA and 78x for normal DNA, and 97% of the target sequence was covered to at least 20x depth ( Supplementary Fig. 3). Targeted resequencing showed a high validation rate (98%) of the detected mutations (see Methods). RNA sequencing in the 12 cases resulted in an average of 132M reads per sample, of which 90% were aligned to the reference sequence with high quality (Supplementary Fig. 4). Twenty-eight additional formalin-fixed paraffin-embedded (FFPE) tumors (cohort 2) were analyzed with RNA sequencing with an average 138M reads per sample, of which 74% were aligned to the reference sequence with high quality ( Supplementary Fig. 5). See Supplementary Table 3 for clinical information.
We detected a median of 16 non-silent somatic mutations per tumor, corresponding to 0.5 mutations per megabase (Supplementary Fig. 6 and Supplementary Table 4). This mutational load is similar to that of salivary gland adenoid cystic carcinoma (0.3/MB) (35), and is lower than that of salivary duct carcinoma (1.7/MB) and most other adult solid cancers (20,36). Despite a low mutation count, MECA tumors showed evidence of intratumoral genetic heterogeneity, with the majority of sequenced tumors harboring at least 1 subclonal population ( Supplementary Fig. 7).
One tumor had a hypermutated genotype, with 1,116 somatic mutations and a transition/transversion ratio of 8.1. This was in contrast to the other cases, which had a mean certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted June 9, 2017. ; https://doi.org/10.1101/148072 doi: bioRxiv preprint transition/transversion ratio of 1.3. Three patients had hotspot mutations in genes involved in RAS signaling: PIK3CA H1047L/R in two cases, and a combination of HRAS Q61R and PIK3CA K111E in one case (Fig. 1, and Supplementary Tables 4 and 5). These mutations appeared to be clonal events with high cancer cell fraction, consistent with early events in tumor development ( Supplementary   Fig. 7). Forty-eight percent of the somatic point mutations found in DNA were also detected by RNA sequencing (Supplementary Fig. 8A), indicating that they are expressed. The percentage of mutations detected by RNA sequencing was higher in cancer genes, and was independent of mutational load ( Supplementary Fig. 8A-B). This is in line with previous findings in head and neck squamous cell carcinoma and salivary duct carcinoma (20,37). certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Fusion genes
We detected fusion genes in 28 of 40 (70%) tumors ( Fig. 1 and Supplementary Table 6); a majority of these were confirmed by RT-PCR (Supplementary Fig. 9) and/or FISH analysis ( Supplementary   Fig. 10 and 11). Fusion-positive cases had a lower mutation rate than fusion negative ones (median: 12 versus 40 mutations/tumor; P = 0.0020; Mann-Whitney test; Supp Fig 6), and no mutations in cancer driver genes were found in fusion-positive tumors. The identified fusion genes were mutually exclusive with one another, consistent with a role as potential drivers of oncogenesis.
These findings add MECA to the list of salivary malignancies that are known to harbor fusions, including adenoid cystic, mucoepidermoid, secretory, and hyalinizing clear cell carcinomas (38).

PLAG1 translocations
Half of MECA cases (21/40; 53%) harbored rearrangements involving the PLAG1 oncogene, which was found in both MECA de novo and MECA ex-PA ( Fig. 1 and Supplementary Table 6). The copyright holder for this preprint (which was not this version posted June 9, 2017. ; https://doi.org/10.1101/148072 doi: bioRxiv preprint Table 7). The striking, specific enrichment of FGFR1-PLAG1 in MECA ex-PA raises the possibility that this fusion in PA may predispose to malignant transformation, particularly along the MECA lineage.
The genomic breakpoints were generally located between the promoter and the transcription start site of both FGFR1 and PLAG1, potentially leading to expression of a full-length PLAG1 protein regulated by the FGFR1 promoter ( Fig. 2A). Indeed, the FGFR1-PLAG1 positive tumors showed high expression of PLAG1 but low levels of FGFR1 (Fig. 2B). IHC analysis showed positive nuclear PLAG1 staining in tumors positive for FGFR1-PLAG1 ( Supplementary Fig. 12).
In PA, the FGFR1-PLAG1 fusions are generated by a ring chromosome, consisting of a small centromeric portion of chromosome 8 (42). We performed array CGH analysis and found a distinct amplification of chromosome 8 between the FGFR1 and PLAG1 loci, consistent with ring chromosome formation, r(8)(p12q12.1), in all 7 MECA ex-PAs positive for FGFR1-PLAG1 ( Supplementary Fig. 13). To date, the FGFR1-PLAG1 fusion has not been detected without ring formation.

Detection of a novel, recurrent TGFBR3-PLAG1 fusion
In 6 of 40 (15%) tumors, we detected a TGFBR3-PLAG1 fusion gene (Fig. 1), which has not been previously reported in cancer. We found no evidence of TGFBR3-PLAG1 or t(1;8) translocations in 442 PAs investigated by karyotyping, or in 281 published and unpublished cases of salivary gland tumors other than MECAs (Supplementary Table 8). Thus, available data indicate that the TGFBR3-PLAG1 fusion appears to be specific for MECA. Similar to what was noted for FGFR1-PLAG1, the genomic breakpoints in the TGFBR3-PLAG1 fusion were located before or just after the transcriptional start site of both TGFBR3 and PLAG1 (Fig. 2C). This could potentially lead to promoter swapping, and expression of full-length (or near full-length) PLAG1 and TGFBR3 proteins. As with FGFR1-PLAG1, PLAG1 was significantly overexpressed in TGFBR3-PLAG1 positive tumors (Fig. 2D), and IHC analysis showed strongly positive nuclear PLAG1 staining ( Supplementary Fig. 12). Notably, unlike other PLAG1 rearrangements, tumors with TGFBR3-PLAG1 showed overexpression also of the 5' fusion partner, TGFBR3 (Fig. 2D). IHC analysis confirmed high levels of TGFBR3 in TGFBR3-PLAG1 positive tumors (Fig. 2E). certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted June 9, 2017. ; https://doi.org/10.1101/148072 doi: bioRxiv preprint certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted June 9, 2017. ; https://doi.org/10.1101/148072 doi: bioRxiv preprint Four of the 6 TGFBR3-PLAG1 positive cases were MECA de novo tumors (Fig. 1), and showed uniform and strong TGFBR3 staining. In one TGFBR3-PLAG1 positive MECA ex PA, IHC for TGFBR3 was uniformly positive in the carcinoma component, but negative in the PA component ( Supplementary Fig. 14A). In the other MECA ex-PA case, TGFBR3-PLAG1 was detected only in limited focal areas of the carcinoma component. The fusion-positive areas were the only parts of the tumor that were positive for TGFBR3 (Supplementary Fig. 14B). Together, these results suggest that the TGFBR3-PLAG1 fusion gene causes overexpression of a functional TGFBR3 protein, likely contributing to the development of MECA, arising either in normal salivary gland tissue or a pre-existing PA.
We then asked whether high TGFBR3 expression is found in other types of salivary tumors, given that there was no evidence for the fusion in other salivary tumors. We examined 11 PAs, finding no cases with high TGFBR3 staining. In 23 non-MECA salivary cancers, however, we did identify high TGFBR3 staining in a subset of adenoid cystic carcinoma (ACC), epithelial myoepithelial carcinoma (EMC), and polymorphic low-grade adenocarcinoma (PLGA) samples ( Supplementary   Fig. 15). These data indicate that high TGFBR3 expression is found across multiple salivary cancer histologies, but not in benign adenomas, and that MECAs appear to achieve this overexpression via TGFBR3-PLAG1 fusion.
We performed unsupervised Poisson clustering on MECA transcriptome data within each cohort, and observed that the highly TGFBR3 positive cases clustered together in each cohort ( Fig. 2F and Supplementary Fig. 16). This indicates that TGFBR3 overexpression is associated with a distinct transcriptional program in MECAs. In contrast, the FGFR1-PLAG1 fusion did not cluster together in either cohort.
TGFBR3 can act as a tumor suppressor which acts negatively on TGF-ß signaling by binding to TGFBR1 and TGFBR2 separately, thereby inhibiting the TGFBR1/2 complex formation (43). On the other hand, overexpression of TGFBR3 has oncogenic potential in breast and colorectal cancer (44,45). We detected no significant difference in TGF-ß signaling activity between tumors with high and low TGFBR3 levels, as judged by expression of 84 key genes responsive to TGF-ß signal transduction ( Supplementary Fig. 17) (46). IHC analysis showed similar levels of the TGF-ß signaling effectors phosphorylated SMAD 2 and 3 in tumors with or without TGFBR3 overexpression (data not shown). In line with this, the TGF-ß signaling pathway activity was certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted June 9, 2017. ; https://doi.org/10.1101/148072 doi: bioRxiv preprint similar between the groups based on Ingenuity pathway analysis (27), which instead revealed altered activity of several cell cycle regulation pathways in TGFBR3 overexpressing tumors ( Supplementary Fig. 18). These data suggest that the oncogenic effect of TGFBR3 overexpression in MECA is mediated by mechanisms independent of TGF-ß signaling, which is in line with previous findings in breast cancer (43).

Overexpression of TGFBR3 and PLAG1 causes transformation of human salivary gland cells
Overexpression of PLAG1 has been shown to have oncogenic potential, both in vitro and in vivo (47,48). However, the role of high TGFBR3 levels in salivary gland cancer has not been investigated. TGFBR3 has been described as having both oncogenic and tumor suppressive phenotypes, depending on context (44,49,50). To determine the effects of TGFBR3 and PLAG1 upregulation in salivary cells, we infected human salivary gland (HSG) cells with constructs leading to overexpression of PLAG1 and/or TGFBR3 at levels similar to those detected in TGFBR3-PLAG1 positive tumors (Fig. 3A). Overexpression of PLAG1 or TGFBR3 caused a similar increase in proliferation, whereas co-expression of TGFBR3 and PLAG1 led to a further increase suggesting an additive effect (Fig. 3B). Furthermore, overexpression of TGFBR3, but not of PLAG1, led to increased migration and soft agar colony formation in these cells (Fig. 3C-D).
We then examined the effect of TGFBR3 expression in non-malignant salivary cells. We  Fig. 20). Taken together, these data from malignant and non-malignant salivary cell lines, suggest that the TGFBR3-PLAG1 fusion, by promoting overexpression of each gene, is tumorigenic.

ND4-PLAG1, a result of fusion between mitochondrial and nuclear DNA
One case of MECA ex-PA harbored the novel fusion gene ND4-PLAG1 (Fig. 4A). ND4 is a mitochondrial gene that encodes the protein NADH-ubiquinone oxidoreductase chain 4. In this fusion, the 3' part of PLAG1, including the full coding region of the gene, was fused with the certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted June 9, 2017. ; https://doi.org/10.1101/148072 doi: bioRxiv preprint inverted 3' end of ND4. The rearrangement was detected by RNA sequencing (Supplementary Fig.   21), and was confirmed by PCR and Sanger sequencing of purified PCR products from genomic DNA and cDNA ( Fig. 4B-C, and Supplementary Fig. 22A). By using PCR primers with increasing distance to the breakpoint, we determined that the size of the mitochondrial insertion was at least 591 base pairs, including not only ND4 but also 3 downstream genes encoding transfer RNA ( Supplementary Fig. 22B). Larger PCR products were not detected, which may reflect limited size of the mitochondrial DNA insertion, or may be due to degradation of the DNA as it was extracted
We also detected the novel fusion genes ACTA2-PLAG1, GEM-PLAG1, and NCALD-PLAG1 ( Supplementary Fig. 23B-D). In all these cases, the breakpoints were located between the promoter and the transcriptional start site of both PLAG1 and the 5' fusion partner, potentially leading to ectopic expression of full-length PLAG1 under control of the GEM, ACTA2, or NCALD promoter, respectively.
One tumor harbored NKTR-PLAG1, where PLAG1 was fused with intron 10 of the inverted NKTR gene (Supplementary Fig. 23E). In another case, PLAG1 was fused with an intergenic region located only 600 kb away from PLAG1, likely resulting from a deletion in the chromosome region 8q12. Similar to the ND4-PLAG1 fusion gene, these rearrangements are not likely to cause promoter swapping. Instead, they might mediate overexpression of PLAG1 by locating regulatory certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted June 9, 2017. ; https://doi.org/10.1101/148072 doi: bioRxiv preprint elements such as enhancer regions upstream of the gene. In 2 tumors, RNA sequencing showed evidence of a PLAG1 fusion gene, but we were not able to determine the fusion partner.

Activation of effector pathways in tumors with PLAG1 rearrangements
All tumors with a PLAG1 rearrangement showed overexpression of PLAG1, and FGFR1-PLAG1 positive cases tended to have the highest PLAG1 levels (Supplementary Fig. 24A). The other PLAG1 family members, PLAGL1 and PLAGL2, were equally expressed in PLAG1 fusion positive and -negative cases (Supplementary Fig. 24B). The oncogenic effect of PLAG1 overexpression has been hypothesized to be mediated by different mechanisms, such as activation of insulin-like growth factor 2 (IGF2) and possibly also Wnt signaling (47,53). We found that PLAG1 rearrangements were associated with increased expression of IGF2, and of the other putative PLAG1 effector genes CRABBP2 and CRLF1 (Supplementary Fig. 24C). On the other hand, Wnt signaling, estimated by analyzing the expression of 5 known effector genes (54), did not appear to differ between PLAG1 fusion positive and -negative cases (Supplementary Fig. 24D). This could potentially suggest that IGF2, but not Wnt, may be involved in the oncogenic effects of PLAG1 fusion genes in MECA.

HMGA2 rearrangements
HMGA2 encodes a small chromatin-binding protein that mediates transcriptional activation of several oncogenic functions, including inhibition of DNA repair and p53-induced apoptosis (55).
We detected HMGA2 rearrangements in 1 MECA de novo and 3 MECA ex-PA tumors, which all showed overexpression of HMGA2. The genomic breakpoints were located within the 3' untranslated region (UTR) of HMGA2, which was fused with intergenic regions from chromosomes 3, 6, or 12 ( Supplementary Fig. 24A-C). The finding that HMGA2 fuses with apparently random genomic locations suggests that the fusion partner of HMGA2 may be irrelevant in MECA. This is in line with the hypothesis that loss of target sites for the negatively regulating miRNA let-7 leads to overexpression of full-length HMGA2 (59,60). Of note, 2 of the 4 tumors certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted June 9, 2017. ; https://doi.org/10.1101/148072 doi: bioRxiv preprint with HMGA2 rearrangements also harbored an FGFR2 M186T mutation, raising the possibility that these events may cooperate in tumor development.

EWSR1-ATF1 in MECA de novo tumors
Fusions of the 5' part of EWS RNA binding protein 1 (EWSR1) and the 3' part of activating transcription factor 1 (ATF1) are found in a majority of salivary gland hyalinizing clear-cell carcinomas (61), and were also reported in soft tissue myoepithelial tumors (62). We detected EWSR1-ATF1 in 2 de novo tumors ( Supplementary Fig. 26A), and re-review by a salivary gland pathologist (N.K) revealed typical MECA morphology as well as clear cell features in both of these cases (Supplementary Fig. 26B). They were therefore considered clear cell MECA, a variant that has been described previously (63).  5B). These findings were confirmed using HEK cells ( Supplementary Fig. 28A-C). Unlike overexpression of MSN, MSN-ALK also stimulated proliferation and soft-agar colony formation in HSG cells (Fig. 5C-D). Treatment with the ALK inhibitors crizotinib (100nM) or lorlatinib (100nM) reduced the colony formation of cells expressing MSN-ALK to baseline levels (Fig. 5D).

Characterization of MSN-ALK
Together, these results show that MSN-ALK is an activating and oncogenic fusion gene, and that MSN-ALK positive tumors may potentially respond to ALK inhibitor treatment. certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Genetic alterations leading to potential neoantigen formation
Neoantigens are novel protein sequences resulting from genetic alterations in cancer cells which, if able to bind to MHC molecules on the surface of antigen presenting cells, may then activate cytotoxic T lymphocytes. Tumors with a high mutational burden tend to be more likely to respond to checkpoint blockade immunotherapy (68,69), an effect probably mediated by a higher number of neoantigens in these tumors (70). MECAs have low mutational load. As expected, we identified a low number of mutation-associated neoantigens predicted to bind to MHC with high affinity (median 2 per tumor; Supplementary Fig. 29). We then investigated the potential of fusion genes certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted June 9, 2017. ; https://doi.org/10.1101/148072 doi: bioRxiv preprint to generate neoepitopes. Whereas no potential neoantigens were found in the FGFR1-PLAG1 positive tumors, the TGFBR3-PLAG1 fusions resulted in several neoepitopes with predicted high affinity binding to MHC molecules (Supplementary Table 9). These findings are based on in silico predictions only and should be interpreted with caution, but indicate that, even in the presence of low mutation count, recurrent gene fusions generate neoepitopes that may have relevance for neoantigen-targeted immunotherapy.

Copy number alterations
Array CGH analysis revealed a higher number of CNAs in MECA ex-PAs compared to MECA de novo tumors (p = 0.0036; Fig. 6A Table 11).
Some of the tumors showed a recurrent pattern of CNAs. For example, 4 tumors positive for FGFR1-PLAG1 all had chromosome 1p deletion, 1q gain, 6q deletion, 8q amplification, and 11p deletion, which may suggest combinatorial oncogenic effects of these alterations (Fig. 6A).
Chromosome 6q deletions have been reported in several types of salivary cancer (71), and were found in 8 (20%) of the MECAs. The detection of 6q deletions was associated with a higher total number of CNAs (mean 11.5 vs 6.1 CNAs per tumor, P = 0.0029, Mann Whitney test), which is in line with previous findings in ACC (72). The PLAG Like 1 (PLAGL1) transcription factor has been suggested as a putative tumor suppressor gene that may mediate the oncogenic effect of chromosome 6q deletions (73), but previous studies did not show reduced PLAGL1 expression in ACCs with 6q translocations (74,75). We detected low PLAGL1 expression in 4 of 8 tumors with 6q deletion (Supplementary Fig. 30A), but the significance of PLAGL1 in MECA remains unknown.
Amplification of 8q has been previously reported in MECA but is not prevalent in benign myoepitheliomas (76). We detected 8q amplifications/gains in all FGFR1-PLAG1 positive tumors and in a majority of MECA ex-PA tumors with other PLAG1 fusion genes (Fig. 6A). As r (8) amplification in FGFR1-PLAG1 positive tumors only include a small portion of the q arm, we hypothesize that gains of 8q may mediate additional oncogenic effects in tumors with r (8) formation. certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted June 9, 2017. ; https://doi.org/10.1101/148072 doi: bioRxiv preprint Homozygous or heterozygous loss of the 9p21-22 region, including the CDKN2A tumor suppressor gene, were reported in soft tissue myoepithelial tumors (77). We found localized homozygous loss of the CDKN2A region in 2 cases, with only 13 and 37 genes included in the deletions, respectively (Supplementary Table 11). These tumors also showed low expression of CDKN2A, unlike 4 other cases that harbored heterozygous loss of the entire chromosome 9 (Supplementary Fig. 30B). Thus, certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Interestingly, no CNAs were found in TGFBR3-PLAG1 positive tumors. Since this fusion gene was detected only in MECA de novo cases or in the malignant component of MECA ex-PA tumors, this mutual exclusivity suggests that TGFBR3-PLAG1 is a potent oncogenic event that leads to carcinoma development even in the absence of CNAs. Conversely, other PLAG1 fusion genes were more commonly found in MECA ex-PA tumors and were associated with a high number of CNAs. We hypothesize that these PLAG1 fusions cause PA, and that the transformation into MECA ex-PA may be mediated by CNAs in these cases.

Clinical correlates
We investigated the prognostic role of different genetic alterations in MECA. Most notably, increasing tumor aneuploidy was associated with poorer prognosis. Patients with 10 or more CNAs had a median recurrence-free time of 17 months, and all these tumors resulted in recurrence within 6 years. On the other hand, none of the tumors without significant CNAs had recurred after a median follow-up time of 5 (range 1-15) years (hazard ratio >9 CNAs versus no CNAs=20.6, Log-rank test), and experienced recurrence in around 50% of the cases (Fig. 6C). This indicates that genomic instability, as measured by a high number of CNAs, confers a poor prognosis in MECA.

Conclusions
We performed a comprehensive analysis of the genome and transcriptome of MECA, an understudied, lethal salivary cancer. We found that MECA exhibits a molecular profile similar to certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted June 9, 2017. ; https://doi.org/10.1101/148072 doi: bioRxiv preprint certain soft tissue tumors and sarcomas (78), in that they are defined by a low mutational load and frequent oncogenic gene fusions, and/or widespread arm-level CNAs acting as the main driving events (e.g., similar to pleomorphic sarcomas). Despite these similarities with soft tissue tumors, MECA appears molecularly distinct from myoepithelial tumors of soft tissue, which are morphologically similar tumors. Most notably, whereas EWSR1 translocations are frequently detected in soft tissue myoepithelial tumors (79,80), PLAG1 translocations are more frequent in MECA.
Cancer driver gene mutations (e.g. PIK3CA, HRAS) were identified in only a minority of tumors, and the average mutational load was low at 0.5/MB. The majority (70%) of MECAs harbored fusion genes, many of which appear to be tumorigenic, either through upregulation of PLAG1 and On the other hand, FGFR1-PLAG1 fusions were only found in MECA ex-PAs. This fusion exists in a small proportion of benign PAs, which may then undergo malignant transformation in the context of the fusion combined with genomic instability manifested as complex copy number alteration (Fig 7). These tumors are associated with a poor prognosis, which may be attributed to the higher degree of genomic instability.
The findings of this study may have an impact on the clinical management of MECA. Diagnosing MECA remains challenging because of the morphological similarities with other types of SGCs.
As TGFBR3-PLAG1 has not been reported in other tumor types, this fusion may be a potential molecular marker of MECA. In addition, several of the genetic events are potentially targetable, for example by inhibiting downstream effectors of PLAG1 or targeting ALK. While low tumor mutational load is anticipated to be associated with lower degrees of response to immune checkpoint inhibitors (68,69,81), recurrent fusion genes that result in novel transcripts may result in immunogenic neoepitopes that would represent attractive targets for neoantigen-based vaccines.
Finally, the correlation between CNAs and poorer prognosis may be of value for future clinical decisions regarding treatment and monitoring of MECA patients. certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted June 9, 2017. ; https://doi.org/10.1101/148072 doi: bioRxiv preprint  certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.