Comparative transcriptional profiling of canine acanthomatous ameloblastoma and homology with human ameloblastoma

Ameloblastomas are odontogenic tumors that are rare in people but have a relatively high prevalence in dogs. Because canine acanthomatous ameloblastomas (CAA) have clinicopathologic and molecular features in common with human ameloblastomas (AM), spontaneous CAA can serve as a useful translational model of disease. However, the molecular basis of CAA and how it compares to AM are incompletely understood. In this study, we compared the global genomic expression profile of CAA with AM and evaluated its dental origin by using a bulk RNA-seq approach. For these studies, healthy gingiva and canine oral squamous cell carcinoma served as controls. We found that aberrant RAS signaling, and activation of the epithelial-to-mesenchymal transition cellular program are involved in the pathogenesis of CAA, and that CAA is enriched with genes known to be upregulated in AM including those expressed during the early stages of tooth development, suggesting a high level of molecular homology. These results support the model that domestic dogs with spontaneous CAA have potential for pre-clinical assessment of targeted therapeutic modalities against AM.

. Case description and assays performed. *Results are shown as a percentage of the area of each tumor exhibiting a staining pattern consistent with EMT, as described in the text. **Symbols denote the intensity of the signal observed as follows: weak (+), intermediate (++), or strong (+++). Scores of IHC results were not compared due to limited number of cases available for staining. Abbreviations: CAA canine acanthomatous ameloblastoma, HGIN healthy gingiva, OSCC canine oral squamous cell carcinoma, EMT epithelial-tomesenchymal transition, WT wild type, ND not determined. www.nature.com/scientificreports/ with RNA-seq. An additional 12 samples were used to complement RNA-seq data validation assays including reverse transcriptase quantitative PCR (qPCR) and immunohistochemistry (IHC) assays. The mutational profile of 28 of the 37 samples had previously been reported using a candidate gene approach 16 Tables S2-4). Principal component analysis and unsupervised hierarchical cluster analysis were performed using independent algorithms with all samples clustering according to tumor/tissue type (Fig. 2). To validate the RNA-seq profiles, 11 genes differentially expressed in at least one of the three pairwise comparisons (i.e., CAA vs HGIN, COSCC vs HGIN, and CAA vs COSCC) were selected for qPCR. Overall, excellent agreement between RNA-seq and qPCR results was found (Fig. 3A, Supplemental Figure S5), including extending the observation of altered gene expression for samples outside the RNA-seq cohort. For biological validation of ODAM, one of the differentially overexpressed genes in CAA, expression of the ODAM protein was assessed by IHC in two CAA samples. We found cytoplasmic and nuclear expression of ODAM in CAA tumor cells (Fig. 3B, Supplemental Figure S6). Moreover, ODAM expression was multifocal and without segregation to basal or suprabasilar neoplastic epithelial cells. Although protein expression was not confirmed with IHC in HGIN or COSCC samples, RNA-seq and qPCR results also showed ODAM to be relatively highly expressed in 1 of the 5 HGIN samples, and in 2 of the 8 COSCC samples (Supplemental Figure S5), demonstrating that its expression is not exclusive to CAA, as would be expected based on previous studies [26][27][28] .

Functional enrichment analyses.
To gain mechanistic insights, Gene Set Enrichment Analysis (GSEA) 29 and Ingenuity Pathway Analysis (IPA, QIAGEN Inc.) software were applied to all three pairwise comparisons. The MSigDB Hallmark, C2 Canonical Pathways and C6 Oncogenic Signature gene set collections were used for GSEA 30 , while IPA was used to identify biological pathways and molecules predicted to be activated (Fig. 4 www.nature.com/scientificreports/ strate downstream MAPK pathway activation and EMT reprogramming of tumor cells, respectively. In the test samples, immunoreactivity for pERK1/2 was multifocal to diffuse with variable nuclear and cytoplasmic expression in neoplastic cells and to a lesser degree, in stromal cells, in some cases (Fig. 5A, B). Loss of CK expression (Fig. 5C) and gain of VIM expression (Fig. 5D) in tumor cells localized along the basal cell layers at the periphery of the islands and cords of neoplastic epithelium, consistent with EMT at the invasive fronts of the neoplasm.
Molecular homology with AM and dental identity of CAA . To investigate the extent of molecular homology between CAA and AM, GSEA was performed with custom gene sets derived from previously published sets of genes altered in AM when compared to healthy human gingiva 31 . CAA had significant enrichment of corresponding genes known to be upregulated in AM, with most qualifying as leading-edge including PTHLH, FGFR1, MMP1, MMP2, MMP13, and COL8A1 (Fig. 6A, Supplemental Table S11). Similarly, a negative enrichment score was observed for genes known to be downregulated in AM.
To determine whether CAA tumors overexpress genes known to be active during specific stages of tooth development, GSEA was performed with custom gene sets associated with the initiation, bud, cap, bell, late bell, secretory, and root development stages, as well as a pool of all 303 known dental genes (http:// bite-it. helsi nki. fi; Fig. 6B, Supplemental Table S12) 32 . When compared to HGIN and COSCC, CAA exhibited a strong dental identity with significant enrichment of genes associated with the cap and bud stages of tooth development. Some of the related genes noted to be overexpressed in CAA compared to HGIN were PITX2, MSX2, and RUNX1.

Discussion
To further understand the molecular pathogenesis of CAA and characterize the extent of homology with AM, we explored the expression profile of CAA using a bulk RNA-seq approach and validated selected proteins by IHC. We found that the expression profile of CAA is dominated by signatures of aberrant RAS signaling and involves activation of the EMT cellular program. Moreover, CAA exhibits a strong dental identity and is enriched with genes known to be upregulated in AM [31][32][33] . The data suggest a high level of molecular oncogenic program homology of CAA and AM. The prominent role of RAS signaling in CAA was characterized by enrichment of genes known to be upregulated by oncogenic KRAS alleles, and by the predicted activation of canonical and non-canonical MAPK pathways including ERK1/2, p38 MAPK, and JNK 34 . The predicted activation of the PI3K-AKT pathway also coincides www.nature.com/scientificreports/ with this trend given that aberrant RAS signaling frequently underlies dysregulation of this pathway 35,36 . The finding that RAS signaling contributes to CAA is consistent with its mutational landscape, which includes highly recurrent HRAS p.Q61R somatic mutations 5,16 . Given that HRAS p.Q61R is one of several mutations that can constitutively activate RAS 36,37 , the results of the present and previous studies 13,16 suggest that this hotspot mutation acts as a driver in CAA. Interestingly, although not found among the most highly enriched pathways in this study, canonical WNT signaling has been previously documented in CAA using IHC 38 . Such findings  www.nature.com/scientificreports/ would be expected in the context of our results given the close interactions that occur between the WNT and RAS-RAF-MAPK pathways 39 . Activation of the EMT program in CAA correlates with overexpression of ZEB1 and ZEB2, which are well known transcription factors that regulate this cellular process 40 . Similarly, the pattern of upregulation of mesenchymal markers (e.g., VIM), extracellular matrix proteins (e.g., COL8A1, FN1), and metalloproteinases (e.g., MMP1, MMP2, MMP13), and downregulation of cytokeratins (e.g., KRT3, KRT10) are typical features of EMT 41 . It is well established that EMT is a dynamic and reversible process that confers neoplastic cells the phenotypic plasticity required for invasion and migration 40,41 . Therefore, these findings were not surprising considering the locally invasive nature of CAA. Additionally, results align with previous studies of AM [42][43][44] , further suggesting similarities in molecular oncogenesis with CAA. Likewise, given that ERK1/2, TGF beta and TNF are among the pathways that can activate EMT 41,45 , these results underscore the impact and complexity of the cellular reprograming events that take place when aberrant or dysregulated signaling ensues.
The remarkable similar behavior of CAA genes that are known to be up-and down-regulated in AM 31 demonstrate that canine and human ameloblastomas exhibit a high degree of transcriptional homology. Unsurprisingly, the functional enrichment patterns observed in CAA were comparable to those reported in AM, including notable enrichment of some of the same oncogenic gene sets 31 . Similarly, the strong dental identity observed in CAA is consistent with the reported expression profile of AM 32 , including upregulation of PITX2 and other transcription factors known to regulate the early stages of tooth development 46,47 . Together with the known clinical, radiographic, histological, and mutational similarities 5,16,22 , these findings support leveraging companion dogs with natural disease as a pre-clinical model of AM. Dogs with naturally-occurring CAA would allow rapid investigations into alternative standalone or combined therapeutic modalities that could complement or substitute traditional surgical approaches. Indeed, aside from RAS-RAF-MAPK signaling, several of the other pathways and molecules identified may represent potential therapeutic targets, including PI3K-AKT, metalloproteinases, and VEGF [48][49][50] . Additionally, the current dataset could be used to investigate other therapeutic strategies, www.nature.com/scientificreports/ including those related to tumor metabolism. For example, the glutaminase regulators JUN and NFKB 51,52 were both predicted to be active; given that glutaminase is a key metabolic enzyme in many types of tumors [51][52][53] , and has been tied to EMT in several contexts [54][55][56][57] , this could justify exploring glutaminase inhibitors as a potential targeted approach for the treatment of CAA.
In this study, we elected to also compare the expression profiles of CAA and COSCC given that they represent the two most common oral tumors of epithelial cell origin in dogs, and because RAS-RAF-MAPK activating mutations occur at detectable frequencies in both tumor types making any transcriptional comparisons potentially insightful [3][4][5]16,58 . As would be expected based on fundamental differences in biologic behavior 5 , sample clustering analyses showed that CAA and COSCC represent molecularly distinct oncogenic processes. Of note, the expression trends observed in COSCC appeared comparable to those previously reported 59,60 , further confirming the validity and reproducibility of the RNA-seq dataset. Functional analyses showed widespread similarity between CAA and COSCC compared to HGIN, with COSCC typically showing stronger pathway activation. The differences may point to the important clinical and morphological distinctions between the tumor types including benign versus malignant behavior. Some representative differences between the gene expression profiles of the two tumor types included relative enrichment of genes associated with hypoxia in COSCC. This finding might be explained by the lower proliferation activity, and thus, slower tumor growth found in CAA when compared to COSCC 5 , which likely account for the relatively noticeable infrequent ischemic necrosis in CAA. Another notable functional difference was a more profound predicted loss of PTEN signaling in COSCC compared to CAA. Since PTEN signaling regulates the PI3K pathway 61 , this finding coincides with the higher predicted activation score of PI3K-AKT activity in COSCC. Also, considering the high frequency of PTEN and PIK3CA mutations in human OSCC 62,63 , such findings justify investigating the mutational status of these pathways in COSCC and CAA.
Apart from functional insights, our RNA-seq data also revealed interesting patterns of expression of a few individual genes. For example, coinciding with human tumors, SPP1 was significantly overexpressed in CAA when compared to HGIN. Osteopontin (OPN), encoded by SPP1, is a multifunctional sialoprotein involved with many important biological processes including cell survival via integrin signaling and PI3K-AKT activation, and EMT regulation [64][65][66] . Not surprisingly, SPP1 was differentially overexpressed in COSCC compared to CAA, underscoring known differences in biologic behavior 5 . In line with several studies that have investigated Figure 6. Homology of CAA with AM and dental identity. Panel (A) shows a heat map depicting the relative expression of the top 50% leading-edge genes revealed by custom GSEA testing for enrichment of a set of genes known to be upregulated in AM 31 . The heat map shows clear differences between CAA and control samples, and to a much lesser extent, variation among CAA samples. Panel (B) shows a scatterplot depicting normalized enrichment scores of custom gene sets of genes known to be involved during different stages of tooth development analyzed with GSEA for CAA relative to HGIN and COSCC relative to HGIN. Red points correspond to gene sets that were significantly more enriched in a separate GSEA analysis of CAA compared to COSCC, and blue points to gene sets significantly more enriched in COSCC compared to CAA. Abbreviations: CAA canine acanthomatous ameloblastoma, COSCC canine oral squamous cell carcinoma.  65,67,68 , these findings might support the use of SPP1 or OPN expression as prognostic biomarkers in CAA and COSCC. Another interesting differentially expressed gene was ODAM, which encodes a secreted protein normally produced by mature ameloblasts during the secretory stage of tooth development, as well as by junctional epithelium associated with erupted teeth [26][27][28] . In this context, the overexpression of ODAM in all but one of the CAA tumors is consistent with its dental origin. However, ODAM has also been shown to be expressed in several types of cancer in people including gastric, lung and breast tumors 28 , so its expression in CAA could also be secondary to other cellular reprogramming events unrelated to the enrichment of genes associated with tooth development. Regardless, the differential expression pattern of ODAM in CAA compared to COSCC might justify investigating it as a biomarker.
It should be noted that, despite clustering together, variation was observed in the expression profiles of samples within each group. This was expected given that samples were obtained from different anatomical sites from individuals of different demographic characteristics, and during potentially different stages of disease. Additionally, given that bulk RNA-seq lacks single cell resolution 69 , and that tissue and tumor samples are an heterogeneous collection of cell types whose exact relative content cannot be controlled, variation in the expression profile is inevitable. This is further compounded by the molecular complexity and dynamism, and genetic diversity of neoplasms in general.
We conclude that the molecular pathogenesis of CAA is largely dominated by aberrant RAS signaling, and involves activation of pathways and cellular programs known to play a role in tumorigenesis including canonical and non-canonical MAPK, PI3K-AKT, EMT, TGFB, and VEGF signaling, and altered ECM production; and that it exhibits a very high degree of homology with AM at the transcriptional level including a strong dental identity.

Methods
Clinical samples. The FFPE tissues used in this study were archived by the Anatomic Pathology Section at Cornell University's College of Veterinary Medicine. Histological assessment of tissues was done using routine H&E-stained samples by a board-certified veterinary pathologist (GED); tumors were diagnosed following previously described criteria while blinded to molecular assays 70,71 . Cryopreserved samples were collected during standard-of-care surgical procedures and stored by the Cornell Veterinary Biobank until retrieved for analysis. Sample collection and experimental procedures were performed in accordance with a protocol (#2005-0151) approved by Cornell University's Institutional Animal Care and Use Committee. Accordingly, informed consent to authorize the use of tissue samples and clinical data for research purpose was obtained from dog owners prior to sample collection, and undue harm was never inflicted to client-owned dogs for the purposes of this study; all methods were performed in accordance with the relevant guidelines and regulations.
RNA isolation, library preparation and sequencing. RNA was isolated and poly-A RNA-seq libraries were generated as previously described 16 . Briefly, cryopreserved tissues were homogenized in Trizol (Thermo Fisher) using a bead mill without thawing. RNA was isolated following the manufacturer's protocol with the following modifications: an extra chloroform extraction was added prior to precipitation, 1 μL glycoblue (Thermo Fisher) was added immediately prior to precipitation, and the RNA pellet was washed twice with 75% ethanol. RNA concentration was measured with a Nanodrop (Thermo Fisher) and integrity determined with a Fragment Analyzer (Advanced Analytical). PolyA + -enriched RNA-seq libraries were generated with the NEBNext Ultra II Directional library prep kit (New England Biolabs) using 250 ng input total RNA. Single-end 85 nt reads were generated on a NextSeq500 instrument (Illumina).
RNA-seq analysis. Raw reads were trimmed for low quality and adaptor sequences and filtered for minimum length with cutadapt software (parameters: -m 20 -q 20 -a AGA TCG GAA GAG CAC ACG TCT GAA CTC CAG TC-match-read-wildcards) 72 . Trimmed reads were mapped to the reference genome/transcriptome (Ensembl CanFam3) using tophat v2.0 (parameters-no-novel-juncs-library-type fr-firststrand-G < Ensembl_CanFam3_genes.gtf >) 73 . Differential gene expression was analyzed with cufflinks v2.2 74 ; additional filters included defining 'expressed' genes (minimum avg(FPKM) > 10 in at least one group) and 'stringent differentially expressed genes' ('expressed' genes with FDR < 0.05 and minimum twofold change between groups). Clustering analyses were generated in R v3.6.1 using variance-stabilized counts from DEseq2 v1.26.0 and the functions prcomp (principal components analysis) and hclust (hierarchical clustering with Euclidean distance and ward.D criteria). The canine gene symbols were converted to human gene symbols using Biomart (Ensembl) one-to-one orthology assignments for protein-coding genes to enable analysis of MSigDB gene sets with GSEA. GSEA pre-ranked analysis included the Hallmark, C2: Canonical Pathways, and C6: Oncogenic Signature gene sets using log2-fold change values for 'expressed' genes for each pairwise comparison with the classic enrichment statistic 75 . The heatmap of leading-edge genes was generated in R from row-normalized FPKM values with d3heatmap v0.6.25.Cuffdiff output files were converted to human gene identifiers and filtered for one-to-one orthologs (Ensembl Biomart) and min FPKM = 1, and uploaded to Ingenuity Pathway Analysis (Qiagen) for pathway analysis. qPCR validation. The levels of expression of a subset of differentially expressed genes were validated using real-time reverse transcription polymerase chain reaction (qPCR). cDNA was synthesized as previously described 16 . All cDNA reactions were diluted 20-fold with water prior to qPCR reaction setup. Primer pairs were designed with Primer-BLAST (NCBI), separated by an intron to minimize amplification of residual contaminating genomic DNA and allow identification of alternate amplicons with melt curve analysis. RPL13A was selected as the endogenous control gene, as this gene showed minimal variation across samples in the RNA-seq data 76  Immunohistochemistry (IHC). Selected FFPE tissue blocks from each case were processed for antigen retrieval and detection by using an automated IHC processor (Leica Bond-Max, Leica Biosystems, Buffalo Grove, Illinois, USA), as previously described 5 . Briefly, sections were dewaxed (cat# AR9222, Bond Dewax Solution, Leica) and processed for epitope retrieval (cat# AR9961 or AR9640, Bond Epitope Retrieval solution, Leica) followed by incubation with the primary antibody. For VIM, the mouse monoclonal IgG2a, kappa antivimentin, clone Vim 3B4 (cat# M7020, Agilent Dako, Santa Clara, California, USA) was used. For pERK1/2, a rabbit monoclonal (EPR19401) to ERK1 (phospho T202) and ERK2 (phospho T185) (cat# ab201015, Abcam, Cambridge, United Kingdom) was used. For CK, a mouse monoclonal anti-human clone AE1/AE3 antibody (cat# M3515, DakoCytomation, Carpinteria, California, USA) was used. Next, polymeric alkaline phosphatase conjugated anti-mouse IgG (cat# PV6110, PowervisionTM Poly-AP Anti-Mouse IgG, Leica) was applied followed by Red DetectionTM (cat# DS9390, Bond Refine Red Detection Kit, Leica), and hematoxylin counterstain. For ODAM, FFPE tissue sections were manually deparaffinized in xylene, rehydrated in graded ethanol, and subjected to antigen retrieval by steaming in citrate buffer (10 mM, pH6.0) for 20 min. Then, endogenous peroxidase activity was quenched with 0.3% hydrogen peroxide followed by incubation with rabbit polyclonal anti-ODAM IgG (cat# orb317695, Biorbyt Ltd., Cambridge, UK) diluted 1:300 overnight at 4 °C and 5 h at room temperature followed by incubation with ImmPRESS HRP Anti-Rabbit Ig (Peroxidase) Polymer Detection Kit (Vector Laboratories), Nova Red chromogen (Vector Laboratories), and hematoxylin counterstain. In each assay, a positive control reference tissue was included on the same slide as the sample. Negative controls consisted of a duplicate tissue section of each case incubated with an isotype-matched irrelevant primary antibody. Scoring of pERK1/2 was performed by a board-certified veterinary pathologist (ADM) by defining the following variables: distribution (absent, multifocal, or diffuse), pattern (cytoplasmic or nuclear), labeled cells (neoplastic or stromal), and intensity (none, weak, intermediate, or strong). Scoring of VIM and CK were performed by a board-certified veterinary pathologist (GED) using the same criteria.

Data availability
The gene expression data is available at the NCBI Gene Expression Omnibus (GEO) with accession number GSE175876. All other relevant data is included in the paper.   TTT GGT TCC AGG TAT CCC ATGAC 134   IL13RA2  TGG GTC ATC AGA ATC CCA GC  AGG TAG TCT GGT GGC ATA GG  87   KRT3  CAA GTG AAG ACC CAG GAG CG  TGT GAT GGA ATT TGT GCC TTGC  150   MSX2  CAG GAG CCC GGC AGA TAC TC  TCT CGG CTT CCG ATT GGT C  90   ODAM  AAG GCC AAG ACT GAT TAC TTA AAG G TGT TAG CAT CGA GGA ATC AAATG 124   PNOC  ATA TGC TGG TGT GGC TGG TACG  GAG CAG CAG GAG GTC ACA AAG  136   SFRP2  AGG ATG ACA ACG ACA TAA TGG AAA C GTC TTG CTC TTG GTC TCC AGG  121   SPP1  TGA CCC ATC TCA GAA GCA GAC  TCG TCA TGG CTT TCA TTG GAC  114   SULF1  TGC AAC CCA AGA CCT AAG GG  ACC TTC CCA TCC ATC CCA TAAC  93   TNFRSF11B  GCG ACA CAG CTC ACA AGA AC  ACG CTG TTC TCA CAC AGG TC  117   WIF1  GCC AAT GTC AAG AAG GCT GG  TTA AGT GAA GGC GTG TGT TGC  115   RPL13A* 76  GCC GGA AGG TTG TAG TCG T  GGA GGA AGG CCA GGT AAT TC  87