Angiogenesis and evading immune destruction are the main related transcriptomic characteristics to the invasive process of oral tongue cancer

Metastasis of head and neck tumors is responsible for a high mortality rate. Understanding its biochemistry may allow insights into tumorigenesis. To that end we carried out RNA-Seq analyses of 5 SCC9 derived oral cancer cell lines displaying increased invasive potential. Differentially expressed genes (DEGs) were annotated based on p-values and false discovery rate (q-values). All 292 KEGG pathways related to the human genome were compared in order to pinpoint the absolute and relative contributions to the invasive process considering the 8 hallmarks of cancer plus 2 new defined categories, as well as we made with our transcriptomic data. In terms of absolute contribution, the highest correlations were associated to the categories of evading immune destruction and energy metabolism and for relative contributions, angiogenesis and evading immune destruction. DEGs were distributed into each one of all possible modes of regulation, regarding up, down and continuum expression, along the 3 stages of metastatic progression. For p-values twenty-six genes were consistently present along the tumoral progression and 4 for q-values. Among the DEGs, we found 2 novel potentially informative metastatic markers: PIGG and SLC8B1. Furthermore, interactome analysis showed that MYH14, ANGPTL4, PPARD and ENPP1 are amenable to pharmacological interventions.

Tongue cancer differentially expressed genes represent almost all of the reported human KEGG pathways. In order to understand main changes on gene expression along with the metastatic OTSCC progression we used the software Cufflinks to map the reads of the known human genes. The relative abundance metric parameter FPKM (Fragments Per Kilobase of exon per Million reads sequenced) was used to represent the value of gene expression on each dataset analyzed. To detect DEGs, we applied the Student's t-test (p-values), and then, the FDR correction (q-values). The quality of the results depends on the amount of genes analyzed, which in turn is based on cut-offs and statistical significance 9 . This justified the use of both data, p-and q-values. We compared the expression of genes between the parental versus its derived cell line. Regarding the p-value, 9169 DEGs were found between SCC9 vs. ZsG; 11597 between ZsG vs. LN1, 5011 between LN1 vs. LN2, and 8572 between LN2 vs. LN3. Regarding the q-values, 6728 DEGs were found between SCC9 vs. ZsG; 9874 between ZsG vs. LN1, 284 between LN1 vs. LN2 and 5579 between LN2 vs. LN3 (Supplementary Table 2). Hereafter, only the identified DEGs will be discussed.
Expression ratios were obtained between parental and derived cell lines. Based on that, a ranking of all DEGs was produced (Supplementary Table 3). SCC9 cell line was used as the reference for ZsG transformed cells, however the following comparisons of transformed cell lines (TCLs) (ZsG vs. LN1, LN1 vs. LN2 and LN2 vs. LN3) will be discussed. Down and up regulated genes were identified for each comparison (Fig. 1A), and by matching those differentially expressed genes, we could classify them into 11-different clusters of gene expression (CoGE). These strategy allowed the identification of regulatory patterns among parental-derived changes on gene expression: (i) exclusive parental genes (FPKM > 0 in parental cell line and FPKM = 0 in derived cell line); (ii) exclusive derived genes (FPKM > 0 in derived cell line and FPKM = 0 in parental cell line); (iii) continuum (referring to those genes whose expression did not change significantly between parental and derived cell line displaying FPKM ratio 0.8 ≥ x ≥ 1.2. Differences of 0.1 are common even in technical replicates, but differences of 0.3 are considered as significant variations. With that in mind, the value of 1 ± 0.2 was set as the limit for the continuum cluster. A similar approach was used for human neoplasms 11 ); (iv) exclusive parental down regulated (FPKM ratio < 0.8); (v) exclusive parental up regulated (FPKM ratio > 1,2); (vi) exclusive derived down regulated (FPKM ratio < 0.8); (vii) exclusive derived up regulated (FPKM ratio > 1.2); (viii) common down regulated; (ix) common up regulated; (x) common parental down regulated and derived up regulated; and (xi) common parental up regulated and derived down regulated (Table 1 and Fig. 1B).
SCIeNtIFIC REPORTS | (2018) 8:2007 | DOI: 10.1038/s41598-017-19010-5 Using STRING 12 , we further categorized the DEGs for both p-values and q-values into protein coding genes (PCG) (Supplementary Tables 4A and 4B, p-and q-values, respectively) and non-coding genes (NCG), (Supplementary Tables 5A and 5B, p-and q-values, respectively). Also, STRING allowed the enrichment of the interactome of all differentially expressed PCG with KEGG pathways by assigning the molecular or biochemical pathways related to them (Table 1 and Supplementary Table 6). Supplementary Tables 6A (p-values) and 6B (qvalues) show the KEGG pathways and genes related to each hallmark of cancer, related to each CoGE. Table 1 reveals that there was no correlation between the number of PCG and KEGG pathways, for both p-and q-values. Concerning the KEGG pathways, the highest was 284 (corresponding to LN1 vs. LN2, exclusive parental down regulated, for both p-and q-values). This observation raised the question of how many KEGG pathways are related to the cDNA of the human genome. Using all the 35238 annotated genes, we followed the same procedures, obtaining 292 KEGG pathways related to 7858 PCG (Supplementary Table 6C). With this approach almost all the pathways related to the human genome were included in our analysis. However, two of these were not detected: D-arginine and D-ornithine metabolism, and fatty acid elongation in mitochondria. The same strategy was applied to all 3717 DEGs found in our analyses of q-values (Supplementary Table 6D).
KEGG pathways and protein-coding genes related to 'Energetic metabolism' and 'Invasion and metastasis' hallmarks were the most represented. All 292 KEGG pathways of the human genome were distributed according to the 8 hallmarks of cancer: (i) auto-sustained proliferative signaling, (ii) ability to evade growth suppressors, (iii) mechanisms to resist cell death, (iv) enable replicative immortality, (v) angiogenesis induction, (vi) invasion and metastasis capacity, (vii) shift of energy metabolism and (viii) evasion of immune destruction 10 , using the PubMed database by manual curation. Because (ix) other cancer types and (x) chronic diseases we related to 71 KEGG pathways, these were added to the 8 hallmarks of cancer. Energy metabolism was the hallmark with the highest number of KEGG pathways (123), followed by invasion and metastasis (77), chronic diseases (52), proliferative signaling (39), resisting cell death and evading immune destruction (37), angiogenesis (24), evading growth suppressors (22), other cancer types (21) and replicative immortality (18).
The characteristic chosen to compare all other hallmarks was invasion and metastasis. Of the 77 KEGG pathways related to invasion and metastasis, 25 were shared with energy metabolism, 22 with immune destruction evasion, 18 with proliferative signaling, 13 with cell death resistance and angiogenesis, 11 with growth suppression evasion and 3 with replicative immortality. Also, other cancer types and chronic diseases pathways were compared to those belonging to the invasion and metastasis hallmark. We found 1 and 0 shared pathways respectively, as depicted in Fig. 2A, upper left panel.
When the same comparison was carried out for p-values, a similar distribution was found except for an exclusive pathway of energy metabolism: 97 out of 98, when LN2 and LN3 were compared ( Fig. 2A, upper panel, in red). For q-values, 23 of 24 exclusive pathways were assigned to the category of resistance to cell death when LN2 and LN3 were compared. There was a decrease in energy metabolism exclusive pathways, for the comparisons LN1 vs. LN2 (97 out of 98) and LN2 vs. LN3 (95 out of 98); and 51 out of 52 chronic diseases pathways when LN2 vs. LN3 were compared (Fig. 2B, upper panel, in red).
In order to evaluate the absolute contributions of the genes associated to invasion and metastasis, the same approach was followed, using the related DEGs of each KEGG pathway identified by STRING. First, we compared all 35238 human genes against invasion and metastasis. The highest number of PCG was related to invasion and metastasis (3712) in which 1542 were shared with evasion of immune destruction followed by 1536 with energy metabolism, 1310 with proliferative signaling, 1126 with resistance to cell death, 1081 with angiogenesis, 1028 with chronic diseases, 854 with evading growth suppressors, 590 with replicative immortality and 460 with other cancer types, with which it shared only one KEGG pathway ( Fig. 2A, upper right panel).
We used the same approach for p-values, finding for ZsG vs. LN1, 2206 DEGs related to invasion and metastasis. For LN1 vs. LN2, there were 2026 DEGs and for LN2 vs. LN3, 1745 DEGs ( Fig. 2A, bottom panels right, middle and left, respectively). All 3 TCLs comparisons displayed the same pattern of contributions to invasion and metastasis. The same analyses were carried out for q-values. For that we used all 3717 DEGs found in our outcomes. The results are shown in Fig. 2B 'Angiogenesis' and 'Evading immune destruction' are the main hallmarks related to metastasis. Next we enquired the relative DEGs contributions of the hallmarks to invasion and metastasis.
LN1 vs. LN2. The CoGE displaying the highest number of DEGs was exclusive in LN1 up regulated concerning the invasion and metastasis hallmark for both p-and q-values, represented by 658 and 850 genes, respectively. The lowest numbers of DEGs were related to other cancer types and replicative immortality, displaying some CoGE without DEGs. The highest hallmarks contributions were evading immune destruction, angiogenesis, proliferative signaling and resisting cell death, while the lowest corresponded to energy metabolism and other cancer types. We concluded based on the KEGG pathways of each CoGE, that both LN1 and LN2 cell lines had features associated with the invasive process although resorting to different mechanisms, in which LN1 cells used inflammatory and proliferative processes to become invasive, whereas LN2 cells relied on strategies to become refractory to the immune system (Supplementary Tables 7A and 7B and Table 2).   (645) and q-values (638). The lowest numbers of DEGs were related to other cancer types and replicative immortality, with some CoGE without DEGs. Following invasion and metastasis, the hallmarks most represented were evading immune destruction, angiogenesis, evading growth suppressors and proliferative signaling in that order. Those with the least contributions were energy metabolism and other cancer types. Based on the KEGG pathways for each CoGE, we suggest that LN2 cells, rather than LN3 cells, showed a remarkable similarity to other cancer types, while the LN3 cell line was closer to high invasive capacity through angiogenesis stimulation and ability to avoid the immune system (Supplementary Tables 7A and 7B and Table 2). In order to identify the global contributions of each hallmark of cancer to invasion and metastasis, we obtained the mean of percentages of 11-CoGE and ranked them accordingly (Fig. 3). We found that the processes most closely related to invasion and metastasis were angiogenesis, evading immune destruction, evading growth suppressors and proliferative signaling, with genetic contributions from more than 68% for p-values and almost 48% for q-values. Energy metabolism and other cancer types were less relevant, with contributions of almost 50% and 37% of their DEGs (p-and q-values comparisons, respectively) ( Table 2). MYH14, ANGPTL4, ENPP1 and PPARD are possible OTSCC biomarkers and potential targets for interference studies. Next we looked for specific genes in our model trying to pinpoint novel OTSCC or metastasis biomarkers. Accordingly, we checked 11-CoGE of each cell line comparison, finding 26 common DEGs displaying the same type of regulation (Table 3) along with the OTSCC model of increasing metastatic potential. They were sorted as 15-downregulated, 10-continuum and 1 up regulated. These genes were classified using Panther ® software to analyze whether they were eligible as biomarkers for OTSCC and/or for metastasis.

Clusters of gene expression
First, we concentrated on the 15-downregulated genes, and found that only angiopoitein related protein 4 (ANGPTL4) was associated to biological adhesion acting as a signaling molecule. Amiloride-sensitive sodium channel subunit alpha (SCNN1A) and Ras-related protein Rab-17 (RAB17) participate in biological regulation. Myosin 14 (MYH14), which acts as G-protein modulator by way of the actin binding motor protein and as a cell junction protein, as well as RAB17, are related to cellular component organization and biogenesis. Regarding cellular processes, six genes, solute carrier family 28 member 3 (SLC28A3), gap-junction alpha-5 protein (GJA5), netrin receptor UNC5B (UNC5B), ANGPTL4, MYH14, and RAB17 were detected. Five genes were associated to developmental processes, namely ANGPTL4, MYH14, UNC5B, transcription cofactor vestigial-like protein 1 (VGLL1) and FYVE, RhoGEF and PH domain-containing protein 3 (FGD3), which acts as guanyl-nucleotide exchange factor. Four genes were related to cellular localization, SCNN1A, SLC28A3, MYH14 and RAB17. Concerning metabolic processes, we detected lipase member H (LIPH), which acts as esterase, phospholipase and storage protein. Two genes were related to multicellular organismal process, SCNN1A and MYH14. LIPH was associated to cell proliferation and motility. Within the group of 15-downregulated genes, ANGPTL4 is a classical biomarker for metastasis. Five genes were not found in the Panther database. Therefore, we used Gene Ontology to find out their biological functions. Two of them had receptor functions, plexin domain containing 2 (PLXDC2) and neuromedin U (NMU), that acts as neuromedin U receptor binding. Other two genes with binding properties, radical S-adenosyl methionine domain containing 2 (RSAD2) acting as a self-association protein and as iron-sulfur cluster binding, and coxsackie virus and adenovirus receptor (CXADR) identical protein binding and integrin binding. Finally, sciellin (SCEL) takes part in the assembly or regulation of proteins in the cornified envelope.
The single up regulated gene, ENPP1, was classified as part of the metabolic process, being a nucleotide phosphatase and pyrophosphatase enzyme.
Inspection of the ten continuum genes revealed one gene related to cellular component organization or biogenesis, syntaxin-6 (STX6), which acts as a SNARE protein. Four genes were related to cellular processes, signal recognition particle receptor subunit beta (SRPRB), peroxisome proliferator-activated receptor delta (PPARD), GPI ethanolamine phosphate transferase 2 (PIGG) and STX6. Four genes were classified within the localization group: charged multivesicular body protein 6 (CHMP6) that acts as transfer/carrier protein; transmembrane emp24 domain containing protein 2 (TMED2), which acts as transfer/carrier protein and as vesicle coat protein; SRPRB and STX6. Also, 2 genes related to metabolic process, PPARD and PIGG were found. Only one gene was related to multicellular organismal process, PPARD, a member of the proliferator-activated receptor family PPAR involved in the development of several chronic diseases. Four genes were not found in the Panther database, so we used Gene Ontology to search for their biological functions. Two were related to cell death, death associated protein kinase 3 (DAPK3), which displays protein homodimerization activity and transferase activity of phosphorus-containing groups; and second mitochondria-derived activator of caspase (SMAC/DIABLO), which activates caspases by binding to inhibitor of apoptosis proteins. Two genes were related to transport functions, sodium/potassium/calcium exchanger 6 mitochondrial (SLC8B1) acting as a cation transporter, and transient receptor potential cation channel subfamily C member 4 associated protein (TRPC4AP), related to phosphatase binding.
To better understand the relationship between the above 26 highlighted genes, we investigated whether they interacted with each other using STRING. Four genes RAB17, FGD3 (down regulated), STX6 and CHMP6 (continuum) (Fig. 4) were found to be linked.
Based on the most common biomarkers of squamous cell carcinoma reviewed by Scanlon et al. 13 , we selected all the expressed genes related to epithelial-mesenchymal transition (EMT) in our OTSCC model (Supplementary  Table S7). In order to understand the relationship between those genes, we used STRING, which displayed 5 sub-clusters, grouping as (i) cadherins, (ii) laminins, collagen and integrins, (iii) integrins and laminins, and two clusters of collagen (iv, v) (Fig. 4). Interestingly, when we analyzed those genes grouping with our list of 26 highlighted genes, we found that the up regulated gene ENPP1 clustered with a collagen sub-cluster; ANGPTL4, a down regulated gene grouped with the integrins of a sub-cluster of laminins and integrins. MYH14, another down regulated gene, grouped with the sub-cluster of cadherins. Finally, PPARD, a gene of the continuum group, clustered with fibronectin and the sub-cluster of laminins, also exhibited higher affinity for the transcription factors SNAIL1 and SNAIL2, TWIST and LEF1, the most important transcription factors of the EMT of HNSCC 13 . Therefore, those 4 genes stand out as potential targets for oral cancer therapy.
To validate some of the common DEGs displaying altered expression, we selected 4-down regulated, 1-continuum and 1-up regulated genes ( Table 3, highlighted genes). We carried out RT-PCR and plotted the results relative to Ct (threshold cycle), as well as to the transcriptomic data (FPKM) (Fig. 5). The results showed that down regulated genes enhance their Ct values along with the metastatic progression. This means that less aggressive cells display higher expression, whereas the most aggressive stages exhibited a lower degree of expression; in contrast, up regulated gene ENPP1 displayed a continuum pattern, as well as PPARD, a continuum gene. Clinical data is consistent with our 4 potential therapeutic targets expression. In order to highlight the expression of these 4 potential therapeutic genes in patients of head and neck cancer, data from The Cancer Genome Atlas (TCGA, https://cancergenome.nih.gov/) of 248 clinical tumors were used to support our observations. In this regard, we analyzed the 4 genes proposed as possible molecular targets (ANGPTL4, MYH14, PPARD and ENPP1, Fig. 6, red lines) in terms of levels of gene expression: low, continuum and high. Considering that metastasis is the major factor in cancer lethality, we found a remarkable correlation between up and down regulation of selected genes and the survival rate of cancer patients. Low expression of genes ANGPTL4 and MYH14 correlates with high lethality (Fig. 6A and B, in red). Nevertheless, the up regulation of gene ENPP1 correlates with high lethality as shown in Fig. 6C, in red. In addition, gene PPARD whose expression did not change (continuum) appears to have no correlation with the survival rate plots (Fig. 6D, in red).

Discussion
Here, we sequenced and analyzed the transcriptomic data of 5 cell lines of OTSCC, characterized by their progressively increasing invasive capacity. The sequencing was made for 3 independent experiments for each cell line. Similar studies 14,15 have reported 21000 expressed genes. In our screening we found 28000 genes. The significant difference may be ascribed to the approach employed in the present work, namely pooling together seven data sets obtained from the biological and technical replicates (Supplementary Table 1). One way to validate the consistency of our datasets was to rank the genes according to the level of expression (FPKM) and to compare whether the top 10 most expressed genes were comparable taking into account both for the independent experiments and the experimental replicates (7 datasets). We found a high correlation that showed 6 highly expressed genes in LN1 and LN2 cell lines, 7 in LN3 cells and 8 in SCC9 and ZSG cell lines, respectively (data not shown). For all cell lines and their replicates microRNA6723 was the most expressed gene in each one of the 35 datasets. Also, we found that among the top 10 expressed genes the calcium binding proteins S100A6 and S100A9 proteins were included. S100A9 protein were associated to chronic inflammation in hypoxia response 16 , a possible mechanism that OTSCC induce to develop more aggressive stages.
In order to detect the differentially expressed genes (DEGs), we compared the parental cells with its derived cell line applying the student's t-test (p-values to the expression values). We found more than 5.000 DEGs. Then, we corrected those data by applying the FDR correction (q-values) to minimize the type I error. We found values   Table 2). Usually transcriptomic analyses are based on p-values. However, many reports are based on q-values to reduce the number of genes to be analyzed functionally. The problems involved in this kind of analyses are the cut-offs based on fold change and statistical significance 9 . Thus, depending on the aim of the work, one has to establish a compromise between the significance of the functional attributes of the gene expression and the number of genes analyzed. On the other hand, many works in the literature have "cleaned" their data by eliminating many non-coding genes (NCG) that have not part of specific biochemical pathways 17 . In the same way, some parameters that take into account the most differentially expressed genes, assume that the complexity of the cells could be downscaled to a small number of genes. In this work, we used all DEGs for p-and q-values and compare them in order to show the relative importance of this correction, holding a comprehensive and more robust analysis of the complex cell biological systems. By comparing DEGs between the parental and its derived cell line, we found 11-CoGE (Supplementary  Tables 3-6). They display all the possible types of expression regulation for each gene and for each comparison of the transformed cell lines. Establishing CoGE is an interesting tool to find the common DEGs and their expression properties between the comparisons. With this approach, new OTSCC biomarkers were identified (Table 3). In addition, this strategy evidenced expression features that may have been acquired or lost between the compared cells. In other words, CoGE can reveal in greater detail those phenotypic traits of cell lines that display individual or common characteristics that fall within the general bracket of malignancy. Usually, the analysis tools displays patterns of gene expression, which include some of the 11-CoGE described here but not all of them 11,[18][19][20] . In this report we have provided a comprehensive view of all gene regulatory possibilities when considering transcriptomics, as related to tumor progression. DEGs for p-and q-values were enriched for the PCG with their related KEGG pathways ( Table 1). The analysis allowed the observation that the number of genes considered did not necessarily produce a proportional number of pathways. This could be interpreted as meaning that genes may be endowed with multiple functions promoting in tumor cells highly plastic networks. An excess of PCGs resulting in fewer pathways could indicate a high functional redundancy. Accordingly, we found 284 KEGG pathways related to LN1 vs. LN2, exclusive DEGs up regulated in LN1 cells, for both p-and q-values. The groups comprising the commonly expressed genes (down or up regulated) had lower number of PCG and KEGG pathways. However, when analyzing the number of KEGG  pathways related to the human genome, we used all-35238 annotated cDNAs following the same procedures, which yielded 292 KEGG pathways (Fig. 2, Supplementary Table 6C).

GENE ID
In order to find out the KEGG pathways related to the 8 hallmarks of cancer described by Hanahan and Weinberg in 2011 10 , we classified each KEGG pathway into each characteristic proposed for cancer by performing manual curation of the data in the literature. Additionally, we found also 71 KEGG pathways related to other cancer types and chronic diseases. For this reason we added these two categories to our analyses. We observed that energy metabolism was the hallmark that included the highest number of related KEGG pathways. Biological processes related to metabolism are frequently found as most altered pathways in large scale analyses [21][22][23][24] . After metabolism, the other hallmarks categories followed: invasion and metastasis, chronic diseases, proliferative signaling, resisting cell death and evading immune destruction, angiogenesis, evading growth suppressors, other cancer types and replicative immortality (Supplementary Table 6). As our OTSCC model was developed by selecting a gradient of increasing metastatic potential, we used invasion and metastasis as a gold reference to weight the contributions of other hallmarks. Consequently, we compared the KEGG pathways falling into this category to KEGG pathways in the other hallmarks ( Fig. 2 and Supplementary Table 7). Among these, 25 were shared with energy metabolism, 22 with immune destruction evasion, 18 with proliferative signaling; 13 with resistance to cell death and angiogenesis, 11 with growth suppression evasion; 3 with replicative immortality, 1 with other cancer types and 0 with chronic diseases.
Consistently, we used the related DEGs to each KEGG pathway, searching for the genetic contributions of each hallmark to invasion and metastasis. We found that the principal inputs of the hallmarks corresponded to those with highest number of pathways and DEGs, namely evading immune destruction (22 pathways, 1542 genes) and energy metabolism (25 pathways, 1536 genes) (Fig. 2). In the same way, the less represented KEGG pathways and DEGs were found to display the lowest contributions, relative to replicative immortality and other cancer types. Interestingly, chronic diseases, a hallmark with no shared pathways with invasion and metastasis, had the fifth higher contribution of DEGs to the invasive process (0 pathways and 1028 shared genes). These data show that considering exclusively the number of pathways could be misleading.
All comparisons of DEGs related to KEGG pathways were carried out for each CoGE. Analyzing the same parameters (pathways and genes) we found clues about the specific biological features of each cell line. It must be mentioned, however, that we have no information as to whether those PCG are actually translated. Notwithstanding, preliminary proteomic data, have confirmed that the transcriptomic data parallel the implied mechanisms as show by the pathways analyses (Cesari IM et al., in preparation). It can be consider the possibility that the genes could also be post-transcriptionally and post-translationally regulated. Taken together the most important findings were that ZsG elements were closest to neurodegenerative diseases and also displayed more features of proliferative cells than LN1 cells. Conversely, LN1 elements were more similar to other cancer types, besides having different cytoskeleton regulation and interactions with the extracellular matrix (ECM) ( Supplementary Tables 7A and 7B and Table 2). This follows the observation that up regulated transcripts in cancer are down regulated in central nervous system (CNS) diseases and vice versa 20 . Indeed, another report revealed that up regulated genes in CNS disorders coded for low abundant proteins, and that the opposite occurred in cancer 25 . These observations may support the idea that the highest similarity with neurodegenerative illnesses occurs in the less aggressive cell lines. Furthermore, it is known that common features of metastasis involve MMPs genes, transcription factors, cyclooxygenases, chemokines, etc. 6,26 , especially those related to ECM interactions.
Based on those comparisons we can infer that LN1 and LN2 use different mechanisms to become metastatic; LN1 resorts to inflammation and proliferation and LN2 to immune system evasion. Indeed, inflammation was associated with amplification of the signaling loops that favor the metastatic cascade [27][28][29] . This is in agreement with previous reports showing that gene silencing was associated with tumor progression and metastasis. A point in case is MTA2 (metastasis tumor-associated protein 2) in glioma, in which it has been shown that proliferation and metastasis were inhibited 30 , while this gene was found to be upregulated in nasopharyngeal cancer 31 . In addition, the immune system can promote either activation or suppression of tumor growth, in a process known as "immunoediting" 32 . Some cancer cells present tumor antigens that lead to their elimination by the immune system. Alternatively during the process of immunoediting, they can lose those antigens due to either random genetic instability or in response to immune-induced inflammation 32,33 . Finally, LN2 cells were more similar the tumor cells classified as "other cancer types" than LN3 cells, whereas LN3 could induce angiogenesis and were capable to evade the immune system (Supplementary Table 7). Both mechanisms were already discussed for ZsG and LN2 cells.
The next step consisted in looking for the relative contribution of DEGs for each hallmark of cancer to invasion and metastasis. Table 2 showed that angiogenesis and evading immune destruction DEGs were the most representative. Regarding the number of genes, angiogenesis and evasion of immune destruction were the first and the fifth hallmarks with highest contributions, respectively. Conversely energy metabolism and other cancer types were those with lowest contributions of DEGs, being the second and the last hallmarks with higher number of genes, respectively. This means that evading immune destruction is a hallmark highly associated to invasion and metastasis in OTSCC due to the number of genes and its contribution to that process. In contrast, energy metabolism, the hallmark that displayed the highest number of KEGG pathways and the second highest number of gene contributions to invasion and metastasis, was less related to the invasive process (Fig. 3). These observations suggest that cancer therapies should target those genes involved in immune system evasion, angiogenesis and/or growth suppressors avoidance. It follows that although metabolism contains a high number of gene contributions, they may not be the most susceptible targets for an efficient therapy.
After evaluating the global behavior of the gene expression and its contributions to invasion, we identified common DEGs displaying altered-expression that could become biomarkers of OTSCC or metastasis. To accomplish that, we compared the 11 CoGE, and found 26 genes: 15 were down regulated, 10 were continuum and one up regulated (Table 3). Of the subgroup of down regulated genes, we found 3 genes described as biomarkers of HNSCC (RAB17 34 , NMU 35 and ANGPTL4 36 ), and one of EMT (CXADR 37 ). Of these, CXADR was reported to be down regulated 38 , agreeing with our results; no expression data for RAB17 was found although in our results we found to be down regulated; ANGPTL4 was reported to be overexpressed 39 in contrast to our result showing it was down regulated; NMU protein was proposed as biomarker, although in our data measuring RNA/cDNA levels, this gene was down regulated. Other 4 genes have expression data in HNSCC, namely VGGL1 40 , SCEL 41 , SCNN1A 42 and UNC5B 43 .
In previous works all of them were found to be down regulated, in agreement with our analysis. Finally, 7 common DEGs which had not been reported before in association with HNSCC expression were found to be down regulated. These were MYH14, RSAD2, SLC28A3, LIPH, GJA5, PLXDC2, and FGD3. Interestingly, mutations for MYH14 have been described in HNSCC 44 which were correlated to a negative regulatory activity of metastasis 45 . Incidentally, all 7 genes have been associated to cancer or metastasis and had been noted for their high level of expression 46-53 , even though PLXDC2 was down regulated in vulvar squamous cell carcinoma (VSCC). This was associated with unfavorable prognosis 54 .
In agreement with data obtained from renal carcinoma we found that DIABLO belonged to the continuum group 55 . Curiously this gene was found to be up regulated in cervical cancer 56 , in tumors of colorectal carcinoma patients 57 , and in gastrointestinal cancer 58 . In addition, TRPC4AP, another member of the continuum CoGE group did not display any type of regulation in mouse fibroblasts NIH-3T3 when induced by adenovirus early region 1 A protein (E1A) oncogene 11 , whereas it was found down regulated in a murine model of aggressive OSCC 59 . In the same way, PPARD, CHMP6 and TMED2 were found to be either down regulated [59][60][61] , or up regulated 61-63 depending on the treatment and the cell line types. Only DAPK3 and STX6 had been reported to be expressed in HNSCC, being down 64 and up regulated 65 . SRPRB was described as down regulated in peripheral blood cells of a melanoma patient when compared with healthy primary melanocyte cells 66 , as well as in breast cancer patients after 4 cycles of chemotherapy 67 . There are no data in cancer studies concerning SLC8B1 (encoding NCLX protein) and PIGG. However, SLC8B1 is known to play a key role in cellular and mitochondrial Ca 2+ homeostasis and thereby, it is implicated in cell Ca 2+ regulation, oxidative phosphorylation, hormonal secretion, synaptic transmission and apoptosis [68][69][70] . Moreover, PIGG is involved in ethanolamine phosphate transference and its mutations and deletions were reported to be associated with intellectual disorders, hypotonia and early-onset seizures 71 . Other members of PIGG's family, such as classes U (PIGU), T (PIGT) and X (PIGX) are oncogenic, being overexpressed in bladder cancer 72 and breast cancer cell lines 73,74 , suggesting a possible role in cancer development related to PIGG.
With regards to the up regulated genes the only one found was ENPP1 whose expression is stimulated by estrogen in stromal cells from normal human endometrium 75 . ENPP1 loss has been found in ovary cell lines occurring even without genomic deletion. The silencing of this gene can be attributed to hyper methylation of the connective tissue growth factor (CTGF/CCN2) promoter, that inhibits the expression of several genes 76 . Also, it was shown that ENPP1 is a potential facilitator of breast cancer bone metastasis, with high levels of both mRNA and protein synthesis 77 , occurring in a chromosomal region reported to be amplified in breast cancer 78 . The opposite was found in ovarian cell lines. Likewise it was shown that loss of microRNA-27b contributed to breast cancer stem cell generation by activating ENPP1. Clinical data suggest ENPP1 expression in primary breast cancer tissues is associated with malignant potential and response to chemotherapy 79 . In HNSCC, it was reported that ENPP1 gene was activated by anti-inflammatory stimuli 80 .
Our observations on common DEGs displaying altered expression corroborate the findings regarding classical biomarkers of invasion, such as RAB17, NMU, ANGPTL4, CXADR and ENPP1, and also allowed the proposal for novel biomarkers of OTSCC metastasis, such as PIGG and SCL8B1 (Table 3). Furthermore, we found 24 of 26-common DEGs displaying altered expression related to different processes in many types of cancer, strengthening our analysis strategy. For instance, NMU gene that encodes a HNSCC biomarker was found to be down regulated at the RNA/cDNA level in our work, leading us to consider the occurrence of post-transcriptional and/or post-translational modifications. Quite probably, a certain number of genes that we have found to be differentially expressed may not synthesize proteins. Similarly the proteome profile may not be deduced from the transcriptomic data, due to many factors pertaining to the transcription process 81 . Our approach suggests a set of pathways, out of many possible ones, that OTSCC could possibly undertake to become more aggressive.
When we compared those 26 sequentially altered genes with traditional biomarkers for OSCC, we found that ANGPTL4, MYH14, ENPP1 and PPARD interact with important subsets of genes involved in EMT (Fig. 4). Collagen and integrins are important components of the ECM, and actively participate in the invasive process 82 . MYH14, ANGPTL4 and ENPP1 clustered with those genes, as observed in the interactome. The transcription factors SNAIL1, SNAIL2, TWIST and LEF-1 promote EMT in HNSSC 13 and PPARD interacted with them. PPARD is activated by LEF-1 83 . This represents an interesting approach to define ANGPTL4, MYH14, ENPP1 and PPARD as novel HNSCC biomarkers. Moreover, we analyzed 248 clinical data of HNSCC from the TCGA and we found that expression levels of these genes in live patients correlate with our findings (Fig. 6).
Validation of the transcriptome was attempted by carrying out RT-PCR for 6 of the common DEGs displaying altered expression. Among them, 4 were down regulated in our model, and displayed a pattern that corroborated the transcriptomic data. Therefore, RSAD2, VGLL1, SCEL and ANGPTL4 could represent biomarkers of oral metastasis. On the other hand, ENPP1, an up regulated gene, did not display the same pattern as that of the RNA-Seq data. PPARD, a gene belonging the continuum CoGE, consistently did not change its expression amongst the metastatic progression (Table 3 and Fig. 5).
Finally, the reasoning used here for large-scale RNA-Seq analyses using all the DEGs with or without corrections (p-or q-values), in order to have a comprehensive and robust view of the complex cell system biology. No single analysis pipeline can be used in all cases 84 . The pipeline took into account PCG and KEGG pathways related to them. This allowed us to classify the DEGs related pathways into the hallmarks of cancer and to establish their contributions to any specific process or characteristic. In our case, we found that evading immune destruction and angiogenesis were the most related to invasion and metastasis. We propose that cancer treatments should be directed against those genes rather than metabolic or generic ones. Our approach can be used for other cellular and cancer related processes and diseases, being an interesting tool to highlight nodal genes or set of proteins on which to base new therapies. Lastly, genes such as ANGPTL4, MYH14, PPARD and ENPP1 might constitute interesting molecular targets for OTSCC treatment trials.

Material and Methods
Cell lines. Cell  Cell culture. Cells were cultured in Ham's F12 medium (DMEM/F12; Invitrogen, USA) supplemented with 10% fetal bovine serum (FBS) and hydrocortisone 400 ng/ml (Sigma-Aldrich, USA). 1.1 × 10 6 cells of SCC9, ZsG and LN1; 1.5 × 10 6 cells of LN2 and 2 × 10 6 cells of LN3 were transferred to 60.1 cm 2 Petri dishes, for 48 hours, using an incubator series 8000 water-jacketed CO 2 (Thermo Scientific), in humidity atmosphere of 5% of CO 2 . Three independent biological replicates of each cell line were used for the transcriptomic analysis. The cell lines were genotyped and tested free for Mycoplasma sp. infection.

RNA extraction.
Total RNA of ~6 × 10 6 cells in 3 independent biological experiments for all 5 cell lines was extracted using RNeasy kit (Quiagen ® ), according to the manufacturer instructions. Quality and purity of the samples were quantified using Nanodrop ND1000 (Thermofisher Scientific).
Library preparation and sequencing. Libraries were prepared using 4 μg of total RNA from each sample, strictly following the instructions of the TruSeq RNA Sample kit v2 (Illumina ® ). Seven technical replicates were obtained for all cell lines, from three biological independent experiments. Each library was uniquely identified using specific barcodes. The quality of library preparations was assessed using DNA 1000 kit for Bioanalyzer (Agilent ® ). Libraries were subsequently quantified by qPCR using Library quantification kit for Illumina (Kapa Biosystems ® ). 10 pM of sample libraries were distributed in 5 lanes of a flow cell, using TruSeq PE Cluster kit v3 -cBot -HS (Illumina ® ). A 100 × 100 Paired End run was carried out in an Illumina HiSeq2500 ® platform, using TruSeq ™ SBS Kit v3 -HS -200 cycles. Samples were multiplexed using Nextera kit, on which sequence adaptors were added into samples to differentiate and demultiplexed after the sequencing process. Data analysis. CASAVA ® tool was used to make the base calling, obtaining the FASTQ sequences for experimental and biological replicates. Using the FASTQ files, gMAP ® was employed to align our reads against the human genome v.38 (Ensembl), producing a GFF file and the Cufflinks tools align the coordinates of each read in GFF format to produce the frequency of each gene, expressed in Fragments Per Kilobase of exon per Million reads sequenced (FPKM). A FPKM ranking was obtained and these data were compared using Excel (Microsoft Corporation ® ). The first step consisted of the comparison between parental vs. its derived cell line, gene by gene, using Student's t-test, to obtain the differentially expressed genes (DEGs). False Discovery Rate (FDR) correction was made, generating a q-value, reducing type I error. A ratio between the derived and parental cell lines was obtained, to determine the type of regulation for each gene, and to establish the clusters of gene expression (CoGE). STRING ® free software (http://string-db.org) 12 was used to classify between protein coding genes (PCG) and non-coding genes, and to obtain the KEGG pathways related to each PCG. Panther ® (http://pantherdb.org) was used to determine the biological process of specific genes, based in gene ontology.

RNA extraction and cDNA synthesis for validation experiments.
Total RNA was isolated from oral cancer cells using TRIzol reagent (Invitrogen) according to the manufacturer's instructions. Total RNA was quantified spectrophotometrically using Nanodrop ND1000 (Thermofisher Scientific) and 1 μg was treated with 1 unit of RNase-free DNase for 30 min at 37 °C. Reactions were stopped by adding 1 μl of 20 mM EDTA and heating for 10 min at 65 °C. Synthesis of cDNA was performed using the DNase-treated RNA according to a High Capacity cDNA Reverse Transcription Kit (Applied Biosystems).
Real Time-PCR. Gene expression analysis was performed using a 7500 Real-Time PCR (Applied Biosystems) and power SYBR-Green PCR master mix (Applied Biosystems). The sequences of the primers used were: RSAD2 forward GCGTTGCGGGGAAACGAA reverse AGCGCCGGCCGTTTATC; VGLL1 forward GGACA TCAGCAGCGTAGTGG reverse CTCTGACTCGAGGGGGTCAA; SCEL forward TTGCAACCTGGCGGTTCATT reverse ACACCTGGTTCCCTCTTCTTCT; ANGPTL4 forward CTCTCTGGAGGCTGGTGGTT reverse TGTG GGATGGAGCGGAAGTA; ENPP1 forward CTATGGACGTGGGGGAGGAG reverse TAGGTGTTGGGGTCCTT GGC; and PPARD forward GTGGCTTCTGCTCACCAACA reverse CATCGTCTGGGTCTGAACGC. The comparative Ct method was used to contrast changes in gene expression levels. β-actin was used as an endogenous control.

Statistical analyses.
For transcriptomic data, statistical analyses were performed using Excel (Microsoft Corporation ® ), statistical significance was determined by student's t-test and false discovery rate (FDR) correction, both of them with α = 0.05. The results were expressed as means ± S.E.M for n independent experiments. Statistical significance was determined by two-way ANOVA, with α = 0,05. For transcriptomic data validation, Prism 7 (GraphPad Software) for Mac was used. The results are expressed as means ± S.D. of 3 independent experiments. Two way ANOVA and Tukey's multiple comparisons test were done 7,26 .