Deregulation of extracellular matrix modeling with molecular prognostic markers revealed by transcriptome sequencing and validations in Oral Tongue squamous cell carcinoma

Oral Tongue Squamous Cell Carcinoma (OTSCC), a distinct sub-group of head and neck cancers, is characteristically aggressive in nature with a higher incidence of recurrence and metastasis. Recent advances in therapeutics have not improved patient survival. The phenomenon of occult node metastasis, even among the purportedly good prognosis group of early-stage and node-negative tongue tumors, leads to a high incidence of locoregional failure in OTSCC which needs to be addressed. In the current study, transcriptome analysis of OTSCC patients identified the key genes and deregulated pathways. A panel of 26 marker genes was shortlisted and validated using real-time PCR in a prospective cohort of 100 patients. The gene expression was correlated with clinicopathological features including occult node metastasis, survival, and therapeutic outcome. The up-regulation of a panel of 6 genes namely, matrix metalloproteinase 9 (MMP9), Laminin subunit Gamma 2 (LAMC2), Desmoglein 2 (DSG2), Plasminogen Activator Urokinase (PLAU), Forkhead Box M1 (FOXM1), and Myosin 1B (MYO1B) was associated with failure of treatment in the early stage (T1, T2). Up-regulation of Tenacin C (TNC) and Podoplanin (PDPN) was significantly correlated with occult node positivity. Immunohistochemical analysis of LAMC2, MMP9, and E-Cadherin (ECAD) confirmed these markers to be indicators of poor prognosis. We propose this panel of valuable prognostic markers can be clinically useful to identify poor prognosis and occult node metastasis in OTSCC patients.


Gene set enrichment analysis on DEG obtained by transcriptome sequencing reveals deregulated extracellular matrix (ECM) in OTSCC.
To understand the biological roles of the DEG from OTSCC, we performed a gene set enrichment analysis on the DEG derived by transcriptome sequencing. We found GO terms for biological process enriched for Inflammatory response, Extracellular Matrix organization, Cell adhesion, Collagen Catabolic Process, Immune response and Angiogenesis. The top most significant enriched GO term for Cellular components were related to extracellular matrix remodeling. The top most significant enriched GO term for Molecular Function were Metalloendopeptidase activity, Heparin binding, Extracellular matrix structural constituent, Collagen binding and Receptor activity. The chief deregulated KEGG pathways included Cytokine-cytokine receptor interaction, Focal adhesion, ECM-receptor interaction and PI3K-Akt Protein-protein interaction (PPI) network shows ECM and focal adhesion interaction pathways deregulated in OTSCC. The PPI network generated in STRING using the 1698 DEGs included 1392 nodes and 2516 edges with a PPI enrichment P value < 1.0e−16. The top 10 nodes are presented in Supplementary Table S3. The module analysis tool extracted 8 most important modules with MSCORE > 6 as sub-networks from the overall PPI network (Supplementary Table S4). The top hub genes, Amyloid Beta Precursor Protein (APP), G protein subunit beta 5 (GNB5) and G protein subunit Gamma Transducin 2 (GNGT2) were part of Module 1. Functional enrichment analysis of the top 2 modules identified the GO terms and KEGG pathways listed in Supplementary Table S5. Most significant module 1 was enriched for GO terms chemokine activity and G-protein coupled receptor binding pathway and for the KEGG pathways, Chemokine receptor signaling pathway and cytokine-cytokine interaction pathways. Module 2 showed enrichment of GO biological process terms Collagen catabolic process and Extracellular matrix organization while the KEGG pathways enriched consisted of ECM receptor interaction and Focal adhesion. Integrated sub network of the top most significant modules of OTSCC-PPI network is shown in Supplementary Figure S3.

Molecular markers useful in differentiating the clinical stages in OTSCC.
The relative expression of LAMC2 (P value = 0.02), VIM (P value = 0.02), HIF1A (P value = 0.01), TWIST2 (P value = 0.03) and SOX2 (P value = 0.02) were found to be significantly differentially expressed among the early and advanced stage tumors.  www.nature.com/scientificreports/ The differential expression of potential markers between early versus advanced staged OTSCC has been shown (Supplementary Figure S4).

Validation of protein expression by immunohistochemistry.
Here, we present the results of protein expression of 3 biomarkers namely LAMC2 (Fig. 3), ECAD (Fig. 4) and MMP9 (Fig. 5) among the others which have been evaluated. The correlation of the IHC markers with various clinic-pathological parameters like age, sex, tobacco habits, alcohol habit, pathological stage, tumor grade, cervical node status, occult node status, perineural invasion status and type of growth was investigated.

Clinico-pathological factors and molecular markers influencing tumor recurrence and as predictors of death analysed by univariate and multivariate analyses. The DFS median was
24 months with a range of 0 to 54 months while the median for OS was 30 months with a range of 4 to 74 months. Univariate and multivariate analyses were performed to identify the risk factors for recurrence of disease and death due to cancer using the Cox proportional hazard regression model (Table 5). Kaplan Meier survival analyses were also done to identify the significant predictors of DFS and OS. Among all the patients, the classical prognostic factors like habits-tobacco chewing (P = 0.01), alcohol (P = 0.04), clinical factors like node status (P = 0.003), stage (P = 0.01) and occult node status (P = 0.005) were found to be significant predictors of disease recurrence in OTSCC. Tobacco chewing, positive nodal status and advanced stage of tumor increased the hazard of tumor recurrence by more than two-fold while a positive occult node increased the risk of recurrence by 4.27-fold. The same clinico-pathological parameters were also significant predictors of death. Tobacco chewing (P = 0.02) and alcohol (P = 0.01) were associated with an increased   www.nature.com/scientificreports/ hazard of death as seen by the HR of more than 2. Similarly, positive cervical node status (P = 0.008), advanced clinical stage (P = 0.01) and grade (P = 0.04) of the tumor were also significant predictors of death. Occult node positivity increased the risk of death by 5.19-fold and was the most significant clinic-pathological parameter among all factors. Kaplan-Meier survival analysis also confirmed the parameters clinical stage, node, presence of perineural invasion and occult node status to be predictors of disease recurrence and death (Fig. 6) with the presence of tobacco habits identified as an additional predictor of death. Among the molecular markers, high LAMC2 expression (Fig. 7a,b) was associated with an increased risk of recurrence (P = 0.004) and an increased risk of death (P = 0.006). Loss of membrane positivity of E-cadherin was associated with an increased hazard of disease-recurrence as seen by the increased HR of 3.19 for DFS (P = 0.001), when compared to no E-cadherin expression. It was also a significant predictor of death (P = 0.003) with a HR of 3.11. High MMP9 expression was associated with an increased risk of both disease recurrence (P = 0.002) and death (P = 0.004). These results were also mirrored by the Kaplan-Meier survival analysis of factors predicting DFS and OS (Fig. 7c,d) (Supplementary Table S9). Among the molecular markers, ECAD expression pattern www.nature.com/scientificreports/ at the ITF was the most significant predictor of both disease recurrence (P = 0.002) and death (P = 0.03) with an increased HR of 25.39 and 10.81 respectively. These results were confirmed by the Kaplan-Meier survival analysis (Fig. 7e,f).

Discussion
OTSCC is an aggressive cancer with high invasion and metastatic properties. Lymph node metastasis is the single most important prognostic indicator, however, in early stages, the lymph node metastasis is occult without clinical signs. Though, elective neck dissection is recommended for early stages, majority of the patients may not benefit substantially due to the surgery related morbidities. Biomarkers are therefore important in OTSCC and more importantly in early stages to prevent mortality and in identifying the high-risk patients. There is a need for reliable tests to detect aggressive early stage OTSCC to provide adequate treatment for these patients.
The findings of our current transcriptomic sequencing study were in complete agreement to our published meta-analysis study on OTSCC 22 . This study shows that role of tumor microenvironment in OTSCC with a number of extracellular matrix (ECM) components playing a crucial role in patient prognosis. Based on the data obtained from RNA-seq, the top up regulated genes included members of Matrix metalloproteinases and Melanoma associated gene families. The Gene Ontology terms enriched among the DEGs included inflammatory response, ECM organization, cell adhesion, collagen catabolic process and metalloendopeptidase activity. Studies have shown that biomarkers of stromal microenvironment might have a greater impact on prognosis compared to biomarkers related to tumor cells 23 . We found ECM receptor interaction and Focal adhesion as the most significant deregulated pathways in OTSCC. However, biomarkers related to tumor microenvironment have not been widely studied except activin A 24 . Degradation of the ECM that expediates the movement of tumor cells to surrounding tissue, is one of the important hallmarks of cancer progression and is facilitated by collagenases. Focal adhesion kinase is a tyrosine kinase that mediates signaling by the integrin family of cell surface receptors for ECM, leading to tumor cell migration and progression 25 . The module analysis additionally identified Neuroactive ligand-receptor interaction pathway, which is implicated in OTSCC 26 and other types of cancers, like breast 27 and bladder 28 to be dysregulated in OTSCC. Collectively, these results complement each other and reinforce the pivotal role played by ECM remodeling and stromal microenvironment in OTSCC which in turn clarifies the aggressive nature of the disease and its propensity to metastasize.
The hub genes in a network are expected to be key drivers influencing the normal functioning of a cell, owing to the central role played by them in the network with multiple interactions 29 . APP was identified as the most significant hub gene in both transcriptomic and meta-analysis studies 23 . APP, a type-I integral membrane protein, has shown higher expression levels in OSCC and is known to be associated with poor prognosis 30 . Secretory proteins assume a lot of importance in the context of a high-effect local disease like oral cancer owing to its potency to be detected in saliva. With more than 500 of the DEGs in the current study being secreted proteins, patient-based saliva proteomics could be an engaging approach to biomarker discovery in oral cancer 31 . SPOCK1 was found to be up-regulated in OTSCC but its mode of action and its role in OSCC has not been explored and merits an exhaustive analysis. SPOCK1 has the potential to act as a prognostic marker and the added advantage of it being a secretory protein, offers the possibility of non-invasive detection.
While surgery is the accepted first line of treatment in OTSCC, there exists several views over the clinical management of node negative patients with suggestions ranging from observation to elective neck dissection [32][33][34][35] . However there has been no single, decisive prognostic/predictive factor for a positive occult node 32 and clinical staging is insufficient and often underestimates the extent of occult node positivity 36 . These shortcomings have called for an exigent need to identify molecular markers to predict occult tumor burden and our study has attempted to answer it.
Our cohort comprised of stage I to stage IV cancers and was representative of the entire spectrum of OTSCC. Out of the 26 markers evaluated, 19 of the tested biomarkers were able to significantly differentiate tumor from normal samples. We report 12 genes namely, LAMC2, VIM, HIF1A, TWIST2, SOX2, TNC, PDPN, MMP9, DSG2, PLAU, FOXM1 and MYO1B found to be relevant clinical markers that can be easily tested in OTSCC tumors to indicate prognosis, cervical and occult node positivity and outcome. We have derived a panel of molecular markers significantly predicting recurrence of the early stage OTSCC based on the gene expression levels of 6 genes namely, MMP9, LAMC2, DSG2, PLAU, FOXM1 and MYO1B.
Several studies including our previous study have shown that members of MMP family are involved in OTSCC with very significant upregulation 23,[37][38][39] . Additional studies by Zhou et al., showed up-regulation of the MMP9 gene to be associated with advanced OTSCC and having a predictive value for the identification of lymph node metastasis 40 . These evidences along with our findings emphasize the importance of evaluating MMP9 in OTSCC.
LAMC2 was identified as a relevant gene and an important indicator of poor prognosis and cancer progression in OTSCC 41,42 with its overexpression indicating an advanced stage and positive cervical node status. Our results are in confirmation of earlier studies identifying LAMC2 as a prognostic factor in early stage and advanced stage OTSCC tumors 43 . Interestingly, in our cohort, the tumor area facing the stroma had a higher expression of LAMC2. Most of the extracellular spaces of the tumor areas showed accumulation of LAMC2 and expression of LAMC2 at tumor stroma interfaces indicates invasion and cancer progression 44 .
Other relevant genes were DSG2, and PLAU. DSG2 is an adhesion molecule and a known epithelial EMT marker overexpressed and is also known for bad prognosis marker in hepatocellular carcinoma 45 and gastric carcinoma 46 . Our observation of PLAU as a marker of relapse in OTSCC is in line with prior study in OSCC demonstrating the role of PLAU as a strong independent prognostic factor of DFS 47 . Baker et al., have previously reported the over-expression of PLAU and its correlation to aggression of tumors in OSCC 48 .
Several studies have implicated FOXM1 to play a major role in chemo-resistance in gastric cancer 49,50 , glioblastoma 51 , non-small-cell lung cancer 52 and most recently, also in oral cancer where it is proven to predict www.nature.com/scientificreports/ poor patient survival 53 . This is similar to our findings where FOXM1 overexpression is associated with relapse in OTSCC patients. MYO1B, a motor protein involve in cellular motility 54 , is known to be overexpressed in HNSCC 55 and is found to play a pivotal role in lymph node metastasis by way of augmenting cancer cell motility 56 . SOX2 is a transcription factor and stem cell marker, known to be expressed in several cancers, playing a major role in cell proliferation, metastasis 57 , tumor drug resistance 58 and is hence a valuable therapeutic target 59 . SOX2 maintains the stemness of CSC, ultimately resulting in cancer relapse and resistance to treatment 60 . Liu et al. demonstrated the involvement of SOX2 in promoting tumor aggressiveness and EMT in OTSCC 61 . In this scenario, our finding of SOX2 to be a marker of failure in OTSCC cases treated by radiotherapy is significant. We identified TNC and PDPN to be significant markers of occult node metastasis. TNC, a protein of the extracellular matrix with important functions in angiogenesis and tumor cell proliferation 62 , has been established as a diagnostic marker and potential cancer-associated fibroblast marker in breast ductal cancer 63 and cervical cancer 64 and a prognostic marker in early stage OTSCC 65 . The functions of TNC and its role as a cancer stem cell (CSC) marker 66 makes it an ideal marker of occult node positivity as found in our study. Another important marker of occult node positivity and poor outcome identified in radiation-treated locally advanced OTSCC was PDPN. PDPN is a trans-membrane receptor glycoprotein that increases clonal cell capacity, EMT, invasion and metastasis 67 , playing a pivotal role in several cancers. Preclinical studies have identified PDPN as a therapeutic target to combat several cancers including oral cancer [68][69][70][71] and nasopharyngeal cancer 72 . These findings corroborate our identification of PDPN as an occult node and poor prognosis predictor.
Our cohort had about 26.9% of occult positive cases that falls within the range of previous studies reported 34 . In another study on OTSCCs from our centre, the incidence of occult metastasis was 32.69% 73 . Currently, sentinel Lymph node biopsy has been suggested as a minimally invasive technique for nodal staging to reduce the needless morbidity of nearly 70-75% of pathologically node negative patients undergoing elective neck dissection. Since, sentinel node is generally believed to be the first lymph node or lymph nodes group receiving lymphatic drainage from the primary tumor, if this node is metastasis negative, the non-sentinel nodes in the neighboring regional basins are also deemed to be negative of metastases. Currently, there are no molecular markers that can identify metastasis in lymphatic basins, and we suggest that TNC and PDPN expression can be evaluated in a larger series of cases to identify patients at risk of occult micrometastasis.
Survival analyses of the clinicopathological factors predicting recurrence and death identified clinical stage, cervical node, occult node and tobacco habits as the classic prognostic factors in tongue cancer 74,75 . Our validations based on protein expression predicting risk of disease relapse and death in OTSCC patients confirmed the expression of LAMC2, MMP9 and ECAD at ITF as the important prognostic indicators. Thus, among our panel of markers, MMP9, LAMC2, VIM, DSG2 and TNC are well-known EMT markers, emphasizing the importance of EMT in OTSCC.
In the exclusive cohort of early stage patients treated by surgery, cervical node and peri-neural invasion significantly predicted failure in early stage OTSCC as reported earlier 76,77 . Loss of membrane ECAD at ITF predicted both risk of recurrence and death. Among the locally advanced cases treated by radiation, LAMC2, MMP9 and POSTN were the predictors of disease relapse while node, LAMC2 and POSTN were the prognostic factors predicting death.
Murthy et al. reported node to be an important prognostic factor in oral cancer patients treated by radiotherapy and went on to state that among the different oral subsites, OTSCC patients treated by radiotherapy perform poorly 78 . Our locally advanced tumors were treated by radiation and POSTN turned out to be an important prognostic factor in these tumors. High levels of POSTN are found to be associated with aggressive tumor behaviour, tumor progression, resistance to treatment and hence, bad prognosis [79][80][81] . Although Choi et al. have identified POSTN to be a diagnostic marker in OSCC by tissue microarray 82 and its role in OSCC tumorigenesis is well established 83,84 , its prognostic role in OTSCC has not been reported earlier.
Although many earlier studies have reported diagnostic biomarkers based on mRNA expression in OSCC 85 , very few have derived meaningful mRNA-based prognostic markers 86 , more so in OTSCC. Using a panel of markers has superior discriminatory power over using individual candidate genes and hence, our panel of biomarkers with further validations in a larger study, may have better utility in the clinical setting to indicate the risk of recurrencee in OTSCC.

Conclusion
OTSCC transcriptome sequencing upholds prior findings on the pivotal role of ECM degradation in OTSCC tumorigenesis and progression. Our cohort study addressed the most pressing issue of OTSCC management, namely, occult node metastasis prediction with 2 molecular markers and identified a panel of 6 molecular markers that can be used to differentiate the early stage tumors at higher risk of recurrence. Identification of novel biomarkers for early detection, prediction of response, and for use as treatment targets is of utmost importance to increase survival in OTSCC and the current study has fulfilled that. With further validation in a larger cohort, preferably, a multi-centric one, the panel of prognostic markers ascertained in our study can be used routinely to make important clinical decisions with respect to treatment modalities. Functional analyses. Functional interpretation of the DEGs was performed with Database for Annotation, Visualization and Integrated Discovery (DAVID, Version 6.8) 88 , a web-based tool for Gene Ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) pathway enrichment analyses 89 . A Benjamini-corrected P value less than 0.05 was used to identify a statistically significant analysis. The hub proteins were identified using the Cytoscape v3.6.0. plugin called 'Network Analyzer' 90 . The hub proteins were evaluated by analyzing the highest closeness centrality (CC), betweenness centrality (BC), and the node degree. Genes with degree ≥ 10 were defined as hub genes in the present study. In addition, the Molecular Complex Detection (MCODE) plugin of Cytoscape software was also employed to identify functionally related and highly interconnected clusters from the PPI network with a degree cutoff of 2, node score cutoff of 0.3, k-core of 4, and maximum depth of 100 (http://bader lab.org/Softw are/MCODE ) 91 . Significant modules were identified with MCODE score ≥ 4 and nodes ≥ 6. Subsequently, based on modules selected from the PPI network, functional enrichment analysis was performed using DAVID with the criterion of P < 0.05. The protein-protein interactions PPI was visualised using Search Tool for the Retrieval of Interacting Genes (STRING) online database (v10.5) (www.strin g-db.org) for network construction 92 . qPCR. Total RNA was extracted from the tissue samples, homogenising them in liquid nitrogen, using RNeasy mini kit (Qiagen, Hilden, Germany) using standard manufacturer protocol. Total RNA (2 µg) was converted to cDNA using High capacity cDNA reverse transcription kit (Applied Biosystems, California, U.S.A.) as per instructions of the manufacturer and stored at − 40 °C.Primers used for the study are listed in Supplementary  Table S7. The quantitative real-time RT-PCR was performed using FastStart Universal SYBR Green Master (Rox) (Roche, Basel, Switzerland) in a 20 µl reaction mix according to the manufacturer's instructions on a 7500 Real Time PCR System (Applied Biosystems, California, U.S.A.). The thermal cycling conditions used were as mentioned before 23 . Triplicates were performed for each gene and average expression value was computed for subsequent analysis. The relative expression level of the genes was calculated using the 2 −∆∆Ct method. We defined the cut-off value for over-expression of genes based on the median log RQ values for each of the marker. Patients with pathological stages I and II disease were grouped as 'early stage' tumors and patients with pathological stages III and IV were grouped as 'advanced stage' tumors. We used t test to identify the markers whose expression varied most among the staging, node positive and outcome subgroups.

Methods
Immunohistochemistry. The IHC methodology is mentioned previously 23 . The IHC detection of MMP9, LAMC2, ECAD, was performed on five-micron sections of FFPE tissues. The sections were deparaffinised in xylene, rehydrated in absolute ethanol and endogenous peroxidase activity was blocked by incubation in 0.3% hydrogen peroxide in Phosphate-buffered Saline (PBS) for 30 min. The sections were then subjected to heatinduced epitope retrieval according to the conditions listed in Supplementary Table S8. Briefly, the sections were blocked in 2% Bovine Serum Albumin (BSA) for 30 min and then incubated with primary monoclonal antibody overnight at 4 °C. The marker expression was observed using the SuperSensitive Polymer-HRP IHC Detection System (BioGenex Laboratories, San Ramon, CA) as per the instructions of the manufacturer. Primary antibody was replaced with 2% BSA in negative control and suitable positive control specific to each of the markers was used. Immunostaining of the sections was reviewed along with corresponding Haematoxylin and Eosin stained sections.

Statistical analysis.
Overall Survival (OS) was calculated as time in months from the date of primary treatment to the date of death due to the disease. Disease free survival (DFS) was calculated as time in months from the date of primary treatment to the date of objective tumor recurrence (local, nodal, locoregional, and distant metastases). The samples were blinded with respect to clinical data during the molecular testing. All statistical analysis was done using SPSS, version 16 (IBM).