An integrated approach for identification of a panel of candidate genes arbitrated for invasion and metastasis in oral squamous cell carcinoma

Oral squamous cell carcinoma (OSCC) is known for its aggressiveness associated with poor prognosis. The molecular mechanisms underlying the invasion and metastasis are still poorly understood. An improved understanding of these mechanisms shall precede the development of new diagnostic tools and targeted therapies. We report an integrated approach using bioinformatics to predict candidate genes, coupled with proteomics and immunohistochemistry for validating their presence and involvement in OSCC pathways heralding invasion and metastasis. Four genes POSTN, TNC, CAV1 and FSCN1 were identified. A protein–protein interaction network analysis teamed with pathway analysis led us to propose the role of the identified genes in invasion and metastasis in OSCC. Further analyses of archived FFPE blocks of various grades of oral cancer was carried out using TMT-based mass spectrometry and immunohistochemistry. Results of this study expressed a strong communiqué and interrelationship between these candidate genes. This study emphasizes the significance of a molecular biomarker panel as a diagnostic tool and its correlation with the invasion and metastatic pathway of OSCC. An insight into the probable association of CAF's and these biomarkers in the evolution and malignant transformation of OSCC further magnifies the molecular-biological spectrum of OSCC tumour microenvironment.


Material and methods
In this study, we aimed to unveil a panel of candidate genes involved in OSCC pathogenesis using a multiintegrated approach. The study protocol was approved by Institutional Ethics Committee, IMS & SUM Hospital, Siksha 'O' Anusandhan University, Bhubaneswar and was in accordance to the Declaration of Helsinki and written informed consent was obtained from all the participant as per the requisite.
Proteomic analysis. Protein from five samples each of well, moderate, and poorly differentiated grades of OSCC cases were pooled for proteomic analysis. We employed pressure cycling technology (PCT) that has been demonstrated to be efficient for proteomic analysis of archived formalin fixed paraffin embedded (FFPE) biopsy samples 12 . The clinicopathological features of the samples used for proteomic study are provided in Table 1.
Protein extraction and digestion using pressure cycling technology. Protein extraction from FFPE sections and protein digestion was performed using Barocycler NEP2320 (Pressure BioSciences, Inc, South Easton, MA). Deparaffinised FFPE tissue sections were gently scraped off the glass slide and transferred to PCT microtubes. Around 150 µl of tissue lysis buffer (4% SDS, 100 mM DTT and 50 mM TEABC) was added and incubated at 95º for 10 min. Protein extraction was done at 95 ºC and 60 cycles of alternating pressure consisting of 50 s at 40,000 www.nature.com/scientificreports/ psi and 10 s at 5000 psi). Protein lysate was clarified at 12,000 rpm for 20 min and supernatant was transferred to separate tube. Equal amount of protein from each sample of well, moderate and poorly differentiated OSCC was pooled before protein digestion. Protein was reduced using 10 mM dithiothreitol (DTT) at 60 ºC for 30 min followed by alkylation using 20 mM iodoacetamide in dark for 10 min. PCT-based protein digestion was done using Lys-c and Trypsin. Briefly, Lys-C was added to the protein lysate at 1:100 enzyme to substrate ratio and transferred to PCT microtubes. Protein digestion was carried out at 32 ºC for 45 cycles with alternating pressure of 20,000 psi for 50 s and 5000 psi for 10 s. Following Lys-C digestion, trypsin was added at 1:50 enzyme to substrate ratio and digestion step was repeated using barocycler 13,14 .
TMT labelling and mass spectrometry data acquisition. TMT-labelling was carried out as per manufacturer's protocol. Briefly, the lyophilized TMT-labels were reconstituted in 41 µl of anhydrous acetonitrile and added to the peptide samples. The reaction mixture was incubated for one hour and the reaction was quenched with 5% hydroxylamine. The samples were pooled, dried, desalted using STAGE tips, and dried. The dried samples were reconstituted in 0.1% formic acid and analysed on an Orbitrap Fusion Tribrid mass spectrometer (ThermoFisher Scientific, Bremen, Germany). The samples were first loaded onto a trap column (75 µm × 2 cm, nanoViper, 3 µm, 100 Å) by an Easy-nLC-1200 at a flowrate of 4 µl/min and then resolved on an analytical column (15 cm × 50 µm, nanoViper, 2 μm). The mass spectrometer was operated in positive mode and data-dependent acquisition was carried out with Synchronous Precursor Selection (SPS-MS3) for TMT reporter ions. The maximum injection time was set to 200 ms while the automatic gain control value was 500,000. Higher energy collisional dissociation of the top ten most intense precursor ions was achieved by a normalized collision energy of 33%.
LC-MS/MS data analysis. The mass spectrometry data was searched against the Human Ref Seq 81 protein database using two search engines-SequestHT and Mascot, through the Proteome Discoverer software suite (version 2.1, Thermo Scientific, Bremen, Germany). Precursor mass tolerance was set to 10 ppm while the fragment mass tolerance was set to 0.05 Da. Search parameters consisted of trypsin as the proteolytic enzyme with a maximum of two allowed missed cleavages. Fixed modifications included carbamidomethylation of cysteine and TMT at N-termini of peptides and lysine. Dynamic modifications included acetylation of protein N-termini and oxidation of methionine. Protein quantitation was carried out using the reporter ion quantifier node. False Discovery Rate (FDR) was calculated using a Target-Decoy strategy and only those PSMs that cleared the FDR threshold of 1% were retained. Immunohistochemical analysis. Patient-related data extradited from archival records and haematoxylin and eosin (H&E) sections were reassessed for OSCC cases used from the archival FFPE blocks. Samples, 50 cases each (40 samples of OSCC and 10 Normal tissue) used for POSTN, TNC, CAV 1 with same clinical and demographic characteristics study group (group 1); whereas varying clinical and demographic characteristics were used for study group of FSCN1 (group 2).
The OSCC samples for group 1 consisted of 9, 15, 9 and 7 samples from Stage 1, Stage 2, Stage 3 and Stage 4 respectively. Grade wise, this group included 18, 12 and 10 samples of Grade 1 (Well differentiated), Grade 2 (Moderately differentiated) and Grade 3 (Poorly differentiated) respectively. The clinicopathological features of the samples used for this group are provided in Table 2.
The OSCC samples for group 2 consisted of 5, 9 and 26 samples from Stage 2, Stage 3 and Stage 4 respectively. Grade wise, this group included 18, 20 and 2 samples of Grade 1 (Well differentiated), Grade 2 (Moderately differentiated) and Grade 3 (Poorly differentiated) respectively. The clinicopathological features of the samples used for this group are provided in Table 3.
Each FFPE block was selected for analysis with ample tumour tissue area for each case of OSCC. The usual five-micron thick unstained sections were cut from the FFPE blocks and mounted on charged glass slides. Standard immunohistochemistry (IHC) staining procedure was performed to check the expression of the selected genes. Polyclonal antibody against human periostin (OSF-2/periostin, was procured from BioVendor Laboratory Medicine, Modrice, Czech Republic. Polyclonal Anti-CAV1 and both Mouse Monoclonal Anti TNC-c and Anti FSCN 1 Antibody obtained from Biogenex, Fremont, CA. Normal mucosal epithelium obtained from the third molar impacted surgery used as a positive control. Recommendations by the manufacturer were followed for optimum antibody dilution. A three-step indirect process was followed based on the streptavidin-biotin complex, with peroxidase conjugated streptavidin molecules and the brown colour due to 3′-diaminobenzidine substrate chromogen formed at the histological site of the target antigen. Haematoxylin was used as a counterstain. A semi-quantitative method was used to score the sections of OSCC cases as described in earlier studies 15,16 . The individual sections were scored by inter-observers (Oral Histopathologists-SR and NM) to avoid bias. Sections were scored considering both the intensity and percentage of staining in the cells. For each antibody, a five tier scoring was done on scale of 0 to 4, where 0 was no staining, 1 in case of 25% (mild staining), 2 for 25-50% (medium staining), 3 for 50-75% (moderate staining) and 4 for ≥ 75% (strong staining) 17 . Further scoring was based on staining colour (reaction product) intensity as no staining, mild, medium, moderate and strong with scores of 0, 1, 2, 3, 4 respectively. The corresponding score for each sample in each slide for the four genes was calculated separately. Statistical analysis for the IHC data performed using GraphPad Prism 5 software. The categorical data compared to know the difference using the χ 2 test. Fisher's exact test used to detect the association of a gene with OSCC. A nonparametric unpaired Mann-Whitney test used to compare the mean of two independent groups, e.g. healthy and cancer's stage and grade 18 . The p-value < 0.05 was considered significant for each statistical test 10

Results
Bioinformatics. Prediction of OSCC candidate genes. Genes common among HCMDB (1938 metastatic genes), OrCaDB (374 genes) and HNdb database (1370 genes) were identified using jVenn ( Fig. 1A) with the idea to identify known metastatic genes that are also reported to have an established role in oral cancer aetiology. A network was generated using the identified common genes using the STRINGdb with medium confidence level (0.7) for interactions. A cluster analysis was performed for metastatic genes around Tissue growth factor-β (TGF-β) and Epidermal growth factor receptor (EGFR) nodes, using K-means clustering method in STRINGdb. The strategy yielded a total of 54 genes within three clusters (referred to as clusters A, B and C) (Fig. 1B). The cluster A had 13 members, cluster B had 36 genes (maximum), and cluster C was very small with only 5 genes. The interaction via FN1, SRC and STAT3 nodes predicted four closely associated genes POSTN, TNC, CAV1 and FSCN1. A detailed literature survey carried amongst the genes from identified clusters revealed interesting link between these genes in particular. They were found to be directly interacting with TGF β and had literature linked to CAF's and OSCC TME. These genes together did not present any relationship with the progression of OSCC in any of the databases. When initially reviewed for this study, it was interesting to see these genes had an individual role to play in OSCC pathogenesis. When linked together through PPI interaction showed evidence of a connected pathway for OSSC progression and thus can act as a panel of potential predictors (PPI enrichment p-value 0.027) (Fig. 1C). This provided a base for initiating a hypothesis for identification and assessment of candidate biomarkers in invasion and metastasis of OSCC.
Evidence in literature/databases for predicted candidate genes. The predicted genes were checked for their differential mRNA expression in oral cancer samples (OSCC/TSCC) in the Oncomine database (https ://www.oncom ine.org) and in the Expression Atlas (https ://www.ebi.ac.uk/gxa/). The analysis showed that POSTN, TNC and FSCN1 are highly upregulated in OSCC with fold changes of 5.31, 6.55 and 5.05 respectively, whereas CAV1 showed a mild upregulation (fold change = 2.97) in Oncomine database (Tables 4, 5). The analysis of mRNA sequencing data from Cancer RNA-Seq Nexus (CRN) database revealed these genes in all stages showed a fold change higher than 2, except for CAV1 in Stage III and Stage IV A (Table 6). A comparative heat map generated using Oncomine and CRN data showed TNC, POSTN and FSCN1 are significantly upregulated whereas CAV1 is moderately upregulated. TNC and CAV1 had the highest and lowest upregulation respectively in Oncomine data. The CRN does not have data for TNC. The POSTN and FSCN1 data showed increasing expression level as clinical stage progressed, whereas the CAV1 showed constant upregulation through the clinical stages (Fig. 1D). www.nature.com/scientificreports/ Proteomics. Proteomic profiling using PCT and TMT-based mass spectrometry analysis was performed.
The cohort cases majorly comprised of habit history of chewing tobacco in 11 of the 15 participants, whereas only one patient had a history of using smoking and chewing tobacco along with alcohol habit. Majority cases belonged to the gingivobuccal complex (GBC) site and Stage IV as per clinical staging. The cases of all assorted grades taking five each from various histopathological differentiation. In our spectrum, the base peak of the four candidate genes showed the greatest relative abundance (100%) along with other peaks corresponding to ion fragments. The proteins identified IIDGVPVEITEK (POSTN), SQTVSAIATTAMGSPK (TNC), YLAPSGPSGTLKS (CAV1), YLAPSGPSGTLKS (FSCN1) (Fig. 2). Analysis of the fold change pattern revealed increased expression of POSTN, TNC, CAV1, and FSCN1 in moderately differentiated OSCC indicative of their role in invasion and metastasis. (Table 7). In the graphical representation of the differential expression related to the genes, TNC showed the highest fold change difference when moderately differentiated OSCC samples were compared to poorly differentiated cases. POSTN too had enumerable fold change difference when in comparison to different grades (Fig. 3). When p-values were calculated, TNC had a highly significant value in well vs poorly differentiated cases (p < 0.0001). Further, POSTN also showed a significant difference when well-differentiated cases were compared to poorly-differentiated cases (Table 8).
Immunohistochemistry. The expression of POSTN, TNC, CAV1 and FSCN1 was localized predominantly in the cytoplasm of the tumour cells. Though in TNC and CAV1, immunoreactivity was observed both in membrane and intra-cytoplasmic of some sections. The grade wise staining pattern is described in Fig. 4A (Fig. 4C) whereas the results were insignificant when compared between all the stages. In case of FSCN 1, significant difference was found between normal and stage 2, stage 3 and stage 4. All the four genes in both grade and stage wise comparison showed significant differences in expression from normal to OSCC samples.
Hypothesis. TGF-β1 signalling known to induce myofibroblastic differentiation depending on the expression of ECM proteins 19,20 . Under the hypoxic environment, CAF's and TGF β reciprocally induce invasion 21 . TGF-β3, in particular, has been found to trigger the induction of POSTN in CAF's and induces production of stromal POSTN 22 and in turn CAF's signalled by POSTN leads to the secretion of ECM protein 23 . Further POSTN performs as a ligand for integrin's αvβ3 and αvβ5, promoting activation of multiple pathways like PI3kinase-Akt pathway leading to increased tumour cell invasion 24,25 . While regulating the EMT, POSTN plays a role in cancer stemness via interacting with protein tyrosine kinase 7 (PTK7) and propagates the cancer stem cell (CSC)-like phenotype via PTK7-Wnt/β-Catenin signalling 26 . CAF's are able enhancer of tumour invasion through their     www.nature.com/scientificreports/ secretion of TNC-c, TNC-w, HGF, and MMPs along with TGF-β 27 . POSTN was also found to incorporate TNC into the TME and promote stiffening of ECM in the cancer niche, thereby releasing the active form of TGF-β1 regulating cancer cell proliferation via ERK signalling pathway 28 . Further downregulation of CAV1 in TME promotes adjacent normal fibroblasts into CAF's phenotype and stimulates Rho-and force-dependent contraction, matrix alignment 21 . This TME stiffening through regulation of p190RhoGAP favours directional migration and invasiveness of carcinoma cells 10 . This illuminates a link between CAV1 and FSCN1 in facilitating of cell migration, invasion and metastasis via Src/FAK pathway and filopodia formation 29 . CAV1, delivered by cancer cell-derived exosomes and its accumulation in OSCC TME promoted EMT and trans-differentiation of fibroblasts junto CAF's 23 . Thereby, taking the above thorough literature survey into account there is an evident link in their pathways related to invasion, progression and metastasis, they have never been studied earlier as a panel for candidate biomarkers. We propose a mechanism/pathway (Fig. 5) to enumerate the occurrence of the four genes POSTN, TNC, CAV1and FSCN1 in OSCC TME via the role of CAF's and TGF-β, as a panel of candidate biomarkers for OSCC.

Discussion
India, a country with a population of about 1.3 billion, presents lip and oral cavity cancer as prevalent cancer amongst men in the global map. According to the World Health Organization (WHO), in 2015, cancer was the first or second leading cause of death before age 70 years in as many as 91 countries. The GLOBOCAN 2018 database for 185 countries and 36 cancers states lip and oral cavity cancer cluster in certain high-risk regions like India and are more frequent in males 30 . According to Global Cancer Observatory (https ://gco.iarc.fr/tomor row, accessed 20-01-2021) incidence and mortality in world population for both male and female is predicted 553 K and 263 k (47.55%) respectively by 2040. While the incidence and mortality in Male only is 382 K and 182 K (46.64%) respectively. So according to this data male are on higher risk of incidence (70.34%) and mortality (69.2%) comparative to female incidence (30.92%) mortality (30.87%).   www.nature.com/scientificreports/ ST use in adults Its already reported that smokeless tobacco (ST) use though prevalent in 127 countries worldwide; the highest rates of consumption and risk estimates for oral cancer is in South and Southeast Asia 31 . Global Adult Tobacco Survey India 2016-2017 (GATS 2, https ://www.who.int/tobac co/surve illan ce/surve y/gats/ GATS_India _2016-17_FactS heet.pdf?ua=1) attributed ST, alcohol use and smoking as leading risk factors for lip and oral cavity cancer in India.
Over the years, the need for finding a tumour biomarker for the prognosis and better treatment protocol for OSCC patients has been tenacious. Literature survey related to OSCC did yield few relevant studies which have attempted to find a candidate biomarker in the field of OSCC, but the haunt is still on 7,32 . Hypothesis proposed in our study establishes an interface between pathways related to invasion and metastasis in OSCC TME via www.nature.com/scientificreports/ the role of CAF's and TGF-β and the four genes POSTN, TNC (both extracellular matrix proteins), CAV1 (a scaffolding protein) and FSCN1 (organization of actin filament bundles). Similar to this study, an integrated bioinformatics analysis approach has already been reported to predict candidate genes related to OSCC in the past 10 . It is pertinent to note that the lack of targeted therapies is an imperative reason behind low survival rate in HNSCC and in particular OSCC. Our study attempts to provide a comprehensive perspective to understand underlying mechanisms in OSCC and pivotal pathways that may be exploited to develop targeted therapeutics. Based on our hypothesis, we co-related the expression and interface between these genes with evidence collected from literature review. Kikuchi et al., initiated an association between CAF's and POSTN, in 2008 using in-situ hybridisation (ISH) 33 . POSTN secreted by CAFs, promotes activation of the PI3kinase-Akt pathway established in 2015 24 . CAF-derived POSTN plays a role in cancer stemness via interacting with Protein tyrosine kinase 7 (PTK7) in HNSCC 26 . Even, Tn-C/TNC in OSCC, transpires as different isoforms generated by alternative splicing and de novo glycosylation 34 . For understanding the association of CAF's and CAV1, a study in 2010 reported downregulation of CAV1 in TME promoted adjacent normal fibroblasts into CAF's phenotype 5 . Further, CAV1 stimulates Rho-and force-dependent contraction, matrix alignment, and TME stiffening through regulation of p190RhoGAP favouring directional migration and invasiveness of carcinoma cells in vitro. Extracting the link between CAV1 and FSCN1 revealed facilitation of cell migration, invasion and metastasis 29 . Another study found that TGF β1, Epidermal growth factor (EGF) and Interleukin 1β (IL 1β) significantly stimulated FSCN1 expression and even suggested that RhoA and Nuclear factor kappa B (NF κB) signals were involved in the same. After considering all pieces of evidence, we further initiated proteomic profiling of these four genes POSTN, TNC, CAV1 and FSCN1 which have shown to be part of the modulation process of TME via CAF's and TGFβ pathways. A SELDI-TOF protein chip system used to screen proteins in saliva from pre-and post-treatment OSCC samples, too displayed an altered pattern of proteins 35 . Using the proteomics data and bioinformatics results, the prioritization index of biomarker candidates for IHC on tissue revealed POSTN as a top candidate 36 . Various studies found upregulation of TNC and validated with IHC [37][38][39] . Using proteomics and bioinformatics techniques, CAV1 has been identified as a major network-centric protein between gastric cancer-associated fibroblasts (GCAFs) and their corresponding inflammation-associated fibroblasts (GIAFs) which was later validated using IHC 40 . Very few studies were found to be assessing FSCN1 role in malignancy using proteomic profiling and validation by immunohistochemistry further revealing its role as a specific target 41 . On literature search for proteomic profiling of POSTN, TNC, CAV1 and FSCN1 in OSCC or oral cancer, we could not find any related studies reported till date.
We followed the protocol as suggested in the study using FFPE tissue samples in a PCT and TMT-based mass spectrometry analysis approach, and bioinformatics results for isolation of candidate biomarkers 36 . Though the data was consistent with several other types of cancers, we could not find any relevant comparative analysing data with OSCC for our panel of candidate biomarkers. Considering this, we shall report as one of the front runners in proteomic profiling data for our candidate biomarkers POSTN, TNC, CAV1, FSCN1.
Multiple studies using IHC conveys, POSTN to be upregulated at the invasive front in both tumour epithelia and the surrounding matricellular space [42][43][44] . When studied in OSCC/HNSCC, the expression of POSTN in the epithelium is associated with a more aggressive tumour phenotype in OSCC, as was determined by the mRNA and IHC expression 45 . Also, upregulation of POSTN gene expression and establishment of its role in tumour lymphangiogenesis, making it evident that this can be used as a potential biomarker for OSCC 46,47 . In a study www.nature.com/scientificreports/ in 2006, POSTN expression was well correlated with the invasion pattern and metastasis. TNC upregulation at the invasive tumour front has already been associated with poor clinical outcome suggesting its role in metastatic progression 48 . Taking various cancers into account, CAV1 was found to be downregulated in some while increased expression seen in a few others, indicative of a biphasic nature of CAV1 49 . In a total of 26 IHC studies of 5 prevalent human carcinomas when identified for meta-analysis found FSCN1 was associated with increased risk of mortality for breast and oesophageal carcinomas. A number of studies have demonstrated that FSCN 1 might be playing a role in malignancy and also have suggested a significant correlation between the expression of FSCN1 with local lymph node metastasis 15 . Role of FSCN1 as a prognostic biomarker for OSCC cases was first concluded in 2007 50 . To better explain the role of this protein, several studies have investigated its function, including a study on two OSCC cell lines concluded that FSCN1 expression might have an essential role in the regulation and development of OSCC that acts through epithelial-mesenchymal transition (EMT) and changes in E-cadherin and β-catenin expressions 51 . Actin components such as microspikes were found to be thicker and longer and showed the formation of more filopodia and lamellipodia, depicting the role of FSCN in disrupting the cell-cell contact and was instrumental in the progression of primary OSCC tumor 52 . FSCN1 over-expression in lymph nodes were significantly associated with clinico-histopathological parameters 15 . Considering IHC as a robust validation methodology, we validated the results of our bioinformatics analysis and proteomic profiling of our panel of candidate genes using IHC. In our study, the expression of POSTN, TNC, CAV1 and FSCN1 found to be localized predominantly in the cytoplasm of the tumour cells. It is worth to reiterate that these genes have shown significant expression differences from normal to OSCC samples (all stages and grades). Thereby the upregulation from normal to different grades and stages does indicate the expression of protein through invasion and progression, in due course preceding to metastasis.
For taking up IHC validation, we included samples of all grades and stages in group 1 maintaining heterogeneity. In group 2, no samples for Stage 1 and only 2 samples in Grade III were included because in this group, lymph node assessment and survival analysis was included for few patients having follow up data We do consider this a as a limitation of our study.
About CAV1's role in OSCC, overexpression in the cytoplasm of OSCC and its upregulation was in sync with tumour progression. Increased CAV1 expression is seen in the stepwise carcinogenesis from normal to primary OSCC. In contrast, decrease in expression seen between the grades and stages or primary OSCC to metastatic OSCC indicating at its biphasic functions 49 . This may owe to the theory of a reverse Warburg metabolism in OSCC that reduced CAV1 can lead to an increase in oxidative stress promoting metastasis 53 . The data in our study matched this logic as it showed upregulated expression for normal vs grades and stages in case of CAV1.

Conclusion
The current study has used an integrated approach utilising bioinformatics, proteomics and immunohistochemistry for identification of candidate biomarkers and their validation. Identification of four potential candidate genes and a hypothesis was generated for their plausible role in invasion and metastasis. The experimental validation of predicted genes through proteomic profiling using PCT and TMT-based mass spectrometry and immunohistochemical analysis indicated the robustness of the current approach. We anticipate that the identified genes will aid into our understanding of the molecular mechanisms underlying the invasion and metastasis and be of assistance in identifying novel targeted therapeutics (Supplementary Information).