Epithelial-mesenchymal transition gene signature is associated with prognosis and tumor microenvironment in head and neck squamous cell carcinoma

In this study we assessed the clinical significance of an epithelial-mesenchymal transition (EMT) gene signature and explored its association with the tumor microenvironment related to immunotherapy in patients with head and neck squamous cell carcinoma (HNSCC). Genes were selected when mRNA levels were positively or negatively correlated with at least one well-known EMT marker. We developed an EMT gene signature consisting of 82 genes. The patients were classified into epithelial or mesenchymal subgroups according to EMT signature. The clinical significance of the EMT signature was validated in three independent cohorts and its association with several immunotherapy-related signatures was investigated. The mesenchymal subgroup showed worse prognosis than the epithelial subgroup, and significantly elevated PD-1, PD-L1, and CTLA-4 levels, and increased interferon-gamma, cytolytic, T cell infiltration, overall immune infiltration, and immune signature scores. The relationship between PD-L1 expression and EMT status in HNSCC after treatment with TGF-β was validated in vitro. In conclusion, the EMT gene signature was associated with prognosis in HNSCC. Additionally, our results suggest that EMT is related to immune activity of the tumor microenvironment with elevated immune checkpoint molecules.

EMT gene signature was associated with the prognosis of patients with HNSCC. Three independent cohorts (Leipzig, FHCRC, and MDACC cohorts) were used to validate the robustness of the EMT gene signature. Validation was performed as outlined in the flow chart in Fig. 2A.
Consistent with the results from the training cohort, patients were classified patients into Epi and Mes subgroups based on the EMT signature. When patients in the Leipzig, FHCRC, and MDACC cohorts were stratified according to EMT gene signatures, significant prognostic differences between subgroups were identified in all independent validation data sets.  cohort were clustered by two-way hierarchical clustering using the Pearson correlation distance between genes (rows), Euclidean distance between EMT signature genes (columns), and Ward's linkage rule. The TCGA cohort was divided into two distinct subgroups of HNSCC: the Mes group (red) and Epi group (blue). The 82 EMT signature genes were separated into distinct epithelial-related genes (green bar) and mesenchymal-related genes (red bar). (B) Kaplan-Meier plots of DFS of patients with HNSCC in the TCGA cohort.

HPV-positive and HPV-negative patients show distinct EMT gene signatures. The patients
included in the TCGA and Leipzig cohorts were divided into HPV-positive and HPV-negative subtypes according to their HPV status. The survival analysis did not reveal statistically significant differences in 5-year OS between the Epi and Mes subgroups in HPV-positive patients (Log rank test p = 0.342; Fig. 3A). However, in HPV-negative patients, Mes subgroups had significantly worse 5-year OS than the Epi subgroup (p = 6.8e-06; Fig. 3B). These results suggest good predictability of the gene signature for HPV-negative patients.
Furthermore, CTLA-4 (CTLA4) expression was higher in the Mes subgroup than that in the Epi subgroup ( Fig. 4B; p = 7.8e-8 and p = 8.2e-7, respectively). Notably, the level of immune checkpoint molecules, including CTLA-4 was significantly elevated in the TCGA and Leipzig cohorts. INFG, CYT, TIS, IIS, and IS scores were elevated in the Mes subgroup. To assess the association of EMT signature with immune activity in the tumor microenvironment we analyzed the gene expression of several immunotherapy-related scores. Interferon gamma (INFG) and cytolytic activity (CYT) scores were significantly elevated in the Mes subgroup in the TCGA and Leipzig cohorts (p = 0.021, p = 0.0019, p = 0.0079, and p = 0.0091, respectively, Fig. 4C). The T cell infiltration score (TIS) and overall immune infiltration score (IIS) were also significantly elevated in the Mes subgroup in the TCGA cohort (p = 1.57e-12 and p = 2.2e-16, respectively). Furthermore, the immune signature (IS) score was higher in the Mes subgroup in the both cohorts (p = 0.046 and p = 3.1e-6, respectively; Fig. 4E).
tGf-β-induced EMT enhances PD-L1 expression in HNSCC cell lines. The role of TGF-β as the primary inducer of EMT has been reported in several cancers [11][12][13] . Therefore, to investigate whether EMT status is associated with PD-L1 expression, we induced EMT status in YD-10B and HSC-4 HNSCC cells by treatment with TGF-β1 (0.1-1 ng/mL). To determine EMT status in YD-10B and HSC-4 HNSCC cells, expression levels of the epithelial cell marker (E-cadherin) and mesenchymal cell marker (vimentin) were analyzed using real-time PCR and western blotting. mRNA and protein levels of E-cadherin were decreased following treatment with TGF-β1 in a dose-dependent manner in YD-10B and HSC-4 cells, whereas mRNA and protein levels of vimentin were increased. Interesting, mRNA and protein levels of PD-L1 were increased (Fig. 5A). To determine whether PD-L1 expression was controlled by TGF-β-induced EMT, mRNA and protein levels of PD-L1 were analyzed in head and neck cancer cells under EMT and MET status using a reversion assay. mRNA and protein levels of PD-L1 were significantly altered under treatment with TGF-β1, as compared with control, and reverted to control levels after switching to culture medium without TGF-β1 (Fig. 5B). Furthermore, the TGF-β1-induced increase of PD-L1 expression was abolished by treatment with a TGF-β1 receptor kinase inhibitor (SB 431542) (Fig. 5C). These results suggest that PD-L1 expression is regulated by TGF-β-induced EMT status through the TGF-β signaling pathway.  www.nature.com/scientificreports www.nature.com/scientificreports/ Association between EMT signature and clinicopathologic characteristics of HNSCC. We sought to identify the association between EMT signature and clinical or pathological characteristics of HNSCC, and hence we performed subset analyses in the TCGA and Leipzig cohorts (Supplementary Table S2). In particular, the association between EMT subgroups and tumor sites, T stage, regional lymph node metastasis, TNM stage, HPV status, smoking status, and alcohol consumption was evaluated. We did not find any significantly different clinical or pathological characteristics of HNSCC with which to identify the Mes and Epi subgroups.

Relationship between EMT signature and somatic mutation.
To identify somatic mutations that coincide with the activation of EMT signature in HNSCC, the somatic mutation data in the TCGA cohort (n = 493) were analyzed. Based on previous studies and results from the TCGA analysis, we selected six genes that are most likely to be associated with HNSCC [14][15][16][17][18] .
The somatic mutation frequencies of the six genes were compared between the Mes and Epi subgroups. The somatic mutation frequencies of EGFR and TP53 were significantly different between the two groups (the Mes subgroup presented more frequent mutations) (Supplementary Table S3). There was no difference in mutation rates between the Mes and Epi subgroups for the other four genes.

Discussion
In the present study, a robust EMT gene signature, clinically significant to patients with HNSCC, was developed and tested. The results revealed that Mes subgroup had poor OS in the training cohort. In the test cohorts, Mes subgroup showed worse OS and DFS. EMT signature showed independent prognostic effects in Cox proportional hazards analysis. Therefore, EMT gene signature may serve as a prognostic biomarker in HNSCC.
HPV is known to be involved in tumor formation through specific mechanisms, but it is unclear how this involvement affects EMT activation 19 . Jung et al. reported that HPV16 positivity correlates with the EMT transcription factor ZEB1 and that both E6 and E7 increase Slug, Twist, ZEB1, and ZEB2 expression 20 . Other studies have demonstrated that E6 or E7 inhibited E-cadherin at the cell surface 21 . In the present study, there was a significant difference in the OS of HPV-negative patients between the Mes and Epi subgroups, but not in the OS of HPV-positive patients. This suggests that EMT signature is another factor for predicting the prognosis of patients with HNSCC with respect to HPV status, which is line with a previous report showing that HPV positivity induces the expression of EMT-related genes 20,21 .
A number of authors reported that immune checkpoint inhibitors are effective in patients with elevated immune checkpoints, but biomarkers representing immune responses and immune checkpoint status are yet to be identified [22][23][24] . EMT is a biologic process related to decreased cell adhesion and increased invasiveness, and is known as a key program for metastasis and drug resistance in many cancers 2,3,25,26 . Studies have reported that EMT transcription factors (Snail or CCL2) can induce EMT process and are related to immunosuppressive cytokines activation and T-lymphocyte resistance in several cancers [27][28][29][30] . It has been reported that EMT can induce PD-L1 expression in non-small cell lung cancer 31 . However, the relationship between EMT and PD-L1 expression in HNSCC is not yet well known. We found that the levels of immune checkpoint genes, such as PD-1, PD-L1, and CTLA-4 were significantly elevated in HNSCC patients with "mesenchymal" phenotype. We also investigated whether EMT signature is closely associated with immune checkpoint genes in HNSCC cell lines. To study this, we induced EMT in HNSCC cell lines by treatment with TGF-β, and analyzed PD-L1 www.nature.com/scientificreports www.nature.com/scientificreports/ expression compared to cells under EMT and MET status. In this result, PD-L1 expression was enhanced by TGF-β-induced EMT, thereby indicating the relationship between EMT and immune checkpoint genes. Next, we sought to investigate whether EMT signature is associated with tumor microenvironment change using known immunotherapy-related gene signature. In the KEYNOTE-001 study, six gene signatures of INFG-related genes (IDO1, HLA-DRA, STAT1, CXCL9, IFNG and CXCL10) were previously developed in melanoma patients 32 . These six genes were significantly associated with the response to anti-PD-1 inhibitor in patients with melanoma. CYT was assessed by granzyme A (GZMA) and perforin (PRF1), and was dramatically associated with CD8 + T cell activation. Therefore, CYT could be used to predict clinical response to treatment with immune checkpoint inhibitors 33 . mRNA-based TIS was used for characterizing the computational immune cell decomposition for solid tumors. T cell infiltration in the tumor microenvironment is a key process in the balance of tumor and anti-tumor immune responses 33,34 . The IS score is known to be a transcriptional predictor of anti-CTLA-4 immune checkpoint inhibitor and was assessed in the genomic data from ~10,000 human tissues across 30 different cancer types to estimate the potential response to immunotherapy 35 . INFG, CYT, TIS, IIS, and IS scores were elevated in patients with "mesenchymal" phenotype as compared with those in patients with "epithelial" phenotype. Our findings demonstrated that EMT is associated with immune activation of the tumor microenvironment and elevation of multiple targetable immune checkpoint molecules, and may be used as a prognostic indicator in patients with HNSCC. EMT could induce cancer growth and metastasis by reprogramming immune activity in the tumor microenvironment. This suggests that patients in the Mes subgroup might have better response to immune checkpoint inhibitors, and targeting immune checkpoints may affect cancer metastasis and drug resistance associated with EMT. Therefore, our data suggest that EMT signature-based biomarkers may be valuable for identifying patients who can benefit from immune checkpoint blockade agents.
In conclusion, our results showed that the EMT signature was associated with the prognosis of patients with HNSCC, and in particular, patients in the Mes subgroup showed poor prognosis. In addition, there was an association between EMT and immune activity of the tumor microenvironment with elevated immune checkpoint molecules.

Methods
Patients and cohorts. For this study, clinical and gene expression data were collected from public databases representing four independent cohorts. Gene expression data of The Cancer Genome Atlas (TCGA cohort, n = 513) were downloaded from the UCSC Xena Browser (https://xena.ucsc.edu/) 18 . The data from the Institute for Medical Informatics, Statistics and Epidemiology (Leipzig cohort, GSE65858, n = 270) 36 , Fred Hutchinson Cancer Research Center (FHCRC cohort, GSE41613, n = 97) 37 , and MD Anderson Cancer Center (MDACC cohort, GSE42743, n = 74) 37 were obtained from the National Center for Biotechnology Information Gene Expression Omnibus database (http://www.ncbi.nlm.nih.gov/geo) and used as the test sets. Gene expression data of TCGA and Leipzig were generated by Illumina HiSeq. 2000 and Illumina HumanHT-12 v4.0 Expression Beadchip, respectively. FHCRC and MDACC cohort data were generated by Affymetrix Human Genome U133 Plus 2.0 Array. Table 2 provides the details of the pathological and clinical characteristics of the patients in all cohorts.

Development of EMT gene signature in HNSCC.
The BRB-ArrayTools software program (http://brb. nci.nih.gov/BRB-ArrayTools/) was used for analysis of gene expression data 38 . The raw data were preprocessed using a robust multiarray averaging method for normalization 39 .
To find EMT-specific genes in HNSCC, gene expression data were analyzed from the TCGA cohort. Genes were selected when the mRNA expression levels were either positively or negatively correlated with at least one of the well-known EMT markers: E-cadherin (CDH1), vimentin (VIM), N-cadherin (CDH2), and/or fibronectin 1 (FN1). These markers were selected on the basis of their previously established roles as markers of EMT in lung cancer as well as other epithelial tumors 40 .
Using a gene feature and its correlated genes, hierarchical clustering analysis was performed with the centered correlation coefficient as the measure of similarity and a complete linkage clustering method. According to the patient clustering result, the patients were divided into two subgroups (mesenchymal and epithelial). Cluster analysis was performed with Cluster 3.0 41 .

Construction of prediction models and validation in test cohorts.
To test the ability of the gene expression signatures to predict the class of patients in an independent cohort, a previously developed model based on the Bayesian compound covariate predictor (BCCP) was adopted 42 . Gene expression data in the training set (TCGA cohort) were combined to form a series of classifiers according to the BCCP algorithm and the robustness of the classifier was estimated according to the misclassification rate determined during leave-one-out cross-validation of the training set. Validation was conducted in three independent patient groups (Leipzig, FHCRC, and MDACC cohorts).
Immunotherapy-related gene score. To determine the association of EMT signature with immunotherapy, we used a previously described and validated immunotherapy-related gene signature. Interferon gamma (INFG) score was calculated as the average of the normalized values of the INFG-related six genes (CXCL9, CXCL10, IDO1, IFNG, HLA-DRA, and STAT1) 32 . The cytolytic activity (CYT) score was calculated as the average of the expression level of two key cytolytic effectors (GZMA and PRF1) 43 . The T cell infiltration score (TIS) was defined as the mean of the standardized values for all T cell subsets, except for T gamma delta and T follicular helper cells: CD8 T, T helper, T, T central and effector memory, Th1, Th2, Th17, and Treg cells 33 . The overall immune infiltration score (IIS) was defined as the mean of the standardized values for macrophages, dendritic cell subsets (total, plasmacytoid, immature, and activated), B cells, cytotoxic cells, eosinophils, mast cells, neutrophils, Scientific RepoRtS | (2020) 10:3652 | https://doi.org/10.1038/s41598-020-60707-x www.nature.com/scientificreports www.nature.com/scientificreports/ natural killer (NK) cell subsets [total, CD56(bright), and CD56(dim)], and all T cell subsets, excluding T gamma delta and T follicular helper cells 33 . The immune signature (IS) score was obtained using 105 immune signature genes 35 . The IS score of TCGA HNSCC cohort was obtained from the data of a previous study 35 . Pathway analysis. The list of EMT signature genes was submitted to the Database for Annotation, Visualization, and Integrated Discovery (DAVID) bioinformatics resources 6.7 to discover the gene ontology categories with significantly enriched gene numbers 44 . The default setting from the software was used to map the EMT signature genes to the reference set of direct and indirect relationships. Next, relevant inputs to the gene list, such as the molecular networks and biological functions were generated by the software's algorithm. The significance of the gene annotation with a p-value less than 0.05 was determined with two-tailed Fisher's exact test.
Antibodies, recombinant protein, and inhibitor. The following antibodies were used in this study: anti-E-cadherin, anti-N-cadherin, anti-vimentin, and anti-PD-L1 from Cell Signaling Technology (Danvers, MA, USA); and anti-β-actin from Santa Cruz Biotechnology (Santa Cruz, CA, USA). Recombinant human transforming growth factor (TGF)-β1 was purchased from PeproTech (Rocky Hill, NJ, USA). SB 431542, a TGF-β inhibitor, was purchased from Tocris Bioscience (St. Louis, MO, USA).  Real-time quantitative reverse transcriptase PCR. Total RNA was extracted from the indicated cell lines using an RNeasy mini kit (Qiagen, Hilden, Germany) according to the manufacturer's instructions. RNA was reverse-transcribed to cDNA with PrimeScript ™ RT Master Mix (Takara, Shiga, Japan). Quantitative real-time PCR was performed using the TB Green ™ Premix Ex Taq ™ II (Takara). Primer sequences are shown in Table 3. Relative amounts of mRNA were calculated from the threshold cycle number using glyceraldehyde 3-phosphate dehydrogenase (GAPDH) as an endogenous control. All experiments were performed in triplicate and the values were averaged.

Statistical analysis.
To test the prognostic significance, only gene expression data with available survival data were used. Prognostic significance was estimated by the Kaplan-Meier method and log-rank test between two predicted subgroups. Univariate and multivariate Cox proportional hazards regression analysis was performed to evaluate independent prognostic factors associated with survival. Pearson's correlation was used for correlation analysis and Fisher's exact test was used to assess the frequency difference of somatic mutation. All statistical analyses were performed using the R language program (http://www.r-project.org).  Table 3. Primer sequences used for qPCR experiments.