A panel of emerging EMT genes identified in malignant mesothelioma

Malignant mesothelioma (MESO) is a highly aggressive cancer with poor prognosis. Epithelial–mesenchymal transition (EMT) is a critical process in malignancies involved in tumor angiogenesis, progression, invasion and metastasis, immunosuppressive microenvironment and therapy resistance. However, there is a lack of specific biomarkers to identify EMT in MESO. Biphasic MESO with dual phenotypes could be an optimal model to study EMT process. Using a powerful EMTome to investigate EMT gene signature, we identified a panel of EMT genes COL5A2, ITGAV, SPARC and ACTA2 in MESO. In combination with TCGA database, Timer2.0 and other resources, we observed that overexpression of these emerging genes is positively correlated with immunosuppressive infiltration, and an unfavorable factor to patient survival in MESO. The expression of these genes was confirmed in our patients and human cell lines. Our findings suggest that these genes may be novel targets for therapeutics and prognosis in MESO and other types of cancers.


Results
Overlaps of up-regulated genes in mesothelioma microenvironment from microarray and scRNA-Seq data sets. First, we screened all up-regulated genes in the mesothelioma microenvironment using peritoneal lavage. The up-or down-regulated genes with significant changes over time were analyzed using microarray from single cells collected from peritoneal lavage of RN5-bearing mice. The total number of genes changed over time after tumor challenge (> twofold change, p < 0.05) ( Fig. 1A-C). The overlaps of all up-regulated genes at different time points were determined by GeneVenn. In total, 429 genes were found to be overexpressed at all time points compared with naïve controls (Fig. 1D). Peritoneal lavage from naïve mice (N) was used as controls.
Cultured mesothelioma RN5 cells were then used to differentiate genes originating from tumor cells (defined as "tumor genes") from those originating from the tumor microenvironment itself (defined as "non-tumor genes") ( Fig. 1E). A total of 3637 genes were up-regulated in RN5 cells compared to the peritoneal lavage cells of naïve mice.
Single cell RNA-sequencing from the peritoneal lavage of tumor bearing mice was performed at 4 weeks. The 4-week time point was selected as the most representative time point to perform scRNA-Seq in comparison with scRNA-Seq of peritoneal lavage from naïve control since the tumor was always well established and the mice were still healthy. ScRNA-Seq demonstrated that 255 genes were up-regulated in intraperitoneal tumor microenvironment at 4 weeks of tumor-bearing mice compared to naïve mice (Fig. 1F).
The overlaps between the 429 up-regulated genes derived from the microarray analysis of peritoneal lavage, the 3637 genes from the microarray analysis of cultured RN5 cells, and the 255 genes from scRNA-Seq from peritoneal lavage single cells of tumor-bearing mice identified 55 non-tumor cell genes and 72 tumor cell genes by GeneVenn from the three data sets (Fig. 1G).
The selected lists of 55 non-tumor cell genes and 72 tumor cell genes are presented in "Supplementary data" ("Supplementary data", Table 1S). The differentiation of tumor genes from non-tumor genes was confirmed in the scRNA-Seq of tumor bearing mice. The non-tumor cell genes were found to be expressed by tumor cells as well in TME ("Supplementary data",

Epithelial-mesenchymal transition (EMT) is the major hallmark gene set in both tumor and non-tumor genes determined by GSEA.
We then explored what hallmark pathways these selected genes are involved. Among the list of 72 tumor cell genes and 55 non-tumor cell genes, 12/72 tumor cell genes (FDR q-value 7.9E − 14, p = 1.58E − 15) and 11/55 non-tumor cell genes (FDR q-value 8.59E − 14, p = 1.72E − 15) are involved in the EMT pathway, identified by MSigDB ("Supplementary data", Table 2S). Top hallmark gene sets are presented in bar graphs ( Fig. 2A,B). Gene expression of selected lists is shown in heatmap: 12 tumor cell genes and 11 non-tumor cell genes (Fig. 2C,D). Hallmark EMT was the top pathway involved in both gene sets.
EMTome: both gene sets were correlated with epithelial and mesenchymal phenotypes. Top genes were identified to be highly correlated with EMT signature. Since EMT is the most out- Figure 1. Overlaps of up-regulated genes in mesothelioma microenvironment by microarray and single cell RNA sequencing. (A) The schema of experimental design; (B) up-or down-regulated genes with significant changes over time were analyzed using microarray in scatter plots. Single cells were collected from peritoneal lavage of RN5-bearing mice at1, 2, 3, 4, 5, and 8 weeks after tumor cell ip injection. (C) Total number of genes changed over time after tumor challenge compared with naïve controls (> twofold change, p < 0.05). (D) The overlaps of all up-regulated genes at different time points were determined by GeneVenn and 429 common genes were found to be overexpressed in all timepoints. (E) Cultured mesothelioma cells RN5 were used to determine tumor genes or non-tumor genes. (F) Most representative time point at 4-week was selected to perform scRNA-Seq in comparison with naïve control (N) to determine up-regulated genes in tumor microenvironment. (G) Final overlaps of up-regulated genes derived from microarray analysis at all time points of peritoneal lavage in (D) (red), microarray analysis of RN5 cells with 3637 up-regulated genes in (E) (green) and scRNA-Seq of single cells in peritoneal lavage from tumor-bearing mice at 4 weeks with 255 up-regulated genes in (F) (yellow). Finally, 55 non-tumor genes and 72 tumor genes were identified by GeneVenn from the above triple data sets. Single cells or RNA from peritoneal lavage of naive mice were used as controls.  www.nature.com/scientificreports/ standing hallmark pathway in both gene lists, we were interested to know which genes played the most predominant roles in this process. EMT score and EMT interactome were thus performed by importing each individual gene of interest. Each gene was correlated with Epithelial top 50 and Mesenchymal top 50 genes in pan-cancer cohorts. R scores from all genes of interest were plotted in bar graphs of both tumor and non-tumor genes, p values for the mesenchymal phenotypes were also indicated (Fig. 3A). The correlation plot demonstrated positive correlation with the mesenchymal markers and negative correlation with the epithelial markers, thus characterizing the mesenchymal phenotype (Fig. 3B). Summary of top four selected genes (COL5A2, ITGAV, SPARC and ACTA2) which were correlated with EMT signature in 32 pan cancer cohorts was presented in radar graphs, and the wellknown CDH1 (Epithelial) and VIM (Mesenchymal) genes were included as controls (Fig. 3C).
Network of OMICS profile of interest for the most significantly correlated genes (COL5A2, ITGAV, SPARC and ACTA2) were presented ("Supplementary data", Fig. 3S). OMICS includes mRNA, miRNA, copy number alteration (CNA), methylation, and reverse phase protein analysis (RPPA) correlation were also analyzed. Significant correlation was considered between genes (mRNA expression) in TCGA cohort MESO with expression if FDR < 0.05.
The mRNA expression of COL5A2, ITGAV, SPARC and ACTA2 genes correlated with mesenchymal phenotype in epithelioid vs non-epithelioid (biphasic and sarcomatoid) subtypes in MESO cohort. Patients with epithelioid subtype (n = 62) and biphasic and sarcomatoid subtypes (n = 23 and n = 2, respectively) were included. Gene expression of COL5A2, ITGAV, SPARC and ACTA2 (RNA expression in Transcripts Per Million-TPM) was significantly different between epithelioid and non-epithelioid subtypes in TCGA Cohort MESO (p < 0.05 for all comparisons). All four genes had significantly higher mRNA expression in non-epithelioid subtypes compared to epithelioid subtypes as shown in violin graphs (Fig. 4A).
Malignant mesothelioma therefore appears to be an optimal model to study the EMT process. The TCGA database contains only two patients with sarcomatoid MESO, thus limiting our ability to look at significant differences between biphasic and sarcomatoid MESO. Despite this limitation, the trend in mRNA expression of these four genes of interest clearly increased from epithelioid to biphasic and then sarcomatoid MESO (Fig. 4B).
In TCGA MESO cohort, gene expression of COL5A2, ITGAV, SPARC and ACTA2 correlated with the hallmark EMT geneset as shown in scatter plots (Fig. 4C). The expression of each gene was positively correlated with hallmark EMT regardless of tumor stages, all p values for each comparison were less than 0.05, suggesting that these genes are EMT markers in MESO independent of tumor stages. It appears that along with tumor progression, q values tended to be bigger while p values tended to be smaller and far less than 0.05.

Confirmation of EMT gene expression in human mesothelioma tissues and cell lines. In our
scRNA-Seq data from the biopsy specimens of naïve MESO patients, the patient (SMTR02T0) with biphasic subtype had the highest expression of COL5A2, ITGAV, SPARC and ACTA2 genes compared with other cases who were confirmed epithelioid subtypes (Fig. 4D). Dotplot was shown to compare the five cases of these four genes, and epithelial marker CDH1 and mesenchymal marker VIM were included as controls. Histology of H&E staining of biphasic (SMTR02T0) and epithelioid (SMTR03T0) subtypes were confirmed ("Supplementary data", Fig. 4S).
Fluorescent immunostaining of human mesothelioma cell lines showed that sarcomatoid CRL-5946 cells expressed COL5A2 and SPARC dramatically higher compared to epithelioid CRL-5915 cells, while CRL-5915 cells expressed epithelial marker EpCAM specifically (Fig. 4E).
The mRNA expression of top genes COL5A2, ITGAV, SPARC and ACTA2 is associated with overall survival in epithelioid subtype and all MPM patients in TCGA database. High expressions of tumor cell genes COL5A2 and ITGAV and non-tumor cell genes SPARC and ACTA2 are both positively correlated with poor overall survival and disease-free survival in MESO cohort (Fig. 5). The results indicate that overexpression of these genes of interest is unfavourable prognostic factors in mesothelioma patients.
Tumor genes COL5A2 and ITGAV expression also correlated with overall survival in the epithelioid subtype, demonstrating that epithelioid MESO can acquire EMT characteristics affecting survival before being categorized as biphasic. Median survival is 15.22mon in COL5A2 hi and 28.37mon in COL5A2 lo , respectively, hazard ratio: 2.823, 95% CI of ratio 1.566-5.088; Median survival is 19.43mon in ITGAV hi and 25.94mon in ITGAV lo , respectively, hazard ratio: 1.946, 95% CI of ratio 1.107-3.422 (Fig. 5C).

Discussion
Our previous studies and that of others have indicated that non-epithelioid mesothelioma cells are more aggressive and have higher capacity to form mesospheres in vitro and in vivo 24,25 . Compared with epithelioid subtype, non-epithelioid mesothelioma cells were more resistant to chemoradiation, and the surviving cells had enhanced stemness 26 . Our findings in this study imply that EMT may be a critical process to predict cancer cell invasiveness and stemness. Nowadays the linkage between the EMT and cancer cell stemness, metastasis and resistance to therapy has been driving efforts to develop novel therapeutic targets 27 .
EMT occurs through distinct intermediate states. Biphasic mesothelioma characterized by dual histology could thus best mimic the intermediate hybrid state of EMT stage while transitioning from epithelial to partially and completely mesenchymal states. A recent study revealed that in mouse and human squamous cell carcinoma, loss of function of FAT1 could promote tumor initiation, progression, invasion, stemness and metastasis through inducing a hybrid EMT state 28 . EMT genes can be novel potential targets to interrupt cancer invasion and metastasis 29 . EMT is driven by the SNAIL and ZEB transcriptional repressors of epithelial genes. TGF-β is a potent inducer of EMT specifically working with RAS-MAPK signaling pathway 30 .
Our findings identify previously less well recognized genes that promote EMT in MESO. Reversal of EMT into MET process may reduce the capacity of invasiveness, cancer cell stemness and metastasis of mesothelioma and augment their sensitivity to therapy. The genes COL5A2, ITGAV, SPARC and ACTA2 play critical roles resulting in EMT in MESO. Therefore, these genes may be potential therapeutic targets as well as prognostic indicators in mesothelioma.
A previous study investigating the relationship between collagen type V alpha 2 chain (COL5A2) expression and clinical outcome in bladder cancer patients observed that patients with lower expression of COL5A2 had better survival than those with higher expression 31 . Recently, a study evaluated the mutational profile of a panel of 34 genes including COL5A2 gene in MESO. They found that BAP1 mutation was related to a prolonged survival of patients treated with platinum/pemetrexed regimens 32 . Mutations in COL5A2 was observed in 6% of the patients. However, the expression level of COL5A2 was not analyzed and therefore its impact on outcome could not be directly addressed.
It is well known that integrin signaling drives multiple cancer cell functions, including tumor initiation, epithelial plasticity, metastasis, and resistance to targeted therapies 33 . Integrin alpha V (ITGAV), a transmembrane glycoprotein, has been found to enhance tumor progression. ITGAV expression is associated with shortened overall survival in esophageal adenocarcinoma 34 . Genes ITGAV, FN1, and ITGB1 were shown to be the targets of miR-9-3p, which could inhibit proliferation and metastases of nasopharyngeal carcinoma by downregulating FN1, ITGB1, and ITGAV, thus inhibiting the EMT process 35 .
SPARC (Secreted protein acidic and rich in cysteine) gene was up-regulated specifically at the early stage of lung adenocarcinoma consistently with TCGA transcriptome database when EMT markers were screened in a cellular model and validated in lung adenocarcinoma 36 . There was a proteomics-based study to identify SPARC   www.nature.com/scientificreports/ as a prognostic biomarker in MESO. They found that high level of circulating SPARC was associated with poor prognosis 37 . However, no study could be found in mesothelioma. ACTA2 gene encodes α-smooth muscle actin (αSMA) protein, which has been shown to be a hallmark of EMT in lens epithelial cells. The increased level of histone H4 acetylation at the ACTA2 promoter region was associated with EMT of lens epithelial cells 38 .
Epithelioid mesothelioma is known to have a better prognosis in comparison with sarcomatoid and biphasic types. However, within patients with epithelioid subtype, there is a lack of good prognostic markers even though some tumors are more aggressive than others. Nuclear atypia could have an impact on survival in epithelioid mesothelioma, and nuclear grade was confirmed to correlate with survival in a multi-institutional study 39,40 , but further studies will be required to demonstrate the validity of this marker in clinical practice.
According to a recent study, the prognosis of epithelioid mesothelioma patients is impacted by a protein called connective tissue growth factor (CTGF). CTGF, a secreted protein produced by both mesothelioma cells and cancer-associated fibroblasts, can promote mesothelioma cell invasion. Patients with lower levels of this protein had a better prognosis and longer survival. Using surgical specimens of epithelioid MPM, the authors evaluated the clinicopathological significance of αSMA expression, the most widely used marker of CAFs, the expression of CTGF, and the extent of fibrosis by immunohistochemistry. They showed that the expression of αSMA and CTGF in CAFs correlated with poor prognosis in epithelioid subtype MESO patients 41 .
Overexpression of the emerging EMT genes had positive correlation with the activated CD4 T cells, macrophages, monocytes, mast cells, Th2 and Treg cells, suggesting that these genes may play critical roles in modulating the immunosuppressive microenvironment. In contrast, negative correlation was observed with NKT cells, supporting a notion that EMT process may abrogate the immune response against tumor. Previous evidence showed that EMT transcriptional factors lead to immunosuppressive cell infiltration, resulting in a tumor immunosuppressive microenvironment. The immunosuppressive cells in turn promote EMT in tumor cells. The interaction between EMT and immunosuppression promotes tumor progression 42 . Tumor-infiltrating immune cells release a wide variety of inflammatory mediators and growth factors to facilitate immunosuppressive microenvironment and EMT plasticity. The immunosuppressive TME is a critical hurdle to the efficacy of cancer immunotherapy, therefore, targeting EMT could improve the outcome of immunotherapy by restricting its immunosuppressive impact [43][44][45] .
Our study may be the first time to demonstrate that strong positive correlation of panel EMT genes overexpression with poor clinical outcome, and immunosuppressive enrichment in MESO, indicating that these newly identified genes are most likely powerful drivers of EMT process in MESO. Therefore, regulation of each individual gene expression and its signalling pathways might potentially control EMT process thus leading to development of novel strategies for MESO therapeutics. SPARC protein has been observed upregulated in MESO cell lines 46 . Transcriptomic studies have identified that EMT-linked genes may contribute towards the resistance of chemotherapy and immunotherapy, suggesting that targeting EMT genes may be potentially applied to treat MESO patients 47 . This work has important implications for targeting nonimmune components in TME to improve the efficacy of chemotherapy and boost the responses to immunotherapy of cancer patients.
In conclusion, utilising transcriptomic and EMTome analysis, we identified these EMT genes that are overexpressed in tumor microenvironment of mouse and human mesothelioma. These genes strongly correlate with an EMT signature and prognosis in MESO. Besides a few sporadic studies, this may be the first study to identify COL5A2, ITGAV SPARC and ACTA2 as a panel of emerging EMT genes in mesothelioma. Furthermore, these genes may be universal EMT markers for most cancer types, as well as prognostic indicators for overall survival across cancer types. Due to a lack of biomarkers to predict survival in epithelioid subtype, it is difficult to further stratify this subgroup of MESO. Encouragingly, this study demonstrated that overexpression of these four genes  www.nature.com/scientificreports/ in epithelioid mesothelioma was associated with poor prognosis, indicating that these genes could potentially serve as independent prognostic indicators in this subtype of mesothelioma. More importantly, overexpression of these genes is associated with an immunosuppressive microenvironment promoted by the EMT process. Therefore, these genes could be potential biomarkers and targets to select appropriate immunotherapy.

Materials and methods
The schema of experimental design was depicted in Fig. 1A. Briefly, murine mesothelioma cells were injected intraperitoneally (ip) into wild-type mice, and peritoneal lavage was collected at different time points to investigate gene expression profile in tumor microenvironment (TME). Naïve mice, and cultured tumor cells were included as controls. Microarray and single cell RNA sequencing (scRNA-Seq) were performed at indicated times.

Murine mesothelioma cells and mice.
The murine mesothelioma cell line RN5 derived from C57BL/6 mice was established recently by our team and presents characteristics of biphasic mesothelioma 48,49 . Cells were maintained in RPMI1640 medium supplemented with 10% fetal bovine serum and 1% penicillin and streptomycin at 37 °C in an atmosphere containing 5% CO 2 . Cells were treated with prophylactic 5 µg/ml Plasmocin™ (Invivogen) for at least 2 weeks and were confirmed as mycoplasma-free. Exponentially growing cells (approximately 90% confluence) were prepared and injected into the 6-8 week-old C57BL/6 mice purchased from the Jackson Laboratories.
RNA extraction for microarray. Peritoneal lavage was collected by rinsing with 5 ml of PBSF and single cells were prepared by removing tumor spheroids with a cell strainer (Ø40µm). The procedure of RNA extraction was performed according to the manufacturer's instruction (QIAGEN, RNeasy Microarray Tissue Mini Kit, CA). Total RNA was treated with a Purelink DNase set (Thermo Fisher Sci., CA). Gene expression profile was evaluated by Affymetrix microarray assay by the Genomic Centre of the Hospital for Sick Children, Toronto, Canada. Array Type: MoGene-2_0-st; Analysis Type: Expression (Gene); Analysis Version: version 1; Genome Version: mm10 (Mus musculus); Annotation: MoGene-2_0-st-v1.na36.mm10.transcript.csv.

Patients with mesothelioma.
Five naïve patients were confirmed with a diagnosis of malignant pleural mesothelioma. Tumor biopsy tissues were processed freshly for scRNA-seq analysis. This study was approved by our institution (REB#19-5858) and all patients signed the consent forms.
Single cell RNA sequencing (scRNA-Seq). Fresh single cells obtained from peritoneal lavage were processed by Princess Margaret Genomic Centre, University Health Network (UHN), following the standard protocol (www. pmgen omics. ca). Loupe Cell Browser v5.0.0 provided by 10× Genomics was used to analyze single cell gene expression in clusters. A reference mouse genome (mm10) was selected to set threshold of mitochondrial UMIs as over-expression of mitochondrial genes could signify poor quality or dying cells. Globally distinguishing genes were moved for further analysis.
Patient tumor samples were collected from the outpatient biopsy. This study was approved by UHN Research Ethics Board (REB#19-5858) and all patients signed the consent form.

Fluorescent immunostaining in human mesothelioma cell lines. Human mesothelioma cell lines
CRL-5915 (characterized with more epithelioid) and CRL-5946 (more sarcomatoid), which were provided by ATCC, were used to stain COL5A2 and SPARC proteins in cultured cells. Cells were fixed with 1% PFA for 15 min, and 1% BSA was to block endogenous enzymes. Triton X-100 (0.25%, 5 min) was added for permeabilization. Primary antibodies rabbit polyclonal to COL5A2 (ab134800) and SPARC (ab14174) were used 1:100 for 1 h at RT, and the secondary antibody anti-rabbit IgG1 conjugated with Alexa-555 was 1:1000 for 1 h at RT. Anti-human EpCAM-FITC monoclonal antibody was applied 1:100. Mount medium containing DAPI was applied to display nuclei. Statistical analysis. Transcriptomic analysis was made according to the filter criteria: Fold Change > 2 or < − 2, P < 0.05. The numbers of genes are differentially expressed genes that pass through the filter criteria. Gene expression (mRNA in TPM) in the violin graph was plotted using Prism 8.0. Unpaired two-tailed t test, with a p-value less than 0.05 was considered a significant difference.

Analysis of immune infiltrates, gene expression and clinical outcome in MESO by
Significant correlation was considered between genes (mRNA expression) in TCGA cohort MESO with expression if false discovery rate (FDR) < 0.05.
Log-rank (Montel-Cox) test was used to compare survival curves. A p-value less than 0.05 was considered a significant difference. Group cutoff median was selected to determine the cutoff-high (50%) and cutoff-low (50%) of gene expression, approximately 50%. Hazard ratio was calculated based on Cox PH Model with 95% Confidence Interval (95% CI).
Ethics approval and informed consent of participants. At University Health Network (UHN), research involving humans conducted within the jurisdiction or under the auspices of UHN must be reviewed and granted written approval by the UHN Research Ethics Board (REB) prior to commencement of research (https:// www. uhnre search. ca/). This applies to research involving humans that is conducted by a UHN Principal www.nature.com/scientificreports/ Investigator (PI). This study was approved by the Research Ethics Board (REB Approval No. 19-5858) of UHN and performed in accordance with the Declaration of Helsinki, and informed consent from the patients was signed prior to the study.

Data availability
The raw data of microarray and scRNA-Seq are available and ready to be uploaded to the public structured data depository. All data are available upon request to Dr. Marc de Perrot.