Prognostic and therapeutic implications of extracellular matrix associated gene signature in renal clear cell carcinoma

Complex interactions in tumor microenvironment between ECM (extra-cellular matrix) and cancer cell plays a central role in the generation of tumor supportive microenvironment. In this study, the expression of ECM-related genes was explored for prognostic and immunological implication in clear cell renal clear cell carcinoma (ccRCC). Out of 964 ECM genes, higher expression (z-score > 2) of 35 genes showed significant association with overall survival (OS), progression-free survival (PFS) and disease-specific survival (DSS). On comparison to normal tissue, 12 genes (NUDT1, SIGLEC1, LRP1, LOXL2, SERPINE1, PLOD3, ZP3, RARRES2, TGM2, COL3A1, ANXA4, and POSTN) showed elevated expression in kidney tumor (n = 523) compared to normal (n = 100). Further, Cox proportional hazard model was utilized to develop 12 genes ECM signature that showed significant association with overall survival in TCGA dataset (HR = 2.45; 95% CI [1.78–3.38]; p < 0.01). This gene signature was further validated in 3 independent datasets from GEO database. Kaplan–Meier log-rank test significantly associated patients with elevated expression of this gene signature with a higher risk of mortality. Further, differential gene expression analysis using DESeq2 and principal component analysis (PCA) identified genes with the highest fold change forming distinct clusters between ECM-rich high-risk and ECM-poor low-risk patients. Geneset enrichment analysis (GSEA) identified significant perturbations in homeostatic kidney functions in the high-risk group. Further, higher infiltration of immunosuppressive T-reg and M2 macrophages was observed in high-risk group patients. The present study has identified a prognostic signature with associated tumor-promoting immune niche with clinical utility in ccRCC. Further exploration of ECM dynamics and validation of this gene signature can assist in design and application of novel therapeutic approaches.


Results
Identification of survival-associated ECM genes. The survival analysis using cBioportal identified 106 ECM genes with higher prevalence in TCGA-KIRC patients (z-score > 2, prevalence > 5% of KIRC dataset (n = 510) (supp. Table 2). Among these 35 genes showed significant association with PFS, OS, and DSS (Fig. 1a,b and supp. Table 3). Out of these, 12 genes showed higher expression in ccRCC (Clear cell renal cell carcinoma) tumor compared to normal tissues ( Fig. 2a-l). The gene products of these genes are found majorly in extra-cellular compartment and play an oncogenic role in several tumors (Table 1). Univariate and KM analysis identified patients with z > 2 score showed a significant association with a higher risk of mortality ( Fig. 3a-l). Among these genes, NUDT1 gene showed the highest risk with (H.R 2.56, 95% CI 1.60 -4.10, p < 0.01*).
For external validation, this 12 gene signature was validated in 3 independent GEO datasets. GSE3538 (177 patients) HR = 2.2; 95% CI [1.42-3.42; p < 0.01*, GSE33371 (23 patients) HR = 17.44; 95% CI [3.61-84.32; p < 0.01*, GSE22541 (24 patients) HR = 4.54; 95% CI [1.53-13.46]; p < 0.05* (Fig. 5b-d). χ2 test between high and low risk categorical variables found significant association with ethnicity, stage, metastasis, aneuploidy score, fraction genome altered, OS status, PFS status and DSS status ( Table 2).  www.nature.com/scientificreports/ Differentially expressed genes (DEGs) in high and low-risk patients. DEseq2 analysis was performed to screen differentially expressed genes between high and low-risk patients' group (supp Table 4). Among 20,501 protein-coding genes, 1818 genes were upregulated at > twofold in high-risk patients compared to lower risk group. In the low-risk group, a total of 159 genes were upregulated at > twofold compared to the high-risk group (supp Table 4). Principal component analysis (PCA) showed a clear separation between the high-risk group and low-risk group based on ECM signature (Fig. 6a). Volcano plot of differentially expressed genes between high-risk and low-risk patients is depicted in Fig. 6b. The enriched gene ontology terms in high ECM group were extracellular matrix structural constituent and acute phase response pathways (Fig. 6c). The enriched gene ontology terms in low ECM group were transmembrane transporter activity and transport (Fig. 6d). Fur-

Correlation analysis of ECM gene signature and infiltration of immune cells.
To analyze the distribution of immune cells in these two risk groups, CIBERSORT-ABS algorithm was utilized (suppl Fig. 1). It identified patients in high risk with higher abundance of T regulatory cells, M0 and M2 macrophages (Fig. 7c).
Although tumor-promoting immune features were found to be correlated with the high-risk group there was no association of inflammatory immune cells with any group (suppl Fig. 2). The constellation plot showed 2 distinct clusters depicting high-risk ECM rich patients and low-risk patients (Fig. 7d).

Discussion
Extracellular matrix (ECM) has been commonly viewed as inert scaffolding for growth and development of tissues 30 . This view is rapidly being revised due to its emerging active roles in health and disease 31 . ECM plays an important role in maintaining structural integrity and amplification of growth signals by providing binding sites for various growth factors. For instance, FGF (fibroblast growth factors) and VEGF (vascular endothelial growth factors) have dual binding sites for cell adhesion and growth factors 32 . This localization of growth factors plays a critical role in the maintenance of gradients, which is essential in cell signalling. Further, the extracellular matrix and its perturbation plays a significant role in tumor invasion and metastasis especially in solid cancer 31 . In cancer, epithelial cells can remodel ECM to pro-tumorigenic tumor microenvironment through activation of stromal cells 33 . This transition of normal ECM due to altered expression of ECM genes can assist Table 1. Biological role of 12 genes. The ECM confidence value ranged from 0 (absence of any extracellular evidence), 1 (low confidence of extracellular evidence) to 5 (highest confidence of extracellular evidence). Cellular localization of these genes was incorporated from Genecards (https:// www. genec ards. org/).

NUDT1
(Nudix Hydrolase 1) 3 NUDT1 enzyme removes oxidized nucleotide pools and prevents its subsequent misincorporation into DNA. Its higher expression has been associated with several cancers including renal cancer, brain, lung, and liver cancer 13 www.nature.com/scientificreports/ in prognostication of renal cancer. In the present study, the expression of ECM-related genes was explored to develop a prognostic signature using TCGA and other independent gene expression datasets. Apart from the critical role of ECM plasticity in the spread of cells and its malignant transformation, it has important immunological implications as well 34,35 . It has been shown that ECM can negatively affect the infiltration and distribution of immune cells such as CD8 + T cells and NK cells in solid tumors 36,37 . Abnormal ECM components can alter mechanical and biochemical properties of TME which can affect immune cells function 38 . Additionally, immunosuppressive molecules such as IL-10 and TGF-β accumulate in ECM rich tumor partly due to low diffusion and buildup of hypoxic and metabolic stress 37,39 . The deposition of ECM components can lead to accumulation of immunosuppressive T-regulatory cells and M2 polarized macrophages 8 . Thus, variations in ECM components can affect distribution, activation, and polarization of immune cells.
In the current study, we explored the prognostic potential of genes related to ECM remodelling and its correlation to immune cell infiltration in ccRCC. We have identified 12 ECM gene signatures in ccRCC with , and (l) POSTN. The patients with higher expression (cutoff: z-score > 2) were grouped in high risk and other patients were in the low-risk group. www.nature.com/scientificreports/ prognostic significance in multiple datasets. Before this study, a comprehensive analysis of ECM genes to identify prognostic gene signature and its immune characterization was lacking. In this study, we have addressed the issue by exploring 964 ECM related genes and developed a 12 gene prognostic signature which was derived from TCGA and validated using independent GEO datasets. Further, these 12 genes were found to be tumorigenic as their expression levels were found to be significantly higher levels in renal cancer compared to normal tissue. Secretion of ECM modifying enzyme families such as LOX, PLOD, and collagen among others can lead to higher invasiveness, migration, proliferation, and survival of cancer cells 35 . The protein products of this gene signature are pre-dominantly secreted in extra-cellular spaces and play tumorigenic role. Further, network analysis of this prognostic disease module identified higher co-expression of these genes. The protein co-expression networks are generally used to identify genes which are functionally relevant and are under similar transcriptional program 40 . The expression level of these genes has been found to be similar across several conditions which is indicated by its higher co-expression (> 66.37%). In a separate study, high ECM gene expression was found to be prognostic for breast cancer 41 and gastric cancer 42 . Interestingly one of the studies has identified correlation of aggressive ECM characteristics with immunosuppressive features in glioblastoma 43 . In renal cancer, gene like collagen 1 has been explored in enhanced metastasis and invasion in RCC 44 . Another study has identified a higher expression of 10 genes associated with cell adhesion and ECM regulation in renal cancer 45 . In GSEA analysis, the patients in the high-risk group showed significant perturbation in normal homeostasis kidney function such as active ion transportation and fatty acid metabolism. These functions are known to be  www.nature.com/scientificreports/ www.nature.com/scientificreports/ associated with kidney functions and renal epithelial cells utilize fatty acids are the source of energy 46 . One of the striking features in low-risk group was the preservation of kidney function genes. Chronic kidney disease (CKD) and lower estimated glomerular filtration rate (eGFR) is associated with increased risk of renal cancer 47 . Further, the clinical management of ccRCC patients includes maximization of kidney function preservation and management of long-term CKD 48 . This gene signature identified patients with high ECM at a higher risk of renal complications compared to the low-risk patient group. Additional analysis of immune abundance using CIBERSORT-ABS revealed that T-regulatory, M2 macrophages and M0 macrophages were found to be higher in high-risk patients. In tumor microenvironment, T-reg and suppression of the immune system plays a significant role in progression of cancer 49 . Recently, high infiltration of M2 macrophages and FOXP3 + T-regulatory cells was associated with adverse clinical outcome www.nature.com/scientificreports/ in renal carcinoma 50 . The combination of anti-CTLA-4 and anti-PD-1 in a CheckMate 214 trial, was found to achieve objective response rate in only 40% of patients 51 . In our study, the association of M2 macrophages and T regulatory cells in ECM high-risk patients represent a risk population which might not benefit from these immunotherapies. These patients can be targeted for emerging novel therapeutic approaches currently under investigation such as T-reg depletion strategies or anti-FoxP3 Treg vaccines 52 . Further, expanded knowledge of variables in ECM distribution can illuminate 2 critical aspects of novel immunotherapeutic approaches: its design and application. First, a recent study has identified that targeted blocking of ECM molecules can lead to a response in immune-refractive patients. In this study, integrin αvβ6/8specific monoclonal antibody (mAb) led to higher survival in TNBC metastatic mouse models which responded poorly to PD-1 blockade 53 . So, direct inhibition of specific ECM components can lead to higher survival in cancer models. Secondly, understanding of physical and biochemical constraints posed by ECM can assist in generating better ways to deliver novel drugs to the tumors. Abnormal ECM matrix leads to poor diffusion of drugs, nutritional supply, and immune mediators 37 . Active ECM modulation has shown promising response to the efficacy of chemotherapeutic regimes in various models. Inhibition of LOX gene led to an improved response to chemotherapy in tumor models 54 . Comparatively, for an effective response to an immunotherapy regime, there is an addition of another variable i.e., immune cells. Immune modulators (mAbs) and immune cells are required to be in proximity with tumor for an effective response, but the ECM is known to interfere with both. Physically, ECM prevents interaction between immune cells such as T cells and tumor cells 55 . Further, ECM also negatively affects the diffusion of immunotherapy molecules in tissues. Dense, highly cross-linked, and stiff ECM matrix interferes with the diffusion of immunomodulatory drugs like ipilimumab (anti-CTLA-4) due to their larger hydrodynamic diameter 37 . Further, novel ECM-targeting drugs such as collagenase-infused nanogels can penetrate ECM-rich tumors and could be explored for such therapies 56 . Thus, further understanding of survival-related ECM genes in ccRCC and its effect on immune cells can assist in the design and application of novel therapeutics.

Conclusion
In the current study, we developed a 12-ECM gene prognostic signature which was able to discriminate high-risk renal cell carcinoma patients from low-risk ones. Besides, this signature was validated in 3 independent datasets. Moreover, the tumorigenic role of the genes included in this signature showed elevated expression in tumor tissues compared to normal. PCA and differential gene expression analysis identified significant perturbation of normal homeostatic functions of high-risk patients. Further, immune-suppressive cells such as T-reg, M2 and M0 macrophages showed higher infiltration in high-risk patients. The prognostic signature and immune features can identify important sub-section of ccRCC patients which can benefit from emerging immunotherapies.

Methods
Data acquisition and pre-processing. A total of 964 extracellular matrix and remodeling genes were downloaded from Gene Ontology (http:// geneo ntolo gy. org) 57 (suppl. Table 1). The gene list was curated based on range of evidence for their involvement in the ECM pathway. The evidence annotation of these genes included were: 'Inferred from Experiment (EXP)' , 'Inferred from Direct Assay (IDA)' , 'Inferred from Physical Interaction (IPI)' , 'Inferred from Mutant Phenotype (IMP)' , 'Inferred from Genetic Interaction (IGI)' and 'Inferred from Expression Pattern (IEP)' (http:// geneo ntolo gy. org/ docs/ guide-go-evide nce-codes/). This approach led to inclusion of only specific genes with evidence of their involvement in ECM interaction pathways. Transcriptome RNA-sequencing and clinical data of kidney renal cell carcinoma cancer (KIRC) patients (n = 510) was extracted from the TCGA data portal (https:// portal. gdc. cancer. gov/). This dataset was used for GSEA analysis, immune cell infiltration analysis and clustering analysis. The clinical information of patients and z-scores of individual genes was downloaded from cBioportal (https:// www. cbiop ortal. org/) 58 . The z-scores were divided into categorical variables (z-score > 2 and z-score < 2) for survival analysis. A total of 523 ccRCC cancer patients and 100 normal kidney tissues from GTEx portal was compared using GEPIA portal (http:// gepia2. cancer-pku. cn/) 59 . GEO (Gene expression Omnibus) datasets (GSE3538, n = 177, GSE33371, n = 23 and GSE22541, n = 24) were explored for validation of prognostic signature 60 .
Identification of survival-associated ECM genes. 964 extracellular matrix and remodeling genes were analyzed in cBioportal to quantify perturbations in TCGA-renal cell carcinoma (TCGA-KIRC RNA Seq V2, n = 510 patients) based on z score more than 2 as has been used previously to screen prognostic genes 61 .
Comparison of prognostic-related genes between tumor and normal tissue. The expression of the genes associated with overall survival (OS) of ECM genes in ccRCC were compared to normal tissue using Gene Expression Profiling Interactive Analysis. For this study, the Kidney tumor (n = 523) dataset was compared against combined gene expression data of normal tissues from TCGA and Genotype-Tissue Expression (GTEx) data (n = 100), respectively. In GEPIA, parameters with Log2FC (Fold change) cutoff were set at 1 with a p-value cut-off at 0.01. Development of 12 gene ECM signature. Independent prognostic potential of 12 genes was analyzed in renal cancer (TCGA-KIRC-cBioportal) based on log-rank p-values. A total of 510 patients with RNA-Seq data were included in this analysis. KM curves with overall survival (OS) and Progression-free survival (DFS) were generated. OS shows total duration for which patients were alive from date of diagnosis. To develop a 12 gene signature, Cox regression model was developed that divided the cohort into low-and high-risk groups ECM risk score based on median as previously published [62][63][64] . Briefly, the prognostic score was calculated by www.nature.com/scientificreports/ multiplying the expression value of a gene with its corresponding Cox proportion regression coefficient (Prognostic score = Σ Cox regression coefficient of Gene i * expression value of gene Gene i ). For validation of gene signature, GEO (Gene expression Omnibus) datasets were explored GSE3538 (n = 177), GSE33371 (n = 23) and GSE22541 (n = 24) using SurvExpress tool (http:// bioin forma tica. mty. itesm. mx: 8080/ Bioma tecSu rvivaX. jsp). The microarray expression data used in these studies were acquired from Affymetrix Human Genome array [HG-U133_Plus_2, GPL platform].

Biological role of 12 genes.
To further evaluate the biological roles of 12 genes and analyze predominant cellular compartment of their expressed proteins, information was extracted from literature and genecards web portal (https:// www. genec ards. org/ Guide/ GeneC ard# compa rtmen ts). The confidence score at genecards webportal was incorporated from literature, high throughput screen and similar text mining analyses. The 'unified confidence score' ranged from absence of localization evidence (value = 0) and range from low confidence (value = 1) to high confidence (value = 5).
Identification of differentially expressed mRNAs. The differentially expressed genes (DEGs) were analyzed using DEseq2 package in R (http:// bioco nduct or. org/ packa ges/ release/bioc/html/DESeq2.html). The results were analyzed and interpreted in R using 'plotPCA' function for Principal component analysis and volcano plot was generated using 'enhancedvolcano' package (http:// bioco nduct or. org/ packages/ EnhancedVolcano.html). Gene interaction network analysis. Network analysis was performed to identify the interactions between the three genes with highest log-fold change compared to tumor using g:profiler 65 . GeneMANIA was used to analyze the string network analysis (http:// genem ania. org/) 66 .

GSEA (Gene Set Enrichment Analysis) and network analysis.
Immune cell infiltration analysis. In this study, the immune cell infiltration landscape of high-and low-risk patients were obtained via CIBERSORT-ABS algorithm which quantifies absolute proportion of cells. Briefly, it is determined by calculating the median expression of all the genes in signature and is divided by gene expression level in whole sample 67 . The absolute scores calculated using this method can be used for both interand intra-sample comparisons.
Clustering analysis. Hierarchal clustering of correlation between immune cells and risk groups was generated using ward's methods. Hierarchal clusters and constellation plots were prepared using JMP-Pro (version 14.0.0, SAS Institute, Cary, USA). (Progression free-survival) and DSS (Disease-specific survival) was computed using Log-rank t-test through cBioportal. For Pearson's chi-square (χ2) test, the numeric values of the risk score were split at the median and divided in 2 groups for comparison of patients in high and low-risk groups. GEPIA web-portal utilizes one-way ANOVA for calculation of differential expression between tumor and normal. GeneMANIA utilizes Geno-Ontology based weighing method that assigns score based on biological processes behind the interacting genes. For Gene Set enrichment analysis, the normalized gene enrichment score was kept at greater than > 2 was considered strong. p values of < 0.05 were considered statistically significant.

Data availability
The study utilized The Cancer Genome Atlas (TCGA) Program and Gene Expression Omnibus (GEO) repositories, which are freely available to the public. www.nature.com/scientificreports/