C1QA, C1QB, and GZMB are novel prognostic biomarkers of skin cutaneous melanoma relating tumor microenvironment

Skin cutaneous melanoma (SKCM) is the most lethal form of skin cancers owing to high invasiveness and high metastatic potential. Tumor microenvironment (TME) provides powerful evidences for discerning SKCM, raising the prospect to identify biomarkers of SKCM. Based on the transcriptome profiles of patients with SKCM and the corresponding clinical information from The Cancer Genome Atlas (TCGA), we used ESTIMATE algorithm to calculate ImmuneScore and StromalScore and identified the TME-Related differentially expressed genes (DEGs), than the intersected TME-Related DEGs were used for subsequent functional enrichment analysis. Protein–protein interaction (PPI) analysis was used to identify the functionality-related DEGs and univariate Cox regression analysis was used to identify the survival-related DEGs. Furthermore, SKCM-related DEGs were identified based on two Gene Expression Omnibus (GEO) datasets. Finally, we intersected functionality-related DEGs, survival-related DEGs, and SKCM-related DEGs, ascertaining that six DEGs (CCL4, CXCL10, CCL5, GZMB, C1QA, and C1QB) function as core TME-related genes (CTRGs). Significant differences of GZMB, C1QA, and C1QB expressions were found in gender and clinicopathologic staging of SKCM. High levels of GZMB, C1QA, and C1QB expressions were associated with favorable prognosis. Gene set enrichment analysis (GSEA) showed that cell–cell interaction, cell behavior, and intracellular signaling transduction may be mainly involved in both C1QA, C1QB and GZMB expressions and metabolism of phospholipid and amino acid, transcription, and translation may be implicated in low GZMB expressions. C1QA, C1QB, and GZMB are novel SKCM-relating CTRGs, providing promising immune-related prognostic biomarkers for SKCM.

To minimize potential batch effects, same library preparation and sequencing platform was used to analyze the gene expressions data from TCGA and GTEx (https:// toil-xena-hub. s3. us-east-1. amazo naws. com/ downl oad/ gtex_ RSEM_ gene_ fpkm. gz). After merging these two datasets, the data of 471 SKCM samples from TCGA and 813 non-SKCM samples of skin from TCGA and GTEx were finally selected. GSE46517 (104 SKCM samples and 17 non-SKCM samples of skin) and GSE15605 (58 SKCM samples and 16 non-SKCM samples of skin) from GEO were used to identify differentially expressed genes (DEGs) respectively. In addition, GSE65904 and GSE54467 were used to verify the predictive effect of DEGs on survival of patients with SKCM.

Identification of DEGs between SKCM and non-SKCM.
Integration of multiple arrays is considered a better approach of enhancing the reliability of results than individual array analysis 12 . We used GEO2R (http:// www. ncbi. nlm. nih. gov/ geo/ geo2r) to identify DEGs between SKCM samples and non-SKCM skin samples from the two GEO Datasets (GSE46517 and GSE15605) on the basis of the threshold of |log2(fold change)| > 1 and FDR-adjusted P < 0.01.
Associations of ImmuneScores, StromalScores, and ESTIMATEScores with survival and with features of patients with SKCM. ImmuneScore, StromalScore, and ESTIMATEScore have been widely used: the ImmuneScore, an index reflecting proportions of immune components, is defined to describe the infiltration of immune cells in tumour tissue; the StromalScore, an index reflecting proportions of stromal components, is defined to represent the presence of stroma in tumour tissue; and the ESTIMATEScore, an index reflecting TME, is defined as the sum of the ImmuneScore and the StromalScore [7][8][9]13 . The ImmuneScore, StromalScore, and ESTIMATEScore were calculated using the ESTIMATE algorithm through "ESTIMATE" R package 13 . We divided the 471 SKCM samples from TCGA into two corresponding groups according to the medians of the ImmuneScores, StromalScores, and ESTIMATEScores, respectively. Kaplan-Meier survival curves were performed through "Survival" and "survminer" R packages to evaluate overall survival (OS). OS was compared using the log-rank test, with P < 0.05 being considered significant. Additionally, Kruskal-Wallis rank sum test was performed using "ggpubr" R package assess associations of the above scores with age, gender, T stages (T0, T1, T2, T3, T4, and Tis), N stages (N0, N1, N2, and N3), M stages (M0 and M1), overall stage, or tumor metastasis, respectively.
Identification of TME-related DEGs. Patients with SKCM on the basis of the medians of the ImmuneScores and StromalScores, were divided into two immune groups (high-immune-score group and low-immunescore group) and two stromal groups (high-stromal-score group and low-stromal-score group), respectively. For the two immune groups and the two stromal groups, DEGs were identified using the "limma" R package. False discovery rate (FDR)-adjusted P < 0.001 and |log2(fold change)| > 1.5 were set as the criteria to screen for DEGs, and results were plotted in volcano plots using the "ggplot2" R package.

Functional enrichment analysis.
To understand the potential biological significance of TME-related DEGs, the Gene Ontology (GO) functional enrichment and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were performed using the "clusterProfiler", "enrichplot", and "ggplot2" R packages (p < 0.05, q < 0.05) 14,15 . The functional enrichment was visualized using bubble diagram.
Consensus TME-related DEGs confer the contribution to functional enrichment pathways, highlighting the interactions among proteins encoded by TME-related DEGs provide more informative understanding of this contribution 16 . Thus, we analyzed consensus genes and constructed protein-protein interaction (PPI) network using the search tool for the retrieval of interaction genes (STRING) (https:// string-db. org/): DEGs with interaction scores > 0.99 were selected for PPI network construction, and PPI network was visualized using Cytoscape v3.8.2. According to the studies of Le et al. 17 and Chen et al. 8 , we also selected the top 30 significant TME-related genes using the multi-network clustering (MNC) method through the cytoHubba app in Cytoscape.
Identification of core TME-related genes. Univariate Cox regression analysis was performed to identify DEGs that associated with OS using the "survival" R package. P < 0.05 indicated statistical significance. In this paper, we selected P < 0.001 as a screened criterion to obtain DEGs with much more significance. The association between DEGs and OS was visualized using forest plot. We, further, intersected genes among top 30 significant TME-related genes from PPI, UCRA-derived DEGs with much more significance, DEGs identified from GSE15605, and DEGs identified from GSE46517, focusing on the core TME-related genes (CTRGs) that were finally utilized in differential expressions analysis on combining data between TCGA and GTEx, association analyses of survival and clinicopathological characteristics, and gene set enrichment analysis (GSEA). www.nature.com/scientificreports/ For differential expressions analysis on combined data between TCGA and GTEx, CTRGs expressions in SKCM samples were compared with those in non-SKCM samples from TCGA and GTEx using the Wilcoxon rank sum test through the "beeswarm" and "ggpubr" R packages.
For association analyses of survival and clinicopathological characteristics, Kaplan-Meier survival curves were performed through "Survival" and "survminer" R packages to evaluate OS. OS was compared using the logrank test, with P < 0.05 being considered significant. Additionally, Kruskal-Wallis rank sum test was performed using "ggpubr" R package assess associations of CTRGs expressions with age, gender, TNM staging, overall stage, or tumor metastasis, respectively.

Results
Analysis workflow of this study. In this study, we investigated the potential roles and prognostic value of CTRGs in SKCM, as shown in Fig. 1.
Associations of ImmuneScore, StromalScore, and ESTIMATEScore with survival and clinicopathologic features in SKCM. We found that the patients with high ImmuneScore and high ESTI-MATEScore exhibited better OS than those with low corresponding scores ( Fig. 2A,C), although StromalScore was not significantly associated with OS ( Fig. 2B), documenting that the high ImmuneScore are potentially positive prognostic indicators for the patients with SKCM. We, further, analyzed associations of ImmuneScore, StromalScore, and ESTIMATEScore with clinicopathologic features in the patients with SKCM from the TCGA, including age, gender, primary tumor size (T stage), regional lymph node status (N stage), distant metastasis (M stage), overall stage, and tumor metastatic. Patients aged ≤ 65 years had significantly higher scores than patients over the age of 65 (for ImmuneScore, P = 0.011; for StromalScore, P = 0.0016; for ESTIMATEScore, P = 0.0026) (Fig. 3A,H,O). No significant differences of the three scores was found between male and female patients with SKCM ( Fig. 3B,I,P), between M stages (M0 and www.nature.com/scientificreports/ M1) (Fig. 3C,J,Q), and N0-N1 vs. N2-N3 (Fig. 3D,K,R). Notably, the three scores in T4 were significantly lower than those in those in T0, T1, T2, and T3 (all P < 0.05) (Fig. 3F,M,T). Overall stage was usually investigated www.nature.com/scientificreports/ by comparing stage 0-II and stage III-IV 19,20 ; thus, we compared the three scores on the basis of stage 0-II and stage III-IV. Interestingly, the StromalScore and ESTIMATEScore in stage III-IV were higher in those in stage 0-II significantly (P = 0.0055 and P = 0.013, respectively) ( Fig. 3E,L,S). Additionally, the ImmuneScore and ESTIMATEScore were significantly higher in metastatic tumors than those in primary tumors (P = 0.0039 and P = 0.006, respectively) ( Fig. 3G,N,U), signifying that the TME may be involved in biological behaviour of SKCM, such as metastasis and infiltration.
Identification of DEGs reflecting both the high vs. low ImmuneScore and the high vs. low StromalScore. We found 1236 DEGs (1224 upregulated and 12 downregulated) specific to the high vs.
low ImmuneScore and 1200 DEGs (1192 upregulated and 8 downregulated) specific to the high vs. low Stro-malScore groups (Fig. 4A,B). After intersecting these DEGs, we identified 960 upregulated and zero downregulated genes (Fig. 4C,D). We, further, performed GO analysis, unveiling that the 960 DEGs were clustered in the immune-related GO terms, such as immune response-activating cell surface receptor signaling pathway, immune response-activating signal transduction, and B cell mediated immunity (Fig. 4E). Moreover, we uncovered that the DEGs enriched in cytokine-cytokine receptor interaction, chemokine signaling pathway, and hematopoietic cell lineage on the basis of KEGG analysis (Fig. 4F). Thus, both GO and KEGG analyses documented that the 960 DEGs were mainly implicated in immune activity, suggesting that these DEGs may be necessary for the TME of SKCM.

Confirmation of CTRGs.
Firstly, because PPI network analysis frequently used to predict the functionality of interacting genes or proteins reveals the interactions among the genes and proteins 16 , we constructed PPI network according to STRING PPI confidence scores > 0.99, obtaining 90 nodes and 79 edges (Fig. 5A). Moreover, we chose the top 30 hub genes through the MNC algorithm to identify functionality-related DEGs (Fig. 5B). Secondly, we established the association between the 960 DEGs and survival of patients with SKCM using univariate Cox regression analysis, further screening 273 survival-related DEGs (P < 0.001) (Fig. S1). Thirdly, to improve detection power, we integrated multiple individual data (2761 DEGs in GSE15605 dataset and 208 DEGs in GSE46517 dataset), confirming SKCM-related DEGs. Finally, we intersected functionality-related DEGs, survival-related DEGs, and SKCM-related DEGs, ascertaining that six DEGs (CCL4, CXCL10, CCL5, GZMB, C1QA, and C1QB) function as CTRGs (Fig. 5C). www.nature.com/scientificreports/ Additionally, significant differences of GZMB expressions were found in gender, and melanoma Clark level (P < 0.05) ( Table 1). And significant differences of C1QA and C1QB expressions were found in T stage, pathologic stage, melanoma ulceration, melanoma Clark level, and Breslow depth (P < 0.05) (Tables 2 and 3).
Specific pathways of C1QA, C1QB, and GZMB. Because KEGG analysis reveals functional enrichment pathways on the basis of gene category, rather than expression levels of genes, we next used GSEA to identify specific pathways of C1QA, C1QB, and GZMB on the basis of expression levels of the three genes. The expressions of C1QA correlated with that of C1QB positively (Fig. 5A); thus, we intersected the specific pathways between C1QA and C1QB. We intersected between the top five specific pathways correlating with high C1QA expressions and the top five specific pathways correlating with high C1QB expressions, obtaining four intersected specific pathways (chemokine_signaling_pathway, cytokine_cytokine_receptor_interaction, natural_killer_cell_medi-ated_cytotoxicity, and toll_like_receptor_signaling_pathway) (Fig. 6B,C). However, we did not identify specific pathways shared by both low C1QA and C1QB expressions. Additionally, the top five specific pathways correlating with high GZMB expressions were kegg_chemokine_signaling_pathway, kegg_natural_killer_cell_medi-ated_cytotoxicity, kegg_viral_myocarditis, kegg_cytokine_cytokine_receptor_interaction, and kegg_cell_adhe-sion_molecules_cams (Fig. 6A). The top five specific pathways correlating with low GZMB expressions were kegg_glycosylphosphatidylinositol_gpi_anchor_biosynthesis, kegg_one_carbon_pool_by_folate, kegg_lysine_ degradation, kegg_rna_polymerase, and kegg_aminoacyl_trna_biosynthesis (Fig. 6D). Thus, cell-cell interaction, cell behavior, and intracellular signaling transduction may be mainly involved in both C1QA and C1QB expressions, as well as in high GZMB expressions. Interestingly, metabolism of phospholipid and amino acid, transcription, and translation may be implicated in low GZMB expressions.
The associations of C1QA, C1QB and GZMB expressions with TICs. The analyses of associations of C1QA, C1QB and GZMB expressions with TICs of SKCM samples from TCGA dataset were performed using CIBERSORT algorithm to further investigate the interaction between these genes expression and the TME. The intersections of variance analyses and correlation analyses documented that 10 of 22 TICs were significant related to C1QA expression as well as C1QB expression, and 13 of 22 TICs were significant related to GZMB expression. Interestingly, most of these TICs such as CD 4 and CD 8 T cells, M1 and M2 macrophages were positive correlated to the expression of these three genes. thus conferring a significant survival advantage (Fig. 6E,F)  (Fig. S2). These results suggested that C1QA, C1QB and GZMB play vital roles in immune infiltration processes in SKCM patients and represented potential therapeutic targets.
The correlations between C1QA, C1QB and GZMB. We investigated the correlations between expressions of C1QA, expressions of C1QB and expressions of GZMB. Interestingly, we found that the expressions of analyses (p < 0.05 and q < 0.05). This figure was generated using "limma", "ggplot2", "clusterProfiler", and "enrichplot" R packages. We have got permission to use the KEGG software from the Kanehisa laboratory (http:// www. kegg. jp/ kegg/ kegg1. html) 15  www.nature.com/scientificreports/ www.nature.com/scientificreports/ C1QA positively correlated with those of C1QB and GZMB, and expressions of C1QB positively correlated with those of GZMB, which indicated that C1QA, C1QB and GZMB function as synergistic roles in the developments of SKCM (Fig. 7).
Single-cell analysis of C1QA, C1QB and GZMB in SKCM. We further understand the main cell types in SKCM microenvironments that express the C1QA, C1QB and GZMB. The heatmaps showed that GZMB was www.nature.com/scientificreports/ mainly enriched in the immune cells (especially T cells), while C1QA, C1QB were mainly expressed in monocyte/macrophage (Fig. 8). www.nature.com/scientificreports/

Discussion
In this study, we investigated TME-related genes involved in SKCM, indicating that the six CTRGs (CCL4, CXCL10, CCL5, GZMB, C1QA, and C1QB) affect OS in patients with SKCM. Our results also support the link of melanoma with CCL4, CXCL10, and CCL5 10 . We verified GZMB, C1QA, and C1QB as novel SKCM-relating CTRGs. CCL4, CCL5, and CXCL10 have been reported as potential biomarkers for SKCM on May 25th, 2020 10 . As expected, the expressions of CXCL10, CCL4 and CCL5 in SKCM are significantly higher than those in normal www.nature.com/scientificreports/ tissues. Moreover, CCL5 modulates tumor immune responses via a local renin-angiotensin system in malignant melanoma 21 . In addition, the low transcription levels of CXCL10 correlate with better prognosis in patients with SKCM 22 . www.nature.com/scientificreports/ We identified three novel SKCM-relating CTRGs, namely C1QA, C1QB, and GZMB. Complement 1 is composed of C1q, C1r, and C1s. C1q recognizes and binds to immunoglobulin complexed to antigen and initiates the complement cascade. C1QA encodes the A-chain polypeptide of serum complement subcomponent C1q (C1QA), and C1QB encodes the B-chain polypeptide of serum complement subcomponent C1q (C1QB). Both C1QA and C1QB are implicated in complement pathway and innate immune system. Additionally, C1QA and C1QB have been found as indices of TME remodeling in osteosarcoma 23 . Granzyme B encoded by GZMB is secreted by natural killer cells and cytotoxic T lymphocytes, proteolytically processing cytokines and degrading extracellular matrix proteins. Granzyme B is involved in apoptosis by cleaving caspase-3, -7, -9 and -10 24 . Moreover, granzyme B triggers caspase-independent pyroptosis after delivered into the target cell via the immunological synapse [25][26][27] . Indeed, pyroptosis-related gene signatures have been showed in robustly predicting the prognosis of SKCM 28 . Of note, CD8+ T cells propagates autoimmunity via granzyme-B-generated unique autoantigen fragments; whereas, C1q limits tissue damage and autoimmunity by mediating effector CD8+ T cells, providing biological insight into an interconnectivity between C1q and granzyme B 29 . We found that expression levels of GZMB, C1QA, and C1QB in normal samples of TCGA dataset were significantly reduced, compared with those in SKCM samples. Thus, the interconnectivity between C1q and granzyme B may recapitulate the three novel SKCM-relating CTRGs (C1QA, C1QB, and GZMB) as biomarkers for the occurrence and development of SKCM.
There are limitations in this study. Firstly, C1QC encodes the C-chain polypeptide of serum complement subcomponent C1q; however, we did not know the reason why there is no correlation of C1QC with SKCM. Secondly, we used sequencing data sets from multiple databases to investigate SKCM-relating CTRGs. Multiomic data are needed to be further validate our results.

Data availability
The datasets used during the current study are available from the corresponding author on reasonable request.