Network models of primary melanoma microenvironments identify key melanoma regulators underlying prognosis

Melanoma is the most lethal skin malignancy, driven by genetic and epigenetic alterations in the complex tumour microenvironment. While large-scale molecular profiling of melanoma has identified molecular signatures associated with melanoma progression, comprehensive systems-level modeling remains elusive. This study builds up predictive gene network models of molecular alterations in primary melanoma by integrating large-scale bulk-based multi-omic and single-cell transcriptomic data. Incorporating clinical, epigenetic, and proteomic data into these networks reveals key subnetworks, cell types, and regulators underlying melanoma progression. Tumors with high immune infiltrates are found to be associated with good prognosis, presumably due to induced CD8+ T-cell cytotoxicity, via MYO1F-mediated M1-polarization of macrophages. Seventeen key drivers of the gene subnetworks associated with poor prognosis, including the transcription factor ZNF180, are tested for their pro-tumorigenic effects in vitro. The anti-tumor effect of silencing ZNF180 is further validated using in vivo xenografts. Experimentally validated targets of ZNF180 are enriched in the ZNF180 centered network and the known pathways such as melanoma cell maintenance and immune cell infiltration. The transcriptional networks and their critical regulators provide insights into the molecular mechanisms of melanomagenesis and pave the way for developing therapeutic strategies for melanoma.


PROGNOSTIC GENE SIGNATURES OF PRIMARY AND METASTATIC TUMORS
We derived survival gene signatures from Primary Skin Cutaneous Melanoma from TCGA (pSKCM) to identify functions and pathways associated to prognosis. pSKCM included overall survival follow-up data for 470 unique patients with median follow-up of 890 days, mean period of 1,596 days and 156 recorded deaths. Prognostic significance of each gene was evaluated by comparing overall survival curves between patients with low-expression below the median, and patients with high-expression above the median expression. Within each group of primary and metastatic samples, good/poor survival gene signatures were then defined by genes showing significantly higher/lower rate of observed deaths in the highexpression groups than the low-expression groups with logrank p-value < 0.05 [3].
This analysis yielded 1,068 and 1,015 genes whose up-regulation is associated with poor or good outcomes respectively from primary tumors, and 1,013 and 1,361 genes whose up-regulation is associated with poor or good outcomes respectively from metastatic tumors by logrank p-value < 0.05 and Cox p-value < 0.05 (these signatures are provided in Supplementary Material 2). Among these gene signatures, 44 and 197 genes are commonly associated with poor or good outcomes in both of primary and metastatic tumors, respectively. These signatures were significantly enriched for important signaling pathways and functions including cell cycle (corrected Fisher's Exact Test p-value (cFET p) = 2.23E-7, 16-fold enrichment (FE)) for the common poor prognosis genes, and immune system (cFET p = 3.13E- 12,, defense response (cFET p = 3.15E-12, 7.5-FE) and T-cell activation (cFET p = 4.74E-10, 20-FE) for the shared good prognosis genes.

EPIGENETIC SILENCING OF T-CELL ACTIVATION LEADS TO POOR PROGNOSIS
To identify potential epigenetic regulation of the modules predictive of melanoma prognosis, we tested enrichments of cis-/trans-methylation correlated genes (MCGs) in the modules. Putative causal regulators mediating epigenetic regulation in key pathways were further determined by the Causal Inference Test (CIT) framework [4] (Supplementary Material 7), leading to an integrative causal network regulated by methylation (mCIT).
We leveraged Causality Inference Test (CIT) framework [5] to identify methylation changes impacting gene expression changes in the downstream. Briefly, CIT evaluates the statistical significance of a causal trio chain, L→G→T, where L is causal genomic variants/alterations, G is intermediate causal regulator, and T is a trait whose change is induced by the upstream regulators. For performing CIT on methylations, we considered two types of trios as follows: i) CpG→G cis i →G trans j , and ii) G trans j →CpG→G cis i , accounting for cis-/trans-regulations by methylations in gene expressions [4]. We curated these trios by collecting significant methylation-gene expression pairs by Bonferroni corrected p-value < 0.05 by Spearman's correlation within cis-/trans-pairs, defined by 1.5kbps window around Transcription Starting Site (TSS). Significant causal trio chains were identified by CIT p-value < 0.05.
Motivated by the prognostic significance of PTPN6 and PTPRCAP in pSKCM, we further classified the pSKCM patients by methylation in their cis-CpG sites into low-and highly-methylated groups in T-cell Activation CpGs (TA-CpG Low/High) via unsupervised clustering (the bottom heatmap, Supplementary Figure 2C). TA-CpG clusters were correlated significantly with patient-wise enrichments of immune response module, M7 (Wilcox p-value between TA-CpG Low and TA-CpG High: 2.56E-6). This shows higher methylations at TA-CpGs corresponds to suppressed immune response overall.
TA-CpG clusters were also associated to altered tumor micro-environment in pSKCM. We leveraged the Cell Population Mapping (CPM) algorithm in 'scBio' R package [8] to deconvolute cell compositions in pSKCM by melanoma derived cell populations, and tested for differentially abundant cells between TA-CpG Low and TA-CpG High samples by Wilcoxon rank sum test. M1-macrophages and CD8+ T-cells were the most decreased cell types in TA-CpG High, while RM CD4+ T-cells and melanoma cells showed marked increases (Supplementary Figure 8B). In comparison to other T-cell subpopulations, Resting Memory (RM) CD4+ T-cells from scRNA-seq showed substantially lower expressions of key checkpoint genes such as PD-1 (FDR=1.17E-2, 0.90-fold decrease) and CTLA4 (FDR=4.98E-73, 0.12fold decrease), and lower expressions in overall T-cell receptor signaling pathway (cFET = 1.62E-13, 3.01 FE).
Interestingly, a key T lymphocyte co-inhibitory checkpoint, PD1/PD-L1 axis, showed markedly increased interaction with a significant correlation of PD1 and PD-L1 expressions in TA-CpG High, compared to no correlation within TA-CpG Low (Supplementary Figure 4D). PD1 and PD-L1 expressions were also significantly higher in TA-CpG Low, coupled with an increased M1-macrophage population compared to TA-CpG High (Supplementary Figure 4D).
Overall, these results suggest epigenetic suppression of T-cell activation pathway may be mediated by PTPN6 and PTPRCAP, and is coupled with increased resting memory CD4+ T-cells with low checkpoint receptor expressions such as PD-1 and CTLA4.
We performed differential gene expression analysis by contrasting siRNA transfected cells against NTC per targeted gene. We first investigated transfection efficiency and its effects on the final differential expression signature by examining relative suppression of respective target gene expressions in siRNAtransfected cells (Supplementary Figure 11A-C). We confirmed that gene expressions of the siRNA target genes were significantly suppressed in comparison to NTC samples (Supplementary Figure 11A), and identified dependency of differentially expressed gene (DEG) signatures on the siRNA knock-down (KD) efficiency (Supplementary Figure 11C). The dependency was particularly severe for siZNF180 samples, where siZNF180 and siZNF347 had mild to negligible dependency. In order to systematically mitigate the dependency on KD efficiency, we discared samples with relatively low KD efficiency, and utilized the remaining samples to identify final DEG signatures by contrasting to NTC (highlighted in red in Supplementary Figure 11B: 2, 2 and 1 samples remained for siPPP1R2, siZNF347 and siZNF180 respectively). Given the low number of samples impeding the statistical power for DEG analysis, we applied a liberal significance threshold of nominal p-value < 0.1 with fold change > 1.2 to extract the DEG signatures per target gene.
We systematically tested if differential expressions from siRNA-DEG signatures tend to closely interact with the respective target genes by a non-parametric and threshold-free approach, Rank-Rank Hypergeometric Overlap (RRHO) test [12]. Briefly, RRHO evaluates statistical significance of overlaps between two ranked gene lists by, for instance, fold changes in differential expression from an experiment, and successively measuring the statistical significance of the number of overlapping genes [12]. Using RRHO, we tested enrichments of closely interacting genes with target genes in pSKCM network with respective siRNA-DEGs. Proximity between the target gene and another gene in the network was defined by the shortest path distance [13,14] where a link (i.e. l ij ) distance, d(l ij ), is defined by 1-|ρ ij |. The differential expressions were ranked by -log10(p-value) x sign(logFC), leading to positive/negative ranking score for up-/down-regulated genes in respective siRNA-DEG signatures. As shown in Supplementary Figure 3E, down-regulated genes in siZNF347 and siZNF180 are significantly enriched in proximity of ZNF347 and ZNF180 in pSKCM. On the contrary, up-regulated genes in siPPP1R2 are enriched in proximity of PPP1R2 in pSKCM network.

EVALUATION OF TARGET NOMINATION STRATEGY BY NETWORK CONNECTIVITY
Gene networks provide important functional contexts (e.g., modules) in which member genes operate. A network regulator can potentially regulate many other genes in the network. While the traditional differential expression and prognosis analyses offer differentially expressed and prognostic gene signatures, neither of them is capable of capturing gene-gene interactions or high-order organizational structures (e.g., coexpressed gene modules), let alone key regulators.
To understand the relationship between the differential expression and prognostic analyses and the network approach (e.g., MEGENA), we set out to test how gene network connectivity is correlated with gene differential expression and gene prognostic significance. From the pSKCM cohort, we calculated network connectivity (i.e., the number of direct links) for each gene from the MEGENA based coexpression network and prognostic significance of each gene through survival analysis. From the single cell transcriptome (GSE72056), we determined gene differential expression between each cell cluster and the rest clusters. We then used the following regression model to evaluate their relationship: where, k is the network connectivity for a gene x, HR is the hazard ratio from the univariate Cox proportional hazard model for the gene x, and FC i (Fold Change) is the raito of the average expression level of the gene x in a cell cluster i to that of x in the rest clusters.
From this model, we calculated the proportions of the variance in the network connectivity explained by prognostic significance and differential expression. Prognostic significance accounts for a very small proportion (0.022%) of the variance in the gene network connectivity. While differential expression in only 4 out of the 15 cell clusters can explain more than 20% variance in the network connectivity, differential expression in the M1-macrophage cluster (CLS3) can only explain 2.84% of the variance in the gene network connectivity. Therefore, network connectivity is distinct from the differential expression and prognostic significance.
Furthermore, we sought to evaluate if the combination of survival analysis of bulk tissue RNA-seq data and differential expression analysis of single cell transcriptome can identify key targets such as MYO1F and ZNF180. Marker genes of a cell cluster were identified by differential expression analysis with FDR < 0.05 and fold change > 2. We then intersected each cell cluster marker gene signature with the prognostic gene signature from the pSKCM cohort. The good prognosis gene signature (GOSG) share 150 genes with the markers of the CD8+ T-cell cluster (CLS11) (FET p=1.09E-57) and 84 genes (including MYO1F) with the markers of the M1-macrophage cluster (CLS3) (FET p=4. 49E-38). This is expected as the significant ovelap reflects the extent of immune infiltration in melanoma tumor, a known predictor for patient survival. But such candidate lists are still long and it is difficult to further prioritize these candidates in an unbiased manner without knowledge about their disease-specific regulatory relationship. On the other hand, this combined analysis failed to identify key protumorigenic genes such as ZNF180, ZNF347 and PPP1R2 identified by MEGENA. These findings further demonstrate that the gene network analysis complements survival analysis and differential expression analysis, thus supporting the utility of network connectivity.
We have also systematically evaluated our target prioritization strategy by intersecting the poor prognosis gene signature (POSG) and the network hubs. The resulting regulators are more likely to be essential for melanoma cell viability, based on the CRISPRi screening data in the Achilles database [15], than the poor prognosis signature alone (Wilcox p-value=3.03E-4; Supplementary Figure 10).
In conclusion, these results strongly demonstrate that network connectivity provides critical information that complements differential expression and prognostic significance, and integration of the network analysis and the survival analysis leads to the identification of biologically more important regulators. More importantly, the network analysis provides functionally relevant network contexts (e.g., modules) in which member genes operate, and such functional contexts cannot be developed through the traditional differential expression and survival analysis.  Figure 4H. Module names are color-coded in accordance to enrichments of MSH2, PAI-1 correlation signatures, and siZNF180-DEG signatures depicted in Figure  4H. The following font colors capture respective enriched signatures: Blue fonts = siZNF180-DN and POSG, Red fonts = siZNF180-UP and GOSG, Cyan fonts = siZNF180-DN and PAI-1(-), Black fonts = siZNF180-DN, PAI-1(-) and MSH2(+) signatures.  in Stat1 knock-out murine bone marrow-derived macrophages are enriched in the 3-layer neighborhood of STAT1 in pSKCM network. D. Differential correlation between PD1 and PD-L1 expressions within TA-CpG High or TA-CpG Low cluster. Differential expressions of individual genes are depicted in density plots on the borders, and the significant differences are shown by two-sided Wilcoxon test p-value. Point size describes CIBERSORT inferred M1-macrophages in pSKCM. E. Correlation between lognormalized STAT1 expression and inferred M1-macrophage abundance in pSKCM cohort. The Spearman correlation coefficients and p-value from two-tailed test are shown in the plot.

Module Top 10 hubs Associated Pathways
left). Cells are further classified by unsupervised clustering approach (bottom left), annotating each cell cluster with varying compositions of different cell types (right). B. Cell-type specific interactions: gene interactions from TCGA-pSKCM (right) are compared to cell-cluster specific interactions (left). C. Cell types enriched for individual gene expression in scRNA-seq. Each dot is a gene belonging to the specified modules in each box. One-tailed FET with odds ratio > 1 were performed to test enrichment of cells expressing a gene in cells from each cell type. The x-axis is inferred cell types in scRNA-seq data. the yaxis is -log10(FDR corrected FET p-value) for the enrichment of cells expressing each gene in the cell types. D. Correlation analysis between CIBERSORT inferred cell abundance (x-axis) and CPM inferred cell abundance. Each dot represents a cell from the scRNA-seq data, whose relative abundance in the pSKCM was inferred by the CPM algorithm. Y-axis, -log10(Spearman correlation FDR) x sign(correlation coefficient), denotes the correlation between individual cell abundance score by CPM