Regulation rewiring analysis reveals mutual regulation between STAT1 and miR-155-5p in tumor immunosurveillance in seven major cancers

Transcription factors (TFs) and microRNAs (miRNAs) form a gene regulatory network (GRN) at the transcriptional and post-transcriptional level in living cells. However, this network has not been well characterized, especially in regards to the mutual regulations between TFs and miRNAs in cancers. In this study, we collected those regulations inferred by ChIP-Seq or CLIP-Seq to construct the GRN formed by TFs, miRNAs, and target genes. To increase the reliability of the proposed network and examine the regulation activity of TFs and miRNAs, we further incorporated the mRNA and miRNA expression profiles in seven cancer types using The Cancer Genome Atlas data. We observed that regulation rewiring was prevalent during tumorigenesis and found that the rewired regulatory feedback loops formed by TFs and miRNAs were highly associated with cancer. Interestingly, we identified one regulatory feedback loop between STAT1 and miR-155-5p that is consistently activated in all seven cancer types with its function to regulate tumor-related biological processes. Our results provide insights on the losing equilibrium of the regulatory feedback loop between STAT1 and miR-155-5p influencing tumorigenesis.


S1 Descriptive statistics of the gene regulatory network structure
In this study, we collected transcription factor (TF) and microRNA (miRNA) regulations to construct global human gene regulatory networks (GRN) from predicted and experimentally validated data, respectively. This predicted GRN consists of 107 TFs, 1,851 mature miRNAs, 18,705 target genes, and 825,659 regulations among these molecules. The experimentally validated network consists of 10,046 regulations among 597 TFs, 497 miRNAs, and 2581 target genes. Detailed information regarding network structure for these two GRNs is depicted in Table   S1.

S2 Reliability of the gene regulatory network
In the predicted GRN, we observed that in-degree, i.e. regulations to targets, showed scale-free distribution, but out-degree, i.e. regulations from regulators, did not ( Figure S1A). To further confirm this, we investigated the degree distribution of the experimentally validated GRN.
Interestingly, both the in-degree and out-degree of the experimentally validated GRN showed scale-free distribution ( Figure S1B) Accordingly, we considered the expression correlation between regulators and target genes to filter out potential false positive regulations and publication bias. We incorporated the mRNA and miRNA expression profiles of seven cancer types from The Cancer Genome Atlas (TCGA).
We then mapped the expression correlations of each regulation to the predicted GRN to contrast the correlated GRN for each cancer type. We observed that the highest out-degree of the correlated GRN can be controlled by around 1,000 when only the top 1%, 5%, or 10% highly correlated regulations are used for each cancer type ( Figure S2). To note, the highest out-degree in the experimentally validated GRN is 648. Moreover, the averaged R 2 of out-degree distribution for the top 1%, 5%, and 10% highly correlated GRN was increased to around 0.5 (Normal: 1%: 0.48, 5%: 0.5, 10%: 0.45; Tumor: 1%: 0.51, 5%: 0.48, 10%: 0.49) ( Figure S1).
Notably, the R 2 of out-degree distribution for the predicted GRN is 0.17. Interestingly, the publication bias could also be reduced by incorporating expression correlations between regulators and targets the predicted GRN (Table S2). The above results suggested that the application of the expression correlations between regulators and targets may be able to reduce the false positive rate of the predicted GRN, control the out-degree distribution as scale-free, and reduce publication bias.  The out-degree distribution profile of the top 1%, 5%, and 10% highly correlated GRN for each cancer type. The log10 frequency of regulators is shown as a function of the log10 out-degree of regulators. The R 2 of the out-degree distribution is labeled on the top of each sub-chart.

S3 Differentially correlated regulations in cancers
To probe the importance of differentially correlated (DC) regulations in cancers, we collected cancer-associated genes and miRNAs from public databases and literature (see Methods in the main text). The DC regulations were defined by the distance of Fisher transformed Spearman′s ρ between normal and tumor (see Methods in the main text). The cancer-associated regulations were those regulations formed by cancer-associated genes or cancer-associated miRNAs. We  Figure S3A). Notably, the cancer-associated genes used in this study are required to be associated with cancer through mutation 1-3 . Therefore, this result proposes that these noncancer-associated TFs with differential regulatory activity might be involved in cancer development through the regulation of cancer-associated targets rather than mutations.
Interestingly, BiTT regulations significantly overrepresented cancer-associated regulations across seven cancer types, especially for both of the TFs that are cancer-associated ( Figure S3A). This observation might highlight the magnitude of regulatory FFL between two cancer-associated TFs across cancers. On the other hand, DC miRout regulations significantly overrepresented cancerassociated regulations across seven cancer types ( Figure S3A). Additionally, the regulations formed by cancer-associated miRNAs (CN and CC) are significantly enriched in DC miRout regulations across seven cancer types. This result could be due to the significant enrichment of cancer-associated miRNAs involved in miRNA regulations ( Figure S3B, right panel). This investigation also implies that miRNAs with distinct regulation activity might be involved in tumorigenesis even through targeting cancer-associated genes/TFs.

S4 Identification of STAT1-regulated functional modules
To discover the STAT1-regulated downstream functional modules, we collected STAT1 target genes that are significantly 1) positively co-expressed with STAT1 and 2) up-regulated in tumor samples for each cancer type. Through these two conditions, we obtained those functional modules potentially activated by STAT1 in a tumor. The significantly positive co-expression was defined as the absolute standard score ≥ 2.5 (the corresponding significance is P < 0.01). The significant up-regulation in tumor is defined as edgeR P < 0.05, adjusted by the Benjamini and Hochberg multiple testing procedures 4 .
Next, we performed functional enrichment analysis using Gene Ontology (GO) 5 annotations to determine the enriched functions in which these selected target genes are involved. Of note, we conducted the functional enrichment analysis in two ways, conventional and network-wise 6,7 .
With the conventional way, the overrepresentation of selected STAT1 target genes defines the significance of STAT1-regulated functions. On the other hand, the network-wise enrichment analysis evaluates the significance of STAT1-regulated functions through the overrepresentation of functional protein-protein interactions (PPIs) among selected STAT1 targets. The PPIs were obtained from the Protein Interaction Network Analysis (PINA) v2 8,9 . Notably, the functional PPIs are PPIs formed by the two proteins involved in the same functions. Moreover, because we applied network-wise enrichment analysis, we further stipulated that the selected STAT1 target genes must be collected in the PPI network in the following analyses. The significance of each function is determined by the P-value produced from Hypergeometric test. For the conventional way, the hypergeometric distribution is: where X denotes the evaluated function. N represents the number of GO annotated genes in the used expression profiles, as well as in PINA PPI network, while m indicates that in the selected STAT1 target genes. n represents the number of genes with the evaluated GO annotations in the used expression profiles as well as in PINA PPI network, while k indicates that in the selected STAT1 target genes. Thus, this formula calculated the probability of the evaluated GO annotations that contains k selected STAT1 target genes. For the network-wise way, we applied a modified hypergeometric distribution as below: e is the abbreviation of the functional PPIs. Each symbol represents the same meaning as the previous one in the conventional hypergeometric distribution, but the counting objects are changed from genes to functional PPIs. All the P-values are adjusted by the Benjamini and Hochberg multiple testing procedures to control the false discovery rate (FDR).
For each GO annotations, the two P-values produced by the conventional and network-wise method are further combined as a summarized P-value by Fisher′s method 10 . For each cancer type, a GO annotation is provided with a summarized P-value. We further combined these summarized P-values of a GO annotation as a combined P-value across the seven studied cancer types by Fisher′s method again. That is, we used the combined P-value to assess the enrichment consistency of the GO annotation across the studied seven cancer types. Furthermore, we utilized these combined P-values to rank the GO annotations in which the selected STAT1 target genes are involved. Finally, we considered the top 20 significant enriched GO annotations as potential STAT1-regulated downstream functions.  Table S3.

S6 List of the drugs differentially regulated STAT1 expression
Supplementary file S1