Controllability in an islet specific regulatory network identifies the transcriptional factor NFATC4, which regulates Type 2 Diabetes associated genes

Probing the dynamic control features of biological networks represents a new frontier in capturing the dysregulated pathways in complex diseases. Here, using patient samples obtained from a pancreatic islet transplantation program, we constructed a tissue-specific gene regulatory network and used the control centrality (Cc) concept to identify the high control centrality (HiCc) pathways, which might serve as key pathobiological pathways for Type 2 Diabetes (T2D). We found that HiCc pathway genes were significantly enriched with modest GWAS p-values in the DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) study. We identified variants regulating gene expression (expression quantitative loci, eQTL) of HiCc pathway genes in islet samples. These eQTL genes showed higher levels of differential expression compared to non-eQTL genes in low, medium, and high glucose concentrations in rat islets. Among genes with highly significant eQTL evidence, NFATC4 belonged to four HiCc pathways. We asked if the expressions of T2D-associated candidate genes from GWAS and literature are regulated by Nfatc4 in rat islets. Extensive in vitro silencing of Nfatc4 in rat islet cells displayed reduced expression of 16, and increased expression of four putative downstream T2D genes. Overall, our approach uncovers the mechanistic connection of NFATC4 with downstream targets including a previously unknown one, TCF7L2, and establishes the HiCc pathways’ relationship to T2D.

We repeated the same procedure with HotNet2 3 . We compared the gene set enrichment with respect to our T2D-related pathway set of genes. We found that 26 out of 1,012 genes related to the 66 HiCc pathways are captured as T2D-related pathways, giving a p-value for enrichment of 7.35e-6 by Fisher's Exact test. This pvalue is lower than the enrichment for the largest HotNet2 clusters for all delta values ( Supplementary Figure 3c), showing that the control centrality measure is able to capture more relevant disease-related genes and pathways compared to HotNet2 clusters.
In order to determine the network specificity of this analysis, i.e., the uniqueness of the extended regulatory network (EGRN) in producing biologically meaningful results when used in conjunction with the HiCc method, we compared with two additional methods which take generic networks as input, namely minimum dominating sets (MDSets) 4 and NetWAS 5 . These prioritization methods were applied on a predefined protein-protein network and a pancreatic islet specific protein interaction network (NetWAS). In both cases, HiCc performed better in terms of capturing T2D related pathways. In proteins belonging to the MDset, 111 pathways were enriched, 26 of which are T2D related. This resulted in p-value=0.0058 and -log P= 5.14. Similarly for NetWAS, out of its 60 significant pathways, 17 are T2D related (-log P=4.92). Comparing these values with 21/66 T2D related pathways of HiCc, applied on the EGRN, (-log P = 8.62), we found that applying HiCc that uses EGRN as the underlying network is significantly better at capturing T2D related pathways than applying methods that use generic networks.

HiCc pathways of Asthma and Chronic Obstructive Pulmonary Disease (COPD)
To extend the applicability of the control centrality approach to identifying key driver pathways in other complex diseases, we chose two complex chronic conditions, asthma and COPD. To create the asthma and COPD GRNs, we followed a very similar feature selection procedure as in the case of diabetes. We first selected the top 2000 genes with the highest variance across all samples. We then added in a manually curated list of genes known to be associated with each disease through GWAS and other genetic and experimental studies: For the inference of the asthma GRN, we used gene expression data from airway epithelial cells of asthmatic and control subjects 6 along with GWAS-implicated genes from expert-curated lists and the GRASP database 7 . For the COPD GRN, we used gene expression data from severe COPD lung tissue from a previous study 8 , along with COPD GWAS-implicated genes from two recent studies 9,10 . Finally, we used LIMMA to compare gene expression profiles of patients with asthma (or COPD) against matched healthy controls and adjusted the p-values using the Benjamini-Hochberg method. We then extracted the differentially expressed genes with Padj < 0.05 and combined them with the GWAS-implicated genes and high-variance genes to produce the final list of input genes for network inference. The predictionet package was used to infer the final networks using the same exact priors and weighting scheme as described above in the diabetes analysis.
The resulting GRNs consisted of 846 and 774 genes for asthma and COPD, respectively. We then built the EGRNs by incorporating signaling 11 and kinasesubstrate interactions 12 . Overall, the resulting EGRNs consisted of 2,004 and 1,335 edges for asthma and COPD, respectively. We then calculated the control centrality (Cc) values of every gene in each EGRN and determined the HiCc pathways by comparing their Cc distributions with the background Cc distribution using a twosided Mann-Whitney U test, as detailed in the Methods section. This resulted in 32 HiCc pathways for asthma and 42 HiCc pathways for COPD (Mann-Whitney U test pvalue < 0.05). To test the hypothesis that HiCc pathways offer an advantage in capturing disease-related signatures, we divided the 186 KEGG pathways in the Molecular Signatures Database (MSigDB) 13 into HiCc and non-HiCc pathways. The degree distributions of the genes within HiCc pathways and non-HiCc pathways showed no distinction, suggesting once again that control centrality is not biased towards highly connected nodes, or hubs, for asthma and COPD alike ( Supplementary Figures 4a and 5a).
To validate the biological significance of the HiCc pathways, we looked for their enrichment in asthma-and COPD-specific gene sets derived from literature and genome-wide association studies (GWAS). In particular, the literature-curated ("GOLD") disease genes were taken from the DisGeNet 14 database with a score cutoff of 0.15. The GWAS-implicated disease genes were obtained from metaanalyses of asthma 15 and COPD 16 GWA studies. To avoid circularity, the set of GWAS-implicated genes that were used in the construction of the asthma and COPD GRNs were excluded from these validation datasets.
For both asthma and COPD, we observed a significant enrichment of HiCc pathways in the two datasets compared to non-HiCc pathways. For asthma, the fraction of enriched pathways in GWAS data (53 pathways overall) was significantly higher for  Supplementary Figures 4b and 5b).
Inspecting the HiCc pathways from the asthma EGRN, we found that they recapitulate the GWAS and literature-based knowledge on asthma-related pathways. Pathways that have been linked to asthma previously, such as JAK-STAT Furthermore, the asthma pathways that were only captured by HiCc showed many interesting and recently explored connections such as long-term potentiation 26 and long-term depression 27,28 , olfactory transduction 29 , glioma 30 , Notch 31 signaling, and regulation of actin cytoskeleton 32 ( Supplementary Figures 4c and 5c).
The COPD HiCc pathways were also confirmed by literature meanwhile providing possibly novel insights. Pathways previously associated with COPD, such as MAPK 33

Benchmarking with other subnetwork identification methods
JActiveModules: We retrieved the connected subnetworks of the EGRN that show significant changes in expression using the JActiveModules plugin of Cytoscape. We   The T2D-related pathway enrichment of high control centrality pathways, compared to other high-centrality pathways.
Arranged based on the significance in T2D specific -omics data.

Functional-network and 51 eQTLs in 33 pathways
The pairs of eQTLs genes with shortest path of one and two was significantly greater (17.3%) compared to all gene (4.8%) pairs in the network (p-0.01).