Modeling regulatory network topology improves genome-wide analyses of complex human traits

Genome-wide association studies (GWAS) have cataloged many significant associations between genetic variants and complex traits. However, most of these findings have unclear biological significance, because they often have small effects and occur in non-coding regions. Integration of GWAS with gene regulatory networks addresses both issues by aggregating weak genetic signals within regulatory programs. Here we develop a Bayesian framework that integrates GWAS summary statistics with regulatory networks to infer genetic enrichments and associations simultaneously. Our method improves upon existing approaches by explicitly modeling network topology to assess enrichments, and by automatically leveraging enrichments to identify associations. Applying this method to 18 human traits and 38 regulatory networks shows that genetic signals of complex traits are often enriched in interconnections specific to trait-relevant cell types or tissues. Prioritizing variants within enriched networks identifies known and previously undescribed trait-associated genes revealing biological and therapeutic insights.

G enome-wide association studies (GWAS) have cataloged many significant and reproducible associations between common genetic variants, notably single-nucleotide polymorphisms (SNPs), and diverse human complex traits 1 . However, it remains challenging 2 to translate these findings into biological mechanisms and clinical applications, because most trait-associated variants have individually small effects and map to non-coding sequences.
One hypothesis is that non-coding variants cumulatively affect complex traits through cell type-or tissue-specific 3 gene regulation 4 . To test this hypothesis, large-scale epigenomic 5,6 and transcriptomic 7-10 data have been made available spanning diverse human cell types and tissues. Exploiting these data many studies have shown enrichments of trait-associated SNPs in chromatin regions [11][12][13] and genes [14][15][16] that are active in traitrelevant cell types or tissues. These studies overlap regulatory maps with GWAS data and often ignore functional interactions among loci within regulatory programs.
Gene regulatory networks [17][18][19][20] have proven useful in mining functional interactions of genes from genomic data. Transcriptional regulatory interactions, rather than gene expression alone, drive tissue specificity 19 . Further, context-specific regulatory networks have emerged as promising tools to dissect the genetics of complex traits [21][22][23] . Network-connectivity analyses in GWAS have shown that trait-associated genes are more interconnected than expected 18 and highly interconnected genes are enriched for trait heritability 24 . However, these analyses do not leverage observed enrichments to further enhance trait-associated gene discovery.
To unleash the potential of regulatory networks in GWAS, we develop a Bayesian framework for simultaneous genome-wide network enrichment and gene prioritization analysis. Through extensive simulations we show several advantages of the method such as flexibility in various genetic architectures, robustness to a wide range of model mis-specification and improved performance over existing methods. Applying the method to 18 human traits and 38 regulatory networks, we identify strong enrichments of genetic associations in network topology specific to trait-relevant cell types or tissues. By prioritizing variants within enriched networks we identify trait-associated genes that were not implicated by the same GWAS. Many of these previously undescribed genes have strong support from multiple lines of external evidence; some are further validated by follow-up GWAS of the same traits with increased sample sizes. Together, these results demonstrate the potential for our method to yield additional biological and therapeutic insights from existing data.

Results
Method overview. Figure 1 shows the method schematic. In brief, we develop a model dissecting the total effect of a single SNP on a trait into effects of multiple (nearby and distal) genes through a regulatory network, and we combine it with a multiple-SNP regression likelihood 25 based on GWAS summary statistics to perform Bayesian inference.
Conceptually, we decompose the total effect of a common SNP on a complex trait into three components: a cis-regulatory component through nearby genes, a trans-regulatory component through distal genes that are regulated by genes near this SNP, and a remaining component due to other factors (Fig. 1a). Since common genetic variation contributes to complex traits primarily via gene regulation 22 , we find this decomposition a sensible approximation to the genetic basis of complex traits.
Despite various ways to model the regulatory components, here we use cell type-or tissue-specific regulatory networks 18,20 linking transcription factors (TFs) to target genes (TGs). Specifically, we define a regulatory network as a directed bipartite graph with weighted edges from TFs to TGs (Fig. 1b). Given a TF-TG network, we use its topology to decompose the total effect of each SNP into effects of multiple interconnected genes. As shown in Fig. 1c, we approximate the effect of SNP j using a weighted sum of cis effects of three nearby genes (outsidenetwork gene k, TG u and TF g) and trans effects of three TGs (u and t on the same chromosome, and n on a different chromosome) that are directly regulated by TF g near SNP j. For identifiability we assume the SNP-gene (c jg ) and TF-TG (v gt ) weights in the decomposition are known, specified by existing omics data (Methods).
To implement this regulatory decomposition in GWAS, we formulate a network-induced prior for SNP-level effects (β), and combine it with a regression likelihood 25 of β based on single-SNP association statistics from a GWAS (Fig. 1d) and linkage disequilibrium (LD) estimates from a reference panel with ancestry matching the GWAS (Fig. 1e). We refer to the resulting Bayesian framework (Fig. 1f) as Regression with Summary Statistics exploiting NEtwork Topology (RSS-NET).
RSS-NET accomplishes two tasks simultaneously: (1) testing if a network is enriched for genetic associations (Fig. 1g); (2) identifying which genes within this network drive the enrichment (Fig. 1h). Specifically, RSS-NET estimates two independent enrichment parameters (θ and σ 2 ) that measure the extent to which, SNPs near network genes and regulatory elements (REs) have higher chances to be associated with the trait, and, SNPs near network edges have larger effect sizes, respectively. To assess network enrichment, RSS-NET computes a Bayes factor (BF) comparing the "enrichment model" (M 1 : θ > 0 or σ 2 > 0) against the "baseline model" (M 0 : θ = 0 and σ 2 = 0). To prioritize genes within enriched networks, RSS-NET contrasts posterior distributions of β estimated under M 0 and M 1 .
RSS-NET improves upon its predecessor RSS-E 16 . Specifically, RSS-NET exploits the full network topology, whereas RSS-E ignores the edge information. By explicitly modeling regulatory interconnections, RSS-NET outperforms RSS-E on both simulated and real data. Despite different treatments of network information, RSS-NET and RSS-E share computation schemes (Box 1; Supplementary Notes 1-3), allowing us to reuse the efficient algorithm of RSS-E. Software is available at https:// github.com/suwonglab/rss-net. Q p j¼1 ½α j Á N ðβ j ; ν j ; τ 2 j Þ þ ð1 À α j Þ Á δ 0 ðβ j Þ is the closest mean-field approximation in Kullback-Leibler divergence to the exact conditional posterior of β given the hyper-parameters {θ 0 , θ, σ 0 , σ}. 1. Initialize: Set the initial values of {ν, α} randomly.
2c. Iterate through all SNPs to update {ν, α} as follows: Method comparison through simulations. The key contribution of RSS-NET is a unified framework that leverages network topology to infer enrichments from whole-genome association statistics and prioritizes loci in light of inferred enrichments automatically. We are not aware of any published method with the same features. However, one could ignore topology and simply annotate SNPs based on their proximity to network genes and REs (Methods). For these SNP-level annotations there are methods to assess global enrichments or local associations on GWAS summary data. Here we use Pascal 26 , LDSC 13,27 , and RSS-E 16 to benchmark RSS-NET.
Given a network, we first simulated SNP effects (β) from either RSS-NET or mis-specified models, and then combined them with real genotypes to simulate phenotypes from a genome-wide multiple-SNP model. We computed the single-SNP association statistics, on which we compared RSS-NET with other methods (Figs. [2][3][4]. Since RSS-NET is modelbased, we designed a large array of simulation scenarios for both correctly-and mis-specified β. To reduce computation of this large-scale design, we mainly used genotypes 28 of 348,965 genome-wide common SNPs and a whole-genome regulatory network inferred for human B cells (436 TFs, 3,018 TGs) 20,29 . We Fig. 1 Schematic of RSS-NET. a Decomposition of the total effect of a common SNP on a complex trait through multiple nearby and distal genes. b Gene regulatory network defined as a weighted and directed bipartite graph linking TFs to TGs. c RSS-NET exploits the topology of a TF-TG network to decompose the total genetic effect into cis and trans-regulatory components. Both the SNP-gene (c jg ) and TF-TG (v gt ) weights in this decomposition are assumed known and are specified by existing omics data (Methods). In addition to TF-TG networks, RSS-NET also requires d GWAS summary statistics and e ancestry-matching LD estimates as input. f Bayesian hierarchical model underlying RSS-NET. An in-depth description is provided in Methods. g Given a network, RSS-NET produces a Bayes factor comparing the baseline (M 0 ) and enrichment (M 1 ) models to summarize the evidence for network enrichment. h RSS-NET prioritizes loci within an enriched network by computing P 1 , the posterior probability that at least one SNP j in a locus is traitassociated (β j ≠ 0). Differences between P 1 under M 0 and M 1 reflect the influence of a regulatory network on genetic associations, highlighting previously undescribed trait-associated genes. We started with simulations where RSS-NET modeling assumptions were satisfied. We considered two genetic architectures: a sparse scenario with most SNPs being null and a polygenic scenario with most SNPs being trait-associated. For each architecture, we created negative datasets by simulating SNP effects (β) from M 0 and positive datasets by simulating β from three M 1 patterns (only θ > 0; only σ 2 > 0; both θ > 0 and σ 2 > 0) of the target network, and applied the methods to detect M 1 from all datasets ( Fig. 2; Supplementary Figs. 1, 2). Existing methods tend to perform well in select settings. For example, Pascal and LDSC perform poorly when genetic signals are very sparse (Fig. 2b); RSS-E performs poorly when enrichment patterns are inconsistent with its modeling assumptions (Fig. 2c). Except for datasets with weak genetic signals on the network (Fig. 2d), RSS-NET performs consistently well in all scenarios. This is expected because the flexible model underlying RSS-NET can capture various genetic architectures and enrichment patterns. In practice, one rarely knows beforehand the correct architecture, which makes the flexibility of RSS-NET appealing.
Genetic associations of complex traits are enriched in regulatory regions 5,6 . Since a regulatory network is a set of genes linked by REs, it is important to confirm that network enrichments identified by RSS-NET are not driven by general regulatory enrichments. To this end, we simulated negative datasets with enriched associations in random SNPs that are near genes ( Fig. 3a; Supplementary Fig. 3) or REs ( Fig. 3b; Supplementary Fig. 4). The results show that RSS-NET is unlikely to yield false discoveries due to arbitrary enrichments in regulatory regions, and it is yet more powerful than other methods.
Minor allele frequency (MAF)-and LD-dependent genetic architectures are identified in complex traits 27 . To assess the impact of MAF-and LD-dependence on RSS-NET results, we simulated MAF-and LD-dependent SNP effects (β) from an additive model of 10 MAF bins and 6 LD-related annotations 27 , which were then used to create negative datasets ( Fig. 3c; Supplementary Fig. 5). Similarly, enrichments identified by RSS-NET are unlikely to be false positives induced by MAF-and LDdependence.
Interconnections within regulatory programs play key roles in driving context specificity 19 and propagating disease risk 22 , but existing methods often ignore the edge information. In contrast, RSS-NET leverages the full topology of a given network. The topology-aware feature increases the potential of RSS-NET to identify the most relevant network for a trait among candidates that share many nodes but differ in edges. To illustrate this feature, we designed a scenario where a real target network and random candidates had the same nodes and edge counts, but different edges. We simulated positive and negative datasets where genetic associations were enriched in the target network Weak network signals (polygenic) d Fig. 2 Flexibility of RSS-NET to identify network-level enrichments from GWAS summary statistics. We used a B cell-specific regulatory network and real genotypes of 348,965 genome-wide SNPs to simulate negative and positive individual-level data under two genetic architectures ("sparse" and "polygenic"). We simulated SNP effects (β) for negative datasets from the baseline model (M 0 : θ = 0 and σ 2 = 0). We simulated β for positive datasets from the enrichment model (M 1 : θ > 0 or σ 2 > 0) for the target network under three scenarios: a θ > 0, σ 2 = 0; b θ = 0, σ 2 > 0; c θ > 0, σ 2 > 0. Using the simulated individual-level data we computed single-SNP association statistics, on which we compared RSS-NET with RSS-E 16 , LDSC-baseline 13 , LDSC-baselineLD 27 , and Pascal 26 using their default setups (Methods). Pascal includes two gene ("max": maximum-of-χ 2 ; "sum": sum-of-χ 2 ) and two pathway ("chi": χ 2 approximation; "emp": empirical sampling) scoring options. For each dataset, Pascal and LDSC methods produced P-values, whereas RSS-E and RSS-NET produced BFs; these statistics were used to rank the significance of enrichments. and random candidates respectively, and then tested enrichment of the target network on all datasets. As expected, only RSS-NET can reliably distinguish true enrichments of the target network from enrichments of its edge-altered counterparts ( Fig. 3d; Supplementary Fig. 6).
To benchmark its prioritization component, we compared RSS-NET with gene-based association modules in RSS-E 16 and Pascal 26 ( Fig. 4; Supplementary Figs. 7-9). Consistent with previous work 16 , RSS methods outperform Pascal methods even without network enrichment (Fig. 4a). This is because RSS-NET and RSS-E exploit a multiple regression framework 25 to learn the genetic architecture from data of all genes and assess their effects jointly, whereas Pascal only uses data of a single gene to estimate its effect. Similar to enrichment simulations (Fig. 2), RSS-NET outperforms RSS-E in prioritizing genes across different architectures ( Fig. 4b-d). This again highlights the flexibility of RSS-NET.
Finally, since RSS-NET uses network as is and most networks to date are algorithmically inferred, we performed simulations to assess the robustness of RSS-NET under noisy networks. Specifically, we simulated datasets from a real target network, created noisy networks by randomly removing edges from this real target, and then fed the noisy networks (rather than the real one) to RSS-NET. By exploiting retained true nodes and edges, RSS-NET produces reliable results in identifying both network enrichments and genetic associations, and unsurprisingly, its performance drops as the noise level increases ( Supplementary  Fig. 10).
In conclusion, RSS-NET is adaptive to various genetic architectures and enrichment patterns, it is robust to a wide range of model mis-specification, and it outperforms existing related methods. To further investigate its real-world utility, we applied RSS-NET to analyze 18 complex traits and 38 regulatory networks.
Enrichment analyses of 38 networks across 18 traits. We first inferred 20 whole-genome regulatory networks for 38 human cell types and tissues (Methods; Supplementary Data 1) from public data 29 of paired expression and chromatin accessibility (PECA). On average each network has 431 TFs, 3,298 TGs, and 93,764 weighted TF-TG edges. Clustering showed that networks recapitulated context similarity, with immune cells and brain regions grouping together as two units ( Fig. 5a; Supplementary Fig. 11).
As a validation, we assessed the pairwise similarity between the 38 PECA-based networks and 394 human cell type-and tissuespecific regulatory networks 18 reconstructed from independent cap analysis of gene expression (CAGE) data 7,8 . Reassuringly, PECA-and CAGE-based networks often reached maximum overlap when they were derived from biosamples of matched cell types or tissues ( Fig. 5b; Supplementary Fig. 12), showing that the context specificity of regulatory networks is replicable.
On the 38 networks, we applied RSS-NET to analyze 1.1 million common SNPs 31  To check whether observed enrichments could be driven by general regulatory enrichments, we created a "near-gene" control network with 18,334 protein-coding autosomal genes as nodes and no edges, and analyzed this control with RSS-NET on the same GWAS data. For most traits, the near-gene control has substantially weaker enrichment than the actual networks. In particular, 512 out of 684 trait-network pairs (one-sided binomial P = 2.2 × 10 −40 ) showed stronger enrichments than their neargene counterparts (average log10 BF increase: 13.94; one-sided t P = 5.1 × 10 −15 ), and, 16 out of 18 traits had multiple networks more enriched than the near-gene control (minimum: 5;  Table 2). Consistent with simulations ( Fig. 3a, b), these results indicate that network enrichments identified by RSS-NET are unlikely driven by arbitrary enrichments harbored in the vicinity of genes.
Among 512 trait-network pairs passing the near-gene enrichment control, we further examined whether the observed enrichments could be confounded by network properties or genomic annotations. We did not observe any correlation between BFs and three network features (proportion of SNPs in a network: Pearson R = −3.0 × 10 −2 , two-sided P = 0.49; node counts: R = −5.4 × 10 −2 , P = 0.23; edge counts: R = −9.2 × 10 −3 , P = 0.84). To check confounding effects of genomic annotations, we computed the correlation between BFs and proportions of SNPs falling into both a network and each of 73 functional categories 27 , and we did not find any significant correlation (−0.13 < R < −0.01, P > 0.05/73). Similar patterns hold for all 684 trait-network pairs (Supplementary Table 3 and Data 2). Together, the results suggest that observed enrichments are unlikely driven by generic network or genome features.
Enrichment patterns varied considerably among traits ( Fig. 5c; Supplementary Table 4). For type 2 diabetes (T2D), two of five networks passing the near-gene enrichment control showed the strongest support for M 11 . Many networks showed the strongest support for M 12 in breast cancer (10), body mass index (BMI, 14), waist-hip ratio (37), and schizophrenia (38). Since one rarely knows the true enrichment patterns a priori, and M 1 includes {M 11 , M 12 , M 13 } as special cases, we used M 1 -based BFs throughout this study. Collectively, these results highlight the heterogeneity of network enrichments across traits, which can be potentially learned from data by flexible approaches like RSS-NET.
Top-ranked enrichments recapitulated many trait-context links reported in previous GWAS. Genetic associations with BMI were enriched in the networks of pancreas (BF = 2.07 × 10 13 ), bowel (BF = 8.02 × 10 12 ), and adipose (BF = 4.73 × 10 12 ), consistent with the roles of obesity-related genes in insulin biology and energy metabolism. Networks of immune cells showed Using the simulated individual-level data we computed single-SNP association statistics, on which we compared RSS-NET with genelevel association components of RSS-E 16 and Pascal 26 . RSS-E is a special case of RSS-NET assuming σ 2 = 0, and RSS-E-baseline is a special case of RSS-E assuming θ = 0. Pascal includes two gene scoring options: maximum-of-χ 2 ("max") and sum-of-χ 2 ("sum"). Given a network, Pascal and RSS-Ebaseline do not leverage any network information, RSS-E ignores the edge information, and RSS-NET exploits the full topology. Each scenario contains 200 datasets and each dataset contains 16,954 autosomal protein-coding genes for testing. We defined a gene as "trait-associated'' if at least one SNP j within 100 kb of the transcribed region of this gene had non-zero effect (β j ≠ 0). For each gene in each dataset, RSS methods produced posterior probabilities that the gene was trait-associated (P 1 ), whereas Pascal methods produced association P-values; these statistics were used to rank the significance of gene-level associations. The first row of each panel displays ROC curves and AUROCs for all methods, with dashed diagonal lines indicating random performance (AUROC = 0.5). The second row of each panel displays precision-recall (PRC) curves and areas under PRC curves (AUPRCs) for all methods, with dashed horizontal lines indicating random performance. For both AUROC and AUPRC, higher value indicates better performance. Simulation details and additional results are provided in Supplementary Figs    To show that RSS-NET can be applied more generally, we analyzed the CAGE-based networks 18 of 20 cell types and tissues that were present in 38 PECA-based networks ( Fig. 5d; Supplementary Fig. 14). PECA-based networks often produced larger BFs than their CAGE-based counterparts on the same GWAS data (average log10 BF increase: 17.36; one-sided t P = 1.4 × 10 −11 ), suggesting that PECA-based networks are more enriched in genetic signals. Reassuringly, PECA-and CAGEbased networks consistently highlighted known trait-context links (e.g., immune cells and autoimmune diseases, muscle tissues and heart diseases). For some traits PECA-based networks produced more informative results. For example, CAGE-based analysis of HDL showed a broad enrichment pattern across cell types and tissues (which is consistent with previous connectivity analysis 18 of the same data), whereas PECA-based analysis identified liver as the top-enriched context by a wide margin. Although not our main focus, these results highlight the potential for RSS-NET to systematically evaluate different network inferences in GWAS.

Enrichment-informed prioritization of network genes.
A key feature of RSS-NET is that inferred network enrichments automatically contribute to prioritization of network genes (Method).
Specifically, for each locus RSS-NET produces P base 1 , P near 1 and P net 1 , the posterior probabilities that at least one SNP in the locus is associated with the trait, assuming M 0 , M 1 for the near-gene control network and M 1 for a given network, respectively. When multiple networks are enriched, RSS-NET produces P bma 1 by averaging P net 1 over all networks passing the near-gene control, weighted by their BFs. This allows us to assess genetic associations in light of enrichment without having to select a single enriched network. Differences between enrichment estimates (P net 1 or P bma 1 ) and reference estimates (P base 1 or P near 1 ) reflect the impact of network on a locus.
RSS-NET enhances genetic association detection by leveraging inferred enrichments. To quantify this improvement, for each trait we calculated the proportion of genes with higher P bma 1 than reference estimates (P base 1 or P near 1 ), among genes with reference P 1 passing a given cutoff (Fig. 5e). When using P base 1 as reference, we observed high proportions of genes with P bma 1 > P base 1 (median: 82-98%) across a wide range of P base 1 -cutoffs (0−0.9), and as expected, the improvement decreased as the reference cutoff increased. When using P near 1 as reference, we observed less genes with improved P bma 1 than using P base 1 (one-sided Wilcoxon P = 9.8 × 10 −4 ), suggesting the observed improvement might be partially due to general near-gene enrichments, but proportions of genes with P bma 1 > P near 1 remained high (median: 74-94%) nonetheless. Similar patterns occurred when we repeated the analysis with P net 1 across 512 trait-network pairs (Supplementary Table 5). Together the results demonstrate the strong influence of network enrichments on nominating additional traitassociated genes.
RSS-NET tends to promote more genes in networks with stronger enrichments. For each trait, the proportion of genes with P net 1 > P near 1 in a network is often positively correlated with the network enrichment BF (R: 0.28−0.91; Supplementary Table 6). When a gene belongs to multiple networks, the highest P net 1 often occurs in the top-enriched networks (Fig. 6). We illustrate this coherent pattern with MT1G, a liver-active 9 gene prioritized for HDL by RSS-NET and also implicated in a recent multi-ancestry genome-wide interaction analysis of HDL 38 . Although MT1G belongs to regulatory networks of 18 contexts, only the top enrichment in liver informs a strong association between MT1G and HDL (P net 1 ¼ 0:98), and remaining networks with weaker enrichments yield minimal improvement (P base 1 ¼ 0:10, P net 1 : 0:14À0:19). RSS-NET recapitulates many genes implicated in the same GWAS. For each analyzed dataset we downloaded the GWASimplicated genes from the GWAS Catalog 1 and computed the proportion of these genes with high P bma 1 . With a stringent cutoff P bma 1 ≥ 0:9, we observed a significant overlap (median across traits: 69%; median two-sided Fisher exact P = 1.2 × 10 −26 ; Supplementary Table 7). Reassuringly, many recapitulated genes are well-established for the traits (Supplementary Table 8), such as CACNA1C for schizophrenia, TCF7L2 for T2D, APOB for lipids, and STAT4 for autoimmune diseases.
RSS-NET also uncovers putative associations that were not reported in the same GWAS. To demonstrate that many of these previously undescribed associations are potentially real, we exploited 15 analyzed traits that each had a updated GWAS with larger sample size. In each case, we obtained newly implicated genes from the GWAS Catalog 1 and computed the proportion of these genes that were identified by RSS-NET (P bma 1 ≥ 0:9). The overlap proportions remained significant (median: 12%; median two-sided Fisher exact P = 1.9 × 10 −5 ; Supplementary  NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-021-22588-0 ARTICLE genes that can be validated by later GWAS with additional samples. Among these validated genes, many are strongly supported by multiple lines of external evidence (Table 1). A particular example is NR0B2, a liver-active 9 gene prioritized for HDL by RSS-NET (P base 1 ¼ 0:84, P net 1 ¼ 0:98) but not identified by standard GWAS 39 of the same data (minimum single-SNP association P = 1.4 × 10 −7 within 100 kb, n = 99, 900). NR0B2 was associated with mouse lipid traits [40][41][42] and human obesity 43 , and identified in a later GWAS of HDL 44 with doubled sample size (P = 9.7 × 10 −16 , n = 187, 056).
Biological and clinical relevance of prioritized genes. Besides looking up overlaps with GWAS publications, we crossreferenced RSS-NET prioritized genes (P bma 1 ≥ 0:9) with multiple orthogonal databases to systematically assess their biological and therapeutic themes.
Mouse phenomics provides important resources to study genetics of human traits 45 . Here we evaluated overlap between RSS-NET prioritized genes and genes implicated in 27 categories of knockout mouse phenotypes 46 . Network-informed genes (P bma 1 ≥ 0:9) were significantly enriched in 128 mouse-human trait pairs (FDR ≤ 0.1; Supplementary Data 4). Fewer significant pairs were identified without network information (119 for P near 1 ≥ 0:9; 80 for P base 1 ≥ 0:9). For many human traits, top enrichments of network-prioritized genes occurred in closely related mouse phenotypes (Fig. 5f). Genes prioritized for schizophrenia were strongly enriched in nervous, neurological and growth phenotypes (OR: 1.77-2.04). Genes prioritized for autoimmune diseases were strongly enriched in immune and hematopoietic phenotypes (OR: 2.05-2.35). The cardiovascular system showed strong enrichments of genes prioritized for heart conditions (OR: 2.45-2.92). The biliary system showed strong enrichments of genes prioritized for lipids, BMI, CAD, and T2D (OR: 2.16-10.78). The phenotypically matched cross-species enrichments strengthen the biological relevance of RSS-NET results.
Genes causing Mendelian diseases often contribute to complex traits 47 . Here we quantified overlap between RSS-NET prioritized genes and genes causing 19 categories 48 of Mendelian disorders 49 . Leveraging regulatory networks (P bma 1 ≥ 0:9), we observed 47 significantly enriched Mendelian-complex trait pairs (FDR ≤ 0.1; 44 for P near 1 ≥ 0:9; 31 for P base 1 ≥ 0:9; Supplementary Data 5), among which the top-ranked ones were often phenotypically matched (Fig. 5g). Genes prioritized for schizophrenia were strongly enriched in Mendelian development and psychiatric disorders (OR: 2.22-2.23). Genes prioritized for AF and heart rate were strongly enriched in arrhythmia (OR: 7.16-8.28). Genes prioritized for autoimmune diseases were strongly enriched in Table 1 Examples of RSS-NET highlighted genes that were not reported in GWAS of the same data but were implicated in later GWAS with increased sample sizes (genome-wide significance threshold: single-SNP association P < 5 × 10 −8 ).  51 . To evaluate their potential in drug discovery, we examined whether RSS-NET prioritized genes are pharmacologically active and clinically relevant 52 . We identified genes with drug indications matching GWAS traits. One identical match is EDNRA, a gene that is prioritized for CAD (P base 1 ¼ 0:57, P net 1 ¼ 0:82 in aorta) and also a successful target of approved drugs for cardiovascular diseases (Table 1). We identified genes with drug indications closely related to GWAS traits. For example, TTR is prioritized for Alzheimer (P base 1 ¼ 0:64, P bma 1 ¼ 0:94) and also a successful target of approved drugs for amyloidosis ( Table 2). For early-stage development, overlaps between drug indications and GWAS traits may provide additional genetic confidence. For example, HCAR3 is prioritized for HDL (P base 1 ¼ 0:85, P bma 1 ¼ 0:92) and also a clinical trial target for lipid metabolism disorders ( Table 2). Other examples include CASP8 with cancer, NFKB2 with IBD, and DLG4 with stroke (Tables 1, 2). For some genes we found mismatches between drug indications and GWAS traits, which could suggest drug repurposing opportunities 53 . For example, CSF3 is prioritized for AF (P base and also a successful target of an approved drug for aplastic anemia (AA). Since CSF3 is associated with various blood cell traits in mouse 54 and human 55 , and inflammation plays a role in both AA and AF etiology 36,37,56 , it is tempting to assess effects of the approved AA drug on AF. Mechanistic evaluations are required to understand the prioritized therapeutic genes, but they could form a useful basis for future studies.

Discussion
We present RSS-NET, a topology-aware method for integrative analysis of regulatory networks and GWAS summary data. We demonstrate the improvement of RSS-NET over existing methods through extensive simulations, and illustrate its potential to yield biological and therapeutic insights via analyses of 38 networks and 18 traits. With multi-omics integration becoming a routine in GWAS, we expect that researchers will find RSS-NET useful.
Compared with existing integrative approaches, RSS-NET has several key strengths. First, unlike many methods that require loci passing a significance threshold 11,12,17 , RSS-NET uses data from genome-wide common variants. This potentially allows RSS-NET to identify subtle enrichments even in studies with few significant hits. Second, RSS-NET models enrichments directly as increased rates (θ) and sizes (σ 2 ) of SNP-level associations, and thus bypasses the issue of converting SNP-level summary data to genelevel statistics 17,18,26 . Third, RSS-NET inherits from RSS-E 16 an important feature that inferred enrichments automatically highlight which network genes are most likely to be trait-associated. This prioritization component, though useful, is missing in current polygenic analyses 13,15,24,27 . Fourth, by making flexible modeling assumptions, RSS-NET is adaptive to unknown genetic architectures.
RSS-NET allows us to study complex trait genetics through the lens of regulatory topology. Complementing previous connectivity analyses [17][18][19]24 , RSS-NET highlights a consistent pattern that genetic signals of complex traits often distribute across genome via regulatory topology. RSS-NET further leverages topology enrichments to enhance trait-associated gene discovery.
The topology awareness of RSS-NET in both enrichment and prioritization analyses is enabled by a model that decomposes the effect of a single SNP into effects of multiple (cis or trans) genes through a regulatory network.
RSS-NET depends critically on the quality of input networks. The more accurate networks are, the better performance RSS-NET achieves. Currently, our understanding of regulatory networks remains incomplete, and most of the available networks are algorithmically inferred [17][18][19][20] . Artifacts in inferred networks can bias RSS-NET results; however, our simulations confirm the robustness of RSS-NET when input networks are not severely deviated from ground truth. The modular design of RSS-NET enables systematic assessment of various networks in the same GWAS and provides interpretable performance metrics, as illustrated in our comparison of PECA-and CAGE-based networks. As more accurate networks become available in diverse cellular contexts, the performance of RSS-NET will be markedly enhanced.
Like any method, RSS-NET has several limitations in its current form. First, despite its prioritization feature, RSS-NET does not attempt to pinpoint associations to causal SNPs within prioritized loci. For this task, we recommend off-the-shelf finemapping methods 57 . Second, the computation time of RSS-NET increases as the total number of analyzed SNPs increases, and thus our simulations and analyses focused on 0.35-1.19 million genome-wide common SNPs 28,31 . Relaxing the complexity will allow RSS-NET to analyze more SNPs jointly. Third, RSS-NET uses a simple method to derive SNP-gene relevance (c jg ) from expression quantitative trait loci (eQTL). A more principled approach would be applying the RSS likelihood 25 to eQTL summary data (as we did in GWAS) and using the estimated SNP effects to specify c jg . However, our initial assessments indicated that the model-based approach was limited by the small sample sizes of current eQTL studies 9,10 . With eQTL studies reaching large sample sizes 58 comparable to current GWAS 1 , this approach may improve c jg specification in RSS-NET. Fourth, RSS-NET analyzes one network at a time. Since a complex disease typically manifests in various sites, multiple cellular networks are likely to mediate disease risk jointly. To extend RSS-NET to incorporate multiple networks, an intuitive idea would be representing the total effect of a SNP as an average of its effect in each network, weighted by network relevance for a disease. Fifth, RSS-NET does not include known SNP-level 13,24,27 or gene-level 14-16 annotations. Although our mis-specification simulations and near-gene control analyses confirm that RSS-NET is robust to generic enrichments of known features, accounting for known annotations can help interpret observed network enrichments 24 . Our preliminary experiments showed that incorporating additional networks or annotations in RSS-NET increased computation costs. Hence, we view developing computationally efficient multinetwork, multi-annotation methods as an important area for future work.
In summary, improved understanding of complex trait genetics requires biologically informed models beyond the standard one employed in GWAS. By modeling context-specific regulatory topology, RSS-NET is a step forward in this direction.
Gene regulatory networks. In this study a regulatory network is a directed bipartite graph {V TF , V TG , E TF→TG }, where V TF and V TG denote the node sets of TFs and TGs respectively, and E TF→TG denotes the set of TF-to-TG edges, summarizing how TFs regulate TGs through REs (Fig. 1b; Supplementary Note 4). Each edge has a weight between 0 and 1, measuring the relative regulation strength of a TF on a TG.
We inferred 38 regulatory networks from context-matched sequencing data of gene expression (e.g., RNA-seq) and chromatin accessibility (e.g., DNAse-seq or ATAC-seq). We obtained these PECA data from ENCODE 29 (https://www. encodeproject.org, accessed December 14, 2018) and GTEx 9 (https://gtexportal. org, accessed July 13, 2019); see Supplementary Data 1. The networkconstruction software and TF-motif information are available at https://github. com/suwonglab/PECA. The 38 networks are available at https://github.com/ suwonglab/rss-net, with descriptive statistics provided in Supplementary  Tables 9-11. We first constructed an "omnibus" network from PECA data of 201 biosamples across 80 cell types and tissues, using a regression-based method 20 . In brief, by modeling the distribution of TG expression levels conditional on RE accessibility levels and TF expression levels, we estimated a regression coefficient for each TF-TG pair. We selected a TF-TG pair as the network edge if this estimated coefficient was significantly non-zero, and divided the estimate by the maximum of estimates for all TF-TG pairs to set a (0, 1)-scale edge weight. We also estimated a regression coefficient for each RE-TG pair, which reflected the regulating strengths of REs on TGs and was later used to construct context-specific networks, i.e., {I it } in Eq. (1). Here we defined REs as open chromatin peaks called from accessibility sequencing data by MACS2 59 (https://github.com/macs3-project/MACS, accessed July 12, 2018).
With the omnibus network in place, we then constructed context-specific networks for 5 immune cell types, 5 brain regions and 27 non-brain tissues. For each context (tissue or cell type), we computed a trans-regulation score (TRS) between TF g and TG t: where R gt is the correlation of TF g and TG t expression levels across all contexts; f f TF g ; f TG t ; f RE i g are normalized context-specific expression (TF g, TG t) and accessibility (RE i) levels (ỹ ¼ y 2 =y med , where y denotes the actual accessibility or expression level in a given context, and y med denotes median level across all contexts); B gi reflects the motif binding strength of TF g on RE i, defined as the sum of motif position weight matrix-based log-odds probabilities of all binding sites on RE i and calculated by HOMER 60 (http://homer.ucsd.edu/homer/, accessed July 12, 2018); and I it reflects the overall regulating strength of RE i on TG t, provided by the omnibus network. TRS naturally ranks and selects context-specific TF-TG edges because a larger value of TRS gt indicates a stronger regulating strength of TF g on TG t in the given context. We set (0, 1)-scale TF-TG edge weights by computing log 2 ð1 þ TRS gt Þ=max ði;jÞ flog 2 ð1 þ TRS ij Þg.
To validate PECA-based networks and illustrate RSS-NET as a generally applicable tool, we also analyzed 394 cell type-and tissue-specific TF-TG circuits 18 inferred from independent CAGE data 7,8 (http://regulatorycircuits.org/, accessed May 8, 2019). When evaluating the similarity between PECA-and CAGE-based networks ( Fig. 5b; Supplementary Fig. 12), we used their full node and edge sets to compute Jaccard indices. When running RSS-NET on context-matched PECA-and CAGE-based networks ( Fig. 5d; Supplementary Fig. 14), we selected top-ranked CAGE-based edges to match PECA-based edge counts (Supplementary Table 10 When quantifying overlaps between RSS-NET prioritized genes and mouse or Mendelian genes, we used all genes for each GWAS trait. We repeated the overlap analysis under the same significance cutoff (FDR ≤ 0.1) after excluding genes implicated in the same or later GWAS (Supplementary Table 7). Since GWASimplicated genes overlap significantly with phenotypically-matched mouse and Mendelian genes (median two-sided Fisher exact P = 7.1 × 10 −7 ), we identified fewer discoveries as expected (mouse-human pairs: 26, Mendelian-complex pairs: 4; Supplementary Data 4-5), but we obtained consistent effect sizes nonetheless (mouse R = 0.78, two-sided P = 8.6 × 10 −73 ; Mendelian R = 0.89, P = 9.0 × 10 −74 ; Supplementary Fig. 15).
Network-induced effect size distribution. We model the total effect of SNP j on a given trait β j as where π j denotes the probability that SNP j is associated with the trait (β j ≠ 0), N ðμ j ; σ 2 0 Þ denotes a normal distribution with mean μ j and variance σ 2 0 specifying the effect size of a trait-associated SNP j, and δ 0 denotes point mass at zero (β j = 0).
We model the trait-association probability π j as log 10 π j where θ 0 < 0 captures the genome-wide background proportion of trait-associated SNPs, θ > 0 reflects the increase in probability, on the log10-odds scale, that a SNP near network genes and REs is trait-associated, and a j reflects the proximity of SNP j to a network. Following previous analyses 15,16,24 , we let a j = 1 if SNP j is within 100 kb of any member gene (TF, TG) or RE for a given network. Equation (3) suggests that if a cell type or tissue plays an important role in a trait then genetic associations may occur more often in SNPs involved in the corresponding network genes and REs than expected by chance. We model the mean effect size μ j as where O j is the set of all nearby or distal genes contributing to the total effect of SNP j, w jg measures the relevance between SNP j and gene g, and γ jg denotes the effect of SNP j on a trait due to gene g. Equation (4) provides a general decomposition of total SNP effect into gene effects through {O j , w jg }.
Here we use a TF-TG network to specify {O j , w jg } in Eq. (4): where G j is the set of all genes within 1 Mb window of SNP j (a standard window size used in cis-eQTL studies 9,10,58 ), c jg measures the relative impact of a SNP j on gene g, T g is the set of all genes directly regulated by TF g in a given network (T g is empty if gene g is not a TF), and v gt measures the relative impact of a TF g on its TG t. Since a genome-wide analysis typically involves many SNPs and genes, we fix {T g , v gt , c jg } to ensure the identifiability of Eq. (5). We use inferred edges and weights of a contextspecific TF-TG network 20,29 to specify T g and v gt respectively. We use contextmatched cis-eQTL 9,10,58 to specify c jg (Supplementary Note 5 and Tables 12, 13). Equation (5) suggests that the total effect of a SNP may fan out through some regulatory network of multiple (nearby or distal) genes to affect the trait 22 . We model the random effect γ jg of SNP j due to gene g as where the SNP-level subscript j in γ jg ensures the exchangeability of β j in Eq. (2); see Supplementary Note 6. Equation (6) uses a constant σ 2 for computational convenience. Equation (6) could be modified by letting σ 2 depend on functional annotations 13,27 of SNP j and context-specific expression [14][15][16] of gene g, though possibly at higher computational cost. Equations (2), (4), and (6) implies a variance decomposition for SNP effect: We hypothesize that Eq. (7) may provide an alternative approach to heritability analyses 13,24,27 and we plan to investigate it elsewhere.
Bayesian hierarchical modeling. Consider a GWAS with n unrelated individuals measured on p SNPs. In practice we do not know the true SNP-level effects β :¼ ðβ 1 ; ; β p Þ 0 in Eq. (2), but we can infer them from GWAS summary statistics and LD estimates. Specifically, we perform Bayesian inference for β by combining the network-based prior defined by Eqs. (2)-(6) with the RSS likelihood 25 where b β :¼ ðβ 1 ; ;β p Þ 0 , b S :¼ diagðb sÞ is a p × p diagonal matrix with b s :¼ ðŝ 1 ; ;ŝ p Þ 0 , fβ j ;ŝ j g are estimated single-SNP effect size of each SNP j and its standard error from the GWAS, and b R :¼ ½r ij is the p × p LD matrix estimated from a reference panel with ancestry matching the GWAS.
RSS-NET, defined by Eqs.
(2)-(6), and (8), consists of four unknown hyperparameters fθ 0 ; θ; σ 2 0 ; σ 2 g. To specify hyper-priors, we first introduce two free parameters {η, ρ} to re-parameterize fσ 2 0 ; σ 2 g: where, roughly, η represents the proportion of the total phenotypic variation explained by p SNPs, and ρ represents the proportion of total genetic variation explained by network annotations {O j , w jg }. Because nŝ 2 j approximates the ratio of phenotype variance to genotype variance, Eq. (9) ensures that SNP effects (β) do not rely on sample size n and have the same measurement unit as the trait. See Supplementary Note 7 for derivation of Eq. (9).
We then place independent uniform grid priors on {θ 0 , θ, η, ρ} (Supplementary  Table 14). These simple hyper-priors produce accurate posterior estimates for hyper-parameters in simulations ( Supplementary Fig. 16). RSS-NET results are robust to grid choice on both simulated and real data .
(If one had specific information about {θ 0 , θ, η, ρ} in a given setting then this could be incorporated in the hyper-priors).
Network enrichment. To assess whether a regulatory network is enriched for genetic associations with a trait, we evaluate a Bayes factor (BF): where f( ⋅ ) denotes probability densities, a is defined in Eq.  Table 15). As the genetic basis of most complex traits remains unknown, we find it impractical to fix some significance threshold. Instead we recommend an adaptive approach. Specifically, for a given GWAS we run RSS-NET on a near-gene control network containing all genes as nodes and no edges (i.e., a j = 1 for all SNPs within 100 kb of any gene and v gt = 0 for all TF-TG pairs), and we use the resulting BF as the enrichment threshold in this GWAS. Our analyses show three advantages of this approach. First, it is adaptive to study heterogeneity such as trait differences and sample sizes (Supplementary Table 1). Second, it accounts for generic enrichments of genetic signals residing near genes. Third, it facilitates comparisons with non-Bayesian methods based on P-values (Supplementary Table 2).
Locus association. To identify the association between a locus and a trait, we compute P 1 , the posterior probability that at least one SNP in the locus is associated with the trait: where D is a shorthand for the input data of RSS-NET including GWAS summary statistics f b β; b Sg, LD estimates b R and network annotations {a, O, W}. See Supplementary Note 3 for computation details. For a locus, P base 1 , P near 1 , and P net 1 correspond to P 1 evaluated under the baseline model M 0 , the enrichment model M 1 for the near-gene control network, and M 1 for a given TF-TG network. In this study, we defined a locus as the transcribed region of a gene plus 100 kb up and downstream, and we used "locus" and "gene" interchangeably.
For K networks with enrichments stronger than the near-gene control, we use Bayesian model averaging (BMA) to compute P bma 1 for each locus: where P net 1 ðkÞ and BF(k) are enrichment P 1 and BF for network k. The ability to average across networks in Eq. (12) is an advantage of our Bayesian framework, because it allows us to assess associations in light of network enrichment without having to select a single enriched network.
In this study we used P 1 ≥ 0.9 as the significance cutoff, yielding a median false positive rate 1.24 × 10 −4 and a median false discovery rate 6.43 × 10 −2 in simulations (Supplementary Tables 16, 17). We also highlighted genes with P net 1 > P near 1 ( Fig. 6 and Tables 1, 2), because they showcase the influence of context-specific regulatory topology on prioritizing genetic associations.
Computation time. The total computation time of RSS-NET to analyze a pair of trait and network is determined by the number of genome-wide SNPs analyzed, the size of hyper-parameter grid, and the number of variational iterations till convergence, all of which can vary considerably among studies. It is thus hard to make general statements about computation time. However, to give a specific example, we finished the analysis of 1,032,214 HapMap3 SNPs and liver network for HDL within 12 hours in a standard computer cluster (60 nodes, 8 CPUs, and 32 Gb memory per node).
The number of genome-wide SNPs analyzed (p) affects the computation time of RSS-NET in two distinct ways. First, the per-iteration complexity of RSS-NET is linear with p (Box 1; Supplementary Note 1). Second, a large p defines a large optimization problem, often requiring many iterations to converge. To quantify the impact of p on computation time, we simulated datasets from different sets of genome-wide SNPs, analyzed them with RSS-NET on identical computers, and compared the computation time ( Supplementary Fig. 9). When p increased from 348,965 to 1,030,397, on average the total computation time was four times longer (one-sided Wilcoxon P = 8.0 × 10 −132 ).
Simulation overview. To assess the network-induced model for SNP effects (β) in RSS-NET, we simulated a large array of correctly-and mis-specified β for a given target network. Specifically, we generated "positive" datasets where the underlying β was simulated from M 1 for the target network, and "negative" datasets where β was simulated from either M 0 or the following scenarios: (1) random enrichments of near-gene SNPs; (2) random enrichments of near-RE SNPs; (3) MAF-and LDdependent effect sizes; (4) M 1 for edge-altered copies of the target network. For a fair comparison in each scenario, we matched positive and negative datasets by both the number of trait-associated SNPs and the proportion of phenotypic variation explained by all SNPs. See Supplementary Figs. 1-9 for details.
We combined the simulated β with genotypes of 348,965 genome-wide SNPs from 1,458 individuals 28 to simulate phenotypes using an additive multiple-SNP model with Gaussian noise. We performed the standard single-SNP analysis of simulated individual-level datasets to generate GWAS summary statistics, on which we compared RSS-NET with external methods.
Given a context-specific TF-TG network, RSS-E and LDSC methods use the same binary SNP-level annotations {a j } defined in Eq. (3). The interface design of Pascal does not allow direct usage of {a j }. Here we supplied Pascal program with a GMT file containing all member genes of a network and set SNP-to-gene window sizes as 100 kb ("-up = 100000 -down = 100000"). In this study all external methods were used with their default setups, which did not include the edge information of a network.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
The 38 network files are available at https://github.com/suwonglab/rss-net (https://doi. org/10.5281/zenodo.4553387). Analysis results of 38 networks and 18 traits are available at https://suwonglab.github.io/rss-net/results. Links and identifiers of other data are specified in Methods, Supplementary Notes 5 and 8. Source data are provided with this paper.