Introduction

Genome-wide association studies (GWAS) have discovered several important disease associations.1 Assigning signals to causal genes is difficult because these signals fall principally within non-coding regions and do not necessarily implicate the nearest gene.2 For example, a signal found in an FTO intron has been shown to physically interact with and lead to differential expression of other genes, but not FTO itself.3 Moreover, evidence suggests that a type 2 diabetes (T2D) GWAS signal at TCF7L2 also influences ACSL5.4

Chromatin interaction studies have discovered genome organization principles including topologically associating domains (TADs).5 TADs are genomic regions defined by increased contact frequency, consistency across cell types and enrichment of insulator element flanks.6 Therefore, TADs can be used as boundaries of where non-coding causal variants will most likely impact tissue-independent function.

The paper is structured in the following manner: First, we present our novel computational method, called TAD_Pathways, which uses TADs to determine candidate genes. Next, we apply our method to bone mineral density (BMD) GWAS findings and test two candidates’ importance in osteoblast function. Our pipeline identified ACP2 as a novel regulator of osteoblast metabolism. A full description of the method and validation is available in the Supplementary Video.

Methods

Computational procedures to identify candidate genes

TAD_Pathways is a computational method using publicly available TAD boundaries to prioritize candidate genes from GWAS SNPs (Figure 1a). Alternative approaches assign SNPs to genes based on nearest gene or by an arbitrary or a linkage disequilibrium (LD)-based window of several kilobases (Figure 1b). For full computational methods, refer to the Supplementary Information.

Figure 1
figure 1

Concepts motivating our approach. TADs are shown as orange triangles, genes are shown as black lines and a genome-wide significant GWAS signal is shown as a dotted red line. (a) The TAD_Pathways method. An example using BMD GWAS signals is shown. (b) Three hypothetical examples are illustrated by a cartoon. The ground truth causal gene is shaded in red. The method-specific selected genes are shaded in blue. The top panel describes a nearest-gene approach. The nearest gene in this scenario is not the gene actually impacted by the GWAS SNP. The middle panel describes a window approach. Based either on linkage disequilibrium or an arbitrarily sized window, the scenario does not capture the true gene. The bottom panel describes the TAD_Pathways approach. In this scenario, the causal gene is selected for downstream assessment.

Here, we use human embryonic stem cell TAD boundaries as reported by Dixon et al.6 and converted to hg19 by Ho et al.7 to build TAD-based gene sets that consists of all genes falling inside TADs implicated with BMD associations. We perform a pathway overrepresentation test8 for the input TAD genes against GO terms.9 This determines if the gene set is associated with any term at a higher probability than by chance. We included both experimentally confirmed and computationally inferred genes, which permit the inclusion of putative genes that do not necessarily have literature support. For validation, we consider only the most significantly enriched term, but a user can also select multiple. Our method also supports custom input SNPs. TAD_Pathways Software is available at https://github.com/greenelab/tad_pathways_pipeline.

Experimental knockdown of candidate genes

We investigated two candidate genes predicted by TAD_Pathways: ACP2 and DEAF1. We selected these genes because they are not the nearest gene and are not in the same LD block as the GWAS SNP (Supplementary Figures S1 and S2). Additionally, the genes were not previously known to impact human bone and thus represented potential novel research/treatment avenues. The corresponding BMD GWAS loci rs7932354 (11p11.2) and rs11602954 (11p15.5) were previously assigned to ARHGAP1 and BET1L, respectively.10

These genes were experimentally knocked down in a human fetal osteoblast (hFOB) cell line using a commercial siRNA reagent system in three temporally separated independent technical replicates. The influence of knockdown on gene expression (qPCR), cellular metabolism/proliferation (MTT), and early osteoblast differentiation (ALP) was evaluated within the first 4 days following siRNA transfection. All values are reported as mean±SD with statistical significance determined via two-way homoscedastic Student’s t-tests (*P≤0.05, #P≤0.10, NS='not significant'). Complete experimental methods are included in the Supplementary Information.

Results

TAD_Pathways reveals candidate genes within phenotype-associated TADs

We applied TAD_Pathways to BMD GWAS results derived from replication-requiring journals (see Supplementary Information publications). GWAS curation resulted in the aggregation of 70 unique BMD SNPs. TAD_Pathways implicated ‘Skeletal System Development’ as the top-ranked pathway (Benjamini–Hochberg adjusted P=1.02 × 10−5). For full BMD TAD_Pathways results refer to Supplementary Table S1. Many candidates were not the nearest gene to the GWAS signal and several had independent eQTL support (Supplementary Table S2).

We compared TAD boundary gene aggregation to nearest-gene and LD windows (r2>0.4). The aggregated gene lists included different gene sets, with TAD boundaries aggregating the most genes (Supplementary Figure S3A). We also applied a pathway analysis to each gene set, and the top pathway for all methods was ‘Skeletal System Development’. TAD_Pathways identified 38 total candidate genes and 17 unique genes not discovered by either nearest-gene or LD approaches (Supplementary Figure S3B).

siRNA knockdown of candidate genes in osteoblasts

We targeted the expression of four genes in vitro using siRNA and assessed transcriptional knockdown efficiency (Figure 2). We noted variation across the three controls, with the scrambled siRNA control altering expression of OCN (osteocalcin), IBSP (bone sialoprotein), TNAP and BET1L (P<0.05). Relative to the scrambled siRNA control, OCN was downregulated in all siRNA groups (P<0.05), except for BET1L siRNA (P=0.122). OSX, IBSP and TNAP were not significantly altered by any siRNA treatment (Figure 2).

Figure 2
figure 2

Real-time PCR of osteoblast differentiation genes and GWAS/TAD hits in hFOB cells. siRNA was used to knockdown expression of TNAP (positive control), ARHGAP1, ACP2, BET1L and DEAF1. Relative expression of the osteoblast marker genes OSX, OCN and IBSP suggests that GWAS/TAD hits are not major regulators of bone differentiation in this model. Red bars highlight specificity of each siRNA knockdown. Values represent mean±SD. Statistical significance relative to the scrambled siRNA control is annotated as: *P≤0.05 and #P≤0.10 using a two-tailed Student’s t-test.

Metabolic and osteoblastic activity of TAD_Pathways gene predictions

Treatment with ACP2 siRNA led to a 66.0% reduction in MTT metabolic activity versus the scrambled siRNA control (P=0.012). ARHGAP1 siRNA caused a 38.8% reduction (P=0.088). siRNA targeted against TNAP, BET1L or DEAF1 did not alter MTT metabolic activity (Figure 3a).

Figure 3
figure 3

Validating two TAD_Pathways predictions for BMD GWAS hits on hFOB cells. siRNA was used to knockdown expression of TNAP, ARHGAP1, ACP2, BET1L and DEAF1. (a) Knockdown of ACP2 decreases cellular metabolic activity, demonstrated using an MTT assay. (b) ALP staining and quantitation indicates that knockdown of TNAP or ACP2 inhibits performance in an osteoblast differentiation assay. Values represent mean±SD. Statistical significance relative to the scrambled siRNA control is annotated as: *P≤0.05 and #P≤0.10 using a two-tailed Student’s t-test.

ALP is highly expressed in osteoblasts: disruption of proliferation or osteoblast differentiation results in ALP downregulation. TNAP siRNA significantly reduced ALP intensity by 5.98±1.77 units versus the scrambled siRNA control (P=0.006). ACP2 siRNA also significantly reduced ALP intensity by 8.74±2.11 (P=0.003). The control stained less intensely than untreated or transfection reagent controls, but this did not reach statistical significance (0.05<P<0.10) (Figure 3b).

Discussion

We show that TAD_Pathways can reveal functional gene to intermediate phenotype relationships using BMD. Several of the TAD_Pathways genes, such as LRP5, are bona fide BMD genes already identified by several methods, thus providing positive controls. However, several BMD GWAS signals do not have obvious nearest-gene associations with bone. Our results suggest that a nearby gene, ACP2, and not the nearest gene, ARHGAP1, regulates osteoblast proliferation/viability. There is modest previous evidence that ACP2 impacts bone in mouse models11 and is thus a promising candidate for follow-up studies.

There are several limitations to our approach. Publication biases from pathway curation present challenges.12 To lessen this bias, we include computationally predicted GO annotations. We used TAD boundaries defined by Dixon et al.,13 whereas increased Hi-C resolution reduced TAD sizes. Despite our method using larger TADs, we still identify relevant pathways. However, the method will fail in diseases instigated by aberrant looping. We were also concerned that TAD_Pathways works only with BMD. We applied TAD_Pathways to T2D and identify several candidate genes that are also not the nearest gene (see Supplementary Table S3). Moreover, the experimental validation was performed in a tetraploid in vitro cell culture system, which may compensate for gene knockdown. While TAD_Pathways identified several candidate genes, we only examined two, and our validation approach does not directly interrogate each SNP.

One of the investigated GWAS SNPs, rs7932354, located in the ARHGAP1 promoter, is an eQTL for ARHGAP1 in several GTEx tissues14 and is associated with epigenetic marks and alternative genes in HaploReg.15 However, none of these tissues are bone related and our screen implicates ACP2 and not ARHGAP1 in osteoblast processes. Furthermore, LRP4 and PACSIN3 also fall within the rs7932354 TAD and LD block (Supplementary Figure S1). Both genes are associated with bone.16, 17 Therefore, TAD_Pathways revealed additional genes that would otherwise have been overlooked by alternative methods.

In conclusion, TAD_Pathways can be used as a candidate gene discovery tool through the leveraging of features of chromatin looping. TAD_Pathways is different from previous approaches, such as DEPICT18 and MAGENTA,19 because it only requires the trait as user input and can be performed rapidly. Our method builds solely from publicly available GWAS and TAD boundaries. TAD_Pathways overcomes SNP abundance-related gene selection biases pervasive in previous methods by aggregating SNPs directly to TADs instead of genes.20 We believe TAD_Pathways and algorithms that leverage 3D genomic structure will aid in the discovery of novel disease features. A Supplementary Video is available at the European Journal of Human Genetics website.