Integrating Epigenomic Elements and GWASs Identifies BDNF Gene Affecting Bone Mineral Density and Osteoporotic Fracture Risk

To identify susceptibility genes for osteoporosis, we conducted an integrative analysis that combined epigenomic elements and previous genome-wide association studies (GWASs) data, followed by validation at population and functional levels, which could identify common regulatory elements and predict new susceptibility genes that are biologically meaningful to osteoporosis. By this approach, we found a set of distinct epigenomic elements significantly enriched or depleted in the promoters of osteoporosis-associated genes, including 4 transcription factor binding sites, 27 histone marks, and 21 chromatin states segmentation types. Using these epigenomic marks, we performed reverse prediction analysis to prioritize the discovery of new candidate genes. Functional enrichment analysis of all the prioritized genes revealed several key osteoporosis related pathways, including Wnt signaling. Genes with high priority were further subjected to validation using available GWASs datasets. Three genes were significantly associated with spine bone mineral density, including BDNF, PDE4D, and SATB2, which all closely related to bone metabolism. The most significant gene BDNF was also associated with osteoporotic fractures. RNA interference revealed that BDNF knockdown can suppress osteoblast differentiation. Our results demonstrated that epigenomic data could be used to indicate common epigenomic marks to discover additional loci with biological functions for osteoporosis.

Osteoporosis is a major public health problem due to the aging population globally. This common skeletal disease is characterized by low bone mass, poor bone quality, and an increased predisposition to fractures 1 . Osteoporosis is diagnosed clinically through the measurement of bone mineral density (BMD), which is the most widely used predictor of fractures 2,3 . Therapeutic decisions that are aimed at preventing fractures are often based on BMD measurement.
BMD is known to be highly heritable, with heritability estimates between 0.6-0.8 4 . Osteoporotic fracture, an endpoint clinical outcome of osteoporosis, also has moderate heritability of 0.5-0.7 5,6 . During the past few years, genome-wide association studies (GWASs) have been demonstrated to be an effective strategy for genetic dissection of human complex diseases/traits 7 . Through this strategy, over 100 novel genetic loci have been successfully identified for osteoporosis 8 . However, the genetic variants identified so far together explain only a small proportion of the heritability for osteoporosis 9 . Due to the modest genetic effect size and inadequate statistical power, true association signals may not be discovered with the use of a stringent genome-wide significance threshold alone 10 . It is likely that a sizable proportion of those rejected associations are false negative, and methods of interpretation are needed to recognize such associations.
In addition, since most associated SNPs reported by GWASs reside within intronic or intergenic regions 11 , and the majority of the GWASs have not provided much information beyond statistical signals, it is difficult to elucidate the functional mechanisms for those novel susceptibility loci in determining the phenotype. Strikingly, these GWASs SNPs are usually involved in regulating gene expression via some common regulatory or functional Identification of epigenomic elements enriched/depleted in osteoporosis-associated genes. We examined whether or not any of the epigenomic elements were enriched or depleted in the osteoporosis-associated genes. Three groups of epigenomic elements were used in the analyses, including 161 TFBSs, 135 chromatin states, and 273 histone marks, which are summarized in more detail in Supplemental Table S3. A total of 52 epigenomic elements were identified to be significantly enriched or depleted in the osteoporosis-associated genes (Fig. 1).
We then analyzed cell type-specific epigenomic elements which regulate the accessibility of chromatin, such as chromatin states and histone marks. Multiple cell types from ENCODE were used to collect these epigenomic data (Supplemental Table S4). For the chromatin states, 21 out of the 135 chromatin states segmentation types were identified to be enriched or depleted in the promoters of osteoporosis-associated genes ( Fig. 1, Supplemental  Table S3). Specifically, both "poised promoter" and "repressed" chromatin regions were significantly enriched in 3 cell lines, including Gm12878 (Epstein-Barr Virus transformed B-lymphoblastoid), H1hesc (Human Embryonic Stem Cells), and Hmec (Human Mammary Epithelial Cells). On the other hand, depletions of "transcriptional elongation", "weak transcribed", and "transcriptional transition" regions were found in several cell lines, including Gm12878, H1hesc, Hmec, Hepg2 (Hepatocellular Carcinoma), Nhek (Normal Human Epidermal Keratinocytes), Hsmm (Human Skeletal Muscle Myoblasts), and Nhlf (lung fibroblasts). "Strong enhancer" region was depleted in Nhlf and K562 chronic myelogenous leukemia cell lines.

Reverse prediction suggests new susceptibility genes for osteoporosis.
Given that the identified epigenomic elements enriched/depleted in osteoporosis-associated genes could reflect osteoporosis-relevant regulatory factors, we performed reverse prediction analysis to prioritize all genes by these epigenomic features, which would suggest additional and/or novel susceptibility genes with high tendency to influence osteoporosis. We calculated a total score for each gene to make a ranking list. The top 20 genes ranked by the total scores are presented in Table 1. Functional annotation and pathway analysis. To explore whether genes identified by the reverse epigenomic analysis are relevant to osteoporosis and may provide novel targets to osteoporosis, we conducted GSEA on all genes prioritized by the total scores. The results were similar as the original osteoporosis-associated genes. As shown in Table 2, we identified 36 significant pathways (FDR P < 0.05). The enriched pathways related to  osteoporosis are particularly interesting, including Wnt signaling, calcium signaling, Hedgehog signaling, MAPK signaling, and TGF-β signaling pathways. We further applied GRAIL analysis to investigate potential connections between the top 20 genes from reverse prediction analysis and the 259 known osteoporosis-associated genes. As shown in Fig. 2, 10 of the 20 predicted genes were connected with 69 known osteoporosis-associated genes.
Validation at the population level. To confirm the relationship between the predicted genes with osteoporosis, we examined associations between the top 20 genes and BMD in four available GWAS BMD datasets, of which one was from the GEFOS dataset and the other three were from our own group (KCOS, OOS and COS). The basic characteristics of our samples are summarized in Table 3. Among the top 20 genes, 3 genes, including brain-derived neurotrophic factor (BDNF), phosphodiesterase 4D (PDE4D), and SATB homeobox 2 (SATB2), were successfully validated for associations with spine BMD. The detailed association results are summarized in Table 4. According to meta-analysis over the 4 GWAS datasets, two SNPs of BDNF, rs7124442 (3′UTR variant) and rs11030119 (intron variant), achieved P values of 2.47 × 10 −5 and 7.65 × 10 −5 , respectively. Two SNPs of PDE4D, rs2938780 and rs2963826 (both intron variants), achieved P values of 1.15 × 10 −4 and 1.27 × 10 −4 , respectively. And two SNPs of SATB2, rs895526 and rs6704641 (both intron variants), achieved P values of 1.25 × 10 −4 and 1.28 × 10 −4 , respectively. After multiple testing corrections, all of these 6 SNPs from 3 genes remained significant (adjusted P < 0.05), and the directions of their effects on spine BMD were totally consistent  across different studies. The heterogeneity test between studies showed that there was no heterogeneity for all of the SNPs we identified (all Q het P > 0.05, I 2 = 0) ( Table 4). For the above 6 significant SNPs, we also tested for associations with osteoporotic fractures in the CFS sample (Table 5). Both SNPs of BDNF were found to be significantly associated with osteoporotic fractures (rs11030119: P = 0.024; and rs7124442: P = 0.042). These two SNPs were in high LD with each other (pairwise LD r 2 = 0.9). The minor alleles of both SNPs have protective effects on fractures, with the odds ratio (OR) estimated to be 0.58 (95% Figure 2. Connections between the top ranking predicted genes and the known osteoporosis-associated genes. Ten out of the 20 predicted genes, which are marked by black triangle, are connected with 69 known osteoporosis-associated genes. The lines between genes represent individually significant connections that contribute to the positive signal, with the thickness of the lines being inversely proportional to the probability that a literature-based connection would be seen by chance.  To explore the functional relevance of the identified 6 significant SNPs, we performed cis-eQTLs analysis in 462 unrelated human LCLs samples from 1000 Genome Project. Although all of these 6 SNPs were not associated with expression levels of their transcript, we found several surrogate SNPs, which were in high LD with the two significant SNPs (rs7124442 and rs11030119) in BDNF, significantly associated with BDNF mRNA expression levels ( Table 6).
Functional Assays. Based on the significant effect we identified for BDNF, we further investigated the function of BDNF in bone. BDNF is reported to be involved in chondrocyte differentiation, cartilage development and osteogenesis [21][22][23] . Therefore, we tested the role of BDNF in osteoblast biology to assess the function of the identified loci. Real-time PCR revealed that differentiated osteoblasts had higher BDNF expression level than pre-osteoblasts (p < 0.05) (Fig. 3A). After treatment with siRNA against BDNF, we examined the mRNA expression levels of osteoblast differentiation markers, including alkaline phosphatase (ALP), osteocalcin (OCN), collagen type-I (COL1), and runt-related transcription factor 2 (RUNX2) 24 . As shown in Fig. 3B, knockdown of BDNF significantly suppressed the expression of marker genes COL1, RUNX2, and OCN, compared with the control siRNA treated group in differentiated osteoblast cells (p < 0.05). Western blot analysis gave similar results (Fig. 3C), suggesting that BDNF may stimulate osteoblast differentiation, resulting in increased bone formation.

Discussion
With GWAS becoming a convenient and powerful tool for genetic decipherment of complex diseases, an arising new challenge is how to utilize the genomic data efficiently to interpret the GWASs results and better understand the disease mechanisms. The ENCODE project 17 has provided a wealth of various functional elements data,

SNP
Chr Position a A1/A2 Genic position Gene   which can be used to the design, analysis and interpretation of GWASs, and therefore improve our knowledge of human diseases processes 25 . In this study, through integrating epigenomic elements and GWASs data, we identified a set of distinct epigenomic elements associated with osteoporosis. The further reverse analysis based on these epigenomic features predicted a ranking list of candidate genes for osteoporosis, and we successfully identified BDNF as a susceptibility gene for BMD and osteoporotic fractures, which highlights the efficiency of finding missing heritability of osteoporosis by reasonably prioritizing genes using epigenomic data. For the epigenomic elements analysis, we didn't restrict our analysis to any given cell types. It is because that the osteoporosis-associated genes identified by GWASs are disease-specific, not cell type-specific. Moreover, osteoporosis is a systematic metabolic disease, which could be caused by a number of diseases, including diabetes, hyperthyroidism, gastrointestinal disorders, kidney disease, rheumatoid arthritis, and systemic lupus erythematosus. It might lose some information by focusing on any given cell types. For the identified significant epigenomic marks, EZH2 (enhancer of zeste homolog 2) has been reported as a transcription repressor through H3K27me3 26 . Previous studies have revealed that EZH2 could interact with Wnt signaling [27][28][29] , which is a crucial pathway to bone biology and development. Interestingly, we found that most of the enriched epigenomic elements for osteoporosis are repressed or inactive marks, such as "poised promoter", "repressed" chromatin regions, EZH2, and repressive H3K27me3 mark, which suggests that osteoporosis-associated genes tend to be affected by repressive or inactive epigenomic marks, and disruption of these transcriptionally inactive or repressive state might be a factor in the disease.
We suggest a list of novel candidate genes for osteoporosis. Among which, three genes (BDNF, PDE4D, and SATB2) were confirmed for association with spine BMD in four GWAS datasets. Especially, BDNF was also associated with osteoporotic fractures. BDNF encodes a member of the neurotrophin family of growth factors, which are related to the canonical nerve growth factor. BDNF is required for differentiation and survival of specific neuronal subpopulations in both central and peripheral nervous systems 30 . Moreover, BDNF mRNA was previously reported to be expressed in murine osteoblasts (MC3T3-E1 cells) 31 . Several studies have shown that BDNF plays a role in chondrocyte differentiation, cartilage development and osteogenesis [21][22][23] . A phosphorylation-related SNP rs6265 in BDNF has been identified to be associated with BMD in humans 32 . We explored the role of BDNF in osteoblast biology and found that siRNA mediated BDNF knockdown can suppress the expression of osteoblast differentiation markers, suggesting that BDNF may stimulate osteoblast differentiation, resulting in increased bone formation. Pathway analysis revealed that BDNF is involved in the MAPK signaling pathway, which plays an essential role in osteoblast differentiation and skeletal development [33][34][35] . Modulation of bone formation by BDNF may influence fracture risk by affecting both bone mass and bone quality. The other two genes also have potential connection with bone or osteoporosis. PDE4D encodes cyclic AMP-dependent phosphodiesterase 4D. PDE4D selective inhibitors can promote osteoblast differentiation in progenitor cells 36 and increase bone mass by promoting bone formation in normal mice 37 . A previous genetic association study identified a variant in PDE4D associated with lumbar spine BMD in females 38 . SATB2 encodes a nuclear matrix-associated transcription factor and epigenetic regulator that plays a critical role in osteoblast lineage commitment 39 . Targeted knockout of SATB2 in mice could result in impaired osteoblast differentiation and craniofacial skeletal defects 40 . Recent studies suggested SATB2 as a novel sensitive marker of osteoblastic differentiation 41,42 . Together, taking into account of the above lines of biological evidence, our findings further highlights the importance of these genes to the pathogenesis of osteoporosis, and also support our hypotheses that epigenomic data could be used to predict susceptibility genes with functional information for diseases.
Our study has several implications compared with conventional GWAS strategies. First, candidate gene prioritization strategies by epigenomic data could increase the prior probability of an association test, and therefore increase the power of detecting bona fide associations in a study of a given size, which may discover genes that would be missed by traditional association studies relying on strictly P value driven approaches. Second, incorporating epigenomic regulatory information may provide more insight into disease biology and offer strategies for therapeutic development. The susceptibility genes we identified have close relationship with bone metabolism. Functional enrichment analysis of the genes prioritized by the epigenomic elements revealed osteoporosis related pathways, including Wnt signaling, Hedgehog signaling, and MAPK signaling pathways, which are consistent with our prior understanding of the pathophysiology of osteoporosis, and meanwhile implicate the feasibility of our study. Third, we confirmed the significant associations of 3 predicted genes in our 3 GWAS datasets and GEFOS dataset. Specifically, GEFOS is the largest dataset for osteoporosis GWAS meta-analysis 9 . Thus, our association results are robust and may provide convergent validity for our findings.
Our current study also has some limitations. For example, in the gene validation stage, we arbitrarily selected the top 20 genes and further remained significant after multiple testing corrections in four GWAS datasets. Thus, it is likely that some genes that contribute to osteoporosis susceptibility but did not meet our selection criteria could have been missed. Second, for the epigenomic elements analyses, we focused on the promoter regions of genes, since promoter is a critical regulatory region that can work in concert with many other regulatory elements to direct the level of transcription of a given gene. It is easy to find regulatory commonalities, but might neglect some potential meaningful epigenomic elements located on other regions of genome.
In conclusion, through the integrated analysis of GWASs and epigenomic data, we identified a set of significant regulatory elements enriched in osteoporosis-associated genes. We also discovered BDNF as a susceptibility gene implicated in osteoporosis that is biologically meaningful. Beyond generating a list of associated SNPs by statistical signals, our findings demonstrate that an integrative approach combining GWASs and epigenomic profiling could lead to the identification of additional loci with functional information underlying osteoporosis. The genes may provide future targets for research into the etiology and treatment of osteoporosis.

Materials and Methods
Acquisition of osteoporosis-associated genes. We used the National Human Genome Research Institute (NHGRI) GWAS Catalog 43 (www.genome.gov/gwastudies downloaded on Apr 20, 2015) and Phenotype-Genotype Integrator (PheGenI) database 44 (http://www.ncbi.nlm.nih.gov/gap/phegeni) to obtain a list of osteoporosis-associated genes, using osteoporosis related phenotypes including BMD, fractures, femoral neck bone geometry, hip bone size, and spine bone size.
Functional annotation. We used ENCODE data drawn from the UCSC genome browser to conduct functional annotation for the genomic regions of interest 45 . In this study, we focused on promoter regions of osteoporosis-associated genes. An in-house Perl script was used to extract the promoter regions, which were defined as 2,000 nucleotides upstream of a gene's transcription start site. For the genes with more than one transcript, the pipeline extracted the promoters for each transcript, and merged overlaps into a single promoter. The annotated genomic features can be classified into three groups of epigenomic elements, including TFs obtained experimentally by ChIP-seq, histone modifications by ChIP-seq, and chromatin state segmentation by hidden Markov model (HMM) from ENCODE. A total of 569 epigenomic elements were used in the analysis. The data from multiple cell lines were used.

Enrichment analysis of epigenomic elements.
We first calculated the total number of promoters of osteoporosis-associated genes annotated by the 569 epigenomic elements obtained above. The annotation was defined that if the promoter overlaps with each epigenomic element for at least 1 nucleotide, it means that the promoter is annotated by this element 46 . If a given promoter overlaps with the same epigenomic element for more than 1 time, it is only counted once to reflect the fact of overlap. Then, using the promoters of all genes on genome as a background, we randomly selected the same number of promoters as those in the osteoporosis-associated genes set to perform random sampling, which could distinguish regulation of one set of genes from another. Such random sampling was repeated 1,000 times to estimate the average number and variance of random annotation. Compared with the random sampling results, we implemented Fisher's exact test to identify epigenomic elements that was significantly over-represented or under-represented in the osteoporosis-associated genes. For easier comparison and visualization, P values were transformed into logarithm (log 10 [P] for under-represented; − log 10 [P] for over-represented). As a control group, we also randomly selected a genes set of the same size as the osteoporosis-associated genes to conduct the above process.
Reverse prediction. To predict new candidate genes for osteoporosis, we analyzed the promoters of other genes to evaluate whether they shared similar set of epigenomic elements as those promoters of osteoporosis-associated genes. The promoters of all genes were annotated for the presence of the significant epigenomic elements obtained by the above enrichment analysis. For each gene, we first counted the times of its promoter annotated by each of the significant epigenomic elements. Then we weighted the counts of each element by the corresponding transformed P values. By summing up the weighted counts on each promoter, we acquired the total score denoting each gene to prioritize the importance of genes. Functional annotation and pathway analysis. We ranked all genes based on the scores obtained from the reverse epigenomic analysis. The ranked gene list was supplied to gene set enrichment analysis (GSEA) 47 pre-ranked analysis with default parameters. c2 KEGG (curated gene sets from KEGG pathway databases) were used for the analysis. We used the Gene Relationships Across Implicated Loci (GRAIL) text-mining algorithm 48 to investigate connections between the predicted new genes and the known osteoporosis-associated genes identified by GWASs.
Validation in GWAS datasets. To validate the predicted candidate genes at the population level, we took advantage of the available five GWAS datasets, of which one was from the published GEFOS dataset and the other four were from our own group. All the studies related to the datasets were approved by the respective institutional ethics review boards and all participants provided signed informed-consent documents. The related information is described in detail as follows.
GWAS datasets. GEFOS dataset. We checked the SNPs of the interested genes for their association signals with femoral neck and lumbar spine BMD in the GEFOS (Genetic Factors for Osteoporosis Consortium) dataset (http://www.gefos.org). GEFOS is the largest GWAS meta-analysis to date in the bone field, including 17 GWASs and 32,961 individuals of European and East Asian ancestry 9 . Phenotype measurements. For our own three GWAS BMD samples, BMD (g/cm 2 ) at spine and femoral neck (FN) was measured with dual energy x-ray absorptiometry (DXA) using Hologic 4500 W machines (Hologic Inc., Bedford, MA, USA) that were calibrated daily. For the GEFOS samples, BMD was measured with DXA scanners using Hologic (Hologic Inc., Bedford, MA, USA) or Lunar scanners (Lunar Corp., Madison, WI, USA). Raw BMD values were adjusted by significant covariates including sex, age, weight and height. To adjust for potential population stratification, the first ten principal components emerging from the EIGENSTRAT analyses were also included as covariates 52 .
Genotyping and quality control. Samples from KCOS and COS were genotyped using Genome-Wide Human SNP Array 6.0 (Affymetrix Inc., Santa Clara, CA, USA), and sample from OOS and CFS were genotyped using the Affymetrix Human Mapping 500 K array set, according to the manufacturer's protocols. The details of genotyping for each sample have been described in our previous studies [49][50][51] . Quality control of genotype data were implemented with PLINK 53 , with the following criteria applied: individual missingness < 5%, SNP call rate > 95%, and Hardy-Weinberg equilibrium (HWE) P-value < 0.0001.

Association analyses.
For the KCOS and COS samples, a linear regression implemented in PLINK 53 was fitted to test for association assuming an additive inheritance model. For the OOS and CFS samples, IMPUTE program 54 was utilized to impute the genotypes of SNPs detected on Array 6.0 but not on 500 K array set based on HapMap data (release 22). To ensure the reliability of imputation, all imputed SNPs reached a calling threshold of 0.90, i.e., a 90% probability that an imputed genotype is true. SNPTEST 54 was used to examine associations in these samples. Summary statistics of associations from each GWAS BMD sample were combined to conduct meta-analysis using the METAL software package 55 , taking into account the square-root of each sample size. The between-study heterogeneity was assessed using both the Cochran's Q statistic and the I 2 metric. The Benjamini and Hochberg (BH) procedure was used for multiple-testing adjustment.
Expression quantitative trait locus (eQTL) analysis. We conducted eQTL analysis to evaluate whether the predicted significant SNPs for each locus also influence expression level of their nearest transcript. Gene expression information was obtained from human lymphoblastoid cell lines (LCLs) of 462 unrelated individuals from 1000 Genomes Project 56 . The linear regression model implemented in PLINK 53 was used to determine associations between expression levels and SNPs. We also included surrogate SNPs with linkage disequilibrium (LD) r 2 > 0.7 with target SNPs in the test.
Transfection with small interfering RNA (siRNA). When cells confluence reached at 30%, cells were transfected with siRNA against murine brain-derived neurotrophic factor (BDNF) or with non-silencing siRNA (Shanghai GenePharma Co.,Ltd) using X-treme GENE siRNA Transfection Reagent (Roche, NJ, USA) according to the manufacturer's instruction. Briefly, after incubating MC3T3-E1 cells with siRNA-reagent mixtures in α -MEM containing for 6 hours, 200 ng/ml rhBMP2 were added and cells were incubated for an additional 48 hours. The sequences of siRNA were described in Supplemental Table S5. The efficiency of knockdown was at least 70% as confirmed by analyses of mRNA levels.

Semi-quantitative RT-PCR and Real-time PCR.
Total RNA was isolated using the TRIzol reagent (Invitrogen, CA, USA), and complementary DNA (cDNA) was synthesized using the Super Scripts II First-Strand cDNA synthesis kit (Invitrogen, CA, USA) according to the manufacturer's instructions. Semi-quantitative RT-PCR experiments were optimized for the number of cycles to ensure that the amplifications were analyzed within an exponential phase. We analyzed the expression levels of BDNF, as well as osteoblast differentiation markers, including ALP, OCN, COL1, and RUNX2 24 , by real-time PCR in an Eppendorf Real-time PCR System, using the comparative threshold cycle (Ct) method for relative quantification. The glyceraldehyde-3-phosphate dehydrogenase (GAPDH) gene was used as an endogenous control. The specific primers for indicated genes are presented in Supplemental Table S6.
Western blot. Total protein was extracted using RIPA buffer (Beyotime Biotechnology, China). Samples were separated by 14% SDS-PAGE, and then transferred onto PVDF membranes (Roche, Germany). After blocking in TBST (Tris buffered saline with 0.1% Tween-20) and 5% non-fat milk and incubated with primary antibodies for BDNF, ALP, OCN, COL1, Runx2, or β -actin (Cell Signaling Technology Inc., USA). Then the membrane was incubated with horseradish peroxidase (HRP)-conjugated goat anti-rabbit secondary antibody (Abcam, MA, USA). Immunoreactivity was detected by enhanced chemiluminescence reaction using Luminata TM Western HRP substrate (Millipore, USA). ECL images were acquired and analysed with the Chemiluminescence Imaging System (CLINX, Shanghai, China). Statistical Analysis. Each experiment was repeated independently at least three times, unless indicated otherwise. The results were expressed as the mean ± standard deviation of triplicate independent samples. Student's t-test was used to examine the differences between the two groups of data. Differences with p < 0.05 were considered as statistically significant.