Introduction

Complex diseases such as osteoporosis are influenced by multiple genes with small individual effects, environments, and their interaction. Identification of genetic components of complex diseases is the greatest challenge for human geneticists. For the past two decades, the dominant study design has been linkage analysis in families, which identifies broad intervals of several megabases of DNA (quantitative trait loci) that correlate with the disease status in pedigrees. Linked DNA intervals can encompass dozens to hundreds of candidate genes that might be involved in or causal for the disease. Transition from quantitative trait loci to gene has been proven to be difficult due to the absence of complete functional information for the majority of genes in this susceptibility locus and the limited knowledge of the link between gene function and disease.

Osteoporosis is a complex disease with a strong genetic component. To date, more than 20 genome-wide linkage scans across multiple populations have been launched to hunt for osteoporosis susceptibility genes (Huang and Kung 2006; Ralston et al. 2005; Streeten et al. 2006; Xiao et al. 2006; Hsu et al. 2007; Lee et al. 2006; Ioannidis et al. 2007). Some significant or suggestive chromosomal regions of linkage to bone mineral density (BMD) have been identified and replicated in genome-wide linkage screens. The next daunting task is identifying key candidate genes within these confirmed regions. Exhaustive surveys of all variation in the intervals are needed to determine which genes within these chromosomal regions account for the linkage signals found. Despite the recent drop in genotyping cost, this kind of study is still expensive and in many cases not feasible. Now some promising bioinformatics tools are available for disease-gene identification (Lopez-Bigas and Ouzounis 2004; van Driel et al. 2005; Franke et al. 2006; Adie et al. 2006; Aerts et al. 2006). These tools use information extracted from public online databases, such as sequence data, medical literature, gene ontology, and function annotation, as well as information on biology, function, and gene expression. These methods have been successfully used to prioritize candidate disease genes for type 2 diabetes and obesity (Tiffin et al. 2006; Elbers et al. 2007). Recently, text-mining methods have been applied to the discovery of novel genes related to bone biology (Gajendran et al. 2007). In this study, we used five freely available bioinformatics tools to analyze the 13 most promising osteoporosis susceptibility loci encompassing the 5,492 positional candidate genes. We prioritized a subset of most likely candidate osteoporosis susceptibility genes for further empirical test.

Materials and methods

Susceptibility loci selection

To date, more than 20 genome-wide linkage scans across multiple populations have been performed to identify quantitative trait loci underlying BMD variations (Huang and Kung 2006; Ralston et al. 2005; Streeten et al. 2006; Xiao et al. 2006; Hsu et al. 2007; Lee et al. 2006; Ioannidis et al. 2007). Only those loci that were significant or suggestive [log of odds (LOD > 2.2)] at least once and replicated (LOD > 1) in at least one independent study were included (Lander and Kruglyak 1995). The regions 1p36, 1q21–25, 2p22–24, 3p14–25, 4q25–34, 6p21, 7p14–21, 11q14–25, 12q23–24, 13q14–34, and 20p12 have been identified. In addition, two significant regions (2q24–32 and 5q12–21) recently identified by a genome-wide linkage scan in a Chinese population (Hsu et al. 2007) are also included for the purpose of our interest. In total, 13 osteoporosis susceptibility loci were selected for analysis (Table 1).

Table 1 List of 91 genes in 13 susceptibility loci for osteoporosis pinpointed by the five disease-gene identification tools

Gene identification methods

Five bioinformatics tools are freely available for disease-gene identification: Disease Gene Prediction (DGP) (Lopez-Bigas and Ouzounis 2004), GeneSeeker (van Driel et al. 2005), Prioritizer (Franke et al. 2006), PROSPECTR and SUSPECTS (PandS) (Adie et al. 2006), and Endeavor (Aerts et al. 2006). Prioritizer ranks genes based on its functional interaction with genes on different susceptibility loci, assuming that disease genes in a specific disorder are usually functionally related. GeneSeeker points to genes that are expressed in disease-related tissues. PROSPECTR differentiates between genes that are likely to be involved in diseases and those that are not involved; it uses sequence-based features such as gene length, protein length, and percentage identity of homologues in other species. SUSPECTS scores candidate genes using the PROSPECTR and also assess the similarity between their annotation and that of already known disease genes. DGP assigns probabilities to genes that could indicate involvement in hereditary diseases using parameters based on conservation, phylogenetic extent, protein length, and paralogy. Endeavor is a software application for computational prioritization of test genes based on a training set of genes already known to be involved in the disease of interest. The ranking of a test gene is based on its similarity with training genes. These five tools were combined to analyze the 13 promising osteoporosis susceptibility loci encompassing the 5,492 positional candidate genes. The 1p36, 1q21–25, 2p22–24, 3p14–25, 4q25–34, 6p21, 7p14–21, 11q14–25, 12q23–24, 13q14–34, 20p12, 2q24–32, and 5q12–21 loci were used as inputs in GeneSeeker, DGP, Endeavor, and PandS. The chromosomal start and end locations in the base pair of the corresponding loci were input in Prioritizer. Endeavor and PandS had to be trained with a set of genes. VDR, ESR1, ESR2, IL6, COL1A1, LRP5, IGF1, BMP2, SOST, CLCN7, TGFB1, and CYP19A1 are already known to be involved in BMD and/or osteoporosis and were therefore used as training genes. If a training gene was located in a selected susceptibility locus, it was taken out from the training genes while this locus was analyzed. GeneSeeker required disease-related tissue as input. Bone(s), skeletal system, osteoblast, osteoclast, spine, cartilage, marrow, parathyroids, ovary, and testis were used.

Identification of candidate genes

GeneSeeker pinpoints genes expressed in disease-related tissue. Therefore, we took all genes pinpointed by this method into consideration. All other tools produce rankings, and therefore, the top 20 genes from each method were included for comparison. A gene was considered to be interesting as a candidate gene if it was indicated by three or more of the tools. As Endeavor, DGP, and PandS partly use the same input information and show similar outputs, candidate genes were excluded if they were solely identified by these three methods.

Pathways and network analyses

Candidate genes identified were imported into the ingenuity pathways analysis (IPA) 5.0 (Ingenuity Systems, Mountain View, CA, USA) to generate putative signaling networks based on the manually curated knowledge database of pathway interactions extracted from the literature. The network was generated by the input genes using both direct and indirect relationships. These networks were ranked by scores that measured the probability that the genes were included in the network by chance alone. Networks with scores > 3 have a 99.9% confidence of not being generated by random chance (Raponi et al. 2004). The overlapping networks were merged to produce the largest possible network such that the number of biological relationships to be examined was maximized. The correctness of the relationships was checked manually based on categorized literature findings provided by the application. Canonical pathways associated with input candidate genes were elucidated with a statistical significance value.

Results

Ninety-one genes were selected as potential osteoporosis susceptibility genes by using five freely available bioinformatics tools for disease-gene identification (Table 1). Some interesting genes indicated by the five disease-gene identification methods are already known to be associated with BMD/osteoporosis. These include genes encoding procollagen lysyl hydroxylase (PLOD), tumor necrosis factor receptor-2 (TNFR2), cytochrome p450 protein (CYP)1B1, proopiomelanocortin (POMC), interleukin (IL)-6, insulin-like growth factor (IGF)-1, and bone morphologic protein (BMP)-2. Of particular interest is the MATN3 gene. The matrilins are a four-member family of noncollagenous extracellular matrix proteins. Matrilin-3 is specifically expressed in cartilaginous tissue and has a role in the development and homeostasis of cartilage and bone. Mutations in the MATN3 gene have been reported in a variety of skeletal diseases, including multiple epiphyseal dysplasia (Chapman et al. 2001). A functional knockout of the Matn3 gene increases BMD in mice (van der Weyden et al. 2006). The MATN3 gene has not been previously implicated in osteoporosis (Huang and Kung 2006). Thus, it will be of great interest to see whether polymorphisms in MATN3 are associated with BMD/osteoporosis in humans.

Another interesting gene is RUNX2. RUNX2 is an osteoblast-specific transcription factor 2 that is responsible for controlling osteoblast differentiation and transactivating genes involved in the deposition of bone matrix, such as osteocalcin, osteopontin, and type I collagen. Runx2-gene knockout results in animals with no osteoblasts and unmineralized skeleton. Overexpression of a dominant negative form of Runx2 in mature osteoblasts of transgenic mice leads to less active osteoblasts, reduced bone formation, and short stature. A single nucleotide polymorphism (SNP) in exon 2 of RUNX2 was associated with higher BMD in an Australian population (Vaughan et al. 2002) and was replicated in a Scottish population (Vaughan et al. 2004). Recently, Doecke et al. (2006) showed that only SNPs in the RUNX2 P2 promoter were significantly associated with BMD and that greater RUNX2 P2 promoter activity was associated with higher BMD.

Five canonical pathways are pinpointed by candidate genes identified (Table 2):

  1. 1.

    TGF-β signaling pathway: BMP-2 belongs to the TGF-β superfamily. There are three type II receptors for BMPs, including type II BMP receptor, as well as and type II and IIB activin receptors (ActR-II and ACVR2B). BMP-2, as well as cytokines of TGF-β, binds to the type II receptor, which phosphorylates and activates activin-receptor-like kinases (ALKs). Ectopic expression of ACVRI resulted in osteoblast differentiation of chondrocytes (Valcourt et al. 2002). Tissue-specific deletion of ALK2 in neural crest cells resulted in craniofacial defects such as hypotrophic mandibles, thereby suggesting its involvement in regulating normal bone growth (Dudas et al. 2004). BMP signaling is crucial for osteoblastogenesis mediated via Runx2-dependent process (Phimphilai et al. 2006). Osteoblasts regulate mineralization via the coordinated secretion of activin A and follistatin. Whereas activin A also belongs to TGF-β and inhibits mineralization in vitro, follistatin increased the process (Eijken et al. 2007). The coordination of activins and inhibins in controlling osteoblastogenesis and osteoclastogenesis has been demonstrated (Gaddy-Kurten et al. 2002).

  2. 2.

    Granulocyte-macrophage colony-stimulating factor (GM-CSF) signaling pathway: Glycoprotein (gp)130-dependent cytokines, which mediate their biological actions via either the signal transducer and activator of transcription (STAT) 1/3 or protein tyrosine phosphatase nonreceptor (PTPN)11/renin angiotensin system (RAS)/mitogen-activated protein kinase (MAPK) pathways, regulate osteoblast and osteoclast formation (Sims et al. 2004). Cytoplasmic Src homology 2-containing phosphotyrosine phosphatase (SHP2) is encoded by PTPN11. In bone formation, SHP2 forms a growth factor receptor-bound protein 2 (Grb2)/fibroblast growth factor receptor substrate 2 ((FRS2)/SHP2 complex, which is essential for fibroblast growth factor (FGF) signaling (Mansukhani et al. 2000). FGF induces proliferation in preosteoblasts, whereas in differentiating osteoblasts, FGF inhibits osteoblast differentiation as revealed by its inhibitory effects on alkaline phosphatase activity and mineralization. One important target of FGF is v-ets erythroblastosis virus E26 oncogene homolog 1 (ETS1). The latter belongs to a family of transcription factors defined by a conserved ETS DNA binding domain. Treatment with retinoic acid (RA) increases the expression of ETS1 in preosteoblast cell line MC3T3-E1 cells (Raouf et al. 2000). The conserved sequence for ETS1 is located in the AJ18 promoter, suggesting its possible role in the regulation of osteoblast differentiation via the transactivation of AJ18 (Jheon et al. 2003).

  3. 3.

    Axonal guidance signaling pathway: Both premature and mature osteoblasts express and secrete IGF-I. Endogenous transcript expression of IGF-I was detected in osteoblasts and osteocytes in rat tibia (Reijnders et al. 2007). Mechanical loading increases IGF-I expression and is proposed to be a means of translating mechanical loading to osteoblast formation. Epidermal growth factor (EGF), as well as other ligands of the EGF receptor, might regulate bone growth and resorption via the modulation of IGF-I and/or IGF-I-binding protein activities (Xian 2007). As a result, proliferation of osteoblasts is stimulated, but their differentiation is inhibited.

  4. 4.

    Peroxisome proliferator-activated receptor (PPAR) signaling pathway: Nuclear receptor subfamily 2 (NR2)F1 encodes nuclear receptor chicken ovalbumin upstream promoter-transcription factor-1 (COUP-TFI) that binds to the regulatory response element in BMP-4 promoter and serves as a silencer for the transcription of BMP-4 in osteoblasts (Feng et al. 1995). Tumor necrosis factor (TNF)-α inhibits osteoblast differentiation while upregulating bone resorption (Nanes 2003). Actions of TNF-α require either cell-surface receptor TNFR1 or TNFR2 encoded by TNFRSF1A and TNFRSF1B, respectively (Chen and Goeddel 2002). The nuclear receptor PPARgamma is an essential key regulator for adipogenesis. Overexpression of PPARgamma2 and its subsequent activation inhibits osteoblast differentiation of mesenchymal stem cells and preosteoblasts (Jeon et al. 2003; Kim et al. 2005). Activation of Wnt/β-catenin signaling enhances osteoblast differentiation of mesenchymal stem cells (MSCs) via the simultaneous suppression of PPARgamma and CCAAT enhancer binding protein alpha (C/EBPalpha) (Kang et al. 2007). Recent studies hypothesized that activation of PPARdelta enhances osteoblast differentiation (Dang and Lowik 2004).

  5. 5.

    Wnt/β-catenin signaling pathway: WNT proteins interact with the receptor frizzled and coreceptor lipoprotein-related protein (LRP)5/6 to activate the stabilization of β-catenin. The cytoplasmic β-catenin translocates to the nucleus and promotes the expression of osteoblast-specific genes. Secreted frizzled-related protein 4 (SFRP4) negatively modulates WNT signaling pathway (Mayr et al. 1997). SFRP4 attenuates bone formation and suppresses the proliferation of osteoblasts, possibly via its antagonistic effect on the WNT signaling pathway (Nakanishi et al. 2006). In addition, six nonoverlapping networks were generated by IPA 5.0 from 88 out of the 91 candidate genes (Fig. 1).

Table 2 Canonical pathways pinpointed by candidate genes identified
Fig. 1
figure 1figure 1figure 1

Six nonoverlapping networks were generated by ingenuity pathways analysis (IPA) 5.0 from 88 out of the 91 candidate genes. Shaded genes are the potential candidate genes identified by the bioinformatics approach, and others are those associated with the selected candidate genes based on pathway analysis. a Legend for the edges and nodes in the IPA networks. b The network with score of 47 and 23 focus genes. c The network with score of 41 and 21 focus genes. d The network with score of 29 and 16 focus genes. e The network with score of 20 and 12 focus genes. f The network with score of 18 and 11 focus genes. g The network with score of 6 and 5 focus genes

Discussion

A number of genomic regions have been identified in previous genome-wide linkage scans for osteoporosis. The challenge now is to find osteoporosis genes in these chromosomal regions. This study generated a list of potential candidate genes in 13 susceptibility loci for osteoporosis using a combination of five computational disease-gene identification methods. These genes are largely involved in TGF-β-, GM-CSF-, axonal-guidance-, PPAR-, and Wnt/β-catenin-signaling pathways. Some genes have already been genetically or functionally associated with osteoporosis. For example, BMP2 is the first osteoporosis gene identified by linkage and positional cloning in humans. In this study, BMP2 was also indicated as such by all the five gene identification methods. Thus, at least some candidate genes presented here might be true candidates. On the other hand, there is no evidence that some candidate genes are genetically or functionally involved in osteoporosis. An appropriate follow-up to our study would be to determine the roles played in osteoporosis by the novel genes identified through computational disease-gene-identification strategy. This two-step process of identifying candidate genes using bioinformatics tools followed by conventional experimentation (e.g., gene-wide and tag-SNP-based association analyses in a large population and/or family samples) will greatly expedite the process of gene discovery in complex diseases such as osteoporosis.

The bioinformatics approaches employ information extracted from public online databases and are based on the assumption that novel disease genes may share some characteristics with identified disease genes. Some potential bias to disease-gene selection might be introduced. First, the extent of gene annotation and available data for genes vary greatly. The greater the extent of annotated data for a gene, the more likely it is to be selected. A gene that has been extensively studied for a long time will have a large amount of associated literature and has a better chance of being selected. Moreover, it tends to point to function/features already associated with the disease. Second, some methods have a significant overlap with other methods in their input data. For example, Endeavor, DGP, and PandS partly use the same input information and show similar outputs. The selection of a candidate gene by several methods using the same input data may be less valuable than the selection of a candidate by several methods using disparate data sources. Some of these limitations can be overcome by using multiple methods and data sources in a complementary fashion. In this study, we excluded candidate genes that were solely identified by Endeavor, DGP, and PandS.

Although these methods yielded a list of most likely candidates rather than a single gene, computational approaches reduced the 5,492 positional candidate genes from genome-wide studies to just 92 most likely candidates in 13 susceptibility loci linked with osteoporosis/BMD in multiple studies. Our results showed that computational approaches are helpful in the hunt for complex disease genes and the identification of pathways and regulatory networks involved in complex disorders. We believe the list of most likely candidate genes and the associated pathways identified will assist researchers in prioritizing candidate disease genes for further empirical analysis and understanding the pathogenesis of osteoporosis.