Introduction

Behçet’s disease (BD) is a multisystem inflammatory disorder characterized by recurrent exacerbations affecting several organs including orogenital mucosa, eyes, and skin. Although BD exists worldwide, it is more prevalent in countries along the ancient Silk Road spanning from the Far East to the Middle East and the Mediterranean basin, with the highest prevalence of 4.2/1000 in Turkey.1 Its etiology remains poorly characterized, but a genetic tendency to uncontrolled inflammatory reactions induced by various environmental triggers is considered to play a critical role in its development. Recently, two genome-wide association studies (GWASs) in BD were conducted on Turkish1 and Japanese2 populations. In these studies, HLA-B51 and other variations around the HLA-B gene were found as the most strongly associated genetic factor to BD. In addition, two other non-HLA associations with the IL10 and IL23R/IL12RB2 genes were confirmed using these sets of samples. A family-based linkage study suggested that HLA-B51 accounts for <20% of the genetic risk,3 suggesting that other genetic factors including rare variants and copy number variations are waiting to be discovered.

Although GWASs have been advocated as the most powerful approach to explore the contribution of common polymorphisms to polygenic traits in many complex diseases,4 numerous challenges persist in the identification of weaker associations with qualitative and quantitative phenotypes at the genome-wide level.5 Standard GWAS approaches focus on the analysis of single SNPs; however, for multifactorial diseases, no particular variant on a particular gene may have a strong effect, but the combination of multiple variants with small effects explains the overall susceptibility to the disease.6, 7 The disruption of different biological pathways is thought to determine the intrinsic biological processes of multifactorial diseases. In this regard, pathway-based approaches to GWAS may be more helpful in a search for multiple genes involved in the same biological pathway, where the common variations in each of these genes have little correlation with disease risk.8, 9, 10, 11, 12, 13, 14

It has been observed that genes that have aberrations associated with a given complex disease tend to be part of the same subnetwork of the overall protein–protein interaction (PPI) network.15, 16 Hence, to be able to explain the connections between genotypic and phenotypic data, perturbed network modules need to be detected. Genotype data (eg, SNP, copy number variation) have been used to aid the identification of these modules.15, 16 In this respect, the use of PPI networks has demonstrated great success in enhancing the outcome of GWASs.8, 9, 10, 13, 17, 18, 19, 20

Another important piece of information that could improve the analysis of GWAS data sets is the functional effect of a SNP.21, 22 Although DNA variations that alter protein function can have significant effects, such as NOD2 mutations in inflammatory bowel disease23 and FLG mutations in eczema,24 other types of SNPs do not have such serious consequences in disease development mechanisms. In addition, to uncover the links between the genetics and pathogenesis of human complex diseases, the potential of conducting research on disparate populations have been discussed in terms of GWAS.25 As summarized here, many different kinds of knowledge must be combined in order to mine GWAS results further. Yet, to the best of our knowledge, none of the existing platforms can successfully integrate functional information of typed SNPs in a GWAS with PPI networks to identify SNP-targeted pathways, and make a comparative evaluation between different populations.

In this study, we hypothesized that the few SNPs that are identified in GWASs and their associated genes may be targeting the same combinations of pathway, and these biological pathways show higher conservation across populations, making them potential markers for BD. In this respect, we analyzed the GWAS data from two different populations using the methodology that we had developed to identify disease-associated pathways by combining nominally significant evidence of genetic association with the known biochemical pathways, PPI networks, and the functional information of selected SNPs.8, 26, 27, 28 In the following sections, we will discuss our findings.

Materials and methods

Turkish population data set

Remmers et al1 conducted a GWAS on 1215 Turkish BD cases and 1278 unaffected controls. The samples were typed using Human CNV370-Duo v1.0 and Human CNV370-Quad v3.0 chips (Illumina, San Diego, CA, USA). Initially, there were 335 887 autosomal SNPs on the Duo and 334 556 SNPs on the Quad chips. The SNP genotypes from these 2493 individuals were filtered using the following criteria, >95% call rate, >1% minor allele frequency, and Hardy–Weinberg equilibrium >0.00001, resulting in 311 459 SNPs in the final analysis.1 The details of whole genome SNP typing, quality control, and association analysis can be found in Remmers et al1 and in Supplementary Table S1.

Japanese population data set

Mizuki et al2 tested 500 568 SNPs on 612 Japanese individuals with BD and 740 healthy controls. DNA samples were genotyped using the Affymetrix GeneChip Human Mapping 500K Array Set (Affymetrix, Santa Clara, CA, USA). Quality control was conducted on 500 568 SNPs, and they removed 28 702 SNPs with a call rate <95%, 14 044 SNPs with deviation from Hardy–Weinberg equilibrium in controls (P<0.001), and 137 384 SNPs with a MAF <5% overall. Finally, 320 438 SNPs were left for further analyses.2 The details of whole genome SNP typing, quality control, and association analysis can be found in Mizuki et al2 and in Supplementary Table S1.

For both BD sets, SNP data and the genotypic P-values of association for each tested SNP, which were calculated via allelic χ2 test, were obtained using the original data by EFR and AM.

Clinical features of the patients

In the Turkish GWAS set, BD patients fulfilled the International Study Group (ISG) criteria for classification; the Japanese GWAS set comprised those fulfilling the Japanese diagnostic criteria that overlap to a great extent with the ISG criteria.1, 2 Turkish patients were recruited through a multi-disciplinary BD clinic, and Japanese patients were mainly recruited from ophthalmology clinics. Therefore, some differences were noted between the frequencies of the clinical manifestations in both groups, such as a higher frequency of uveitis among Japanese patients (83.3% vs 35.4%) and a more frequent vascular involvement in Turkish patients (25.2% vs 4.4%).1, 2

PPI data set

A human PPI data set was obtained from Protein Interaction Network Analysis platform (PINA) that integrated PPI data from six manually curated databases (MINT, IntAct, DIP, BioGRID, HPRD, and MIPS/MPact).29 The integrated set of PPIs contains 68 420 interactions between 11 411 genes.

Network and pathway oriented GWAS analysis

We had previously developed a methodology (Pathway and Network Oriented GWAS Analysis (PANOGA) to devise disease-related KEGG pathways through the identification of SNP targeted genes within these pathways.8, 19, 20, 26, 27, 28 PANOGA is publicly available at http://panoga.sabanciuniv.edu. In this study, we further improved PANOGA to identify SNP-targeted pathways with the genes responsible for BD susceptibility. A diagram of our multistep strategy is given in Figure 1.

Figure 1
figure 1

Overview of methodological pipeline.

PANOGA starts with considering the functional effects of SNPs, predicted at the splicing, transcriptional, translational, and post-translational levels. Next, following the scoring scheme of Saccone et al,30 PANOGA combines SNP functional scores (FS) with GWAS P-values (P) and calculates a weighted P-value (Pw=P/10FS) for each SNP.

To be able to move into pathways, PANOGA maps the SNPs to genes. However, while assigning SNPs to genes, SPOT program30 checks 2000 bp upstream region of genes following dbSNP strategy. In this approach, if a SNP is located further than 2000 bp away, it cannot be assigned to any gene, and the information from this SNP is lost. Especially the HLA region, which is known to be important for BD, is rich in such SNPs. To be able to incorporate such SNPs to our system, in this study, we improved PANOGA such that if no gene is assigned to a SNP by SPOT program, we assign the closest gene to this SNP. Next, using these SNPs to gene mapping results, we transfer SNP weighted P-values to genes.

An active subnetwork is the connected subgraph of the interactome that has high total significance of genotypic P-values of the disease-predisposing SNPs with respect to the controls.31 PANOGA searches out active subnetworks by using the above-mentioned Pw values of the genes. Differently from the original version of PANOGA, in this study, we adapted PANOGA to use PPI network by PINA that combines information from six manually curated databases. In addition, differently from the original version of PANOGA, in this study we used Entrez GeneIDs to prevent issues with gene aliases (nomenclature problems).

Following the identification of subnetworks, PANOGA evaluates whether these subnetworks are biologically meaningful via comparing the constituent genes with known KEGG pathways. The details of these steps can be found elsewhere.8, 19, 20, 26, 27

Results

To identify the biological pathways with the genes responsible for BD susceptibility, two data sets of GWASs that are conducted on BD case–control groups of Turkish (TR) and Japanese (JP) populations were studied. Following the study of Baranzini et al,9 the SNPs showing nominal evidence of association (P<0.05) in GWASs were investigated. Then, PANOGA was applied separately on each data set, as shown in Figure 1. A total of 18 479 SNPs and 20 594 SNPs identified in GWASs were mapped to 3869 and 4076 genes for the TR and JP populations, respectively. Next, the network-oriented steps of our methodology were conducted. The genes in a subnetwork are likely to be functionally associated and might be underlying a potential genetic interaction that underlies the manifestation of the disease. To identify possible pathogenic pathways of BD, for each identified subnetwork, we computed the proportion of the genes in an identified subnetwork that were also found in a specific human biochemical pathway, compared with the overall proportion of genes described for that pathway. At this step, our subnetworks were tested against 246 available human KEGG pathways. If a KEGG pathway is found to be statistically significant for at least one of the identified subnetworks, PANOGA adds this pathway into our final list of significant KEGG pathways as associated with disease. Here, PANOGA calculates the significance of a pathway in relation to the disease as the minimum P-value given to that pathway among all subnetworks that are found to be associated with that KEGG pathway. The pathways are ranked according to the significance scores and are named as SNP-targeted pathways.

In all, 102 pathways were detected in TR population (as shown in Supplementary Table S2) and 96 pathways were detected in JP population (as shown in Supplementary Table S3) with corrected P-values of <E−4. Of these pathways, 85 were commonly found in both populations (as shown in Supplementary Table S4). We found that the correlation between the two studies was significant (Pearson’s r2=0.72, P<5 × 10−15). A pairwise correlation of pathway statistics between two studies (which were carried out on independent populations with different ethnicities) should indicate common genetic variation associated with BD. As shown in Table 1, six pathways were found to be shared in the top 10 of both the TR and JP populations. The probability of random occurrence of such an event was 2.24E−31. These pathways are: (1) focal adhesion, (2) mitogen-activated protein kinase (MAPK) signaling, (3) transforming growth factor-β (TGF-β) signaling, (4) extracellular matrix (ECM)–receptor interaction, (5) complement and coagulation cascades, and (6) proteasome pathways. As shown in Figure 2a, among the 265 SNPs targeting these 6 pathways, only 2 SNPs were found in both TR and JP populations. In addition, as shown in Figure 2b and Table 2, in these 6 pathways, 37 genes out of 137 identified genes were commonly targeted by disease-predisposing SNPs in both populations. However, in the different populations, different SNP-targeted genes were found to be affected in these commonly found pathways. In another words, the same pathways can be targeted via independent genes in different populations. Even though there are few common disease-predisposing SNPs and even commonly targeted genes between these two populations, the identification of 6 common pathways in the top 10 pathways shows the relevance of our approach.

Table 1 The top 10 KEGG pathways identified in Turkish (TR) and Japanese (JP) Behçet’s disease GWAS sets
Figure 2
figure 2

(a) SNP counts within the commonly identified top 10 pathways are shown as unique to TR population, unique to JP population, and common between these two populations. (b) SNP-targeted gene counts within the commonly identified top 10 pathways are shown as unique to Turkish (TR) population, unique to Japanese (JP) population, and common between these two populations.

Table 2 Commonly identified SNP-targeted genes within the top 10 overrepresented KEGG pathways of each population

As a control for our interpretation of these pathways, we checked the nonrandomness of our results. We started with a list of 22 836 protein-coding genes available in Ensembl BioMart. We randomly picked a number between 1 and 22 836. We obtained the corresponding gene from the gene list. We repeated this procedure 3869 times for the Turkish population and 4076 times for the Japanese population. Hence, we randomly picked 3869 and 4076 genes (the number of SNP-targeted genes in the TR and JP GWAS data sets respectively). We created 10 such random sets of genes for Turkish population and 10 such random sets of genes for Japanese population. We followed the same PANOGA procedure for these random gene sets. In other words, we identified active subnetworks and then searched for enriched pathways within these subnetworks. The list of pathways identified for random gene set was quite different from the pathways identified for the BD data set. There was no correlation (Pearson’s r2=0.2757, P=0.0200) between the rankings of the identified pathways between the TR BD study and random gene set. In addition, there was no correlation (Pearson’s r2=−0.0312, P=0.7894) between the rankings of the identified pathways between the JP BD study and random gene set. For random gene sets, the complete list of the identified pathways is shown in Supplementary Tables S5 and S6. On the other hand, when we repeat this experiment independently with two real data sets, we found that 6 of the top 10 identified pathways in TR and JP populations were overlapping. The dysfunction of these pathways cause common functional problems and, hence, may cause BD.

We also searched for known BD-related pathways in KEGG Disease Database using ‘Behçet’ as a keyword. This search returned one disease term (H00106) and one pathway (complement and coagulation cascades) as associated with this disease term. This pathway is identified in 7th and 10th rankings with P=1.00E−18 and P=2.35E−16 in TR and JP populations, respectively.

Discussion

Analysis of the Turkish and Japanese GWAS data using the PANOGA methodology identified 6 overlapping pathways out of the top 10 identified pathways in the individual data sets as possible mechanisms involved in the pathogenesis of BD. The shared pathways included those involved in antigen processing (proteasome), inflammation (focal adhesion, MAPK signaling pathway, TGF-β signaling pathway), coagulation (complement and coagulation cascade), and interactions with extracellular environment (ECM–receptor interaction, focal adhesion).

Genetic tendencies play a critical role in complex diseases including BD, and the GWAS approach has enabled us to define the roles of common functional genetic variants in the disease pathogenesis. Analysis of a large series of patients with BD with GWAS revealed the contributions of IL10 and IL23R/IL12RB2 polymorphisms in addition to already known HLA-B51 and other HLA Class I associations, and further analysis of the same data set by imputation provided evidences for ERAP1, CCR1, and STAT4 gene associations.32, 33 However, these findings can only explain part of the overall genetic tendency to BD, and different approaches and analysis methods and/or the collection of a much larger series of patients are necessary for the identification of remaining genetic susceptibility factors contributing to the disease pathogenesis.34 BD is a rare disease in most populations, and collection of larger sample sets of well-defined patients has become a major challenge for genetic studies. Therefore, alternative analysis methods for exploiting available data obtained from different ethnic groups are expected to provide some help for identification of novel genetic associations or interaction pathways involved in its pathogenesis.

The pathways found with the PANOGA method support the available GWAS findings. Identification of the proteasome pathway is very complementary for the known HLA class I and ERAP1 associations and supports the importance of antigen processing and peptide loading to HLA-B antigens in the disease pathogenesis. The identification of inflammatory pathways (focal adhesion, MAPK signaling pathway, TGF-β signaling pathway) extends the spectrum of IL10 and IL23R/IL12RB2 associations to other pro-inflammatory and regulatory cytokines that may be critical in deregulated immune response observed in BD. Recently, to elucidate the mechanisms of innate immune responses, Ture-Ozdemir et al35 studied inflammasome activation in dendritic cells and neutrophils following stimulation with two different pattern recognition receptors (RIG-1-like and NOD-like) in BD patients. In this study, the dendritic cells from patients with BD were found to be activated as a response to NOD2 stimulus that was shown by the slightly defective (P<0.05) expression levels of RIP2 and p38 as well as IL18.35 Activation of RIP2 results in the activation of nuclear factor-κB (NF-κB) and the MAPKs. This cascade is known to trigger the production of pro-inflammatory cytokines, including IL1B and IL18.35 As shown in Table 1, the MAPK signaling pathway is identified in second and sixth rankings with P=2.05E−23 and P=2.14E−17 in TR and JP populations, respectively.

The TGF-β/Smad signaling pathway is also shown as overactive in BD patients.36 This pathway is ranked as fourth (P=4.05E−21) and third (P=1.87E−21) in the TR and JP data sets, as shown in Table 1. Shimizu et al36 observed that after stimulation, the expressions of TGFBR1, IL12 receptor β2, and SOCS1 on peripheral blood mononuclear cell were significantly enhanced in BD patients as compared with the normal controls (P<0.05). They also reported that CD4+ T cells infiltrating into BD skin lesions expressed TGFB1 much more than those infiltrating into non-BD erythema nodosum.36 Similarly, in our study, TGFB1 and SOCS1 genes are found to be targeted by a SNP in TR population (shown in Table 3). Th17 cells (a novel subset of T cells) are suspected to play a fundamental role in pathogenesis of BD.37, 38 Shahneh et al39 stated that Th17 cells predominantly produce IL17A-F, IL21, IL22, and TNF-α. In our study, the tumor necrosis factor-α receptor (TNFRSF1A) gene is found to be targeted by two SNPs in TR population (shown in Table 3). As this gene has a role in innate immunity, nonsynonymous variants identified by deep exonic resequencing of TNFRSF1A gene has been recently evaluated for BD association.40 The association is shown to be significant40 in the C-α test, the adaptive sum test, and the step-up test. Shahneh et al39 also emphasized possible roles of IL6 and TGF-β during the differentiation of Th17 cells from naive T cells.39 In another study by Shimizu et al,41 it is shown that IL6, IL21, and TGF-β play a role in the differentiation of Th17 cells that proliferated in the presence of IL23. In our study, the IL21 gene is found to be targeted by two SNPs in JP population (shown in Table 3). Our results indicate that the propagation of a signal from TGF-β to MAPK signaling pathway has an important role in BD development and progression mechanisms. In both of these pathways, there are not so many commonly targeted genes between the two populations (shown in red in Figure 3), but there are several genes affected in individual populations (shown in yellow and blue in Figure 3). This finding supports our hypothesis that although there might be individual disease development mechanisms among populations, the affected pathways show a higher level of conservation. In the literature, there is also evidence that TGF-β activates p38 MAPK that regulates Th17 cell differentiation.42, 43 These improvements in the knowledge of BD pathogenesis pave the way for innovative therapy. While presenting the new approaches in immunotherapy of BD, Shahneh et al39 suggested that if subsequent studies focus on these, it might eventually lead to the development of better treatment modalities for BD patients.39

Table 3 The unique SNP-targeted genes of each population within the 10 overrepresented KEGG pathways for BD
Figure 3
figure 3

KEGG pathway map for TGF-β and MAPK signaling pathways. The set of genes shown in blue includes genes that are found for Turkish (TR) data set; yellow includes genes that are found for Japanese (JP) data set; red includes genes that are found both by TR and JP GWASs of Behçet’s disease. A full color version of this figure is available at the European Journal of Human Genetics journal online.

The detection of the complement and coagulation cascade as one of the shared pathways provides further insights for the genetic background of vasculopathy and thrombophilia associated with BD. As shown in Table 1, this pathway is identified in 7th and 10th rankings with P=1.00E−18 and P=2.35E−16 in TR and JP populations, respectively. As shown in Figure 4 in red and in Table 2, in this pathway, PLAT (plasminogen activator, tissue), F5 (coagulation factor V), and F13A1 (coagulation factor XIII) genes are identified in our method by both TR and JP GWASs. Previously, abnormalities in the coagulation cascade, such as significantly higher levels of tissue plasminogen activator in patients with BD compared with healthy age-matched controls, have been reported.44, 45, 46 Moreover, high levels of serum concentration of von Willebrand factor (vWF, an endothelial product that is shown in yellow in Figure 4), plasminogen activator inhibitor-1, and thrombomoduline were found in patients with BD.47, 48 Along this line, Demirer et al,46 Ozoran et al,47 and Beyan et al49 showed that the level of vWF was significantly higher in patients with BD, supporting endothelial destruction because of vasculitis related with BD.50 Factor V Leiden (F5) mutation was also reported as associated with thrombosis in BD.51 Similarly, the study of Gul et al52 concluded that factor V gene mutation may play a major role in the development of venous thrombosis in BD. In parallel, Batioğlu et al34 indicated that the prevalence of factor V Leiden mutation was significantly higher in ocular Behçet patients as compared with healthy control subjects. Hence, they argued that factor V Leiden may be an additional risk factor in ocular BD.34

Figure 4
figure 4

KEGG pathway map for complement and coagulation pathway. The set of genes shown in blue includes genes that are found for Turkish (TR) data set; yellow includes genes that are found for Japanese (JP) data set; red includes genes that are found both by TR and JP GWASs of Behçet’s disease. A full color version of this figure is available at the European Journal of Human Genetics journal online.

In this study, the Turkish and Japanese sets of patients fulfilled different but established criteria for the diagnosis of BD that overlap to a great extent and define a similar phenotype criteria.1, 2 However, depending on the features of the referral centers, some differences were observed in the distribution of BD manifestations between two sets, like more frequent ocular inflammation in the Japanese and more frequent vascular involvement in the Turkish patient groups.1, 2 These differences may have played a role in the rankings of the identified pathways in Turkish and Japanese sets, that is, proteasome pathway was ranked as the first in the Japanese and ninth in the Turkish sets, and the complement and coagulation cascade was ranked as seventh in the Turkish and tenth in the Japanese sets.

The selection of top 10 pathways for comparison was an arbitrary approach, and it may have resulted in overlooking some potentially important pathways, such as the Janus kinase (JAK)-signal transducer and activator of transcription (STAT) signaling pathway. In our study, this pathway is identified in 3rd and 17th rankings with P=3.68E−21 and P=6.36E−14 in TR and JP populations, respectively. The JAK2 gene is targeted by 14 genotyped SNPs in Japanese population, and none of these SNPs target the same gene in Turkish population. Hence, if one searches for conserved SNPs between populations, such important clues, illuminating an aspect of disease etiology, might have been missed. The genes such as JAK2 and STAT3 in the IL23 signaling pathway contribute to a surprising range of autoimmune diseases, including BD and inflammatory bowel disease.53, 54, 55 As part of this pathway, low-frequency and rare variants in the FN3, fibronectin tenth type III domain of IL23R gene were detected by targeted resequencing of TR and JP samples.40 As shown in Table 2, IL2RB gene, which has the same FN3 domain, is targeted by the SNPs in both TR and JP populations. Considering the fact that the elucidation of the JAK/STAT pathway has provided many insights into disease mechanisms and has become the basis for new pharmacologic agents, here we would like to emphasize once again the importance of the pathway-oriented approach. If we were to proceed with the strict cutoff of traditional GWAS (P-value<10−8) and did not integrate network and pathway-oriented analysis, we could not detect any affected genes (shown in red, yellow, and blue in Figures 3 and 4) within the identified pathways. As illustrated in Figures 3 and 4, in TGF-β signaling, MAPK signaling pathways, and complement and coagulation cascades, only a few genes are commonly targeted between the two populations. There is striking evidence that each population has its own disease mechanisms. But at the same time, the affected pathways show higher conservation among different populations. That is why we propose that GWAS data should be analyzed in this manner to extract intrinsic information hidden in the data.

This method of analysis provides opportunities for identification of new pathways critical for the disease development and as targets for better treatments. Despite the identification of the same pathways in two different populations, findings of this method need to be confirmed by other approaches such as gene expression analyses or resequencing of the genes involved in the pathways for rare variants. The strategy may also be applicable to other clinical phenotypes, although more testing will be needed to assess its potential for discovery in general.

To conclude, with the fast technological developments and continuous data production in the field of GWASs, more and more data sets are expected to be available in the near future. In this respect, we propose a novel method for the identification of disease-related pathways using the GWAS data from different populations. By applying our method on BD data set, we have shown that although the shared pathways between the TR and the JP populations explain the general mechanisms of BD development, the pathways that are identified by population-specific GWASs also need to be examined to gain a more comprehensive understanding of BD pathogenesis. Each population may search for disease-causing factors targeting the genes within these affected pathways. Rather than the population, the same method can be extended to individuals to identify modifications occurring on the genes within these pathways. Thus, we can determine individual reasons for disease development that can be exploited for drug development and personalized therapeutical applications. To understand individual disease development mechanisms, the identified disease-related pathways can be scanned for an individual for alterations in the functions of the genes contained within. Thus, determining the disease-causing factors will provide a valuable insight for individualized therapy targets that would rectify the impact of these function-altering factors.