Introduction

Behçet’s disease (BD) is a systemic inflammatory disease characterized by recurrent oral aphthous ulcers, genital ulcers, skin lesions and ocular involvement.1, 2 Although uncommon, BD may cause serious manifestation including systemic vasculitis, central nervous system involvement and sight-threatening uveitis.1, 2

The etiology of BD is not understood fully, but some environmental factors are thought to trigger the immunopathogenesis of BD in people with a genetic predisposition. Innate immunity is involved in inflammation along with the adaptive immune system, most importantly T cells.1, 2 HLA-B51 is the most extensively investigated genetic risk factor for BD.1, 2, 3, 4 A strong association of HLA-B51 with BD and similarities in the geographic distribution of BD and the susceptibility alleles have been reported.5 In addition to HLA-B51, many genetic variations have been reported to be associated with BD, including HLA-A26, MICA, TNF, TLR4 and others.6, 7, 8 Recent genome-wide association studies have revealed an association at the IL10 and IL23R–IL12RB2 loci.9, 10

Although many genome-wide association studies based on single-nucleotide polymorphism (SNP) genotyping have suggested the existence of several significant loci in complex diseases including BD, only a small fraction of the heritable variations have been identified. One possible explanation is that many different rare or pathogenic variants, which are difficult to find using SNP microarray experiments, contribute substantially to the genetic susceptibilities of these diseases. Therefore, it has been suggested that sequencing of candidate genes would be an efficient method to investigate the contribution of previously unrecognized rare or pathogenic variants to the target disorder.

Whole-exome sequencing or targeted sequencing with massively parallel or next-generation sequencing (NGS) technology has shown promising results in several diseases, mostly those with Mendelian inheritance such as Duchenne and Becker muscular dystrophy.11, 12, 13, 14 Exome or candidate gene sequencing studies of complex diseases have also been undertaken.15, 16 Herein, to investigate the genetic predisposition for BD with severe uveitis, we used a targeted and massively parallel sequencing method to explore the genetic diversity of exonic regions in 132 candidate genes. We also performed a replication experiment using pathogenic candidates selected from the targeted sequencing.

Materials and methods

Study subjects

The Institutional Review Board of the Seoul National University Hospital approved the study protocol. The enrolment of participants and blood collection were performed in accordance with the Declaration of Helsinki. Informed consent was obtained from all enrolled patients and healthy subjects. A total of 61 unrelated Korean individuals with BD were enrolled in this study. The diagnosis of BD was made by the criteria of the Behçet’s Research Committee of Japan, and patients with either complete or incomplete types of BD were included.17 Complete ophthalmological examinations including slit-lamp biomicroscopic examination, dilated fundus examination and fluorescein angiography were conducted for all patients. All patients showed severe chronic recurrent vision-threatening uveitis in both eyes and were followed up for more than 1 year at the uveitis clinic, Department of Ophthalmology, Seoul National University Hospital. For the replication study, samples were obtained from 320 healthy Korean subjects (controls) aged 50 years with no evidence of rheumatological disease or eye disease except senile cataract. All controls were unrelated to each other or to the patients.

Genomic DNA extraction

After the subjects provided informed consent, genomic DNA was extracted from peripheral blood leucocytes using FlexiGene DNA kit (Qiagen, Valencia, CA, USA) according to the manufacturer’s instructions. Sample concentrations were measured by using a NanoDrop ND-1000 spectrophotometer (NanoDrop Technologies, Wilmington, DE, USA).

Targeted exonic sequencing of candidate genes

We selected 132 candidate genes that had been reported or suspected to play a role in the pathogenesis of Behcet’s uveitis (Supplementary Table S1). In brief, the 132 genes included cytokines such as IL6, IL10, IFNG, TNF and so on; cytokine receptors such as IFNGR1; chemokines such as CCL2; other immune-related genes such as TLR4, NOD2 and so on; and previously reported BD-associated genes such as F5.

For targeted sequencing, 32 patient samples were used. Based on the RefSeq gene set, exonic regions of these genes were captured using a SureSelect Target Enrichment System kit (Agilent Technology, Palo Alto, CA, USA) and sequenced through a Genome Analyzer IIx (Illumina, San Diego, CA, USA). The detailed procedures were described in our previous report14 and the only difference was that eight samples were applied to each capture kit.

The generated reads were aligned to National Center for Biotechnology Information (NCBI) human reference build 37 using the GSNAP (Genomic Short-read Nucleotide Alignment Program) alignment tool.18 For further analyses, we used only the bases that were covered by 8 uniquely aligned reads with 20 mean Q-scores. Single-nucleotide variants (SNVs) and short insertions/deletions (indels) were called according to the algorithm in our previous study.19

Selection of candidate variants

We compared the frequency of each variant with that of the control samples. The controls for this discovery step were the samples from 59 normal Koreans comprising eight whole-genome sequencing and 51 whole-exome sequencing samples.16, 20 For a relevant comparison, we used only variant positions covered well in 20 controls. In addition, we selected further pathogenic candidates among variants based on the gene annotation information: nonsynonymous SNVs, coding sequence indels and splicing site variants. The Reference Sequence (RefSeq) gene database was used for the gene annotation.

The comparison was conducted in two different ways. First, we focused on novel rare variants with allele frequencies <0.01 in the controls that were detected at the same time in 2 case subjects. Novelty was determined by comparison with dbSNP release 132. Second, the genotype frequency of each variant was compared in several ways between the case and control groups to screen for common candidates among the variants. We applied both dominant and recessive models for comparison, and we also compared the allele frequencies between the two groups. Fisher’s exact test was used for all comparisons. A P-value of <0.05 was considered as the standard for candidate selection.

The selected rare and common candidates were checked manually in terms of base coverage depth. We excluded from further experiments those with >2 alleles or with a coverage depth ratio between the reference and variant alleles that deviated far from 0.5 throughout the samples. Finally, linkage disequilibrium (LD) was estimated among the final candidates; if strong LD was found (r2>0.8), we selected one among the related candidates for the further replication study.

Replication study on candidate variants using additional case and control subjects

We genotyped the 13 candidate variants in all 61 case samples, including 31 samples that were sequenced in the first step, and 320 control samples from healthy Korean individuals using the TaqMan SNP genotyping assay (Applied Biosystems, Foster City, CA, USA) (Supplementary Table S2). Some missing genotypes from this experiment were checked through PCR and a subsequent Sanger sequencing method (Supplementary Table S3).

As we did when examining the discovery set, we also applied genotype frequency comparisons to each candidate in both the dominant and recessive models, along with allele frequency comparison. The only difference was that the χ2 test was used for the analyses of the common candidates. Odds ratios were calculated, and a P-value of <0.05 was considered significant.

Two computational algorithms, PolyPhen-2 and SIFT (Scale-Invariant Feature Transform), were used to predict the functional impacts of nonsynonymous SNVs in this study.21, 22 PolyPhen-2 distinguishes variants with drastic effects from other neutral variants, and the results for each variant were classified into two subcategories: damaging and benign. The output of SIFT showed a normalized probability score. Positions with normalized probabilities of <0.05 were predicted to be damaging, and those with normalized probabilities of 0.05 were predicted to be tolerated.

Results

Characteristics of patients

The demographic and clinical characteristics of the patients with BD are shown in Table 1. All patients had a complete or incomplete type of BD with severe uveitis.

Table 1 Demographic and clinical characteristics of the study subjects

Summary of targeted sequencing in patient samples

The sequencing summary of 32 patient samples is shown in Supplementary Table S4. The mean coverage depth was 73.68 × (median 52.03 × ), and the mean percentage of 8 × covered bases in the bait region was 71.77% (median 88.09%). Supplementary Table S5 summarizes the SNV and indel counts of each patient sample. As shown in both tables, five samples were very poorly covered by sequencing (<10%), and the numbers of variants called were markedly lower than those of other samples.

Pathogenic candidate variants

Figure 1 shows the overall flow of this study, including the processes for candidate selection and the replication experiment. In total, we called 3139 SNVs and 269 indels from the targeted sequencing. Among them, 1489 SNVs and 79 indels were well covered in 20 control samples, and 301 SNVs and 5 indels were identified as possibly pathogenic at the same time by the criteria described in the MATERIALS AND METHODS. We narrowed the pathogenic candidates further to 5 rare SNVs and 12 common SNVs by comparing the genotype frequencies between the case and control groups. Among these, 13 SNVs were finally chosen for the replication study with additional samples, and the detailed information of them is shown in Supplementary Table S6. Their allele frequencies were similar with those from the 1000 Genomes project, except three rare and two common variants without information.

Figure 1
figure 1

Overall scheme of the experiment. This flow chart shows briefly all steps in the experiment of this study, including the processes of candidate selection and the additional replication experiment. indel, insertion/deletion; LD, linkage disequilibrium; SNV, single-nucleotide variant.

Replication study with additional study subjects

We genotyped the candidate variants in 61 cases and 320 controls using TaqMan SNP genotyping assay. The case subjects included 31 patient samples that had been sequenced at the discovery step and 99.0% of genotypes were shown to be concordant between the two genotyping methods.

As a result, two rare SNVs and seven common SNVs were shown to have significant associations with BD (Table 2). Their possible effects were consistent with those of the discovery step, which were predicted to be damaging for all candidates. Among the rare candidates, one nonsynonymous SNV in KIR3DL3 (rs199955684) and one in IFNAR1 had significantly higher frequencies in the BD group than in the controls. The substitution of threonine to isoleucine in KIR3DL3 (T395I) was predicted to affect protein function by both PolyPhen-2 and SIFT. Other three rare SNVs also showed the P-values near 0.05. Among the common candidates, seven SNVs in MTHFR (rs1801133), FCGR3A (rs396991), MICA (rs1051790, rs61736348, rs41546114), ICAM1 (rs5498) and KIR2DL4 (rs1051456) had significantly higher allele frequencies in the BD group than in the controls. In case of three SNVs (rs1051790, rs41546114 and rs1051456), the recessive model showed more significant results than the dominant model, and the highest odds ratio (21.42) was found for rs41546114. The substitutions of alanine to valine in MTHFR (A222V) and leucine to valine in MICA (L145V) were also predicted to be damaging by both PolyPhen-2 and SIFT, and the P209A change of KIR2DL4 was predicted to be damaging by PolyPhen-2. The variants on FCGR3A and ICAM1 have been previously reported as susceptibility loci of BD and its ocular involvement.23, 24, 25, 26

Table 2 List of 13 nonsynonymous SNVs genotyped in 61 patients with Behçet’s uveitis and 320 normal controls

Figure 2 shows the distribution the 13 pathogenic candidates in patient samples. Rare SNVs seem to be widely distributed throughout the samples with few overlaps. In case of common candidates, rs1051790 and rs41546114 in MICA had moderate LD between them (r2=0.56), whereas others seem to be randomly distributed in case samples.

Figure 2
figure 2

Distribution of the 13 pathogenic candidates throughout the replication cases. Upper and lower plots show which kinds of rare and common pathogenic candidates were found for each sample, respectively. Grey squares indicate heterozygous variants and black squares indicate homozygous variants. The numbers assigned to each variant correspond to the order of variants listed in Table 2. Common 3 (rs1051790) and 5 (rs41546114) variants have moderate linkage disequilibrium with each other (r2>0.5).

Discussion

To identify the genetic predisposition for BD, we used target enrichment and NGS methods to investigate the genetic diversity of exonic regions in 132 candidate genes, which was followed by a replication study. Among the 1489 SNVs and 79 indels identified in target regions, 2 rare and 7 common SNVs showed significant associations with BD. In addition, we found that three pathogenic variants of KIR3DL3 (T395I), MTHFR (A222V, rs1801133) and MICA (L145V, rs1051790) were predicted to be damaging by the two computational algorithms, PolyPhen-2 and SIFT.

We tried to find some functional candidates for the BD pathogenesis. The strategies for selecting functional candidates in this study were as follows. First, we focused only on whole-exonic regions of 132 genes, most of which had been reported as susceptibility genes for BD. Second, we selected possible pathogenic variants with changes in the amino acid sequences of the corresponding proteins. For this strategy, we chose to apply the NGS technology because it could fully cover the exonic regions of several genes at the base level. However, when using the NGS method, successful analysis requires even and sufficient sequencing coverage throughout study samples. Unfortunately, five samples were scarcely covered, and some other samples also showed fluctuating coverage patterns (Supplementary Tables S4 and S5). Although this can be explained by many factors such as poor sample quality, technical problems and experimental errors, the most acceptable explanation would be the application of too many samples to one capture kit. Our previous study, which showed uniform coverage patterns throughout study samples, used the same experimental procedures as in this study; the only difference was that we applied four samples to one capture kit in our previous study and eight in this study.14 In addition, if more sequencing throughput had been acquired, we would have expected more sufficient coverage even if the coverage patterns had fluctuated.

To overcome this uneven coverage in the discovery set, we first selected samples that satisfied our base quality criteria (bases covered by 8 uniquely aligned reads with 20 mean Q-scores) for each variant position. The statistical comparison was conducted next using only the selected samples for the position. In this way, variants that have too many missing genotypes and therefore high P-values would be excluded, and only variants with significant differences between two groups would remain as candidates. For the replication study, we genotyped all available cases again using TaqMan probes to check the accuracy of sequencing variant calls and to make up missing genotypes for more powerful analyses.

One rare SNV in KIR3DL3 (rs199955684) and one common SNV in KIR2DL4 (rs1051456) showed significant associations with BD in our study. Killer cell immunoglobulin-like receptors (KIRs) are members of the immunoglobulin superfamily that are expressed by natural killer (NK) cells and some T-cell subsets.27 KIR3DL3 expression is limited to CD56bright NK cells, which have a functional role in the innate immune response as the primary source of NK cell-derived immunoregulatory cytokines.27, 28 In BD, the number of NK cells increases in peripheral blood and aqueous humor.29, 30 NK cells may play an important role in controlling inflammation in BD.31, 32 The KIR3DL3 and KIR2DL4 genes have rarely been suggested as susceptibility genes of BD in previous studies. The role of KIRs in BD has not been established, but the variations in these genes may affect the functions of NK cells and the pathogenesis of BD.

The MTHFR gene encodes methylene tetrahydrofolate reductase that uses folate to metabolize and thereby remove homocysteine.33 The A222V polymorphism (rs1801133) in this gene is associated with reduced enzyme activity and therefore causes impaired remethylation of homocysteine to methionine along with subsequent hyperhomocysteinemia.33 Homocysteine is suggested to be a risk factor for the hypercoagulability and thrombotic complications observed in BD patients.34, 35 The significant association of MTHFR with BD in our study is consistent with association between this variation (rs1801133) and an increased risk of ocular involvement in BD reported by Ozkul et al.36

In this study, the frequency of the MICA*A5.1 allele (rs1051790, MICA*00801) was significantly higher in patients with BD. MICA is located within the human major histocompatibility complex, 46 kb centromeric to the human leukocyte antigen B (HLA-B) gene.37 A strong LD was observed between several alleles of MICA including MICA*A6 and HLA-B51 in both the patients with BD and normal controls.38, 39 Thus, the significant association of MICA*A5.1 allele with BD might be caused by LD with HLA-B51, which is known to show a strong association with BD. However, because the LD between MICA*A5.1 and HLA-B51 has not been investigated, and HLA typing was not done in our study, the exact role of MICA*A5.1 in BD remains unclear.

In addition to the variants predicted to be damaging at the prediction tools, there exist some more SNVs that showed the possible associations with BD in this study. One rare and novel variant in IFNAR1 (N465Y) showed the lowest P-value in the dominant model among candidates. The role of interferon-α (IFN-α) in BD pathogenesis was suggested in a previous study and IFN-α uses the receptor dimer of IFNAR1 and IFNAR2.40 Among common SNVs, rs396991 in FCGR3A and rs5498 in ICAM1 also have reached the significance level in this study, and both of these variants have been previously reported to be associated with BD in several studies.23, 24, 25, 26

In this study, we successfully replicated previous studies and identified several novel susceptibility variants of target genes using the NGS technology. However, there are some limitations to our study such as the insufficient sequencing coverage, a small sample size of cases and inclusion of only one ethnicity; thus, our findings should be interpreted with caution. In addition, BD is a multifactorial disease caused by several genetic and environmental factors. Even though several reports have shown the successful application of NGS methods to discovery of genetic determinants of complex traits, further studies with larger sample sizes and inclusion of different ethnicities are needed to confirm the results of this study.15, 16

In conclusion, target enrichment and massively parallel sequencing technologies have provided valuable information on the genetic predisposition for BD; in particular, by focusing on pathogenic candidates that alter the corresponding transcripts. In addition to the genes or variants that replicated previous studies, newly detected genetic variants, such as rs199955684 in KIR3DL3, should be investigated further with a larger sample size from diverse populations.