Introduction

Rheumatoid arthritis (RA) is a multifactorial and systemic autoimmune disease that can lead to progressive joint destruction and disability. About 60% of the RA risk is genetic1 and one-third of the genetic risk of RA is attributed to the major histocompatibility complex (MHC, human leukocyte antigens in human: HLA) region2 where HLA-DRB1 is strongly associated. Several genome-wide association studies (GWAS) have been conducted based on the ‘common-disease common-variant’ hypothesis to identify genetic factors that contribute to RA3, 4 and more than 30 non-HLA loci were reported to be associated with RA.5 However, the identified non-HLA loci so far explain only about 5% of genetic variance.5, 6 There is a possibility that common variants explain only a modest fraction of the heritability of common diseases, and that rare and low-frequency variants, the so-called ‘missing heritability’, may explain the remaining genetic variance.7, 8 The exploration of genetic interactions may also help to overcome some of the problems associated with ‘missing heritability’.9

Deep resequencing of target regions, which were identified by GWAS, has been performed to identify rare and low-frequency variants. In fact, using PCR products of disease-susceptible loci, many rare variants were identified as disease-susceptible single-nucleotide polymorphisms (SNPs) of type 1 diabetes10 and inflammatory bowel disease.11 In addition, exome sequencing brought successful results for identification of causative variants of Mendelian diseases.12, 13 Exomic sequencing with filtering methodology appears to be an efficient strategy for identifying genes involved in Mendelian and/or non-Mendelian complex diseases using a small number of cases.

In the present study, we applied exome sequencing to 19 cases with RA to search for gene-coding variants associated with RA. We identified SNPs within the butyrophilin-like protein 2 gene (BTNL2) in the MHC region that are likely RA-susceptible gene variants. A functional SNP in BTNL2 rs2076530 polymorphism was previously associated with RA, but found to have only a secondary effect after taking into account its linkage disequilibrium (LD) with the other candidate RA susceptibility genes within the class II region of the MHC, HLA-DQB1 and HLA-DRB1.14 However, the association between RA and other BTNL2 SNPs was not previously investigated. Moreover, BTNL2 is a member of the immunoglobulin gene superfamily that is highly expressed in lymphoid tissues and has been associated either with a primary effect or as a secondary effect of HLA-DRB1 with many autoimmune diseases, including sarcoidosis,15, 16, 17 ulcerative colitis,18 Kawasaki disease,19 osteoarthritis,20, 21 systemic lupus erythematosus,14 multiple sclerosis,22 Graves’ disease23 and Crohn’s disease.24

In this paper, we (1) report on finding an association between RA and the BTNL2 gene polymorphisms using exomic and Sanger sequencing in different case and control comparisons after applying different filtering methods, and (2) demonstrate by allele and haplotype frequency analysis in cases and controls that BTNL2 confers a statistically significant RA risk independently from HLA-DRB1 and NOTCH4.

Materials and methods

Subjects

A total of 432 Japanese RA patients were enrolled from outpatients of the Division of Rheumatology, Tokai University Hospital, and Division of Rheumatology, Keio University Hospital. All RA patients fulfilled the 1987 revised criteria of the American College of Rheumatology.25 Two rheumatologists independently evaluated all cases as a blind analysis of clinical information. A total of 432 unrelated healthy Japanese control subjects were recruited from among visitors to the Health Evaluation and Promotion Center of Tokai University Hospital.

All subjects gave written informed consent for genetic screening. Ethical approvals for this study were obtained from the ethics committee of Tokai University School of Medicine and from the ethics committee of Keio University School of Medicine.

DNA samples

DNA samples were extracted from peripheral blood using a DNA extraction kit Genomix (Biologica, Nagoya, Japan) using the manufacturer’s instructions.

Next-generation DNA sequencing

Exome sequencing analysis was performed for 19 RA samples, where entire exon sequences were enriched by using a SureSelect Human All Exon kit (Agilent Technologies, Santa Clara, CA, USA), and the libraries were sequenced on the Illumina HiSeq 2000 (Illumina, San Diego, CA, USA) using the paired-end module of 101-bp read. Average read depth was 146, and 92.0, 96.2 and 99.4% of exons were covered on average by at least 20 reads, 10 reads and 1 read, respectively. Sequence reads were aligned to a reference genome (UCSC hg19, NCBI GRCh37) using BWA (Version 0.5.17; http://bio-bwa.sourceforge.net/).26 SAMtools (Version 0.1.18; http://samtools.sourceforge.net/),27 Picard (http://picard.sourceforge.net/) and GATK (http://www.broadinstitute.org/gsa/wiki/index.php/Home_Page)28, 29 were used for removing duplicated reads, realignment, recalibration and variant identification. Segmental duplication regions were annotated by using ANNOVAR.30

Genotyping of HLA and single-nucleotide variants (SNVs)

We analyzed HLA-DRB1 using the Luminex assay system and HLA typing kits (WAKFlow HLA Typing kits, Wakunaga, Osaka, Japan or LABType SSO, One Lambda, Canoga Park, CA, USA). SNP markers were genotyped by Sanger sequencing using PCR products. The PCR products were purified by ExoSAP-IT treatment (USB, Cleveland, OH, USA) and sequenced using Big Dye Termination chemistry and ABI 3100 genetic analyzer (Applied Biosystems, Foster City, CA, USA). The PCR primers used for sequencing and for analyzing SNPs in BTNL2 and NOTCH4 are shown in Supplementary Table 1. We followed NC_000006 (NCBI, GRCh37) for gene structure of BTNL2 (Figure 1).

Figure 1
figure 1

Schematic illustration of the location of the BTNL2 and the non-synonymous SNPs in BTNL2 observed in this study.

Statistical analyses

R-software and PLINK (http://pngu.mgh.harvard.edu/purcell/plink/)31 were used for statistical analyses: Hardy–Weinberg equilibrium test, Fisher’s exact test and logistic regression (additive model). Haplotypes of HLA and SNPs were estimated using PHASE version 2.1.1.32 The D′ and the squared Pearson correlation coefficient (r2) were calculated using the reconstructed haplotypes based on the PHASE analysis. LD blocks were created by the GAB algorithm and the LD patterns were displayed using Haploview software.33

Results

Exomic sequencing analysis

We conducted whole-exome sequencing using 19 RA patients. We excluded synonymous SNVs and SNVs located on the segmental duplication regions. For the quality control of the SNVs, we used only SNVs that fulfilled the following two filtering criteria: (i) the average supporting reads were 8; and (ii) the average SNP quality values generated by the calling algorithm GATK SNV were 100. From 57 225 SNV loci, 18 660 SNVs passed the two filtering steps. The summary of the results of the filtered candidate SNVs is shown in Supplementary Table 2 and the flowchart of this study is shown in Supplementary Figure 1.

We postulated that the number of rare and low-frequency variants in a RA-susceptible gene was higher in cases than that in controls.34 Therefore, our first step was to compare the average number of SNVs in each gene between RA patients and 44 disease controls (non-autoimmune diseases). On this basis, we selected the top 30 genes with the highest number of SNV differences between the cases and controls (Supplementary Table 3). Next, we divided the SNVs into two groups: the low variants with an allele frequency of <0.106 (4 of 38 chromosomes in 19 RA cases) and the common variants with an allele frequency of >0.106. As a second step, we compared the allele frequency of each SNV in the 19 RA cases with those of 36 Japanese genomes obtained from the data of 1000 genomes project. In this step, we compared the allele frequencies for each SNP in a set of controls different to those used in the first step in order to reduce further the chances of obtaining false-positive associations. This second filtering step narrowed the 30 RA-susceptible candidate genes down to 15 genes with an allele frequency difference of >0.1 for the common variants and/or higher allele frequency differences for the low variants (Table 1).

Table 1 Top 15 gene SNVs and their overall significance after Sanger sequencing and association analysis between 48 RA cases and 48 controls

The average number of SNV of 99 genes that were reported to be RA-susceptible or were located in the mapped region to be RA-susceptible in ‘A Catalog of Published Genome-Wide Association Studies’ (http://www.genome.gov/gwastudies/) were shown in Supplementary Table 4. Although only one gene, C6orf10, was appeared in Supplementary Table 3 among 99 genes, the gene was filtered out at the comparison step with 36 Japanese genomes data obtained from the 1000 genomes project (Table 1).

Validation of RA-susceptible gene SNVs by Sanger sequencing and association analysis

The SNVs identified initially by exomic sequencing in the 15 genes listed in Table 1 were validated by Sanger sequencing using 48 cases (excluding the 19 exome-sequenced samples) and 48 controls (Table 1). Then the differences in the allele frequencies between the cases and controls were determined by the Fisher’s exact test. Only the SNVs located in BTNL2, NOTCH4 and MYPN showed P-value <0.05.

We further confirmed the association of these SNVs with RA by using 432 cases, including the 19 exome-sequenced samples, and 432 controls. Among these samples, 411 RA patients and 420 control samples with complete genotypes for these SNPs were used for the analyses. Table 2 shows the BTNL2 and NOTCH4 SNPs that were significantly associated with RA, although rs2071282 in NOTCH4 was already reported to be RA-susceptible.35 The rs10997975 in MYPN showed weak association: P=0.0047, odds ratio=1.33, 95% confidence interval=1.09–1.62 by the Fisher’s exact test. As the association of rs10997975 in MYPN with RA was weak and BTNL2 and NOTCH4 are located in close proximity to HLA-DRB1, we then focused mainly on BTNL2 as well as analyzing the relationships between BTNL2, NOTCH4 and HLA-DRB1.

Table 2 Association analysis with rheumatoid arthritis using logistic regression

Twelve non-synonymous SNPs in BTNL2 were significantly associated with RA (Table 2) by logistic regression. The three SNPs, rs28362678, rs28362677 and rs41521946, in exon 6 were in absolute LD (r2=1, see below) and showed the lowest P-value among the 12 SNPs.

As the BTNL2 rs2076530 in exon 5 was previously reported to be a RA-susceptible SNP that was insignificant when tested conditionally on HLA DQB1–DRB1 haplotypes,14 we conducted logistic regression with the adjustment for RA-susceptible DRB1 alleles in Japanese (DRB1*04:05, DRB1*04:01 and DRB1*10:01)36 (Table 3) or the RA-susceptible NOTCH4 allele (Table 4). The three SNPs, rs28362678, rs28362677 and rs41521946, of exon 6 retained significant P-values (P<0.05) in both of logistic regression with the adjustment, although the other SNPs, including rs2076530, in exon 5 were no longer significantly associated with RA (P>0.05).

Table 3 Logistic regression with the adjustment for RA-risk HLA-DRB1 allelesa
Table 4 Logistic regression with the adjustment for rs2071281 in exon 4 of NOTCH4a

The amino acids substituted by the three non-synonymous SNPs were shared by other species, including orangutan (Supplementary Figure 2), that is, the substituted amino acids that were found in the human were conserved in other mammalian species. It implies the function of the BTNL2 protein was not changed or affected drastically by evolution and perhaps the selection pressure was weak.

LD and haplotype analysis between BTNL2, NOTCH4 and HLA-DRB1

As only the three SNPs in exon 6 of BTNL2 retain significant association with RA in logistic regression with the adjustment for RA-susceptible DRB1 alleles or NOTCH4, we compared LD values among the BTNL2 and NOTCH4 SNPs and DRB1*04:05 using phased haplotypes (Table 5 and Supplementary Figure 3). Both values of D′ and r2 between rs2076530 and DRB1*04:05 showed lower values compared with those between the three BTNL2 SNPs and DRB1*04:05: D′=0.718, r2=0.124 for rs2076530—DRB1*04:05, and D′=0.807, r2=0.268 for rs28362678/rs28362677/rs41521946—DRB1*04:05, respectively. On the other hand, the MAF (minor allele frequency) and allele frequency were 0.500 for rs2076530, 0.369 for the 3 BTNL2 SNPs and 0.194 for DRB1*04:05, respectively. Therefore, the differences in MAF may explain the lower D′ and r2 obtained between rs2076530 and DRB1*04:05 compared with those between the three BTNL2 SNPs and DRB1*04:05. That is, only a minority of the haplotypes carrying rs2076530-G have DRB1*04:05.

Table 5 LD values between SNPs in the NOTCH—HLA-DRB1*04:05 region

The comparison of the number of BTNL2 SNVs in the patients and controls with respect to HLA-DRB1 genotypes was also shown in the Supplementary Table 5.

Discussion

We showed here that the three non-synonymous SNPs, rs28362678, rs28362677 and rs41521946, of the exon 6 of BTNL2, which were in absolute LD with each other, were associated with RA independently from HLA-DRB1 and NOTCH4. Although the other nine SNPs, including rs2076530, of BTNL2 also showed significant association with RA, they were no longer significant in the logistic regression with the adjustment for RA-susceptible HLA-DRB1 alleles in Japanese: DRB1*04:05, DRB1*04:01 and DRB1*10:01. These results agree with the report that the association between rs2076530 and RA was attributed to LD with DR-DQ haplotypes.14 However, this is the first report on the association between non-synonymous SNPs other than rs2076530 in BTNL2 with RA. Although the association of rs2076530 with RA was attributed to the LD with DRB1, the LD values, D′ and r2, of rs2076530 were lower than those of the three BTNL2 SNPs in exon 6. The differences in MAF appear to explain the phenomenon.

In this study, the association of NOTCH4 with RA was a secondary effect to that of HLA-DRB1 based on the logistic regression with the adjustment for DRB1*04:05, although it was reported previously that the association of NOTCH4 with RA was independent from HLA-DRB1 using a partially recessive model focusing on a shared epitope of HLA-DRB1.35 We also showed here a weak but novel association between MYPN and RA. The MYPN gene, located on chromosome 10, encodes the myopallidin protein that interacts with Actinin, alpha 2, ANKRD23 and ANKRD1 and is a component of a family of titin filament-based stress/strain response molecules in myofibrils.37, 38 It is difficult to envisage what role the myopallidin protein, as a component of the sarcomere, might have in RA. Therefore, the association of MYPN with RA should be retested and confirmed in a replication study using many more samples.

To date, a number of GWAS were conducted on RA. However, there were few reports on the association between BTNL2 and RA. The r2 between SNPs of BTNL2 ranged between 0.084 and 1, and the lowest r2 (0.084) was between rs2076530 and rs28362680 (Table 5). It is likely that SNPs around BTNL2 on the GWAS platforms do not fully represent the variants that can be found within the BTNL2-LD block and that were detected in this study. For example, the two probes corresponding to rs3817969 and rs1980493, which are 1.1 kb upstream of BTNL2 and in intron 5 of BTNL2, respectively, are used on the OmniExpress (Illimina) across the exon 6 of BTNL2. The SNP rs28362675, which showed no significant association with RA in the logistic regression with the adjustment for DRB1*04:05, is more closely positioned to rs3817969 than to the three BTNL2 SNPs of exon 6 that were in absolute LD. In addition, the MAF of rs1980493 is 0.085 in Japanese (HapMap data) and is considerably lower than that of the three BTNL2 SNPs (MAF: 0.369). Therefore, the tagging of only two SNPs for BTNL2 in the OmniExpress arrays may not work well for detecting the three BTNL2 SNPs in exon 6. In this regard, exome sequencing will complement the tagging SNPs in the OmniExpress arrays that might not be representative of the BTNL2 LD blocks shown here.

BTNL2 is a member of the immunoglobulin gene superfamily with homology to the B7 costimulatory molecule and consists of two sets of two Ig domains, IgV and IgC.39 BTNL2 mRNA is highly expressed in lymphoid tissues as well as in the intestine and is involved in the regulation of T-cell activation as a negative costimulatory molecule.39, 40 Interestingly, the CTLA4 gene, which encodes for a protein, which is one of the ligands of B7 and a inhibitory signal to activated T cells, is also associated with RA.41, 42 Therefore, it is likely that the variants in BTNL2 confer risk to autoimmune disease, although the amino acids substituted by the three SNPs would not be expected to result in drastic changes because the changed amino acids are shared by other species, including orangutan (Supplementary Figure 1). In fact, there are many reports on the associations of BTNL2 SNPs with autoimmune disease, sarcoidosis,15, 16, 17 ulcerative colitis,18 Kawasaki disease19 and osteoarthritis.20, 21 The two intergenic SNPs located between BTNL2 and DRA were reported to be associated with anti-cyclic citrullinated peptide antibody titer in adults with RA.43 In these diseases, the independence from HLA-DRB1 is often controversial. However, the majority of these associations were described for the SNP rs2076530, intron SNPs and intergenic SNPs. However, other BTNL2 non-synonymous SNPs were associated with sarcoidosis15 and ulcerative colitis.18 In sarcoidosis, the association of the three non-synonymous SNPs in exon 6 were secondary effects based on LD with rs2076530. In ulcerative colitis, the association was with rs2076523 in exon 3. In our study on RA, however, the P-values of the three SNPs were lower than for both rs2076530 and rs2076523 (Table 2). In addition, the associations of rs2076530 and rs2076523 were owing to LD with DRB1 (Table 3). The association studies using three non-synonymous SNPs in exon 6 of BTNL2 on auto-immune disease as well as replication studies for RA may clarify the association of BTNL2 with auto-immune diseases, including systemic lupus erythematosus,14 multiple sclerosis,22 Graves’ disease23 and Crohn’s disease,24 for which the association with rs2076530 was reported to be a secondary effect to DRB1.

In this study on the identification of RA-susceptible genes, we performed exome sequencing followed by the validation of the candidate disease SNVs in case–control association analysis. In order to identify the candidate genes, we compared the average number of SNV per gene based on the hypothesis that the number of rare and low-frequency variants in a disease-susceptible gene in cases would be higher than that in controls. More than half of genes listed in Supplementary Tables 3 and 1 were located on 6p21-22, in which region the strongest RA-susceptible gene, HLA-DRB1, was also located. We could not deny the effects of the LD with HLA-DRB1 on the filtering process. Therefore, the testing of independence from HLA-DRB1 and adjacent RA-susceptible genes was required, and we carried out such testing. Eventually, we could not obtain RA-susceptible rare and low-frequency variants in this study. It is likely that the presence of common variants in the same gene interfered with the detection of rare and low-frequency variants, and that a limited number of exome-sequenced cases affected the efficiency of detecting rare and low-frequency variants. We expect that a greater accumulation of exome-sequenced cases will resolve these limitations.