Introduction

Rheumatoid arthritis (RA) is a multifactorial and systemic autoimmune disease and characterized by synovial inflammation and hyperplasia, autoantibody production, cartilage and bone destruction and systemic features including cardiovascular, pulmonary, psychological and skeletal disorders.1 Although the causality of RA is not completely understood, genetic factors contribute to the onset. About 60% of the RA risk is genetic2 and one-third of the genetic risk of RA is attributed to the major histocompatibility complex (human leukocyte antigens in human) region where HLA-DRB1 is strongly associated. Several genome-wide association studies based on the common disease–common variant hypothesis as well as meta-analyses have been conducted to elucidate the genetic risk of RA. In a recent meta-analysis using greater than 100 000 subjects, the 100 RA risk loci outside the human leukocyte antigen region explained only 5.5% and 4.7% of heritability in Europeans and Asians, respectively.3 On the other hand, common disease–multiple rare variants hypothesis has been proposed to explain the remaining genetic risk. Namely, a significant portion of the inherited susceptibility to common diseases may be due to summation of the effects of a series of low-frequency dominantly and independently acting variants of a variety of different genes.4, 5 These variants cannot be detected by conventional genome-wide association studies, thereby exome sequencing is an alternative tool.

Deep exon sequencing has been used for detecting rare/low-frequency variants related to common diseases. In the DNA sequence of the exons plus exon/intron boundaries of LPL of 313 patients with hypertriglyceridemia, 20 rare variants were detected including previously reported seven variants.6 Regarding RA, deep exon sequencing of 25 biological candidate genes which were identified by genome-wide association studies revealed that the aggregated contribution of low-frequency and common coding variants to the risk of RA using 10 609 RA cases and 35 605 controls.7 Aggregation of rare coding variants in PLD3 of the patients with Alzheimer’s disease has also been reported.8

Exome sequencing has been introduced to identify causative- or susceptible- rare/low-frequency variants in Mendelian diseases and common diseases, and numerous successful results have been reported.9, 10 We also performed exome sequencing to identify genetic risk factors of RA and reported BTNL2 as a RA-susceptible gene.11 Here, we focused on rare/low-frequency variants aggregated in a gene, which are related to RA. We identified the aggregation of rare/low-frequency variants in the mitochondrial respiratory chain-related genes. Oxidative stress has been considered to be involved in the onset of RA.12 Indeed, levels of H2O2 and O2 in RA patients’ plasma were significantly higher than those in controls.13 Elevated levels of urinary 8-hydroxylguanin (8-OHdG), which is a standard biomarker for oxidative stress have also been reported in RA patients.14, 15 It is likely that the single-nucleotide variants (SNVs) associated with RA contribute to the increase of electron leakage in the mitochondrial respiratory chain and production of reactive oxygen species (ROS) through reducing oxygen molecules to the superoxide anion by the leaked electrons. The aggregation of rare/low-frequency variants in the mitochondrial respiratory chain suggests genetic bases of oxidative stress in RA patients.

Materials and methods

Subjects

A total of 432 Japanese RA patients (male: n=72; average age=62.1, female: n=360; average age=58.8) were enrolled from outpatients of the Division of Rheumatology, Tokai University Hospital and Division of Rheumatology, Keio University Hospital. All RA patients fulfilled the 1987 revised criteria of the American College of Rheumatology.16 A total of 432 unrelated healthy Japanese controls without a history of autoimmune diseases (male: n=202; average age=58.7, female: n=230; average age=45.6) were recruited at the Health Evaluation and Promotion Center of Tokai University Hospital.

The RA patients were classified into three groups based on the number of joints with erosion and the time course of erosion:17 a subset with mutilating disease (MUD: 17 cases), a subset with more erosive disease (MES: 276 cases) and a subset with least erosive disease (LES: 139 cases). The order of Sharp score is expected to be MUD>MES>LES. The MUD group was combined with the MES group as severe erosive cases for statistical analyses because of the small number of MUD cases.

All subjects gave written informed consent for genetic analyses. Ethical approvals for this study were obtained from the ethics committee of Tokai University School of Medicine and from the ethics committee of Keio University School of Medicine.

Next Generation DNA Sequencing

DNAs were extracted from peripheral blood using a DNA extraction kit, Genomix (Biologica, Nagoya, Japan). Exome sequencing was performed for 59 patients with severe erosive RA comprising 7 MUD cases and 52 MES cases together with 93 controls without a history of autoimmune diseases using the SureSelect Human All Exon kit (Agilent Technologies, Santa Clara, CA, USA), and the enriched libraries were sequenced by using an Illumina HiSeq 2000 sequencer (Illumina, San Diego, CA, USA) with 2 × 100 bp paired-end module. The reads were mapped to the reference genome (UCSC hg19, NCBI GRCh37) using bwa v.0.7.5.18 BWA generated sam files were sorted and indexed using Samtools v.0.1.18.19 Then duplicated reads were marked with Picard v.1.102 (https://github.com/broadinstitute/picard). The files obtained in bam format were analyzed using Genome Analysis Toolkit v.2.7 (GATK)20, 21 following their best practices guidelines. In brief, the bam files were first subject to indel realignment, base quality score recalibration and then SNVs and small indels calling with the UnifiedGenotyper walker, to obtain the potential variants in a vcf file. Variants in vcf files were annotated using script in ANNOVAR (version 2013jul21).22

SNV filtering, narrow down and statistical analyses of exome-derived SNVs

SNVs detected in both of cases and controls by the exome sequencing were filtered out based on the following criteria: SNVs with phred-scaled variant quality provided by the UnifiedGenotyper of GATK20, 21, 23 <100, SNVs with supporting read <8, synonymous SNVs, SNVs on segmental duplications, SNVs with no reads in greater than 5% of samples. To focus on the rare/low-frequency variants, the genes having SNV(s) with minor allele frequency (MAF)>0.05 in control were excluded. In this step, genes harboring common SNV(s) and rare/low-frequency variant(s) as well as genes harboring only common SNV(s) with greater than 5% of MAF were excluded, because it is likely that the RA-susceptible genes harboring common SNV(s) were already identified by genome-wide association studies and to narrow down candidate genes. A total of 6012 SNVs (3806 genes) were obtained after filtering. In the next step, a gene-burden test24, 25 was performed using filtered SNVs from 59 RA samples and 93 controls to prioritize pathways in which RA-susceptible genes involved. Fisher’s exact test was used for the gene-burden test based on the number of cases or controls with or without greater than one SNV of interest in each gene. Among the genes which showed significance (P<0.05) in a gene-burden test, we addressed a gene in the mitochondrial respiratory chain based on the previous reports and observation of SNVs in the gene with a relatively high proportion among RA patients. And then, a case–control study followed by a pathway-burden test was conducted. Steps of SNV filtering and narrow down SNVs were shown by a flow chart (Supplementary Figure 1).

Case-control studies were conducted with the SNVs contained in the genes of the pathway (the mitochondrial respiratory chain) using Sanger sequencing of SNVs on the 432 cases and 432 controls. One SNV in SCO1 at the position 10600796 on chromosome 17, which was newly detected in the amplicon for Sanger sequencing of the SNV at the position 10600794 on chromosome 17 in the case–control study, was included in the following statistical analysis. The gene-burden test on each gene was performed. The pathway-burden test was performed by integrating the results on selected genes in the mitochondrial respiratory chain. Namely, a 2 × 2 contingency table based on the number of cases or controls with or without greater than one SNV in the genes was used for Fisher’s exact test.

Results

In the exome sequencing, mapping rate of the alignment was 90.9%, duplication rate of mapped reads was 17.3%, and the average coverage depth was 143.1 × in 59 patients with 96.2% of the target bases covered by at least eight reads. In each sample, 18 171 exonic SNVs and 404 exonic Indels were identified on average. The SNV transition/transversion rate was 2.92. After filtering the data, we performed gene-burden test for obtaining candidate RA-susceptible genes. Gene-burden tests indicated that 107 genes showed differences between cases and controls (P<0.05). Top 20 out of 107 genes were shown in Table 1. 27.1% of the RA patients carried rare/low-frequency variants on NDUFA7, which showed a P-value 6.68E-03, and we focused on the gene in the following study based on several lines of evidence (i) NDUFA7 is a subunit of mitochondrial respiratory chain complex I, (ii) ~90% of ROS is produced in the mitochondria,24 (iii) oxidative stress has been considered to be involved in the onset of RA.12 We performed a case–control study using 432 Japanese RA patients and 432 Japanese healthy controls on the association of three SNVs in NDUFA7 (positions 8381416, 8381435 and 8385744 on chromosome 19, Supplementary Figure 2) with RA based on the gene-burden test, which were observed in exome sequencing of RA samples. Nominally significant at P<0.05 association was observed between SNVs of NDUFA7 with severe erosive RA (MUD plus MES): P=0.011, odds ratio (OR)=1.81, 95% confidence interval (CI)=1.14–2.90, although significance was not reached for overall RA: P=0.134, OR=1.41, 95% CI=0.90–2.21.

Table 1 Rheumatoid arthritis-susceptible candidate genes obtained from exome sequencing followed by filteringa

If SNVs of NDUFA7 result in increasing electron leakage followed by ROS production through reducing oxygen molecules to superoxide anion and ROS from mitochondria is involved in RA onset, other SNVs related to mitochondria respiratory chain might be associated with RA. Then, we re-explored those genes: subunits of mitochondrial respiratory chain complexes I–V (http://www.genenames.org/genefamilies/mitocomplex), and mitochondrial respiratory chain complex assembly factors (http://www.genenames.org/genefamilies/MITOAF). A total of 105 SNVs in 59 genes were observed from 59 exome sequencing results of RA patients: 17 SNVs in 13 genes among 38 complex I genes, no SNVs in 4 genes of complex II, 3 SNVs in 3 genes among 9 complex III genes, 2 SNVs in 2 genes among 16 complex IV genes, 11 SNVs in 8 genes from17 complex V genes and 27 SNVs in 14 genes of 35 mitochondrial respiratory chain complex assembly factors (Supplementary Tables 1 and 2). We selected the genes with greater than 1 of OR for further analysis. Three genes were selected other than NDUFA7: SDHAF2, SCO1 and ATP5O (Table 2). The results of the gene-burden test using 432 cases and 432 controls were shown in Table 3. Based on the results of the gene-burden tests, genes were subjected to the pathway-burden test. Sixty-eight individuals of 292 RA patients with severe erosion and 53 individuals out of 431controls had greater than one SNV in the mitochondrial respiratory chain related genes (pathway) containing NDUFA7, SDHAF2, SCO1 and ATP5O. Four RA patients with severe erosion and no controls had two SNVs in those genes. An association was observed with severe erosive RA: P=1.56E-04, OR: 2.16, 95% CI=1.43–3.28 (Table 3). The allele frequencies of respective SNVs in 432 RA cases and 432 controls are summarized in Supplementary Table 3.

Table 2 Rheumatoid arthritis-susceptible candidate genes and SNVs selected
Table 3 Association analysis of rheumatoid arthritis with selected genes

Discussion

Rare variants for common diseases are highlighted owing to the available massive sequencing data. However, single-variant tests with rare variants have less power comparing with common variants. Accordingly, aggregation tests24, 25 evaluating cumulative effects of multiple rare variants in a gene or a pathway, which are associated with a disease, have increased power to detect genetic causality.26

Filtering and narrowing down of SNVs for quality control and focusing on rare/low-frequency variants, 3806 genes were obtained as RA-susceptible candidate genes (Supplementary Figure 1). It indicated that the significant level of P-value was 1.32 × 10−5 by gene-burden test after Bonferroni Correction and the P-values of gene-burden tests using the exome-sequencing data did not reach the significant level. However, we hypothesized that significant association based on a pathway would be detected by integrating multiple rare/low-frequency variants which were observed in the exome-sequencing but not reached the significant level after Bonferroni correction because of small number of rare SNVs in each gene. In this context, gene-burden test was useful for prioritizing of pathways in which RA-susceptible genes involved. Consequently, a significant association between severe erosive RA and mitochondrial respiratory complex-related genes which were major site of ROS generation was detected. Owing to the low frequency of each SNV, the variance explained by each of the SNVs in NDUFA7, ATP5O, SCO1 and SDHAF2 was at most 0.13% by using a liability threshold model27 in which the prevalence of RA was assumed to be 1.0%. Increment of sample size and the functional analyses of SNVs in the mitochondrial respiratory chain are subjects of future investigations.

Mitochondrial respiratory complexes I and III have been considered to be the major ROS generation site because large changes in the potential energy of the electrons occur in the sites.28 However, other complexes in the mitochondrial respiratory chain have been also shown to be related with ROS production using inhibitors and siRNA.29, 30, 31 Therefore, it is likely that SNVs in the genes related to those complexes elicit ROS production. Five complexes of mitochondrial respiratory chain are constituted with proteins encoded 97 genes and 35 assembly factors are involved in the formation of the complexes. Mitochondrial flavoenzymes such as α-ketoglutarate dehydrogenase (OGDH) and glycerol phosphate dehydrogenase (GPD2) are also important sources of ROS.32 In this study, a limited number of rare/low-frequency variants were observed because of a relatively small number of exome-sequenced cases used. Some particular rare/low-frequency variants in other genes, which were not detected in this study, may be involved in the ROS production in both of severe erosive RA and least erosive RA.

Oxidative stress has been considered to be involved in the onset of RA.12 RA is characterized by synovial inflammation and hyperplasia, autoantibody production, cartilage and bone destruction and systemic features.2 Previous reports suggest ROS is underlying some of these characteristics.

Anti-cyclic citrullinated peptides antibody (Ab) has been utilized as a biomarker for RA diagnosis because of its high specificity and sensitivity to RA.33 PADI4 has been reported to be a RA-susceptible gene.34 PADI4, one of the members of peptidylarginine deiminase (PADI) that catalyzes the conversion of arginine residues into citrulline, was expressed in hematological and RA synovial tissues.35 It was reported that the mRNA level of PADI4 was increased in U373MG cells and the citrullinated proteins were also increased in U-2 OS cells by transfection of p53-expression vector.36 ROS activates p53 through the activation of p38MAPK37 or damaging DNA. The overexpression of p53 in synovial tissue from patients with RA was reported.38 In addition, the somatic mutations in p53 have been found within the RA synovium and greater than 80% of the mutations are G/A and T/C transitions, which are characteristic of oxidative deamination by NO or oxygen radicals.39 Both wild-type p53 and mutated p53 effectively induce NF-kB.40 Therefore, it is likely that the SNVs aggregated in the mitochondrial respiratory chain-related proteins increase the leakage of ROS followed by activation of p53 and overexpression of PADI4. Aberrantly produced citrullinated proteins will activate immune system and cause overproduction of pro-inflammatory cytokines such as TNF-α and IL-6 (Figure 1).

Figure 1
figure 1

Involvement of ROS in the onset of rheumatoid arthritis: bibliographic consideration. ROS activates p53 through activation of p38MAPK, damaging DNA and mutating p53. Activated p53 promotes aberrant citrullination of proteins through activation of PADI4 and results in production of anti-cyclic citrullinated peptides antibody. Activated p53 also promotes inflammation through activation of NFkB as well as immunoreactions caused by anti-cyclic citrullinated peptides antibody. On the other hands, ROS causes articular destruction through activation of mTOR followed by activation of osteoclast and fibroblast-like synoviocytes.

ROS activates mTOR41 which activates osteoclasts and fibroblast-like synoviocytes. Inhibition of mTOR by sirolimus or everolimus reduced synovial osteoclast formation and protected against local bone erosions and cartilage loss in human tumor necrosis factor transgenic mice.42 Rapamycin treatment reduces the phosphorylation of mTOR and decreased fibroblast-like synoviocytes invasion in RA tissues which correlate with radiographical and histological damage in RA.43 Thus, ROS is also involved in cartilage and bone destruction through activation of osteoclast and fibroblast-like synoviocytes (Figure 1).

Recently, resveratrol that is a natural anti-inflammatory antioxidant has been reported to be a promising strategy in controlling synovial inflammatory response.44 It supports our findings in this study.