Genome-wide association analysis identifies three new risk loci for gout arthritis in Han Chinese

Gout is one of the most common types of inflammatory arthritis, caused by the deposition of monosodium urate crystals in and around the joints. Previous genome-wide association studies (GWASs) have identified many genetic loci associated with raised serum urate concentrations. However, hyperuricemia alone is not sufficient for the development of gout arthritis. Here we conduct a multistage GWAS in Han Chinese using 4,275 male gout patients and 6,272 normal male controls (1,255 cases and 1,848 controls were genome-wide genotyped), with an additional 1,644 hyperuricemic controls. We discover three new risk loci, 17q23.2 (rs11653176, P=1.36 × 10−13, BCAS3), 9p24.2 (rs12236871, P=1.48 × 10−10, RFX3) and 11p15.5 (rs179785, P=1.28 × 10−8, KCNQ1), which contain inflammatory candidate genes. Our results suggest that these loci are most likely related to the progression from hyperuricemia to inflammatory gout, which will provide new insights into the pathogenesis of gout arthritis.

G out, which is one of the most common types of inflammatory arthritis in men and affects 1-2% of adults in developed countries, results from the deposition of monosodium urate (MSU) crystals in and around the joints [1][2][3] . Elevated serum urate levels are a key risk factor for gout onset 4,5 . However, only B10% of people with hyperuricemia develop clinical gout, suggesting that hyperuricemia alone is not sufficient for the development of gout arthritis 6 . Previous genome-wide association studies (GWASs) have identified dozens of loci associated with elevated serum urate levels 7-10 , whereas little is known about the genetic etiology of the inflammatory response to the MSU crystals 6 . Large, well-defined cohorts of gout and hyperuricemia without gout are required for a GWAS to properly identify genetic loci that control the progression from hyperuricemia to inflammatory gout.
To extend the knowledge of the genetic basis of gout, we conducted GWAS and replication studies in the Han Chinese population using 4,275 clinically ascertained male gout patients and 6,272 healthy male controls in addition to 215 female cases and 541 healthy female controls. Furthermore, 1,644 long-term hyperuricemia patients who had never developed gout were recruited and used to examine whether the newly identified genetic loci are associated with elevated serum urate levels or only with inflammatory gout (Supplementary Fig. 1). We identified three new susceptibility loci that are significantly associated with gout arthritis at 17q23.2 (rs11653176, P ¼ 1.36 Â 10 À 13 ), 9p24.2 (rs12236871, P ¼ 1.48 Â 10 À 10 ) and 11p15.5 (rs179785, P ¼ 1.28 Â 10 À 8 ).

Results
Association analyses. During the discovery phase (GWAS stage), we genotyped 1,398 male cases and 1,962 male controls (Supplementary Table 1) using the Affymetrix Axiom Genome-Wide CHB Array. After the quality control (QC), a total of 603,505 single-nucleotide polymorphisms (SNPs) in 1,255 cases and 1,848 controls were retained for further analysis (see Methods). A principal component analysis (PCA)-based analysis was performed to correct for any potential population stratification (Supplementary Fig. 2 and Supplementary Fig. 3, see Methods). Association analyses are performed with logistic regression. The quantile-quantile (Q-Q) and Manhattan plots are shown in Supplementary Fig. 4 and Supplementary Fig. 5a. As expected, the SNP rs2231142 (P ¼ 4.66 Â 10 À 10 , Supplementary Table 2), which was previously reported in several GWASs of serum urate levels, hyperuricemia, and gout 7-10 (Supplementary Data 1), showed genome-wide significance (Po5 Â 10 À 8 ). Because we intended to search for new risk loci, each SNP with a Pr5 Â 10 À 5 in the discovery stage and with adjacent genes that were not previously reported was selected as a candidate for the follow-up phase I (REP1 stage) study.
The genome-wide significant SNPs and kidney function. Both the 17q23.2 and 11p15.5 loci have been reported to be associated with measures of renal function and kidney disease in previous GWASs 11,12 . As impaired kidney function is one of the main risk factors for gout, we tried to compare the effect estimates for the GWAS, genome-wide association studies; SNP, single-nucleotide polymorphism; OR, odds ratio; Shown are the discovery (GWAS), replications (REP1, REP2 and REP3) and meta-analysis results for gout cases versus controls. Position, based on hg18; A1, minor allele for the whole sample; The minor allele was the coded allele; Freq., the frequency of A1 in the controls; OR, odds ratio; P, P-value; Q, P-value for Cochrane's Q statistic; I 2 , I 2 heterogeneity index (0-100). *The OR and P values are based on the logistic regression. w The OR and P values are based on the meta-analysis under fixed-effects model.
analyses with or without eGFR (estimated glomerular filtration rate, a measure of kidney function) as a covariate. As we did not collect the serum creatinine data for all individuals throughout the study, we used a subset of sample for whom data on serum creatinine were available for the analysis. In an analysis of 2,473 gout cases and 2,448 controls, we did not see a substantial difference in the effect estimates between with and without adjustment for eGFR (Supplementary Table 5). It suggests the identified genome-wide significant SNPs for gout are more likely to be independent of impaired kidney function.
The genome-wide significant SNPs and hyperuricemia. To further investigate whether the four genome-wide significant SNPs were either associated with or independent of hyperuricemia, we additionally analysed a cohort with 1,644 long-term hyperuricemia patients (Supplementary Table 1) whose serum urate levels were higher than 420 mmol l À 1 and who had experienced hyperuricemia for 410 years without any treatment to reduce urate but had not developed gout when the samples were collected. The analysis of the 1,644 special hyperuricemia patients and healthy controls in this study showed that the four SNPs located in the three loci are not associated with hyperuricemia (P40.05, Table 2), but their allele frequencies were significantly different between the gout and the special hyperuricemia groups (Po0.05, Table 2). Therefore, our results suggest that these genetic associations should be linked to mechanisms other than hyperuricemia in gout.
The genome-wide significant SNPs and female gout. Because the molecular mechanism associated with female gout is considered to be different from that of male gout, we also analysed the four SNPs in a small female cohort of 215 cases and 541 controls (Supplementary Table 6), and one SNP (rs12236871, Assessment of the regulatory potential for the novel loci. To explore the potential implications and epigenetic profile of the association signals, we queried the index SNPs and their proxies (r 2 40.8) based on LD in the 1,000 Genomes Project ASI data set using HaploReg v2 13 . Some signs of regulatory activity (promoter, enhancer and DNase hypersensitivity sites, binding proteins and motifs changed) were observed for the associated SNPs and their surrogates, indicating a possible effect on transcription for these loci (Supplementary Table 7).
Previously reported risk loci for hyperuricemia or gout. We also verified the previously identified loci in hyperuricemia or gout (Supplementary Data 1,64 SNPs at 35 loci with a Po5 Â 10 À 8 ) in the NHGRI GWAS catalog (as of 03/25/14). Ten of the 64 SNPs are with low frequency (o1%) in our samples, 54 SNPs at 28 loci were kept for the association analyses of gout and serum urate levels (the gout samples) in our discovery data set (Supplementary Data 5). For gout, 18 SNPs at 8 loci showed significance (Po0.05), and all are with consistent directions of the previous reports, except for rs4698014 that is uncertain due to the OR was not given in the previous report. For serum urate levels, six SNPs at four loci showed significance, and all (except for rs729761) are with consistent directions of the previous reports.

Discussion
In this multistage GWAS of gout arthritis, we identified four SNPs that were significantly associated (Po5 Â 10 À 8 ) with gout risk in Chinese population. The top two signal SNPs (rs11653176 and rs9905274) are localized in the intron of BCAS3 (breast carcinoma amplified sequence 3). BCAS3 is an estrogen-induced transcriptional co-activator that is overexpressed in breast cancer 14,15 , and it is associated with tumour grade and proliferation 16  Pmeta=6.45x10 Genome CHB and JPT data (June 2010). The P gwas was obtained from the GWAS stage (analysed with logistic regression) and is shown for the genotyped (circle) and imputed (cross) SNPs. The P meta was obtained from a meta-analysis combining all of the data sets, including the discovery and replication stages. The genes within the relevant regions are annotated and indicated with arrows.
NATURE COMMUNICATIONS | DOI: 10.1038/ncomms8041 ARTICLE from data of non-European ancestry populations, but not to be significant (Indian,n ¼ 8,340, African-American,n ¼ 5,820 and Japanese, n ¼ 15,286) 9 . In our data, the effect size of rs2079742 on serum urate was comparable to previous estimates from Asian populations, but due to the small sample size, we did not observe significant association. And the SNPs rs11653176 and rs9905274 did not show a significant association with hyperuricemia, and with serum urate levels in the combined hyperuricemia and controls, either. The fact that many of the gout patients were treated may also influence the results for urate levels association analyses. Another nearby gene, TBX2, which is located B7 kb downstream of BCAS3, could be a candidate in this region as well. TBX2 has been reported to modulate the expression of IFN-gamma 17 . A previous study indicated that the MSU crystals alone did not induce NO production in murine macrophages, while a synergistic effect on the level of iNOS expression and NO generation was observed in cells exposed to MSU crystals in combination with IFN-g 18 . Therefore, it is possible that TBX may be involved in gout development by regulating IFN-g. Thus, both BCAS3 and TBX2 are the most likely genes involved in the association observed at this locus. The third identified SNP, rs12236871, mapped to 53-kb upstream of RFX3 (regulatory factor 3). This association signal was also observed in the female cohort. RFX3 is a transcription factor involved in the control of ciliogenesis. It is expressed in the ciliated ependymal cells of the subcommissural organ, choroid plexuses and ventricular walls. RFX3 has also been found to be necessary for the differentiation and function of mature beta-cells and regulates GCK expression and mature beta-cell function by binding to its promoter 19 . Interestingly, GCKR, another regulatory protein of GCK, was reported to be associated with serum uric acid levels in individuals of European descent 8 and with gout in the Han Chinese 20 .
The fourth SNP, rs179785, resides within the intron region of KCNQ1, the potassium voltage-gated channel (KQT-like subfamily) member 1 gene. KCNQ1 is expressed in the mid-to late-proximal tubule of the kidney and along the entire gastrointestinal tract. A recent study indicated that KCNQ1 is involved in mouse and human gastrointestinal cancer development, and the loss of KCNQ1 in mice leads to alterations in the genes involved in innate immune responses 21 .
Notably, KCNQ1 has shown a strong association with type 2 diabetes (T2D) in several GWASs [22][23][24] . Several studies found that common variants of KCNQ1 may also confer susceptibility to diabetic nephropathy, especially in East Asian populations [25][26][27] . To avoid the influence of T2D, in the follow-up stages of our study, all of the cases and controls were filtered for diabetes. Besides, the SNP (rs179785) is in low/moderate linkage disequilibrium with the reported T2D associated SNPs in the 1,000 Genome Asian samples (r 2 ¼ 0.001 to 0.044 and D' ¼ 0.038 to 0.421, Supplementary Table 8). Therefore, the association with gout identified here should be independent of T2D.
In this study, we recruited 4,275 male gout patients and 215 female cases, but this sample size was still limited, especially for the separate analysis for each stage. Not more than 1,500 cases were included for each stage. The discovery sample sizes used in this study can detect the effect sizes of median risks (ORs range from 1.35 to 1.50) for common alleles (frequencies range from 0.10 to 0.75) at the significance threshold used for follow-up (Po5 Â 10 À 5 ) with the power greater than 80% (Supplementary Table 9). However, our sample size for each stage may still be underpowered for detecting small risks.
In conclusion, the combined multiple-stage analysis identified three new loci located in 17q23.2, 9p24.2 and 11p15.5 that are associated with gout but not with hyperuricemia without gout in the Han Chinese population. The candidate genes located in these regions likely play important roles in the etiology of gout arthritis but not hyperuricemia. However, further validations, especially functional experiments, are suggested. Our identification of these new common genetic risk variants of gout arthritis provides new insights into the pathogenesis of this disease.

Methods
Ethics. The sample collection and the clinical information regarding the subjects were undertaken following informed consent and approval by the relevant ethics review board at the Affiliated Hospital of Qingdao University, in accordance with the tenets of the Declaration of Helsinki.
Subjects. All of the gout patients analysed in the study were interviewed by endocrinologists and diagnosed according to the American College of Rheumatology criteria for gout 28 . All of the cases in the discovery and validation stages were recruited using the same diagnostic criteria. The healthy controls were attained via site survey. Practice lists of healthy controls were screened for potentially suitable subjects by excluding those with hyperuricemia, diabetes, cancer and other arthritis-related illnesses.
The discovery phase included 1,255 male cases recruited from Shandong Province and 1,848 normal male controls recruited from the northern China area, including Shandong, Shanxi, Hebei, and Beijing.
Follow-up stages I and II (REP1 and REP2) included 814 male cases and 1,414 healthy male controls and 882 male cases and 1,895 healthy male controls, respectively. All of the samples were recruited from Shandong Province. Follow-up stage III (REP3) included two data sets: the Northern China sample set, which consisted of 996 male cases and 786 healthy male controls, and the Sichuan Province sample set, which consisted of 328 male cases and 329 healthy male controls.
The female cohort was comprised of 215 cases and 541 healthy controls. We recruited 1,644 hyperuricemia patients who had experienced hyperuricemia for more than 10 years without urate-lowering therapy but who had never developed gout. The serum urate level of the males and postmenopausal females was greater than 420 mmol l À 1 , while it was greater than 360 mmol l À 1 in the premenopausal females.
The sample descriptions can be found in Supplementary Table 1. All of the subjects in the replication stage were unrelated.
Quality control (QC) of the GWAS data set. A total of 3,360 arrays were used in the GWAS, including 25 arrays for the designed duplication of the randomly selected samples. A dish QC (DQC) value greater than 0.82 was set as the primary quality-control step. A total of 94 samples were excluded from further data analyses due to DQC failure. A total of 6 samples were excluded because the self-reported genders did not match the genotyped genders. The members with a lower experimental quality in the duplicate pair, which were genotyped as an internal control for the experimental QC, were excluded from the analysis (n ¼ 25). The genotype data were generated using Axiom Genotyping Algorithm v1 (Axiom GT1). For the sample filtering, arrays with generated genotypes for o95% of the loci were excluded (n ¼ 40). The heterozygosity rates were calculated and deviations of more than 6 s.d. from the mean were excluded (no samples were excluded). PLINK's identity by descent analysis was used to detect the hidden relatedness. When pairs of individuals had a PI_HAT40.25, the member of the pair with the lower call rate was excluded from the analysis (n ¼ 92); 1,255 cases and 1,848 controls were retained for further analyses. For the SNP filtering (after sample filtering), SNPs with call rates o95% in the samples were removed (n ¼ 36,349). SNPs with a minor allele frequencyo3% (n ¼ 12,267) or SNPs that deviated significantly (Pr1 Â 10 À 5 ) from Hardy-Weinberg equilibrium in the controls (n ¼ 15,707) were also excluded. A total of 603,697 SNPs passed the quality criteria and were used in the subsequent analyses.
Population stratification analysis. The population stratification was assessed using a PCA-based method implemented in the software package EIGENSTRAT 29 . First, we performed a PCA of a combination of our samples and 270 reference HapMap samples to evaluate the population structure of the samples. The first two eigenvectors are plotted in Supplementary Fig. 2. We performed a second PCA for the discovery set for the population stratification correction ( Supplementary Fig. 3). A total of 20 principal components were generated for the correction.
SNP genotyping in the replication phases. The genotyping for replication I was performed using the iPLEX platform (Sequenom, San Diego, CA), and replications II and III were performed using the ligation detection reaction method 30,31 , with technical support from the Shanghai Biowing Applied Biotechnology Company.
Statistical analysis. For gout, logistic regression was used to test the association of a single SNP using PLINK (http://pngu.mgh.harvard.edu/Bpurcell/plink/) 32 , and the 20 principal components were used as covariates in the association analysis to correct for the population stratification. After adjustment, little stratification was observed (l ¼ 1.058, l 1,000 ¼ 1.039, standardized to a sample size of 1,000). A fixedeffects model with inverse variance weighting was used in the meta-analysis. Heterogeneity across the data sets was evaluated using Cochran's Q test, and the I 2 index was used to quantify the degree of heterogeneity. For serum urate levels, the phenotypes were normalized to a standard normal distribution for further association analysis, the association analyses were conducted using SNPTEST 33 . A Manhattan plot of the À log 10 (P values) was generated using Haploview 34 . Ungenotyped SNPs of the autosomes were imputed in the GWAS discovery samples using SHAPEIT 2.0 (http://www.shapeit.fr/) 35 (phasing step), IMPUTE2 (, http://mathgen.stats.ox.ac.uk/impute/impute_v2.html) 36 (imputation step) and the haplotype information from the 1,000 Genomes Project (Phase I integrated variant set across all 1,092 individuals, v2, March 2012; http://www. 1000genomes.org/; Supplementary Fig. 5b). The online tool HaploReg (http://hapmap.ncbi.nlm.nih.gov/) was used to explore chromatin states, conservation and regulatory motif alterations of the associated loci 13 . The input for HaploReg consisted of the six index SNPs, and the r 2 threshold was set at 0.8 (based on the 1,000G Phase 1 ASI population for the LD calculation). Regional plots were generated using the online tool LocusZoom 1.2 37 (http://csg.sph. umich.edu/locuszoom/). Power analysis was conducted using the genetic power calculator at risk allele frequency ranges from 0.05 to 0.85 and OR ranges from 1.10 to 1.50 38 .