Association analysis in a Latin American population revealed ethnic differences in rheumatoid arthritis-associated SNPs in Caucasian and Asian populations.

Large genome-wide association studies (GWAS) have increased our knowledge of the genetic risk factors of rheumatoid arthritis (RA). However, little is known about genetic susceptibility in populations with a large admixture of Amerindian ancestry. The aim of the present study was to test the generalizability of previously reported RA loci in a Latin American (LA) population with admixed ancestry. We selected 128 single nucleotide polymorphisms (SNPs) in linkage equilibrium, with high association to RA in multiple populations of non-Amerindian origin. Genotyping of 118 SNPs was performed in 313 RA patients/487 healthy control subjects by mid-density arrays of polymerase chain reaction (PCR). Some of the identified associations were validated in an additional cohort (250 cases/290 controls). One marker, the SNP rs2451258, located upstream of T Cell Activation RhoGTPase Activating Protein (TAGAP) gene, showed significant association with RA (p = 5 × 10-3), whereas 18 markers exhibited suggestive associations (p < 0.05). Haplotype testing showed association of some groups of adjacent SNPs around the signal transducer and activator of transcription 4 (STAT4) gene (p = 9.82 × 10-3 to 2.04 × 10-3) with RA. Our major finding was little replication of previously reported genetic associations with RA. These results suggest that performing GWAS and admixture mapping in LA populations has the potential to reveal novel loci associated with RA. This in turn might help to gain insight into the 'pathogenomics' of this disease and to explore trans-population differences for RA in general.

condition can develop at any age, RA affects women more frequently than men and is mainly diagnosed between the ages of 40-60 years. In Latin America (LA), differences towards women seem to be higher, whereas prevalence has been estimated between 0.2-0.5% 2,3 . In Chile, there are data showing that the overall prevalence of RA based on clinical examination is 0.46% 4 .
The etiology of RA is multifactorial and partially unknown because of the complex interactions between genetic and environmental factors. Approximately 50% of RA risk is thought to be genetic, and one-third of this risk is associated with the human leukocyte antigen (HLA) locus 5 , specifically HLA-DRB1 shared alleles (SE), which encode a common amino acid sequence 6 . Since 2007 about 101 RA risk loci have emerged from genome-wide association studies (GWAS) and subsequent GWAS meta-analyses 7,8 , mostly in individuals from European and/or Asian populations (Supplementary Table 1). In fact, none of the GWAS pertaining to RA has been performed in LA populations (Supplementary Table 1).
It is generally accepted that many common risk variants are shared between multiethnic populations, but allele frequencies of disease-associated single nucleotide polymorphisms (SNPs) vary significantly among ethnic groups due to genetic drift or selection 9 . Linkage between causal variants and tag SNPs included in genotyping microarrays might vary depending on population-specific pattern of recombination which in turn, is largely affected by population size, founder effects and admixture processes. In addition, populations with different histories may carry distinct causal mutations even in similar loci. All of these factors can preclude generalization of genetic associations from one population to another, and suggest testing for locus-or haplotype-wise rather than SNP-wise generalization 10 .
López Herráez et al. 11 examined susceptibility loci for RA in LA populations. In this study, a strong association with HLA region was observed, with three independent effects, probably due to the diverse origin of the patients (Argentina, Mexico, Chile, and Peru). Some of the RA associations previously reported in GWAS were also replicated in the study by López Herráez and coworkers, but with moderate significant values (including protein tyrosine phosphatase, non-receptor type 22 (lymphoid) [PTPN22] and signal transducer and activator of transcription 4 [STAT4] genes). However, in general, genetic association studies on RA have not been robustly replicated in LA populations. Therefore, the aim of the present study was to carry out a high-density SNP genotyping in candidate genes to test their association with susceptibility to RA in the Chilean population, in order to provide insight on the cross-ethnic generalizability of known European and Asian RA risk loci to LA populations.

Results
In the present study, five hundred and sixty-three (42.0%) of the included individuals suffered RA. Supplementary Table 2 shows the characteristics of the RA patients that were used for the analysis. The mean age was 48 and 58 years for cohort 1 and 2, respectively, and 84.7% and 81.0% of the patients were women. The mean duration of the disease was 8 years. Anti-cyclic citrullinated peptide (CCP) antibodies were determined in a total of 218 patients being positive in 164 of them (75.23%), whereas rheumatoid factor (RF) was determined in 300 patients being positive in 264 (88.0%). The RA group did not differ from the control group with regard to any of the clinical parameters included in the study (data not shown).
The present findings do not show replicable association of individual SNPs with RA. Among 128 SNPs genotyped, 118 passed all the quality filters, after excluding SNPs with a minor allele frequency <0.01 or missingness > 0.1 and those that were not in Hardy-Weinberg equilibrium (HWE) (p < 0.001) (Supplementary Table 3). Only two markers (2%) showed significant associations (p ≤ 0.01): rs1635567 and rs2469434 (Table 1), of which none was confirmed in Cohort 2. When data from both cohorts were combined, rs2469434 was still significant whereas rs1635567 could not be tested because the assay failed in Cohort 2. However, the combined analysis revealed a new significant association for rs2451258 (combined p = 5 × 10 −3 ; p = 0.09 after Bonferroni correction for multiple testing) ( Table 1). Eighteen markers exhibited suggestive associations (p < 0.05), whereas the associations of the remainder of SNPs included in the study were not significant. The significantly-associated SNPs in peptidyl arginine deiminase, type IV (PADI4), Protein tyrosine phosphatase, non-receptor type 22 (PTPN22), signal transducer and activator of transcription 4 (STAT4), cytotoxic T-lymphocyte-associated protein 4 (CTLA4), tumor necrosis factor, alpha-induced protein 3 (TNFAIP3), and chemokine receptor 6 (CCR6 genes), identified in Caucasian and Asian populations, were not replicated in the Chilean population ( Supplementary Fig. 1).
We next determined the correlation between odds ratio (OR) derived from our study and OR previously reported in GWAS from Caucasian and Asian population 12 (Fig. 1). There was no correlation between data belonging to Caucasian population and our data (r = −0.041, p = 0.768), or between Asian populations and our data (r = 0.152, p = 0.302). In addition, the allele frequencies of RA-associated SNPs varied significantly among different ethnic groups (Fig. 2, Supplementary Fig. 2). The results of allele frequencies were concordant between our study (healthy controls vs. RA cases, p-value < 10 −15 and r = 0.98) and ChileGenomico dataset (healthy controls vs. ChileGenomico, p-value < 10 −15 and r = 0.96). However, the allele frequency in European, East Asian, Aymara and Mapuche samples showed variability compared to our cohort (r ≤ 0.70).
The sliding window test revealed several SNP blocks that were associated with RA ( Table 2). The p values for the strongest sliding window (ranging from p = 9.82 × 10 −3 to 2.04 × 10 −3 ) were associated with regions around STAT4 gene. In addition to the sliding window test, we also performed case-control studies based on linkage disequilibrium (LD) haplotype block reconstruction, not revealing associations between SNPs and RA. Detailed haplotype block information and the LD plot around the STAT4 gene are shown in Supplementary Fig. 3 .

Discussion
The present study aimed to investigate the association of SNPs markers in candidate genes and RA in the Chilean population. Our main finding was a little replication of previously reported genetic associations with RA. Indeed, only 2% of know RA loci from GWAS studies in populations of European or Asian origin were significantly associated in our LA population, and just 11% showed a suggestive association. This was unexpected because www.nature.com/scientificreports www.nature.com/scientificreports/ SNPs in well-known RA loci were tested, such as PADI4, PTPN22, STAT4, CTLA4, TNFAIP3, and CCR6 -none of which replicated. There are a number of reasons why previously GWAS-significant findings might not replicate in independent cohorts, as reviewed by Kraft et al. 13 . The small sample size of our study may be responsible for the www.nature.com/scientificreports www.nature.com/scientificreports/ modest number of SNPs that showed associations validated in our participants. Sample sizes larger than the one used here are needed to reach high confidence levels and strong statistical power. In this regard, the low prevalence of the disease restricted the number of patients that we were able to recruit for our study. A long-term effort to progressively collect numerous patients' samples from biobanks might allow to perform more powered genetic studies and to test for generalizability of genetic associations. Similarly, we believe that the small sample size is a main reason for the lack of differences we found between endophenotypes. Our study did not reach statistical power for one-third of the SNPs analyzed, which might provide a possible explanation, at least in part, for the lack of replication of the results in the Chilean population. However, if lack of power was the only explanation, it is expected that, overall, the OR values would follow the same trend in Chilean patients as in other populations. However, ORs in Chile show absolutely no correlation with estimates from studies with Europeans and only a very week positive association with Asians (Fig. 1). This suggests that genetic divergence between populations at these loci may be one of the reasons of the lack generalization of SNP associations.
Differences in LD patterns between populations may preclude replication of association, which can be caused by multiple factors such as different demographic history including population-specific bottlenecks, genetic drift, selection, and recent admixture, among others 14 . Large diversity in LD among populations from different continents, including the Americas, is well documented 15 . Furthermore, RA is a trait associated with loci responsible for the immune response, which in turn is highly associated with local adaptations and disease resistance. In support for the above interpretation of our results, although we did not find any significant SNP-wise association of STAT4 with RA, we did find association for this locus when testing haplotypes instead of genotypes. Using the sliding window test revealed several haplotype associations with RA, suggesting the possible existence of untested (potentially functional) genetic variation within STAT4 in the Chilean population, a result that other studies with different populations might had failed to detect or might had not shown the strongest signal. Further investigations are required to confirm these findings. The strongest association was observed for the SNP rs2451258 located upstream of the T-cell activation RhoGTPase activating protein (TAGAP) gene, although the p-value was >0.05 after Bonferroni correction for multiple testing. This variant is not within any protein-coding sequence or disrupted a non-coding functional motif, but TAGAP would be a promising biological candidate gene 12 . TAGAP gene encodes a member of the Rho GTPase-activator protein superfamily, but little is known about their role in the immune system. Additional investigations, with higher of variants in the region are required to confirm this hypothesis.
Polygenic risk scores could be the next great stride in genomic medicine, which is generating a considerable debate regarding their use in complex phenotypes 16 . Recently, Khera et al. proposed that it is time to contemplate the incorporation of polygenic risk prediction in clinical care 17 , projecting these scores across a wide variety of diseases. The risk scores have been generated and tested mainly in individuals of primarily European ancestry. In the present study, significant values of the previously detected SNP-wise associations were moderate and a better generalizability was found when testing association between phenotype and haplotypes rather than SNPs. Moreover, allele frequency vary between populations of different ancestries. These results suggest the existence of genomic patterns in Chilean, and probably other LA populations, that differentiate them from Europeans with regard to loci that are relevant for RA. This can be caused by different demographic histories (e.g., past population bottlenecks and migration events, or ancestries [18][19][20]. Haplotype-based associations may capture the interacting effects among two or more potential causal variants within certain genomic region, which single-variants www.nature.com/scientificreports www.nature.com/scientificreports/ approach cannot detect. Therefore, haplotype-based approaches show a greater power to map susceptibility genes in complex traits than single-marker methods 21,22 . These results support the need for GWAS in LA populations, including Chileans, to discover potentially novel loci accounting for genetic risk for RA, to investigate the contribution of genetic ancestry, and to improve performance of polygenic prediction models in these populations.

Study participants.
A total of 1.340 individuals were studied as two distinct cohorts. Cohort 1 comprised 313 patients with RA and 487 healthy control subjects; cohort 2 included 250 RA patients and 290 healthy controls. The patients with RA were diagnosed following the 2010 American College of Rheumatology/European League Against Rheumatism (ACR/EULAR) classification criteria 23 . The study was approved by the Ethical Committee of the "Servicio de Salud del Maule" (registration number 04/2014), Chile; and all individuals gave their written informed consent prior to enrolling in the study. All methods were performed in accordance with the relevant guidelines and regulations.
Snp selection and genotyping. A total of 128 SNPs from 73 genes were chosen for genotyping from previous GWAS in populations of diverse ethnic background 7,11 . Supplementary Table 3 shows SNPs elected for our analysis. Some of them were selected as haplotype-tag-SNPs (ht-SNPs) based on LD patterns located within our candidate genes (PADI4, PTPN22, STAT4, CTLA4, TNFAIP3 and CCR6) and using the HapMap dataset 24 . Haplotype tagging (Ht)-SNPs were selected using the Tagger tool of Haploview 25 , under the following criteria: minor allele frequency ≥0.01 and r 2 > 0.8, and based on the HapMap populations (CEU, CEU + TSI and MEX). Some of the identified associations were validated by genotyping 23 SNPs in the cohort 2. The SNPs were genotyped using the OpenArray ®™ TaqMan platform (Applied Biosystems Inc.) in the test (Cohort 1) and replication (Cohort 2) samples. The genotyping assays were performed at the Pfizer-University of Granada-Junta de Andalucía Centre for Genomics and Oncological Research (GENYO) (Cohort 1), Granada, Spain; and at the Centro Nacional de Genotipado (Cohort 2), at the Santiago de Compostela node, Spain.
Genotyping data from reference populations. In order to assess ethnic differences in allelic frequencies for the SNPs evaluated in this work, we obtained genotypes for 108 AFR, 99 EUR, and 103 EAS unrelated individuals from the 1000 Genomes Project Phase 3 dataset (http://www.1000genomes.org). For Amerindian ancestry, we obtained genotypes for 85 individuals of Aymara ancestry (AYM), 54 individuals of Mapuche ancestry (MAP), and 348 of Chilean ancestry (CLG) from the ChileGenomico Project (http://chilegenomico.med.uchile. cl). AYM, MAP, and CLG individuals were genotyped using the Axion LAT1 Array (Affymetrix, Inc., Santa Clara, California, U.S.) and imputed using the 1000 Genomes Project phase 3 26 .