Large genome-wide association studies (GWAS) have increased our knowledge of the genetic risk factors of rheumatoid arthritis (RA). However, little is known about genetic susceptibility in populations with a large admixture of Amerindian ancestry. The aim of the present study was to test the generalizability of previously reported RA loci in a Latin American (LA) population with admixed ancestry. We selected 128 single nucleotide polymorphisms (SNPs) in linkage equilibrium, with high association to RA in multiple populations of non-Amerindian origin. Genotyping of 118 SNPs was performed in 313 RA patients/487 healthy control subjects by mid-density arrays of polymerase chain reaction (PCR). Some of the identified associations were validated in an additional cohort (250 cases/290 controls). One marker, the SNP rs2451258, located upstream of T Cell Activation RhoGTPase Activating Protein (TAGAP) gene, showed significant association with RA (p = 5 × 10−3), whereas 18 markers exhibited suggestive associations (p < 0.05). Haplotype testing showed association of some groups of adjacent SNPs around the signal transducer and activator of transcription 4 (STAT4) gene (p = 9.82 × 10−3 to 2.04 × 10−3) with RA. Our major finding was little replication of previously reported genetic associations with RA. These results suggest that performing GWAS and admixture mapping in LA populations has the potential to reveal novel loci associated with RA. This in turn might help to gain insight into the ‘pathogenomics’ of this disease and to explore trans-population differences for RA in general.
Rheumatoid arthritis (RA) is an autoimmune inflammatory rheumatic disease that affects mainly synovial joints among many tissues and organs. It affects approximately 1% of the population worldwide1 and, although this condition can develop at any age, RA affects women more frequently than men and is mainly diagnosed between the ages of 40–60 years. In Latin America (LA), differences towards women seem to be higher, whereas prevalence has been estimated between 0.2–0.5%2,3. In Chile, there are data showing that the overall prevalence of RA based on clinical examination is 0.46%4.
The etiology of RA is multifactorial and partially unknown because of the complex interactions between genetic and environmental factors. Approximately 50% of RA risk is thought to be genetic, and one-third of this risk is associated with the human leukocyte antigen (HLA) locus5, specifically HLA-DRB1 shared alleles (SE), which encode a common amino acid sequence6. Since 2007 about 101 RA risk loci have emerged from genome-wide association studies (GWAS) and subsequent GWAS meta-analyses7,8, mostly in individuals from European and/or Asian populations (Supplementary Table 1). In fact, none of the GWAS pertaining to RA has been performed in LA populations (Supplementary Table 1).
It is generally accepted that many common risk variants are shared between multiethnic populations, but allele frequencies of disease-associated single nucleotide polymorphisms (SNPs) vary significantly among ethnic groups due to genetic drift or selection9. Linkage between causal variants and tag SNPs included in genotyping microarrays might vary depending on population-specific pattern of recombination which in turn, is largely affected by population size, founder effects and admixture processes. In addition, populations with different histories may carry distinct causal mutations even in similar loci. All of these factors can preclude generalization of genetic associations from one population to another, and suggest testing for locus- or haplotype-wise rather than SNP-wise generalization10.
López Herráez et al.11 examined susceptibility loci for RA in LA populations. In this study, a strong association with HLA region was observed, with three independent effects, probably due to the diverse origin of the patients (Argentina, Mexico, Chile, and Peru). Some of the RA associations previously reported in GWAS were also replicated in the study by López Herráez and coworkers, but with moderate significant values (including protein tyrosine phosphatase, non-receptor type 22 (lymphoid) [PTPN22] and signal transducer and activator of transcription 4 [STAT4] genes). However, in general, genetic association studies on RA have not been robustly replicated in LA populations. Therefore, the aim of the present study was to carry out a high-density SNP genotyping in candidate genes to test their association with susceptibility to RA in the Chilean population, in order to provide insight on the cross-ethnic generalizability of known European and Asian RA risk loci to LA populations.
In the present study, five hundred and sixty-three (42.0%) of the included individuals suffered RA. Supplementary Table 2 shows the characteristics of the RA patients that were used for the analysis. The mean age was 48 and 58 years for cohort 1 and 2, respectively, and 84.7% and 81.0% of the patients were women. The mean duration of the disease was 8 years. Anti-cyclic citrullinated peptide (CCP) antibodies were determined in a total of 218 patients being positive in 164 of them (75.23%), whereas rheumatoid factor (RF) was determined in 300 patients being positive in 264 (88.0%). The RA group did not differ from the control group with regard to any of the clinical parameters included in the study (data not shown).
The present findings do not show replicable association of individual SNPs with RA. Among 128 SNPs genotyped, 118 passed all the quality filters, after excluding SNPs with a minor allele frequency <0.01 or missingness > 0.1 and those that were not in Hardy-Weinberg equilibrium (HWE) (p < 0.001) (Supplementary Table 3). Only two markers (2%) showed significant associations (p ≤ 0.01): rs1635567 and rs2469434 (Table 1), of which none was confirmed in Cohort 2. When data from both cohorts were combined, rs2469434 was still significant whereas rs1635567 could not be tested because the assay failed in Cohort 2. However, the combined analysis revealed a new significant association for rs2451258 (combined p = 5 × 10−3; p = 0.09 after Bonferroni correction for multiple testing) (Table 1). Eighteen markers exhibited suggestive associations (p < 0.05), whereas the associations of the remainder of SNPs included in the study were not significant. The significantly-associated SNPs in peptidyl arginine deiminase, type IV (PADI4), Protein tyrosine phosphatase, non-receptor type 22 (PTPN22), signal transducer and activator of transcription 4 (STAT4), cytotoxic T-lymphocyte-associated protein 4 (CTLA4), tumor necrosis factor, alpha-induced protein 3 (TNFAIP3), and chemokine receptor 6 (CCR6 genes), identified in Caucasian and Asian populations, were not replicated in the Chilean population (Supplementary Fig. 1).
We next determined the correlation between odds ratio (OR) derived from our study and OR previously reported in GWAS from Caucasian and Asian population12 (Fig. 1). There was no correlation between data belonging to Caucasian population and our data (r = −0.041, p = 0.768), or between Asian populations and our data (r = 0.152, p = 0.302). In addition, the allele frequencies of RA-associated SNPs varied significantly among different ethnic groups (Fig. 2, Supplementary Fig. 2). The results of allele frequencies were concordant between our study (healthy controls vs. RA cases, p-value < 10−15 and r = 0.98) and ChileGenomico dataset (healthy controls vs. ChileGenomico, p-value < 10−15 and r = 0.96). However, the allele frequency in European, East Asian, Aymara and Mapuche samples showed variability compared to our cohort (r ≤ 0.70).
The sliding window test revealed several SNP blocks that were associated with RA (Table 2). The p values for the strongest sliding window (ranging from p = 9.82 × 10−3 to 2.04 × 10−3) were associated with regions around STAT4 gene. In addition to the sliding window test, we also performed case-control studies based on linkage disequilibrium (LD) haplotype block reconstruction, not revealing associations between SNPs and RA. Detailed haplotype block information and the LD plot around the STAT4 gene are shown in Supplementary Fig. 3.
The present study aimed to investigate the association of SNPs markers in candidate genes and RA in the Chilean population. Our main finding was a little replication of previously reported genetic associations with RA. Indeed, only 2% of know RA loci from GWAS studies in populations of European or Asian origin were significantly associated in our LA population, and just 11% showed a suggestive association. This was unexpected because SNPs in well-known RA loci were tested, such as PADI4, PTPN22, STAT4, CTLA4, TNFAIP3, and CCR6 -none of which replicated. There are a number of reasons why previously GWAS-significant findings might not replicate in independent cohorts, as reviewed by Kraft et al.13. The small sample size of our study may be responsible for the modest number of SNPs that showed associations validated in our participants. Sample sizes larger than the one used here are needed to reach high confidence levels and strong statistical power. In this regard, the low prevalence of the disease restricted the number of patients that we were able to recruit for our study. A long-term effort to progressively collect numerous patients’ samples from biobanks might allow to perform more powered genetic studies and to test for generalizability of genetic associations. Similarly, we believe that the small sample size is a main reason for the lack of differences we found between endophenotypes. Our study did not reach statistical power for one-third of the SNPs analyzed, which might provide a possible explanation, at least in part, for the lack of replication of the results in the Chilean population. However, if lack of power was the only explanation, it is expected that, overall, the OR values would follow the same trend in Chilean patients as in other populations. However, ORs in Chile show absolutely no correlation with estimates from studies with Europeans and only a very week positive association with Asians (Fig. 1). This suggests that genetic divergence between populations at these loci may be one of the reasons of the lack generalization of SNP associations.
Differences in LD patterns between populations may preclude replication of association, which can be caused by multiple factors such as different demographic history including population-specific bottlenecks, genetic drift, selection, and recent admixture, among others14. Large diversity in LD among populations from different continents, including the Americas, is well documented15. Furthermore, RA is a trait associated with loci responsible for the immune response, which in turn is highly associated with local adaptations and disease resistance. In support for the above interpretation of our results, although we did not find any significant SNP-wise association of STAT4 with RA, we did find association for this locus when testing haplotypes instead of genotypes. Using the sliding window test revealed several haplotype associations with RA, suggesting the possible existence of untested (potentially functional) genetic variation within STAT4 in the Chilean population, a result that other studies with different populations might had failed to detect or might had not shown the strongest signal. Further investigations are required to confirm these findings. The strongest association was observed for the SNP rs2451258 located upstream of the T-cell activation RhoGTPase activating protein (TAGAP) gene, although the p-value was >0.05 after Bonferroni correction for multiple testing. This variant is not within any protein-coding sequence or disrupted a non-coding functional motif, but TAGAP would be a promising biological candidate gene12. TAGAP gene encodes a member of the Rho GTPase-activator protein superfamily, but little is known about their role in the immune system. Additional investigations, with higher of variants in the region are required to confirm this hypothesis.
Polygenic risk scores could be the next great stride in genomic medicine, which is generating a considerable debate regarding their use in complex phenotypes16. Recently, Khera et al. proposed that it is time to contemplate the incorporation of polygenic risk prediction in clinical care17, projecting these scores across a wide variety of diseases. The risk scores have been generated and tested mainly in individuals of primarily European ancestry. In the present study, significant values of the previously detected SNP-wise associations were moderate and a better generalizability was found when testing association between phenotype and haplotypes rather than SNPs. Moreover, allele frequency vary between populations of different ancestries. These results suggest the existence of genomic patterns in Chilean, and probably other LA populations, that differentiate them from Europeans with regard to loci that are relevant for RA. This can be caused by different demographic histories (e.g., past population bottlenecks and migration events, or ancestries18,19,20). Haplotype-based associations may capture the interacting effects among two or more potential causal variants within certain genomic region, which single-variants approach cannot detect. Therefore, haplotype-based approaches show a greater power to map susceptibility genes in complex traits than single-marker methods21,22. These results support the need for GWAS in LA populations, including Chileans, to discover potentially novel loci accounting for genetic risk for RA, to investigate the contribution of genetic ancestry, and to improve performance of polygenic prediction models in these populations.
A total of 1.340 individuals were studied as two distinct cohorts. Cohort 1 comprised 313 patients with RA and 487 healthy control subjects; cohort 2 included 250 RA patients and 290 healthy controls. The patients with RA were diagnosed following the 2010 American College of Rheumatology/European League Against Rheumatism (ACR/EULAR) classification criteria23. The study was approved by the Ethical Committee of the “Servicio de Salud del Maule” (registration number 04/2014), Chile; and all individuals gave their written informed consent prior to enrolling in the study. All methods were performed in accordance with the relevant guidelines and regulations.
SNP selection and genotyping
A total of 128 SNPs from 73 genes were chosen for genotyping from previous GWAS in populations of diverse ethnic background7,11. Supplementary Table 3 shows SNPs elected for our analysis. Some of them were selected as haplotype-tag-SNPs (ht-SNPs) based on LD patterns located within our candidate genes (PADI4, PTPN22, STAT4, CTLA4, TNFAIP3 and CCR6) and using the HapMap dataset24. Haplotype tagging (Ht)-SNPs were selected using the Tagger tool of Haploview25, under the following criteria: minor allele frequency ≥0.01 and r2 > 0.8, and based on the HapMap populations (CEU, CEU + TSI and MEX). Some of the identified associations were validated by genotyping 23 SNPs in the cohort 2. The SNPs were genotyped using the OpenArray®™ TaqMan platform (Applied Biosystems Inc.) in the test (Cohort 1) and replication (Cohort 2) samples. The genotyping assays were performed at the Pfizer-University of Granada-Junta de Andalucía Centre for Genomics and Oncological Research (GENYO) (Cohort 1), Granada, Spain; and at the Centro Nacional de Genotipado (Cohort 2), at the Santiago de Compostela node, Spain.
Genotyping data from reference populations
In order to assess ethnic differences in allelic frequencies for the SNPs evaluated in this work, we obtained genotypes for 108 AFR, 99 EUR, and 103 EAS unrelated individuals from the 1000 Genomes Project Phase 3 dataset (http://www.1000genomes.org). For Amerindian ancestry, we obtained genotypes for 85 individuals of Aymara ancestry (AYM), 54 individuals of Mapuche ancestry (MAP), and 348 of Chilean ancestry (CLG) from the ChileGenomico Project (http://chilegenomico.med.uchile.cl). AYM, MAP, and CLG individuals were genotyped using the Axion LAT1 Array (Affymetrix, Inc., Santa Clara, California, U.S.) and imputed using the 1000 Genomes Project phase 326.
Power calculations were done with the GAS Power Calculator tool (http://csg.sph.umich.edu) assuming a multiplicative model, with OR = 1.5, a significance level of 0.05 and an RA prevalence of 0.5%. Only SNPs that met the quality criteria of a minor allele frequency (MAF) > 0.01, missingness < 0.1, and/or HWE P > 0.001 were considered for inclusion in the association analyses (Supplementary Table 3). Allele frequencies were compared between RA patients and control populations by chi-square test, and OR with 95% confidence intervals (95% CI) were calculated using PLINK software (v1.07)27. Haplotype analysis was performed using Haploview software (v4.2)25. In addition, haplotypes based on 1-bp sliding windows of 2 to 21 SNPs each were also constructed. Association analyses were done with the chi-square test using PLINK. Pearson’s correlations and linear regression were used to evaluate differences between genetic background. The LocusZoom web-based resource was used to generate plots of association results by genomic region28.
Tobón, G. J., Youinou, P. & Saraux, A. The environment, geo-epidemiology, and autoimmune disease: Rheumatoid arthritis. Journal of Autoimmunity 35, 10–14 (2010).
Spindler, A. et al. Prevalence of rheumatoid arthritis in Tucuman, Argentina. J. Rheumatol 29, 1166–1170 (2002).
Rodrigues Senna, É. et al. Prevalence of Rheumatic Diseases in Brazil: A Study Using the COPCORD Approach. Journal of Rheumatology 31, 594–597 (2004).
Bennett, K. et al. Community screening for rheumatic disorder: Cross cultural adaptation and screening characteristics of the COPCORD core questionnaire in Brazil, Chile, and Mexico. Journal of Rheumatology 24, 160–168 (1997).
MacGregor, A. J. et al. Characterizing the quantitative genetic contribution to rheumatoid arthritis using data from twins. Arthritis & Rheumatism 43, 30–37 (2000).
Gregersen, P. K., Silver, J. & Winchester, R. J. The shared epitope hypothesis. an approach to understanding the molecular genetics of susceptibility to rheumatoid arthritis. Arthritis & Rheumatism 30, 1205–1213 (1987).
Okada, Y. et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506, 376–381 (2013).
Terao, C., Raychaudhuri, S. & Gregersen, P. K. Recent Advances in Defining the Genetic Basis of Rheumatoid Arthritis. Annual Review of Genomics and Human Genetics 17, 273–301 (2016).
Yamamoto, K., Okada, Y., Suzuki, A. & Kochi, Y. Genetics of rheumatoid arthritis in Asia—present and future. Nature Reviews Rheumatology 11, 375–379 (2015).
Novembre, J. & Ramachandran, S. Perspectives on human population structure at the cusp of the sequencing era. Annu. Rev. Genomics Hum. Genet 12, 245–274 (2011).
Herráez, D. L. et al. Rheumatoid arthritis in latin americans enriched for amerindian ancestry is associated with loci in chromosomes 1, 12, and 13, and the HLA Class II region. Arthritis and Rheumatism 65, 1457–1467 (2013).
Okada, Y. et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506, 376–381 (2013).
Kraft, P., Zeggini, E. & Ioannidis, J. Replication in genome-wide association studies. Stat. Sci. 24, 561–573 (2010).
Slatkin, M. Linkage disequilibrium in growing and stable populations. Genetics 137, 331–336 (1994).
Conrad, D. F. et al. A worldwide survey of haplotype variation and linkage disequilibrium in the human genome. Nature Genetics 38, 1251–1260 (2006).
GWAS to the people. Nature Medicine 1483 (2018).
Khera, A. V. et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nature genetics 50, 1219–1224 (2018).
Wang, S. et al. Genetic variation and population structure in Native Americans. PLoS Genetics 3, 2049–2067 (2007).
Moreno-Estrada, A. et al. The genetics of Mexico recapitulates Native American substructure and affects biomedical traits. Science 344, 1280–1285 (2014).
Homburger, J. R. et al. Genomic Insights into the Ancestry and Demographic History of South America. Plos Genetics 11 (2015).
Liu, N., Zhang, K. & Zhao, H. Haplotype-Association Analysis. Advances in Genetics, 335–405 (2008).
Hsieh, A. R., Hsiao, C. L., Chang, S. W., Wang, H. M. & Fann, C. S. J. On the use of multifactor dimensionality reduction (MDR) and classification and regression tree (CART) to identify haplotype-haplotype interactions in genetic studies. Genomics 97, 77–85 (2011).
Aletaha, D. et al. 2010 Rheumatoid arthritis classification criteria: An American College of Rheumatology/European League Against Rheumatism collaborative initiative. Arthritis and Rheumatism 62, 2569–2581 (2010).
Frazer, K. A. et al. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–61 (2007).
Barrett, J. C., Fry, B., Maller, J. & Daly, M. J. Haploview: Analysis and visualization of LD and haplotype maps. Bioinformatics 21, 263–265 (2005).
Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Purcell, S. et al. PLINK: A tool set for whole-genome association and population-based linkage analyses. American Journal of Human Genetics 81, 559–575 (2007).
Pruim, R. J. et al. LocusZoom: Regional visualization of genome-wide association scan results. In Bioinformatics 27, 2336–2337 (2011).
This work was supported by Fondecyt grants n° 11130198 and n° 1151048. Research by A.L. is funded by the Spanish Ministry of Economy and Competitiveness (Fondo de Investigaciones Sanitarias and Fondos FEDER [grant number PI18/00139]).
The authors declare no competing interests.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Castro-Santos, P., Verdugo, R.A., Alonso-Arias, R. et al. Association analysis in a Latin American population revealed ethnic differences in rheumatoid arthritis-associated SNPs in Caucasian and Asian populations. Sci Rep 10, 7879 (2020). https://doi.org/10.1038/s41598-020-64659-0