Introduction

Hyperuricemia or elevated concentration of urate in serum (SU) is a risk factor for gout, hypertension, chronic kidney disease (CKD) and cardiovascular disease (CVD)1,2,3,4. Uric acid is the final product of purine metabolism in humans, and urate homeostasis involves balancing its production with secretion and reabsorption in the proximal convoluted tubule of kidneys3,4. The variation in SU concentration is under significant genetic influence and its pattern of inheritance suggests that many genes may influence it1. Correspondingly, the renal transport of urate involves several genes including solute carrier family 2, member 9 (SLC2A9), ATP-binding cassette ABC, subfamily G, member 2 (ABCG2), solute carrier family 22, members 11 and 12 (SLC22A11 and SLC22A12), solute carrier family 17, members 1, 3 and 4 (SLC17A1, SLC17A3 and SLC17A4), and solute carrier family 16, member 9 (SLC16A9). Most of these genes have been associated with hyperuricemia1,2,3,4,5,6.

Both hyper and hypouricemia have been linked to increased risk for metabolic diseases. While hypouricemia has been linked to neurological disorders such as multiple sclerosis and Parkinson’s disease7,8, hyperuricemia is causal for gout and nephrolithiasis and seems to increase the risk for CKD and CVD1,2,3,4. Originally thought to be just a marker, SU’s role in development and progression of these diseases is being increasingly recognized3. While a recent review by Li et al.9 found no clear role for uric acid in metabolic diseases other than gout and nephrolithiasis, many studies including ours, have shown that gout patients and asymptomatic hyperuricemic individuals tend to be at high risk for CVD and CKD (Table 1)10,11. Therefore, it is essential to understand the genetic and environmental factors that affect the variation in SU. Even though genome-wide association studies (GWAS) have identified many SU-related loci, the majority of these studies have been conducted in European, African American and Asian populations12,13,14,15,16,17,18,19,20,21,22. To better understand genetic variation, biological significance and translation to human health, it is important to study ethnically diverse populations23. Further, the linkage disequilibrium (LD) pattern differences in ethnically diverse populations may offer a unique perspective on fine mapping of genetic loci.

Table 1 Hyperuricemia and cardiovascular disease risk factors in SHFS.

American Indians are such a population that is understudied and underrepresented in genetic databases. The prevalence of CVD and CKD is high in American Indians with heart disease being the leading cause of death [https://www.cdc.gov/dhdsp/data_statistics/fact_sheets/fs_aian.htm]. The Strong Heart Family Study (SHFS) is a multigenerational family-based study of CVD in American Indians. This cohort has high rates of obesity, diabetes, CKD and CVD24,25,26. In addition, about 25% of individuals have hyperuricemia (SU > 6 mg/dl)27. Thus, our aim in this study was to identify the genetic loci that regulate SU concentrations in American Indians. The GWAS was first conducted in each of the three centers of the SHFS (Arizona, Oklahoma and Dakotas (North and South)), followed by a meta-analysis of all three centers. As a secondary aim, we aimed to identify American Indian-specific SNPs in SLC2A9, the gene most strongly associated with SU in this study.

Results

The current study included 3000 SHFS participants (1282 men and 1718 women) from three study centers, Arizona, North and South Dakota (Dakotas) and Oklahoma. The mean SU concentrations were 5.14 ± 1.5 mg/dl (4.6 ± 1.3 mg/dl in women and 6.0 ± 1.4 mg/dl in men); 4.9 ± 1.5 mg/dl, 5.2 ± 1.5 mg/dl, 5.3 ± 1.5 mg/dl in Arizona, Dakotas and Oklahoma respectively. Genetic analysis was conducted using rank-inverse-normal transformed SU concentrations, which were regressed on covariates such as age, sex, and their interactions, diabetes status, and medications27,28.

MetaboChip data analysis

Metabochip genotyping was conducted in a subset of 2000 SHFS (Arizona = 300, Dakotas = 850, Oklahoma = 850) participants who were free of diabetes at visit 1. The final data set included 162,718 autosomal SNPs. MetaboChip data analysis, conducted in each of the three SHFS centers, revealed significant associations of SU with SLC2A9 SNPs (P < 4 × 10−7); rs13145758, rs9998811, rs7862063 in Arizona and rs4481233 in Dakotas and Oklahoma. The minor allele frequencies (MAFs) ranged between 25 and 44%, and the effect sizes (proportion of the residual phenotypic variance that is explained by the minor allele of the SNP) ranged between 4 and 6% (Table 2). The most significant SNP in Arizona was rs13145758 whereas it was rs4481233 in Oklahoma and the Dakotas. Several other SNPs showed associations at P < 1 × 10−5 including rs4148155, rs1481012 and rs2231142 of ABCG2.

Table 2 Genome-wide association analysis of serum urate stratified by center.

Meta-analysis

As a follow-up, we conducted a meta-analysis of SNPs associated with SU concentrations in each of the three centers. The order of the MetaboChip-wide significantly associated SNPs remained similar after meta-analysis, but with increased statistical significance (most significant SNP - rs4481233 (A/G); P = 9 × 10−20) (Table 3). Individuals with G alleles of rs9998811, rs4148155 and rs1481012 and A alleles of rs7696092, rs4481233, rs13145758 and rs2231142 had lower SU concentrations as compared to other alleles. Table 3 shows these top seven SNPs that exhibited significant associations. Figure 1 shows the Manhattan plot with meta-analysis results showing strong association of SLC2A9 and ABCG2 variants on chromosome 4 with SU.

Table 3 Variants associated with serum urate in meta-analysis of the three centers.
Figure 1
figure 1

Genome-wide association analysis shows strong association of serum urate with SLC2A9 and ABCG2 SNPs.

SLC2A9 sequence variants

To identify American Indian-specific SNPs in our top hit, we conducted targeted sequencing of key regions of SLC2A9 including all exons, 74Kb of introns and 10 kb of upstream and downstream regions of the gene in 902 SHFS founders (individuals with offspring in the SHFS but no parents). A total of 427 autosomal, polymorphic variants were identified in the 96 kb region of SLC2A9 that was sequenced. These included 233 single nucleotide polymorphisms (SNPs; MAF ≥ 1%); 125 single nucleotide variants (SNVs; minor allele count ≥2); 26 singletons (variants found only in one of our samples); and 43 indels/triallelics (insertions or deletions or more than two alleles); 117 variants of these were novel based on comparison with dbSNP database (Supplementary Table 1). The 233 SNPs were then genotyped in all 3000 SHFS (Arizona = 586, Dakotas = 1208, Oklahoma = 1206) participants of all three centers. A total of 89 SNPs were associated at the significance level of P < 2 × 10−4 after adjustment for multiple tests. Table 4 lists the top 10 SNPs and their associations with SU concentrations. The rest of the SNPs and their association with SU are shown in Supplementary Table 2. Figure 2 shows LD patterns of these 10SNPs in the SHFS and other ethnicities from the 1000 Genomes project29.

Table 4 SLC2A9 sequence variants associated with serum urate (Top 10 significant associations reported here).
Figure 2
figure 2

Comparison of LD patterns between ethnicities for top 10SU-associated SLC2A9 sequence variants.

Genetic analysis conditional on significant SLC2A9 and ABCG2 variants

To identify secondary signals for loci associated with SUA, we repeated GWAS of SU conditional on the significant SLC2A9 and ABCG2 variants in each center30. We found suggestive evidence of association of SU concentration with SNPs in nucleobindin (NUCB1-AS1) (rs746075, P = 2.0 × 10−6) in Oklahoma and neuronal PAS-domain containing protein 4 (NPAS4) (rs7947391, P = 3.2 × 10−6) in the Dakotas. No significance or suggestive level of significance was found for SNPs in the Arizona center.

Replication of SU associations in independent cohorts

To ascertain whether SU-associated SNPs, particularly the novel ones, identified in American Indians are generalizable to other ethnicities, we conducted replications in four different cohorts: Europeans (publicly available data)15, Mexicans-mestizos31, Indians32 and East Asians (publicly available data)33. Table 5 shows the SNPs identified in SHFS along with the association results in the other four ethnicities. SNPs rs62293300 and rs4385059 of SLC2A9 were associated with SU with direction of the effect in American Indians being consistent with Mexican-mestizos. rs4385059, rs28592748 and rs7696092 of SLC2A9 were associated with SU in all three populations. However, the direction of the effect was same in American Indians, Mexican-mestizos and Indians but different in Europeans and East Asians. The SU associations with NPAS4 SNP rs7947391 was significantly replicated in the cohort of East Asians with the direction of the effect being same as American Indians and Europeans in contrast to Mexican- Mestizos and Indians. rs746075 of NUCB1 SNP was not associated with SU in any of the replication cohorts.

Table 5 Replication of the associations observed in SHFS.

Discussion

Our extensive association analysis using genome-wide as well as candidate gene SNPs in American Indians of the SHFS showed that uric acid transporters SLC2A9 and ABCG2 are key genes regulating SU concentrations. Previously, significant heritability was obtained, and linkages were localized for SU concentrations in American Indian participants of the SHFS27. Also, our previous candidate gene study replicated 7 SLC2A9 gene polymorphisms in these participants in all centers combined and when stratified by recruitment center28. However, so far no genome-wide analyses for SU have been reported in American Indians. In this regard, our MetaboChip data represents for the first time a detailed genome-wide investigation to identify genetic factors affecting the variation in SU in this population.

The top SNPs from our SU association analysis belonged to gene SLC2A9 located on chromosome 4 confirming results from previous studies from our and other groups14,15,16,17,18,19,20,21,22. The SNPs rs4481233, rs9998811, rs7696092 and rs3145758 were consistently associated with SU across centers. Our meta-analysis reproduced and strengthened our genome-wide MetaboChip results that were found for all the three study centers separately. In addition to SLC2A9, ABCG2 variants also significantly affected SU concentrations, which further implicates uric acid transporters in the regulation of SU concentrations.

Identification of genetic variants underlying complex traits in minority populations in the US is challenging as they are underrepresented in genetic association studies and databases23, particularly Native Americans. Previous studies from our group involving minority populations such as Mexican Americans19, Zuni Indians20 and Hispanic children21 have shown SLC2A9 to be the key gene affecting SU concentrations. The same has been shown by others in Europeans14,15, Asians16,17 and African Americans12,22. However, the associated variants seem to differ by population. Three recent studies have found significant association of SU with rs2231142 of ABCG2 and rs7678287 of SLC2A9 in Mexican-mestizos31 and rs2231142 of ABCG2 and rs3775948 of SLC2A9 in Indians32 and rs7679724 of SLC2A9 and rs4148155 of ABCG2 in Japanese individuals33. While rs2231142 and rs4148155 of ABCG2 and rs3775948 were strongly associated with SUA in our study, rs7678287 and rs7679724 were not associated with SU in either individual centers or in the meta-analysis.

Our MetaboChip (center-specific and meta-analysis) association analysis has consistently shown rs4481233 of SLC2A9 to be strongly associated with SU concentration, further confirmed by our sequence variant analysis. An intronic variant, rs4481233, has been shown to be strongly associated with urate and gout34,35,36. The minor allele (A) of rs4481233 has been shown to be associated with lower concentrations of SU34 replicated by our study where SU was decreased by 0.32 mg/dl with every allele of the minor allele. Similar results have also been reported in a GWAS of untargeted serum metabolomics in about 3000 individuals from two large population-based European cohorts where rs4481233 was found to be strongly associated with urate35,36. Although intronic, rs4481233 is in high LD with missense variant rs16890979 and is part of a LD block that contains the polymorphic Alu elements37 and regulatory motifs for Ets- and TCF12-family of transcription factors38 with potential for affecting splicing and gene expression.

We also found significant associations of SU with SNPs rs2231142, rs4148155, and rs1481012 (all three in LD) belonging to ABCG2, yet another uric acid transporter gene. Notably, rs2231142 is a missense variant of ABCG2 and is likely functional. This variant has been extensively reported to be associated with SU in several populations including Mexican-mestizos31, Mexican Americans13,39, European Americans, African Americans, and Asian populations12,13,14,15,16,17, although SLC22A12 has been shown to be strongest gene to be associated with SU in East Asians18. ABCG2 variants had not only been shown to be associated with SU concentrations, but also with fasting glucose40, and different forms of cancer41,42, especially rs1481012 associated with decreased risk of colorectal cancer43. This gene, however, was not associated with SU in a cohort of Zuni Indians, another American Indian group20.

Other SNPs that were linked to variation in SU concentrations were rs7862063, located on chromosome 9, and rs6688009 of phosphatidic acid phosphatase type 2 (PPAP2B or PLPP3) gene in the Arizona center. Studies have indicated a role for PPAP2B in adipogenesis44 and vascular inflammation45. It seems to have a protective role in endothelial dysfunction by negatively regulating inflammatory cytokines45. However, this association was not shown by our other two centers or replication cohorts. In the Dakotas, we observed marginal association of SU with rs179409 of potassium channel, voltage gated KQT-like subfamily Q, member 1 (KCNQ1) gene, which has been shown to associate with gout and hyperuricemia46,47. The potential mechanism seems to be through alterations in innate immunity47,48, though, these variants and our SNP are about ~3 Mb apart and also not in LD with each other. Therefore, these associations need further investigations. KCNQ1 has been widely reported to be associated with body mass index and other anthropometric measures, and also with type 2 diabetes mellitus in several populations49,50,51.

Furthermore, our conditional analysis identified two novel genes, NUCB1 and NPAS4, to be associated with SU. Conditional analysis is a tool that is used to identify secondary signals that may be otherwise masked by the strong effects of lead SNPs30. The NUCB1 and NPAS4 SNPs were associated with lower SU concentrations and found only in the Oklahoma and Dakota centers, respectively. Center-specific statistics for these two SNPs show considerable differences in minor allele frequencies between them, rs7947391 of NPAS4 – 0.05, 0.22, 0.13 and rs746075 of NUCB1 – 0.31, 0.49, 0.36 in AZ, DK, OK, respectively. NUCB1 is a Golgi-protein with potential role in calcium homeostasis and immunity52. It is believed to control protein unfolding in Alzheimer’s disease53 and stimulate insulin secretion54. NPAS4 is a neuronal transcription factor involved in the regulation of cognitive functions in the brains55. Its association with SU in our study was replicated in Japanese individuals of East Asian ancestry33. In addition, this study also found a significant expression quantitative trait locus (eQTL) for NPAS4 in monocytes (p = 6 × 10−10). The common link between NPAS4 and SU seems to be oxidative stress and inflammation-associated ischemia in the brain56. This assumes significance considering the increasing importance of uric acid in cognitive, and neuronal function and its recognition as an important biomarker for Parkinson’s disease57,58.

Sequence analysis of key regions of SLC2A9 identified 384 SNPs/SNVs/singleton variants, 117 variants of these were novel based on comparison with dbSNP database. Several of those SNPs, including rs3775946, rs6826764, rs6823877, rs56239136, rs4697693, rs2240721, rs1107710, and rs7698858 have not been previously reported as affecting SU concentrations. One SNP (chr_pos: 4_10027969) was found to be novel. Although, most studies reported the SNPs in the SLC2A9 locus to be significantly associated with SU in different populations, SLC2A9 variants such as rs3733585, rs6855911, rs1014290 and rs12499857 associated with SU in our study have also been linked to Parkinson’s disease59,60, type 2 diabetes61, anxiety disorders62 and nonsyndromic cleft palate63. Our results also showed that the minor alleles of most of these SNPs were associated with lower SU concentrations.

There are some limitations of the study. First, the MetaboChip may not be the best option for identifying SNP associations with SU in this population. Secondly, the sample size for GWAS may be only moderate. However, family-based studies, a key strength of this study, are favorable for detection of significant associations and/or gene discoveries as they are homogenous, are robust to the effects of population stratification and have increased power to detect novel associations due to reduced residual variance64. Another strength of this study is inclusion of an understudied population with unique genetic and environmental background and availability of extensive covariate information.

In summary, our results replicated known associations of uric acid transporters with SU in a minority population of American Indians and demonstrated potential novel associations of SU with neuronal-related genes which need further investigation.

Methods

Study population: strong heart family study (SHFS)

The Strong Heart Family Study (SHFS) is a genetic study of CVD risk in American Indians. Description of the phenotypes, SU measurement techniques, and other related analytical approaches have been detailed elsewhere27,28. In short, the SHFS is a multi-center family-based genetic study in American Indian communities from Arizona, the Dakotas, and Oklahoma, which are experiencing extraordinarily high rates of progressive chronic kidney disease (CKD), obesity, diabetes, CVD, and diabetic nephropathy. Approximately 3000 members (including 902 founders) belonging to multigenerational families of Arizona, North and South Dakota (Dakotas), and Oklahoma participated in the study. These individuals aged 14 to 93 years were recruited without regard to disease status in 199865,66. The Indian Health Service Institutional Review Board and the Institutional Review Boards from Texas Biomedical Research Institute and the University of North Carolina at Chapel Hill approved the SHFS protocol and all subjects gave informed consent. For participants under the age of 18 years, informed consent was obtained from their parent or legal guardian. Study design and methods of the SHFS are in accordance with institutional guidelines and have been described previously65,66.

Phenotyping

Blood was collected after an overnight fast. Uric acid concentrations in serum were assayed in the SHFS central laboratory by the uricase and peroxidase method67.

Genotyping

MetaboChip data

Blood collected from individuals who were free of diabetes at baseline visit (n = 2000) was used for this study. Cardio-Metabo DNA Analysis BeadChip (Illumina catalog# WG-310-1001 or WG-310-1002) was used for genotyping. The MetaboChip contained 196,725 markers. The original annotation file for the Cardio-Metabo BeadChip is Metabochip_Gene_Annotation. Simwalk2 was used to remove genotyping inconsistencies68. Participants were excluded if genotyping call rate was <95% (n = 3). SNPs were excluded if the call rate <98% (n = 0), not autosomal (n = 250), no data after imputation (n = 33,599) or Hardy-Weinberg equilibrium P < 1 × 10−5 (n = 20,067). Pairwise correlations (r2) between markers were calculated to estimate linkage disequilibrium (LD). The final cleaned, imputed data set includes 162,718 autosomal marker information available for 2000 American Indian participants.

Sequencing of SLC2A9 gene

We sequenced 96 kb of the SLC2A9 gene, using a Illumina’s TruSeq Custom Amplicon kit and MiSeq Sequencer, in 902 founders of multigenerational families. The target regions contained all exons, 2.2 kb, 74 kb of introns, and 10 kb of upstream and downstream region of the gene. Illumina generated sequence data (BAM files) were aligned to the Human Genome Reference Sequence version 37.1 (hg19). Variants were called, recalibrated and QC’d using the Genome Analysis Toolkits (GATK v.3.3) Haplotype Caller69. Pairwise correlations (r2) between markers were calculated to estimate linkage disequilibrium (LD). We identified 427 autosomal, non-monomorphic variants, 384 of which affected a single base (233 single nucleotide polymorphisms (SNPs; MAF ≥ 1%); 125 single nucleotide variants (SNVs; minor allele count ≥ 2); 26 singletons (variants found only in one of our samples)); and 43 were indels/triallelic (insertions or deletions or more than two alleles). Out of the forty-three indel/triallelic vaiants, 14 variants were listed in dbSNP (rs140391260, rs5856025, rs34839464, rs137899691, rs35950306, rs139025036, rs35614040, rs58702202, rs112058434, rs66622652, rs60841869, rs142713311, rs3834235, and rs66943961). The MAFs of all variants ranged between 0.1 and 49%, except for indels/triallelic.

Statistical analysis

Genotype cleaning and population stratification assessment

Genotype frequencies for each SNP were estimated and tested for departures from Hardy-Weinberg equilibrium in the software package, Sequential Oligogenic Linkage Analysis Routines (SOLAR)70. Also, we used principal component (PC) scores to model differences in ancestral contributions among study participants for MetaboChip data. PCs were calculated using the unrelated SHFS founders and a subset of 15,158 selected SNPs (r2 < 0.1; MAF > 0.05). PCA was performed on a matrix of “doses” (copies of minor allele) for the selected SNPs, using “prcomp” in R. The PC scores were then predicted for all genotyped individuals using the PCA model fit to the founder data71,72. While no PC accounted for a large percentage of total variance in genotypes scores, the first four PCs account for substantially more than the rest and were, therefore, included as additional covariates in association analyses.

Measured genotype analysis (MGA)

The association of SNPs with SU was estimated using a measured genotype analysis (MGA)73 executed in SOLAR after accounting for family relationships based on variance components approach. This approach allows us to account for the non-independence among family members. To minimize the problem of non-normality, the SU data were inverse-normal-transformed using SOLAR. All analyses involved adjustment for the covariate effects (see results). The appropriate significance level was determined to be P < 4 × 10−7 for MetaboChip data, and P < 2 × 10−4 for sequence data after correcting for multiple tests.

METAL

METAL74 software was used to perform meta-analysis of GWAS results taken from the three study centers, each study containing individual genome-wide MetaboChip association results for multiple markers.

Conditional analysis and functional annotation of significant variants

To identify additional independent loci that are associated with SU concentrations, we performed association analysis conditioned on significant SLC2A9 and ABCG2 SNPs30.

Replication studies

Mexican-mestizos studies

Seven of the SNPs that reached genome-wide suggestive significance in the discovery phase were tested for replication in an independent cohort of Mexican Mestizo individuals (1,061 children and 1,101 adults). Population characteristics, biochemical measurements and genotyping have been previously described. Briefly, genotypes of six SNPs were obtained from a Multi-Ethnic Genotyping Array (MEGA, Illumina, San Diego, CA, USA), while rs7947391 genotypes were imputed using 1000 Genomes Project and Native Mexican individuals as ref. 31. Associations with SUA were tested separately in children and adults in a linear mixed model that considered the genetic relatedness matrix as a random effect, while genotype, age, sex and BMI percentile or body mass index (kg/m2) were included as fixed effects. Results were meta-analyzed with the inverse variance method75.

Indian diabetes consortium

The study participants included the members of the INdian DIabetes Consortium (INDICO)76. Details of the study recruitment and phenotype measurements are given in Giri et al.32. In short, samples were enrolled in the study by conducting diabetes awareness camp organized in various parts of North India. Prior informed written consent was obtained from the study participants. The study was approved by the Human Ethics Committee of the CSIR-Institute of Genomics and Integrative Biology and the All India Institute of Medical Sciences research Ethics Committee. The study was conducted in accordance with the principles of the Helsinki Declaration. Genotyping was conducted using the Illumina Human 610-quad bead chip array. Association with SU concentrations was tested using linear regression models in PLINK77. Sex, age, BMI and first three principal components of genotypes were used as covariates in the model.

European studies

Replication analysis in Europeans was conducted with publicly available data from Kottgen et al.15. We used the summary statistics from meta-analysis of serum urate from the article by Kottgen et al. The meta-analyses comprised of 14 studies totaling 2,115 cases and 67,259 controls [http://metabolomics.helmholtz-muenchen.de/gugc/, http://useast.ensembl.org/Homo_sapiens/Variation/Explore?r=4:9950721-9951721;source=dbSNP;v=rs1079128;vdb=variation;vf=250145324].

East asian studies

Replication analysis in East Asians was conducted with publicly available data from Kanai et al.33. We used the summary statistics from a genome-wide association analysis of serum uric acid in Japanese individuals from Kanai et al. The GWAS was conducted for 58 quantitative traits, including serum uric acid, in 162,255 individuals [https://www.ncbi.nlm.nih.gov/pubmed/?term=Kanai+M%2C+2018%2C Nature + Genetics]78 This research was supported by the Tailor-Made Medical Treatment Program (the BioBank Japan Project) of the Ministry of Education, Culture, Sports, Science, and Technology (MEXT) and the Japan Agency for Medical Research and Development (AMED)78.