Polygenic analysis of the effect of common and low-frequency genetic variants on serum uric acid levels in Korean individuals

Increased serum uric acid (SUA) levels cause gout and are associated with multiple diseases, including chronic kidney disease. Previous genome-wide association studies (GWAS) have identified more than 180 loci that contribute to SUA levels. Here, we investigated genetic determinants of SUA level in the Korean population. We conducted a GWAS for SUA in 6,881 Korean individuals, calculated polygenic risk scores (PRSs) for common variants, and validated the association of low-frequency variants and PRS with SUA levels in 3,194 individuals. We identified two low-frequency and six common independent variants associated with SUA. Despite the overall similar effect sizes of variants in Korean and European populations, the proportion of variance for SUA levels explained by the variants was greater in the Korean population. A rare, nonsense variant SLC22A12 p.W258X showed the most significant association with reduced SUA levels, and PRSs of common variants associated with SUA levels were significant in multiple Korean cohorts. Interestingly, an East Asian-specific missense variant (rs671) in ALDH2 displayed a significant association on chromosome 12 with the SUA level. Further genetic epidemiological studies on SUA are needed in ethnically diverse cohorts to investigate rare or low-frequency variants and determine the influence of genetic and environmental factors on SUA.

Uric acid is the final product of purine metabolism in humans 1 . Uric acid is produced primarily in the liver, and 70% of daily uric acid excretion occurs via the kidney 1 . The uric acid transport system in the kidney plays an important role in its excretion 2 , since only 10% of the initially filtered uric acid is excreted through urine 3 . The serum uric acid (SUA) level is determined by the balance of renal clearance, extrarenal clearance in the gut, and purine metabolism in the liver. The absence of the enzyme uricase and the presence of the renal uptake system contribute to higher SUA levels in humans, compared to that in other mammals 1,4 . SUA levels in the general population follow a normal distribution 2 . The physiological normal range of uric acid is 2.5-7.5 and 2.0-6.5 mg/dL for men and women, respectively 5 . The female hormones that influence the renal uptake of uric acid 6,7 and the effects of genetic variations in SLC2A9 and ABCG2 8,9 are reported as sex-related factors.
Hyperuricemia is the result of overactive hepatic metabolism and high cell turnover or renal under-excretion/ extra-renal under-excretion or a combination of both 10 . Generally, renal and gut under-excretion account for two-thirds 11 and one-third 12 of hyperuricemia caused by under-excretion. In addition to environmental factors, genetic factors identified by previous genome-wide association studies (GWASs) have been implicated in the critical process of uric acid excretion and its role in hyperuricemia [13][14][15][16][17][18] . SLC2A9 and ABCG2 were found to be the Meta-analysis of SUA levels. We analysed a total of 6,129,701 SNPs in the discovery phase to identify the association with SUA levels in individuals using linear regression and discovered eight independent SNPs (two low-frequency and six common) that reached the threshold of statistical significance (Table 2). In the discovery phase, the inflation factor for the meta-analysis of SUA was 0.998 after performing a genomic control   Fig. S1). A Manhattan plot of the meta-analysis on SUA levels is shown in Fig. 1. A total of 1,149 SNPs that reached a level of genome-wide significance (P-value <5.0 × 10 −8 ) for association with SUA levels are listed in Supplementary Table S1, and the 10 independent nonsynonymous SNPs that passed Bonferroni's correction for 11,600 nonsynonymous SNPs (P-value <4.31 × 10 −6 ) are listed in Supplementary Table S3. The regional plots for genome-wide significant association are shown in Supplementary  Fig. S3. Conditional analyses identified multiple variants, including rs184521656, a low-frequency (minor allele frequency (MAF) = 0.01) intronic variant in FRMD8, that reached a level of genome-wide significance for association with SUA levels after conditioning with pre-selected lead SNPs in the flanking region ( Supplementary  Fig. S4a). Other regions did not show multiple independent signals at the level of genome-wide significance ( Supplementary Fig. S4). The results of the main meta-analysis were similar to the simple model that used only sex, age, and the first 10 principal components of genetic variants as covariates, except for the chromosome 17 locus near the BCAS3 gene, which is a previously reported locus ( Supplementary Fig. S5) 9 . Of note, a rare nonsense variant in the SLC22A12 locus showed the strongest association with SUA levels (rs121907892, P-value = 7.4 × 10 −54 ; β = −1.15 mg/dl; standard error (SE) = 0.07 mg/dl) in the discovery phase ( Supplementary Fig. S2). The locus was monomorphic in most populations, but the variant was found to be present at a very low frequency in only East Asian populations. This variant has been reported in a previous exome-wide association study in the Japanese population 28 . It is also present in very low linkage disequilibrium with most of the other neighbouring variants. Another low-frequency missense variant in SLC2A9, previously identified in GWAS studies 9,29 , also showed significant associations with SUA levels (rs16890979, P-value = 5.86 × 10 −7 ; β = −0.47 mg/dl; SE = 0.09 mg/dl) in the discovery phase. This variant is a common polymorphism observed in most tested populations. Other significant common variants were found to be located near the   Table S5). Although most of these loci were previously reported in GWASs, the lead SNPs identified in the present study are different from those reported in European populations ( Supplementary Fig. S3) 9 .
Alcohol intake-adjusted associations in the region on chromosome 12. We performed meta-analysis of SUA levels using the alcohol intake-adjusted association results to check for the influence of the identified loci on uric acid levels via alcohol-related pathways. The variants in the chromosome 12 region did not show significant associations after adjusting alcohol intake, whereas other loci remained significant after this adjustment ( Supplementary Fig. S6a,b). In particular, the chromosome 12 region was significant in the genome-wide meta-analysis of alcohol intake ( Supplementary Fig. S6c). A common, missense variant in ALDH2 in this region, known to be associated with alcohol metabolism and drinking behaviour 30,31 , showed the strongest association with alcohol intake (rs671, P-value = 9.4 × 10 −101 ; odds ratio = 0.049; 95% confidence interval = 0.037 to 0.065) and was in high linkage disequilibrium with rs116873087, the lead SNP in NAA25 (r 2 = 0.985). Variants in the ATXN2 and CUX2 genes associated with SUA levels also showed association with alcohol intake ( Supplementary Fig. S6d).

Comparison of significant variants identified in our study and the European study.
Allele frequencies, haplotype structure, and effect sizes of the SNPs identified in the Korean cohorts may be different from those reported in European populations, which may lead to differences in the proportion of variance explained by the variants identified in the Korean and European populations. We compared the effect sizes and the proportion of variance for phenotype explained by a given SNP (PVE) for the significant variant sets from our study and from the Global Urate Genetics Consortium (GUGC) for the European population. The effect sizes of the lead SNPs in our study were approximately linear to those in the GUGC study and showed the same direction of effects as the GUGC study. Of the 27 significant lead variants from the GUGC study, 22 showed significant associations (P-value <0.05) in our study and the same direction of effects between the two studies.
Spearman's correlation coefficient of effect sizes for the GUGC GWAS hit SNPs was 0.743, and it was calculated to be 0.878 for the lead SNPs in our study (Fig. 2a,c). The Cohen's kappa (κ) coefficients of effect sizes were also high (Supplementary Table S2). In contrast, the sum of the PVE values was different between the two cohorts. The sum of PVE for the lead SNPs in our study was larger in Koreans (0.135) than in Europeans (0.071), whereas that for GUGC GWAS hit SNPs was slightly larger in Europeans (0.055) than in Koreans (0.048) (Fig. 2b,d).
Validation of low-frequency variants and PRSs of common SNPs. The linear regression model that comprised PRSs of common variants associated with SUA levels, the two low-frequency variants, and covariates was fitted to the discovery phase cohorts and validated in the two independent Ansan-Ansung and KBSMC cohorts. The PRSs were normally distributed and the overall distribution correlated with SUA levels (Fig. 3).
The highest and lowest bins of the PRSs corresponded to the highest and lowest levels of SUA, respectively. The PRS was significant in both validation cohorts (P-value = 7.06 × 10 −13 , β = 0.24 mg/dL, SE = 0.03 mg/dL in the Ansan-Ansung cohort and P-value = 7.93 × 10 −21 , β = 0.21 mg/dL, SE = 0.02 mg/dL in the KBSMC cohort) ( Table 3). Among the two low-frequency variants, the p.W258X nonsense mutation of SLC22A12 showed nominal association (P-value = 0.047). There was insufficient data to verify the significance of these low-frequency variants because of the small sample size in the cohort study.

Discussion
The main findings of our study include the identification of genome-wide significant association of common and low-frequency variants with SUA levels in Koreans, and the validation of the PRS model of SUA in different cohort studies. The strength of our study is that it includes both urban and rural dwelling participants across Korea and represents an investigation on alcohol consumption. In addition, we have assessed our PRS model in independent cohorts and used available data of genotype-phenotype associations from a large European GWAS for comparison. Our study evaluates the contribution of both low-frequency and common variants to SUA levels by using an HRC-based imputation.
Our results demonstrate the benefits of using a large imputation panel such as the HRC reference panel for discovery, and characterisation of common and low-frequency variants contributing to SUA levels in the general population. We replicated several previously known genetic loci associated with SUA levels (ABCG2, SLC2A9, NRXN2, NAA25, SLC17A3-SLC17A2, SLC22A12, and BCAS3) reported for the European population and identified independent lead SNPs in two loci (SLC22A12 and NAA25) between the European GWAS and the present study. Comparison with results of the GUGC European study showed that effect sizes of common variants were comparable between East Asian and European populations, whereas allele frequencies and sum of PVE of variants were higher in the population in which the corresponding variants were identified. These results suggest that SUA levels may be affected by population-specific variants as well as shared variants at the same locus, and in particular by population-specific low-frequency variants.
The population-specific SLC22A12 p.W258X variant and the SLC2A9 p.W282I variant were accurately imputed (R 2 ≥ 0.8) using the HRC reference panel. We identified an association between a low-frequency, Scientific RepoRtS | (2020) 10:9179 | https://doi.org/10.1038/s41598-020-66064-z www.nature.com/scientificreports www.nature.com/scientificreports/ SLC22A12 loss-of-function variant (rs121907892) and SUA levels in the general population at a genome-wide significance level (P-value <5 × 10 −8 ) 9,15,32 . SLC22A12 p.W258X is a well-known variant for hypouricemia in Koreans 33 . This variant has been earlier identified only by sequencing analysis 20,34 because of its rarer frequency and weaker linkage disequilibrium with other neighbouring common variants analysed in GWASs. Other common variants in this region were identified in the European population study (rs505802) 15 and AGEN study (rs504915) including a combined total of ~110,347 Asian individuals 21 . The imputation in AGEN was performed on the basis of the HapMap Phase 2 reference panel. In a more recent analysis by Lee et al., who used 1000 Genomes phase 3 haplotypes as a reference panel for imputation, the founder mutation was still not identified 35 . The observed significant association near the NAA25 gene on chromosome 12 was previously reported, although different genes (ALDH2, ATXN2, and CUX) were reported for this region. The lead SNP was in high linkage disequilibrium with rs671 in ALDH2, a common missense variant in East Asians, but mostly absent in other populations. Our results of alcohol intake-adjusted association analysis and comparison with GWAS for alcohol intake suggests that the association of this region with SUA levels is mediated by alcohol consumption or metabolism through ALDH2.
The sex-specific CDH13 variant (rs8063966) was only detected in females. This may be associated with lower levels of SUA in women since oestradiol and progesterone have been shown to be associated with CDH13 expression 36 . Estrogen is known to exert a protective effect on SUA levels, with SUA starting to rise in the late-menopausal transition stage 37 . Further research is needed to elucidate the mechanism underlying the interplay of CDH13 and SUA levels in women. in both ethnic cohorts (blue), genome-wide significance in only one cohort but nominal significance in the other (P-value <0.05) (sky-blue), insignificance (P-value ≥ 0.05) (black) or missing (grey) significance in the other cohort. The regression line (pink) shows the similarity between the two cohorts. There is greater dissimilarity between the two cohorts when the regression line is far from the y = x line (dotted line). Variants missing in any cohort were excluded from the regression analysis. Abbreviations: ρ, Spearman's correlation coefficient; κ, Cohen's kappa coefficient; β, coefficient of each SNP obtained by linear regression; PVE, proportion of variance in phenotype explained by a given SNP.

Scientific RepoRtS |
(2020) 10:9179 | https://doi.org/10.1038/s41598-020-66064-z www.nature.com/scientificreports www.nature.com/scientificreports/ The missense variant SLC2A9 p.W282I (rs16890979) has also been previously reported in European and African populations 29 . Renal hypouricemia is believed to arise owing to genetic defects in SLC22A12 and SLC2A9; this is based on the "rare disease, rare variant hypothesis" 38 . In contrast, we tested a regression model of low-frequency variants and PRSs of common alleles associated with SUA levels. Our model showed their independent effects despite predicting a relatively small proportion of variance in SUA levels of the general population, reflecting the "combinational effects of common and rare variants". Our linear regression model also showed that the SUA levels increased by 0.21-0.28 per 1 standard deviation (SD) increase in PRS and increased by 0.77-0.90 (rs121907892) and 0.44-0.45 (rs16890979) per risk allele of the low-frequent missense variant in the general population.   www.nature.com/scientificreports www.nature.com/scientificreports/ Our study has several limitations. First, since we enrolled a cohort of ethnically homogeneous Koreans, the generalisability of our results to other populations may be limited. For example, the rare SLC22A12 p.W258X variant has been found only in the East Asian population, whereas different missense SLC22A12 variants (c.1245-1253del and p.T467M) have been found in the Roma population with renal hypouricemia 39 . Second, we were not able to analyse all the low-frequency variants due to the lack of sequencing data and population-matched reference panels for imputation. The power to detect additional rare and less frequent disease-causing variants is expected be improve with the availability of reference panels that include sequencing data of more diverse racial/ethnic (e.g., East Asians) samples. Third, we conducted the present analysis in the general population, and therefore, we could not assess the role of genetic variants in people with impaired renal functions. Our results may be different from those reported for gout patients or individuals with impaired renal function, who may exhibit distinct pathways for SUA regulation by other transporters such as ABCG2 10,40 . Fourth, potentially confounding environmental factors were not considered in our analysis. A potential contribution of genetic interaction with other factors such as food and drug intake, and chronic disease conditions in the pathophysiology of SUA should be examined. The combination of all the significant genetic loci identified in this study explained approximately 9.55% of the variance in SUA levels of our study, suggesting the need for discovery of additional epigenetic and genetic factors in larger, more comprehensive cohorts.
In summary, we replicated the previously known common genetic variants associated with SUA in the Korean population and identified low-frequency variants that exerted substantial impact on reduced SUA levels. Further studies are needed to identify additional variants associated with SUA and to evaluate our model under different disease conditions (e.g., gout and chronic kidney disease).

Ethics, consent, and permissions. This study was approved by the institutional review board of
Sungkyunkwan University (IRB# SKKU 2017-12-007) and Kangbuk Samsung Hospital (KBSMC 2013-01-245-008), and written informed consent was obtained from all participants. All experiments were performed in accordance with all applicable institutional and governmental regulations concerning the ethical use of human participants.

Study participants. The Korean Genome and Epidemiology Study (KoGES) is a national biobank for
genomic and epidemiological studies 41 . The discovery-phase samples comprised two cohorts from KoGES, namely a nation-wide cohort from urban (3,585 individuals) and another from rural (3,296 individuals) areas. In total, 6,881 individuals with no missing covariates were included in the present study. Genotyping was conducted on the Affymetrix 6.0 and the Illumina Omni1 arrays for the urban and rural cohorts, respectively. We evaluated two low-frequency variants and PRS for validation in 1,167 individuals of a community-based KoGES cohort from Ansan and Ansung areas, in which SUA levels were measured, and in 2,027 individuals from the Health Screening and Examination cohort of the Kangbuk Samsung Hospital (KBSMC). Genotyping was conducted using the Affymetrix 5.0 array and Illumina HumanCore BeadChips for the Ansan-Ansung and KBSMC cohorts, respectively.
Quality control, imputation, and annotation. Sample-level and SNP-level quality controls were performed on the genotyped data using PLINK 1.9 42 . SNPs with MAF < 1%, call rate <98%, and deviation from Hardy-Weinberg equilibrium (P-value <1.0 × 10 −6 ) were excluded. Samples were excluded based on criteria including call rate <95%, heterozygosity rate (samples with observed heterozygosity rate 3 SD away from the mean were removed), and principal components derived from the genome-wide genotype data (samples with the first and second principal components 3 SD away from the mean were removed). We excluded one of related pairs of individuals with second-degree or closer relationships using KING 2.1 43 . After quality control, the genotype data was phased using Eagle 2.3 44 and imputed on the reference panel of the Haplotype Reference Consortium (HRC r1.1 2016) using Minimac3 45 . SNPs with low imputation quality (R 2 < 0.8) and MAF < 0.5% were excluded from the main analysis. We annotated gene symbols closest to each SNP using ANNOVAR 46 ; the two nearest genes were annotated for intergenic SNPs.
Association analysis. The association of SNPs with SUA was tested using a linear regression model adjusted for sex, age, body mass index (BMI), estimated glomerular filtration rate (eGFR), diabetes, hypertension, hyperlipidaemia, systolic blood pressure, total cholesterol, high-density lipoprotein (HDL) cholesterol, and triglycerides. Blood pressure medication was measured in the rural cohort and adjusted in linear regression. The equation from the Chronic Kidney Disease Epidemiology collaboration (CKD-EPI) was used for the estimation of eGFR 47 . We performed an inverse variance-weighted fixed-effects meta-analysis of SUA levels to combine the summary statistics of the association test results from the two discovery cohorts using METAL 48 . The meta-analysis results were double-genomic-controlled for population structure. In the meta-analysis, SNPs that reached a level of genome-wide significance (P-value <5.0 × 10 −8 ) for association with the SUA levels were considered significant. For nonsynonymous variants, Bonferroni's correction threshold (P-value <4.31 × 10 −6 , 0.05 for a total of 11,600 nonsynonymous tested variants) was used.

Conditional analysis.
We performed conditional association analysis for each significant locus using GCTA-COJO 49 to identify statistically independent SNPs, which have been listed in Table 2. The peak SNPs in each significant locus were selected as the primary associated lead SNP and the association analysis conditioning on the primary lead SNP was conducted for variants within the surrounding 2-megabase pairs region. If there were significant SNPs with a conditional P-value that reached a level of significance described in the association www.nature.com/scientificreports www.nature.com/scientificreports/ analysis, the peak SNP of conditional analysis was selected as the secondary associated lead SNP. The association analysis conditioning on the primary and secondary lead SNPs was conducted for variants in the surrounding region. This procedure was repeated until there were no more SNPs that reached the significant level in each locus.
Alcohol intake. We observed a high linkage disequilibrium between a lead SNP on chromosome 12 and a known functional missense variant (rs671) in the aldehyde dehydrogenase 2 (ALDH2) gene. Therefore, alcohol intake-adjusted association of SNPs with SUA was tested using a linear regression model adjusted for sex, age, the first 10 principal components derived from the genome-wide genotype data, and dummy variables for the alcohol intake groups. Several alcohol-related variables were investigated in the urban and rural cohorts, including the prevalence of a drinking habit, average intake per drink by type of alcoholic beverage, and average number of intakes per year. We calculated the daily alcohol intake based on these variables using the following equation: Alcohol intake g day Alcohol content ratio Average intake per drink Average number of intake per year The study participants in each cohort were then divided into four groups based on the calculated daily alcohol intake: non-drinker, light drinker (<20 g/day), moderate-heavy drinker (≥ 20 g/day and <50 g/day), and heavy drinker (≥50 g/day). To compare the GWAS results of SUA, we also performed the association analysis of alcohol intake (non-drinkers vs. moderate-heavy or heavy drinkers) using a logistic regression model adjusted for sex, age, and the first 10 principal components.
Comparison with results from the European cohort study. Summary statistics results from the GUGC were used for comparison of significant associations identified in our study with those reported for European populations. We extracted lead SNPs from the present study and GWAS hit SNPs from GUGC. Three of the GUGC GWAS hit SNPs were not found in our datasets and nine of the lead SNPs in our study were absent from the GUGC data. We set the β coefficients of these SNPs to zero. To compare the effect sizes from our meta-analysis with those of GUGC summary statistics, we used Cohen's kappa (κ) coefficient to determine the direction of effect size and Spearman's correlation coefficient (ρ). In addition, we used proportion of variance in phenotype explained by a given SNP (PVE) for comparison. The GUGC summary statistics provides sample size, MAF, effect size (β coefficient), and SE of effect size for each SNP above the P-value <5.0 × 10 −8 threshold 9 . We estimated PVE values using the equation below 50  we used values of the MAF of Europeans from the Genome Aggregation Database (gnomAD) because information on allele frequency was not included in the GUGC summary statistics 51 . Comparative measurements were calculated after excluding the variants that were missing in our meta-analysis result or from the GUGC summary statistics.
Polygenic risk scoring and model construction. We calculated PRS values as a weighted sum of the effects multiplied by the number of alleles based on the β estimates for SUA-raising alleles for 14 common (MAF ≥ 5%) SNPs that reached a level of genome-wide significance (P < 5.0 × 10 −8 ) for association with SUA levels and were not in a high linkage disequilibrium (r 2 ≤ 0.2) with the other variants. The PRS of β estimates for SUA-raising alleles was standardised to zero mean and unit variance. For nonsynonymous variants, Bonferroni's correction threshold (P < 4.31 × 10 −6 , 0.05 for a total of 11,600 nonsynonymous variants) was applied to correct for the multiple testing problem. Two low-frequency SNPs (MAF ~1%), rs121907892 and rs16890979, which passed the threshold were also included in the linear regression model-based analysis of SUA levels. We constructed a linear regression model on SUA levels that comprised these PRSs and two low-frequency nonsynonymous variants. The linear regression model was adjusted for the following covariates available in all cohorts for validation: sex, age, BMI, eGFR, diabetes, hypertension, hyperlipidaemia, systolic blood pressure, total cholesterol, HDL cholesterol, and triglycerides.