## Introduction

Hyperuricemia (serum urate level ≥ 6.8 mg/dL) and gout are associated with chronic kidney disease and components of metabolic syndrome, such as obesity, hypertension, and type 2 diabetes [1,2,3,4,5,6]. In the United States, the National Health and Nutrition Examination Survey (NHANES) estimate of the conditional probability of obesity given the individual has hyperuricemia is between 0.4 and 0.5 [1]. The prevalence of hypertension and chronic kidney disease given hyperuricemia are similarly high [1].

Because hyperuricemia and comorbidities, such as obesity, hypertension and chronic kidney disease are integral to the clinical presentation of cardiovascular disease, metabolic diseases and gout; an important open question is whether the clustering of these comorbidities/diseases has a genetic basis. Genetic epidemiology research has established that hyperuricemia, gout, and its comorbidities (and traits representing these comorbidities, i.e., BMI for obesity) are moderately heritable. Many loci are associated with multiple traits, generating genetic covariance, which could be due to pleiotropy, if the same loci affect more than one trait, or linkage disequilibrium, if separate but linked loci are influencing the traits. For example, a proportion of validated loci from GWAS may overlap among comorbidities, which suggests the existence of genetic covariance for pairs of traits [7,8,9,10,11]. Previous results from GWAS of eGFR and serum urate have implicated numerous overlapping loci [10, 12]. It is also possible to use GWAS summary statistics to directly estimate the genetic correlation of single loci with pairs of traits using the LD score regression approach, or in chromosomal segments [13, 14]. Importantly, many loci with small effects, when integrated across the entire genome, may generate genetic correlation between pairs of traits. In the following we take the latter approach of estimating the marker-based genome-wide genetic correlation of pairs of traits.

Elucidating the presence of genetic correlation may explain the shared biology of these traits/diseases. Quantitative genetic approaches can be used to deconstruct the bivariate association patterns observed at the phenotypic level into genetic and environmental components [15, 16]. For many years the estimation of genetic and environmental correlations was only possible with family data. It is now possible, using high-density panels of common SNPs and whole-genome regression (WGR) methods, to estimate quantitative genetic parameters in either family-based or unrelated datasets [17]. These methods have typically been used to analyze single traits to understand the biology of complex traits, and for prediction of phenotypes [18,19,20,21,22].

In this study we disentangle the phenotypic correlation into its genetic and environmental causes between serum urate and traits representing comorbidities of clinical importance: serum creatinine, blood pressure, serum glucose, and body mass index (BMI). We used multivariate representations of WGR to estimate heritabilities and genetic correlations with genetic markers. Our study is based on the Framingham Heart Study (FHS) and Hypertension Genetic Epidemiology Network (HyperGEN) data. Both datasets have family structure and are well known and intensively studied resources among genetic epidemiologists.

## Materials and methods

### Participants

In this study we used multivariate Bayesian WGR to estimate genetic correlations between serum urate, systolic blood pressure (SBP), BMI, serum creatinine, and blood glucose in two family-based datasets: FHS and HyperGEN. FHS is a longitudinal study with data from three related cohorts. The structure of FHS, its history, recruitment strategy and overall purpose are well described elsewhere [23]. Data used in this study consist of SNP genotypes and clinical data measured on a total of 8200 combined records from the three cohorts. This study included data from cohort 0 (original cohort) exam 13 (n = 1396, pht000015.v3), cohort 1 (offspring cohort) exam 6 (n = 3237, pht000035.v6), and cohort 3 (third generation) exam 1 (n = 3567, pht000074.v12). The exams were chosen primarily because they represented maximum concurrent measurements on individual FHS participants for the five cardio-metabolic traits.

HyperGEN is part of the Family Blood Pressure Program funded by the National Heart Lung and Blood Institute and was designed to study genetic contributors to hypertension and related conditions. Participants were recruited from several affected (≥2) hypertensive sibships ascertained through population-based cohorts or from the community-at-large. Recruitment was later extended to include siblings and offspring of the original sib-pair. The structure of HyperGen, its history, recruitment strategy, and overall purpose are well described elsewhere [24]. In this study only Caucasians were included. Participants with type 1 diabetes or advanced renal disease (defined as serum creatinine level >2 mg/dL) were excluded from the original study since these two conditions can cause secondary hypertension and the goal of HyperGEN was to identify novel essential hypertension loci. The same variables were collected in both the FHS and HyperGEN studies. FHS does not exclude subjects with particular conditions, thus we removed outliers greater than or less than 3 SD from the mean of the variable. Accordingly, 106 subjects (1.3%) for BMI, 57 subjects (0.7%) for SBP, 10 subjects (0.1%) for urate, 170 subjects (0.2%) for glucose, and 14 subjects (0.2%) for creatinine were removed from FHS prior to analysis. A sensitivity analysis, not excluding outliers for FHS was also carried out.

Both datasets were genotyped on the Affymetrix 500 K array (with 500,568 SNPs) and only genotyped autosomal markers were used. We excluded SNPs from the analysis if they had HWE p < 0.001, a minor allele frequency <0.05, or a genotype call rate <0.98. After QC, the number of autosomal markers used to generate the genomic relationshipmatrix was 333,915 in Hypergen and 388,172 in FHS. It has been argued that WGR requires very large sample size to obtain reliable estimates in order to identify genetic (measured by SNPs) and environmental (measured by clinical traits) variances [25]. However, in family-based data, the necessary sample size is much smaller [26, 27], achieving heritabilities near those from pedigree-based estimates [28, 29]. Phenotypic values of creatinine, BMI and SBP were log transformed to improve normality.

### Statistical methods for estimation of heritabilities and genetic and environmental correlations

We used a multivariate Bayesian WGR to estimate heritabilities and genetic correlations between traits. The phenotypic variance of y, $$( {\sigma _y^2} )$$, can be decomposed into genetic values captured by the SNP markers $$( {\sigma _g^2} )$$, and environmental factors $$( {\sigma _\varepsilon ^2} )$$ capturing residual error and any variation not specified by the model (e.g., diet) as well as genetic variation not captured by the SNP array $$( {\sigma _g^2} )$$, such as as non-additive genetic factors. The variance partition assumes independence between genetic and environmental effects. Genomic heritability is defined then as the ratio: $$h_g^2 = \frac{{\sigma _g^2}}{{\left( {\sigma _g^2 + \sigma _\varepsilon ^2} \right)}}$$ .A system of five traits (t) is regressed on genotype information using a multivariate mixed effects model of the following form,

$$y_{it} = \mu _t + \mathop {\sum }\limits_j z_{ij}\gamma _{jt} + g_{it} + \varepsilon _{it}\left( {i = 1, \ldots .,n;t = 1, \ldots ,5} \right),$$
(1)

where yit represents the phenotype of the ith individual for the tth trait, t = {1, …, 5}, μt is a trait-specific intercept (i.e., μt=1 and μt=2 for two traits such as BMI and serum urate for example), $$\mathop {\sum }\limits_j z_{ij}\gamma _{jt}$$ is a regression on j covariates other than markers (e.g., age, sex), git is a genomic effect representing the regression of the tth phenotype on common p SNPs, and εit is an error term assumed to be independent of git. The separation of genetic signal (git) from noise (εit) exploits genetic resemblance, quantified as the proportion of SNP allele sharing between two individuals at observed markers Kii. Specifically, in a MT-GBLUP model the joint distribution of genetic values and of model residuals are assumed to be as follows:

$$\left[ {\begin{array}{*{20}{c}} {{\boldsymbol{g}}_{t = 1}} \\ \ldots \\ {{\boldsymbol{g}}_{t = 5}} \end{array}} \right]\sim MVN\left[ {0,{\boldsymbol{G}}_0 \otimes {\boldsymbol{K}}} \right]\left[ {\begin{array}{*{20}{c}} {\varepsilon _{t = 1}} \\ \ldots \\ {\varepsilon _{t = 5}} \end{array}} \right]\sim MVN\left[ {0,{\boldsymbol{R}}_0 \otimes {\boldsymbol{I}}} \right]$$
(2)

Above, $${\boldsymbol{K}} = \left\{ {K_{ii\prime }} \right\}$$ is an n × n genomic relationship matrix describing genetic similarity between individuals in the variants from the Affymetrics array (MAF > 0.05) [30]. I represents an identity matrix with dimensions n × n. Above, $${\boldsymbol{g}}_t = \left( {g_{1t}, \ldots ,g_{nt}} \right)\prime$$ and $$\varepsilon _t = \left( {\varepsilon _{1t}, \ldots ,\varepsilon _{nt}} \right)\prime$$ are vectors of genomic values and error terms for the trait t, and G0 and R0 are within-subject five by five genetic and environmental covariance matrices. That is,$${\boldsymbol{G}}_0 = \left[ {\begin{array}{*{20}{c}} {\sigma _{g_1}^2} & \ldots & {\sigma _{g_1g_5}} \\ \ldots & \ldots & \ldots \\ {\sigma _{g_5g_1}} & \ldots & {\sigma _{g_5}^2} \end{array}} \right]$$ and $${\boldsymbol{R}}_0 = \left[ {\begin{array}{*{20}{c}} {\sigma _{\varepsilon _1}^2} & \ldots & {\sigma _{\varepsilon _1\varepsilon _5}} \\ \ldots & \ldots & \ldots \\ {\sigma _{\varepsilon _5\varepsilon _1}} & \ldots & {\sigma _{\varepsilon _5}^2} \end{array}} \right]$$ where $$\sigma _{g_t}^2$$ ($$\sigma _{\varepsilon _t}^2$$) represent the SNP-based genetic (environmental) variance of trait t, and σgt,gt (σεt,εt) represent the SNP-based genetic (environmental) covariance between the two traits. Genetic and environmental correlations between traits can be derived from the elements of G0 and R0. Specifically, the genetic and environmental correlations between traits t and t′ are as follows: $$\rho _{g_tg_{t\prime }} = \frac{{\sigma _{g_tg_{t\prime }}^2}}{{\sqrt {\sigma _{g_t}^2\sigma _{g_{t\prime }}^2} }}$$ and $$\rho _{\varepsilon _t\varepsilon _{t\prime }} = \frac{{\sigma _{\varepsilon _t\varepsilon _{t\prime }}^2}}{{\sqrt {\sigma _{\varepsilon _t}^2\sigma _{\varepsilon _{t\prime }}^2} }}$$, respectively. Models were fit using the package MTM in R [31].

## Results

The mean and SD for urate, BMI, creatinine, SBP, and glucose are presented in Table 1 for both HyperGEN and FHS. Pearson correlation coefficients for all pairs of trait phenotypes were mostly positive ranging from −0.04 to 0.49 (Table 2). The lowest magnitude phenotypic correlations were between creatinine and glucose, and between SBP and BMI (ranging from −0.04 to 0.15). By contrast, the phenotypic correlation between creatinine and urate was the trait pair with the maximum correlation for both FHS (0.49) and HyperGEN (0.48). The phenotypic correlations between traits excluding creatinine ranged between 0.12 and 0.49 in both datasets.

Figure 1 and Table 3 show results of the genomic marker-based heritability for BMI, creatinine, urate, SBP, and glucose for FHS and HyperGEN datasets. Heritability estimates (95% credibility regions) were consistent between the studies except for Framingham SBP, which was lower, 0.27 (0.23; 0.31), than the estimate using the HyperGEN data, 0.50 (0.43, 0.56). For FHS, the minimum heritability estimate was for glucose 0.31 (0.26; 0.36) and the maximum for SC 0.49 (0.42; 0.57). For HyperGen the minimum heritability was also lowest for glucose 0.31 (0.17, 0.44) and the maximum was for BMI 0.56 (0.48, 0.64).

The magnitude and direction of the genetic and environmental covariance estimates were generally concordant between the independent datasets (Table 3 and Fig. 2). Most genetic correlations were positive with credibility regions not overlapping zero (upper right quadrant in Fig. 2). For FHS, the largest magnitude genetic correlation estimates (95% credibility regions) were between glucose and BMI, 0.31 (0.21; 0.41), and glucose and SBP 0.31 (0.20; 0.43). HyperGEN’s maximum genetic correlation was between glucose and SU 0.30 (0.007, 0.59). For both FHS and HyperGEN, three of four trait pairs involving creatinine had negative genetic correlations, but their magnitudes were low and credibility regions overlapped zero. The exception was the genetic correlation estimate of urate and creatinine—FHS: 0.20 (0.07; 0.33); HyperGEN: 0.25 (0.07, 0.41)—, which in both studies were positive and significantly different than zero.

For both FHS and HyperGEN, all environmental correlations were positive except for CR with SBP, BMI and blood glucose, which contained credibility regions overlapping zero (Table 3). In FHS, 0.41 (0.35; 0.47), and HyperGEN, 0.33 (0.21, 0.43), the largest environmental correlation involved urate and BMI. For both studies the only environmental correlation involving creatinine that did not contain credibility regions overlapping zero was with urate—FHS: 0.11 (0.03; 0.19); HyperGEN: 0.25 (0.11, 0.36).

Finally, we conducted sensitivity analyses to assess the impact of removing outliers from the FHS data. We observed high concordance between the original genetic and environmental correlation estimates and those models in which the data were refit with outliers removed (Table S1). One genetic (urate × BMI) and one environmental (urate × blood glucose) correlation estimate appeared to change substantially when fitting the model with outliers, and in both cases the estimates were shifted toward zero. Therefore, for 28/30 cases (10 covariance estimates plus 5 variance estimates times 2 types, genetic and environmental) the results are robust to inclusion of outlier observations.

## Discussion

The prevalence of hypertension, obesity, type 2 diabetes, and chronic kidney disease are disproportionately high among those with hyperuricemia and gout [1,2,3]. Our research demonstrates that the association of comorbidities with hyperuricemia does have a genetic basis. We used a multivariate Bayesian WGR analysis to estimate, with genome-wide SNP markers, the genetic correlation of pairs of traits representing the comorbidities of hyperuricemia using two independent and well-studied datasets, HyperGEN and FHS. This allowed us to gauge the strength of the evidence for an overlapping genetic basis of the traits. We found that genetic factors account for a significant component of the phenotypic correlation in many pairs of traits, suggesting either that linkage disequilibrium or pleiotropy accounts for the genetic clustering of comorbidities with hyperuricemia. Notably, our data suggest that separate sets of genes are responsible for two modules of genetic regulation: of urate with metabolic traits, and urate with creatinine. The genetics of these five traits carry risk for other diseases, e.g., glucose for type 2 diabetes [32], creatinine for CKD [12], urate for gout [10]. Studies have demonstrated that comorbid conditions and gout typically cluster together in particular patterns, i.e., (1) gout only, (2) gout, and metabolic disease, (3) gout and a high prevalence of chronic kidney disease [4, 33]. There are exceptions to these broad observed patterns of clustering, which are attributable to particular study designs (NHANES vs primary care vs rheumatology clinics), but generally the analyses performed to date appear to generate roughly discrete clusters [34]. Our results indicating independent genetic overlap between urate and creatinine and urate and metabolic syndrome traits, support the notion that genetics may in part explain the clustering of gout, CKD, and other metabolic comorbidities.

Previous studies have shown that the traits analyzed here are moderately heritable [35, 36]. Our analysis of FHS and HyperGEN confirms these prior estimates in that genomic marker-based heritability estimates yielded results similar to results from the prior pedigree-based studies [35, 36]. A family-based study by Tang et al., using the National Heart Blood and Lung Institute (NHLBI) Family Heart Study, which included some participants from FHS, reported heritabilities highly concordant with the FHS marker-based estimates presented here. [36] Tang et al. reported urate heritability of 0.36 and the Bayesian WGR estimate was 0.38 [36]. The glucose and SBP point estimates presented here were slightly lower than the Tang et al. estimates but fell within the reported credibility regions. Tang et al. also reported genetic correlation estimates, which were not as concordant with the Bayesian WGR estimates. Only 1/6 bivariate correlations were statistically significant (of the four traits overlapping our study) [36]. One explanation for the discrepancy is that we had higher power to detect genetic correlations, which comes from two sources: (1) larger sample size and (2) cross generational genetic relationships including a large number of first-degree relatives. Their one significant genetic correlation estimate, between urate and BMI, was 0.39 ± 0.09 SE, while the estimates using the Bayesian WGR presented here were 0.2 (0.09, 0.3) 95% CR and 0.28 (0.12, 0.43) 95% CR for FHS and HyperGEN, respectively. Notably a recent analysis of ~6000 distantly related participants of a study from Sweden showed a GCTA based analysis (frequentist version of Bayesian WGR) to have an estimate for creatinine heritability of 0.19 [35]. In both FHS and HyperGEN, our heritability estimate is much higher and closer to the pedigree-based estimates. One explanation for the discrepancy, which corresponds to the concept of missing heritability, is that in unrelated samples the genetic markers may have lesser linkage disequilibrium with causal variants for the traits. This phenomenon lowers the genetic signal contributing to the estimated bivariate genomic correlation [26]. Given the similarity in the estimates between the FHS and HyperGEN studies and with the published literature, we are highly confident that our WGR approach yields precise results.

Our results demonstrate a widespread genetic basis for the co-association of pairs of traits related to hyperuricemia, gout, and their comorbidities. There were ten possible genetic correlations corresponding to pairwise combinations of five traits. At least half of the estimates in each dataset had 95% credibility regions not including zero: in FHS 6/10 and in HyperGEN 5/10. Creatinine and urate are genetically correlated, with significant estimates in both datasets, but creatinine did not show genetic correlation estimates different than zero with the other traits. Interestingly, urate was genetically and environmentally related to BMI, SBP, and blood glucose. Therefore, an important finding of our research is that there are two independent axes of genetic overlap between traits related to urate. The first axis involves urate, SBP, glucose, and BMI. This axis is indicative of metabolic syndrome. The second axis involves urate and creatinine and is indicative of the close inter-dependency between urate and creatinine levels via kidney function.

The presence of genetic correlation suggests simultaneous genetic effects on phenotypic values of both traits. Of the many overlapping loci from the previous eGFR and serum urate GWAS, six of these have fairly large effect (VEGFA, GCKR, INHBC, UBE2Q2, AP5B1, A1CF) [10, 12]. These six genome-wide significant loci are potential candidate genes generating pleiotropic effects between urate and creatinine.

In FHS 7/10 and in HyperGEN 5/10 pairwise comparisons for environmental covariance had estimates with credibility regions that did not overlap zero. Similar to the genetic correlations between traits, the environmental correlations suggest environmentally induced constraints in the expression of pairs of traits. In both FHS and HyperGEN, the largest environmental correlation was between BMI and urate. Other studies have suggested that BMI is causal for urate (and not the reverse: urate to BMI), although the effect size is very small, meaning their association represents a pathway from BMI to urate [37]. Thus, the large environmental covariance observed may represent an environmentally based constraint between BMI and SU that is relevant to the case of an intervention. Zhu et al. demonstrated in a secondary analysis of a clinical trial involving men with a high cardiovascular risk profile that a 10 kg weight loss reduced serum urate by 0.62 mg/dL on average [38].

A limitation of our study is that it does not provide inference of directionality of causal relationships. Such analyses would require techniques such as Mendelian randomization, which has previously indicated no causal effect of SU on chronic kidney disease [39]. Epidemiological studies show association between hypertension and hyperuricemia, but the causal relationship is unresolved although hyperuricemia has been observed to precede the incidence of hypertension [40, 41]. In addition, animal studies indicate uric acid causes endothelial dysfunction and impacts the renin angiotensin system [42, 43]. Further, results from clinical trials of adolescents demonstrate that urate lowering agents such as allopurinol lower blood pressure [44]. Allopurinol specifically inhibits xanthine oxidase, so an explanation to the blood pressure outcome is that effects are off target via inhibition of reactive oxygen species (ROS) production. It has also been suggested that the association between hyperuricemia and chronic kidney disease (stage 3 or greater) is causal [45], although this suggestion is also controversial as results from Mendelian randomization do not show evidence of SU modifying SC [39]. A second limitation is that genetic correlation estimated from genomic data has been questioned. Missing heritability translates to missing correlation and this effect is platform dependent [46]. However, in family-based studies the problem is minimal. Finally, our study focused on genetic correlation estimates for continuous traits used to define dichotomous traits such as serum urate (rather than hyperuricemia), and glucose (rather than diabetes). Because the continuous traits are not equivalent to the diseases themselves the analysis of their genetic overlap is far from complete. Nevertheless, the estimates of genetic overlap presented here of clinically important phenotypes may provide a window to see the potential genetic significance for the diseases the traits represent.

This study’s strengths include the family-based designs and very large sample size from the well-studied FHS and HyperGEN datasets, and the innovative and powerful Bayesian WGR approach used to estimate the marker-based genetic overlap of the traits representing hyperuricemia and comorbidities. The WGR marker-based and genome-wide estimates of heritability and genetic correlations were highly concordant between the independent studies. Our research motivates future quantification of genetic correlations at individual loci, which will increase our knowledge of the genetic etiology of hyperuricemia, gout, and its comorbidities.