Common and rare variants associated with kidney stones and biochemical traits

Kidney stone disease is a complex disorder with a strong genetic component. We conducted a genome-wide association study of 28.3 million sequence variants detected through whole-genome sequencing of 2,636 Icelanders that were imputed into 5,419 kidney stone cases, including 2,172 cases with a history of recurrent kidney stones, and 279,870 controls. We identify sequence variants associating with kidney stones at ALPL (rs1256328[T], odds ratio (OR)=1.21, P=5.8 × 10−10) and a suggestive association at CASR (rs7627468[A], OR=1.16, P=2.0 × 10−8). Focusing our analysis on coding sequence variants in 63 genes with preferential kidney expression we identify two rare missense variants SLC34A1 p.Tyr489Cys (OR=2.38, P=2.8 × 10−5) and TRPV5 p.Leu530Arg (OR=3.62, P=4.1 × 10−5) associating with recurrent kidney stones. We also observe associations of the identified kidney stone variants with biochemical traits in a large population set, indicating potential biological mechanism.

T he lifetime risk of kidney stones is 8.8% in the United States 1 with an estimated recurrence rate of 14% after 1 year and 35% after 5 years 2 , placing a significant burden on the health care system 1 . In Iceland, the prevalence in individuals older than 70 years is 10.1% for men and 4.2% for women 3 . Kidney stones form when urine becomes supersaturated with salts such as calcium oxalate or calcium phosphate and when urine concentrations of natural inhibitors of stone formation such as citrate, magnesium, pyrophosphate, uromodulin and osteopontin are low 4,5 . Calcium oxalate and calcium phosphate are the most common (B80%) constituents of kidney stones 5 . The mechanism of stone formation involves both environmental factors such as diet 6 and genetic traits. Individuals with a family history of kidney stones are at a greater risk than others of developing the condition. Recent studies estimate that up to 65% of kidney stone formers have a family history of kidney stones [7][8][9] and both twin 10 and genealogy 11 studies have reported strong heritability of kidney stone disease. Candidate gene association studies have attempted to assess the role of several genes involved in calcium homeostasis on kidney stone formation 12 . These studies have been limited by sample size, the results are conflicting and they do not consider a broad spectrum of sequence variants.
In the current study, we have substantially increased the sample size from our previous study 13 (N cases ¼ 5,419; N controls ¼ 279,870). We were also able to increase the number of sequence variants tested by performing GWAS on imputed genotypes of sequence variants identified through whole-genome sequencing of 2,636 Icelanders, yielding a more extensive coverage of the genome. In addition to discovering associations with kidney stones, we assessed the association of these variants to 13 biochemical traits involved in calcium-phosphate metabolism, purine metabolism, kidney function, acid-base and ion homeostasis in a large population set.
In this study, we found variants at three loci associated with kidney stones at a genome-wide significance level (0.05/28.3 million ¼ 1.8 Â 10 À 9 ) and one additional locus with suggestive association ( Fig. 1; Table 1). At these loci, we also observed genome-wide significant associations with serum calcium, phosphate, PTH and ALP that do not in all cases correlate with the corresponding kidney stone association signal. Focusing on coding variants in genes with preferential renal expression, we also found two rare coding variants associating with kidney stones and recurrent stone formation.
Variants at ALPL. We observe a genome-wide significant signal with two fully correlated markers associating with kidney stones in ALPL at 1p36. 12 (Table 2). To attempt replication, we directly genotyped the missense variant ALPL p.Val522Ala in a Danish population set (N ¼ 5,822, MAF Denmark ¼ 17.53%). We replicated the association (P ¼ 3.0 Â 10 À 4 ) with an effect ¼ 10.73 s.d.%. Combined analysis of discovery and replication sets yielded a P ¼ 3.5 Â 10 À 35 and an effect ¼ 8.00 s.d.% with no heterogeneity between the populations (P ¼ 0.38). This is consistent with a previous functional study where a mutated version of ALPL carrying p.Val522Ala showed increased enzyme activity in a cell-based assay 20 . Within ALPL, we also detect a low-frequency missense variant (rs149344982[A], MAF ¼ 1.42%, NP_000469.3: p.Arg152His), which associates strongly with reduced serum ALP levels (effect ¼ À 45.9 s.d.%, P ¼ 9.84 Â 10 À 105 ) and suggestively with reduced risk of kidney stones (OR ¼ 0.742, P ¼ 8.1 Â 10 À 3 ) (Supplementary Table 2). ALPL p.Arg152His has little correlation with the more common ALPL variants rs1256328 and p.Val522Ala (r 2 o0.0040) (Supplementary Table 3). This finding is in line with the effect of several rare loss-of-function variants in ALPL that have been reported in patients with hypophosphatasia (OMIM:146300), a syndrome characterized by decreased levels of ALP and elevated urine pyrophosphate 21 , a known inhibitor of kidney stone formation 4 . ALPL is expressed in the proximal tubules of the kidney 21 and hydrolyses pyrophosphate to free phosphate, where it may promote the formation of kidney stones 4 . Our results, for variants in ALPL that either increase or decrease ALP levels, suggest that the effect of ALPL on extracellular pyrophosphate metabolism can influence kidney stone formation.
In summary, we report three uncorrelated (r 2 o0.0062 for all pairs) genome-wide significant signals at the ALPL locus associating with ALP and kidney stones (rs1256328 ), ALP and serum phosphate (rs12132412 and rs1976403) and ALP (rs149344982).
Variants at SLC34A1. On 5q35.3 at the SLC34A1 locus, encoding the Na/Pi co-transporter SLC34A1, we replicate with an effect size similar to the replicated one a common kidney stone association signal previously reported in an Asian population 14 (Supplementary Table 5). The strongest marker in our data rs12654812[A] (MAF ¼ 41.84%) associates with kidney stones with an OR ¼ 1.18 and P ¼ 5.7 Â 10 À 11 ( Fig. 1; Table 1) and correlates with the previously reported marker (r 2 ¼ 0.68) (Supplementary Table 5). Nine variants located in or near SLC34A1 show genome-wide significant association with kidney   Table 6). In addition, we demonstrate that rs12654812[A] associates significantly with decreased serum PTH levels (effect ¼ À 5.9 s.d.%, P ¼ 2.3 Â 10 À 9 ) and decreased serum phosphate (effect ¼ À 4.2 s.d.%, P ¼ 1.1 Â 10 À 14 ) ( Table 2). The sequence variant rs12654812 has also been reported by us and others to associate with kidney functionrelated traits 24,25 . To search for additional signals at this locus after adjusting for the index variant rs12654812, we performed conditional analysis including all variants located within the LD block containing rs12654812. We identified three strongly correlated (r 2  . SLC34A1 p.Tyr489Cys shows the best association of all coding variants (N ¼ 24) with kidney stone disease (adjusted OR ¼ 1.93, adjusted P ¼ 1.4 Â 10 À 5 ) and recurrent stone formation (adjusted OR ¼ 2.58, adjusted P ¼ 4.3 Â 10 À 6 ) ( Table 3) in SLC34A1 and the nine adjacent genes that reside within the same LD block (Fig. 2). The p.Tyr489Cys variant is located in the Na/Pi co-transporter domain 27 (amino acids 368-504, see Supplementary Fig. 1) at a highly conserved position (GERP 28 ¼ 5.03) and is predicted to be pathogenic (PolyPhen 29 ¼ probably damaging and SIFT 30 ¼ deleterious). Interestingly, p.Tyr489Cys has been reported to associate with increased serum creatinine such as the common kidney stone risk variant rs12654812 [A] 25 . In our data, p.Tyr489Cys also associates with decreased PTH levels, similar to rs12654812[A] ( Table 2; Supplementary Table 7).
The association data presented here point to SLC34A1 as the kidney stone target at this locus. In support of this hypothesis, several studies demonstrate that rare variants in SLC34A1 are linked to hypophosphatemic nephrolithiasis/ osteoporosis (OMIM:612286) and that SLC34A1 is the only gene at the locus that shows tissue-specific gene expression in kidney 31 . Furthermore, the observed changes in biochemical traits reflect the function of SLC34A1 as a phosphate transporter. The kidney is the main regulatory organ of phosphate homeostasis and serum phosphate levels are reflected by a change in phosphate reabsorption. SLC34A1 is expressed in the brush border membrane of proximal tubular cells, where the bulk of phosphate reabsorption takes place, and appears to be responsible for B70% of total phosphate transport based on mouse models in which Slc34a1 has been knocked out [32][33][34] . The reduction in serum PTH levels associated with the kidney stone variants likely results from a decrease in serum phosphate levels caused by diminished renal reabsorption as a consequence of the negative feedback loop that maintains serum phosphate levels.
Variants at CLDN14. We previously reported a genome-wide significant association at the CLDN14 locus on 21q22.13 with kidney stones represented by rs219780 (ref. 13). In the current 1p36 -ALPL analysis, a total of 31 correlated variants at the locus reach genome-wide significance, including six variants for recurrent kidney stones ( Fig. 1; Fig. 2; Supplementary Table 8).
The strongest signal associating with kidney stones is a two base pair deletion in intron Table 1) correlating with the previously reported sequence variant rs219780[T] (r 2 ¼ 0.82, OR ¼ 0.81, P ¼ 2.6 Â 10 À 11 ). We do not observe a significant residual signal when adjusting the association of rs199565725 with kidney stones for rs219780 (adjusted P ¼ 0.086). When considering the number of variants in the LD block containing rs199565725 no variants remained significant after adjusting for this variant (P40.05/ 1,993 ¼ 2.5 Â 10 À 5 ).
Variants at CASR. After the three genome-wide significant loci for kidney stones, we observe a suggestive association of rs7627468[A] in the first intron of CASR at 3q21.1 encoding the calcium-sensing receptor (MAF ¼ 26.80%; OR ¼ 1.16, P ¼ 2.0 Â 10 À 8 ) (Table 1; Fig. 1). CASR is a G-protein-coupled receptor expressed on the apical surface of the proximal tubule 36 that plays a key role in renal control of calcium homeostasis 37 and has long been considered a candidate gene for kidney stones 38,39 . A total of 58 variants, strongly correlated with rs7627468 (r 2 40.95) and located within intron 1 of CASR, associate with kidney stones (Fig. 2; Supplementary Table 10). A non-significant   Table 11). In summary, we observe two uncorrelated signals (r 2 ¼ 0.0073) at the CASR locus, one for serum calcium only (rs73186030) and one mainly for kidney stones (rs7627468). Taken together, this suggests that the effect of CASR on kidney stones is complex. The sequence variant rs73186030, which has a strong effect on serum calcium, does not associate with kidney stones. The kidney stone variant rs7627468 and other linked variants associated with kidney stones are located within intron 1 of CASR that entails a regulatory region 42 .
Rare coding variant in TRPV5. We used a recent source of data on tissue-enriched gene expression in an attempt to analyse coding variants in genes with preferential kidney expression 31 . We tested a total of 220 coding variants in 63 genes, showing tissue-specific or enriched expression in the kidney, for association with kidney stones and recurrent kidney stones. In addition to SLC34A1 p.Tyr489Cys, we found a rare missense variant in TRPV5 (NP_062815.2:p.Leu530Arg (MAF ¼ 0.13 %) associating significantly with recurrent kidney stones (OR ¼ 3.62, P ¼ 4.1 Â 10 À 5 o0.05/220 ¼ 2.3 Â 10 À 4 ) ( Table 3). The TRPV5 p.Leu530Arg variant was observed only once in the ExAC database (samples N ¼ 61,486) 26 . The variant is at a highly conserved position (GERP 28 ¼ 5.8) and is predicted to be damaging by two different algorithms (PolyPhen 29 ¼ probably damaging and SIFT 30 ¼ deleterious). TRPV5 is a highly selective epithelial calcium channel and the mutation lies within the pore-forming region of the protein (amino acids 527-538) 43 (Fig. 3; Supplementary Fig. 1). The change in coding sequence results in substitution of the hydrophobic amino acid leucine with a positively charged arginine at position 530 (Fig. 3). The introduction of a positive charge into the hydrophobic pore-forming region of TRPV5 is expected to interfere with the diffusion of positively charged calcium ions across the channel. TRPV5 is expressed at the apical membrane of distal renal tubule epithelial cells that mediates calcium transport in the kidney and constitutes the rate-limiting step of active calcium reabsorption 44 . Trpv5 knockout mice and mice carrying a point mutation in Trpv5 exhibit renal calcium wasting resulting in severe hypercalciuria 45,46 . TRPV5 has been suggested to play a role in hypercalciuric disorders but candidate gene studies have so far failed to demonstrate an association 44,47,48 .
Variant in APRT and recessive mode of inheritance. We note that we observe a strong association of a missense variant in the APRT gene encoding adenine phosphoribosyltransferase

Replication of Asian variants.
A GWAS in Asians reported three loci associating with kidney stones 14 . In addition to the variants at the SLC34A1 locus mentioned above, we were able to replicate variants at the INMT-FAM188B-AQP1 locus (Po3.3 Â 10 À 3 , OR ¼ 1.16-1.21) (Supplementary Table 6).

Discussion
We performed a GWAS of 28.3 million variants discovered through sequencing and observed four common sequence variants associating with the risk of kidney stones at ALPL, SLC34A1, CLDN14 and CASR. We replicate the association of the variant at SLC34A1 reported in Asians 14 and we previously reported 13 the signal corresponding to the association of the variant at CLDN14. In addition to the classic GWAS approach, we used a recent resource 31 on tissue-specific expression together with the ability to detect and impute rare variants to analyse coding changes in genes with enriched expression in the kidney. Among those, we found rare missense variants at highly conserved sites in SLC34A1 and TRPV5 associating strongly with risk of kidney stones and recurrent kidney stones. The identification of a rare missense variant in SLC34A1, independent of common variants in the region, points to SLC34A1 as the causative gene for both signals. Interestingly, the calcium channel TRPV5 has been the focus of several studies suggesting a role in kidney stone formation 12 . Sequencing a large number of individuals (N ¼ 2,636) in a founder population allowed us to detect the TRPV5 p.Leu539Arg variant (MAF ¼ 0.13%). This variant is essentially absent from other sequenced populations and is located at an extremely conserved site in the pore-forming region of the protein, making it likely to be the causal variant explaining the association with recurrent kidney stones. The total proportion of sibling recurrence risk for kidney stones explained by the identified sequence variants is 4.81% (Supplementary Table 12). Two of the identified genes are involved in phosphate homeostasis (ALPL 4 and SLC34A1 (refs 12)) and the other three genes play a key role in renal handling of calcium (CLDN14 and CASR, TRPV5) 12 . We screened the kidney stones associated variants for their association to serum level of biochemical traits and detected association of variants at ALPL with ALP, SLC34A1 with PTH, and creatinine, phosphate and CLDN14 with PTH, magnesium and potassium. We also observed uncorrelated genome-wide significant association of variants at these loci that influence serum levels of biochemical traits but do not associate with the disease in all cases. This implies that the risk is not mediated solely through the serum levels of the biochemical traits.
The observation of three uncorrelated signals at ALPL associating with ALP levels which do not predict their associations with serum phosphate and kidney stones is noteworthy. This is an example of allelic heterogeneity that is particularly interesting in the context of the relationship between metabolic bone disease and kidney stones.
A similar pattern is observed at CASR where we observe two uncorrelated signals, one located in intron 1 of the CASR gene associating with kidney stones and the other at the 3 0 -end of the gene associating strongly with serum calcium but not with kidney stones. This observed allelic heterogeneity might reflect differences in the function of CASR in the kidney on one hand and the parathyroid gland on the other. Variations in intron 1 might specifically influence gene expression in the kidney influencing the ability of CASR to respond to extracellular calcium and in this way increase the risk of kidney stone formation 51 .
In summary, the genetic associations presented emphasize the role of sequence variants in genes involved in calcium-phosphate homeostasis in kidney stone disease. The pathophysiology underlying these associations requires further study.

Methods
The Icelandic study population. This study is based on whole-genome sequence data from the whole blood of 2,636 Icelanders participating in various disease projects at deCODE genetics 15,16 (Supplementary Tables 13 and 14) (European Variant Archive: PRJEB8636). In addition, a total of 104,220 Icelanders have been genotyped using Illumina SNP chips 15,16 (Supplementary Table 15) and genotype probabilites for untyped relatives has been calculated based on Icelandic genealogy 15,16 .
All participating individuals, or their guardians, gave their informed consent before blood samples were drawn. The family history of participants donating blood was incorporated into the study by including the phenotypes of first-and second-degree relatives and integrating over their possible genotypes.
All sample identifiers were encrypted in accordance with the regulations of the Icelandic Data Protection Authority. Approval for these studies was provided by the National Bioethics Committee and the Icelandic Data Protection Authority.
Kidney stone study population. To identify kidney stone cases, we searched for patients with International Classification of Diseases (ICD) codes, radiology diagnosis codes and surgical procedure codes indicative of kidney stones 3 at Landspitali-The National University Hospital of Iceland in Reykjavik (LUH), a community hospital for half of Iceland's population and a tertiary care centre for the whole nation; Akureyri Hospital in North Iceland, the largest hospital outside the Reykjavik area; and Domus Radiology in Reykjavik, the largest privately operated medical imaging clinic in the country. A thorough medical record review was conducted for all patients identified to confirm the diagnosis of kidney stones. Patients with calcifications other than kidney stones and asymptomatic kidney stones were excluded from the study. A total of 5,419 kidney stone cases were included in the association analysis; 2,979 of these were genotyped using various Illumina chips and imputed using long-range phased haplotypes, and genotype probabilities for 2,440 were imputed on the basis of information from genotyped close relatives 15,16 . Among the kidney stone cases were 2,172 recurrent kidney stone formers. A recurrent episode was defined as the development of a new stone occurring at least 6 months following the first stone event. Controls comprised individuals recruited through different genetic research projects at deCODE. Individuals in the kidney stone cohort were excluded from the control group. Of the controls, 88,266 were genotyped by chip, and 191,604 were imputed on the basis of the genotypes of close relatives 15,16 . The total number of controls was 279,870.
Quantitative trait measurements. We studied a sample of Icelanders with biological markers measured at three different laboratories: the laboratory of the LUH, the Icelandic Medical Center (Laeknasetrid) Laboratory in Mjodd (RAM) and Akureyri Hospital (FSA). Here we use measurements of serum calcium (N ¼ 114, A Danish sample set was included in the study involving measurement of ALP of healthy individuals from the Inter99 study 52 . The study was approved by the Regional Scientific Ethical Committees for Southern Denmark and the Capital Region of Denmark. Informed consent was obtained from all study participants. Gene and variant annotation. Variants were annotated with information from Ensembl release 70 using Variant Effect Predictor (VEP) version 2.8 (refs 52,53).
Association testing. To test for association between sequence variants and disease, we applied logistic regression using disease status as the response and genotype counts as covariates as we described previously 16 . The following individual characteristics that correlate with disease status were also included in the model as nuisance variables: sex, county of birth, current age or age at death (first-and second-order terms included), blood sample availability for the individual and an indicator function for the overlap of the lifetime of the individual with the timespan of phenotype collection.
As described previously 16 , given genotype counts for n individuals, g 1 ; g 2 ; . . . ; g n 2 0; 1; 2 f g , their phenotypes y 1 ; y 2 ; . . . ; y n 2 0; 1 f g and a list of vectors of nuisance parameters x 1 ; x 2 ; . . . ; x n , the logistic regression model states that where a, b and g are the regression coefficients and L i is the contribution of the ith indivual to the likelihood function; L a; b; g ð Þ¼ Q n i¼1 L i a; b; g ð Þ . It is then possible to test for association based on the asymptotic assumption that the likelihood ratio statistic follows a w 2 distribution with one degree of freedom: Given the computation cost of maximizing over the nuisance parameters for every marker in the genome, the likelihood was maximized under the null hypothesis of b ¼ 0, which is the same for all markers, and use the maximizer of g,g, under the alternative. This methods leads to a smaller likelihood ratio than maximizing over g for every marker because max a;b;g L a; b; g ð Þ!max a;b L a; b; g ¼g ð Þ . Our analysis is based on imputed genotype values where the values of g i are not known. Instead, we use P(g i ¼ j|I i ) for jA{0,1,2}, where I i stands for the information about g i . Given the logistic regression model above, this allows us to calculate ; for all i 2 1; 2; . . . ; n f g : Testing quantitative traits. To test for association between quantitative traits and sequence variants a generalized form of linear regression was used as described previously 16 . Let y be a vector of quantitative measurements, and let g be a vector of expected allele counts for the tested variant. Assuming the quantitative measurements follow a normal distribution with a mean that depends linearly on the expected allele at the variant and a variance covariance matrix proportional to the kinship matrix: y $ N a þ bg; 2s 2 F À Á ; where is the kinship matrix as estimated from the Icelandic genealogical database. We split individuals with trait values into smaller clusters for the calculation as it is not NATURE COMMUNICATIONS | DOI: 10.1038/ncomms8975 ARTICLE computationally feasible to use the full model. The maximum likelihood estimates for the parameters a, b and s (ref. 2) involve inverting the kinship matrix. If there are n individuals in the cluster, then this inversion requires O(n 3 ) calculations. However, because these calculations only need to be performed once, the computational cost of doing a genome-wide association scan will only be O(n 2 ) calculations per variant, which is the cost of calculating the maximum likelihood estimates if the kinship matrix has already been inverted. The ALP measurements from the Danish Inter99 (ref. 54) study were tested using a generalized log-linear regression, assuming an additive genetic model, for association with allele count of rs34605986 using the R software package.
Adjusting for relatedness. Relatedness and stratification within the sample sets were accounted for as we previously described 16 using the genomic control method 55 . P values were adjusted by dividing the corresponding w 2 value with an inflation factor l g estimated based on a set of about 300,000 common variants distributed across the genome Inheritance models in association testing. The sum of the two imputed haplotype probabilities was used as a covariate for both the logistic regression and the generalized linear regression when testing for association under the additive model 16 . Where an individual was imputed to have the minor allele of a sequence variant with probability a f on his paternal chromosome and a m on his maternal chromosome, then a f þ a m was used as a covariate.
Conditional analysis. When testing a variant conditioning on the expected genotypes of another variant the analysis was carried out using the same software used in the GWAS analysis 19 .
Gene expression specificity classification. Genes were defined to have tissuespecific or tissue-enriched gene expression as described by Fagerberg et al. 31 (ArrayExpress: E-MTABB-1773). Briefly, tissue-specific expression is defined as at least 50-fold higher FPKM (fragments per kilobase of exon per million fragments mapped) than all other tissues (N ¼ 27) and tissue-enriched expression is defined as at least fivefold higher FPKM than all other tissues.