1000 Genomes-based meta-analysis identifies 10 novel loci for kidney function

HapMap imputed genome-wide association studies (GWAS) have revealed >50 loci at which common variants with minor allele frequency >5% are associated with kidney function. GWAS using more complete reference sets for imputation, such as those from The 1000 Genomes project, promise to identify novel loci that have been missed by previous efforts. To investigate the value of such a more complete variant catalog, we conducted a GWAS meta-analysis of kidney function based on the estimated glomerular filtration rate (eGFR) in 110,517 European ancestry participants using 1000 Genomes imputed data. We identified 10 novel loci with p-value < 5 × 10−8 previously missed by HapMap-based GWAS. Six of these loci (HOXD8, ARL15, PIK3R1, EYA4, ASTN2, and EPB41L3) are tagged by common SNPs unique to the 1000 Genomes reference panel. Using pathway analysis, we identified 39 significant (FDR < 0.05) genes and 127 significantly (FDR < 0.05) enriched gene sets, which were missed by our previous analyses. Among those, the 10 identified novel genes are part of pathways of kidney development, carbohydrate metabolism, cardiac septum development and glucose metabolism. These results highlight the utility of re-imputing from denser reference panels, until whole-genome sequencing becomes feasible in large samples.


Supplementary Figure 2. Novel overlapping meta gene sets.
Shown are the 20 novel meta gene sets, based on the DEPICT analysis of 9,270 variants from the HapMap and 1000G meta-analysis. The coloring of the meta gene sets represents the smallest p-value of all comprised gene sets and is coded on a continuous scale. The overlap between meta gene sets was estimated by computing the pairwise Pearson correlation coefficient ᴘ between each pair of gene sets followed by a ranking: 0.3 ≤ ᴘ ≤ 0.5, low overlap; 0.5 < ᴘ < 0.7, medium overlap; ᴘ ≥ 0.7, high overlap. Overlap is shown by edges between gene set nodes; edges representing overlap corresponding to ᴘ ≤ 0.3 are not shown. The network was drawn with Cytoscape (http://cytoscape.org/). Figure 3. Regional association plots of the 8 loci with potentially independent association signal. Shown are the p-values (on a -log10 scale) versus genomic position (on GRCh build 37) in the 1000 Genomes meta-analysis before ("Unconditional" left panels: A1-H1) and after conditioning on the reported variant using the GCTA approach ("Conditional" right panels:

Supplementary
A2-H2). The reported variant is highlighted in blue and the potentially independent association signal is highlighted in red. The red horizontal line indicates the genome-wide significance threshold of 5x10 -8 .

ASTN2
This gene encodes a protein that is expressed in the brain and may function in neuronal migration, based on functional studies of the related astrotactin 1 gene in human and mouse. A deletion at this locus has been associated with schizophrenia.
Medium staining in both glomeruli and tubules ---

SLC7A6
Involved in uptake of dibasic amino acids and some neutral amino acids. Requires coexpression with SLC3A2/4F2hc for uptake of arginine, leucine and glutamine.
Not detected --Locus identified previously by the CKDGen Consortium in a pathway-based approach 61 .

SLC7A6OS
Encodes Solute Carrier Family 7 Member 6 Opposite Strand. Directs RNA polymerase II nuclear import.
Medium staining in both glomeruli and tubules = = =

PRMT7
Encodes for protein arginine methyltransferase 7, which catalyzes protein arginine methylation, an irreversible protein modification. Synthesized SDMA. Arginine methylation is implicated in signal transduction, RNA transport, and RNA splicing.
Medium staining in glomeruli, high in tubules -- In mice, susceptibility alleles for doxorubicin nephropathy are associated with reduced prmt7 expression 62 .

PLA2G15
Lysosomal enzyme that has both calciumindependent phospholipase A2 and transacylase activities.
Medium to high staining in glomeruli, high in tubules ---

SMPD3
Catalyzes the hydrolysis of sphingomyelin to form ceramide and phosphocholine.
Probably participates in bone and dentin mineralization.
Low staining in glomeruli, high in tubules Tumor suppressor that inhibits cell proliferation and promotes apoptosis.
Modulates the activity of protein arginine Nmethyltransferases, including PRMT3 and PRMT5.
Medium staining in tubules, not detected in glomeruli --Differential splicing connected to diverse roles in kidney and brain physiology, and potentially unique functions in cell proliferation and tumor suppression 63 .
rs12458009 CDH20 Encodes for a calcium dependent cell-cell adhesion glycoprotein.
Medium staining in tubules, not detected in glomeruli --- Positions are given on GRCh build 37. The gene closest to the variant is listed (index gene). IQ is the imputation quality metric computed as median of info score (ImputeV2) or RSQ (minimac) across studies. The effects are given on ln eGFRcrea. The correlation r² was computed using the SNAP software available at http://www.broadinstitute.org/mpg/snap/ldsearch.php or as Spearman correlation coefficient in the KORA-F4 study for variants not present in the SNAP database. The effects are given in terms of log(eGFRcrea). * The SNP rs7422339 has merged into rs1047891.  65 . Power computations to be compared between the two meta-analyses were based on a true effect size assumed to be the average between the two effect estimates (-0.0056, -0.0049, 0.0055, -0.0057, respectively), a true EAF assumed to be the average between the two EAF estimates (0.27, 0.67, 0.71, or 0.78, respectively), a 5x10 -8 significance level, and the sample size of 110,517 or 133,806, respectively. Effective power was computed based on the effective sample size (sample size multiplied with the imputation quality). IQ is the imputation quality metric computed as median of info score (ImputeV2) or RSQ (minimac) across studies. EAF is the effect allele frequency. The effects are given on ln eGFRcrea. The 98.5% confidence interval (CI) corresponds to the 95% CI accounting for four independent tests and it is computed as effect ± 2.5 * SE. The gene closest to the variant is listed (index gene). Imputation quality is computed as median of info score (ImputeV2) or RSQ (minimac) across studies. I² is the heterogeneity statistic as reported by the metaanalysis software METAL 68 .  Table 11. Summary of the 8 independent association signals suggested by joint conditional analysis on reported variants with GCTA. Joint conditional analysis was performed on the 53 previously reported and the 10 novel loci associated with eGFRcrea in the 1000 Genomes meta-analysis on up to 110,517 subjects. Shown are the previously reported variants used for the conditional analysis and the association results of the independent association signals with genome-wide significant variants in the 8 genetic regions after adjustment for the previously reported variants (Supplementary Position is reported on GRCh build 37. Chr is chromosome. EAF is the effect allele frequency. The effects are given on ln eGFRcrea.

Supplementary Table 12. Characterization of variants with smallest p-value in independent association signals identified by the joint conditional analysis.
In each locus with an independent signal near previously reported variants (indicated by index gene), we examined the primary p-value (from the meta-analysis) and the p-value from the joint conditional analysis (with GCTA) and their influence on the variant effect, the allelic correlation, and the inheritance of the previously reported variant with the variant with smallest p-value in potentially independent association signals.

DDX1
In DDX1 the previously reported lead variant rs6431731 66 had median IQ=0.60 and was not genome-wide significant in the HapMap meta-analysis (primary p-value=3.00x10 -7 ). It achieved a higher IQ in The 1000 Genomes imputed data (IQ=0.82) and a higher pvalue for association (primary p-value=1.73x10 -5 ). Nevertheless, 69,988 base pairs upstream, the variant rs807601 is genome-wide significant and well-imputed in the HapMap analysis (p-value=6.60x10 -12 , IQ=0.98) and in the 1000 Genomes meta-analysis (p-value=3.84x10 -11 , IQ=0.97) and also genome-wide significant after adjusting for the previously reported variant (p-value=2.39x10 -8 ), as well. There is no change of effect size (effect=0.007 in both analyses). The potentially independent association signal and the previously reported variant are uncorrelated (r 2 =0.04) but inherit their risk alleles together as the coinheritance indicator D' is high (D'=0.79) (Supplementary Figure 2 A1-A2). Thus, compared to previous analysis, the current 1000 Genomes meta-analysis identifies a better index SNP for this locus.

UMOD
In the UMOD locus the highly significant lead variant rs77924615 (p-value=4.57x10 -40 ) was newly introduced with The 1000 Genomes reference data. The joint conditional analysis suggests that it is independent (conditional p-value=8.45x10 -14 ) from the reported variant rs12917707 64 (p-value=2.01x10 -34 ). Its effect is diminished in the joint conditional analysis (from 0.018 to 0.009). The variants are independent from each other (r 2 =0.34) but they are likely to inherit their alleles via the same haplotype given the presence of moderate LD (D'=0.64, Supplementary Figure 2 B1-B2).

SLC22A2
In SLC22A2, rs316020 (primary p-value=6.39x10 -15 , conditional p-value=1.18x10 -11 ) is suggested to be independent from the previously reported variant rs2279463 67 (primary p-value=1.07x10 -17 ). The effect of rs316020 does not change in, when adjusting on the reported variant (changes from 0.012 to 0.011). The variants have limited correlation but they are in total disequilibrium (r 2 =0.45, D'=1.00), suggesting, the risk alleles at the two variants are inherited on the same haplotype (Supplementary Figure 2 C1-C2).

GATM
The joint conditional analysis shows that rs146625690 (primary p-value=2.27x10 -12 ) is independent (conditional p-value=1.92x10 -8 ) from the previously reported variant rs2453533 64 (primary p-value=2.65x10 -43 ) in the GATM locus. The effects of rs146625690 are little diminished when adjusting for the previously reported variant (changes from -0.032 to -0.026). Thus, it is suggested, that rs146625690 is the SNP with lowest p-value in an independent signal. The variants are highly correlated and their risk alleles are inherited via the same haplotype (r 2 =0.98, D'=0.85, Supplementary Figure 2 D1-D2).

MPPED2
The genome-wide significant previously reported variant in the MPPED2 locus is rs3925584 66 (primary p-value=2.09x10 -16 ). The variant rs294345 (primary p-value=4.15x10 -8 ) is independent from the rs3925584 (conditional p-value=3.09x10 -10 ) and its effect is comparable in the joint conditional, when adjusting for the previously reported variant (changes from -0.012 to -0.014). The variant rs294345 is closer to the gene promoter and in the same recombination segment as the previously reported SNP rs3925584. The r 2 and D' between the two variants are low (r 2 =0.01, D'=0.42), suggesting independence and that the risk alleles are inherited by separate haplotypes. (Supplementary Figure 2 F1-F2).

BCAS3
The previously reported variant in BCAS3 is rs9895661 67   Not entry for this gene --The locus has been identified in a GWAS or serum magnesium levels 75 . rs8080123

TBX2
Encodes a T-box binding transcription factor with a role in developmental processes.
High staining in both glomeruli and tubules --One paper describing a role of TBX2 in defining the territory of the pronephric nephron using Xenopus model organism 76 .

BCAS3
Plays a role in angiogenesis. Participates in the regulation of cell polarity and migration. 3.40x10 -3 Chr is chromosome. Position is given on GRCh build 37. P-value is the result of the cis eQTL association whereas the corresponding Effect Direction is based on the Effect Allele.
Study specific acknowledgements and funding sources for participating studies, alphabetical order.

3C.
Three-City Study. The work was made possible by the participation of the control subjects, the patients, and their families. We thank Dr. Anne Boland (CNG) for her technical help in preparing the DNA samples for analyses. This work was supported by the National Foundation for Alzheimer's disease and related disorders, the Institut Pasteur de Lille and the Centre National de Génotypage. The 3C Study was performed as part of a collaboration between the Institut National de la Santé et de la Recherche Médicale (Inserm), the Victor Segalen Bordeaux II University and Sanofi-Synthélabo. The Fondation pour la Recherche Médicale funded the preparation and initiation of the study. The 3C Study was also funded by the Caisse Nationale Maladie des Travailleurs Salariés, Direction Générale de la Santé, MGEN, Institut de la Longévité, Agence Française de Sécurité Sanitaire des Produits de Santé, the Aquitaine and Bourgogne Regional Councils, Fondation de France and the joint French Ministry of Research/INSERM "Cohortes et collections de données biologiques" programme. Lille Génopôle received an unconditional grant from Eisai.

AGES.
Age, Gene/Environment Susceptibility-Reykjavik Study. This study has been funded by NIH contract N01-AG-1-2100, the NIA Intramural Research Program, Hjartavernd (the Icelandic Heart Association), and the Althingi (the Icelandic Parliament). The study is approved by the Icelandic National Bioethics Committee, VSN: 00-063. The researchers are indebted to the participants for their willingness to participate in the study.

ARIC. Atherosclerosis Risk in Communities study.
The ARIC study is carried out as a collaborative study supported by National Heart, Lung, and Blood Institute contracts (HHSN268201100005C, HHSN268201100006C, HHSN268201100007C, HHSN268201100008C, HHSN268201100009C, HHSN268201100010C, HHSN268201100011C, and HHSN268201100012C), R01HL087641, R01HL59367 and R01HL086694; National Human Genome Research Institute contract U01HG004402; and National Institutes of Health contract HHSN268200625226C. The authors thank the staff and participants of the ARIC study for their important contributions. Infrastructure was partly supported by Grant Number UL1RR025005, a component of the National Institutes of Health and NIH Roadmap for Medical Research. This work as well as YL and AK were supported by the German Research Foundation (KO 3598/2-1, KO 3598/3-1 and CRC1140 A05 to AK).

ASPS. Austrian Stroke Prevention Study.
The research reported in this article was funded by the Austrian Science Fond (FWF) grant number P20545-P05 and P13180. The Medical University of Graz supports the databank of the ASPS. The authors thank the staff and the participants of the ASPS for their valuable contributions. We thank Birgit Reinhart for her long-term administrative commitment and Ing Johann Semmler for the technical assistance at creating the DNA-bank. . Assistance with phenotype harmonization and genotype cleaning, as well as with general study coordination, was provided by the GENEVA Coordinating Center (U01HG004446). Assistance with data cleaning was provided by the National Center for Biotechnology Information. Genotyping was performed at the Broad Institute of MIT and Harvard, with funding support from the NIH GEI (U01HG04424), and Johns Hopkins University Center for Inherited Disease Research, with support from the NIH GEI (U01HG004438) and the NIH contract "High throughput genotyping for studying the genetic contributions to human disease"(HHSN268200782096C). The NHS renal function and albuminuria work was supported by DK66574. Additional funding for the current research was provided by the National Cancer Institute (P01CA087969, P01CA055075), and the National Institute of Diabetes and Digestive and Kidney Diseases (R01DK058845). We thank the staff and participants of the NHS and HPFS for their dedication and commitment.