Genome-wide associations for benign prostatic hyperplasia reveal a genetic correlation with serum levels of PSA

Benign prostatic hyperplasia and associated lower urinary tract symptoms (BPH/LUTS) are common conditions affecting the majority of elderly males. Here we report the results of a genome-wide association study of symptomatic BPH/LUTS in 20,621 patients and 280,541 controls of European ancestry, from Iceland and the UK. We discovered 23 genome-wide significant variants, located at 14 loci. There is little or no overlap between the BPH/LUTS variants and published prostate cancer risk variants. However, 15 of the variants reported here also associate with serum levels of prostate specific antigen (PSA) (at a Bonferroni corrected P < 0.0022). Furthermore, there is a strong genetic correlation, rg = 0.77 (P = 2.6 × 10−11), between PSA and BPH/LUTS, and one standard deviation increase in a polygenic risk score (PRS) for BPH/LUTS increases PSA levels by 12.9% (P = 1.6×10−55). These results shed a light on the genetic background of BPH/LUTS and its substantial influence on PSA levels.

Our bioinformatics-and eQTL analyses yielded several interesting findings for the newly discovered BPH/LUTS variants (see Supplementary Data Files 1 and 2,). Regarding the eQTLs search in the GTEx Browser (version 7) we did not identify any significant expression association results in either prostate or bladder tissue. The relevant eQTL results are discussed below and listed in Supplementary Table 4. Below we summarize findings from our bioinformatics analyses for the 23 newly discovered BPH/LUTS loci.

2p16.1
At 2p16.1, two variants, independently associated with BPH/LUTS, were discovered in our study. rs2556378 is located intronic in BCL11A which functions as a B-cell proto-oncogene and may play role in leukemogenesis and hematopoiesis. rs10180282 is located in AC007381.3, a long intergenic noncoding RNAs (LincRNA), for which the highest expression is reported in skin and EBV-transformed lymphocytes according to the GTEx Portal.

5p15.33
Among the most strongly associated variants with BPH/LUTS in our study is rs381949. This variant is located intronic in CLPTM1L on 5p15.33; a region previously reported to contain multiple variants associated with cancer risk in several different organs. Variants, strongly correlated (r 2 > 0.85; see Supplementary Table 8) with rs381949, have been reported to associate with serum levels of PSA 1 and cancer at multiple different sites [2][3][4][5] . The conditional analysis revealed a second GWS association signal at 5p15.33, rs2853677, located intronic within TERT. This variant has previously been reported to associate with multiple cancer types, being protective for some and at risk for others 6,7 . Interestingly, this variant has no other strongly correlated variant (r 2 >0.75; according to deCODE's dataset of 8,700 wholegenome sequenced individuals) despite having a MAF of 42% (an average based on the Icelandic and UK controls) and can therefore be considered a credible causal variant for symptomatic BPH/LUTS.

5q22.1
The 5q22.1 variant (rs10054105) is located intergenic between STARD4 and NREP but within the antisense RNA gene STARD-AS1. According to UniProt webpage, NREP may have roles in neural function and promotes also axonal regeneration (By similarity). It may also have functions in cellular differentiation (by similarity) and induce differentiation of fibroblast into myofibroblast and myofibroblast ameboid migration. It increases retinoic-acid regulation of lipid-droplet biogenesis (by similarity) and down-regulates the expression of TGFB1 and TGFB2 but not of TGFB3 (by similarity). Whereas STARD4 May be involved in the intracellular transport of sterols or other lipids and it may also bind cholesterol or other sterols (by similarity).
5q31.1 rs677394 at 5q31.1, is located intronic in C5orf66, and downstream of H2AFY, which encodes a replication-independent histone that is a member of the histone H2A family. It replaces conventional H2A histones in a subset of nucleosomes where it represses transcription and participates in stable X chromosome inactivation. H2AFY is widely expressed but no expression association is reported for rs677394 in the GTEx Portal.
6p22.1 rs200476 is located intergenic on 6p22.1 in the vicinity of a histone gene cluster.

10p12.31
At 10p12.31 we discovered two independent BPH/LUTS signals: rs148678804 and rs7906649. A correlated SNP (rs116940348 with r 2 = 0.71) of rs148678804 has been reported to associate with serum levels of PSA 8 . Both of these variants are located upstream of the DNAJC1 gene and our bioinformatics analysis suggests an effect of both of these variants on a promoter/enhancer region for DNAJC1. This gene encodes a membrane bound heat shock protein that binds the molecular chaperone BiP, located in the lumen of the endoplasmic reticulum. A relatively strong expression of DNAJC1 is reported in both bladder and prostate tissue according to the GTEx Portal.

10q26.12
The 10q26.12 locus contains three independent BPH/LUTS association signals: rs2981575 located intronic in FGFR2, rs4548546 located intronic in WDR11, and rs11199879 located intergenic between WDR11 and FGFR2 (see Table 1). Members of WDR11 gene family are involved in a variety of cellular processes, including cell cycle progression, signal transduction, apoptosis, and gene regulation. FGFR2 is a member of the fibroblast growth factor receptor family and plays an essential role in the regulation of cell proliferation, differentiation, migration and apoptosis. The FGFR2 intron variant (rs2981575) has previously been shown to associate with BRCA2 associated breast cancer 9 . As for the intergenic variant (rs11199879) a strongly correlated variant, rs10886902 (has r 2 = 0.99 with of rs11199879) has been shown to associate with serum levels of PSA 1 and aggressiveness of prostate cancer 10 .We checked if any of these three variants were significantly associated with gene expression, based on results in the GTEx Portal. No significant expression association results are reported for any of these variants, after conditioning on the SNP with the most significant eQTL association for the most relevant gene (see Supplementary Table 4).

11p15.5
At 11p15.5, a missense variant in ODF3, rs72878024, associates with BPH/LUTS. According to the GTEx Portal, rs72878024 associates with expression levels of BET1L in esophagusmuscular tissue after the results have been conditioned on the strongest eQTl marker at this locus (β =-0.33; P = 0.0015, see Supplementary Table 4). ODF3 is a component of sperm flagella outer dense fibers, which are important during sperm movement, whereas BET1L participates in vesicles transport within the Golgi complex. To us this reflects well the diversity of potential biological functions related to a single association signal.

12q24.21
The 12q24.21 locus has two independently associated BPH/LUTS variants. rs2555019 is located intergenic and downstream of TBX5, a gene belonging to a gene family that encode transcription factors involved in regulation of embryonic developmental processes. The other variant, rs8853, is correlated (r 2 = 0.64) with rs11067228 reported to associate with serum levels of PSA 8 and it is located in the 3'-UTR of TBX3, belonging to the same gene family as TBX5. Germline mutations in TBX3 underlie ulnar mammary syndrome, a rare pleiotropic disorder characterized by altered development of upper limbs, apocrine and mammary gland and genital development 11 . Expression levels of TBX3, in the GTEx tissue library, are reported to be second and third highest, in bladder-and prostate tissues, respectively. Based on our focused analysis of promoters/enhancer regions in prostate epithelial cells we found the 12q24.12 locus (with rs8853 as a lead variant) to intersect with a super-enhancer and to have a clear tissue-specificity with respect to the H3K27ac mark in prostate-derived cells ( Fig. 2a). Furthermore, based on a recently developed enhancer-gene target resource, referred to as the Joint Effect of Multiple Enhancers (JEME), TBX3 is the only candidate target gene, in primary prostate tissue samples, linked to this enhancer element.

13q14.3
rs1638703 and rs6561599 on 13q14.3 are both independently associated with BPH/LUTS according to our results. rs1638703 is fully correlated (r 2 =1) with rs202346 that has been reported to associate with serum levels of PSA 8 and is located intronic within the non-protein coding gene DLEU1, whereas, rs6561599 is located some 5 kb upstream of RNASEH2B. The protein encoded by this gene is the non-catalytic B-subunit of RNase H2 endonuclease complex, which is thought to play a role in nucleic acid metabolism to preserve genome stability and to prevent immune activation 12 . Our focused analysis (with rs6561599 as a lead variant) of promoters/enhancers revealed a tissue-specific promoter region for RNASEH2B, wherein the H3K27ac mark was particularly prevalent in prostate-derived cells (see Fig. 2b).

17q12
At 17q12, rs11651052 is located intronic in HNF1B and has previously been reported to associate with increased risk of prostate cancer and serum levels of PSA (rs11651052 has r 2 =0.91 with rs4430796 Ref 13 , see Supplementary Table 8).

20q13.33
The 20q13.33 locus contains two independent variants associated with symptomatic BPH/LUTS. One of these variants, rs200383755_C, is a missense variant (p.Ser19Trp) in the GATA5 gene. In our combined study group this variant has an average minor allele frequency in controls of 0.9%, and a strong protective effect against symptomatic BPH/LUTS, with an ORconditioned = 0.67 and Pconditioned = 3.2×10 -9 (Table 1). The protein encoded by this gene is a transcription factor that contains two GATA-type zinc fingers and is required during cardiovascular development 14 . According to the GTEx Portal, GATA5 has the highest expression in bladder but its expression is also relatively high in prostate tissue, ranking seventh from the top. The other independently associated variant at 20q13.33 is rs6061244_C (ORconditioned = 0.94 and Pconditioned = 5.7×10 -8 ; see Table 1), located intronic in GATA5 and it has no strongly correlated variants (r 2 < 0.75) and can therefore, be considered a credible risk variant. Supplementary Figure 3. Comparison of the effect of variants on PSA-levels vs. on benign prostatic hyperplasia/lower urinary tract symptoms (BPH/LUTS). Results shown are for variants reported in main text and that are genome-wide significantly associate with BPH/LUTS. On the y-axis are their effect estimates (β) from the GWAS of PSA-levels in Iceland and on the x-axis is the natural logarithm of the odds ratio (log(OR)) from the metaanalysis of the GWASs of BPH/LUTS in Iceland and the UK. The blue dots denote BPH/LUTS-variants significantly associated with PSA-levels at a Bonferroni correction of P < 0.0022 and red dots denote variants not surpassing the Bonferroni significance threshold. Error bars denote standard error. The depicted data is also reported in Supplementary Table 7 Shown is the effect allele (EA), the other allele (OA), the simple average effect allele population frequency (EAF), the allelic odds ratio (OR) for the effect allele with upper and lower 95% confidence intervals (c.i.) and the two-sided P value for association testing between variants and disease which was performed using the likelihood ratio statistic. Results from the two study groups were combined using a Mantel-Haenszel model (see Methods). Annotation is according to Variant Effect Predictor (VEP). Shown are also the P value for the heterogeneity (Phet) between the two study groups and the heterogeneity (cont. on next page) statistic (I 2 ) representing the fraction of variability due to heterogeneity between study groups. rs200383755 had an imputation information score of 0.99 and 0.88 in the Icelandic and UK datasets, respectively. All other markers listed had imputation information score > 0.95. Results for markers pertaining to loci with more than one association signal are shown after conditioning on a relevant covariate. Markers at loci with no additional association signal do not have any applicable covariate (na) and the results are the unconditioned association result from the GWAS of symptomatic BPH/LUTS. *Markers discovered in the conditional analysis. Position is according to Build 38 (hg19) of the reference genome. Shown is the effect allele (EA), the other allele (OA), the simple average effect allele population frequency (EAF), the allelic odds ratio (OR) for the effect allele with upper and lower 95% confidence intervals (c.i.) and the P value for association testing between variants and disease which was performed using logistic regression and the likelihood ratio statistic. Results from the two study groups were combined using a Mantel-Haenszel model (see Methods). Annotation is according to Variant Effect Predictor (VEP). Shown are also the P value for the heterogeneity (Phet) between the two study groups and the heterogeneity statistic (I 2 ) representing the fraction of variability due to heterogeneity between study groups. rs200383755 had an imputation information score of 0.99 and 0.88 in the Icelandic and UK datasets, respectively. All other markers listed had imputation information score >0.95.