Benign prostatic hyperplasia (BPH) is characterized by an enlarged prostate that affects a moderate proportion of middle-aged men and a large proportion of elderly men and can result in significant discomfort and reproductive and urinary tract dysfunction1,2,3,4. Lower urinary tract symptoms (LUTS) are commonly attributed to BPH in the absence of other causes5,6,7. Very severe cases can result in urinary tract infections and bleeding, bladder stones, and kidney damage from failing to void7,8,9,10. Pharmaceutical treatments for BPH include alpha blockers to relax muscles and treat some LUTS symptoms, and 5-alpha reductase inhibitors which can shrink the prostate in some patients but may increase risk for prostate cancer11,12,13,14,15,16. Additionally, the available surgical remedies can present additional risks and have considerable potential consequences for reproductive and urinary tract health17,18,19,20,21.

Heritability of LUTS scores in twins has been estimated at 20–40%22, with some estimates as high as 83%23, while heritability of benign prostate disease has been estimated at 49% from twin studies24. The presence of racial disparities also supports a genetic contribution to BPH risk25,26. Evaluation of the SNP-based additive genetic heritability has not yet been published.

The genetic factors underlying BPH risk remain unclear. To date there have been many BPH candidate gene studies, often evaluating the effect of prostate cancer susceptibility variants, with mixed success27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47. Three larger-scale studies have been performed, a MetaboChip analysis of prostate volume48 and two recent genome-wide association studies (GWAS) of BPH49,50. In the present study, we evaluated genetic heritability of clinically reported BPH and conducted a GWAS using cases and controls identified from the Electronic Medical Records and Genomic (eMERGE) network, and evaluated the contribution of the genetic associations to gene expression in prostate tissue.



SNP-based additive heritability among common variants was assessed in the 5 sites of the eMERGE-1 network and one of the Geisinger datasets (CoreExome) as they had the largest number of cases assessed on a common genotyping array (Table 1). After stringent filters to remove residual population stratification, there were 755 cases and 899 controls included from eMERGE-1 and 423 cases and 1278 controls included from Geisinger CoreExome. Heritability results were consistent between the two groups, with an estimated heritability of 0.65 (±0.30) in eMERGE-1 (p-value = 0.011) and 0.56 (±0.38) in Geisinger (p-value = 0.070) (Table 2). Results were largely consistent across inclusion of increasing PCs (Supplementary Table 1). Results across chromosomes varied substantially (Supplementary Fig. 1), however, chromosomes 6 and 7 were present among the top 5 results for both eMERGE-1 and Geisinger, suggesting the likelihood of one or more BPH-susceptibility loci being located on these chromosomes.

Table 1 Study Characteristics.
Table 2 SNP-based heritability of BPH in two cohorts.

Genome-wide association

In total, 2,656 cases and 7,763 controls were included across eight eMERGE sites (including the data used in the heritability analysis; Table 1) for the analysis of common (minor allele frequency [MAF] > 0.05) genetic association at 10,973,920 SNPs. Overall, the samples were predominantly identified as white (85%) and cases were slightly older than controls (average age of 68.88 years in cases vs 61.45 years in controls). The most statistically significant results from single SNP GWAS analyses was on chromosome 22 in synpasin 3 (SYN3) at rs2710383 (allele frequency = 0.12, p-value = 4.56 × 10−7; Odds Ratio [OR] = 0.69, 95% confidence interval [CI] = 0.55–0.83; Table 3; Fig. 1). Other suggestive signals were within genes glutamate-cysteine ligase catalytic subunit (GCLC; chromosome 6), unc-13 homolog A (UNC13A; chromosome 19), and ELOVL [elongation of very long chain fatty acids] fatty acid elongase 6 (ELOVL6; chromosome 4), near the long intergenic non-protein coding RNA 1919 (LINC01919; chromosome 18), and in an intergenic region on chromosome 20 between BTB domain containing 3 (BTBD3) and serine palmitoyltransferase long chain base subunit 3 (SPTLC3). Secondary analysis restricting to whites yielded consistent results for top SNPs (Table 3), but identified the top variants as rs10786938 in SORCS1 (p-value = 3.84 × 10−7, OR = 1.23; Supplementary Table 2).

Table 3 Top GWAS Results.
Figure 1
figure 1

Genome-wide genetically-predicted gene expression and SNP association meta-analysis results with BPH.

We also evaluated the previously identified variants from recent GWAS to determine whether variants replicated across studies (Table 4)49,50. None of the variants reported in those studies were significant or suggestively associated with BPH in this analysis, although effect estimates were largely consistent in both direction and magnitude. Restricting to whites only to more closely match the papers49,50 did not yield significant results.

Table 4 Replication of suggestive index SNPs reported in recent GWAS of BPH.

Gene expression

We also evaluated genetically predicted gene expression (GPGE) in prostate tissue using S-PrediXcan51 and models constructed in GTEx samples52 (Table 5; Fig. 1). The top result did not reach statistical significance (Bonferroni threshold for number of genes, p-value < 1.93 × 10−5) was with increased predicted expression of ETS variant 4 (ETV4; chromosome 17; p-value = 0.0015). Other nominally significant genes were identified on chromosomes 6, 20, 3, 14, 1, and 7. It is noteworthy that neither of the genes on chromosome 6 (histone cluster 1 H3 family member e [HIST1H3E]) and 20 (gonadotropin-releasing hormone 2 [GNRH2], were near the top signals implicated in the GWAS results (GLGC and the intergenic region between BTBD3 and SPTLC3), instead these suggestive GPGE results arose from secondary signals in other regions.

Table 5 Suggestive results from predicted gene expression in prostate.


We have performed the first SNP-based heritability assessments of BPH followed by a trans-ethnic GWAS and evaluation of genetically predicted gene expression in prostate tissue. Our results indicate that BPH is likely to be substantially heritable, with consistent point estimates near 60% across two comparable EMR-based cohorts, which is somewhat higher than the 49% reported previously from twin studies24. The LUTS symptom score heritability however has been reported to be variable, with estimates ranging from 20 to 83%. The cases in this study likely have overt symptoms of BPH that lead to their clinical diagnoses and treatments, and may represent a more severe phenotype than from some cohort studies.

In this first GWAS of EHR-assessed BPH, we identified previously unreported suggestive SNPs. The gene containing the top SNP from the GWAS, SYN3 is a neuronal protein53,54 which has been implicated in GWAS of many diverse phenotypes including age-related macular degeneration55,56,57, height58,59,60, and uric acid levels61. Expression of SYN3 in GTEx is highest in testis, followed by several brain regions, but is low in prostate and no predictive model was constructed for SYN3 expression in that tissue52,62. Another neuronal protein UNC13A (unc-13 homolog A) was also implicated from these GWAS results. Variants near this gene have been consistently associated with amyotrophic lateral sclerosis in several genome-wide studies63,64,65.

The second suggestive signal from GWAS, in the gene GCLC is also interesting, due in part to the localization of the SNP-based heritability on chromosome 6. Also relevant is the finding of modest association with the lead variant (rs534957) from our study which also demonstrated a weak association with prostatitis in the UK Biobank data (p-value = 6 × 10−3; as viewed in the Global Biobank Engine66 []). This suggests a consistent finding with another EHR-defined data set despite differences in case/control classification. Additionally, the identification of suggestive GPGE on chromosome 6 apart from the GCLC locus provides modest support for the heritability analysis, suggesting that the relevant SNPs have yet to be detected, perhaps due to a lack of power in the present studies.

One of the more biologically interesting candidates identified in this study is GNRH2 (gonadotropin-releasing hormone 2). GPGE analysis indicated that reduced expression of this gene in the prostate was associated with increased risk of BPH (p-value = 0.021). GNRH2 is expressed in the prostate67,68,69,70 and its expression is regulated by several reproductive hormones71. Both gonadotropin releasing hormone (GnRH) antagonists and agonists have been investigated as treatments for BPH and prostate cancer12,16,72,73,74,75,76,77, however, the side effects have made many of these impractical as therapeutic options. There is currently a Phase 3 trial underway to evaluate whether a GnRH antagonist in combination with radiation can improve progression of prostate cancer. This therapeutic was previously part of a phase 2 trial for efficacy in BPH, however the trial was stopped early due to not meeting primary efficacy endpoints. This is potentially consistent with the results observed here in which reduced expression levels of GNRH2 are associated with increased risk of BPH. A genetic variant in GNRH2 (rs6051545) was observed to impact testosterone levels during androgen deprivation therapy to treat metastatic prostate cancer78. It has been suggested that this may lead to a negative effect of the therapy on prognosis78.

Of the 11 genes included in Table 5, more than half of them have been previously reported such that expression changes have been associated in prostate tissue, often with various stages of prostate cancer. The top result from the GPGE analysis, ETV4 (ETS variant 4) has been previously found in studies of prostate cancer to have significantly higher relative expression in the tumor tissues than in benign samples79, as well as an association with poor prognosis80,81. We found that increased predicted expression of ETV4 is associated with increased risk of BPH in this study (p-value = 0.0015). Laminin subunit beta 2 (LAMB2) has been identified as being downregulated in the transition from prostate intraepithelial neoplasia to invasive prostate cancer from differential expression analysis82. Our results suggest that increasing LAMB2 expression is associated with increased risk of BPH. SCAP, which encodes SREBP cleavage-activating protein, has also been identified to show expression changes in prostate cancer83,84, and has been specifically noted to be regulated by androgens83,85. Recently, TIGIT expression has been implicated in failures of prostate cancer checkpoint inhibition86,87. Together, these results suggest that though these results did not achieve statistical significance, germline genetic associations with BPH may alter gene expression in prostate tissue, and that those genes without a presently documented role may yet be identified as important in studies of prostate gene expression implicated in disease.

There have been two recent GWAS of BPH in whites which have identified many significant and suggestive associations, though none were identified by both studies. Evaluation of these reported signals in the eMERGE data revealed modest associations at only five loci, including BCL11A, TERT, CLPTM1L, GATA6, and FGFR2 (Table 4). It is notable, that although none of the variants reported were significant in this study, effect estimates were largely consistent in both direction and magnitude. The lack of replication may be due in part to differences across studies in disease definition (varied use of IPSS scores, prostate volume, history of transurethral resection of the prostate, etc), participant recruitment from clinical trials, community cohorts, and hospital-based populations, or differences in age. Evaluation of associated variation reported in candidate gene analyses of BPH28,30,31,32,34,37,38,88 and an evaluation of prostate volume48 did not yield any suggestive results in the present study (Supplementary Table 3).

Previous studies have shown adequate positive and negative predictive values based on electronic diagnoses (International Classification of Diseases, Ninth Revision (ICD9) codes and problem list) for BPH89, however, the phenotyping of BPH in the medical record likely reflects the presence of symptoms. Studies of care-seeking behavior with respect to BPH and LUTS have consistently shown that those seeking medical care tend to have higher symptom scores/more severe symptoms, but that reasons for not seeking treatment include diverse social and treatment concerns, even among those experiencing symptoms90,91,92,93,94. This suggests the possibility that some portion of the controls in our study may have experienced (or will experience in the future) symptoms of BPH but have not (yet) sought treatment for the condition. This is a limitation of the present study.

Based on these results, wherein BPH was shown to be heritable but no significant susceptibility loci were detected, it seems that BPH is a complex disease made up of many physiological symptoms and the genetic underpinnings of this trait are likely to consist of a multitude of variants of small effect. This makes large sample sizes crucial for detecting genetic loci associated with BPH as has been demonstrated50. In conclusion, we have shown that BPH is heritable, identified suggestive association signals, and are the first to evaluate the association between BPH and genetically-predicted gene expression in prostate.


Study Populations

The eMERGE Network is a consortium of several EHR-linked biorepositories formed with the goal of developing approaches for the use of the EHR in genomic research95,96. Consortium membership has evolved over eMERGE’s 11 year history, with many sites contributing data including Group Health/University of Washington, Marshfield Clinic, Mayo Clinic, Northwestern University, Vanderbilt University (Phase 1 sites), Children’s Hospital of Philadelphia (CHOP), Boston Children’s Hospital (BCH), Cincinnati Children’s Hospital Medical Center (CCHMC), Geisinger Health System, Mount Sinai School of Medicine (sites added in Phase 2), Harvard University and Columbia University (sites added in Phase 3). The eMERGE study was approved by the Ethical Committee/Institutional Review Board at each site (Vanderbilt University Medical Center, Group Health/University of Washington, Marshfield Clinic, Mayo Clinic, Northwestern University, Children’s Hospital of Philadelphia, Boston Children’s Hospital, Cincinnati Children’s Hospital Medical Center, Geisinger Health System, Mount Sinai School of Medicine, Harvard University and Columbia University) and all methods were performed in accordance with the relevant guidelines and regulations. In this study of BPH, data from the eMERGE pediatric study sites (CHOP, BCH, CCHMC) were not included. Participants at all study sites provided written, informed consent, and for participants under the age of 18 years (who were not included in the analyses presented herein), informed consent was obtained from a parent and/or legal guardian.


Among men of at least age 40, without prostate or bladder cancers (defined via ICD9 codes [233.4, 233.7 or 233.9], tumor registries [Primary site = C619] and problem lists [containing keywords e.g. “prostate cancer”, “malignant tumor of the prostate”, “bladder cancer”, “bladder CA”), we included all cases of BPH with at least two ICD9 codes indicating a BPH diagnosis (600, 600.0, 600.0*, 600.2, 600.2*, 600.9, 600.9*), in addition to either receiving medications for the treatment of BPH or 1 or more procedure (Current Procedural Terminology [CPT]) codes for BPH-related surgeries (52450, 52601, 52630, 52648, 53850, 53852). Controls were males of at least age 40, with at least 3 outpatient visits within any 2-year period after the age of 40, without prostate or bladder cancer or instances of BPH ICD9 codes, medications or BPH-related surgical CPT codes. The algorithm is available on (

Genotyping and Quality Control

Genotyping was performed for eMERGE-1 study sites using one of two Illumina arrays across two genotyping centers. Individuals of self-identified or administratively-assigned European-descent were genotyped on the Illumina 660W-Quad, while individuals of self-identified or administratively-assigned African-descent were genotyped on the Illumina 1 M. For the majority of patients, genotyping was performed at one of two centers: the Center for Inherited Disease Research (CIDR) at Johns Hopkins University and the Center for Genotyping and Analysis at the Broad Institute as previously described95,96. Existing genotype data available for eMERGE-2 and -3 study sites included data from the Illumina 550, Illumina 610, Illumina HumanOmni Express, Illumina MultiEthnic Genotyping Array, Illumina CoreExome, and Affymetrix 6.0 arrays.

Genotype quality control (QC) was performed within each study population, and a uniform protocol was implemented. QC for all studies was performed using PLINK97, including a 95% single nucleotide polymorphism (SNP) and individual call rate threshold, removal of first-degree related individuals, sex checks, alignment of alleles to the genomic ‘+’ strand. Visualization of ancestry by principal components analysis was performed by study using either Eigenstrat98 or flashPCA99.

Statistical Analysis

Restricted maximum likelihood estimation as implemented in GCTA100 was used to determine the proportion of phenotypic variance explained by common additive genetic variants in two cohorts, eMERGE-1 and Geisinger CoreExome. Data was filtered to include only common variants (MAF > 0.05), and samples with IBD probabilities <0.025, as well as restricted using principal components to retain only EA samples. Disease prevalence was based on average age within cohort and set at 0.67 for eMERGE-1 and 0.63 for Geisinger CoreExome.

Genotype data was imputed from the 1000 Genomes Project haplotypes using SHAPEIT2101 and IMPUTE2102 by site or study and analyzed for associations separately. We used logistic regression to model BPH risk as a function of genotype, age, and principal components of ancestry with the SNPTEST software package, with subsequent meta-analysis performed using METAL103. There was no substantial genomic inflation observed, with a meta-analysis lambda of 1.017 (Supplementary Fig. 2).

To further evaluate the genetic association results in the context of gene expression, we employed the novel method S-PrediXcan51, an extension of the PrediXcan method62. PrediXcan conducts a test of association between phenotypes and gene expression levels predicted by genetic variants in a library of tissues from the Genotype-Tissue Expression (GTEx) project52,104. S-PrediXcan is a meta-analysis approach that conducts the PrediXcan test using genotype association summary statistics, rather than performing the tests in individual-level data. We utilized covariance matrices built for prostate tissue from GTEx to annotate SNP association signals as well as to provide information about likely tissue expression patterns and relevant biological information.