Heritability and genome-wide association study of benign prostatic hyperplasia (BPH) in the eMERGE network

Benign prostatic hyperplasia (BPH) results in a significant public health burden due to the morbidity caused by the disease and many of the available remedies. As much as 70% of men over 70 will develop BPH. Few studies have been conducted to discover the genetic determinants of BPH risk. Understanding the biological basis for this condition may provide necessary insight for development of novel pharmaceutical therapies or risk prediction. We have evaluated SNP-based heritability of BPH in two cohorts and conducted a genome-wide association study (GWAS) of BPH risk using 2,656 cases and 7,763 controls identified from the Electronic Medical Records and Genomics (eMERGE) network. SNP-based heritability estimates suggest that roughly 60% of the phenotypic variation in BPH is accounted for by genetic factors. We used logistic regression to model BPH risk as a function of principal components of ancestry, age, and imputed genotype data, with meta-analysis performed using METAL. The top result was on chromosome 22 in SYN3 at rs2710383 (p-value = 4.6 × 10−7; Odds Ratio = 0.69, 95% confidence interval = 0.55–0.83). Other suggestive signals were near genes GLGC, UNCA13, SORCS1 and between BTBD3 and SPTLC3. We also evaluated genetically-predicted gene expression in prostate tissue. The most significant result was with increasing predicted expression of ETV4 (chr17; p-value = 0.0015). Overexpression of this gene has been associated with poor prognosis in prostate cancer. In conclusion, although there were no genome-wide significant variants identified for BPH susceptibility, we present evidence supporting the heritability of this phenotype, have identified suggestive signals, and evaluated the association between BPH and genetically-predicted gene expression in prostate.


Results
Heritability. SNP-based additive heritability among common variants was assessed in the 5 sites of the eMERGE-1 network and one of the Geisinger datasets (CoreExome) as they had the largest number of cases assessed on a common genotyping array (Table 1). After stringent filters to remove residual population stratification, there were 755 cases and 899 controls included from eMERGE-1 and 423 cases and 1278 controls included from Geisinger CoreExome. Heritability results were consistent between the two groups, with an estimated heritability of 0.65 (±0.30) in eMERGE-1 (p-value = 0.011) and 0.56 (±0. 38) in Geisinger (p-value = 0.070) ( Table 2). Results were largely consistent across inclusion of increasing PCs (Supplementary Table 1). Results across chromosomes varied substantially ( Supplementary Fig. 1), however, chromosomes 6 and 7 were present among the top 5 results for both eMERGE-1 and Geisinger, suggesting the likelihood of one or more BPH-susceptibility loci being located on these chromosomes.
We also evaluated the previously identified variants from recent GWAS to determine whether variants replicated across studies (Table 4) 49,50 . None of the variants reported in those studies were significant or suggestively associated with BPH in this analysis, although effect estimates were largely consistent in both direction and magnitude. Restricting to whites only to more closely match the papers 49,50 did not yield significant results. Gene expression. We also evaluated genetically predicted gene expression (GPGE) in prostate tissue using S-PrediXcan 51 and models constructed in GTEx samples 52 (Table 5; Fig. 1). The top result did not reach statistical significance (Bonferroni threshold for number of genes, p-value < 1.93 × 10 −5 ) was with increased predicted expression of ETS variant 4 (ETV4; chromosome 17; p-value = 0.0015). Other nominally significant genes were identified on chromosomes 6, 20, 3, 14, 1, and 7. It is noteworthy that neither of the genes on chromosome 6 (histone cluster 1 H3 family member e [HIST1H3E]) and 20 (gonadotropin-releasing hormone 2 [GNRH2], were near the top signals implicated in the GWAS results (GLGC and the intergenic region between BTBD3 and SPTLC3), instead these suggestive GPGE results arose from secondary signals in other regions.

Discussion
We have performed the first SNP-based heritability assessments of BPH followed by a trans-ethnic GWAS and evaluation of genetically predicted gene expression in prostate tissue. Our results indicate that BPH is likely to be substantially heritable, with consistent point estimates near 60% across two comparable EMR-based cohorts, which is somewhat higher than the 49% reported previously from twin studies 24 . The LUTS symptom score heritability however has been reported to be variable, with estimates ranging from 20 to 83%. The cases in this study likely have overt symptoms of BPH that lead to their clinical diagnoses and treatments, and may represent a more severe phenotype than from some cohort studies.
In this first GWAS of EHR-assessed BPH, we identified previously unreported suggestive SNPs. The gene containing the top SNP from the GWAS, SYN3 is a neuronal protein 53,54 which has been implicated in GWAS of many diverse phenotypes including age-related macular degeneration [55][56][57] , height [58][59][60] , and uric acid levels 61 . Expression of SYN3 in GTEx is highest in testis, followed by several brain regions, but is low in prostate and no predictive model was constructed for SYN3 expression in that tissue 52,62 . Another neuronal protein UNC13A (unc-13 homolog A) was also implicated from these GWAS results. Variants near this gene have been consistently associated with amyotrophic lateral sclerosis in several genome-wide studies [63][64][65] .
The second suggestive signal from GWAS, in the gene GCLC is also interesting, due in part to the localization of the SNP-based heritability on chromosome 6. Also relevant is the finding of modest association with the lead variant (rs534957) from our study which also demonstrated a weak association with prostatitis in the UK Biobank data (p-value = 6 × 10 −3 ; as viewed in the Global Biobank Engine 66 [https://biobankengine.stanford.edu/]). This suggests a consistent finding with another EHR-defined data set despite differences in case/control classification. Additionally, the identification of suggestive GPGE on chromosome 6 apart from the GCLC locus provides modest support for the heritability analysis, suggesting that the relevant SNPs have yet to be detected, perhaps due to a lack of power in the present studies.
One of the more biologically interesting candidates identified in this study is GNRH2 (gonadotropin-releasing hormone 2). GPGE analysis indicated that reduced expression of this gene in the prostate was associated with increased risk of BPH (p-value = 0.021). GNRH2 is expressed in the prostate 67-70 and its expression is regulated by several reproductive hormones 71 . Both gonadotropin releasing hormone (GnRH) antagonists and agonists have been investigated as treatments for BPH and prostate cancer 12,16,[72][73][74][75][76][77] , however, the side effects have made many of these impractical as therapeutic options. There is currently a Phase 3 trial underway to evaluate whether a GnRH antagonist in combination with radiation can improve progression of prostate cancer. This therapeutic was previously part of a phase 2 trial for efficacy in BPH, however the trial was stopped early due to not meeting primary efficacy endpoints. This is potentially consistent with the results observed here in which reduced expression levels of GNRH2 are associated with increased risk of BPH. A genetic variant in GNRH2 (rs6051545) was observed to impact testosterone levels during androgen deprivation therapy to treat metastatic prostate cancer 78 . It has been suggested that this may lead to a negative effect of the therapy on prognosis 78 .
Of the 11 genes included in Table 5, more than half of them have been previously reported such that expression changes have been associated in prostate tissue, often with various stages of prostate cancer. The top result from the GPGE analysis, ETV4 (ETS variant 4) has been previously found in studies of prostate cancer to have significantly higher relative expression in the tumor tissues than in benign samples 79 , as well as an association with poor prognosis 80,81 . We found that increased predicted expression of ETV4 is associated with increased risk of BPH in this study (p-value = 0.0015). Laminin subunit beta 2 (LAMB2) has been identified as being downregulated in the transition from prostate intraepithelial neoplasia to invasive prostate cancer from differential expression analysis 82 . Our results suggest that increasing LAMB2 expression is associated with increased risk of BPH. www.nature.com/scientificreports www.nature.com/scientificreports/ SCAP, which encodes SREBP cleavage-activating protein, has also been identified to show expression changes in prostate cancer 83,84 , and has been specifically noted to be regulated by androgens 83,85 . Recently, TIGIT expression has been implicated in failures of prostate cancer checkpoint inhibition 86,87 . Together, these results suggest that though these results did not achieve statistical significance, germline genetic associations with BPH may alter gene expression in prostate tissue, and that those genes without a presently documented role may yet be identified as important in studies of prostate gene expression implicated in disease.
There have been two recent GWAS of BPH in whites which have identified many significant and suggestive associations, though none were identified by both studies. Evaluation of these reported signals in the eMERGE data revealed modest associations at only five loci, including BCL11A, TERT, CLPTM1L, GATA6, and FGFR2 (Table 4). It is notable, that although none of the variants reported were significant in this study, effect estimates were largely consistent in both direction and magnitude. The lack of replication may be due in part to differences across studies in disease definition (varied use of IPSS scores, prostate volume, history of transurethral resection of the prostate, etc), participant recruitment from clinical trials, community cohorts, and hospital-based populations, or differences in age. Evaluation of associated variation reported in candidate gene analyses of BPH 28,[30][31][32]34,37,38,88 and an evaluation of prostate volume 48 did not yield any suggestive results in the present study (Supplementary Table 3).
Previous studies have shown adequate positive and negative predictive values based on electronic diagnoses (International Classification of Diseases, Ninth Revision (ICD9) codes and problem list) for BPH 89 , however, the phenotyping of BPH in the medical record likely reflects the presence of symptoms. Studies of care-seeking behavior with respect to BPH and LUTS have consistently shown that those seeking medical care tend to have higher symptom scores/more severe symptoms, but that reasons for not seeking treatment include diverse social and treatment concerns, even among those experiencing symptoms [90][91][92][93][94] . This suggests the possibility that some   www.nature.com/scientificreports www.nature.com/scientificreports/ portion of the controls in our study may have experienced (or will experience in the future) symptoms of BPH but have not (yet) sought treatment for the condition. This is a limitation of the present study.
Based on these results, wherein BPH was shown to be heritable but no significant susceptibility loci were detected, it seems that BPH is a complex disease made up of many physiological symptoms and the genetic underpinnings of this trait are likely to consist of a multitude of variants of small effect. This makes large sample sizes crucial for detecting genetic loci associated with BPH as has been demonstrated 50 . In conclusion, we have shown that BPH is heritable, identified suggestive association signals, and are the first to evaluate the association between BPH and genetically-predicted gene expression in prostate.

Methods study populations. The eMERGE Network is a consortium of several EHR-linked biorepositories formed
with the goal of developing approaches for the use of the EHR in genomic research 95  and problem lists [containing keywords e.g. "prostate cancer", "malignant tumor of the prostate", "bladder cancer", "bladder CA"), we included all cases of BPH with at least two ICD9 codes indicating a BPH diagnosis (600, 600.0, 600.0*, 600.2, 600.2*, 600.9, 600.9*), in addition to either receiving medications for the treatment of BPH or 1 or more procedure (Current Procedural Terminology [CPT]) codes for BPH-related surgeries (52450, 52601, 52630, 52648, 53850, 53852). Controls were males of at least age 40, with at least 3 outpatient visits within any 2-year period after the age of 40, without prostate or bladder cancer or instances of BPH ICD9 codes, medications or BPH-related surgical CPT codes. The algorithm is available on PheKB.org (https://phekb.org/phenotype/216).
Genotyping and Quality Control. Genotyping was performed for eMERGE-1 study sites using one of two Illumina arrays across two genotyping centers. Individuals of self-identified or administratively-assigned European-descent were genotyped on the Illumina 660W-Quad, while individuals of self-identified or administratively-assigned African-descent were genotyped on the Illumina 1 M. For the majority of patients, genotyping was performed at one of two centers: the Center for Inherited Disease Research (CIDR) at Johns Hopkins University and the Center for Genotyping and Analysis at the Broad Institute as previously described 95,96 . Existing genotype data available for eMERGE-2 and -3 study sites included data from the Illumina 550, Illumina 610, Illumina HumanOmni Express, Illumina MultiEthnic Genotyping Array, Illumina CoreExome, and Affymetrix 6.0 arrays.
Genotype quality control (QC) was performed within each study population, and a uniform protocol was implemented. QC for all studies was performed using PLINK 97 , including a 95% single nucleotide polymorphism (SNP) and individual call rate threshold, removal of first-degree related individuals, sex checks, alignment of alleles to the genomic '+' strand. Visualization of ancestry by principal components analysis was performed by study using either Eigenstrat 98 or flashPCA 99 .
Statistical Analysis. Restricted maximum likelihood estimation as implemented in GCTA 100 was used to determine the proportion of phenotypic variance explained by common additive genetic variants in two cohorts, eMERGE-1 and Geisinger CoreExome. Data was filtered to include only common variants (MAF > 0.05), and samples with IBD probabilities <0.025, as well as restricted using principal components to retain only EA samples. Disease prevalence was based on average age within cohort and set at 0.67 for eMERGE-1 and 0.63 for Geisinger CoreExome.
Genotype data was imputed from the 1000 Genomes Project haplotypes using SHAPEIT2 101 and IMPUTE2 102 by site or study and analyzed for associations separately. We used logistic regression to model BPH risk as a function of genotype, age, and principal components of ancestry with the SNPTEST software package, with subsequent meta-analysis performed using METAL 103 . There was no substantial genomic inflation observed, with a meta-analysis lambda of 1.017 ( Supplementary Fig. 2).
To further evaluate the genetic association results in the context of gene expression, we employed the novel method S-PrediXcan 51 , an extension of the PrediXcan method 62 . PrediXcan conducts a test of association between phenotypes and gene expression levels predicted by genetic variants in a library of tissues from the Genotype-Tissue Expression (GTEx) project 52,104 . S-PrediXcan is a meta-analysis approach that conducts the  www.nature.com/scientificreports www.nature.com/scientificreports/ PrediXcan test using genotype association summary statistics, rather than performing the tests in individual-level data. We utilized covariance matrices built for prostate tissue from GTEx to annotate SNP association signals as well as to provide information about likely tissue expression patterns and relevant biological information.