Large-scale association analysis in Asians identifies new susceptibility loci for prostate cancer

Genome-wide association studies (GWAS) have identified ∼100 genetic loci associated with prostate cancer risk. Less than a dozen of these loci were initially identified from GWAS in two Asian populations, likely because of smaller sample sizes of these individual GWAS in Asians. Here, we conduct a large-scale meta-analysis of two GWAS from the Japanese population (1,583 cases and 3,386 controls) and the Chinese population (1,417 cases and 1,008 controls), followed by replication in three independent sample sets. We identify two independent susceptibility loci for prostate cancer at 11p15.4 (rs12791447, P=3.59 × 10−8; PPFIBP2) and 14q23.2 (rs58262369, P=6.05 × 10−10; ESR2). The mRNA levels of PPFIBP2 and ESR2 are differentially expressed in prostate tumours and paired normal tissues. Our study adds two new loci to the limited number of prostate cancer risk-associated variants in Asians and provides important insight into potential biological mechanisms of prostate cancer.

W ith B900,000 new cases diagnosed each year, prostate cancer represents the most frequently diagnosed cancer worldwide in men 1 . The incidence of prostate cancer varies widely between countries and regions, with the highest prevalence rate observed in Western developed countries and the lowest incidence rate observed in Asian countries. In addition to age and race, positive family history is also an established risk factor for prostate cancer, suggesting that inherited genetics may influence the risk for this disease 2 .
To date, genome-wide association studies (GWAS) for prostate cancer have identified a more than 100 susceptibility loci, explaining B30% of the familial risk for this disease 3 . However, most of these GWAS were conducted in men of European descent. Only 10 of these loci were initially identified from GWAS in two Asian populations (Japanese and Chinese), likely because of smaller sample sizes of these individual GWAS in Asians [4][5][6] . It has been demonstrated that combining multiple existing GWAS data using a meta-analysis can increase the statistical power to uncover additional risk loci 7 .
In this study, we performed a large-scale meta-analysis using two existing Asian GWAS data sets to identify novel loci for prostate cancer susceptibility in Asians. In addition, promising single-nucleotide polymorphisms (SNPs) were further evaluated in three independent study populations of Japanese and Chinese descents. We report strong evidence for two prostate cancer susceptibility loci at 11p15.4 and 14q23.2. These results provide additional insight into the inherited genetic basis of prostate cancer and support the presence of considerable genetic heterogeneity between different ethnic groups.

Results
Study overview. The overall study design of the meta-analysis GWAS of prostate cancer is depicted in Fig. 1. After qualitycontrol analysis in the two original GWAS, 1,583 cases and 3,386 controls from the Japanese GWAS and 1,417 cases and 1,008 controls from the Chinese GWAS were retained for further metaanalysis. The characteristics of the subjects are summarized in Supplementary Table 1. After exclusion of rare SNPs that have a minor allele frequency (MAF) less than 5%, a total of 4,550,396 autosomal SNPs were available in both GWAS data sets. A quantile-quantile (Q-Q) plot revealed a modest inflation of the test statistics (l ¼ 1.07; Supplementary Fig. 1). The meta-analysis identified multiple previously known genomic regions, with the strongest signal at 8q24 (Fig. 2). Among the 100 known prostate cancer risk-associated SNPs, 33 were significant at P additive o0.05 (Supplementary Table 2).
Newly identified susceptibility loci for prostate cancer. To identify novel prostate cancer susceptibility loci, we first filtered out SNPs that had the P41 Â 10 À 4 from the meta-analysis or had an opposite direction of association in the two GWAS studies. A total of 2,045 SNPs remained in the analysis (Supplementary Table 3). We then applied the following criteria to select promising novel SNPs: (a) had low linkage disequilibrium (LD; r 2 o0.1) with the 100 known prostate cancer risk-associated SNPs; (b) only one SNP with the strongest evidence for association within each LD block was selected; and (c) had not been previously evaluated in follow-up stages of these two individual GWAS. After these steps, 50 SNPs were identified for further replication analysis (Supplementary Table 4).
For replication, we first genotyped these 50 selected SNPs in an independent Chinese population of 1,664 affected subjects and 1,523 control subjects from Shanghai, China (Replication 1). One SNP failed in the assay design and four SNPs failed in the genotyping stage because of poor clusters. Of the remaining 45 SNPs, four SNPs (rs4749884 at 10p14, rs12791447 at 11p15.4, rs75718479 at 12q23.1 and rs58262369 at 14q23.2) showed Suggestive loci rs4749884 at 10p14 and rs75718479 at 12q23.1 nominally significant associations at P additive o0.05 and had the same directions of association as in the GWAS meta-analysis (Supplementary Table 5).
We next examined the four significant SNPs in two independent Asian prostate cancer case-control populations, including 1,941 prostate cancer cases and 2,396 controls (Replication 2a, 908 cases and 1,354 controls of Chinese from Nanjing, China, and Replication 2b, 1,033 cases and 1,042 controls of Japanese descent from Multiethnic Cohort (MEC)). Three of these four SNPs were confirmed in the Nanjing population (P additive o0.05) and with an effect in the same direction. However, none of the SNPs was confirmed in the MEC population (Table 1).
On the basis of our meta-analysis of all independent populations in this study (GWAS stage and confirmation stages), we found that two SNPs were associated with prostate cancer risk at the genome-wide significance level; P additive ¼ 3.59 Â 10 À 8 for rs12791447 at 11p15.4 and P additive ¼ 6.05 Â 10 À 10 for rs58262369 at 14q23.2. The odds ratios for prostate cancer were 1.23 and 0.78 for the effect allele C of rs12791447 and rs58262369, respectively (Fig. 3). We also identified two loci with suggestive evidence for prostate cancer association, rs4749884 at 10p14 (P additive ¼ 6.68 Â 10 À 6 ) and rs75718479 at 12q23.1 (P additive ¼ 1.12 Â 10 À 6 ). No significant heterogeneity for these four SNPs was observed among different sets of study populations (P heterogeneity 40.05; Table 1).

ARTICLE
We also investigated the associations of these four SNPs with disease aggressiveness among prostate cancer patients. However, we did not find significant associations with aggressiveness of prostate cancer for these four SNPs (P additive 40.05; Supplementary Table 6).

Examination of identified susceptibility loci in Europeans.
To investigate the association of prostate cancer risk with these identified genetic loci in other ethnicities, we tested the associations in individuals of European descent using data from Cancer Genetic Markers of Susceptibility. No significant association with prostate cancer risk was found for these four SNPs and other SNPs that are in strong LD with these four SNPs (r 2 40.70; Supplementary Table 7).

Functional annotation and expression analysis of novel loci.
In the meta-analysis of GWAS, we identified two novel loci associated with prostate cancer, rs58262369 at 14q23.2 (ESR2; Fig. 4a) and rs12791447 at 11p15.4 (PPFIBP2; Fig. 4b). The mRNA expression levels of genes at these two loci were analysed in prostate tumour tissues and normal tissues (Supplementary Table 8). Examination of rs12791447 using data from Encyclopedia of DNA Elements (ENCODE) revealed significant modification of H3K4me3 (Supplementary Table 9). We also found that rs12791447 was an expression quantitative trait locus (eQTL) for PPFIBP2. However, rs58262369 was not an eQTL for ESR2 or other genes at 14q23.2 (Supplementary  Table 10).

Discussion
In this large-scale association analysis in Asians, we identified two novel prostate cancer susceptibility loci at 11p15.4 and 14q23.2 that reached genome-wide significance level, as well as two additional loci at 10p14 and 12q23.1 with suggestive evidence for association with prostate cancer risk.
It is noted that for two of these four SNPs, rs58262369 at 14q23.2 and rs75718479 at 12q23.1, the genotypes in the two original GWAS were imputed. However, the quality of imputation for these two SNPs was high (see Methods). Furthermore, we genotyped these two SNPs by the TaqMan assay in 1,500 subjects randomly selected from the Chinese GWAS. The concordant rate between imputed and genotyped data was 99.3% for rs58262369 and 99.5% for rs75718479.
The SNP rs58262369 at 14q23.2 is located in the 3 0 -untranslated region (3 0 -UTR) of the ESR2 gene (also known ESR beta), which encodes a member of the family of oestrogen receptors (Fig. 4a). Oestrogen plays a vital role in mammary gland development and is also involved in prostate cancer progression, exerting its biological actions through the oestrogen receptors ESR1 and ESR2 (refs 8,9). ESR2, originally discovered in a rat prostate cDNA library, has highest expression in normal prostate epithelial cells 10 . ESR2 knockout mice developed prostatic hyperplasia during the aging process 11 . It has been shown that ESR2 has antiproliferative and pro-differentiation roles in the prostatic epithelium, reducing prostate cancer risk 9,12 . We also found that ESR2 was upregulated in prostate tumour tissues compared with normal prostate tissues in The Cancer Genome Atlas (TCGA) data (P t-test ¼ 0.028; Supplementary  Table 8). However, the magnitude of ESR2 mRNA expression was not high, which should be confirmed via analysis of the expression levels of ESR2 protein in prostate tumour tissues. Several genetic variants in ESR2, including rs1256049, have been demonstrated to contribute to prostate cancer risk 13,14 . However, this SNP (rs1256049), which is in weak LD with rs58262369 (r 2 ¼ 0.05), showed no association with prostate cancer in the current study (P additive ¼ 0.149). HaploReg prediction revealed that rs58262369 disrupted the motifs of EWSR1-FLI1 and STAT, indicating that it would have a regulatory impact on ESR2 (Supplementary Table 9). The SNP rs58262369 also maps to a highly conserved sequence in the 3 0 -UTR of ESR2, which may disturb the regulation of ESR2 by the binding of microRNA-942 predicted by MirSNP 15 . Our eQTL analysis using TCGA data showed no significant association of rs58262369 with ESR2 gene expression (P additive ¼ 0.482) in prostate cancer tissues, nor with any other genes within 500 kb of rs58262369 (Supplementary  Table 10). It is possible that this variant in the 3 0 -UTR of ESR2 is related with efficiency of translation of the ESR2 protein. Further fine-mapping studies are warranted to elucidate the biological role of this association signal at 14q23.2.
The SNP rs12791447 at 11p15.4 is located in an intron of the PPFIBP2 gene (Fig. 4b). This gene encodes the PTPRF-interacting protein, binding protein 2 (liprin beta 2), which was differentially expressed in endometrial cancer 16 . We also found that the expression level of PPFIBP2 was significantly lower in tumour tissues than in normal tissues in TCGA data set (P t-test ¼ 1.89 Â 10 À 10 ; Supplementary Table 8). Several other genes at this locus, including CYB5R2, EIF3F, NLRP10, OLFML1 and OVCH2, were also found to be significantly expressed in prostate tumours in comparison with normal tissues. In TCGA prostate cancer data, rs12791447 acted as an eQTL for PPFIBP2 and NLRP14, which was consistent with the expression levels of blood-related eQTL results for PPFIBP2 (Supplementary Table 10). The GM12878 data from ENCODE suggest that rs12791447 maps to a strong enhancer region marked by the histone modification of H3K4me3 (Supplementary Table 9). On the basis of above data, we propose that PPFIBP2 is the candidate susceptibility gene involved in the association observed at this locus. However, other causal or functional SNPs operating the regulatory mechanisms at this locus cannot be ruled out. It is noted that rs12791447 is B530 kb away from two SNPs (rs7126629 and rs7127900) at 11p15.4, which were previously reported to be associated with prostate cancer susceptibility in European and Chinese populations, respectively 17,18 . The association of rs12791447 with prostate cancer was not changed after adjusting for either rs7126629 or rs7127900, indicating that it is independent from these two reported SNPs at 11p15.4.
The SNP rs4749884 at 10p14 is in an intergenic region. ENCODE data indicate that alleles at rs4749884 can disrupt the binding of the transcription factor MAFK, which suppressed the expression of the hemeoxygenase-1 (HO-1) gene induced by transforming growth factor-b (ref. 19). The SNP rs4749884 is 118 kb away from rs7918885, a SNP that was reported to be associated with prostate cancer in a previous GWAS in West African men 20 . The SNP rs4749884 is not in LD with rs7918885 (r 2 ¼ 0.002). Meanwhile, rs7918885 and other SNPs in LD with it were not associated with prostate cancer risk in our metaanalysis, suggesting that multiple causal variants may exist at 10p14 in different ethnicities.
The SNP rs75718479 lies at 12q23.1, a region where DNA gain was commonly found in localized prostate tumours of Japanese patients 21 . This region contains a gene of rhabdomyosarcoma 2-associated transcript (RMST). RMST is a newly identified long  noncoding RNA, which acts as a transcriptional co-regulator of SOX2 and is implicated in neurogenesis 22 . Recent studies in mouse models have revealed that Rmst was differentially expressed in midbrain dopaminergic neurons in the developing mouse brain 23 . However, the role of RMST in prostate cancer is currently unknown. Further studies are warranted to the correlation between rs75718479 and RMST in the development of prostate cancer. It should be noted that four SNPs identified in the metaanalysis GWAS were not replicated in the Japanese descent of the MEC study. The inconsistent results might be because of the differences of lifestyle factors and environment between Japanese individuals in Japan and in the United States, as well as different genetic and environmental risk factors between Japanese and Chinese. Lack of confirmation in the Japanese descent from the MEC population was also reported for SNPs identified in GWAS of Japanese individuals in Japan 4 . Moreover, we did not observe any significant relationship between these four SNPs and prostate cancer risk in the European men. The frequencies of these four SNPs varied among different populations (Supplementary Table 11). For example, rs58262369 at 14q23.2 was not polymorphic in the European population. Further evaluation in other ethnic groups may be needed.
In summary, this study identified two novel susceptibility loci at 11p15.4 and 14q23.2, and two novel loci with suggestive evidence at 10p14 and 12q23.1, which is the largest GWAS to our knowledge on prostate cancer among Asian individuals. Further studies of fine-mapping and functional analyses are warranted to identify and characterize the causative variants for prostate tumorigenesis.

Methods
Study populations. The meta-analysis of GWAS in this study was based on two GWAS on prostate cancer in Japan and China 5,6 . The Japanese prostate cancer GWAS included 1,583 cases and 3,386 controls from the BioBank Japan, which was established in the Institute of Medical Science at the University of Tokyo 24 . From the BioBank Japan, prostate cancer cases were selected according to the pathological evaluation of biopsy. Among the selected subjects, 229 individuals had a positive family history, and 952 cases had high Gleason grade score. Controls in the Japanese GWAS were also selected from BioBank Japan for other diseases as well as healthy subjects from the Osaka-Midosuji Rotary Club (Osaka, Japan). The Chinese prostate cancer GWAS comprised 1,417 prostate cancer cases and 1,008 controls who are part of the ChinaPCa (ref. 25). All recruited subjects self-reported being Han Chinese. Cases were collected from local hospitals and were newly diagnosed with pathologically confirmed prostate cancer. Control subjects were healthy individuals recruited from community-based studies or while receiving a physical examination in the hospital. Control subjects were excluded if they had an abnormal prostate-specific antigen level higher than 4.0 ng ml À 1 or positive digital rectal examination.
Three additional study populations (Replications 1, 2a and 2b) were used to validate the suggestive associations from the meta-analysis of GWAS. Individuals in Replications 1 and 2a were Chinese and consisted of 1,664 cases and 1,523 controls from Shanghai and 908 cases and 1,354 controls from Nanjing. Most subjects have been used in previously published association studies [25][26][27] . Cases were histopathologically confirmed as having prostate cancer, and the controls were all cancer-free males from Shanghai, Nanjing and surrounding areas. Replication 2b was men of Japanese descent from the MEC, a large populationbased prospective study including more than 215,000 adults in Hawaii and California 28 . The GWAS of prostate cancer in the MEC study included 1,033 cases and 1,042 controls. The Cancer Genetic Markers of Susceptibility prostate cancer GWAS was a nested case-control study, including 1,172 cases and 1,157 controls, which was part of the Prostate, Lung, Colon and Ovarian Cancer Screening Trial 29 . All participants provided informed consent. This study was approved by the institutional review boards of the Fudan University, Nanjing Medical University and the University of Tokyo and RIKEN Yokohama Institute.
Genotyping and quality control. The Japanese GWAS was conducted using the Illumina Human610-Quad BeadChip for cases and HumanHap550v3 BeadChip for controls, and the Chinese GWAS was genotyped using the Illumina Human OmniExpress BeadArray. In addition to quality-control procedures performed in previous studies, SNPs with genotyping rate ofo95%, MAFo0.05 or Po0.001 in a Hardy-Weinberg Equilibrium test were further removed before imputation analysis. After exclusion of imputed SNPs with MAFo0.05 or information score (INFO)o0.5, 6,268,427 and 4,575,935 SNPs remained for Japanese GWAS and Chinese GWAS, respectively. A total of 4,550,396 SNPs were available in both studies and used for further meta-analysis. We selected all SNPs with Po1.0 Â 10 À 4 for the Replication 1 study. Genotyping for Replication 1 was conducted using the Sequenom MassARRAY platform, and genotyping for Replication 2a was conducted using the TaqMan assays. The primer and probe sequences are shown in Supplementary Table 12. On each 96-well plate, two negative controls (water) and two blinded duplicates were included as qualitycontrol samples. Genotyping was performed independently by two researchers in a blinded manner. For in silico replication of the MEC study, genotyping of the subjects was conducted using the Illumina Human660W bead array 30 .
Functional annotation. Genomic annotations on the index SNPs and surrogates were identified using the ENCODE data. The threshold of LD was r 2 40.80 on the basis of Asians from the 1000 Genomes Project. HaploReg 2.0 (http:// www.broadinstitute.org/mammals/haploreg/haploreg.php/) was used to query the histone marks, DNase I hypersensitivity, the binding of transcription factor and motif instances 31 . RegulomeDB (http://regulome.stanford.edu/) was then used to explore potential functional annotations for the genomic regions surrounding the index SNPs 32 . The high score of RegulomeDB indicated a relatively high degree of evidence for potential regulatory function. Furthermore, SNPs in 3 0 -UTR of gene were predicted using MirSNP (http://cmbi.bjmu.edu.cn/mirsnp) to test the changes in microRNA-binding sites 15 .
Expression analysis. We downloaded the mRNA expression profiles from TCGA project (http://cancergenome.nih.gov/) by RNA-Seq (level 3). A total of 497 prostate tumour tissues and 115 normal prostate tissues were available on 10 January 2015. The reads per kilobase per million was used to quantify of gene expression levels. The expression levels of mRNA were normalized by using log2 transformation, and genes with zero value or missing data in any sample were removed. Differential expression of genes for each gene was measured between tumour and normal tissues.
For each individual in the TCGA data, we also obtained level 2 SNP data from prostate tumour tissues, normal tissues and blood, which were genotyped using Affymetrix SNP Array 6.0 chips. SNPs within 500 kb of index SNP were extracted from genotyped or imputed genotypes. The eQTL analysis was measured only in tumour tissues with mRNA expression. We also queried the correlation between variants and candidate gene expression from the Blood eQTL Brower 33 .
Statistical analysis. SNPs that passed quality control were imputed on the basis of data from the 1000 Genomes Project (version 3, March 2012 release) using IMPUTE2 (ref. 34). Only imputed genotypes of high quality (INFO40.50) were included in further association analysis. The associations in Japanese GWAS and Chinese GWAS were first analysed under an additive model using PLINK1.07. The odds ratio, 95% CI, the corresponding s.e. and P value for each SNP were obtained from the logistic regression model. The meta-analysis of two GWAS data was performed using a fixed-effect model and weighted by the estimated SE using GWAMA (ref. 35). The heterogeneity of studies was tested by the Cochran's Q statistic and I 2 . If the test of heterogeneity existed (Po0.01), the random-effect model was provided. The LD structure (r 2 ) was calculated using Haploview 4.2 (ref. 36). Expressed differences in each gene between tumours and normal samples were measured by the Student's t-test. The relationships of SNP genotypes with gene expression levels were evaluated using a linear regression model.