Prostate cancer risk prediction using a polygenic risk score

Hereditary factors have a strong influence on prostate cancer (PC) risk and poorer outcomes, thus stratification by genetic factors addresses a critical need for targeted PC screening and risk-adapted follow-up. In this Finnish population-based retrospective study 2283 clinically diagnosed and 455 screen-detected patients from the Finnish Randomised Study of Screening for Prostate Cancer (FinRSPC), 2400 healthy individuals have been involved. Individual genetic risk through establishment of a polygenic risk score based on 55 PC risk SNPs identified through the Finnish subset of the Collaborative Oncological Gene-Environment Study was assessed. Men with PC had significantly higher median polygenic risk score compared to the controls (6.59 vs. 3.83, P < 0.0001). The polygenic risk score above the control median was a significant predictor of PC (OR 2.13, 95% CI 1.90–2.39). The polygenic risk score predicted the risk of PC with an AUC of 0.618 (95% CI 0.60–0.63). Men in the highest polygenic risk score quartile were 2.8—fold (95% CI 2.4–3.30) more likely to develop PC compared with men in the lowest quartile. In the FinRSPC cohort, a significantly higher percentage of men had a PSA level of ≥ 4 ng/mL in polygenic risk score quartile four compared to quartile one (18.7% vs 8.3%, P < 0.00001). Adding the PRS to a PSA-only model contributed additional information in predicting PC in the FinRSPC model. Results strongly suggest that use of the polygenic risk score would facilitate the identification of men at increased risk for PC.

When divided into PRS quartiles, PC cases were distributed 18%, 25%, 27% and 30% from the lowest to the highest quartile. For the controls, the proportions showed an opposite pattern (33%, 26%, 22%, 19%, respectively). Showing that nearly a third of the PC cases belong to the highest PRS quartile, while one-third of the controls belong to the lowest PRS quartile. Men in the highest PRS quartile were of 2.8-fold (95% CI 2.40-3.30) higher risk of PC compared with men in the lowest quartile.
The overall receiver operator curve AUC of the PRS to predict PC was 0.  Table 2) with an AUC of 0.549 (95% CI 0.51-0.59). Although, 70.0% of the men with high PSA at diagnosis (PSA > 20 ng/mL) had a PRS above the median, no association between the PRS and high PSA at diagnosis could be identified. Similarly, there was no significant association between PRS and high Gleason score, advanced stage, tumour and nodal stage or lethal PC, possibly due to the nature of the SNPs included and low amount of cases (Table 2). Association with PSA in the subset of FinRSPC cohort: the FinRSPC model. When the FinRSPC cohort was divided into negative and positive PSA (PSA < 4 ng/mL vs PSA ≥ 4 ng/mL), the number of men with elevated PSA increased in each PRS quartile (Table 3, A). The association between PSA and PRS is illustrated by the fact that 8.3% of men in the lowest PRS quartile had PSA ≥ 4 ng/mL compared to 18.7% in the highest quartile (χ 2 = 32.95; P < 0.00001).

Discussion
We constructed a population-specific PRS for PC and evaluated its application in genetic risk stratification. The Prs was higher among men with PC than the controls, as indicated by the median and proportion above the control median, with an odds ratio of 2.13. The AUC was 0.62, with sensitivity of 0.68. The PC risk also increased with PRS when it was divided into quartiles. The PRS was associated with metastatic disease, however, it was not associated with other indicators of poor prognosis such as high Gleason score or advanced disease. Furthermore, within the screening trial, PRS was associated with the proportion of men with positive PSA 10 and contributed to detect PC.
Our finding (ROC 0.62) was comparable with previous studies, despite our use of a relatively small number of SNPs (n = 55). Previous studies using risk allele based polygenic scores have shown ROC values of 0.54-0.68 for PC [11][12][13] . However, they have provided only limited evidence that the PRS using common variants improves risk prediction 8,12,13 .
Genome-wide association studies (GWASs) for metastatic PC are lacking 14 and few studies have investigated the association between known germline PC risk variants and metastatic disease diagnosis or development of metastasis after initial treatment 14,15 . All of the metastatic patients in this cohort had already been found at diagnosis due to the retrospective nature of this study. The identified association of PRS with metastatic disease at diagnosis is likely due to the inclusion of PC risk SNPs, which have earlier been found to be associated with metastatic PC risk [14][15][16] . The lack of association with other clinical variables is in line with earlier findings. Since the performance of our PRS was poorer for metastatic disease than PC overall, it offers only limited use for individual prediction of the risk of metastasis.
PSA has long been used as the primary biomarker for PC diagnosis, however, PSA screening results are frequent in false-positive results and overdiagnosis 17,18 . Therefore, population-based screening is not recommended 19,20 . In this study, we show that in the FinRSPC screening cohort the PRS quartiles are associated with elevated PSA of ≥ 4 ng/mL at diagnosis and that the PRS contributed additional information beyond PSA and age in predicting PC in the screening trial men. Performance of the PRS in screening needs an additional prospective cohort in order to test its applicability for population-based screening to supplement PSA-based stewardship in screening for PC. The main strength of the study is that it is population-based, therefore the selection bias is minimized and the generalizability is increased. Previous studies have included mainly risk variants (OR > 1) for the construction of the PRS 13 . We used both risk (per allele OR > 1) and protective (per allele OR < 1) SNPs to capture genetic variation in risk more widely. Furthermore, we designed a population-specific risk score, as PC risk variants and their frequencies differ between populations 21 .
Naturally, the study has some limitations. In the study population there are only few PC deaths and aggressive cases. Since it is based on retrospective data, validation in a prospective setting will elevate the power and would potentially improve the study. In particular, application of the PRS in population screening needs to be conclusively evaluated in a prospective trial in order to test the PRS for clinical implications and potential benefit. Since this a Finnish population based PRS study, application in other, less homogeneous population is needed.
Our findings show that a subgroup of men at an increased risk of PC (OR > 2) can be identified based on a PRS. However, the accuracy in predicting was limited (AUC 0.62). The fact that PRS contributed additional information above PSA and age suggests that its usefulness in screening is worthwhile.

Materials and methods
All methods were carried out in accordance with relevant guidelines and regulations.
The flow diagram shown in Fig. 1 presents the steps of participant enrolment to the study (A) and selection of SNPs for PRS calculation (B).

Study participants.
All genotyped PC patients and controls without PC were of Finnish origin. The study protocol was reviewed and approved by the research Ethics committee at Pirkanmaa Hospital District (tracking numbers R10167, 90,577, R03203). Permission for the use of samples was given by the National Supervisory Authority for Welfare and Health (VALVIRA). Informed consent was obtained from the participants involved in the study. Altogether, 2738 non-familial PC cases were included in the study. Of them, 2283 were clinical cases from the Pirkanmaa Hospital District, and 455 were from the Finnish Randomized Study of Screening for Prostate Cancer (FinRSPC) 22 , which is the largest component of the European Randomized Study of Screening for www.nature.com/scientificreports/ Prostate Cancer (ERSPC) 23 . Cancer free control subjects (n = 2400) were identified through the FinRSPC trial 22 . The FinRSPC trial population and the protocol population have been described in detail elsewhere 24 . Briefly, 80,458 men aged 55-67 years were enrolled during 1996-1999, with 32,000 randomised to the screening arm and invited to PSA-based screenings at four-year intervals. Clinical characteristics of the genotyped PC patients, separately for clinically detected and for screening trial cases, are summarized in Table 4. PSA at diagnosis was classified as ≤ 20 versus > 20 ng/mL. Gleason score was divided into ≤ 6, 7 and ≥ 8. Stage was divided into organ-confined (T1-2, N0/x, M0/x) versus advanced disease (T3-4, or N1 or M1). PC death was defined based on the underlying cause recorded as the official cause of death by Statistics Finland.
Genotyping and quality control. The original genotyping was carried out by the PRACTICAL (Prostate Cancer Association group to Investigate Cancer Associated Alterations in the Genome) consortium. The genotyping outcome was obtained from the use of a custom Illumina Infinium array (iCOGS), as described previously 8 .
Single nucleotide polymorphism selection and statistical analyses. The Hardy-Weinberg equilibrium was ensured by checking that the proportion of each genotype obtained was in agreement with the expectation calculated from the allele frequencies. Statistical analyses were performed with IBM SPSS version 25 (SPSS Inc., Chicago, USA) unless otherwise specified. For each SNP, allelic ORs for PC with 95% confidence intervals were computed using logistic regression. A total of 55 variants shown to be associated with PC in the Finnish subset from iCOGS (Supplementary Table 2) were chosen for the calculation of the PRS based on the www.nature.com/scientificreports/ selection criteria described in Fig. 1. In short, selected SNPs were associated with PC at a genome-wide significance level (p < 5 × 10 -8 ) and had the effect size of OR > 1.1 for risk SNPs and OR < 0.9 for protective SNPs.
We assessed the PRS of men with and without PC, and also separately for clinically diagnosed and screening trial cases. Sensitivity and specificity of the PRS were calculated. The use of the control median as the cut-off-point showed a near-optimal sensitivity and specificity. Therefore, the study participants were divided into those with a polygenic risk below and above the control median, which represents men free of PC. The odds ratio for PC risk prediction relating to the PRS above median was evaluated by logistic regression with PC as the outcome. We evaluated the predictive performance of PRS by calculating the area under the curve (AUC) of the receiver operator characteristic (ROC). Evaluation of the discriminative potential of the PRS for subsets of cases with high PSA at diagnosis, high Gleason score, advanced stage, local and distant progression, and PC death was performed with the same methodology.
In order to evaluate the possible implications of the PRS in the screening trial, we evaluated the additional contribution of PRS quartiles incremental to PSA and age in predicting PC in the FinRSPC cohort. Logistic regression models including PSA, age and the PRS were applied to assess PC prediction and the AUC calculated. All reported p values are two-sided. polygenic risk score calculation. A PRS for each individual was calculated by summing the number of risk alleles 25 at each of the 55 SNPs multiplied by the logarithm of the SNP's OR as follows: where βi is the per-allele log-odds ratio for locus i, x ij represents the number of risk alleles (i.e., 0, 1 or 2) carried by an individual j at locus i, and n is the number of loci. The risk conferred by each of the variants is assumed to be allele dose-dependent with a multiplicative (log-additive) effect on a relative risk scale 6 . Under the multiplicative model, the distribution of polygenic risk in the population follows the normal distribution, when relative risk is plotted on a logarithmic scale, with mean, μ, and variance σ 2 . We set the mean, μ = −σ 2 /2, so that the mean relative risk in the population is equal to unity. Log-transformation of non-normally distributed PRS data was applied.