Introduction

Prostate cancer is the second most common cancer diagnosed in men worldwide, causing substantial morbidity and mortality1. Prostate cancer screening may reduce morbidity and mortality2,3,4,5, but to avoid overdiagnosis and overtreatment of indolent disease6,7,8,9, it should be targeted and personalized. Prostate cancer age at diagnosis is important for clinical decisions regarding if/when to initiate screening for an individual10,11. Survival is another key cancer endpoint recommended for risk models12.

Genetic risk stratification is promising for identifying individuals with a greater predisposition for developing cancer13,14,15,16, including prostate cancer17. Polygenic models use common variants—identified in genome-wide association studies—whose combined effects can assess the overall risk of disease development18,19. Recently, a polygenic hazard score (PHS) was developed as a weighted sum of 54 single-nucleotide polymorphisms (SNPs) that models a man’s genetic predisposition for developing prostate cancer13. Validation testing was done using ProtecT trial data2 and demonstrated the PHS to be associated with age at prostate cancer diagnosis, including aggressive prostate cancer13. However, the development and validation datasets were limited to men of European ancestry. While genetic risk models might be important clinical tools for prognostication and risk stratification, using them may worsen health disparities20,21,22,23,24 because most models are constructed using European data and may underrepresent genetic variants important in persons of non-European ancestry20,21,22,23,24. Indeed, this is particularly concerning in prostate cancer, as race/ethnicity is an important prostate cancer risk factor; diagnostic, treatment, and outcomes disparities continue to exist between different races/ethnicities25,26.

Here, we assessed PHS performance in a multi-ethnic dataset that includes individuals of European, African, and Asian genetic ancestry. This dataset also includes long-term follow-up information, affording an opportunity to evaluate PHS for association with fatal prostate cancer.

Results

Adaption of PHS for OncoArray

Of the 30 SNPs from PHS1 not directly genotyped on OncoArray, proxy SNPs were identified for 22 (linkage disequilibrium ≥ 0.94). Therefore, PHS2 included 46 SNPs, in total (Supplementary Information). PHS2 association with age at aggressive prostate cancer diagnosis in ProtecT was similar to that previously reported for PHS1 (z = 21.7, p = 3.6 × 10−104 for PHS1; z = 21.4, p = 1.3 × 10−101 for PHS2). HR98/50 was 4.68 [95% CI: 3.62–6.15] for PHS2, compared to 4.61 [3.52–5.99] for PHS1.

PHS association with any prostate cancer in OncoArray

PHS2 was associated with age at prostate cancer diagnosis in all three OncoArray-defined genetic ancestry groups (Table 1). Comparing the 80th and 20th percentiles of genetic risk, men with high PHS had an HR of 5.32 [4.99–5.70] for any prostate cancer. Within each genetic ancestry group, men with high PHS had HRs of 5.54 [5.18–5.93], 4.49 [3.23–6.33], and 2.54 [2.08–3.10] for men of European, Asian, and African ancestry, respectively.

Table 1 Association of PHS with prostate cancer.

PHS association with aggressive prostate cancer in OncoArray

PHS2 was associated with age at aggressive prostate cancer diagnosis in all three OncoArray-defined genetic ancestry groups (Table 2). Comparing the 80th and 20th percentiles of genetic risk, men with high PHS had an HR of 5.88 [5.46–6.33] for aggressive prostate cancer; within each genetic ancestry group, men with high PHS had HRs of 5.62 [5.23–6.05], 5.16 [4.79–5.55], and 2.43 [2.26-2.61] for men of European, Asian, and African ancestry, respectively.

Table 2 Association of PHS with aggressive prostate cancer.

PHS association with fatal prostate cancer in OncoArray

PHS2 was associated with age at prostate cancer death for all men in the multi-ethnic dataset (z = 15.9, p = 6.3 × 10−57). Table 3 shows z-scores and corresponding HRs for fatal prostate cancer. Comparing the 80th and 20th percentiles of genetic risk, men with high PHS had a HR of 5.68 [5.07–6.46] for prostate cancer death.

Table 3 Association of PHS with death from prostate cancer.

Sensitivity analyses

Sensitivity analyses demonstrated that large changes in assumed population incidence had minimal effect on the calculated HRs for any, aggressive, or fatal prostate cancer (Supplementary Information).

PHS and family history

Family history was also associated with any prostate cancer (z = 39.7, p < 10−300; Table 4), aggressive prostate cancer (z = 32.4, p = 2.7 × 10−230), and fatal prostate cancer (z = 8.76, p = 1.4 × 10−18) in the multi-ethnic dataset. Among those with known family history, the combination of family history and PHS performed better than family history alone (log-likelihood p < 10−300). This pattern held true when analyses were repeated on each genetic ancestry. Additional family history analyses are reported in the Supplementary Information.

Table 4 Multivariable models with both PHS and family history of prostate cancer (≥1 first-degree relative affected) for association with any prostate cancer in the multi-ethnic dataset, and by genetic ancestry.

PHS associations with aggressive prostate cancer using alternative ancestry groupings

Agnostic genetic ancestry groupings with fastSTRUCTURE

With fastSTRUCTURE, the optimal model was the one with K = 2 clusters: cluster 1 had mainly men of European OncoArray-defined genetic ancestry and self-reported race/ethnicity, cluster 2 had only men of African OncoArray-defined genetic ancestry and mostly Black/African American self-reported race/ancestry, while the Admixed cluster included men of all Oncotype-defined genetic ancestries. Table 5 demonstrates the HR80/20 for aggressive prostate cancer for these K = 2 fastSTRUCTURE-defined clusters. Comparing the 80th and 20th percentiles of genetic risk, men with high PHS had HRs for aggressive prostate cancer of 5.60 [5.55, 5.64], 2.06 [2.03, 2.09], and 5.05 [4.89, 5.21] for cluster 1, cluster 2, and admixed cluster, respectively. Corresponding results for the K = 3–6 clustering approaches are shown in the Supplementary Information.

Table 5 Association of PHS with aggressive prostate cancer, by two clusters using fastSTRUCTURE.

Self-reported race/ethnicity

HRs for aggressive prostate cancer comparing the 80th and 20th percentiles of genetic risk when participants are stratified by their self-reported race/ethnicity are shown in the Supplementary Information.

Discussion

These results confirm the previously reported association of PHS with age at prostate cancer diagnosis in Europeans and show that this finding generalizes to a multi-ethnic dataset, including men of European, Asian, and African ancestry. PHS is also associated with age at aggressive prostate cancer diagnosis and at prostate cancer death. Comparing the highest and lowest quintiles of genetic risk, men with high PHS had HRs of 5.32, 5.88, and 5.68 for any prostate cancer, aggressive prostate cancer, and prostate cancer death, respectively.

We found that PHS is associated with prostate cancer in men of European, Asian, and African genetic ancestry (and a wider range of self-reported race/ethnicities). Current prostate cancer screening guidelines suggest possible initiation at earlier ages for men of African ancestry, given higher incidence rates and worse survival when compared to men of European ancestry26. Using the PHS to risk-stratify men might help with decisions regarding when to initiate prostate cancer screening: perhaps a man with African genetic ancestry in the lowest percentiles of genetic risk by PHS could safely delay or forgo screening to decrease the possible harms associated with overdetection and overtreatment9, while a man in the highest risk percentiles might consider screening at an earlier age. Similar reasoning applies to men of all genetic ancestries. Risk-stratified screening should be prospectively evaluated.

PHS performance was better in those with OncoArray-defined European and Asian genetic ancestry than in those with African ancestry. For example, comparing the highest and lowest quintiles of genetic risk, men with OncoArray-defined European and Asian genetic ancestry with high PHS had HRs for any prostate cancer of 5.54 and 4.49 times, respectively, while the analogous HR for men of African genetic ancestry was 2.54. This trend was also observed for aggressive prostate cancer. Moreover, the optimal fastSTRUCTURE clustering of our dataset (K = 2) yielded one cluster that consisted of almost only men of African ancestry (by both self-report and OncoArray-defined genetic ancestry) and had inferior risk stratification with PHS2 (HR 2.06), compared to the performance observed in the other cluster (nearly all European) and an admixed cluster (HRs 5.60 and 5.05, respectively). Overall, these results suggest PHS can differentiate men of higher and lower risk in each ancestral group, but the range of risk levels may be narrower in those of African ancestry. Possible reasons for relatively diminished performance include increased genetic diversity with less linkage disequilibrium in those of African genetic ancestry27,28,29. Known health disparities may also contribute25, as the availability—and timing—of PSA results may depend on healthcare access. Alarmingly, there has historically been a poor representation of African populations in clinical or genomic research studies20,21. This pattern is reflected in the present study, where most men of African genetic ancestry were missing clinical diagnosis information used to determine disease aggressiveness. That such clinical information is less available for men of African ancestry also leaves open the possibility of systematic differences in the diagnostic workup—and therefore the age of diagnosis—across different ancestry populations. These are critical health disparities that will need to be addressed (and ultimately eliminated) to ensure equitable and accurate genomic prostate cancer stratification for all men. Notwithstanding these caveats, the present PHS is associated with age at prostate cancer diagnosis in men of African ancestry, possibly paving the way for more personalized screening decisions for men of African descent. Promising efforts are also underway to further improve PHS performance in men of African ancestry30.

The first PHS validation study used data from ProtecT, a large prostate cancer trial2,13. ProtecT’s screening design yielded biopsy results from both controls and cases with PSA ≥ 3 ng/mL, making it possible to demonstrate improved accuracy and efficiency of prostate cancer screening with PSA testing. Limitations of the ProtecT analysis, though, include few recorded prostate cancer deaths in the available data, and the exclusion of advanced cancer from that trial2. The present study includes long-term observation, with both early and advanced disease18, allowing for evaluation of PHS association with any, aggressive, and fatal prostate cancer; we found PHS to be associated with all outcomes.

Age is critical in clinical decisions of whether men should be offered prostate cancer screening31,32,33,34 and in how to treat men diagnosed with prostate cancer31,32. Age may also inform prognosis32,35. Age at diagnosis or death is therefore of clinical interest in inferring how likely a man is to develop cancer at an age when he may benefit from treatment. One important advantage of the survival analysis used here is that it permits men without cancer at the time of the last follow-up to be censored while allowing for the possibility of them developing prostate cancer (including aggressive or fatal prostate cancer) later on. prostate cancer death is a hard endpoint with less uncertainty than clinical diagnosis (which may vary with screening practices and delayed medical attention). PHS may help identify men with a high (or low) genetic predisposition to develop lethal prostate cancer and could assist physicians in deciding when to initiate screening.

Current guidelines suggest considering a man’s individual cancer risk factors, overall life expectancy, and medical comorbidities when deciding whether to screen6. The most prominent clinical risk factors used in practice are family history and race/ethnicity6,36,37. Combined PHS and family history performed better than either alone in this multi-ethnic dataset. This finding is consistent with a prior report that PHS adds considerable information over family history alone. The prior study did not find an association of family history with age at prostate cancer diagnosis, perhaps because the universal screening approach of the ProtecT trial diluted the influence of family history on who is screened in typical practice13. In the present study, family history and PHS appear complementary in assessing prostate cancer genetic risk. Moreover, the HRs for PHS suggest clinical relevance similar or greater to predictive tools routinely used for cancer screening (e.g., breast cancer) and for other diseases (e.g., diabetes and cardiovascular disease). HRs reported for those tools are around 1–3 for disease development or other adverse outcome38,39,40,41,42; HRs reported here for PHS (for any, aggressive, or fatal prostate cancer) are similar or greater.

Limitations to this work include that the dataset comes from multiple, heterogeneous studies, from various populations with variable screening rates. This allowed for a large, multi-ethnic dataset that includes clinical and survival data, but comes with uncertainties avoided in the ProtecT dataset used for original validation. However, the heterogeneity would likely reduce the PHS performance, not systematically inflate the results. Second, we note that no germline SNP tool, including this PHS, has been shown to discriminate men at risk of aggressive prostate cancer from those at risk of only indolent prostate cancer. Third, while the OncoArray-defined and fastSTRUCTURE genetic ancestry classifications used here may be more accurate than self-reported race/ethnicity alone43 and allowed for evaluation of admixed genetic ancestry, detailed analysis of local ancestry was not assessed. As noted above, clinical data availability was not uniform across contributing studies and was lower in men of OncoArray-defined African genetic ancestry. Efforts to improve genetic risk prediction should focus on consistent data collection patterns and elimination of data disparities so that models are widely applicable for all men. We also found that while the optimal fastSTRUCTURE model had K = 2 clusters for risk stratification men for aggressive prostate cancer, models with more K clusters also produced comparable (or larger ranges) of hazard ratios for risk stratification. The ability of these models with more K clusters to risk-stratify men well (while possibly being less representative of the available data) emphasizes the dire need for more complex and deeper studies evaluating the intersection of genetics, the granularity of ancestry, and prostate cancer risk. In addition, the PHS may not include all SNPs associated with prostate cancer; in fact, over 60 additional SNPs have been reported since the development of the original PHS18. Some of these SNPs are ethnicity-specific, including within non-European populations44,45,46, and will be included in further model optimization to improve prostate cancer risk stratification. Future work could also evaluate the PHS performance in relation to epidemiological risk factors associated with prostate cancer risk beyond those currently used in clinical practice (i.e., family history and race/ethnicity). Finally, various circumstances and disease-modifying treatments may have influenced post-diagnosis survival to an unknown degree. Despite this possible source of variability in survival among men with fatal prostate cancer, PHS was still associated with age at death, an objective, and meaningful endpoint. Future development and optimization hold promise for improving upon the encouraging risk stratification achieved here in men of different genetic ancestries, particularly African.

In summary, PHS was associated with age at any and aggressive prostate cancer, and at death from prostate cancer in a multi-ethnic dataset. PHS performance was relatively diminished in men of African genetic ancestry, compared to performance in men of European or Asian genetic ancestry. PHS risk-stratifies men of various genetic ancestries for prostate cancer and should be prospectively studied as a means to individualize screening strategies seeking to reduce prostate cancer morbidity and mortality.

Methods

Participants

We obtained data from the OncoArray project47 that had undergone quality control steps18. This dataset includes 91,480 men with genotype and phenotype data from 64 studies (Supplementary Information). Individuals whose data were used in the prior development or validation of the original PHS model (PHS1) were excluded (n = 10,989)13, leaving 80,491 in the independent dataset used here. Table 6 describes available data. Individuals not meeting the endpoint for each analysis were censored at age of last follow-up.

Table 6 Participant characteristics, n = 80,491.

All contributing studies were approved by the relevant ethics committees; written informed consent was acquired from the study participants48. The present analyses used de-identified data from the PRACTICAL consortium.

Polygenic hazard score

The original PHS1 was validated for association with age at prostate cancer diagnosis in men of European ancestry using a survival analysis13. To ensure the score was not simply identifying men at risk of indolent disease, PHS1 was also validated for association with age at aggressive prostate cancer (defined as an intermediate-risk disease, or above6) diagnosis13. PHS1 was calculated as the vector product of a patient’s genotype (Xi) for n selected SNPs and the corresponding parameter estimates (βi) from a Cox proportional hazards regression:

$${\rm{PHS}} = \mathop {\sum}\limits_i^n {Xi} \beta i$$
(1)

The 54 SNPs in PHS1 were selected using PRACTICAL consortium data (n = 31,747 men) genotyped with a custom array (iCOGS, Illumina, San Diego, CA)13.

Adapting the PHS to OncoArray

Genotyping for the present study was performed using a commercially available, cancer-specific array (OncoArray, Illumina, San Diego, CA)18. Twenty-four of the 54 SNPs in PHS1 were directly genotyped on OncoArray. We identified proxy SNPs for those not directly genotyped and re-calculated the SNP weights in the same dataset used for the original development of PHS113 (Supplementary Methods).

The performance of the adapted PHS (PHS2), was compared to that of PHS1 in the ProtecT dataset originally used to validate PHS1 (n = 6411). PHS2 was calculated for all patients in the ProtecT validation set and was tested as the sole predictive variable in a Cox proportional hazards regression model (R v.3.5.1, “survival” package49) for age at aggressive prostate cancer diagnosis, the primary endpoint of that study. The performance was assessed by the metrics reported during the PHS1 development:13 z-score and hazard ratio (HR98/50) for aggressive prostate cancer between men in the highest 2% of genetic risk (≥98th percentile) vs. those with average risk (30–70th percentile). HR 95% confidence intervals (CIs) were determined by bootstrapping 1000 random samples from the ProtecT dataset50,51 while maintaining the same number of cases and controls. PHS2 percentile thresholds are shown in the Supplementary Information.

OncoArray-defined genetic ancestry

Self-reported race/ethnicities47,52, included European, Black, or African American (includes Black African, Black Caribbean), East Asian, South Asian, Hawaiian, Hispanic American, and Other/Unknown.

Genetic ancestry for each individual from the OncoArray project47 was provided with the PRACTICAL consortium data. Briefly, genotypes from 2318 ancestry informative markers were mapped into a two-dimensional space representing the first two principal components, which has been shown to yield results very similar to those obtained with the STRUCTURE approach52. The distance from the individual’s mapping to the three reference clusters (European, African, and Asian) was then used to estimate the individual’s genetic ancestry47,52. Individuals were classified into one of three OncoArray-defined labels; European: greater than 80% European ancestry, Asian: greater than 40% Asian ancestry, and African: greater than 20% African ancestry. Individuals not meeting any of the aforementioned three labels were classified as “other,” but all of the individuals in the present prostate cancer dataset met the criteria for one of the three OncoArray-defined genetic ancestries.

Any prostate cancer

We tested PHS2 for association with age at diagnosis of any prostate cancer in the multi-ethnic dataset (n = 80,491, Table 6).

PHS2 was calculated for all patients in the multi-ethnic dataset and used as the sole independent variable in Cox proportional hazards regressions for the endpoint of age at prostate cancer diagnosis. Due to the potential for Cox proportional hazards results to be biased by a higher number of cases in our dataset than in the general population, sample-weight corrections were applied to all Cox models using population data from Sweden13,53 (additional details are in Supplementary Information). Significance was set at α = 0.0113.

These Cox proportional hazards regressions (with PHS2 as the sole independent variable and age at prostate cancer diagnosis as the outcome) were then repeated for subsets of data, stratified by OncoArray-defined genetic ancestry: European, Asian, and African. Percentiles of genetic risk were calculated using data from the 9,728 men in the original (iCOGS) development set who were less than 70 years old and without prostate cancer13,54. HRs and 95% CIs for each genetic ancestry group were calculated to make the following comparisons: HR98/50, men in the highest 2% of genetic risk vs. those with average risk (30–70th percentile); HR80/50, men in the highest 20% vs. those with average risk, HR20/50, men in the lowest 20% vs. those with average risk; and HR80/20, men in the highest 20% vs. lowest 20%. CIs were determined by bootstrapping 1000 random samples from each genetic ancestry group50,51 while maintaining the same number of cases and controls. HRs and CIs were calculated for age at prostate cancer diagnosis separately for each genetic ancestry group.

Given that the overall incidence of prostate cancer in different populations varies, we performed a sensitivity analysis of the population case/control numbers, allowing the population incidence to vary from 25 to 400% of that reported in Sweden (chosen as an example population; Supplementary Information).

Aggressive prostate cancer

Recognizing that not all prostate cancer is clinically significant, we also tested PHS2 for association with age at aggressive prostate cancer diagnosis in the multi-ethnic dataset. For these analyses, we included cases that had known tumor stage, Gleason score, and PSA at diagnosis (n = 60,617 cases, Table 6). Aggressive prostate cancer cases were those that met any of the following criteria6,13: Gleason score ≥7, PSA ≥ 10 ng/mL, T3–T4 stage, nodal metastases, or distant metastases. As before, Cox proportional hazards models and sensitivity analysis were used to assess the association.

Fatal prostate cancer

Using an even stricter definition of clinical significance, we evaluated the association of PHS2 with age at prostate cancer death in the multi-ethnic dataset. All cases (regardless of staging completeness) and controls were included, and the endpoint was the age at death due to prostate cancer. This analysis was not stratified by genetic ancestry due to low numbers of recorded prostate cancer deaths in the non-European datasets. The cause of death was determined by the investigators of each contributing study using cancer registries and/or medical records (Supplementary Information). At last follow-up, 3983 men had died from prostate cancer, 5806 had died from non-prostate cancer causes, and 70,702 were still alive. The median age at the last follow-up was 70 years (IQR: 63–76). As before, Cox proportional hazards models and sensitivity analysis were used to assess the association.

PHS and family history

Prostate cancer family history was also tested for association with any, aggressive, or fatal prostate cancer. Information on family history was standardized across studies included in PRACTICAL consortium data. A family history of prostate cancer was defined as the presence or absence of a first-degree relative with a prostate cancer diagnosis. There were 46,030 men with available prostate cancer family history data.

Cox proportional hazards models were used to assess family history for association with any, aggressive, or fatal prostate cancer. To evaluate the relative importance of each, a multivariable model using both family history and PHS was compared to using family history alone (log-likelihood test; α = 0.01). HRs were calculated for each variable.

Explorations of alternative ancestry groupings

Agnostic genetic ancestry groupings with FastSTRUCTURE

The primary analyses, above, used OncoArray-defined genetic ancestries, as prior reports have shown genetic ancestry may be more informative than self-reported race/ethnicities43. However, for the purpose of this study, the OncoArray-defined categories may underestimate the impact of the inherent complexity of human genetic ancestry. Therefore, we further explored the impact of an array of alternative genetic ancestry subgroup definitions on PHS2 performance using fastSTRUCTURE55, which infers global admixture/ancestry via a Bayesian approach. We ran fastSTRUCTURE v1.0 on all individuals in the multi-ethnic dataset using approximately 2300 ancestry informative markers and multiple (K) levels of population complexity to agnostically cluster the data into K = 2–6 populations. For each iteration of K populations, participants were placed into the cluster for which their maximum admixture proportion was ≥0.8. Those participants without a cluster for which their maximum admixture proportion was ≥0.8 were placed into a separate group termed “admixed.” The optimal number of clusters (K) for fastSTRUCTURE was chosen as that which maximized the marginal likelihood of the data55. PHS2 was evaluated for association with aggressive prostate cancer (HR80/20) after stratification by each K population subgroup.

A comparison of fastSTRUCTURE clustering, OncoArray-determined genetic ancestry, and self-reported race/ethnicity was compiled. OncoArray-defined genetic ancestry was mostly concordant with self-reported race/ethnicity. Participants with other/unknown self-reported race/ethnicity were mostly grouped into OncoArray’s European genetic ancestry. Additional details are shown in the Supplementary Information.

Self-reported race/ethnicity

Finally, we also evaluated PHS performance for association with aggressive prostate cancer using participants’ self-reported race/ethnicity.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.