Polygenic hazard score is associated with prostate cancer in multi-ethnic populations

Genetic models for cancer have been evaluated using almost exclusively European data, which could exacerbate health disparities. A polygenic hazard score (PHS1) is associated with age at prostate cancer diagnosis and improves screening accuracy in Europeans. Here, we evaluate performance of PHS2 (PHS1, adapted for OncoArray) in a multi-ethnic dataset of 80,491 men (49,916 cases, 30,575 controls). PHS2 is associated with age at diagnosis of any and aggressive (Gleason score ≥ 7, stage T3-T4, PSA ≥ 10 ng/mL, or nodal/distant metastasis) cancer and prostate-cancer-specific death. Associations with cancer are significant within European (n = 71,856), Asian (n = 2,382), and African (n = 6,253) genetic ancestries (p < 10−180). Comparing the 80th/20th PHS2 percentiles, hazard ratios for prostate cancer, aggressive cancer, and prostate-cancer-specific death are 5.32, 5.88, and 5.68, respectively. Within European, Asian, and African ancestries, hazard ratios for prostate cancer are: 5.54, 4.49, and 2.54, respectively. PHS2 risk-stratifies men for any, aggressive, and fatal prostate cancer in a multi-ethnic dataset.

P rostate cancer is the second most common cancer diagnosed in men worldwide, causing substantial morbidity and mortality 1 . Prostate cancer screening may reduce morbidity and mortality [2][3][4][5] , but to avoid overdiagnosis and overtreatment of indolent disease [6][7][8][9] , it should be targeted and personalized. Prostate cancer age at diagnosis is important for clinical decisions regarding if/when to initiate screening for an individual 10,11 . Survival is another key cancer endpoint recommended for risk models 12 .
Genetic risk stratification is promising for identifying individuals with a greater predisposition for developing cancer [13][14][15][16] , including prostate cancer 17 . Polygenic models use common variants-identified in genome-wide association studies-whose combined effects can assess the overall risk of disease development 18,19 . Recently, a polygenic hazard score (PHS) was developed as a weighted sum of 54 single-nucleotide polymorphisms (SNPs) that models a man's genetic predisposition for developing prostate cancer 13 . Validation testing was done using ProtecT trial data 2 and demonstrated the PHS to be associated with age at prostate cancer diagnosis, including aggressive prostate cancer 13 . However, the development and validation datasets were limited to men of European ancestry. While genetic risk models might be important clinical tools for prognostication and risk stratification, using them may worsen health disparities [20][21][22][23][24] because most models are constructed using European data and may underrepresent genetic variants important in persons of non-European ancestry [20][21][22][23][24] . Indeed, this is particularly concerning in prostate cancer, as race/ethnicity is an important prostate cancer risk factor; diagnostic, treatment, and outcomes disparities continue to exist between different races/ethnicities 25,26 .
Here, we assessed PHS performance in a multi-ethnic dataset that includes individuals of European, African, and Asian genetic ancestry. This dataset also includes long-term follow-up information, affording an opportunity to evaluate PHS for association with fatal prostate cancer.
PHS association with any prostate cancer in OncoArray. PHS 2 was associated with age at prostate cancer diagnosis in all three OncoArray-defined genetic ancestry groups (Table 1). Comparing the 80th and 20th percentiles of genetic risk, men with high PHS had an HR of 5.32 [4.99-5.70] for any prostate cancer. Within each genetic ancestry group, men with high PHS had HRs of 5.54 [5.18-5.93], 4.49 [3.23-6.33], and 2.54 [2.08-3.10] for men of European, Asian, and African ancestry, respectively.
PHS association with aggressive prostate cancer in OncoArray. PHS 2 was associated with age at aggressive prostate cancer diagnosis in all three OncoArray-defined genetic ancestry groups ( PHS association with fatal prostate cancer in OncoArray. PHS 2 was associated with age at prostate cancer death for all men in the multi-ethnic dataset (z = 15.9, p = 6.3 × 10 −57 ). Table 3 shows zscores and corresponding HRs for fatal prostate cancer. Comparing the 80th and 20th percentiles of genetic risk, men with high PHS had a HR of 5.68 [5.07-6.46] for prostate cancer death.
Sensitivity analyses. Sensitivity analyses demonstrated that large changes in assumed population incidence had minimal effect on the calculated HRs for any, aggressive, or fatal prostate cancer (Supplementary Information).
PHS and family history. Family history was also associated with any prostate cancer (z = 39.7, p < 10 −300 ; Table 4), aggressive prostate cancer (z = 32.4, p = 2.7 × 10 −230 ), and fatal prostate cancer (z = 8.76, p = 1.4 × 10 −18 ) in the multi-ethnic dataset. Among those with known family history, the combination of family history and PHS performed better than family history alone (log-likelihood p < 10 −300 ). This pattern held true when analyses were repeated on each genetic ancestry. Additional family history analyses are reported in the Supplementary Information.
PHS associations with aggressive prostate cancer using alternative ancestry groupings Agnostic genetic ancestry groupings with fastSTRUCTURE. With fastSTRUCTURE, the optimal model was the one with K = 2 clusters: cluster 1 had mainly men of European OncoArraydefined genetic ancestry and self-reported race/ethnicity, cluster 2 had only men of African OncoArray-defined genetic ancestry and mostly Black/African American self-reported race/ancestry, while the Admixed cluster included men of all Oncotype-defined genetic ancestries. OncoArray genetic ancestry cluster 1, cluster 2, and admixed cluster, respectively. Corresponding results for the K = 3-6 clustering approaches are shown in the Supplementary Information.
Self-reported race/ethnicity. HRs for aggressive prostate cancer comparing the 80th and 20th percentiles of genetic risk when participants are stratified by their self-reported race/ethnicity are shown in the Supplementary Information.

Discussion
These results confirm the previously reported association of PHS with age at prostate cancer diagnosis in Europeans and show that this finding generalizes to a multi-ethnic dataset, including men of European, Asian, and African ancestry. PHS is also associated with age at aggressive prostate cancer diagnosis and at prostate cancer death. Comparing the highest and lowest quintiles of genetic risk, men with high PHS had HRs of 5.32, 5.88, and 5.68 for any prostate cancer, aggressive prostate cancer, and prostate cancer death, respectively.
We found that PHS is associated with prostate cancer in men of European, Asian, and African genetic ancestry (and a wider range of self-reported race/ethnicities). Current prostate cancer screening guidelines suggest possible initiation at earlier ages for men of African ancestry, given higher incidence rates and worse survival when compared to men of European ancestry 26 . Using the PHS to riskstratify men might help with decisions regarding when to initiate prostate cancer screening: perhaps a man with African genetic ancestry in the lowest percentiles of genetic risk by PHS could safely delay or forgo screening to decrease the possible harms associated with overdetection and overtreatment 9 , while a man in the highest risk percentiles might consider screening at an earlier age. Similar   This analysis is limited to individuals with known family history. Both family history and PHS were significantly associated with any prostate cancer in the combined models. Hazard ratios (HRs) for family history were calculated as the exponent of the beta from the multivariable Cox proportional hazards regression 56 . The HR for PHS in the multivariable models was estimated as the HR80/20 (men in the highest 20% vs. those in the lowest 20% of genetic risk by PHS2) in each cohort. p Values reported are two-tailed from the Cox models. The model with PHS performed better than family history alone (log-likelihood p < 10 −300 ). reasoning applies to men of all genetic ancestries. Risk-stratified screening should be prospectively evaluated. PHS performance was better in those with OncoArray-defined European and Asian genetic ancestry than in those with African ancestry. For example, comparing the highest and lowest quintiles of genetic risk, men with OncoArray-defined European and Asian genetic ancestry with high PHS had HRs for any prostate cancer of 5.54 and 4.49 times, respectively, while the analogous HR for men of African genetic ancestry was 2.54. This trend was also observed for aggressive prostate cancer. Moreover, the optimal fastSTRUCTURE clustering of our dataset (K = 2) yielded one cluster that consisted of almost only men of African ancestry (by both self-report and OncoArray-defined genetic ancestry) and had inferior risk stratification with PHS 2 (HR 2.06), compared to the performance observed in the other cluster (nearly all European) and an admixed cluster (HRs 5.60 and 5.05, respectively). Overall, these results suggest PHS can differentiate men of higher and lower risk in each ancestral group, but the range of risk levels may be narrower in those of African ancestry. Possible reasons for relatively diminished performance include increased genetic diversity with less linkage disequilibrium in those of African genetic ancestry [27][28][29] . Known health disparities may also contribute 25 , as the availability-and timing-of PSA results may depend on healthcare access. Alarmingly, there has historically been a poor representation of African populations in clinical or genomic research studies 20,21 . This pattern is reflected in the present study, where most men of African genetic ancestry were missing clinical diagnosis information used to determine disease aggressiveness. That such clinical information is less available for men of African ancestry also leaves open the possibility of systematic differences in the diagnostic workup-and therefore the age of diagnosis-across different ancestry populations. These are critical health disparities that will need to be addressed (and ultimately eliminated) to ensure equitable and accurate genomic prostate cancer stratification for all men. Notwithstanding these caveats, the present PHS is associated with age at prostate cancer diagnosis in men of African ancestry, possibly paving the way for more personalized screening decisions for men of African descent. Promising efforts are also underway to further improve PHS performance in men of African ancestry 30 .
The first PHS validation study used data from ProtecT, a large prostate cancer trial 2,13 . ProtecT's screening design yielded biopsy results from both controls and cases with PSA ≥ 3 ng/mL, making it possible to demonstrate improved accuracy and efficiency of prostate cancer screening with PSA testing. Limitations of the ProtecT analysis, though, include few recorded prostate cancer deaths in the available data, and the exclusion of advanced cancer from that trial 2 . The present study includes long-term observation, with both early and advanced disease 18 , allowing for evaluation of PHS association with any, aggressive, and fatal prostate cancer; we found PHS to be associated with all outcomes.
Age is critical in clinical decisions of whether men should be offered prostate cancer screening [31][32][33][34] and in how to treat men diagnosed with prostate cancer 31,32 . Age may also inform prognosis 32,35 . Age at diagnosis or death is therefore of clinical interest in inferring how likely a man is to develop cancer at an age when he may benefit from treatment. One important advantage of the survival analysis used here is that it permits men without cancer at the time of the last follow-up to be censored while allowing for the possibility of them developing prostate cancer (including aggressive or fatal prostate cancer) later on. prostate cancer death is a hard endpoint with less uncertainty than clinical diagnosis (which may vary with screening practices and delayed medical attention). PHS may help identify men with a high (or low) genetic predisposition to develop lethal prostate cancer and could assist physicians in deciding when to initiate screening. Current guidelines suggest considering a man's individual cancer risk factors, overall life expectancy, and medical comorbidities when deciding whether to screen 6 . The most prominent clinical risk factors used in practice are family history and race/ ethnicity 6,36,37 . Combined PHS and family history performed better than either alone in this multi-ethnic dataset. This finding is consistent with a prior report that PHS adds considerable information over family history alone. The prior study did not find an association of family history with age at prostate cancer diagnosis, perhaps because the universal screening approach of the ProtecT trial diluted the influence of family history on who is screened in typical practice 13 . In the present study, family history and PHS appear complementary in assessing prostate cancer genetic risk. Moreover, the HRs for PHS suggest clinical relevance similar or greater to predictive tools routinely used for cancer screening (e.g., breast cancer) and for other diseases (e.g., diabetes and cardiovascular disease). HRs reported for those tools are around 1-3 for disease development or other adverse outcome [38][39][40][41][42] ; HRs reported here for PHS (for any, aggressive, or fatal prostate cancer) are similar or greater.
Limitations to this work include that the dataset comes from multiple, heterogeneous studies, from various populations with variable screening rates. This allowed for a large, multi-ethnic dataset that includes clinical and survival data, but comes with uncertainties avoided in the ProtecT dataset used for original validation. However, the heterogeneity would likely reduce the PHS performance, not systematically inflate the results. Second, we note that no germline SNP tool, including this PHS, has been shown to discriminate men at risk of aggressive prostate cancer from those at risk of only indolent prostate cancer. Third, while the OncoArraydefined and fastSTRUCTURE genetic ancestry classifications used here may be more accurate than self-reported race/ethnicity alone 43 and allowed for evaluation of admixed genetic ancestry, detailed analysis of local ancestry was not assessed. As noted above, clinical data availability was not uniform across contributing studies and was lower in men of OncoArray-defined African genetic ancestry. Efforts to improve genetic risk prediction should focus on consistent data collection patterns and elimination of data disparities so that models are widely applicable for all men. We also found that while the optimal fastSTRUCTURE model had K = 2 clusters for risk stratification men for aggressive prostate cancer, models with more K clusters also produced comparable (or larger ranges) of hazard ratios for risk stratification. The ability of these models with more K clusters to risk-stratify men well (while possibly being less representative of the available data) emphasizes the dire need for more complex and deeper studies evaluating the intersection of genetics, the granularity of ancestry, and prostate cancer risk. In addition, the PHS may not include all SNPs associated with prostate cancer; in fact, over 60 additional SNPs have been reported since the development of the original PHS 18 . Some of these SNPs are ethnicity-specific, including within non-European populations [44][45][46] , and will be included in further model optimization to improve prostate cancer risk stratification. Future work could also evaluate the PHS performance in relation to epidemiological risk factors associated with prostate cancer risk beyond those currently used in clinical practice (i.e., family history and race/ethnicity). Finally, various circumstances and disease-modifying treatments may have influenced post-diagnosis survival to an unknown degree. Despite this possible source of variability in survival among men with fatal prostate cancer, PHS was still associated with age at death, an objective, and meaningful endpoint. Future development and optimization hold promise for improving upon the encouraging risk stratification achieved here in men of different genetic ancestries, particularly African.
In summary, PHS was associated with age at any and aggressive prostate cancer, and at death from prostate cancer in a multi-ethnic dataset. PHS performance was relatively diminished in men of African genetic ancestry, compared to performance in men of European or Asian genetic ancestry. PHS risk-stratifies men of various genetic ancestries for prostate cancer and should be prospectively studied as a means to individualize screening strategies seeking to reduce prostate cancer morbidity and mortality.

Methods
Participants. We obtained data from the OncoArray project 47 that had undergone quality control steps 18 . This dataset includes 91,480 men with genotype and phenotype data from 64 studies (Supplementary Information). Individuals whose data were used in the prior development or validation of the original PHS model (PHS 1 ) were excluded (n = 10,989) 13 , leaving 80,491 in the independent dataset used here. Table 6 describes available data. Individuals not meeting the endpoint for each analysis were censored at age of last follow-up.
All contributing studies were approved by the relevant ethics committees; written informed consent was acquired from the study participants 48 . The present analyses used de-identified data from the PRACTICAL consortium.
Polygenic hazard score. The original PHS 1 was validated for association with age at prostate cancer diagnosis in men of European ancestry using a survival analysis 13 . To ensure the score was not simply identifying men at risk of indolent disease, PHS 1 was also validated for association with age at aggressive prostate cancer (defined as an intermediate-risk disease, or above 6 ) diagnosis 13 . PHS 1 was calculated as the vector product of a patient's genotype (X i ) for n selected SNPs and the corresponding parameter estimates (β i ) from a Cox proportional hazards regression: The 54 SNPs in PHS 1 were selected using PRACTICAL consortium data (n = 31,747 men) genotyped with a custom array (iCOGS, Illumina, San Diego, CA) 13 .
Adapting the PHS to OncoArray. Genotyping for the present study was performed using a commercially available, cancer-specific array (OncoArray, Illumina, San Diego, CA) 18 . Twenty-four of the 54 SNPs in PHS 1 were directly genotyped on OncoArray. We identified proxy SNPs for those not directly genotyped and recalculated the SNP weights in the same dataset used for the original development of PHS 1 13 (Supplementary Methods).
The performance of the adapted PHS (PHS 2 ), was compared to that of PHS 1 in the ProtecT dataset originally used to validate PHS 1 (n = 6411). PHS 2 was calculated for all patients in the ProtecT validation set and was tested as the sole predictive variable in a Cox proportional hazards regression model (R v.3.5.1, "survival" package 49 ) for age at aggressive prostate cancer diagnosis, the primary endpoint of that study. The performance was assessed by the metrics reported during the PHS 1 development: 13 z-score and hazard ratio (HR 98/50 ) for aggressive prostate cancer between men in the highest 2% of genetic risk (≥98th percentile) vs. those with average risk (30-70th percentile). HR 95% confidence intervals (CIs) were determined by bootstrapping 1000 random samples from the ProtecT dataset 50,51 while maintaining the same number of cases and controls. PHS 2 percentile thresholds are shown in the Supplementary Information.
OncoArray-defined genetic ancestry. Self-reported race/ethnicities 47,52 , included European, Black, or African American (includes Black African, Black Caribbean), East Asian, South Asian, Hawaiian, Hispanic American, and Other/Unknown. Genetic ancestry for each individual from the OncoArray project 47 was provided with the PRACTICAL consortium data. Briefly, genotypes from 2318 ancestry informative markers were mapped into a two-dimensional space representing the first two principal components, which has been shown to yield results very similar to those obtained with the STRUCTURE approach 52 . The distance from the individual's mapping to the three reference clusters (European, African, and Asian) was then used to estimate the individual's genetic ancestry 47,52 . Individuals were classified into one of three OncoArray-defined labels; European: greater than 80% European ancestry, Asian: greater than 40% Asian ancestry, and African: greater than 20% African ancestry. Individuals not meeting any of the aforementioned three labels were classified as "other," but all of the individuals in the present prostate cancer dataset met the criteria for one of the three OncoArraydefined genetic ancestries.
Any prostate cancer. We tested PHS 2 for association with age at diagnosis of any prostate cancer in the multi-ethnic dataset (n = 80,491, Table 6). PHS 2 was calculated for all patients in the multi-ethnic dataset and used as the sole independent variable in Cox proportional hazards regressions for the endpoint of age at prostate cancer diagnosis. Due to the potential for Cox proportional hazards results to be biased by a higher number of cases in our dataset than in the general population, sample-weight corrections were applied to all Cox models using population data from Sweden 13,53 (additional details are in Supplementary  Information). Significance was set at α = 0.01 13 .
These Cox proportional hazards regressions (with PHS 2 as the sole independent variable and age at prostate cancer diagnosis as the outcome) were then repeated for subsets of data, stratified by OncoArray-defined genetic ancestry: European, Asian, and African. Percentiles of genetic risk were calculated using data from the 9,728 men in the original (iCOGS) development set who were less than 70 years old and without prostate cancer 13,54 . HRs and 95% CIs for each genetic ancestry group were calculated to make the following comparisons: HR 98/50 , men in the highest 2% of genetic risk vs. those with average risk (30-70th percentile); HR 80/50 , men in the highest 20% vs. those with average risk, HR 20/50 , men in the lowest 20% vs. those with average risk; and HR 80/20 , men in the highest 20% vs. lowest 20%. CIs were determined by bootstrapping 1000 random samples from each genetic ancestry group 50,51 while maintaining the same number of cases and controls. HRs and CIs were calculated for age at prostate cancer diagnosis separately for each genetic ancestry group.
Given that the overall incidence of prostate cancer in different populations varies, we performed a sensitivity analysis of the population case/control numbers, allowing the population incidence to vary from 25 to 400% of that reported in Sweden (chosen as an example population; Supplementary Information).
Aggressive prostate cancer. Recognizing that not all prostate cancer is clinically significant, we also tested PHS 2 for association with age at aggressive prostate cancer diagnosis in the multi-ethnic dataset. For these analyses, we included cases that had known tumor stage, Gleason score, and PSA at diagnosis (n = 60,617 cases, Table 6). Aggressive prostate cancer cases were those that met any of the following criteria 6,13 : Gleason score ≥7, PSA ≥ 10 ng/mL, T3-T4 stage, nodal metastases, or distant metastases. As before, Cox proportional hazards models and sensitivity analysis were used to assess the association.
Fatal prostate cancer. Using an even stricter definition of clinical significance, we evaluated the association of PHS 2 with age at prostate cancer death in the multiethnic dataset. All cases (regardless of staging completeness) and controls were included, and the endpoint was the age at death due to prostate cancer. This analysis was not stratified by genetic ancestry due to low numbers of recorded prostate cancer deaths in the non-European datasets. The cause of death was PHS and family history. Prostate cancer family history was also tested for association with any, aggressive, or fatal prostate cancer. Information on family history was standardized across studies included in PRACTICAL consortium data. A family history of prostate cancer was defined as the presence or absence of a firstdegree relative with a prostate cancer diagnosis. There were 46,030 men with available prostate cancer family history data. Cox proportional hazards models were used to assess family history for association with any, aggressive, or fatal prostate cancer. To evaluate the relative importance of each, a multivariable model using both family history and PHS was compared to using family history alone (log-likelihood test; α = 0.01). HRs were calculated for each variable.
Explorations of alternative ancestry groupings Agnostic genetic ancestry groupings with FastSTRUCTURE. The primary analyses, above, used OncoArray-defined genetic ancestries, as prior reports have shown genetic ancestry may be more informative than self-reported race/ethnicities 43 . However, for the purpose of this study, the OncoArray-defined categories may underestimate the impact of the inherent complexity of human genetic ancestry. Therefore, we further explored the impact of an array of alternative genetic ancestry subgroup definitions on PHS 2 performance using fastSTRUCTURE 55 , which infers global admixture/ancestry via a Bayesian approach. We ran fas-tSTRUCTURE v1.0 on all individuals in the multi-ethnic dataset using approximately 2300 ancestry informative markers and multiple (K) levels of population complexity to agnostically cluster the data into K = 2-6 populations. For each iteration of K populations, participants were placed into the cluster for which their maximum admixture proportion was ≥0.8. Those participants without a cluster for which their maximum admixture proportion was ≥0.8 were placed into a separate group termed "admixed." The optimal number of clusters (K) for fastSTRUCTURE was chosen as that which maximized the marginal likelihood of the data 55 . PHS 2 was evaluated for association with aggressive prostate cancer (HR 80/20 ) after stratification by each K population subgroup.
A comparison of fastSTRUCTURE clustering, OncoArray-determined genetic ancestry, and self-reported race/ethnicity was compiled. OncoArray-defined genetic ancestry was mostly concordant with self-reported race/ethnicity. Participants with other/unknown self-reported race/ethnicity were mostly grouped into OncoArray's European genetic ancestry. Additional details are shown in the Supplementary Information.
Self-reported race/ethnicity. Finally, we also evaluated PHS performance for association with aggressive prostate cancer using participants' self-reported race/ethnicity.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
PRACTICAL consortium data are available upon request to the Data Access Committee (http://practical.icr.ac.uk/blog/?page_id=135). Questions and requests for further information may be directed to PRACTICAL@icr.ac.uk. All other data are available within the Article, Supplementary information, or upon request to the authors.

Code availability
Code used for this work has been made available along with this paper (Supplementary Software 1).