Applying a genetic risk score for prostate cancer to men with lower urinary tract symptoms in primary care to predict prostate cancer diagnosis: a cohort study in the UK Biobank

Background Prostate cancer is highly heritable, with >250 common variants associated in genome-wide association studies. It commonly presents with non-specific lower urinary tract symptoms that are frequently associated with benign conditions. Methods Cohort study using UK Biobank data linked to primary care records. Participants were men with a record showing a general practice consultation for a lower urinary tract symptom. The outcome measure was prostate cancer diagnosis within 2 years of consultation. The predictor was a genetic risk score of 269 genetic variants for prostate cancer. Results A genetic risk score (GRS) is associated with prostate cancer in symptomatic men (OR per SD increase = 2.12 [1.86–2.41] P = 3.5e-30). An integrated risk model including age and GRS applied to symptomatic men predicted prostate cancer (AUC 0.768 [0.739–0.796]). Prostate cancer incidence was 8.1% (6.7–9.7) in the highest risk quintile. In the lowest quintile, prostate cancer incidence was <1%. Conclusions This study is the first to apply GRS in primary care to improve the triage of symptomatic patients. Men with the lowest genetic risk of developing prostate cancer could safely avoid invasive investigation, whilst those identified with the greatest risk could be fast-tracked for further investigation. These results show that a GRS has potential application to improve the diagnostic pathway of symptomatic patients in primary care.


INTRODUCTION
Prostate cancer accounts for around a quarter of new cancer cases in men, approximately 52,000 per year in the UK, and is increasing by around 4% annually [1]. An estimated 14% of prostate cancer deaths in the UK could be avoided with earlier detection [2]; advanced stage at diagnosis is associated with poorer survival [3]. Most men with prostate cancer are diagnosed after attending primary care with symptoms [4]. Evidence on the benefit of prostate cancer screening programmes (targeting asymptomatic men) is mixed; a large European screening trial identified significant reductions in prostate cancer mortality [5], while other have found that increases in prostate cancer incidence associated with screening trials were not accompanied by significant decreases in mortality [6,7], suggesting possible overdiagnosis [8].
Lower urinary tract symptoms (LUTS), such as nocturia, urinary frequency or poor stream, are common in men aged 50 and above, and are often present at the time of a prostate cancer diagnosis. The incidence of LUTS, benign prostate enlargement, and prostate cancer all rise with increasing age, complicating attempts to accurately diagnose tumours. The evidence for an association between LUTS and the risk of prostate cancer is equivocal [9], and very few studies have assessed this association in a primary care population [10].
The UK's National Institute for Health and Care Excellence (NICE) recommends a prostate-specific antigen (PSA) test for men in primary care with LUTS or new onset erectile dysfunction [11]. PSA is the only test currently available for detecting prostate cancer in primary care, yet the diagnostic accuracy of PSA in symptomatic men is unclear [9]. The most recent systematic review of the diagnostic accuracy of PSA for prostate cancer in patients with LUTS found that a PSA threshold of 4 ng/mL had a sensitivity of 0.93 (95% CI 0.88, 0.96) and specificity of 0.20 (95% CI 0.12, 0.33), and the area under the curve (AUC) was 0.72 (95% CI 0.68, 0.76) [12]. All included studies for the review were conducted in secondary care patient cohorts, limiting the applicability of the findings to the primary care setting, where cancer incidence is lower, and therefore AUC likely to be lower, due to spectrum bias [13]. As the studies in that review were based on observational data, ascertainment bias and lack of follow-up in PSA-negative men may mean that the true AUC of PSA in symptomatic men in primary care is lower still.
Over the past 15 years, genome-wide association studies (GWAS) have identified over 250 individual genetic variants that contribute to the development of prostate cancer, which have been combined into a clinically useful measure that reflects an individual's risk of developing prostate cancer: a genetic risk score (GRS) [14]. GRS improve risk predictions based on family history alone [15][16][17] but despite promising evidence on predictive ability, there has been limited integration of GRS into clinical practice [18]. There are no studies of the application of a prostate cancer GRS in the targeted investigation of men with LUTS. It is not known whether the genetic risk of developing prostate cancer affects the chance of it being present in symptomatic men, or whether GRS could be helpful in selecting men for further investigation once they present with LUTS.
The objective of this study is to assess if a prostate cancer GRS predicted a new diagnosis of prostate cancer in men in the UK Biobank who consulted their general practitioner (GP) with LUTS.

Public and patient involvement
An existing patient and public involvement and engagement (PPI&E) group consisting of six men with personal experience of prostate cancer investigation informing on-going prostate cancer research at the University of Exeter was involved with the development of the research question for this study. Their views were specifically sought around the acceptability of developing an integrated risk model that required the incorporation of genetic information, and the additional risk factors to consider. These men felt the potential benefits in improving early detection of prostate cancer and avoiding unnecessary, invasive diagnostic tests outweighed concerns about using genetic data. They also highlighted the importance of a patient's age and family history in assessing prostate cancer risk.

Participants
Unrelated UK Biobank participants of white European ancestry were included in this study. Principal component analysis was performed using individuals from the 1000 Genomes Project prior to the projection of UK Biobank individuals into the principal component space. K-means clustering was subsequently applied to classify individuals as European, with centres initiated to the mean principal component values of each 1000 Genomes sub-population. The first four principal components were used in this analysis. Related individuals were defined using a KING Kinship [19] to exclude those third-degree relatives or closer. An optimal list of unrelated individuals was generated by preferentially removing individuals with the maximum number of relatives to allow maximum numbers of individuals to be included; e.g. if A was related to B and C, but B and C were not, A was removed. For a simple pair, one individual was removed at random.
Participants were included in the analysis if they had any of these recorded in the UKBB GP records: incontinence, nocturia, hesitancy, frequency, urgency, retention, poor stream, double voiding, or a general code of lower urinary tract symptoms (LUTS). Read codes for each condition can be found in Supplementary

Variable definition
Prostate cancer was defined using the earliest date of either the Read code 'B46..' in GP records, or the linked cancer registry data. As this study aimed to test the ability of a prostate cancer GRS to identify new prostate cancer in men with symptoms, patients with prostate cancer recorded prior to the index date were excluded. Patients in the symptomatic cohort that were diagnosed with prostate cancer within 2 years of the index date were treated as cases. Patients with no record of a prostate cancer diagnosis within 2 years of the index date were considered controls. Controls may have been diagnosed with prostate cancer more than 2 years after the index date; this follow-up period was selected so that only prostate cancers that could be causing symptoms were detected. These could be diagnosed at the time the patient is symptomatic. While there is no perfect cutoff date for this, 2 years is a commonly accepted limit in previous research in cancer diagnosis [10,[20][21][22][23][24][25][26].
A genetic risk score for prostate cancer was derived using the 269 known risk variants reported in a recent trans-ancestry genome-wide meta-analysis; the included variants are described in Conti et al. [14]. Weighting for each single nucleotide polymorphism (SNP) was given by the log of the European odds ratio from Supplementary Table 4 of Conti et al. These weights were used over the UK Biobank weights to avoid issues with overfitting. The GRS was calculated for each UK Biobank participant using the sum of the weights multiplied by the participant's genotype.
Body mass index (BMI) was defined using UK Biobank's Data-Field 21001 and reported as mean kg/m 2 , ± standard deviation. Smoking status (ever or never) was defined using Data-Field 1239. Family history of prostate cancer was defined using self-report data (Data-Field 20111). These were measured at baseline UKBB recruitment.
Only a small proportion of the cohort had a PSA test result on record, and these were abnormal; the AUC for PSA alone was >0.9 which is unrealistic compared to the literature and likely to be the result of ascertainment bias [12]. As PSA is part of the current diagnostic pathway to determine if a patient is investigated for prostate cancer, it has a direct causal effect on whether an individual will be diagnosed with prostate cancer independently of the test's ability to predict that outcome. Any model of PSA and GRS in an observational study like UK Biobank will be significantly biased towards PSA; patients with a negative PSA test are not followed up and therefore unlikely to be diagnosed with prostate cancer, even if it exists. Therefore, this study compared the performance of a prostate cancer GRS to published reports of PSA diagnostic accuracy.

Statistical methods
All analysis was conducted using R 4.0.3 "Bunny-Wunnies Freak Out". The cohort characteristics were described and tests for associations performed with baseline variables: index age, family history, smoking status and BMI. The association between the GRS and a prostate cancer diagnosis within 2 years of symptoms was evaluated in a simple logistic regression model, and the odds ratio reported per standard deviation increase in GRS. We also evaluated the hazard ratio using a Cox Proportional Hazards model. Controls who died within the 2-year study period were excluded from the logistic regression model as it cannot be ascertained whether they would have remained cancer-free for 2 years. An integrated risk model was developed by including all permutations of predictor variables that reached nominal significance (P < 0.05) plus symptoms in addition to the GRS to test if predictive power was enhanced in any combination. As some participants had multiple symptoms recorded at the index date, the symptom profile could not be considered a categorical variable, and was modelled by treating each symptom as its own binary variable. The receiver operating characteristic (ROC) area under the curve (AUC) was estimated with 95% confidence intervals (CIs) for each possible integrated risk model to measure overall diagnostic performance. Diagnostic performance was estimated for incidence thresholds of 1, 2, 3, 4 and 5%; 3% is the current NICE threshold for investigation in guidance NG12 [11], although a drop to 2% is under consideration [27]. Patients have reported that they would prefer to be investigated at risk thresholds as low as 1% [28]. The study was reported in line with STROBE guidelines [29].

Preprint
A previous version of this manuscript was published as a preprint [30].

Cohort description
Of the 179,308 unrelated white European men in UKBB, 82,604 had linked GP records, of which 6930 individuals reported relevant symptoms. 153 had evidence of prostate cancer prior to the first symptom report and were excluded. Of the 6777 without preexisting prostate cancer, 247 had a record of prostate cancer within 2 years (3.5%) and were included as cases, of which 5 (2%) died during the 2-year period. Of the remaining 6530, 62 (0.9%) died during the 2-year follow-up and were excluded from casecontrol analyses, leaving 6468 controls. 3.7% of those included in the model were cases. Over 75% of the cohort were included following reports of LUTS, nocturia or frequency (Supplementary  Table 2). Figure 1 shows how the case and control numbers were obtained. Over 75% of the cohort were included following reports of LUTS, nocturia or frequency (Supplementary Table 2).
Those who went on to develop prostate cancer tended to be older, but no other covariates were significantly associated at P < 0.05 (Table 1).
A GRS predicts prostate cancer in men with symptoms In men with symptoms, the prostate cancer genetic risk score was associated with the development of prostate cancer within the next 2 years. In the 247 men with a prostate cancer diagnosis within 2 years of symptoms, the mean GRS was 23.52 (SD 0.81) vs 22.92 (SD 0.79) in the 6468 men who were not diagnosed with prostate cancer (OR = 2.12 [1.86-2.41] P = 3.5e-30) per SD increase in GRS. Supplementary Fig. 1 shows the distributions of genetic risk score in men who were diagnosed with cancer within 2 years of symptom onset vs those who were not.
Prostate cancer incidence rate over time, stratified by GRS quintile, is shown in Fig. 2. Individuals with relevant symptoms who were in the lowest quintile of the GRS had an 8.8% (7.3-10%) chance to develop prostate cancer by the end of the 2-year period, while individuals in the top quintile had an 1% (0.59-1.8%) chance. Using Cox-PH modelling, the GRS had a hazard ratio of 2.06 (1.82-2.33), P = 1.5e-31 per SD increase in GRS.
An integrated risk model of GRS and age has predictive power over and above GRS alone An integrated risk model including GRS and age returned a ROC AUC of 0.772 (95% CI 0.744-0.8) (ROC curve shown in Supplementary Fig. 2 Predicted probability of 2-year prostate cancer incidence and diagnostic accuracy statistics are reported in Table 2 at thresholds of 1, 2, 3, 4 and 5%, in addition to the probability threshold that maximises Youden's J statistic (3.7%). The integrated risk model had a negative predictive value of greater than 99% for thresholds of 0.02 or less.
In Table 3, the 2-year incidence rates of prostate cancer are stratified by age decade and GRS quintile. An incidence of <1% was observed in those aged 40 years and under in the bottom four GRS quintiles and aged 40-50 years in the bottom two GRS quintiles. Men aged 70 years and over had a >1% incidence rate in every GRS quintile, while men over 60 in the top GRS quintile had a >10% incidence rate.

DISCUSSION
This study is the first to demonstrate that genetic risk scores can improve the selection of men for suspected prostate cancer investigation in primary care, over and above presenting clinical features. NICE guidance NG12 proposes that any combination of clinical features that represent a ≥3% chance of cancer should be investigated [11], although a reduction to 2% is under consideration to improve cancer outcomes [27]. The integrated risk model presented in this study could be used to risk stratify men with LUTS above and below this threshold. All individuals in the lower 3 quintiles (60% of men in UKBB with symptoms) could potentially be managed in primary care, avoiding referral. Individuals in the lower 2 quintiles of GRS (40%) could avoid referral under the proposed 2% threshold. Using the proposed 2% threshold, the integrated risk model suggests excluding GRS quintiles 1-4 in those aged under 60 years and quintile 1 in those aged 60-70 years.

Limitations
This analysis was limited to white European ancestry due to the lack of ethnic diversity in UKBB; a substantial limitation as black men are twice as likely to be diagnosed with, and suffer worse outcomes from, prostate cancer [31].    [14]. However, this could also result in an underestimate of the true predictive value of GRS in symptomatic men. This study examines men in UKBB with a code for LUTS, which may not represent all men seeing their GP with such symptoms. There is also a lack of standardised follow-up across the cohort. The UK Biobank's cancer registry data contains only diagnosis data from HES records and GP records, precluding us from studying tumour aggressiveness. A complete model of genetic susceptibility to prostate cancer would further include high-penetrance rare variants, which are not included in the selected GRS.
Comparison to the existing literature The performance of the integrated risk model is similar to the diagnostic accuracy of PSA as reported in the literature: AUC 0.72 (95% CI 0.68-0.76) [12]. We hypothesise that the optimal predictive model would incorporate PSA, GRS, and other clinical features. Oto [14].

Clinical implications
This work has significant implications for the suspected prostate cancer investigation pathway in UK primary care. With the integration of GRS into routine clinical care, men identified as being at the greatest risk of prostate cancer could be prioritised for investigation, resulting in expedited diagnosis. The best available evidence supports the position that cancer diagnosis at an earlier disease stage is beneficial for survival [34]. Conversely, those identified as being at a very low risk of cancer by the integrated risk model could be managed in primary care and avoid invasive investigations, reducing patient harm, and reducing demand on secondary care services. The ideal place for an integrated risk model in primary care would be as stratification tool to support GP decision-making for patients with LUTS, perhaps in deciding when to offer a PSA test. We have shown that, for prostate cancer, 40% of men with LUTS could avoid investigation for suspected cancer. Genetic sequencing is not currently available in UK primary care but current trends suggest that it will become part of routine practice in the future. The NHS will be the first national health care system to offer whole genome sequencing as part of routine care [

Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.

DATA AVAILABILITY
All data in this project were part of the UK Biobank resource and was accessed under application number 74981. Information on how to access the UK Biobank can be found at https://www.ukbiobank.ac.uk/enable-your-research/apply-for-access.

CODE AVAILABILITY
All code used to generate results for this study can be found on the author's Github page: https://github.com/hdg204/ProstateCancer.