Prospective investigation of risk factors for prostate cancer in the UK Biobank cohort study

Background: Prostate cancer is the most common cancer in British men but its aetiology is not well understood. We aimed to identify risk factors for prostate cancer in British males. Methods: We studied 219 335 men from the UK Biobank study who were free from cancer at baseline. Exposure data were collected at recruitment. Prostate cancer risk by the different exposures was estimated using multivariable-adjusted Cox proportional hazards models. Results: In all, 4575 incident cases of prostate cancer occurred during 5.6 years of follow-up. Prostate cancer risk was positively associated with the following: black ethnicity (hazard ratio black vs white=2.61, 95% confidence interval=2.10–3.24); having ever had a prostate-specific antigen test (1.31, 1.23–1.40); being diagnosed with an enlarged prostate (1.54, 1.38–1.71); and having a family history of prostate cancer (1.94, 1.77–2.13). Conversely, Asian ethnicity (Asian vs white hazard ratio=0.62, 0.47–0.83), excess adiposity (body mass index (⩾35 vs <25 kg m−2=0.75, 0.64–0.88) and body fat (⩾30.1 vs <20.5%=0.81, 0.73–0.89)), cigarette smoking (current vs never smokers=0.85, 0.77–0.95), having diabetes (0.70, 0.62–0.80), and never having had children (0.89, 0.81–0.97) or sexual intercourse (0.53, 0.33–0.84) were related to a lower risk. Conclusions: In this new large British prospective study, we identified associations with already-established, putative and possible novel risk factors for being diagnosed with prostate cancer. Future research will examine associations by tumour characteristics.

The UK Biobank cohort is an important new resource for the study of cancer aetiology. We report here the first results from this cohort on the association between prostate cancer incidence and potential risk factors, including socio-demographic, anthropometric and lifestyle factors, health status, prostate-specific factors prior to the recruitment, sexual history, early life characteristics, hair colour, and balding pattern. We also tested whether these associations vary by time to diagnosis.

MATERIALS AND METHODS
Study design. The UK Biobank is a prospective study designed to be a resource for research into the causes of disease in middle and old age. The study protocol and information about data access are available online (http://www.ukbiobank.ac.uk/wp-content/uploads/ 2011/11/UK-Biobank-Protocol.pdf) and more details of the recruitment and study design have been published elsewhere (Sudlow et al, 2015). In brief, all participants were registered with the UK National Health Service (NHS) and lived within B25 miles (40 km) of one of the assessment centres. The UK Biobank invited B9.2 million people to participate through postal invitation with a telephone follow-up, with a response rate of 5.7%. A total of 503 317 men and women aged 40-69 years were recruited in 22 assessment centres across England, Wales andScotland, between 2006 and. In all, 608 participants have subsequently withdrawn from the study and their data were not available for analysis. The UK Biobank study was approved by the North West Multi-Centre Research Ethics Committee (reference number 06/ MRE08/65), and at recruitment all participants gave informed consent to participate in UK Biobank and be followed-up, using a signature capture device.
After excluding 9835 men with prevalent cancer (except C44: non-melanoma skin cancer), and 2 men censored on entry day, these analyses included a total of 219 335 men (Supplementary Figure 1).
Exposure assessment. Participants provided detailed self-reported data via a touch screen questionnaire and a verbal interview with a trained nurse at the assessment centres at baseline (Sudlow et al, 2015), and a wide range of physical measurements (e.g., body mass index (BMI) and including bioimpedance) and biological samples were collected (Sudlow et al, 2015). Information about the assessment procedure is available at http://www.ukbiobank.ac.uk/.
Exposure data included information on socio-demographic factors (region, Townsend deprivation index, education level, ethnicity, employment, and living with a wife or partner), anthropometric measurements (standing height, weight, BMI, percentage body fat, waist and hip circumferences, waist to hip ratio (WHR) (UK-Biobank, 2014)), lifestyle characteristics (smoking status, alcohol consumption, and physical activity), healthrelated factors (vasectomy, hypertension, and diabetes), prostatespecific factors prior to recruitment (PSA test, enlarged prostate, and family history of prostate cancer), sexual history (number of children, age at first sexual intercourse, lifetime heterosexual partners, same-sex intercourse, and lifetime number of same-sex partners), early life factors (puberty as defined by age of first facial hair, relative age voice broke, and comparative body size and height at age 10 years), and hair colour and balding pattern. Detailed information regarding how these variables were collected is given in the Supplementary Methods.
Outcome assessment. Men were followed-up until the censoring date (30 September 2014 in England and Wales, and 31 December 2014 in Scotland) via record linkage to the NHS Central Register, which provides information on cancer registrations and deaths.
The end point included in these analyses is first diagnosis of prostate cancer (International Classification of Diseases Tenth revision codes: C61; (WHO, 2010)) or death from prostate cancer, whichever was first. Person-years were calculated from the date of recruitment to the date of cancer registration (first malignant neoplasm, except non-melanoma skin cancer (ICD-10 C44)), death, or the censoring date, whichever occurred first.
Statistical analysis. Cox proportional hazards models were used to calculate hazard ratios (HRs) and 95% confidence intervals (CIs) for prostate cancer risk, using age as the underlying time variable. All analyses were stratified by geographical region of recruitment (10 UK regions, except for when region was the main exposure of interest) and age (o45, 45-49, 50-54, 55-59, 60-64, X65 years) at recruitment. Exposure variables (socio-demographic factors, anthropometric measurements, lifestyle characteristics, health status, prostate-specific factors prior to recruitment, sexual history, early life factors, and hair colour and balding pattern) were divided into categories based on their distribution at baseline in the whole cohort (categories for each exposure are explained in detailed in Supplementary Methods). Missing and/or unknown values of the exposure of interest were not included in the analyses, but missing and/or unknown values were assigned to a separate category when the variable was included as a covariate. As these are the first analyses on the association between potential risk factors and incidence of total prostate cancer in UK Biobank, potential confounders were first identified a priori based on possible risk factors for prostate cancer that have some support in the literature (Sutcliffe & Colditz, 2013;Cuzick et al, 2014;WCRF/AICR, 2014;Rider et al, 2016).
Minimally adjusted Cox regression models were performed to identify statistically important covariates from the a priori potential confounders. On the basis of results from the minimally adjusted Cox regression analyses, the multivariable-adjusted model was additionally adjusted for Townsend deprivation score (fifths, unknown (0.1%)), ethnicity (white, mixed background, Asian, black, other, and unknown (0.7%)), lives with a wife or partner (no, yes), BMI (o25, X25-o30, X30-o35, X35 kg m À 2 , and unknown (0.6%)), cigarette smoking (never, former, current, and unknown (0.7%)), physical activity (low (0-o10 metabolic equivalents (METs) per week), moderate (X10-o50 METs per week), high (X50 METs per week), and unknown (3.7%)), diabetes (no, yes, and unknown (0.6%)), enlarged prostate (no or unknown, and yes), and family history of prostate cancer (no, yes (brother or father), and unknown (45.1%)). For each adjustment variable, missing values were assigned to a separate category. Body mass index was not included in the multivariable-adjusted model when fat mass, waist circumference, and WHR were the main exposure variables.
P-values from the multivariable-adjusted model for the association of each exposure with prostate cancer risk were calculated as follows: P-values for likelihood ratio tests for variables with more than two categories (categorical variables) were obtained comparing the model with and without the variable of interest; P-values for dichotomous variables were obtained comparing the reference category to the other category in the model; and P-values for trend were obtained using a pseudo-continuous variable equal to the median value in each category for continuous variables. The proportional hazards assumption was tested using time-varying covariates and Schoenfeld residuals and revealed no evidence of deviation from the proportional hazards assumption.
Sensitivity analyses were performed to test for heterogeneity in the associations with risk by time between recruitment and diagnosis (o2 and X2 years) to examine whether there were associations for cancers diagnosed shortly after recruitment, which could indicate reverse causality. For this purpose, we fitted stratified Cox models based on competing risks and compared the risk coefficients and s.e.'s in the subgroups of interest (o2 or X2 years between recruitment and diagnosis). All analyses were performed using Stata version 14.1 (Stata Corporation, College Station, TX, USA), all tests of significance were two-sided, and P-values o0.05 were considered statistically significant.

Participants' characteristics.
A total of 4575 men were diagnosed with prostate cancer after a mean 5.6 years (s.d., 1.0 years) followup. Table 1 shows the characteristics of the study population at baseline. The mean age at recruitment was 56.5 years (s.d., 8.2 years) and the mean BMI was 27.8 kg m À 2 (s.d., 4.2). Among all participants, 12.4% reported that they were current cigarette smokers, and 43.2% reported drinking at least 20 g of alcohol per day. Physical inactivity was reported by 27.6% of men. Diabetes was reported by 6.9% of men. Regarding prostate-specific factors prior to recruitment, 27.6% of men reported having had at least one PSA test and 7.5% of men had a family history of prostate cancer. Table 2 shows the HRs of prostate cancer in relation to sociodemographic factors, anthropometric factors, and lifestyle factors before and after adjusting for multiple factors. There were no marked differences between the minimally and multivariableadjusted models. After multivariable adjustment, men living in North-West England, North-East England, Yorkshire & the Humber, and in South-West England were significantly less likely to be diagnosed with prostate cancer than men living in London. There was no evidence for an association between living in a deprived area, having higher education, being unemployed, or living with a partner and prostate cancer risk. Compared to men of white ethnicity, Asians had a lower risk (HR ¼ 0.62, 95% CI 0.47-0.83) and black men had a higher risk (2.61, 2.10-3.24) of prostate cancer.
Height was not associated with prostate cancer risk in the multivariable-adjusted model (Table 2). Obesity (BMI X30-o35 vs o25 kg m À 2 : HR ¼ 0.88, 95% CI 0.81-0.97) and morbid obesity (BMI X35 vs o25 kg m À 2 ¼ 0.75, 0.64-0.88), high body fat percentage (HR for the highest vs the lowest fifth ¼ 0.81, 0.73-0.89), high waist circumference (HR for the highest vs the lowest fifth ¼ 0.90, 0.82-0.99), and high WHR (HR for the highest vs the lowest fifth ¼ 0.87, 0.79-0.96) were all significantly inversely associated with prostate cancer risk.  Among the lifestyle characteristics analysed in this study, compared with never smokers, current cigarette smokers (0.85, 0.77-0.95) and former cigarette smokers (0.93, 0.88-0.99) had a significantly lower risk of prostate cancer, while no association with risk was observed for alcohol intake or physical activity. Table 3 shows the HRs of prostate cancer in relation to health status and prostate-specific factors prior to recruitment, sexual history, early life factors, and hair colour and balding pattern. Men with a self-reported diagnosis of diabetes had a lower risk of incident prostate cancer (HR ¼ 0.70, 0.62-0.80). In contrast, men who had had a PSA test prior to recruitment (1.31, 1.23-1.40), had any first-degree family history of prostate cancer (1.94, 1.77-2.13), and who reported that they had been diagnosed with an enlarged prostate (1.54, 1.38-1.71) had an elevated risk of prostate cancer. Moreover, compared to men with no family history of prostate cancer, men with both their father and brother diagnosed with prostate cancer had an even higher risk of prostate cancer (3.35, 2.33-4.81).
For sexual history factors, men who had no children had a reduced prostate cancer risk (never vs ever, HR ¼ 0.89, 95% CI 0.81-0.97), as did men who reported they had never had sex (never vs ever, 0.53, 0.33-0.84). Other sexual history characteristics were not related to prostate cancer risk (Table 3).
Early life factors (relative age of first facial hair, relative age voice broke, and comparative body size and height at age 10) and hair colour and pattern were not associated with prostate cancer risk (Table 3).
For most factors, there was no significant heterogeneity in the association of prostate cancer risk according to time between recruitment and diagnosis (o2 and X2 years) (Supplementary Table 1). However, there was evidence of heterogeneity by time to diagnosis for the association of prostate cancer risk with ethnicity (P heterogeneity ¼ 0.001; for black vs white, HR ¼ 3.94, 2.85-5.44 for men diagnosed within 2 years and HR ¼ 2.01, 1.49-2.70 for those diagnosed after 2 years), unemployment (P heterogeneity o0.001; HR ¼ 1.34, 1.18-1.51 in the first 2 years and HR ¼ 0.89, 0.82-0.97 after 2 years), hypertension (P heterogeneity ¼ 0.018; for yes vs no, HR ¼ 1.12, 1.00-1.24 in the first 2 years and HR ¼ 0.95, 0.89-1.02 after 2 years), having had a PSA test prior to recruitment (P heterogeneity ¼ 0.003; HR ¼ 1.51, 1.35-1.70 in the first 2 years and 1.23, 1.14-1.33 after 2 years), and having had enlarged prostate (P heterogeneity o0.001; HR ¼ 2.16, 1.82-2.56 in the first 2 years and 1.27, 1.10-1.46 after 2 years; Supplementary Table 1).

DISCUSSION
Here we report results on associations with established and putative risk factors for prostate cancer risk in a large prospective study of British men. We found that black ethnicity and having previously had a PSA test, an enlarged prostate, or a family history of prostate cancer were positively associated with prostate cancer risk. The risk of being diagnosed with prostate cancer was lower in those who were of Asian ethnic origin, and in men who had obesity, smoked cigarettes, had diabetes, and had never had sex. Time to diagnosis was not a strong modifier of these associations, although men who had had a PSA test prior to recruitment were Multivariable-adjusted model: HRs are stratified by region and age at recruitment and adjusted for age (underlying time variable), Townsend deprivation score (fifths, unknown), ethnicity (white, mixed background, Asian, black, other, and unknown), lives with a wife or partner (no and yes), BMI (o25, X25-o30, X30-o35, X35 kg m À 2 , unknown), smoking (never, former, current, and unknown), physical activity (low (0-o10 METs per week), moderate (X10-o50 METs per week), high (X50 METs per week), and unknown), diabetes (no, yes, and unknown), enlarged prostate (no or unknown, and yes), and family history of prostate cancer (no, yes, unknown), use as appropriate. c P-values from the multivariable-adjusted model were calculated as follows: P-values for likelihood ratio test for variables with more than two categories were obtained comparing the model with and without the variable of interest; P-values for dichotomous variables were obtained comparing the reference category to the other category in the model; P-values for trend were obtained using a pseudo-continuous variable equal to the median value in each category for continuous variables.   . A crosssectional study within the Hospital Episodes Statistics database for England also showed a higher risk of prostate cancer in men with black ethnicity (Maruthappu et al, 2015). Geographic differences in prostate cancer incidence rates have been observed in the United States (Cook et al, 2015), indicating that risk factors for prostate cancer occurrence and for diagnosis may vary geographically. In the current British study, there were differences in risk between certain geographical regions. It is possible that some of the regional differences might be due to differences in detection rates of asymptomatic prostate cancer (Littlejohns et al, 2016). Socioeconomic status, education level, employment status, and marital status (living with a wife or partner) were not associated with prostate cancer risk.
While we did not observe a significant association of prostate cancer risk with height, as seen in previous prospective studies (Pischon et al, 2008; WCRF/AICR, 2014), our results did show that men with higher BMI and fat mass percentage had a lower risk of prostate cancer. While two previous prospective studies have used fat mass measurement from bioimpedance measurements (477 (MacInnis et al, 2003) and 817 incident cases (Wallstrom et al, 2009), respectively), this is to our knowledge the first large prospective study using bioimpedance to estimate body composition. Previous prospective investigations have also found a link between excess adiposity, typically as estimated by BMI or waist circumference, and a lower risk of overall prostate cancer risk (Perez-Cornago et al, 2017;WCRF/AICR, 2014). It is possible that this association might be due to detection bias as in this cohort men with obesity are less likely to have had a PSA test (Littlejohns et al, 2016). Moreover, previous studies have reported slightly lower PSA concentrations in men with high BMI (Bonn et al, 2016). A positive association between adiposity and risk for aggressive prostate cancer has also been observed in previous studies (WCRF/AICR, 2014), but data on stage and grade are not yet available in the UK Biobank cohort.
Findings from the present study of a nearly 15% reduced risk of prostate cancer in cigarette smokers compared to never smokers are consistent with results from a 2010 meta-analysis of 24 observational studies (Huncharek et al, 2010). However, men who were current smokers were markedly less likely to have had a PSA test than never smokers in UK Biobank (Littlejohns et al, 2016), and this association might therefore be due to detection bias.
In agreement with findings from a recent meta-analysis (WCRF/AICR, 2014), alcohol consumption was not related to prostate cancer risk in the current study. Total physical activity was also not associated with prostate cancer risk in the current study, whereas findings from a recent meta-analysis of 46 890 incident cases, which showed that greater leisure-time physical activity was associated with a higher risk of prostate cancer (Moore et al, 2016), although the impact on those results of detection bias was not clear (i.e., the extent to which PSA testing is associated with healthconscious behaviour).
Health status, prostate-specific factors prior to recruitment, sexual history, early life factors, and hair colour and pattern. In agreement with previous studies (Byrne et al, 2017;Nayan et al, 2016), we found no association between vasectomy status and prostate cancer risk. In the current study, hypertension was not linked to prostate cancer risk, although there was some evidence that it was associated with an increased risk in the first 2 years of follow-up. A recent meta-analysis has suggested that hypertension may be related to prostate cancer incidence, but high heterogeneity  , 45-49, 50-54, 55-59, 60-64, and X65 years) and adjusted for age (underlying time variable), use as appropriate.
b Multivariable-adjusted model: HRs are stratified by region and age at recruitment and adjusted for age (underlying time variable), Townsend deprivation score (fifths, unknown), ethnicity (white, mixed background, Asian, black, other, and unknown), lives with a wife or partner (no and yes), BMI (o25, X25-o30, X30-o35, X35 kg m À 2 , and unknown), smoking (never, former, current, and unknown), physical activity (low (0-o10 METs per week), moderate (X10-o50 METs per week), high (X50 METs per week), and unknown), diabetes (no, yes, and unknown), enlarged prostate (no or unknown, and yes), and family history of prostate cancer (no, yes, and unknown), use as appropriate. c P-values from the multivariable-adjusted model were calculated as follows: P-values for likelihood ratio test for variables with more than two categories were obtained comparing the model with and without the variable of interest; P-values for dichotomous variables were obtained comparing the reference category to the other category in the model; P-values for trend were obtained using a pseudo-continuous variable equal to the median value in each category for continuous variables. among studies was noted (Liang et al, 2016), and more prospective data are needed. Our finding that diabetes was associated with a reduced risk of prostate cancer has been consistently reported in other cohort studies (Rodriguez et al, 2005;Tsilidis et al, 2015). It has been suggested that the inverse association between diabetes and prostate cancer risk might be due to lower circulating concentrations of IGF-I (Teppala & Shankar, 2010) and/or testosterone (Grossmann, 2011) or to potential anti-carcinogenic properties of diabetes medication (Wright and Stanford, 2009). Information was not available on diabetes type for the current analyses, but the majority of cases in this age group will be of type II diabetes (Kirkman et al, 2012).
Increases in the proportion of men undergoing PSA testing in the UK have led to a large increase in prostate cancer diagnoses over recent decades (Lilja et al, 2008), and as expected, history of having had a PSA test was positively associated with prostate cancer risk in our study. As expected, previous PSA testing was also more strongly associated with risk in the first 2 years of follow-up, owing to it being a first-line test in the diagnostic pathway for men with prostatic symptoms. Similarly, men with an enlarged prostate were also more likely to be diagnosed with prostate cancer, particularly within the first 2 years, suggesting increased likelihood of cancer detection following urological investigations.
This study found that having a first-degree relative with prostate cancer doubled the risk of prostate cancer, which is wellestablished, although to date only approximately one-third of this risk is explained by known genetic variants (Benafif & Eeles, 2016). The risk was even higher in men where both the father and the brother had prostate cancer, although the number of cases in this category was small (n ¼ 30). The remainder of the excess familial risk may be due to a combination of currently unidentified genetic variation, shared environmental factors, and differential detection in family members (e.g., health-seeking behaviours).
There are few prospective data on sexual history and prostate cancer risk (Rosenblatt et al, 2001). Our results show that men who reported never having had sex have a lower prostate cancer risk than men who had ever had sex. Moreover, men who had not had children had a lower risk compared to those who had had children. Men who have not had sex might have erectile dysfunction or low sexual interest, which are both conditions that have been linked to reduced circulating androgen levels (Isidori et al, 2005). Observational epidemiological studies to date have not shown an association between circulating testosterone and prostate cancer risk (Roddam et al, 2008), but more data are required to examine risk in men with very low circulating testosterone levels.
To date it is unclear whether early-life exposures are involved in prostate cancer aetiology (Sutcliffe & Colditz, 2013;Moller et al, 2015;Sarre et al, 2016), although it has been speculated that childhood body size or timing of puberty may be related to changes in prostate tissue in early adulthood (Sutcliffe & Colditz, 2013;Sarre et al, 2016). Mendelian randomisation studies have shown an association between genetically determined age at puberty and higher risk of aggressive prostate cancer risk (Bonilla et al, 2016). In the current study, however, we found no evidence for an association between a number of early life factors (relative age of first facial hair, relative age voice broke, and comparative height and body size at age 10) and the future risk of prostate cancer.
It has been hypothesised that pigmentation-related traits may influence prostate cancer risk, possibly through altered vitamin D synthesis, owing to a finding in the Alpha-Tocopherol, Beta-Carotene Cancer Prevention Study that men with naturally red hair (which is determined by polymorphisms in the melanocortin-1-receptor (MC1R) gene) had a lower risk of prostate cancer compared to men with light brown hair (Weinstein et al, 2013); however, no significant association between naturally red hair and prostate cancer risk was observed in the current study. Because of the influence of the active androgen dihydrotestosterone on both the growth of prostate cells and on androgenic alopecia, malepattern balding has also been suggested as possible risk factor for prostate cancer, although the findings are inconclusive (Muller et al, 2013;Zhou et al, 2015a, b;Sarre et al, 2016) and the current study found no association between balding pattern and prostate cancer risk.
Study strengths and limitations. This is, to our knowledge, the largest single prospective study of risk factors for prostate cancer in British men. The UK Biobank has collected detailed information on numerous possible prostate cancer risk factors, including risk factors that few previous studies have examined in detail, such as body fat mass, sexual history, early life factors, and hair colour and pattern. In particular, fat mass estimated using bioimpedance is a better marker of overall adiposity than BMI or waist circumference, which does not differentiate between muscle and fat mass. Moreover, many risk factors, such as adiposity measurements or blood pressure, were assessed by trained research clinic staff instead of being self-reported.
Despite the breadth of the exposure information collected at recruitment, we cannot exclude the possibility of residual confounding by unknown or unmeasured factors. In addition, because of the number of tests performed, some of the associations observed might be due to chance. For some of the rare exposures (e.g., red hair colour or never having had sexual intercourse), there are small numbers of exposed cases for robust analysis. This cohort includes participants from multiple regions, including deprived areas, but it is not a representative sample of the whole UK population (Fry et al, 2017). However, it does include participants with a wide range of exposures for a comprehensive set of characteristics allowing internally valid and informative comparisons of risk by factors of interest. Although the number of missing values in our cohort is low (o1%), there are some variables, such as family history of prostate cancer, that have a higher proportion of missing values. These values may not be missing at random, for example, participants with family history of prostate cancer may have replied to this question, while participants with no family history may have left this question blank. Finally, risk factors for prostate cancer may differ by tumour characteristics, but data on tumour stage and Gleason grade were not available for the current analysis. We will perform analyses by subgroups of disease aggressiveness when these data become available.
We have reported a range of established and novel risk factors that are associated with subsequent prostate cancer risk in this large UK prospective study. In particular, black ethnicity, having had a PSA test, an enlarged prostate, and a family history of prostate cancer were positively associated with prostate cancer risk, while Asian ethnicity, obesity, smoking status, diabetes, and never having had children or sexual intercourse were related to a lower prostate cancer risk. Future research in UK Biobank will include analyses by disease aggressiveness to explore whether these associations are due to differences in the likelihood of being diagnosed and/or differences in the risk of developing clinically important prostate cancer.
Resource under Application Number 3282 and we express our gratitude to the participants and those involved in building the resource.