Introduction

Until recently epidemiological studies have largely focused on the role of physical activity behaviours with cancer risk [1]. Cardiorespiratory fitness (referred to here as ‘fitness’) is distinct from physical activity as it describes the capacity of the circulatory and respiratory systems to supply oxygen to skeletal muscle during prolonged physical activity [2, 3]. Fitness is generally objectively measured and has a stronger genetic component than habitual physical activity [2,3,4].

Higher fitness is associated with good cardiometabolic health, including lower visceral adipose tissue, inflammation and insulin sensitivity, and may, therefore, reduce the risk of cancer [5,6,7,8]. Previous studies report that people with higher fitness have lower risks of all-cause mortality, cancer mortality and cardiovascular disease [5, 9,10,11], but the relationship between fitness and incident cancers are less clear. Some studies have reported inverse associations between fitness and lung and colorectal cancers [12,13,14,15,16,17], while for prostate cancer associations have been reported to be null or positive [13,14,15, 17,18,19,20]. Only one prior study has investigated associations between fitness and female-specific incident cancers, and did not find evidence of a relationship [14].

A limitation of observational epidemiological studies includes the possibility of residual confounding and reverse causation. Mendelian randomisation (MR) uses germline genetic variants as proxies of biological traits to generate instrumental variables and estimate their associations with disease risk. Because germline genetic variants are fixed and randomly allocated at conception, this technique may be less likely to be affected by biases and confounding factors (such as preclinical disease and smoking history). This is the first study to use MR to investigate fitness and cancer risk.

We aimed to assess the associations of measured fitness and risk of common cancers (lung, colon, rectal, endometrial, female breast, and prostate cancer) using observational methods in the UK Biobank. In secondary analyses, we used a two-sample MR framework, using genetically predicted fitness, as an instrumental variable derived from UK Biobank [21] and genetic case control data from consortia for those same sites, plus pancreatic cancer and renal cell carcinoma for which observational analyses in the UK Biobank are underpowered. By integrating evidence from both observational epidemiology and MR approaches, we aim to strengthen the basis for causal inference [22].

Methods

UK Biobank study population

The UK Biobank study is a population-based prospective cohort study of 502,625 adults aged 40 to 69 years. A description of the study protocol is available online [23]. Participants were registered with the UK National Health Service and lived within 40 km of a UK Biobank assessment centre in England, Wales, and Scotland. Baseline data were collected between 2006 and 2010. A repeat-measures substudy was conducted between 2012 and 2013.

UK Biobank cardiorespiratory fitness assessment

An individualised submaximal cycle ergometer test was implemented in 2009 and offered to 75,087 participants during baseline data collection, 17,109 participants during the repeat assessment study, and 2877 participants at both timepoints; 97,950 tests were offered in total. For those participants who were offered a test at both timepoints, the earliest fitness test completed by the participant was used to maximise follow-up duration. Participant baseline data were collected on the same day as their exercise test. The test was individualised to each participant’s exercise capacity and risk level for engaging in exercise. Participants with lower exercise capacity or higher risk for exercise-related complications were offered a test with lower work rates, while those with higher exercise capacity or lower risk were offered a test with higher work rates. A description of the exercise test individualisation process and maximal oxygen consumption (VO2 max; ml O2min−1kg−1) estimation process is provided in Supplementary Methods; the test protocol is available online [24]. VO2 max was estimated in two ways: scaled by total-body mass (VO2maxtbm [3.5 ml O2min−1kg−1 total-body mass=1 MET]) and scaled by fat-free mass (VO2maxffm) [25, 26]. VO2maxffm represents the ability of skeletal muscle to use oxygen during maximal exercise, whereas VO2maxtbm is more representative of aerobic performance capacity [27].

Genetic instrument for cardiorespiratory fitness

Full details of the fitness genome-wide association study (GWAS) are available elsewhere [21]. In brief, single nucleotide polymorphisms (SNPs) associated with fitness were identified from a GWAS based on UK Biobank participants of European ancestry who participated in the fitness test (N included = 69,416). Fitness was estimated using the same framework method described above, scaled by fat-free mass and using resting heart rate data from the full cohort, excluding those taking beta-blockers (N included = 452,941) (P < 5 × 10−8 significance threshold).

The Radial plot method was used to select eligible resting heart-rate associated genetic variants for fitness by removing heterogeneous outliers for the genetic variants, of which 149 were also nominally significant in the fitness GWAS (p < 0.05) [28]. The genetic instrument for fitness included 14 fitness and 149 fitness and resting heart rate variants with prioritisation given to the variants identified in the fitness GWAS. In total, 160 independent (r² > 0.01) genetic variants were included in our instrument for fitness [21].

Cancer ascertainment

Observational analysis

Cancer registration data were provided via record linkage to national cancer and death registries, until the following censoring dates: 31 July 2019 in England and Wales and 31 October 2015 in Scotland. Cancers occurring after the registry censoring dates were identified using Hospital Episode Statistics (HES), until the following censoring dates: 30 September 2021 in England, 31 July 2021 in Scotland and 28 February 2018 in Wales (see Supplementary Methods for cancer site definitions).

Of the 84,792 fitness tests analysed after preliminary exclusions (i.e., participant withdrawal of data, ‘high risk’ for exercise; see Supplementary Fig. 1), we retained a preliminary analytic sample of 79,347 participants after additionally excluding 3209 participants for missing data, 1017 due to test data quality, 1219 with missing weight, fat-free mass, or heart rate, and 44 for whom fitness estimation could not be applied. We then excluded 5180 participants with prevalent cancer at baseline and 1551 participants diagnosed with cancer within two years of follow-up. The final analytic sample was 72,572 participants. Health and sociodemographic characteristics were described across age-adjusted and sex-specific fitness tertiles [29].

Genetic cancer data

Risk estimates may be biased when instrumental variables and outcomes are identified from the same sample [30]. We, therefore, used independent GWAS data from international consortia. This includes breast (including estrogen receptor (ER)+ and ER− subtypes) [31, 32], prostate (including aggressive disease) [33], endometrial [34], ovarian [35], lung (including for never smokers) [36], and colorectal cancer (including colon, rectal, male colorectal and female colorectal, distal colon and proximal colon) [37, 38]. We also included pancreatic cancer and renal cell carcinoma [39,40,41,42]. Included sites and subtypes were chosen based on data availability. Further information for the genetic case control studies is available in Supplementary Table 1.

Statistical analysis

Observational analysis

Cox regression models with age as the underlying timescale were used to estimate hazard ratios (HRs) and 95% confidence intervals (CIs) per 3.5 mlO2min−1kg−1 total-body mass and 5.0 mlO2min−1kg−1 fat-free mass for risk of cancer diagnosis. Models were adjusted for possible confounding factors and for female reproductive cancers (breast, endometrial and ovarian cancers) we additionally adjusted for reproductive factors (see Supplementary Methods). Multivariate imputation by chained equations was used to impute missing covariate values.

Adiposity may partially mediate and confound the relationship between fitness and cancer risk (Supplementary Fig. 2). Therefore, we evaluated the role of adiposity in fitness-to-cancer associations both with and without adjustment for either BMI (for models with VO2max scaled by total-body mass) or fat mass (for models with VO2max scaled by fat-free mass).

We have shown previously that repeat assessments of the UK Biobank fitness test will elicit moderately stable fitness estimates (regression dilution ratio = 0.79, standard error = 0.01) [43]. This source of measurement error will influence the strength of observed health associations. Therefore, in a sensitivity analysis, we also provide regression dilution calibrated estimates of fitness-to-cancer associations using established statistical techniques [44]. The shape of dose-response relationships between fitness and risk of cancer diagnosis was investigated using cubic spline regression models. Each model used two knots placed at the 33rd and 67th percentile of the fitness distribution. Reference values were set to the mean fitness value for each specific analysis (see Supplementary Figs. 3 and 4).

Sensitivity analysis

Subgroup analyses for colorectal cancer were examined by sex, and associations for fitness and lung cancer were re-examined after restricting the analysis to never-smokers only. Subgroups were chosen a priori on the basis of data availability and previous evidence for heterogeneity in the associations [1]. We also included a minimally adjusted model to investigate the influence of mediators and/or confounders.

Mendelian randomisation

The MR estimation for fitness and cancer was conducted using the inverse-variance weighted (IVW) method [45]. We additionally calculated the I2GX statistic to assess measurement error in SNP-exposure associations [33], the F-statistic to examine the strength of the genetic instrument [46], Cochran’s Q statistic for heterogeneity between the MR estimates for each SNP [47], and PhenoScanner was used to assess pleiotropy of the genetic instruments [48]. As sensitivity analyses, we used the MR residual sum and outlier (MR-PRESSO) to investigate the role of SNP outliers [49]. To assess pleiotropy, we used the weighted median and contamination mixture methods [50].

To explore relationships between body fat and fitness, we conducted a bi-directional MR of genetically predicted fitness on fat mass and vice versa using our genetic instrument for fitness and an instrument for total fat mass based on a GWAS of UK Biobank participants (N = 330,762 participants of European ancestry), derived from bioelectrical impedance measurements at study baseline [51]. We also conducted multivariable MR (MVMR) analyses to assess the effect of fitness on cancer risk, after accounting for genetically predicted fat mass and height [45].

Statistical software

Observational analyses were performed using Stata version 16.1 (Stata Corporation, College Station, TX, USA). MR analyses were performed using the TwoSampleMR and MendelianRandomisation R packages [52, 53] and figures were plotted in R version 3.6.3. All tests of significance were two-sided, and P < 0.05 were considered statistically significant. Results are presented in accordance with the STROBE checklist [54].

Results

Observational analysis

After a median of 11 years of follow-up, 1586 prostate cancers, 1093 breast cancers, 811 colorectal cancers, 480 lung cancers, 184 endometrial cancers, and 136 ovarian cancers were diagnosed. Participant characteristics by age-adjusted and sex-specific fitness tertiles are provided in Table 1 for fitness scaled by total-body mass and Supplementary Table 2 for fitness scaled by fat-free mass. Fitness was higher in men compared to women, and those in the middle and higher fitness tertiles had better measures of adiposity, socioeconomic status, and cardiometabolic health than those in the lower fitness tertile.

Table 1 Participant characteristics by age-adjusted and sex-specific cardiorespiratory fitness (VO2max per kg total-body mass) tertiles.

Observational analysis results are summarised in Fig. 1. In analyses without BMI adjustment, each 3.5 ml O2min−1kg−1 total-body mass increase (equivalent to 1 metabolic equivalent of task [MET]) in fitness was associated with a 19% reduction in endometrial cancer, 6% reduction in colorectal cancer, and 4% reduction in breast cancer. After BMI adjustment, associations were attenuated but remained directionally consistent. Where associations were detected, relationships generally appeared to be linear but with uncertainty for some cancers at the tails of the fitness distribution (Supplementary Figs. 3 and 4). When fitness was expressed per kg fat-free mass, associations with cancers were not significant. Results adjusted for regression dilution are shown in Supplementary Fig. 5.

Fig. 1: Associations of cardiorespiratory respiratory fitness and incident cancer risk without and with body fat adjustment.
figure 1

HRs and 95% CIs estimated using Cox regression models adjusted for age, sex, self-reported racial/ethnic group, Townsend index of deprivation, education, employment status, smoking status, alcohol consumption, red and processed meat consumption, fish consumption, fruit and vegetable consumption, salt consumption, diabetes status, hypertension, medication use (beta blockers, calcium channel blockers, ACE inhibitors, diuretics, bronchodilators, lipid-lowering agents, iron deficiency agents, non-steroidal anti-inflammatory drugs, metformin). Female reproductive cancers (breast, endometrial, and ovarian) were additionally adjusted for age at menarche, age at menopause, parity, hormone replacement therapy usage, and oral contraceptives. Associations with and without adjustment for either continuous BMI (for models with VO2max scaled by total-body mass) or fat mass (for models with VO2max scaled by fat-free mass). ACE Angiotensin-converting enzyme, BMI body mass index, CI confidence interval, HR hazard ratio.

There was evidence of heterogeneity in the associations of fitness and colorectal cancers by sex; the relationship was inverse for men and null for women (Fig. 2 and Supplementary Fig. 4). Minimally adjusted models are available from Supplementary Table 3.

Fig. 2: Sex-stratified associations of cardiorespiratory respiratory fitness and incident cancer risk without and with body fat adjustment.
figure 2

HRs and 95% CIs estimated using Cox regression models adjusted for age, sex, self-reported racial/ethnic group, Townsend index of deprivation, education, employment status, smoking status, alcohol consumption, red and processed meat consumption, fish consumption, fruit and vegetable consumption, salt consumption, diabetes status, hypertension, medication use (beta blockers, calcium channel blockers, ACE inhibitors, diuretics, bronchodilators, lipid-lowering agents, iron deficiency agents, non-steroidal anti-inflammatory drugs, metformin). Associations with and without adjustment for either continuous BMI (for models with VO2max scaled by total-body mass) or fat mass (for models with VO2max scaled by fat-free mass). ACE Angiotensin-converting enzyme, BMI body mass index, CI confidence interval, HR hazard ratio.

Mendelian randomisation analyses

Higher levels of genetically predicted fitness were associated with a lower risk of breast cancer (OR per 5.0 ml O2min−1kg−1 fat-free mass = 0.92, 95% CI: 0.86–0.98; P = 0.02), including ER+ (0.91, 0.84–0.99; P = 0.02) and ER- (0.88, 0.80–0.97; P = 0.01) subtypes, but was not significantly associated with any other cancer site (Fig. 3). There was also no evidence of an association with colorectal cancer after stratification by sex and site (Supplementary Tables 4 and 5). There was significant heterogeneity in the MR estimates for the SNPs for each cancer site (Cochran’s Q P < 0.05), except for associations with lung cancer for never smokers (P = 0.13), aggressive prostate cancer (P = 0.17) and renal cancer (P = 0.09).

Fig. 3: Associations of genetically predicted cardiorespiratory respiratory fitness and cancer risk.
figure 3

Associations were estimated using the inverse variance weighted method. CI confidence interval, ER estrogen receptor, OR odds ratio.

In MR sensitivity analyses, the relationships between fitness and breast cancer were directionally consistent in comparison with the primary MR analysis (Supplementary Table 6). There was evidence of an inverse association between fitness and lung cancer using the weighted median method (0.85, 0.74–0.98; P = 0.02) and a positive association with pancreatic cancer using the contamination mixture method (1.09, 1.03–1.14; P = 0.03) (Supplementary Table 6). Radial plots also did not indicate any strong influence of outliers on the MR results (Supplementary Fig. 6). The likelihood of bias due to weak instruments was low (F-statistic > 10 for all SNPs). There was evidence of moderate levels of measurement error (I2GX = 0.52–0.65), indicating reduced reliability of Egger results, therefore we do not include Egger estimates [55]. Using PhenoScanner, 742 traits were linked to SNPs for fitness (P < 5 × 10−8), particularly pulse rate (Supplementary Fig. 7).

The bi-directional MR analysis indicated that genetically instrumented fat mass had a strong inverse association with fitness (OR per 0.5 SD increase = 0.61, 0.52–0.71; P < 0.001), but a weaker inverse relationship of fitness with fat mass (OR per 5 ml O2min−1kg−1 fat-free mass = 0.96, 0.92–1.01; P = 0.08). In MVMR analyses, associations with breast cancer were attenuated after adjustment for fat mass and height. While associations with lung cancer became statistically significant (0.90, 0.84–0.96; P = 0.002), although remained null for never smokers (Table 2).

Table 2 Genetic associations of cardiorespiratory respiratory fitness and cancer risk after accounting for fat mass and height.

Discussion

This study used both observational and MR methods to examine the relationship between cardiorespiratory fitness and incident cancer risk, providing the first evidence that higher fitness levels may reduce risks of breast cancer. In observational analysis only, we report additional inverse associations between VO2max scaled to total body mass and risks of colorectal and endometrial cancer. However, associations with all three cancer sites were attenuated after accounting for adiposity. Observational associations between cancer and VO2max scaled to fat-free mass were not statistically significant.

Previous observational analyses have reported inverse associations between fitness and colorectal and lung cancer. We did not observe an association with lung cancer and the inverse association between VO2max scaled to total body mass and colorectal cancer was attenuated after accounting for BMI [12,13,14,15]. Our results may differ from these previous studies due to differences in population sampling, fitness assessment, and fitness estimation approaches. For example, cycle ergometer-based fitness estimates may differ from treadmill-based estimates due to differences in load bearing and motion artefact [15, 18, 20]. The UK Biobank fitness test was also relatively light intensity, which enabled more participants to be assessed. Thus, our analysis likely characterises a wider variety of lower-fitness individuals than previous studies which used more strenuous tests. Previous estimates using UK Biobank data had shorter duration of follow-up (median 5 years) and used fewer exercise test data, which will reduce the precision of risk estimates.

Previous MR studies based on up to five SNPs have reported inverse associations between genetically predicted physical activity levels and risks of breast, colorectal and aggressive prostate cancer [56, 57]. However, current estimates suggest that GWAS significant polymorphisms explain a very limited proportion of phenotypic physical activity (e.g., 0.06% for overall physical activity) [58]. The small number of SNPs increase the influence of possibly invalid variants within the instrument, and the instrument has a bidirectional association with BMI [58]. Fitness is a trait that reflects both input from genetics and physical activity behaviours. The genetic instrument for fitness used in the present study likely encompasses both past and current levels as well as the capacity to participate in physical activity [2, 3]. This instrument explains 1.2% of the variation in observed fitness levels, increasing the reliability of risk estimates. Future work examining the relative importance of the different constituents of genetic fitness may help to clarify whether the null relationships that we report for fitness on colorectal and aggressive prostate cancer risk are indicative of the greater relative importance of physical activity behaviours or are partially reflective of the methodological limitations discussed above.

The role of adiposity in fitness is complex and not fully understood. Higher adiposity is associated with impaired physical performance, relating reduced muscle oxygen uptake, lower cardiac efficiency, neuromuscular dysfunction, and increased cancer risk [59,60,61,62,63]. Higher levels of physical activity are important for weight maintenance and increasing fitness, and higher fitness may reduce some of the harmful cardiometabolic effects of obesity [64]. Differences between the associations of fitness and cancer by scaling are likely driven by the different components of fitness, as VO2maxtbm has a strong inverse correlation with body size and adiposity [27]. However, the complex interplay of adiposity, fitness and cancer might mean that accounting for adiposity for models of cardiorespiratory fitness could lead to an over-adjustment of risk estimates, but these relationships are difficult to disentangle. Relationships between fitness and all-cause, cancer, and cardiovascular mortality outcomes has stronger evidence for independence of associations with adiposity [10, 64,65,66]. Future work with longer durations of follow-up will improve power to investigate whether there are differential risk associations by BMI classification.

These analyses have several strengths. This study is the first to use genetically instrumented fitness to evaluate possible causal relationships between fitness and cancer risk. The UK Biobank is the largest sample currently available with measured cardiorespiratory fitness, maximising power to assess associations across a broad range of cancer sites, the majority of which have not been previously investigated. Our independently validated novel framework to estimate fitness harmonised the UK Biobank test protocols and calibrated these data to a maximal exercise test to estimate VO2max. This estimation framework also incorporated multiple heart rate measurements to reduce measurement error, with high temporal agreement (regression dilution ratio=0.79) over approximately a 2.8 year period for greater precision in risk estimates [43]. Further, the baseline assessment collected data across a wide range of lifestyle, medical and anthropometric factors, enabling thorough adjustment for possible confounders.

Our study has limitations. This analysis is not a randomised controlled trial and therefore we are not able to fully assess causality. In MR analysis we cannot exclude the possibility of genetic confounding or horizontal pleiotropy [67]. The genetic instrument for VO2maxtbm was not available for comparison with our observational analysis. The genetic instrument also included resting heart rate information; therefore, our results may be partially driven by genetic associations with resting heart rate. Given the strong a priori evidence and mechanistic plausibility of associations between fitness and cancer risk we have not included correction for multiple testing [18,19,20], however, we cannot exclude the possibility of chance findings. The UK Biobank participants are predominantly of White European ancestry and are healthier than the underlying sampling population, therefore risk estimates may not be generalisable to some other populations, including “high-risk” participants who did not undergo the fitness assessment. The fitness test was also submaximal, which may increase measurement error, and previous studies have noted larger magnitudes of associations with health outcomes using maximal fitness tests [11, 15].

In summary, we provide evidence that higher fitness levels may reduce risks of endometrial, colorectal, and breast cancer. The role of adiposity in mediating the relationship between fitness and cancer risk is not fully understood, and further research is needed to explore this complex relationship. Aiming to increase fitness, including via changes in body composition, may be an effective strategy to reduce risk of some cancer sites.