## Introduction

Breast and colorectal cancer are two of the most common cancers globally with a combined estimated number of 4 million new cases and 1.5 million deaths in 20181. Physical activity is widely promoted along with good nutrition, maintaining a healthy weight, and refraining from smoking, as key components of a healthy lifestyle that contribute to lower risks of several non-communicable diseases such as cardiovascular disease, diabetes, and cancer2.

Epidemiological studies have consistently observed inverse relationships between physical activity and risks of breast and colorectal cancer3,4,5. The World Cancer Research Fund/American Institute for Cancer Research (WCRF/AICR) Continuous Update Project classified the evidence linking physical activity to lower risks of breast (postmenopausal) and colorectal cancer as ‘strong’6. However, previous epidemiological studies have generally relied on self-report measures of physical activity which are prone to recall and response biases and may attenuate ‘true’ associations with disease risk7. More objective methods to measure physical activity, such as accelerometry, have seldom been used in large-scale epidemiological studies, with the UK Biobank being a recent exception in which ~100,000 participants wore a wrist accelerometer for 7-days to measure total activity levels8. Epidemiological analyses of these data will provide important new evidence on the link between physical activity and cancer, but these analyses remain vulnerable to other biases of observational epidemiology such as residual confounding (e.g. low physical activity levels may be correlated with other unfavourable health behaviours) and reverse causality (e.g. preclinical cancer symptoms may have resulted in low physical activity levels).

Mendelian randomisation (MR) is an increasingly used tool that uses germline genetic variants as proxies (or instrumental variables) for exposures of interest to enable causal inferences to be made between a potentially modifiable exposure and an outcome9. Unlike traditional observational epidemiology, MR analyses should be largely free of conventional confounding owing to the random independent assignment of alleles during meiosis10. In addition, there should be no reverse causation, as germline genetic variants are fixed at conception and are consequently unaffected by the disease process10.

We used a two-sample MR framework to examine potential causal associations between objective accelerometer-measured physical activity and risks of breast and colorectal cancer using genetic variants associated with accelerometer-measured physical activity identified from two recent genome-wide association studies (GWAS)11,12. We examined the associations of these genetic variants with risks of breast cancer13 and colorectal cancer14.

## Results

### MR estimates for breast cancer

We estimated that a 1 standard deviation (SD) (8.14 milligravities) increment in the genetically predicted levels of accelerometer-measured physical activity was associated with a 49% lower risk of breast cancer for the instrument using the 5 genome-wide-significant SNP instrument (odds ratio [OR]: 0.51, 95% confidence interval [CI]: 0.27 to 0.98, P-value = 0.04, Q-value = 0.062) (Table 1), and a 41% lower risk for the extended 10 SNP instrument (OR: 0.59, 95% CI: 0.42 to 0.84, P-value = 0.003, Q-value = 0.012). An inverse association was only found for estrogen receptor positive breast cancer (ER+ve) (5 SNP instrument, OR: 0.45, 95% CI: 0.20 to 1.01, P-value = 0.054, Q-value = 0.077; extended 10 SNP instrument, OR: 0.53, 95% CI: 0.35 to 0.82, P-value = 0.004, Q-value = 0.004), and not estrogen receptor negative (ER-ve) breast cancer (Table 1); although this heterogeneity by subtype was not statistically different (I2 = 16%; P-heterogeneity by subtype = 0.27). There was some evidence of heterogeneity based on Cochran’s Q (P-value < 0.05) for the breast cancer analyses; consequently, for these models random effects MR estimates were used (Table 1). MR estimates for each of the SNPs associated with accelerometer-measured physical activity in relation to breast cancer risk are presented in Fig. 1 and Supplementary Fig. 1. Scatter plots (with coloured lines representing the slopes of the different regression analyses) and funnel plots of the accelerometer-measured physical activity and breast cancer risk association for the extended 10 SNP instrument are presented in Supplementary Figs. 2 and 3.

### Mendelian randomisation estimates for colorectal cancer

For colorectal cancer, a 1 SD increment in accelerometer-measured physical activity level was associated with a 34% lower risk (OR: 0.66, 95% CI: 0.48 to 0.90, P-value = 0.01, Q-value = 0.022) for the 5 SNP instrument, and a 40% lower risk for the extended 10 SNP instrument (OR: 0.60, 95% CI: 0.47 to 0.76, P-value = 2.4 × 10−5, Q-value = 0.0002) (Table 1). The inverse effect estimate was stronger for women (OR: 0.57, 95% CI: 0.36 to 0.90, P-value = 0.02, Q-value = 0.036), while there was weak evidence for an inverse association for men (OR: 0.79, 95% CI: 0.50 to 1.23, P-value = 0.29, Q-value = 0.31); this heterogeneity did not meet the threshold of significance (I2 = 0%; P-heterogeneity by sex = 0.34). For colorectal subsite analyses, accelerometer-measured physical activity levels were inversely associated with risks of colon cancer (OR per 1 SD increment OR: 0.64, 95% CI: 0.44 to 0.94, P-value = 0.02, Q-value = 0.036); while there was weak evidence for an inverse association between accelerometer-measured physical activity levels and rectal cancer (OR: 0.70, 95% CI: 0.43 to 1.14, P-value = 0.15, Q-value = 0.18). Similar results by sex and subsite for colorectal cancer were found for the extended 10 SNP instrument (Table 1). MR estimates for each individual SNP associated with accelerometer-measured physical activity in relation to colorectal cancer risk are presented in Fig. 2 and Supplementary Figs. 46. Scatter plots (with coloured lines representing the slopes of the different regression analyses) and funnel plots of the accelerometer-measured physical activity and colorectal cancer risk association for the extended 10 SNP instrument are presented in Supplementary Figs. 7 and 8.

### Evaluation of assumptions and sensitivity analyses

The strength of the genetic instruments denoted by the F-statistic was ≥10 for all the accelerometer-measured physical activity variants and ranged between 27 and 56 (Table 2). Little evidence of directional pleiotropy was found for all models that used the extended 10 SNP instrument (MR-Egger intercept P-values > 0.06) (Table 1). The estimates from the weighted-median approach for the extended 10 SNP instrument were consistent with those of inverse-variance weighted (IVW) models (Table 1). The MR pleiotropy residual sum and outlier test (MR-PRESSO) method identified the SNPs rs11012732 and rs55657917 contained within the extended 10 SNP instrument as pleiotropic for breast cancer, but similar magnitude associations were observed when these variants were excluded from the analyses (Supplementary Table 10). After examining Phenoscanner and GWAS catalogue, we found that several of the accelerometer-measured physical activity genetic variants were also associated with adiposity-related phenotypes (Supplementary Tables 11, 12). However, the results from the leave-one-SNP out analysis did not reveal any influential SNPs driving the associations (Supplementary Tables 1318). Additionally, similar results were found when the 5 adiposity-related SNPs were excluded from the extended 10 SNP genetic instrument (Supplementary Table 19). Further, the results from the multivariable MR analyses adjusting for BMI using the extended 10 SNP instrument were largely unchanged from the main IVW results (Supplementary Table 20). Finally, a similar pattern of results was found when GWAS effect estimates adjusted for BMI were used for 5 SNP genetic instrument11 (Supplementary Table 21).

## Discussion

In this MR analysis, higher levels of genetically predicted accelerometer-measured physical activity were associated with lower risks of breast cancer and colorectal cancer, with similar magnitude inverse associations found for ER+ve and for colon cancer. These findings indicate that population-level increases in physical activity may lower the incidence of these two commonly diagnosed cancers, and support the promotion of physical activity for cancer prevention.

A large body of observational studies has investigated how physical activity relates to risk of breast and colorectal cancer15,16. In a participant-level pooled analysis of 12 prospective studies, when the 90th and 10th percentile of leisure-time physical activity were compared, lower risks of breast cancer (hazard ratio [HR]: 0.90, 95% CI: 0.87 to 0.93), colon cancer (HR: 0.84, 95% CI: 0.77 to 0.91), and rectal cancer (HR: 0.87, 95% CI: 0.80 to 0.95) were found3. Similarly, inverse associations between total physical activity and risks of postmenopausal breast and colorectal cancer were recently reported in meta-analyses of all published prospective cohort data by the WCRF/AICR Continuous Update Project15,16.

These observational studies relied on self-report physical activity assessment methods that are prone to measurement error, which may attenuate associations towards the null. In addition, causality cannot be ascertained from such observational analyses as they are vulnerable to residual confounding and reverse causality. Further, logistical and financial challenges prohibit randomised controlled trials of physical activity and cancer development. For example, it has been estimated that in order to detect a 20% breast cancer risk reduction, between 26,000 to 36,000 healthy middle-aged women would need to be randomised to a 5 year exercise intervention17. Several trials on cancer survivors are registered and underway, and these may provide evidence of potential causal associations between physical activity and disease free survival and cancer recurrence;18 however, these interventions will not inform causal inference of the relationship between physical activity and cancer development. We therefore conducted MR analyses to allow causal inference between accelerometer-measured physical activity and risks of developing breast and colorectal cancer. The inverse associations we found were stronger for ER+ve breast cancer and colon cancer, and are highly concordant with prior observational epidemiological evidence.

There is currently no standard method in translating accelerometer data into energy expenditure values, such as metabolic equivalent of tasks (METs). However, using an accepted threshold for moderate activity (e.g. fast walking) of 100 milli-gravity19,20, 1-SD higher mean acceleration (~8 milli-gravity) equates to approximately 50 min extra moderate activity per week. Similarly, using an accepted threshold of 425 milli-gravity for vigorous activity (e.g. running)19,20, a 1-SD higher mean acceleration equates to approximately 8 min of extra vigorous activity per week. In our study, we found that such an increase in weekly activity translates to a 49 and 34% lower risks of developing breast and colorectal cancer, respectively.

Being physically active is associated with less weight gain and body fatness, and lower adiposity is associated with lower risks of breast and colorectal cancer15,16. Since body size/adiposity is likely on the causal pathway linking physical activity and breast and colorectal cancer, it is challenging to disentangle independent effects of physical activity on cancer development. The close inter-relation between adiposity and physical activity is evident from 5 of the 10 SNPs in the extended genetic instrument for accelerometer-measured physical activity being previously associated with adiposity/body size traits. However, it is noteworthy that our results were unchanged when we excluded adiposity-related SNPs from this genetic instrument, and when we conducted multivariable MR analyses adjusting for body mass index (BMI). These results would therefore suggest that physical activity is also associated with breast and colorectal cancer independently of adiposity.

Multiple biological mechanisms are hypothesised to mediate the potential beneficial role of physical activity on cancer development21,22. Greater physical activity has been associated with lower circulating levels of insulin and insulin-like growth factors, which promote cellular proliferation in breast and colorectal tissue and have also been linked to development of cancers at these sites21,23,24,25,26,27. Higher levels of physical activity have also been associated with lower circulating concentrations of estradiol, estrone, and higher levels of sex hormone binding globulin28,29,30 which are themselves risk factors for breast cancer development31,32. Physical activity has also been associated with improvements in the immune response with increased surveillance and elimination of cancerous cells33,34. Higher levels of physical activity may also reduce systemic inflammation by lowering the levels of pro-inflammatory factors, such as C-reactive protein (CRP), interleukin-6 (IL-6) and tumour necrosis factor-alpha (TNF-a)33,35,36. Finally, emerging evidence suggests that the gut microbiome may play an important role in the physical activity and cancer relationship. Dysbiosis of the gut microbiome has been associated with increased risks of several malignancies, including breast and colorectal cancer37. Changes in gut microbiome composition and derived metabolic products have been found following endurance exercise training with short-chain fatty acid concentrations increased in lean, but not obese, subjects38,39.

A fundamental assumption of MR is that the genetic variants do not influence the outcome via a different biological pathway from the exposure of interest (horizontal pleiotropy). We conducted multiple sensitivity analyses using an extended 10 SNP genetic instrument for accelerometer-measured physical activity to test for the influence of pleiotropy on our causal estimates, and our results were robust according to these various tests. A potential limitation of our analysis is that the genetic variants explained a small fraction of the variability of accelerometer-measured physical activity, which may have resulted in some of the breast cancer subtype and colorectal subsite analyses being underpowered. In addition, our use of summary-level data precluded subgroup analyses by other cancer risk factors (e.g. BMI, exogenous hormone use). We were also unable to stratify breast cancer analyses by menopausal status; however, the majority of women in the source GWAS had postmenopausal breast cancer13. Finally, 7-day accelerometer-measured physical activity levels of UK Biobank participants may not have been representative of usual behavioural patterns.

In conclusion, we found that genetically elevated levels of accelerometer-measured physical activity were associated with lower risks of breast and colorectal cancer. These findings strongly support the promotion of physical activity as an effective strategy in the primary prevention of these commonly diagnosed cancers.

## Methods

### Data on physical activity

Summary-level data were obtained from two recently published GWAS on accelerometer-measured physical activity conducted in ~91,000 participants from the UK Biobank11,12. In the GWAS by Doherty et al.11, BOLT-LMM was used to perform linear mixed models analyses that were adjusted for assessment centre, genotyping array, age, age2, and season. This GWAS identified 5 genome-wide-significant SNPs (P-value < 5 × 10−8) associated with accelerometer-measured physical activity. The estimated SNP-based heritability for accelerometer-measured physical activity in the UK Biobank is 14%12, suggesting that additional SNPs contributed to its variation. Consequently, we also used an accelerometer-measured physical activity instrument with an expanded number of SNPs (n = 10; associated with accelerometer-measured physical activity at P-value < 1 × 10−7) identified by another UK Biobank GWAS by Klimentidis et al.12. The extended number of SNPs in the accelerometer-measured physical activity instrument allowed us to conduct more robust sensitivity analyses to check for the influence of horizontal pleiotropy on the results. Data for the associations between the 10 SNPs and physical activity were obtained from a recent MR study on physical activity and depression that used the data from the same UK Biobank GWAS40. Detailed information on the genetic variants used in the 5 genome-wide significant SNP instrument and the extended 10 SNP instrument is provided in Table 2.

### Data on breast cancer and colorectal cancer

Summary data for the associations of the accelerometer-measured genetic variants with breast cancer (overall and by estrogen receptor status: ER positive [ER+ve] and ER negative [ER-ve]) were obtained from a GWAS of 228,951 women (122,977 breast cancer [69,501 ER positive, 21,468 ER negative] cases and 105,974 controls) of European ancestry from the Breast Cancer Association Consortium (BCAC)13. Genotyping data were imputed using the program IMPUTE214 with the 1000 Genomes Project Phase III integrated variant set as the reference panel. Single nucleotide polymorphisms (SNPs) with low imputation quality (imputation r2 < 0.5) were excluded. Top principal components (PCs) were included as covariates in regression analysis to address potential population substructure (iCOGS: top eight PCs; OncoArray: top 15 PCs) (Supplementary Tables 1, 2)13,41. For colorectal cancer, summary data from 98,715 participants (52,775 colorectal cancer cases and 45,940 controls) were drawn from a meta-analysis within the ColoRectal Transdisciplinary Study (CORECT), the Colon Cancer Family Registry (CCFR), and the Genetics and Epidemiology of Colorectal Cancer (GECCO) consortia14. Imputation was performed using the Haplotype Reference Consortium (HRC) r1.0 reference panel and the regression models were further adjusted for age, sex, genotyping platform (whenever appropriate), and genomic PCs (from 3 to 13, whenever appropriate) (Supplementary Tables 36).

### Statistical power

The a priori statistical power was calculated using an online tool at http://cnsgenomics.com/shiny/mRnd/42. The 5 and 10 SNP accelerometer-measured physical activity instruments explained an estimated 0.2% and 0.4% of phenotypic variability, respectively. Given a type 1 error of 5%, for the 5 SNP instrument identified from the GWAS by Doherty et al.11 we had sufficient power (> 80%) when the expected OR per 1 SD was ≤ 0.77 and ≤ 0.67 for overall breast cancer (122,977 cases and 105,974 controls) and colorectal cancer (52,775 colorectal cancer cases and 45,940 controls), respectively. Power estimates for the 5 genome-wide significant SNP and the extended 10 SNP instruments by subtypes of breast cancer and subsites of colorectal cancer are presented in Supplementary Tables 7 and 8.

### Statistical analysis

A two-sample MR approach using summary data and the fixed-effect IVW method was implemented. All accelerometer-measured physical activity and cancer results correspond to an OR per 1 SD increment (8.14 milli-gravities) in the genetically predicted overall average acceleration. The heterogeneity of causal effects by cancer subtype and sex was investigated by estimating the I2 statistic assuming a fixed-effects model43.

For causal estimates from MR studies to be valid, three main assumptions must be met: 1) the genetic instrument is strongly associated with the level of accelerometer-measured physical activity; 2) the genetic instrument is not associated with any potential confounder of the physical activity—cancer association; and 3) the genetic instrument does not affect cancer independently of physical activity (i.e. horizontal pleiotropy should not be present)44. The strength of each instrument was measured by calculating the F-statistic using the following formula: $$F = R^2\left( {N - 2} \right)/\left( {1 - R^2} \right)$$, where R2 is the proportion of the variability of the physical activity explained by each instrument and N the sample size of the GWAS for the SNP-physical activity association45. To calculate R2 for the 5 genome-wide significant SNP instrument we used the following formula:$$2 \times {\mathrm{EAF}} \times \left( {1 - {\mathrm{EAF}}} \right) \times {\mathrm{beta}}^2$$; whereas for the extended 10 SNP instrument we used:$$\left( {2 \times {\mathrm{EAF}} \times \left( {1 - {\mathrm{EAF}}} \right) \times {\mathrm{beta}}^2} \right)/\left[ {\left( {2 \times {\mathrm{EAF}} \times \left( {1 - {\mathrm{EAF}}} \right) \times {\mathrm{beta}}^2} \right) + (2 \times {\mathrm{EAF}} \times \left( {1 - {\mathrm{EAF}}} \right) \times {\mathrm{N}} \times {\mathrm{SE}}({\mathrm{beta}})^2)} \right]$$, where EAF is the effect allele frequency, beta is the estimated genetic effect on physical activity, Ν is the sample size of the GWAS for the SNP-physical activity association and SE (beta) is the standard error of the genetic effect46. FDR correction (Q-value) was performed using the Benjamini–Hochberg method47.

### Sensitivity analyses

Several sensitivity analyses were used to check and correct for the presence of pleiotropy in the causal estimates. Cochran’s Q was computed to quantify heterogeneity across the individual causal effects, with a P-value ≤ 0.05 indicating the presence of pleiotropy, and that consequently, a random effects IVW MR analysis should be used43,48. We also assessed the potential presence of horizontal pleiotropy using MR-Egger regression based on its intercept term, where deviation from zero denotes the presence of directional pleiotropy. Additionally, the slope of the MR-Egger regression provides valid MR estimates in the presence of horizontal pleiotropy when the pleiotropic effects of the genetic variants are independent from the genetic associations with the exposure49,50. We also computed OR estimates using the complementary weighted-median method that can give valid MR estimates under the presence of horizontal pleiotropy when up to 50% of the included instruments are invalid44. The presence of pleiotropy was also assessed using the MR-PRESSO. In this, outlying SNPs are excluded from the accelerometer-measured physical activity instrument and the effect estimates are reassessed51. For all of the aforementioned sensitivity analyses to identify possible pleiotropy, we considered the estimates from the extended 10 SNP instrument as the primary results due to unstable estimates from the 5 SNP instrument. A leave-one-SNP out analysis was also conducted to assess the influence of individual variants on the observed associations. We also examined the selected genetic instruments and their proxies (r2 > 0.8) and their associations with secondary phenotypes (P-value < 5 × 10−8) in Phenoscanner (http://www.phenoscanner.medschl.cam.ac.uk/) and GWAS catalog (date checked April 2019).

For the extended 10 SNP instrument, we also conducted multivariable MR analyses to adjust for potential pleiotropy due to BMI because the initial GWAS on physical activity reported several strong associations (P-value < 10−5) between the identified SNPs and BMI52. The new estimates correspond to the direct causal effect of physical activity with the BMI being fixed. The genetic data on BMI were obtained from a GWAS study published by The Genetic Investigation of ANthropometric Traits (GIANT) consortium53 (Supplementary Table 9). Additionally, for the extended 10 SNP instrument, we also conducted analyses with adiposity-related SNPs (i.e. those previously associated with BMI, waist circumference, weight, or body/trunk fat percentage in GWAS studies at P-value < 10−8) excluded (n = 5; rs34517439, rs6775319, rs11012732, rs1550435, rs59499656). Finally, we conducted two-sample MR analyses using BMI adjusted GWAS estimates for the 5 SNP accelerometer-measured physical activity instrument11. However, the MR results using the BMI adjusted GWAS estimates should be interpreted cautiously due to the potential for collider bias11.

All the analyses were conducted using the MendelianRandomisation54 and TwoSampleMR55 packages, and the R programming language.

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.