Physical activity and risks of breast and colorectal cancer: A Mendelian randomization analysis

Physical activity has been associated with lower risks of breast and colorectal cancer in epidemiological studies; however, it is unknown if these associations are causal or confounded. In two-sample Mendelian randomization analyses, using summary genetic data from the UK Biobank and GWA consortia, we found that a one standard deviation increment in average acceleration was associated with lower risks of breast cancer (odds ratio [OR]: 0.59, 95% confidence interval [CI]: 0.42 to 0.84, P-value=0.003) and colorectal cancer (OR: 0.66, 95% CI: 0.53 to 0.82, P-value=2*E-4). We found similar magnitude inverse associations by breast cancer subtype and by colorectal cancer anatomical site. Our results support a potentially causal relationship between higher physical activity levels and lower risks of breast cancer and colorectal cancer. Based on these data, the promotion of physical activity is probably an effective strategy in the primary prevention of these commonly diagnosed cancers. Disclaimer Where authors are identified as personnel of the International Agency for Research on Cancer / World Health Organization, the authors alone are responsible for the views expressed in this article and they do not necessarily represent the decisions, policy or views of the International Agency for Research on Cancer / World Health Organization.


Abstract
Physical activity has been associated with lower risks of breast and colorectal cancer in epidemiological studies; however, it is unknown if these associations are causal or confounded. In two-sample Mendelian randomization analyses, using summary genetic data from the UK Biobank and GWA consortia, we found that a one standard deviation increment in average acceleration was associated with lower risks of breast cancer (odds ratio [OR]: 0.59, 95% confidence interval [CI]: 0.42 to 0.84, P-value=0.003) and colorectal cancer (OR: 0.66, 95% CI: 0.53 to 0.82, P-value=2*E -4 ). We found similar magnitude inverse associations by breast cancer subtype and by colorectal cancer anatomical site. Our results support a potentially causal relationship between higher physical activity levels and lower risks of breast cancer and colorectal cancer. Based on these data, the promotion of physical activity is probably an effective strategy in the primary prevention of these commonly diagnosed cancers.

Introduction
Breast and colorectal cancer are two of the most common cancers globally with a combined estimated number of 4 million new cases and 1.5 million deaths in 2018 1 . Physical activity is widely promoted along with good nutrition, maintaining a healthy weight, and refraining from smoking, as key components of a healthy lifestyle that contribute to lower risks of several non-communicable diseases such as cardiovascular disease, diabetes, and Epidemiological studies have consistently observed inverse relationships between physical activity and risks of breast and colorectal cancer [2][3][4][5] , but have generally relied on selfreport measures of physical activity, which are prone to recall and response biases, and may attenuate 'true' associations with disease risk 6 . More objective methods to measure physical activity, such as accelerometry, have seldom been used in large-scale epidemiological studies, with the UK Biobank being a recent exception, in which ~100,000 participants wore a wrist accelerometer for 7-days to measure total activity levels 7 . Epidemiological analyses of these data will provide important new evidence on the link between physical activity and cancer, but these analyses remain vulnerable to other biases of observational epidemiology, such as residual confounding (e.g. low physical activity levels may be correlated with other unfavourable health behaviours) and reverse causality (e.g. preclinical cancer symptoms may have resulted in low physical activity levels).
Mendelian randomization (MR) is an increasingly used tool that uses germline genetic variants as proxies (or instrumental variables) for exposures of interest to enable causal inferences to be made between a potentially modifiable exposure and an outcome 8 . Unlike traditional observational epidemiology, MR analyses, should be largely free of conventional confounding owing to the random independent assignment of alleles during meiosis 9 . In addition, there should be no reverse causation, as germline genetic variants are fixed at conception and are consequently unaffected by the disease process 9 .
We used a two-sample MR framework to examine potential causal associations between objective accelerometer-measured physical activity and risks of breast and colorectal cancer using genetic variants associated with accelerometer-measured physical activity identified from a recent genome-wide association study (GWAS) 10 . We examined the associations of these genetic variants with risks of breast cancer 11 and colorectal cancer 12 .

Mendelian randomization estimates for breast and colorectal cancer
We estimated that a 1 standard deviation (SD) (8.14 milli-gravities) increment in the genetically predicted levels of accelerometer-measured physical activity was associated with a 41% (Odds ratio [OR]: 0.59, 95% confidence interval [CI]: 0.42 to 0.84, P-value=0.003) lower risk of overall breast cancer ( Table 2). Similar magnitude inverse associations were found for estrogen receptor positive (ER +ve ) (OR: 0.53, 95% CI: 0.35 to 0.82, P-value=0.004) and estrogen receptor negative (ER -ve ) (OR: 0.78, 95% CI: 0.51 to 1.22, P-value=0.27) breast cancer (I 2 =35%; P-heterogeneity by subtype=0.21). There was some evidence of heterogeneity based on Cochran's Q (P-value<0.05) for the breast cancer analyses; consequently, for these models random effects MR estimates were used (Table 2). MR estimates for each individual single nucleotide polymorphism (SNP) associated with accelerometer-measured physical activity in relation to breast cancer risk are presented in  ) than men (OR: 0.82, 95% CI: 0.61 to 1.11, P-value=0.21), although this heterogeneity did not meet the threshold of significance (I 2 =72%; P-heterogeneity by sex=0.06). For colorectal subsite analyses, accelerometer-measured physical activity levels were inversely associated with risks of colon cancer (OR per 1 SD increment OR: 0.61, 95% CI: 0.47 to 0.79, P-value=2*10 -4 ) and rectal cancer (OR: 0.76, 95% CI: 0.55 to 1.07, P-value=0.12). MR estimates for each individual SNP associated with accelerometer-measured physical activity in relation to breast cancer risk are presented in Figure 2  Similar results were generally observed for all breast cancer and colorectal cancer endpoints when MR analyses were conducted with the two genome-wide significant accelerometer-measured physical activity SNPs only ( Table 2).

Evaluation of assumptions and sensitivity analyses
The strength of the genetic instruments denoted by the F-statistic was ≥ 10 for all the accelerometer-measured physical activity variants and ranged between 29 and 56 ( Table 1).
The intercept test from the MR-Egger regression was statistically significant in the analysis of colorectal cancer in women denoting potential pleiotropy; however, the corrected estimate from MR-Egger replicated the initial finding ( Table 2). The estimates from the weighted median approach were consistent with those of inverse variance weighted (IVW) models. The MR pleiotropy residual sum and outlier test (MR-PRESSO) method identified the SNPs rs11012732 and rs55657917 as pleiotropic for breast cancer, but similar magnitude inverse relationships were observed when these variants were excluded from the analyses (Supplementary Table 6).
After examining Phenoscanner and GWAS catalog, we found that several of the accelerometer-measured physical activity genetic variants were also associated with adiposity related phenotypes (Supplementary Table 7). However, the results from the leave-one-SNP out analysis did not reveal any influential SNPs driving the associations (Supplementary   Tables 8 -10). Additionally, similar results were found when the five adiposity-related SNPs were excluded from the genetic instrument (Supplementary Table 11). Further, the results from the multivariable MR analyses adjusting for BMI were largely unchanged from the main IVW results (Supplementary Tables 12, 13).
The association of genetically predicted physical activity and colorectal cancer was similar (OR: 0.60, 95% CI: 0.47 to 0.76, P-value=2.5*10 -5 ) after excluding UK Biobank participants from the GWAS for colorectal cancer.

Discussion
In this MR analysis, higher levels of genetically-predicted accelerometer-measured physical activity were associated with lower risks of breast cancer and colorectal cancer, with similar magnitude inverse associations found for breast cancer subtypes and by colorectal anatomical subsite. These findings indicate that population-level increases in physical activity may lower the incidence of these two commonly diagnosed cancers, and support the promotion of physical activity for cancer prevention.  3 . These observational studies relied on self-report physical activity assessment methods that are prone to measurement error, which may attenuate associations towards the null. In addition, causality cannot be ascertained from such observational analyses as they are vulnerable to residual confounding and reverse causality. Further, logistical and financial challenges prohibit randomized controlled trials of physical activity and cancer development.
For example, it has been estimated that in order to detect a 20% breast cancer risk reduction, between 26,000 to 36,000 healthy middle-aged women would need to be randomized to a 5 year exercise intervention 14 . Several trials on cancer survivors are registered and underway, and these may provide evidence of potential causal associations between physical activity and disease free survival and cancer recurrence 15 ; however, these interventions will not inform causal inference of the relationship between physical activity and cancer development. We conducted MR analyses to allow causal inference between accelerometer-measured physical activity and risks of developing breast and colorectal cancer. The inverse associations we found were consistent for breast cancer subtypes and across colorectal cancer subsites, and are strongly concordant with prior observational epidemiological evidence.
Being physically active is associated with less weight gain and body fatness, and lower adiposity is associated with lower risks of breast and colorectal cancer 16,17 . Since body size/adiposity is likely on the causal pathway linking physical activity and breast and colorectal cancer, it is challenging to disentangle independent effects of physical activity on cancer development. This overlap between adiposity and physical activity is evident from 5 of the 10 SNPs in the genetic instrument for accelerometer-measured physical activity previously being associated with adiposity/body size traits. However, it is noteworthy that our results were unchanged when we excluded adiposity-related SNPs from the genetic instrument, and when we conducted multivariable MR analyses adjusting for body mass index (BMI). These results would therefore suggest that physical activity is also associated with breast and colorectal cancer independently of adiposity.
Multiple biological mechanisms are hypothesized to mediate the potential beneficial role of physical activity on cancer development. Greater physical activity has been associated with lower circulating levels of insulin and insulin-like growth factors, which promote cellular proliferation in breast and colorectal tissue and have also been linked to development of cancers at these sites [18][19][20][21][22][23] . Higher levels of physical activity have also been associated with lower circulating levels of estradiol, estrone, and higher levels of sex hormone binding globulin [24][25][26] which are strong risk factors for breast cancer development 27,28 . Physical activity has also been associated with improvements in immune response, with increased surveillance and elimination of cancerous cells 29,30 . Higher levels of physical activity may also reduce systemic inflammation by lowering the levels of pro-inflammatory factors, such as C-reactive protein (CRP), interleukin-6 (IL-6) and tumour necrosis factor-alpha (TNF-a) 29,31,32 . Finally, emerging evidence suggests that the gut microbiome may play an important role in the physical activity and cancer relationship. Dysbiosis of the gut microbiome has been associated with increased risks of several malignancies, including breast and colorectal cancer 33 . Changes in gut microbiome composition and derived metabolic products have been found following endurance exercise training, with short-chain fatty acid concentrations increased in lean, but not obese subjects 34,35 .
A fundamental assumption of MR is that the genetic variants do not influence the outcome via a different biological pathway from the exposure of interest (horizontal pleiotropy). We conducted multiple sensitivity analyses to test for the influence of pleiotropy on our causal estimates, and our results were robust according to these various tests. A potential limitation of our analysis is that the genetic variants explained a small fraction of the variability of accelerometer-measured physical activity, which may have resulted in some of the breast cancer subtype and colorectal subsite analyses being underpowered. In addition, our use of summary-level data precluded subgroup analyses by other cancer risk factors (e.g. BMI, exogenous hormone use). We were also unable to stratify breast cancer analyses by menopausal status; however, the majority of women in the source GWAS had postmenopausal breast cancer 11 . Finally, 7-day accelerometer-measured physical activity levels of UK Biobank participants may not have been representative of usual behavioral patterns.
In conclusion, we found that genetically elevated levels of accelerometer-measured physical activity were associated with lower risks of breast and colorectal cancer. These findings strongly support the promotion of physical activity as an effective strategy in the primary prevention of these commonly diagnosed cancers.

Data on physical activity
Summary-level data were obtained from a recently published GWAS on accelerometer-measured physical activity conducted within UK Biobank 10 . In this GWAS, the regression models were adjusted for age, sex, the first ten genomic principal components, center, season (month), and genotyping chip. This GWAS identified 2 genome-widesignificant polymorphisms (P-value<5x10 -8 ) associated with accelerometer-measured physical activity. The estimated SNP-based heritability was 14% suggesting that additional SNPs contributed to its variation. Consequently, for our primary analyses, we used a larger number of 10 independent (linkage disequilibrium [LD] r 2 ≤ 0.001) genetic variants by relaxing the significance threshold to P-value<1x10 -7 . The expanded number of genetic variants in the accelerometer-measured physical activity instrument also allowed sensitivity analyses to be conducted to check for the influence of horizontal pleiotropy on the results. Data for the associations between the 8 additional SNPs and physical activity were obtained from a recent MR study on physical activity and depression that used the data from the same UK Biobank GWAS 36 . In secondary analyses, we used the two genome-wide significant SNPs only.

Statistical analysis
A two-sample MR approach using summary data and the fixed-effect IVW method was implemented. All accelerometer-measured physical activity and cancer results correspond to an OR per 1 SD increment (8.14 milli-gravities) in the genetically predicted overall average acceleration. The heterogeneity of causal effects by cancer subtype and sex was investigated by estimating the I 2 statistic assuming a fixed-effects model 38 .
For causal estimates from MR studies to be valid, three main assumptions must be met: 1) the genetic instrument is strongly associated with the level of accelerometer-measured physical activity; 2) the genetic instrument is not associated with any potential confounder of the physical activity -cancer association; and 3) the genetic instrument does not affect cancer independently of physical activity (i.e. horizontal pleiotropy should not be present) 39 . The strength of each instrument was measured by calculating the F statistic using the following formula: where R 2 is the proportion of the variability of the physical activity explained by each instrument and N the sample size of the GWAS for the SNP-physical activity association 40 . To calculate R 2 we used the following formula: where EAF is the effect allele frequency, beta is the estimated genetic effect on physical activity, Ν is the sample size of the GWAS for the SNP-physical activity association and SE (beta) is the standard error of the genetic effect 41 .

Sensitivity analyses
Several sensitivity analyses were used to check and correct for the presence of pleiotropy in the causal estimates. Cochran's Q was computed to quantify heterogeneity across the individual causal effects, with a P-value≤0.05 indicating the presence of pleiotropy, and that consequently, a random effects IVW MR analysis should be used 38,42 . We also assessed the potential presence of horizontal pleiotropy using MR-Egger regression based on its intercept term, where deviation from zero denotes the presence of pleiotropy. Additionally, the slope of the MR-Egger regression provides valid MR estimates in the presence of horizontal pleiotropy when the pleiotropic effects of the genetic variants are independent from the genetic associations with the exposure 43,44 . We also computed OR estimates using the complementary weighted-median method that can give valid MR estimates under the presence of horizontal pleiotropy when up to 50% of the included instruments are invalid 39 .
The presence of pleiotropy was also assessed using the MR-PRESSO. In this, outlying SNPs are excluded from the accelerometer-measured physical activity instrument and the effect estimates are reassessed 45 . A leave-one-SNP out analysis was also conducted to assess the influence of individual variants on the observed associations. We also examined the selected genetic instruments and their proxies (r 2 >0.8) and their associations with secondary phenotypes (P-value<5x10 -8 ) in Phenoscanner (http://www.phenoscanner.medschl.cam.ac.uk/) and GWAS catalog (date checked April 2019).
We also conducted multivariable MR analyses to adjust for potential pleiotropy due to BMI because the initial GWAS on physical activity reported several strong associations (P-value<10 -5 ) between the identified SNPs and BMI 46 . The new estimates correspond to the direct causal effect of physical activity with the BMI being fixed. The genetic data on BMI were obtained from a GWAS study published by The Genetic Investigation of ANthropometric Traits (GIANT) consortium 47 (Supplementary Table 5). Additionally, we also conducted analyses with adiposity related SNPs (i.e. those previously associated with BMI, waist circumference, weight, or body/trunk fat percentage in GWAS studies at P-value<10 -8 ) excluded (n=5; rs34517439, rs6775319, rs11012732, rs1550435, rs59499656).
Finally, as the GECCO consortium includes 26,763 participants from the UK Biobank, we re-ran the colorectal cancer analyses using GWAS summary estimates with UK Biobank participants excluded in order to correct for any bias and inflated Type 1 errors this may have introduced into our results 48 .
All the analyses were conducted using the MendelianRandomization package and R programming language 49 .

Data availability
Data supporting the findings of this study are available within the paper and its supplementary information files.       Single nucleotide polymorphisms * The estimates correspond to a standard deviation increase in physical activity † P-value or pleiotropy based on MR-Egger intercept ‡ P-value for heterogeneity based on Q statistic § The estimates were derived from a random-effects model due to the presence of heterogeneity based on Cochran's Q statistic