Mendelian randomization analysis rules out disylipidaemia as colorectal cancer cause

Dyslipidemia and statin use have been associated with colorectal cancer (CRC), but prospective studies have shown mixed results. We aimed to determine whether dyslipidemia is causally linked to CRC risk using a Mendelian randomization approach and to explore the association of statins with CRC. A case-control study was performed including 1336 CRC cases and 2744 controls (MCC-Spain). Subjects were administered an epidemiological questionnaire and were genotyped with an array which included polymorphisms associated with blood lipids levels, selected to avoid pleiotropy. Four genetic lipid scores specific for triglycerides (TG), high density lipoprotein cholesterol (HDL), low density lipoprotein cholesterol (LDL), or total cholesterol (TC) were created as the count of risk alleles. The genetic lipid scores were not associated with CRC. The ORs per 10 risk alleles, were for TG 0.91 (95%CI: 0.72–1.16, p = 0.44), for HDL 1.14 (95%CI: 0.95–1.37, p = 0.16), for LDL 0.97 (95%CI: 0.81–1.16, p = 0.73), and for TC 0.98 (95%CI: 0.84–1.17, p = 0.88). The LDL and TC genetic risk scores were associated with statin use, but not the HDL or TG. Statin use, overall, was a non-significant protective factor for CRC (OR 0.84; 95%CI: 0.70–1.01, p = 0.060), but lipophilic statins were associated with a CRC risk reduction (OR 0.78; 95%CI 0.66–0.96, p = 0.018). Using the Mendelian randomization approach, our study does not support the hypothesis that lipid levels are associated with the risk of CRC. This study does not rule out, however, a possible protective effect of statins in CRC by a mechanism unrelated to lipid levels.

0.78; 95%CI 0.66-0.96, p = 0.018). Using the Mendelian randomization approach, our study does not support the hypothesis that lipid levels are associated with the risk of CRC. This study does not rule out, however, a possible protective effect of statins in CRC by a mechanism unrelated to lipid levels.
Colorectal cancer (CRC) has been associated with both genetic and environmental risk factors such as diet 1,2 , alcohol 3 , smoking 4 , physical activity 5 and metabolic syndrome 6,7 . Metabolic syndrome is a cluster of important cardiovascular risk factors: high fasting glucose, abdominal obesity, high triglycerides (TG), reduced high density lipoprotein cholesterol (HDL) and high blood pressure. As one of the components, dyslipidemia has been thought to have an important role in inflammatory pathways, oxidative stress and insulin resistance, which could contribute to the pathogenesis of cancer. However, findings from prospective studies that have examined the association between serum dyslipidemia (TG, HDL, low density lipoprotein cholesterol (LDL) or total cholesterol (TC)) and colorectal neoplasia have been inconsistent 6,[8][9][10][11] . It is unknown whether lipids and lipoproteins cause cancer or are intermediate or correlated factors within carcinogenic pathways. The Mendelian randomization approach can be used to establish a causal relationship between dyslipidemia and CRC. Mendelian randomization studies use the distribution of alleles in the population to simulate randomized assignment to lower or higher lipids. A previous Mendelian randomization analysis assessing the causality of dyslipidemia and CRC has been published before 11 . It reported an association between a genetic score for TC and the risk of CRC (OR per unit SD increase = 1.46; 95% CI: 1.20-1.79) but no significant association was found for LDL, HDL or TG. While the association of TC was strong, the interpretation of the results is not straightforward, since TC is the sum of LDL and HDL cholesterol, fractions that have been reported to have opposite effects regarding CRC and may explain the controversial findings of studies that have analyzed the association between high levels of serum TC and CRC risk 8,9,12 . A causal effect of lipids regarding CRC risk should have a clear mechanistic interpretation. Being TC essentially the sum of HLD and LDL, genetic instruments for TC are correlated either with LDL, HDL or both, which violates the pleiotropy assumption of Mendelian randomization studies.
Epidemiological studies on dyslipidemia and CRC risk could be confounded by 3-Hydroxy-3-methylglutaryl-coenzyme A reductase inhibitors (statins) use, which might also have a protective effect on CRC. It is unclear whether it is statin use or dyslipidemia that prompted statin use, which may be associated with CRC. Indeed, a large number of epidemiological studies have examined the effect of statins on CRC risk, with often inconsistent results 13,14 . It has been suggested that pharmacogenetic variation or specific type of statin, which influence the effect on lipid levels, might also modify the statin-CRC risk association [13][14][15] .
In this study, we conducted a Mendelian randomization study to evaluate the relationship between CRC and a genetic risk score, derived from 119 genetic variants associated with blood concentrations of TG, HDL and LDL in GWAS studies. Moreover, we wanted to explore the effect of statins use on CRC.

Materials and Methods
Study population. A detailed description of the MCC-Spain case-control study has been provided elsewhere 16 . Briefly, 10,183 subjects aged 20-85 years were enrolled in Spanish hospitals and primary care centers between 2008 and 2013. Eligible subjects included histological confirmed incident cases of CRC (n = 2,171). Both cases and controls were free of personal CRC history. For the present study we only have included a subset of 1,336 CRC cases and 2,744 controls that had genotype data. These subjects were selected from the complete study using a stratified random sampling strategy to maintain the distribution of cases and controls among participating centers. Data collection. A structured computerized epidemiological questionnaire was administered by trained personnel in a face-to-face interview. Subjects also filled in a semi-quantitative food frequency questionnaire, and blood samples and anthropometric data were obtained following the study protocol. This study did not collect or measured blood lipid levels for cases and controls.
The variables that were analyzed were those that could be related with CRC. We also wanted to explore variables included in the definition of metabolic syndrome. So the variables considered for analyses were: level of education, family history of CRC (none versus first or second or third-degree); cigarette smoking (ever, never); average alcohol consumption between ages 30 and 40 (in standard units of alcohol, SUA, categorized into low-risk and high-risk consumption: >4 SUA/day in men and >2 SUA/day in women) 17 ; diabetes (with only diet or using anti-diabetic drugs; hypertension (with only diet or antihypertensive treatment); body mass index (BMI), calculated with the weight reported at age 45, which was categorized according to World Health Organization criteria: underweight, normal weight, and overweight (<30 kg/m 2 ) versus obese (≥30 kg/m 2 ); abdominal obesity (calculated at the inclusion date) was defined according to World Health Organization criteria as a waist-hip ratio ≥ 0.90 cm for males and ≥0.85 cm for females; average physical exercise, determined using self-reported leisure-time activity performed in the past 10 years and used to estimate the Metabolic Equivalent of Task (MET) per hour per week, calculated using the Ainsworth's compendium of physical activities 18 , and categorized as no physical activity in their leisure time (0 MET), and any physical activity in their spare time (>0 MET); vegetables, classified as low or high intake using 200 g/day as cut-off; red meat consumption including cured meat, and processed meat. High intake of red meat was considered eating ≥65 g/day.
The location of the CRC was defined according to its anatomic distribution: proximal colon (colon above the level of the splenic flexure including it), distal (descending colon, sigmoid colon), and rectum.
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration. The protocol of MCC-Spain was approved by each of the Ethics Committees of the participating institutions. The specific Drug exposure. Use of prescription drugs was obtained through face-to-face interviews administered by trained personnel, mainly by indication. Information was coded following the Anatomical Therapeutic Chemical-ATC code to assess individual exposure to different drugs. All drugs prescribed to the patients were recorded, but only statins, aspirin (acetyl salicylic acid, ASA) or other nonsteroidal anti-inflammatory drugs (NSAIDs) were considered for this study. Statins were classified as lipophilic (atorvastatin, fluvastatin, lovastatin, simvastatin) or hydrophilic (pravastatin and rosuvastatin) and, by effectiveness in lowering LDL cholesterol levels, as low-potency (fluvastatin, lovastatin, pravastatin, simvastatin) and high-potency (atorvastatin and rosuvastatin) 19 . Moreover, statins users were considered regular if they started consuming these drugs at least one year before. Given that ASA or NSAIDs can be used sporadically, regular users were defined as consuming ≥1 time/ day for at least one year.
Genotyping and elaboration of genetic lipid scores. The Infinium Human Exome BeadChip (Illumina, San Diego, USA) array was used to genotype >200,000 coding markers plus 5000 additional custom single nucleotide polymorphisms (SNP) selected from previous genome-wide association studies (GWAS) or in genes of interest for cancer. SNPs associated with specific blood lipid levels (LDL, HDL, TG) in GWAS were identified through the GWAS catalogue 20 and downloaded with the MRInstruments R package (https://github.com/ MRCIEU/MRInstruments). Retrieved SNPs were filtered according to these criteria: associated at genome-wide significance (p ≤ 5.0 × 10 −8 ); restricted to Caucasian population; the allele associated with increased lipid level should be identified; only one SNP per chromosomal region (the most statistically significant SNP was selected when linkage disequilibrium was r 2 > 0.2). To minimize pleiotropic effects, SNPs associated with more than one lipid trait (except for TC, which could share SNPs of HDL and LDL) or to other traits potentially related to CRC (alcohol, BMI, diabetes, inflammation) were excluded. SNP rs174546, which encodes for a protein that is a member of the fatty acid desaturase (FADS), was the only polymorphism that was associated with the three lipid pathways (an increase of LDL and HDL levels, but a decrease of TG), but many other SNPs were associated with more than one lipid trait. So, only SNPs exclusively associated at genome-wide significance with one lipid trait (either TG, HDL or LDL, but not more than 1) were chosen for each genetic risk score to make them as specific to one trait as possible (as in Holmes et al. 21 ). To study total cholesterol (TC), since it was the sum of LDL and HDL, the SNPs selected were not restricted regarding pleiotropy 22 between LDL and HDL (some selected SNPs may be related to both LDL and HDL). We consider the results for TC difficult to interpret, since LDL and HDL may have opposite effects. However, we provide the analysis for completeness and to compare the results with other studies.
Since the effects of each individual locus identified through GWAS are small, lipid genetic scores (LGS) were created as the joint additive effect of the selected SNPs risk alleles to increase the power of the Mendelian randomization analysis 23 . We created four LGS: one for each lipid exposure of interest. Our exome array included 22 SNPs associated with levels of TG, 36 with HDL, 39 with LDL and 49 with levels of TC, which are shown in Supplementary Table 1.
In all LGS, risk alleles were those associated with increase of TG, LDL, HDL or TC levels. Each SNP was coded according to the number of copies of the allele associated with dyslipidemia (0, 1, and 2) and each LGS was created as the total sum of alleles across the specific SNPs for each lipid. To simplify the analyses, and because the effect size was not always reported or was in different units, an equal weight was assigned to all SNPs, though it is known that some SNPs show stronger associations with lipid levels than others.

Statistical analysis.
Hardy-Weinberg equilibrium was tested among controls, and no SNP showed deviation with p < 0.001. The sum of allele counts for each LGS was compared between cases and controls with linear models. Odds ratios (OR) and 95% confidence intervals (95% CI) were also calculated from logistic regression models. When referring to genetic scores, all ORs were scaled to express the risk per 10 alleles (OR 10 ). A study design adjustment score (SDAS) was created to reduce bias related to differences in case and control selection frequencies. All the analyses were adjusted for this SDAS that included age, sex, recruiting center, level of education and the three first principal components of genetic ancestry obtained from ancestry informative markers included in the genotyping array. The SDAS also included the interactions between age and sex, and region and sex. In addition to the SDAS, which was always considered, multivariate-adjusted models to assess the net effect of statins also included as potential confounders variables that were associated both with CRC (Table 1) Table 2 shows the association of statin use with other variables among controls). To account for multiple comparisons when individual SNPs were analyzed, Bonferroni significance threshold were calculated (Supplementary Table 1). The SNPassoc 24 package from R statistical software (R Foundation for Statistical Computing, Vienna, Austria) and PLINK 25 were used for the analyses. Statistical power. We performed a power calculation regarding the association of the LGS with CRC. We used the web application mRNd 26 (https://cnsgenomics.shinyapps.io/mRnd), and assumed that the proportion of lipid levels variance explained by the LGS was 10%. Our study had 80% power to detect an OR of at least 1.33.

Results
In this population-based case-control study, as expected, CRC was associated with family history, high alcohol consumption, high waist-hip ratio, physical inactivity, low intake of vegetables, and high intake of red meat (Table 1). Regular AAS or NSAID use was associated with a reduced risk of CRC. Other covariates related with the metabolic syndrome such as diabetes mellitus and arterial hypertension were not associated with CRC.
LDL genetic score and statins use. The LDL and TC LGS were associated with statin consumption among controls (OR 10 = 2.12, 95% CI: 1.64-2.73, p < 0.0001 for LDL; OR 10 = 1.67, 95% CI: 1.32-2.11, p < 0.0001 for TG). Figure 1 shows, the increase in odds ratio of being a regular statin user in relation to the number of risk alleles www.nature.com/scientificreports www.nature.com/scientificreports/ of the LDL LGS. The figure shows that this increase was linear, indicating an independent additive contribution of each SNP to the LDL score. Each risk allele increased a 7.8% the likelihood of being a regular statin user. This association was similar in cases. Though blood lipid levels for the subjects were not collected in the MCC study, this observation supports the validity of the LDL LGS used as an instrumental variable for lifetime exposure to higher LDL levels. HDL and TG LGS were not associated with statin use (OR 10 = 1.10, 95%CI: 0.86-1.42, p = 0.49 for HDL and OR 10 = 1.33, 95%CI: 0.96-1.84, p = 0.092 for TG).

Association of lipid genetic scores with CRC.
None of the SNPs contributing to each of the LGS was associated with CRC when Bonferroni correction was applied (see Supplementary Tables 1-4 for details). Moreover, the distribution of estimated risk effects showed a normal distribution around an average of zero, with some SNPs showing a direct association with CRC while others had an inverse direction, suggesting that the observed effects were likely related to random variation. The most significant SNPs in relation with CRC in our population were rs1748195 for TG, rs7941030 for HDL, rs12916, rs10102164 and rs10102164 for LDL. For TC, the most significant SNPs were rs12916, rs10102164 and rs7941030 which also were related to LDL and HDL, as expected.
None of the Mendelian randomization estimates for the associations between LGS and CRC, based on unweighted allele scores, show any risk effect (Table 3). Cases had the same average risk alleles than controls in all the lipids traits (Table 3 and Fig. 2). Therefore, individuals with larger numbers of TG, HDL, LDL, or TC risk alleles were not at higher risk for CRC. Also, since the LDL and TC LGS were associated with statin use, we performed the analysis stratified by statin use. As Table 3 shows, the instruments were not associated with CRC among non-users of statins and there was not an effect interaction. Furthermore, to discard the possibility of confounding or pleiotropy, we confirmed that the LGS were independent of known CRC risk factors (Supplementary  Table 5), and specifically discarded effect modification by BMI, diabetes or hypertension.

Discussion
Mendelian randomization is a technique used to determine the causal impact of a risk factor on an outcome from observational data using genetic variants. It has already been used to investigate associations between blood lipids and colorectal polyps 27 , coronary heart disease 28 , and prostate cancer 29 . In our study, using the Mendelian randomization approach, none of the lipid genetic scores for dyslipidemia analyzed was associated with CRC risk. This indicates that lifetime dyslipidemia most probably is unrelated to the development of colorectal neoplasms, and that the associations previously found in observational studies between dyslipidemia and CRC could be result of uncontrolled confounding factors or reverse causation.
This analysis was based on the usual instrumental variable assumptions of Mendelian randomization studies that use genetic scores as lifetime exposure surrogates. The first assumption was that the LGS was associated with the exposure of interest. Our study could not test this condition directly since lipid levels were not measured, but it was probably fulfilled, since we selected SNPs that had been linked to plasma lipid levels in GWAS 20 . As an indirect check of this assumption, we confirmed the association between the TC, LDL and TG allele score and statin consumption, a drug that is essentially prescribed for high LDL but also for hypertriglyceridemia 30,31 . The second assumption was that the LGS was not related to other factors that confound the exposure-outcome relationship. Potential confounding factors (diabetes mellitus, arterial hypertension, BMI, and waist-hip ratio) that could influence the relationship between dyslipidemia and CRC were explored, and none were associated with www.nature.com/scientificreports www.nature.com/scientificreports/ the LGS. We acknowledge, however, that the existence of residual confounding cannot completely be ruled out in a case-control study. Since the genetic scores were related to statin use, we also performed an analysis stratified by use of these drugs and observed no effect modification. Mendelian randomization studies are usually robust to confounding, as long as the third assumption is met: the effect of the LGS on the outcome should be mediated only through the association with the modifiable risk factor. We cannot rule out pleiotropic effects of the SNPs included in the LGS that may confound the association, though we carefully selected the SNPs to avoid that. We excluded SNPs associated with other traits than the lipid of interest. In fact, these strong assumptions that Mendelian randomization studies require are suggested to be the reason why this type of study may find more negative results 22,23 .
It is uncertain whether dyslipidemia may contribute to the pathogenesis of colorectal neoplasia. In fact, three recent meta-analysis/systematic reviews show controversial results. Yao et al. 8 , concluded that especially high levels of TG and TC were associated with an increased risk of CRC, whereas HDL might be associated with a decreased risk of CRC. In contrast, Tian et al. 9 reported that high levels of TC, TG and LDL were associated with colorectal adenoma but not with CRC, and no association was observed between levels HDL and colorectal neoplasia. Finally, Passarelli et al. 12 found a 13% higher prevalence of colorectal adenomas per increase in TG levels at colonoscopy, a 4% lower per increase in HDL and similar blood concentrations of LDL and TC. There are two studies based on Mendelian randomization to assess the causality of dyslipidemia and colorectal neoplasia 11,27 . Passarelli et al. 27 investigated the relationship between dyslipidemia and polyps and they reported that genotypepolyp odds ratio using weighted allele scores were not associated On the other hand, Rodriguez-Broadbent et al. 11 supported a causal relationship between higher levels of TC with CRC risk. In our study, we did not find any association between dyslipidemia, including TC, and CRC. We have explored the relationship between TC and CRC and found it unrelated, though we believe its interpretation can be difficult as TC is essentially the sum of LDL and HDL and HDL may have an inverse effect on risk as it has been suggested for cardiovascular diseases 32 .
Regarding statins, these drugs have been suggested to play a role in cancer chemoprevention 33,34 . However, epidemiological evidence remains inconclusive. Recently, two meta-analyses including 40 and 42 individual studies, respectively 13,14 , reported a modest reduction in risk of CRC among statin users, though this was only among cohort studies and case control studies, and not among randomized controlled trials. It is unclear whether statin use may be associated with CRC due to their lipid lowering activity or to other mechanisms. Also, the association of statins with CRC could be just a reflection of the possible risk associated with high lipid levels. Confounding by indication is a common bias in observational studies and occurs when the indication (high cholesterol) for the medication under study (statin) is also associated with the outcome of interest (CRC). We have found that statin use was associated with CRC, with a borderline non-significant risk reduction of 16% in the multivariate adjusted model. However, we admit with a point estimate of 0.84 and a confidence interval which goes from 0.70 to 1.01, data must be interpreted cautiously. The failure to obtain a significant association could be related to statistical power, because our study is large, but not huge. We found, however, a stronger risk reduction for lipophilic statins. It is known that lipophilic statins show an efficient activity both at hepatic and extrahepatic sites, which could explain a higher effect in colonic tissue 35 . Since statins show a probable risk reduction of CRC but our Mendelian randomization study rules out the role of lipids, we should consider that the protective effect of statins is related to other mechanisms of action independent of lipid levels, such as anti-inflammatory effects 36 or microbiome interactions 37 .
Although our study does not support a causal effect of dyslipidemia in CRC risk, we must consider that our study has several limitations. The strong assumptions required for Mendelian randomization analysis, as  www.nature.com/scientificreports www.nature.com/scientificreports/ explained above, may not be fully met. Also, the modest sample size of our study results in a limited statistical power for a Mendelian randomization study unless the magnitude of the association is large 26,38 . Our calculations indicate that we have good power to detect OR > 1.33, assuming that the proportion of explained variance in lipid levels by the genetic score is 10%, which is debatable, but seems reasonable according to previous literature 39 . We assigned equal weight to each SNP, which may be suboptimal, but was necessary to include in the LGS some SNPs for which the effect size was unreported or were expressed in different units. Another important limitation is that we do not have serum lipid levels. However, the fact that the instrumental variable estimation for the LDL levels was associated with statins use supports the validity of the score used.
In conclusion, the null association observed in this Mendelian Randomization study does not support the hypothesis that dyslipidemia is involved in the pathogenesis of CRC. Our study found that lipophilic statins could have a possible protective effect on CRC by a mechanism unrelated to lipid levels.

Data Availability
The datasets generated and/or analyzed during the current study are not publicly available as they contain information that could compromise the privacy of research participants. The data are available from the corresponding author on request, but access to the data is dependent on consideration of the request by the Study Direction Committee, which serves as the data access committee for this study. The committee is affiliated to CIBER of Epidemiology and Public Health (CIBERESP), Instituto de Salud Carlos III, Madrid.

Figure 2.
Distribution of the genetic risk score of every lipid trait in cases and controls. The x-axis indicates the number of risk alleles of the lipid genetic risk score indicated in each figure title. The y-axis corresponds to the proportion of subjects observed for each risk allele count. The proportion of controls are shown in gray and, superimposed, the proportion of cases in red. Deviations from the overlap between cases and controls are shown in light color. The distributions do not differ in shape or location as indicated in the t-tests performed to compare the number of risk alleles between cases and controls.