The effect of education and general cognitive ability on smoking: A Mendelian randomisation study

Objectives To determine whether the effect of educational attainment on health outcomes is due in part to an effect on smoking behaviour, and whether this is due to educational attainment or general cognitive ability. Design Multivariable Mendelian randomization using genetic variants associated with educational attainment and general cognitive ability. Setting Individuals (N = 84,907) in the UK Biobank study. Main outcome measures Smoking status, including current smoking, ever smoking and smoking cessation. Results One additional year of education leads to a 4.4% decrease in the probability of being a current smoker (95% CI 6.8% decrease to 1.9% decrease), but there is no clear evidence of an effect of general cognitive ability (coefficient 5.5% increase, 95% CI -0.4% decrease to 11.3% increase). Similar results were obtained for smoking initiation and cessation. Conclusions More years of education lead to a reduced likelihood of smoking and, among those who do smoke, a greater likelihood of cessation. Given the considerable physical harms associated with smoking, this is likely to account for a substantial proportion of health inequalities associated with differences in educational attainment.

4 effects of educational attainment versus general cognitive ability is critical to understanding the nature of health inequalities associated with educational status.
In this study we used genetic variants associated with educational attainment, together with genetic variants associated with general cognitive ability, in a multivariable MR framework using individual level data to determine the unique effects of each on smoking behaviour. This method, recently developed by Sanderson and colleagues (5) allows multiple genetic instruments, capturing distinct exposures, to be investigated simultaneously (even when these exposures are highly correlated, as in the case of educational attainment and general cognitive ability).

Data
Between 2006 and 2010, UK Biobank recruited 502,641 individuals aged between 37 and 73 years old from across the UK. All participants attended a clinic where they completed a questionnaire and interview about their characteristics and a range of health topics. They additionally provided a range of data including anthropometric measurements and gave samples of blood, urine and saliva. This study is described in more detail elsewhere (6), and data access procedures are described on the UK Biobank website (http://www.ukbiobank.ac.uk/principles-of-access/). We excluded from our analysis all individuals with a high level of relatedness to other individuals in the study (81,890 exclusions) or who had non-European ancestry (78,647 exclusions). We additionally excluded an additional 849 individuals due to sex chromosomal mismatch with their reported sex or sex-chromosome aneuploidy detection, or who where outliers in heterozygosity or the rate of missing genetic data..
The SNPs that we used to create our instrument for general cognitive ability were discovered using the interim release of UK Biobank, and so we additionally excluded all individuals who were included in the interim release. This gives a resulting potential sample size of 256,142 individuals. A total of 91,016 of the individuals eligible to be included in our sample completed the general cognitive ability questions. However, 6,109 of these individuals were missing data on one of the other exposures or controls included in the analysis, giving a final sample size of 84,907 individuals.

Measures
Education. Individuals in UK Biobank were asked to report their highest educational qualification. For each of the levels of education a corresponding age at which the individual would have completed their education has been assigned. These are described in Table 1.
Insert Table 1 about here.
General cognitive ability. General cognitive ability was measured as the number of correct answers recorded in a series of 13 questions designed to measure cognitive ability.
This was completed by participants during a computer questionnaire as part of their initial clinic visit or through an online questionnaire completed after the visit. The score is the number of questions correctly completed by the individuals in the time allowed and ranges between 0 and 13 with a mean of 6.0 and a standard deviation of 2.1. For the analyses reported here, the score has been standardised to have a mean of 0 and a standard deviation of 1.
Smoking. All participants where asked if they currently smoked and if they had ever smoked. From this information we constructed three variables: 1) a binary variable for current smokers/current non-smokers; 2) a binary smoking initiation variable for ever/never smokers; and 3) among individuals who have ever smoked, a smoking cessation variable for former vs current smokers.
Genetic Instruments. A recent GWAS of educational attainment by Okbay and colleagues identified 74 SNPs at genome-wide significance that associated with years of education completed (7). A polygenic risk score for education was created for each of the individuals in our sample using the genome-wide significant SNPs from this GWAS. Each individual's risk score was calculated as the weighted total number of education increasing alleles they had from these SNPs, weighted by the size of the effect of the SNP in the original GWAS. Three of the SNPs from this GWAS where not available in the UK Biobank data and were substituted; all of the substitutes used where in perfect LD with the original SNP.
For general cognitive ability we created a polygenic risk score for each individual using the results from a recent GWAS of general cognitive ability by Sniekers and colleagues, which identified 18 SNPs at genome-wide significance (8). Again, the risk score was calculated as the weighted number of general cognitive ability increasing alleles each individual had from these SNPs, weighted by the effect of the SNP in the GWAS.

Statistical analysis
We used multivariable MR analysis of individual level data to estimate the direct effects of education and general cognitive ability on our three smoking phenotypes.
Multivariable MR includes both education and general cognitive ability as exposure variables in an MR regression with both of these exposures being predicted by a set of SNPs. This analysis controls for any correlation between education and general cognitive ability, and for any pleiotropic effect of the education SNPs on general cognitive ability and the general cognitive ability SNPs on education (i.e., through the education SNPs having a direct effect on general cognitive ability, or the cognitive ability SNPs having a direct effect on education). For comparison we also conducted standard MR analysis for each of the exposures on each of our three outcomes using the SNP score for education as the only instrument for educational attainment and the SNP score for general cognitive ability as the only instrument for general cognitive ability. These results give the total causal effect of education and general cognitive ability on each of the smoking phenotypes, including any effect of education on general cognitive ability and any effect of general cognitive ability on education. However, these results are more susceptible to pleiotropy as any direct pleiotropic effect of the education score on general cognitive ability or the general cognitive ability score on education will affect the results obtained from this analysis.
Throughout the analysis we adjusted for age at the time of the clinic, sex, year of birth, and a year of birth × sex interaction term, in order to adjust for any changes in patterns of smoking behaviour over time. We also controlled for the top 10 genetic principal components to account for any residual population stratification. We additionally included an 80% upweighting for individuals who left school with no qualifications.
To test for potential pleiotropy in the effect of the SNPs used as instruments we calculated the multivariable MR Egger effect estimates (9). Multivariable MR Egger analysis extends MR Egger to a two-sample multivariable setting (10)

Observational analysis
Observational analyses, controlling for the full set of controls using ordinary least squares (OLS) regression, indicated a small negative association between educational attainment, general cognitive ability and current smoking. These results also indicated a negative association between increased education and smoking initiation, and a positive association between general cognitive ability and smoking initiation. Increased education and higher general cognitive ability were both positively associated with smoking cessation.
These results are shown in Tables 2-4 Insert Tables 2 to 4 about here.

Causal analysis
In all of these analyses we used two instruments in the multivariable MR analysis and one instrument in each of the single variable MR analyses. The F-statistics show that the instruments used strongly predict education and general cognitive ability. In the multivariable MR analysis an additional assumption required for the instruments to be considered strong is that the instruments used can predict both of the exposure variables at the same time. This is measured using the conditional F statistic (11). In all of our multivariable MR estimations the conditional F-statistic is larger than the conventional value of 10 used to indicate a strong instrument, and so the instruments used strongly predict the exposure variables in the multivariable MR analyses. The introduction of a second exposure in the multivariable MR analysis does not introduce collider bias because each exposure is predicted by a set of SNPs that are not associated with smoking. The multivariable MR analysis uses values of education and general cognitive ability that have been predicted from a set of SNPs that are not associated with smoking behaviour; these predicted values will therefore not be affected 9 by changes in smoking and so cannot introduce collider bias to the analysis. This is explained further elsewhere (5). Smoking cessation. The results for being a former smoker showed that higher education leads to a higher probability of having quit smoking, but did not indicate a causal effect of general cognitive ability on quitting smoking. These results reflect the pattern seen in the results for currently or ever smoking showing that increased education makes individuals more likely to quit smoking once they have started as well as being less likely to start smoking in the first place.

Sensitivity analyses
As a sensitivity analysis we repeated the analyses separately with no weighting applied, and with self-reported age at which individual completed education (replaced with 21 for individuals with a degree who were not asked their age when they completed education). Both of these analyses give results that are not substantially different from those presented here (data not shown).
The results for the multivariable MR Egger estimation are given in Table 5. In these analyses, we dropped education SNPs directly associated with cognitive ability since we excluded SNPs in high LD, so that the 89 SNPs were included in total. In all of the analyses the constant is estimated to be close to zero, with narrow 95% confidence intervals around zero. These results suggest that the SNPs used in the main analysis do not have a pleiotropic effect on the outcome, and therefore that the results given above are valid.
For MR Egger analysis it is important to align all of the SNPs so they have a positive effect on the exposure; it is not clear how to do this in multivariable MR Egger and therefore we repeated the analysis with all of the effects aligned in the positive direction for firstly the effect on education and then their effect on cognitive ability. The results from this analysis did not change the conclusion drawn from the results (data not shown).

Discussion
Our multivariable MR results allow us to estimate the direct causal effects of each of education and general cognitive ability on smoking behaviour conditional on general cognitive ability and education respectively. The single variable MR estimates give the total effect of a change of a change in education and general cognitive ability on smoking including any effect of general cognitive ability on education and vice versa. Our results suggest that more years of education leads to a reduced likelihood of smoking and, among those who do smoke, a greater likelihood of cessation. Given the considerable physical harms associated with smoking, this is likely to account for a substantial proportion of health inequalities associated with differences in educational attainment. The large difference between the conventional and multivariable MR estimates for the effect of general cognitive ability on smoking suggests that a large part of the total effect of general cognitive ability on smoking behaviour observed in the conventional MR analysis is due to an effect of general cognitive ability on educational attainment. The estimates for the effect of education on smoking are far more similar in each of the estimates suggesting that changes in education will have an effect on smoking behaviour. These results are consistent with recent findings reported by Davies and colleagues, which used the natural experiment of an increase in school leaving age to show that remaining in school causally reduces the risk of adverse health outcomes, including likelihood of smoking (12).
Inferring causality using MR methods does not necessarily clarify the mechanistic pathways that link exposures and outcomes. Our results support a causal pathway from educational attainment to smoking behaviour that appears to be independent of general cognitive ability, but the causal pathway is likely to be complex. More years in education might influence smoking via greater awareness of the harms of smoking, increased selfefficacy, exposure to social groups where smoking is less common, and so on. While years in education might be considered a target for intervention, there is only limited scope to extend this. Identifying other links in the causal pathway may reveal other putative intervention targets that are potentially more tractable. This should be the subject of future research, and in part will depend on the identification of genetic variants associated with, for example, self-efficacy beliefs if MR methods are to be applied.
There are a number of important limitations to this study that should be considered when interpreting these results. First, Mendelian randomization is not without limitations and, critically, relies on key assumptions, in particular regarding the validity of the genetic variants used as instruments. We have shown that the genetic variants we used are strongly associated with both education and general cognitive ability. We also assessed the robustness of our analyses to potential pleiotropy using methods that rely on different assumptions (e.g., MR Egger), and these produced similar results. Second, we cannot rule out the possibility of dynastic effects -offspring will share on average 50% of the variegated genotype of their parents, so that the effects we observed may in fact be due to effects of parental education on the offspring environment that have an impact on smoking behaviour.
While we cannot rule out this possibility, converging evidence from instrumental variables not subject to potential dynastic effects that educational attainment is causally related to smoking-related health outcomes argues against this possibility. Third, the UK Biobank is highly unrepresentative of the general UK population, and this selection bias may introduce biases from which MR methods are not immune (13). For example, if educational attainment and smoking behaviour are both associated with participation in UK Biobank, then restricting our analyses to this sample may introduce collider bias, and bias associations between these variables (or even introduce spurious associations between them). It will therefore be important to replicate these findings in other samples that are either more representative of the general population, or are subject to different selection biases. Fourth, the interpretation of multivariable MR results is nuanced and potentially complex. Specifically, the estimate for educational attainment is adjusted for general cognitive ability and vice versa. This means that the estimated effect of education is the effect given a constant level of general cognitive ability, and the effect for general cognitive ability is the effect given a constant level of education. While the interpretation of our educational attainment results is relatively straightforward, the interpretation of our conditional general cognitive ability results (which imply increased likelihood of smoking with increasing general cognitive ability) is not. As general cognitive ability affects education what this (weak) effect may reflect is the fact that individuals with high levels of general cognitive ability who do not achieve the educational attainment that might be expected are likely to have done so for specific reasons that are themselves likely to influence smoking. This result indicates that the positive effect of general cognitive ability on smoking behaviour observed in the conventional MR analysis is acting via the effect of general cognitive ability on education, and the consequent effect of education on smoking behaviour. This suggests that an increase in general cognitive ability that was not associated with a corresponding increase in educational attainment would have a limited effect on smoking behaviour.
In summary, our results indicate that the observed association between educational attainment and smoking is causal, and unlikely to be due to effects of general cognitive ability on educational attainment. This suggests that a part of the health inequalities associated with differences in educational outcomes may be due to smoking, and that education represents a potential target for intervention to reduce health inequalities. Future research may identify other putative targets on the causal pathway from educational attainment and smoking behaviour, and should also explore whether there are similar relationships to other health behaviours such as alcohol consumption and diet. and coronary heart disease: mendelian randomisation study. Bmj. 2017 Aug 30;358:j3542.    Estimates of the effect of education and general cognitive ability on current smoking from OLS, multivariable MR and single variable MR regressions. All regressions also include a full set of adjustments: age, sex, year of birth, gender interacted with year of birth and 10 genetic principal components. All non-European and related individuals have been excluded from the analysis. Total sample size 52,419.