Introduction

Recent genome-wide association studies (GWAS) have described loci implicated in obesity, body mass index (BMI) and central adiposity. Yet most studies have ignored environmental exposures with possibly large impacts on the trait variance1,2. Variants that exert genetic effects on obesity through interactions with environmental exposures often remain undiscovered due to heterogeneous main effects and stringent significance thresholds. Thus, studies may miss genetic variants that have effects in subgroups of the population, such as smokers3.

It is often noted that currently smoking individuals display lower weight/BMI and higher waist circumference (WC) as compared to nonsmokers4,5,6. Smokers also have the smallest fluctuations in weight over 20 years compared to those who have never smoked or have stopped smoking7,8. Also, heavy smokers (>20 cigarettes per day [CPD]) and those that have smoked for more than 20 years are at greater risk for obesity than non-smokers or light to moderate smokers (<20 CPD)9,10. Men and women gain weight rapidly after smoking cessation and many people intentionally smoke for weight management11. It remains unclear why smoking cessation leads to weight gain or why long-term smokers maintain weight throughout adulthood, although studies suggest that tobacco use suppresses appetite12,13 or alternatively, smoking may result in an increased metabolic rate12,13. Identifying genes that influence adiposity and interact with smoking may help us clarify pathways through which smoking influences weight and central adiposity13.

A comprehensive study that evaluates smoking in conjunction with genetic contributions is warranted. Using GWAS data from the Genetic Investigation of Anthropometric Traits (GIANT) Consortium, we identified 23 novel genetic loci, and 9 loci with convincing evidence of gene-smoking interaction (GxSMK) on obesity, assessed by BMI and central obesity independent of overall body size, assessed by WC adjusted for BMI (WCadjBMI) and waist-to-hip ratio adjusted for BMI (WHRadjBMI). By accounting for smoking status, we focus both on genetic variants observed through their main effects and GxSMK effects to increase our understanding of their action on adiposity-related traits. These loci highlight novel biological functions, including response to oxidative stress, addictive behaviour and regulatory functions emphasizing the importance of accounting for environment in genetic analyses. Our results suggest that smoking may alter the genetic susceptibility to overall adiposity and body fat distribution.

Results

GWAS discovery overview

We meta-analysed study-specific association results from 57 Hapmap-imputed GWAS and 22 studies with Metabochip, including up to 241,258 (87% European descent) individuals (51,080 current smokers and 190,178 nonsmokers) while accounting for current smoking (SMK) (Methods section, Supplementary Fig. 1, Supplementary Tables 1–4). For primary analyses, we conducted meta-analyses across ancestries and sexes. For secondary analyses, we conducted meta-analyses in European-descent studies alone and sex-specific meta-analyses (Tables 1, 2, 3, 4, Supplementary Data 16). We considered four analytical approaches to evaluate the effects of smoking on genetic associations with adiposity traits (Fig. 1, Methods section). Approach 1 (SNPadjSMK) examined genetic associations after adjusting for SMK. Approach 2 (SNPjoint) considered the joint impact of main effects adjusted for SMK+interaction effects14. Approach 3 focused on interaction effects (SNPint); Approach 4 followed up loci from Approach 1 for interaction effects (SNPscreen). Results from Approaches 1–3 were considered genome-wide significant (GWS) with a P-value<5 × 10−8 while Approach 4 used Bonferroni adjustment after screening. Lead variants >500 kb from previous associations with BMI, WCadjBMI, and WHRadjBMI were considered novel. All association results are reported with effect estimates oriented on the trait increasing allele in the current smoking stratum.

Table 1 Summary of association results for novel loci reaching genome-wide significance in Approach (App) 1 (PSNPadjSMK <5E−8) or Approach 2 (PSNPjoint <5E−8) for our primary meta-analysis in combined ancestries and combined sexes.
Table 2 Novel loci showing significant association in Approaches 1 (SNPadjSMK) and/or 2 (SNPjoint) identified in secondary meta-analyses and not significant in primary meta-analyses.
Table 3 Summary of association results for loci showing significance for interaction with smoking in Approach (App) 3 (SNPint) and/or Approach 4 (SNPscreen) in our primary meta-analyses of combined ancestries and combined sexes.
Table 4 Summary of association results for loci showing significance for interaction with smoking in Approach 3 (SNPint) and/or Approach 4 (SNPscreen) in our secondary meta-analyses not identified in primary meta-analyses.
Figure 1: Summary of study design and results.
figure 1

Approach 1 uses both SNP and SMK in the association model. Approaches 2 and 3 use the SMK-stratified meta-analyses. Approach 4 screens loci based on Approach 1, then uses SMK-stratified results to identify loci with significant interaction effects (Methods section).

Figure 2: Forest plot for novel and GxSMK loci stratified by smoking status.
figure 2

Estimated effects (β±95% CI) for smokers (N up to 51,080) and nonsmokers (N up to 190,178 ) per risk allele for (a) BMI, (b) WCadjBMI and (c) WHRadjBMI for novel loci from Approaches 1 and 2 (SNPadjSMK and SNPjoint, respectively) and all loci from Approaches 3 and 4 (SNPint and SNPscreen) identified in the primary meta-analyses. Loci are ordered by greater magnitude of effect in smokers compared to nonsmokers and labelled with the nearest gene. For the locus near TMEM38B, rs9409082 was used for effect estimates in this plot. (¥loci identified for Approach 4, *loci identified for Approach 3).

Across the three adiposity traits, we identified 23 novel associated genetic loci (6 for BMI, 11 for WCadjBMI, 6 for WHRadjBMI) and nine having significant GxSMK interaction effects (2 for BMI, 2 for WCadjBMI, 5 for WHRadjBMI; Fig. 1, Tables 1, 2, 3, 4, Supplementary Data 16). We provide a comprehensive comparison with previously-identified loci1,2 by trait in supplementary material (Supplementary Data 7, Supplementary Note 1).

Accounting for smoking status

For primary meta-analyses of BMI (combined ancestries and sexes), 58 loci reached GWS in Approach 1 (SNPadjSMK; Supplementary Data 1, Supplementary Figs 2 and 3), including two novel loci near SOX11, and SRRM1P2 (Table 1). Three more BMI loci were identified using Approach 2 (SNPjoint), including a novel locus near CCDC93 (Supplementary Figs 4 and 5). For WCadjBMI, 62 loci reached GWS for Approach 1 (SNPadjSMK) and two more for Approach 2 (SNPjoint), including eight novel loci near KIF1B, HDLBP, DOCK3, ADAMTS3, CDK6, GSDMC, TMEM38B and ARFGEF2 (Table 1, Supplementary Data 2, Supplementary Figs 2–5). Lead variants near PSMB10 from Approaches 1 and 2 (rs14178 and rs113090, respectively) are >500 kb from a previously-identified WCadjBMI-associated variant (rs16957304); however, after conditioning on the known variant, our signal is attenuated (PConditional=3.02 × 10−2 and PConditional=5.22 × 10−3), indicating that this finding is not novel. For WHRadjBMI, 32 loci were identified in Approach 1 (SNPadjSMK), including one novel locus near HLA-C, with no additional loci in Approach 2 (SNPjoint; Table 1, Supplementary Data 3, Supplementary Figs 2–5).

We used GCTA15 to identify loci from our primary meta-analyses that harbour multiple independent SNPs (Methods section, Supplementary Tables 5–7). Conditional analyses revealed no secondary signals within 500 kb of our novel lead SNPs. Additionally, we performed conditional association analyses to determine whether our novel variants were independent of previous GWAS loci within 500 kb that are associated with related traits of interest. All BMI-associated SNPs were independent of previously identified GWS associations with anthropometric and obesity-related traits. Seven novel loci for WCadjBMI were near previous associations with related anthropometric traits. Of these, association signals for rs6743226 near HDLBP, rs10269774 near CDK6, and rs6012558 near ARFGEF2 were attenuated (PConditional>1E−5 and β decreased by half) after conditioning on at least one nearby height and hip circumference adjusted for BMI (HIPadjBMI) SNP, but association signals remained independent of other related SNP-trait associations. For WHRadjBMI, our GWAS signal was attenuated by conditioning on two known height variants (rs6457374 and rs2247056), but remained significant in other conditional analyses. Given high correlations among waist, hip and height, these results are not surprising.

Several additional loci were identified for Approaches 1 and 2 in secondary meta-analysis (Table 2, Supplementary Data 16, Supplementary Fig. 6). For BMI, 2 novel loci were identified by Approach 1, including 1 near EPHA3 and 1 near INADL. For WCadjBMI, 2 novel loci were identified near RAI14 and PRNP. For WHRadjBMI, five novel loci were identified in secondary meta-analyses near BBX, TRBI1, EHMT2, SMIM2 and EYA4. A comprehensive summary of nearby genes for all novel loci and their potential biological relevance is available in Supplementary Note 2.

Figure 3 presents analytical power for Approaches 1 and 2 while Supplementary Table 8 and Supplementary Fig. 7 present simulation results to evaluate type 1 error (Methods section). A heat map cross-tabulates P-values for Approaches 1 and 2 along with Approach 3 examining interaction only (Supplementary Fig. 8). We demonstrate that the two approaches yield valid type 1 error rates and that Approach 1 can be more powerful to find associations given zero or negligible quantitative interactions, whereas Approach 2 is more efficient in finding associations when interaction exists.

Figure 3: Power comparison across Approaches.
figure 3

Shown is the power to identify adjusted (Approach 1, dashed black lines), joint (Approach 2, dotted green lines) and interaction (Approach 3 and 4, solid magenta and orange lines) effects for various combinations of SMK- and NonSMK-specific effects and assuming 50,000 smokers and 180,000 nonsmokers. For (a,c,e), the effect in smokers was fixed at a small (=0.01%, similar to the realistic NUDT3 effect on BMI), medium (=0.07%, similar to the realistic BDNF effect on BMI) or large (=0.34%, similar to the realistic FTO effect on BMI) genetic effect, respectively, and varied in nonsmokers. For (b,d,f), the effect in nonsmokers was fixed to the small, medium and large BMI effects, respectively, and varied in smokers.

Modification of genetic predisposition by smoking

Approach 3 directly evaluated GxSMK interaction (SNPint; Table 3, Supplementary Data 1;6, Fig. 2, Supplementary Figs 9 and 10). For primary meta-analysis of BMI, two loci reached GWS including a previously identified GxSMK interaction locus near CHRNB4 (ref. 3), and a novel locus near INPP4B. Both loci exhibit GWS effects on BMI in smokers and no effects in nonsmokers. For CHRNB4 (cholinergic nicotine receptor B4), the variant minor allele (G) exhibits a decreasing effect on BMI in current smokers (βsmk=−0.047) but no effect in nonsmokers (βnonsmk=0.002). Previous studies identified nearby SNPs in high LD associated with smoking (nonsynonymous, rs16969968 in CHRNA5)3 and arterial calcification (rs3825807, a missense variant in ADAMTS7)16. Conditioning on these variants attenuated our interaction effect but did not eliminate it (Supplementary Table 7), suggesting a complex relationship between smoking, obesity, heart disease, and genetic variants in this region. Importantly, the CHRNA5-CHRNA3-CHRNB4 gene cluster has been associated with lower BMI in current smokers3, but with higher BMI in never smokers3, evidence supporting the lack of association in nonsmokers as well as a lack of previous GWAS findings on 15q25 (Supplementary Data 8)1. The CHRNA5-CHRNA3-CHRNB4 genes encode the nicotinic acetylcholine receptor (nAChR) subunits α3, α5 and β4, which are expressed in the central nervous system17. Nicotine has differing effects on the body and brain, causing changes in metabolism and feeding behaviours18. These findings suggest smoking exposure may modify genetic effects on 15q24-25 to influence smoking-related diseases, such as obesity, through distinct pathways.

In primary meta-analyses of WCadjBMI, one novel GWS locus (near GRIN2A) with opposite effect directions by smoking status was identified for Approach 3 (SNPint; Table 3, Supplementary Data 2, Fig. 2, Supplementary Figs 9 and 10). The T allele of rs4141488 increases WCadjBMI in current smokers and decreases it in nonsmokers (βsmk=0.037, βnonsmk=−0.015). In secondary meta-analysis of European women-only, we identified an interaction between rs6076699, near PRNP, and SMK on WCadjBMI (Table 4, Supplementary Data 5, Supplementary Fig. 6), a locus also identified in Approach 2 (SNPjoint) for European women. The major allele, A, has a positive effect on current smokers as compared to a weaker and negative effect on WC in nonsmokers (βsmk=0.169, βnonsmk=−0.070), suggesting why this variant remained undetected in previous GWAS of WCadjBMI (Supplementary Data 8).

Approach 4 (SNPscreen; Fig. 1, Methods section) evaluated GxSMK interactions after screening SNPadjSMK results (from Approach 1) using Bonferroni-correction (Methods section, Tables 3, 4, Supplementary Data 16). We identified two SNPs, near LYPLAL1 and RSPO3, with significant interaction; both have previously published main effects on anthropometric traits. These loci exhibit effects on WHRadjBMI in nonsmokers, but not in smokers (Fig. 2). In secondary meta-analyses, we identified three known loci with significant GxSMK interaction effects on WHRadjBMI near MAP3K1, HOXC4-HOXC6 and JUND (Table 4, Supplementary Data 3 and 6). We identified rs1809420, near CHRNA5-CHRNA3-CHRNB4, for BMI in the men-only, combined-ancestries meta-analysis (Supplementary Data 1).

Power calculations demonstrate that Approach 4 has increased power to identify SNPs that show (i) an effect in one stratum (smokers or nonsmokers) and a less pronounced but concordant effect in the other stratum, or (ii) an effect in the larger nonsmoker stratum and no effect in smokers (Fig. 3). In contrast, Approach 3 has increased power for SNPs that show (i) an effect in the smaller smoker stratum and no effect in nonsmokers, or (ii) an opposite effect between smokers and nonsmokers (Fig. 3). Our findings for both approaches agree with these power predictions, supporting using both analytical approaches to identify GxSMK interactions.

Enrichment of genetic effects by smoking status

When examining the smoking specific effects for BMI and WCadjBMI loci in our meta-analyses, no significant enrichment of genetic effects by smoking status were noted. (Fig. 2, Supplementary Figs 11 and 12). However, our results for WHRadjBMI were enriched for loci with a stronger effect in nonsmokers as compared to smokers, with 35 of 45 loci displaying numerically larger effects in nonsmokers (Pbinomial=1.2 × 10−4).

We calculated the variance explained by subsets of SNPs selected on 15 significance thresholds for Approach 1 from PSNPadjSMK=1 × 10−8 to PSNPadjSMK=0.1 (Supplementary Table 9, Fig. 4). Differences in variance explained between smokers and nonsmokers were significant (PRsqDiff<0.003=0.05/15, Bonferroni-corrected for 15 thresholds) for BMI at each threshold, with more variance explained in smokers. For WCadjBMI, the difference was significant for SNP sets beginning with PSNPadjSMK≥3.16 × 10−4, and for WHRadjBMI at PSNPadjSMK≥1 × 10−6. In contrast to BMI, SNPs from Approach 1 explained a greater proportion of the variance in nonsmokers for WHRadjBMI. Differences in variance explained were greatest for BMI (differences ranged from 1.8 to 21% for smokers) and lowest for WHRadjBMI (ranging from 0.3 to 8.8% for nonsmokers).

Figure 4: Stratum specific estimates of variance explained.
figure 4

Total smoking status-specific explained variance (±s.e.) by SNPs meeting varying thresholds of overall association in Approach 1 (SNPadjSMK) and the difference between the proportion of variance explained between smokers and nonsmokers for these same sets of SNPs in BMI (a,b), WCadjBMI (c,d), and for WHRadjBMI (e,f).

These results suggest that smoking may increase genetic susceptibility to overall adiposity, but attenuate genetic effects on body fat distribution. This contrast is concordant with phenotypic observations of higher overall adiposity and lower central adiposity in smokers4,6,7. Additionally, smoking increases oxidative stress and general inflammation in the body19 and may exacerbate weight gain20. Many genes implicated in BMI are involved in appetite regulation and feeding behaviour1. For waist traits, our results adjusted for BMI likely highlight distinct pathways through which smoking alters genetic susceptibility to body fat distribution. Overall, our results indicate that more loci remain to be discovered as more variance in the trait can be explained as we drop the threshold for significance.

Functional or biological role of novel loci

We conducted thorough searches of the literature and publicly available bioinformatics databases to understand the functional role of all genes within 500 kb of our lead SNPs. We systematically explored the potential role of our novel loci in affecting gene expression both with and without accounting for the influence of smoking behaviour (Methods section, Supplementary Note 3, Supplementary Tables 10–12).

We found the majority of novel loci are near strong candidate genes with biological functions similar to previously identified adiposity-related loci, including regulation of body fat/weight, angiogenesis/adipogenesis, glucose and lipid homeostasis, general growth and development. (Supplementary Notes 1 and 3).

We identified rs17396340 for WCadjBMI (Approaches 1 and 2), an intronic variant in the KIF1B gene. This variant is associated with expression of KIF1B in whole blood with and without accounting for SMK (GTeX and Supplementary Tables 10 and 12) and is highly expressed in the brain21. Knockout and mutant forms of KIF1B in mice resulted in multiple brain abnormalities, including hippocampus morphology22, a region involved in (food) memory and cognition23. Variant rs17396340 is associated with expression levels of ARSA in LCL tissue. Human adipocytes express functional ARSA, which turns dopamine sulfate into active dopamine. Dopamine regulates appetite through leptin and adiponectin levels, suggesting a role for ARSA in regulating appetite24.

Expression of CD47 (CD47 molecule), near rs670752 for WHRadjBMI (Approach 1, women-only), is significantly decreased in obese individuals and negatively correlated with BMI, WC and Hip circumference25. Conversely, in mouse models, CD47-deficient mice show decreased weight gain on high-fat diets, increased energy expenditure, improved glucose profile and decreased inflammation26.

Several novel loci harbour genes involved in unique biological functions and pathways including addictive behaviours and response to oxidative stress. These potential candidate genes near our association signals are highly expressed in relevant tissues for regulation of adiposity and smoking behaviour (for example, brain, adipose tissue, liver, lung and muscle; Supplementary Note 2, Supplementary Table 10).

The CHRNA5-CHRNA3-CHRNB4 cluster is involved in the eNOS signalling pathway (Ingenuity KnowledgeBase, http://www.ingenuity.com) that is key for neutralizing reactive oxygen species introduced by tobacco smoke and obesity27. Disruption of this pathway has been associated with dysregulation of adiponectin in adipocytes of obese mice, implicating this pathway in downstream effects on weight regulation27,28. This finding is especially important due to the compounded stress adiposity places on the body as it increases chronic oxidative stress itself28. INPP4B has been implicated in the regulation of the PI3K/Akt signalling pathway29 that is important for cellular growth and proliferation, but also eNOS signalling, carbohydrate metabolism, and angiogenesis30.

GRIN2A, near rs4141488, controls long-term memory and learning through regulation and efficiency of synaptic transmission31 and has been associated with heroin addiction32. Nicotine increases the expression of GRIN2A in the prefrontal cortex in murine models33. There are no established relationships between GRIN2A and obesity-related phenotypes in the literature, yet memantine and ketamine, pharmacological antagonists of GRIN2A activity34,35, are implicated in treatment for obesity-associated disorders, including binge-eating disorders and morbid obesity (ClinicalTrials.gov identifiers: NCT00330655, NCT02334059, NCT01997515, NCT01724983). Memantine is under clinical investigation for treatment of nicotine dependence (ClinicalTrials.gov identifiers: NCT01535040, NCT00136786 and NCT00136747). While our lead SNP is not within a characterized gene, rs4141488 and variants in high LD (r2>0.7) are within active enhancer regions for several tissues, including liver, fetal leg muscle, smooth stomach and intestinal muscle, cortex and several embryonic and pluripotent cell types (Supplementary Note 2), and therefore may represent an important regulatory region for nearby genes like GRIN2A.

In secondary meta-analysis of European women-only, we identified a significant GxSMK interaction for rs6076699 on WCadjBMI (Table 4, Supplementary Data 4, Supplementary Fig. 6). This SNP is 100 kb upstream of PRNP (prion protein), a signalling transducer involved in multiple biological processes related to the nervous system, immune system, and other cellular functions (Supplementary Note 2)36. Alternate forms of the oligomers may form in response to oxidative stress caused by copper exposure37. Copper is present in cigarette smoke and elevated in the serum of smokers, but is within safe ranges38,39. Another gene near rs6076699, SLC23A2 (Solute Carrier Family 23 (Ascorbic Acid Transporter), Member 2), is essential for the uptake and transport of Vitamin C, an important nutrient for DNA and cellular repair in response to oxidative stress both directly and through supporting the repair of Vitamin E after exposure to oxidative agents40,41. SLC23A2 is present in the adrenal glands and murine models indicate that it plays an important role in regulating dopamine levels42. This region is associated with success in smoking cessation and is implicated in addictive behaviours in general43,44. Our tag SNP is located within an active enhancer region (marked by open chromatin marks, DNAse hypersentivity, and transcription factor binding motifs); this regulatory activity appears tissue specific (sex-specific tissues and lungs; HaploReg and UCSC Genome Browser).

Nicotinamide mononucleotide adenylyltransferease (NMNAT1), upstream of WCadjBMI variant rs17396340, is responsible for the synthesis of NAD from ATP and NMN45. NAD is necessary for cellular repair following oxidative stress. Upregulation of NMNAT protects against damage caused by reactive oxygen species in the brain, specifically the hippocampus46. Also for WCadjBMI, both CDK6, near SNP rs10269774, and FAM49B, near SNP rs6470765, are targets of the BACH1 transcription factor, involved in cellular response to oxidative stress and management of the cell cycle47.

Influence of novel loci on related traits

In a look-up in existing GWAS of smoking behaviours (Ever/Never, Current/Not-Current, Smoking Quantity (SQ))48 (Supplementary Data 8), eight of our 26 SNPs were nominally associated with at least one smoking trait. After multiple test correction (PRegression<0.05/26=0.0019), only one SNP remains significant: rs12902602, identified for Approaches 2 (SNPjoint) and 3 (SNPint) for BMI, showed association with SQ (P=1.45 × 10−9).

We conducted a search in the NHGRI-EBI GWAS Catalog49,50 to determine if any of our newly identified loci are in high LD with variants associated with related cardiometabolic and behavioural traits or diseases. Of the seven novel BMI SNPs, only rs12902602 was in high LD (r2>0.7) with SNPs previously associated with smoking-related traits (for example, nicotine dependence), lung cancer, and cardiovascular diseases (for example, coronary heart disease; Supplementary Table 13). Of the 12 novel WCadjBMI SNPs, 5 were in high LD with previously reported GWAS variants for mean platelet volume, height, infant length, and melanoma. Of the six novel WHRadjBMI SNPs, three were near several previously associated variants, including cardiometabolic traits (for example, LDL cholesterol, triglycerides and measures of renal function).

Given high phenotypic correlation between WC and WHR with height, and established shared genetic associations that overlap our adiposity traits and height1,2,51 we expect cross-trait associations between our novel loci and height. Therefore, we conducted a look-up of all of our novel SNPs to identify overlapping association signals (Supplementary Data 8). No novel BMI loci were significantly associated with height (PRegression<0.002(0.05/24) SNPs). However, there are additional variants that may be associated with height, but not previously reported in GWAS examining height, including two for WHRadjBMI near EYA4 and TRIB1, and two for WCadjBMI near KIF1B and HDLBP (PRegression<0.002).

Finally, as smoking has a negative (weight decreasing) effect on BMI, it is likely that smoking-associated genetic variants have an effect on BMI in current smokers. Therefore, we expected that smoking-associated SNPs exhibit some interaction with smoking on BMI. We looked up published smoking behaviour SNPs49,50, 10 variants in 6 loci, in our own results. Two variants reached nominal significance (PSNPint<0.05) for GxSMK interaction on BMI (Supplementary Table 14), but only one reached Bonferroni-corrected significance (P<0.005). No smoking-associated SNPs exhibited GxSMK interaction. Therefore, we did not see a strong enrichment for low interaction P values among previously identified smoking loci.

Validation of novel loci

We pursued validation of our novel and interaction SNPs in an independent study sample of up to 119,644 European adults from the UK Biobank study (Tables 1, 2, 3, 4, Supplementary Table 15, Supplementary Fig. 9). We found consistent directions of effects in smoking strata (for Approaches 2 and 3) and in SNPadjSMK results (Approach 1) for each locus examined (Supplementary Fig. 13). For BMI, three SNPs were not GWS (PSNPadjSMK, PSNPjoint, PSNPInt>5E−8) following meta-analysis with our GIANT results: rs12629427 near EPAH3 (Approach 1); rs1809420 within a known locus near ADAMTS7 (Approach 4) remained significant for interaction, but not for SNPadjSMK; and rs336396 near INPP4B (Approach 3). For WCadjBMI, 3 SNPs were not GWS (PSNPadjSMK, PSNPjoint, PSNPInt>5E−8) following meta-analysis with our results: rs1545348 near RAI14 (Approach 1); rs4141488 near GRIN2A (Approach 3); and rs6012558 near PRNP (Approach 3). For WHRadjBMI, only 1 SNP from Approach 4 was not significant following meta-analysis with our results: rs12608504 near JUND remained GWS for SNPadjSMK, but was only nominally significant for interaction (PSNPint=0.013).

Challenges in accounting for environmental exposures in GWAS

A possible limitation of our study may be the definition and harmonization of smoking status. We chose to stratify on current smoking status without consideration of type of smoking (for example, cigarette, pipe) for two reasons. First, focusing on weight alone, former smokers tend to return to their expected weight quickly following smoking cessation7,13,52. Second, this definition allowed us to maximize sample size, as many participating studies only had current smoking status available. However, WC and WHR may not behave in the same manner as weight and BMI with former smokers retaining excess fat around their waist. Thus, results may differ with alternative harmonization of smoking exposure.

Another limitation may be potential bias in our effect estimates when adjusting for a correlated covariate (for example, collider bias)53. This phenomenon is of particular concern when the correlation between the outcome and the covariate is high and when significant genetic associations occur with both traits in opposite directions. Our analyses adjusted both WC and WHR for BMI. WHR has a correlation of 0.49 with BMI, while WC has a correlation of 0.85 (ref. 53). Using previously published results for BMI, WCadjBMI and WHRadjBMI, we find three novel loci for WCadjBMI (near DOCK3, ARFGEF2 and TMEM38B) and two for WHRadjBMI (near EHMT2 and HLA-C; Supplementary Data 8) with nominally significant associations with BMI and opposite directions of effect. At these loci, the genetic effect estimates should be interpreted with caution. Additionally, we adjusted for SMK in Approach 1 (SNPadjSMK). However binary smoking status, as we used, has a low correlation to BMI, WC, and WHR, as estimated in the ARIC study’s European descent participants (−0.13, 0.08 and 0.12, respectively) and in the Framingham Heart Study (−0.05, 0.08 and 0.16). Additionally, there are no loci identified in Approach 1 (SNPadjSMK) that are associated with any smoking behaviour trait and that exhibit an opposite direction of effect from that identified in our adiposity traits (Supplementary Data 8). We therefore preclude potential collider bias and postulate true gain in power through SMK-adjustment at these loci.

To assess how much additional information is provided by accounting for SMK and GxSMK in GWAS for obesity traits, we compared genetic risk scores (GRSs) based on various subsets of lead SNP genotypes in various regression models (Methods section). While any GRS was associated with its obesity trait (PGRS<1.6 × 10−7, Supplementary Table 16), adding SMK and GxSMK terms to the regression model along with novel variants to the GRSs substantially increased variance explained. For example, variance explained increased by 38% for BMI (from 1.53 to 2.11%, PGRSDiff=4.3 × 10−5), by 27% for WCadjBMI (from 2.59 to 3.29%, PGRSDiff=3.9 × 10−6) and by 168% for WHRadjBMI (from 0.82 to 2.20%, PGRSDiff=3.2 × 10−11). Therefore, despite potential limitations, much is gained by accounting for environmental exposures in GWAS studies.

Discussion

To better understand the effects of smoking on genetic susceptibility to obesity, we conducted meta-analyses to uncover genetic variants that may be masked when the environmental influence of smoking is not considered, and to discover genetic loci that interact with smoking on adiposity-related traits. We identified 161 loci in total, including 23 novel loci (6 for BMI, 11 for WCadjBMI, and 6 for WHRadjBMI). While many of our newly identified loci support the hypothesis that smoking may influence weight fluctuations through appetite regulation, these novel loci also have highlighted new biological processes and pathways implicated in the pathogenesis of obesity.

Importantly, we identified nine loci with convincing evidence of GxSMK interaction on obesity-related traits. We were able to replicate the previous GxSMK interaction with BMI within the CHRNA5-CHRNA3-CHRNB4 gene cluster. One novel BMI-associated locus near INPP4B and two novel WCadjBMI-associated loci near GRIN2A and PRNP displayed significant GxSMK interaction. We were also able to identify significant GxSMK interaction for one known BMI-associated locus near ADAMTS7 and for five known WHRadjBMI-associated loci near LYPLAL1, RSPO3, MAP3K1, HOXC4-HOXC6 and JUND. The majority of these loci harbour strong candidate genes for adiposity with a possible role for the modulation of effects through tobacco use.

We identified 18 new loci in Approach 1 (PSNPadjSMK) by adjusting for current smoking status. Our analyses did not allow us to determine whether these discoveries are due to different subsets of subjects included in the analyses compared to previous studies1,2 or due only to adjusting for current smoking. Adjustment for current smoking in our analyses, however, did reveal novel associations. Specifically after accounting for smoking in our analyses, all novel BMI loci exhibit P-values that are at least one order of magnitude lower than in previous GIANT investigations, despite smaller samples in the current analysis2. While sample sizes for both WCadjBMI and WHRadjBMI are comparable with previous GIANT investigations, our P values for variants identified in Approach 1 are at least two orders of magnitude lower than previous findings. Thus, adjustment for smoking may have indeed revealed new loci. Further, loci identified in Approach 2, including nine novel loci, suggest that accounting for interaction improves our ability to detect these loci even in the presence of only modest evidence of GxSMK interaction.

There are several challenges in validating genetic associations that account for environmental exposure. In addition to exposure harmonization and potential bias due to adjustment for smoking exposure, differences in trait distribution, environmental exposure frequency, ancestry-specific LD patterns and allele frequency across studies may lead to difficulties in replication, especially for gene-by-environment studies54. Furthermore, the ‘winner’s curse’ (inflated discovery effects estimates) requires larger sample sizes for adequate power in replication studies55. Despite these challenges, we were able to detect consistent direction of effect in an independent sample for all novel loci. Some results that did not remain GWS in the GIANT+UKBB meta-analysis had results that were just under the threshold for significance, suggesting that a larger sample may be needed to confirm these results, and thus the associations near INPP4B, GRIN2A, RAI14, PRNP and JUND should be interpreted with caution.

While we found that effects were not significantly enriched in smokers for BMI, there is a greater proportion of variance in BMI explained by variants that are significant for Approach 1 (SNPadjSMK), which may be expected given that there are a greater number of variants with higher effect estimates in smokers. For WCadjBMI, there was no enrichment for stronger effects in one stratum compared to the other for our significant loci; however, there was a greater proportion of explained variance in WCadjBMI for loci identified in Approach 1 (SNPadjSMK) in nonsmokers. For WHRadjBMI, there were significantly more loci that exhibit greater effects in nonsmokers, and this pattern was mirrored in the variance explained analysis. The large difference between effects in smokers and nonsmokers likely explains the sub-GWS levels of our loci in previous GIANT investigations2. For example, the T allele of rs7697556, 81kb from the ADAMTS3 gene, was associated with increased WCadjBMI and exhibits a sixfold greater effect in nonsmokers compared to smokers, although the interaction effect was only nominal; in previous GWAS this variant was nearly GWS. These differences in effect estimates between smokers and nonsmokers may help explain inconsistent findings in previous analyses that show central adiposity increases with increased smoking, but is associated with decreased weight and BMI5,9,10.

Our results support previous findings that implicate genes involved in transcription and gene expression, appetite regulation, macronutrient metabolism, and glucose homeostasis. Several of our novel loci have candidate genes within 500 kb of our tag variants that are highly expressed and/or active in brain tissue (BBX, KIF1B, SOX11 and EPHA3) and, like other obesity-associated genes, may be involved in previously-identified pathways linked to neuronal regulation of appetite (KIF1B, GRIN2A and SLC23A2), adipo/angiogenesis (ANGPTL3 and TNF) and glucose, lipid and energy homeostasis (CD47, STK25, STK19, RAGE, AIF1, LYPLAL1, HDLBP, ANGPTL3, DOCK7, KIF1B, PREX1 and RPS12).

Many our newly identified loci highlight novel biological functions and pathways where dysregulation may lead to increased susceptibility to obesity, including response to oxidative stress, addictive behaviour, and newly identified regulatory functions. There is a growing body of evidence that supports the notion that exposure to oxidative stress leads to increased adiposity, risk of obesity, and poor cardiometabolic outcomes27,56. Our results for BMI and WCadjBMI, specifically associations identified near CHRNA5-CHRNA3-CHRNB4, PRNP, SLC23A2, BACH1 and NMNAT1, highlight new biological pathways and processes for future examination and may lead to a greater understanding of how oxidative stress leads to changes in obesity phenotypes and downstream cardiometabolic risk.

By considering current smoking, we were able to identify 6 novel loci for BMI, 11 for WCadjBMI and 6 for WHRadjBMI, and highlight novel biological processes and regulatory functions for genes implicated in increased obesity risk. Eighteen of these remained significant in our validation with the UK Biobank sample. We confirmed most established loci in our analyses after adjustment for smoking status in smaller samples than were needed in previous discovery analyses. A typical approach in large-scale GWAS meta-analyses is not to adjust for covariates such as current smoking; our findings highlight the importance of accounting for environmental exposures in genetic analyses.

Methods

Study design overview

We applied four approaches to identify genetic loci that influence adiposity traits by accounting for current tobacco smoking status (Fig. 1). We defined smokers as those who responded that they were currently smoking; not current smokers were those that responded ‘no’ to currently smoking. We evaluated three traits: body mass index (BMI), waist circumference adjusted for BMI (WCadjBMI), and waist-to-hip ratio adjusted for BMI (WHRadjBMI). Our first two meta-analytical approaches were aimed at determining whether there are novel genetic variants that affect adiposity traits by adjusting for SMK (SNPadjSMK), or by jointly accounting for SMK and for interaction with SMK (SNPjoint); while Approaches 3 and 4 aimed to determine whether there are genetic variants that affect adiposity traits through interaction with SMK (SNPint and SNPscreen) (Fig. 1). Our primary meta-analyses focused on results from all ancestries, sexes combined. Secondary meta-analyses were performed using the European-descent populations only, as well as stratified by sex (men-only and women-only) in all ancestries and in European-descent study populations.

Cohort descriptions and sample sizes

The GIANT consortium was formed by an international group of researchers interested in understanding the genetic architecture of anthropometric traits (Supplemental Tables 1–4 for study sample sizes and descriptive statistics). In total, we included up to 79 studies comprising up to 241,258 individuals for BMI (51,080 smokers, 190,178 non-smokers), 208,176 for WCadjBMI (43,226 smokers, 164,950 non-smokers), and 189,180 for WHRadjBMI (40,543 smokers, 148,637 non-smokers) with HapMap II imputed genome-wide chip data (up to 2.8M SNPs in association analyses), and/or with genotyped MetaboChip data (195 K SNPs in association analyses). In instances where studies submitted both Metabochip and GWAS data, these were for non-overlapping individuals. Each study’s Institutional Review Board has approved this research and all study participants have provided written informed consent.

Phenotype descriptions

Our study highlights three traits of interest: BMI, WCadjBMI and WHRadjBMI. Height and weight, used to calculate BMI (kg m−2), were measured in all studies; waist and hip circumferences were measured in the vast majority. For each sex, traits were adjusted using linear regression for age and age2 (as well as for BMI for WCadjBMI and WHRadjBMI), and (when appropriate) for study site and principal components to account for ancestry. Family studies used linear mixed effects models to account for familial relationships and also conducted analyses for men and women combined including sex in the model. Phenotype residuals were obtained from the adjustment models and were inverse normally transformed subsequently to facilitate comparability across studies and with previously published analyses. The trait transformation was conducted separately for smokers and nonsmokers for the SMK-stratified model and using all individuals for the SMK-adjusted model.

Defining smokers

The participating studies have varying levels of information on smoking, some with a simple binary variable and others with repeated, precise data. Since the effects of smoking cessation on adiposity appear to be immediate7,8,52, a binary smoking trait (current smoker versus not current smoker) is used for the analyses as most studies can readily derive this variable. We did not use a variable of ‘ever smoker vs. never’ as it increases heterogeneity across studies, thus adding noise; also this definition would make harmonization across studies difficult.

Genotype identification and imputation

Studies with GWAS array data or Metabochip array data contributed to the results. Each study applied study‐specific standard exclusions for sample call rate, gender checks, sample heterogeneity and ethnic group outliers (Supplementary Table 2). For each studies (except those that employed directly typed MetaboChip genotypes), genome-wide chip data was imputed to the HapMap II reference data set.

Study level analyses

To obtain study-specific summary statistics used in subsequent meta-analyses, the following linear models (or linear mixed effects models for studies with families/related individuals) were run separately for men and women and separately for cases and controls for case-control studies using phenotype residuals from the models described above. Studies with family data also conducted analyses with these models for men and women combined after accounting for dependency among family members as a function of their kinship correlations. We assumed an additive genetic model. The analyses were run using various GWAS software Supplementary Table 2.

Quality control of study-specific summary statistics

The aggregated summary statistics were quality-controlled according to a standardized protocol57. These included checks for issues with trait transformations, allele frequencies and strand. Low quality SNPs in each study were excluded for the following criteria: (i) SNPs with low minor allele count (MAC<=5, MAC=MAF × N) and monomorphic SNPs, (ii) genotyped SNPs with low SNP call-rate (<95%) or low Hardy-Weinberg equilibrium test P value (<10−6), (iii) imputed SNPs with low imputation quality (MACH-Rsq or OEVAR <0.3, or information score <0.4 for SNPTEST/IMPUTE/IMPUTE2, or <0.8 for PLINK). To test for issues with relatedness or overlapping samples and to correct for potential population stratification, the study-specific standard errors and association P values were genomic control (GC) corrected using lambda factors (Supplementary Fig. 1). GC correction for GWAS data used all SNPs, but GC correction for MetaboChip data were restricted to chip QT interval SNPs only as the chip was enriched for associations with obesity-related traits. Any study-level GWAS file with a lambda >1.5 was removed from further analyses. While we established this criterion, no study results were removed for this reason.

Meta-analyses

Meta-analyses used study-specific summary statistics for the phenotype associations for each of the above models. We used a fixed-effects inverse variance weighted method for the SNP main effect analyses. All meta-analyses were run in METAL58. As study results came in two separate batches (Stage 1 and Stage 2), meta-analyses from the two stages were further meta-analysed (Stage 1+Stage 2). A second GC correction was applied to all SNPs when combining Stage 1 and Stage 2 meta-analyses in the final meta-analysis. First, Hapmap-imputed GWAS data were meta-analysed together, as were Metabochip studies. This step was followed by a combined GWAS+Metabochip meta-analysis. For primary analyses, we conducted meta-analyses across ancestries and sexes. For secondary meta-analyses, we conducted meta-analyses in European-descent studies alone, and sex-specific meta-analyses. There were two reasons for conducting secondary meta-analyses. First, both WCadjBMI and WHRadjBMI have been shown to display sex-specific genetic effects2,59,60. Second, by including populations from multiple ancestries in our primary meta-analyses, we may be introducing heterogeneity due to differences in effect sizes, allele frequencies, and patterns of linkage disequilibrium across ancestries, potentially decreasing power to detect genetic effects. See Supplementary Fig. 1 for a summary of the primary meta-analysis study design. The obtained SMK-stratified summary statistics were later used to calculate summary SNPjoint and SNPint statistics using EasyStrata61. Briefly, this software implements a two-sample, large sample test of equal regression parameters between smokers and nonsmokers59 for SNPint and the two degree of freedom test of main and interaction effects for SNPjoint14.

Lead SNP selection

Before selecting a lead SNP for each locus, SNPs with high heterogeneity I2≥0.75 or a minimum sample size below 50% of the maximum N for each strata (for example, N> max(N women smokers)/2) were excluded. Lead SNPs that met significance criteria were selected based on distance (±500 kb), and we defined the SNP with the lowest P value as the top SNP for a locus. SNPs that reached genome-wide significance (GWS), but had no other SNPs within 500 kb with a P<1E-5 (lonely SNPs), were excluded from the SNP selection process. Two variants were excluded from Approach 2 based on this criterion, rs2149656 for WCadjBMI and rs2362267 for WHRadjBMI.

Approaches

Figure 1 outlines the four approaches that we used to identify novel SNPs. The left side of Fig. 1 focuses on the first hypothesis that examines the effect of SNPs on adiposity traits. Approach 1 considered a linear regression model that includes the SNP and SMK, thus adjusting for SMK (SNPadjSMK). Summary SNPadjSMK results were obtained from the SMK-adjusted meta-analysis. Approach 2 used summary SMK-stratified meta-analysis results14 to consider the joint hypothesis that a genetic variant has main and/or interaction effects on outcomes as a 2 degree of freedom test (SNPjoint). For this approach, the null hypothesis was that there is no main and no interaction effect on the outcome. Thus, rejection of this hypothesis could be due to either a main effect or an interaction effect or to both.

The right side of Fig. 1 focuses on our second hypothesis, testing for interaction of a variant with SMK on adiposity traits as outcomes. Approach 3 used the SMK-stratified results to directly contrast the regression coefficients for a test of interaction (SNPint)59. Approach 4 used a screening strategy to evaluate interaction, whereby the SMK-adjusted main effect results (Approach 1) were screened for variants significant at the P<5 × 10-8 level. These variants were then carried forward for a test of interaction, comparing the SMK-stratified specific regression coefficients in the second step (SNPscreen).

In Approaches 1–3 variants significant at P<5 × 10−8 were considered GWS. In Approach 4 (SNPscreen) variants for which the P value of the test of interaction is less than 0.05 divided by the number of variants carried forward were considered significant for interaction. We performed analytical power computations to demonstrate the usefulness and characteristic of the two interaction Approaches.

Locuszoom plots

Regional association plots were generated for novel loci using the program Locuszoom (http://locuszoom.sph.umich.edu/) . For each plot, LD was calculated using a multiethnic sample of the 1000 Genomes Phase I reference panels62, including EUR, AFR, EAS and AMR. Previous SNP-trait associations highlighted within the plots include traits of interest (for example, cardiometabolic, addiction, behaviour and anthropometrics) found in the NHGRI-EMI GWAS Catalog and supplemented with recent GWAS studies from the literature1,2,51,60.

Conditional analyses

To determine if multiple association signals were present within a single locus, we used GCTA15 to perform approximate joint conditional analyses on the SNPadjSMK and SMK- stratified data. The following criteria were used to select candidate loci for conditional analyses: nearby SNP (±500kb) with an R2>0.4 and an association P<1E−5 for any of our primary analyses. GCTA uses associations from our meta-analyses and LD estimates from reference data sets containing individual-level genotypic data to perform the conditional analyses. To calculate the LD structure, we used two U.S. cohorts, the Atherosclerosis Risk in Communities (ARIC) study consisting of 9,713 individuals of European descent and 580 individuals of African American descent, and the Framingham Heart Study (FramHS) consisting of 8,481 individuals of European ancestry, both studies imputed to HapMap r22. However, because our primary analyses were conducted in multiple ancestries, each study supplemented the genetic data using HapMap reference populations so that the final reference panel was composed of about 1–3% Asians (CHB+JPT) and 4–6% Africans (YRI for the FramHS) for the entire reference sample. We extracted each 1 MB region surrounding our candidate SNPs, performed joint approximate conditional analyses, and then repeated the steps for the appropriate Approach to identify additional association signals.

Many of the SNPs identified in the current analyses were nearby SNPs previously associated with related anthropometric and obesity traits (for example, height, visceral adipose tissue). For all lead SNPs near a SNP previously associated with these traits, GCTA was also used to perform approximate conditional analyses on the SNPadjSMK and SMK-stratified data in order to determine if the loci identified here are independent of the previously identified SNP-trait associations.

Power and type I error

In order to illustrate the validity of the approaches with regards to type 1 error, we conducted simulations. For two MAF, we assumed standardized stratum-specific outcomes for 50,000 smokers and 180,000 nonsmokers and generated 10,000 simulated stratum-specific effect sizes under the stratum-specific null hypotheses of ‘no stratum-specific effects’. We applied the four approaches to the simulated stratum-specific association results and inferred type 1 error of each approach by visually examining QQ plots and by calculating type 1 error rates. The type 1 error rates shown reflect the proportion of nominally significant simulation results for the respective approach. Analytical power calculations to identify effects for various combinations of SMK- and NonSMK-specific effects by the Approaches 1–4 again assumed 50,000 smokers and 180,000 nonsmokers. We first assumed three different fixed effect estimates in smokers that were small (=0.01%, similar to the realistic NUDT3 effect on BMI), medium (=0.07%, similar to the realistic BDNF effect on BMI) or large (=0.34%, similar to the realistic FTO effect on BMI) genetic effects, and varied the effect in nonsmokers. Second, we assumed fixed (small, medium and large) effects in nonsmokers and varied the effect in smokers.

Biological summaries

To identify genes that may be implicated in the association between our lead SNPs (Tables 1, 2, 3) and BMI, WHRadjBMI and WCadjBMI, and to shed light on the complex relationship between genetic variants, SMK and adiposity, we performed in-depth literature searches on nearby candidate genes. Snipper v1.2 (http://csg.sph.umich.edu/boehnke/snipper/) was used to identify any genes and cis- or trans-eQTLs within 500 kb of our lead SNPs. All genes identified by Snipper were manually curated and examined for evidence of relationship with smoking and/or adiposity. To explore any potential regulatory or function role of the association regions, loci were also examined using several online bioinformatic tools/databases, including HaploReg v4.1 (ref. 63), UCSC Genome Browser (http://genome.ucsc.edu/), GTeX Portal (http://www.gtexportal.org), and RegulomeDB64.

eQTL analyses

We used two approaches to systematically explore the role of novel loci in regulating gene expression. First, to gain a general overview of the regulatory role of newly identified GWAS regions, we conducted an eQTL lookup using >50 eQTL studies65, with specific citations for >100 data sets included in the current query for blood cell related eQTL studies and relevant non-blood cell tissue eQTLs (for example, adipose and brain tissues). Additional eQTL data was integrated from online sources including ScanDB, the Broad Institute GTEx Portal, and the Pritchard Lab (eqtl.uchicago.edu). Additional details on the methods, including study references can be found in Supplementary Note 3. Only significant cis-eQTLS in high LD with our novel lead SNPs (r2>0.9, calculated in the CEU+YRI+CHB+JPT 1000 Genomes reference panel), or proxy SNPs, were retained for consideration.

Second, since public databases with eQTL data do not have information available on current smoking status, we also conducted a cis-eQTL association analysis using expression results derived from fasting peripheral whole blood using the Human Exon 1.0 ST Array (Affymetrix, Inc., Santa Clara, CA). The raw expression data were quantile-normalized, log2 transformed, followed by summarization using Robust Multi-array Average66 and further adjusted for technical covariates, including the first principal component of the expression data, batch effect, the all-probeset-mean residual, blood cell counts, and cohort membership. We evaluated all transcripts ±1 Mb around each novel variant in the Framingham Heart Study while accounting for current smoking status, using the following four approaches similar to those used in our primary analyses of our traits: (1) eQTL adjusted for SMK, (2) eQTL stratified by SMK, (3) eQTL × SMK interaction and (4) joint main+eQTLxSMK interaction). Significance level was evaluated by FDR<5% per eQTL analysis and across all loci identified for that model in the primary meta-analysis. Additional details can be found in Supplementary Note 3.

Variance-explained estimates

We estimated the phenotypic variance in smokers and nonsmokers explained by the association signals. For each associated region, we selected subsets of SNPs within 500 kb of our lead SNPs and based on varying P value thresholds (ranging from 1 × 10−8 to 0.1) from Approach 1 (SNPadjSMK model). First, each subset of SNPs was clumped into independent regions to identify the lead SNP for each region. The variance explained by each subset of SNPs in the SMK and nonSMK strata was estimated by summing the variance explained by the individual lead SNPs. Then, we tested for the significance of the differences across the two strata assuming that the weighted sum of chi-squared distributed variables tend to a Gaussian distribution ensured by Lyapunov’s central limit theorem67,68.

Smoking behaviour lookups

In order to determine if any of the loci identified in the current study are associated with smoking behaviour, we conducted a look-up of all lead SNPs from novel loci and Approach 3 in existing GWAS of smoking behaviour3. The analysis consists of phasing study-specific GWAS samples contributing to the smoking behaviour meta-analysis, imputation, association testing and meta-analysis. To ensure that all SNPs of interest were available in the smoking GWAS, the program SHAPEIT2 (ref. 69) was used to phase a region 500Kb either side of each lead SNP, and imputation was carried out using IMPUTE2 (ref. 70) with the 1000 Genomes Phase 3 data set as a reference panel.

Each region was analysed for three smoking related phenotypes: (i) Ever vs Never smokers, (ii) Current vs Non-current smokers and (iii) a categorical measure of smoking quantity48. The smoking quantity levels were 0 (defined as 1-10 cigarettes per day [CPD]), 1 (11-20 CPD), 2 (21-30 CPD) and 3 (31 or more CPD). Each increment represents an increase in smoking quantity of 10 cigarettes per day. There were 10,058 Never smokers, 13,418 Ever smokers, 11,796 Non-current smokers, 6,966 Current smokers and 11,436 samples with the SQ phenotypes. SNPMETA48 was used to perform an inverse-variance weighted fixed effects meta-analysis across cohorts at all SNPs in each region, and included a single GC correction. At each SNP, only those cohorts that had an imputation info score >0.5 were included in the meta-analysis.

Main effects lookup in previous GIANT investigations

To better understand why our novel variants remained undiscovered in previous investigations that did not take SMK into account, we also conducted a lookup of our novel variants in published GWAS results examining genetic main effects on BMI, WC, WCadjBMI, WHR, WHRadjBMI, and height1,2,51.

GWAS catalog lookups

To further investigate the identified genetic variants in this study and to gain additional insight into their functionality and possible effects on related cardiometabolic traits, we searched for previous SNP-trait associations nearby our lead SNPs. PLINK was used to find all SNPs within 500 kb of any of our lead SNPs and calculate r2 values using a combined ancestry (AMR, AFR, EUR, ASN) 1000 Genomes Phase 1 reference panel62 to allow for LD calculation for SNPs on the Illumina Metabochip and to best estimate LD in our multiethnic GWAS. All SNPs within the specified regions were compared with the NHGRI-EBI (National Human Genome Research Institute, European Bioinformatics Institute) GWAS Catalog, version 1.0 (www.ebi.ac.uk/gwas)49,50 for overlap, and distances between the two SNPs were calculated using STATA v14, for the chromosome and base pair positions based on human genome reference build 19. All previous associations within 500 kb and with an R2>0.5 with our lead SNP were retained for further interrogation.

Genetic risk score calculation

We calculated several unweighted genetic risk scores (GRSs) for each individual in the population-based KORA-S3 and KORA-S4 studies (total N=3,457). We compared GRSs limited to previously known lead SNPs (see Supplementary Data 7 for lists of previously known lead SNPs) with GRSs based on previously known and novel lead SNPs from the current study (see Supplementary Tables 1–4 for lists of novel lead SNPs). Risk scores were tested for association with the obesity trait using the following linear regression models: The unadjusted GRS model (TRAIT=β01GRS), the adjusted GRS model (TRAIT=β01GRS+β2SMK) and the GRSxSMK interaction model (TRAIT=β01GRS+β2SMK+β3GRSxSMK). Additionally, we used an F statistic to test whether the residual sum of squares (RSS) for the full model including GRSxSMK interaction was significantly different from the reduced model.

Data availability

Summary statistics of all analyses are available at https://www.broadinstitute.org/collaboration/giant/.

Additional information

How to cite this article: Justice, A. E. et al. Genome-wide meta-analysis of 241,258 adults accounting for smoking behaviour identifies novel loci for obesity traits. Nat. Commun. 8, 14977 doi: 10.1038/ncomms14977 (2017).

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.