Introduction

Smoking is influenced by genetic and environmental factors.1, 2 Genome-wide association studies (GWAS) in populations of European ancestry have identified genetic variation associated with smoking behaviors, including smoking initiation (SI), smoking quantity and smoking cessation (SC). An initial, large (n=10 995) GWAS of smoking quantity identified associations with genetic variants in the nicotinic acetylcholine receptor α5, α3 and β4 subunit cluster on chromosome 15q25.1.3 Genome-wide meta-analyses in three large consortia (n=74 053, 31 226 and 41 150) of smoking behaviors confirmed the finding at 15q25.1 and refined the association signal within the locus.4, 5, 6 Additional studies in diverse populations also have revealed independent signals in this region, suggesting multiple biologically functional variants.7, 8 This locus has also been reported as a susceptibility locus for lung cancer; however, whether this effect is independent of smoking behavior is unclear.9, 10 Additional regions have been identified for smoking quantity (CHRNB3/CHRNA6) on 8p11,4 CYP2A6 on 19q134, 6 and LOC100188947 on 10q256), SI (BDNF on 11p13)6 and SC (DBH on 9q34).6

To date, all published GWAS for smoking behaviors have been conducted in populations of European descent.11 Conducting GWAS in non-European populations, such as African ancestry populations is important because of their greater genetic diversity and population differences in disease allele frequency, linkage disequilibrium patterns and phenotype prevalence.12 For smoking behaviors, the need for GWAS in African American populations is particularly clear; African Americans, on average, initiate smoking later, smoke fewer cigarettes per day, yet are less likely to successfully quit smoking. Further, they have a higher risk of smoking-related lung cancer than many other populations.13 Ethnic differences in the clearance of nicotine, cotinine and other metabolites have been shown to contribute to the observed differences in cigarette consumption across populations, mediated in part by genetic variants in the cytochrome p450 2A6 gene.14, 15, 16

The genetic architecture of smoking-related traits is not well described in non-European ancestral groups, but there is evidence that genetic determinants have important implications for multiple addictive behaviors in populations globally.17 We established the Study of Tobacco in Minority Populations (STOMP) Genetics Consortium, which represents 13 GWAS studies of men and women of African ancestry, to search for risk loci for smoking behaviors in this population.

Materials and methods

Study description

The STOMP Genetics Consortium is comprised of the following studies: the Women's Health Initiative SNP Health Association Resource (n=8208), the African American GWAS consortia of Breast Cancer (n=5061) and Prostate Cancer (n=5556), the Candidate Gene Association Resource Consortium (including the Atherosclerosis Risk in Communities (n=2916) study, the Cleveland Family Study (n=632), the Coronary Artery Risk Development in Young Adults (n=953) study, the Jackson Heart Study (n=2145) and the Multi-Ethnic Study of Atherosclerosis (n=1646)), the Cardiovascular Health Study (n=801), the Healthy Aging in Neighborhoods across the Life Span Study (n=918), the Health ABC Study (n=1137), the Genetic Study of Atherosclerosis Risk (n=1175) and the Hypertension Genetic Epidemiology Network (n=1241). A description of each participating study as well as details regarding the measurement and collection of smoking data for each study are provided in Supplementary Materials. All studies had local Institutional Review Board approval for the present study and all participants provided written informed consent.

Smoking phenotypes

We examined four smoking phenotypes previously shown to be heritable in the African and European ancestry samples18, 19, 20, 21 and used in prior GWAS of smoking behavior.4, 5, 6 SI contrasted individuals who reported having smoked 100 cigarettes during their lifetime (ever smokers) with those who reported having smoked between 0 and 99 cigarettes during their lifetime (never smokers), consistent with the Centers for Disease Control classification.22 Among smokers, the age of SI (AOI) represented the age individuals began smoking. Some studies captured the age they first tried smoking, whereas others collected the age they began smoking regularly. As prior research suggests similar heritabilities and high genetic correlation between these phenotypes, we justified using either value in a general assessment of AOI. Similarly, for cigarettes smoked per day (CPD), some studies collected maximum CPD, whereas others collected average CPD. Longitudinal twin data suggests a high correlation between these variables over time, which supported using either value in our analyses. For studies that collected CPD as ranges, the mid-point of the interval was used as the data point; for example, individuals who reported the CPD category 0–4 were assigned a CPD value of 2. SC contrasted individuals who had quit smoking at interview (former smokers) with those who were current smokers. As relapse to smoking is highest within the first year after quitting,23 we tried to reduce misclassification by excluding smokers who quit within 1 year of interview within studies with available data. Table 1 presents distributions of smoking phenotypes across participating studies.

Table 1 Descriptive characteristics of the 13 studies participating in the STOMP Consortium

Genotyping and quality control

Each study performed its own genotyping using Illumina (San Diego, CA, USA) or Affymetrix GWAS arrays (Santa Clara, CA, USA). Supplementary Tables 1 and 2 present the details of the arrays, genotyping quality control procedures and sample exclusions (i.e., sex mismatch, call rate failure, relatedness, missing smoking and ancestry outliers) for each study. The quality control filters applied by each study were comparable; single-nucleotide polymorphisms (SNPs) with call rates <95% (except the Genetic Study of Atherosclerosis Risk, <90%), <1% minor allele frequency or significant (P<10−6) departure from Hardy–Weinberg equilibrium were excluded, as were individuals with excess autosomal heterozygosity, mismatch between reported and genetically determined sex, or first- or second-degree relatedness. Genome-wide imputation24 was carried out in each study using the software MACH, IMPUTE, BEAGLE or BIMBAM v0.99,25, 26, 27, 28, 29, 30, 31, 32 to infer genotypes for SNPs that were not genotyped directly on the platforms, but were genotyped on the HapMap phase 2 CEU and YRI samples.33 SNPs with imputation quality scores <0.5 were excluded.

Data analyses

Study-specific GWAS analysis. Each study conducted uniform cross-sectional analyses for each smoking phenotype using an additive genetic model. Logistic regression was used for discrete traits (SI and SC) and linear regression was used for quantitative traits (CPD and AOI). Continuous, quantitative traits were normalized by transformation to Z scores, owing to heavy tails and non-normality. Outliers were removed within each study, where abs (Z)>2. Link (Y)=Z scores were fit using ordinary least squares regression. To investigate potential sources of heterogeneity across studies, we examined the distribution of African ancestry in each cohort (Supplementary Figure 1). To account for population stratification and admixture, all studies adjusted for an appropriate number of eigenvectors3, 4, 5, 6, 7, 8, 9, 10 from a study-specific principal components analysis.34 In addition, study-specific analyses included adjustment for age and case status or study site, when appropriate. Genomic control inflation factors were computed using standard methods.35, 36

Meta-analyses of GWAS results. We performed fixed-effect meta-analysis for each smoking phenotype by computing pooled inverse-variance-weighted β-coefficients, s.e. and Z scores for each SNP.37 All GWAS results were corrected via genomic control before the meta-analysis. The study-specific lambda values utilized in this step ranged from 1.01 to 1.08 for SI (Supplementary Table 1). Heterogeneity across studies was investigated using the I2 statistic.38 The results presented herein are corrected by a second GC correction based on λ of the meta-analyses (λ<1.02). A significance threshold of P<5 × 10−8 was considered to indicate genome-wide significance. Linkage disequilibrium statistics for the largest of the STOMP cohorts (Women's Health Initiative, n=8208) were calculated using DPRIME (http://www.phs.wfubmc.edu/public/bios/gene/downloads.cfm). Linkage disequilibrium statistics for CEU and YRI were obtained from HapMap phase 2 33. Statistical power analysis was performed using QUANTO.39

Results

The meta-analysis included 32 389 genotyped men and women of African ancestry from 13 studies with sample sizes ranging from n=632 to n=8208 (Table 1). Our meta-analysis sample was 66.1% female, the mean age when smoking information was collected ranged from 35.5 to 73.4 years, and 52.7% were ever smokers. Among smokers, mean CPD ranged from 11.5 to 15.7, the mean AOI ranged from 17.3 to 23.3 years, and 44.8% were former smokers.

Sample sizes for the four smoking phenotype analyses (i.e., with complete genotype and phenotype data) were n=32 389 for SI, n=16 877 for AOI, n=15 547 for CPD and n=16 215 for SC. Manhattan plots for the four smoking phenotypes after double-GC scaling are shown in Figure 1. In the entire analysis, only one SNP, rs2036527, achieved genome-wide significance for one trait, CPD (β=0.04, s.e.=0.007, P=1.84 × 10−8, I2=41.6%, Table 2; study-specific results are show in Supplementary Table 3). This variant is located 6246 bp 5′ of the CHRNA5 gene on chromosome 15q25.1. We observed multiple SNPs with P-values of 10−7 associated with CPD: rs3101457, located in intron 2 (IVS2) of C1orf100 on 1q44, and rs547843, located 63 kb 5′ of a non-coding RNA sequence (LOC503519) on 15q12. Three highly correlated SNPs (r2>0.95, YRI) in the SPOCK2 gene on 10q22.1 exhibited a P-value of 10−7 with AOI (Table 2). The most significant associations for SI and SC were observed at rs566973 (20 kb 3′ of CRCT1 on 1q21.3) and rs3813637 (in the 3′-untranslated region of C1orf49 on 1q25.2), respectively (data not shown).

Figure 1
figure 1

Double genomic control (GC)-corrected Manhattan plots showing significance of association of all single-nucleotide polymorphisms (SNPs) for four smoking phenotypes. (ad). SNPs plotted on the x axis according to their position on each chromosome against, on the y axis (shown as −log10 P-value), the association with (a) smoking initiation (SI, ever vs never smokers), (b) age of SI, (c) cigarettes smoked per day, and (d) smoking cessation (former vs current smokers). Dotted red line indicates genome-wide significance threshold of P<5 × 10−8.

Table 2 SNPs with meta-analytic P-values of <1 × 10−6 for CPD and AOI

Four top SNPs associated with CPD span approximately 100 kb (76.6–76.7 Mb) at 15q25.1; from rs3813570, located in the 5′-untranslated region (c.-72T>C) of PSMA4, to rs938682, located in IVS4 (c.378-1941C>T) of CHRNA3 (Table 2 and Figure 2). The most significant SNP, rs2036527, is located between PSMA4 and CHRNA5, and is correlated with the index signals (rs1051730, rs16969968) for CPD reported in previous European ancestry studies. In CEU, the r2 is 0.84 between rs2036527 and rs1051730, and 0.93 between rs2036527 and rs16969968. The r2 between rs2036527 and 1051730 is 0.44 in YRI, and 0.502 in STOMP, whereas rs16969968 is non-polymorphic. Rs2036527 is also correlated with SNPs in the European Americans that tag a haplotype associated with increased expression of CHRNA5 in prefrontal cortex brain samples from European Americans and African Americans,40 but is not correlated with this haplotype in African ancestry samples (r2 between rs2036527 and rs1979905=0.443 in CEU, 0.045 in YRI and 0.064 in STOMP). The additional signals at 15q25.1 with near genome-wide significance in our study are represented by rs667282, rs938682 and rs3813570, which are weakly correlated with rs2036527 (r20.2 in CEU, 0.12 in YRI and 0.084 in STOMP). These three SNPs are correlated with each other (r20.60 in CEU and 0.32 in YRI) as well as with rs578776 and other SNPs at 15q25.1 that define a signal for smoking intensity in the European ancestry populations that is independent of rs2036527.8 However, when conditioning on rs2036527 in the four largest study populations in our sample (the African American GWAS consortia of Prostate Cancer, African American GWAS consortia of Breast Cancer, Candidate Gene Association Resource and Women's Health Initiative; n=13 113), the association between these three SNPs and CPD diminished (P-values of 10−3 after conditioning on rs2036527; Supplementary Figure 2). Assuming the GWAS arrays utilized in this study provide adequate coverage of common alleles at 15q25.1, this suggests there are not multiple independent signals for CPD in this region in African Americans or the frequencies of the functional alleles and/or their effect sizes are much smaller than the signal defined by rs2036527.

Figure 2
figure 2

Forest and regional plot of rs2036527 with cigarettes smoked per day (CPD) from meta-analyses of the Study of Tobacco in Minority Populations (STOMP) consortia. Forest plot showing effect sizes across studies; I2=41.6%. Regional association plot show single-nucleotide polymorphisms (SNPs) plotted by position on chromosome against −log10 P-value. Estimated recombination rates (from HapMap-CEU) are plotted in light blue to reflect the local linkage disequilibrium (LD) structure on a secondary y axis. The SNPs surrounding the most significant SNP (purple) are color-coded to reflect their LD with this SNP (using pairwise r2 values from HapMap-CEU): orange, r20.8, red; 0.6–0.8, orange; 0.6–0.8; green, 0.4–0.6, light blue, 0.2–0.4; dark blue, <0.2. The blue bars at the bottom of the plot represent the relative size and location of genes in the region. AABC, African American GWAS consortia of Breast cancer; AAPC, African American GWAS consortia of Prostate Cancer; ARIC, Atherosclerosis Risk in Communities; CARDIA, Coronary Artery Risk Development in Young Adults; CFS, Cleveland Family Study; JHS, Jackson Heart Study; MESA, Multi-Ethnic Study of Atherosclerosis; HANDLS, Healthy Aging in Neighborhoods across the Life Span Study; HYPGEN, Hypertension Genetic Epidemiology Network; WHI, Women's Health Initiative.

Supplementary Table 4 presents how the variants associated with smoking behaviors in European ancestry populations performed in STOMP (rs1051730 in CHRNA3; rs16969968 in CHRNA5; rs1329650 and rs1028936 in LOC100188947; rs3733829 in EGLN2, near CYP2A6; rs6265, rs1013443, rs4923457, rs4923460, rs4074134, rs1304100, rs6484320 and rs879048 in BDNF; and rs3025343, near DBH). We observed modest nominally statistically significant associations for CPD with rs1051730 (P=0.0079) and rs16969968 (P=0.027), and for SC with rs3025343 (P=0.03).

Discussion

Investigating whether there are genetic variants associated with smoking behavior among African Americans is important, given that smoking prevalence and smoking-attributable mortality differ by race/ethnicity. Smoking prevalence and smoking intensity are lower for African Americans than European Americans, yet African Americans are less likely to successfully quit smoking.41

To our knowledge, this is the first meta-analysis of GWAS data for smoking behaviors in African Americans. The single genome-wide significant association we observed between rs2036527 and CPD is the same signal that was reported previously at 15q25.1 for nicotine dependence, smoking intensity and lung cancer in European ancestry samples.4, 5, 6, 42, 43 The strong association that we found for this SNP supports studies suggesting that it is highly correlated with the functional allele(s) in populations of African ancestry. The fact that we did not observe a strong second association signal in this region after conditioning on rs2036527 suggests that rs2036527 and correlated SNPs in the African ancestry populations may define a single common haplotype at chr15q25.1 with sufficient effect size to be detected in our sample. After back transformation of the beta estimate, mean CPD values for each rs2036527 genotype were 14.6 for AA, 13.5 for AG and 12.8 for GG, suggesting that there is an increase of less than one cigarette smoked per day for each copy of the A allele. This SNP accounted for approximately 0.20% of the phenotypic variance of CPD in our sample. This effect is similar to that reported for rs1051730, which is correlated with rs2036527, where each copy of the rs1051730 A allele corresponds to a approximately one CPD increase and accounts for 0.5% of the phenotypic variance in smoking quantity in populations of European ancestry.

A study of CHRNA5 knock-out mice showed that re-expressing this gene in the medial habenula, which extends projections to a brain region shown to mediate nicotine withdrawal,44 abolished the inhibitory effects of nicotine while maintaining the reinforcing effects of nicotine.45 In a functional magnetic resonance study of smokers, genetic variation in CHRNA5 appeared to also affect reactivity to smoking cues in the insula, hippocampus and dorsal striatum, regions implicated in addictive behavior and memory.46 Thus, it is biologically plausible that rs2036527, as a correlate of increased expression of the CHRNA5 gene, could be associated with smoking quantity as a consequence of neuro-adaptations resulting from complex interactions between genes and environment that alter positive and negative reinforcement.47

To our knowledge, no SNPs in the SPOCK2 gene, which encodes a protein that forms part of the extracellular matrix, have been reported previously in association with smoking behaviors or smoking-related cancer phenotypes. Variants at the SPOCK2 locus have been linked to bronchopulmonary dysplasia, a respiratory condition observed in premature infants48 that has been linked to intrauterine smoke exposure.49 These variants are weakly correlated with the SNPs identified at this locus for AOI in Europeans (r2<0.25 in CEU), but are not correlated in the African ancestry populations (r2=0). The top SNP associated with SC (rs3813637) is located at 1q25 in the C1orf49 gene. This locus has been linked to late-onset Alzheimer's disease, but genetic variation at this locus has not been reported in association with smoking behavior.50 We are not aware of any smoking-related, other behavioral or pathological phenotypes associated with the variants we detected at 1q44 (C1orf100) and 15q12 (LOC503519) or CTCT1 for CPD.

Although this is the largest GWAS meta-analysis of smoking phenotypes conducted to date in men and women of African ancestry, statistical power was a significant limitation. We had 80% power (for a mean allele frequency of 0.15 and α of 5 × 10−8) to detect effect sizes of 1.25 for SI, AOI and SC, and a β of 0.15 for CPD. Notably, effect sizes for variants reported with many of these smoking phenotypes reported in the larger GWAS of the European ancestry were much smaller. For example, TAG, ENGAGE and Ox-GSK consortia reported β for SI of 0.015 for SNPs in BDNF and 0.026 for rs3025343 in DBH. Thus, we cannot rule out the possibility of additional loci that influence smoking behavior among African Americans that may be detected with larger sample sizes.

This analysis was limited by the fact that we were not able to adjust for local admixture, and the chip coverage of common variants (>5%) is less complete compared with the European populations,51 which applies to most GWAS of African American populations. However, the use of a global adjustment for population genetic variation in the regression analysis using the principal components approach provided some measure of control for potential confounding because of population admixture.34, 52 Additionally, we acknowledge the limited precision of the smoking phenotypes. Smoking quantity is a highly heritable trait: estimates for CPD, heavy versus light smoking and/or pack-years range from 40 to 70% heritability in the European, African and Asian ancestry twin and family studies. Other studies have estimated that shared environmental factors account for 50% or more of the observed variation in SI, AOI and SC.1, 18, 20, 53, 54, 55, 56, 57

We were unable to directly assess more refined phenotypes and highly heritable traits such as nicotine metabolism,58 given our reliance on existing data originally collected for other purposes. Moreover, we were unable to examine gene × environment interactions using meta-GWAS analytic approach. Our analyses did not incorporate environmental covariate analyses, such as type of cigarettes smoked, mentholated or non-mentholated, dietary factors, socioeconomic status and other factors that might influence one or more of the phenotypes analyzed—data were not uniformly available and beyond the scope of the planned analyses we undertook in this discovery investigation. Future prospective studies with more detailed characterizations of smoking phenotypes and relevant environmental covariates are needed to identify additional variants that may be associated with smoking behaviors.

In summary, collective findings from GWAS among the African and European ancestry populations implicate chromosome 15q25 region as the most significant for smoking quantity. However, for both populations, SNPs in this region are associated with very small changes in smoking quantity and explain a small proportion of the variance, which suggests that conventional GWAS approaches may not be adequate to discover the likely hundreds of variants contributing small increments in risks of the additive genetic effects for heritable traits or so-called ‘missing heritability’ of complex diseases.59 The use of more refined, specific and harmonized phenotypes capturing the complex behavior of SI, trajectories of progression and cessation, and environmental effect-modifiers are also needed to detect the genetic architecture of smoking behavior in different ancestral populations. Larger studies utilizing next-generation SNP arrays, whole-exome or whole-genome sequencing will be required to investigate lower-frequency variation, which may contribute to unexplained heritability for common traits.60