Genome-wide association study of smoking trajectory and meta-analysis of smoking status in 842,000 individuals

Xu, Ke; Li, Boyang; McGinnis, Kathleen A.; Vickers-Smith, Rachel; Dao, Cecilia; Sun, Ning; Kember, Rachel L.; Zhou, Hang; Becker, William C.; Gelernter, Joel; Kranzler, Henry R.; Zhao, Hongyu; Justice, Amy C.

doi:10.1038/s41467-020-18489-3

Download PDF

Article
Open access
Published: 20 October 2020

Genome-wide association study of smoking trajectory and meta-analysis of smoking status in 842,000 individuals

Nature Communications volume 11, Article number: 5302 (2020) Cite this article

8573 Accesses
47 Citations
19 Altmetric
Metrics details

Subjects

Abstract

Here we report a large genome-wide association study (GWAS) for longitudinal smoking phenotypes in 286,118 individuals from the Million Veteran Program (MVP) where we identified 18 loci for smoking trajectory of current versus never in European Americans, one locus in African Americans, and one in Hispanic Americans. Functional annotations prioritized several dozen genes where significant loci co-localized with either expression quantitative trait loci or chromatin interactions. The smoking trajectories were genetically correlated with 209 complex traits, for 33 of which smoking was either a causal or a consequential factor. We also performed European-ancestry meta-analyses for smoking status in the MVP and GWAS & Sequencing Consortium of Alcohol and Nicotine use (GSCAN) (N_total = 842,717) and identified 99 loci for smoking initiation and 13 loci for smoking cessation. Overall, this large GWAS of longitudinal smoking phenotype in multiple populations, combined with a meta-GWAS for smoking status, adds new insights into the genetic vulnerability for smoking behavior.

Genetic diversity fuels gene discovery for tobacco and alcohol use

Article Open access 07 December 2022

Gretchen R. B. Saunders, Xingyan Wang, … Scott Vrieze

Multi-trait genome-wide association analyses leveraging alcohol use disorder findings identify novel loci for smoking behaviors in the Million Veteran Program

Article Open access 05 May 2023

Youshu Cheng, Cecilia Dao, … Ke Xu

Expanding the genetic architecture of nicotine dependence and its shared genetics with multiple traits

Article Open access 03 November 2020

Bryan C. Quach, Michael J. Bray, … Dana B. Hancock

Introduction

Cigarette smoking has an estimated heritability of 40–70%^1,2, and is a leading cause of morbidity and mortality worldwide. In the past two decades, over 25 genome-wide association studies (GWAS) of smoking and smoking-related phenotypes have been reported^{3,4,5,6,7,8,9}. Among several hundred reported loci, the established loci are predominantly in a few genomic regions, such as 15q25 (CHRNA5-A3-B4) and 8p11 (CHRNB3-CHRNA6)^3,4,10,11. Large meta-GWASs have been reported, including a study from the UK Biobank (UKBB) and the Tobacco and Genetics Consortium (TAG) with a total of 518,633 individuals that identified 223 loci for ever versus (vs.) never smokers¹⁰. These genetic variants accounted for 10.9% of the phenotypic variation for this single trait¹⁰. The largest meta-GWAS to date by the GWAS & Sequencing Consortium of Alcohol and Nicotine use (GSCAN) included up to a total of 1.2 million individuals (depending on traits) from 26 cohorts and identified 406 loci associated with multiple stages of cigarette use (initiation, cessation, and heaviness)⁵. In that study, genetic heritability was 4–8% for smoking phenotypes⁵. A meta-analysis GWAS of up to 61 studies identified 40 new rare or low-frequency variants associated with smoking behavior⁶. However, none of the large GWAS studies identified genetic variants for smoking trajectories.

Smoking is a complex trait typically ascertained by self-reported responses to questionnaires. Phenotypes previously used for GWAS include self-reported ever vs. never smoked^3,12,13,14, number of cigarettes smoked per day (CPD)^3,9,14, age of initiation, smoking cessation, and nicotine dependence defined by the Fagerström Test for Nicotine Dependence (FTND)^7,15. However, single measures of self-reported health behaviors, especially those that are stigmatized like smoking, are subject to social desirability bias and substantially underestimate smoking¹⁶, and they often measure state rather than trait. Biomarkers for nicotine exposure offer less biased metrics and have demonstrated strong associations with genetic markers^17,18. For example, the nicotine metabolite ratio, which can be estimated in blood, provided an estimated heritability as high as 0.81 and was linked to 19q13^19,20. However, the feasibility of using biomarkers for large-scale gene discovery is limited. In contrast, electronic medical records (EMRs) provide an opportunity to obtain large-scale longitudinal data that can be linked to genetic data. The large scope of EMR-derived data increases power, overcoming some of the limitations of state phenotypic measurement. Few studies have applied a longitudinal smoking phenotype for gene discovery²¹.

We leverage the EMR-derived longitudinal data to perform the largest longitudinal smoking GWAS in a multi-ethnic cohort and identify 16 genetic loci associated with smoking trajectory contrasts for European American (EA), one locus for African American (AA), and one locus for Hispanic American (HA) individuals from the Million Veteran Program (MVP, N = 286,118). We also meta-analyze smoking status GWASs from EAs in the MVP and European subjects in GSCAN (excluding data from 23andMe which summary statistics are not released for public), yielding a total sample of 842,717 individuals and leading to the identification of 99 genetic loci associated with smoking initiation and 13 loci associated with smoking cessation. We further characterize the genetic risk for smoking trajectory by estimating SNP-based heritability, prioritizing causal genes and biological pathway interpretation, determining genetic correlation of smoking trajectory with psychiatric and nonpsychiatric traits, and exploring the causality of smoking trajectory with other genetically correlated traits. The phenotypic characterization of the studied populations is presented in Table 1 and analytical approach is presented in Fig. 1. Overall, we identify multiple genetic loci associated with smoking trajectory contrasts in diverse populations. Our findings highlight the significance of incorporating longitudinal data in the EMR-derived phenotypes in improving the identification and biological interpretation of genetic associations characterizing smoking behaviors for diverse populations.

Table 1 Phenotype distribution of smoking trajectories and demographic characteristics in the Million Veteran Program (N = 286,118).

Full size table

**Fig. 1: An overview of analyses performed on the smoking trajectory and status phenotypes.**

Results

GWAS for EMR-based smoking trajectories in the MVP

Using a previously validated approach²² and data from over 2.2 million clinical encounters (per person mean = 10.5, median = 8) where smoking status was recorded in the EMR for a total of 286,118 veterans (209,915 EAs, 54,867 AAs, and 21,336 HAs), we identified three distinct longitudinal trajectory groups for smoking in each population group. The trajectory groups were defined using the highest assignment probabilities and were identified as individuals who mostly never smoked, those with mixed smoking and nonsmoking and those who, over time, reported mostly current smoking. As a binary comparison phenotype, we also defined smoking status as smoking initiation (ever vs. never) and smoking cessation (current vs. past) using a classification method based on the modal value, i.e., the most common value of all data available in the same samples. Although the two phenotypes were highly correlated, by comparing their performance we evaluated their relative utility for detecting genetic variants contributing to smoking behavior.

We first conducted a GWAS for smoking trajectories separately in EA, AA, and HA populations in the MVP, then conducted a trans-ethnic meta-GWAS for smoking trajectories. We used a multinomial regression model to analyze the unordered multinomial smoking trajectory groups and adjusted for age, sex, and ten principal components (PCs) as covariates. The pairwise smoking trajectory contrasts are current vs. never (contrast I), which corresponds to smoking initiation, and current vs. mixed (contrast II), which corresponds to smoking cessation. The two trajectory contrasts were modeled simultaneously, resulting in smaller standard error estimates of genetic effects and thus greater statistical power. The overall fit of the model was evaluated using the likelihood ratio test.

In the EA samples, we identified 16 independent genome-wide significant (GWS) loci (pairwise r² < 0.1) from the likelihood ratio test on the multinomial smoking trajectories (p < 5 × 10⁻⁸) (Supplementary Table 1) from the overall trajectory model. Seven of the 16 loci replicated previously reported associations with smoking or related phenotypes^10,11,23,24. The most significant single nucleotide polymorphism (SNP) was rs7515828 (likelihood ratio test p = 6.29 × 10⁻¹⁶) near LNC01360 on chromosome 1, which was recently reported to be significantly associated with smoking in a large GWAS from the UKBB and TAG¹⁰. Analysis for trajectory contrast I (current vs. never) showed minimal inflation (before correction λ_LDSC = 1.0281, se = 0.0096; after LDSC correction λ_LDSC = 0.9997, se = 0.0094; and after genomic control correction λ_LDSC = 0.8247, se = 0.0078). We identified 18 significant loci for trajectories of current vs. never with odds ratios (ORs) in the range of 0.90–1.09 (Table 2 and Supplementary Data 1) (Fig. 2 and Supplementary Fig. 1). The analysis for contrast II identified five significant loci for current vs. mixed with ORs ranged from 0.92 to 1.08 (before correction λ_LDSC = 1.0236, se = 0.0091; after LDSC correction λ_LDSC = 1.0000, se = 0.0089; and after genomic control correction λ_LDSC = 0.8868, se = 0.0081). Three genes–CHRNA2, DBH, and DRD2–were GWS in both contrast analyses (Fig. 2). Based on these findings, we believe that there is an association of DRD2 with smoking trajectory, although previous findings on this relationship have been controversial^{23,25,26,27,28}. These loci coincide with the most recent findings from large meta-GWAS^5,6.

Table 2 Genome-wide significant associations for smoking trajectories in the Million Veteran Program (N = 286,118).

Full size table

**Fig. 2: Genome-wide association study results for smoking trajectory contrasts.**

In the AA samples, we identified one locus for trajectory contrast I (current vs. never) with the lead SNP rs4478781 near to LINC01346 (OR = 1.106, Wald test p = 1.06 × 10⁻⁸) (Table 2 and Supplementary Data 1) (Supplementary Fig. 2.1). A previous study reported that DNA hydroxymethylation of LINC01346 was associated with a quantitative measure of neuritic plaque in Alzheimer disease in human postmortem brain²⁹, suggesting a potential functional impact of this gene on brain pathology. No GWS variant was found for trajectory contrast II (current vs. mixed) in AAs.

In the HA samples, we identified one variant, rs1334346 near to Regulating Synaptic Membrane Exocytosis 1 (RIMS1) that was significantly associated with smoking trajectory contrast I (OR = 0.831, Wald test p = 8.22 × 10⁻⁹) (Table 2 and Supplementary Data 1) (Supplementary Fig. 2.2). The protein encoded by RIMS1 plays a role in the regulation of voltage-gated calcium channels during neurotransmitter release, and the variants in this gene were associated with autism³⁰ and schizophrenia³¹. The analysis of contrast II (current vs. mixed) for smoking trajectory revealed no significant variant in HAs.

The trans-ethnic meta-GWAS combining EAs, AAs, and HAs in the MVP identified 14 GWS loci for contrast I (current vs. never) and 4 GWS loci for contrast II (current vs. mixed) (Supplementary Data 2) (Supplementary Fig. 3). The significant loci largely overlapped with the significant loci of the smoking trajectory analysis in EAs. For contrast I (current vs. never), 9 of 14 loci overlapped with the significant loci in EAs, and 5 loci were revealed only by smoking trajectory meta-analysis. For contrast II (current vs. mixed), the four loci identified by the trans-ethnic meta-GWAS were identical with the significant loci in EAs. Given that the sample sizes for AA and HA were relatively limited, subsequent analyses of smoking status and all downstream analyses focused on European-ancestry individuals.

GWAS for smoking status

We first identified loci for smoking status in the EAs from the MVP sample using the phenotypes of smoking initiation (ever vs. never) and smoking cessation (current vs. past). Using a logistic regression model, a GWAS for smoking status showed very little inflation (λ_LDSC = 1.01 for initiation and λ_LDSC = 1.06 for cessation). Twelve loci were GWS for smoking initiation (Supplementary Data 1) (Supplementary Fig. 4), including three that were not seen in either the UKBB or GSCAN studies: GRIK4, SPATS2, and FMNL3. The GWAS for smoking cessation identified eight GWS loci (Supplementary Data 1) (Supplementary Fig. 4). Two of eight loci, rs34735365 near to NR5A2 and rs77648866 near to NICN1, were associated only with smoking cessation and not with smoking initiation.

Of note, previous studies have shown significant associations in EAs of 15q25 (CHRNA5-A3-B4) and 8p11 (CHRNB3-CHRNA6) with CPD or FTND^32,33. To validate these findings in the MVP sample, we conducted two GWASs for current CPD (N = 17,014) and past (N = 77,515) CPD phenotypes in EAs using MVP survey data, which yielded three GWS for each (Supplementary Table 2). The most significant signal for current CPD was a well-established SNP, rs12914385, located in an intron of CHRNA3 on 15q25 (β = 0.014; Wald test p = 4.30 × 10⁻¹⁰). Another significant SNP, rs3025386, was near DBH (β = −0.017; Wald test p = 1.25 × 10⁻⁸), a gene previously associated with smoking cessation^3,24,34. The third significant SNP—rs11697662 on chromosome 20—is a novel locus for current CPD (β = 0.016; p = 6.93 × 10⁻⁹). For past CPD, the most significant SNP, rs8040863, is a well-established smoking-associated exonic variant on CHRNA3 (β = 0.044; Wald test p = 1.80 × 10⁻²⁰). Rs12459249, proximal to CYP2A6, previously associated with the nicotine metabolite ratio²⁰, was also GWS for past CPD (β = −0.033; Wald test p = 1.33 × 10⁻¹¹). A novel locus for past CPD was rs7571606 on TEX41 (β = 0.027; Wald test p = 4.62 × 10⁻⁹). Thus, our GWAS of CPD both replicated previously identified loci and yielded two novel loci for a quantitative smoking trait.

We performed meta-analyses on a combined sample size of 842,717 individuals from the MVP and GSCAN cohorts (excluding 23andMe) for two phenotypes separately: smoking initiation and smoking cessation. We identified 99 independent loci for smoking initiation and 13 loci for smoking cessation (Supplementary Data 3). For smoking initiation, our meta-analysis of MVP and GSCAN yielded 16 new loci, including rs10446671 near to LOC101927285 and rs10211770 near to LINC01441. For smoking cessation, two of 13 loci were not reported in GSCAN: rs329120 near JADE2 and rs12891477 near LINC00637.

Downstream analyses and biological interpretation of GWAS results

As mentioned earlier, we applied downstream analyses to the association results of the pairwise trajectory contrasts and smoking status phenotypes obtained in EA samples in the MVP. Here we studied SNP-based heritability and heritability enrichments, gene prioritization and pathway enrichments for top signals identified in the GWASs, and summary statistics-based genetic correlation and potential causal–consequential relationships with other psychiatric and nonpsychiatric phenotypes.

Using LDSC³⁵, the SNP heritability of smoking trajectory was 18.7% (SE = 0.010) for contrast I (current vs. never) and 5.8% (SE = 0.005) for contrast II (current vs. mixed). The heritability of smoking status in EAs was 6.9%, SE = 0.004 for smoking initiation and 6.1%, SE = 0.005 for smoking cessation (Supplementary Table 3). Thus, smoking trajectory I (current vs. never) accounted for more SNP heritability than smoking initiation while smoking trajectory II (current vs. mixed) has SNP heritability comparable to smoking cessation.

To estimate tissue and cell type-specific heritability, we conducted partitioned LD score regression using GenoSkyline-Plus functional annotations only in EAs^36,37. LDSC showed significant heritability enrichments in multiple tissue and cell lines (Supplementary Data 4). In contrast I (current vs. never), the most significant heritability enrichment was for the anterior caudate in brain (enrichment score = 5.4, Wald test p = 4.7 × 10⁻⁷). Other significantly enriched cell types included the cingulate gyrus (enrichment score = 5.8, p = 1.5 × 10⁻⁴), normal human astrocytes (enrichment = 5.1, Wald test p = 1.1 × 10⁻⁰⁴), and G-CSF-mobilized hematopoietic CD34+ stem cells (enrichment = 3.7, Wald test p = 9.0 × 10⁻⁵) (Supplementary Data 4). No significant heritability enrichment was identified for contrast II (current vs. mixed). The heritability of smoking status in the MVP-only sample showed significant enrichment in the anterior caudate for smoking initiation (enrichment = 5.7, Wald test p = 2.9 × 10⁻⁸) and in G-CSF-mobilized hematopoietic CD34+ stem cells for smoking cessation (enrichment = 5.3, Wald test p = 9.9 × 10⁻⁵) (Supplementary Data 4).

To prioritize causal genes, we performed functional annotation of the GWAS-identified genetic variants for smoking trajectory in EA samples in the MVP by using co-localization analysis that focused on two molecular traits, expression quantitative trait loci (eQTLs) and histone interactions in brain tissues. We used the false discovery rate (FDR) < 0.05 for the eQTL mapping and FDR < 1 × 10⁻⁶ for the chromatin interaction mapping. Co-localization analysis was separately performed for smoking trajectories, contrast I (current vs. never) and contrast II (current vs. never) using FUMA GWAS. For contrast I (current vs. never), we identified 40 significant genomic regions co-localized among 18 significant loci and molecular traits (Supplementary Data 5). Positional mapping identified 16 significant genes, eQTLs were mapped to 28 genes in multiple brain regions including basal ganglia and hippocampus, and chromatin interactions were mapped to 21 genes in adult cortex. Importantly, co-localization analysis showed that several well-known genes for smoking, e.g., DRD2, CHRNA2, TTC12, were causal genes (Fig. 3). For contrast II (current vs. mixed), we identified five significant genomic regions. Positional mapping revealed three significant genes (TTC12, CHRNA2, and CYP2A7); eQTL mapping identified an additional significant gene, EPHX2, in the brain cerebellar hemisphere; and chromatin interaction mapping identified two significant genes, TTC12 and TMPRSS5 in adult cortex (Supplementary Data 5) (Fig. 3). The mapped brain cortex, hippocampus, and basal ganglia regions are functionally involved in reward reinforcement, cognitive processing, and emotion regulation and thus have implications for the etiology of addictive behaviors and their association with other neuropsychiatric disorders including depression^38,39,40. These findings provide further support that GWAS for smoking trajectory identified biologically meaningful signals.

**Fig. 3: Gene prioritization for smoking trajectory contrasts.**

Using the Database for Annotation, Visualization and Integrated Discovery (DAVID)⁴¹ to conduct pathway analysis for smoking trajectories, we identified 23 significant pathways for smoking trajectory contrast I (current vs. never) and 4 for trajectory contrast II (current vs. mixed) (Supplementary Data 6). The top enriched pathways for trajectory contrast I (current vs. never) were identified for the histone H1 gene cluster and were involved in DNA packaging complex, nucleosome assembly and organization, and chromatin assembly/disassembly activities. The top enriched pathways for trajectory contrast II (current vs. mixed) were identified for the CYP2 gene cluster and were related to epoxygenase, oxidoreductase, and monooxygenase activities.

We used GeNetic cOVariance Analyzer (GNOVA) to estimate genetic correlations between smoking trajectory contrasts in EA samples from the MVP and 524 psychiatric and nonpsychiatric traits for which large-scale summary statistics are available^42,43. We found 209 significant genetic correlations (Bonferroni-corrected p < 0.05/524/2 = 4.8 × 10⁻⁵) between at least one of the trajectory contrasts and other traits (Supplementary Data 7). As shown in Fig. 4, smoking trajectory contrasts were significantly genetically correlated with other smoking-related phenotypes, alcohol phenotypes, and psychiatric disorders, including positive correlations with depressive symptoms and schizophrenia. There were also significant positive genetic correlations between smoking trajectory contrasts and lung disease, coronary artery disease, diabetes, obesity, and negative correlations between smoking and overall health rating. All of these findings are expected and consistent with previous reports.

**Fig. 4: Genetic correlations between smoking trajectory contrasts and multiple complex traits.**

To understand causal–consequential relationships between smoking trajectory and significantly genetically correlated traits, we performed a bi-directional two-sample MR analysis for smoking trajectory contrast I (current vs. never) and contrast II (current vs. mixed) separately. Here, significant SNPs from the trajectory GWAS served as an instrumental variable, smoking trajectory as an exposure, and a psychiatric or nonpsychiatric trait as an outcome variable; MR analyses were also performed with smoking trajectory as an outcome and the various complex traits as exposures. Trait pairs with three or fewer instrumental variables were excluded from the analysis. We used four MR methods: inverse-variance weighted (IVW), weighted median (WM), MR-Egger regression (MR-Egger), and MR-PRESSO to test the causal relationships. We applied a Bonferroni correction to correct for different MR methods and all combinations of causal paths. We present causal paths for statistically significant traits identified by one of the four MR methods.

We identified five significant causal paths using trajectory contrast I (current vs. never) as an exposure and 23 significant causal paths using it as an outcome. While no significant results identified using trajectory contrast II (current vs. mixed) as an exposure, 12 were identified as being significant using it as an outcome (Supplementary Data 8). MR-Egger seemed to be the most conservative and identified no significant causal relationship or pleiotropic effect. While the significance varied across methods, the magnitude and direction of estimated causal effects were highly consistent across IVW, median weighted, and MR-PRESSO. Our analyses suggested multiple interesting causal–consequential relationships. For example, in trajectory contrast I (current vs. never), smoking was a cause of wheeze or whistling in the chest in last year, while depression (fed-up feelings) likely led to smoking. Smoking was an outcome of multiple traits related to body weight including body mass index and whole-body fat mass. Interestingly, we found that higher education likely had a causal effect on smoking behavior. We found no trait pair with both significant forward and reverse causal relationships. It is noted that the statistical significance is inadequate to claim a causal relationship between trait pairs. The identified significant MR results serve as suggestive evidence for further investigation and the causal relationships should be interpreted with caution.

Discussion

We present data from a GWAS of smoking phenotypes in the MVP, the largest relatively homogeneous clinical sample studied to date, and combine these results with a published study to yield a large meta-analysis for smoking status. We identified more than a dozen significant genetic risk loci for smoking trajectories in the MVP. A meta-GWAS of MVP and GSCAN in European-ancestry individuals identified more than 100 loci for smoking status, including loci that were not seen in either MVP or GSCAN alone. The loci identified in the MVP cohort account for more genetic heritability in smoking behavior than previous studies^5,10. The heritability was enriched in brain regions related to smoking behavior. We prioritized 40 genes strongly associated with biological functions relevant to smoking trajectory of current vs. never, including neural and synaptic function. Smoking-associated loci were highly correlated with 209 smoking-related psychiatric and nonpsychiatric phenotypes. We found 33 significant causal relationships between smoking trajectory and other smoking phenotypes, alcohol consumption, and psychiatric traits. Together, our results provide new insights into genetic risks for smoking behavior and implicate loci to be validated biologically in future studies.

We leveraged robust EMR data to yield a longitudinal phenotype for the trait of smoking trajectories. This novel phenotype identified genomic loci for smoking trajectories overlapping with those from a meta-analysis of smoking status in a much larger sample^5,10. Thus, longitudinal smoking trajectories may offer greater power to detect novel genetic variants than the widely used binary phenotype of ever vs. never smoked. We identified several interesting loci for smoking trajectory. A locus containing neuronal growth regulator 1 (NEGR1) was associated with the smoking trajectory of current vs. never. NEGR1 encodes a protein involved in neuronal development and maturation^44,45; the gene has been associated with major depression⁴⁶ and showed altered gene expression in the prefrontal cortex of schizophrenic patients⁴⁷. A variant in NEGR1 was also previously linked to obesity⁴⁸, which, like smoking, involves a dysregulation of appetitive behavior. Several loci associated to smoking trajectory were also associated with other relevant phenotypes. A locus containing CNNM2, which was significantly associated with smoking trajectories in this study, was recently linked to smoking behavior in a large meta-analysis⁶. A variant in CAMKMT for smoking trajectories in the MVP was previously associated with anxiety disorder⁴⁹.

Some of the genes identified in the meta-analysis for smoking status are particularly noteworthy. For example, rs6265 on BDNF (Wald test p = 1.3 × 10⁻¹⁴) is a well-established exonic functional SNP for psychiatric disorders^50,51, including smoking behavior. One of the loci located in NRXN1 (Neurexin 1, index SNP rs11125335, Wald test p = 1.1 × 10⁻¹¹) was previously associated to nicotine dependence⁵². Neurexins are cell-surface receptors that bind neuroligins to form Ca(2+)-dependent neurexin/neuroligin complexes at synapses in the central nervous system⁵³. This locus was previously identified as a candidate gene but was not GWS in the TAG samples¹². Genetic variation in NRXN1 has also been linked to autism spectrum disorder⁵⁴, attention deficit hyperactivity disorder⁵⁵, and schizophrenia⁵⁶. One of the significant signals for smoking status was from NCAM1, index SNP rs7110863 (Wald test p = 1.6 × 10⁻³⁵), which was highly significant for smoking trajectory in the MVP (index SNP rs7126748; Wald test p = 4.2 × 10⁻⁸). NCAM1 is located in a well-known genomic region on chromosome 11, NCAM1-TTC12-ANKK1-DRD2, previously associated with nicotine dependence²⁸ and smoking motives⁵⁷.

Of note, despite several recently reported large GWASs for smoking traits^5,10 that revealed more loci than the present study, the estimated heritability of smoking for trajectory contrast I (current vs. never) in our study is substantially higher than that in previous studies (>15% for contrast I vs. <10% for other smoking traits). These results suggest that smoking trajectories are a more powerful trait for detecting genome-wide signals with small effects on smoking behavior. A higher SNP heritability level in our sample also suggests that meta-GWAS from the multiple cohorts may be missing vulnerability for smoking traits, possibly due to heterogeneity in the meta-GWAS. Note that our results estimated with LDSC were based on a heritability model that assumes that all genetic variations contribute equally to the SNP-based heritability and the summary statistics were derived after adjusting for age, sex, and top ten PCs as covariates. The heritability estimation is subject to inadequate covariate adjustment and novel methods developed to provide a more flexible specification of heritability model are worth future investigation^58,59,60.

We were unable to replicate the identified loci associated with smoking trajectories in an independent sample due to a lack of a replication cohort with comparable longitudinal measurements of smoking behavior. We identified only one locus for smoking trajectory in the AA and HA populations, likely due to the modest sample sizes in these populations. The findings from trans-ancestry meta-analysis mostly reflect the findings from the EA samples. In addition, the majority of our samples were men, so that it was not feasible to examine sex-specific genetic risks.

In conclusion, compared to more widely used, simple smoking phenotypes, we demonstrated that (1) longitudinal EMR data permit the estimation of smoking trajectories, and (2) these phenotypes yield greater statistical power to detect small-effect variants, enabling the detection of novel genetic risk variants for smoking behavior.

Methods

Study cohort

The MVP recruited veteran volunteers and collects data from questionnaires, EMRs, and blood samples for genomic analysis. The Central Veterans Affairs Institutional Review Board (IRB) and site-specific IRBs approved the MVP study. All relevant ethical regulations for work with human subjects were followed in the conduct of the study and informed consent was obtained from all participants.

We selected 26,497 LD pruned SNPs and applied flashpca version 1.2.5 to perform PC analysis on 343,268 unrelated MVP samples and 2504 1000 Genomes Project (1KG) samples to identify population structure. The annotated Euclidean center of European, African, admixed American, East Asian, and South Asian 1KG samples plus the admixture analysis results with a probability cutoff of 0.8 were used to define populations. We removed samples with a high genotype missing rate (>10%), discordant sex, excessive heterozygosity (>3 sd), and up to second-degree relatives. A total of 209,915 EAs, 54,867 AAs, and 21,336 HAs passed quality control filters. The mean age of the study cohort ranged from 58 to 64 years across three ancestral groups (EA: mean = 64, standard deviation (SD) = 13, AA: mean = 58, SD = 12, and HA: mean = 56, SD = 15) and majority of the MVP samples were male (EA: 93%, AA: 87%, and HA: 92%).

Smoking phenotypes

Smoking data from 2000 to 2015 were obtained from the Veteran Healthcare Administration Corporate Data Warehouse. Details on the data extraction methods are provided elsewhere⁶¹. In brief, EMR smoking data are collected nationally from patients approximately annually using the clinical reminder process, which prompts providers to ask patients questions related to health. EMR smoking data consist of text values that represent responses to the specific smoking-related queries to patients, which can vary by site and over time. Mapping strategies were created to classify these responses into never, past, and current smoking status and can be found on www.vacohort.org⁶¹.

Smoking trajectory phenotype accounts for variation in smoking status over time²². We used joint trajectory modeling to sort each participant’s smoking values (current, past, never) into clusters and estimated distinct trajectories^62,63,64. We used age as the time scale to account for possible decreases in smoking with age. The procedure calculated each individual’s probability of belonging to each trajectory and assigned the individual to the trajectory with the highest probability of membership (mostly current smoking, mixed smoking and nonsmoking, mostly never smoking). Applying the phenotype definition, we identified 40,456 mostly current smokers, 110,403 mixed smokers, and 59,056 mostly never smokers in EAs, 13,511 mostly current smokers, 23,605 mixed smokers, and 17,751 mostly never smokers in AAs, and 2920 mostly current smokers, 11,221 mixed smokers, and 7195 mostly never smokers in HAs.

Using all available EMR smoking observations, we used the most common (modal) value for smoking status assessment. We identified 72,729 never smokers, 71,002 past smokers, and 66,184 current smokers in the EA sample. We further contrasted ever smoked (past or current smokers) with never smoked (nonsmokers) to study smoking initiation behavior. Similarly, we used the modal value of smoking status classification and contrasted current with past to reflect smoking cessation phenotype.

Cigarettes per day was based on responses to the MVP baseline survey questionnaire. The subjects were asked, “Do you currently smoke cigarettes?”, If yes, then “How many cigarettes do you smoke per day now?”. If no, then “Over the entire time you smoked, on average, how many cigarettes did you smoke per day?”. The responses were on a scale from 1 to 5, corresponding to (1) less than a half pack, (2) a half pack, (3) 1 pack, (4) 2 packs, and (5) more than 2 packs. We identified 17,014 and 77,515 individuals for CPD current and CPD past with nonmissing responses, respectively.

Genotyping, imputation, and quality control

MVP used an Affymetrix Axiom Biobank Array that genotyped ~723,000 markers, which were enriched for exonic variants. SNPs were validated for common diseases and phenotypes of specific interest to the VA population (e.g., psychiatric traits)⁶⁵. Minimac3 and the 1000 Genomes Project 3 reference panel were used to conduct genotype imputation, which resulted in ~79 million variants in this study⁶⁶. We filtered out rare variants (minor allele frequency < 0.01), variants with a missing rate > 5%, variants with imputation r² < 0.8, and those that deviated significantly from Hardy–Weinberg equilibrium (p < 1 × 10⁻⁶).

Single nucleotide polymorphism (SNP) association analysis

For the trajectory GWAS, we applied a multinomial regression analysis using SNPTEST (v2.5.4-beta2) to test for an association of SNPs with trajectories. The overall fit of the model was evaluated using the likelihood ratio test, which assessed the strength of the relationship between each genetic variation and the multinomial smoking trajectory groups by comparing the fit of two models with and without the genetic variation. The two trajectory contrasts were modeled simultaneously, which resulted in smaller standard error estimates of genetic effects and thus greater statistical power. We estimated genetic effects for two trajectory contrasts: contrast I (current vs. never) and contrast II (current vs. mixed). For the smoking status phenotype modal value that defined smoking initiation and smoking cessation, we applied logistic regression to estimate marginal effects of each single genetic variant on smoking. For the CPD phenotype, we used linear regression analysis. We used PLINK (v1.9) to performed logistic and linear regression association analyses. In each model, we included age and sex as covariates and the ten top PCs calculated with flashpca (v1.2.5) to adjust for population stratification. Quantile–quantile plots were generate to evaluate the extent to which the observed GWAS p values deviated from the null hypothesis (Supplementary Fig. 5). The LD score regression intercept was used to quantify inflation resulting from confounding bias³⁵. We identified proxy independent SNPs (LD, r² < 0.1) and selected the ones with the most significant p values as index SNPs. We then defined a risk locus as a physical region containing all GWS variants (p < 5 × 10⁻⁸) that were in LD (r² > 0.6) with the index SNP. Loci within 250 kilobases were merged^67,68. We used ANNOVAR (v2016Feb01) to annotate index SNPs with their nearest genes⁶⁹. Loci mapped to the same nearest gene were merged to yield one risk locus. We summarized whether the identified locus in the MVP was mapped to a gene previously reported to be associated with smoking phenotypes in other studies. If it was, we also listed the associated smoking phenotypes. The results are based on a literature review and query search in the GWAS Catalog⁷⁰. LocusZoom (v1.3) was used to visualize regional associations and LD patterns⁷¹.

Meta-analysis of GWAS results

We meta-analyzed GWAS summary statistics of smoking status phenotypes (smoking initiation meta N = 842,717, smoking cessation meta N = 450,129) in the MVP EA samples (N = 209,915), and GSCAN excluding 23andMe samples (N = 632,802). The summary statistics from 23andMe are not publicly available. We also performed trans-ethnic meta-analyses of the trajectory phenotypes (contrast I meta N = 140,889, contrast II meta N = 202,116) in EAs, AAs, and HAs in the MVP. Meta-analyses were performed using inverse-variance mixed effects model implemented in METAL (v2011-03-25) with the test statistics and standard errors scheme⁷². We enabled the genomic control correction option on all GWAS results in the meta-analysis.

Downstream analysis for smoking GWASs in European Americans

Heritability estimation and enrichment analyses were performed for smoking initiation, smoking cessation, smoking trajectory contrast I (current vs. never), and smoking trajectory contrast II (current vs. mixed) in the MVP EA samples. We applied LD score regression (v1.0.0) to estimate the narrow-sense heritability due to additive genetic effects³⁵. We also estimated heritability enrichments to identify tissue and cell types that were the most relevant to smoking-related traits³⁶. We used 66 functional annotations from GenoSkyline-Plus (v1.0.0) including tissues and cell lines from blood, brain, lung, vascular, heart, thymus, spleen, muscle, gastrointestinal, pancreas, liver, fat, bone/connective, skin, breast, and ovary for the heritability enrichment analysis³⁷. Bonferroni correction was applied to the 66 enrichment tests for two smoking trajectory contrasts and two smoking status phenotypes, resulting in a significance cutoff of 0.05/66/4 = 1.9 × 10⁻⁴.

We performed functional gene mapping for two smoking trajectory contrasts in the MVP EA samples. Positional, eQTL, and chromatin interaction mapping was performed using the functional mapping and annotation tool FUMA (v1.3.6)⁷³. We used the default parameter settings for the identification of lead and candidate SNPs except for changing the maximum distance to 250 kilobases to merge LD blocks into a locus. The change was consistent with the distance parameter we used in previous genetic loci definition. Positional mapping was performed with the default parameter setting. eQTL mapping was restricted to 13 GTEx v8 brain tissues. We used the built-in adult cortex Hi–C data and enhancer/promoter annotations in 12 brain tissues from Roadmap epigenomes to perform chromatin interaction mapping. By default, we used FDR < 0.05 for the eQTL mapping and FDR < 1 × 10⁻⁶ for the chromatin interaction mapping. The functional annotation and gene mapping parameter setting and results are available on the FUMA public results section.

We used DAVID (v6.8) to perform pathway-enrichment analysis on GWAS variants with p values less than 1 × 10⁻⁵⁴¹. ANNOVAR was used to map variants to their nearest genes⁶⁹. Smoking trajectory contrast I (current vs. never) had 344 genes mapped from 6296 variants and smoking trajectory contrast II (current vs. mixed) had 130 genes mapped from 877 variants. Gene Ontology (GO) terms in three categories (GO: biological process, GO: cellular component, and GO: molecular function) were used for pathway-enrichment analyses and biological interpretation⁶⁹. From 550 pathway-enrichment tests, 321 and 229 GO terms contained genes that overlapped with smoking trajectory contrast I (current vs. never) and smoking trajectory contrast II (current vs. mixed), respectively. Bonferroni correction was applied to correct for 550 tests, yielding a significance cutoff of 0.05/550 = 9.1 × 10⁻⁵.

We used GNOVA to estimate genetic correlations between smoking trajectory phenotypes and other traits and diseases with publicly available summary statistics⁴². GWAS summary statistics for 524 complex traits and diseases were downloaded from LD Hub and recent publications on addictive behaviors⁴³. For the UKBB phenotypes integrated in LD Hub, we manually filtered out lifestyle and environmental traits with a keyword search for sport, activity, exercise, driving, transportation, travel, job, and employment. Bonferroni correction was applied to the 524 correlations and two smoking trajectory contrasts, yielding a significance cutoff of 0.05/524/2 = 4.8 × 10⁻⁵.

Bi-directional Mendelian randomization (MR) was performed on trait pairs identified with significant genetic correlations. Instrumental variables were selected as GWS SNPs for the exposure excluding those GWS for the outcome to satisfy the MR assumptions. Trait pairs with 3 or fewer instrumental variables were excluded from the analysis, leaving 638 trait pairs to be tested for causal relationships. We used four MR methods: IVW, WM, MR-Egger, and MR-PRESSO implemented in the MendelianRandomization (v0.4.2) and MR-PRESSO (v1.0) R packages^74,75,76,77. We tested the intercept term from the MR-Egger to detect potential horizontal pleiotropic effects. We also used MR-PRESSO to identify outliers of instrumental variables with pleiotropic effects. The significance cutoff for outlier identification was calculated as 0.05/number of instrumental variables. If outliers were identified, we reported causal estimates obtained by MR-PRESSO after removing the outliers. In the absence of outliers, the MR-PRESSO provided results identical to those from IVW. We used t-test statistics to derive p values from all four methods for comparability. We applied a Bonferroni correction on bi-directional MR tests on trait pairs with more than three instrumental variables and each was tested with four methods, resulting in a significance cutoff of 0.05/638/4 = 1.96 × 10⁻⁵.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

The full summary-level association data from the meta-analysis for each of the smoking-related traits from this report are available through dbGaP accession number phs001672.v4.p1. 1000 Genomes Project reference panel can be downloaded from [ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/]. GWAS summary statistics used in the genetic correlation analysis were made publicly available by GSCAN, Social Science Genetic Association Consortium (SSGAC) [https://www.thessgac.org/data], and LD Hub.

Code availability

Code to perform analyses in this paper is available from the authors upon request (B.L.), or from the URLs associated with each software in the online methods section.

References

Li, M. D., Cheng, R., Ma, J. Z. & Swan, G. E. A meta-analysis of estimated genetic and environmental effects on smoking behavior in male and female adult twins. Addiction 98, 23–31 (2003).
Article PubMed Google Scholar
Maes, H. H. et al. A twin study of genetic and environmental influences on tobacco initiation, regular tobacco use and nicotine dependence. Psychol. Med. 34, 1251–1261 (2004).
Article PubMed Google Scholar
Tobacco, GeneticsC. Genome-wide meta-analyses identify multiple loci associated with smoking behavior. Nat. Genet. 42, 441–447 (2010).
Article CAS Google Scholar
Hancock, D. B., Markunas, C. A., Bierut, L. J. & Johnson, E. O. Human genetics of addiction: new insights and future directions. Curr. Psychiatry Rep. 20, 8 (2018).
Article PubMed PubMed Central Google Scholar
Liu, M. et al. Association studies of up to 1.2 million individuals yield new insights into the genetic etiology of tobacco and alcohol use. Nat. Genet. 51, 237–244 (2019).
Article CAS PubMed PubMed Central Google Scholar
Erzurumluoglu, A. M. et al. Meta-analysis of up to 622,409 individuals identifies 40 novel smoking behaviour associated genetic loci. Mol. Psychiatry 25, 2392–2409 (2020).
Article CAS PubMed Google Scholar
Hancock, D. B. et al. Genome-wide association study across European and African American ancestries identifies a SNP in DNMT3B contributing to nicotine dependence. Mol. Psychiatry 23, 1–9 (2018).
Article CAS Google Scholar
Siedlinski, M. et al. Genome-wide association study of smoking behaviours in patients with COPD. Thorax 66, 894–902 (2011).
Article PubMed Google Scholar
Thorgeirsson, T. E. et al. Sequence variants at CHRNB3-CHRNA6 and CYP2A6 affect smoking behavior. Nat. Genet. 42, 448–453 (2010).
Article CAS PubMed PubMed Central Google Scholar
Karlsson Linnér, R. et al. Genome-wide association analyses of risk tolerance and risky behaviors in over 1 million individuals identify hundreds of loci and shared genetic influences. Nat. Genet. 51, 245–257 (2019).
Article PubMed CAS Google Scholar
Yang, J. & Li, M. D. Converging findings from linkage and association analyses on susceptibility genes for smoking and other addictions. Mol. Psychiatry 21, 992–1008 (2016).
Article CAS PubMed Google Scholar
Bierut, L. J. et al. Novel genes identified in a high-density genome wide association study for nicotine dependence. Hum. Mol. Genet. 16, 24–35 (2007).
Article CAS PubMed Google Scholar
Wain, L. V. et al. Novel insights into the genetics of smoking behaviour, lung function, and chronic obstructive pulmonary disease (UK BiLEVE): a genetic association study in UK Biobank. Lancet Respir. Med. 3, 769–781 (2015).
Article PubMed PubMed Central Google Scholar
Liu, J. Z. et al. Meta-analysis and imputation refines the association of 15q25 with smoking quantity. Nat. Genet. 42, 436–440 (2010).
Article CAS PubMed PubMed Central Google Scholar
Rice, J. P. et al. CHRNB3 is more strongly associated with Fagerstrom test for cigarette dependence-based nicotine dependence than cigarettes per day: phenotype definition changes genome-wide association studies results. Addiction 107, 2019–2028 (2012).
Article PubMed PubMed Central Google Scholar
Connor Gorber, S., Schofield-Hurwitz, S., Hardt, J., Levasseur, G. & Tremblay, M. The accuracy of self-reported smoking: a systematic review of the relationship between self-reported and cotinine-assessed smoking status. Nicotine Tob. Res. 11, 12–24 (2009).
Article PubMed Google Scholar
Ware, J. J. et al. Genome-wide meta-analysis of cotinine levels in cigarette smokers identifies locus at 4q13.2. Sci. Rep. 6, 20092 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Munafo, M. R. et al. Association between genetic variants on chromosome 15q25 locus and objective measures of tobacco exposure. J. Natl. Cancer Inst. 104, 740–748 (2012).
Article CAS PubMed PubMed Central Google Scholar
Loukola, A. et al. A genome-wide association study of a biomarker of nicotine metabolism. PLoS Genet. 11, e1005498 (2015).
Article PubMed PubMed Central CAS Google Scholar
Baurley, J. W. et al. Genome-wide association of the laboratory-based nicotine metabolite ratio in three ancestries. Nicotine Tob. Res. 18, 1837–1844 (2016).
Article CAS PubMed PubMed Central Google Scholar
Lessov-Schlaggar, C. N., Kristjansson, S. D., Bucholz, K. K., Heath, A. C. & Madden, P. A. Genetic influences on developmental smoking trajectories. Addiction 107, 1696–1704 (2012).
Article PubMed PubMed Central Google Scholar
McGinnis, K. A. et al. Using DNA methylation to validate an electronic medical record phenotype for smoking. Addict. Biol. 24, 1056–1065 (2018).
Article PubMed PubMed Central CAS Google Scholar
Munafo, M. R., Timpson, N. J., David, S. P., Ebrahim, S. & Lawlor, D. A. Association of the DRD2 gene Taq1A polymorphism and smoking behavior: a meta-analysis and new data. Nicotine Tob. Res. 11, 64–76 (2009).
Article CAS PubMed PubMed Central Google Scholar
Hirvonen, K., Korhonen, T., Salomaa, V., Mannisto, S. & Kaprio, J. Association of the DBH polymorphism rs3025343 with smoking cessation in a large population-based sample. Nicotine Tob. Res. 19, 1112–1115 (2017).
Article CAS PubMed Google Scholar
McGuire, V. et al. Association of DRD2 and DRD3 polymorphisms with Parkinson’s disease in a multiethnic consortium. J. Neurol. Sci. 307, 22–29 (2011).
Article CAS PubMed PubMed Central Google Scholar
Ma, Y., Wang, M., Yuan, W., Su, K. & Li, M. D. The significant association of Taq1A genotypes in DRD2/ANKK1 with smoking cessation in a large-scale meta-analysis of Caucasian populations. Transl. Psychiatry 5, e686 (2015).
Article CAS PubMed PubMed Central Google Scholar
Yang, J. et al. The contribution of rare and common variants in 30 genes to risk nicotine dependence. Mol. Psychiatry 20, 1467–1478 (2015).
Article CAS PubMed Google Scholar
Gelernter, J. et al. Haplotype spanning TTC12 and ANKK1, flanked by the DRD2 and NCAM1 loci, is strongly associated to nicotine dependence in two distinct American populations. Hum. Mol. Genet. 15, 3498–3507 (2006).
Article CAS PubMed Google Scholar
Zhao, J. et al. A genome-wide profiling of brain DNA hydroxymethylation in Alzheimer’s disease. Alzheimers Dement. 13, 674–688 (2017).
Article PubMed PubMed Central Google Scholar
Krumm, N. et al. Excess of rare, inherited truncating mutations in autism. Nat. Genet. 47, 582–588 (2015).
Article CAS PubMed PubMed Central Google Scholar
Bustillo, J. R. et al. Risk-conferring glutamatergic genes and brain glutamate plus glutamine in schizophrenia. Front. Psychiatry 8, 79 (2017).
Article PubMed PubMed Central Google Scholar
Berrettini, W. H. & Doyle, G. A. The CHRNA5-A3-B4 gene cluster in nicotine addiction. Mol. Psychiatry 17, 856–866 (2012).
Article CAS PubMed Google Scholar
Weiss, R. B. et al. A candidate gene approach identifies the CHRNA5-A3-B4 region as a risk factor for age-dependent nicotine addiction. PLoS Genet. 4, e1000125 (2008).
Article PubMed PubMed Central CAS Google Scholar
Salloum, N. C. et al. From genes to treatments: a systematic review of the pharmacogenetics in smoking cessation. Pharmacogenomics 19, 861–871 (2018).
Article CAS PubMed PubMed Central Google Scholar
Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Article CAS PubMed PubMed Central Google Scholar
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
Article CAS PubMed PubMed Central Google Scholar
Lu, Q. et al. Systematic tissue-specific functional annotation of the human genome highlights immune-related DNA elements for late-onset Alzheimer’s disease. PLoS Genet. 13, e1006933 (2017).
Article PubMed PubMed Central CAS Google Scholar
Volkow, N. D., Michaelides, M. & Baler, R. The neuroscience of drug reward and addiction. Physiological Rev. 99, 2115–2140 (2019).
Article CAS Google Scholar
Schultz, W. Reward functions of the basal ganglia. J. Neural Transm. 123, 679–693 (2016).
Article PubMed Google Scholar
Ekstrom, A. D. & Ranganath, C. Space, time, and episodic memory: The hippocampus is all over the cognitive map. Hippocampus 28, 680–687 (2018).
Article PubMed Google Scholar
Huang da, W., Sherman, B. T. & Lempicki, R. A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4, 44–57 (2009).
Article PubMed CAS Google Scholar
Lu, Q. et al. A powerful approach to estimating annotation-stratified genetic covariance via GWAS summary statistics. Am. J. Hum. Genet. 101, 939–964 (2017).
Article CAS PubMed PubMed Central Google Scholar
Zheng, J. et al. LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis. Bioinformatics 33, 272–279 (2017).
Article CAS PubMed Google Scholar
Marg, A. et al. Neurotractin, a novel neurite outgrowth-promoting Ig-like protein that interacts with CEPU-1 and LAMP. J. Cell Biol. 145, 865–876 (1999).
Article CAS PubMed PubMed Central Google Scholar
Noh, K. et al. Negr1 controls adult hippocampal neurogenesis and affective behaviors. Mol. Psychiatry 24, 1189–1205 (2019).
Article CAS PubMed Google Scholar
Hyde, C. L. et al. Identification of 15 genetic loci associated with risk of major depression in individuals of European descent. Nat. Genet. 48, 1031–1036 (2016).
Article CAS PubMed PubMed Central Google Scholar
Karis, K. et al. Altered expression profile of IgLON family of neural cell adhesion molecules in the dorsolateral prefrontal cortex of schizophrenic patients. Front. Mol. Neurosci. 11, 8 (2018).
Article PubMed PubMed Central CAS Google Scholar
Speliotes, E. K. et al. Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nat. Genet. 42, 937–948 (2010).
Article CAS PubMed PubMed Central Google Scholar
Otowa, T. et al. Meta-analysis of genome-wide association studies of anxiety disorders. Mol. Psychiatry 21, 1391–1399 (2016).
Article CAS PubMed PubMed Central Google Scholar
Neves-Pereira, M. et al. The brain-derived neurotrophic factor gene confers susceptibility to bipolar disorder: evidence from a family-based association study. Am. J. Hum. Genet. 71, 651–655 (2002).
Article CAS PubMed PubMed Central Google Scholar
Haerian, B. S. BDNF rs6265 polymorphism and drug addiction: a systematic review and meta-analysis. Pharmacogenomics 14, 2055–2065 (2013).
Article CAS PubMed Google Scholar
Nussbaum, J. et al. Significant association of the neurexin-1 gene (NRXN1) with nicotine dependence in European- and African-American smokers. Hum. Mol. Genet. 17, 1569–1577 (2008).
Article CAS PubMed PubMed Central Google Scholar
Tong, X. J. et al. Retrograde synaptic inhibition is mediated by alpha-Neurexin binding to the alpha2delta subunits of N-Type calcium channels. Neuron 95, 326–340.e5 (2017).
Article CAS PubMed PubMed Central Google Scholar
Marshall, C. R. et al. Contribution of copy number variants to schizophrenia from a genome-wide study of 41,321 subjects. Nat. Genet. 49, 27–35 (2017).
Article CAS PubMed Google Scholar
Al Shehhi, M. et al. NRXN1 deletion syndrome; phenotypic and penetrance data from 34 families. Eur. J. Med. Genet. 62, 204–209 (2019).
Article PubMed Google Scholar
Levinson, D. F. et al. Genome-wide association study of multiplex schizophrenia pedigrees. Am. J. Psychiatry 169, 963–973 (2012).
Article PubMed PubMed Central Google Scholar
Bidwell, L. C. et al. An initial investigation of associations between dopamine-linked genetic variation and smoking motives in African Americans. Pharmacol. Biochem. Behav. 138, 104–110 (2015).
Article CAS PubMed PubMed Central Google Scholar
Holmes, J. B., Speed, D. & Balding, D. J. Summary statistic analyses can mistake confounding bias for heritability. Genet. Epidemiol. 43, 930–940 (2019).
Article PubMed Google Scholar
Speed, D. & Balding, D. J. SumHer better estimates the SNP heritability of complex traits from summary statistics. Nat. Genet. 51, 277–284 (2019).
Article CAS PubMed Google Scholar
Speed D., Holmes J. & Balding D. Evaluating and improving heritability models using summary statistics. Nat. Genet. 52, 458–462 (2020).
Article CAS PubMed Google Scholar
McGinnis, K. A. et al. Validating smoking data from the Veteran’s Affairs Health Factors dataset, an electronic data source. Nicotine Tob. Res. 13, 1233–1239 (2011).
Article PubMed PubMed Central Google Scholar
Jones, A. S. et al. Multi-dimensional risk factor patterns associated with non-use of highly active antiretroviral therapy among human immunodeficiency virus-infected women. Women’s Health Issues 20, 335–342 (2010).
Article PubMed PubMed Central Google Scholar
Marshall, B. D. et al. Drinking trajectories among HIV-infected men who have sex with men: a cohort study of United States veterans. Drug Alcohol Depend. 148, 69–76 (2015).
Article PubMed Google Scholar
Nagin, D. S., Jones, B. L., Passos, V. L. & Tremblay, R. E. Group-based multi-trajectory modeling. Stat. Methods Med. Res. 27, 2015–2023 (2018).
Article MathSciNet PubMed Google Scholar
Gaziano, J. M. et al. Million Veteran Program: a mega-biobank to study genetic influences on health and disease. J. Clin. Epidemiol. 70, 214–223 (2016).
Article PubMed Google Scholar
Das, S. et al. Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284–1287 (2016).
Article CAS PubMed PubMed Central Google Scholar
Schizophrenia Working Group of the Psychiatric Genomics Consortium Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).
Article ADS PubMed Central CAS Google Scholar
Pasman, J. A. et al. GWAS of lifetime cannabis use reveals new risk loci, genetic overlap with psychiatric traits, and a causal influence of schizophrenia. Nat. Neurosci. 21, 1161–1170 (2018).
Article CAS PubMed PubMed Central Google Scholar
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).
Article PubMed PubMed Central CAS Google Scholar
Welter, D. et al. The NHGRI GWAS catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014).
Article CAS PubMed Google Scholar
Pruim, R. J. et al. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics 26, 2336–2337 (2010).
Article CAS PubMed PubMed Central Google Scholar
Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).
Article CAS PubMed PubMed Central Google Scholar
Watanabe, K., Taskesen, E., van Bochoven, A. & Posthuma, D. Functional mapping and annotation of genetic associations with FUMA. Nat. Commun. 8, 1826 (2017).
Article ADS PubMed PubMed Central CAS Google Scholar
Bowden, J. et al. A framework for the investigation of pleiotropy in two-sample summary data Mendelian randomization. Stat. Med. 36, 1783–1802 (2017).
Article ADS MathSciNet PubMed PubMed Central Google Scholar
Bowden, J., Davey Smith, G., Haycock, P. C. & Burgess, S. Consistent estimation in Mendelian randomization with some invalid instruments using a weighted median estimator. Genet. Epidemiol. 40, 304–314 (2016).
Article PubMed PubMed Central Google Scholar
Bowden, J., Davey Smith, G. & Burgess, S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. Int. J. Epidemiol. 44, 512–525 (2015).
Article PubMed PubMed Central Google Scholar
Pickrell, J. K. et al. Detection and interpretation of shared genetic influences on 42 human traits. Nat. Genet. 48, 709–717 (2016).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This research is based on data from the Million Veteran Program, Office of Research and Development, Veterans Health Administration, and was supported by award #MVP004. This publication does not represent the views of the Department of Veterans Affairs or the United States Government. Our study was supported by National Institute on Drug Abuse grants R01DA038632, R01DA047063, and R01DA047820. We thank all participants in the MVP who allowed access to their electronic health records and provided blood samples for genomic analyses. We thank all authors and consortia for making their summary statistics publicly available to the research community.

Author information

These authors contributed equally: Ke Xu, Boyang Li.

Authors and Affiliations

Yale School of Medicine, New Haven, CT, 06511, USA
Ke Xu, Hang Zhou, William C. Becker, Joel Gelernter, Hongyu Zhao & Amy C. Justice
VA Connecticut Healthcare System, West Haven, CT, 06516, USA
Ke Xu, Boyang Li, Kathleen A. McGinnis, Cecilia Dao, Hang Zhou, William C. Becker, Joel Gelernter & Amy C. Justice
Yale School of Public Health, New Haven, CT, 06511, USA
Boyang Li, Cecilia Dao, Ning Sun & Hongyu Zhao
University of Kentucky College of Public Health, Lexington, KY, 40536, USA
Rachel Vickers-Smith
University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, 19104, USA
Rachel L. Kember & Henry R. Kranzler
Crescenz Veterans Affairs Medical Center, Philadelphia, PA, 19104, USA
Rachel L. Kember & Henry R. Kranzler

Authors

Ke Xu
View author publications
You can also search for this author in PubMed Google Scholar
Boyang Li
View author publications
You can also search for this author in PubMed Google Scholar
Kathleen A. McGinnis
View author publications
You can also search for this author in PubMed Google Scholar
Rachel Vickers-Smith
View author publications
You can also search for this author in PubMed Google Scholar
Cecilia Dao
View author publications
You can also search for this author in PubMed Google Scholar
Ning Sun
View author publications
You can also search for this author in PubMed Google Scholar
Rachel L. Kember
View author publications
You can also search for this author in PubMed Google Scholar
Hang Zhou
View author publications
You can also search for this author in PubMed Google Scholar
William C. Becker
View author publications
You can also search for this author in PubMed Google Scholar
Joel Gelernter
View author publications
You can also search for this author in PubMed Google Scholar
Henry R. Kranzler
View author publications
You can also search for this author in PubMed Google Scholar
Hongyu Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Amy C. Justice
View author publications
You can also search for this author in PubMed Google Scholar

Consortia

VA Million Veteran Program

Ke Xu
, Boyang Li
, Kathleen A. McGinnis
, Rachel Vickers-Smith
, Cecilia Dao
, Ning Sun
, Rachel L. Kember
, Hang Zhou
, William C. Becker
, Joel Gelernter
, Henry R. Kranzler
, Hongyu Zhao
& Amy C. Justice

Contributions

K.X., B.L., A.C.J., and H. Zhao designed the study. B.L. performed the association and downstream analyses. K.X. advised on the biological interpretation of the association results. K.A.M, R.V.S., and C.D. identified smoking trajectory phenotypes. N.S. performed the population clustering based on genotype information. The Million Veterans Program provided data and computational platform to perform the analyses. K.X. and B.L. prepared the first version of the paper. K.X., B.L., K.A.M, R.V.S., C.D., N.S., R.L.K., H. Zhou., W.C.B., J.G., H.R.K, H. Zhao, and A.C.J contributed to paper editing and all authors approved the paper.

Corresponding author

Correspondence to Amy C. Justice.

Ethics declarations

Competing interests

H.R.K. is a member of the American Society of Clinical Psychopharmacology’s Alcohol Clinical Trials Initiative, which was supported in the last 3 years by AbbVie, Alkermes, Ethypharm, Indivior, Lilly, Lundbeck, Otsuka, Pfizer, Arbor, and Amygdala Neurosciences. H.R.K. and J.G. are named as inventors on PCT patent application #15/878,640 entitled: “Genotype-guided dosing of opioid agonists,” filed January 24, 2018. The other authors have no competing interests to declare.

Additional information

Peer review information Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Descriptions of Additional Supplementary Files

Supplementary Data 1

Supplementary Data 2

Supplementary Data 3

Supplementary Data 4

Supplementary Data 5

Supplementary Data 6

Supplementary Data 7

Supplementary Data 8

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Xu, K., Li, B., McGinnis, K.A. et al. Genome-wide association study of smoking trajectory and meta-analysis of smoking status in 842,000 individuals. Nat Commun 11, 5302 (2020). https://doi.org/10.1038/s41467-020-18489-3

Download citation

Received: 05 April 2019
Accepted: 20 August 2020
Published: 20 October 2020
DOI: https://doi.org/10.1038/s41467-020-18489-3

This article is cited by

Exploring the causal association between rheumatoid arthritis and the risk of cervical cancer: a two-sample Mendelian randomization study
- Minxian Xu
- Huan Chen
- Qing Li
Arthritis Research & Therapy (2024)
Deep sequencing of candidate genes identified 14 variants associated with smoking abstinence in an ethnically diverse sample
- Paul M. Cinciripini
- David W. Wetter
- Sanjay Shete
Scientific Reports (2024)
Metabolic profiling of smoking, associations with type 2 diabetes and interaction with genetic susceptibility
- Yuxia Wei
- Sara Hägg
- Sofia Carlsson
European Journal of Epidemiology (2024)
Multi-ancestry meta-analysis of tobacco use disorder identifies 461 potential risk genes and reveals associations with multiple health outcomes
- Sylvanus Toikumo
- Mariela V. Jennings
- Sandra Sanchez-Roige
Nature Human Behaviour (2024)
Multi-trait genome-wide association analyses leveraging alcohol use disorder findings identify novel loci for smoking behaviors in the Million Veteran Program
- Youshu Cheng
- Cecilia Dao
- Ke Xu
Translational Psychiatry (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.