GWAS of smoking behaviour in 165,436 Japanese people reveals seven new loci and shared genetic architecture


Cigarette smoking is a risk factor for a wide range of human diseases1. To investigate the genetic components associated with smoking behaviours in the Japanese population, we conducted a genome-wide association study of four smoking-related traits using up to 165,436 individuals. In total, we identified seven new loci, including three loci associated with the number of cigarettes per day (EPHX2–CLU, RET and CUX2–ALDH2), three loci associated with smoking initiation (DLC1, CXCL12–TMEM72-AS1 and GALR1–SALL3) and LINC01793–MIR4432HG, associated with the age of smoking initiation. Of these, three loci (LINC01793–MIR4432HG, CXCL12–TMEM72-AS1 and GALR1–SALL3) were found by conducting an additional sex-stratified genome-wide association study. This additional analysis showed heterogeneity of effects between sexes. The cross-sex linkage disequilibrium score regression2,3 analysis also indicated that the genetic component of smoking initiation was significantly different between the sexes. Cross-trait linkage disequilibrium score regression analysis and trait-relevant tissue analysis showed that the number of cigarettes per day has a specific genetic background distinct from those of the other three smoking behaviours. We also report 11 diseases that share genetic basis with smoking behaviours. Although the current study should be carefully considered owing to the lack of replication samples, our findings characterized the genetic architecture of smoking behaviours. Further studies in East Asian populations are warranted to confirm our findings.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: GWAS results for smoking behaviours.
Figure 2: Genetic correlation of smoking behaviours to various diseases.
Figure 3: Trait-relevant cell types of smoking behaviours.

Data availability

GWAS summary statistics of the smoking behaviours are publicly available at our website (JENGER; and the National Bioscience Database Center (NBDC) Human Database (research ID: hum0014) as open data without any access restrictions. The accession numbers for each phenotype in the NBDC Human Database are as follows: age of smoking initiation: hum0014.v14.asi.v1; CPD: hum0014.v14.cpd.v1; smoking initiation: hum0014.v14.ens.v1; smoking cessation: hum0014.v14.fcs.v1. GWAS genotype data from the participants were deposited at the NBDC Human Database (research ID: hum0014).


  1. 1.

    US Department of Health and Human Services. The Health Consequences of Smoking — 50 Years of Progress: A Report of the Surgeon General (CDC, 2014).

  2. 2.

    Bulik-Sullivan, B. K. et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).

    CAS  Article  Google Scholar 

  3. 3.

    Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).

    CAS  Article  Google Scholar 

  4. 4.

    Global Health Estimates 2015: Deaths by Cause, Age, Sex, by Country and by Region, 2000–2015 (WHO, 2016).

  5. 5.

    WHO Report on the Global Tobacco Epidemic, 2017: Monitoring Tobacco Use and Prevention Policies (WHO, 2017).

  6. 6.

    Summary Results of the National Health and Nutrition Survey Japan (National Institute of Health and Nutrition, 2016).

  7. 7.

    Jamal, A. et al. Current cigarette smoking among adults — United States, 2005–2015. MMWR Morb. Mortal. Wkly Rep. 65, 1205–1211 (2016).

    Article  Google Scholar 

  8. 8.

    Boardman, J. D., Blalock, C. L. & Pampel, F. C. Trends in the genetic influences on smoking. J. Health Soc. Behav. 51, 108–123 (2010).

    Article  Google Scholar 

  9. 9.

    Yang, J. & Li, M. D. Converging findings from linkage and association analyses on susceptibility genes for smoking and other addictions. Mol. Psychiatry 21, 992–1008 (2016).

    CAS  Article  Google Scholar 

  10. 10.

    The Tobacco and Genetics Consortium Genome-wide meta-analyses identify multiple loci associated with smoking behavior. Nat. Genet. 42, 441–447 (2010).

    Article  Google Scholar 

  11. 11.

    Berrettini, W. et al. Alpha-5/alpha-3 nicotinic receptor subunit alleles increase risk for heavy smoking. Mol. Psychiatry 13, 368–373 (2008).

    CAS  Article  Google Scholar 

  12. 12.

    Kumasaka, N. et al. Haplotypes with copy number and single nucleotide polymorphisms in CYP2A6 locus are associated with smoking quantity in a Japanese population. PLoS One 7, e44507 (2012).

    CAS  Article  Google Scholar 

  13. 13.

    Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).

    CAS  Article  Google Scholar 

  14. 14.

    Zheng, J. et al. LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis. Bioinformatics 33, 272–279 (2017).

    CAS  Article  Google Scholar 

  15. 15.

    Thorgeirsson, T. E. et al. Sequence variants at CHRNB3–CHRNA6 and CYP2A6 affect smoking behavior. Nat. Genet. 42, 448–453 (2010).

    CAS  Article  Google Scholar 

  16. 16.

    Liu, J. Z. et al. Meta-analysis and imputation refines the association of 15q25 with smoking quantity. Nat. Genet. 42, 436–440 (2010).

    CAS  Article  Google Scholar 

  17. 17.

    Nagai, A. et al. Overview of the BioBank Japan Project: study design and profile. J. Epidemiol. 27, S2–S8 (2017).

    Article  Google Scholar 

  18. 18.

    Hirata, M. et al. Cross-sectional analysis of BioBank Japan clinical data: a large cohort of 200,000 patients with 47 common diseases. J. Epidemiol. 27, S9–S21 (2017).

    Article  Google Scholar 

  19. 19.

    Kanai, M., Tanaka, T. & Okada, Y. Empirical estimation of genome-wide significance thresholds based on the 1000 Genomes Project data set. J. Hum. Genet. 61, 861–866 (2016).

    CAS  Article  Google Scholar 

  20. 20.

    Masaoka, H. et al. Aldehyde dehydrogenase 2 polymorphism is a predictor of smoking cessation. Nicotine Tob. Res. 19, 1087–1094 (2017).

    CAS  PubMed  Google Scholar 

  21. 21.

    Masaoka, H. et al. Combination of ALDH2 and ADH1B polymorphisms is associated with smoking initiation: a large-scale cross-sectional study in a Japanese population. Drug Alcohol Depend. 173, 85–91 (2017).

    CAS  Article  Google Scholar 

  22. 22.

    McKay, J. D. et al. Large-scale association analysis identifies new lung cancer susceptibility loci and heterogeneity in genetic susceptibility across histological subtypes. Nat. Genet. 49, 1126–1132 (2017).

    CAS  Article  Google Scholar 

  23. 23.

    McVean, G. Linkage disequilibrium. in Handbook of Statistical Genetics. 3rd ed. (eds. Balding, D.J., Bishop, M., & Cannings, C.) 914–918 (Wiley, 2007).

  24. 24.

    Patel, Y. M. et al. Novel association of genetic markers affecting CYP2A6 activity and lung cancer risk. Cancer Res. 76, 5768–5776 (2016).

    CAS  Article  Google Scholar 

  25. 25.

    Gelernter, J. et al. Genome-wide association study of nicotine dependence in American populations: identification of novel risk loci in both African-Americans and European-Americans. Biol. Psychiatry 77, 493–503 (2015).

    CAS  Article  Google Scholar 

  26. 26.

    Zhang, J. et al. Genetic pleiotropy between nicotine dependence and respiratory outcomes. Sci. Rep. 7, 16907 (2017).

    Article  Google Scholar 

  27. 27.

    Wang, T. et al. Pleiotropy of genetic variants on obesity and smoking phenotypes: results from the Oncoarray Project of The International Lung Cancer Consortium. PLoS One 12, e0185660 (2017).

    Article  Google Scholar 

  28. 28.

    Akiyama, M. et al. Genome-wide association study identifies 112 new loci for body mass index in the Japanese population. Nat. Genet. 49, 1458–1467 (2017).

    CAS  Article  Google Scholar 

  29. 29.

    Kanai, M. et al. Genetic analysis of quantitative traits in the Japanese population links cell types to complex human diseases. Nat. Genet. 50, 390–400 (2018).

    CAS  Article  Google Scholar 

  30. 30.

    Yang, J. et al. Genome-wide genetic homogeneity between sexes and populations for human height and body mass index. Hum. Mol. Genet. 24, 7445–7449 (2015).

    CAS  Article  Google Scholar 

  31. 31.

    Li, M. D., Cheng, R., Ma, J. Z. & Swan, G. E. A meta-analysis of estimated genetic and environmental effects on smoking behavior in male and female adult twins. Addiction 98, 23–31 (2003).

    Article  Google Scholar 

  32. 32.

    Lindström, S. et al. Quantifying the genetic correlation between multiple cancer types. Cancer Epidemiol. Biomarkers Prev. 26, 1427–1435 (2017).

    Article  Google Scholar 

  33. 33.

    Nakajima, M. et al. A genome-wide association study identifies susceptibility loci for ossification of the posterior longitudinal ligament of the spine. Nat. Genet. 46, 1012–1016 (2014).

    CAS  Article  Google Scholar 

  34. 34.

    How Tobacco Smoke Causes Disease: The Biology and Behavioral Basis for Smoking-Attributable Disease: A Report of the Surgeon General (CDC, 2010).

  35. 35.

    Weiss, N. S. Cigarette smoking and arteriosclerosis obliterans: an epidemiologic approach. Am. J. Epidemiol. 95, 17–25 (1972).

    CAS  Article  Google Scholar 

  36. 36.

    Powell, J. T. et al. Risk factors associated with the development of peripheral arterial disease in smokers: a case–control study. Atherosclerosis 129, 41–48 (1997).

    CAS  Article  Google Scholar 

  37. 37.

    Price, J. et al. Relationship between smoking and cardiovascular risk factors in the development of peripheral arterial disease and coronary artery disease: Edinburgh Artery Study. Eur. Heart J. 20, 344–353 (1999).

    CAS  Article  Google Scholar 

  38. 38.

    Rohrmann, S. et al. Smoking and the risk of prostate cancer in the European Prospective Investigation into Cancer and Nutrition. Br. J. Cancer 108, 708–714 (2013).

    CAS  Article  Google Scholar 

  39. 39.

    Wang, S. et al. Significant associations of CHRNA2 and CHRNA6 with nicotine dependence in European American and African American populations. Hum. Genet. 133, 575–586 (2014).

    CAS  Article  Google Scholar 

  40. 40.

    Tsai, Y.-W., Tsai, T.-I., Yang, C.-L. & Kuo, K. N. Gender differences in smoking behaviors in an Asian population. J. Womens Health (Larchmt) 17, 971–978 (2008).

    Article  Google Scholar 

  41. 41.

    Speliotes, E. K. et al. Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nat. Genet. 42, 937–948 (2010).

    CAS  Article  Google Scholar 

  42. 42.

    Hoffmann, T. J. et al. A large multiethnic genome-wide association study of adult body mass index identifies novel loci. Genetics 210, 499–515 (2018).

    CAS  Article  Google Scholar 

  43. 43.

    Berndt, S. I. et al. Genome-wide meta-analysis identifies 11 new loci for anthropometric traits and provides insights into genetic architecture. Nat. Genet. 45, 501–512 (2013).

    CAS  Article  Google Scholar 

  44. 44.

    Shungin, D. et al. New genetic loci link adipose and insulin biology to body fat distribution. Nature 518, 187–196 (2015).

    CAS  Article  Google Scholar 

  45. 45.

    The International HapMap 3 Consortium Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).

    Article  Google Scholar 

  46. 46.

    Li, Y., Willer, C. J., Ding, J., Scheet, P. & Abecasis, G. R. MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet. Epidemiol. 34, 816–834 (2010).

    Article  Google Scholar 

  47. 47.

    Howie, B., Fuchsberger, C., Stephens, M., Marchini, J. & Abecasis, G. R. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat. Genet. 44, 955–959 (2012).

    CAS  Article  Google Scholar 

  48. 48.

    Abecasis, G. R. et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).

    Article  Google Scholar 

  49. 49.

    Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).

    CAS  Article  Google Scholar 

  50. 50.

    Aulchenko, Y. S., Struchalin, M. V. & van Duijn, C. M. ProbABEL package for genome-wide association analysis of imputed data. BMC Bioinformatics 11, 134 (2010).

    Article  Google Scholar 

  51. 51.

    Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).

    CAS  Article  Google Scholar 

  52. 52.

    Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).

    Article  Google Scholar 

  53. 53.

    Lonsdale, J. et al. The Genotype-Tissue Expression (GTEx) Project. Nat. Genet. 45, 580–585 (2013).

    CAS  Article  Google Scholar 

  54. 54.

    Lamparter, D. et al. Fast and rigorous computation of gene and pathway scores from SNP-based summary statistics. PLoS Comput. Biol. 12, e1004714 (2016).

    Article  Google Scholar 

  55. 55.

    Ogata, H. et al. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 27, 29–34 (1999).

    CAS  Article  Google Scholar 

  56. 56.

    Kanehisa, M. et al. Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Res. 42, D199–D205 (2014).

    CAS  Article  Google Scholar 

  57. 57.

    Croft, D. et al. Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Res. 39, D691–D697 (2011).

    CAS  Article  Google Scholar 

  58. 58.

    Nishimura, D. BioCarta. Biotech Softw. Internet Rep. 2, 117–120 (2001).

    Article  Google Scholar 

Download references


We acknowledge all patients who participated in the study. We thank the staff of the BBJ for their collecting and managing of clinical information and samples. We also thank the contributions of the Tohoku Medical Megabank Project, the Japan Public Health Center-based Prospective (JPHC) Study, the Japan Multi-Institutional Collaborative Cohort (J-MICC) Study, and the Genetic Study Group of Investigation Committee on Ossification of the Spinal Ligaments for the case–control studies used in this study. This research was supported by the Tailor-Made Medical Treatment Program (the BioBank Japan Project) of the Ministry of Education, Culture, Sports, Science, and Technology (MEXT), and the Japan Agency for Medical Research and Development (AMED) (grant ID: JP17km0305002). N.I. and M.I. were funded by the AMED under JP18dm0107097, JP18km0405201 and JP18km0405028. S.I. was funded by the AMED under JP18ek0109223 and JP18bm0804006. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information




N.M., M.A. and Y.K. contributed to the study concept and design. K.M., M.H. and M.Kubo collected and managed the BBJ samples. Y.M. and M.Kubo performed the genotyping. N.M., M.A., K.I., M.Kanai and A.T. performed the statistical analysis. S.I. M.I. and N.I. contributed to data acquisition. N.M. drafted the primary manuscript along with M.A., Y.O. and Y.K. All authors reviewed and approved the final version of the manuscript.

Corresponding author

Correspondence to Yoichiro Kamatani.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figures 1–7, Supplementary Tables 1–6 and 9–13, and Supplementary References.

Reporting Summary

Supplementary Tables

Supplementary Tables 7, 8 and 14–16

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Matoba, N., Akiyama, M., Ishigaki, K. et al. GWAS of smoking behaviour in 165,436 Japanese people reveals seven new loci and shared genetic architecture. Nat Hum Behav 3, 471–477 (2019).

Download citation

Further reading


Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing