Large scale genome wide association studies (GWASs) have identified risk genes for many complex human diseases and traits (, including psychiatric disorders such as schizophrenia and nicotine dependence (ND)1,2,3. These GWASs also show that many human diseases and traits are polygenic in nature and the contribution of individual genes is limited4,5,6. Many of these studies have been deposited in the database of Genotypes and Phenotypes (dbGaP, and are available for secondary analyses. These datasets provide an opportunity to examine the genetic relationship between correlated traits, and to discover and identify risk genes shared across these traits.

Pleiotropy is a phenomenon in which a single locus affects multiple traits7,8. It accounts for at least a part of the genetic mechanism of many correlated human behaviors and diseases. Pleiotropy can take two forms: either a single process, leading to a cascade of downstream effects (sometimes described as “mediated pleiotropy”), or a single locus influencing multiple traits (sometimes described as “biological pleiotropy”)9. Schizophrenia is highly comorbid with cigarette smoking10. However, the underlying biology of this comorbidity is not well understood11. Several hypotheses have been proposed. The self-medication hypothesis postulates that schizophrenia patients smoke to reduce symptoms and antipsychotics-induced side effects and to improve their attention and working memory12. Alternatively, schizophrenia and ND could have shared some genetic liability (i.e., biological pleiotropy)13, which is supported by recent studies of individual genes14,15,16,17,18,19,20,21. A third possibility is that smoking may be causal to schizophrenia (i.e., mediated pleiotropy)22. To explore the genetic relationship between schizophrenia and ND, we obtained the GWAS summary statistics from the Psychiatric Genomics Consortium (PGC) schizophrenia analyses and ND related traits from our unpublished studies, and conducted polygenic analyses. Under the hypothesis of biological pleiotropy, we expect that genetic risk scores of schizophrenia and ND related traits predict each other; whereas self-medication would anticipate unidirectional (schizophrenia to ND traits) prediction. In this article, we report the findings from these analyses.


Nicotine dependence and schizophrenia share genetic liability

In these analyses, we calculated genetic risk scores for schizophrenia (supplementary Figure 1A) and tested whether the risk scores predicted FTND and CPD. The results were summarized in Table 1. Schizophrenia risk scores predicted FTND score and CPD at the thresholds of P ≤ 5 × 10−3, 5 × 10−2 and 5 × 10−1. The correlation coefficients at these thresholds were all positive, suggesting that a schizophrenia diagnosis was positively associated with cigarette smoking, consistent with the well-known comorbidity between schizophrenia and ND. However, schizophrenia risk scores explained only a very small fraction of the FTND and CPD traits.

Table 1 Schizophrenia risk score prediction of FTND and CPD.

FTND and COT risk scores (supplementary Figure 1B,C) were calculated for the subjects of the phase I PGC schizophrenia GWAS samples23 using the summary statistics from the FTND (n = 17,781) and COT (n = 4,548) GWAS meta-analyses. We then evaluated whether the genetic risk scores of COT and FTND could predict the schizophrenia diagnosis using logistic regression. The results were presented in Table 2. The COT risk scores calculated at the P-values of 5 × 10−5, 5 × 10−4 and 5 × 10−3 predicted schizophrenia diagnosis, but FTND risk scores failed to do so. For the P-values thresholds at which the COT risk scores predicted schizophrenia diagnosis, the beta coefficients were also positive, again, confirming the positive phenotypic correlation between ND and schizophrenia.

Table 2 Genetic risk score to nicotine dependence prediction of schizophrenia diagnosis * .

Identification of shared variants between ND and schizophrenia

Our reciprocal polygenic analyses suggested that there were some shared genetic liability between schizophrenia and ND as defined by the FTND and COT traits. We then proceeded to identify the variants associated with both schizophrenia and ND traits. We computed joint P-values for each marker using the summary statistics from the schizophrenia and COT/FTND/TFC meta-analyses, and assigned a q-value to each of the joint P-values using an FDR method24,25. Table 3 listed the loci identified by the joint analyses with q-values ≤ 0.05. From the joint analyses between schizophrenia and COT, 11 loci reached genome-wide significance for association with both COT and schizophrenia, of which 2 loci had no known genes nearby and 6 were spliced ESTs or long non-coding RNAs. In the analyses between schizophrenia and FTND, 10 loci were identified, and 3 of them were ESTs or non-coding RNAs. The joint analyses between schizophrenia and TFC yielded 15 significant loci. The CHRNA5-CHRNA3-CHRNB4 locus was the only one identified by all three smoking traits. In addition to some genes known to be associated with schizophrenia (HLA-B and MAD1L1), we also identified novel non-coding RNAs and RNA binding protein genes (DA376252, BX089737, LOC101927273, LINC01029, LOC101928622, HY157071, DA902558, RBFOX1 and TINCR), post-translation modification genes (MANBA, UBE2D3 and RANGAP1) and energy production genes (XYLB, MTRF1 and ENOX1).

Table 3 Joint testing of association with schizophrenia and smoking traits.

Pathway enrichment and network interaction analyses

We further explored the pathways shared by schizophrenia and ND by selecting all markers with q-values less than 0.16 from the joint analyses between schizophrenia and smoking traits. After mapping the markers to genes, the genes showing potential association with schizophrenia and COT/FTND/TFC were pooled to search for pathways enriched in both conditions. In these analyses, we selected only the genes identified by at least 2 of the 3 smoking traits, yielding a total of 146 genes. After filtering out the human leukocyte antigen genes (HLA-B, HLA-C, HLA-DOA, HLA-DQA1, HLA-DQB1, HLA-DRB1, and HLA-G) due to their strong linkage disequilibrium26, we used the remaining 139 genes in pathway analyses.

Our analyses identified 16 unique pathways that were shared between schizophrenia and ND (Table 4). The most noticeable pathways were Calcium Signaling, Long-Term Potentiation, Neuroactive Ligand-Receptor Interaction, Phosphatidylinositol Signaling, Cell Adhesion Molecules, and Regulation of Actin Cytoskeleton pathways. Some of these pathways (Calcium Signaling, Long-Term Potentiation, Cell Adhesion Molecules, and Regulation of Actin Cytoskeleton) had been reported to be involved in schizophrenia27,28,29,30,31,32, others (Cell Adhesion Molecules and Neuroactive Ligand-Receptor Interaction) had been implicated in ND33,34. We found that these pathways were enriched in the genes associated with both ND and schizophrenia. Additionally, pathways involved in cardiomyopathy, GnRH signaling, gastric acid secretion and Alzheimer’s disease were also found to be shared between schizophrenia and ND. In the pathway network interaction analyses, we found a network of crosstalk between pathways (Fig. 1), with the Long-Term Potentiation located at the center of these interactions.

Table 4 Pathways enriched in schizophrenia and smoking traits.
Figure 1: Pathway crosstalk network.
figure 1

The size of the node is proportional to the P-values of pathway enrichment test. The thickness of the edge is proportional to the P-values of pathway crosstalk.


It is well known in psychiatric clinics that a large proportion of schizophrenia patients smoke cigarettes and smoke heavily13. The dominant hypothesis to explain the comorbidity is self-medication12, i.e., that schizophrenia patients smoke to ameliorate impairments in cognitive function and suppress psychotic symptoms. Another hypothesis contends that schizophrenia and ND share some genetic liability, and the high prevalence rate of cigarette smoking in schizophrenia patients is a manifestation that is partially due to the common liability13. A third possibility is that smoking may be a risk factor for the development of schizophrenia, given that smoking initiation typically predates the onset of schizophrenia22. These three hypotheses are not mutually exclusive, and all three may contribute to the observed co-occurrence of schizophrenia and smoking.

Previous studies examining this issue have largely focused on individual functions/symptoms or genes using relatively small sample sizes. Here we took a systematic approach, and examined the entire genome using large GWAS datasets and multiple traits. We observed different patterns between the reciprocal polygenic analyses (comparing Tables 1 and 2). When we used the genetic risk scores of schizophrenia to predict ND traits, the association was evident at P-values ≥ 5 × 10−3, with the association strength increased as the P-value threshold became larger (Table 1). Given that the PGC schizophrenia GWAS did not control for smoking status and quantity, and there was a large difference of smoking prevalence between schizophrenia patients and controls (on average, 65% or more schizophrenia patients smoke, and about 20% people smoke in the general population), we would expect that the PGC schizophrenia GWAS identify top candidates for ND related traits. But what we found was not the case. These top ranked candidates (i.e. those with P-values ≤ 5 × 10−5) from the PGC schizophrenia meta-analysis1 were not predict ND related traits. A likely explanation for these results is that genes most strongly associated with schizophrenia do not directly contribute to the smoking behaviors in schizophrenia patients. In other words, the reason why schizophrenia patients smoke is that they want to improve their cognitive functions and to suppress psychotic symptoms, not because that they are addicted to nicotine as regular smokers in the general population do. These results are consistent with the self-medication hypothesis.

In contrast, when we used COT risk scores to predict schizophrenia diagnosis, we found that smaller P-values produced stronger signals (Table 2), indicating that genes most strongly associated with ND were associated with schizophrenia. The results imply that either ND and schizophrenia share some genetic liability, or ND is a risk factor of schizophrenia. These fit the predictions of the shared liability hypothesis and that smoking is a causal risk for schizophrenia. Of note, these two explanations are not mutually exclusive. But without data on smoking of the patients we are unable to test the latter possibility (e.g., by stratifying our sample on smoking status).

Assuming biological pleiotropy to be the underlying mechanism, we devised a test to discover the variants shared between ND and schizophrenia. Using this approach, we identified multiple genes associated with both conditions (Table 3). Of these genes, the CHRNA5-CHRNA3-CHRNB4 cluster had been found to be associated with CPD2,3 and other ND related traits, and it was reported to be associated with schizophrenia in the latest schizophrenia GWAS meta-analysis from PGC1. Several of the genes had been reported to be associated with schizophrenia (HLA-B and MAD1L1)28 and epilepsy (KCNT1, PRICKLE2 and RBFOX1)35,36,37, suggesting that they might play a role in smoking behaviors as well. Our analyses also identified some novel genes shared between schizophrenia and ND, including a group of long non-coding RNAs and RNA binding protein genes (DA376252, BX089737, LOC101927273, LINC01029, LOC101928622, HY157071, DA902558, RBFOX1 and TINCR), a group of post-translation modification genes (MANBA, UBE2D3, and RANGAP1) and a group of energy production genes (XYLB, MTRF1 and ENOX1). Long non-coding RNAs were suggested to play a role in schizophrenia38,39, the identification of multiple long non-coding RNAs was intriguing.

Phenotype comorbidity is common in complex diseases and traits7,8. Pleiotropy, or shared genetic liability, may be an underlying mechanism of these comorbidities. Under this condition, different approaches have been developed to identify genes shared by the comorbid conditions40,41, and these approaches seem more powerful than standard GWAS8,42. Another advantage of these methods is that they can use the large number of GWAS datasets produced by a single phenotype/trait analyses. The approach we used to identify these shared loci is conservative. In our analyses, we excluded all markers reaching genome-wide significance from both schizophrenia and smoking traits and required a balanced contribution from both traits. Under this condition, if a marker reached genome wide significance for schizophrenia but had a modest association with ND traits (say P-values between 10−4 to 5 × 10−6), it was excluded from our joint testing. Similarly, some markers would be excluded if they reached genome wide significance in ND traits. Because the GWASs used have different sample sizes, and therefore varied in their statistical power, it is inevitable that we would miss some markers from the more powerful GWAS when we required balanced summary statistics in the joint testing.

Our pathway analyses identified multiple pathways shared by schizophrenia and ND. The most significant pathways were Calcium Signaling, Long-Term Potentiation and Neuroactive Ligand-Receptor Interaction. These pathways are involved in neurotransmitter transduction and communication between neurons, and they are essential for cognitive functions. These pathways have been shown to be involved in schizophrenia28,43,44 and ND45,46. The Cell Adhesion Molecules and Regulation of Actin Cytoskeleton pathways have also been reported in schizophrenia31,47,48,49 and ND45,50,51. Thus, our results are consistent with these studies. It is worth noting that the cardiomyopathy pathways were identified in our analyses and that, in a previous study, we found that CMYA5 was associated with schizophrenia52. Another gene, NDUFV2, causative to hypertrophic cardiomyopathy53,54, the genetic form of cardiomyopathy, was also found to be associated with schizophrenia55,56,57. Pathway crosstalk analyses showed that many of these pathways interact with each other and together they form an interlinked network with the Long-Term Potentiation pathway at the center of these interactions. In animal studies, nicotine alters long-term potentiation58,59,60 and learning and memory61. In humans, smoking may alleviate cognitive impairment62, and both nicotine withdrawal and schizophrenia are associated with cognitive impairments63,64. Thus, compensating cognitive impairments may be a common motivational factor between regular smokers and schizophrenia patients.

In summary, our results supported the self-medication hypothesis. We also found evidence that schizophrenia and ND share some genetic liability and these results did not contradict the hypothesis that smoking was a causal risk factor for schizophrenia. Assuming shared liability and a balanced contribution, we identified novel candidate genes associated with both schizophrenia and ND. Analyses of the shared genes revealed multiple pathways and an interacting network centered on long-term potentiation. These results provided some new insights for our understanding of smoking behaviors in both schizophrenia patients and the general population.


Phenotypes and GWAS datasets

For schizophrenia, we obtained the summary statistics from the PGC GWAS of schizophrenia1. This study used 52 independent samples, of them 46 were case control samples of European ancestry, 3 were Asian case control samples and 3 were European family samples. Since the samples were collected from different countries, both the criteria for Diagnostic and Statistical Manual of Mental Disorders (DSM) and International Classification of Diseases (ICD) were used in the diagnosis of the patients. Please see original paper1 for details. We selected to use the summary statistics of the 46 European case control samples (32,405 cases and 46,839 controls). For ND-related traits, we used the summary statistics of our cotinine study65 and 2 unpublished datasets (manuscripts in preparation). One data used the sum scores of the Fagerström Test for Nicotine Dependence (FTND)66 as a trait, which is a commonly used phenotype for ND based on self-reported smoking behaviors. The second data used a single item of the FTND questionnaire, “How soon after you wake up do you smoke your first cigarette”, or time to smoke the first cigarette (TFC) as a trait. This question can be seen as a measure of nicotine withdrawal since the half-life of nicotine in the human body is about 2 hours67. Smokers often experience nicotine withdrawal in the morning after not smoking overnight. The third data65 used the plasma cotinine concentration (COT) as a trait. Cotinine is the major metabolite of nicotine, and its half-life is much longer than that of nicotine. Therefore, its concentration in plasma can be considered an index of nicotine intake in recent days68,69. Because the quantity of nicotine intake is one of the most important measures of ND, COT may be considered a measure of ND as well. In these studies, FTND, TFC and COT were treated as quantitative traits. The sample size for FTND was 16,237, excluding the Netherlands Twin Registry sample because some of its subjects were also used in COT GWAS. The sample sizes for TFC and COT were 15,705 and 4,575 respectively. The FTND and TFC measures were derived from the same subjects, therefore, only FTND was used in polygenic analyses. TFC were used only for the identification of shared genes between schizophrenia and ND related phenotypes. The samples used in these 3 ND related GWASs were listed in Supplementary Table S1. All subjects used in this study were of European ancestry.

Polygenic analyses

Schizophrenia risk scores were calculated for 9 independent smoking related studies (Table S1, n = 10,794) with FTND and CPD measures using the summary statistics from the PGC schizophrenia meta-analysis. The control subjects from the Molecular Genetics of Schizophrenia (MGS) were included in the GWASs of both FTND and PGC schizophrenia, therefore they were excluded from this analysis. Risk scores for COT and FTND were calculated for 13,326 individuals from the NIMH genetics consortium repository ( We estimated the risk scores for each trait using the algorithms implemented in the PLINK software70. Specifically, the risk score for an individual was the sum of the number of risk alleles multiplied by the logarithm of odds ratio (OR, for schizophrenia) or beta coefficient (for FTND and COT), which was then normalized subsequently by the product of maximal number of risk alleles and log(OR)s/beta coefficients. For each trait, we calculated risk scores at 5 P-value thresholds: 5 × 10−5, 5 × 10−4, 5 × 10−3, 5 × 10−2 and 5 × 10−1. The numbers of markers used to calculate schizophrenia risk scores at these thresholds were 6,014, 94,804, 268,070, 1,021,476 and 5,370,899. The numbers of markers used for FTND and COT were 731, 6,312, 55,378, 500,542 and 4,752,196; and 1,621, 6,357, 48,575, 473,100, and 4,737,313 respectively. We then tested whether schizophrenia risk scores predicted FTND scores and vice versa using logistic (schizophrenia) and linear regression (FTND scores). Since the number of cigarettes smoked per day (CPD) was available from the FTND datasets, we also tested whether the genetic risk scores for schizophrenia predicted the CPD phenotype. Because we did not have individual genotypes for all datasets used in the COT meta-analyses, we used only the COT risk score to predict schizophrenia diagnosis. Sex, age and study were included as covariates in regression analyses.

Identification of shared risk genes

While there are papers looking at pleiotropy from a conditional FDR point of view71, we arrive to qualitatively similar conclusions using a somewhat simpler approach of family-wise error rate. Our test attempts to discover shared risk genes between schizophrenia and ND using summary statistics from their respective GWASs. To ensure that such a test is not overly influenced by a strong signal in just one trait, we implemented a “weakest link” approach (i.e., choosing the larger P-value of the pair of trait tests at the SNP under investigation)72. In more detail, let Xj and Pj be the χ2 distributed statistics and their associated (background enrichment adjusted) P-values, j = 1, …, m, for association tests between the m traits and a SNP. As the overlap statistic of all traits we use (or, alternatively,). Under the assumption that the trait tests are independent, the P-value (also denoted as overlap P-value) for a given overlap statistic, r, at a SNP is . If we further assume that (under the null hypothesis - H0) none of the traits is associated with the genetic variant, the overlap P-value simplifies to P(R ≤ r) = rm (1). Otherwise, P(Pj ≤ r) can be computed based on the distribution of the j-th trait P-values. For instance, for two phenotype configuration and a putative threshold of 5 × 10−8, the parametric version of our method requires that, for a significant pleiotropic signal, the p-values for both phenotypes to be <2.2 × 10−4 (). This substantially less than 5 × 10−8 p-value threshold is similar in spirit to the one from Andeassen et al.71 While the overlap p-value (1) does eliminate most of the influence of an extreme signal for one phenotype, it does not eliminate it completely. However, for a putative threshold of 5 × 10−8 in (1), under the worst case scenario of an extreme signal in one phenotype, the false positive rate per SNP is still rather small, i.e. 2.2 × 10−4. Even more, as seen in Andreassen et al., the false positive rate is likely to be substantially lower. Moreover, a worst-case-scenario 2.2 × 10−4 false positive rate is adequate for the pathway analyses73. We used FDR24 to evaluate the approximate significance of the genetic overlap (described by relation (1)) between schizophrenia and smoking phenotypes. To select promising markers for pathway and network analyses we applied a threshold of q-value ≤ 0.16, corresponding to a factor of 2 in Akaike Information Criterion penalty in a likelihood ratio χ2 test with 1 degree of freedom.

Pathway and network analyses

We conducted pathway enrichment analysis of genes with at least one marker with q-values lower than 0.16 from the joint testing of schizophrenia and COT/FTND/TFC traits. If a marker was within a gene region, it was assigned to the gene; otherwise, it was mapped to its most proximate gene using the 50-kb flanking regions (both 5′ and 3′ sides). Genes identified using SNPs associated with COT, FTND, or TFC were merged for the pathway enrichment analysis, for which we used the hypergeometric test implemented in the tool WebGestalt (2013 update)74 and the canonical pathways from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. We required each pathway to have at least three genes from our gene list and no more than 300 genes from the reference genome. The P-values from hypergeometric tests were further adjusted by the Benjamini-Hochberg method23. Only pathways with adjusted P-values < 0.05 were considered statistically significantly enriched.

We further examined pathways interaction using the Characteristic Sub-Pathway Network (CSPN) algorithm31,75 the human protein-protein interaction (PPI) network76. We restricted the analysis specifically to the aforementioned merged gene set and their enriched pathways. In the final step, we selected the significant pathway interaction pairs based on permutation P-values less than 0.05.

Additional Information

How to cite this article: Chen, J. et al. Genetic relationship between schizophrenia and nicotine dependence. Sci. Rep. 6, 25671; doi: 10.1038/srep25671 (2016).