Genome-wide association study of lifetime cannabis use based on a large meta-analytic sample of 32 330 subjects from the International Cannabis Consortium

Cannabis is the most widely produced and consumed illicit psychoactive substance worldwide. Occasional cannabis use can progress to frequent use, abuse and dependence with all known adverse physical, psychological and social consequences. Individual differences in cannabis initiation are heritable (40–48%). The International Cannabis Consortium was established with the aim to identify genetic risk variants of cannabis use. We conducted a meta-analysis of genome-wide association data of 13 cohorts (N=32 330) and four replication samples (N=5627). In addition, we performed a gene-based test of association, estimated single-nucleotide polymorphism (SNP)-based heritability and explored the genetic correlation between lifetime cannabis use and cigarette use using LD score regression. No individual SNPs reached genome-wide significance. Nonetheless, gene-based tests identified four genes significantly associated with lifetime cannabis use: NCAM1, CADM2, SCOC and KCNT2. Previous studies reported associations of NCAM1 with cigarette smoking and other substance use, and those of CADM2 with body mass index, processing speed and autism disorders, which are phenotypes previously reported to be associated with cannabis use. Furthermore, we showed that, combined across the genome, all common SNPs explained 13–20% (P<0.001) of the liability of lifetime cannabis use. Finally, there was a strong genetic correlation (rg=0.83; P=1.85 × 10−8) between lifetime cannabis use and lifetime cigarette smoking implying that the SNP effect sizes of the two traits are highly correlated. This is the largest meta-analysis of cannabis GWA studies to date, revealing important new insights into the genetic pathways of lifetime cannabis use. Future functional studies should explore the impact of the identified genes on the biological mechanisms of cannabis use.

Furthermore, we showed that, combined across the genome, all common SNPs explained 13-20% (P o0.001) of the liability of lifetime cannabis use. Finally, there was a strong genetic correlation (r g = 0.83; P = 1.85 × 10 − 8 ) between lifetime cannabis use and lifetime cigarette smoking implying that the SNP effect sizes of the two traits are highly correlated. This is the largest meta-analysis of cannabis GWA studies to date, revealing important new insights into the genetic pathways of lifetime cannabis use. Future functional studies should explore the impact of the identified genes on the biological mechanisms of cannabis use.

INTRODUCTION
Cannabis is the most widely produced and consumed illicit psychoactive substance worldwide. 1 Following initiation, occasional cannabis use can progress to frequent use, abuse and dependence. About 1 in 10 occasional users becomes dependent, which is associated with physical, psychological, social and occupational consequences. 2,3 Despite the increasing use of cannabis for medicinal purposes, 4 associations with adverse health effects have been reported. 5,6 These include increased risk for psychiatric outcomes, including psychosis, schizophrenia, schizotypal personality disorder and mania. 7,8 Early cannabis use appears to moderate relationship between polygenic risk scores for schizophrenia and brain maturation. 9 In view of expanding medicalization and decriminalization, the potential consequences, and the debate surrounding the benefits versus adverse consequences associated with cannabis use, 10 understanding the genetics of cannabis use should be a public health priority. 11 The risk of lifetime cannabis use, defined as any use of cannabis during the lifetime, varies between individuals. Previous studies have shown that individual differences in lifetime cannabis use can be partly explained by genetic differences between individuals; a meta-analysis of twin studies reported significant heritability estimates of lifetime cannabis use of 48% for males and 40% for females. 12 Shared environmental factors, such as cannabis availability and parental monitoring, 13,14 also have a role accounting for 25 and 39% of the risk for males and females, respectively. 12 Also, there is substantial overlap in the genetic risks underlying lifetime cannabis use and cannabis use disorder. 15 Several studies have sought to identify specific genetic risk factors associated with cannabis use phenotypes. Genome-wide linkage studies have revealed suggestive evidence for linkage across many chromosomes. [16][17][18][19][20] With very little consistency across studies, nearly all findings failed to meet genome-wide significance. The one study examining lifetime cannabis use 16 reported a nonsignificant linkage locus on chromosome 18 (LOD score = 1.97).
Candidate gene studies, including reports examining the CNR1, GABRA2, FAAH and ABCB1 genes have detected some significant associations with cannabis use but again, replication has been inconsistent. [21][22][23] On the basis of a sample of 7452 Caucasian individuals, Verweij et al. 21 found no gene-based associations between the frequency of cannabis use and 10 candidate genes identified by Agrawal and Lynskey. 24 Overall, the results of candidate-gene studies are inconclusive; some associations have been replicated a few times, but failed to replicate in other studies. Moreover, findings may be further distorted due to publication bias favouring significant results.
As an alternative to the candidate-gene approach, the genomewide association study (GWAS) is a hypothesis-free method that aims to detect novel genetic variants involved in complex traits. To date, three GWASs of cannabis use phenotypes have been published: one GWAS of cannabis dependence in 708 cannabisdependent individuals and 2346 controls; 25 a GWAS meta-analysis of lifetime cannabis use based on two studies with a combined sample size of 10 091 individuals (40.7% users); 26 and a recent GWAS of lifetime cannabis use and age of cannabis use onset based on a sample of 6744 individuals (of whom 20% were users). 27 None of the studies identified any genome-wide significant associations. This was likely due to the small effect sizes typical of common variants underpinning highly polygenic traits, 28 thereby indicating a need for larger sample sizes. In this context, the success of larger GWASs and international consortia examining a variety of complex traits is encouraging. 29 For example, multiple large meta-analyses of GWA results for number of cigarettes smoked per day have independently identified associations on chromosome 15q25 spanning the α5, α3 and β4 nicotinic receptor subunit gene clusters (CHRNA5, CHRNA3, CHRNB4). [30][31][32] These and other recent GWA findings 29 clearly illustrate the need for larger sample sizes. In response to this need, the International Cannabis Consortium was established to combine the results of multiple GWASs to identify the genetic variants underlying individual differences in cannabis use phenotypes. Our rationale for focusing on lifetime cannabis use (yes/no) is because this phenotype is heritable and shares significant genetic risks with that risk for cannabis abuse or dependence. 14,15,33 In contrast to frequency of use or abuse and dependence, which are not commonly assessed in large-scaled genetic studies, most general population studies have assessed lifetime cannabis use, thereby increasing our sample size and power to detect associations. Currently, the combined International Cannabis Consortium sample size for lifetime cannabis is 32 330 individuals from 13 cohorts from Europe, the United States and Australia, along with four independent replication samples comprising 5627 individuals. This sample size is considerably larger than the sample size of the previous GWAS investigating lifetime cannabis use in two samples from Australia and the UK, thereby providing substantially greater power to detect genetic variants of small effect size.
The aim of the present study is to identify genetic variants associated with lifetime cannabis use by meta-analysis of the GWAS results from all contributing International Cannabis Consortium samples. The tests of association for individual genetic variants will be complemented with gene-based tests of association. In addition, we will investigate which proportion of the heritability inferred by twin studies is explained by common SNPs captured on GWAS arrays. Finally, we will estimate the genetic correlation between lifetime cannabis and smoking initiation based on the analysis of our summary statistics and those from the publicly available Tobacco Alcohol and Genetics consortium.

MATERIALS AND METHODS Cohorts
We performed a meta-analysis of GWA results from 13 discovery samples from Europe, USA and Australia including a total of 32 330 individuals of European ancestry. The size of the samples ranged from 721 to 6778 individuals. The age of the participants ranged from 16 to 87 years with an average of 34 years. The percentage of females ranged from 30 to 66% with an average of 53%. Owing to the differences in recruitment strategies, cultural and temporal difference, combined with likely variation in the drug availability between countries, there was a wide range in the prevalence of lifetime use (that is, never/ever used cannabis), which varied from 1 to 92% with an average of 44.5%.
Four additional independent samples with a total of 5627 subjects were used for replication. One sample (n = 2660) consisted of African American Genome-wide association study of lifetime cannabis use S Stringer et al subjects. The other three included subjects of European ancestry. See Table 1 for individual sample characteristics. The procedures for data collection per sample are described in the Supplementary Information 1.

Phenotype and covariates
For all individuals, the data were available on whether or not the subject reported having ever used cannabis during their lifetime: yes (1) versus no (0). Although phrasing of the question slightly differed between samples (see Supplementary Information 1), our unit of analysis reflected lifetime cannabis use in all the samples.
Covariates included age at the time of phenotypic assessment, sex, birth cohort and principal components (obtained from the genome-wide genotype data). Spanning 20-year intervals, birth cohort was dummy coded, with the lowest birth cohort (that is, oldest age group) used as the reference group. The details about phenotypic assessment and individual sample characteristics for the discovery and replication samples are located in Supplementary Information 1

Genotyping and imputation
Genotype imputation was based on the 1000 Genomes phase 1 reference panel. 34 Allelic dosage data were used to account for genotype uncertainties. See Supplementary Table 2 for the genotyping platform, imputation program and quality control thresholds used.

Statistical analyses
GWA analysis in each discovery cohort. The GWA analyses were performed by each group separately. Associations between the binary phenotype and the genotypes were tested genome-wide using a logistic regression model including covariates (see above). For family-based samples, familial relatedness was taken into account by using a sandwich correction as implemented in PLINK. 35 The analyses plan can be found in Supplementary Information 3. It should be noted that some groups did do the analyses in a slightly different manner based on the characteristics of their sample. The analyses plan that was send to the participating groups is included in Supplementary Information 3. It should be noted that some groups did do the analyses in a slightly different manner based on the characteristics of their sample. Supplementary Table 2 lists the program used by each group.
Meta-analysis of GWAS results. Before performing the meta-analysis, we applied a set of filters to each GWA results set independently. First, we removed insertions and deletions, ensuring that all base pair positions were unique and referred to the same genetic variant (that is, SNP). Second, we removed genotyped SNPs that were not in Hardy-Weinberg equilibrium (P ⩽ 10 − 5 ). Third, we removed SNPs with minor allele frequency (MAF) o √(5/N), which under the assumption of Hardy-Weinberg equilibrium corresponded to less than five estimated individuals in the least frequent genotype group. In the EGCUT1 sample, due to very low prevalence of lifetime cannabis use (1.3%), we excluded SNPs with MAFo0.2. Fourth, regardless of the quality score type used, we excluded SNPs with imputation quality scores below 0.6. Finally, SNPs present in only one sample and SNPs with alleles or allele frequencies inconsistent with the 1000 Genomes phase I European reference panel (absolute MAF difference 40.15) were removed.
We performed a fixed-effects meta-analysis based on the cohort's effect sizes and standard errors using METAL. 36 Our meta-analysis combined association summary statistics for 6 444 471 SNPs that passed all the filters. We applied the conventional threshold of 5 × 10 − 8 as an indication of genome-wide significance (see ref. 37). Although the combined sample size of the meta-analysis based on the discovery samples is 32 330, the sample size per SNP varies due to missingness across subsamples.
Gene-based test. Results of the GWAS were then used as part of genebased tests of association in the Knowledge-based mining system for Genome-wide Genetic studies (KGG) software package Version 3.5. 38,39 This approach uses an extended Simes test that integrates prior functional information and the meta-analysis association results when combining the SNP P-values within a gene to obtain an overall association P-value for each entire gene. We conducted 24 576 gene-based tests of association. The genome-wide significance level according to the Knowledge-based mining system for Genome-wide Genetic studies default setting of Benjamini and Hochberg false discovery rate threshold of 0.05 (ref. 40) was 9.38 × 10 − 6 .
Estimation of SNP-based heritability and genetic overlap with lifetime cigarette smoking. The proportion of phenotypic variance that could be explained by the SNPs was estimated using the density estimation method developed by So et al. 41 The density estimation method estimates the genome-wide distribution of effect sizes based on the difference between the observed distribution of test statistics in the meta-analysis and the corresponding null distribution. Before estimation, the SNPs present in at least 25% of the meta-analysis samples were pruned for LD. We used the r 2 = 0.15 pruning level as the primary result for consistency with other applications of this method. Additional details are located in the Supplementary Information 2. LD Score regression 42,43 was used as an alternative method to estimate the SNP-based heritability, as well as to estimate the degree of genetic covariance between lifetime cannabis use (present study) and lifetime cigarette smoking 31 Table 2 illustrates the top 10 independent (R 2 o 0.1) SNPs associated with lifetime cannabis use. None of these 10 SNPs were replicated in the four independent replication samples (Supplementary Table 3). In a combined metaanalysis of the 10 top SNPs (that is, discovery plus replication samples), none of the SNPs reached genome-wide significance. Local plots of the most strongly associated regions, including neighboring genes, are provided in Supplementary Figures 4a-j. The most statistically significant marker (P-value = 4.6 × 10 − 7 ) was rs4984460 located on chromosome 15 (see Supplementary Figure 5 for the forest plot). The SNP is located in an intergenic region between LOC400456/LOC145820 and NR2F2 and MIR1469 genes. Supplementary  Table 3, four genes and one intergenic noncoding RNA region were significantly (false discovery rate-corrected P o0.05) associated with lifetime cannabis use: (i) neural cell adhesion molecule 1 (NCAM1, on 11q23); (ii) cell adhesion molecule 2 (CADM2, on 3p12); (iii) short coiled-coil protein (SCOC) and (iv) SCOC antisense RNA1 (SCOC-AS1, both located on 4q31); and (v) potassium channel, subfamily T, member 2 (KCNT2, on 1q31). Regional plots 44 of these top genes are located in Supplementary Figure 6. The smallest gene-based P-value was found for the NCAM1 gene. Within this gene, rs4471463 had the lowest SNP P-value, and was also among the top 10 associated SNPs. The forest plot in Figure 2 illustrates the effect of this SNP in each sample. In most samples, the effect is in the same direction, such that the major (T) allele is associated with a decreased risk of lifetime cannabis use. The forest plot for two SNPs with lowest P-values in the other significant gene regions can be found in Supplementary Figure 5.
Of the five genes included in our replication analyses, none were replicated in two of the independent replication samples (see Table 3). In the African American replication sample, suggestive associations with SCOC-AS1 (P = 0.044) and SCOC (P = 0.027) were found.
SNP-based heritability and genetic overlap with lifetime cigarette smoking Using the density estimation method (see 'Materials and Methods' section for a description), all the SNPs available in at least 25% of the samples when combined explained 20% of the total variance in lifetime cannabis use (P o 0.001). Alternative estimation with LD score regression also yielded a significant heritable component of 13% (h 2 LD = 0.13, s.e. = 0.02, P = 1.4 × 10 − 7 ). These variance estimates were robust across pruned sets with similar r 2 thresholds (see Supplementary Table 6). Stricter LD pruning (that is, r 2 = 0.05), or restricting analyses to SNPs present in all studies substantially decreased the estimate of variance explained. Both SNP heritability estimates confirmed that lifetime cannabis use has a significant heritable component (13-20%), indicating that GWAS should be able to identify these common SNPs (but effect sizes are small and large sample sizes are thus required). However, because these estimates are only based on common SNPs, the total heritability of lifetime cannabis use is likely to be higher.
The LD score regression analyses revealed a strong and highly significant genetic correlation (r g = 0.83, s.e. = 0.15, P = 1.85 × 10 − 8 ) between lifetime cannabis use and lifetime cigarette smoking (based on the Tobacco Alcohol and Genetics consortium 31 summary results), implying that SNPs for lifetime cannabis use and lifetime cigarette smoking are highly correlated.

DISCUSSION
To date, this is the largest GWA study of lifetime cannabis use. We performed meta-analysis of the GWA results based on a discovery sample comprising 32 330 individuals from 13 cohorts, and a replication sample comprising 5627 subjects from four cohorts (including one African American cohort). There were no genomewide significant SNP associations. However, heritability analyses revealed that between 13 and 20% of the variation in lifetime cannabis use could be explained by common SNPs. Moreover,  Genome-wide association study of lifetime cannabis use S Stringer et al gene-based tests of association identified four protein-coding genes and one intergenic region significantly associated with lifetime cannabis use including NCAM1, which has previously been linked to substance use. [45][46][47][48] Finally, we revealed that the genetic liability to lifetime cannabis use correlated to a large extent (r = 0.83) with the genetic liability to lifetime cigarette smoking.
Our results are consistent with the hypothesis that lifetime cannabis use is a highly polygenic trait, comprising many SNPs each with small effects contributing to lifetime risk. Moreover, portions of the genetic risk in lifetime cannabis use likely correlates with other substances including cigarette smoking.
Our top gene associated with lifetime cannabis use was NCAM1, a known candidate for nicotine dependence. 45 The role of NCAM1 is to regulate pituitary growth hormone secretion as a membranebound glycoprotein that mediates cell-cell contact by hemophilic interactions. 46 NCAM1 is part of the NCAM1-TTC12-ANKK1-DRD2 (NTAD) gene cluster, which is related to neurogenesis and dopaminergic neurotransmission. Importantly, the NTAD cluster   55 as well as other substance use disorders. 54 Although it is plausible that NCAM1 is capturing pleiotropic risks underpinning the liability to licit and illicit substance use in general, we note that NCAM1 was not identified either by the Tobacco Alcohol and Genetics consortium or other consortia for cigarette smoking. [30][31][32] The functions of the putative variants responsible for the associations in the candidate-gene studies remain to be determined.
The second gene, CADM2, is a synaptic cell adhesion molecule (SynCAM family) belonging to the immunoglobulin (Ig) superfamily. Variants in the CADM2 gene have been previously associated with body mass index, 56 processing speed 57 and autism disorders. 58 Interestingly, these phenotypes were associated with cannabis use in previous studies, [59][60][61] which together suggest that CADM2 can be considered an important gene related to a variety of complex traits. It is possible that the association with lifetime cannabis use may be driven, for example, by differences in personality rather than as a direct relationship with lifetime use.
The third gene, SCOC, encodes a short coiled-coil domaincontaining protein that localizes to the Golgi apparatus. Many coiled-coil-type proteins are involved in important biological functions such as the regulation of gene expression through the regulation of transcription factor binding. 62 The function of SCOC is largely unknown and no previous association studies have linked SCOC to cannabis or other substance use phenotypes. The SCOC antisense RNA1 gene is located in the same chromosomal region.
Finally, KCNT2 encodes a potassium voltage-gated channel (subfamily S, member 2). The sodium-activated potassium channels Slack and Slick are encoded by KCNT1 (potassium channel, subfamily T, member 1) and KCNT2, respectively, which are found in neurons throughout the brain. Suggestive association for SNPs near KCNT2 have previously been found for cocaine dependence and for early-onset, highly comorbid, heavy opioid use. 63,64 This suggests that potassium signaling may have a role in addiction.
The lack of genome-wide significant associations for individual SNPs is consistent with previous GWA studies of lifetime cannabis use 26,27 and cannabis dependence. 25 The difficulty of identifying specific SNPs for lifetime cannabis use may be attributable to several reasons. First, complex traits are known to be influenced by many variants, each with very small effect sizes. Although power calculations reveals suitable power (96%) to detect odds ratios of 1.15 based on common SNPs (MAF = 0.2), the power to detect smaller effect sizes remains lower. For example, there is only 28% power to detect effect sizes with odds ratio of 1.1 and MAF = 0.2. Therefore, our data suggest that the effect sizes of single variants contributing to lifetime cannabis use are likely to be smaller than 1.15. Combining variants within larger units (that is, genes) did however reveal four significant genes associated with lifetime cannabis use implying that these genes are appropriate targets for future functional studies of cannabis use. Unfortunately, our gene-based results were not replicated in the replication samples, probably due to low sample sizes and therefore low power. In the African American replication sample, we did find suggestive association with SCOC-AS1 and SCOC.
On the basis of twin studies, the heritability of lifetime cannabis use is estimated at 40-50%. 12 In our study, all common SNPs combined explained 13-20% of the variance in the liability to use cannabis depending on the method used. Stricter LD pruning (that is, r 2 = 0.05) or restricting to SNPs observed (genotyped or imputed) in all the analyses, substantially reduces the estimate of variance explained. Speculatively, this may indicate that much of the variance explained comes from SNPs located in the regions of weak LD. Such effects are likely to be poorly tagged for the estimation of variance explained after strict LD pruning, and are likely to be more difficult to impute owing to a lack of strongly correlated genotyped SNPs (and thus missing from some studies). Our SNP-based heritability estimates lie in between two previous heritability estimates for lifetime cannabis use based on the Genome-wide Complex Trait Analysis 65 software package. Verweij et al. 26 estimated that 6% of the variance in lifetime cannabis use is explained by aggregated common SNPs (MAF40.05). Minică et al. 27 found an estimate of 25%. Provided that the current sample is much larger than the samples used in the previous studies, we conclude that approximately one-third to half of the heritability is explained by common SNPs captured on a GWAS array. Other sources of variation may explain the discrepancy between SNP-and twin-based heritability estimates. For example, age-related genetic differences, non-additive genetic variance, interactions between genetic variants and environmental risk factors, epistasis and/or rare mutations may also have a role.
Our results indicate a very high genetic overlap (r = 0.83) between our measure of lifetime cannabis use and lifetime cigarette use when based on the SNP panel. Twin studies have shown moderate to high genetic correlations of 0.59-0.74 between lifetime cannabis and nicotine use. 66 Kendler et al. 67 also reported significant biometrical genetic correlations between the levels of cannabis, nicotine and alcohol use, which were increasingly influenced by common genetic risks detectable in early adulthood.
Our findings should be interpreted in the context of at least four potential limitations. First, our study was underpowered to detect very small effects of individual variants. Power analyses revealed that a twofold increase in sample size is required to detect SNP effect sizes with odds ratios of 1.1. Second, lifetime cannabis use is a dichotomous measure combining single lifetime, regular and chronic users. Consequently, our sample may compromise heterogeneous patterns of use, which has the potential to reduce the power to detect genetic association. 68 Third, prevalences of lifetime cannabis use varied between 1% (EGCUT1) and 92% (Yale Penn EA). This was likely due to differences in the sample characteristics, recruitment strategies and the political differences between countries. Despite these differences, the forest plots of the key SNPs (see Figure 2; see also Supplementary Figure 5) revealed that the 95% confidence intervals surrounding the effect estimates typically included the estimated meta-analytic effect, which tends to overlap across studies. This indicates that the input samples were representative of the same population of users. Finally, the average age of participants varied between 18 (ALSPAC) and 45 (QIMR) years. Consequently, some younger participants might have initiated cannabis use at a later age, but have been classified as 'never users' in the current study. This can decrease power, but does not invalidate our results. In addition, we note that the average age of each sample did not correlate with sample prevalences (r = − 0.04, P = 0.91).
On the basis of our observations, the following recommendations for future studies can be made. We have identified four genes significantly associated with cannabis use, which are candidates for follow-up functional studies. In particular, the role of NCAM1 can be examined to determine the functional role of this gene, possibly in combination with other genes in the same gene cluster (NCAM1-TTC12-ANKK1-DRD2).
The next goal of the International Cannabis Consortium is to perform a meta-analysis on GWA studies investigating the age at first cannabis use. Our rationale is based on the observation that early initiation of cannabis use is associated with rapid progression towards cannabis abuse and dependence, polysubstance use and other substance use disorders. [69][70][71] Methods other than GWASs may also be used to reveal the biological pathways of cannabis use, such as rare variant association analyses. The environmental risk factors may be incorporated to investigate gene × environment interactions. Hopefully, the combination of advanced technologies and novel statistical approaches with larger samples will further contribute to our understanding of the genetic architecture of cannabis use.

CONCLUSION
We have performed the largest meta-analysis to date of GWASs investigating cannabis use phenotypes. With a sample of over 32 000 individuals, our results implicate four genes as involved in lifetime cannabis use: NCAM1, CADM2, SCOC and KCNT2. Our results illustrated that lifetime cannabis use is under the influence of many common genetic variants. The combined SNPs explained 13-20% of the phenotypic variation, and revealed a high degree of genetic sharing (r = 0.83) with lifetime cigarette smoking. Future studies should investigate the impact of these genes on the biological mechanisms leading to lifetime cannabis use.  London. SNP Genotyping was performed by The Wellcome Trust Sanger Institute and National Eye Institute via NIH/C. Utrecht-We are grateful to Chris Schubart and Willemijn van Gastel and numerous students for their work in the study. Foremost, we thank our study participants. This study was financially supported by a grant of the NWO (Netherlands Organization for Scientific Research), grant no. 91207039. The study was performed at the University Medical Centre Utrecht, The Netherlands. Yale Penn-Genotyping services for a part of our GWAS study were provided by the Center for Inherited Disease Research (CIDR) and Yale University (Center for Genome Analysis). CIDR is fully funded through a federal contract from the National Institutes of Health to The Johns Hopkins University (contract number N01-HG-65403). This study was supported by the National Institutes of Health grants RC2 DA028909, R01 DA12690, R01 DA12849, R01 DA18432, R01 AA11330, R01 AA017535, and the VA Connecticut and Philadelphia VA MIRECCs.