Multi-ancestry genome-wide association study of cannabis use disorder yields insight into disease biology and public health implications

Levey, Daniel F.; Galimberti, Marco; Deak, Joseph D.; Wendt, Frank R.; Bhattacharya, Arjun; Koller, Dora; Harrington, Kelly M.; Quaden, Rachel; Johnson, Emma C.; Gupta, Priya; Biradar, Mahantesh; Lam, Max; Cooke, Megan; Rajagopal, Veera M.; Empke, Stefany L. L.; Zhou, Hang; Nunez, Yaira Z.; Kranzler, Henry R.; Edenberg, Howard J.; Agrawal, Arpana; Smoller, Jordan W.; Lencz, Todd; Hougaard, David M.; Børglum, Anders D.; Demontis, Ditte; Gaziano, J. Michael; Gandal, Michael J.; Polimanti, Renato; Stein, Murray B.; Gelernter, Joel

doi:10.1038/s41588-023-01563-z

Download PDF

Article
Open access
Published: 20 November 2023

Multi-ancestry genome-wide association study of cannabis use disorder yields insight into disease biology and public health implications

Nature Genetics volume 55, pages 2094–2103 (2023)Cite this article

39k Accesses
6 Citations
680 Altmetric
Metrics details

Subjects

Abstract

As recreational use of cannabis is being decriminalized in many places and medical use widely sanctioned, there are growing concerns about increases in cannabis use disorder (CanUD), which is associated with numerous medical comorbidities. Here we performed a genome-wide association study of CanUD in the Million Veteran Program (MVP), followed by meta-analysis in 1,054,365 individuals (n_cases = 64,314) from four broad ancestries designated by the reference panel used for assignment (European n = 886,025, African n = 123,208, admixed American n = 38,289 and East Asian n = 6,843). Population-specific methods were applied to calculate single nucleotide polymorphism-based heritability within each ancestry. Statistically significant single nucleotide polymorphism-based heritability for CanUD was observed in all but the smallest population (East Asian). We discovered genome-wide significant loci unique to each ancestry: 22 in European, 2 each in African and East Asian, and 1 in admixed American ancestries. A genetically informed causal relationship analysis indicated a possible effect of genetic liability for CanUD on lung cancer risk, suggesting potential unanticipated future medical and psychiatric public health consequences that require further study to disentangle from other known risk factors such as cigarette smoking.

Genome-wide association study in individuals of European and African ancestry and multi-trait analysis of opioid use disorder identifies 19 independent genome-wide significant risk loci

Article Open access 25 July 2022

The genetic aetiology of cannabis use: from twin models to genome-wide association studies and beyond

Article Open access 21 November 2022

A genetically informed study on the association of cannabis, alcohol, and tobacco smoking with suicide attempt

Article 08 June 2020

Main

Cannabis is a psychoactive substance with a long history of use and dependence. Recently within the United States, 37 states have approved what is termed medical cannabis use, and 19 states, 2 territories and the District of Columbia allow possession of cannabis for recreational purposes. In Europe, only Malta has fully legalized recreational cannabis, although many other countries have decriminalized possession of small amounts of cannabis and have enabled medical allowances. It was recently legalized in Thailand but remains prohibited in many parts of Asia, the Middle East and South America. The status in many of these places may be subject to change in the near future. More than a third of individuals who use cannabis develop cannabis use disorders (CanUD), and evidence regarding the impact of legalization on escalating use and use disorders is mixed^1,2. Substantial negative health outcomes associated with chronic cannabis use include various cancers associated with inhaling combustion products³, declines in cognitive capacity and motivation and increased schizophrenia (SCZ) risk^4,5. Individual and societal complications that result from CanUD include decreased productivity and accidents related to intoxication⁶. The full range of risks and negative outcomes associated with cannabis use and CanUD may not be appreciated widely. Considering the gradually increasing permissiveness surrounding its use, understanding various sources of risk that influence CanUD is both necessary and timely.

In this Article, we combined genome-wide genotype data from the Million Veteran Program (MVP) with expanded samples from iPSYCH2^7,8 and Mass General Brigham (MGB) BioBank⁹ and meta-analyzed these with the Psychiatric Genomics Consortium (PGC)/deCODE/iPSYCH1 study^7,10. MVP, one of the largest biobanks in the world¹¹, has enabled a substantial increase in power for genomic discovery by doubling the number of cases of European (EUR) ancestry available. By increasing sample numbers, we substantially increased the number of discovered loci and confirmed previous findings^7,10. We also leveraged the ancestral diversity of the MVP to expand analyses of African ancestry individuals (AFR) and conducted genome-wide association studies (GWAS) analyses in Admixed American (AMR) and East Asian (EAS) ancestries. Linkage disequilibrium (LD) score regression (LDSC) can quantify variance explained by genetics and identify overlap between traits. This method is sufficient for EUR ancestries but not appropriate for some non-European and admixed ancestries. To solve this problem, we used cohort-derived covariate LDSC¹² to calculate single nucleotide polymorphism (SNP)-based heritability in these populations, finding similar results among all ancestries. We conducted a transcriptome-wide association study (TWAS), which leverages annotations based on variant associations to changes in gene expression, in adult and fetal brain tissue to identify significant expression quantitative trait loci (eQTLs), using stratified LDSC to show enriched SNP-based heritability in fetal but not adult cortex. We also conducted Mendelian randomization (MR) analyses—an approach that uses genetic variations identified by GWAS as instruments to obtain an unbiased estimate of the effect of a trait of interest (here, CanUD) on outcomes—to examine causal relationships with chronic pain, lung cancer, physical activity and SCZ. Finally, we performed genomic structural equation modeling (gSEM)—a multivariate method for analyzing GWAS summary statistics to examine joint genetic architecture of traits—to understand the genomic relationships between cannabis use traits and other psychiatric and substance use disorder (SUD) traits. This work builds upon a decade of progress in the field^{7,10,13,14,15,16,17,18}.

Results

GWAS

We assembled a total sample of 886,025 EUR participants across five datasets (Table 1; 42,281 cases and 843,744 controls) for GWAS meta-analysis of CanUD and identified 22 independent genome-wide significant (GWS) loci in this population. In the AFR meta-analysis of 123,208 participants across three cohorts (19,065 cases and 104,143 controls), we identified two GWS loci. In a cohort of 38,289 participants assigned using the broad AMR ancestry references (which include individuals recruited from several Latin American populations) in the MVP cohort (2,774 cases and 35,515 controls) we found one GWS locus, and in EAS ancestry references we identified two GWS loci. The lead signal for EUR was near CHRNA2 (rs56372821, P = 7.3 × 10⁻¹⁴), which encodes cholinergic receptor nicotinic alpha 2 subunit, consistent with prior GWAS^7,10; the lead SNP was identical to one prior study⁷. Findings for AFR include a SNP in an intron of SLC36A2 (rs573117193, P = 4.9 × 10⁻⁸), which encodes a pH-dependent proton-coupled amino acid transporter for glycine, alanine and proline. The lead SNP in AMR was rs9815757 (P = 4.4 × 10⁻⁸). The lead SNP in EAS (rs78561048, P = 6.7 × 10⁻⁹) is intronic to SEMA6D, which encodes semaphorin 6D (Fig. 1 and Table 2). Several variants showed concordant direction of effect across all four stratified ancestral groups. Five additional loci were discovered in the multi-ancestry analysis: rs7003100 (intergenic), rs7029483 (130 kb upstream of MTND2P8), rs2627197 (intronic to ENO4), rs34438449 (40 kb downstream of MIR5007) and rs147144681 (intronic to CHRNA3).

Table 1 Demographics

Full size table

**Fig. 1: Stacked Manhattan plots depicting CanUD GWAS results from four ancestries tested.**

Table 2 Lead SNP for each ancestral group

Full size table

LDSC

Intergroup comparisons between EUR CanUD cohorts (MVP, PGC/deCODE, iPSYCH2) included in the meta-analysis yielded high genetic correlation, with r_G ranging between 0.71 and 0.87. Comparative analysis of CanUD and cannabis use traits with a range of psychiatric and nonpsychiatric traits revealed striking differences, with CanUD showing far stronger overlap with pathological and negative traits (Fig. 2). The largest magnitude difference was in educational attainment, which showed a positive correlation with cannabis use but a negative correlation with CanUD. Covariate LDSC was used to calculate SNP-based heritability within each ancestral group. Significant SNP-based heritability was identified for the three larger ancestries: EUR h² = 6.7% (standard error (s.e.) = 0.017), AFR h² = 8.1% (s.e. = 0.013), and AMR h² = 18.0% (s.e. = 0.042). There was high variance and a high point estimate in AMR. LDSC was used to calculate genetic correlation between cannabis use dependence cohorts included in this meta-analysis and also within MVP phenotype definitions (Supplementary Table 1). Genetic correlations were calculated for 1,335 traits (Fig. 2 and Supplementary Fig. 2). The strongest observed positive correlations were related to smoking initiation and alcohol dependence, while the strongest negative correlations were with ages of first intercourse and smoking cessation.

Cross-ancestry genetic correlation

Genetic correlations were calculated against available traits using POPCORN¹⁹ for CanUD in African ancestry and a selection of traits represented in Fig. 2. When compared to the same traits in EUR, there is no significant difference across ancestries (Supplementary Fig. 3).

Mendelian randomization

Multi-site chronic pain had a unidirectional causal effect on CanUD (inverse variance-weighted (IVW) β = 0.46, P = 2.90 × 10⁻⁵). There was a bidirectional causal effect of CanUD and SCZ (SCZ→CanUD IVW β = 0.17, P = 2.07 × 10⁻⁵, CanUD→SCZ IVW β = 0.17, P = 0.01). CanUD showed a unidirectional effect on lung cancer (IVW β = 0.18, P = 0.006) (Supplementary Fig. 1 and Supplementary Tables 5–13).

Conditional analysis

For EUR, we performed a multi-trait conditional and joint analysis (mtCOJO) of CanUD conditioned on two smoking traits from the GWAS and Sequencing Consortium of Alcohol and Nicotine use study to investigate potential confounding effects²⁰. Two different datasets were used: smoking initiation and cigarettes per day. Individual runs were performed for the two cigarette smoking traits. A proportion of 18 of 22 original lead SNPs remained in the dataset following conditioning on smoking initiation (meaning they matched with variants in the conditioning data). For two out of four remaining SNPs, there were proxy SNPs in LD with each lead SNP showing GWS P values. Only rs545943750 and rs184064410 were excluded after conditioning due to missingness in the smoking data, leaving 20 of 22 lead loci from the CanUD GWAS available in the conditional analysis. All 20 remained GWS following conditioning. The results were similar with CanUD conditioning on cigarettes per day, with the same 20 lead loci remaining GWS after conditioning. Conditional analysis with smoking initiation or cigarettes per day did not substantially alter the magnitude of the lead CHRNA2 association (P_cond = 2.14 × 10^-14). We used these summary statistics conditioned on cigarette smoking initiation to re-test the causal relationship between CanUD and lung cancer, and while the signal attenuated, it was still significant (IVW β = 0.2, P = 0.0025). The conditional analysis with cigarettes per day, however, removed the effect of CanUD on lung cancer (P = 0.79).

Multi-trait analysis of GWAS

Considering the high genetic correlation of CanUD with alcohol use disorder (AUD) and the Fagerström Test for Nicotine Dependence (FTND), we conducted an multi-trait analysis of GWAS (MTAG) analysis that identified 34 lead SNPs at 26 genomic risk loci, including four novel loci compared to the EUR meta-analysis, at P < 5 × 10⁻⁸ for CanUD (Supplementary Fig. 5 and Supplementary Table 14) when combined with AUD and FTND. The GWAS-equivalent sample size for CanUD was 200,762, augmenting the meta-analysis effective sample size of 161,053 by 20%. Ten genomic risk loci were significant (or in LD with significant variants) in both the GWAS and MTAG analyses. The remaining 16 significant variants were LD independent. The effect size of eight of the 26 significant SNPs in the MTAG analysis was significantly smaller than those obtained from the original GWAS (Supplementary Table 15), suggesting specificity to CanUD.

Transcriptome-wide association study

In TWAS analyses, 59 and 25 genes were detected (P < 2.5 × 10⁻⁶) using adult and fetal brain frontal cortex expression, respectively, with six genes in common (Fig. 3a). We tested these genes by permutation test, leaving 44 and 17 genes using adult and fetal models, with two genes in common (Fig. 3a). For the remaining genes within 1 Mb of one another, we applied gene-level probabilistic fine-mapping. In the end, we detected 36 and 15 genes using the adult and fetal models, which form 90% credible sets (with 90% estimated probability of containing the causal variant) that explain the corresponding genetic associations (Fig. 3a, b). These sets contained only one gene in common: DALR Anticodon Binding Domain Containing 3 (DALRD3) (Fig. 3a, b). The observed gene associations included four distinct GWAS loci: 3p21.31 (gene detected in adult and fetal brain cortex: DALRD3), 5q12.1 (fetal: ERCC8), 11q23.2 (adult: RP11-629G13.1) and 16q22.2 (adult: PHLPP2). Protein functions of these genes are described in the Discussion below. The remaining set of genes identifies 38 candidate novel genetic loci associated with CanUD, with potential underlying transcriptomic mechanisms in either adult or fetal brain cortex (Supplementary Table 3).

**Fig. 3: TWAS and tissue enrichment of the EUR CanUD GWAS variants.**

Partitioned SNP-based heritability

Standardized TWAS effect sizes estimated using adult and fetal brain frontal cortex expression models showed moderate correlation (Spearman’s ρ = 0.54, P < 2.2 × 10⁻¹⁶; Fig. 3c). Accordingly, we next estimated the SNP-based heritability enrichment in adult and fetal brain cortex eQTL. Using LDSC, we estimated enrichment ratios for SNP-based heritability using different windows around expression SNPs for expression genes. We detected significant enrichments only for fetal brain frontal cortex expression SNPs at windows of 0 bp, 50 bp and 100 bp. In general, fetal brain frontal cortex eQTLs were far more enriched for CanUD trait heritability than adult brain cortex eQTLs (Fig. 3d).

gSEM

Using exploratory factor analysis (EFA), a four-factor model fit the data best, with the cumulative variance explained being 0.789, distributed relatively evenly across the four factors, with each accounting for between 22.7% and 29.2% of the overall variance explained (factor 1 of 0.23, factor 2 of 0.19, factor 3 of 0.18 and factor 4 of 0.18). Each of the four factors had high sums of square (SS) loadings (factor 1 SS of 3.5, factor 2 SS of 2.9, factor 3 SS of 2.8 and factor 4 SS of 2.7).

Using confirmatory factor analysis (CFA) to evaluate the four-factor model that allowed all factors to intercorrelate had a comparative fit index of 0.913, a standardized root mean square residual of 0.068, a chi-squared value of 1397.5 and an Akaike information criterion of 1483.5. Traits loading most strongly on factor 1 included ‘Unable to work’ (loading of 1.06 ± 0.04), Townsend deprivation index (loading of 0.56 ± 0.03), chronic pain (loading of 0.50 ± 0.04) and FTND (loading of 0.45 ± 0.07). Traits loading most strongly on factor 2 included number of sex partners (loading of 0.91 ± 0.02), cannabis use (loading of 0.70 ± 0.03) and initiation of regular smoking (loading of 0.58 ± 0.03). Psychiatric traits loaded most strongly on factor 3 and included major depressive disorder (MDD) (loading of 0.95 ± 0.02), post-traumatic stress disorder (PTSD) checklist score (PCL) total (loading of 0.88 ± 0.04), generalized anxiety disorder symptoms (loading of 0.86 ± 0.03), suicide attempt (loading of 0.59 ± 0.05) and SCZ (loading of 0.29 ± 0.02). SUD traits loaded most strongly on factor 4 and included CanUD (loading of 0.96 ± 0.03), opioid use disorder (loading of 0.85 ± 0.05) and AUD (loading of 0.81 ± 0.03). There were moderate correlations between factors 2 and 4 (r = 0.65), factors 1 and 3 (r = 0.64), factors 1 and 4 (r = 0.52) and factors 3 and 4 (r = 0.53). All correlations and loadings are summarized in Fig. 4.

Discussion

Recently, cannabis use has been legalized in various US states and elsewhere without fully examining the health consequences of individual or societal risks. An epidemiologic survey conducted by the National Survey on Drug Use and Health in the United States identified a past-year cannabis use prevalence of 17.5%, an increase from 11.0% in 2002, and 1.8% with CanUD, the same percentage recorded in 2002. Usage varies worldwide, with many regions of high prevalence²¹.

The findings we report here add to our understanding of CanUD biology on many levels. First, we greatly increased the available sample size for genomic analysis, mostly by incorporating MVP data, and identified multiple novel risk loci in four populations, improving on previous results in EUR by more than an order of magnitude and presenting the first genetic discoveries in the other populations studied. Using the GWAS data, we then showed overlapping genetic liability to other traits. Next, investigating how genetic variation underlying CanUD influences fetal brain gene expression, the brain in particular showed significant enrichment for SNP-based heritability. Essentially, SNPs that influence fetal brain gene expression explain a greater proportion of CanUD phenotypic variance than the overall GWAS association of all SNPs. We investigated the overlapping and shared underlying genetic architectures of several different traits and employed MR to demonstrate putative causal relationships between outcomes with substantial impact on human health, including an association with lung cancer risk. Cannabis is frequently consumed using methods involving inhaling combustion products, potentially exposing users to risks similar to those found in smoking other substances such as tobacco. Indeed, some of the shared genetic risk between CanUD and tobacco smoking may relate to propensity to smoke per se, independent of substance, a hypothesis that we currently lack the power to evaluate.

We identified 22 significant loci, most of them novel, for CanUD in EUR. We also replicated findings in CHRNA2 (meta P = 7.3 × 10⁻¹⁴, MVP only P = 1.1 × 10⁻⁵) and FOXP2 (meta P = 1.7 × 10⁻⁸, MVP only P = 2.0 × 10⁻³), with triple the effective sample size of the largest of those studies¹⁰, demonstrating once again the stability of GWAS findings as sufficient sample size and power to discover new loci are reached^22,23. We discovered GWS loci in four ancestral groups: EUR, AFR, AMR and EAS. In AFR, two independent SNPs were associated on chromosome 5. The first (rs574008891) was within an intron of the gene that encodes methylcrotonyl-CoA carboxylase subunit 2 (MCCC2). The other significant locus (rs573117193) mapped to an intron in the solute carrier family 36 member 2 (SLC36A2) gene. These specific variants are absent in the other ancestries studied. For AMR, the one risk locus was rare (rs9815757, minor allele frequency (MAF) 0.1%) and mapped in an intergenic region downstream of leucine rich repeat containing 3B (LRRC3B). Finally, for EAS, one locus was associated with CanUD: rs78561048, near semaphorin 6D (SEMA6D). Follow-up analysis in larger samples is needed to assess the robustness of findings, particularly in AMR and EAS. Several variants showed concordant direction of effect across all four stratified ancestral groups (Table 1). For instance, rs10986600, significantly associated in EUR on chromosome 9, was nominally significant (P < 0.05) with same effect direction in AFR (0.04) and AMR (0.03) and significant in the multi-ancestry meta-analysis. This intronic variant of the protein phosphatase 6 catalytic subunit (PPP6C) is an eQTL for PPP6C, a gene linked to various cancers, including skin melanoma and lung squamous cell carcinoma. Multi-ancestry meta-analysis revealed an additional five loci not identified in the stratified analyses. Among them, the lead SNP on chromosome 15, rs147144681, which maps to an intron of the cholinergic receptor nicotinic alpha 3 subunit (CHRNA3) gene, is particularly noteworthy; as reported above, variation in CHRNA2 was among the first variants associated with CanUD and was replicated here. This suggests potential convergence involving the cholinergic system broadly and nicotinic receptors, specifically in the underlying etiology of CanUD. While nicotinic receptors are also associated with tobacco smoking-related traits²⁴, the relative pattern of association for those traits is different from the observations for CanUD—for many smoking-related traits, a chromosome 15 nicotinic receptor cluster is associated with orders of magnitude greater support than other variants, including other nicotinic receptors; for CanUD, CHRNA2 is consistently the strongest association, also by orders of magnitude. We conducted conditional analysis for CHRNA2 and found the conditional P value remained robust following conditioning on smoking initiation²⁰ (P_cond = 4.6 × 10⁻¹⁴). This replicates similar analyses performed by Demontis et al.⁷ and Johnson et al.¹⁰, which showed conditioning on smoking did not affect the CanUD association at this variant. Several other loci near cholinergic receptor subunit genes previously identified for smoking are not significant in our analysis of CanUD (CHRNA4, rs13036436, smoking P = 1.1 × 10⁻²⁹, CanUD P = 0.97; CHRNA5, rs667282, smoking P = 9.9 × 10⁻²⁵, CanUD P = 0.043). Conversely, the CHRNA3 variant we find associated with CanUD is not significant for smoking (rs147144681, smoking P = 0.0033, CanUD P = 3.3 × 10⁻⁸) (ref. ²⁰).

Genetic correlations were calculated for 1,335 traits to identify genetic overlap with CanUD. Some traits with significant r_G were tested for causal inference based on a combination of significant genetic correlation and a prior interest in phenotype (physical activity, multi-site chronic pain, Alzheimer’s disease and SCZ). We identified a bidirectional causal relationship between CanUD and SCZ. At the same time, the MR Egger analysis indicated this was not due to horizontal pleiotropy. This supports similar findings reported previously, confirming previous genetic–epidemiologic studies²⁵ and verifying an important public health risk associated with CanUD. To highlight differences between cannabis use and CanUD, we compared the pattern of genetic correlations across 18 traits, which showed striking differences. CanUD was much more closely associated with psychopathology, recapitulating a general pattern seen with other comparisons of SUD and use traits²⁶. For example, while we observed a substantial negative correlation between CanUD and educational attainment, cannabis use was associated with greater educational attainment. POPCORN was used to generate a cross-covariance score to allow for comparison of traits across ancestries using genetic correlations for EUR and AFR groups (Supplementary Fig. 3). We found a striking similarity for cross trait comparisons for both groups, indicating a similar underlying genomic architecture. This finding supports the possibility that some findings uncovered so far for EUR individuals, recruited in vastly greater numbers for genetic study, will provide some degree of generalizability across human populations.

Chronic pain may be a factor driving CanUD in some individuals, with significant unidirectional evidence for a causal effect of chronic pain²⁷ on CanUD in the MR analysis. Cannabis use has been proposed as a treatment for chronic pain, and there are several clinical trials in progress²⁸. This MR observation suggests that there may be merit in cannabis as a treatment for at least some kinds of pain. The small overall beneficial effect observed requires so many individuals to be treated that harmful effects (such as increased CanUD) also become a significant factor²⁹. Our MR results suggesting that chronic pain has a causal influence on CanUD emphasize the need for follow-up investigations that address whether greater consideration should be given to the adverse effects, rather than just the therapeutic effects among individuals receiving cannabis-based medicines. A similar question arises with opioids, which although often prescribed for pain, can also cause great harm³⁰: namely, what level of risk of CanUD is acceptable given cannabis’ potential to improve quality of life and reduce opioid exposure in chronic pain patients? Our results suggest that harms such as dependence and consequences, reflected in underlying genetics of the trait, may need to be weighed against the potential benefits of cannabis treatment for chronic pain. Future studies should consider this novel relationship to pain³¹ and clinical efficacy trials are underway.

Cigarette smoking substantially increases the risk of many forms of cancer, including lung cancer, through numerous well-studied mechanisms with established literature dating back more than 60 years³². The influence of cannabis on cancer risk is less well understood; it should be anticipated that these combustion products could have harmful pulmonary impacts—indeed, it would be surprising if smoking tobacco, but not smoking cannabis, increased cancer risk. MR yielded evidence for a unidirectional causal effect of CanUD on lung cancer. This result was robust to conditioning on data from the largest available smoking initiation GWAS but not conditioning on cigarettes per day, both traits that also have causal relationships with lung cancer but far more robust genetic instruments to evaluate this relationship. We do not currently have a way to assess genetic variation associated with the route of cannabis administration, but combustion is by far the most common method in the MVP and other cohorts studied. Given the trend toward increased legalization and usage, this apparent causal association needs to be monitored as it may have profound and underappreciated public health consequences. As the causal relationship with CanUD was not robust to conditioning on cigarettes per day, one probable explanation may be that there is horizontal pleiotropy between these traits in their influence on lung cancer.

Four GWS loci overlapped with TWAS prioritization from the EUR meta-analysis, using eQTL integration from samples of adult³³ and fetal³⁴ cortical tissue. These were DALRD3 (both fetal and adult), ERCC8 (fetal), RP11-629G13.1 (adult) and PHLPP2 (adult). The DALRD3 protein product, a DALR anticodon binding domain, forms a complex with the product of METTL2B. Nonsense mutations in DALRD3 are associated with developmental delay and early-onset epileptic encephalopathy³⁵. ERCC8 encodes the excision repair 8, CSA ubiquitin complex subunit, which plays a role in DNA repair and is associated with the developmental disorder Cockayne syndrome³⁶, as well as breast, esophageal and other cancers^37,38. RP11-629G13.1 is a long noncoding RNA associated with downregulation of NCAM1 gene expression in multiple myeloma patients³⁹. Significant partitioned SNP-based heritability was observed in fetal but not in the adult cortex, with 4.36% of trait SNP-based heritability explained by 0.12% of the total SNPs near fetal frontal cortex eQTLs. Only 1.77% of CanUD SNP-based heritability was explained using 0.13% of the total SNPs near adult cortex eQTLs. Fetal development may play a role in SUD susceptibility⁴⁰, and substance use can influence fetal development during pregnancy and health outcomes during childhood⁴¹. Although exogenous exposure to cannabis may not occur until years or decades after birth, enriched fetal SNP-based heritability in this study argues a possible role for genetic effects on CanUD in the developing brain independent of exposure. SCZ risk is also modulated by risk factors during fetal development⁴² and genetic⁴³ and environmental effects (including maternal food deprivation in the first trimester of pregnancy⁴⁴). Temporal convergence of the initiation of genetic risk effects for both SCZ and CanUD, if validated experimentally, would provide insight into the genetic relationship between these disorders and could relate to a mechanism for the bidirectional risk relationship between cannabis use and SCZ.

gSEM was used to contextualize summary statistics from this project with those from other published GWAS studies. Exploratory and confirmatory factor analyses showed that four factors provide the best fit for the 14 correlated traits included in the analysis. Factors fit mostly into categories that relate to functional impairment (factor 1), impulsivity and risk taking (factor 2), psychopathology (factor 3) and substance dependence (factor 4). CanUD fit best (and strongly) in the substance dependence cluster (factor 4). FTND fit into factors for functional impairment and substance dependence. Suicide attempts fit into functional impairment, impulsivity/risk taking and psychopathology. This is consistent with research showing overlapping pathologies within addiction and shared genetic risk factors between them⁴⁵.

This study has limitations. The use of electronic health records allows for a large sample of CanUD cases but limits the assessment of subdiagnostic cannabis use in controls. Although we accounted for subdiagnostic cannabis users by excluding them from controls when information was available, these are probably underreported. Future studies of individuals with ascertained cannabis use who do not meet criteria for CanUD would provide more insight into the specific genetic liability to dependence. As the traits of interest were gathered from previously published reports or queries of electronic health records (EHRs) for diagnostic codes, we did not have information regarding tetrahydrocannabinol (THC) blood levels or information on the potency of cannabis at each exposure. If these data were available, study of effects on cannabis potency on dependence and comorbidities would be of great interest. We identified a causal relationship between multi-site chronic pain and CanUD. As pain is a complex trait and different type of pain may interact differently with CanUD, our finding for multi-site chronic pain is not sufficient to draw conclusions about the interaction between CanUD and specific kinds of pain or pain syndromes. Our definition of CanUD was based on any report of abuse or dependence either as an inpatient or outpatient. Participants in this study span a period of changing legal status and increasing use of marijuana, a major secular trend. Given the age of the participants (Supplementary Table 16) and expected time from initial exposure to the development of a use disorder, nearly all participants would have been exposed to cannabis before legalization. The TWAS study did not include ascertainment for CanUD in the individuals who donated brain tissue used for analysis. We discovered GWS loci in ancestral groups, but AFR, AMR and EAS sample sizes were small compared to EUR. We did not perform MR or TWAS analyses in non-European samples because available GWAS and eQTL datasets are still limited in non-European ancestry populations, and cross-ancestry analyses carry risk of biases due to differences in the underlying LD structure between ancestries. More studies are needed of individuals of diverse ancestries to replicate these findings, estimate their robustness and ensure that the benefits provided by these studies are available to all people.

This is the largest genetic study of CanUD so far, including data from multiple international cohorts in more than one million participants and comprising four ancestral groups. We replicate two prior GWS findings while identifying 25 novel loci, and we leverage these novel data to investigate genetic overlap with other traits. We identify a clear difference between cannabis use and CanUD, with genetic liability to CanUD being much more closely associated with psychopathology and disability. We found greater heritability enrichment in fetal than adult brain tissue, supporting an important role of development in laying the biological basis for CanUD. We used MR to assess causal relationships and found evidence of bidirectional causal effects between CanUD and SCZ and unidirectional effects of multi-site chronic pain on CanUD, and of CanUD on lung cancer. Finally, using gSEM, we found that CanUD loads on a latent factor with other substance dependence traits, consistent with clinical observation, genetic epidemiology and prior genetic studies of other SUD traits. In particular, we highlight the possible relationship revealed herein between CanUD and lung cancer risk. This study yields new insights into the genetic architecture of CanUD and how this risk interacts with traits crucial to public health and raises important concerns regarding the potential adverse consequences of the secular trend toward increased cannabis use consequent to legalization.

Methods

Inclusion and ethics statement

We included researchers from the iPSYCH biobank and the PGC, who played a role in study design. This research was not restricted or prohibited in the setting of any of the included researchers. All studies were approved by local instituational research boards and ethics review committees. MVP was approved by the Veterans Affairs central instituational research board. We do not believe our results will result in stigmatization, incrimination, discrimination or personal risk to participants.

Cohorts

We used data release version 4 of the MVP. Linked and de-identified EHRs were queried using the Veterans Affairs Informatics and Computing Infrastructure to identify individuals with International Classification of Disease (ICD) codes for cannabis dependence or cannabis abuse (together, CanUD) (Supplementary Tables 2 and 3). The range of diagnosis dates was between May 1992 and December 2019. Two classifications were investigated: (1) cases identified by at least two separate outpatient visits or any number of inpatient visits to a US Veterans Affairs (VA) medical center for CanUD and (2) cases identified by at least one inpatient or outpatient visit for CanUD. Genetic correlation analysis indicated that these traits were almost identical from a genetic perspective (r_G = 0.99) and SNP-based heritability (h²) was not statistically different (definition 1, h² = 0.075, s.e. 0.0053, z = 14.1; definition 2, h² = 0.087, s.e. 0.0062, z = 14.0; P_diff = 0.14), so case definition per the second classification was retained for further analysis (that is, at least one inpatient or outpatient visit). All individuals diagnosed under the first disease definition were also diagnosed under the second more inclusive definition. Controls were defined as individuals without any VA EHR ICD codes for cannabis dependence, cannabis abuse or cannabis use (cannabis use codes included in ICD-9: 305.29 and included in ICD-10: F12.90, F12.920, F12.921, F12.922, F12.929, F12.93, F12.950, F12.951, F12.959, F12.980, F12.988 and F12.99). The PGC cohort was as previously described and was made up of 16 cohorts with varying phenotype definitions and ascertainments¹⁰. A leave-one-out analysis was performed to remove the iPSYCH1 sample, leaving 18,370 cases and 304,838 controls for European and African ancestries in the remaining PGC/deCODE sumstats. An updated expanded iPSYCH2 cohort was then added via meta-analysis (4,733 cases and 95,657 controls, all EUR). We also included samples from MGB Biobank (456 cases and 24,088 controls, all EUR) and new data from the Yale–Penn cohort⁴⁶ beyond the individuals already included in the PGC study (an additional 310 cases and 1,471 controls for EUR, and 271 cases and 666 controls for AFR). Table 1 gives numbers for each cohort.

MVP genotyping, imputation, quality control, and GWAS and meta-analysis

Genotyping and imputation of MVP participants has been described previously¹¹. Briefly, a customized Affymetrix Axiom Array was used for genotyping. MVP genotype data for biallelic SNPs were imputed using Minimac4 and a reference panel from the African Genome Resources panel by the Sanger Institute. Indels and complex variants were imputed independently using the 1000 Genomes (1KG) phase 3 panel and merged in an approach similar to that employed by the UK Biobank. Designation of broad ancestries was based on genetic assignment with comparison to 1KG reference panels⁴⁷.

MVP GWAS was conducted using logistic regression in PLINK 2.0 using the first ten positive controls, sex and age as covariates. Variants were excluded if call missingness in the best-guess genotype exceeded 20%. Alleles with MAF <0.1% were excluded in EUR, AFR and AMR. Alleles with MAF <1% were removed from EAS due to smaller sample size. The MVP data represented the largest and most diverse cohort with 22,260 cases and 423,587 controls (EUR), 14,946 cases and 97,580 controls (AFR), 2,774 cases and 35,515 controls (AMR) and 194 cases and 6,649 controls (EAS) (Table 1). GWAS meta-analyses in the PGC datasets of the deCODE and PGC samples were conducted as previously described, although a leave-one-out analysis was conducted to remove data from iPSYCH1 so that a larger cohort could be independently analyzed¹⁰. This leave-one-out PGC meta-analysis contained 14,522 EUR cases and 298,941 controls and 3,848 AFR cases with 5,897 controls. This study includes new genotypes from iPSYCH (referred to as iPSYCH2), and all iPSYCH data (iPSYCH1 + 2) has been reprocessed. Pre-imputation quality control and imputation were performed on genotypes from the full set of genotyped individuals for iPSYCH1 and iPSYCH2 separately, using standard procedures for GWAS data. The iPSYCH1 samples were genotyped in 23 genotyping waves and thus additional steps were taken to eliminate potential batch effects. Only variants present in more than 20 waves and with no significant association with wave status were retained. Imputation was done using the pre-phasing/imputation stepwise approach implemented in EAGLE v2.3.5⁴⁸ and Minimac⁴⁹, using the Haplotype Reference Consortium⁵⁰ panel v1.0. GWAS of 4,733 EUR cases and 95,657 controls and was done on a merged set of best-guess genotypes with MAF >0.01 and imputation info score >0.8 (in both iPSYCH1 and iPSYCH2) using logistic regression with appropriate covariates (age, sex, psychiatric diagnoses (attention deficit hyperactivity disorder, autism spectrum disorder, SCZ, bipolar disorder and MDD), first ten positive controls and iPSYCH cohort of origin). A new Yale–Penn tranche was analyzed using PLINK 1.9 in unrelated individuals not previously included in any other GWAS or meta-analysis. This contributed 310 cases and 1,471 controls (EUR) and 271 cases and 666 controls (AFR). Finally, MGH Partners BioBank⁵¹ contributed 456 cases and 24,088 controls (EUR).

EUR cohorts were combined in a GWAS meta-analysis (Table 1). For AFR, we performed meta-analysis between the MVP, PGC and Yale–Penn cohorts. For AMR and EAS, only MVP included data so no meta-analysis was possible within these ancestries. GWAS meta-analyses were conducted using inverse variance weighing in METAL⁵² for both EUR and AFR. For within-ancestry meta-analyses, there were 42,281 EUR cases with 843,744 controls, and 19,065 AFR cases with 104,143 controls. The multi-ancestry meta-analysis⁵³ included 1,044,620 total participants of EUR, AFR, AMR and EAS ancestries. Sex-stratified analysis was conducted in the only cohort available individual GWAS for the analysis—the MVP (Supplementary Fig. 7).

LDSC and SNP-based heritability

LDSC was used to calculate SNP-based heritability on the liability scale, using a lifetime population prevalence⁵⁴ of 2% and a sample prevalence of 5% for EUR, 13.2% for AFR, and 7.2% for AMR within the MVP⁵⁵. We used the lifetime population prevalence reported in the PGC/deCODE/iPSYCH1 cannabis paper¹⁰ for comparability. Typically, calculating SNP-based heritability depends on reliable reference ancestry to account for nonindependence of some variance due to LD. This is easily done for EUR, but admixed non-European ancestries pose a statistical challenge. Covariate LDSC¹² uses sample covariates such as those derived from principal components analysis (a dimension reduction technique that produces eigenvalues for each variant) carried out in the study sample to adjust LD scores to enable calculation of SNP heritability in each ancestry using sample-specific LD scores. LDSC as implemented by the Complex Traits Genomics Virtual Lab⁵⁶ was used to estimate genetic correlations⁵⁷ to identify common genetic architecture across all 1,335 traits available for comparison. Additionally, LDSC was used to compare genetic correlations between CanUD and cannabis use (from a previously published study¹⁸).