Introduction

Cannabis is the most popular illicit drug worldwide, with approximately 183 million individuals reporting use in the past year [1]. In 2016, 35.6% of US youth in 12th grade reported cannabis use, with 6% reporting daily or almost daily use [2]. Concurrent with these trends, the perceived harms of cannabis have decreased while its societal acceptance has increased [3]. Thus, associations between cannabis use and health outcomes are of great public health importance.

One proposed risk of chronic cannabis use is alterations in brain structure, especially in adolescents and young adults. Substantial neurodevelopment occurs from adolescence to the mid-20s, including increased myelination, accelerated synaptic pruning, decreased gray matter volume, increased white matter volume, and maturation of prefrontal regions and associated neural circuitry [4, 5]. The endocannabinoid system, including CB1 receptors involved in the psychoactive response to cannabis, has been implicated in adolescent neurodevelopmental changes, such as regulating dendrite arborization and refining excitatory synaptic connections [6]. Thus, there are increasing concerns whether cannabis use during adolescence may disrupt normative trajectories of brain development.

Recent systematic reviews have concluded that frequent cannabis use in adolescence and early adulthood is associated with abnormalities in brain structure [7, 8]. However, despite accumulating research, consistent findings in this area have remained elusive. Generally, studies have focused on brain regions with high densities of CB1 receptors [9], including subcortical structures such as the basal ganglia, hippocampus, and amygdala, as well as the cerebellum, cingulate cortex, and prefrontal cortex. Several studies report associations between frequent cannabis use in adolescents and young adults and reductions in hippocampal volumes [10,11,12]. However, other studies do not replicate these reductions [13,14,15,16,17], including one longitudinal study [18]. Similarly, orbitofrontal cortex volumes have been examined, with mixed results [19, 20]. While three studies found larger volumes of cerebellar structures in adolescent frequent cannabis users [13, 21, 22], three report equivocal findings or decreases in cerebellar volumes [17, 23, 24]. Frequent cannabis users may also have thinner prefrontal cortex [15, 25], although several studies have not replicated these findings [14, 26, 27]. Finally, inconsistent results are apparent in amygdala, striatum, and cingulate cortex [10, 13, 15], despite their high density of CB1 receptors.

These inconsistencies may be due to differences in sampling, design, and analytic techniques. For example, small sample sizes have plagued neuroscience until recently, resulting in suboptimal statistical power and reducing the precision and reliability of findings [28]. Brain imaging studies of cannabis users are not exempt from this critique. In addition, there is heterogeneity in how studies define “problematic” cannabis use, varying from a diagnosis of cannabis use disorder, criteria for number of lifetime uses, or recent frequency of use. Another factor contributing to inconsistency may be the application of numerous neuroimaging data processing pipelines and analytic techniques across the literature combined with some opacity about how such differences may affect results [29, 30]. For example, many studies apply a region of interest (ROI) approach, which may bias results (and systematic reviews) without correction for multiple comparisons, especially if ROIs were not selected a priori.

In the current study, we leverage a large sample of adolescents and young adults ascertained in the Philadelphia Neurodevelopmental Cohort [PNC; 31] to examine structural brain differences related to levels of cannabis use. Replicating prior work with larger samples is increasingly recognized as essential, especially with rapid policy shifts regarding cannabis. However, we also advance prior research in several ways. First, we examine both frequent and occasional cannabis users, with use criteria informed by prior research. Since the majority of cannabis users are not daily or even almost daily users, this group of occasional users is of scientific interest both as an understudied but germane sample and to allow for examination of dose–response effects. Second, to reduce variability from analytic techniques, we apply robust quality control procedures [32] and implement two well-validated automated image analysis tools. Third, we add parameters informative for understanding brain development, such as gray matter density [GMD; 33]. Finally, we use a community-based sample not enriched for substance use or psychopathology, enhancing generalizability.

Materials and methods

Participants

The PNC is a single-site, community-based study of 9498 youths aged 8–22. Data from the initial cross-sectional sample, reported here, was collected from November 2009 to December 2011. For extensive details on the PNC, see refs [31, 34]. Importantly, selection was not based on psychiatric or substance use symptoms. PNC inclusion criteria included living in the tristate area of Pennsylvania, New Jersey, and Delaware; proficiency in English; and ability to provide informed consent/assent. Exclusion criteria were significant developmental delays or physical conditions that would interfere with participation in assessments. Criteria were intentionally broad in order to enhance generalizability. Participants who received neuroimaging were excluded for standard MRI contraindications. Participants and their guardians (for participants under 18) provided written informed consent/assent. Institutional Review Boards at University of Pennsylvania and Children’s Hospital of Philadelphia approved the protocol. See Figure S1 for a flowchart of sample construction.

Substance use and psychopathology assessment

Details of PNC substance use assessments were previously reported in detail [35]. Briefly, most participants received an abbreviated version of the Minnesota Center for Twin and Family Research self-report substance use assessment [36], which was privately self-administered on a laptop computer. The measure assessed lifetime use of several substances; for cannabis and alcohol, additional questions queried age at first use, frequency of past year use, and methods of access. In the initial phase of the project, prior to implementing the self-report measure, the Kiddie-Schedule for Affective Disorders and Schizophrenia (K-SADS) substance screening interview was administered (6% of participants). Since this screener did not query frequency of use, here we only include the subset of participants who denied cannabis use from those administered the K-SADS screener. This measure was subsequently replaced to accommodate high participant volume and reduce social desirability biases. To further reduce social desirability biases, participants were assessed separately from collaterals (e.g., parents) and informed that information reported would be kept confidential except as legally required. Participants endorsing use of fake drugs (e.g., “cadrines”) were excluded from analyses (n = 8), similar to previous work [35].

Analyses were conducted in participants between 14 and 22 years old (n = 781) given limited cannabis use under age 14 [see ref. 35]. Informed by prior work from our group [35] and others [37, 38], we divided cannabis users into frequent users (≥ “3–4 times per week”; n = 38) and occasional users (≤ “1–2 times per week”; n = 109). Information regarding abstinence and urine toxicology were not acquired. To examine associations with cannabis from cumulative recent use, as opposed to remote use, we only examined cannabis users who endorsed use over the past year, removing 62 participants from analysis.

As described previously [31], a computerized K-SADS collected information on symptoms, duration, distress, and impairment for lifetime mental health symptoms. Empirically-derived psychopathology factor scores were generated from this measure to parse psychopathology into symptom dimensions (see Supplementary Methods and ref. [39]).

Neuroimaging acquisition

A subset of the PNC (n = 1601) received structural MRI, as previously reported [34]. Imaging data was acquired on a single MRI scanner (Siemens 3T TIM Trio, Erlangen, Germany; 32-channel head coil) using the same sequences for all participants. A magnetization‐prepared, rapid acquisition gradient‐echo (MPRAGE) T1‐weighted image was acquired with the following parameters: TR 1810 ms; TE 3.51 ms; FOV 180 × 240 mm; matrix 192 × 256; 160 slices; slice thickness/gap 1/0 mm; TI 1100 ms; flip angle 9°; effective voxel resolution of 0.93 × 0.93 × 1.00 mm; total acquisition time 3:28 min.

Three highly-trained image analysts independently assessed structural image quality, as previously described in detail [32]. Briefly, prior to rating images, three raters were trained to >85% concordance with faculty consensus ratings on an independent training sample of 100 images. T1 images were rated on a 0–2 Likert scale (0 = unusable images [3.1%]; 1 = usable images with some artifact [16.9%], and 2 = images with none or almost no artifact [80.0%]). Images with ratings of 0 were excluded. Average quality rating across the three raters was included as a covariate in all models to control for the confounding influence of subtle variation in image quality.

Structural image processing

We evaluated multiple measures of brain structure including cortical thickness (CT), volume, and GMD. Image processing and analysis was primarily conducted with Advanced Normalization Tools (ANTs). We also performed supplementary processing of images with FreeSurfer to evaluate the robustness of results.

ANTs image processing

CT was quantified using tools in ANTs [40]. To avoid registration bias and maximize sensitivity to detect regional effects that can be impacted by registration error, a custom adolescent template and tissue priors were created using data from 140 PNC participants, balanced for age and sex. Structural images were processed and registered to this custom template using the ANTs cortical thickness pipeline [41], which includes brain extraction, N4 bias field correction [40], Atropos tissue segmentation [42], SyN diffeomorphic registration [43], and diffeomorphic registration-based CT (DiReCT) estimation in volumetric space [44]. Regional CT averaged CT estimates over anatomically-defined regions, as defined below.

To parcellate the brain into anatomically-defined regions, we used an advanced multi-atlas labeling approach. Specifically, 24 young adult T1-weighted volumes from the OASIS data set [45], manually labeled by Neuromorphometrics, Inc., were registered to each subject’s T1-weighted volume using SyN diffeomorphic registration [43]. Using multiple manually-labeled atlases yields more accurate estimates of anatomical regions compared to manually-labeled images [46]. Label sets were synthesized into a final parcellation via joint label fusion [46]. To increase tissue specificity, volume was determined for each parcel using the intersection between the parcel created and a prior-driven gray matter cortical segmentation from the ANTs cortical thickness pipeline.

Finally, GMD was calculated via Atropos [42], with an iterative segmentation procedure initialized using 3-class K-means segmentation. This procedure produces both a discrete 3-class hard segmentation and a probabilistic GMD map (soft segmentation) for each subject. GMD was calculated within the intersection of this 3-class segmentation and the subject’s volumetric parcellation [33]. Importantly, this method is distinct from methods in most prior studies that use GMD interchangeably with voxel based morphometry analyses [e.g., 47], which display relationships closer to volumetric analyses than to density parameters [33].

Supplementary FreeSurfer image processing

Cortical reconstruction of T1 images was performed using FreeSurfer version 5.3 [48]. See Supplementary Methods for detailed procedures.

Group-level analyses

Group differences in structural imaging variables were explored using ANOVAs and Kruskall–Wallis tests. In all models, we controlled for potentially confounding variables including race, sex, overall psychopathology, average quality rating, and nonlinear effects of age. Nonlinearities were modeled using generalized additive models (GAMs) with penalized splines as implemented in the ‘mgcv’ R package. Interactions between groups and age, quadratic, and cubic expansions of age were explored in a similar framework. We examined group differences for global (mean GMD, total brain volume [TBV], cortical thickness), lobar (frontal, temporal, occipital, parietal, insular, limbic), and regional measurements. To explore significant omnibus ANOVAs, pairwise relationships were explored using t-tests. False discovery rate (FDR) correction was used to account for multiple comparisons throughout. Analyses were conducted in R version 3.3.

Follow-up nonparametric, data-driven analyses were conducted to probe for pairwise group differences while limiting the potential influence of non-normal distributions or outliers. Mean differences were estimated across 10,000 bootstrap folds as implemented by the ‘boot’ package in R. The ‘boot’ package allows for resampling while preserving group proportions, yielding 10,000 samples with roughly equivalent group distributions. Studentized 95% confidence intervals were then obtained. To account for multiple comparisons, p-values were obtained for every confidence interval, as previously detailed [49], with FDR correction.

As described below, these analyses revealed predominantly non-significant effects. Since non-significant tests do not necessarily support null results, we performed follow-up equivalence tests to examine whether the presence of effects of a particular magnitude could be statistically rejected, allowing for greater specificity in defining the magnitude of potential group differences. Two one-sided t-tests (TOSTs) evaluated equivalence between each pairwise comparison [50] as implemented in the ‘equivalence’ package in R. TOSTs require an upper and lower bound effect size. Due to increased sample sizes required to conduct TOSTs [50], effect sizes conventionally considered to be medium magnitude were first examined, setting our equivalence boundary at d = −0.5 and 0.5, respectively. Follow-up analyses used an effect size boundary from d = −0.3 to 0.3 to compare occasional and non-users (frequent user comparisons were not conducted at this boundary due to limited power). Two composite t-tests were run, one probing larger and the other smaller than the prespecified boundaries. In these tests, the null hypothesis is non-equivalence, or the presence of an effect.

Supplementary analyses

We conducted several supplementary analyses to examine whether potentially confounding variables may have influenced observed results. First, we conducted analyses of global, lobar, and regional measurements with additional covariates. In the first, item-wise alcohol variables were included as covariates (see Supplementary Methods). Alcohol data were unavailable for n = 35 participants. In the second, estimated IQ, as assessed by the Reading Subtest of the Wide Range Achievement Test-4 [51], was included as a covariate. Finally, we used propensity score matching [52] to match all cannabis users to a subsample of non-users on age and sex and re-ran all analyses (see Supplementary Methods).

Results

Characteristics

Demographic, substance use, and psychopathology characteristics are summarized in Table 1. Cannabis groups were older and included higher proportions of males than non-users, and frequent users had a higher proportion of males than occasional users. Groups were similar in race. Compared to non-users, frequent users evidenced lower estimated IQs, although all groups had estimated IQs within the average range. Both user groups reported more frequent alcohol use and higher overall psychopathology than non-users.

Table 1 Demographic and substance use characteristics of the sample, by cannabis use group

Structural imaging analyses

Group differences were non-significant across ranges of anatomical specificity. There were no significant group differences in any global metric (TBV, total white matter volume, total gray matter volume, mean CT, mean GMD) from the ANTs processing pipeline (see Figs. 1 & 2) or the FreeSurfer pipeline (see Figures S2 & S3 and Supplementary Results). Consistent, non-significant results were found using Kruskall–Wallis tests. At the lobar level, there were no significant group differences in volume or GMD. For cortical thickness (using ANTs), two lobar values were nominally significant: the left frontal lobe (F = 4.60, p = 0.01) and the left parietal lobe (F = 3.29, p = 0.04) (Fig. 1). After FDR correction (Q = 0.05) was applied, these omnibus tests were no longer significant.

Fig. 1
figure 1

Density plots across the three groups of interest displaying standardized cortical thickness values using the Advanced Normalization Tools (ANTs) cortical thickness pipeline. Prior to plotting nonlinear effects of age, sex, mean manual rating quality, psychopathology, and race effects were removed from the values. On the left, mean cortical thickness is plotted, on the right lobar specific values are plotted

Fig. 2
figure 2

Density plots across the three groups of interest displaying standardized volume values using Advanced Normalization Tools (ANTs). Prior to plotting, nonlinear effects of age, sex, mean manual rating quality, psychopathology, and race effects were removed from the values. On the left, global metrics are plotted, while subcortical regions are plotted on the right

For regional differences, four regions of interest yielded uncorrected significant results in volume, including the left posterior cingulate gyrus, right superior temporal gyrus, and bilateral cerebellar white matter; none remained significant after FDR correction (see Table S1). Cortical thickness analyses found 11 uncorrected significant regions of interest, while none remained significant after FDR correction (see Table S1). There were no significant group differences in GMD. Effect sizes across regions in cortical thickness are presented in Figures S4S6 to discern overall trends in the data, and brain slices mapping regional F and t values and analytic code are available online (https://adrose.github.io/nullef/index.html).

Analyses of age by group interactions displayed non-significant effects. Across all 396 comparisons, no test yielded a significant interaction.

Supplementary analyses

Supplementary analyses that included alcohol use variables or estimated IQ as covariates revealed convergent findings with prior results, such that there were non-significant differences among groups across all levels of anatomical specificity. Furthermore, supplementary analyses of subsamples matched on age and sex did not demonstrate significant group differences at the global, lobar, or regional level (see Supplementary Results).

Bootstrap analyses

We followed these results with nonparametric, data-driven bootstrap analyses. Largely, results remained consistent (Fig. 3 shows global results), suggesting minimal group differences in cortical thickness, volume, or gray matter density at the global, lobar, or regional levels. However, one pairwise difference between non-users and frequent users in left frontal lobe CT remained significant after FDR correction (d = 0.45, 95% CI = [0.14,0.75], z = 2.9, pfdr = 0.041), indicating greater CT in non-users.

Fig. 3
figure 3

Histograms of mean differences across 10,000 bootstrap folds are displayed above for total brain volume, total gray matter volume, and total white matter volume. Differences were acquired by resampling data with replacement 10,000 times preserving group proportions across each fold. Differences of means were then calculated across each fold. P values were estimated by acquiring the 95% confidence interval across the entire population of mean differences (n = 10,000), and deriving a z-score from the confidence interval. The 95% confidence intervals are displayed on each histogram

Equivalence testing

Finally, we implemented equivalence testing to examine the inconsistent, relatively small effects reported above. At all levels of anatomical specificity, TOSTS remained significant at an FDR threshold of Q = 0.05 for all contrasts (e.g., frequent vs. non-users), indicating that any differences across groups in brain structural measures were between d = 0.5 and −0.5. Follow-up TOSTs were run limiting the magnitude of the effects to d = +/−0.3 to compare the non-users and occasional users, and all TOSTS remained significant, suggesting equivalence.

Discussion

In the current study, we used a systematic approach to examine whether occasional or frequent cannabis use was associated with alterations in measures of brain structure in a large, community-based, single-scanner sample of adolescents and young adults. Using rigorous quality control procedures and two well-validated analysis programs, we did not replicate most previously reported structural differences in cannabis users, as we found few differences in brain structural measures associated with either occasional or frequent cannabis use in adolescence. However, the distribution of effect sizes and nonparametric bootstrap analyses suggested the possibility of lower cortical thickness of a small magnitude in the left prefrontal lobe in frequent users compared to non-users. We also statistically evaluated non-significant results and provided support for the absence of medium or greater magnitude effects across groups. In addition, we did not find significant interactions between cannabis use and age, which would have suggested increased vulnerability to cannabis use at younger ages.

Several studies have reported significant associations between frequent cannabis use and alterations in subcortical and cerebellar volumes, cortical thickness, and surface-based morphometry in adolescents and young adults [7]. However, results have been notably variable and predominantly from studies with modest sample sizes. In the current study, we found limited evidence for structural brain differences, especially between occasional users and non-users of cannabis. Moreover, few structural neuroimaging metrics showed a coherent pattern of dose–response relationships by level of cannabis use. We also found minimal evidence of brain structural differences of a medium magnitude between frequent cannabis users and the other two groups, although the presence of smaller magnitude differences (e.g., left prefrontal cortex), cannot be ruled out. Previous studies with small samples were likely underpowered to detect small magnitude effects, which could partially explain variability in the literature. Our results converge with data from larger samples of cannabis-using youth, which have found more limited brain structural differences associated with cannabis than smaller studies. For example, Weiland and colleagues [17] compared 50 adolescent daily users of cannabis to 50 demographically matched non-users, replicating methods from an earlier study [47], and found non-significant differences in volume, surface-based morphometry, and shape. Similarly, in a sample of 439 adolescents, Thayer and colleagues [53] found no significant associations of past month cannabis use with brain volume or measures of diffusivity after covarying for alcohol use disorder symptoms. Our results extend these studies by probing effects across levels of cannabis use and examining measures of cortical thickness and gray matter density.

Although non-significant results from null hypothesis significance testing should not be interpreted as supporting “null” findings, we performed follow-up equivalence testing to provide context for these results. These analyses suggested that differences between groups across structural imaging metrics were likely less than d = 0.5, conventionally considered to be a medium magnitude effect size, and any differences between occasional users and non-users of cannabis were likely less than d = 0.3. As an example to aid in interpreting a d = 0.5 difference, there is a 64% chance that a person picked randomly from the frequent users would have a lower value in a measure such as cortical thickness than a person picked at random from the non-users (see http://rpsychologist.com/d3/cohend/). For effect sizes smaller than d = 0.5, there would be even lower probabilities of smaller values for a person picked randomly from the frequent cannabis group.

An alternative explanation of findings is that neuroanatomical alterations may only be present in youth with (a) heavier or greater duration of use than observed here; (b) symptoms of abuse or dependence. While we cannot rule these out given our limited information about cannabis use disorders, the criteria and use patterns of our sample appear similar to those from prior adolescent studies invoked to support the presence of structural brain differences [e.g., 16, 20]. In addition, our user groups had higher levels of alcohol use and more psychopathology, reflecting expected sample characteristics. Moreover, frequent users had higher levels of these symptoms than less frequent users, although dose–response relationships in structural brain metrics were not apparent. It is also possible that structural brain alterations require more of a cumulative dose than observed here or may take longer to emerge with continued neurodevelopment. Longitudinal research is needed to address such questions.

Considerations for interpretation

It is challenging to integrate our findings with the overall literature regarding cannabis and brain structure because of methodological heterogeneity. For example, some studies follow group-level analyses (sometimes without significant differences) with correlational analyses between structural metrics and variables such as age at first cannabis use or cannabis quantity; yet, whether these represent post hoc analyses, increasing Type I error risk, is rarely discussed. Heterogeneity in findings may also reflect potential moderators of risk/vulnerability that are either unidentified to date or untested because of limited statistical power. For example, in a large sample of adolescents between the ages of 12 and 21, French and colleagues [54] found that cannabis use before age 16 was associated with slightly reduced cortical thickness, but only in males with a high polygenic risk score for schizophrenia.

Additionally, causal inferences from observational data should be undertaken cautiously. Reflecting the complexity of interpreting brain structural differences, Cheetham and colleagues [55] showed that smaller orbitofrontal cortex volumes at age 12 were predictive of cannabis initiation at age 16, suggesting that some structural brain differences could reflect pre-use risks as opposed to consequences of use. Such pre-use differences could also be present in our sample. It is also challenging to determine whether group differences in adolescent brain structure reflect “disruptions,” as neurodevelopment is dynamic and dependent on complex trajectories of gray and white matter development [56]. To this end, longitudinal studies have reported mixed findings regarding altered trajectories of brain structural measures in cannabis users [27, 57].

Although we did not find strong support for brain structure alterations in cannabis users, our study cannot answer whether cannabis affects brain functioning in adolescent-onset users. Long-term studies of brain changes are scarce, and small structural effects could nonetheless have clinical significance for brain development, cognitive functioning, or mental health. It is also important to place results in the context of the overall cannabis literature, as they do not speak to other risks of use, such as propensity for addiction, motivational difficulties, or mental health disorders, including psychosis.

Limitations

Limitations of this study include the lack of detailed history of substance use and objective indicators of cannabis use. Additional biological measurements were considered but lacked feasibility with the large-scale data collection required. However, our substance use assessment was selected to minimize reporting bias and has shown good reliability and greater disclosure of substance use compared to diagnostic interviews [58]. Moreover, although our ascertainment methods limited selection biases, it is unknown whether our data generalize to individuals with cannabis use disorders. We also did not examine measures of white matter organization from diffusion MRI or perform shape analysis, although there are significant complexities in interpreting typically used measures (e.g., fractional anisotropy) from these techniques [59, 60]. Our data were cross-sectional, which presents challenges in youth samples with protracted, complex trajectories of brain development. In the absence of randomized controlled trials of cannabis, longitudinal data will provide the best test of whether cannabis causes brain structural alterations.

Conclusions

In a large sample of adolescents and young adults, the present study found predominantly non-significant differences in brain structural measures among cannabis non-users, occasional users of cannabis, and frequent users of cannabis. Follow-up analyses indicated that results were less likely due to reduced statistical power and more likely to limited or smaller magnitude effects. Results diverge from some prior studies that have reported structural differences associated with cannabis in brain volumes and cortical thickness; such differences could reflect our use of larger samples and a community ascertainment approach. Longitudinal studies are needed to determine whether adolescent cannabis use is associated with longer-term changes in brain structure.

Funding and disclosure

This work was supported by the NIH (RC2 MH089983 and MH089924; NIDA supplement to MH089983; R01MH107703; K08MH079364), the Lifespan Brain Institute (LiBI), and the Dowshen Program for Neuroscience. Dr. Scott’s participation was supported by a Department of Veterans Affairs Career Development Award (IK2CX000772). The views expressed in this article are those of the authors and do not necessarily reflect the position or policy of the Department of Veterans Affairs. The authors declare no competing interests.