Introduction

Cannabis use has a prominent role in the development of psychosis1,2, and exacerbates the course of the full-blown psychotic disorder3,4. Indeed, patients with psychosis who are habitual cannabis users have distinctly worse long-term outcome compared to those without concurrent cannabis use in terms of re-hospitalization, the severity of psychotic symptoms and general functioning5. In first episodes of psychosis, cannabis use is reportedly among the most powerful predictors of relapse to psychosis6. Likewise, in patients at clinical high-risk for psychosis (CHR) reporting lifetime cannabis use, continued cannabis use (CCu) after experiencing attenuated psychotic symptoms increased the risk of transition7. However, the number of individuals with cannabis use disorder has increased worldwide in recent years8, with correspondingly high rates of cannabis use disorder reported in early psychosis patients9. The vulnerability to CCu remains even after treatment to encourage cannabis abstinence10, and the response to abstinence interventions varies greatly between individuals11.

This body of evidence suggests that detecting individuals at risk for CCu as well as investigating and understanding the factors and mechanisms associated with CCu is from a preventive perspective important to improve the prospects for a good long-term outcome in CHR and recent-onset psychosis (ROP) patients12. In cannabis users drawn from community-based samples, sociodemographic factors such as young age, male sex, low income, higher body mass index (BMI) and substance use patterns each predicted relapse of cannabis use13,14. Further, the transition from irregular cannabis use to cannabis use disorder—a form of CCu that persists despite distress or impairment caused by the substance15—can be predicted by the pattern of substance use, as well as by mental health problems, history of traumatic events, schizotypal personality and living in an urban area16,17,18. Moreover, among clinically dependent cannabis users, poor current functioning predicts relapse of cannabis use19,20. Only one study20 has hitherto investigated predictors of cannabis relapse in psychotic patients (N = 66), wherein psychotic symptoms proved to be most predictive of relapse of cannabis use. However, a review of self-reported reasons for cannabis consumption by patients with psychosis21 concluded that present psychotic symptoms and self-medication are rarely reported as reasons for cannabis consumption. Instead, groups of psychotic patients21 and CHR patients22 both reported mood enhancement and social motives as their primary motivations for use. Cognitive deficits have been linked with relapse for several substances23, although the link with cognition is less consistently reported for cannabis compared to other substances24. Meta-analytic evidence of cognitive deficits attributable to cannabis use is complex, showing negative effects of cannabis on cognition in non-psychotic individuals, but also better preserved cognitive functions in psychotic patients with concurrent cannabis use25. Notably, the environmental risk factors for CCu, and the presence of cognitive deficits have also been individually associated with cannabis use in general, and with an increased risk for psychosis26,27.

Whether addiction—that is to say, a substance use disorder—should properly be called a “brain disease” remains a matter of debate28. Nonetheless, drug-seeking and relapse in the use of diverse substances, such as alcohol29,30 and cocaine31, have consistently been associated with underlying neurobiological alterations32,33. Interestingly, there is a substantial overlap between brain regions that are associated with drug-seeking in general, cannabis use disorder and psychosis32,33,34,35. Decreased grey matter volume (GMV) in the frontal cortex, hippocampus, insula and temporal lobe and increased volume in the cerebellar cortex, are common to all three conditions32,34,35,36,37. Further, effects of cannabis use on brain structure were more pronounced in psychotic individuals and individuals at clinical high-risk for developing psychosis compared to the effects in healthy individuals, potentially indicating a particular sensitivity to cannabis exposure38.

Several studies have investigated the association between these risk factors and cannabis relapse13,20 or the development of a cannabis use disorder16,17,18. Nevertheless, their power for predicting CCu in psychotic patients and their generalizability to other clinical cases—a precondition for model implementation into clinical practice39—have not yet been tested. Moreover, most studies have analyzed risk factors in isolation, without considering their potentially interconnected nature40. Progress in the field of predictive medicine using multivariable approaches has demonstrated that models enabling the simultaneous investigation of several risk factors and multiple data modalities can often outperform unimodal predictors for conversion to psychosis41,42, diagnostic approaches43 and functional outcome37.

In the current study, we (1) investigated multiple data modalities using machine learning39 to assess their power to predict CCu in patients with ROP. More specifically, we generated three predictive models of CCu based on single data modalities (unimodal); namely (i) clinical, (ii) cognitive and (iii) structural magnetic resonance imaging (sMRI)-based predictors. Next, we combined these models for super-ordinate prediction, to test whether combinations of unimodal predictors would improve the predictive performance of the algorithm. Then, (2) we applied the predictors to CHR individuals, aiming to assess the predictors’ generalizability to patients less severely affected in terms of psychotic symptoms and cannabis use. Finally, (3) we assessed how CCu is associated with several aspects of long-term clinical outcome to confirm previously published clinical relevance of CCu in ROP and CHR4,5,7. We hypothesized that there should emerge a pattern of interview-based variables at baseline that would predict CCu in our ROP sample above chance level and that, due to overlapping reasons for cannabis use between ROP and CHR patients22, this model would generalize well to a separate CHR population. Further, we hypothesized that including cognition and sMRI results would improve the algorithm’s predictive performance. In line with previous publications4,5 we expected that CCu would be associated with a worse long-term clinical outcome in ROP and CHR patients, thus highlighting the clinical relevance of the prediction.

Results

Sample characteristics

Overall, we included 182 patients (mean [SD] age, 23.8 [4.7] years and female = 68 [36.8%]) (Table 1) who all reported lifetime cannabis use at baseline. Eighty-seven patients (47%) had a CCu within a nine-month follow-up period, i.e. at least one cannabis consumption between baseline and follow-up. All other patients remained abstinent until at least nine months after baseline assessment and were labelled discontinued cannabis use (DCu). Follow-up data for DCu patients was available on average for a mean (SD) of 597 (254) days from the baseline assessment. In this time period, only N = 8 (8.7%) subjects labelled as DCu had a relapse in cannabis use after the nine-month follow-up. On average, patients with CCu resumed cannabis consumption after a mean (SD) of 94 (100) days from the baseline assessment. The time between baseline and renewed cannabis use did not significantly differ between CHR and ROP groups (mean [SD], 87 [100] days for CHR and 97 [102] days for ROP; t32 = −0.33, p = 0.744). We trained and tested our model in repeated nested cross-validation strictly separating training and testing folds on N = 109 patients of age 15–40 years with ROP and tested our model in a separate group of N = 73 CHR patients.

Table 1 Demographic information of patients with recent-onset psychosis and patients with clinical high-risk for psychosis.

In the ROP and CHR groups, CCu was significantly associated with more recent cannabis consumption at baseline. CHR patients with CCu were more likely to be male than those with DCu (χ21 = −6.11, p = 0.013). ROP patients with CCu had significantly lower lifetime highest role functioning (t105 = −2.67, p = 0.009), more severe Positive and Negative Syndrome Scale (PANSS)44—general scores (t103 = 2.66, p = 0.009), as well as a higher number of SCID-IV diagnoses for cannabis use disorder compared with ROP patients with DCu (χ22 = −9.61, p = 0.010; Table 1). Due to missing information or inadequate MR image quality, our samples differed slightly for the predictors based on cognition (NROP = 105, NCHR = 73) and sMRI (NROP = 101, NCHR = 61) (Supplementary Fig. 7).

Prediction of continued cannabis use

Only the unimodal predictor based exclusively on clinical predictors yielded significant prediction of CCu in ROP patients (balanced accuracy (BAC) = 73.3%, p = 0.001). Further, this model had an acceptable Area Under the Curve (AUC = 0.75) as defined previously (AUC ≥ 0.745). Applied to the CHR group, the BAC dropped significantly by 14.2% points (p < 0.001, Supplementary Fig. 8) but still provided a correct prediction in 58.7% of the CHR patients. The sMRI predictor performed with a BAC of 55.7% (p = 0.093) in the ROP and of 54.6% in the CHR group. The cognitive predictor performed below chance level in both groups (ROP: BAC = 45.6%; CHR: BAC = 49.7%). The clinical prediction accuracies could not be better explained by confounding effects (Supplementary 4), but sensitivity and specificity differed significantly depending on whether the criterion of cannabis use disorder in a lifetime was fulfilled (Supplementary 6). Stacking our significant clinical predictor with the sMRI-, the cognitive- or both predictors did not improve performance in the ROP group (BAC = 66.0–67.8%). The stacked predictors including sMRI yielded similar results as the clinical predictor when applied to the CHR group (BAC = 58.7%) (Table 2). Likewise, combining the clinical with the cognitive predictor did not significantly improve the prediction when applied to CHR compared with the unimodal clinical predictor (BAC = 60.0%, p = 0.065, Supplementary Fig. 8).

Table 2 Prediction results of unimodal and multimodal predictors.

Predictive patterns of the clinical classifier

Features from different categories contributed reliably to the clinical classifier (Fig. 1). The significant and most reliable features predicting CCu were a higher number of substances from other substance classes tried in a lifetime and a lower lifetime highest role functioning. Further reliable predictors of CCu were a higher number of lifetime diagnoses of cannabis dependence and a lower number of units of alcohol consumption at drinking occasions, as well as lower functional disability scores of the split version of the Global Assessment of Functioning (GAF-F)46,47 score in the past month. A higher population density of place of living, higher physical anhedonia, less frequent use of favourite food as a coping strategy and more severe mannerisms and posturing were also reliable predictors of CCu. Further, an increased likelihood of being currently unable to work because of long-term physical illness was one of the top ten most predictive features of DCu. However, this variable might be spurious, as only one ROP patient with DCu replied to this query with “yes”, while all other CCu and DCu patients replied with “no” or did not respond to this question (16.5% missing answer, Supplementary Table 2 for %-missing of features and Supplementary Table 8 for univariate comparisons between CCu and DCu for all clinical variables included in the prediction).

Fig. 1: Feature importance.
figure 1

Top ten most predictive clinical variables differentiating between continued and discontinued cannabis use until nine-month follow-up in terms of cross-validation ratio (left-side) and significant predictive features measured in terms of sign-based consistency (right-side). GAF Global Assessment of Functioning, FDR false discovery rate, PANSS G Positive and Negative Syndrome Scale—General symptoms, SCID Structured Clinical Interview for DSM Disorders.

Exploration: Continued cannabis use and long-term clinical outcome

Following investigation of long-term effects of CCu by employing linear-mixed effects models (Fig. 2, Supplementary Table 10 for further details on all models calculated), our results showed that, on average, clinical measures improved in ROP patients over the 18 months follow-up period (all pFDR < 0.007). In the ROP group, CCu was significantly associated with lower GAF-F (t136 = −3.15, pFDR = 0.006), lower current symptoms of the GAF Symptoms (GAF-S) (t167 = −2.46, pFDR = 0.030), higher PANSS-general scores (t168 = 3.75, pFDR = 0.001) and higher PANSS-positive score (t205 = 2.22, pFDR = 0.042), while CCu did not significantly predict the sum score of the Becks Depression and Inventory-II (BDI-II)48 (t122 = 1.15, pFDR > 0.303). There were no significant interaction effects between time and CCu in ROP patients (all pFDR > 0.060). In the CHR patients, all clinical measures besides BDI-II improved over the 18 months follow-up period (all pFDR < 0.001). There was a significant time-by-group interaction for BDI-II (linear: t195 = −4.46, pFDR < 0.001, quadratic: t199 = 3.89, pFDR < 0.001, cubic: t199 = −3.35 pFDR < 0.003), but no significant main effect of CCu on any of the clinical outcomes (all pFDR > 0.220).

Fig. 2: Association of continued cannabis use and long-term clinical outcomes.
figure 2

Association of continued cannabis use with the long-term course of several clinical outcomes from baseline till 18 months follow-up. Linear-mixed models were calculated modelling the clinical outcome as dependent variable and group (continued cannabis use/discontinued cannabis use), time since baseline, linear trends, quadratic trends and trend interactions as independent variable. Subject entered as random effect. Significant group effects are marked in black above and significant interactions effects are marked in black within the graphs. False-discovery rate correction was performed to control for the number of comparisons for each fixed effect across the clinical outcome variables. Of note: For graphical depiction, time from baseline is presented as ordinal variable, however, in the model calculation the time from baseline entered as a continuous variable. Further, as the model fit for the optimal complexity varied by outcome the regression-line in the plot is modelled with the ‘LOESS’ nonparametric function. PANSS Positive and Negative Syndrome Scale, GAF Global Assessment of Functioning, BDI-II Beck’s Depression Inventory-II, ROP recent-onset psychosis, CHR clinical high-risk for psychosis.

Discussion

This is the first multivariable study examining the predictability of CCu in individuals with ROP and CHR based on unimodal and multimodal data domains. Our study adds to previous investigations by indicating (1) a potentially generalizable predictor for risk of CCu in a sample of patients who are particularly vulnerable to the harmful effects of cannabis consumption4, and (2) by revealing a pattern of factors that might be further investigated to ultimately inform the design of tailored preventive strategies.

We found evidence supporting the feasibility of generalizable and significant prediction, correctly predicting CCu and DCu within nine months after baseline in 73.3% of ROP patients, based solely on their baseline clinical data. This model generalized to CHR patients only slightly above chance (BAC = 58.7%). The most important predictors of CCu were lower lifetime best role functioning and the lifetime number of illicit substances consumed other than cannabis. Predictive performance was not improved by augmenting the model with cognitive or GMV data. Further, we found that CCu was significantly associated with worse clinical outcomes in psychotic patients, and interacted with longitudinal depressive symptoms in at-risk individuals, thus confirming the importance of timely efforts to discourage CCu in these clinical groups.

Baseline clinical predictors of continued cannabis use

Our finding that the predictive power of interview-based variables outperforms other data modalities is in line with earlier results in CHR and ROP samples presenting predictive models of other clinical outcomes, such as treatment outcome after a first episode40, transition to psychosis41,49,50,51 or global functioning37.

We confirmed the importance of global functioning as an important predictor of CCu52,53 and extended previous literature in two ways. First, we assessed the model’s subject-specific predictive power and generalizability to at-risk individuals and investigated its effect by considering diverse factors simultaneously. Furthermore, our results reemphasize the importance of investigating broad aspects of global functioning in patients with psychosis37,49,54. Interestingly, CCu was mainly associated with lower levels of highest functioning. Assuming that the suboptimal functioning was also in part subjectively experienced, the lack of subjective well-functioning in several domains over a longer time period might lead to lower self-expectations, which are known to undermine abstinence55. The predictive power of lifetime diagnosis of cannabis dependence was expected because the diagnostic criteria of cannabis use disorder inherently entail an elevated likelihood to CCu8,56. The importance of the number of lifetime illicit substances is also in line with the literature13,17,18. Conversely, lower average alcohol consumption at drinking occasions predicted CCu. This is an interesting novel finding, as the literature has so far been inconclusive whether alcohol is typically used as a substitution or complementary to cannabis use57. Our finding would rather support the substitution hypothesis for alcohol use, which is in line with a previous study58 reporting changes in alcohol consumption patterns during cannabis abstinence. In line with that evidence, we found that patients with CCu were less likely to use food or snacks as a coping strategy in stressful life situations. Additionally, the CCu patients presented with higher physical anhedonia, a decreased ability to experience pleasure, which might reflect a general lack of coping strategies against relapse to cannabis use. Importantly, this conjecture is supported by studies showing that mood enhancement and social factors are the primary motivations for cannabis consumption in patients with psychosis21 and CHR patients22. Further, we replicated earlier findings on the importance of higher population density of place of living18 as a predictive risk factor of CCu. The population density was previously shown not only to be predictive of cannabis relapse, but also of lifetime cannabis use and psychosis26,59, suggesting that urbanicity and cannabis use may interact to increase the risk for psychosis60,61. Future studies should disentangle the specific impact of these two factors on psychosis.

Validation of the clinical predictor in clinical high-risk patients

Our clinical predictor performed only slightly above the chance level when applied to CHR patients. Indeed, univariate statistics (Supplementary Table 8) show that several of the most important clinical predictors did not significantly differ between CCu and DCu among CHR patients. Importantly, the CHR group had a lower proportion of subjects with cannabis use disorder compared with the ROP group, which might indicate that even the CCu individuals among CHR patients are less heavy cannabis users. As our predictor seems to be more sensitive to patients with cannabis use disorder (Supplementary 6), further investigations and testing in more diverse clinical populations are warranted. Applying our predictor to CHR patients with concurrent cannabis use disorder, to ROP patients with and without cannabis use disorder, as well as non-psychotic individuals with cannabis use disorder might disentangle the coupling between psychotic symptoms and cannabis use.

sMRI predictor of continued cannabis use

Contrary to expectation, our sMRI predictor did not perform significantly better than chance. This might be related to the study-specific outcome: CCu was defined as any cannabis consumption between baseline and nine-month follow-up. Most previous studies have instead investigated associations between more severe forms of cannabis use and GMV32. One study62 attempted to predict future cannabis use in 14-year-old abstinent adolescents, defined as at least ten instances of cannabis use during two years follow-up, with the finding that GMV differences did not precede cannabis use. Although general use in predictive models of additional and costly sMRI would not be justified, it still merits testing in future studies including larger samples to see if sMRI might help to predict more severe forms of CCu. Notably, although we carefully corrected for site-specific MR variation (Supplementary), the unbalanced sample sizes across sites might nonetheless have impacted the predictive accuracy of sMRI.

Cognitive predictor of continued cannabis use

Cognition did not predict CCu above chance level, which was surprising since schizophrenia is characterized by severe impairments in cognition43,63, as is likewise heavy cannabis use64. On the other hand, a previous meta-analysis has shown that the evidence is inconclusive for an association between cannabis dependence and cognitive impairments24. This inconsistency might be explained by differences in the cognitive tests analyzed, as some performance deficits have been shown to be task-specific65. Moreover, other evidence shows that cognition is better preserved in cannabis-using psychotic individuals than in patients without concurrent cannabis use25,66, and hence cognition might be a less important factor for predicting CCu in this particular patient group. Furthermore, a recent review on acute and residual effects of cannabis on cognition67 concluded that the association between cognition and cannabis is likely explained by genetic and environmental factors that predispose certain individuals both to cannabis use and cognitive deficits, and to a lesser degree by actual neurotoxic effects. Future studies are warranted to disentangle whether these negative results reflect our use of tests that are insensitive to particular cognitive changes predicting CCu, or whether cognitive disturbances are indeed not predictive of CCu in psychotic and at-risk patients.

Effect of continued cannabis use on long-term clinical outcome

Our longitudinal analyses partially support the notion that CCu increases the risk for a poor long-term outcome in ROP and CHR individuals. Even though we found significant differences between CCu and DCu for almost all clinical outcome measures in the ROP group, we found a significant interaction between time and CCu only with depression in the CHR group. Depressive symptoms are a common comorbidity in patients with recent-onset psychosis68. However, our finding was unexpected since other studies investigating the impact of altered cannabis use on depressive symptoms have been so far inconclusive52,69. In the ROP group, we found a trending interaction effect of CCu with general symptoms over time, which would be in line with the previous literature52. There are several possible explanations for these non-significant interaction effects: First, patients with DCu have been longer abstinent than CCu patients, and thus they might already have recovered from the detrimental effects of the cannabis consumption. Second, our analyses might have been less sensitive to time-dependent effects due to the attrition rate in our study, leading to missing data and a relatively small sample size. Third, some of the patients with CCu have reported only one cannabis use at follow-up. Previous studies have shown that even a decreased CCu might improve the long-term clinical outcome4. Future studies might investigate further baseline measures to disentangle the main effects of CCu versus general cannabis use.

Limitations

Among the important limitations of our study, we note that missing assessments in several subjects for some timepoints hindered analysis of time-to-event data, which might otherwise have improved accuracy by disentangling further subjects’ risk42. Additionally, the patient population of our study is difficult to contact and typically present a high attrition rate4. Thus, the follow-up period was only nine months, and our final sample was relatively small and unbalanced across sites, which might well have influenced results—especially in the imaging domain. Even though we carefully corrected for site effects, future studies are needed to investigate thoroughly and replicate our findings in larger samples and across sites. This would also be important to validate the speculation, as might arise from our findings, that MRI and cognitive measures are not of pivotal importance for predicting continued cannabis use. Even though most individuals who remained abstinent during the nine-month follow-up remained abstinent thereafter, further studies are warranted specifically to investigate the long-term prediction of continued cannabis use. Furthermore, our relatively small sample size hindered a further stratification of the critical outcome “continued cannabis use”. Future studies might also assess the predictability of different severities of CCu. Indeed, any reduction in cannabis use improves psychosis outcome5, and may be a more realistic harm reduction aim in therapy than complete abstinence52. As such, it would be useful to predict the relevant amount of cannabis use as distinct from complete abstinence. Most critically, our study lacks an external validation of the prediction of CCu in ROP. Thus, it cannot be inferred whether the drop in the accuracy of our predictor is better explained by low generalizability or by the differences of our samples in terms of severity of clinical symptoms and substance use. Hence, future tests of generalizability in ROP samples with similar substance use profiles are called for. Moreover, our study lacks some variables with known associations with cannabis use disorder, such as the individual’s motivation to quit cannabis use53 or specific substance-related cognitive tests65, the inclusion of which might improve accuracy in future studies. Importantly, cannabis use was assessed via self-report, which might suffer from recall- and social desirability bias. Ideally, future studies should confirm cannabis use and ascertain cannabis abstinence by biological measurements, preferably via hair toxicology, given its long detection window70.

Conclusion

This is the first multimodal examination of prognostication of CCu in ROP patients, along with generalizability testing in CHR patients. We found that the best predictor was based solely on clinical variables, reliably showing a contribution of global functioning, especially lower highest lifetime functioning, specific patterns of substance use, urbanicity and a lack of coping strategies. This predictor might be improved in future studies by adding specific cannabis-related questionnaires or additional data modalities such as cortical thickness, genetics or functional MRI, aiming to improve its clinical utility. Importantly, the ultimate aim to identify better those patients with ROP or CHR who are most likely to continue cannabis use, enabling tailored interventions and thus improve their clinical outcome, calls for testing and improvement of the model in larger and more diverse clinical samples.

Methods

Study design and population

As part of the multisite ‘Personalized Prognostic Tools for Early Psychosis Management’ study (PRONIA [www.pronia.eu, German Clinical Trials Register identifier DRKS0000504237]) N = 80 patients of age 15–40 years with ROP and N = 73 CHR patients were included. A further N = 29 patients of age 18–40 years with ROP were recruited within the monocentric, longitudinal cannabis-induced psychosis study (CIP)71. The ROP group included via PRONIA had experienced an affective or non-affective psychotic episode within the past 24 months that was present within the past three months prior to study entry. The ROP group included in CIP had a psychosis diagnosis originally associated with cannabis use that preceded the onset of psychotic symptoms by no more than two weeks in the last 24 months, as defined in the International Classification of Diseases, 10th Revision, criteria for substance-induced psychosis72. CHR individuals needed to fulfil (1) the basic symptom criterion “Cognitive Disturbances” assessed by the Schizophrenia Proneness Instrument73; and/or (2) a slightly adapted version of the ultra-high-risk criteria according to the Structured Interview for Psychosis-Risk Syndromes74.

ROP patients included in CIP were recruited at the Department of Psychiatry of Ludwig-Maximilian-University in Munich, while both PRONIA samples were recruited at ten different European sites (see ref. 41). Diagnoses were based on internationally established criteria and given by trained clinical raters37,71. Current or past alcohol dependence and polysubstance dependence within the past six months were exclusion criteria (Supplementary 1 for general exclusion criteria). Further, ROP and CHR patients included via PRONIA had to be abstinent from cannabis in the four weeks prior to inclusion. We imposed an additional inclusion criterion, only admitting patients with lifetime cannabis use prior to baseline.

All patients from PRONIA underwent baseline assessment between 2014 and 2019 and were followed for up to 36 months. The CIP recruitment took place from December 2016 until May 2019, and the follow-up period was nine months. The study protocols were largely harmonized (detailed assessments are listed in Supplementary Table 1).

Prior to inclusion, all patients provided written, informed consent (either personally or through a legal guardian if below the age of 18). Studies were approved at their respective sites by the local research ethics committees.

Outcome target

Substance use was assessed in a semi-structured interview at each visit71 (Supplementary Fig. 1). At the baseline interview, clinical raters asked the patient about his/her history of cannabis use and subsequently if he/she had used cannabis since the previous examination. We defined CCu as any cannabis consumption between baseline and nine month follow-up. Conversely, we labelled each patient who remained abstinent until at least nine months after baseline assessment as discontinued cannabis use (DCu).

Definition of the predictors

We trained three unimodal classifiers: (i) clinical, (ii) cognitive and (iii) sMRI (Supplementary Table 2 for the full list of variables). Predictors for the clinical domain were selected based on their prior association with cannabis use, consisting of: (1) substance use-related items56,71, (2) environmental risk-factors16, (3) clinical symptoms8,19,20, (4) global functioning75, (5) stress and coping strategies information76, (6) demographic data and (7) the BMI14. The cognitive predictor variables were selected from subscores of the cognitive domains of the MATRICS Consensus Cognitive Battery77, following the previous approaches41. The sMRI classifier was based on whole-brain GMV. A harmonized protocol for the acquisition of sMRI data was used at all sites37. For pre-processing, we used the open-source CAT12 toolbox (version r1155; http://dbm.neuro.uni-jena.de/cat12/), which is an extension of SPM12 running in MATLAB 2018a (Supplementary 2 and Supplementary Table 3 for details of sMRI acquisition and pre-processing). We employed group information guided–independent component analysis (GIG-ICA)78, which simultaneously takes into account the covariance between brain voxels and their similarity to reference components (RCs) of interest71,79. We chose nine RCs34 previously shown to be linked with schizophrenia34, which included several regions that have also been associated with cannabis use disorder, namely the prefrontal cortex, insula and cerebellum32 (Supplementary Fig. 2 for RCs).

Machine learning strategy

We generated and tested our predictors on the total sample of ROP patients (N = 109). Next, we tested if our predictors would generalize to CHR patients (N = 73). Our machine learning pipeline was implemented in NeuroMiner version 1.1 (www.pronia.eu/neurominer) running in MATLAB R2019. To build the set of predictors, we strictly separated the training and test phases in repeated nested cross-validation (CV) with ten folds and five permutations both at the outer (CV2) and inner cycles (CV1). All features of the (i) clinical and (ii) cognitive predictors were standardized based on the median, with imputation of missing values by Seven-Nearest Neighbour imputation, and pruning of non-informative features (zero-variance, infinity). Subsequently, all features were scaled from zero to one. To find a set of optimally predicting features, we employed a wrapper-based feature selection using linear support vector machines (SVM; LIBSVM 3.1280; http://www.csie.ntu.edu.tw/~cjlin/libsvm). Following a previous approach41, we trained the models on the CV1 training data and picked the best-performing models based on the average SVMs (BAC) at the CV1 training and testing data. More specifically, we performed a greedy sequential forward search81 across the range of the SVM C regularization parameters (\(2^{[-4_\in{\mathbb{Z}}\rightarrow +4]}\)41), adding one feature at a time until the top ten percent most predictive features were selected.

For the (iii) sMRI-based predictor, we accounted for site-specific heterogeneity in two steps. First, we used the so-called g-theory mask37,41 to exclude all voxels showing only between-site but no inter-subject variation71,82. Second, we adjusted the remaining voxels for site effects using ComBat83,84, a harmonization method based on an empirical Bayesian approach, frequently used to remove non-biological variation related to differences between MRI scanners. To preserve the biological variation of interest (CCu), we used ComBat on a subsample of healthy individuals from PRONIA that was matched for age and sex between sites (Supplementary Fig. 3, and Supplementary Table 4 for age and sex distribution of matched healthy control sample, Supplementary Fig. 4, and Supplementary Table 5 for pre/post comparisons). This model was then applied independently to our discovery (ROP) and validation samples (CHR) (Supplementary Fig. 5, Supplementary Table 6). Finally, the thresholded and site-corrected sMRI images entered our machine learning pipeline. Strictly separating between CV1 and CV2, we first scaled total intracranial volume proportionally from each voxel. We then corrected for sex and age effects based on betas computed in our healthy control subsample and employed GIG-ICA to reduce feature dimensionality. Next, the components were scaled between zero and one. Again, we employed an SVM80 with optimization of the C-parameter within a range from \(2^{[-4_\in{\mathbb{Z}}\rightarrow+4]}\)41. See Supplementary 3 for a detailed description of sMRI processing and Supplementary Fig. 2 for an overview of all steps.

Multimodal prediction models

To combine our best-performing unimodal (i) clinical predictor with the other unimodal predictors we used a stacked generalization procedure37. Here, the CV1-test decision scores from unimodal predictors served as features within the same CV structure and were scaled from zero to one, with the imputation of any missing sMRI and cognitive data using Seven-Nearest Neighbour imputation. Again, we optimized the C-parameter within a range of \(2^{[-4_\in{\mathbb{Z}}\rightarrow+4]}\).

We assessed the significance of all classifiers via permutation testing85,86 with 1000 permutations and α = 0.05. Further, we compared differences between all predictors’ performances in ROP using the nonparametric Quade-test87 at the omnibus level followed by post-hoc pairwise comparisons using the t-distribution88. Between the ROP and CHR groups we compared the performance of our best predictor (clinical) using the nonparametric and unpaired Wilcoxon rank-sum test, whereas in CHR we compared the best unimodal predictor (clinical) with the best multimodal (clinical-cognitive) predictor. Additionally, we assessed whether our clinical and sMRI-based unimodal predictions were biased by confounding effects such as age, site, sex or level of functioning (Supplementary 4). To assess whether the imbalanced group assignment of the clinical predictor in CHR patients was associated with differences in substance use severity between ROP and CHR groups, we compared the sensitivity and specificity of these models separately for subjects with and without cannabis use disorder (Supplementary 6).

Feature importance

To understand which features were most reliably contributing to the prediction of CCu, we computed the CV ratio37,85. The significance of features for predictors that included wrapper-based feature selection (clinical and cognition) was calculated by sign-based consistency following previous approaches41 (Supplementary 5).

Exploration: effect of continued cannabis use on long-term clinical outcome

To explore the clinical relevance of CCu-prediction, we examined the impact of CCu on long-term clinical outcome employing linear-mixed effects models using the package ‘lmerTest’89 in R language for statistical computing, version 3.6.390 separately in ROP and CHR groups. Clinical outcomes, specifically the sum score of positive, negative and general symptoms from the PANSS91, the sum score of BDI-II48, current symptoms of the GAF-S46,47 and current functional disability of the GAF-F until 18 months follow-up entered the model as dependent variables. Following the approach in a previous study92 we tested the main fixed effects “group” (CCu vs. DCu), time since baseline, linear, quadratic and cubic trends and trend interactions with the outcome. Patients were modelled as a random effect. We assessed model complexity for both groups (ROP and CHR) and each outcome individually employing the parametric bootstrap method for the Likelihood Ratio Test (R package PBmoDCuomp93) with 200 iterations. We deleted missing data for each case per visit.