Pattern of predictive features of continued cannabis use in patients with recent-onset psychosis and clinical high-risk for psychosis

Continued cannabis use (CCu) is an important predictor for poor long-term outcomes in psychosis and clinically high-risk patients, but no generalizable model has hitherto been tested for its ability to predict CCu in these vulnerable patient groups. In the current study, we investigated how structured clinical and cognitive assessments and structural magnetic resonance imaging (sMRI) contributed to the prediction of CCu in a group of 109 patients with recent-onset psychosis (ROP). We tested the generalizability of our predictors in 73 patients at clinical high-risk for psychosis (CHR). Here, CCu was defined as any cannabis consumption between baseline and 9-month follow-up, as assessed in structured interviews. All patients reported lifetime cannabis use at baseline. Data from clinical assessment alone correctly classified 73% (p < 0.001) of ROP and 59 % of CHR patients. The classifications of CCu based on sMRI and cognition were non-significant (ps > 0.093), and their addition to the interview-based predictor via stacking did not improve prediction significantly, either in the ROP or CHR groups (ps > 0.065). Lower functioning, specific substance use patterns, urbanicity and a lack of other coping strategies contributed reliably to the prediction of CCu and might thus represent important factors for guiding preventative efforts. Our results suggest that it may be possible to identify by clinical measures those psychosis-spectrum patients at high risk for CCu, potentially allowing to improve clinical care through targeted interventions. However, our model needs further testing in larger samples including more diverse clinical populations before being transferred into clinical practice.


INTRODUCTION
Cannabis use has a prominent role in the development of psychosis 1,2 , and exacerbates the course of the full-blown psychotic disorder 3,4 . Indeed, patients with psychosis who are habitual cannabis users have distinctly worse long-term outcome compared to those without concurrent cannabis use in terms of re-hospitalization, the severity of psychotic symptoms and general functioning 5 . In first episodes of psychosis, cannabis use is reportedly among the most powerful predictors of relapse to psychosis 6 . Likewise, in patients at clinical high-risk for psychosis (CHR) reporting lifetime cannabis use, continued cannabis use (CCu) after experiencing attenuated psychotic symptoms increased the risk of transition 7 . However, the number of individuals with cannabis use disorder has increased worldwide in recent years 8 , with correspondingly high rates of cannabis use disorder reported in early psychosis patients 9 . The vulnerability to 1 CCu remains even after treatment to encourage cannabis abstinence 10 , and the response to abstinence interventions varies greatly between individuals 11 . This body of evidence suggests that detecting individuals at risk for CCu as well as investigating and understanding the factors and mechanisms associated with CCu is from a preventive perspective important to improve the prospects for a good long-term outcome in CHR and recent-onset psychosis (ROP) patients 12 . In cannabis users drawn from community-based samples, sociodemographic factors such as young age, male sex, low income, higher body mass index (BMI) and substance use patterns each predicted relapse of cannabis use 13,14 . Further, the transition from irregular cannabis use to cannabis use disorder-a form of CCu that persists despite distress or impairment caused by the substance 15 -can be predicted by the pattern of substance use, as well as by mental health problems, history of traumatic events, schizotypal personality and living in an urban area [16][17][18] . Moreover, among clinically dependent cannabis users, poor current functioning predicts relapse of cannabis use 19,20 . Only one study 20 has hitherto investigated predictors of cannabis relapse in psychotic patients (N = 66), wherein psychotic symptoms proved to be most predictive of relapse of cannabis use. However, a review of self-reported reasons for cannabis consumption by patients with psychosis 21 concluded that present psychotic symptoms and self-medication are rarely reported as reasons for cannabis consumption. Instead, groups of psychotic patients 21 and CHR patients 22 both reported mood enhancement and social motives as their primary motivations for use. Cognitive deficits have been linked with relapse for several substances 23 , although the link with cognition is less consistently reported for cannabis compared to other substances 24 . Metaanalytic evidence of cognitive deficits attributable to cannabis use is complex, showing negative effects of cannabis on cognition in nonpsychotic individuals, but also better preserved cognitive functions in psychotic patients with concurrent cannabis use 25 . Notably, the environmental risk factors for CCu, and the presence of cognitive deficits have also been individually associated with cannabis use in general, and with an increased risk for psychosis 26,27 .
Whether addiction-that is to say, a substance use disordershould properly be called a "brain disease" remains a matter of debate 28 . Nonetheless, drug-seeking and relapse in the use of diverse substances, such as alcohol 29,30 and cocaine 31 , have consistently been associated with underlying neurobiological alterations 32,33 . Interestingly, there is a substantial overlap between brain regions that are associated with drug-seeking in general, cannabis use disorder and psychosis [32][33][34][35] . Decreased grey matter volume (GMV) in the frontal cortex, hippocampus, insula and temporal lobe and increased volume in the cerebellar cortex, are common to all three conditions 32,[34][35][36][37] . Further, effects of cannabis use on brain structure were more pronounced in psychotic individuals and individuals at clinical high-risk for developing psychosis compared to the effects in healthy individuals, potentially indicating a particular sensitivity to cannabis exposure 38 .
Several studies have investigated the association between these risk factors and cannabis relapse 13,20 or the development of a cannabis use disorder [16][17][18] . Nevertheless, their power for predicting CCu in psychotic patients and their generalizability to other clinical cases-a precondition for model implementation into clinical practice 39 -have not yet been tested. Moreover, most studies have analyzed risk factors in isolation, without considering their potentially interconnected nature 40 . Progress in the field of predictive medicine using multivariable approaches has demonstrated that models enabling the simultaneous investigation of several risk factors and multiple data modalities can often outperform unimodal predictors for conversion to psychosis 41,42 , diagnostic approaches 43 and functional outcome 37 .
In the current study, we (1) investigated multiple data modalities using machine learning 39 to assess their power to predict CCu in patients with ROP. More specifically, we generated three predictive models of CCu based on single data modalities (unimodal); namely (i) clinical, (ii) cognitive and (iii) structural magnetic resonance imaging (sMRI)-based predictors. Next, we combined these models for super-ordinate prediction, to test whether combinations of unimodal predictors would improve the predictive performance of the algorithm. Then, (2) we applied the predictors to CHR individuals, aiming to assess the predictors' generalizability to patients less severely affected in terms of psychotic symptoms and cannabis use. Finally, (3) we assessed how CCu is associated with several aspects of long-term clinical outcome to confirm previously published clinical relevance of CCu in ROP and CHR 4,5,7 . We hypothesized that there should emerge a pattern of interview-based variables at baseline that would predict CCu in our ROP sample above chance level and that, due to overlapping reasons for cannabis use between ROP and CHR patients 22 , this model would generalize well to a separate CHR population. Further, we hypothesized that including cognition and sMRI results would improve the algorithm's predictive performance. In line with previous publications 4,5 we expected that CCu would be associated with a worse long-term clinical outcome in ROP and CHR patients, thus highlighting the clinical relevance of the prediction.

Sample characteristics
Overall, we included 182 patients (mean [SD] age, 23.8 [4.7] years and female = 68 [36.8%]) ( Table 1) who all reported lifetime cannabis use at baseline. Eighty-seven patients (47%) had a CCu within a nine-month follow-up period, i.e. at least one cannabis consumption between baseline and follow-up. All other patients remained abstinent until at least nine months after baseline assessment and were labelled discontinued cannabis use (DCu). Follow-up data for DCu patients was available on average for a mean (SD) of 597 (254) days from the baseline assessment. In this time period, only N = 8 (8.7%) subjects labelled as DCu had a relapse in cannabis use after the nine-month follow-up. On average, patients with CCu resumed cannabis consumption after a mean (SD) of 94 (100) days from the baseline assessment. The time between baseline and renewed cannabis use did not significantly differ between CHR and ROP groups (mean [SD], 87 [100] days for CHR and 97 [102] days for ROP; t 32 = −0.33, p = 0.744). We trained and tested our model in repeated nested crossvalidation strictly separating training and testing folds on N = 109 patients of age 15-40 years with ROP and tested our model in a separate group of N = 73 CHR patients.
In the ROP and CHR groups, CCu was significantly associated with more recent cannabis consumption at baseline. CHR patients with CCu were more likely to be male than those with DCu (χ 2 1 = −6.11, p = 0.013). ROP patients with CCu had significantly lower lifetime highest role functioning (t 105 = −2.67, p = 0.009), more severe Positive and Negative Syndrome Scale (PANSS) 44 -general scores (t 103 = 2.66, p = 0.009), as well as a higher number of SCID-IV diagnoses for cannabis use disorder compared with ROP patients with DCu (χ 2 2 = −9.61, p = 0.010; Table 1). Due to missing information or inadequate MR image quality, our samples differed slightly for the predictors based on cognition (N ROP = 105, N CHR = 73) and sMRI (N ROP = 101, N CHR = 61) ( Supplementary Fig. 7).

Prediction of continued cannabis use
Only the unimodal predictor based exclusively on clinical predictors yielded significant prediction of CCu in ROP patients (balanced accuracy (BAC) = 73.3%, p = 0.001). Further, this model had an acceptable Area Under the Curve (AUC = 0.75) as defined previously (AUC ≥ 0.7 45 ). Applied to the CHR group, the BAC dropped significantly by 14.2% points (p < 0.001, Supplementary  Fig. 8) but still provided a correct prediction in 58.7% of the CHR N. Penzel et al.    Table 2). Likewise, combining the clinical with the cognitive predictor did not significantly improve the prediction when applied to CHR compared with the unimodal clinical predictor (BAC = 60.0%, p = 0.065, Supplementary Fig. 8).

Predictive patterns of the clinical classifier
Features from different categories contributed reliably to the clinical classifier (Fig. 1). The significant and most reliable features predicting CCu were a higher number of substances from other substance classes tried in a lifetime and a lower lifetime highest role functioning. Further reliable predictors of CCu were a higher number of lifetime diagnoses of cannabis dependence and a lower number of units of alcohol consumption at drinking occasions, as well as lower functional disability scores of the split version of the Global Assessment of Functioning (GAF-F) 46,47 score in the past month. A higher population density of place of living, higher physical anhedonia, less frequent use of favourite food as a coping strategy and more severe mannerisms and posturing were also reliable predictors of CCu. Further, an increased likelihood of being currently unable to work because of long-term physical illness was one of the top ten most predictive features of DCu. However, this variable might be spurious, as only one ROP patient with DCu replied to this query with "yes", while all other CCu and DCu patients replied with "no" or did not respond to this question (16.5% missing answer, Supplementary

DISCUSSION
This is the first multivariable study examining the predictability of CCu in individuals with ROP and CHR based on unimodal and multimodal data domains. Our study adds to previous investigations by indicating (1) a potentially generalizable predictor for risk of CCu in a sample of patients who are particularly vulnerable to the harmful effects of cannabis consumption 4 , and (2) by revealing a pattern of factors that might be further investigated to ultimately inform the design of tailored preventive strategies. We found evidence supporting the feasibility of generalizable and significant prediction, correctly predicting CCu and DCu within nine months after baseline in 73.3% of ROP patients, based solely on their baseline clinical data. This model generalized to CHR patients only slightly above chance (BAC = 58.7%). The most important predictors of CCu were lower lifetime best role functioning and the lifetime number of illicit substances consumed other than cannabis. Predictive performance was not improved by augmenting the model with cognitive or GMV data. Further, we found that CCu was significantly associated with worse clinical outcomes in psychotic patients, and interacted with longitudinal depressive symptoms in at-risk individuals, thus confirming the importance of timely efforts to discourage CCu in these clinical groups.
Baseline clinical predictors of continued cannabis use Our finding that the predictive power of interview-based variables outperforms other data modalities is in line with earlier results in CHR and ROP samples presenting predictive models of other clinical outcomes, such as treatment outcome after a first episode 40 , transition to psychosis 41,[49][50][51] or global functioning 37 .
We confirmed the importance of global functioning as an important predictor of CCu 52,53 and extended previous literature in two ways. First, we assessed the model's subject-specific predictive power and generalizability to at-risk individuals and investigated its effect by considering diverse factors simultaneously. Furthermore, our results reemphasize the importance of investigating broad aspects of global functioning in patients with psychosis 37,49,54 . Interestingly, CCu was mainly associated with lower levels of highest functioning. Assuming that the suboptimal functioning was also in part subjectively experienced, the lack of subjective well-functioning in several domains over a longer time period might lead to lower self-expectations, which are known to undermine abstinence 55 . The predictive power of lifetime diagnosis of cannabis dependence was expected because the diagnostic criteria of cannabis use disorder inherently entail an elevated likelihood to CCu 8,56 . The importance of the number of lifetime illicit substances is also in line with the literature 13,17,18 .
Conversely, lower average alcohol consumption at drinking occasions predicted CCu. This is an interesting novel finding, as the literature has so far been inconclusive whether alcohol is typically used as a substitution or complementary to cannabis use 57 . Our finding would rather support the substitution hypothesis for alcohol use, which is in line with a previous study 58 reporting changes in alcohol consumption patterns during cannabis abstinence. In line with that evidence, we found that patients with CCu were less likely to use food or snacks as a coping strategy in stressful life situations. Additionally, the CCu patients presented with higher physical anhedonia, a decreased ability to experience pleasure, which might reflect a general lack of coping strategies against relapse to cannabis use. Importantly, this conjecture is supported by studies showing that mood enhancement and social factors are the primary motivations for cannabis consumption in patients with psychosis 21 and CHR patients 22 . Further, we replicated earlier findings on the importance of higher population density of place of living 18 as a predictive risk factor of CCu. The population density was previously shown not only to be predictive of cannabis relapse, but also of lifetime cannabis use and psychosis 26,59 , suggesting that urbanicity and cannabis use may interact to increase the risk for psychosis 60,61 . Future studies should disentangle the specific impact of these two factors on psychosis.

Validation of the clinical predictor in clinical high-risk patients
Our clinical predictor performed only slightly above the chance level when applied to CHR patients. Indeed, univariate statistics (Supplementary Table 8) show that several of the most important clinical predictors did not significantly differ between CCu and DCu among CHR patients. Importantly, the CHR group had a lower proportion of subjects with cannabis use disorder compared with the ROP group, which might indicate that even the CCu individuals among CHR patients are less heavy cannabis users. As our predictor seems to be more sensitive to patients with cannabis use disorder (Supplementary 6), further investigations and testing in more diverse clinical populations are warranted. Applying our predictor to CHR patients with concurrent cannabis use disorder, to ROP patients with and without cannabis use disorder, as well as non-psychotic individuals with cannabis use disorder might disentangle the coupling between psychotic symptoms and cannabis use.
sMRI predictor of continued cannabis use Contrary to expectation, our sMRI predictor did not perform significantly better than chance. This might be related to the study-specific outcome: CCu was defined as any cannabis consumption between baseline and nine-month follow-up. Most   Fig. 2 Association of continued cannabis use and long-term clinical outcomes. Association of continued cannabis use with the long-term course of several clinical outcomes from baseline till 18 months follow-up. Linear-mixed models were calculated modelling the clinical outcome as dependent variable and group (continued cannabis use/discontinued cannabis use), time since baseline, linear trends, quadratic trends and trend interactions as independent variable. Subject entered as random effect. Significant group effects are marked in black above and significant interactions effects are marked in black within the graphs. False-discovery rate correction was performed to control for the number of comparisons for each fixed effect across the clinical outcome variables. Of note: For graphical depiction, time from baseline is presented as ordinal variable, however, in the model calculation the time from baseline entered as a continuous variable. Further, as the model fit for the optimal complexity varied by outcome the regression-line in the plot is modelled with the 'LOESS' nonparametric function. PANSS Positive and Negative Syndrome Scale, GAF Global Assessment of Functioning, BDI-II Beck's Depression Inventory-II, ROP recent-onset psychosis, CHR clinical high-risk for psychosis.
previous studies have instead investigated associations between more severe forms of cannabis use and GMV 32 . One study 62 attempted to predict future cannabis use in 14-year-old abstinent adolescents, defined as at least ten instances of cannabis use during two years follow-up, with the finding that GMV differences did not precede cannabis use. Although general use in predictive models of additional and costly sMRI would not be justified, it still merits testing in future studies including larger samples to see if sMRI might help to predict more severe forms of CCu. Notably, although we carefully corrected for site-specific MR variation (Supplementary), the unbalanced sample sizes across sites might nonetheless have impacted the predictive accuracy of sMRI.
Cognitive predictor of continued cannabis use Cognition did not predict CCu above chance level, which was surprising since schizophrenia is characterized by severe impairments in cognition 43,63 , as is likewise heavy cannabis use 64 . On the other hand, a previous meta-analysis has shown that the evidence is inconclusive for an association between cannabis dependence and cognitive impairments 24 . This inconsistency might be explained by differences in the cognitive tests analyzed, as some performance deficits have been shown to be task-specific 65 . Moreover, other evidence shows that cognition is better preserved in cannabis-using psychotic individuals than in patients without concurrent cannabis use 25,66 , and hence cognition might be a less important factor for predicting CCu in this particular patient group. Furthermore, a recent review on acute and residual effects of cannabis on cognition 67 concluded that the association between cognition and cannabis is likely explained by genetic and environmental factors that predispose certain individuals both to cannabis use and cognitive deficits, and to a lesser degree by actual neurotoxic effects. Future studies are warranted to disentangle whether these negative results reflect our use of tests that are insensitive to particular cognitive changes predicting CCu, or whether cognitive disturbances are indeed not predictive of CCu in psychotic and at-risk patients.

Effect of continued cannabis use on long-term clinical outcome
Our longitudinal analyses partially support the notion that CCu increases the risk for a poor long-term outcome in ROP and CHR individuals. Even though we found significant differences between CCu and DCu for almost all clinical outcome measures in the ROP group, we found a significant interaction between time and CCu only with depression in the CHR group. Depressive symptoms are a common comorbidity in patients with recent-onset psychosis 68 . However, our finding was unexpected since other studies investigating the impact of altered cannabis use on depressive symptoms have been so far inconclusive 52,69 . In the ROP group, we found a trending interaction effect of CCu with general symptoms over time, which would be in line with the previous literature 52 . There are several possible explanations for these nonsignificant interaction effects: First, patients with DCu have been longer abstinent than CCu patients, and thus they might already have recovered from the detrimental effects of the cannabis consumption. Second, our analyses might have been less sensitive to time-dependent effects due to the attrition rate in our study, leading to missing data and a relatively small sample size. Third, some of the patients with CCu have reported only one cannabis use at follow-up. Previous studies have shown that even a decreased CCu might improve the long-term clinical outcome 4 . Future studies might investigate further baseline measures to disentangle the main effects of CCu versus general cannabis use.

Limitations
Among the important limitations of our study, we note that missing assessments in several subjects for some timepoints hindered analysis of time-to-event data, which might otherwise have improved accuracy by disentangling further subjects' risk 42 . Additionally, the patient population of our study is difficult to contact and typically present a high attrition rate 4 . Thus, the follow-up period was only nine months, and our final sample was relatively small and unbalanced across sites, which might well have influenced results-especially in the imaging domain. Even though we carefully corrected for site effects, future studies are needed to investigate thoroughly and replicate our findings in larger samples and across sites. This would also be important to validate the speculation, as might arise from our findings, that MRI and cognitive measures are not of pivotal importance for predicting continued cannabis use. Even though most individuals who remained abstinent during the nine-month follow-up remained abstinent thereafter, further studies are warranted specifically to investigate the long-term prediction of continued cannabis use. Furthermore, our relatively small sample size hindered a further stratification of the critical outcome "continued cannabis use". Future studies might also assess the predictability of different severities of CCu. Indeed, any reduction in cannabis use improves psychosis outcome 5 , and may be a more realistic harm reduction aim in therapy than complete abstinence 52 . As such, it would be useful to predict the relevant amount of cannabis use as distinct from complete abstinence. Most critically, our study lacks an external validation of the prediction of CCu in ROP. Thus, it cannot be inferred whether the drop in the accuracy of our predictor is better explained by low generalizability or by the differences of our samples in terms of severity of clinical symptoms and substance use. Hence, future tests of generalizability in ROP samples with similar substance use profiles are called for. Moreover, our study lacks some variables with known associations with cannabis use disorder, such as the individual's motivation to quit cannabis use 53 or specific substance-related cognitive tests 65 , the inclusion of which might improve accuracy in future studies. Importantly, cannabis use was assessed via selfreport, which might suffer from recall-and social desirability bias. Ideally, future studies should confirm cannabis use and ascertain cannabis abstinence by biological measurements, preferably via hair toxicology, given its long detection window 70 .
CONCLUSION This is the first multimodal examination of prognostication of CCu in ROP patients, along with generalizability testing in CHR patients. We found that the best predictor was based solely on clinical variables, reliably showing a contribution of global functioning, especially lower highest lifetime functioning, specific patterns of substance use, urbanicity and a lack of coping strategies. This predictor might be improved in future studies by adding specific cannabis-related questionnaires or additional data modalities such as cortical thickness, genetics or functional MRI, aiming to improve its clinical utility. Importantly, the ultimate aim to identify better those patients with ROP or CHR who are most likely to continue cannabis use, enabling tailored interventions and thus improve their clinical outcome, calls for testing and improvement of the model in larger and more diverse clinical samples. longitudinal cannabis-induced psychosis study (CIP) 71 . The ROP group included via PRONIA had experienced an affective or non-affective psychotic episode within the past 24 months that was present within the past three months prior to study entry. The ROP group included in CIP had a psychosis diagnosis originally associated with cannabis use that preceded the onset of psychotic symptoms by no more than two weeks in the last 24 months, as defined in the International Classification of Diseases, 10th Revision, criteria for substance-induced psychosis 72 . CHR individuals needed to fulfil (1) the basic symptom criterion "Cognitive Disturbances" assessed by the Schizophrenia Proneness Instrument 73 ; and/ or (2) a slightly adapted version of the ultra-high-risk criteria according to the Structured Interview for Psychosis-Risk Syndromes 74 .

Study design and population
ROP patients included in CIP were recruited at the Department of Psychiatry of Ludwig-Maximilian-University in Munich, while both PRONIA samples were recruited at ten different European sites (see ref. 41 ). Diagnoses were based on internationally established criteria and given by trained clinical raters 37,71 . Current or past alcohol dependence and polysubstance dependence within the past six months were exclusion criteria (Supplementary 1 for general exclusion criteria). Further, ROP and CHR patients included via PRONIA had to be abstinent from cannabis in the four weeks prior to inclusion. We imposed an additional inclusion criterion, only admitting patients with lifetime cannabis use prior to baseline.
All patients from PRONIA underwent baseline assessment between 2014 and 2019 and were followed for up to 36 months. The CIP recruitment took place from December 2016 until May 2019, and the follow-up period was nine months. The study protocols were largely harmonized (detailed assessments are listed in Supplementary Table 1).
Prior to inclusion, all patients provided written, informed consent (either personally or through a legal guardian if below the age of 18). Studies were approved at their respective sites by the local research ethics committees.

Outcome target
Substance use was assessed in a semi-structured interview at each visit 71 ( Supplementary Fig. 1). At the baseline interview, clinical raters asked the patient about his/her history of cannabis use and subsequently if he/she had used cannabis since the previous examination. We defined CCu as any cannabis consumption between baseline and nine month follow-up. Conversely, we labelled each patient who remained abstinent until at least nine months after baseline assessment as discontinued cannabis use (DCu).

Definition of the predictors
We trained three unimodal classifiers: (i) clinical, (ii) cognitive and (iii) sMRI (Supplementary Table 2 for the full list of variables). Predictors for the clinical domain were selected based on their prior association with cannabis use, consisting of: (1) substance use-related items 56,71 , (2) environmental risk-factors 16 , (3) clinical symptoms 8,19,20 , (4) global functioning 75 , (5) stress and coping strategies information 76 , (6) demographic data and (7) the BMI 14 . The cognitive predictor variables were selected from subscores of the cognitive domains of the MATRICS Consensus Cognitive Battery 77 , following the previous approaches 41 . The sMRI classifier was based on whole-brain GMV. A harmonized protocol for the acquisition of sMRI data was used at all sites 37 . For pre-processing, we used the open-source CAT12 toolbox (version r1155; http://dbm.neuro.unijena.de/cat12/), which is an extension of SPM12 running in MATLAB 2018a (Supplementary 2 and Supplementary Table 3 for details of sMRI acquisition and pre-processing). We employed group information guided-independent component analysis (GIG-ICA) 78 , which simultaneously takes into account the covariance between brain voxels and their similarity to reference components (RCs) of interest 71,79 . We chose nine RCs 34 previously shown to be linked with schizophrenia 34 , which included several regions that have also been associated with cannabis use disorder, namely the prefrontal cortex, insula and cerebellum 32 ( Supplementary Fig.  2 for RCs).

Machine learning strategy
We generated and tested our predictors on the total sample of ROP patients (N = 109). Next, we tested if our predictors would generalize to CHR patients (N = 73). Our machine learning pipeline was implemented in NeuroMiner version 1.1 (www.pronia.eu/neurominer) running in MATLAB R2019. To build the set of predictors, we strictly separated the training and test phases in repeated nested cross-validation (CV) with ten folds and five permutations both at the outer (CV 2 ) and inner cycles (CV 1 ). All features of the (i) clinical and (ii) cognitive predictors were standardized based on the median, with imputation of missing values by Seven-Nearest Neighbour imputation, and pruning of non-informative features (zero-variance, infinity). Subsequently, all features were scaled from zero to one. To find a set of optimally predicting features, we employed a wrapper-based feature selection using linear support vector machines (SVM; LIBSVM 3.12 80 ; http://www.csie.ntu.edu.tw/~cjlin/libsvm). Following a previous approach 41 , we trained the models on the CV 1 training data and picked the best-performing models based on the average SVMs (BAC) at the CV 1 training and testing data. More specifically, we performed a greedy sequential forward search 81 across the range of the SVM C regularization parameters (2 ½À42Z!þ441 ), adding one feature at a time until the top ten percent most predictive features were selected.
For the (iii) sMRI-based predictor, we accounted for site-specific heterogeneity in two steps. First, we used the so-called g-theory mask 37,41 to exclude all voxels showing only between-site but no inter-subject variation 71,82 . Second, we adjusted the remaining voxels for site effects using ComBat 83,84 , a harmonization method based on an empirical Bayesian approach, frequently used to remove non-biological variation related to differences between MRI scanners. To preserve the biological variation of interest (CCu), we used ComBat on a subsample of healthy individuals from PRONIA that was matched for age and sex between sites ( Supplementary Fig. 3, and Supplementary Table 4 for age and sex distribution of matched healthy control sample, Supplementary Fig. 4, and Supplementary Table 5 for pre/post comparisons). This model was then applied independently to our discovery (ROP) and validation samples (CHR) (Supplementary Fig. 5, Supplementary Table 6). Finally, the thresholded and site-corrected sMRI images entered our machine learning pipeline. Strictly separating between CV 1 and CV 2 , we first scaled total intracranial volume proportionally from each voxel. We then corrected for sex and age effects based on betas computed in our healthy control subsample and employed GIG-ICA to reduce feature dimensionality. Next, the components were scaled between zero and one. Again, we employed an SVM 80 with optimization of the C-parameter within a range from 2 ½À42Z!þ441 . See Supplementary 3 for a detailed description of sMRI processing and Supplementary Fig. 2 for an overview of all steps.

Multimodal prediction models
To combine our best-performing unimodal (i) clinical predictor with the other unimodal predictors we used a stacked generalization procedure 37 . Here, the CV 1 -test decision scores from unimodal predictors served as features within the same CV structure and were scaled from zero to one, with the imputation of any missing sMRI and cognitive data using Seven-Nearest Neighbour imputation. Again, we optimized the C-parameter within a range of 2 ½À42Z!þ4 .
We assessed the significance of all classifiers via permutation testing 85,86 with 1000 permutations and α = 0.05. Further, we compared differences between all predictors' performances in ROP using the nonparametric Quade-test 87 at the omnibus level followed by post-hoc pairwise comparisons using the t-distribution 88 . Between the ROP and CHR groups we compared the performance of our best predictor (clinical) using the nonparametric and unpaired Wilcoxon rank-sum test, whereas in CHR we compared the best unimodal predictor (clinical) with the best multimodal (clinical-cognitive) predictor. Additionally, we assessed whether our clinical and sMRI-based unimodal predictions were biased by confounding effects such as age, site, sex or level of functioning (Supplementary 4). To assess whether the imbalanced group assignment of the clinical predictor in CHR patients was associated with differences in substance use severity between ROP and CHR groups, we compared the sensitivity and specificity of these models separately for subjects with and without cannabis use disorder (Supplementary 6).

Feature importance
To understand which features were most reliably contributing to the prediction of CCu, we computed the CV ratio 37,85 . The significance of features for predictors that included wrapper-based feature selection (clinical and cognition) was calculated by sign-based consistency following previous approaches 41

(Supplementary 5).
Exploration: effect of continued cannabis use on long-term clinical outcome To explore the clinical relevance of CCu-prediction, we examined the impact of CCu on long-term clinical outcome employing linear-mixed N. Penzel et al. effects models using the package 'lmerTest' 89 in R language for statistical computing, version 3.6.3 90 separately in ROP and CHR groups. Clinical outcomes, specifically the sum score of positive, negative and general symptoms from the PANSS 91 , the sum score of BDI-II 48 , current symptoms of the GAF-S 46,47 and current functional disability of the GAF-F until 18 months follow-up entered the model as dependent variables. Following the approach in a previous study 92 we tested the main fixed effects "group" (CCu vs. DCu), time since baseline, linear, quadratic and cubic trends and trend interactions with the outcome. Patients were modelled as a random effect. We assessed model complexity for both groups (ROP and CHR) and each outcome individually employing the parametric bootstrap method for the Likelihood Ratio Test (R package PBmoDCuomp 93 ) with 200 iterations. We deleted missing data for each case per visit.

DATA AVAILABILITY
The data are not publicly available due to Institutional Review Board restrictionssince the participants did not consent to their data being publicly available.