Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Predicting alcohol use disorder remission: a longitudinal multimodal multi-featured machine learning approach


Predictive models for recovering from alcohol use disorder (AUD) and identifying related predisposition biomarkers can have a tremendous impact on addiction treatment outcomes and cost reduction. Our sample (N = 1376) included individuals of European (EA) and African (AA) ancestry from the Collaborative Study on the Genetics of Alcoholism (COGA) who were initially assessed as having AUD (DSM-5) and reassessed years later as either having AUD or in remission. To predict this difference in AUD recovery status, we analyzed the initial data using multimodal, multi-features machine learning applications including EEG source-level functional brain connectivity, Polygenic Risk Scores (PRS), medications, and demographic information. Sex and ancestry age-matched stratified analyses were performed with supervised linear Support Vector Machine application and were calculated twice, once when the ancestry was defined by self-report and once defined by genetic data. Multifeatured prediction models achieved higher accuracy scores than models based on a single domain and higher scores in male models when the ancestry was based on genetic data. The AA male group model with PRS, EEG functional connectivity, marital and employment status features achieved the highest accuracy of 86.04%. Several discriminative features were identified, including collections of PRS related to neuroticism, depression, aggression, years of education, and alcohol consumption phenotypes. Other discriminated features included being married, employed, medication, lower default mode network and fusiform connectivity, and higher insula connectivity. Results highlight the importance of increasing genetic homogeneity of analyzed groups, identifying sex, and ancestry-specific features to increase prediction scores revealing biomarkers related to AUD remission.


National surveys on alcohol use statistics and Alcohol Use Disorder (AUD) studies show that only one-third of individuals with AUD attempted to quit drinking every year1. Of them, only 25% were successful in reducing alcohol consumption for more than a year1. At the same time, there are ongoing debates over courts and correctional programs admitting individuals into rehabilitation programs given their efficacy and program’s outcome2,3. Therefore, it is of great importance to be able to identify AUD resilience and readiness to recover features including predisposition characteristics that can predict a change in drinking behavior, consequently impacting therapeutic approaches to AUD, helping individuals overcome addiction and overall reducing state, and federal associated financial burden. Few studies have investigated the characteristics of those with AUD who reduce alcohol consumption, but they have often focused on psychosocial aspects3 and initial consumption measurements4,5, leaving physiological and genetic variables unexplored.

Recently, the ability to use multimodal multi-features machine learning (ML) applications has started to revolutionize biomedical research enabling to classify and predict diseases, as well as a better understanding of development and treatment outcomes, outperforming more classical analyses such as regressions6,7,8. Significant ML Support Vector Machine (SVM) classifier models were found for complex disorders, including schizophrenia, bipolar disorder, and depression, identifying discriminating features9. Our own study8 and others7, have shown that the accuracy of ML models increases by using multimodal, multi-features approaches to describe complex disorders, permitting a variety of measurement domains that could be brought to bear on different aspects of disease pathology7. Indeed, ML studies calculating AUD classifiers/predictive models have employed genetic loci8, psychosocial7, family history8, and electrophysiological (EEG) measurements8 as features in a multimodal analysis. In the current ML study, we have utilized EEG, genetics, medication intake, and demographic as predisposition characteristics to predict AUD remission. EEG measurements, especially resting-state functional connectivity (EEG-FC) have been shown to be a reliable diagnostic tool and classifier in AUD and other brain disorders such as post-traumatic stress disorder, and bipolar disorder10. Polygenic risk scores (PRS), which summarize the effects of genome-wide association study (GWAS) markers to measure the genetic liability to a trait or a disorder, have shown promise in predicting human complex traits and diseases11,12. Several GWAS studies tested alcohol-related PRS for association with AUD phenotypes, using PRS related to risky behaviors, alcohol-use problems, and alcohol consumption with encouraging results13,14. We also tested demographic features including marital and employment status which have been found to be associated with a reduction in alcohol consumption and remission1 from AUD. Current medication intake was added as a potential feature to the calculated AUD remission predictive model. Alcohol misuse targets areas of the brain, altering mental states such as emotion15 and cognition16, thus affecting an individual’s capability to cope with the challenges involved in the relapse/recovery processes17. Medication can restore brain regulation abilities, potentially strengthening and stabilizing individual mental abilities, thus supporting AUD remission. The substantial impact that marital and occupational status has for those with AUD18,19, indicates stabilizing and supportive environmental effects.

The present study, therefore, aims to create an ML model, predicting future AUD remission among individuals who had met criteria for active DSM-5 AUD at their first interview but no longer meet criteria for current DSM-5 AUD at their next interview. It should be noted that remission is a complex, multidimensional process; this study focuses on the reduction of alcohol use and of AUD symptoms to subclinical levels, which is one component of remission20. We used longitudinal multidimensional data from COGA (e.g., clinical, electrophysiological, GWAS, demographics), including individuals of European Ancestry (EA) and African Ancestry (AA). COGA collects data and follows individuals with AUD, providing a unique opportunity to compare an individual’s AUD status over the development of their addiction and during their remission. Most importantly the diverse COGA data-enabled stratified analyses, increased group homogeneity, creating an individualized model, and discriminative key features for every group. To further increase group ancestry genetic homogeneity we calculated the models when ancestry was based on self-report and again based on genetics calculated with ancestral principal components (PCA)21. Our central hypothesis was that model based on multidimensional features will result in a better prediction than singular modality and that being married, employed, and taking medication will predict remission. Using stratification to control for the confounding variables, sex, and ancestry, we expected to find differences in the prediction models between the groups, with higher accuracy scores when the ancestry was calculated using genetic data. We also examined the most discriminative features in the predictive models, enhancing our understanding of neurophysiological, genetic, and socio-demographic characteristics underlying AUD resilience and recovery.

Materials and methods


The data consisted of 1376 participants (826 males and 550 females) from COGA, including EA and AA individuals. Data from seven collection sites were included in this study. The experimental protocols were approved by each site’s institutional review board, and informed consent (for those over 18 years of age) or assent (for those under 18 years of age) was obtained from all participants. Ascertainment and assessment procedures of COGA recruits have been described elsewhere22 and in Supplementary Materials. Only individuals who met criteria for lifetime DSM-5 AUD and who participated in at least two interviews were included in the sample for these analyses. We examined only participants who were diagnosed as DSM-5 AUD at their initial laboratory visit and reassessed years later, at a follow-up visit, when they were divided into two groups: (1) continued AUD: met criteria for active DSM-5 AUD at time 1 and time 2, and (2) remitted AUD: no longer met criteria for active DSM-5 AUD at time 2. The AUD and remission phenotypes were based on information from two consecutive interviews. Both continued AUD and remitted AUD met criteria for current AUD at the first interview, defined as the presence of two or more AUD criteria within the previous 12 months. Remission at the second interview was defined as the absence of all AUD criteria other than craving for at least 12 months and either low-risk drinking or abstinence (n = 688, 413 males, 275 females, mean age at initial visit: 30.62 ± 9.41, mean number of years between visits = 4.6 ± 1.7). The continued AUD group met criteria for current AUD at both interviews (n = 688, 413 males, 275 females, mean age: 30.79 ± 9.36, mean number of years between visits = 4.8 ± 1.6). The analysis was done on the data collected during the first visit to predict remission status at the second visit. In a series of analyses, the groups were further divided according to ancestry (EA, AA) and sex (male, female). Stratified analysis by ancestry was done twice: once with ancestry identified by self-report and once identified by implementing SNPrelate23 to estimate principal components from GWAS data which was subsequently used to determine EA and AA. Sex, ancestry, and features’ missing values dictated a series of analyses that included different subsets of subjects. All groups were matched on age. A full description of each of the groups can be found in Supplementary Tables S1S4.


EEG data acquisition and preprocessing

EEG was recorded for 4 min as the participants were sitting on a comfortable chair and were instructed to stay awake with their eyes closed and not to move. Participants sat in a dimly lit, sound-attenuated RF-shielded booth (Industrial Acoustics, Inc., Bronx, NY, USA) with 64-channel electrode cap (Electro-Cap International, Inc., Eaton, OH, USA) based on the extended 10–20 System. EEG recording and preprocessing procedures are described in Supplementary Materials.

Feature extraction

EEG extracted features

A full description of EEG functional connectivity calculation (using MNE package)24 can be found in Supplementary Materials and Supplementary Table S5. Briefly, The FreeSurfer parcellation scheme (aparc.lh/rh), based on the Desikan–Killiany Atlas25, was used to define 68 cortical regions from both hemispheres (list of ROIs in Supplementary Table S5). We computed spectral coherence26 to measure functional connectivity (FC) between EEG signals of 68 regions of interest (ROI) at specific frequency bands: theta (4–8 Hz), alpha (8–12 Hz), beta (12–30 Hz), and gamma (30–60 Hz) with no overlap between frequencies. The following electrophysiological features were extracted: for each of the frequency bands (theta, alpha, beta, and gamma), a 68 × 68 ROIs matrix of coherence was created for each participant resulting in 9221 features. Each of these features represents an EEG coherence functional connectivity (EEG-FC) between two ROIs.

PRS features

PRS based on GWAS weights from 47 phenotypes were derived from 12 publicly available large-scale GWAS of alcohol-related traits conducted in EA and AA males and females including GWAS of alcohol consumption27,28,29, DSM-IV alcohol dependence28,30,31, and a maximum number of alcoholic drinks within 24 h. Additional PRS were derived from GWAS of other traits known to correlate with alcohol use and problems, including educational attainment32,33, anxiety disorders34, personality traits(e.g., aggression35, neuroticism33,36), depression33, subjective wellbeing32, brain structure37, and environmental sensitivity38 (overall number of PRS features = 1162). Details regarding the discovery of GWAS, including the number of individuals who participated in the GWAS and phenotypes, can be found in Table S6. Information on genotyping and quality control is available in the Supplemental Materials. Briefly, the well-established process of clumping and thresholding was used39 where single nucleotide polymorphisms (SNPs) from discovery GWAS were clumped based on linkage disequilibrium (LD) in the 1000 genomes EUR panel using PLINK 1.940, based on an R2 = 0.25, with a 500 kb window. SNPs were weighted using the negative log of the association p values. Scores were based on differing thresholds of GWAS p values (p < 0.0001, p < 0.001, p < 0.01, p < 0.05, p < 0.10, p < 0.20, p < 0.30, p < 0.40, p < 0.50). PRSs were converted to Z-scores for interpretation.

Marital, employment status, and medication intake

This information was gathered as part of COGA’s assessment procedure (see22 Supplementary Materials for further description). Assessment about prescribed medication intake during the last 30 days includes medication for sleep, anxiety, headaches, birth control, depression, energy, containing steroids, and another category listed as ‘other medication’.

Machine learning analysis

Z Normalization was applied to all the features to maintain a common scale, without distorting differences in the ranges of values. Regularization methods were used to control for variables overfitting, enhancing the interpretability, and prediction accuracy of the calculated models. We used the least absolute shrinkage and selection operator (LASSO) penalty approach shown by Tibshirani41 for feature selection. The sparsity property of LASSO which generates coefficient estimates of exactly zero, shrinks the estimation variance resulting with a more interpretable model42. Previous use of this application for genomic data43 has shown that the selective number of discriminating features can reach satisfactory classification. Regularization parameters were determined using a tenfold cross-validation (CV) procedure, with the label: continued AUD vs. remitted AUD as the response variable. The reduced set of the most discriminant features with non-zero coefficient was fed into the model to predict participants status to either continued AUD group or remitted AUD group. A supervised linear-kernel SVM that included parameter optimization was trained with a tenfold CV procedure to classify participants into the two groups. The tenfold CV procedure involved randomly dividing the participants into ten equal groups, training the classifier on nine of them, and tested the trained model on the left out one. To ensure randomization of the participants in the calculated model, the dataset was shuffled before every fold. To take advantage of the randomization procedure, we repeated this process ten times, averaging the output results. CV was applied to all models with additional training/testing (70:30) validation analysis to confirm results in the larger samples (EA male and females). Model performance was evaluated by calculating the number of true positives (TP, number of correctly classified remitted AUD) and true negatives (TN, number of correctly classified continued AUD) scores. We computed the classification accuracy as the ratio of sum of TP and TN divided by the sum of all classified subjects. Area under curve (AUC)7 was used to evaluate the classification models. More description of AUC calculation and comparison can be found in Supplementary Materials.


Significant ML SVM models were calculated predicting remission from AUD for individuals who were previously diagnosed as AUD DSM-5. Sex and ancestry stratified analysis created an individualized model for each of the groups: EA males, EA females, AA males, and AA females (full details of the number of participants and matching age for each of the models in Supplementary Tables S1S4). Table 1 summarizes the results of the significant predictive model scores across ancestry and sex (see Tables S7S10 for full results), confirming the previous finding that the combined feature model (e.g., AA males and females models with EEG, PRS, medication, and demographic features) was more accurate than models based on single domain (Fig. 1). We found higher model accuracy when group’s ancestry was defined by genetics than by self-report, in EA males (p < 0.001) and AA males (p < 0.001) in models with only PRS as features (Supplementary Fig. S1). No difference was found in the females’ groups between the two types of ancestry definition. The AA male group combined feature model of PRS, EEG-FC, marital status, and employment status achieved the highest accuracy of 86.04% (specificity = 85.83%, sensitivity = 86.257%, AUC = 0.97). The AA female group combined feature model of PRS, EEG-FC, and depression medication also achieved high accuracy of 85.43% (specificity = 80.66%, sensitivity = 86.19%, AUC = 0.9). The models of EA male and female groups achieved AUC >0.74 for the model with combined features of PRS, EEG-FC, and medication (accuracy 64.96%, EA males) and PRS, EEG-FC, and marital status (accuracy 63.60%, EA females) (Table 1). Adding discriminatory features to the models increased accuracy, specifically EEG-FC was the most discriminative feature category for all groups (p < 0.001).

Table 1 Selected models predicting AUD remission stratified by ancestry and sex.
Fig. 1: Model accuracy stratified by sex and ancestry.
figure 1

Prediction obtained by only the PRS, the combined EEG and PRS, and features from the highest accuracy scored model for every group (Table 1). Results indicate higher accuracy for the combined feature models suggesting the advantage of adding phenotypes to genetic prediction models. The error bars are standard deviations. *p < 0.05, **p < 0.01.

Discriminative features

Tables 2 and 3 present a summary of selected shared and group-specific features stratified by ancestry and sex for the model with the highest accuracy in each group. Supplementary Tables S11S14 present the significant features separately for every model.

Table 2 Selected discriminative PRS and demographic features predicting AUD remission stratified by ancestry and sex.
Table 3 Selected discriminative EEG-FC features predicting AUD remission stratified by ancestry and sex.


Default Mode Network (DMN) FC, as well as, connectivity levels in other brain networks, were found to discriminate between the groups and to predict AUD remission (selected significance FC features according to groups in Table 3 and Fig. 2. Full list of FC features including right, and left hemispheres can be found in Supplementary Tables S11S15). Lower connectivity of DMN ROIs was found in the continued AUD group, especially in the range of theta and gamma bands. Known DMN hubs including the precuneus (AA female), the posterior cingulate (AA female), and the middle frontal (EA male, AA male) showed lower gamma connectivity. Lower Theta connectivity was found in other more temporal DMN hubs such as the temporal cortex (AA female) and the parahippocampal formation (EA female) with anterior and posterior brain areas. Lower theta connectivity in the continued AUD versus remitted group was also found between the fusiform and posterior and anterior brain areas. Higher connectivity in the continued AUD group was found in selected temporal and parietal areas including the insula connectivity with superior parietal (beta, EA female) and with inferior parietal (theta, AA male) (Fig. 2).

Fig. 2: EEG functional connectivity AUD remission biomarkers.
figure 2

AUD remission prediction models reveal ancestry/sex group-specific brain connectivity biomarkers discriminating between those who recovered from AUD to those who did not. Results highlight lower connectivity in theta (blue) and gamma (red) in areas related to DMN (in bold -IT, PCC, PR, raCC, PH) and higher connectivity in theta (blue) and beta (orange) between insula and inferior and superior parietal regions respectively (in bold INS, IP, IT) specific to every sex and ancestry predicting the maintenance of AUD. blue-theta, green-alpha, orange-beta, red-gamma. Thinner lines—lower connectivity, Thicker lines—higher connectivity. CMF Caudal middle frontal, FF fusiform, FP frontal pole, INS insula, IC Isthmus cingulate, IP inferior parietal, IT inferior temporal, LI lingual, LO Lateral Occipital, MOF medial orbito frontal, PC paracentral, PCC posterior cingulate cortex, PH parahippocampus, PO parsorbitalis, PR precuneus, raCC rostral anterior cingulate cortex, SM supramarginal, rmF Rostral middle frontal, TP temporal pole, TT Transverse temporal, SP Superior parietal, ST Superior temporal.


Significant discriminative shared and specific PRS were found between the groups (selected significance PRS features according to groups in Table 2 and full list can be found in Supplementary Tables S11S14, including weight ranking). The discriminative features include PRS related to personality traits (neuroticism (EA males, EA females) and aggression (EA male), as well as, depression (EA females), socio-demographic (years of education (EA males, EA females)) and alcohol-related (maximum number of alcoholic drinks within 24 h (AA males)).


Adding medications to the PRS models significantly increased model accuracy for EA males—sleep medication, AA males—other medication. (p(PRS vs PRS & sleep medication) <0.001, p(PRS vs PRS & other medication) <0.001, respectively). Depression medication increased the accuracy score of the PRS & EEG-FC model but this increase did not reach significance (p(PRS & EEG vs PRS & EEG & depression medication) = 0.5). In those cases, taking medication predicted maintenance of the AUD state (Table 2 for weight ranking).

Marital status

Marital status feature discriminated between the groups revealing that more members from the remitted AUD EA female and AA male groups were not married compared to their AUD counterparts. (Table 2 for weight ranking). EA female: p(PRS & EEG vs. PRS & EEG & Marital status) <0.004, AA male: p(PRS & EEG vs. PRS & EEG & Marital status) <0.001.

Employment status

Employment status feature discriminated between the groups revealing that more members from the remitted AUD AA male group were not employed compared to their AUD counterparts. (Table 2 for weight ranking). p(PRS & EEG vs. PRS & EEG & Marital & Employment status) <0.001.


Of the one-third of individuals with AUD who attempt to quit drinking every year, only 25% are successful in reducing their consumption a year later. Therefore, understanding the parameters that can set an optimal initial state (including biomarkers, demographic, and medications) can greatly affect the success of remission from AUD. Using multimodal, multi-featured machine learning applications with the COGA longitudinal dataset, we uncovered these parameters, uniquely characterized per sex and ancestry. This is the first study to formulate a multimodal-based prediction model to determine AUD individuals who are going to be in remission from AUD. Results confirmed previous results showing that the combined feature model (e.g., EEG, PRS, medication, and demographic information) achieved a higher prediction score than models based on single domain suggesting that genetics prediction models will improve from the addition of phenotypes to the calculation. Intriguingly, results indicate higher accuracy scores for EA and AA males, when the ancestry was defined by genetics than by self-report for models with only PRS features. Several discriminative features were identified for each of the models revealing novel predisposition sex and ancestry-specific AUD remission biomarkers. EEG-FC in all groups was found to distinguish between the continued AUD and the remitted AUD group, revealing DMN and fusiform lower and insula higher functional connectivity in the continued AUD group. Several discriminative PRS were shared (neuroticism PRS and years of education PRS in EA groups), while others PRS were group-specific, such as PRS associated with aggression were important for EA males, and depression PRS were important for EA females. Being married, employed, and taking medication predicted the maintenance of the AUD state. Overall, our findings suggest that wide range of multidimensional features with high internal homogeneity groups will formulate better predictive models.

Our results underscore previous findings showing the high predictive value of neurophysiological brain function to predict/classify neurological disorder8. The EEG-FC discriminative features highlight the difference in the neural connectivity underlying resting state spontaneous processes (mind wandering, self-reference, and other introspective processes) between those with continued AUD vs remitted AUD. The continued AUD group showed a lower level of DMN connectivity confirming previous findings of aberrant DMN function in AUD and across psychiatric conditions such as depression, schizophrenia, and autism44,45. DMN activity during resting state is implicated in memory consolidation because of the commonality of neural systems to both processes, and, therefore, may be related to aberrant related mental processes such as working memory deficit46, inferior cognitive performance, inferior memory formation, and poor learning of cognitive skills47 in alcoholics. Indeed the continued AUD EA female group showed lower connectivity of parahippocampal with the anterior and posterior areas of the brain, previously implicated in memory48 and cognitive49 functions, and the AA males group showed lower connectivity in the precuneus/posterior cingulate, two of the main DMN hubs, which are suggested to play a pivotal role in how the intrinsic activity is mediated throughout the DMN50. Both female groups showed lower connectivity involving the fusiform supporting previous fMRI resting-state findings linking cognitive impairment and lower fusiform connectivity51. The insula was found to increase connectivity with superior temporal (EA females) and inferior parietal (AA males) in the group that maintains AUD diagnosis. Previous study showed that greater functional coupling between the anterior insula and the left frontoparietal network is linked to smoking and impulsivity52. Given the insula’s role in interoceptive awareness and homeostatic processing, this lower activity of connectivity may relate to bias towards immediate rewards52 and increased tension53 associated with addiction52. Interestingly, the structural integrity of the salience network (insula) and DMN lower activity was previously linked to the salience network role as regulating dynamic changes in other networks54. This theory suggests that if the salience network in the continued AUD group is damaged it might relate to the DMN lower functioning55. Overall, these brain networks’ connectivity showed aberrant functions related to AUD. Observed EEG differences between the genders and between the ancestries support the importance of identifying group-specific prediction models. EEG is a highly heritable phenotype and the differences revealed are the first steps in identifying and distinguishing between different genders and ancestries for the purpose of deepening our knowledge about disease recovery.

The present study reinforces recent discoveries that show the inherent power of adding phenotypes to the genetics prediction model in order to increase accuracy8. Several group-specific PRS were identified as distinguishing between the continued AUD and the remitted AUD group. For example, while the EA male model includes PRS related to aggression, the EA female model includes PRS related to depression. Interestingly, both EA group models for the prediction of AUD remission include neuroticism PRS and years of education PRS. Our findings are in line with recently published studies showing PRS association with disorder outcomes in depression56, schizophrenia57, and alcohol-related phenotypes58. These findings highlight the potential of PRS collections to predict the course of development and recovery from diseases. PRS collections representing genetic fingerprint of various phenotypes allow embodying complex diseases with multiple domains. Notably, our results demonstrate the significance of using genetic data over self-report to identify self-ancestry, which increases the genetic homogeneity of the groups, leading to higher prediction scores.

Contrary to our hypothesis, taking medication (EA male: medication for sleep and other, AA female: medication for depression) predicted maintenance of AUD state. Evidence indicates that individuals suffering from comorbidity of other disorders will be disadvantaged in dealing with the physical and psychological processes that accompany withdrawal from addiction59. Specifically, alcohol and sleep disturbances have complex mutual relationships as alcohol is used by more than one in ten individuals as a hypnotic agent to self-medicate sleep problems60., thus increasing the likelihood of developing alcohol problems61. Moreover, studies show that sleep disturbances are extremely common during withdrawal from alcohol dependence and may persist for several months despite continued abstinence62, and hence may interfere with remission and contribute to relapse63. Contrary to our hypothesis, results showed that marriage and employment status predicted maintenance of the AUD state. Marriage and employment may add additional stressors to the alcoholic’s state of mind that precludes seeking help. Co-workers, spouses, and other family members often experience many tensions and heightened emotional distress caused by the negative consequences of living and working with a person with AUD64 leading to a challenging complex environment for the AUD individual. As many studies have noted, therapeutic programs treating married AUD individuals should involve the family/spouse64, and taking into account the difficulty for married and employed individuals to leave for rehabilitation for long periods of time, which has led to the development of programs such as Family Systems Therapy (FST)65 and Community Reinforcement and Family Training (CRAFT)66.

Results indicated higher accuracy for the AA groups over the EA group models. The research of biomarkers, prediction models, and machine learning algorithms rely on group homogeneity and relevant features. Therefore, the higher AA accuracy could be due to better fit of the features to the target and that EA genetic-based ancestry definition has variation leading to the reduction in EA group homogeneity. Studies have shown that EA forms a structured population due to historical immigration of diverse source populations67. Future studies might consider dividing EA groups to subgroups according to genetic variation or finding new approaches to define ancestry68,69.

Identifying individuals who are ready for the challenge to renounce addiction (and those who are not ready) holds enormous possibilities including intervention and therapy programs. Further, strengthening the AUD individual by altering those biomarkers, psychosocial or demographic “protective” characteristics, can elevate motivation for the initiation of successful remission. Overall, our findings demonstrate the importance of embedded ancestry and sex in the analysis towards the formulation of personalized prediction model. Interestingly, we found that identifying ancestry by genetic data might increase group homogeneity leading to higher accuracy of the prediction model. We further show that the model based on various features from different areas of health (genetics, electrophysiology, medication, and demographic data) outperform prediction models based on features derived from a single domain. We identified specific robust features of PRS and EEG functional connectivity for each sex/ancestry group, further expanding our knowledge of the predisposition biomarkers including genetics and brain mechanisms underlying the process of remission from AUD.


Given the uniqueness of the COGA dataset (with genetics, EEG measurements, and AUD remitter status), analysis on an independent dataset was not available. The latest prediction models’ approach is towards precision medicine, in which sex and ancestral stratification analysis produce more group-specific tailored results. This approach led to different sample sizes, with AA groups showing a smaller cohort. For homogeneous analysis across different group’ sizes, we applied CV analysis on all models, while additional training/testing validation was applied and confirmed the CV results on the larger samples (EA males and females, p > 0.1 for all models). Future studies with larger cohorts are required to further validate these results. Another limitation is related to the scope of features. Various symptomatic and psychosocial features were implicated in previous studies as associated with AUD development, including our own work8. These features were not included in the current analysis to enable a focus on biomarkers (genetics, brain function) for prediction. Future studies with a wider selection of features are required to further investigate the variables that best predict remission from AUD.

Code availability

All computer code used to generate the results reported here is freely available and can be accessed by contacting the corresponding author of this article.


  1. DA, D. et al. Recovery from DSM-IV alcohol dependence: United States, 2001-2002. Addiction 100, 281–292 (2005).

  2. Werb, D. et al. The effectiveness of compulsory drug treatment: a systematic review. Int. J. Drug Policy (2016).

  3. Moos, R. H. & Moos, B. S. Participation in treatment and Alcoholics Anonymous: a 16-year follow-up of initially untreated individuals. J. Clin. Psychol. (2006).

  4. Trim, R. S., Schuckit, M. A. & Smith, T. L. Predictors of initial and sustained remission from alcohol use disorders: findings from the 30-year follow-up of the San Diego Prospective Study. Alcohol. Clin. Exp. Res. (2013).

  5. Moos, R. H. & Moos, B. S. Rates and predictors of relapse after natural and treated remission from alcohol use disorders. Addiction 101, 212–222 (2006).

    Article  Google Scholar 

  6. Bi, J., Sun, J., Wu, Y., Tennen, H. & Armeli, S. A machine learning approach to college drinking prediction and risk factor identification. ACM Trans. Intell. Syst. Technol. (2013).

  7. Whelan, R. et al. Neuropsychosocial profiles of current and future adolescent alcohol misusers. Nature (2014).

  8. Kinreich, S. et al. Predicting risk for Alcohol Use Disorder using longitudinal data with multimodal biomarkers and family history: a machine learning study. Mol. Psychiatry (2019).

  9. Cho, G., Yim, J., Choi, Y., Ko, J. & Lee, S. H. Review of machine learning algorithms for diagnosing mental illness. Psychiatry Investig. (2019).

  10. Greicius, M. Resting-state functional connectivity in neuropsychiatric disorders. Curr. Opin. Neurol. (2008).

  11. Chatterjee, N., Shi, J. & García-Closas, M. Developing and evaluating polygenic risk prediction models for stratified disease prevention. Nat. Rev. Genet. (2016).

  12. Khera, A. V. et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. (2018).

  13. Li, J. J. et al. Polygenic risk, personality dimensions, and adolescent alcohol use problems: a longitudinal study. J. Stud. Alcohol Drugs (2017).

  14. Chang, L. H. et al. Association between polygenic risk for tobacco or alcohol consumption and liability to licit and illicit substance use in young Australian adults. Drug Alcohol Depend. (2019).

  15. Dvorak, R. D. et al. Alcohol use and alcohol-related consequences: associations with emotion regulation difficulties. Am. J. Drug Alcohol Abuse 40, 125–130 (2014).

    Article  Google Scholar 

  16. Stavro, K., Pelletier, J. & Potvin, S. Widespread and sustained cognitive deficits in alcoholism: a meta-analysis. Addict. Biol. (2013).

  17. Gonzales, R., Douglas Anglin, M., Beattie, R., Ong, C. A. & Glik, D. C. Understanding recovery barriers: youth perceptions about substance use relapse. Am. J. Health Behav. (2012).

  18. Harford, T. C., Hanna, E. Z. & Faden, V. B. The long- and short-term effects of marriage on drinking. J. Subst. Abuse (1994).

  19. Conway, T. L., Vickers, R. R., Ward, H. W. & Rahe, R. H. Occupational stress and variation in cigarette, coffee, and alcohol consumption. J. Health Soc. Behav. (1981).

  20. Dodge, K., Krantz, B. & Kenny, P. J. How can we begin to measure recovery? Subst. Abuse Treat. Prev. Policy (2010).

  21. Yaeger, R. et al. Comparing genetic ancestry and self-described race in African Americans born in the United States and in Africa. Cancer Epidemiol. Biomarkers Prev. (2008).

  22. Edenberg, H. J. et al. Description of the data from the Collaborative Study on the Genetics of Alcoholism (COGA) and single-nucleotide polymorphism genotyping for Genetic Analysis Workshop 14. BMC Genet. 6(Suppl. 1), S2 (2005).

    Article  Google Scholar 

  23. Zheng, X. et al. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics (2012).

  24. Gramfort, A. et al. MNE software for processing MEG and EEG data. Neuroimage (2014).

  25. Desikan, R. S. et al. An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. Neuroimage (2006).

  26. Srinivasan, R., Winter, W. R., Ding, J. & Nunez, P. L. EEG and MEG coherence: measures of functional connectivity at distinct spatial scales of neocortical dynamics. J. Neurosci. Methods (2007).

  27. Clarke, T.-K. et al. Genome-wide association study of alcohol consumption and genetic overlap with other health-related traits in UK Biobank (N = 112 117). Mol. Psychiatry (2017).

  28. Kranzler, H. R. et al. Genome-wide association study of alcohol consumption and use disorder in 274,424 individuals from multiple populations. Nat. Commun. (2019).

  29. Schumann, G. et al. KLB is associated with alcohol drinking, and its gene product β-Klotho is necessary for FGF21 regulation of alcohol preference. Proc. Natl. Acad. Sci. USA (2016).

  30. Gelernter, J. et al. Genome-wide association study of alcohol dependence:significant findings in African- and European-Americans including novel risk loci. Mol. Psychiatry (2014).

  31. Mahedy, L. et al. Parental alcohol use and risk of behavioral and emotional problems in offspring. PLoS ONE (2017).

  32. Lee, J. J. et al. Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat. Genet. (2018).

  33. Okbay, A. et al. Genome-wide association study identifies 74 loci associated with educational attainment. Nature (2016).

  34. Otowa, T. et al. Meta-analysis of genome-wide association studies of anxiety disorders. Mol. Psychiatry (2016).

  35. Pappa, I. et al. A genome-wide approach to children’s aggressive behavior: The EAGLE consortium. Am. J. Med. Genet. B Neuropsychiatr. Genet. (2016).

  36. Walters, R. K. et al. Transancestral GWAS of alcohol dependence reveals common genetic underpinnings with psychiatric disorders. Nat. Neurosci. (2018).

  37. Hibar, D. P. et al. Novel genetic loci associated with hippocampal volume. Nat. Commun. (2017).

  38. Keers, R. et al. A genome-wide test of the differential susceptibility hypothesis reveals a genetic predictor of differential response to psychological treatments for child anxiety disorders. Psychother. Psychosom. (2016).

  39. Privé, F., Vilhjálmsson, B. J., Aschard, H. & Blum, M. G. B. Making the most of Clumping and Thresholding for polygenic scores. Am. J. Hum. Genet. (2019).

  40. Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience (2015).

  41. Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Series B (2018).

  42. Knight, K. & Fu, W. Asymptotics for lasso-type estimators. Ann. Stat. 28, 1356–1378 (2000).

    Article  Google Scholar 

  43. Ghosh, D. & Chinnaiyan, A. M. Classification and selection of biomarkers in genomic data using LASSO. J. Biomed. Biotechnol. 2005, 147–154 (2005).

    Article  Google Scholar 

  44. Woodward, N. D. & Cascio, C. J. Resting-state functional connectivity in psychiatric disorders. JAMA Psychiatry (2015).

  45. Yan, C. G. et al. Reduced default mode network functional connectivity in patients with recurrent major depressive disorder. Proc. Natl. Acad. Sci. USA (2019).

  46. Chanraud, S., Pitel, A. L., Müller-Oehring, E. M., Pfefferbaum, A. & Sullivan, E. V. Remapping the brain to compensate for impairment in recovering alcoholics. Cereb. Cortex (2013).

  47. Loeber, S. et al. Impairment of cognitive abilities and decision making after chronic use of alcohol: the impact of multiple detoxifications. Alcohol Alcohol. (2009).

  48. Zola-Morgan, S., Squire, L. R., Amarel, D. G. & Suzuki, W. A. Lesions of perirhinal and parahippocampal cortex that spare the amygdala and hippocampal formation produce severe memory impairment. J. Neurosci. (1989).

  49. Aminoff, E. M., Kveraga, K. & Bar, M. The role of the parahippocampal cortex in cognition. Trends Cogn. Sci. (2013).

  50. Fransson, P. & Marrelec, G. The precuneus/posterior cingulate cortex plays a pivotal role in the default mode network: Evidence from a partial correlation network analysis. Neuroimage (2008).

  51. Cai, S. et al. Altered functional connectivity of fusiform gyrus in subjects with amnestic mild cognitive impairment: a resting-state fMRI study. Front. Hum. Neurosci. (2015).

  52. Clewett, D. et al. Increased functional coupling between the left fronto-parietal network and anterior insula predicts steeper delay discounting in smokers. Hum. Brain Mapp. (2014).

  53. Kinreich, S., Podlipsky, I., Jamshy, S., Intrator, N. & Hendler, T. Neural dynamics necessary and sufficient for transition into pre-sleep induced by EEG neurofeedback. Neuroimage 97, 19–28 (2014).

  54. Bonnelle, V. et al. Salience network integrity predicts default mode network function after traumatic brain injury. Proc. Natl. Acad. Sci. USA (2012).

  55. Jilka, S. R. et al. Damage to the salience network and interactions with the default mode network. J. Neurosci. (2014).

  56. Halldorsdottir, T. et al. Polygenic risk: predicting depression outcomes in clinical and epidemiological cohorts of youths. Am. J. Psychiatry (2019).

  57. Jonas, K. G. et al. Schizophrenia polygenic risk score and 20-year course of illness in psychotic disorders. Transl. Psychiatry (2019).

  58. Mies, G. W. et al. Polygenic risk for alcohol consumption and its association with alcohol-related phenotypes: Do stress and life satisfaction moderate these relationships? Drug Alcohol Depend. (2018).

  59. Quello, S. B., Brady, K. T. & Sonne, S. C. Mood disorders and substance use disorder: a complex comorbidity. Sci. Pract. perspect. (2005).

  60. Johnson, E. O., Roehrs, T., Roth, T. & Breslau, N. Epidemiology of alcohol and medication as aids to sleep in early adulthood. Sleep (1998).

  61. Weissman, M. M., Greenwald, S., Niño-Murcia, G. & Dement, W. C. The morbidity of insomnia uncomplicated by psychiatric disorders. Gen. Hosp. Psychiatry (1997).

  62. Cohn, T. J., Foster, J. H. & Peters, T. J. Sequential studies of sleep disturbance and quality of life in abstaining alcoholics. Addict. Biol. (2003).

  63. Brower, K. J., Aldrich, M. S., Robinson, E. A. R., Zucker, R. A. & Greden, J. F. Insomnia, self-medication, and relapse to alcoholism. Am. J. Psychiatry (2001).

  64. O’Farrell, T. J. & Clements, K. Review of outcome research on marital and family therapy in treatment for alcoholism. J. Marital Fam. Ther. (2012).

  65. Rohrbaugh, M., Shoham, V., Spungen, C. & Steinglass, P. in Comprehensive Textbook of Psychotherapy: Theory and Practice 228–253 (Oxford University Press, 1995).

  66. Manuel, J. K. et al. Community reinforcement and family training: a pilot comparison of group and self-directed delivery. J. Subst. Abuse Treat. (2012).

  67. Price, A. L. et al. Discerning the ancestry of European Americans in genetic association studies. PLoS Genet. (2008).

  68. Peterson, R. E. et al. The utility of empirically assigning ancestry groups in cross-population genetic studies of addiction. Am. J. Addict. (2017).

  69. Royal, C. D. et al. Inferring genetic ancestry: opportunities, challenges, and implications. Am. J. Hum. Genet. (2010).

Download references


The Collaborative Study on the Genetics of Alcoholism (COGA), Principal Investigators B. Porjesz, V. Hesselbrock, T. Foroud; Scientific Director, A. Agrawal; Translational Director, D. Dick, includes eleven different centers: University of Connecticut (V. Hesselbrock); Indiana University (H.J. Edenberg, T. Foroud, J. Nurnberger Jr., Y. Liu); University of Iowa (S. Kuperman, J. Kramer); SUNY Downstate (B. Porjesz, J. Meyers, C. Kamarajan, A. Pandey); Washington University in St. Louis (L. Bierut, J. Rice, K. Bucholz, A. Agrawal); University of California at San Diego (M. Schuckit); Rutgers University (J. Tischfield, A. Brooks, R. Hart); The Children’s Hospital of Philadelphia, University of Pennsylvania (L. Almasy); Virginia Commonwealth University (D. Dick, J. Salvatore); Icahn School of Medicine at Mount Sinai (A. Goate, M. Kapoor, P. Slesinger); and Howard University (D. Scott). Other COGA collaborators include: L. Bauer (University of Connecticut); L. Wetherill, X. Xuei, D. Lai, S. O’Connor, M. Plawecki, S. Lourens (Indiana University); L. Acion (University of Iowa); G. Chan (University of Iowa; University of Connecticut); D.B. Chorlian, J. Zhang, S. Kinreich, G. Pandey (SUNY Downstate); M. Chao (Icahn School of Medicine at Mount Sinai); A. Anokhin, V. McCutcheon, S. Saccone (Washington University); F. Aliev, P. Barr (Virginia Commonwealth University); H. Chin and A. Parsian are the NIAAA Staff Collaborators.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Sivan Kinreich.

Ethics declarations

Conflict of interest

Dr. Nurnberger is an investigator for Janssen on a separate study. Other investigators declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kinreich, S., McCutcheon, V.V., Aliev, F. et al. Predicting alcohol use disorder remission: a longitudinal multimodal multi-featured machine learning approach. Transl Psychiatry 11, 166 (2021).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI:

Further reading


Quick links