Behavioral and neurocognitive factors distinguishing post-traumatic stress comorbidity in substance use disorders

Significant trauma histories and post-traumatic stress disorder (PTSD) are common in persons with substance use disorders (SUD) and often associate with increased SUD severity and poorer response to SUD treatment. As such, this sub-population has been associated with unique risk factors and treatment needs. Understanding the distinct etiological profile of persons with co-occurring SUD and PTSD is therefore crucial for advancing our knowledge of underlying mechanisms and the development of precision treatments. To this end, we employed supervised machine learning algorithms to interrogate the responses of 160 participants with SUD on the multidimensional NIDA Phenotyping Assessment Battery. Significant PTSD symptomatology was correctly predicted in 75% of participants (sensitivity: 80%; specificity: 72.22%) using a classification-based model based on anxiety and depressive symptoms, perseverative thinking styles, and interoceptive awareness. A regression-based machine learning model also utilized similar predictors, but failed to accurately predict severity of PTSD symptoms. These data indicate that even in a population already characterized by elevated negative affect (individuals with SUD), especially severe negative affect was predictive of PTSD symptomatology. In a follow-up analysis of a subset of 102 participants who also completed neurocognitive tasks, comorbidity status was correctly predicted in 86.67% of participants (sensitivity: 91.67%; specificity: 66.67%) based on depressive symptoms and fear-related attentional bias. However, a regression-based analysis did not identify fear-related attentional bias as a splitting factor, but instead split and categorized the sample based on indices of aggression, metacognition, distress tolerance, and interoceptive awareness. These data indicate that within a population of individuals with SUD, aberrations in tolerating and regulating aversive internal experiences may also characterize those with significant trauma histories, akin to findings in persons with anxiety without SUD. The results also highlight the need for further research on PTSD-SUD comorbidity that includes additional comparison groups (i.e., persons with only PTSD), captures additional comorbid diagnoses that may influence the PTSD-SUD relationship, examines additional types of SUDs (e.g., alcohol use disorder), and differentiates between subtypes of PTSD.

A clear understanding of the hallmark risk factors and distinct endophenotypic profiles is needed to advance treatment strategies for trauma-exposed SUD populations.Various neurobehavioral risk factors have been linked with comorbid SUD and PTSD, including increased anxiety sensitivity [21], anhedonia [22,23], impulsivity [24][25][26][27][28], more severe executive functioning deficits [29][30][31], and increased sensitivity to reward and punishment [32].Yet, while such evidence supports potential etiological pathways toward shared PTSD-SUD vulnerability, the neurobehavioral traits that consistently distinguish comorbid from non-comorbid patients have not yet been identified.Machine learning (ML) is an analytic approach well suited to this task [33][34][35][36][37] and has been utilized by our research group to delineate a multivariate neurobehavioral profile of cocaine dependence [38].Indeed, ML techniques have considerable advantages over traditional inferential case-control comparisons, including superior accuracy, the ability to handle numerous (and intercorrelating) predictors simultaneously, and the capacity to quantify the relative importance of individual predictors [39].
In the present study, we employed supervised ML to determine neurobehavioral phenotypic features that associated with posttraumatic stress symptoms within a SUD population.Candidate predictors were gathered using the standardized Phenotyping Assessment Battery (PhAB) constructed in collaboration with the National Institute of Drug Abuse (NIDA) [40] based on Research Domain Criteria principles [41,42] for deployment in clinical trials of SUD pharmacotherapies [43].The PhAB is a modular compendium of neurofunctional domain-based assessments [40].Our study sought to identify comorbidity predictors from each of: the negative emotionality domain (e.g., low distress tolerance, anhedonia, anxiety, depression), metacognitive domain (e.g., distorted thought patterns and beliefs), reward domain (e.g., impulsivity, sensitivity to threat), interoceptive domain (e.g., willingness to tolerate aversive body sensations), and sleep domain (e.g., poor sleep quality).In addition, a subset of participants were administered a battery of neurocognitive measures to assess the predictive value of cognitive functioning, including working memory, attention, delay discounting, motoric impulsivity (i.e., response inhibition), and attentional bias towards threat.Consistent with our goal of identifying a neurobehavioral profile of comorbid post-traumatic stress symptoms in SUD, we constructed a series of classification and regression trees predicting post-traumatic stress symptom severity (i.e., score on the Post-traumatic Stress Disorder Checklist for DSM-5 (PCL-5)) as well as binary group identification (i.e., significant comorbid PTSD symptoms vs. no significant comorbid PTSD symptoms).

MATERIALS AND METHODS
The Virginia Commonwealth University Institutional Review Board approved the study and written informed consent was obtained from all participants.Data for the present investigation were collected in the course of a feasibility study for the NIDA PhAB [40].

Subjects
Participants were recruited for the parent study from an established participant registry and through local advertising [40].Eligibility criteria were intentionally broad to recruit a "real-world" sample of individuals with SUD.To be eligible for participation, individuals were required to be: (a) between 18 and 70 years of age, (b) able to complete forms and interviews in English, and (c) meet DSM-5 diagnostic criteria for a current, primary SUD with the substance of choice being either cocaine, marijuana, or opioids.The MINI International Neuropsychiatric Interview Version 7.0.2[44] was used to determine DSM-5 SUD diagnoses.The presence of any other condition or illness that, in the opinion of the Principal Investigator or study physician, would preclude safe and/or successful completion of the study was also a cause for exclusion.Participants could not meet DSM-5 criteria for a severe SUD related to other substances.However, due to the ubiquity of polysubstance use in the local population, individuals with mild

Assessments
Assessment-derived predictor variables.The NIDA PhAB "core" SUD-relevant domains evaluate cognition, reward, interoception, negative emotionality, metacognition, and sleep [40] (See Supplemental Information for fuller details on questionnaires and cognitive tasks).Nine self-report questionnaires and five neurocognitive tasks were employed here (Table 2).Each measure within the PhAB produced between one and eight scale/subscale scores.The battery of questionnaires/tasks was administered electronically via Redcap and Inquisit Lab software (Millisecond Software LLC, Seattle WA), and the average completion time for the battery was 180 min.
Assessment-derived outcome definitions.Because trauma exposure itself or even subthreshold PTSD symptoms may complicate SUD presentation [45] or treatment [46], the source PhAB protocol administered a dimensional probe of PTSD symptomatology using the PCL-5 [47].The PCL-5 is a 20-item self-report questionnaire that assesses key DSM-5 criteria for PTSD.
Items are rated on a 5-point Likert scale ranging from 0 to 4, and item ratings are summed to produce total scores ranging from 0 to 80 with higher scores reflecting greater symptom severity.Because most previous reports on PTSD-SUD comorbidity utilized clinical diagnostic criteria to obtain a binary (present/absent) diagnosis of PTSD, for harmonization with these studies, our analyses (described below) also featured a proxy dichotomization of (subclinical) vs. clinically-significant PTSD in addition to a continuous PTSD dimensional score.Based on previous findings [48], persons with scores ≥33 tend to have moderate to severe PTSD symptoms and probably meet DSM-5 criteria for PTSD, whereas persons with scores <33 tend to exhibit either a low level of PTSD symptoms or no PTSD symptoms.Thus, to demarcate participants with clinically significant comorbid PTSD, we employed this empirically derived cutoff score of 33.
Below a score of 33, the participant was classified with "absent to mild" PTSD symptoms.At or above 33, the participant was classified with "clinically significant" PTSD symptoms.Thus, this binary outcome was employed for supervised ML.The effect sizes of association between each predictor variable and the target variable (no significant PTSD symptoms vs. significant PTSD symptoms) are shown in Supplementary Figs. 1 and 2.

Statistical analysis
We used R version 4.0.3 and Salford Predictive Modeler (SPM) version 8 to perform all statistical analyses, and 0.05 was set as the level of significance for all relevant hypothesis tests.The scores from the PhAB battery described above comprised an initial 72 explanatory (predictor) variables.Due to technical errors and/or participant failures to comply with task instructions, Miss percentage (NS_MISS) Percentage of "go" trials to which there was no response complete and valid neurocognitive task data were available from 102 participants.Our primary analysis centered on the full sample (n = 160) for whom complete questionnaire data were available.Subsequently, we performed follow-up analyses using data from the 102-participant subset (Table 1) wherein the neurocognitive performance variables (Table 2) were added to the model with the questionnaire-based metrics (Table 2).We were, therefore, able to infer whether neurocognitive metrics emerged as appreciable, i.e., top 20 predictors of PTSD symptomotology and whether their inclusion improved the prediction of PTSD symptom status.
The ML algorithms employed in the study were TreeNet and Classification and Regression Trees (CART).TreeNet [49] is an empirical variable selection procedure that can be used to efficiently reduce the number of explanatory variables in a predictive model.TreeNet algorithms [50,51], also known as stochastic gradient boosting, offer several unique and useful features.These include (a) built-in estimation of prediction accuracy, (b) measures of feature importance, and (c) a measure of similarity between sample inputs.TreeNet improves upon classical decision trees such as CART [52] with retention of the most appealing properties of tree methods.The product of TreeNet is a ranked list of variable importance which is based on predictive models from an ensemble of weak learners in the form of classification trees.This methodology frequently outperforms Random Forest methods in prediction and variable selection [53]."Boosting" is a method that seeks to convert weak learners into stronger ones.Here, a learner is an algorithm, and in the case of stochastic gradient boosting, that algorithm is CART; thus, boosting aims to improve on CART methods by creating multiple CART trees based on subsets of the data.The result is that TreeNet creates a final model in a gradual, additive, and sequential manner.All default options for SPM were used, and the algorithm was run with 80% of the data randomly selected for model training, and 20% of the data randomly selected for model testing.Based on the TreeNet results, the top 20 most important variables were used as predictor variables for the CART algorithm [54], which was used to predict the target outcome.
CART is a highly useful nonparametric method for building decision trees and predicting novel relationships between phenotypic predictors and biomedical outcomes because there is no requirement to select input variables based on theoretical importance.CART handles structurally complex datasets comprised of both categorical and continuous data with extreme robustness and limited vulnerability to outliers [49].The three main components of CART involve creating a set of rules for splitting each node in a tree, deciding when a tree is fully grown, and assigning an outcome prediction to each terminal node of the tree [55].Decision trees produce a clearly interpretable split at each node which is a binary response of some feature in the data set.The basic algorithm for building the decision tree seeks a data feature which maximizes the split between the classes contained in the parent node.CART is a recursive algorithm such that, once an appropriate split resulting in two child nodes is determined, the child nodes then become the new parent nodes, and the process is carried on down the branches of the tree.The CART tree is considered fully grown once a split cannot be identified that reduces model impurity.CART uses cross-validation techniques to determine the accuracy of the decision trees.
We first derived CART models based on a regression approach in which the target variable was the continuous score on the PCL-5.In a regression-based decision tree, the continuous target variable is split using recursive partitioning into a series of bins containing subsets of scores along the continuum.The algorithm does not attempt to predict the exact score on the target variable for each individual, but rather the bin into which that individual's target score belongs.The average target score within each bin is therefore used as the predicted target score for all individuals who are presumed to belong in the respective bins.The number of bins and boundaries between bins are determined based on quantitative cross-validation aimed at reducing error.For instance, a small number of large bins typically diminishes precision, whereas a large number of small bins increases precision, but can sometimes result in overfitting and insufficient generalizability.We evaluated the accuracy of regression-based models using goodness-of-fit, or R 2 .
We subsequently derived classification-based CART models, which attempted to classify participants with subclinical levels PTSD symptoms vs. those with clinically-significant levels of PTSD symptoms.As such, the outcome categories were set using the PCL-5 trauma score split into a binary variable based on the empirically derived cutoff at 33. Model performance was examined using confusion matrices, sensitivity and specificity, the F1-score (harmonic mean of precision and recall), as well as area under the receiver operating characteristic (ROC) curve.Based on established benchmarks for the predictive accuracy of psychological tests within the literature [56], we deemed 70% accuracy to reflect acceptable performance, 80% accuracy to reflect good performance, and 90% accuracy to reflect excellent performance.

Full sample analysis
Regression.The goodness-of-fit for the TreeNet training and testing datasets were 0.88 and 0.77, respectively, thus indicating an acceptable degree of explained variance.The 20 most important predictor variables identified by the TreeNet algorithm (Table 3), and the resulting CART algorithm produced a tree containing five primary parent (i.e., splitting) nodes and six child (i.e., terminal) nodes (Fig. 1 3 and 4).Areas under the ROC curves for both training and testing data (Supplementary Fig. 3 5 and 6).Predictive performance, as evidenced by areas under the receiving operator characteristic curves, were 0.98 for the training data and 0.78 for the testing data (Supplementary Fig. 5).Although results indicated model overfitting for the training data, model accuracy was high enough to warrant further analysis.The 20 most important variables identified by the TreeNet algorithm, which included some neurocognitive assessments, are shown in Table 6.With the top 20 variables (self-report plus cognitive performance) entered as predictors, accuracies for the CART training and testing algorithms were 89.66% (Sensitivity = 100%; Specificity = 79.31%;F1 = 82.86%)and 86.67% (Sensitivity = 66.67%;Specificity = 75.00%;F1 = 41.39%)(confusion matrices shown in Supplementary Tables 7 and 8).Predictive accuracy, as reflected in an area under the curve, was 0.93 for the training data and 0.81 for the testing data (Supplementary Fig. 6).These results indicate good to excellent predictive accuracy.The decision tree derived in this CART analysis is displayed in Fig. 4. The only variables that emerged as splitting nodes in this CART algorithm consisted of the PROMIS Depression and the Fear Effect from the Emotional Go-NoGo Task (in order of importance).

DISCUSSION
The multi-dimensional assessment battery and ML analytics reveal significant insights into neurobehavioral factors associated with comorbid PTSD symptoms among patients with SUD.For participants with only self-report assessment data, the prediction algorithm  failed to accurately predict a dimensional PCL-5 score, but did accurately classify individuals as having significant comorbid PTSD symptoms (or not), based on symptoms of anxiety and depression as well as perseverative/intrusive thought patterns, low tolerance of aversive body sensations, and reduced ability to focus on body sensations.In participants who also completed neurocognitive assessments, continuous PTSD symptom severity and significant comorbid PTSD symptoms were predicted with impressive accuracy.Many of the same predictors that contributed to regression algorithms were also included in the classification algorithms, but the neurocognitive subsample classification algorithm uniquely included increased response bias toward by fear-related social information evaluated with the Emotional Go-NoGo Task.

No Yes
Our results support previous research linking comorbid SUD and PTSD to increased negative affect and cognitive biases.Elevated negative affect is a key feature of PTSD and is strongly associated with SUD, thus our observation that comorbid presentations are associated with increased anxiety and depression is logical.The current study also suggests that PTSD symptomatology in SUD patients is associated with a general deficit in regulating aversive internal states, whether cognitive (e.g., worrisome thoughts), affective (e.g., distress), or interoceptive (e.g., aversive body sensations) in nature.Indeed, this notion is consistent with observations that anxiety sensitivity and sensitivity to punishment are shared vulnerability factors [21,23,32].Anxiety sensitivity reflects the propensity to react negatively or fearfully to anxiety-related sensations, thoughts, emotions, or environmental stimuli and has been linked to PTSD [57], SUD [58], comorbid SUD-PTSD [59], and increased responsivity to the fear-dampening effects of alcohol [60,61].Further, there is indeed robust evidence of increased attentional bias toward social threat in anxietyrelated disorders [62][63][64] and PTSD [65].Exaggerated attentional bias toward negative information in anxiety disorders is thought to stem from an imbalance between an exaggerated bottom-up processing of threat or insufficient top-down regulation of threat response by executive control neurocircuitry [66]; this bias is thought to alter subsequent steps of cognitive processing, such as promoting rumination [62].Consequently, several clinical trials have attempted to reduce attentional bias to threat directly, wherein the patient is trained over multiple trials to disengage from threat stimuli such as by directing gaze to alternative stimuli [67].Similar bias modification techniques have also targeted attentional capture by substances of abuse [68], including more "gamified" approaches to reduce monotony [69].These findings collectively lead to the hypothesis that top-down regulation of attention may be bolstered by repeated bias training for one modality (e.g., threat) which may aid in reducing attentional capture by another trigger (e.g., drug-related stimuli).
This study supports existing findings and offers new insights into the relationship between SUD and PTSD.We anticipated that the addition of neurocognitive task scores to predict PTSD symptomatology would improve model performance and that scores from neurocognitive tasks would be utilized as splitting nodes in decision trees.This expectation stems from fact that the constructs measured by neurocognitive assessments, such as response inhibition, tend to be strongly associated with many mental illnesses [70,71].However, neurocognitive tasks may also be affected by transient states of fatigue or mental distraction, or conversely prone to uncharacteristically high vigilance summoned under artificial conditions.Selfreport questionnaires, with the generally longer time span and reallife related contexts specified in question items, are thought to be more stable than performance on "one-shot" computerized cognitive tasks [72].As such, it is possible that the presence of PTSD symptoms among SUD patients does not impede cognitive abilities in all contexts, but rather is associated with a subtle shift in cognition that can be functionally important and noticeable over time but are difficult to detect via computerized tests.Another noteworthy feature of our regression-based models was the indication that some constructs may be associated specifically with the extremes of posttraumatic stress symptom severity.For example, aggression-related constructs (from the Buss-Perry Aggression Scale) were only featured in regression algorithms as differentiating very high scores from moderately high scores on the PCL-5.These data may indicate that aggression is only a notable feature in SUD patients with severe PTSD symptoms.Accordingly, perhaps intervention strategies aimed at persons with SUD and mild-to-moderate comorbid PTSD should not be aimed at anger management or social functioning, whereas these may be important components of an intervention for severe comorbid PTSD.
In spite of these insights, the results should be considered in light of some limitations.First, because the study did not include a third group of participants with heightened PTSD symptoms alone (no SUD), we cannot conclude that the traits/risk factors identified truly differentiate comorbid SUD-PTSD from each disorder alone.The predictors highlighted in our results may be reflective of the increased overall disorder burden of comorbid conditions rather than SUD or PTSD alone [73].Therefore, future studies will benefit from approaches that identify which psychopathological mechanisms differentiate comorbid SUD-PTSD from each disorder on its own.We also did not collect data on comorbid psychiatric diagnoses, which could have influenced study results and could be highly relevant for personalized treatment approaches.Additionally, while the inclusion of persons across multiple types of SUD can enable our predictive variables to serve as potential risk markers that cut across different types of SUDs, our results could presumably vary depending upon the primary substance of abuse.Indeed, individuals with more significant PTSD symptoms related to the hyperarousal symptom cluster were more likely to abuse alcohol, while those with more of avoidance and reexperiencing symptoms were more likely to abuse cocaine [74].
Future studies are therefore needed to compare data across multiple samples of patients who abuse different substances, including substance use disorders not examined in the current study (e.g., alcohol, methamphetamine) and polysubstance use disorders.Relatedly, heterogeneity among symptoms of PTSD may have impacted our findings, such that important subtypes of posttraumatic stress may be associated with unique phenotypic signatures.Multiple subtypes of PTSD have been proposed, including externalizing vs. internalizing subtypes as well as a dissociating subtype [75].It is therefore possible that participants with comorbid PTSD symptoms were affected by distinct clusters of symptoms, which do not translate well into a linear models of disorder severity, and thus are not easily predicted by our ML approaches.While our predictive algorithms achieved relatively high accuracy, there is a clear need for future studies to deliver greater predictive accuracy by either increasing sample size or including additional biological predictors, particularly neurophysiological variables such as peripheral biomarkers, neuroimaging metrics, and genetic information.Nevertheless, the performance of our models support a combination of self-report traits and neurocognitive abilities to be involved in the underlying etiology of comorbid SUD and PTSD.Our results from a machine-learning-based approach largely align with previous work, yet also provide a efficient integration of works investigating the role of a single risk factor or etiological feature.Future studies are planned to replicate these results and determine whether targeted treatments aimed at the identified psychopathological processes are associated with improved care for this high-need population.

Table 1 .
Demographic characteristics of participants.

Table 2 .
continued Mean interval in ms between "go" and subsequent "stop" signal that fostered a 50% successful rate of stopping Stop signal reaction time (SSRT) Time required to stop an initiated go process (smaller numbers reflect better inhibition) Mean reaction time in stop trials (SR_RT) Response times of commission errors in stop trials Mean reaction time in go trials (NS_RT) Response times during correct "go" responses Hit percentage in go trials (NS_HIT) Percent correct indication of direction (L/R) of target arrow

Table 3 .
Top 20predictors of significant PTSD symptoms in TreeNet Regression analysis-full sample.
). Variables used as splitting nodes, in order of importance, were the PROMIS Anxiety, Metacognitions Questionnaire-Uncontrollability/ Danger, Buss-Perry Total Score, Buss-Perry Anger, and SUPPS Positive Urgency.Unfortunately, this model failed to demonstrate adequate goodness-of-fit, with R 2 values in the training and testing datasets at 0.59 and 0.42, respectively.

Table 4 .
Top 20 predictors of significant PTSD symptoms in TreeNet classification analysis-full sample.Neurocognitive sub-sample analysis Regression.The goodness-of-fit for the TreeNet training and testing datasets were 0.69 and 0.64, respectively, thus indicating slightly sub-optimal model performance.The twenty most important predictor variables identified by the TreeNet algorithm are shown in Table5.Results demonstrated that some neurocognitive task metrics supplanted self-report scales as predictors of trauma symptomatology, namely orienting and conflict effects in the Attention Network Task and happy and fear effects in the Emotional Go-NoGo Task (See Supplemental Material for detailed descriptions of neurocognitive task metrics).The subsequent CART algorithm produced a tree containing six primary splitting nodes and seven terminal nodes (Fig.3).Variables used as splitting nodes, in order of importance, included PROMIS Anxiety, Metacognitions Questionnaire-Uncontrollability/Danger, Buss-Perry Hostility, Distress Tolerance Scale-Absorption, and Multidimensional Assessment of Interoceptive Awareness-Not Distracting.No neurocognitive assessments emerged as splitting nodes.This model demonstrated potentially adequate goodness-of-fit, with R 2 values in the training and testing datasets at 0.72 and 0.64, respectively.

Table 5 .
Top 20predictors of significant PTSD symptoms in TreeNet regression analysis-neurocognitive sub-sample.

Table 6 .
Top 20predictors of significant PTSD symptoms in TreeNet classification analysis-neurocognitive sub-sample.