Multi-Domain Potential Biomarkers for Post Traumatic Stress Disorder (PTSD) Severity in Recent Trauma Survivors

Importance Uncovering objective correlates of PTSD severity may improve early case detection and treatment decisions. Objective To test the ability of an innovative analytic approach to select a set of multimodal biomarkers that efficiently differentiates PTSD subtypes shortly after traumatic event. Design Observational cohort study of general hospital emergency department (ED) admissions for traumatic events seen between 2014 and 2018. A three-staged semi-unsupervised computational method (alias “3C”) was used to categorize trauma survivors based on current PTSD diagnostics, derive clusters of self-reported depression and anxiety symptoms related to these categories, and to predict and classify participants’ cluster membership using concurrently collected neurocognitive and neuroimaging data. 256 features were extracted from psychometrics, neurocognitive and neuroimaging (structural and functional) data obtained within a month of trauma exposure. Setting Information on consecutive ED trauma admission was used to initiate telephone screening interviews, followed, in eligible survivors, by clinical, neurocognitive and brain imaging assessments. Participants 101 adults survivors of traumatic events (78% motor-vehicle accidents, 51 females, average age = 34.80; age range = 18 – 65). Main Outcomes and Measures Objective features (alias “potential biomarkers”) that best differentiate clusters membership were derived from this set. Importance analysis, classification trees, and one-way ANOVA were used to test the potential biomarkers’ contributions. Results Entorhinal and rostral anterior cingulate cortical volumes; executive function, cognitive flexibility; and the amygdala’s functional connectivity with the insula and thalamus best differentiated between the two participants severity clusters. Cross-validation established the results’ robustness and consistency within this sample. Conclusions and Relevance This work demonstrates the ability of multi-domain analytic assessment using standardized and objectively measured neuro-behavioral features to differentiate PTSD subgroups at the early aftermath of traumatic events. Differentiating features revealed by this work are consistent with previously reported neurobehavioral PTSD attributes. Trial Registration Neurobehavioral Moderators of Post-traumatic Disease Trajectories. ClinicalTrials.gov registration number: NCT03756545. https://clinicaltrials.gov/ct2/show/NCT03756545 Key Points Question Can we computationally derive objective classifiers of post-traumatic stress disorder (PTSD) shortly after traumatic event? Findings 256 features were extracted from psychometrics, neurocognitive and neuroimaging data obtained from 101 recent trauma survivors. A semi-unsupervised computational method (3C) successfully categorized cases based on PTSD diagnostics, derived clusters of severe and mild PTSD, and revealed neurocognitive and neuroimaging features that efficiently classified patients’ status. Meaning Biomarkers revealed by the 3C offer objective classifiers of post-traumatic psychiatric morbidity shortly after traumatic event. They also map into previously documented neurobehavioral PTSD features, thus support the future use of objective measurements to more precisely identify post-traumatic psychopathology.


Introduction
Post-traumatic stress disorder (PTSD) symptoms are commonly observed shortly after exposure to trauma, and their initial extent has been associated with a high risk of poor-recovery [1][2][3][4][5] . This categorization is currently based on reported symptoms as captured by structured interviews such as Clinician-Administered PTSD Scale (CAPS).
While showing good reliability, construct and predictive validity, the CAPS, also has notable limitations. First, it doesn't capture symptoms often co-expressed with PTSD such as depression and general anxiety 6 . Second, it is only weakly linked with neurocognitive measures and other objective measurements of putative biological mechanism [7][8][9] . Third, by solely accounting for reported-or observed symptoms, the CAPS entirely relies on subjective, interpersonal reporting. These limitations may be responsible for CAPS diagnoses instability across time 10 and with their suboptimal guidance of individualized clinical management 11 .
Cognitive functioning is one of many dimensions often overlooked in this context. Studies to date have reported associations between cognitive deficits and PTSD (for recent meta-analyses see 12,13 ), including working memory, information processing speed, verbal learning, short-term and declarative memory 14,15 , attention and executive functioning 16,17 , response inhibition and attentional switching [18][19][20][21][22][23] . Conversely, adaptive cognitive functioning has been linked with resilience and reduced likelihood of PTSD symptoms development and maintenance 13,24 . Another relevant domains underlying PTSD clinical manifestations are brain structure and function. Accumulating neuroimaging evidence point to the presence of structural and functional brain abnormalities in PTSD patients 7,[25][26][27][28] , like lower hippocampal volume [29][30][31][32] and altered 7 activity and connectivity of the amygdala, insula, ACC, mPFC and dlPFC; structures involved in threat detection, executive function, emotion regulation, and contextual processing 7,25,[33][34][35][36][37] . Importantly, some structural and functional neuroimaging studies also point to early predisposing factors to the development of PTSD 25,34,[38][39][40] . Despite promising finding from neurocognitive and neuroimaging studies these objective measures have not been integrated in a routine assessment or management of PTSD. One obstacle for such translation is the lack of a clear guidance as to how these objective indicators actually cluster, inform PTSD subtypes, and better guide eventual interventions.
One way to overcome this translational gap is using advanced data-driven computational and statistical methods that can evaluate a wider arrays of potential biomarkers and disorder's indicators and quantify their relationship with clinical manifestations. Machine learning methods are particularly well-suited to confront the computational challenges, because they are able to take the complex interrelation of many relevant factors into account 41 . Indeed, in the last decade, there has been an exponential increase in machine learning approaches in the field of posttraumatic stress, including both supervised and unsupervised approaches. Supervised analytic approaches make assumptions and are therefore limited by how much the assumptions are informed accurately by prior knowledge 42 . On the other hand, unsupervised approaches make few or no assumptions, but are limited in such that subpopulations revealed by the analysis are not tied to specific questions of interest 42 .
A possible solution is using hybrid analytic methods, which combine both supervised and unsupervised approaches, and may help advance more precise diagnosis 8 while identifying novel combination of potential biomarkers of specific disorders 42 . This study applied the recently developed three-staged hybrid analytic methodology, termed 3C (Categorize, Cluster, and Classify); semi-unsupervised, since it combines theory-and data-driven approaches 43 . The analysis included both clinical assumptions, and state-ofthe-art data-driven analysis of objective measures obtained from neuroimaging and cognitive testing on recent trauma survivors. We assumed that such a hybrid analytic approach could unveil a unique set of mechanism related potential biomarkers for PTSD subtypes that could yet closely tied to already existing diagnosis. The term "subtypes" can account for different demographics, clinical sub-scales or symptom severity of the disorder.

Methods
We used the 3C procedure on a multi-domain dataset composed of clinical interview and questionnaires, computerized cognitive testing and neuroimaging of brain structure and function, obtained from recent trauma survivors discharged from a general hospital's emergency room (ER) after experiencing a traumatic evene and examined in our lab within one month from the incident.
Participants. Participants were 101 trauma survivors (age=34.80±11.95, range 18-65, 51 females), admitted to the Tel Aviv Sourasky Medical Center's ER following a traumatic experience. The most common trauma types among these individuals were motor-vechile accidents (n=79, 78%)Those who met a full PTSD symptom diagnosis at the face-to-face clinical interview were defined "High-Symptomatic PTSD" (HiPTSD) (n=58), and those who did not meet this criterion, but still suffered from a variety of PTSD symptoms, were defined "Low-Symptomatic PTSD" (LoPTSD) (n=43). The participants were part of a larger prospective longitudinal study. For more details see eMethods.
Clinical Assessment. PTSD symptom severity was quantified using the Clinician-

Procedure. See eMethods for details.
Statistical Approach. We used the 3C approach 43 . The 3C method assumes that existing medical knowledge of a given disorder is critical, yet not sufficient for an accurate 10 diagnosis. It builds upon the current diagnosis approach and expands it with an unsupervised data-driven approach, dividing patients into homogeneous clusters based on common characteristics. Important, those clusters enable to explore the relevance of objective multi-domain measures which are related to the specific disorder subtype. For example, a recent study using the 3C method 59 discovered new sub-phenotypic groups in a large Alzheimer's disease dataset ("ADNI" 60  Unsupervised clustering of the selected clinical measurements using k-medoids with Manhattan distance metrics. This allowed us to discover data-driven homogeneous clusters that are based on the existing diagnostics (i.e., dividing participants into subtypes based on common characteristics). Prior to clustering, we determined the optimal number of clusters based on two metrics: Gap statistics 61 and Silhouette 62 . The Classification stage includes characterization of the clusters by the objective potential biomarkers that is performed using three different approaches: importance analysis (mean decrease GINI 63 ); classification trees; and one-way analysis of variance (ANOVA) between the clusters.

Results
In the Categorization stage, the features were divided into three distinct categories required by the 3C methodology: The Assigned Diagnosis was based on the CAPS-4 total scores, the Clinical Measurements included the total scores of the four self-report questionnaires (PCL, BDI, BAI, and CGI), and the Potential Biomarkers included 11 standardized total scores obtained from computerized cognitive testing, 192 features from structural imaging including volumes and thickness of subcortical and cortical areas, 48 features extracted from fMRI during the emotional faces matching task, including wholebrain activations and functional connectivity from limbic areas.
In the Clustering stage, unsupervised clustering was applied on the Clinical Measurements that were correlated with PTSD symptom severity as indicated by CAPS-4 total scores: PCL, BDI, BAI, and CGI (based on the ad-hoc threshold of FDR-adjusted p< 0.2). Participants were divided into an optimal number of two clusters, based on Gap statistics and Silhouette, which had the best separation on all four clinical measurements (PCL, BDI, BAI, and CGI). Participants belonging to the first cluster were, on average, showed higher severity on all four clinical measurements, compared to those belonging to the second cluster (see Fig. 1). To support the link between these clusters (representing high and low "disease load") and PTSD diagnosis (PTSD or not according to CAPS-4), a two-sample test for equality of PTSD proportions between the two clusters was conducted. Results revealed a significant link between the clusters in PTSD dichotomous diagnosis (F 1,99 =35.47, p<0.001, 2,1 =20.911, p<0.001, CI of the difference of proportions = (0.28,0.63)) (see Table 1). Furthermore, a one-way ANOVA was performed on the CAPS-4 total scores (continuous PTSD symptom severity measure) with the clusters as the grouping variable. Results indicated that individuals of cluster 2 had significantly higher CAPS-4 total scores (M=61.91±18.16) compared to those of cluster 1 (M=37.45±23.12)(p<0.001) (see eFig. 1). It is important to note that these clusters do not represent division directly to high and low CAPS severity scores, but rather stand for a data-driven participants division into severity subtypes, based on selfreported symptoms of depression (BDI), anxiety (BAI), post-trauma (PCL) and patients' subjective global impression (CGI). As mentioned above, those subtypes (clusters) were found to correlate with the PTSD clinical diagnosis and severity, but were not identical to it, and rather represent "disease load" cluster. Accordingly, from now on cluster 1 will be referred to as the "Low-Symptomatic" Cluster (LoClus, low PTSD severity), whereas cluster 2 will be referred to as the "High-Symptomatic" Cluster (HiClus, high PTSD severity).

13
In the Classification stage, we classified the clusters based on objective variables that could serve as potential biomarkers, using mean decrease GINI measure (i.e. importance index) for each biomarker. The most important potential biomarkers for the clustering division included left entorhinal cortex (EC) volume (Importance=0.884), cognitive flexibility (Importance=0.487), rostral anterior cingulate cortex (rACC) volume (Importance=0.429), and average amygdala functional connectivity with the thalamus while watching angry faces (Importance=0.419). The top ten most important potential biomarkers for the clustering division are presented both in Figure 2 and eTable 1.
To further characterize the different patients included in each cluster according to the identified potential biomarkers, we built classification trees (see Fig. 3). Results determined that the left hemisphere EC volume had the greatest importance for the clustering, indicating that subjects with lower EC volume were more likely to belong to the HiClus. The next two splits of the classification tree were dividing individuals based on their executive function score and left hemisphere caudal middle frontal gyrus volume. For a further description of the classification tree, see eResults. Finally, a oneway ANOVA was conducted on the potential biomarkers with HiClus/LoClus as a grouping variable. After BH adjustments there was no significant difference in any potential biomarker between HiClus/LoClus at a 0.05 level (partly due to hundreds of pvalues that were adjusted).
Lastly, to assess the generalizability of our results, a cross-validation test was conducted. In each iteration, the 3C method was performed on a randomly chosen P percent of the total subjects. A random forest method was used to classify the remaining 1-P subjects into the two clusters. This procedure was repeated 1000 times for each P.
14 The results are presented in Table 2, which depicts the mean and SD of the percentage of subjects that were correctly classified (according to the results of the 3C methodology based on all of the subjects). For example, when the 3C algorithm was based on only 20% of the subjects, on average, 83% were correctly classified (out of the other 80% of the subjects). This analysis confirmed the validity and generalizability of the 3C method.

Discussion
This work illustrates the usefulness of a novel 3-staged (alias 3C) data analytic approach in classifying recent trauma survivors using a set of potential biomarkers. The 3C 'hybrid' data-and theory-driven analytic approach combined current diagnostic-based categorization, symptom severity-based clustering and data-driven retrieval of classifying potential biomarkers. Unlike extant supervised machine learning models, which 'flatten' information sources and dimensions into one data matrix, the 3C method uniquely and hierarchically combines current PTSD diagnostics with two layers of data-driven exploration: clinical symptom severity and concurrently-recorded objective measurement of neurocognitive functioning, brain structure and function. The differentiating biomarkers revealed by the 3C approach are in line with previously documented neural and cognitive correlates of PTSD. They therefore provide a template for early objective mechanism based categorization of trauma survivors' psychopathology.

Potential biomarkers for PTSD
The potential biomarkers for PTSD severity included features obtained from both neuroimaging and cognitive testing. Within the structural brain domain, the features that had the most influence on the PTSD class division was the volume of the entorhinal 15 cortex (both according to importance analysis and classification trees, see Fig. 2 and 3), such that lower entorhinal cortex volume was associated with higher PTSD severity. The entorhinal cortex plays an important role in memory, a key feature of post-traumatic psychopathology, as the uncontrolled memory recall of the traumatic event determines symptom severity 64-66 . Another structural feature found to be of importance to the classification was the volume of the rostral anterior cingulate cortex (rACC), such that lower rACC volume was associated with higher PTSD severity. Indeed, rACC volume was previously associated with PTSD 34,67,68 . Importantly, it was also shown to predict cognitive behavioral treatment response for individuals with PTSD 69 , suggesting its potential as a guiding mechanism-based early intervention closely after trauma.
To note that our classifier didn't identify several structural abnormalities found in previous cross-sectional studies of PTSD 7,25,34,38 . This includes the most replicated finding of small hippocampus 70 , but also abnormal volume of the amygdala, insular cortex, medial and dorsal prefrontal cortices (mPFC and dlPFC respectively) 28,31,33,71-73 . This could stem from our classifier identifying early stage severity-related (or "disease load") biomarkers, rather than DSM-based categories of PTSD diagnosis. Our approach identified structural abnormalities within one-month after trauma as related to symptom severity; Since major changes in gray matter volume within this time frame are less likely to occur 74 , these abnormalities might be early predisposing risk factors for chronic PTSD development. Future studies on population prone to trauma exposure with longitudinal measurement could shade more light on the causal inference of our findings. It could also be that the other structural abnormalities mark the long-term trajectory of the disorder rather its early staged status [75][76][77] .
Within the functional neuroimaging domain, amygdala functional connectivity with both the insula and the thalamus was found to be particularly important for the classification (both according to importance analysis and classification trees, see Fig. 2 and 3). Hyper-connectivity of amygdala with other structures is consistent with previous studies, [78][79][80] and abnormal amygdala activation had been hypothesized to contribute to PTSD pathophysiology (refs) . Moreover, thalamic dysfunction has been found in patients with PTSD, suggesting its role in the disorder psychopathology 81,82 . To sum, although neuroimaging studies have pointed to several functional brain abnormalities as potentially playing a role in the pathophysiology of PTSD, our computational analysis showed that some of these abnormalities involved in PTSD severity subtype already in the early aftermath of the trauma.
Within the cognitive domain, the most significant differentiating factors were cognitive flexibility level (according to importance analysis, see Fig. 2) and executive function level (according to classification trees, see Fig. 3). Indeed, meta-analyses regarding the role of cognitive functions in PTSD consistently showed impaired ability of executive functioning including cognitive flexibility among PTSD patients 16,17 . Cognitive flexibility; the ability to switch between two different tasks or strategies 16 is of particular interest as it has recently found to correlate with PTSD trajectory in two independent longitudinal cohorts 24 . In specific, lower scores of cognitive flexibility correlated with more PTSD symptom at one month after trauma, as well as predicted persistent PTSD a year later. More so, the disturbance in cognitive flexibility was ameliorated by a cognitive intervention and associated with better treatment outcomes 24 , implying its role as an early recovery related process. One possible mechanism for that is that intact cognitive flexibility enables the individual to better differentiate between threat-related and neutral situations, thus assists in the extinction of fear-motivated learning, a coreelement in PTSD recovery 83 .

Methodological considerations
Our analytic 3C approach revealed two PTSD subtypes (classes) correlated with high and low clinical severity, both according to total CAPS score and across all CAPS subscales (re-experiencing, avoidance, negative alterations in cognitions and mood, and alternations in arousal and reactivity). Nevertheless, our analysis did not find classes representing different clinical subtypes (e.g. dissociative subtype, greater avoidance, etc.). This could be accounted by the fact that the 3C clustering stage is based on a given set of clinical measurements, and the classification stage is based on predetermined measures of potential biomarkers. It is possible that a larger dataset (i.e., more measures and more participants), may allow the identification of unique clinical classes that correspond to different symptomatic subtypes (possibly also more than two classes).
Indeed, in the effort to account for the heterogeneity in the expression of PTSD, several works tried to characterize different subtypes of the disorder. One example is an externalizing type characterized by low constraint and high negative emotionality, compared to an internalizing cluster with high negative emotionality and low positive emotionality [84][85][86] . Another example is a dissociative subtype for patients with PTSD and de-personalization and/or de-realization symptoms, which was introduced by the fifth edition of the DSM [87][88][89] . It is worth noting that there is no definitive ad-hoc threshold for selection of clinical measurements, and different measures strongly efffects the clusters created and the derived classification of potnetial biomarkers. Lastly, adding longitudinal measurements from different time-points after trauma may assist in revealing classes corresponding to PTSD clinical trajectories. This may be cruical for identification of individuals which are at risk to develop PTSD, as well as to provide with appropriate early-stage treatment.
In conclusion, our study implemented an innovative computational approach that unveiled novel variables that were correlated with morbidity classification of recent trauma survivors, and linked to mechanisms that generate PTSD symptoms. The method is utilizing the current DSM-based PTSD diagnostic categories and other clinical severity measures of depression and anxiety, as well as a data-driven classification of multidomain potential biomarkers. Our results point to an alternative approach for identifying objective variables linked to PTSD severity subtypes (high and low), based on testing within a single session closely after the exposure to trauma. This computational objective classification of potential biomarkers, if successful may further guide mechanism-driven interventions for PTSD (e.g. cognitive remediation or neuromodulation treatments). This computational approach, performed on more participants and with more clinical measures and potential biomarkeres, may help to further refine PTSD diagnostic subtypes, and play an important role in the development of more precise management of recent trauma survivors.   total score (colored in red) for each cluster, together with the top ten most important potential biomarkers (Y-axis): neurocognitive domains (colored in yellow), structural brain measurements (colored in blue), and functional brain measurements (colored in green). The first cluster (HiClus) is colored turquoise, while the second cluster (LoClus)

Tables Titles and Legends
is colored red. The bold line of each cluster represents the median of each variable for this cluster, whereas the scattered "cloud" around it represents the 95% confidence interval (CI).