Introduction

Posttraumatic stress disorder (PTSD) is a psychiatric disorder that can develop after experiencing a traumatic event. It is characterized by states of re-experiencing of the traumatic event, avoidance of trauma-reminders, emotional numbing, and hyperarousal1. PTSD lifetime prevalence rates in the general population are estimated to be below 10% (varying between 1.3 to 8.8% depending on the country)2,3 but can vary heavily in veterans (between 1.4 to 31%)4,5. Treatment of PTSD typically involves trauma-focused psychotherapy with or without the administration of medication such as selective serotonin reuptake inhibitors (SSRIs). Trauma-focused therapies, such as trauma-focused cognitive behavior therapy (TF-CBT) or eye movement desensitization and reprocessing (EMDR) have been suggested as first-line treatments for treating PTSD6,7. However, 30–50% of patients do not benefit sufficiently8. To improve treatment response rates it is important to better understand differences between responders and non-responders, and identify reliable predictors for treatment outcome.

PTSD is characterized as a brain disorder showing alterations in activity and connectivity of cortical and sub-cortical brain regions. The neurocircuitry model of PTSD suggests that PTSD pathology is characterized by hyperactivity and increased connectivity of the amygdala, the anterior insula and the anterior cingulate cortex, decreased activity of the ventromedial prefrontal cortex (vmPFC) and hypoconnectivity between vmPFC, hippocampus and amygdala9,10,11,12. Functional connectivity of these regions can be recorded using neuroimaging techniques such as resting-state functional magnetic resonance imaging (rs-fMRI). Therefore, it is important to investigate if those alterations in rs-fMRI connectivity could be used to predict treatment-outcome and reveal biomarkers to increase the treatment-response rate. Indeed, pre-treatment group differences in fMRI activity and connectivity were observed between responders and non-responders in PTSD in several studies13,14,15,16,17. However, these group-level univariate analyses focus on average differences between responders and non-responders. This does not allow inference at the individual patient level, which can be achieved using multivariate supervised machine learning analyses18,19. Most importantly, performance is evaluated on new data to estimate the generalizability of the trained models, and thereby enabling the prediction of treatment outcome for new patients. Machine-learning analyses have been performed in the context of PTSD using different modalities of MRI data to distinguish between patients and controls20,21,22,23. However, only two studies to date have used machine learning analyses to predict future outcome at an individual level. One study aimed to predict clinical status two years after treatment with 12 weeks of paroxetine in a sample of 20 civilian PTSD patients24. This study used pre- and post-treatment rs-fMRI derived measures, namely amplitude of low-frequency fluctuations and whole-brain degree centrality maps, and the results showed that pre- but not post-treatment measures were able to predict remission status after two years with an accuracy of 72.5%. But as all but one patient had been in remission shortly after treatment, these results reflect relapse rather than treatment outcome. In addition, one recent study used a combination of resting-state connectivity within the ventral attention network and delayed recall performance in a verbal memory task to predict the response to prolonged exposure therapy in ~19 civilian patients with PTSD25. Although the proportion of treatment non-responders was low, the classifier still managed to distinguish the groups with ≥80% sensitivity and specificity.

To determine whether neuroimaging data could also predict treatment outcome in a larger sample of combat veterans with PTSD, we analyzed pre-treatment structural MRI and rs-fMRI data of 44 patients who received treatment-as-usual. This consisted of trauma-focused psychotherapy such as TF-CBT and EMDR, and clinical outcome was determined 6–8 month following the baseline fMRI scan. We previously reported pre-treatment group differences in structural26, white-matter27 and task-based (f)MRI13,28 between responders and non-responders, as well as rs-fMRI differences between patients and controls29. In another study, we explored time-by-group interactions between remitted and persistent PTSD patients based on network measures derived from rs-fMRI30. In the current study, we focus on pre-treatment MRI measures exclusively within a machine-learning paradigm. We extracted functional connectivity (FC) within resting-state networks (RSNs) using independent component analysis (ICA). ICA was chosen as the method for analyzing rs-fMRI because it provides a multivariate and data-driven way to identify multiple RSN’s present in the data. It does not require the definition of seed regions and is more robust to noise31. To ensure independence between RSN identification and estimation of RSN expression for each individual patient, the ICA was performed on rs-fMRI data of sex and age matched combat controls (n = 28). In addition, we derived maps of regional gray matter volume using voxel-based morphometry (VBM). Abnormalities in structural MRI data have been repeatedly linked to treatment-response in PTSD26,32,33,34,35 and we therefore used it as a standardized and easy to acquire imaging baseline. For both the rs-fMRI and VBM data, we performed univariate inference on the group level as well as multivariate prediction on the single-subject level using Gaussian process classification (GPC) with 10 × 10 cross-validation. The current study is exploratory and investigated the general feasibility of structural and rs-fMRI data to predict treatment-outcome in PTSD on the single-subject level. Because of its exploratory nature, there were no a priori hypotheses formulated.

Materials and methods

Participants

In total 57 veterans with PTSD and 29 combat controls (CC) were included in the study. Patients were recruited from one of four outpatient clinics of the Military Mental Healthcare Organization in Utrecht, The Netherlands. PTSD diagnosis was established by a licensed psychologist or psychiatrist. The Clinician Administered PTSD scale (CAPS)36 for DSM-IV1 was administered by trained research staff to quantify the total symptom severity and had to be ≥45. Combat controls had to have no current psychiatric disorders and a total CAPS score < 15. Further inclusion criteria for all subjects were deployment to a war zone and 18–60 years of age. Comorbid disorders were examined using the structured clinical interview for DSM-IV (SCID-I)37. Subjects with a history of neurological disorders, current substance dependence and contraindications for MRI scanning were excluded. From the initial 57 PTSD patients, seven were lost to follow-up, three were excluded based on excessive motion during scanning (see Supplementary information), one due to an artifact in the MRI scan, and one due to refusal of scanning. One additional participant was excluded as she was the only female in the sample. This lead to the final sample of 44 PTSD patients. From the CC only one subject had to be excluded based on excessive motion (n = 28).

After a period of six to eight months in which patients underwent treatment-as-usual consisting of trauma-focused therapy (e.g., TF-CBT, EMDR) a second CAPS assessment was performed. Treatment response was defined as a ≥30% decrease of total CAPS score at follow-up with respect to the baseline assessment38,39. According to this criterion 24 PTSD veterans were defined as responder and 20 as non-responder. All participants gave written informed consent. The study was approved by the University Medical Center Utrecht ethics committee, in accordance with the declaration of Helsinki40.

Clinical data analysis

To estimate whether the CC, responders and non-responders differed across any demographic or clinical variables at baseline or follow-up ANOVA, Kruskal–Wallis, χ2, or t-tests were applied as appropriate. All tests were performed using the R software (version 3.5.1).

Data acquisition

All scans were obtained on a 3T MRI scanner (Philips Medical System, Best, the Netherlands). The T1-weighted high resolution MRI scan was acquired before the rs-fMRI scan with the following parameters: repetition time (TR) = 10 ms, echo time (TE) = 4.6 ms, flip angle = 8°, 200 sagittal slices, field of view (FOV) = 240 × 240 × 160, matrix size = 304 × 299 and voxel size = 0.8 × 0.8 × 0.8 mm. The rs-fMRI scan consisted of 320 T2*-weighted echo planar interleaved slices with TR = 1600 ms, TE = 23 ms, flip angle = 72.5°, FOV = 256 × 208 × 120, 30 transverse slices, matrix size = 64 × 51, total scan time 8 min and 44.8 s, 0.4 mm gap, acquired voxel size = 4 × 4 × 3.60 mm). Participants were asked to focus on a fixation cross, while letting their mind wander and relax.

MRI data preprocessing

To estimate whether structural images carry information to distinguish between responders or non-responders a VBM analysis was performed. Gray matter (GM) voxel-wise volume maps were computed using the SPM12 toolbox (v7219, https://www.fil.ion.ucl.ac.uk/spm/software/spm12/).

Resting-state fMRI images were preprocessed using the advanced normalization tools (ANTs, 2.1.0, http://stnava.github.io/ANTs/) and FMRIB Software Library (FSL, 5.0.10, https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/. To control for the influence of motion on the rs-fMRI data ICA-AROMA was applied41. Details on the preprocessing pipelines can be found in the Supplementary information.

Resting-state network identification

Preprocessed rs-fMRI data were analyzed to determine group-level resting-state networks (RSNs). Group components with a fixed number of 70 components were estimated using a meta-ICA approach utilizing FSL’s MELODIC software42 applied to the rs-fMRI data of the CC. The identification of RSNs only on the data of the CC was done to reduce the potential of overfitting during the machine learning analysis since the training and test data sets would have to be taken together when defining the RSNs if data of the PTSD patients would be included. However, a potential drawback is that the identified RSNs might not optimally represent the rs-fMRI of the PTSD patients. The number of components was fixed to 70 because it was shown to provide good insight into clinical differences of patient groups43. However, it should be noted that such a higher-order ICA can split up canonical networks into multiple sub-networks which might reduce the information present in each sub-network. The meta-ICA approach allows for the identification of reproducible and reliable group components44. After meta-ICA, 48 RSN’s were identified using a semi-automatic approach. Thereafter, FSL’s dual regression approach was used to estimate single-subject spatial representations of the corresponding group networks for all patients. Details on the implementation and rationale of the procedure can be found in the Supplementary information, and signal and noise components are illustrated in Figs. S1 and S2.

Univariate analysis

The preprocessed GM volume maps from the VBM analysis and the identified RSNs were used to investigate group differences between responders and non-responders. Age and total intracranial volume were entered as covariates for the VBM data, while only age was used as covariate for the RSN data. The significance level was set to P < 0.05 family-wise error (FWE) corrected and estimated using the threshold-free-cluster-enhancement statistic (TFCE)45 with permutation testing (10,000 permutations) using the TFCE toolbox (r167, http://dbm.neuro.uni-jena.de/tfce/) for the VBM data. For the resting-state data, the PALM toolbox (a112, https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/PALM) was used since it allowed for permutation-based FWE correction across the whole-brain and all 48 RSNs at the same time. Both analyses accounted for two-tailed tests.

Multivariate analysis

For the multivariate single-subject classification of responders and non-responders, we used the GM volume-maps from the VBM analysis and each RSN separately. The classification was performed using a Gaussian process classifier (GPC)46. Briefly, GPCs are multivariate Bayesian classifiers which allow to obtain valid probabilistic predictions by estimating the posterior distribution, given a pre-defined prior distribution. GPCs are a standard classifier used in machine learning for neuroimaging and has been shown to perform comparable to support vector machines47. We utilized the ability of the GPC’s to provide valid probabilistic predictions to investigate posthoc the performance of the classifier when a ‘reject’ option is implemented (see below). Univariate feature selection was performed on the training set to reduce the initial data dimension using nested 5-fold cross-validation (see Supplementary information). The performance was estimated by calculating sensitivity, specificity, balanced accuracy, area under the receiver-operator curve (AUC) and positive/negative predictive value (PPV/NPV) using ten times repeated 10-fold cross-validation to avoid overfitting bias. To estimate whether our classifier performed better than chance, label permutation tests with 1000 iterations were performed. The final P-values were Bonferroni corrected for 49 tests.

We also investigated the performance of the GPC when an uncertainty option was allowed: utilizing the probabilistic output of the classifier, we established regions of uncertainty for which the classifier would not make a prediction. For example, with an uncertainty region of 10% any probabilistic GPC output for a new patient which lies within 45–55%, would not be assigned a classification label (because the classification into responders and non-responders would be uncertain). Only patients with a higher (or lower) probability would be assigned to a class and considered for calculation of balanced accuracy. This allowed us to investigate how well our GPC would perform if classification has only to be made if a specific level of certainty is reached and how many patients would need to be excluded to reach that level.

The code used in the analysis of the data can be made available upon request.

Results

Clinical data

Demographic information, clinical variables and outcomes of statistical tests can be found in Table 1. There was no difference in demographics between the CC, responders or non-responders, nor any clinical difference between responders and non-responders at baseline. At follow-up non-responders showed a higher total CAPS score (t(42) = 7.89, P < 0.001) and higher use of serotonin reuptake inhibitors (χ2(1) = 5.77, P = 0.02).

Table 1 Demographics and clinical data.

Univariate analysis

After correction for multiple comparisons across all RSNs, the rs-fMRI analysis showed one network with significantly increased connectivity in non-responders as compared to responders (Fig. 1). The network was centered on the bilateral lateral frontal polar area and the difference was observed in the right superior frontal gyrus (PFWE = 0.04). In Fig. S4, we show all univariate group-differences when no FWE-correction across networks was applied, performed for illustrative purposes only. No significant group differences in GM were observed.

Fig. 1: Results of the group-level univariate RSN analysis.
figure 1

Higher resting-state connectivity was observed in non-responders than responders in the frontopolar network. Two-tailed P-value was corrected for whole-brain comparisons and 48 networks.

Multivariate analysis

GPC’s trained on a network centered around the pre-supplementary motor area (pre-SMA) could classify non-responders and responders with an average cross-validated balanced accuracy of 81.4% (SD: 17.2, PBonferroni < 0.05) (Fig. 2a). The network showed excellent AUC (0.929, SD: 0.149) with high sensitivity (84.8%, SD: 25.1), moderately high specificity (78% SD: 28.6), and high PPV/NPV (0.840/0.835, SD: 0.214/0.262). No other network showed significant classification performance after Bonferroni correction was applied, including the network that showed a significant difference on the group-level in the univariate analysis. However, if no Bonferroni correction is applied this network becomes significant, as well as three additional networks. Uncorrected networks and consistently selected features are shown for illustrative purposes in Fig. S5.

Fig. 2: Results of the single-subject multivariate prediction analysis of treatment outcome.
figure 2

a The classification metrics of the pre-SMA network shown as box-and-whisker plots. Outliers plotted as circles were determined as values which lay outside 1.5 times the interquartile range. Please note that the box for the AUC metric collapsed because the first quartile and the median were the same value. b Posthoc evaluation of accuracy of the GPC classifier for various cut-off levels of probabilistic certainty. Calculations were performed for and averaged across the ten repetitions of the 10-fold cross-validation with SD plotted as error bars. For example, once 12 patients (27%) with low prediction certainty of 0.41–0.59 —where 0.5 is equal probability of prediction— would be excluded, accuracy would increase to over 90%.

To investigate which regions of the pre-SMA network were most important for the classification process we examined consistently selected voxels during the feature selection process. We tracked the selection frequency of voxels across cross-validation runs, looking at voxels which were selected in >50% of the runs (Table 2 and Fig. 3). Regions in both hemispheres located outside the group-network were contributing to the classification performance. The largest clusters were located in the left inferior temporal gyrus (nvoxel = 14), left superior frontal gyrus (nvoxel = 10), and right precentral gyrus (nvoxel = 9). For illustrative purposes we also computed mean correlations for responder and non-responder groups separately between average time-courses of the largest clusters(nvoxel > 5, Table 2) and the subject-specific time-courses of the pre-SMA network identified by dual regression (Fig. S3). Patterns of positive, negative and no significant connectivity with the network can be observed. Note that null-connectivity voxels might still contribute to the classification by removing common noise sources from the overall pattern48.

Table 2 Most frequently selected features during the nested-cross-validation procedure of the pre-SMA network.
Fig. 3
figure 3

Best performing network in the multivariate classification (pre-SMA) in hot colors and the most often selected voxels during the classification in cold colors.

Additionally, we provided a posthoc evaluation of what would happen if prediction would only be made for patients for which a high degree of certainty of the classifier is established. As illustrated in Fig. 2b, this ability to ‘reject’ patients from the classification with increasing classification certainty leads to increasing accuracy while at the same time reducing the number of patients for which the GPC can make a classification. For example, once 12 patients (27%) with low prediction certainty of 0.41–0.59—where 0.5 is equal probability of prediction—would be excluded, accuracy would increase to over 90%.

Discussion

The present study investigated the possibility of using pre-treatment structural MRI and rs-fMRI data to predict the response to trauma-focused psychotherapy in male combat veterans with PTSD. The results showed that rs-fMRI data successfully distinguished between responders and non-responders in univariate and multivariate analyses. The univariate analysis detected group differences in a network centered on the frontal pole, and the multivariate analysis predicted treatment response on an individual level using pre-SMA connectivity with an accuracy of 81.4%. Whereas previous studies have focused on MRI-based treatment outcome predictors at the group level, our results suggest that single-subject prediction is also feasible. This result provides a proof-of-concept for the feasibility of developing predictive biomarkers, which could enable personalized treatment for patients with PTSD.

Our multivariate analysis revealed the predictive importance of the pre-SMA. This brain area is closely linked to the SMA, and is involved in motor preparation, response inhibition, and imagination49,50,51. Intriguingly, resting-state connectivity within this network is also predictive for the response to electroconvulsive therapy in depression52. The main difference in results is that the network in the current study is more confined to the pre-SMA due to the use of ICA with 70 components instead of 32 components, which was associated with a larger network that consisted of a large part of the dorsomedial prefrontal cortex. Together, this suggests that pre-SMA connectivity may determine responsiveness to treatment, regardless of intervention and disorder.

The discovered network is different from the ventral attention network (VAN, consisting of the insula, dorsal anterior cingulate, anterior middle frontal gyrus, and supramarginal gyrus) that was recently reported. The VAN in combination with delayed recall performance in a verbal memory task could predict prolonged exposure therapy outcome in a sample of ~19 civilians with PTSD with sensitivity and specificity ≥80%25. But even though both studies used rs-fMRI, the underlying biomarkers cannot readily be compared. First, the variables tested in25 were discovered by performing comparisons between healthy controls and PTSD patients, whereas we discovered the pre-SMA network from comparisons between responders and non-responders directly. Second, the authors did not investigate any other networks beyond the VAN for treatment outcome prediction. And third, the brain regions that are part of the VAN were part of distinct RSNs in our ICA analysis, whereas the VAN was considered one network in the previous study. Therefore, it remains to be tested whether VAN or pre-SMA connectivity is also predictive in other samples. Regardless, both studies demonstrate that rs-fMRI contains information that is informative for predicting psychotherapy outcome on an individual level.

The univariate group analysis showed increased connectivity in non-responders in the frontal pole. The frontal pole region (BA 10) has been implicated in a multitude of cognitive tasks, including attention, perception, language, and memory tasks53,54. Specifically, the lateral parts of the frontal pole are more associated with working memory and episodic memory retrieval while medial parts of the frontal pole were mostly involved in mentalizing, which is the reflection of your own emotions and mental states53,54. This division of the frontal pole was recently confirmed by a cytoarchitectonic parcellation indicating two distinct areas: a more lateral frontal pole area 1 (FP 1) and a more medial frontal pole area 2 (FP 2). Our frontal polar network was mostly located in FP 1 and may, therefore, be primarily associated with memory-related processes.

The difference between the identified networks in the univariate and multivariate analyses might seem counterintuitive at first but can be explained by the differences in objective and methodology of both analyses. This discrepancy is in line with the observation that significant group-level differences do not necessarily translate to high classification accuracies because of strongly overlapping distributions and different goals of the analysis18,19. A significant P-value in a group-level analysis does not have to correspond to the ability of distinguishing between individual patients because the statistically significant difference in average values might show low effect sizes. In these case classification performance will be low. In addition, the goal of statistical inference is the identification of localized differences between groups while the goal of classification is to find the best multivariate combination of data, which would allow to generalize the effect to new subjects. These are two inherently different goals which therefore can lead to different outcomes.

In contrast to our results, previous studies that have used univariate analysis of structural MRI and task-based fMRI data have primarily pointed to pre-treatment differences in the anterior cingulate cortex, amygdala, hippocampus, and insula13,14,15,16,17,32,33,34,35. However, direct comparison with our study is difficult since there are numerous differences between our study and those previous studies. For example, most studies that investigated structural MRI data used a predefined region of interest approach26,32,33,35 instead of a whole-brain approach. Most fMRI studies have investigated task-induced changes instead of investigating resting-state fMRI recordings13,14,15,16,17. And finally, different types of psychotherapies (such as prolonged exposure therapy), different PTSD populations and experienced trauma, and different treatment-criteria make a direct comparison challenging55.

This can be exemplified with the absence of results for the structural MRI analysis which is in contrast to our previous finding of differences in hippocampal volume between patients with remitted vs. persistent PTSD26. This difference could be due to the calculation of the volumes: in the present study, a VBM analysis was employed to provide a highly multivariate data set that could be optimally used during the classification procedure, whereas we previously estimated hippocampal volume using segmentation in Freesurfer. In addition, in this study we chose to focus on treatment response while previously we investigated the more stringent criterion of treatment remission to focus on PTSD persistence. Finally, the previous study was employing a repeated-measures design combining pre- and post-treatment data while the current study only focused on the pre-treatment data.

The current study has several limitations. The sample size in the current study is small for a machine learning application. This could result in high variance of the estimated accuracy and the results, therefore, require further validation and replication in independent samples56. Another limitation of this study is the use of an all-male veteran sample. This limits the generalization of the results to other patients with PTSD. Therefore, a replication of the proposed approach in a more diverse sample would be desirable. Finally, the treatments received by the patients represent a heterogeneous mix of different trauma-focused psychotherapies. While they are considered as first-line treatments and the fact that in realistic settings multiple treatments might be employed by therapists, the results are not specific to one particular treatment. Therefore, the current approach might obscure specific individual patient-by-treatment interactions. Future studies should aim to determine the most optimal treatment for each patient.

In conclusion, the current study shows that treatment response to trauma-focused psychotherapy can be predicted for individual patients with PTSD using machine learning analysis of rs-fMRI data. This proof-of-concept study demonstrates the feasibility to develop neuroimaging biomarkers for treatment response, which will enhance the personalized treatment of patients with PTSD.