Introduction

Stress-related neuropsychiatric disorders such as major depressive disorder (MDD), anxiety disorders, and post-traumatic stress disorder (PTSD) are common and associated with high levels of comorbidity and within-disorder heterogeneity [1,2,3]. High disorder comorbidity and symptom heterogeneity suggest that approaches focusing on DSM5 diagnostic categories or on a circumscribed biomarker set could limit identification of likely complex relationships between clinically heterogeneous neuropsychiatric symptoms and their underlying biological signatures [4]. This is consistent with the concept that neuropsychiatric disorders are not distinct disorders, but instead are comprised of sets of neurobiological mechanisms across several units of analysis [5,6,7]. Since it is unlikely that any single biological mechanism that operates in isolation can explain the full range of symptoms of a given disorder, there is a need for development of alternative analytic approaches that address the dimensional nature of psychiatric symptoms and the array of neurobiological mechanisms that are likely contributors [8]. Such approaches can help identify sources of heterogeneity within a disorder or reveal comprehensive phenotype profiles to explain transdiagnostic symptom patterns. Use of unbiased data-driven approaches have begun to yield biological signatures of discrete profiles of stress-related neuropsychiatric symptoms [9,10,11].

Leverage of multivariate analytic approaches is increasingly used to identify interrelationships among multiple units of analysis, including between psychiatric symptoms and biological markers that univariate approaches are unable to capture [12,13,14]. One analytic technique that has received renewed interest in addressing psychiatric and neurobiological heterogeneity in neuropsychiatric disorders is canonical correlation analysis (CCA) [15, 16]. CCA a type of multivariate analysis that seeks to extract multiple sets of latent features called canonical variates (CVs) that when correlated, represent the maximized linear relationship between feature sets of variables, and has recently been applied to neuroimaging measures to identify brain-based dimensions of neuropsychiatric symptoms [17,18,19,20]. It can be expanded to include multiple feature sets, called multi-set CCA (mCCA), to identify multivariate patterns between multiple neurobiological modalities and psychiatric disorders that a more traditional two-way CCA would miss [21,22,23]. mCCA is like other multivariate fusion and latent variable approaches such as joint-independent components analysis (jICA), in that they are both data-driven approaches that do not require specific hypotheses and they both seek to find latent patterns across two or more variable feature sets [24]. Whereas jICA assumes a linked-relationship between each variable modality, mCCA is more flexible by allowing both linked and distinct interrelationships between two or more variables (see Ref. [23] for a review). However, applying mCCA to investigate stress-related symptoms is still relatively rare, partly due to the need for extensive, deep phenotype data sets with adequate power.

The aim of our current investigation was to use mCCA to derive linked relationships between dimensional stress-related psychiatric symptoms, psychophysiological measures and a set of biochemical assays in a large cohort of active-duty participants from the Marine Resiliency Study (MRS; Nā€‰=ā€‰2592) [25]. MRS is a large prospective, longitudinal study of active-duty service members aimed to identify predictors of risk and resilience to combat stress. Here we focused on the pre-deployment time point which had the largest sample size available with three different features of data including biochemical (e.g., blood, saliva, and urine bioassays), psychophysiological (e.g., acoustic startle reflex, fear potentiated startle, blood pressure, heart rate variability), and questionnaire or interview derived psychiatric symptom data (e.g., depression, anxiety, and trauma-related reexperiencing, avoidance, and hypervigilance symptoms). Using these dataset features, we performed a three-way mCCA to define dimensional transdiagnostic psychiatric symptom components (See Fig.Ā 1) that associated with specific biological marker sets.

Fig. 1: Multi-set canonical correlation analysis workflow conducted on the three data features: Biochemical measures (7 variables), psychophysiology (20 variables), and psychiatric symptoms (57 variables).
figure 1

After preprocessing, a data-reduction step was conducted on the psychophysiological and psychiatric symptom measures via principle components analysis (PCA). Next, the three features sets were entered into the mCCA to derive five canonical variates (CVs) for each feature set and for each subject. The first CVs for each feature set are correlated to form a multi-set canonical correlation (mCC), which represents the maximized linear relationship between the three data feature sets and can be represented as a correlation table. The remaining mCCs are calculated using the residuals from the prior mCC. Portions of this figure was created using BioRender.com (San Francisco, CA).

Materials and methods

Participants

Participants were recruited from infantry battalions deploying to either Iraq (2008) or Afghanistan (2009ā€“2010). All active-duty members of these operational units were eligible. There were no exclusion criteria. Women were not included because female Marines were not part of infantry battalions at the time of testing (see TableĀ 1 for sample details). A total of 2592 active-duty Marines and accompanying Navy personnel were enrolled in the pre deployment assessment. Participants missing multiple data points were removed (nā€‰=ā€‰88), leaving a total of 2504 for the primary analyses before preprocessing. Study procedures were approved by the institutional review boards of the Veteranā€™s Administration San Diego Healthcare System; the University of California, San Diego; and the Naval Health Research Center. All participants provided voluntary written informed consent. Complete MRS methods and demographic information are described elsewhere [25]. The measures relevant to the present study are presented here.

Table 1 Demographic Information and Psychiatric Symptom Measures.

Psychiatric measures

PTSD symptoms were assessed with the Clinician-Administered PTSD Scale for DSM-IV (CAPS-IV) [26] the gold-standard clinical interview assessing for diagnostic criteria and severity of PTSD. All interviews were conducted by study personnel trained, certified, and supervised by a licensed psychiatrist (D.G.B). 13.18% of the sample met criteria for PTSD using partial DSM-IV PTSD criteria: >0 Cluster B symptom, >1 Cluster C symptoms, and >1 Cluster D symptoms, with minimum frequency ratings of 1 and minimum intensity ratings of 2 [27]. Depression symptoms were measured with the Beck Depression Inventory version 2 (BDI-II) [28]: 7.91% of the sample met criteria for moderate to severe depression (BDI-II score > 19). The BDI-II measures the presence of depressive symptoms within the past 2 weeks. Anxiety symptoms were assessed with the Beck Anxiety Inventory (BAI) [29], a reliable measure of general anxiety symptoms present within the past week which discriminates between anxiety vs. depressive symptoms fairly well [30]: 13.74% of the sample endorsed moderate to severe levels of anxiety (BAI score >15).

Psychophysiological measures

Modulation of acoustic startle reactivity was measured with three separate tasks. Before testing, each participant is screened for hearing impairment and fitted with headphones while seated in a comfortable chair facing a computer monitor. After electrode placement and verification, the participant completed the following startle tasks: (1) assessment of startle threshold using acoustic tones, (2) test of modulation of acoustic startle response while viewing emotional images or anticipating image presentation, and (3) test for pre-pulse inhibition and startle habituation [31,32,33,34]. Details of each task and pre-processing are reported in the Supplementary Materials. Cardiovascular measures included systolic and diastolic blood pressure, heart-rate, and heart rate variability (HRV) [35, 36].

Biochemical measures

Peripheral blood, urine, and saliva samples were collected from all participants. Blood-based assays were C-reactive protein, and neuropeptide-Y. Saliva-based assays were cortisol, cotinine, and Ī±-amylase. Spot-urine assays were epinephrine, and norepinephrine [25, 37,38,39]. See Supplementary Materials for data collection and processing details.

Analysis

Pre-Processing and data reduction

The pre-processing of the psychophysiological and biochemical markers was conducted using standard procedures (see Supplemental Materials). For the remaining participants (nā€‰=ā€‰2504), missing data was imputed using predicted mean matching multiple imputation by chained equations using the mice package in R [40]. Missing datapoints for each variable were imputed within its respective feature set (i.e., missing cortisol value was imputed using biochemical variables and not psychophysiology or psychiatric symptoms). Missing datapoints were infrequent: Biochemical variable range: 0ā€“80 missing datapoints; Psychophysiology variable range: 0 ā€“ 30; psychiatric symptoms variable range: 6ā€“12 (see Supplementary TableĀ 1 for details). Next, psychophysiological non-responders (e.g. no startle response) were removed (nā€‰=ā€‰422). Multivariate outliers were additionally removed (nā€‰=ā€‰58), leaving a final sample nā€‰=ā€‰2024. Variables were normalized using a best-model approach via the bestNormalize package in R [41]. Variance associated with age, battalion cohort, time of day of data collection, and ethnicity was removed via multiple regression prior to computing the mCCA. All pre-processing and analyses were conducted in R.

Data-reduction

Data reduction was performed to reduce data dimensionality and orthogonalize variables sets with elevated collinearity prior to computing the mCCA [15, 18]. The psychophysiological variables (nvarā€‰=ā€‰20; variance inflation factor [VIF] range=1.06ā€“10.62) and psychiatric symptoms (nvarā€‰=ā€‰57; VIF range: 1.07ā€“2.69) were entered into separate principal component analyses (PCAs) to derive a reduced set of principle components (PCs) for each feature set. The number of PCs derived was based upon Hornā€™s technique [42] using the paran package in R [43]. This technique compares the PCA eigenvalues to eigen values produced on a random number of datasets to adjust the sample error-induced inflation. PCs with adjusted eigenvalues greater than one were kept. For the psychophysiological feature set, five PCs were derived explaining 72.4% of the variance. Supplemental TableĀ 2 shows the PCA loadings and names given to each PC. For the psychiatric symptom feature set, 10 PCs were derived explaining 52.3% of the variance. Supplementary TableĀ 3 shows the PCA loadings and names for each component. The five psychophysiological PCs and the 10 psychiatric symptom PCs, along with the seven biochemical variables were then submitted to the mCCA. The biochemical feature set consisted of only seven variables and had low VIFs (range: 1.00ā€“1.51), therefore data-reduction was not necessary.

Multi-set CCA

To examine the inter-relationships between psychophysiology and biochemical feature sets, and psychiatric clinical symptoms, we computed a multi-set canonical correlation analysis (mCCA) using R:mcancor via the nscancor package [44]. mCCA is an extension of a two-way CCA to allow for multiple data-set domains [22, 45]. Like traditional CCA [46], mCCA identifies linked relationships of canonical variates (CVs), such that the correlations among the CVs for the multiple domains are linearly maximized, in this case three data domains. The next set of canonical variables is found by again maximizing their correlation from the residuals produced from the prior multi-set canonical correlation, under the additional constraint that they must be uncorrelated to all previous ones until reaching the final canonical set (mCCā€‰=ā€‰5). These constraints are specified through iterative regression functions for each domain set. To derive an overall metric for each of the five canonical correlation multisets, we computed the sum of the three pairwise squared canonical correlations (R2) for each mCC. This metric indicates the shared variance explained by each canonical triplet and has a similar interpretation as R2 in multiple regression [47].

Permutation and bootstrapping

To determine the significance of mCCA correlation, we performed restricted, or multi-level block, permutation using the ā€œpermuteā€ package in R [48]. We randomly scrambled subjectsā€™ psychophysiological and biochemical data columns to break the association between subjectsā€™ psychophysiology/biochemical measures with their clinical psychiatric symptom measures. To reduce potential inflation of significance testing caused by dependence in datasets [49], we restricted the permutations to within battalion cohort due to the shared military experience within each battalion. We re-ran mCCA for 5000 permutations to create a null distribution of mCCA values. We compared the original mCCA values to these re-aligned distributions. Any significant mCCA values would have to be greater than the correlations in the permuted datasets. Permuted P values were computed by determining the number of permuted canonical correlation values (mCCperm) that were greater than or equal to the observed canonical correlation (mCCobs) divided by the number of permutations: Ppermā€‰=ā€‰(# mCCperm ā‰„ mCCobs) / 5000. We conducted permutations for each mCC triplet and each pair-wise CC. To reduce Type-I error, the permuted P values were corrected using Benjamini-Hochberg False Discovery Rate (FDRā€‰<ā€‰0.05). We additionally computed a bootstrap resampling procedure. We performed 5,000 random resamples and estimated the means, standard errors, and 95% confidence intervals for each mCCA value (See Supplementary TableĀ 4 for the bootstrapped mean mCCs and the standard errors).

Results

Multi-set canonical correlation analysis

A multi-set canonical correlation analysis (mCCA) of psychiatric symptoms and biological measures revealed four significant canonical correlations (TableĀ 2). In the remaining sections, we focus only on the first two mCCs. The first two mCCAs have moderately robust pairwise CCs, whereas the remaining mCCs were considerably weaker and potentially not scientifically meaningful [15]. We present the results for the remaining mCCs in Supplementary Materials Results and Supplementary Figs.Ā 1ā€“3.

Table 2 Multi-set canonical correlation matrices.

The first mCC revealed a significant multi-set relationship between psychiatric symptoms and physiological/biological measures mCC 1 R2ā€‰=ā€‰0.22, Ppermutedā€‰<ā€‰0.0002, FDRadjustedā€‰<ā€‰0.0005. Examination of the pairwise relationships between the canonical variates (CVs) indicated that mCCA 1 was driven predominately by the relationship between the psychiatric symptoms and psychophysiology, CCā€‰=ā€‰0.46, Ppermutedā€‰<ā€‰0.0002, FDRadjustedā€‰<ā€‰0.0002, CIbootā€‰=ā€‰0.33ā€“0.46; Fig.Ā 2A. The relationship between psychiatric symptoms and biochemical CVs was not significant, CCā€‰=ā€‰0.09, Ppermutedā€‰=ā€‰0.99, PFDRā€‰=ā€‰0.99, CIbootā€‰=ā€‰0.05ā€“0.15; nor was the relationship between the psychophysiology and biochemical CVs significant, CCā€‰=ā€‰0.05, Ppermutedā€‰=ā€‰0.99, PFDRā€‰=ā€‰0.99, CIbootā€‰=ā€‰0.003ā€“0.09. Examination of the canonical loadings for the first mCC indicate that the psychiatric symptom CV 1 (Fig.Ā 2B) was characterized by a cluster of symptoms including dysphoric and anxious arousal as described by the 5-factor model of CAPS [50] which includes elevated dysphoric arousal, anxiety, and anhedonia (canonical loadingsā€‰=ā€‰0.66, 0.41, and 0.39, respectively). The psychophysiology CV 1 (Fig.Ā 3B) was associated with low blood pressure levels, general startle reactivity, and prepulse inhibition (canonical loadingsā€‰=ā€‰ā€“0.52, ā€“0.40, and ā€“0.32, respectively). These findings suggest that individuals who reported higher levels of dysphoric and anxious arousal exhibited both lower blood pressure and startle reactivity.

Fig. 2: Results of mCC Set 1 reveal a dysphoric arousal phenotype.
figure 2

A Scatterplot depicts the significant pairwise canonical correlation (CC) between the Psychophysiology canonical variate (CV) and the psychiatric symptom CV. Each dot represents an individual subjectā€™s score as a function of the relationship between each CV pair. The inset reflects the permutation test (5000 permutations). The vertical dotted line indicates the exact pairwise canonical correlation value. P values were calculated using restricted permutation testing (Pperm). B For visualization purposes, univariate Pearson correlations of two examples of top-weighted psychophysiological and psychiatric symptom variables from mCC 1 are depicted. C Canonical loadings for the psychiatric symptom CV 1. The top three PCs are dysphoric arousal, anxiety, and anhedonia. C Canonical loadings for the psychophysiology CV 1. The top two PCs are general startle reactivity and blood pressure.

Fig. 3: Results of mCC Set 2 reveals an anxious fatigue phenotype.
figure 3

A Scatterplots depict the two significant pairwise canonical correlation (CC) between the biochemical canonical variate (CV) and the psychiatric symptom CV and the psychophysiological CV and the psychiatric CV. The insets reflect the permutation tests (5000 permutations). The vertical dotted lines indicate the exact pairwise canonical correlation value. P values were calculated using restricted permutation testing (Pperm). B For visualization purposes, residualized univariate Pearson correlations of the top-weighted biochemical, psychophysiological, and psychiatric symptom variables from mCC 2 are depicted. (C) Canonical loadings for the psychiatric symptom CV 2. The top three PCs are fatigue, anxiety, and reexperiencing/avoidance symptoms. (C) Canonical loadings for the for the biochemical CV 2. The top two PCs are norepinephrine and C-reactive protein (CRP). C Canonical loadings for the psychophysiology CV 2. The top two PCs are blood pressure and low startle threshold (Low thresh.).

The second mCC revealed a significant multi-set relationship between psychiatric symptoms and biological measures mCC 2 R2ā€‰=ā€‰0.05, Ppermutedā€‰<ā€‰0.0004 FDRadjustedā€‰<ā€‰0.0007; TableĀ 2. This canonical finding was predominately driven by the relationship between the psychiatric symptoms and biochemical assays (CCā€‰=ā€‰0.19, Ppermutedā€‰=ā€‰0.0002, FDRadjustedā€‰=ā€‰0.0004, CIbootā€‰=ā€‰0.10ā€“0.26; Fig.Ā 3A) and between the psychiatric symptoms and the psychophysiology measures (CCā€‰=ā€‰0.11, Ppermutedā€‰=ā€‰0.004, FDRadjustedā€‰=ā€‰0.006, CIbootā€‰=ā€‰0.01ā€“0.12; Fig.Ā 3B). Examination of the canonical loadings indicate that the psychiatric symptom CV 2 (Fig.Ā 3C left panel) was characterized by elevated symptom levels of fatigue, anxiety and by the symptoms of reexperiencing/avoidance (canonical loadingsā€‰=ā€‰0.71, 0.30, and 0.29, respectively). The biochemical CV2 (Fig.Ā 3C middle panel) was characterized by elevated norepinephrine and CRP (canonical loadingsā€‰=ā€‰0.75 and 0.51). The psychophysiology CV 2 (Fig.Ā 3C right panel) was associated with high blood pressure, and negatively associated with the low startle threshold PC (i.e. startle less likely when presented with weak startling stimuli, indicating high startle threshold) (canonical loadingsā€‰=ā€‰0.87 and ā€“0.48). These findings indicate that increases in self-reported fatigue, anxiety, and reexperiencing/avoidance symptoms of intrusive trauma imagery and avoidance were jointly associated with increases in norepinephrine, CRP levels, blood pressure, and high startle threshold. These findings remained robust even when not controlling for variation in age, race/ethnicity, time of day, and battalion cohort (See Supplementary TableĀ S5).

Pearson correlations

Details for the correlational analysis between the PCs and the individual variables are reported in the Supplementary Materials.

PC correlations

The largest pairwise-correlation between psychophysiology PCs and psychiatric symptoms was between the blood pressure PC and the hypervigilance PC, R2ā€‰=ā€‰ā€“0.19, pā€‰=ā€‰7.6ā€‰Ć—ā€‰10ā€“18, PFDRā€‰=ā€‰2.0ā€‰Ć—ā€‰10ā€“16 (See Supplementary Fig.Ā 3), an effect size lower than that observed in mCCA R1(CCā€‰=ā€‰0.46). The largest pairwise-correlation between biochemical and psychiatric systems was between norepinephrine and the fatigue PC, R2ā€‰=ā€‰0.03, rā€‰=ā€‰0.18, pā€‰=ā€‰2.02ā€‰Ć—ā€‰10ā€“15, which is on par with the biochemical and psychiatric symptom CC from mCCA R2 (CCā€‰=ā€‰0.19) (See Supplementary Fig.Ā 4).

Item-level correlations

Like the correlations computed on the PCs, for mCC1, the individual psychiatric symptom items (22 items) were low to moderately correlated with psychophysiology (12 items), rsā€‰<ā€‰0.12 (range:ā€“0.03ā€“0.12), psā€‰>ā€‰0.001 (uncorrected; See Supplementary Fig.Ā S5). For mCC2, the individual psychiatric symptom items (15 items) were low to moderately correlated with the psychophysiology (6 items) and biochemical (2 items) measures that most defined the PCs underlying mCC2, rsā€‰<ā€‰0.14 (range: ā€“0.06 to 0.14), psā€‰>ā€‰0.001 (uncorrected; see Supplementary Fig.Ā S6).

Discussion

The primary aim of the current investigation was to derive multivariate biomarkers linked to dimensional symptom measures of dysphoria, anxiety, fatigue, and trauma-related reexperiencing and avoidance. We identified linked patterns that characterized the relationship between these dimensional symptoms and biological signature sets. We identified a symptom cluster of dysphoric arousal and anhedonia that was associated with blunted startle reactivity and low blood pressure, and alternatively, a symptom cluster of fatigue and reexperiencing/avoidance symptoms associated with elevated norepinephrine and CRP, high blood pressure and high startle threshold. We also identified two other weaker but statistically significant relationships between psychiatric symptoms, psychophysiological, and biochemical measures (See Supplementary Materials). Using permutation testing and bootstrapping our results were shown to be reliable and reproducible. Overall, CCA also offers the potential to identify robust relationships with larger effect sizes and with stronger protection against Type 1 errors due to fewer multiple comparisons than when using traditional univariate analyses [16, 51]. As shown in the current investigation, we observed a robust relationship between psychiatric symptoms and psychophysiology CV (CC 1ā€‰=ā€‰0.46), whereas the highest univariate correlation was substantially weaker (rā€‰=ā€‰ā€“0.19). These results provide compelling support for the utility of the mCCA approach to identify novel and robust findings in complex multimodal datasets.

Using the mCCA approach, we derived psychiatric symptom phenotypes that were based on patterns between individual differences of psychophysiology and psychiatric symptoms. In other words, the psychiatric dimension of dysphoric arousal (characterized by anhedonia, dysphoric arousal, and anxiety) observed in mCC 1 is represented as a mixture of these symptoms and psychophysiology measures. The psychophysiology dimension was comprised of blunted measures of arousal, including low general startle reactivity and low blood pressure. The finding of a negative relationship between general startle response and low blood pressure with self-reported arousal may seem counterintuitive [52,53,54]. However, both startle hyporeactivity [55,56,57,58,59] and low blood pressure [60,61,62] are documented in individuals with a history of anxiety, dysphoria, and stress-related neuropsychiatric symptoms (see Lang et al., 2014 for a reviewĀ [63]). Epidemiological studies support a link between low blood pressure and depression, particularly anhedonia symptoms [64]. In a large cohort (nā€‰=ā€‰60,799), individuals with comorbid anxiety and depression were more likely to have low blood pressure than individuals without these symptomsĀ [65] and high baseline levels of anxiety and depression predicted low blood pressure 22 years later [62]. Furthermore, low blood pressure is associated with suicidal ideation [66] and is a risk-factor for late life depressionĀ [61]. The CCA also showed blunted startle associated with dysphoric arousal and anhedonia. Blunted startle responding can occur after chronic or long-standing stress [63, 67,68,69] and in adults endorsing early-life adversity [59, 70, 71]. In a recent review of psychophysiological phenotypes across anxiety and mood disorders, blunted heart rate and startle responses either at baseline or in response to threat is observed in populations with high chronic distress and depression symptoms [72]. Blunted psychophysiological reactivity may also reflect elevated dissociative symptoms in individuals experiencing chronic traumatic stress symptoms [73, 74]. Pre-deployment dissociative symptoms were not assessed in this investigation; therefore, it will be important for future studies to include a measure of dissociation to determine how it links to multimodal physiological markers. These previous findings have focused on traditional diagnostic categories (i.e., Major Depressive Disorder, PTSD), whereas our findings are focused on transdiagnostic phenotypes. Therefore, direct comparison of our findings to previous work needs to acknowledge this distinction. However, these prior findings are thus very much in line with the CCA identified in the current study that used an unbiased, data-driven approach across wide range of symptom measures and physiological and peripheral signaling markers. Taken together these findings suggest that there is a clear subpopulation of individuals with blunted physiological responses associated with combined dysphoric arousal and anhedonia, which may suggest specific mechanisms underlying this symptom pattern. The biological mechanism(s) for this association is unclear, however there is some data to suggest an inflexible corticolimbic-response including blunted amygdala reactivity or that poor amygdala synchrony to emotional stimuli contribute to the blunted physiological reactivity [75, 76] as well as dysregulation of the hypothalamic-pituitary-adrenal axis and renin-angiotensin systems that influence blood pressure [65].

We also identified a biomarker associated with anxious fatigue, high anxiety and reexperiencing/avoidance symptomsā€”a symptom profile that covaried with elevated norepinephrine and CRP, elevated blood pressure, and high startle threshold. Physiologically, autonomic imbalance, such as excess sympathetic drive, particularly under periods of stress, are known to contribute to increased blood pressure and inflammation [77,78,79,80,81,82]. Thus, our observed relationship between CRP, norepinephrine and blood pressure levels is not un-expected. The physiologic profile of high inflammation/sympathetic drive, associated with reexperiencing/avoidance symptoms and overall high anxiety/fear symptoms in humans is consistent with previous work, and may be exacerbated in those with more severe illness [79, 83,84,85]. These findings are also in line with descriptions of both high CRP and peripheral norepinephrine across anxiety and trauma disorders as well as during chronic stress (e.g.Ā [86,87,88,89,90]) and fatigue [91]. A benefit of CCA and mCCA is that they can extract multiple components of latent variable patternsā€”relationships that exist in a high-dimensional dataset that would not otherwise be apparent if a non-latent variable approach was used [47]. The high sympathetic drive phenotype is a unique linear combination of psychiatric, biochemical, and psychophysiological variables that is orthogonal to the phenotype observed in mCC 1.

There are limitations to this study that should be considered. First, although the CCA approach may allow for detection of complex relationships, it requires a number of a-priori choices in model development and interpretation, including definition of the phenotype and delimitation of the variety and array of biological and behavioral markers to be included in the analytic model. This is a feature of most unbiased analytic approaches however, not just CCA [13]. Related, creation of robust data-driven and latent psychiatric phenotypes helps address clinical heterogeneity and aids in linking complex constructs across multiple domains. But such approaches also raise important questions about interpretability and applications in new datasets [92]. Enhancing the generalizability and clinical utility of latent variable psychiatric phenotypes will be a critical step for future investigations. Second, the sample consisted exclusively of active-duty male Marine/Navy servicemen and thus is not necessarily generalizable to civilians and females [93, 94]. Third, this population had good variance in symptoms and biomarker phenotypes, however it was predominantly a relatively healthy population, thus other or additional symptom-marker relationships may be detectable in a more clinically impaired sample. Fourth, the sample was predominately White and the results may not generalize to individuals from other racial groups. Future work will be needed to examine racial and ethnic differences in multimodal phenotypes since recent work has found racial differences in phenotypes associated with neuropsychiatric symptoms [95]. Fifth, another drawback of CCA is that it is based on the assumption that the relationships between the features are linear and therefore do not measure higher-order relationships [15]. Applying kernelized-CCA, other multivariate/machine learning approaches (e.g., independent components analysis [ICA], neural nets), or their combination (mCCAā€‰+ā€‰ICA, deep CCA) may better identify non-linear relationships between variable sets than CCA alone [9, 21, 96, 97]. However, these limitations are balanced by notable strengths, including the large sample (Nā€‰=ā€‰2024), the deeply phenotyped dataset, and the relative physical health of the populationā€”reducing confounds of comorbid physical illnesses and other extraneous variables (note all peripheral biomarkers were controlled for the effects of age, time of assessment, cohort, and ethnicity).

Conclusion

High psychiatric disorder comorbidity and symptom heterogeneity suggests that the current diagnostic system is not capturing the range of patientā€™s symptom experience, which may hinder the identification of clinically useful biomarkers to guide treatment development [2, 98]. Work linking a single neurobiological measure to a single diagnostic disorder has had limited clinical utility [99], motivating the field to shift to a framework where the focus is on dimensional psychiatric symptoms, not diagnostic categories, and on multiple, not single, biological markers [5, 11, 100,101,102]. Clearly, new data-driven analytic strategies are needed to address the complex multivariate relationship between psychiatric symptoms and multiple biological markers [8, 103], with mCCA being well-suited to handle this challenge [47]. The current findings support the mission towards a dimensional model of neuropsychiatric symptoms grounded in neurobiology and highlight the potential of multivariate statistics to reveal important psychiatric symptom biomarkers derived from several psychophysiological and biochemical measures. Future work will be required to apply this approach with other high-dimensional datasets that are an inherent part of biological assays (e.g., neuroimaging, genome, epigenome), and to test how multimodal biomarkers relate to other measures (e.g., trauma history, psychosocial functioning), and to determine their predictive utility.