Introduction

Schizophrenia (SZ) is a major cause of years lived with disability1, yet the pathological mechanisms underlying the diverse manifestations of this disorder remain unclear2. In line with the notion that the symptoms partly arise from abnormal brain connectivity and functional integration of brain processes3,4,5, histopathological and genetic investigations have implicated lipid homeostasis, neuroinflammation, myelin and oligodendrocyte abnormalities6,7,8,9,10,11. Furthermore, brain imaging has consistently indicated anatomically widespread white matter (WM) microstructural abnormalities using diffusion tensor imaging (DTI)12,13,14,15. A recent meta-analysis across 29 cohorts of the ENIGMA consortium, including the current, documented significantly lower fractional anisotropy (FA) in patients with SZ (n = 1963) compared to healthy controls (n = 2359) in 20 of 25 regions of interest, and no significant associations with age of onset and medication status16.

WM abnormalities have been reported across a wide range of clinical traits and disorders17,18,19,20,21,22, and the diagnostic specificity remains unknown. Current diagnostic nosology treats SZ and bipolar spectrum disorder (BD) as independent categories, but genetic23, clinical and neuropsychological24,25 evidence suggest partly overlapping pathophysiology and clinical manifestation, with higher symptom burden, poorer function and worse outcome in SZ26. Including both patients with SZ and BD in the same analysis is vital for probing common and distinct etiological mechanisms across the psychosis spectrum. While DTI aberrations have consistently been documented in BD27,28,29,30,31, the existing studies that have included both groups have not provided conclusive evidence of marked group differences between BD and SZ32,33,34,35,36,37,38,39.

SZ have been conceptualized as a neurodevelopmental disorder40,41 and deficient myelination during adolescence has been included among the core features of the prodromal phase42 with WM aberrations present before disease onset43,44. Along with evidence of accelerated brain changes in adult SZ45,46,47, neurodevelopmental theories strongly support the need for a dynamic lifespan perspective. The magnitude and modulators of group differences vary across life, e.g. manifested as delayed developmental trajectories48 and progressive aging-related changes in adulthood49. Moreover, whereas the largest meta-analysis to date revealed no significant group by sex interactions, effect sizes for female patients were significantly larger compared to the effect sizes for males for global FA16 (but see another recent review and meta-analysis which found no significant sex-related differences in effect sizes when comparing patients and controls within males and females, respectively50). Although the evidence of strong modulating effects of sex on WM abnormalities in severe mental disorders is lacking, sexual dimorphisms in brain biology and clinical expression warrant further studies on possible sex by diagnosis interactions on the human brain.

In order to address these unresolved issues, our main aim was to compare several DTI indices across the brain between patients diagnosed with a SZ spectrum disorder (n = 128), BD (n = 61), and HC (n = 293), both within and across males and females, using tract-based spatial statistics (TBSS)51. Based on the literature we expected widespread WM microstructural alterations in SZ compared to HC, in particular lower FA. Additionally, based on clinical severity, we anticipated moderate group differences between BD and HC, with BD showing less distinct and distributed abnormalities. To comply with a dynamic age-variant perspective, we tested for group by age interactions and compared effect sizes within age cohorts using a sliding window technique52, and also tested for group by sex interactions and compared effect sizes within females and males, respectively. DTI based indices of WM microstructure are highly sensitive to differences in data quality, e.g. due to subject motion, which may bias the results53,54,55. Since previous studies have often failed to report quality control (QC) measures or simply omitted systematic QC altogether, it is unknown if reported group effects are biased due to differences in data quality. Hence, we employed a stringent multi-step exclusion protocol based on quantitative quality assessment, and compared groups at different levels to assess the relevance.

Results

Demographics and clinical characteristics

Table 1 summarizes demographics and clinical characteristics. There was a significant main effect of group on age (F = 4.2, p = 0.016), education (F = 34.1, p < 0.001) and IQ (F = 37.3, p < 0.001), with higher age in HC compared to SZ (Supplementary Fig. S1), longer education and higher IQ (HC > BD > SZ). Compared to BD, SZ had higher symptom severity as measured by The Positive and Negative Syndrome Scale (PANSS) total (t = 8.2, p < 0.001), positive (t = 7.6, p < 0.001), negative (t = 6.4, p < 0.001), and disorganized (t = 4.2, p < 0.001) sub-scales, and the split version of Global Assessment of Functioning Scale split version (GAF)56 with GAF function (t = −5.5, p < 0.001) and GAF symptom (t = −7.4, p < 0.001).

Table 1 Demographic and clinical dataa.

Main effects of diagnostic groups on DTI

Figure 1 and summarize results from voxelwise analyses testing for main effects of group on the DTI indices. We found significant and widespread main effects of group on FA and radial diffusivity (RD), including the corpus callosum, superior longitudinal fasciculus, fornix, cingulum, forceps major and inferior fronto-occipital fasciculus. Pairwise comparisons revealed widespread FA reductions and RD increases in SZ compared to HC, and FA reductions in SZ compared to BD. No other group comparisons yielded significant effects.

Figure 1
figure 1

Colored voxels show significantly decreased (blue) and increased (red) DTI-indices in SZ patients relative to HC and BD. Group differences are thresholded at p < 0.05 (two-tailed) after permutation testing using threshold free cluster enhancement (TFCE). Note that the white matter skeleton has been slightly thickened to aid visualisation.

Global DTI measures and sliding window approach

Figure 2 shows mean skeleton DTI values plotted as a function of age and group. Table 2 summarizes the results from linear models accounting for age, age2, sex and diagnosis. We found significant main effects of group, sex and age on FA. There was no significant group by age or group by age2 interaction; therefore the model was run without the interaction terms. Pairwise comparisons revealed lower FA in SZ compared to HC, and lower FA in SZ compared to BD. Females showed significantly lower FA compared to males. Figure 2, Supplemental Fig. S2 and Table 3 summarize results from the bootstrapped age fitting procedure, yielding estimates of the mean and standard deviation of age at maximum FA and minimum RD, MD and axial diffusivity (AD) within groups. The overlap in confidence intervals (Table 3) and the comparison against empirical null distributions generated using permutation testing (see Methods and Supplemental Fig. S2) revealed no significant between-group differences in age at maximum (FA) or minimum (RD, MD, and AD). Supplemental Figs S3S6 summarize the bootstrapped age fitting procedure for regions-of-interest (ROI). Briefly, the majority of ROIs show a consistent pattern with early peak in FA in SZ compared to BD and HC. However, the results from left and right cingulum have a more intricate pattern with BD reporting an older age peak. For RD, AD and MD the trend is less consistent and more complex.

Figure 2
figure 2

Plots (ad) Mean skeleton DTI values plotted as a function of age and group (HC = healthy controls, BD = bipolar disorders, SZ = schizophrenia spectrum disorders). Plots (eh) Violin plot depicting the fitted values for each group. Plots (il) Uncertainty estimates of the age within each group when maximum FA, minimum RD, MD and AD are reached from a bootstrap procedure with 10 000 resamples.

Table 2 Mean skeleton DTI metrics within groups.
Table 3 Age where maximum FA or minimum RD, MD or AD were reached.

Effect sizes within different age-bins from the sliding-window technique are presented in Supplementary Fig. S7. For FA, MD, RD, effect sizes for HC vs. SZ increased until the late 20 s. Effect sizes for SZ vs. BD showed a similar pattern for FA, and more complex non-linear associations for MD, RD and AD. Effect sizes for HC vs. BD for FA straddled around 0 throughout the sampled age range. For RD, MD, and AD the effect sizes showed more complex non-linear associations.

Sex related differences

Mean skeleton and ROI analyses revealed no significant sex-by-diagnosis interaction effects on DTI WM metrics. Supplementary Table S1 shows results from group comparisons within females and males, respectively. Briefly, the analysis revealed main effects of group on FA both in males (F = 3.36, p = 0.036) and females (F = 3.99, p = 0.02). Pairwise comparisons revealed lower FA in SZ compared to HC in both sexes and lower FA in SZ compared to BD in females.

Associations with symptom domains

Mean skeleton and ROI analyses revealed no significant associations with GAF and PANSS domain scores across patient groups. Global and ROI-based t- and p-statistics are summarized in Supplementary Table S2 and Supplementary Fig. S8, respectively.

Effects of quality control

Visual inspection of the datasets with QC summary z-score below −2.5 (n = 35, see below) indicated no clear reason for exclusion. Therefore, the main analyses were run on the entire dataset, but we also present results using varying QC levels. Figure 3 summarizes the effects of QC on the mean skeleton data. Effect sizes for HC vs. SZ and BD vs. SZ increased with QC stringency for all metrics except AD, which showed a more complex pattern. The effect size for HC vs. BD remained relative unchanged as a function of QC. Voxelwise analysis revealed highly similar patterns as those obtained using the full sample (Fig. 1 and Supplementary Fig. S9).

Figure 3
figure 3

Mean of skeleton DTI metrics plotted across quality control subgroup analyses (A) and Cohens d for pairwise comparisons across quality control subgroup analyses. (B) The labels on the x-axes reflect the number of participants in each analysis. The error bars of part A represent the standard error of the mean.

Subgroup and age restricted analyses

Density plots showing the distribution of the four DTI metrics for each of the subgroups within each diagnostic group are presented in Supplementary Fig. S10. The results from the subgroup analyses are presented in Supplementary Table S3, while the age restricted analyses are presented in Supplementary Table S4. Briefly, but not limited to, for the diagnostic subgroups there were significant differences between psychosis not otherwise specified (PNOS) and a strict SZ diagnosis for global MD (p = 0.011, uncorrected) and global AD (p = 0.002, uncorrected). A similar pattern was observed for the BD subgroups, with significant difference between BDI and BDII for MD (p = 0.035, uncorrected) and AD (p = 0.035, uncorrected). For the age- restricted analyses (55 years and younger) we observed main effect of group for FA only, with pairwise comparisons indicating lower global FA in SZ compared to HC.

ROI analyses

Figure 4 and Table 4 summarize the ROI results. Most ROIs showed main effects of group for FA (ηps2: 0.001–0.029), with strongest effects in the body (BCC) and splenium (SCC) of the corpus callosum and forceps major. We found substantial effects of group in 7 and 1 of the 23 ROIs in RD and AD. We found a nominal significant (p < 0.05, uncorrected) age by group interactions for MD in the BCC (ηps2 < 0.013, p = 0.044), indicating larger group differences with increasing age. Since no age by group interactions remained after corrections for multiple comparisons, all main effects and results from pairwise comparisons were computed without the interaction term in the models.

Figure 4
figure 4

Results from region of interest (ROI) analyses with mean difference and variance from pairwise comparisons plotted for each DTI metric. The error bars represent 95% confidence intervals. List of abbreviations: Genu of corpus callosum (GCC), Body of corpus callosum (BCC), Splenium of corpus callosum (SCC), Anterior thalamic radiation L (ATR L), Anterior thalamic radiation R (ATR R), Corticospinal tract L (CST L), Corticospinal tract R (CST R), Cingulum (cingulate gyrus) L (CGL), Cingulum (cingulate gyrus) R (CG R), Cingulum (hippocampus) L (CGH L), Cingulum (hippocampus) R (CGH R), Forceps major (FMJ), Forceps minor (FMI), Inferior fronto-occipital fasciculus L (IFO L), Inferior fronto-occipital fasciculus R (IFO R), Inferior longitudinal fasciculus L (ILF L), Inferior longitudinal fasciculus R (ILF R), Superior longitudinal fasciculus L (SLF L), Superior longitudinal fasciculus R (SLF R), Uncinate fasciculus L (UNC L), Uncinate fasciculus R (UNC R), Superior longitudinal fasciculus (temporal part) L (Temporal SLF L), Superior longitudinal fasciculus (temporal part) R(Temporal SLF R).

Table 4 Anatomical regions of interest (ROI) analyses.

Discussion

One of the major implications of the brain dysconnectivity hypothesis of psychotic disorders is that the WM microstructural layout and integrity modulates risk and give rise to a range of symptoms. In order to test this hypothesis, we compared DTI metrics between patients with SZ and HC across the brain. Our inclusion of a group of patients with BD allowed us to test for diagnostic specificity or, conversely, cross-diagnostic convergence. In line with our primary hypothesis and converging evidence16, the results revealed robust differences between patients with SZ and HC on several metrics, in particular lower FA across the brain in patients, even after careful quality assessment. In general, the results were not strongly dependent on age or sex and we found no significant associations with symptoms across groups. Adding to the accumulated evidence of brain gray matter abnormalities in patients with severe mental illness52,57, these results support converging evidence implicating WM abnormalities in SZ and suggest these abnormalities are more pronounced for SZ than for BD.

Clinical overlaps have motivated a dimensional approach to reveal common and distinct disease mechanisms in SZ and BD. In line with cognitive and genetic studies23,24 suggesting several commonalities, brain imaging has not revealed structural or functional brain characteristics unambiguously distinguishing the two disorders, and previous DTI studies comparing SZ and BD have been largely inconclusive32,34,35,58.

The neurobiological underpinnings of DTI metrics are complex and multidimensional, and our findings do not allow for interpretation regarding specific cellular processes. Previous studies have shown associations between RD and myelin related processes59,60, and higher RD may suggest reduced myelin integrity in SZ, in particular when considered in light of genetic studies reporting altered expression of genes involved in lipid homeostasis and myelination9,61. Complementary models implicate microglial inflammatory processes and oxidative stress in WM pathology62,63, and inflammation-related cytokines and growth factors have been associated with reduced FA and increased RD and MD in BD64. Further studies are needed to delineate the roles of myelination and inflammation for WM integrity and mental health across the lifespan65,66.

Considering the strong impact of age on brain WM microstructure67,68, characterizing age trajectories within groups may provide indirect information about the temporal evolution of aberrations in an ontogenetic perspective. Both neurodevelopmental and neurodegenerative models of the development and sustainment of psychosis have been formulated46,47,69, and both genetic liability and neurodevelopmental perturbations play critical roles in the modulation of risk43,44. Indeed, the emergence of psychotic symptoms in late adolescence and early adulthood may in fact reflect late stages of the disorder42. Patients with early onset SZ show microstructural aberrations70, which could manifest as delayed WM development during adolescence48. The current lack of group-by-age interactions may suggest the observed group differences are explained by events prior to the sampled age-range, which would indicate parallel age trajectories in adulthood47.

For HC, we observed a peak of FA at approx. 32 years followed by decreases until the maximal sampled age, which is highly corresponding with previous cross-sectional studies67. The FA trajectory for SZ (approx. 27 years) and BD (approx. 27 years) showed an earlier peak, followed by a linear decrease until the maximum age. Although permutation testing revealed no significant differences in age at peak FA, the sliding window approach, which provide further insight beyond a standard age-by-diagnosis interaction test, suggested the magnitude of the group differences are not completely invariant to age, with indications of increasing group differences in FA between SZ and HC until the late 20 s. In addition to suggesting early and possibly accelerating age-related differences in SZ, moderate age-related differences in effect sizes may also reflect a combination of clinical heterogeneity, sampling issues, and power. The lack of significant age by group interactions possibly also hints at the shortcomings of simple models for delineating and comparing complex trajectories71. Future studies utilizing a longitudinal design including a wide age-range and participants at genetic or clinical risk who have still not developed psychosis are needed to characterize the trajectories of the dynamic WM aberrations during the course of brain maturation and disease development22.

Although detrimental effects of subject motion and other sources of noise on DTI metrics have been documented53,55,72, most previous DTI studies on psychotic disorders have not provided sufficient details regarding the employed QC procedures or included any quantitative QC measures as covariates. It is therefore largely unknown to which degree various sources of noise have contributed to the reported group differences. A major strength of this study is the use of an automated approach for identifying and replacing slices with signal loss due to bulk motion, considerably increasing temporal signal-to-noise ratio (tSNR21), and a comprehensive QC protocol including both manual and automated quantitative measures. Comparisons of summary statistics and group differences at different steps in the QC and exclusion procedure revealed a tendency of increasing group differences in FA between SZ and HC with the exclusion of noisy data. These results indicate that stringent QC may increase sensitivity to WM aberrations in SZ, and suggest that future studies should carefully address different sources of noise in their datasets before interpreting their findings as reflecting relevant pathophysiology.

Some limitations should be considered while interpreting our findings. The influence of medication on WM is debated73,74,75. As most patients were medicated, with the majority of patients taking antipsychotics, confounding effects of medication cannot be ruled out. Future studies with an appropriate design for assessing medication effects are needed. Despite our current lack of significant sex by diagnosis interaction, there is a growing appreciation of sexual dimorphisms in brain and behavior both in health and disease, which warrants further investigations. Further, our cross-sectional design is not suitable for delineating dynamic individual changes in WM microstructure, and further studies utilizing a prospective design in younger children and adolescents are needed to map microstructural changes to risk and development of psychosis. Integrating a wider range of MRI modalities with clinical, cognitive and genetic features21,76, and including microstructural indices based on multi-compartment diffusion models (e.g., Neurite Orientation Dispersion and Density Imaging77,78, free water imaging79, or restriction spectrum imaging80), cortical and subcortical morphometry and functional measures, may prove helpful for increasing diagnostic sensitivity and specificity.

In conclusion, we report widespread WM microstructural aberration in patients with SZ compared to BD and HC. We found no significant differences between patients with BD and HC, suggesting the biophysical processes causing DTI based WM abnormalities in severe mental disorders are more prominent for SZ. These results are in line with converging genetic and pathological evidence implicating neuroinflammatory and lipid and myelin processes in SZ pathophysiology.

Methods

Sample

Adult patients were recruited from psychiatric units in four major hospitals Oslo. Patients had to fulfill criteria for a Structured Clinical Interview (SCID)81 DSM-IV diagnosis of schizophrenia spectrum disorder, collectively referred to as SZ (n = 128 including schizophrenia (n = 70), schizoaffective (n = 18), schizophreniform (n = 7)) and psychosis not otherwise specified (n = 33), or bipolar spectrum disorder, collectively referred to as BD (n = 61 including BDI (n = 39), BDII (n = 17) and BD NOS (n = 5)). The sample comprised both medicated (n = 133), unmedicated (n = 7) and patients missing information regarding medication status (n = 49).

293 healthy controls from the same catchment area were invited through a stratified randomized selection from the national records. Exclusion criteria for both patients and HC included hospitalized head trauma, neurological disorder or IQ below 70. In addition, HC were screened with a questionnaire about severe mental illness and the Primary Care Evaluation of Mental Disorders (PRIME-MD)82. Exclusion criteria included somatic disease, substance abuse or dependency the last 12 months or a first-degree relative with a lifetime history of severe psychiatric disorder (SZ, BD, or major depressive disorder). The Tematisk Område Psykoser (TOP) Study is approved by the Regional Ethics Committtee (REK Sør-Øst C, 2009/2485) and the Norwegian Data Inspectorate (2003/2052). Study protocol and procedures adhered to the ethics approval and to the Declaration of Helsinki. All participants provided written informed consent, see SI for more information regarding neuropsychological and clinical assessment. Due to confidentiality and privacy of participant information data may not be shared readily online, but data can be requested by contacting the authors.

MRI acquisition

Imaging was performed on a General Electric (Signa HDxt) 3 T scanner using an 8-channel head coil at Oslo University Hospital. For DTI, a 2D spin-echo whole-brain echo planar imaging pulse with the following parameters was used: repetition time: 15 s; echo time: 85 ms; flip angle: 90°; slice thickness: 2.5 mm; in-plane resolution: 1.875 * 1.875 mm; 30 volumes with different gradient directions (b = 1000 s/mm2) in addition to two b = 0 volumes with reversed phase-encode (blip up/down) were acquired.

DTI processing

Image analyses and tensor calculations were performed using FSL83,84,85. Pre-processing steps included topup (http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/topup)86 and eddy (http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/eddy)87,88 to correct for geometrical distortions and eddy currents. Topup uses information from the reversed phase-encode blips, resulting in pairs of images with distortions going in opposite directions. From these image pairs the susceptibility-induced off-resonance field was estimated and the two images were combined into a single corrected one. Eddy detects and replaces slices affected by signal loss due to bulk motion during diffusion encoding, which is performed within an integrated framework along with correction for susceptibility induced distortions, eddy currents and motion88. In order to assess the effect of replacement of dropout-slices on tSNR we also processed the data using eddy without slice replacement (Supplementary Fig. S11). Briefly, mean tSNR was significantly (t = 25.76, p < 0.001) lower when running eddy without slice replacement (mean: 7.77, SD: 0.52) compared to with slice replacement (mean 8.79, SD: 0.70). There was no significant group differences in the amount of slices replaced (F = 1.046, p = 0.352, mean group slice replacement: HC: 10.92 (±7.36), BD: 10.52 (±7.40), SZ: 12.46 (±9.42)).

Diffusion tensor fitting was done using dtifit in FSL. FA is a scalar value of diffusion directionality, MD was computed as the mean of all three eigenvalues, RD as the mean of the second and third eigenvalue89, while AD represent the principal eigenvalue.

Prior to statistical analyses we employed a stepwise QC procedure, including maximum voxel intensity outlier count (MAXVOX)90 and tSNR90. Since reduced data quality due to subject motion and other factors may bias the results in clinical studies, we defined various quantitative QC metrics and tested for group differences within different QC strata. Specifically, we devised a semi-qualitative QC protocol including methods provided in DTIPrep91 and tSNR90. Supplementary Fig. S12 shows a flowchart of the QC protocol. At each step the distributions of the quality metrics were visually inspected. In our step-wise exclusion protocol, datasets were excluded based on a summary score utilizing (1) maximum MAXVOX90 and (2) tSNR90. The summary score was formed by first inverting the MAXVOX score, z-normalize both scores independently, add 10 to each of the z-scores (to avoid negative values), and then computing the product of the two. This product was then z-normalised, with low scores indicating worse quality. In an iterative fashion, subjects with a QC sum z-score below −2.5 were excluded, and the group statistics were recomputed. This was repeated until no datasets had a z-score below −2.5. Briefly, the slice-wise check and the MAXVOX screens the DWI data for intensity related artifacts while tSNR is a global summary measure. See SI for further details regarding the QC such as summary stats for each step of the QC procedure (Supplementary Table S5), demographic overview of excluded participants (Supplementary Table S6), density plots of DTI metrics before and after exclusion (Supplementary Fig. S13) and voxel-wise analyses after QC (Supplementary Fig. S9). In short, a thorough inspection of the excluded and included participants after QC suggested that general quality of the data is good. Thus, we present results on the full dataset with supplemental and complementary results from a stringent QC.

Voxelwise analysis of FA, MD, AD and RD were carried out using TBSS51. FA volumes were skull-stripped and aligned to the FMRIB58_FA template supplied by FSL using nonlinear registration (FNIRT)92. Next, mean FA were derived and thinned to create a mean FA skeleton, representing the center of all tracts common across subjects. The same warping and skeletonization was repeated for MD, AD and RD. We thresholded and binarized the mean FA skeleton at FA > 0.2 before feeding the data into voxelwise statistics.

Statistical analyses

Voxelwise statistical analyses were performed using permutation testing, implemented in FSL’s randomise93. Main effects of diagnosis on FA, RD, MD and AD were tested using general linear models (GLM) by forming pairwise group contrasts and corresponding F-tests. Since previous studies have documented strong curvilinear relationships between DTI features and age throughout the adult lifespan67, we included age, age2 and sex as covariates. The data was tested against an empirical null distribution generated by 5000 permutations and threshold free cluster enhancement (TFCE)94 was used to avoid arbitrarily defining the cluster-forming threshold. Voxelwise maps were thresholded at p < 0.05 and corrected for multiple comparisons across space. Mean FA, MD, RD and AD across the brain and within significant clusters were submitted to R95 for peak estimation and to compute effect sizes and visualization. In a resampling with replacement (bootstrapping) procedure we fitted the DTI data to age using local polynomial regression function (LOESS). LOESS has previously been used in lifespan studies67 and avoids some of the shortcoming of polynomial models for age fitting71. Using boot package96,97 in R, we repeated the age fitting procedure for each of the 10,000 bootstrapped samples for each group to estimate the mean age at the maximum (FA) or minimum (MD, RD, AD) value across iterations, and its uncertainty with confidence intervals calculated using the adjusted bootstrap percentile method. Additionally, the group differences in age at peak FA was tested against an empirical null distribution generated by 10000 permutations, generated by randomly shuffling group labels and computing the pairwise group differences at each iteration. All pairwise differences were combined into one null distribution, and the differences in the true data were compared to this common null, enabling correction for multiple comparisons across all pairwise comparisons.

We tested for associations between GAF/PANSS domains and FA, MD, RD and AD across both patient groups (SZ and BD grouped together) in the whole brain and within specific regions, covarying for age, age2 and sex. False discovery rate (FDR)98 and Bonferroni was used to correct for multiple testing.

Differences between and within subgroups (see SI for more information), group by age and group by age2 interactions on the mean skeleton DTI metrics were tested. In order to account for heterogeneity in the diagnostic groups we ran subgroup analyses on the mean skeleton metrics on the largest subgroups (strict SZ, PNOS, BDI and BDII). Additionally, in a control analysis to confirm that possible differences in age distribution did not influence the main results we excluded participants over 55 years of age and ran mean skeleton analyses across groups.

We performed a sliding window technique to obtain effect sizes for each of the pairwise group comparison within different age-strata. Utilizing the zoo R package99, we slid a window of 150 participants in steps of 5 participants along the sorted age span. At each step, we computed a linear model investigating effects of diagnosis, accounting for sex. We plotted the resulting t-values and effect sizes (Cohen’s d) representing pairwise group differences against the mean age of each sliding group and fit a LOESS function using ggplot2 in R100. In order to test if group differences varied between females and males we reran the analysis when including a sex-by-diagnosis interaction term for the mean skeleton and ROI analyses.

To facilitate future meta-analyses, we calculated raw mean DTI values across the skeleton and within various anatomical regions of interest (ROIs) based on the intersection between the TBSS skeleton and probabilistic atlases101,102. R was used for further analysis, including linear models with each of the ROI DTI value as dependent variable, diagnostic group and sex as fixed factors, and age and age2 as covariates.