The progression of disorder-specific brain pattern expression in schizophrenia over 9 years

Age plays a crucial role in the performance of schizophrenia vs. controls (SZ-HC) neuroimaging-based machine learning (ML) models as the accuracy of identifying first-episode psychosis from controls is poor compared to chronic patients. Resolving whether this finding reflects longitudinal progression in a disorder-specific brain pattern or a systematic but non-disorder-specific deviation from a normal brain aging (BA) trajectory in schizophrenia would help the clinical translation of diagnostic ML models. We trained two ML models on structural MRI data: an SZ-HC model based on 70 schizophrenia patients and 74 controls and a BA model (based on 561 healthy individuals, age range = 66 years). We then investigated the two models’ predictions in the naturalistic longitudinal Northern Finland Birth Cohort 1966 (NFBC1966) following 29 schizophrenia and 61 controls for nine years. The SZ-HC model’s schizophrenia-specificity was further assessed by utilizing independent validation (62 schizophrenia, 95 controls) and depression samples (203 depression, 203 controls). We found better performance at the NFBC1966 follow-up (sensitivity = 75.9%, specificity = 83.6%) compared to the baseline (sensitivity = 58.6%, specificity = 86.9%). This finding resulted from progression in disorder-specific pattern expression in schizophrenia and was not explained by concomitant acceleration of brain aging. The disorder-specific pattern’s progression reflected longitudinal changes in cognition, outcomes, and local brain changes, while BA captured treatment-related and global brain alterations. The SZ-HC model was also generalizable to independent schizophrenia validation samples but classified depression as control subjects. Our research underlines the importance of taking account of longitudinal progression in a disorder-specific pattern in schizophrenia when developing ML classifiers for different age groups.


Description of the COBRE
The dataset includes 71 schizophrenia and 74 controls. The exclusion criteria of the dataset were: a history of neurological disorder or mental retardation, past severe head trauma with more than 5 minutes loss of consciousness, history of substance abuse or dependence within the last 12 months.
Schizophrenia diagnoses were evaluated using the Structured Clinical Interview for DSM-IV. The dataset was downloaded from the COINS database (https://coins.trendscenter.org).
Structural MRI was conducted using 3T SIEMENS MAGNETOM TrioTim syngo MR B17, and a multi-

Description of the NFBC1966
The NFBC1966 is an unselected population birth cohort based on 12 058 deliveries in the two Northernmost provinces in Finland (Oulu and Lapland). We used the nationwide Finnish Hospital Discharge Register (FHDR, currently known as the Care Register for Health Care, https://thl.fi/en/web/thlfi-en/statistics/information-on-statistics/register-descriptions/careregister-for-health-care) to identify the NFBC1966 members with a psychotic disorder. This register has data on patients discharged from hospitals since 1969. Thus, we did not draw schizophrenia cases from psychiatric services. In the baseline, we validated all schizophrenia diagnoses according to the Diagnostic and Statistical Manual of Mental Disorders Third Edition Revised (DSM-III-R) criteria. In the follow-up, we used SCID-IV as a diagnostic instrument, supplemented by anamnestic information (including hospital medical records until the year 2009). Besides, we selected random control participants from the same birth cohort. The only inclusion criterion was that these participants had no history of a psychotic episode. The flow-chart presents the selection of the participants of the NFBC1966 in the present study (Supplementary Figure 1). All the study participants gave written informed consent. We excluded individuals with severe head trauma (based on register data and interviews), psychotic syndromes other than schizophrenia, a neurological disorder with a potential effect on brain structure, and poor-quality sMRI at baseline or the follow-up.
We used the same 1.5 T GE Signa (General Electric, Milwaukee, Wisconsin) at both the baseline (34y) and the follow-up (43y). Note, however, that there was an update in the scanner electronics and the sequence between the timepoints, as described below. At the baseline, we obtained T1weighted high-resolution three-dimensional spoiled gradient echo (3D SPGR) using the following parameters: coronal plane covering the whole brain (slice thickness 1.5 mm), in-plane resolution matrix size 256 x 256, voxel size 1.5 mm x 1 mm x 1 mm; TR 35 ms; TE 5 ms, flip angle = 35. Between the timepoints, the scanner was upgraded into HDxt with a new gradient system and parallel image data acquisition with an eight-channel receiving coil. At the follow-up, we obtained the T1 weighted images using a 3D fast spoiled gradient echo (FSPGR) sequence with the following parameters: slice thickness = 1 mm; in-plane resolution matrix size 256 x 256; voxel size 1 mm x 1 mm x 1 mm, TR = 12.576 ms, TE = 5.3 ms and flip angle =20.
Schizophrenia and control participants completed the California Verbal Learning Test (CVLT) 1 , Abstraction, Inhibition and Working Memory task (AIM) 2 , and The Visual Object Learning Test (VOLT) 3 at both timepoints. In the CVLT, we used the total score of the Immediate free recall of trials 1-5 since this score has been demonstrated as having the most significant effect size of the CVLT variables in detecting verbal learning deficits in schizophrenia 4 . AIM results in two outcome measures: total score of the abstraction trials (AIM-) and total score of trials with abstraction and memory (AIM+). VOLT measures visual-spatial learning and memory analogous. Positive and Negative Syndrome Scale (PANSS) 7 was used to measure symptom dimensions from one week before the baseline and the follow-up. At the baseline, PANSS was acquired from the SCID I diagnostic interview. At the follow-up, PANSS was acquired from a PANSS specific interview. SOFAS and Clinical Global Impression scale (CGI) were assessed via interviews. The duration of the disorder was acquired from the medical records and registers. The number of hospitalizations was acquired from the nationwide registers and was used as a proxy of relapse.

Description of the validation datasets
We used two open datasets, namely the Consortium for Neuropsychiatric Phenomics (CNP) 8 and the Neuromorphometry by Computer Algorithm Chicago (NMorphCH) 9 . The CNP was downloaded from the OpenfMRI (https://www.openfmri.org) and the NMorphCH from the Schizconnect (http://schizconnect.org).
The NMorphCH (44 Schizophrenia and 43 controls) is a longitudinal study examining the clinical, cognitive, and neuroimaging (MRI) data from schizophrenia and control subjects at baseline and after two years. Schizophrenia diagnoses were acquired using DSM-IV. The mean age in schizophrenia was 32.5 (SD = 6.9) and 31.5 (SD = 8.4) in controls. 29 schizophrenia and 24 controls in the NMorphCH had available longitudinal data. In the NMorphCH, sMRI was conducted with 3 T using the following parameters: TR = 3.15 ms, TE = 1.37 ms, flip angle = 8°, 160 x 160 matrix, 128 slices, slice thickness = 1.6 mm.
The CNP participants were ages 21-50 and were recruited by community advertisements from the Los Angeles area. Schizophrenia diagnosis was verified with the SCID-IV. Due to different scanner types, we utilized only those subjects in the CNP that were imaged using Siemens version Syngo MR B15, leaving 52 controls without any disorder (mean age = 30.7, SD = 9.13) and 18 schizophrenia patients (mean age = 36.8, SD = 8.7). The MPRAGE in the CNP were imaged with 3 T using the following parameters: TR = 1.9 s, TE = 2.26 ms, FOV = 250 mm, matrix =256 × 256, sagittal plane, slice thickness = 1 mm, 176 slices.

Description of the MDD data
In the Munich sample (MUC) 10 , patients with major depression (N = 103) were examined at the Department of Psychiatry and Psychotherapy, Ludwig-Maximilian-University Munich (LMU) using the Structured Clinical Interview for DSM-IV and the Hamilton Depression Rating Scale. The exclusion criteria were insufficient knowledge of German or a history of neurological disorders (e.g., dementia), somatic disorders affecting the central nervous system, personality disorders, substance abuse or dependence, anorexia nervosa, or mental disability (IQ<70). The mean age in the MDD group was 42.1, SD = 11.9. We also utilized 103 sex and age-matched control participants. The control participants' exclusion criteria included a history of head trauma, cortisol treatments, somatic conditions affecting the central nervous system, present or past alcohol abuse, and a personal or familial history of psychiatric disorders in first-degree relatives. Structural MRI scanning was conducted using the Siemens MAGNETOM Vision 1.5T. The 3D-MPRAGE sequence was conducted using the following parameters: TE, 4.9ms; TR, 11.6ms; the field of view, 230mm; matrix, 512×512×126 contiguous axial slices; voxel dimensions, 0.45×0.45×1.5mm.
We also utilized the Münster site in the Marburg-Münster Affective Disorders Cohort Study (referred to hereinafter as Münster dataset) 11,12 . For the present study, we utilized 100 MDD and 100 sex and age-matched control participants (mean age = 30.5, SD = 8.0). MDD diagnosis was acquired with the SCID interview according to the DSM-IV criteria. The study's exclusion criteria were verbal IQ < 80, substance-related disorders, history of severe neurological or medical disorders, and current benzodiazepine use. Structural MRI data were acquired at a 3T MRI scanner

The effect of attrition rate in the NFBC1966 on the SVM decision scores
Although the non-participating schizophrenia patients had higher SVM decision scores (mean=0.91;

The effect of image quality on the prediction performance
There were no differences (T-tests, all P-values>0.1) between those who were correctly (vs. not correctly) classified as schizophrenia or controls in comparison to the corresponding image quality.
Even when we combined the COBRE and the NFBC1966 baseline, there were no differences in classification error with respect to image quality (Cohen's d=0. 16, t(106)=-1.09, P-value=0.28). The same was true when the COBRE sample was combined with the NFBC1966 follow-up (Cohen's d=0. 16, t(95)=-1.03, P-value=0.31). These comparisons are shown in Supplementary Figure 3.

The longitudinal changes in SVM decision scores in the NMorphCH validation sample
Using those participants of the NMorphCH with 2-year follow-up data (29 schizophrenia and 24 controls), we found that schizophrenia patients' SVM decision scores changed into more schizophrenia-likeness, but this change was not significant (Cohen's d=0.11, t(28)=0.61, P-value=0.55). Furthermore, we found no timepoint by group interaction on the SVM decision scores (F(1,51)=0.32, P-value=0.57). This is shown in Supplementary Figure 16.

Classification of schizophrenia vs. controls using the NFBC1966
We used the same nested cross-validation design at the baseline and the follow-up as in the main analyses in the COBRE, but now all the analyses were conducted in the NFBC1966. At the baseline, our classification of schizophrenia from controls resulted in a BAC of 66.7% (sensitivity=44.7%, specificity=77.0%). This model's prediction performance (AUC=0.678) did not differ from the performance predicted using the COBRE-trained models with OOCV at the baseline (AUC=0.76), DeLong's test for two correlated ROC curves: Z = 1.0639, P-value = 0.287. At the follow-up, our classification of schizophrenia from controls resulted in a BAC of 78.0% (sensitivity=72.4%, specificity=83.6%). The prediction performance (AUC= 0.87) did not differ from the results predicted using the COBRE-models at the follow-up (AUC=0.87), DeLong's test for two correlated ROC curves: Z = -0.087, p-value = 0.93. The corresponding ROC-curves for the schizophrenia vs. controls classification using the NFBC1966 are provided in Supplementary Figure 17.
"Long" vs. "short" disorder duration models of the COBRE The mean age of the "short" disorder duration subsample of the COBRE (i.e., disorder duration below the median) schizophrenia was 26.7 (SD=8.7) and in the "long" disorder duration subsample of the COBRE (i.e., disorder duration above the median) 48. 3 (8.7), T-test t(66.9)=10.26, Cohen's d=2.47, P-value<0.0001. Using the "long" disorder duration models, our classification of schizophrenia from controls resulted in a BAC of 64.3% (sensitivity=60%, specificity=68.6%). Using the "short" disorder duration models, our classification of schizophrenia from controls resulted in a BAC of 54.4% (sensitivity=41.2%, specificity=67.6%). The ROC curves are provided in Supplementary   Figure 18. There was a trend towards greater AUC in the "long" disorder duration model (AUC=0.7) compared to the "short" disorder duration (AUC=0.57), DeLong's test D = -1.3705, one-tailed P-value = 0.086. Further, SVM decision scores were higher in the schizophrenia patients of the COBRE that were used to train the "long" disorder duration model vs. schizophrenia that were used to train the "short" disorder duration model (Cohen's d=0.50, t(45)=2.08, one-tailed P-value=0.043).

The effect of disorder duration of the COBRE sample on the prediction performance in the NFBC1966
We applied the "long" and "short" disorder duration models to the NFBC1966 using OOCV. Both of these disorder duration models were based on half the COBRE sample and, therefore, were less than the recommended 130 subjects required to train a generalizable model 21 . Thus, these explorative analyses were considered proof of concept to investigate whether the divergence in a change in SVM decision scores between schizophrenia and controls is observed regardless of the training sample's disorder duration.
Using DeLong's test of paired AUC-curves, there was no difference in performance when comparing "long" vs. "short" (Z = -0.24, P-value=0.81) at the NFBC1966 follow-up. At the baseline, there was also no difference in performance when comparing "long" vs. "short" models (Z=0.534, P-value=0.59). The ROC-curves are provided in Supplementary Figure 19.

SVM decision score difference*timepoint interaction on the white matter density
Due to the relatively unexpected finding that SVM decision score difference*timepoint interaction related to the increases in the periventricular white matter of the grey matter maps, we explored whether this finding stems from partial volume effect. We suspected that we might detect a false increase in grey matter signal in the white matter if the grey matter boundary moves into white matter due to white matter atrophy over time. This was confirmed as we found that SVM decision score difference*timepoint related to decreases of white matter in the same regions that showed increases in the grey matter density contrast (Supplementary Figure 12).

The relationship between BMI and CVLT with the SVM decision scores in the controls of the NFBC1966
Given that both CVLT and BMI are related to SVM decision scores in schizophrenia, we also tested whether similar relationships is found for the control subjects of the NFBC1966. Across the timepoints, we found no relationship between CVLT and SVM decision scores (F(1,104)=0.51, P-