## Introduction

Bipolar II disorder (BPII) is the commonest subtype of bipolar disorders1. Depression dominates the course of BPII and is the commonest mode of clinical presentation of the disorder. This phenomenological overlap with unipolar depression (UD) belies important clinical differences—poorer response to antidepressants, younger onset, higher recurrence, atypical features, cognitive impairment and increased suicide rates compared with UD2,3,4,5. Understanding the neurobiological differences between BPII depression and UD should yield important insights into the core etiological mechanisms differentiating between bipolar and unipolar illnesses and guide therapeutic development.

Diffusion tensor imaging (DTI) studies on bipolar disorders have found white matter (WM) connectivity disruption, consistent with a model of emotional dysregulation, in the fronto-limbic circuitry (cingulum, uncinate fasciculus), inter-hemispheric circuitry (corpus callosum), the fronto-parieto-temporal long associative fibers, and frontal and temporal regions, compared to healthy subjects6,7,8,9, with more widespread disruptions compared to UD7,10,11,12. There has been a paucity of direct comparisons between UD and BPII, and the findings above were mostly based on bipolar I (BPI) samples. Findings from the few studies pertaining to BPII were also less consistent. One study reported increased fiber alterations in temporal and inferior prefrontal region in BPII13. while others found sparing of WM disruption in uncinate fasciculus14 and the corpus callosum9. It is also uncertain whether the contradicting findings were attributable to heterogeneity in illness chronicity (7.3–18.8 years)7,8,9,10,13. In fact, while WM disruptions has been proposed as an endophenotypic/trait marker of bipolarity15, oxidative stress from longer illnesses with more illness episodes may result in increased myelin disruption in bipolar disorders16. Heterogeneity in illness chronicity in the samples may therefore potentially contribute to the inconsistent findings. In addition, exposure to medications such as lithium17, antipsychotics17, anticonvulsants18, antidepressants19 in Bipolar Disorder (BD) has been known to result in changes in WM integrity that may further add heterogeneity to the results. This may be especially relevant to UD vs BPII comparisons in view of the considerable difference in medications prescribed for these two conditions. We therefore set out to examine WM abnormalities in treatment-naïve BPII subjects relatively close to onset, in comparison with UD and healthy controls (HC). DTI derived indices of WM integrity (fractional anisotropy [FA], mean diffusivity [MD], radial diffusivity [RD], and axial diffusivity [AD]) are obtained for 15 a priori, well-characterized white matter tracts implicated in BD18,20. The between-group differences are examined with a multi-variate statistical model to differentiate BPII from UD.

## Results

### Demographics

Twenty-seven each of BPII, UD and HC subjects were included in the analyses. There was no significant difference of age (F[2,78] = 0.79, p = 0.46), gender (X2[2, N = 81] = 1.5, p = 0.47), and years of education (X2[4, N = 81] = 5.12, p = 0.28) amongst the three groups (Supplementary Table S1).

### Clinical characteristics in BPII and UD patients

All patients were currently in a major depressive episode (MDE), as defined in the text revision of the fourth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV-TR), of at least moderate severity. UD and BPII patients had no significant between-group difference in Montgomery–Åsberg Depression Rating Scale (MADRS) scores (t[52] = 0.29, p = 0.86) or functional impairment (Short-form-36 health survey physical component summary [SF-36 PCS] t[52] = − 0.2, p = 0.89; mental component summary [MCS] t[52] = 0.68, p = 0.73), but mean current Young Mania Rating Scale (YMRS) score (t[52] = 3.2, p = 0.02, Cohen’s d = 0.87) was significantly higher in BPII. The bipolar patients have had on average 5 years since first depressive onset, numerically but not significantly longer than UD (t[52] = 1.97, p = 0.15). BPII patients had larger lifetime number of depressive episodes (t[52] = 2.41, p = 0.02, Cohen’s d = 0.91), number of episodes in the past year (t[52] = 2.81, p = 0.04, Cohen’s d = 0.77), lifetime number of episodes (t[52] = 2.64, p = 0.04, Cohen’s d = 0.73), family history of hypomania (HM)(X2[1, N = 81] = 6.48, p = 0.01, φc = 0.28), and bipolarity Index (t[52] = 6.54, p < 0.001, Cohen’s d = 1.78) than UD. HC had significantly higher estimated IQ score than BPII (p = 0.03) and UD (p = 0.01) patients (F[2,77] = 5.07, p = 0.009, ηp2 = 0.12) (Supplementary Table S1).

### Group differences of white matter measures

#### TRACULA

Bootstrapped ANOVA was done on each DTI measure at each tract with 1000 iterations and the bootstrapped $${F}_{B}$$ were extracted. Significant difference was found only for AD at the right superior longitudinal fasciculus (SLF) (temporal) (1.14 (UD), 1.17 (BPII), 1.16 (HC); F[2,78] = 6.93, 95% confidence interval (CI) of $${F}_{B}$$: 0.00073, 5.22; ηp2 = 0.15). No significant group differences of DTI measures were found in any of the remaining tracts (ps > 0.05; observed F fell within the 95% CIs of $${F}_{B}$$). Post-hoc t-tests indicated significantly lower AD at the right SLF (temporal) in UD compared to BPII and HC (t statistic not within 95% CIs of bootstrapped t statistic, Cohen’s d vs BPII = 1; Cohen’s d vs HC = 0.67), but no significant difference between BPII and HC (Table 1 and Fig. 1).

To explore the effect of age on the DTI measures in each diagnostic group, group-by-age ANOVA with bootstrapping was conducted (Supplementary Table S2). Results revealed significant interaction between group and age on MD at the right inferior longitudinal fasciculus (ILF) (7.73 × 10−4 (UD), 7.82 × 10−4 (BPII), 7.74 × 10−4 (HC); F = 5.28, 95% CI of FB 0.001, 5.02). Post-hoc one-way ANOVA showed significant effect of age in BPII only (F = 5.28, 95% CI of FB 0.001, 5.02), but not UD and HC (observed F fell within 95% CI of bootstrapped FB).

#### TBSS

Permutation tests with threshold-free cluster enhancement (TFCE) revealed no significant between-group voxel-wise difference of FA, AD and RD (ps > 0.05), adjusted for age, sex and education level.

### Correlations between TRACULA and clinical data

No significant correlation between clinical and DTI variables were found when data of both patient groups were combined (observed r fell within the 95% CIs of $${r}_{B}$$). Correlation was also examined for each patient group separately. In the UD group, AD at right superior longitudinal fasciculus (SLF; temporal) correlated negatively with family history of major depressive disorder (MDD; r = − 0.307, 95% CI of $${r}_{B}$$ − 0.196, 0.204). In BPII, AD of right SLF (temporal) positively correlated with family history of MDD (r = 0.203, 95% CI of $${r}_{B}$$ − 0.169, 0.136), and negatively with Bipolarity Index (r = − 0.121, 95% CI of $${r}_{B}$$ − 0.109,0.585). Correlations between AD at right SLF and other clinical variables were not significant (ps > 0.05; observed r fell within the 95% CIs of $${r}_{B}$$) (Table 2).

### Predictive models

Principal component analysis (PCA) results are illustrated in Fig. 2a,b. To understand the relationship between imaging and clinical variables, we first included five clinical and one DTI variables with significant differences between UD and BPII in the PCA: (1) total number of Major Depressive Episodes, (2) number of episodes in the past year, (3) total YMRS score, (4) lifetime total number of episodes, (5) family history of HM, and (6) AD at right SLF (temporal). Bipolarity Index was not included in the analysis as its inherent multi-dimensional nature would make it difficult to interpret in the multivariate model.

According to the scree plot, two components were found to account for 61.9% of variance in our data (Fig. 2a).

#### Component 1 (PC1): Episodic recurrence

Component 1 included three variables of medium-to-high negative loadings, namely lifetime total number of MDE, lifetime total number of episodes and the number of episodes in the past year.

#### Component 2 (PC2): WM integrity and family history/severity of hypomania

Component 2 included three medium-to-high negative loadings for right SLF (temporal) axial diffusivity, current YMRS score and family history of bipolar disorder.

These two components were then entered into linear discriminant analysis (LDA, which resulted in a linear discriminant function: 0.771 × PC1 + 1.066 × PC2, with 81.5% sensitivity and 85.2% specificity (Fig. 2c,d). Value above the cut-off at 0.706 indicated UD, whereas value below the cut-off indicated BPII.

A second PCA was then run with only clinical variables ((1) to (5)) to clarify the effect of inclusion of right SLF (temporal) AD in the model (Supplementary Figure S1). Results revealed a grossly similar component structure, where two components of episodic recurrence and family history/hypomania severity explained 67.6% of variance in the data, and the LDA classified patient groups with 69.2% sensitivity and 92.6% specificity.

## Discussion

We did not find evidence of increased white matter disruption in our sample of young and treatment-naïve depressed bipolar II patients, in comparison with age/sex/education matched unipolar depressed and healthy subjects. To our knowledge, this is the only neuroimaging study comparing treatment-naïve patients with bipolar II depression and unipolar depression.

Our findings stand in contrast to existing reports of widespread WM loss that were summarised in a recent ENIGMA meta-analysis of DTI studies on 1482 BD patients, which reported widespread WM abnormalities in BD, with no significant difference between the two subtypes9,18. This discrepancy could ensue from a few salient features in our sample.

Firstly, before we disregard the potential impact of WM in the bipolar subtypes, we need to consider firstly, that the lack of a BPI group precluded direct comparison with BPII subjects in our study, while two previous studies did report relative sparing of WM changes in uncinate fasciculus and corpus callosum in BPII compared to BPI and unaffected siblings9,14, which is consistent with the absence of WM loss in this BPII sample, and our earlier report of relatively preserved cognitive functioning of young treatment-naïve BPII patients21. These three studies all suffered from relatively small sample sizes (32, 58 and 20 BD, respectively), Further studies including probands and unaffected siblings of both BPI and BPII for comparison would be needed.

Secondly, our sample was medication-naïve. It is premature to conclude if the absence of WM disruption in our BPII sample was attributable to medication naivety. To begin with, structural and functional changes have been shown to commence in BD before receiving any medication22,23, and continued to be seen after medications were stopped for > 2 months24,25. On the other hand, studies on the effect of medications on brain structure in BD are conflicting. An earlier review suggested that medications generally did not contribute to structural differences between BD and HC, and where the effect was present, it was usually normalizing26. However, the authors also noted that many of the comparisons between medicated and unmedicated subjects were underpowered (sample sizes as small as n = 2), and a recent meta-analysis identified reduced FA in patients receiving antipsychotics and anticonvulsants in multiple regions of interest18. Multiple lines of evidences support the neuroprotective effects of mood stabilizers and lithium18,26,27,28, although one study did observe reduced FA only in lithium-treated but not lithium-free BD versus HC29. Clarifying the effects of different drugs in medicated samples requires overcoming numerous challenges in sample size limitations, variations in medication load, combinations and adherence, the common occurrence of polypharmacy managing bipolarity, and also complex symptomatic differences that influence choice of medication use.

Unfortunately, examination of unmedicated samples have been rare. One study of unmedicated (eight of the 18 patients were treatment-naïve) pediatric BP-I patients did not find any WM disruptions30, but reduced FA in superior frontal regions was reported in a small sample of medication-naïve adolescent BP-I patients in their first manic episode (n = 11)31. While medications are not expected to entirely explain the heterogeneity of the WM findings as noted above, examination of treatment-naïve samples should allow better comparability and removal of its confounding effects especially in the comparison of structural connectivity measures between bipolar and unipolar samples.

Alternatively, the paucity of white matter disruption in these young (average age 23), treatment-naïve bipolar II patients with an average of 5 years of illness may suggest that the wide spread WM disruptions summarised in the ENIGMA meta-analysis, with average age of 39.6 and on average 15.47 years since illness onset9,18, could be attributable to neuroprogression. Neuroprogression in bipolar disorder refers to the accumulation of biological disruptions, such as increase in pro-inflammatory cytokines and reduction in neurotrophins, following acute mood instability which progressively increase patients’ vulnerability to subsequent affective episodes32. This is consistent with previously reported correlations of illness chronicity with impaired WM integrity6,33, although WM disruptions had also been found in samples with shorter illness durations (0.2–5.6 years)9, individuals at high familial risk of BD who have not experienced any affective episodes6, and changes in brain volume and functional connectivity have been shown in first-episode BD patients22,34. Late onset and short disease duration were also found in the ENIGMA meta-analysis to be correlated with higher FA in multiple ROIs18. In fact, our observation of significant group-by-age interaction in MD, with age affecting MD in the right ILF specifically in BPII (but not UD and HC) appears consistent with existing evidence of neuroprogression leading to WM damage as a specific phenomenon in bipolar disorder.

The significant reduction in UD patients, compared to BPII and HC, of AD in the right SLF, may reflect increased WM disruption35 in medication-naïve UD but not individuals with BPII. Decreased AD has previously been reported in first-episode treatment-naïve UD patients, but in left rather than right SLF36. The lack of between-group difference in RD with reduced axial diffusivity suggested that impaired axonal integrity, instead of myelin loss, explained the reduction of MD, which is a scalar estimate of mean water diffusivity perpendicular and parallel to the tract37. The difference was unlikely related to clinical severity or chronicity effects, as UD and BPII groups had similar depressive and anxiety severity scores, number of comorbid disorders and significantly fewer affective episodes in UD. The two groups’ similar percentages of patients with family depressive history also did not suggest stronger biological loading in the UD group.

The negative correlation between family history of MDD and AD in the right SLF UD group may suggest a link of heritable susceptibility to depressive disorder to impaired axonal integrity in UD. Indeed, altered WM integrity in multiple ROIs including bilateral SLF, has been shown in healthy adolescents at familial risk for affective disorders38. Interestingly, in the BPII group, AD in right SLF correlated negatively with bipolarity index, suggesting a potential relationship between inherent and manifest bipolarity and axonal damage, but given the multidimensional nature of this index39, it would require larger samples to elucidate whether it was the family bipolarity, course or symptom characteristics that explained the correlation. The positive correlation of family history of MDD with AD in right SLF in the BPII group is in conflict with the negative correlation observed in the UD sample. Although consistent with our ANOVA result that BPII group has increased AD in right SLF, this finding suggests that familial risk of depressive disorder may relate to increased axonal integrity in patients with bipolar disorder. This finding needs to be further examined in larger samples, which could benefit from combined analysis of comparable samples in multi-centre studies.

WM disruption in the SLF had been reported in medication-free patients with UD40 to be associated with depressive symptoms and a possible trait marker for late-life depression. For right SLF (temporal) integrity (AD) to co-vary only with trait (family bipolarity) and symptom (hypomanic) markers of bipolarity in the PCA results in the present study, and to discriminate UD from BPII with good sensitivity and specificity in LDA among the depressed patients, along with a separate component that indicates episodic recurrence (known marker for bipolarity), in fact suggested right SLF (temporal) integrity to be a potential discriminant marker for bipolar II vs unipolar depression that is independent of course characteristics. However, the sensitivity and specificity in this exploratory LDA was likely inflated by the application of this model on the same samples from which the measures were chosen based on one-way ANOVAs to differentiate the two samples. To compensate for this shortcoming, we performed another LDA with leave-one-out cross-validation and found similar prediction accuracies (see Supplementary Table S3), which ensured that the accuracy of the selected training sample for deriving discriminant scores was optimistic. Nonetheless, these factors should be further tested on samples recruited de novo, which would greatly benefit from multi-centre collaboration.

There are other limitations that need to be considered in interpreting the findings here. Firstly, WM integrity is only one of many factors affecting DTI indices41. Even though the 15 a priori WM tracts are well-characterised42, there are still likely crossing fibres present in some of the tracts to affect water diffusion. Any such group differences in tissue fibre architecture, as well as axon diameter and packing density are all potential confounds that could have obscured WM disruptions in BD. Secondly, our sample may have excluded more severely depressed or suicidal patients, who would have required immediate inpatient treatment, which may render brain scan in a treatment-naïve state risky and ethically questionable. To the extent that these such patients may be expected to have greater biological disturbances, our findings may have underestimated WM disruptions in BD and UD. Thirdly, a larger sample size, or meta-analyses may enhance power that is needed to detect smaller white matter changes and examine the effects of clinical bipolar correlates of possibly smaller effect size, such as family loading and age of onset. Fourth, the lack of a BPI comparison group precludes direct comparison with our BPII patients. As mentioned, it is therefore uncertain if the observed differences (or lack thereof) between UD and BPII were specific to BPII or applicable to BPI as well. Fifth, the exclusion of psychosis or substance use has allowed us to conduct a more homogeneous investigation, but may have also limited us from accessing a fuller spectrum of bipolarity which would be associated with these two conditions. Lastly, we do not have data on the subjects’ body mass index, which were reported to be associated with WM abnormalities43. Although none of our young and medication-naïve patients reported metabolic syndromes, general medical illnesses frequently comorbid with BD such as obesity, hypertension and diabetes44, may confound the WM disruptions observed in DTI studies and should be included in further research.

The imaging and DTI processing approaches adopted in our study may have shortcomings compared to those used in the Human Connectome Project (HCP). The global probabilistic tractography used in this study (TRACULA) enabled fast and automated quantification of WM tracts in individual subject’s space, eliminating the inaccuracies caused by inter-subject registration. However, with this approach, a single FA (and other DTI indices) was used to quantify the diffusion asymmetry of the entire tract, which might have diluted any WM abnormalities that were not present universally along the tract. Also, tractography was performed for the 15 WM tracts pre-defined by the toolbox only, which may overlook the possibility of other tracts and brain areas that relate to the psychopathological conditions of interest. Alternatively, TBSS enabled voxel-wise comparison of WM tracts between different groups of participants, but reliability has been proven to depend heavily on the specific DTI-derived measure and the pre-processing steps (e.g. warping subjects to a common individual template, smoothing DTI data, etc.)45. On the one hand, these methods allow automated and straightforward processing of DTI data, which is best suited for exploratory analyses. On the other hand, more advanced imaging and processing approaches are useful for characterizing the structural, as well as functional, connectivity in bipolar II disorder holistically. Indeed, the state-of-the-art HCP has enabled advances in imaging and processing pipelines that improve efficiency and accuracy of MRI data analysis46. For example, the ‘multi-band’ pulse sequences47 and customized scanners with increased maximum gradient strength benefit diffusion MRI by enhancing spatial resolution (1.25 mm)48, which facilitates identification of cross-fibers. Scanners with increased diffusion gradient directions have also been developed to improve sensitivity of tractography and support more sophisticated models49,50. An increasing body of research has used probabilistic tractography to build connection matrix that summarizes the macroscopic connectivity of every brain area. In addition, whole brain connectomics analysis of this dataset is under way, which is hoped to shed light on differences in the network characteristics of these BPII and UD patients. Further efforts will also benefit from alignment with the state-of-the-art HCP pipelines in image acquisition and pre-processing.

In summary, in this first DTI study comparing young and treatment-naïve BPII and UD patients, we could not find evidence of increased WM disruption in BPII disorder. Whether this represents a contemporaneous subtype difference between BPI and BPII merits direct, treatment-naïve, comparison. The discrepancy of our findings from previous BPII studies also belies the methodological salience in examining patients stratified by medication status and illness chronicity. Nonetheless, that we could not find evidence of WM disruption in medication-naïve young adults with on average 5 years of illness history suggest that the WM disruptions otherwise found in more chronically ill bipolar patients may be substantially attributable to effects of illness chronicity. The findings also encourage longitudinal examination for the specific effects of accumulated illness and recurrence in BPII disorder, which would help establish the biological underpinning of bipolarity to the extent that it is observable versus BPI and UD.

## Methods

### Participants and recruitment

Treatment-naïve, currently depressed subjects were recruited from individuals presenting to either of three specialist psychiatric clinics during the years 2014–2018 for scheduling a new appointment. Inclusion criteria were (1) aged 18–30, (2) currently satisfying the criteria for DSM-IV-TR Major Depressive Episode, (3) either meeting DSM-IV-TR criteria for Major Depressive Disorder, with no history of hypomanic episodes (for UD), or meeting research diagnostic criteria (RDC) for Bipolar II Disorder (BPII) (DSM-IV-TR MDE with history of hypomanic episodes of at least 2-day duration)51, and (4) no prior exposure to any psychotropic drug treatment in their life time. Exclusion criteria included: (1) current and lifetime histories of psychoses, (2) substance misuse, (3) organic brain syndromes, and/or (4) evidence of intellectual disability. Healthy volunteers without personal or family history of any mental disorders were recruited from online advertisements and a public health centre.

Written informed consent was obtained from all subjects/patients. The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008. All procedures involving human subjects/patients were approved by the New Territories East Cluster—Chinese University of Hong Kong Clinical Research Ethics Committee (CREC Ref. No.: 2014.168).

### Clinical assessments

Diagnostic assessments were conducted by trained interviewers using the Chinese bilingual version of the Structured Clinical Interview for DSM-IV Axis I Disorders (SCID-I), adapted to facilitate diagnosis of current and lifetime hypomanic episodes under the supervision of an experienced clinician academic psychiatrist51,52,53. All lifetime affective episodes were enquired year by year from the first onset of depression using a modified life-chart method based on SCID-I. Repeated interviews were conducted to enhance detection of past hypomanic episodes.

Current-week affective symptoms were rated by trained clinician interviewers using the interviewer-administered MADRS54, YMRS55, and HAM-A56. Current health-related quality of life was evaluated with the SF-3657, validated for Chinese settings58. Medication history was directly enquired with participants and caregivers, and checked against the territory-wide public hospital computer registry. We also assessed for the Bipolarity Index, a clinician-rated scale across five domains: signs and symptoms, age of onset, course of illness, responses to treatment and family history denoting bipolarity as a dimensional construct39. General intelligence was measured with the three-subtest short form of Wechsler Adult Intelligence Scale-III (WAIS-III)59,60.

### MRI acquisition

MR images were acquired using 3-T MRI scanner (Achieva TX series, Philips Healthcare, Best, Netherlands) using a standard 8-channel head coil for signal reception. High resolution structural images of the whole brain were acquired using a 3D T1-weighted sequence (TR: 7.4 ms, TE: 3.4 ms, field of view: 250′250 mm2, 285 contiguous slices, sagittal plane, 0.6 mm R-L thickness, reconstruction matrix: 240′240, flip angle: 8°). To minimize the effects of patient motion to the quality of our raw data, padding was applied around to the participant's head to minimize motion during signal acquisition. At the end of each imaging sequence, image quality checks were made to ensure no gross motion artifacts were visible on the images.

Diffusion-weighted imaging was performed using a single-shot echo planar imaging sequence (TR: 8912 ms, TE: 60 ms, field of view = 224′224 mm2, 70 continuous axial slices, 2 mm slice thickness, no gap, acquisition matrix = 112′112, flip angle: 90°). Diffusion sensitizing gradients were applied along 32 non-linear directions with b = 1000 s/mm2, together with an acquisition without diffusion weighting (b = 0 s/mm2). A parallel imaging acceleration factor of 2.5 was used to reduce scan time.

### Image pre-processing

Image registration was performed to account for motion artifacts arising from patient motion and to realign image datasets before further post-processing. The anatomical images and diffusion weighted images were pre-processed using FSL61 and FreeSurfer 6.062. The anatomical images were corrected for intensity non-uniformity63 in pursuance of segmentation. After removing the skull and neck, the brain images were registered to MNI-305 template, with a series of linear and non-linear transformations, for tract fitting. The brain surfaces were also constructed from brain images in individual native space.

The diffusion-weighted images (DWI) were pre-processed using Freesurfer, with the trac-all command. The DWI were first registered to the b0 images of the volume via affine transformation to correct for eddy-current and echo-planar imaging distortion and head motion, where the b-vectors were rotated to account for motion corrections. Individual DWI were then rigidly registered to individual anatomical images with aid of brain surface reconstruction cost function from anatomical image pre-processing. Cortical and white-matter masks were created, followed by tensor fitting for extraction of tensor-based measures (FA, MD, RD and AD). Lastly, the anatomical priors for 15 white-matter pathways (see below) were computed for TRACULA.

### TRACULA

TRACULA (TRActs Constrained by UnderLying Anatomy)42, an automated toolbox within Freesurfer, was applied to reconstruct the global probabilistic distribution of 15 a priori, well-characterized white matter tracts implicated in bipolar disorder18,20, including the anterior thalamic radiations (ATR), the cingulum cingulate gyrus (CING), the cingulum angular bundle (CAB), the inferior longitudinal fasciculus (ILF), the parietal part of the SLF, the temporal part of the SLF, the uncinate fasciculus (UNC), all bilaterally; and the forceps minor of the corpus callosum (CC). To assure validity of findings, we also included three tracts with no reported bipolar association as control, namely forceps major and bilateral corticospinal tracts. Specifically, diffusion distributions were estimated by applying the ball-and-stick model, using “bedpostX” of FSL, followed by fitting the shape of each white-matter pathway to the results of the ball-and-stick model of diffusion and the anatomical priors for white-matter pathways computed during the pre-processing stage. Subsequently, we examined DTI-derived connectivity indices, including FA, MD, RD and AD (see below).

### TBSS

Tract-based spatial statistics (TBSS) was conducted using FSL for voxel-wise analysis of white matter tracts. FA, AD and RD images created using Freesurfer’s trac-all (see above) were projected onto the mean FA skeleton that represents the centre of white-matter tracts, thresholded at FA = 0.2.

### Statistical analysis

Statistical tests for demographic, clinical and imaging (TRACULA) data were performed using R Statistical Software64. Group differences of demographic and cognitive variables were analysed using one-way ANOVA for continuous variables and Chi-Squared test for categorical variables. Post-hoc group comparisons were performed with false discovery rate (FDR) correction. Differences of clinical measures between UD and BPII patients were analysed by means of independent t-tests for continuous variables, and Chi-Square test for categorical variables. Where unequal variances existed between groups, Welch’s tests were used.

Group differences of FA, MD, RD and AD of unilateral tracts were analysed using one-way ANOVA, with WAIS-III IQ scores included in the model as covariate. In view of the small sample size and unequal variances of DTI data across groups, we performed data resampling by bootstrapping with 1000 iterations. With each bootstrap sample, we conducted a one-way ANOVA and extracted the bootstrapped $${F}_{B}$$. For each DTI measure at each tract, we identified the 2.5th and 97.5th percentiles (i.e. 95% confidence intervals) of all $${F}_{B}$$ and compared it with the observed F. Group differences were considered statistically significant where the observed F was not within the 95% CIs of $${F}_{B}$$. Post-hoc t-tests were conducted with FDR correction. Using the same bootstrap samples (with 1000 iterations), the bootstrapped $${t}_{B}$$ were extracted and the 2.5th and 97.5th percentiles (i.e. the 95% confidence intervals) of all $${t}_{B}$$ were obtained. Significant group differences were identified where the observed t exceeded the 95% CIs of $${t}_{B}$$. In addition to the main ANOVA, we conducted a supplementary group-by-age analysis to examine the effect of age on the DTI measures using the same ANOVA and bootstrapping methods. Results are reported in Supplementary Table S2. For TBSS, voxel-wise permutation analysis on the skeletonized data (FA, AD and RD) were performed using FSL’s PALM with 5000 permutations, using threshold-free cluster enhancement (TFCE), corrected for age, sex and education level.

We examined the correlations between the DTI and clinical data in our patient samples by calculating the Pearson’s r. Correlation was tested for each patient group separately and also with both groups combined. For all correlation tests, bootstrapping was conducted with 1000 iterations. Specifically, with each bootstrap sample, we extracted the $${r}_{B}$$ and identified the 95% confidence intervals, which equalled to the 2.5th and 97.5th percentiles of all $${r}_{B}$$. Significant correlation was indicated by which the observed r was not within the 95% CIs of $${r}_{B}$$.

In view of multicollinearity in the clinical and imaging measures, principal components analysis (PCA) was performed, instead of linear regression, to assess the multivariate linear relationships among the clinical and DTI measures in the UD and BPII groups, on DTI measures and clinical variables with significant group differences. Varimax rotation was applied to the component loadings. We examined the scree plot and extracted components with eigenvalue greater than 1 to identify the number of components sufficient to explain the variance in the data. The components obtained from the PCA were then entered into a linear discriminant analysis (LDA) to examine the classification between UD and BPII groups based on the DTI and clinical variables. A maximum likelihood method was used, assuming the variables were continuous and normally distributed. To compensate for the relatively small sample size, we performed an LDA with leave-one-out cross-validation and found similar prediction accuracies (see Supplementary Table S3), which ensured that the accuracy of the selected training sample for deriving discriminant scores was optimistic. Since only DTI and clinical measures differentiating our UD and BPII samples were included in the PCA-LDA, the specificity and sensitivity in differentiating UD from BPII here is likely stronger here than would be found in the population. Thus, this analysis is exploratory in nature. In addition, PCA and LDA were performed again to examine the model without the DTI variable, i.e. AD in right SLF (temporal) to explore the effect of the imaging variable on the model.