## Main

Normative modeling is an emerging statistical framework that aims to capture variability by comparing individuals with a normative population1. The benefits of normative modeling in neuroimaging in general are well documented, with applications in atypical brain development and psychiatry (see ref. 2 for a review). Applications of deep normative modeling have mostly been demonstrated at the voxel level using functional MRI data3 and volumetric data derived from standard structural magnetic resonance imaging (MRI)4,5.

Diffusion MRI (dMRI) is another MRI modality that allows non-invasive characterization of tissue microstructure. In the brain, for example, information about the structural architecture of the white matter can be obtained by probing the random motion of water molecules6 and acquiring multiple magnetic resonance images with different diffusion-senitization properties. The ability to derive semiquantitative features such as fractional anisotropy7 or mean diffusivity8, or to virtually reconstruct white-matter pathways with tractography9 has had a huge impact on the ability to distinguish between typical and atypical brain structures in vivo in health and disease10; however, prediction modeling (or case-control differentiation), in which a group of N patients with the same disease is compared with a group of N-matched controls, is not well suited to clinically heterogeneous groups (for example, neurological disorders11, psychiatric disorders such as schizophrenia or autism spectrum disorder, and rare cases). Despite decades of progress in the research domain, the primary clinical use of dMRI has been for largely limited to diagnosing acute ischemic stroke or grading and monitoring of tumor invasion. Several studies have shown success in identifying subtle but important microstructural changes at the individual patient level12, showcasing the potential of dMRI to be applied more broadly across applications. Yet there is a scarcity of dMRI frameworks for single-patient analysis (that is, one patient versus N controls). There is an urgent yet unmet need for a paradigm shift from group-wise comparisons to individualized diagnosis, that is, detecting whether (and where) the tissue microstructure of a single participant is abnormal13; not only would this greatly facilitate the study of clinically heterogeneous groups1,4, it would also facilitate the study of rare diseases and true clinical adoption (that is, making a diagnosis/prognosis in an individual patient).

As mentioned above, current efforts to apply deep normative modeling to neuroimaging have so far relied on voxel-based methods. For the assessment of white matter—which comprises continuous pathways—under conditions in which an entire pathway may be affected (for example, in developmental disorders such as schizophrenia14), fragmenting the analysis in this way is suboptimal, whereas operating on the manifold of reconstructed tracts offers a more intuitive approach. Indeed, in comparing microstructural properties between groups, many computational pipelines adopt a tractometry approach15,16, that is, mapping measures along pathways reconstructed via tractography, either by averaging along the whole tract17 or a segment thereof18,19,20,21. Along-tract profiling has been applied previously to investigate various brain conditions15,19,22,23,24. The main advantage here is that image registration can be avoided as tractometry is performed in each patient’s native space, resulting in a set of individual tract profiles. There are, however, some limitations. First, most analyses treat tractometry measures from specific pathways as independent measures. This univariate approach has the potential to obscure key relationships between different tracts. Focusing on any particular anatomical location therefore increases the risk of losing the full picture. Although individual pathways can appear normal in isolation, by considering them as an ensemble (for example, as part of a network), any such intertract relationships could collectively help to identify outliers. Second, when analyzing multiple measures (even when derived within the same tract), statistical analysis is hampered by: (1) the multiple comparisons problem; and (2) any covariance between measurements17,25. Here multidimensional approaches can increase statistical power by combining the sensitivity profiles of independent modalities25,26,27.

Here we present an anomaly detection framework (Detect) that uses a data-driven, unsupervised normative modeling approach based on deep autoencoders28. Autoencoders are a type of artificial neural network traditionally used for dimensionality reduction28. They can capture non-linear interactions between the input features by learning a self-representation of their inputs through a low-dimensional layer (Fig. 1); the goal is to generate an output ($$\hat{\textbf{x}}$$) similar to the input (x) by minimizing the reconstruction error. This same representation can be exploited for anomaly detection by analyzing deviations in the reconstruction. Detect moves substantially beyond dMRI group-level analysis techniques by identifying and localizing anomalies in multiple tract profiles at once at the individual level. The proposed framework was trained to learn normative sets of features derived from healthy brain tract profiles in three independent datasets and one reproducibility dataset29. Once trained, the network is then presented with unseen healthy tract profiles (for testing) and subsequently exposed to tract profiles from individuals diagnosed with neurological/psychiatric conditions. We emphasize that the model has no access to the diagnostic labels during the training phase and thus our anomaly detection approach is fully unsupervised. To assess generalization, the framework was then applied to single participants with a range of neurological and psychiatric disorders, including children and adolescents with copy number variants (CNVs) at high risk of neurodevelopmental and psychiatric disorders; patients with drug refractory epilepsy; and patients with schizophrenia (SCHZ). We compared the performance of our approach with (1) a conventional z-score-distribution approach and (2) principal component analysis (PCA) combined with the Mahalanobis distance (a widely used approach in cluster analysis and classification techniques)21,24,27. Importantly, the analyses of the three pathological cases were performed in the browser, from the point-of-view of future Detect users.

## Results

### Framework overview

Detect is an open-source, user-facing tool built on the interactive Streamlit framework, which promotes data exploration through interactive visualizations in the browser. The framework offers three main scripts: Detect, Inspect and Relate. Detect facilitates patient comparisons using cross-validated area under the curve (AUC) computed over a set of user-defined iterations, by means of z-score, PCA or autoencoder (see Supplementary Fig. 4). The output is a single, bootstrapped anomaly score for each patient. Inspect allows the user to select a single subject and to compare it with the rest of the population. Here, feature anomalies are highlighted using a leave-one-out cross-validation approach, and only segments along the tract where two consecutive outliers occurred are reported. Results are displayed in real-time during each iteration for the user to evaluate. Furthermore, both scripts allow the visualization of tract profiles. Finally, Relate is a simple visual interface for correlating the anomaly scores obtained by the previous commands with clinical scores. In all scenarios, microstructural features included fractional anisotropy, mean diffusivity and rotationally invariant spherical harmonic features of zeroth and second orders30 (RISH0 and RISH2, respectively) derived from the largest b-value data in each dataset to maximize sensitivity to the intra-axonal signal31.

### White-matter anomaly detection in CNV participants

#### Discriminating power

First we investigate individual differences in white-matter microstructure in children with CNVs at high genetic risk of neurodevelopmental and psychiatric disorders32, which are relatively rare and therefore challenging to recruit for research imaging studies33. Note that the framework was trained using data from typically developing children only. In general, the autoencoder approach showed higher AUC scores across the microstructural metrics and was better at identifying CNV patients as outliers, providing substantially higher sensitivity–specificity tradeoffs than the z-score and Mahalanobis-based approaches (Supplementary Fig. 1 and Supplementary Table 2). For example, the RISH0 feature showed higher discriminating power (AUC = 0.83 ± 0.08) compared with the mean univariate z-score (AUC = 0.80 ± 0.09) and multivariate Mahalanobis distance (AUC = 0.61 ± 0.09). In comparing the RISH0 group distributions (Fig. 2), anomaly scores derived via the autoencoder were significantly different (Kolmogorov–Smirnov test = 0.62, P < 0.003; Cohen’s d effect size = 1.39) between the CNV individuals and the typically developing patients. In particular, all CNV patients had an anomaly score larger than the typically developing patient mean and 50% of them were larger than the 95th percentile of the typically developing patient population. In comparison, the difference between the anomaly scores was less pronounced with the z-score (Kolmogorov–Smirnov = 0.34, P = 0.3, Cohen’s d = 0.38) approaches, whereas similar results were obtained by using the Mahalanobis distance (Kolmogorov–Smirnov = 0.56, P = 0.01, Cohen’s d = 1.20).

#### Tract-specific deviations

A key advantage of using deep autoencoders for anomaly detection over traditional PCA-derived approach is its unique ability to relate the anomaly back to the individual elements of the input data. More specifically, the predicted data retains the same dimensionality as the input data and thus it is possible to see which feature cannot be accurately recovered by the autoencoder. For example, if a feature has a positive reconstruction error, then one can infer that the network learned a smaller value for that feature than what was provided as input. In the context of the CNV participants, multiple regions were highlighted (by positive reconstruction errors) as deviants from the typically developing patient population. Figure 3 reveals a high anomaly rate for various association bundles such as the bilateral inferior longitudinal fasciculus (ILF), optic radiations and the left superior longitudinal fasciculus (SLF_II). This is in line with current literature where microstructural differences are expected along association pathways, in agreement with psychotic symptoms34.

### White-matter anomaly detection in epilepsy

Focal cortical dysplasia (FCD)—a malformation of cortical development—is the most common etiology in drug-resistant neocortical partial epilepsies35. Although complete resection is the main predictor of seizure freedom following surgery, a substantial proportion of FCDs may be missed with standard clinical imaging protocols35 and the seizure-generating network may extend far beyond the visible dysplasia. Diffusion MRI contrast enhances the sensitivity of MRI to differences in the brain, but has only been reported at the group level36. Here we demonstrate two key advantages of the deep autoencoder approach in a clinical context. First, it succeeded in detecting white-matter anomalies that a conventional z-score-based approach has missed, potentially due to hidden interactions between the features. Second, detection of abnormal microstructural features away from putative seizure onset zone, as demonstrated in the first example, may contribute to the mapping of epileptogenic networks in individuals; thus, although the examples shown here had radiological changes detectable with T2-weighted sequences, the method could potentially be extended to cases of MRI-negative partial epilepsy increasing the diagnostic yield.

Patient 1 is a young adult female with seizures described as a fuzzy painful sensation in the torso rising up to the head, associated with mumbling sounds, occurring 2–5 times per day. A scalp video electroencephalogram (EEG) showed left temporal interictal epileptiform discharges and left temporal EEG onset. Clinical imaging demonstrated a small area of cortical–white-matter junction blurring in the laterobasal left temporal lobe associated with a transmantle area of T2 hyperintensity, suggestive of FCD type II37. Neuropsychological assessment was concordant with a left temporal deficit, also revealing preserved mesial structures manifesting in relatively preserved verbal memory performance. Subsequent stereo-EEG (SEEG) implantation confirmed ictal onset and prominent interictal discharges from neocortical contacts immediately behind the MRI lesion; furthermore, neocortical discharges were seen in SEEG contacts close to the temporal pole. Based on those clinical findings (neuropsychological assessment and SEEG), the patient proceeded to have resection with histology consistent with FCD. Five tracts of possible relevance were interrogated (Fig. 4). Microstructural anomalies were identified along the left ILF and optic radiations in the immediate proximity of the T2-weighted changes corresponding to SEEG contacts with maximal ictal EEG changes. Anomalies in the temporal portions of the left inferior fronto-occipital fasciculus and uncinate fasciculus pointed towards the temporal pole corroborating the SEEG findings that, despite normal clinical MRI, this area was part of the seizure network.

Patient 2 is an adult female with focal onset seizures since childhood occurring daily with episodes of loss of contact, grimacing and limb stiffening hypermotor movements, including clutching at nearby object on the left side. Scalp video EEG findings were consistent with frontal onset seizure semiology. Clinical MRI showed blurring of the cortical–white-matter junction between the right posterior superior frontal gyrus and the adjacent precentral gyrus, and a transmantle sign on T2/FLAIR from the cortex reaching all the way to the lateral ventricle, consistent with FCD type II. Subsequent SEEG recordings demonstrated spatial overlap between primary motor areas and early ictal onset and hence the patient did not proceed to surgery. Five tracts of possible relevance were interrogated with our framework (Fig. 5). Anomalies were detected corresponding to radiological and electrophysiological findings along the right corticospinal tract (CST), primary motor (CC4) and superior longitudinal fasciculus (SLF-I) beyond the visible lesion. No anomalies were found along the right cingulum and primary sensorimotor (CC5) regions.

The results are promising, with the tool identifying anomalies in concordance with clinical hypothesis in a single-patient analysis paradigm, testifying to its utility for clinical evaluation. Its extra value is highlighted by its sensitivity to outlying tract segments not detected with the conventional z-score approach. The N = 1 approach to detect deep white-matter anomalies illustrated here will facilitate the identification of individualized therapy most appropriate to that patient, suggesting additional targets for diagnostic evaluation and possible surgical treatment.

### Linking brain heterogeneity with epidemiological findings in schizophrenia

The extent to which individual clinical variability in schizophrenia relates to microstructural variability remains a key challenge in neuropsychiatry38, with most findings being at the group or voxel level. For the RISH0 feature, the autoencoder approach was better at identifying single SCHZ patients as outliers (AUC = 0.64 ± 0.06) when compared with PCA (AUC = 0.59 ± 0.07) or the z-score (AUC = 0.39 ± 0.06). In comparing these group distributions, anomaly scores derived from the autoencoder were found to be significantly different (t = –2.60, P = 0.01, Cohen’s d = 0.47) between the SCHZ individuals and the healthy participants (Fig. 6). In particular, 31 of the 43 SCHZ patients had an anomaly score larger than the healthy participants’s mean and nine of them were larger than the 95th percentile of the healthy participants’s population. In comparison, the difference between the anomaly scores was less pronounced for the PCA (t = –1.75, P = 0.08, Cohen’s d = 0.32) and z-score (t = 1.85, P = 0.07, Cohen’s d = 0.33) approaches (see Supplementary Fig. 1 and Supplementary Table 2). Furthermore, the above chance-level detection rates of the proposed deep autoencoder in SCHZ suggest a successful application of the tractometry-based framework in unsupervised anomaly detection. The significance of these results are even more pronounced considering the challenging task at hand; that is, where even a supervised support vector machine classifier provides similar accuracy (AUC = 0.65 ± 0.13). Finally, to provide an illustrative way in which the Detect tool can provide an important avenue for future studies, anomaly scores were correlated with clinical scores. The Hopkins anxiety index (Hopkins symptom checklist), a widely used screening instrument to study mental illness, was used as a proof of concept. For the autoencoder, PCA and z-score, the Spearman’s rank correlation coefficients were ρ = 0.38 (P = 0.012), ρ = 0.3 (P = 0.055) and ρ = –0.12 (P = 0.448), respectively (Fig. 6).

### Repeatability of anomaly scores and tract profiles

Using a test-retest dataset (six patients, five time points29), we assessed the repeatability of (1) the input RISH0 tract profiles and (2) the generated anomaly scores by calculating the intraclass correlation coefficient (ICC, with two-way mixed, absolute agreement for average measurements as in ref. 29) and the coefficient of variation (CoV). Supplementary Fig. 2 shows the repeatability of the tract profiles, with the optic radiations being the most reproducible bundles (mean ICC = 0.95, CoV = 0.03) and the left cingulum being the least reproducible (ICC = 0.66, CoV = 0.06). In terms of anomaly scores, the proposed anomaly detection framework shows reconstruction errors that are reproducible across sessions (Supplementary Fig. 3) with an ICC of 0.96 (95% CI: 0.88, 0.99) and a CoV of 0.06. Furthermore, reliability of anomaly scores in the pediatric dataset can be estimated from the test-retest dataset. Using the two-way mixed ICC for consistency (for single measurements) of 0.85 from the test-retest study, assuming a similar measurement-related variance and accounting for the standard deviations in both cohorts, the ICC for anomaly scores in the pediatric cohort is 0.86. Even if the measurement-related variance was to increase by 30%, the ICC would still indicate good reliability (0.76). This confirms that the demonstrated group differences in anomaly scores are unlikely to have occurred due to measurement-related variance.

## Discussion

Further exploration of input features and hyperparameters of the model remains to assess the generalizability of the framework and its application to other pathologies. From a generalization point of view, the key problem in biomarker research is the need for individual prediction/diagnosis. Advancing knowledge of brain pathology and related cognitive impairment at the individual level is essential for early detection and intervention. Normative models show that, if groups are too heterogeneous, it can be a challenging task to learn characteristics from a given population using supervised approaches, hence the need for unsupervised learning1. Although the amount of data we can employ in imaging studies is relatively small in comparison with population-based studies, the framework provides a principled method to detect individual differences in tissue microstructure. With the ever-growing amount of dMRI data being acquired, the framework will make less conservative inferences as the number of data points increases. In principle, our framework can be trivially extended to tract-based assessment of non-diffusion-based microstructural metrics (for example, such as magnetization transfer and myelin water imaging16,17), as long as the associated quantitative maps are co-registered with the diffusion space. They could be derived from any manually defined or atlas-derived regions of interest, tract-based regions of interest as done here, or even at the voxel-based level39. Moreover, our framework could be applied to any set of neuroimaging features. For example, these inputs could include time series from functional MRI and cortical thickness derived from structural T1 imaging. Finally, other applications of the framework include the characterization of microstructural changes in neurological disorders without gross pathology. Furthermore, Detect could also help predict worse phenotypic outcome in those with a rare genetic disorder (for example, CNV carriers) and potentially allow for early therapeutic planning in the future.

One of the limitations of single-patient analysis is that it requires substantial amounts of healthy participant data to define a normative brain1. Combining multisite or multiscanner dMRI datasets can greatly increase the statistical power of neuroimaging studies40, but cross-scanner and cross-protocol variability challenges joint analysis, hence the need for data harmonization30,41. However, dMRI harmonization is a young field, there is not yet an established method that can bring cross-scanner variability back to the level of within-patient variability. Moreover, tractography still faces considerable challenges in the field42,43 and most commonly available tools can only track reliably within normal-appearing white matter. Although recent machine learning approaches have shown promise in reproducible tract segmentation across patients44, therefore strengthening hope of analyzing dMRI data for group studies, there is a potential challenge: along-tract profiling approaches should only be used when a complete tract has been reconstructed. Future work will establish the utility of this approach in conditions where pathology leads to incomplete tract reconstruction. Another limitation is that autoencoders require more computation than PCA; however, PCA will also have limitations for large datasets where memory storage is an issue.

The choice of dMRI measure will also influence the capacity of the tool to detect anomalies. For example, the use of fiber-specific measurements45,46 could help better disentangle anomalies in fiber-crossing regions. This would of course demand a model-based approach as opposed to the RISH0 features employed in this study. Furthermore, it was recently demonstrated that there is more sensitivity to individual differences at high b-values31,47. Nevertheless, we are encouraged to see that even with more commonly used b-values (for example, b = 1,000 s mm2), we are still able to uncover patient/control differences in the SCHZ cohort, highlighting the potential for widespread clinical adoption of dMRI.

In the context of microstructural MRI, the demonstration of Detect through the three different scenarios goes beyond the utility of other techniques and provides compelling motivation for future application. The unsupervised multivariate framework proposed here uses state-of-the-art machine learning to approach high-dimensional data non-linearly and improve accuracy and precision over traditional anomaly detection. Our deep learning approach also provides advantages over other statistical approaches for outlier detection as it was recently shown (using functional MRI data3) that deep unsupervised approaches improve identification of psychiatric patients compared to mass-univariate normative modeling. It is also generally accepted that autoencoders tend to perform better when the middle layer (that is, bottleneck layer) is small when compared with PCA. This can potentially mean that the same accuracy can be achieved with less components and hence may be beneficial for smaller datasets.

Browser-based applications are becoming increasingly popular among the computational neuroscience community due to their ease of use and accessibility across devices21. Detect enables the detection of abnormalities in clinically heterogeneous groups or rare cases and ultimately improve diagnosis of neurological and neuropsychiatric disorders. Our aim was to develop and distribute an open-source framework to characterize microstructural white-matter changes at the individual level. This enables the detection of abnormalities To the best of our knowledge, other tools such as AFQ Browser21 compare individuals using a linear approach (z-score) that considers each tract-segment independently and ignores potential complex interactions between the features. Those tract segments are then statistically tested in an univariate manner, and as such the correction for multiple comparisons—required by the typical high dimensionality of dMRI data—will hamper the discriminating power of the analysis25. Recently, PCA was employed to acknowledge the multivariate nature of dMRI data25,26,27, but this approach still relies on linear assumptions thereby ignoring possible complex interactions between the features. We believe strongly that the proposed deep autoencoder approach goes hand-in-hand with existing browser-based dMRI analysis frameworks21 in encouraging reproducible research and data-driven discoveries. Finally, we encourage future users of Detect to apply the tool to their own datasets and we welcome contribution to the tool in the form of added functionalities via GitHub.

In summary, diffusion MRI offers great promise to detect subtle differences in tissue microstructure when applied at the group level; however, the goal of clinical neuroimaging is to be applicable at the individual level. The single case approach proposed here will facilitate the identification of individualized therapy most appropriate to that patient, forming a baseline biomarker for subsequent monitoring through a therapeutic process. We believe that our tractometry-based anomaly detection framework paves the way to progress from the traditional paradigm of group-based comparison of patients against healthy participants, to a personalized medicine approach, and takes us a step closer in transitioning microstructural MRI from the bench to the beside.

## Methods

### Detect interactive interface

Users of Detect will input demographic data that consist of comma-separated values (.csv), where each row represents a patient (ID). Example demographics columns include: group, age, gender or clinical scores. The user is given the option to correct for confounding factors (that is, by treating those attributes as covariates across the entire brain2), resulting in age-independent microstructural features, for example. The microstructural tractometry data format consists of an .xlsx spreadsheet, where each sheet represents a dMRI metric (for example, fractional anisotropy, mean diffusivity and so on). As per the demographic data, patients are stacked individually on each row. The first column denotes the ID of each patient. The remaining columns follow the following convention: bundle_hemi_section where bundle is the white-matter bundle of interest, hemi is the hemisphere (that is, left or right and void for commissural tracts), and section is the along-tract portion (for example, from 1 to 20).

### Data acquisition and preprocessing

#### CNV dataset

Diffusion MRI data were acquired from 90 typically developing children (age 8–18 years) and eight children with CNVs at high risk of neurodevelopmental and psychiatric disorders (CNVs, 2× 15q13.3 deletion, 2× 16p11.2 deletion, 3× 22q.11.2 deletion and 1× Prader–Willi syndrome) and no apparent white-matter lesions (age 8–15 years). Data collection procedures for the typically developing and CNV groups were approved by the Cardiff University School of Psychology and School of Medicine Ethics Committees, respectively. Children under 16 and those over 16 who lacked capacity to consent given written/verbal assent and their parents or legal guardians gave written consent on their behalf. Images were acquired using a Siemens 3 T Connectom MRI scanner (32-channel radiofrequency coil, Nova Medical) with 14 b0 images, 30 directions at b = 500, 1,200 s mm2; 60 directions at b = 2,400, 4,000, 6,000 s mm2; and 2 × 2 × 2 mm3 voxels (TE (echo time)/TR (repetition time) = 59/3,000 ms; Δ/δ = 24/7 ms). The total scan time for the multishell protocol was 16 min and 14 s. Each dataset was denoised48 and corrected for signal drift49, motion and distortion in FSL50, gradient non-linearities51 and Gibbs ringing52. Next, RISH features30 were derived for each patient using the b = 6,000 s mm2 shell to maximize sensitivity to the intra-axonal signal31 (zeroth and second orders only, RISH0 and RISH2, respectively). Furthermore, diffusion tensors were generated using an in-house non-linear least squares fitting routine using only the b ≤ 1,200 s mm2 data, followed by the derivation of fractional anisotropy and mean diffusivity maps.

#### Epilepsy dataset

Diffusion MRI data from two epilepsy patients with FCD were acquired on a Siemens 3 T Connectom MRI scanner with 60 directions at b = 1,200, 3,000 and 5,000 s mm2 and 1.2 × 1.2 × 1.2 mm3 voxels (TE/TR = 68/5,400 ms; Δ/δ = 31.1/8.5 ms; total scan time = 30 min and 25 s). Furthermore, data on 15 healthy participants (aged 21–41 years) from the computational diffusion MRI harmonization database were used40. Data collection procedures for the healthy participant and FCD groups were approved by the Cardiff University School of Psychology and School of Medicine Ethics Committees, respectively. Written informed consent was obtained from all patients. Each dataset was corrected for Gibbs ringing52, signal drift49, motion and distortion in FSL50, and gradient non-linearities51. Next, RISH0 features30 were derived for each patient using the b = 5,000 s mm2 shell. A fourfold data augmentation was applied to the healthy participant tract profiles using the synthetic minority oversampling technique53 resulting in 75 healthy participants.

#### SCHZ dataset

Diffusion MRI data from the UCLA Consortium for Neuropsychiatric Phenomics54 was downloaded from the OpenNeuro platform (openneuro.org/datasets/ds000030/versions/00016), which also contains demographic, behavioral and clinical data. Although more focused on functional MRI, the dataset contains dMRI data from 123 healthy participants and 49 individuals with SCHZ amongst other psychiatric disorders. Data were acquired on a Siemens 3 T Tim Trio MRI scanner with one b0 image, 64 directions at b = 1,000 s mm2 and 2 × 2 × 2 mm3 voxels. Data quality assessment was first performed, resulting in the exclusion of datasets with reduced field-of-view (preventing the reconstruction of white-matter bundles in the inferior temporal lobes) and those with substantial slice dropout (impacting estimation of diffusion metrics). A total number of 109 healthy participants (aged 21–50 years) and 43 SCHZ patients (aged 22–49 years) were used for further analysis. Diffusion data were denoised48, corrected for patient motion in FSL50 and distortion using the anatomical T1-weighted image as reference. Next, RISH features (RISH0, RISH2) were derived for each patient. Finally, diffusion tensors were generated using iteratively weighted least squares in MRtrix55 followed by the derivation of fractional anisotropy and mean diffusivity maps.

#### Repeatability dataset

To assess repeatability, we employed the microstructural image compilation with repeated acquisitions dataset29, which comprises five repeated sets of microstructural imaging in six healthy human participants (three female, aged 24–30 years). Each participant was scanned five times in the span of two weeks on a 3 T Siemens Connectom system with ultra-strong (300 mT m–1) gradients. Multishell dMRI data were collected (TE/TR = 59/3,000 ms; voxel size = 2 × 2 × 2 mm3; b-values = 0 (14 volumes), 200 and 500 (20 directions), 1,200 (30 directions), 2,400, 4,000, and 6,000 (60 directions) s mm2) resulting in a total scan time of 16 min and 37 s. The same preprocessing as the CNV dataset was applied. Data collection was approved by the Cardiff University School of Psychology Ethic Committee and written informed consent was obtained from all patients.

### Tractometry

For each dataset, automated white-matter tract segmentation was performed using TractSeg44 (see Supplementary Table 1) using multishell constrained spherical deconvolution47. For each bundle, 2,000 streamlines were generated. Tractometry16 was performed (sampling fractional anisotropy, mean diffusivity, and RISH0 and RISH2 at 20 locations along the tracts19,23,25) using a Nextflow architecture provided by SCILPY (github.com/scilus/scilpy). Specifically, individual streamlines were reordered for all patients to ensure consistency using the following order: left-to-right for commissural tracts, anterior-to-posterior for association pathways and top-to-bottom for projection pathways. Next, a core streamline was generated and microstructural metrics at each vertex of the bundle were projected to the closest point along the core. The resulting tract profiles were concatenated to form a feature vector (x).

### Artificial neural network

Our autoencoder implementation consists of a symmetric design of five fully connected layers (ln). The input and output layers (l1 and l5) have exactly the same number of nodes as the number of input tracts features. The inner layers (l2 and l4) consecutively apply a compression ratio of two by reducing the number of nodes by half, up to the bottleneck hidden layer (l3). For example, if using an input vector made of 100 features, l1 and l5 will consist of 100 nodes; l2 and l4 50 nodes; and 25 nodes for the bottleneck l3. Rectified linear units activation was used between the layers to promote sparse activation and tanh for the last layer (epochs = 25, batch size = 24, learning rate = 1 × 10–3; optimizer, Adam; loss, mean squared error; validation split = 0.1). Using different activation functions in different layers aims at balancing the advantages and disadvantages of the two activation functions. To promote sparsity and reduce overfitting, an activation penalty was imposed to the bottleneck layer using 1-regularization (×10–5). This is especially best suited for models that explicitly seek an efficient learned representation. The goal is to generate an output ($$\hat{x}$$) similar to the input (x) by minimizing the reconstruction error. Here the MAE was used as anomaly score and is defined as:

$${\mathrm{MAE}}=\frac{1}{n}\mathop{\sum }\limits_{j=1}^{n}| {x}_{i}-{\hat{x}}_{i}| ,$$
(1)

The MAE measures the average magnitude of the errors and is derived during testing by computing the absolute differences between the reconstructed microstructural features ($${\hat{x}}_{i}$$) and the raw input features (xi). Due to the heavily imbalanced group ratio between healthy participants and patients (that is, CNV and epilepsy), a bootstrap was implemented to draw random samples of equal sizes from each group. Specifically, the autoencoder is trained using healthy participants data only. The entire dataset is therefore first split into a training set (80%, made of healthy controls only) and a validation set (20%, by combining the patients with a matching number of healthy participants) to create the outer fold; 10% of the training set is held out for testing during the training phase (inner fold) to evaluate the loss. Age and sex regression56 and feature normalization (min–max) were performed on the normative training set and subsequently applied to the held-out validation set to prevent information leakage. To derive conservative estimates and assess variations within the model, we repeated this process 100 times and report the mean MAE for each patient. Finally, we compared the sensitivity versus specificity of the anomaly scores using the mean receiver operating characteristic AUC across iterations, with standard deviations used as uncertainties. Comparisons of the AUC scores between patients and controls were performed using two-tailed t-tests assuming equal variances in the case of balanced groups and Kolmogorov–Smirnov test for the unbalanced groups. The correlation between anomaly scores and clinical scores was computed using Spearman’s ρ.

### Univariate approach

One of the most commonly used tools in determining outliers is the z-score. The z-score (or standard score z = x − μ)/σ) is a way of describing a data point as deviance from distribution, in terms of standard deviations from the mean of the normal distribution. Here, z-scores were computed for each tract-segment, relative to the mean of the healthy group (μ) and averaged to derive a patient-specific anomaly score at each iteration (described above). The mean over all iterations described above was retained as anomaly scores for all patients.

### Multivariate linear approach

Principal component analysis was applied to the set of features by restricting the dimensionality to preserve features accounting for 85% of the variance in the data. In Detect, this number can be manually defined as the percentage of explained variance. Next, the Mahalanobis Distance (M, a multidimensional generalization of the z-score that accounts for the relationships between the white-matter bundles) was used to derive an anomaly score defined as:

$$M(x)=\sqrt{{(x-\mu )}^{^{\prime} \cdot }{C{}^{-1}}^{\cdot }(x-\mu )},$$
(2)

where x represents the feature vector of a given patient, μ is the vector of mean microstructural metrics for each tract location s, and C−1 is the inverse covariance matrix of the input features. The problem of anomaly detection can be seen as a one-class classification problem and therefore, our training data only contains healthy participants to calculate C. an M score was then derived for all unseen patients in relation to the healthy participant distribution. The same dataset split as aforementioned was used to derive a bootstrapped estimate of M for each patient, which was subsequently analyzed.

### Support vector machine comparison

A supervised support vector machine classifier was used for comparisons on the SCHZ dataset. Class weights were set to account for the class imbalance between healthy participants and patients. The classifier was validated using a repeated (ten times) stratified, fivefold cross-validation approach in scikit-learn (scikit-learn.org). Optimized parameters were derived for each cross-validation fold using a grid-search approach. Those included the choice of kernel ([radial basis function, linear]), regularization ([1, 10, 100, 1000]) and gamma parameters ([10−3, 10−2, 10−1, 1, 101, 102, 103]).

### Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.