Main

Over the past 30 years, neuroimaging has matured to adopt the FAIR (findable, accessible, interoperable and reusable) principles1,2, develop reporting best practices3 and data standards4. While making research more rigorous and transparent, this maturation has inevitably increased compliance requirements. Indeed, just a few years ago it was possible to publish studies with a few hours of data collected and analyzed in a single laboratory. Today, studies require combining hundreds of hours of measurement, across multiple participants, laboratories and data modalities (for example, magnetic resonance imaging (MRI), positron emission tomography, functional near-infrared spectroscopy, electro-encephalography (EEG) and magnetoencephalography (MEG)). To support the needs of a mature neuroimaging field, several data collection efforts have been started; relevant examples are the Human Connectome Project (HCP)5, Cambridge Centre for Ageing and Neuroscience study (Cam-CAN)6, Adolescent Brain Cognitive Development (ABCD) study7, the UK-Biobank8, Healthy Brain Network (HBN)9, Pediatric Imaging Neurocognition and Genetics (PING) study10 and the Natural Scene Dataset11. At the same time, the complexity of the data pipeline has also increased with multiple, distinct, software libraries and analysis toolboxes developed12,13.

As compliance requirements grow, so do barriers to entry (Fig. 1a). The mature, neuroimaging field requires increased resources and technical training to piece together and track multiple processes such as data ingestion, standardization, storage, management, preprocessing and feature extraction (Fig. 1a). Currently, no single and low-barrier technology exists to integrate and manage the ever-changing software and data components of a full study. The growing compliance requirements affect the research community inequitably; smaller institutions and lower-income countries are more likely to lack resources and training. As such, this maturation process may risk favoring higher-resourced teams: an outcome that would not only decrease diversity and inclusion, but also slow-down scientific progress.

Fig. 1: The burdens of neuroscience and the promise of integrative infrastructure.
figure 1

a, A figurative representation of the current major burdens of performing neuroimaging investigations. b, Our proposal for integrative infrastructure that coordinates services required to perform FAIR, reproducible, rigorous and transparent neuroimaging research thereby lifting the burden from the researcher. c, brainlife.io rests on the foundational pillars of the open science community such as data archives, standards, software libraries and compute resources. d, brainlife.io’s Map step takes MRI, MEG and EEG data and processes them to extract statistical features of interest. brainlife.io’s reduce step takes the extracted features and serves them to Jupyter Notebooks for statistical analysis. PS, parc-stats datatype; TM, tractmeasures datatype; NET, network datatype and CLI, common line interface. e, The brainlife.io technology automates capture of data provenance. All data objects on brainlife.io are stored with a record of the apps, app versions and parameters used to process the data. f, The primary services are provided to the user by brainlife.io. Panels a and b adapted from ref. 22 under a Creative Commons license CC BY 4.0.

In support of simplicity, efficiency, transparency and equity in big data neuroscience research, our team has developed a community resource, brainlife.io (Fig. 1b). The brainlife.io platform stands on the pillars of open science (Fig. 1c), to provide free, secure and reproducible neuroscientific data analysis. Because of its web-based availability, brainlife.io should expand opportunities for researchers from nations and institutions with limited research budgets and resources. brainlife.io should then serve as an enabler for researchers and students from all sorts of institutions of higher education and all sorts of backgrounds to access cutting-edge neuroscience analytic tools.

brainlife.io is a ready-to-use and ready-to-expand platform. As a ready-to-use system, it allows researchers to upload and analyze data from MRI, MEG and EEG systems. Data are managed using a secure warehousing system with a proper access-control model. Data can be preprocessed and visualized using version-controlled applications (hereafter referred to as apps; https://brainlife.io/apps; Supplementary Fig. 1), compliant with major data standards (for example, the Brain Imaging Data Structure4). As a ready-to-expand system, software developers may submit apps guided by standardization and documentation (https://github.com/brainlife/abcd-spec and https://brainlife.io/docs). The platform uses opportunistic computing to serve commercial and academic clouds to researchers. Computing resources can be registered on brainlife.io for individual users and projects, or the larger community (Extended Data Fig. 1a,b). Supplementary Results 1 describe the technology.

The architecture of brainlife.io is based on a microservice approach for automated and decentralized data management and processing. Microservices are handled by the orchestration system Amaretti (Extended Data Fig. 1c,d and Extended Data Table 1) which deploys computational jobs on high-performance clusters and clouds (for example, Google Cloud, AWS or Microsoft Azure). Data management on brainlife.io is centered around projects, the ‘one-stop-shop’ for data management, processing, analysis and visualization (Supplementary Results 2 and Supplementary Fig. 2). Data archives can be docked by brainlife.io (Extended Data Fig. 1d) and data imported via the portal https://brainlife.io/datasets (Supplementary Table 1). Data from measurement instruments are imported using https://brainlife.io/ezbid (Extended Data Table 1)14. Data processing on brainlife.io uses an object-oriented and micro-workflow service model. Data objects are stored using predefined standardized formats, datatypes, that allow automated app pipelining (Extended Data Fig. 1e; https://brainlife.io/datatypes) and provenance tracking for millions of data objects. Data processing Apps are containerized, composable processing units, can be written in any language using containerization technology and are smart, meaning that they automatically identify, accept or reject data objects before processing (Supplementary Results 3 and Supplementary Fig. 3). brainlife.io apps and datatypes are Brain Imaging Data Structure4 compatible.

Complex neuroimaging processing pipelines are simplified into two main steps, akin to Google’s MapReduce algorithm. An initial Map step preprocesses data objects asynchronously, in parallel, to extract features of interest (that is, functional activations, white matter maps, brain networks or time series data; Fig. 1d). A ‘reduce’ step follows where features extracted using apps are made available to preconfigured Jupyter Notebooks to perform analysis and generate figures. Indeed, all analyses and figures in this paper are available in brainlife.io notebooks (Supplementary Table 2). brainlife.io’s data workflow makes it possible to integrate large volumes of data into small sets of features saved into ‘tidy data’ structures (Fig. 1d). For more documentation regarding usage of the platform, see Extended Data Fig. 2 and Supplementary Table 1. Datatypes inform apps allowing automated processing and provenance tracking for millions of data objects. brainlife.io tracks data object IDs, app versions and parameter sets across data processing steps. brainlife.io data provenance graphs visualize (Fig. 1e and Supplementary Table 1) and reproduce (Supplementary Table 1) the data generation steps. brainlife.io lowers the barriers of entry to FAIR neuroimaging by supporting an end-to-end data analysis workflow within a unified ecosystem (Fig. 1f).

We performed validation experiments to demonstrate cases where brainlife.io’s technology produces results consistent with best practices in the field. We used over 1,800 participants from three datasets: PING, HCPs1200 and Cam-CAN (Extended Data Figs. 38, Supplementary Results 4, Supplementary Fig. 4 and Supplementary Tables 3 and 4). Participants across all datasets spanned seven decades (that is, PING, 3–20 years; HCPs1200, 20–37 years and Cam-CAN, 18–88 years). Lifespan trajectories were plotted for multiple brain features (for example, brain region volume, white matter tract FAs, connectivity networks and MEG peak frequency; Fig. 2a and Extended Data Fig. 7) using brainlife.io’s Jupyter Notebooks. Inverted U-shaped lifespan trajectories were estimated, consistent with previous studies15,16,17 (Fig. 2a and Extended Data Fig. 7). The results generated using brainlife.io demonstrate that substantially different datasets can be collated to identify established brain’s lifespan trajectories (Supplementary Results 4).

Fig. 2: brainlife.io supports scientific discovery and replication.
figure 2

a–d, Identifying unique relationships with brain features over the lifespan. a, Relationship between participant age and right hippocampal volume, right inferior longitudinal fasciculus (FA, within-network average functional connectivity (FC) derived using the Yeo17 atlas and peak frequency in the alpha band derived from magnetometer (squares) and gradiometers (circles) from MEG data. These analyses include participants from the PING (purple), HCP1200 (green) and Cam-CAN (yellow) datasets. Linear regressions were fitted to each dataset, and a quadratic regression was fitted to the entire dataset (blue). b,c, Replication and generalization of previously reported scientific findings. b, Average cortical hcp-mmp parcel thickness (Nstruc = 322) compared to parcel the ODI from the NODDI model mapped to the cortical surface (inset) of the HCPS1200 dataset (Nsub = 1,043) and Cam-CAN (Nsub = 492) dataset compared to the parcel-average cortical thickness. c, Stressful life events were obtained from the NLES survey from HBN participants (Nsub = 42) compared to uncinate-average normalized quantitative anisotropy (QA). Mean linear regression (blue line) fits and standard deviation (shaded blue). Early life stress was obtained from multiple surveys collected from ABCD participants (Nsub = 1,107) compared to uncinate-average FA. Linear regression (green line) fits the data with standard deviation (shaded green). d, Identification of clinical biomarkers. d, Retinal optical coherence tomography images from healthy controls (top row), patients with Stargardt’s disease (middle row) and patients with Choroideremia (bottom row). From these images, photoreceptor complex thickness was measured for each group (controls, gray; Choroideremia, green; Stargardt’s, blue) in two distinct areas of the retina: the fovea (eccentricities 0–1°) and periphery (eccentricities 7–8°). In addition, optic radiations carrying information for each retinal area were segmented and FA profiles were mapped. Average profiles with standard error (shaded regions) were computed. One participant with Stargardt and one with Choroideremia were identified each having FA profiles that deviated from healthy controls.

We further evaluated the ability to replicate results and generalize findings. Apps were created to estimate cortical thickness and tissue orientation dispersion, orientation dispersion index (ODI) and analyze the HCPs1200 dataset. A negative relationship between cortical thickness and ODI was estimated (Fig. 2b and Extended Data Fig. 8; rHCP-brainlife = −0.43 versus roriginal), replicating the original study (ODI; roriginal = −0.46)18. The result was also generalized to the Cam-CAN dataset (Fig. 2b and Extended Data Fig. 8; rCam-CAN-brainlife = −0.28 versus roriginal). The association between life stressors and white matter organization of the uncinate fasciculus (r = −0.057) was a replication of Hason et al.19 using two independent datasets. The Negative Life Events Schedule (NLES) was correlated with quantitative anisotropy in the right- and left-hemisphere uncinate fasciculus (Fig. 2c and Extended Data Fig. 8; rHBN_LEFT = −0.35, two-tailed t-test, P = 0.018; rHBN_RIGHT = −0.39, two-tailed t-test P < 0.0156). Early Life Stress (a composite score of traumatic life events, environmental and neighborhood safety, and the family conflict subscale) was associated with the uncinate fasciculus FA (Fig. 2c and Extended Data Fig. 8; rABCD_LEFT = −0.12, P = 9.41 × 10−5; rABCD_RIGHT = −0.09, P = 0.0035). The results demonstrate the ability of brainlife.io services to detect meaningful associations in large, heterogeneous datasets (Supplementary Results 4).

Finally, we tested the ability of brainlife.io’s services to detect optic radiation white matter changes as a result of eye disease20. Individuals with Stargardt’s disease (deterioration initiated in the central retina) and choroideremia (deterioration initiated in peripheral retina) were compared to healthy controls. Stargardt’s FA was reduced in optic radiation fibers projecting to central V1 (not peripheral; Fig. 2d). Choroideremia’s FA was reduced in optic radiation fibers projecting to peripheral V1 (not central; Fig. 2d and Supplementary Results 4).

Our vision for brainlife.io is that of a trusted, interoperable and integrative platform connecting data archives and global communities of software developers, hardware providers and domain scientists (Supplementary Results 5 and Supplementary Table 5). The goal of brainlife.io is to facilitate research and education, accelerate brain understanding and lead to cures for brain diseases. To support this vision, brainlife.io connects trainees, researchers, developers and computing resource managers in high-, medium- and low-income countries via technology. The platform is registered on fairsharing.org, datacite.org and nitric.org, it is recommended by the International Neuroinformatics Coordinating Facility (https://incf.org/infrastructure/brainlife) and it can serve the US National Institutes of Health in the United States data deposition and sharing mandate21,22. A comprehensive overview of the platform and tutorials can be found at https://brainlife.io/docs. Videos provide tutorials and demonstrations at youtube.com/@brainlifeio. A slack channel supports communication and operations: https://brainlife.slack.com. Questions can be posted using the topic ‘brainlife’ on https://neurostars.org or GitHub issues (https://github.com/brainlife/brainlife/issues) can be added directly to the code repositories. A quarterly outreach newsletter is sent out to all users, and an X account (@brainlifeio) informs the wider community about critical events. The platform has already collected a growing community (Supplementary Results 3).

Methods

Data sources

Multiple openly available data sources were used for examining the validity, reliability and reproducibility of brainlife.io apps and for examining population distributions. All information regarding the specific image acquisitions, participant demographics and study-wide preprocessing can be found in the publications in refs. 5,6,7,9,10,23,24,25. Some data sources are currently unpublished. For these, the appropriate information is provided. Experiments were approved by the local institutional review boards (IRB) and only the personnel approved for a specific study accessed the data in private projects on brainlife.io.

Validity, reliability, reproducibility, replicability, developmental trends and reference datasets

HCP (test–retest, s1200-release)

Data from these projects were used to assess the validity, reliability and reproducibility of the platform. They were used to assess the abilities of the platform to identify developmental trends in structural and functional measures, and they were used to generate reference datasets. For structural MRI (sMRI) data, the minimally preprocessed structural T1w and T2w images from the HCP from 1,066 participants from the s1200 and 44 participants from the test–retest releases were used5. Specifically, the 1.25 mm ‘acpc_dc_restored’ images generated from the Siemens 3 T MRI scanner were used for all analyses involving the HCP. For most examinations, the already-processed Freesurfer output from HCP was used. For diffusion MRI (dMRI) data, to assess the validity of preprocessing on brainlife.io, the unprocessed dMRI data from 44 participants from the HCP test dataset was used. For reliability and all remaining analyses, the minimally preprocessed dMRI images from 1,066 participants from the s1200 and 44 participants from the test–retest releases from the 3 T Siemens scanner were used. All processes incorporated the multi-shell acquisition data. For functional data (functional MRI (fMRI)), regarding validation, the unprocessed resting-state fMRI data from 44 participants from the HCP test dataset were compared to the minimally preprocessed blood oxygenation level dependent data provided by HCP. For reliability and all other analyses, the minimally preprocessed blood oxygenation level dependent data from 1,066 participants from the s1200 and 44 participants from the test–retest releases from the 3 T Siemens scanner were used.

The Cam-CAN

The data from this project were used to assess the validity, reliability and reproducibility of the platform, and to assess the abilities of the platform to identify developmental trends of structural and functional measures, and to generate reference datasets. For sMRI data, the unprocessed 1 mm isotropic structural T1w and T2w images from 652 participants from the Cam-CAN6 study were used. For dMRI data, the unprocessed 2 mm isotropic diffusion (dMRI) images from 652 participants from the Cam-CAN study were used. For fMRI data, the 3 × 3 × 4 mm3 unprocessed resting-state fMRI images from 652 participants from the Cam-CAN study were used. For electromagnetic data (MEG), the 1,000 Hz resting-state filtered and unfiltered datasets from 652 participants from the Cam-CAN study were used.

Developmental trends and reference datasets

PING

The data from this project were used to assess the abilities of the platform to identify developmental trends of structural measures and to generate reference datasets. For sMRI data, the unprocessed 1.2 × 1.0 × 1.0 mm3 structural T1w and the 1.0 mm isotropic T2w images from 110 participants from the PING10 study were used. For dMRI data, the unprocessed 2 mm isotropic diffusion (dMRI) images from 110 participants from the PING study were used.

Replicability datasets

ABCD

For sMRI data, the unprocessed 1 mm isotropic structural T1w and T2w images from a subset of 1,877 participants from the ABCD (release-2.0.0) study were used. For dMRI data, the unprocessed 1.77 mm isotropic diffusion (dMRI) images from a subset of 1,877 participants from the ABCD (release-2.0.0) study were used7,26. A single diffusion gradient shell was used for these experiments (b = 3,000 s ms2). Research was approved by the University of Arkansas IRB (no. 2209425822).

HBN

The data from this project were used to assess the abilities of the platform to replicate previously published findings via the assessment of the relationship between microstructural measures mapped to segmented uncinate fasciculi and self-reported early life stressors. Research was approved by the University of Pittsburgh IRB (no. PRO17060350). For sMRI data, the 0.8 mm isotropic structural T1w images from 42 participants from the HBN study9 were used. For dMRI data, the unprocessed 1.8 mm isotropic diffusion (dMRI) images from 42 participants from the CitiGroup Cornell Brain Imaging Center site of the HBN study were used. Research was approved by the University of Pittsburgh IRB (no. PRO17060350).

UPENN-PMC

The University of Pennsylvania, Penn Memory Center (UPENN-PMC) data from this project were used to assess the abilities of the platform to replicate previously published findings via the assessment of the performance of an automated hippocampal segmentation algorithm. Secondary data analyses were conducted under IRB exemption at Indiana University. For sMRI data, the T1w and T2w data were provided within the Automated Segmentation of Hippocampal Subfields Automated Segmentation of Hippocampal Subfields atlas27.

Clinical-identification datasets

Indiana University Acute Concussion dataset

The data from this project were used to assess the abilities of the platform to identify clinical populations via the mapping of microstructural measures to the cortical surface. Neuroimaging was performed at the Indiana University Imaging Research Facility, housed within the Department of Psychological and Brain Sciences with a 3 T Siemens Prisma whole-body MRI using a 64-channel head coil. Within this study, nine concussed athletes and 20 healthy athletes were included. Research approved by Indiana University (IRB 906000405). For sMRI data, high-resolution T1-weighted structural volumes were acquired using an MPRAGE sequence: TI = 900 ms, TE = 2.7 ms, TR = 1,800 ms, flip angle 9°, with 192 sagittal slices of 1.0 mm thickness, a field of view of 256 × 256 mm2 and an isometric voxel size of 1.0 mm3 (where TI, TE and TR refer to inversion time, echo time and repetition time, respectively). The total acquisition time was 4 min and 34 s. High-resolution T2-weighted structural volumes were also acquired: TE = 564 ms, TR = 3,200 ms, flip angle 120°, with 192 sagittal slices, a field of view of 240 × 256 mm2 and an isometric voxel size of 1.0 mm3. Total acquisition time was 4 min and 30 s. Diffusion data (dMRI) were collected using single-shot spin-echo simultaneous multi-slice (SMS) echo-planar imaging (transverse orientation, TE = 92.00 ms, TR = 3,820 ms, flip angle 78°, isotropic 1.5 mm3 resolution; FOV = LR 228 × 228 × 144 mm3; acquisition matrix MxP 138 × 138. SMS acceleration factor 4). This sequence was collected twice, one in the anterior-posterior fold-over direction and the other in the posterior-anterior (PA) fold-over direction, with the same diffusion gradient strengths and the number of diffusion directions: 30 diffusion directions at b = 1,000 s mm2, 60 diffusion directions at b = 1,750 s mm2, 90 diffusion directions at b = 2,500 s mm2 and 19 b = 0 s mm2 volumes. The total acquisition time for both sets of dMRI sequences was 25 min and 58 s.

Oxford University Choroideremia & Stargardt’s Disease Dataset

The data from this project were used to assess the abilities of the platform to identify clinical populations via mapping retinal-layer thickness via optical coherence tomography and mapping of microstructural measures along optic radiation bundles segmented using visual field information (eccentricity). Neuroimaging was performed at the Wellcome Centre for Integrative Neuroimaging, Oxford with the Siemens 3 T scanner. Research was approved by the UK Health Regulatory Authority reference 17/LO/1540. For sMRI data, high-resolution T1-weighted anatomical volumes were acquired using an MPRAGE sequence: TI = 904 ms, TE = 3.97 ms, TR = 1,900 ms, flip angle 8°, with 192 sagittal slices of 1.0 mm thickness, a field of view of 174 × 192 × 192 mm3 and an isometric voxel size of 1.0 mm3. The total acquisition time was 5 min and 31 s. Diffusion data (dMRI) were collected using echo-planar imaging (transverse orientation, TE = 92.00 ms, TR = 3,600 ms, flip angle 78°, 2.019 × 2.019 × 2.0 mm3 resolution; FOV = 210 × 220 × 158 mm3; acquisition matrix MxP = 210 × 210, SMS acceleration factor 3). This sequence was collected twice, one in the anterior-posterior fold-over direction and the other in the PA fold-over direction. The PA fold-over scan contained six diffusion directions, three at b = 0 s mm2 and three at b = 2,000 s mm2, and was used primarily for susceptibility-weighted corrections. The anterior-posterior fold-over scan contained 105 diffusion directions, five at b = 0 mm s2, 51 at b = 1,000 mm s2 and 49 at b = 2,000 mm s2. The total acquisition time for both sets of dMRI sequences was 7 min and 8 s.

General processing pipelines

Structural processing

For the ABCD, Cam-CAN, Oxford University Choroideremia & Stargardt’s Disease Dataset, and the Indiana University Acute Concussion datasets, the structural T1w and T2w (sMRI) images (if available) were preprocessed, including bias correction and alignment to the anterior commissure-posterior commissure plane, using the brainlife.io apps A273 (https://doi.org/10.25663/brainlife.app.273) and A350 (https://doi.org/10.25663/brainlife.app.350), respectively. For PING data, no bias correction was performed but alignment to the anterior commissure-posterior commissure plane was performed using A99 (https://doi.org/10.25663/brainlife.app.99) and A116 (https://doi.org/10.25663/brainlife.app.116) for T1w and T2w data, respectively. For HCP data, this data was already provided. The structural T1-weighted images for each participant and dataset were then segmented into different tissue types using functionality provided by MRTrix3 (ref. 28) implemented as A239 (https://doi.org/10.25663/brainlife.app.239). For a subset of datasets, this was performed within the diffusion tractography generation step using A319 (https://doi.org/10.25663/brainlife.app.319). The gray- and white-matter interface mask was subsequently used as a seed mask for white matter tractography. The processed structural T1w and T2w images were then used for segmentation and surface generation using the recon-all function from Freesurfer29 (A0; https://doi.org/10.25663/brainlife.app.0). Following Freesurfer, representations of the cortical ‘midthickness’ surface were computed by spatially averaging the coordinates of the pial and white matter surfaces generated by Freesurfer using the wb_command -surface-cortex-layer function provided by Workbench command for the HCPTR, HCPs1200, ABCD, Cam-CAN, PING and Indiana University Acute Concussion datasets. These surfaces were used for cortical tissue mapping analyses. Following Freesurfer and midthickness-surface generation, the 180 multimodal cortical nodes (hcp-mmp) atlas and the Yeo 17 (yeo17) atlas were mapped to the Freesurfer segmentation of each participant implemented as brainlife.io app A23 (https://doi.org/10.25663/brainlife.app.23). These parcellations were used for subsequent cortical, subcortical and network analyses. In addition, measures for cortical thickness, surface area, volume and summaries of diffusion models of microstructure were estimated using A383 (https://doi.org/10.25663/brainlife.app.383) and A389 (https://doi.org/10.25663/brainlife.app.389). To estimate population receptive fields and visual field eccentricity properties in the cortical surface in the Oxford University Choroideremia & Stargardt’s Disease Dataset, the automated mapping algorithm developed by refs. 30,31 was implemented using A187 (https://doi.org/10.25663/brainlife.app.187). To segment thalamic nuclei for optic radiation tracking, the automated thalamic nuclei segmentation algorithm provided by Freesurfer28 was implemented as A222 (https://doi.org/10.25663/brainlife.app.222). Finally, visual regions of interest (ROI) binned by eccentricity were then generated using AFNI software32 functions implemented in A414 (https://doi.org/10.25663/brainlife.app.414). To assess the replicability capabilities of the platform, an automated hippocampal nuclei segmentation app (A262; https://doi.org/10.25663/brainlife.app.262) was used to segment hippocampal subfields from participants within the UPENN-PMC dataset provided within the Automated Segmentation of Hippocampal Subfields atlas.

dMRI processing

Preprocessing and model fitting

For most of the analyses involving the HCP dataset, the minimally preprocessed dMRI images were used and thus no further preprocessing was performed. However, to assess the validity of the preprocessing pipeline, the unprocessed dMRI data from the HCP test dataset and dMRI images were preprocessed following the protocol outlined in ref. 33 using A68 (https://doi.org/10.25663/brainlife.app.68). The same app was also used for preprocessing the dMRI images for the ABCD, Cam-CAN, PING, Oxford University Choroideremia & Stargardt’s Disease Dataset, the Indiana University Acute Concussion and HBN datasets. Specifically, dMRI images were denoised and cleaned from Gibbs ringing using functionality provided by MRTrix3 before being corrected for susceptibility, motion and eddy distortions and artifacts using FSL’s topup and eddy functions34,35. Eddy-current and motion correction was applied via the eddy_cuda8.0 with the replacement of outlier slices (that is, repol) command provided by FSL36,37,38,39. Following these corrections, MRTrix3’s dwigradcheck functionality was used to check and correct for potential misaligned gradient vectors following topup and eddy40. Next, dMRI images were debiased using ANT’s n4 functionality41 and the background noise was cleaned using MrTrix3.0’s dwidenoise functionality42. Finally, the preprocessed dMRI images were registered to the structural (T1w) image using FSL’s epi_reg functionality43,44,45. Following preprocessing, brain masks for dMRI data using bet from FSL were implemented as A163 (https://doi.org/10.25663/brainlife.app.163).

DTI, NODDI and q-sampling model fitting

Following preprocessing, the diffusion tensor imaging (DTI) model46 and the neurite orientation dispersion and density imaging (NODDI)47,48 models were subsequently fit to the preprocessed dMRI images for each participant using either A319 (https://doi.org/10.25663/brainlife.app.319) or A292 (https://doi.org/10.25663/brainlife.app.292) for DTI model fitting and A365 (https://doi.org/10.25663/brainlife.app.365) for NODDI fitting. Note, the NODDI model was only fit on the HCP, Cam-CAN, Oxford University Choroideremia & Stargardt’s Disease Dataset and the Indiana University Acute Concussion datasets. For those datasets, the NODDI model was fit using an intrinsic free diffusivity parameter (d) of 1.7 × 10−3 mm2 s−1 for white matter tract and network analyses, and a d of 1.1 × 10−3 mm2 s−1 for cortical tissue mapping analyses, using AMICO’s implementation48 as A365 (https://doi.org/10.25663/brainlife.app.365). The constrained spherical deconvolution49 model was then fit to the preprocessed dMRI data for each run across four spherical harmonic orders (that is, Lmax) parameters (2, 4, 6, 8) using functionality provided by MRTrix3 implemented as brainlife.io app A238 (https://doi.org/10.25663/brainlife.app.238). For the PING datasets, the constrained spherical deconvolution model was fit using the same code found in A238 (https://doi.org/10.25663/brainlife.app.238), but performed using the tractography app A319 (https://doi.org/10.25663/brainlife.app.319). For the HBN dataset, the isotropic spin distribution function was obtained by reconstructing the diffusion MRI data with the generalized q-sampling imaging method50 using functionality provided by DSI-Studio51 (A423; https://doi.org/10.25663/brainlife.app.423). Quantitative anisotropy was then estimated from the isotropic spin distribution function.

Tractography

Following model fitting, the fiber orientation distribution functions for Lmax = 6 and Lmax = 8 were subsequently used to guide anatomically constrained probabilistic tractography52 using functions provided by MRTrix3 implemented as brainlife.io app A297 (https://doi.org/10.25663/brainlife.app.297) or A319 (https://doi.org/10.25663/brainlife.app.319). For the HCPTR, HCPs1200 and Oxford University Choroideremia & Stargardt’s Disease datasets, Lmax = 8 was used. For the ABCD and Cam-CAN datasets, Lmax = 6 was used. For the HCP, ABCD and Cam-CAN, datasets, a total of 3 million streamlines were generated. For all datasets, a step size of 0.2 mm was implemented. For the HCPTR, HCPs1200, ABCD and Cam-CAN datasets, minimum and maximum lengths of streamlines were set at 25 and 250 mm, respectively, and a maximum angle of curvature of 35° was used. For the PING dataset, minimum and maximum lengths of streamlines were set at 20 and 220 mm, respectively, and a maximum angle of curvature of 35° was used.

Whiter matter segmentation and cleaning

Following tractography, 61 major white matter tracts were segmented for each run using a customized version of the white matter query language53 implemented as brainlife.io app A188 (https://doi.org/10.25663/brainlife.app.188). Outlier streamlines were subsequently removed using functionality provided by Vistasoft and implemented as brainlife.io app A195 (https://doi.org/10.25663/brainlife.app.195). Following cleaning, tract profiles with 200 nodes were generated for all DTI and NODDI measures across the 61 tracts for each participant and test–retest condition using functionality provided by Vistasoft and implemented as A361 (https://doi.org/10.25663/brainlife.app.361). Macrostructural statistics, including average tract length, tract volume and streamline count were computed using functionality provided by Vistasoft implemented as A189 (https://doi.org/10.25663/brainlife.app.189). Microstructural and macrostructural statistics were then compiled into a single data frame using A397 (https://doi.org/10.25663/brainlife.app.397).

Segmentation of the optic radiation

To generate optic radiations segmented by estimates of visual field eccentricity in the Oxford University Choroideremia & Stargardt’s Disease Dataset, ConTrack54 tracking was implemented as A252 (https://doi.org/10.25663/brainlife.app.252). Then, 500,000 sample streamlines were generated using a step size of 1 mm. Samples were then pruned using inclusion and exclusion waypoint ROI following methodologies outlined in refs. 19,55.

Segmentation of uncinate fasciculus

To assess the relationship between uncinate tract-average quantitative anisotropy, fractional anisotropy (FA) and early life stressors within two independent datasets (HBN, ABCD), the tract-average quantitative anisotropy for the left and right uncinate were computed from 42 participants from the HBN and the tract-average FA were computed from 1,107 participants from the ABCD dataset. For the HBN dataset, a full tractography segmentation pipeline was used to preprocess the dMRI data and segment the uncinate fasciculus using A423 (https://doi.org/10.25663/brainlife.app.423). Automatic fiber tracking was then performed to segment the uncinate fasciculus using default parameters and templates from a population tractography atlas from the HCP56. A threshold of 16 mm as the maximum allowed threshold for the shortest streamline distance was then applied to remove spurious streamlines. The whole tract-average quantitative anisotropy was then estimated. To probe stress exposure within the HBN dataset, we used the NLES, a 22-item questionnaire in which participants were asked about the occurrence of different stressful life events. The tractography pipeline for the ABCD dataset has been described previously. The average FA for the left and right uncinate were estimated using procedures described previously, and then compared to the participant’s life stressors behavioral measures by fitting a linear regression to the data.

Structural networks

Following tract segmentation, structural networks were generated using the multimodal 180 cortical node atlas and the tractograms for each participant using MRTrix3’s tck2connectome (ref. 57) functionality implemented as A395 (https://doi.org/10.25663/brainlife.app.395). Connectomes were generated by computing the number of streamlines intersecting each ROI pairing in the 180 cortical node parcellation. Multiple adjacency matrices were generated, including count, density (that is, the count divided by the node volume of the ROI pairs), length, length density (that is length divided by the volume of the ROI pairs) and average and average density axial diffusivity, fractional anisotropy, mean diffusivity, radial diffusivity, neurite density index, orientation dispersion index and isotropic volume fraction. Density matrices were generated using the -invnodevol option58. For non-count measures (length, axial diffusivity, fractional anisotropy, mean diffusivity, radial diffusivity, neurite density index, orientation dispersion index, isotropic volume fraction), the average measure across all streamlines connecting and ROI pair was computed using MRTrix3’s tck2scale functionality using the -precise option59 and the -scale_file option in tck2connectome. These matrices can be thought of as the ‘average measure’ adjacency matrices. These files were output as the ‘raw’ datatype and were converted to a conmat datatype using A393 (https://doi.org/10.25663/brainlife.app.393). Connectivity matrices were then converted into the ‘network’ datatype using functionality from Python functionality implemented as A335 (https://doi.org/10.25663/brainlife.app.335).

Cortical and subcortical diffusion and morphometry mapping

For the PING, HCPTR, HCPs1200, Cam-CAN and Indiana University Acute Concussion datasets, DTI and NODDI (if available) measures were mapped to each participant’s cortical white matter parcels following methods found in Fukutomi and colleagues18 using functions provided by Connectome Workbench60 implemented as brainlife.io app A379 (https://doi.org/10.25663/brainlife.app.379). A Gaussian smoothing kernel (full-width at half-maximum ~4 mm, σ = 5/3 mm) was applied along the axis normal to the midthickness surface, and DTI and NODDI measures were mapped using the wb_command -volume-to-surface-mapping function. Freesurfer was used to map the average DTI and NODDI measures within each parcel using functionality from Connectome Workbench using A389 (https://doi.org/10.25663/brainlife.app.389) and A483 (https://doi.org/10.25663/brainlife.app.483). Measures of volume, surface area and cortical thickness for each cortical parcel were computed using Freesurfer and A464 (https://doi.org/10.25663/brainlife.app.464). Freesurfer was also used to generate parcel-average DTI and NODDI measures for the subcortical segmentation (aseg) from Freesurfer using A383 (https://doi.org/10.25663/brainlife.app.383). Measures of volume for each subcortical parcel were computed using Freesurfer and A272 (https://doi.org/10.25663/brainlife.app.272).

rs-fMRI preprocessing and functional connectivity matrix generation

For the HCPTR and Cam-CAN datasets, unprocessed resting-state functional MRI (rs-fMRI) datasets were preprocessed using fMRIPrep implemented as A160 (https://doi.org/10.25663/brainlife.app.160). Briefly, fMRIPrep does the following preprocessing steps. First, individual images are aligned to a reference image for motion estimation and correction using mcflirt from FSL. Next, slice timing correction is performed in which all slices are realigned in time to the middle of each relaxation time using 3dTShift from AFNI. Spatial distortions are then corrected using field map estimations. Finally, the fMRI data is aligned to the structural T1w image for each participant. Default parameters provided by fMRIPrep were used. For a subset of analyses involving the HCP test and retest datasets, the preprocessed rs-fMRI datasets provided by the HCP consortium were used. Following preprocessing via fMRIPrep for the volume data, connectivity matrices were generated using the Yeo17 parcellation and A369 (https://doi.org/10.25663/brainlife.app.369) and A532 (https://doi.org/10.25663/brainlife.app.532). Within-network functional connectivity for the 17 canonical resting-state networks was computed by computing the average functional connectivity values within all of the nodes belonging to a single network. These estimates were used for subsequent analyses.

rs-fMRI gradient processing

For the HCPTR and Cam-CAN datasets, unprocessed rs-fMRI data from the HCP Test and Cam-CAN datasets were preprocessed using fMRIPrep implemented as A160 (https://doi.org/10.25663/brainlife.app.160). Within this app, the same preprocessing steps are undertaken as in A160 (https://doi.org/10.25663/brainlife.app.160), except for an additional volume-to-surface mapping using mri_vol2surf from Freesurfer. The surface-based outputs were then used to compute gradients following methodologies outlined in ref. 61 for each participant in the HCPs1200, HCPTR and Cam-CAN datasets using A574 (https://doi.org/10.25663/brainlife.app.574) using diffusion embedding62 and functions provided by BrainSpace63. More specifically, connectivity matrices were computed from surface vertex values within each node of the Schaffer 1,000 parcellation64. Cosine similarity was then computed to create an affinity matrix to capture inter-area similarity. Dimensionality reduction is then used to identify the primary gradients. A normalized-angle kernel was used to create the affinity matrix, from which two primary components were identified. Gradients were then aligned across all participants using a Procrustes alignment and joined embedding procedure61. Values from the primary gradient and the cosine distance used to generate the affinity matrices were used for subsequent analyses.

MEG processing

For some analyses, raw resting-state-MEG time series data from the Cam-CAN dataset was filtered using a Maxwell filter implemented as A476 (https://doi.org/10.25663/brainlife.app.476) and median split using A529 (https://doi.org/10.25663/brainlife.app.529). For the remainder of the analyses, filtered data provided by the Cam-CAN dataset was used. For all MEG data, power-spectrum density profiles (PSD) were estimated using functionality provided by MNE-Python28,65 implemented as A530 (https://doi.org/10.25663/brainlife.app.530). Following PSD estimation, peak alpha frequency was estimated using A531 (https://doi.org/10.25663/brainlife.app.531). Finally, PSD profiles were averaged across all nodes within each of the canonical lobes (frontal, parietal, occipital, temporal) using A599 (https://doi.org/10.25663/brainlife.app.599). Measures of PSD and peak alpha frequency were used for all subsequent analyses.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.