CHIASM, the human brain albinism and achiasma MRI dataset

We describe a collection of T1-, diffusion- and functional T2*-weighted magnetic resonance imaging data from human individuals with albinism and achiasma. This repository can be used as a test-bed to develop and validate tractography methods like diffusion-signal modeling and fiber tracking as well as to investigate the properties of the human visual system in individuals with congenital abnormalities. The MRI data is provided together with tools and files allowing for its preprocessing and analysis, along with the data derivatives such as manually curated masks and regions of interest for performing tractography.


Background & Summary
We present CHIASM, the human brain albinism, and achiasma dataset, a unique collection of magnetic resonance imaging (MRI) data of brains with congenital abnormalities in the visual system. The unique feature of these subjects is the varied amount of crossing found in a specific structure -the optic chiasm-across participants with albinism. More specifically, it is well established 1 that the number of crossing fibers crossing at the human optic chiasma to reach the contralateral brain hemisphere (right and left respectively) varies between certain groups. The percentage of fiber crossing at the chiasm has been reported for normal-sighted (control) participants to be about 53% 2 . In brains affected by albinism instead, the number of crossing fibers at the optic chiasm grows above 53% 3 . Crossing fibers within the optic chiasma in individuals affected by chiasm hypoplasia is lower than 53% 4 . Finally, data from individuals with achiasma have been shown to completely lack neuronal fiber crossing at the optic chiasm 1,5,6 .
The data we present here can be of value to the scientific community for multiple reasons. First, it can serve as a reference dataset to support basic research for clarifying the neuroscientific underpinnings of the different conditions. Currently, there are no reference datasets available covering similar conditions measured with high-resolution DWI data. Second, this dataset can be used by investigators to validate independent results and advance studies on the disease and neuroplasticity mechanisms. This is possible as chiasmal malformations induce abnormal representations within the visual pathways, which are expected to trigger neuroplastic mechanisms e.g. to resolve potential sensory conflicts 1,[12][13][14][15][16][17][18][19] . Finally, the dataset presented here can be used to advance tractography methods development. The field of brain tractography has faced a long-lasting challenge commonly referred to as the "crossing fibers problem" or simply CFP 18,[20][21][22][23][24][25][26][27][28][29][30][31][32][33][34][35] . CFP can lead to poor estimates of the number of crossing fibers through brain regions containing multiple fiber populations 36,37 . It has been established that up to 90% of total brain white matter volume might have crossing fibers 38 . Advancing methods for accurate tracking in regions with crossing fibers is fundamental in clarifying the role of white matter in human health and disease 18,39,40 . As of today, several important approaches to tractography evaluation and validation have been proposed. These approaches can be classified into four primary categories: synthetic phantoms 41,42 , physical phantoms 43 , biological phantoms 44,45 , and statistical 9,32,[46][47][48][49] . Most of these approaches have helped advance tractography methods, but major challenges remain 30,31,42 . The data made available here opens the possibility to assess crossing strength at the optic chiasm by first using anatomical data (T1w, DWI) to model the crossing at the optic chiasm and cross-validating the proposed findings with functional estimates (fMRI) of misrouting based on the BOLD signal 4,50,51 (see Suppl. Table 1). This provides a unique opportunity for testing novel tractography methods that assess crossing strength by providing an independent modality for their evaluation.
Methods MRI Data sources. The described MRI data was analyzed in previously published studies 4,50,51 , where acquisition protocols and data properties are detailed.

Participants.
A single participant with achiasma, a single participant with chiasm hypoplasia, 9 participants with diagnosed albinism, and 8 control participants [no neurological or ophthalmological history; normal visual acuity (≥1.0 with Freiburg Visual Acuity Test 52 ) and normal stereo vision 53,54 ] were recruited for the MRI measurements. Each participant was instructed about the purpose of the study and the methods involved and gave written informed study participation and data sharing consent. The study was approved by the Ethics Committee of the Otto-von-Guericke University Magdeburg, Magdeburg, Germany. The patients and control participants underwent ophthalmological examination (Suppl. Table 1), which incorporated methods described in 55,56 . . The fMRI data is not provided due to severe nystagmus and motion compromising the quality of data. Top, middle and bottom rows display respectively T1w, DW, and fMRI data. Images show pseudo-axial views of a T1w image cropped to the brain mask. MRI Data acquisition. MRI data was acquired with a Siemens MAGNETOM Prisma 3 Tesla scanner with the Syngo MR D13D software and a 64-channel head coil. The acquisition protocol for T1w and DW data was initiated by a localizer scan, followed by a whole-brain T1w 3D-MPRAGE scan and two DW scans -respectively with anterior-posterior (A-P) and posterior-anterior (P-A) phase-encoding direction. T1w and DW images were collected during a single continuous scanning session, fMRI data was acquired in separate sessions (patients data was acquired on two consecutive days). T1w images were obtained in sagittal orientation using a 3D-MPRAGE sequence (TE/TR = 4.46/2600 ms, TI = 1100 ms, flip angle = 7°, resolution = 0.9 × 0.9 × 0.9 mm 3 , FoV: 230 × 230 mm²; image matrix: 256 × 256 × 176, acquisition time = 11 min:06 s 57 ) and corrected simultaneously during acquisition for gradient nonlinearity distortions. Each individual's T1w data was screened by a radiologist for unexpected abnormalities present in the data. Apart from the abnormalities given in Methods (Participants), no clinically relevant abnormalities were detected.
DWI were acquired with Echo-Planar Imaging (EPI) sequence (TE/TR = 64.0/9400 ms, b-value 1600 s/ mm², resolution 1.5 × 1.5 × 1.5 mm³, FoV 220 × 220 mm², anterior to posterior (A-P) phase-encoding direction, acquisition time = 22 min:24 s, no multi-band). The b-value was chosen with regard to reported optimal values for resolving two-way crossing 58 (1500-2500 s/mm 2 ). Scans were performed with 128 gradient directions, so the obtained DWI data can be described as High Angular Resolution Diffusion Imaging 7 (HARDI) data. The redundantly high number of gradient directions for the maximal angular contrast provided by a b-value of 1600 s/mm 2 supported residual bootstrapping. This enhanced the effective signal-to-noise ratio (SNR), which is an important feature considering the reduced SNR of the DWI of the optic chiasm. The gradient scheme, initially generated using E. Caruyer's tool for q-space sampling 59 for 3 shells acquisition, was narrowed to the single shell in order to address the acquisition time constraints. DW volumes were evenly intersected by 10 non-diffusion weighted (b-value = 0, hereafter referred to as b0) volumes for the purpose of motion correction. The second DW series were acquired with reversed phase-encoding direction in comparison to the previous scan, specifically posterior to anterior (P-A). Apart from that, all scan parameters were identical to ones corresponding to the preceding acquisition. Acquisition of two DW series with opposite phase-encoding directions enhanced the correction of geometrically induced distortions 60 . Furthermore, the additional scans improve the signal-to-noise ratio (SNR) of the total DWI data.
fMRI data was acquired from 4 controls and 6 participants with albinism (see Suppl. Tables 1 and 2) with T2*-weighted EPI sequence (TE/TR = 30.0/1500 ms, flip angle = 70°, resolution 2.5 × 2.5 × 2.5 mm³, FoV 210 × 210 mm 2 , acquired with multi-band and in-plane acceleration factor = 2) during visual stimulation. Visual stimulation was performed in either the left, right or both visual hemifields in separate runs. A single repetition comprised 168 volumes acquired within 252 seconds. Each of these three stimulation conditions was repeated three times, resulting in a total of nine functional runs acquired within a single session. The visual stimulation is detailed in 51 . Briefly, it employed a moving high-contrast checkerboard pattern 61 presented within the aperture of a drifting bar (width: 2.5°) within a circular aperture (radius: 10°). The bar aperture was moving in four directions (upwards, downwards, left, and right) across the stimulus window in 20 evenly spaced steps within 30 s. The sequence of the visual stimulation runs was interspersed by equally long (30 s) mean luminance blocks with zero contrast. The stimuli, generated with Psychtoolbox 62,63 in MATLAB (Mathworks, Natick, MA, USA), were projected onto a screen (resolution 1140 × 780 pixels) placed at the magnet bore. The participants viewed the stimuli monocularly with their dominant eye (see Suppl. Tables 1 and 2) via an angled mirror at a distance of 35 centimeters, and were instructed to fixate on a central dot and respond with a button press to dot color changes.
Data preprocessing. Data preprocessing was mainly performed online, using web services available on the brainlife.io platform (https://brainlife.io), with a few exceptional steps done offline. The source code for the Apps used for online preprocessing is to be found at https://github.com/brainlife. The offline preprocessing involved conversion of DICOM data to NIfTI format, data anonymization, and, in the case of DW data, correction of gradient nonlinearity distortions and alignment to T1w image. The scripts for all of the offline preprocessing steps are available on https://github.com/rjpuzniak/CHIASM. Data preprocessing was meant to provide minimally processed data and standardized T1w, DWI, and fMRI data files.
The following software packages were used for data preprocessing: MRtrix 28 Preprocessing of the T1w data. In the offline preprocessing steps, T1w images were converted into the NIfTI format using dcm2niix 71 and subsequently anonymized through the removal of facial features using mri_ deface algorithm 78 from FreeSurfer 6.0.0. Anonymized T1w images were aligned to the Anterior Commissure -Posterior Commissure (ACPC) plane using the mrAnatAverageAcPcNifti.m command from mrDiffusion package (https://github.com/vistalab/vistasoft/wiki/ACPC-alignment). The outcome T1w images were used as the reference image for the coregistration of DWI. Further, T1w images were automatically segmented into five-tissue-type (cerebrospinal fluid, white, grey, and subcortical grey matter, and eventual pathological tissue; 5TT) segmented images 79 through the use of commands from FSL 6.0.3 65,[80][81][82] . Finally, the T1w data was uploaded to brainlife.io, where it was segmented once again using FreeSurfer 7.1.1 (Fig. 2a, top row). Detailed information about the preprocessing code is provided in the Code Availability section (Table 1).
www.nature.com/scientificdata www.nature.com/scientificdata/ Preprocessing of the DWI data. The DICOM DWI data preprocessing followed the well-established outline proposed by the Human Connectome Project (HCP) consortium 83 . Initially, DW files were converted offline into NIfTI format using dcm2niix and uploaded to brainlife.io. Next, the DWI data was corrected online for the Rician noise using dwidenoise 84,85 and for Gibbs ringing using mrdegibbs 86 commands from MRtrix 3.0. The following step of preprocessing involved estimation of the susceptibility-induced off-resonance field in the DW data with FSL's topup command 60,65 using two DW series with opposite phase encoding directions. The output of topup was subsequently fed to eddy command 87 in order to correct for susceptibility-and eddy current-induced off-resonance field, as well as the motion correction. The topup and eddy command were implemented through dwifslpreproc command from MRtrix 3.0, which final output was a single file containing the corrected DW series. In the final online preprocessing step, the data was corrected for the field biases using dwibiascorrect from MRtrix 3.0, which, in turn, used the N4 algorithm from ANTS 88 in order to estimate the MR field inhomogeneity. At this stage, the DWI data were downloaded and corrected in an offline mode for the gradient nonlinearities The red contour marks brain mask estimated from BOLD signal, magenta contour marks combined CSD and WM masks, where voxels with partial GM volume were removed, blue contour marks the top 2% most variable voxels within brain mask.

Preprocessing step
Software/Tool Software website/App www.nature.com/scientificdata www.nature.com/scientificdata/ distortions. This step involved using the gradunwarp package and information about Legendre coefficients in spherical harmonics for the scanner's gradient coil, provided by the vendor (stored in the https://github.com/ rjpuzniak/CHIASM repository). As for the final step of preprocessing, the DWI data was coregistered to T1w data using the Boundary-Based Registration (Fig. 2a, middle row). At first, the transformation matrix from DWI to T1w image space was estimated with the epi_reg command from FLIRT 89-91 , a part of FSL 6.0.3 package. The transformation matrix was subsequently applied to DWI data by the flirt command from the same package, and to the corresponding b-vectors by shell script from HCP repository (https://github.com/Washington-University/ HCPpipelines/blob/master/global/scripts/Rotate_bvecs.sh). The resulting data, in NIfTI format, have been uploaded to brainlife.io and were published as a preprocessed DW data set. Detailed information about the preprocessing code is provided in the Code Availability section ( Table 2).
Preprocessing of the fMRI data. The fMRI data was converted into NIfTI format using dcm2niix.
Subsequent preprocessing was performed online using two Apps wrapping the fMRIPrep tool 74 : fMRIPrep -Surface output 92 (which output data as the surface vertices) and fMRIPrep -Volume output 93 (which output data in volumetric format). The preprocessing, in both cases, involved correction for susceptibility distortions using antsRegistration from ANTs 2.3.3, registration to T1w image using bbregister command from FreeSurfer 6.0.1 (Fig. 2a, bottom row), slice-time correction using 3dTshift from AFNI and correction for head-motion. Additionally, the BOLD (blood-oxygen-level-dependent) data was subject to Component-Based Noise Correction (CompCor) 94 , which uses information principal components from noise-driven regions (defined as top 2% variable voxels in BOLD image; Fig. 2b, blue contour) in order to reduce the standard deviation of resting-state BOLD data. The noise-driven regions selection was limited only to voxels not affected by gray matter partial volume (Fig. 2b, pink contour). The output files created during preprocessing with fMRIPrep Apps are described in detail in section Data Records. Detailed information about the preprocessing pipeline is provided in the HTML report files generated by fMRIPrep application in the Data Records, while the code is provided in the Code Availability section ( Table 3).
Drawing of the optic chiasm mask. Due to the limited accuracy of the automatically generated optic chiasm mask (Fig. 3a), manual segmentation was necessary to ensure the proper anatomical definition of the structures in each participant. The procedure comprised the following steps: 1) Initial segmentation of voxels unambiguously belonging to the optic chiasm (i.e. outer voxels affected by partial volume effects were excluded). This segmentation was performed only in an axial view and was done in multiple slices covering optic nerves, optic chiasm,and optic tract. 2) Second step where voxels affected by partial volume effects, previously omitted, were included. The two main criteria for the inclusion of candidate voxels were (a) relative intensity (compared to neighboring voxels identified in the previous step) and the coherence/continuity of the optic chiasm structure (already defined by voxels selected in the previous step). 3) A third and final step involved corrections performed in axial, coronal, and sagittal views at the same time.
The main criterion here was to assure the continuous borders.
The outcome masks covered posterior optic nerves, whole optic chiasm, and anterior optic tracts (Fig. 3b) and were used for correction of white matter definition in previously generated 5TT masks. The corrected white  www.nature.com/scientificdata www.nature.com/scientificdata/ matter masks were extracted from 5TT images using mrconvert command, transformed to the space of T1w image (in order to ensure matching of QForm and SForm transformation matrices) with flirt command from FSL and mrconvert commands from MRtrix, and uploaded to the brainlife.io. Detailed information about the code for preprocessing can be found in the Code Availability section (Table 4).
Drawing regions of interest in the optic chiasm. Four ROIs were manually drawn and curated in each participant (Fig. 3c) on the T1w images. These ROIs identified the anterior and posterior aspects of the optic chiasm in each individual. The two anterior ROIs identified the location of the left and right optic nerve (Fig. 3c,e, yellow, and magenta). The posterior ROIs identified the left and right optic tract (Fig. 3c,f, cyan, and red). Once created, ROIs were transformed to the space of T1w image with mrconvert and mrtransform and thresholded with mrthreshold commands from MRtrix (in order to remove interpolation artifacts) and uploaded to the brainlife. io. Detailed information about the code for preprocessing can be found in the Code Availability section (Table 4).
Diffusion signal reconstruction, tractography, and statistical evaluation. The whole-brain tractography was performed using MRtrix 0.2.12 64 . The tractography was based on diffusion tensor (DT) and constrained spherical deconvolution (CSD) models and was performed using both deterministic and probabilistic methods 25,28,[95][96][97][98][99] . The DT model was used for deterministic tracking, in the case of CSD both deterministic and probabilistic tracking was applied. The tractography utilized Anatomically-Constrained Tractography 79 , where the tracking was restricted to gray matter-white matter boundary from the newly created FreeSurfer 7.1.1 segmentation of the provided T1w image to ensure the agreement of the obtained tracks with the underlying anatomical Table 3. List of preprocessing steps applied to the fMRI images, together with web links to relevant software and, if available, brainlife.io Apps. Web services used to process that are available for reuse on https://brainlife.io/apps. The whole-brain tractogram was evaluated and optimized using the Linear Fasiscle Evaluation method 9,49 (LiFE). Over the course of the evaluation process, every streamline is assigned a weight indicating its unique contribution in explaining the measured diffusion signal based on a tensor fit of the preprocessed diffusion data. Streamlines with non-zero weights are deemed as significant, while others are being discarded. The brainlife.io application implementing LiFE evaluation 9 can be found at 110 .

Data Records
The data includes T1w, DW, and (if available) fMRI images of: single participant with achiasma (ACH1), single participant with chiasma hypoplasia (CHP1), 9 participants with albinism (ALB1 -ALB9) and 8 control participants (CON1 -CON8). The data from control participants are provided under an open license. To assure anonymity of the participants with clinical conditions, their data are made available upon direct request (as regulated by the Data Use Agreement, Suppl. Box 1).
The data is publicly accessible via brainlife.io platform 11 at https://doi.org/10.25663/brainlife.pub.9 111 . When downloaded, the files are organized as defined by brainlife.io DataTypes (https://brainlife.io/docs/ user/datatypes/ and https://brainlife.io/datatypes), and, if applicable, as the most updated version of the Brain Imaging Data Structure specification 112 (BIDS). Due to the developmental nature of the BIDS format, at the present time, it does not support all the data derivative types presented here; the data records detailed below are presented according to brainlife.io Data Types. The data files stored for each subject on brainlife.io can be divided into three general categories: (A) source data, which consist of anonymized and aligned to the anterior commissure -posterior commissure (AC-PC) space T1w image, raw DW and fMRI data in NIfTI format, (B) preprocessed data, which consist of preprocessed DW and fMRI data, as described in Data Preprocessing section and (C) data derivatives, as described in Data Derivatives section. Additionally, the fMRI NIfTI data stored on brainlife.io are provided together with (D) MrVista.mat files (further referred to as "fMRI meta-files"), which are necessary for the analysis of the former. Those files are stored in a separate Open Science Framework (OSF) repository: https://doi.org/10.17605/osf.io/XZ29Q 113 and are described in detail in the 'fMRI meta-files' section.
Source data. Source data (raw) files consist of two DWI datasets, one T1w set per participant, and, in the case of 6 participants from the albinism group and 4 controls, fMRI T2*-weighted images. DW source data. Source DWI data covers two DW series acquired with opposite phase encoding directions (PEDs) -Anterior-Posterior (AP, Box 1a) and Posterior-Anterior (PA, Box 1b), as indicated by the tags.

Preprocessing step
Software/Tool Software source/App  www.nature.com/scientificdata www.nature.com/scientificdata/ fMRI source data. The source fMRI data is available for 6 participants with albinism (ALB1, ALB5, ALB6, ALB7, ALB8 and ALB9) and 4 controls (CON1, CON2, CON3 and CON8) and incorporates BOLD series acquired in 6 runs (3 runs corresponding to monocular stimulation of right visual hemifield, and 3 runs for left), except for ALB5 (3 runs for right and 2 for left hemifield) and CON1 (2 runs for right and 3 for left hemifield). Importantly, the fMRI files for the participant are stored in separate sessions (e.g. CON1/run4, Box 2).
Preprocessed data. Preprocessed files are divided into 2 main categories: DW and fMRI files. The former are stored in each participant's main folder, whereas fMRI files, if provided, are stored in folders corresponding to separate sessions. DW preprocessed data. Preprocessed DW data consists of two files, tagged as "preprocessed" and "clean".
The 'preprocessed' tag marks the data (Box 4), which has been processed online, but lacks correction for gradient nonlinearity distortions and was not aligned to T1w image (those last two steps were performed offline) -the details are described in the "Data preprocessing" section.
The "clean" tag marks the data which has been completely preprocessed and aligned to T1w image (Box 5). Consequently, the files tagged as "clean" are recommended for further analyses.
fMRI preprocessed data. fMRI data processing was performed both for surface and volume representations of the data, and in both cases several output files were created.
In case of surface output 92 , the output files consist of surface vertices (3D mesh), for pial and white matter, as well as inflated representation, defined for both hemispheres (Box 6a), surface data in NIfTI format containing measures at each vertices (Box 6b), surface time series data in CIFTI format (Box 6c), HTML preprocessing report (Box 6d), volumetric mask of brain (Box 6e) and confounds (nuisance regressors) representing fluctuations with a potential non-neuronal origin, identified using CompCor (Box 6f).  93 shares a majority of files with surface preprocessing (Box 6c-f ), except for files containing data in surface representation (Box 6a,b). Furthermore, two additional files are included in the volumetric input: brain mask based on BOLD image (Box 7a) and volumetric BOLD image (Box 7b).

Data derivatives.
Provided data derivatives consist of manually curated and automatically generated white matter masks, custom ROIs, T1w image segmentation, tractograms, and filtered tractograms.

Box 6
Organization of the fMRI preprocessed data files (surface output) according to the brainlife.io Data Types. (a) surface vertices (3D mesh), for pial and white matter, as well as inflated representation, defined for both hemispheres, (b) surface data in NIfTI format containing measures at each vertices, (c) surface time series data in CIFTI format, (d) HTML preprocessing report, (e) volumetric mask of brain and (f) confounds (nuisance regressors) representing fluctuations with a potential non-neuronal origin, identified using CompCor.
(a) proj-5ddfa986936ca339b1c5f455/sub-{}.ses-run{}/dt-neuro-surface-vertices. www.nature.com/scientificdata www.nature.com/scientificdata/ Manually curated and automatically generated masks. White matter masks manually curated in the optic chiasm region (Fig. 3b; creation described in "Data preprocessing" section) sampled to match the original T1w image resolution (Box 8): Additional white matter mask (created from FreeSurfer segmentation of white matter), generated by brainlife.io App performing tractography 109  T1w image segmentation. A FreeSurfer (v 7.1.1.) segmentation of T1w image, which was generated as a part of data preprocessing (see section "Data preprocessing") and was used in tractography (Box 11a) and fMRI data preprocessing (Box 11b) is provided exclusively in brainlife.io Data Types format.

Box 9
Organization of the data files of automatically generated optic chiasm masks according to the brainlife. io Data Types. www.nature.com/scientificdata www.nature.com/scientificdata/ Tractograms and filtering results. The results of tractography performed for the purpose of technical validation of the DW data (Box 12a) and results of its filtering with LiFE (Box 12b) are provided as part of the repository: fMRI meta-files. fMRI meta-files (for a subset of 6 albinism and 4 control participants, for which fMRI source data are provided) are available on the Open Science Framework (OSF) platform: https://doi.org/10.17605/ osf.io/XZ29Q 113 . The files are in MATLAB format.mat and provide all the necessary information for performing the retinotopy data analysis using the MrVista package (https://github.com/vistalab/vistasoft). For each participant there is a total of 6 files: description of visual stimulus presented during left and right visual hemifield stimulation (Box 13a,b, respectively), full information about the acquisition parameters, participant's response and stimulus for left and right visual hemifield stimulation (Box 13c,d, respectively; this also includes contents of visual stimulus corresponding to given hemifield stimulation as in Box 13a,b, respectively), file containing all parameters necessary for initialization of session in MrVista (such as paths to files required in analysis; Box 13e) and mrSession file storing all information about the analysis (Box 13f):

technical Validation
This section provides a quality assessment of the published DW and fMRI data and is based on a previously published approach 11 comprising qualitative and quantitative measures.
Qualitative assessment. The qualitative assessment involves (A) demonstration of the quality of alignment between anatomical, DWI, and fMRI images, and (B) demonstration of reconstruction of diffusion signal and tractography in the optic chiasm.
Registration of anatomical, DW, and fMRI data. A critical step in data preprocessing is to obtain the precise alignment between images of various modalities (T1w, DWI, and fMRI images; for a detailed description see Methods). The quality of registration is demonstrated by overlaying FreeSurfer's 7.1.1 segmentation contours of white and pial matter ( Fig. 2a; blue and red colors, respectively) on top of T1w image (from which they were derived; Fig. 2a, top row), DWI (Fig. 2a, middle row) and BOLD image (Fig. 2a, bottom row) for a representative participant (CON1).
Diffusion signal reconstruction and tractography in the optic chiasm. Considering the role of optic chiasm malformations as a major factor driving group differences, the quality assessment included diffusion signal modeling and tractography in this structure. Figure 4a displays representative optic chiasms in the T1w images, next to aligned dMRI b0 images (Fig. 4b). The DWI signal in each voxel was modeled using a CSD 106,114 model in a process where an estimated single fiber response (SFR; L max = 6) was used as a deconvolution kernel in the process of calculating the fiber orientation distribution function 20 (FOD; L max = 12) from acquired DWI. Figure 4c demonstrates the fit of calculated FOD in an optic chiasm region. Figure 4d demonstrates tracking results in the region of the optic chiasm. Presented results are limited only to probabilistic CSD-based tractography (iFOD2 algorithm, step size = 0.75 mm, FOD cutoff amplitude = 0.06, maximum angle between successive steps = 45°) based on already calculated ODFs (L max = 12; Fig. 4c), which was done between pairs of ROIs and within manually corrected white matter mask defined in the Data derivatives paragraph of the Methods section. For the purpose of clarity, only 0.25% of the total number of generated streamlines is displayed. Participant's motion during DW data acquisition. The participants' motion in DWI has been calculated for concatenated AP and PA series (acquired subsequently during a single scanning session, see Methods) by calculating the RMS of each voxels' displacement using the Eddy command from FSL. The displacement calculation used the first acquired voxels as a reference for all volumes and included only voxels within the brain mask. The slow, yet steadily increasing drift visible for all participants (Fig. 5a) can be well tracked with b0 images intersecting DW series, which benefited motion correction. While, the lowest RMS was observed for the control group (which can be justified by the inclusion of trained control participants, well accustomed to MRI scanning), no proof for inequality of mean displacement RMS between participants with albinism and controls was found (Student's t-test p-value = 0.11). It should be noted that the increased motion shown by achiasmatic participant ACH1 matches the observations from the fMRI scanning session, where the data had to be discarded due to extensive motion.
The output data of eddy describing motion for each participant is available online on the Github repository (https://github.com/rjpuzniak/CHIASM/tree/main/Plots/ Fig.5_Motion), where it is provided together with the MATLAB code of Fig. 5a.
SNR in DW data. In order to evaluate the quality of the DWI data, the signal-to-noise ratio (SNR) of raw and preprocessed images was measured. The computations were performed for b0 (Fig. 5b) and diffusion-weighted (along with X-, Y-and Z-axis; Suppl. Fig. 1) for corpus callosum and optic chiasm voxels. Specifically, the SNR was defined as the mean ratio of the signal in voxels (from the selected structure) to standard deviation of noise (measured in voxels outside the brain), as described in 37,115 . In the case of the corpus callosum, the SNR was calculated separately for b0 images of two raw DW series (one with AP PED and one with opposite PA PED) and for the fully preprocessed DW image ("corrected"). As expected, the comparable SNR of corrected images is increased while the preprocessing (unwarping) is performed (Fig. 5b).
Comparable analysis of SNR in optic chiasm was obstructed by the severe geometry-induced distortions present in this region, which introduce spatial warping of the chiasm. Although theoretically this problem can be addressed by using new sets of optic chiasm masks (created separately for images warped in AP and PA directions), drawing new masks on top of DW images in heavily distorted regions is practically extremely challenging and is highly likely to introduce inaccuracies. Instead, the SNR was calculated for the fully preprocessed www.nature.com/scientificdata www.nature.com/scientificdata/ (corrected) DWI images, where the optic chiasm mask was cropped from a manually curated white matter mask (Fig. 5b). The observed higher SNR in optic chiasm (compared to corpus callosum) could be due to multiple reasons that were not tested by the authors. We speculate that it might be the result of using a 64-channel Radio www.nature.com/scientificdata www.nature.com/scientificdata/ Frequency coil, which measures stronger signals from peripheral brain regions in comparison to deeper regions such as the corpus callosum. The brainlife.io application implementing the SNR computation in the corpus callosum can be found at 116 which follows the outlined SNR calculation strategy presented in 37,115 .
Coefficients of the single fiber response (SFR) function. The quality of modeling of DWI was assessed by plotting the zero phase coefficient of the SFR function (calculated for L max = 6 with dwi2response command from MRtrix, which used a iterative algorithm for SFR voxels selection and response function estimation; 107 Fig. 5c). The observed plot is in agreement with theoretical expectations, where successive non-zero (even) terms are of opposite signs and of decreasing absolute value.

Quality of tractography.
Finally, the quality of the created tractogram has been assessed with the LiFE algorithm (see Diffusion signal reconstruction, tractography and statistical evaluation in Methods). Figure 5d demonstrates the correlation between the number of fascicles of non-zero weights (y-axis) and Connectome Root Mean Square Error (RMSE; 9,103 ). The data points for all CHIASM participants demonstrate a high number of weighted fascicles (positively contributing to measured signal, y-axis on Fig. 5d) combined with a low connectome's RMSE (measuring the discrepancy between signal predicted from weighted fascicles and measured signal, y-axis). Those measures replicate previous findings of high-quality diffusion scans (HCP/O3D 11,117 ).
tSNR of fMRI data. The quality of fMRI images was assessed using a temporal SNR (tSNR) measure (Fig. 6). Specifically, the tSNR in the BOLD images was calculated for two areas -whole brain volume (derived from BOLD images with fMRIPrep -Volume Output 93 App) and primary visual cortex (V1) mask derived from T1w image using Benson's atlas [118][119][120] . The tSNR has been calculated using the code provided in 121 (https://github. com/psychoinformatics-de/studyforrest-data-aligned). The mean values of tSNR calculated for participants with albinism and controls (51.0 and 57.7, respectively) correspond well to the tSNR previously reported for voxel volumes of 16.625 mm 3 122 .
Population Receptive Fields (pRF) Mapping. The pRF-mapping data derivatives and methods were described in a previous publication 51 . Briefly: The pRF sizes and positions can be estimated from the fMRI data and visual stimulus position time course. The BOLD response of each voxel can be predicted using a circular 2D-Gaussian model of the neuronal populations receptive field defined by three stimulus-referred parameters i.e. x0, y0, σ where x0 and y0 are the coordinates of the receptive field center and σ is it's spread 61 . The predicted BOLD signal can be calculated by convolution of the stimulus sequence for the respective pRF-model and its three parameters with the canonical hemodynamic response function 123 . Based on this, the optimal pRF parameters can be found by minimizing the residual sum of squared errors (RSS) between the predicted and observed BOLD time-course. Only voxels will be retained whose explained variance exceeded a threshold of 15%.

Usage Notes
This data is organized according to BIDS standard 112 , whenever applicable (additionally, in all cases the data is organized according to brainlife.io Data Types), and are stored in documented standard NIfTI format. The data is to be accessed at the brainlife.io computing platform either by (A) the web interface of brainlife.io and/or (B) a command-line interface (https://github.com/brainlife/cli). CLI offers means to query and download partial or full data. This utility is further expanded when using a web interface, which in addition to selection and download of data allows for online processing with provided brainlife.io Apps Table 5.