A Robust Image Registration Interface for Large Volume Brain Atlas

Accurately mapping brain structures in three-dimensions is critical for an in-depth understanding of brain functions. Using the brain atlas as a hub, mapping detected datasets into a standard brain space enables efficient use of various datasets. However, because of the heterogeneous and nonuniform brain structure characteristics at the cellular level introduced by recently developed high-resolution whole-brain microscopy techniques, it is difficult to apply a single standard to robust registration of various large-volume datasets. In this study, we propose a robust Brain Spatial Mapping Interface (BrainsMapi) to address the registration of large-volume datasets by introducing extracted anatomically invariant regional features and a large-volume data transformation method. By performing validation on model data and biological images, BrainsMapi achieves accurate registration on intramodal, individual, and multimodality datasets and can also complete the registration of large-volume datasets (approximately 20 TB) within 1 day. In addition, it can register and integrate unregistered vectorized datasets into a common brain space. BrainsMapi will facilitate the comparison, reuse and integration of a variety of brain datasets.

Mapping brain structures in three dimensions is necessary to thoroughly understand brain functions 1 . Creating a comprehensive space in which various brains are mapped encompasses complex spatiotemporal information that can greatly facilitate the comparison 2 , reuse 3 and integration 4 of brain datasets. Drawing a stereotaxic brain atlas 5,6 provides a unified spatial reference for addressing this issue. However, with the rapid development of high-resolution whole-brain microscopic imaging [7][8][9] , the obvious heterogeneous and nonuniform 10 characteristics of brain structures at the cellular level make it difficult to map the variety of experimental datasets from different individuals and modalities to a standard brain space using the uniform registration methods 11 that have been used effectively in previous macroscopic datasets, such as magnetic resonance imaging (MRI) datasets. In addition, given the large-volume datasets produced during imaging, we urgently need a robust nonlinear registration pipeline that can register massive spatial information datasets with cellular resolution.
Previous studies [11][12][13][14][15] have been conducted to solve the registration of whole-brain three-dimensional datasets, especially within the MRI field. Among these, gray-level-based registration algorithms 11 can effectively achieve nonlinear registration of MRI datasets with uniform signals. These methods can also be applied to optical microscopic images. For example, Leonard et al. 16 processed serial two-photon (STP) datasets to obtain an average brain. However, the gray-level-based methods are highly dependent on the quality of the original image signals. Similarly optical microscopy images are susceptible to variations in sample preparation and imaging processes. Hence, for the more complex optical microscopy images, feature-based registration algorithms [17][18][19] are the better choice because the factors that affect registration accuracy can be changed from the grayscale level to the extracted features level; then, the bottleneck involves accurately and objectively extracting sufficient features. Ohnishi et al. 14 completed the registration of two-dimensional microscopy images to MRI data by manually extracting feature points, but their number was limited and subjective. Wang et al. 20 used automatic methods to extract feature points and achieved the registration of histopathological data, but the accuracy of the automatic recognition methods used was greatly affected by image quality. Fürth et al. 17

Materials and Methods
The complete BrainsMapi pipeline is shown in Fig. 1. We extract anatomically invariant regional features to map brain structures and achieve better registration effects. Compared to observing anatomical points in three-dimensional brain space, extracting the boundaries of specific anatomical regions is more accurate, simpler, and includes information such as shape and size. Based on these findings, we use a parameter obtaining and high-resolution transformation strategy to achieve whole-brain registration of TB-scale datasets at cellular resolution. We describe these two aspects in the following experiments and results.
Model data. We designed five simple cartoon models with smiling and crying faces (size: 400 × 400 pixel 2 ) to demonstrate the effectiveness and robustness of BrainsMapi. The processes are as follows.
(1) Fixed model: A round smiling face in which the eyebrows, eyes, and mouth are marked with different gray level values.
(2) Model 1: We manually designed a deformation, deformed the fixed model, and made it a crying face.
(3) Model 2: To simulate a weak SNR in a real biological dataset, we used the histograms of the corpus callosum (cc), hippocampal region (HIP) and cerebellum (CB) from the Allen CCFv3 Nissl stained dataset and assigned the gray value randomly and automatically to the eyebrows, eyes, and mouth regions of Model 1.
(4) Model 3: We added streak noise to Model 2 using the sin function y x w x A A (sin(( / ) ) 1 ) = π + − , where w = 20 and A = 7. (5) Model 4: Based on Model 3, we set a gray level of zero in a triangular area of the mouth to simulate a torn sample condition; the triangular area was also created manually.
Biological datasets. We employed 6 whole-brain datasets and 6 sets of metadata to validate BrainsMapi.
All the animal experiments followed procedures approved by the Institutional Animal Ethics Committee of Huazhong University of Science and Technology, and this study was specifically approved by this ethics committee. All the methods were carried out in accordance with relevant guidelines and legislation. Dataset 1 16 is from the CCFv3 of the Allen Institute, which is an average brain obtained by averaging 1,675 STP brains. The average brain, annotation file, and individual Nissl datasets in four resolution levels (100 μm (132 × 80 × 114 pixel 3 ), 50 μm (264 × 160 × 228 pixel 3 ), 25 μm (528 × 320 × 456 pixel 3 ) and 10 μm (1320 × 800 × 1140 pixel 3 )) are openly available at http://brain-map.org. We adopted the average brain dataset with the highest (10 μm) resolution.
Dataset 2 24 is a volume of images representing the canonical Waxholm Space (WHS) of the adult C57BL/6 J mouse brain. Five datasets are provided, including T1, T2, and T2* datasets, a Nissl stained dataset, and the manually labeled Atlas dataset. All these datasets are available from https://www.nitrc.org/. Here, we chose the T2*-weight MRI image dataset. Dataset 3 7 is a whole-brain image dataset of a Nissl-stained C57BL/6 adult mouse imaged by MOST. The three-dimensional dataset with 1 μm axial resolution and 0.35 μm horizontal resolution was acquired in 7 days using the MOST automatic slicing imaging system. The coronal slice number and the original data size are approximately 11,000 and 4 TB, respectively. Dataset 4 8 is from a whole-brain dual-color labeled dataset of the Thy1-GFP M-line of transgenic mice imaged by BPS. We adopted the colocalized fluorescent labeled neurons and counterstained cell bodies dataset, which is brain-wide with a 2 μm axial resolution and a 0.32 μm horizontal resolution. The coronal slice number and original data size of one channel of the dataset are approximately 4,800 and 3 TB, respectively. Here, we adopted the cytoarchitectonic channel dataset.
Dataset 5 8 is a specifically selected problematic dataset that contains streaks caused by uneven staining and illumination and sample tearing phenomena. This dataset was used for comparison with later results. The imaging system and the staining method are the same as those used to create Dataset 4. Dataset 6 21 consists of a collection of published articles from the Division of Neurophysiology, MRC National Institute for Medical Research, London. Briefly, it is a dataset of autofluorescent C57BL/6 mice imaged by STP microscopy. The original dataset has an axial resolution of 5 μm and a coronal resolution of 0.32 μm. The dataset is available from http://www.swc.ucl.ac.uk/aMAP. Please refer to Table S1 and Fig. S1 for the main characteristics of all the image datasets and metadata.
Computing environments. In this study, we used two computing devices: a graphical workstation equipped with 20 cores (Intel Xeon E5-2687w × 2) and 128 GB of RAM, and an HPC Cluster with 20 nodes, each equipped with 20 cores (Intel Xeon E5-2660 V3 × 2) and 128 GB of RAM and connected to the Lustre file system via a 10 Gb Ethernet network.
image preprocessing. Image preprocessing aims to obtain high quality images for good registration results ( Fig. 1 Preprocessing). First, brightness and light-field corrections are used to reduce uneven staining and illumination of optical microscopy images 25 . Then, the corrected dataset is sampled isotopically. Finally, we use the adaptive threshold method 26 and morphological operations (hole filing, opening and closing operators) to extract the mouse outline and manually fix it.
Regional feature extractions. The accuracy of selected features is directly related to the registration effects. Here, taking advantage of brain anatomy information, we extracted anatomically invariant regional features to register two brains. The segmented regional features are those with anatomical meanings, that is, brain Registration pipeline for large volume brain datasets. Input dataset: mammal, three-dimensional whole brain, micron dimension, TB-scale dataset. The crucial steps to ensure the registration robustness include preprocessing, regional feature extraction, and obtaining transformations, are shown on the left. The obtained warping parameters are input to the right, and the registration process for large-volume brain datasets includes image blocking, block transformation and blocked data reslicing. The registered results are obtained in three anatomical sections. The region lines in 'Output dataset' is from the Allen CCF v3, © 2004 Allen Institute for Brain Science. Allen Mouse Brain Atlas. Available from: atlas.brain-map.org.
regions or nuclei. These regional features are conserved; therefore, we can delineate the boundaries of these feature regions in each brain sample.
Here, we selected an interactive segmentation tool, Amira (version 6.1.1; FEI, Mérignac Cedex, France) to perform the feature extraction procedure ( Fig. 1: Extraction of regional features). Briefly, the selection of extracted brain regions and nuclei needs to follow these three criteria.
(1) Distributed throughout the brain to ensure the accuracy of brain-wide registration.
(2) Easily identifiable anatomical regions to ensure accuracy and objectiveness for feature extraction.
(3) Registration based on the selection of conserved brain regions or nuclei that are guaranteed to occur in every brain to ensure the registration correctness.
(1) 3D segmentation results: It is necessary to modify the segmentation results repeatedly in the coronal, sagittal and horizontal planes to ensure that the results have continuity in three dimensions.
(2) Accuracy: Segmentation should be conducted under the guidance of anatomical experts and verified by at least three people's back to back.
(3) Rapidity: We used Amira's interpolation function to automatically evolve the anatomical regions between two known region images.
We show these regions in Fig. S2. To ensure registration accuracy, the entire regional feature extraction and subsequent transformation obtaining steps were performed at a 10 μm resolution.
Accurate transformation obtaining. After accurately extracting the features, we need to map them to obtain accurate transformation parameters ( Fig. 1: Obtaining transformations). The process of obtaining transformation is an optimization problem. The moving image is warped to the fixed image by the initial transformation parameters, and the similarity metric of the moving and fixed images is used as an energy function. Iteratively, the transformation parameters are updated to achieve an optimal solution and obtain the corresponding transformation. Here, we chose the Symmetric Diffeomorphic Normalization (SyN, ANTs tools, version 2.x) 22 nonlinear registration method. SyN customizes the symmetry deformation based on the standard method of the Large Deformation Diffeomorphic Metric Matching (LDDMM) proposed by Beg 27 . SyN flexibly records the displacement of each pixel with a large deformation and produces a diffeomorphic transformation of symmetric and invertible displacements (Eq. (1)).

{ }
Ω where ϕ1 and ϕ2 are the diffeomorphism fields in the opposite directions of domain Ω indexed by time t, t ∈ [0,1], and v 1 and v 2 are the opposite-direction velocity fields. Physically, the distance drives the motion of each pixel and is determined based on the image's potential energy. When the original images are replaced by our extracted regional features, the movement of each pixel is determined by the potential energy of the features, which are not disturbed by the gray signals in the original image. In short, when we replace I and J, which represent the original images with the feature images I 0 and J 0 , we can obtain accurate transformations. We used a multiresolution strategy in both linear and nonlinear registrations for acceleration and adopted mutual information as the similarity measure 28 . The entire step to obtain the transformations requires approximately 3 hours. We obtain the linear matrix M, the nonlinear direct displacement field ϕ1 and the inverse field ϕ2.
The displacement is presented in a grid form to intuitively illustrate the nonlinear deformation effect of the diffeomorphism method based on regional features (Fig. 2). Three-dimensional nonlinear displacement (Fig. 2B) and nonlinear registration results (Fig. 2C) are obtained by the nonlinear registration from the linear results ( Fig. 2A). The global and local nonlinear deformation effects can be observed from the coronal, sagittal and horizontal sections, respectively (Fig. 2D). For comparison, we also show the registration effects with the atlas line superimposed on the nonlinear results (Fig. 2E).
Nonlinear transformation for a large volume dataset. In this process, we block large-volume datasets and efficiently transform each block in parallel by performing real-time interpolations of the deformation field instead of upsampling the entire dataset directly, which would lead to TB-scale displacement field generation.
We obtain the transformation parameters after registration at the low resolution (10 μm isotropic) and use them for the high-resolution transformation as follows: (1) For linear parameters, the low-resolution linear matrix M is multiplied by scale matrices to produce a high-resolution linear matrix (M′), as shown in Eq. (2); where scaleM 1 and scaleM 2 are the amplification and shrunken matrices, respectively.
(2) The nonlinear parameters are represented by deformation fields that express the displacement for each voxel in three-dimensional space in a three-channel form (ϕ x , ϕ y , ϕ z ). Conversely, we calculate each voxel displacement (ϕ x (x, y, z), ϕ y (x, y, z), ϕ z (x, y, z)) by trilinear interpolation (Eq. (3)) during the transformation process according to the low-resolution displacement fields: (3) P(x, y, z) denotes the space coordinate of voxel P before registration in the high resolution, and the coordinates after linear and nonlinear registration are P' (x' , y' , z') and P"(x", y", z"), respectively. Here, s indicates a scaling factor, which is the quotient of the low-resolution and high-resolution ratios. Then, the mapping relationship is shown in Eq. (4): The three-dimensional rendering of the brain outline and three anatomical sections (coronal, horizontal and sagittal) before nonlinear registration; (B) Threedimensional displacement; (C) Three-dimensional deformation field applied to the original three-dimensional dataset; (D) A two-dimensional grid shows the application of the deformation fields in the coronal, sagittal, and horizontal planes; (E) The registration of coronal, sagittal and horizontal sections corresponding to (D), respectively. Scale bars: 1 mm. The region lines in E is from the Allen CCF v3, © 2004 Allen Institute for Brain Science. Allen Mouse Brain Atlas. Available from: atlas.brain-map.org. (2020) 10:2139 | https://doi.org/10.1038/s41598-020-59042-y www.nature.com/scientificreports www.nature.com/scientificreports/ After obtaining the mapping relationship, we use TDat tools 29 to partition the raw image sequences into cubic blocks ( Fig. 1: Image blocking). The transformed space is precalculated based on the sizes of the fixed images and the scaling factor (s). Next, we apply transformation parameters for each block in transformed space separately. The details are as follows ( Fig. 1: Block transformation).
(1) For each block in transformed space, we calculate the block involved in the original space by transforming all the points on the six surfaces of the block using the mapping relationship (Eq. (4)). Then, the corresponding blocks (ROI blocks) in the original space are loaded into memory. Generally, a transformed block only needs to import about 4~20 corresponding blocks in original space.
(2) For every voxel in a block, we calculate the spatial coordinate of the corresponding voxel in the ROI blocks using the mapping relationship (Eq. (4)) and use the grayscale value of the coordinate for this voxel.
(3) Each block is written to a disk in three-dimensional image format. After completing the above process, we acquire the registration of the three-dimensional image dataset at the original resolution. Finally, the registered two-dimensional image sequences of the three anatomical sections (coronal, horizontal, sagittal) are generated by reslicing ( Fig. 1: Blocked data reslicing).
We used the following ideas to optimize the calculation efficiency. First, parallel implementations of the process-level message passing interface (MPI) and thread-level OpenMP are applied to the data reading, writing and calculating operations in a multicomputer environment. Second, we use video coding 30 to greatly compress the image datasets, reducing the burden of a large number of I/O operations during transformation. Third, image data external to the brain are considered as redundant to reduce the number of calculations.

Vectorized dataset registration.
Integrating datasets such as cells 31,32 , neurons 8,33 , and vessels 34 requires mapping them into a standard space. These vectorized datasets are constructed of point sets and coordinates. Using neurons as an example, traced neurons are saved as SWC files in the form of sequence points. As an incidental outcome of large-volume transformation, we can easily register these vectorized datasets.
Multilevel quantitative evaluation. Here, with reference to the aMAP 21 evaluation method, we designed a set of multilevel quantitative assessment methods, from coarse to fine, and assessed the registration results from the brain-region level to the identifiable nuclei level at a 10 μm resolution objectively.
By design, we first assess the accuracy at the brain-region level, a coarse assessment. A total of ten brain regions are chosen throughout the brain: Outline, CB, CP, hindbrain (HB), HIP, hypothalamus (HY), isocortex (ISO), midbrain (MB), pons (P) and thalamus (TH). We manually segment the moving image after registration, but regard the segmentation results as a silver standard 11 instead of a golden standard. To reduce the workload, we do not segment each complete brain region slice by slice; instead, we selected 50 images at the same interval in each region and segmented those slice by slice. All the manual segmentation results are referenced to the Allen CCFv3. The Dice score 35 (Eq. (5)) was selected as the evaluation measure: where I is the moving image after registration, J is the fixed image, and ∩ denotes the intersection of two images. The Dice score is calculated in two-dimensions, and the number of Dice scores for each brain region is 50. To design an identifiable nuclei level evaluation at a 10 μm resolution, we selected nine nuclei selected brain-wide (anterior cingulate area (ACA), primary visual area (VISp), primary somatosensory area (SSP), reticular nucleus of the thalamus (RT), ventromedial hypothalamic nucleus (VMH), periaqueductal gray (PAG), subiculum (SUB), entorhinal area, lateral part (ENT1), medial vestibular nucleus (MV)). Twenty-two trained technicians segmented these regions into five registered datasets (four types). First, we select a representative coronal section in the CCFv3 for each nucleus; a stack (40 sequences) of corresponding positions in the registered datasets is also provided. Then, these 22 individuals were tasked with identifying a single coronal section from the stack that they considered the most similar to the reference coronal section and segmenting it. Based on this procedure, the STAPLE algorithm 36 was used to fuse the 22 human segmentation results and obtain the STAPLE results. The Dice scores of each segmentation result and the STAPLE results are considered as the human performance (HP), while the Dice scores of the CCFv3 and STAPLE results are calculated as the registration performance (RP). Then, a correlation analysis was performed.

Experiments and Results
A model experiment for robust demonstration. To demonstrate the effectiveness of our designed regional features, we customized a model experiment to compare BrainsMapi with other registration methods under different situations (see Materials and Methods).
A crying face turning into a smiling face demonstrates that good registration effects have been achieved. We manually segmented several regions (e.g., mouth, eyes) in synthetic images and registered them to the fixed model. The last column in Fig. 3 shows that regardless of the deformation, weak SNR or streak noise, BrainsMapi (2020) 10:2139 | https://doi.org/10.1038/s41598-020-59042-y www.nature.com/scientificreports www.nature.com/scientificreports/ can obtain good registration results. Although the tearing effects in images are not fixed after registration, BrainsMapi prevents incorrect registration results.
In contrast, we demonstrate the effects of other registration strategies on the model data, mainly focusing on the points-and gray-level-based methods. The results of manually selecting feature points (Symmetric Diffeomorphic Normalization (SyN) from 40-50 manually selected points) involve some randomness due to the limited numbers and inconsistent locations of the selected feature points (Fig. 3, second column). It is difficult to extract accurate feature points automatically (SyN with Harris) from images with weak SNR and strong noise, which leads to inaccurate registration (Fig. 3, third column). The gray-level-based registration method (SyN with mutual information (MI)) is also difficult to process under weak SNR, strong noise interference and torn images (Fig. 3, fourth column). Additional three-dimensional model demonstrations are provided in the Supplementary Materials (Fig. S3).
We also compared and quantified the registration results on the real biological dataset (Fig. S4). We selected 50 pairs of feature points from the whole brain (Fig. S4D), but the results are still poor (Fig. S4A). Moreover, we found it difficult to select points accurately (especially in the z direction) and determine which slice corresponds to which point. We emphasize that anatomical regional features can better ensure the accuracy and robustness of registration.
Registering sample tearing and streak images for intramodal registration. The complications simulated in the model data also exist in brain images. There are differences in image quality even with images in the same modality. These images are easily disturbed during sample preparation and imaging processes and can include obvious streaks caused by uneven illumination (Fig. S5). Nevertheless, the image signals of the intramodal datasets are similar after the preprocessing step; in particular, they have small individual differences, and hence, their registration is relatively easier.
Using Dataset 4 as a reference brain, we present an intramodal registration by aligning Dataset 5 to the reference brain based on BrainsMapi. We visualize the registration results in the form of checkerboards in Fig. 4. The global orientation of the mouse brain is corrected after the linear registration (Fig. 4A), while the local brain regions and nuclei are adjusted after the nonlinear registration (Fig. 4B). Compared to the linear results (Fig. 4C), the nonlinear registration resulted in good local correction of regions such as the olfactory areas (OLF), CB, HIP, and the paraflocculus (PFL) (Fig. 4D). The small purple arrows in Fig. 4C,D indicate that misalignment positions that are corrected after the nonlinear registration.
In addition, we evaluated the registration accuracy at the brain-region level (see the Materials and Methods section). In the box plot (Fig. 4E), the median Dice score for all brain regions is between 0.9 and 0.99. Generally, a Dice score above 0.8 indicates that a good registration effect has been achieved 11,22 . Register individuals with large differences and deformations for intermodal registration. In addition to image quality, there are general differences among individuals and deformations, especially the deformations incurred during the brain sample preparation stage of mesoscopic optical imaging. The registration of these large deformation datasets is more difficult. Here, we expect to show registration results of individuals with large differences and deformations. By applying a method for measuring the distance between anatomical landmarks 37 , we found that the greatest difference in individuals and deformation pairs was in Datasets 2 and 4 among the following four datasets (Dataset 2 (MRI), Dataset 3 (MOST), Dataset 4 (BPS) and Dataset 6 (STP)) (Fig. S6). We also show the distance between these two datasets in Fig. 5A (median 596.9 μm).
Dataset 4 is registered using Dataset 2 as a reference brain. By merging Dataset 4 (green channel) with Dataset 2 (purple channel), we present the intermodal registration results for horizontal (Fig. 5B), sagittal (Fig. 5C) and several coronal sections (Fig. 5D). The brain outline and large brain regions are well aligned. Moreover, enlarged views are provided in Fig. 5E. Nuclei such as CB, HIP, aco, cc, CP, and VIIn can also be well mapped. www.nature.com/scientificreports www.nature.com/scientificreports/ We also conducted a quantitative assessment of brain-region levels. In the box plot (Fig. 5F), the median Dice score for all the brain regions is between 0.9 and 0.99.
Register to a reference atlas for multimodal registration. We present a special registration type in this section that is registered to the reference atlas. Using direct or inverse registration to the reference brain space, we are able to acquire the spatial information of the original dataset, which is the key to integrating multi-modal image datasets into a common brain space.
We register four types of datasets (Datasets 2-4 and 6) to Allen CCFv3 (Dataset 1) and present the registration results. From top to bottom (Fig. 6), the images correspond to the registration results of Datasets 2-4 and 6. We selected three coronal sections with the form of the Allen CCFv3 on the left; the yellow dotted line shows the Allen CCFv3 superimposed on the original image in the results. The brain outline and large brain regions, such as HIP, TH, HY, MB and CB, have good alignment with the reference atlas as judged from their spatial orientations and anatomical boundaries. Moreover, we present enlarged views of local regions of nuclei for each data type. The nuclei with distinct boundaries, such as act, dentate gyrus, granule cell layer (DG-sg), and CP, are well aligned. For ACA, the upper and lower boundaries can be judged by the original image. For MV and the spinal nucleus of the trigeminal, interpolar part (SPVI) with inconspicuous boundaries, the accuracy can be judged by the brain spatial orientation. All these areas are well aligned. www.nature.com/scientificreports www.nature.com/scientificreports/ According to the above registration results, we first assess the accuracy at the brain-region level to form a rough assessment (See Materials and Methods). We evaluated ten brain regions in the brain-wide data of the five datasets. The results are shown in box plots (Fig. 7A). The median Dice scores of all the brain regions and datasets are above 0.9.
In addition to the assessment of identifiable nuclei level at a 10 μm resolution, nine brain anatomical structures are selected (see the Materials and Methods section). When we grouped the results by HP or RP, the medians acquired by HP (Fig. 7B, black) were not significantly different from those acquired by RP (Fig. 7B, orange)   Figure 7. Multilevel quantitative assessment results. (A) The assessment at the brain-region level: boxplot of Dice scores for ten brain regions of five brain datasets (indicated with different colors) (n = 5). (B) The assessment at the nuclei level: Dice scores of registration performance (RP, orange) and human performance (HP, black) are grouped by structure (n = 4). The vertical lines indicate the median scores. (C) The assessment at the nuclei level: Dice scores of registration performance (RP, orange) and human performance (HP, black) are grouped by dataset (n = 5). The brains used in (B) are marked with an asterisk.
(Mann-Whitney U-test, score of 0.90 versus 0.92, P = 0.054; n = 4 brains, 9 structures, 22 human raters). When we grouped these scores by nucleus, there were no significant differences between HP and RP in eight structures except that the PAG in HP is significantly better than the RP at segmenting the PAG regions (Fig. 7B) (Mann-Whitney U-test, score of 0.94 versus 0.90, P = 0.017; n = 4 brains, 9 structures, 22 human raters). When we grouped the results by datasets, there was no significant difference between HP and RP for the four individual brains (Fig. 7B) (Mann-Whitney U-test, P > 0.06).
All these results demonstrate that the method proposed in this paper has extremely high accuracy at the brain-region level and its results do not differ significantly from manual results performed at the nuclei level at a resolution of 10 μm.
Whole-brain registration for TB-scale dataset at single-cell resolution. We used the low-resolution registration results (10 μm) to demonstrate the robustness and accuracy of BrainsMapi in the previous section. Additionally, we achieved registration of the TB-scale dataset at single-cell resolution.
Nonlinear registration for a large-volume dataset is able to correct nonuniform deformations in brains and integrate datasets at the cellular level. The original images can be registered to the standard brain space without any loss of resolution, while in the past, they were analyzed under original unregistered images. We can now maintain the original resolution while directly obtaining the analyzed results from the registered high-resolution images. The small arrows indicate the corresponding fine structures before and after registration. Allen CCF v3, © 2004 Allen Institute for Brain Science. Allen Mouse Brain Atlas. Available from: atlas.brain-map.org. Dataset 5, which has a 0.32 μm × 0.32 μm × 1 μm original resolution, is sampled to 0.32 μm isotropic. The dataset is approximately 20 TB in size. The processed images are aligned to the Allen CCFv3 in three dimensions ( Fig. 8A-C), and the brain regions are matched accurately (Fig. 8G). We also present the difference before and after registration of coronal images (Fig. 8D,G). For the finer-scale enlarged views, the neuron fibers are simultaneously corrected and maintain the continuous structure (Fig. 8E,H). The fine and weak signals also remain consistent after registration. We also demonstrate the high-resolution registration results on the Nissl stained dataset (Fig. S7).
Finally, we evaluated the registration performance on a large-volume dataset ( Fig. 9 and Table S2). As the data size increases, BrainsMapi memory consumption remains stable at approximately 14 GB, while the memory consumptions of ITK and ANTs increase sharply (Fig. 9A). Furthermore, BrainsMapi can process a 1 TB dataset within several hours and requires only 24 hours for 20 TB datasets (Fig. 9B).
Registering existing metadata to a reference atlas. Metadata refers to digitized, vectorized information acquired from the original image dataset. These metadata are scattered across different brain spaces, individual laboratories and projects. As a part of the BrainsMapi features, a vectorized dataset can be registered into a standard brain space to complete the integration and localization.
By aligning metadata to the Allen CCFv3, we present the registration results of vectorized neurons. The vertebral neurons are manually traced from unregistered Brain 4 (Fig. 10A). We deform these metadata and automatically obtain the projection patterns from the cortex (CTX) through the TH and then project them to the MB and medulla (MY) (Fig. 10B). For comparison, we also present the results before and after registration in the sagittal and coronal (Fig. 10C) planes. The projection pattern of each neuron can be automatically calculated by the localization results as well (Fig. 10D).
Additionally, we present the registration results of some other metadata, including segmented blood vessels (Fig. S8A), traced neurons (Fig. S8E), and counted cell bodies (Fig. S8I). Automatic location and integration of these scattered metadata from different sources can be completed by BrainsMapi.

Discussion
In this paper, we propose a robust registration interface for large-volume brain datasets, named BrainsMapi. We use anatomically invariant regions during the registration to ensure the accurate extraction of a large number of features, enabling BrainsMapi to register various datasets accurately. Furthermore, by obtaining both the low-resolution parameters and conducting a high-resolution transformation strategy, we can realize registrations of TB-scale whole-brain datasets by block warping. We demonstrate the robustness of our registration method on both model data and real brain images and present the nonlinear registration results of a three-dimensional whole-brain fine image dataset at single-neuron resolution. Additionally, the labeled and existing vectorized datasets are registered to a standard brain space. Finally, we design an objective multilevel evaluation method to prove the accuracy of our approach.
The regional features of our method are not limited to cytoarchitectural image data, reflecting its wide applicability. As long as the anatomical region can be identified in the image, the method can be used for registration. Moreover, the selection of regional features is not absolute and can be autonomous based on the image characteristics or experimental requirements until the registration results meet expectations. An ideal situation is to select all the brain regions for registration; this approach obtains the best results, but the cost is high. Therefore, we analyzed the relationship between the selected features and BrainsMapi's performance (Fig. S9) and found that with the continuous addition of features, the accuracy of local regions related to the features is improved. Additionally, due to the different registration tasks and different user skills, it is difficult to quantitatively evaluate the time cost of extracting features: for some simple and small features, such as MH, fr and mtt, feature extraction takes only a few minutes; for more complex and large features, such as the outline and CB, it takes between half an hour and an www.nature.com/scientificreports www.nature.com/scientificreports/ hour approximately. In this paper, we recommended 14 regions based on experience to obtain sufficiently accurate results at lower cost. A skilled person may require only about 4 to 6 hours to perform the feature extraction steps needed to obtain accurate registration results.
Another important aspect of this paper is the registration method for large-volume datasets. It is difficult to achieve whole-brain registration with accurate positioning at the cellular level due to hardware limitations and the complexity of the registration algorithms, and the highest 3D mouse brain atlas in the world has only a 10 μm resolution (CCFv3). Therefore, BrainMapi's high-resolution registration strategy is currently the most appropriate. Certainly, once a 3D brain atlas at the cellular level is available, accurate registration to the mesoscopic level will be our ultimate goal. In addition, our method is highly scalable and is very promising for future applications to petavoxel datasets, such as the marmoset and human brain datasets.
Neuroscientific analysis with brain spatial orientation requires matching the dataset to a standard brain space coordinate system to obtain anatomical boundaries. More general and common analysis skills involve combining multimodality and multiscale datasets, such as MRI, optical imaging, or even electron microscopy datasets, to reveal the structural and functional relationships in the brain,. The integration of mesoscopic and macroscopic datasets is more meaningful, valuable and efficient. Many projects [38][39][40] have used registration techniques to integrate various datasets-even the recent international brain projects 1 desire to develop a powerful, standardized and industrialized framework to integrate multiscale, multimode and massive datasets for studies on brain function mapping, disease models, and behavioral cognition 41 . BrainsMapi is highly compatible with the data integration requirements. It can accurately register various image datasets and existing vectorized metadata and handle high-throughput, TB-scale, large-volume whole-brain datasets, providing a complete and effective pipeline for brain data integration.
The datasets and codes of this study are available from the corresponding author on reasonable request. The projection pattern of a single neuron. Allen CCF v3, © 2004 Allen Institute for Brain Science. Allen Mouse Brain Atlas. Available from: atlas.brain-map.org.