Background & Summary

Whether focusing on a large swath of cortex or a single subcortical nucleus, consistent and reliable visualization of brain microarchitecture is critical for the creation of reference points which demarcate the brain’s landscape1. This is true not only for the identification of landmarks (or regions of interest), but also the study of local circuits therein. Thus, detailed views into the brain’s microarchitecture can be used to study disease, experimentally target circuits, and to advance the field’s understanding and integration of each of these overarching neural systems.

With advances in the reconstruction and analysis of significantly larger brain volumes, neuroscientists are now able to visualize patterns of microarchitecture that arise at a scale previously inaccessible using traditional methods2,3,4. Examples such as CLARITY5, expansion microscopy6, serial two photon tomography7,8,9, multi-beam scanning electron microscopy10, and X-ray microtomography11,12,13,14,15, now provide access to several regions of interest within a volume of tissue simultaneously, providing rich context to study both local circuitry and long-range projections. With many of these new techniques, it is possible to image and analyze large intact anatomical samples that preserve the connectivity between multiple regions of interest16,17, thus providing a lens into the heterogeneity of neural structure within and across different brain areas.

Here, we introduce a three-dimensional neuroanatomical dataset extracted from a validated, in-vitro mouse thalamocortical sample spanning six anatomically distinct regions of interest (somatosensory cortex, two thalamic nuclei, zona incerta, striatum and hypothalamus)16. This dataset was reconstructed using X-ray microtomography to reveal a diverse composition of microstructures (e.g., myelinated axons, cell bodies, and vasculature) within each region at isotropic, micron-scale resolution. For a selected number of regions, we also provide validated annotations performed at the area level (which identifies regions of interest several hundreds of microns in size) and at the pixel level (which identifies microstructures at a micron scale). To technically validate the dataset, human annotators assessed two series of extracted images. We found that both annotators were able to classify images from these datasets reliably and accurately, likely in part due to the heterogeneity of the microstructures throughout each regions of interest. This dataset and multi-scale annotations are available for visualization and download from an online, interactive atlas (

Ultimately we envision this heterogeneous, 3D brain volume dataset as a resource not only for neuroscientists interested in exploring structures within this thalamocortical pathway, but also machine learning scientists seeking data diverse enough to test computer vision methods for brain area prediction and segmentation. We further believe the provision of this dataset will prompt collaborative opportunities for both experimentalists and theorists interested in exploring neural circuitry at the micron-level and beyond.


Sample preparation

All animal experiments were approved by the Institutional Animal Care and Use Committee (IACUC) at the University of Chicago. The thalamocortical sample for this dataset was obtained from an 8 week old, C57BL/6J female mouse. The animal was deeply anesthetized using Euthasol (60mg/kg), then transcardially perfused. Vasculature was first flushed with 0.1M cacodylate buffer, followed by primary fixatives paraformaldehyde (2%) and glutaraldehyde (2.5%) in 0.1M cacodylate buffer. The brain was dissected from the skull and then post-fixed for 48 hours. Following multiple rinses in 0.1M cacodylate buffer, the brain was sliced on a vibratome at a thickness of 450 um until the thalamocortical slice was obtained16. At this point, a 1.7 by 6.5 mm strip of tissue which preserves the pathway from somatosensory cortex to the ventral posterior thalamic nucleus was dissected (see Fig. 1a–c)16. Prior to imaging, the total estimated volume of this sample was 5 mm3. The sample was further post-fixed in paraformaldehyde (2%) and glutaraldehyde (2.5%) for 2 hours at room temperature, rinsed three times with 0.1M cacodylate buffer, and stored overnight in 0.1M cacodylate buffer at 4° Celsius. The tissue was then embedded with heavy metals as described by18. The sample was initially stained with (2%) buffered osmium tetroxide for 1.5 hours at room temperature followed by (2.5%) potassium ferrocyanide for 1.5 hours at room temperature. After rinsing with water, tissue was incubated in (1%) filtered thiocarbohydrazide at 40 degrees Celsius for 45 minutes. Following rinsing with water, the tissue was stained with another round of (2%) unbuffered osmium tetroxide for 1.5 hours at room temperature. After another thorough rinse with water, the sample was stained with (1%) aqueous uranyl acetate overnight at 4 degrees Celsius and at 50 degrees Celsius for 2 hours. Following the final water rinse, tissue was stained with lead aspartate for 2 hours at 50 degrees Celsius. This was followed by dehydration through a series of graded ethanols and propylene oxide, and a gradual infiltration of the tissue with epon resin. The infiltrated sample was incubated in 100% epon resin overnight, before being placed in 1.5 mm cylindrical tubing with fresh 100% resin. The preparation was then cured in an oven at 60° Celsius for 48–72 hours.

Fig. 1
figure 1

Overview of the thalamocortical sample microarchitecture and 3D reconstruction. Data from the Allen Reference Atlas (ARA) ( provides a schematic overview of the dataset regions of interest (a), along with cytoarchitectural differences as identified by NeuN staining (b)8. (c) The photomicrograph to the right shows the thalamocortical slice prior to dissection, with the final sample volume outlined in red. CTX = somatosensory cortex; VP = ventral posterior nucleus (d) Visualization of the synchrotron X-ray microtomographic data acquisition process. X-ray projections were acquired and reconstructed into a 3D image volume with micron-scale isotropic resolution (1.17 μm pixel size).

Imaging and reconstruction

Synchrotron X-ray tomography was performed on the embedded sample on the 32-ID beamline at the Advanced Photon Source in Argonne National Laboratory as described by Vescovi and colleagues19. X-ray radiographs were recorded with a detection system consisting of a LuAG:Ce scintillator converting X-rays into visible light that were magnified with a 5X objective lens onto a CCD detector with 1920 x 1200 pixels (5.86 μm; Flir Grasshopper 3, Model #GS3-U3-23S6M-C). To improve the spatial resolution, the detector was built with a large NA (0.21) long working distance Mitutoyo 5X objective lens with a resolving power of 1.3 μm and 14 μm depth of focus. To maintain its resolving power, the lens is coupled with a 13 μm thick, thin film scintillator matching its depth of focus20. With a 5X magnification, the pixel size of each projection image was 1.17 μm. Exposure for a single projection image took approximately 30ms, thus the total imaging acquisition time for the collection of 3000 projections was approximately 1.5 min in total (see Figure 1d for an illustration of the imaging process).

Each single reconstructed dataset corresponded to a region of 1920 × 1920 × 1200 voxels. These volumes were then stitched together using Tomosaic software19. The entire volume was trimmed down to 720 × 1420 × 5805 voxels3 which corresponded to 0.842 × 1.661 × 6.792 mm3. Acquired data was stored in HDF5 files with Dxchange format (see21). To make subsequent analysis and labeling more aligned with an anatomical frame of reference, we virtually resliced the data to produce 720 images (each 1420 × 5805 pixels), where each image spans all cortical layers and traversed all sub-cortical regions of interest as well. This resulted in an image volume of 5.9 Gigavoxels total, including pixels outside of the sample. These images were chunked into 14 independent stacks comprised of 50 .tiff files, spanning the entire length of the sample. These datasets were converted to 8 bits from 32 bit precision after histogram normalization.

Ground truthing and annotation for brain areas (regions of interest)

6 regions of interest were annotated by an experienced neuroanatomist using a macroscale view of the entire thalamocortical sample (from somatosensory cortex to hypothalamus). 9 of the above 14 stacks were used for annotations, performed using the 3D segmentation software ITK-Snap22. Nine images (slices in Z) distributed uniformly throughout the volume (z = z1, z2, ...) were annotated at the pixel level for each region of interest, approximately 50 images apart. The regions annotated included: somatosensory cortex (CTX), striatum (STR), the thalamic reticular nucleus (TRN), the ventral posterior (VP) nucleus of thalamus, zona incerta (ZI), the hypothalamus (HYP), and white matter (WM), which includes the internal capsule and corpus callosum. Each region’s annotation was therefore spaced 50 slices (58.5 μm) apart throughout most of the volume (z = 109, 159, 209, 259, 309, 359, 409, 459), with the exception of the HYP which was only present for the first four stacks within the dataset. The span of the volume considered was bounded by slices 109 to 459, as regions beyond these slices were insufficiently visualized to provide complete annotations. It took approximately 2 hours to complete annotations for all regions of interest across the 14 stacks .

Pixel-level annotations of selected regions of interest

We manually created ground truth pixel-level annotations of 4 major brain areas within the dataset: CTX, ZI, STR, and TRN. These regions were selected as they spanned the full extent of the sample, beyond simply cortical and thalamic regions of interest (i.e., through the inclusion of STR and ZI), making them well suited to studying architectural uniqueness within and across brain regions. To standardize the ground truth segmentations, we extracted 4 volumes of size (x:y:z) = (257:257:361), 1 for each of the 4 brain areas. The coordinates of each of these volumes are as follows, formatted as (xstart_xend__ystart_yend__zstart_zend): CTX (4600_4857__900_1157__110_471), ZI (1543_1800__650_907__110_471), STR (3700_3957__500_757__110_471), TRN (3063_3320__850_1107__110_471). Within each of these (257:257:361) brain area volumes, we densely annotated (starting at index z = 0) slice z = 30, 60, 90, 120, 150, 180, 210, 240, 270, 300, 330. This results in 11 densely annotated images per region and 44 images in total across the four areas, which took a total of 44 hours to complete .

Data Records

The X-ray imaged dataset and associated annotations were deposited in figshare23,24,25 and in bossDB ( bossDB is a spatial database that is optimized for access and visualization of three-dimensional neuroimaging data, allowing users to efficiently and dynamically access different sub-volumes of the data26. This database connects seamlessly to Neuroglancer27 which enables interactive visualization of large-scale, volumetric images, annotations, and analytics results from a web browser. Within this web portal, users can view and navigate the data in three dimensions using public access credentials (no account creation required). In addition to these web-based tools, we provide example Jupyter notebooks to demonstrate approaches for downloading raw data and annotations, and applying simple analysis algorithms to subvolumes of data ( These tools directly address the needs of both novice and expert users and are easily adapted to additional use-cases.

The stored data and annotations consist of the following three sets of images and annotations:

  • Original Image Data23: The raw images are rescaled from 32-bits and stored in 8-bit format. We computed the average number of bits of information in each pixel of the original image, confirming that an 8-bit depth was sufficient to capture the information in the CT stack. These images were then stacked into an image volume of size 5805 × 1420 × 720 (x,y,z). The pixel size is 1.17 μm isotropic. Valid labels span the entire X, Y, and Z axis of the volume. Slices spanning from z = 15 to z = 670 contain brain specimen, and outside of this range, the epon block the sample is embedded in is visible. In bossDB, these annotations are stored in a channel called images.

  • Region-of-interest (ROI) Annotations24: Manually labeled pixel-level annotation of brain regions of interest. Valid labels span the entire X and Y axis of the volume. Valid z slices are 109, 159, 209, 259, 309, 359, 409, 459. The labels are 0-> no label; 1-> cortex; 2-> striatum; 3-> trn; 4-> vp; 5-> zona incerta; 6-> internal capsule; 7-> hypothalamus; 8-> corpus callosum. In bossDB, these annotations are stored in a channel called region_of _interest.

  • Pixel-level Microstructure Annotations25: Manually labeled pixel-level annotation of neural microstructure. Four volumes of size 361x257x257 are annotated. The first volume, from cortex, spans z from 110 to 471, y from 900 to 1157, and x from 4600 to 4857. The second volume, from striatum, spans z from 110 to 471, y from 500 to 757, and x from 3700 to 3957. The third volume, from vp, spans z from 110 to 471, y from 850 to 1107, and x from 3063 to 3320. The fourth volume, from zona incerta, spans z from 110 to 471, y from 650 to 907, and x from 1543 to 1800. The labels are 0-> no label (background); 1-> vasculature; 2-> cell body; 3-> myelinated axon. In bossDB, these annotations are stored in a channel called pixel_annotation.

Technical Validation

Examination of pixel-level features across regions of interest

As one of the defining features of this thalamocortical dataset is it’s diverse collection of brain regions, we sought to assess the extent of tissue heterogeneity by examining four major brain areas at a microstructural level (CTX, ZI, STR, TRN). Specifically, the intensities of pixels within each region and the anatomical composition (cells, blood vessels, and axons) were measured (see Fig. 2c,d). We determined the composition of features for each region (assessing the percentage of blood vessels, cells and myelinated axons within each), based on manual annotations over 128 images (32/class, 150 μm by 150 μm in size). While relatively similar numbers of cells and vasculature were identified within each region of interest, the fraction of axons annotated within each region varied dramatically, with far fewer axons identified in the CTX and ZI relative to a high proportion of axons in the STR and TRN (Fig. 2d). The KL divergence between pixel intensity distributions (Fig. 2e) revealed that CTX is most dissimilar from TRN, which is evident given the vast differences in microstructure between these two regions of interest. Manual annotations further confirmed TRN and CTX were highly dissimilar in their microstructural composition, whereas the TRN and STR were highly similar as measured by the KullbackLeibler (KL) divergence (Fig. 2f). One unexpected finding was, in spite of the difference in their microstructural composition, STR and CTX were highly similar in their pixel intensity distributions. The 3D reconstruction of our thalamocortical dataset with microCT thus provided richer details that are unobservable upon examining stained, thinly sliced tissue using traditional light microscopy methods.

Fig. 2
figure 2

Validation of neuroanatomical heterogeneity within the dataset. In (a), an example of a reconstructed image from the dataset following X-ray acquisition, highlighting the regions of interest in the sample. From top to bottom: somatosensory cortex (CTX); striatum (STR); the thalamic reticular nucleus (TRN) and the ventral posterior nucleus (VP) of thalamus; zona incerta (ZI); and hypothalamus (HYP). (b) Examples of the microstructures identified manually within the different ROIS, including cells, axons and blood vessels. These examples each span a roughly 300 × 300 micron field-of-view and highlight the architectural diversity within and across regions of interest. (c) The distribution of pixel intensities across four selected regions of interest within the dataset (CTX, STR, TRN, ZI). (d) The distribution of pixels divided by underlying microstructure class (cell, blood vessel, axon) within each region of interest. In (e,f), we show the KL-divergence between: the pixel intensity distributions across the selected regions (e), and the microstructural composition of selected regions as measured with dense manual annotations (f).

Region of interest prediction from local views of the microarchitecture

This dataset is comprised of a range of brain regions which can be visualized and annotated by any user interested in exploring macro-level or local characteristics. To validate the use of this dataset with annotators, we assessed whether humans can accurately predict regions of interest within the sample using only a small field-of-view (150x150 microns); see Fig. 3a. For simplicity, TRN and VP were combined into a single region of interest (VP). The internal capsule and corpus callosum fiber tracts were included in this study, and categorized as WM. We then provided a training set of 48 images (8 per region of interest) for 2 annotators to examine, which took approximately less than 10 minutes for the annotators to get acquainted. After studying these examples, each annotator was provided with a test set of 180 novel images (30 per region of interest) to sort into one of six region of interest categories. This sorting procedure took both annotators less than 2 hours to complete. These annotators had previously studied example images from each region of interest prior to classifying these images, without extensive training. Interestingly, both annotators classified images within the CTX, STR and WM to a high degree of accuracy (>80%), whereas images from HYP, VP and ZI proved more challenging to classify (see Fig. 3b). Generally, CTX images were classified to the same degree of accuracy as WM images relative to other regions of interest (see Fig. 3c). We also noted that the annotators themselves varied in their classification performance; while Annotator 2 was slightly more accurate at classifying images from CTX relative to Annotator 1, they both had significantly more difficulty with identifying images from the HYP (*p < 0.05). It was also evident that the images from HYP were most challenging to classify, regardless of annotator and relative to those from CTX and STR (which had highly distinguishing microstructures, and therefore more likely to be correctly classified). Collectively, these findings support this heterogeneous 3D imaging dataset serving as a generalizable and useful resource for the field, given that human annotators can use it to a high degree of success for image classification.

Fig. 3
figure 3

Brain area prediction performance. (a) An annotated image from the dataset with each region of interest overlaid as a distinct color (left) and 150 × 150 micron snapshots from within (right), highlighting microstructural heterogeneity within each region. (b) The performance (f1-score) and inter-rater reliability of two annotators classifying image patches similar to those visualized in (a). Both annotators classified images into one of six different brain areas; each test set consisted of 180 images (30/class) for a total of 360 images classified. (c) Summary of significance in annotators’ ability to accurately predict a region of interest relative to others. Asterisks denote prediction measures that are significantly different, where * is used to denote p < 0.05, ** denotes p < 0.01, and *** denotes p < 0.001).