Background & Summary

Understanding how genes, development, and the environment produce variation in complex morphological traits is a core challenge in biology with evolutionary and clinical implications. Explanations for the generation of variation tend to cohere around the genotype-phenotype map concept. Genetic variation and genetic effects, like epistasis and pleiotropy, drive variation in developmental processes that act at different times and scales in anatomical context1,2,3. Specific developmental and genetic mechanisms then operate alongside embedded mechanisms, such as nonlinearities4,5 and gene redundancy6, to modulate these effects to express a phenotype7,8,9. Despite recent insights into these phenomena, the developmental-genetic basis for morphological variation remains largely unknown, as there are likely many overlapping and coordinated mechanisms involved, each with relative contributions10. To help disentangle these mechanisms, it is important to build and integrate large phenotypic databases for model organisms11,12,13,14. In this work, we present MusMorph, a database of standardized mouse morphology data for meta-analyses of morphological variability and variation, particularly in the craniofacial complex and brain.

The laboratory mouse is a useful model organism for studying the mechanisms of morphological variation because of the high genetic homology with humans, short gestation, and rich set of tools for manipulating the genome in a controlled environment. Unfortunately, phenotypic data are often biased by laboratory-specific data collection protocols. The International Mouse Phenotyping Consortium (IMPC, was born out of a need to determine the relationship between genotype and phenotype with standardized phenotypic data. Using micro-computed tomography (µCT) and optical projection tomography, the consortium has studied the anatomy of mouse lines heterozygous or homozygous for a single gene mutation, particularly at embryonic day E9.5, E14.5-15.5, and E18.515,16,17,18,19,20. Less emphasis has been placed on µCT imaging and analysis of adults and mid-gestation (E10 to E11) mutants, where critical developmental events, like fusion of the craniofacial prominences, occur. Mouse lines with normal (non-pathological) levels of variation, such as recombinant inbred strains and outbred strains with high heterozygosity21,22,23, have also been poorly characterized. Quantifying such variation is important, because it drives disease susceptibility and course of disease in humans.

Recently, model organism phenotyping has transitioned from manual linear measurements to fully automated computational pipelines. One common approach is voxel-based morphometry24,25. Voxel-based morphometry is based on the analysis of deformation fields obtained via image registration. After spatially aligning images to an average atlas, the deformation fields can be quantitatively compared between groups on a voxel-wise basis to identify differences in morphology. Voxel-based morphometry remains a pillar of shape analysis, because it can localize small regions of shape change without any a priori knowledge of the anatomy, but it is prone to the multiple testing problem26,27. Another approach is atlas-based geometric morphometrics, which instead uses registration fields to automatically derive landmarks, or Cartesian coordinate points that are homologous across samples. Geometric morphometrics is central to evolutionary biology and developmental biology, among other fields, because landmarks allow for statistically tractable quantifications of morphological variation, as well as intuitive visualizations28. These advantages continue to fuel development of novel geometric morphometric pipelines and extensions29,30,31,32,33. Yet large-scale morphometric analyses remain rare due to the sparsity of standardized landmark data.

Here, we introduce MusMorph, a database of standardized mouse morphology data generated with an open-source, atlas-based phenotyping pipeline that integrates techniques from image registration, deep learning, and morphometrics. We compiled the database (N = 10,056) using µCT scans of mice from a variety of strain/genotype combinations and developmental stages, including E10.5, E11.5, E14.5, E15.5, E18.5, and adulthood. Most of MusMorph is composed of head morphology data, but there are also whole-body embryo data for different integrative analyses. We provide (1) a developmental atlas for each timepoint; (2) a rigidly aligned and preprocessed µCT scan, dense anatomical landmarks, and segmentations (if available) for each specimen; (3) a set of scripts for transforming and comparing an input scan to an atlas; (4) an approach to validate the transformed landmark data and optimize it, if needed. To ensure reproducibility and data sharing, we make the data freely accessible from FaceBase34 (, and our code from GitHub ( By incorporating substantial developmental and genetic variation alongside a rich set of metadata, MusMorph will enable standardized morphometric analyses of genotype-phenotypes to better understand the mechanistic basis for morphological variation.



We compiled mouse embryos and adults from numerous sources. The mouse lines for the E15.5 and E18.5 datasets were generated by the IMPC. These mice were produced and maintained on a C57BL/6N genetic background, with support from C57BL/6NJ, C57BL/6NTac or C57BL/6NCrl. More details about husbandry practices can be found at The mouse lines for the E10.5, E11.5, E14.5, and adult datasets were produced on a variety of genetic backgrounds at different institutions for studies of craniofacial variation. We hereafter refer to these lines as the Calgary mice, because they were ultimately imaged at the University of Calgary. Specific information about study protocols, such as husbandry practices and genotyping, should be gleaned from the MusMorph dataset summaries on FaceBase or the original studies themselves. Each dataset within the MusMorph project35 on FaceBase represents a study or set of studies defined by a common study design that yielded similar mouse lines. Details about the experimental design were obtained from the original studies listed in the “Publication(s)” section of each dataset. In addition, we provide a supplementary comma-separated values (CSV) file (Study_Metadata.csv) in the project-wide metadata dataset36 on FaceBase that lists the associated studies.

Micro-computed tomography

Sample preparation

Each IMPC embryo underwent a hydrogel stabilization protocol37 to prepare for diffusible iodine-based contrast-enhanced µCT (diceCT)38. This involved incubating the embryo in a hydrogel solution composed of 4% (wt) paraformaldehyde, 4% (wt/vol) acrylamide (Bio-Rad, USA), 0.05% (wt/vol) bis-acrylamide, 0.25% VA044 Initiator (Wako Chemicals, USA), 0.05% (wt/vol) saponin (Sigma-Aldrich, Germany), and phosphate-buffered saline at 4 °C for 3 days. Following incubation, the air in the specimen tube was replaced with nitrogen gas and the tube was immersed in a 37 °C water bath for 3 h. The whole embryo was then stained with a 0.025 N to 0.1 N Lugol’s iodine (I2KI) solution (Sigma-Aldrich, Germany) for 24 h and mounted in agarose for diceCT. This approach has become a popular alternative to magnetic resonance imaging because it is faster, cheaper, and still offers remarkable contrast, allowing for high-throughput phenotyping of soft and hard tissue38.

The Calgary embryos were subjected to different fixation and staining protocols. Each embryo acquired prior to 2017 was fixed in a solution of 4% (wt) paraformaldehyde and 5% (wt) glutaraldehyde. The specimen was next submerged in the CystoCon Ray II (iothalamate meglumine) contrast agent for one hour to stain external morphology. Embryos obtained after 2017 were put through a nucleic acid stabilization protocol that allows for examination of RNA in embryos scanned via µCT39. Each embryo was fixed with the PAXgene Tissue FIX solution (Qiagen, PreAnalytics, cat #765312), incubated overnight (17 h + /- 1 h) at room temperature, then transferred to a solution of PAXgene Tissue STABILIZER prepared to manufacturer specification (Qiagen, PreAnalytics, cat #765512). For diceCT, each specimen was placed in a solution of PAXgene Tissue STABILIZER and 1% to 3.75% (wt/vol) Lugol’s iodine for 24 h. The head of every embryo was dissected before being mounted in either agarose or soft wax, which was covered by a microcentrifuge tube and infused with 50-100 µl of tissue stabilizer.

Each Calgary adult was set up with a standardized storage and mounting protocol. The mouse carcass was stored at −20 °C after euthanasia. Prior to the day of scanning, the mouse was retrieved and thawed overnight at 4 °C. The carcasses were then wrapped in foam and placed into a 37 mm diameter sample holder for µCT.


The IMPC embryos were imaged at six centers, including the Baylor College of Medicine, Czech Center for Phenogenomics, MRC Harwell, Toronto Centre for Phenogenomics, The Jackson Laboratory, and University of California, Davis. A 3-D image of each iodine-stained whole embryo was acquired with a Skyscan 1172 µCT scanner (Bruker, Kontich, Belgium) at 100 kVp and 100 µA. The raw images were initially obtained with isotropic voxels but variable spatial dimensions and resolutions, ranging between 0.002 mm to 0.04 mm. Image projections were reconstructed into a digital stack using the Feldkamp algorithm40.

The Calgary mice were imaged in the 3-D Morphometrics Center at the University of Calgary. A 3-D image of each stained embryo head was obtained with either (a) a Scanco µCT 35 scanner (Scanco Medical, Brütisellen, Switzerland) at 45 kV and 177 µA or (b) a ZEISS Xradia Versa 520 X-ray microscope (Carl Zeiss AG, Oberkochen, Germany) at 40–50 kV, 4-5 W, and 2 s exposure time. A 3-D image of each adult skull was acquired with either (a) a Scanco vivaCT 40 µCT scanner (Scanco Medical, Brütisellen, Switzerland), (b) a Scanco vivaCT 80 µCT scanner (Scanco Medical, Brütisellen, Switzerland), or (c) a Skyscan 1173 v1.6 µCT scanner (Bruker, Kontich, Belgium) at 55–80 kV and 60–145 µA. Like the IMPC data, these original images were obtained with isotropic voxels but variable spatial dimensions and resolutions. Embryo image resolutions ranged between 0.007 mm and 0.027 mm, whereas adult resolutions ranged between 0.035 mm and 0.044 mm. Image projections were reconstructed with the integrated Scanco software, the ZEISS XMReconstructor software, or the Skyscan NRecon v1.7.4.2 software.

Image preprocessing

We preprocessed each image to account for differences in image acquisition that would interfere with the atlas-based registration workflow described below (Fig. 1). The preprocessing scripts are provided in the MusMorph GitHub repository ( In this preprocessing step, we first converted the reconstructed imaging data (.nrrd, .aim, .tiff) to the Montreal Neurological Institute (MNI) .mnc format using file conversion scripts written in Bash and Python (see,,,, and As part of the open-source MINC library (, the .mnc format is implemented using HDF5 (Hierarchical Data Format, version 5), which supports hierarchical data structure, internal compression, 64-bit file sizes, and other modern features41.

Fig. 1
figure 1

Schematic overview of the phenotyping pipeline. Specimens were staged, prepared (fixed/stored), stained, and imaged with different but standardized lab-specific protocols. While the E10.5, E11.5, E14.5, and adult specimens were obtained in Calgary, the E15.5 and E18.5 specimens were acquired from the IMPC. To account for differences in image acquisition (e.g., intensity artifacts, image resolution and dimensions, and position), each image was subjected to a series of preprocessing steps. Next, each preprocessed image was non-linearly registered to a stage-specific reference atlas with a detailed set of landmarks and segmentations. We recovered deformation fields, landmarks, and segmentations (if available) for each specimen. To optimize the landmark predictions of poorly registered specimens, as measured by cross-correlation similarity, a downstream neural network was used.

Staining artifacts, such as extreme intensity gradients and variable penetrance, can bias the image registration process. To minimize intensity inhomogeneities, we applied the N3 method42. Since many of the E15.5 images had background noise, where the stained scanning medium was indistinguishable from the anatomy, we employed a thresholding script in Bash (see This script computes a lower anatomical density threshold, masks the voxels above this bound and those in proximity via dilation, and equates all voxels outside the mask to 0. To ensure the image resolutions and dimensions were consistent with the atlas, we implemented an image resampling script in Bash (see We also used this script to control for differences in bit depth among scanners by including a min-max normalization, which scaled the embryo intensities between 0 and 1. Table 1 outlines the source of the image data, developmental stage, voxel dimensions, image resolutions, stage-specific sample sizes, and the presence or absence of atlas anatomical labels. Note that the E14.5 images were solely used to create another stage-specific atlas, as they are from a smaller, unpublished dataset.

Table 1 Summary of imaging data.

Another essential step to all image registration workflows is the initialization, or a rigid alignment between an image pair. Using initialization scripts written in Bash (see and R (Tag_Combine.R), we rigidly transformed each image to a stage-specific atlas or, if an atlas did not exist, an arbitrary but stage-specific reference image. To determine the rigid transformation matrices, we utilized a semi-automated or strictly automated approach, depending on anatomical orientation. If the mouse was scanned in a random orientation, we rendered a minimum threshold surface in MINC, then manually placed five homologous three-dimensional (3-D) landmarks at anatomical extrema (e.g., ears, nose, top of the head, and back of the head), resulting in an MNI tag point file (.tag) with landmark coordinates. Next, we concatenated the reference and arbitrary landmark matrices, and minimized their 3-D Euclidean distances via least squares. If the specimen was already roughly aligned to the reference image, we performed an automated, intensity-based rigid alignment using the full registration process outlined below (see the “Image Registration and Label Propagation” section). This intensity-based rigid alignment was also repeated for the manually aligned volumes to ensure consistency. With the rigid transformation matrices, we resampled each image into their stage-specific reference coordinate space using tri-linear interpolation.

Reference atlases

We generated a population average atlas for each stage, excluding E15.5 and adulthood, by spatially normalizing 25 µCT images of wildtype mice with a group-wise registration workflow43,44 (Figs. 2 and 3). A nearly identical workflow was used to create the existing E15.5 and adult atlases. The atlas construction script is available in the MusMorph GitHub ( and is written in Python (see or This script produces Bash scripts that can be executed automatically and in parallel on a compute cluster to maximize computational efficiency. Without massively parallel computing, the volumetric registrations would need to be performed sequentially, each requiring hours of computation and a large amount of memory. Before executing the workflow, the user must upload the initialized images and registration scripts to a compute cluster. In addition, the user needs to install a MINC Toolkit module onto the cluster via Docker ( or GitHub (, or define a pre-existing module, because the scripts utilize the open-source MINC software. An atlas can also be generated locally, but it will be significantly slower without massively parallel computing.

Fig. 2
figure 2

Embryo reference atlases. Sagittal cross-sections of the E10.5 (, E11.5 (, E14.5 (, E15.5 (, and E18.5 ( atlas volumes are shown to display the stained internal anatomy. Each head surface was labelled with a dense landmark configuration to capture global and local aspects of morphology. Lateral, superior, and anterior views of each head isosurface are shown. The equivalent semi-landmark patches (small, color-coded points) were interpolated between a set of sparse homologous landmarks (large, red points). They can be slid and resampled for morphometric analyses.

Fig. 3
figure 3

Adult reference atlas. Cranium (top), mandible (middle), and endocast (bottom) surfaces were segmented from the skull atlas (, then labelled with a dense landmark configuration to capture global and local aspects of morphology. Lateral, superior, and anterior views of each segmentation isosurface are shown. There are sparse landmarks (red) as well as surface (blue) and curve (green) semi-landmarks that can be slid and resampled for morphometric analyses.

Spatial normalization involves an initial affine transformation for global alignment, followed by a deformable transformation for non-linear alignment. To account for global variation in location, orientation, and scale, we computed a series of multi-resolution (coarse to fine) affine transformations among the images by optimizing a cross-correlation objective function45. Given that sample-wide pairwise registrations yield an improved affine template46, or intensity average, we completed all possible (N = 25*24) pairwise affine registrations, then averaged the resulting transformation for each specimen. Using the averaged transformations, we resampled each initialized image into the affine coordinate space with tri-linear interpolation and averaged the resulting images to produce an affine template. To correct for local variation in shape, we computed a series of multi-resolution non-linear transformations with the ANIMAL (Automatic Nonlinear Image Matching and Anatomical Labelling) algorithm47, again optimizing for cross-correlation. This iterative, four-step process involves non-linearly deforming each mouse to an evolving template at increasingly higher resolutions, with the first template being the affine average and the next three being improved versions of the non-linear average48. The final product is a stage-specific average with excellent contrast and a high signal-to-noise ratio.

Since the goal of MusMorph was to aggregate landmark data for morphometrics, and our primary imaging data are head scans, we focused on labelling each atlas head surface with a standardized landmark configuration (Figs. 2 and 3). Specific information about the number of landmarks and their anatomical definitions can be found below in the “Data Records: Landmarks” section. To generate the landmarks, we first rendered a minimum density isosurface in MINC, which uses ITK’s marching cubes algorithm, and saved the 3-D rendering as a Stanford PLY (.ply) file. We then used 3D Slicer49 or the MINC Toolkit to acquire a landmark configuration on each surface that provided a comprehensive representation of shape50. For the embryos, we used 3D Slicer and the SlicerMorph extension32 to identify sparse landmarks and interpolate landmark patches of variable density in between, depending on the size of the area, resulting in dense coverage of the head. This also ensured that the semilandmark patches were equivalent, allowing for a morphospace into which all specimens may be superimposed. Note that we privilege the term equivalent over homologous here, because the developmental origin of dense tissue regions between fixed landmarks is more ambiguous. Because semilandmark patch equivalence was not a consideration for the adults, we landmarked the adult atlas in MINC using built-in display tools, again ensuring sparse and dense landmark coverage.

Shared developmental pathways lead to correlated morphological variation, or morphological integration51,52,53,54,55,56,57. To enable analyses of integration, we added landmark configurations and segmentations to different regions of the adult skull atlas. Specifically, we manually segmented the adult cranium, mandible, and cranial endocast (i.e., a proxy for the brain) in MINC, then rendered these segmentations as isosurfaces before landmarking them with a dense configuration. Once again, the landmark details are described below in the “Data Records: Landmarks” section. The segmentations may be used for surface-based analyses58, measures of size (e.g., volume or surface), or as masks to reduce the shape dimensionality of a voxel-based morphometry analysis. Unlike the adult atlas, the embryo atlases do not come with segmentations due to the scope of this work, apart from the pre-existing E15.5 atlas, which has 48 manually segmented structures (

Image registration and label propagation

We pairwise registered each image to their stage-specific atlas to obtain a composite (affine and non-linear) transformation for label propagation (Fig. 1). Like the atlas workflow described above, the registration scripts are available in the MusMorph GitHub ( and are written in Python (see or The purpose once more is to produce Bash scripts en masse for massively parallel computing on a compute cluster due to the computational requirements of volumetric deformable registration and anatomical labelling. Only the initialized images and registration scripts need to be uploaded to the cluster to execute the workflow. While the pairwise registrations involved the same multi-resolution affine alignment described above, the non-linear alignment differed. Here, we implemented the geodesic SyN (Symmetric Normalization) algorithm59, because it was previously validated for atlas-based landmarking and morphometrics of mouse models44. The SyN registrations were optimized using cross-correlation. After registration, we used labelling scripts written in Bash and produced via Python (see to recover the non-linear transformations, concatenate them with the affine transformations, invert them, and propagate the atlas labels to the rigid space of each image.

Neural network shape optimization

Although top-performing registration algorithms provide an effective and generalizable way to automatically label anatomy, there are instances where outliers and problematic landmarks can alter shape representations. This is particularly true for model organisms, where mutant phenotypes may show little to no resemblance with an atlas. To demonstrate how biological signal can be restored, we implemented a supervised deep learning workflow available in the MusMorph GitHub (, which employs scripts written in R and Julia (see GPA_and_Projection.R and Landmark_Optimization.jl)60. Using a subset of 68 sparse adult craniofacial landmarks (N = 2,000) described in previous work61,62,63,64,65, we trained a deep feedforward neural network to learn a domain-specific loss function that minimizes automated and manual shape differences. The sparse landmark numbers amenable to optimization (see Optimization_Order.csv)36 are available on FaceBase. We focused on the adults because that was the only stage with a large existing set of homologous manual landmarks for training.

We tested the network predictions on a random subset (N = 500) of adult skulls described further in the “Technical Validation” section. To help others initialize the network without having to retrain it, we provide the adult network model (Calgary_Adult_Cranium_Model.bson) and weights (Calgary_Adult_Cranium_Weights.bson) in the Binary JSON (.bson) file format on GitHub. We also make available the optimized sparse shape predictions for the entire adult crania dataset (Adult_Cranium_Sparse_Landmarks.csv)36. Although we focused on adults, this optimization strategy is generalizable, so other research groups with manual landmark data on any structure of the atlases may use the network architecture to improve outlier predictions.

Data Records

Specimen metadata

Each specimen in the MusMorph database35 is associated with a rich set of identifiers to accommodate morphometric analyses using multiple factors and/or covariates. Alongside detailed metadata descriptions in FaceBase, we provide the specimen metadata as a supplementary CSV file (MusMorph_Metadata.csv)36 for convenience and to include auxiliary fields. Table 2 enumerates the metadata and Table S1 summarizes the metadata distributions for each dataset on FaceBase.

Table 2 Summary of metadata identifiers.

Figure 4a,b illustrates the distributions of sex, strain type, and genotype across the embryo and adult datasets. Sex is well-annotated for the E15.5, E18.5, and adult datasets, but is missing (“NA”) for many of the E10.5 and E11.5 specimens. While most of the embryo mouse models were produced on an isogenic inbred background, particularly C57BL/6N, strain diversity is a focal point of the adult datasets. Among the nine adult strain types provided, there are 98 unique background strains. The majority are recombinant inbred lines (e.g., the Collaborative Cross dataset66), wild-derived crosses (e.g., the Hybrid dataset67), and outbred lines (e.g., the Diversity Outbred dataset68). We have included 459 unique genotypes for the embryo datasets, most of which derive from the IMPC dataset69, as well as 179 genotypes for the adult datasets. A minority of specimens, including several embryos in the Ap270, B9d71, and Bulgy72 datasets as well as a few adults in the Brain-Face73 dataset, have unknown genotypes (e.g., “-/-;NA” and “ + /-;NA” in double knockout designs or “NA” and “ + / + or + /-” in single knockouts) due to genotyping complications in the past. Specimens homozygous for a single gene mutation predominate the embryo datasets, whereas normal wildtype variants comprise the bulk of the adult datasets. Figure 4c shows the developmental stages represented in MusMorph. Of the 10,056 specimens processed, 40% are embryos and 60% are adults, many of which have just finished maturing around postnatal day 90. All specimens without a recorded stage (“NA”) are mature adults.

Fig. 4
figure 4

Summary of metadata. (a) Distribution of sex, strain type, and genotype for the embryo datasets. (b) Distribution of sex, strain type, and genotype for the adult dataset. (c) Sample sizes of each developmental stage included in the database. All “NA” specimens are mature or middle-aged adults. (d) Left: Example landmarks and segmentations of the adult skull and endocast (brain). Middle/Right: Morphological analyses, such as PCA and allometry regressions, that one might perform with a dense landmark dataset. Each color in the plot represents a different mouse genotype. (e) Left: Slice visualization of a non-linear deformation grid. Middle/Right: Morphological analyses, such as statistical parametric mapping, that one might perform with a deformation field. The t values show significant (p < 0.05) voxel-wise differences in form (i.e., volume shrinkage) in Ghrhr homozygous mutants relative to wild type, whereas the variance heatmap shows voxel-wise variances in Ghrhr mutants.

It is often desirable to compare mutants to their wildtype counterparts from the same sample because background strains vary. To preserve sample provenance where possible, specimens that are wildtype for a given mutation will have the same gene symbol as their heterozygote and homozygote littermates. For wildtype specimens without litter information, like the IMPC dataset, their genotypes are equated to background strain. Mouse strain nomenclature follows the MGI guidelines, except when the strain design is unknown and has no MGI ID (e.g., novel hybrid backcrosses). We also abbreviate genotypes for complex strain designs using MGI synonyms if available. Furthermore, while most wildtype specimens fall within the control experimental group, there are cases where they can exhibit mutant-like phenotypes and be categorized as such. One example in MusMorph is the artificial selection Longshanks dataset74, which through many generations of artificial selection produced wildtype specimens with extreme tibia and craniofacial phenotypes75,76.

We selected the above identifiers, because they tend to explain a significant amount of morphological variation in morphometric analyses. For instance, many structures in the mouse are sexually dimorphic, including the shape of the brain77 and craniofacial complex78, cortical bone size and strength79, adipose tissue distribution80, and feto-placental growth81,82, to name a few. It is also known that classical laboratory strains, such as those in the Strain Comparison dataset83, exhibit naturally occurring craniofacial phenotypes84. Moreover, gene mutations can interact with a background strain via epistasis to produce different phenotypes85,86,87, like those in the Spry dataset88. Another key driver of variation is developmental stage, as differences in age often define a principal axis of allometric variation via correlations with size and/or shape89,90,91,92,93. Given the ubiquity of allometry, these correlations can be found across most MusMorph datasets (Fig. 4d). Finally, numerous studies have reported the phenotypic outcomes of single gene mutations, environmental perturbations, and how zygosity modulates these effects94,95,96. These identifiers have corresponding images, landmarks, segmentations, and deformation fields for morphological analyses (Fig. 4d,e).


We provide the atlases and initialized images for each specimen in the MNI .mnc format. The naming convention for the atlas volumes is < Source > _ < Stage > _ < Anatomy > _Atlas .mnc. They are categorized as “Imaging Data” in the project-wide dataset36 on FaceBase. The naming convention for the initialized volumes is < Biosample > .mnc, where Biosample is the name of the specimen in the metadata (see the “Specimen metadata” section). One exception is the naming convention for the subset of thresholded E15.5 images, which is < Biosample > _Thresh.mnc. These volumes are also categorized as “Imaging Data” across the MusMorph datasets on FaceBase. Each .mnc file has four key attributes: 1) a named dimension (xspace, yspace, zspace), 2) length (number of voxels on each dimension), step (resolution), and start (origin). MINC defines a voxel and world coordinate system, so one can move between them with the simple “voxeltoworld” and “worldtovoxel” MINC commands. If users want to convert between .mnc and different file formats (e.g., raw data, DICOM, NIfTI, Analyze, ECAT, TIFF, Concorde, VFF), there are a variety of other Bash commands available ( While the raw IMPC images are freely accessible in the NRRD (.nrrd) format at, the raw Calgary images are available upon request in the AIM (.aim) or TIFF (.tiff) formats.


For each pairwise registration, we recovered an inverted non-linear and composite (affine and non-linear) transformation. Given the file sizes of the non-linear deformation fields (~3 GB on average × 10,000 = 30 TB), we make the transformations available upon request. The deformation fields and composite transformations are in the MNI .mnc and .xfm formats. Each .mnc file shares the same image attributes described above with an additional named dimension called vector_dimension which describes the non-linear displacement vectors. Each .xfm file contains a header and affine transformation matrix. The naming convention for the deformation fields is < Biosample > _ANTS_nl_inverted_grid_0.mnc and < Biosample > _ANTS_nl_inverted.xfm, whereas the composite transformations are called < Biosample > _origtoANTS_nl_inverted_grid_0.mnc and < Biosample > _origtoANTS_nl_inverted.xfm. “ANTS” denotes the algorithm and “nl” stands for “non-linear”. Much like the images, the transformations for the subset of thresholded E15.5 volumes have “Thresh” appended to the < Biosample > name.

Non-linear deformation fields describe the displacements of each target image voxel to each reference image voxel97. By calculating the Jacobian determinant J for every point \(p(x,y,z)\) in the deformation field,

$$\det (J(x,y,z))=\left|\begin{array}{ccc}\frac{\partial x}{\partial x} & \frac{\partial y}{\partial x} & \frac{\partial z}{\partial x}\\ \frac{\partial x}{\partial y} & \frac{\partial y}{\partial y} & \frac{\partial z}{\partial y}\\ \frac{\partial x}{\partial z} & \frac{\partial y}{\partial z} & \frac{\partial z}{\partial z}\end{array}\right|$$

one can quantify the magnitude of morphological change at each voxel (Fig. 4e). A Jacobian determinant of 1 indicates no volume change, whereas determinants greater than 1 indicate volume expansion and determinants between 0 and 1 indicate volume shrinkage. These determinants can also be scaled and sheared with a composite transformation to examine voxel-wise differences in form. Jacobian determinants can be analyzed with voxel-wise tests, such as an ANOVA with a false-discovery rate correction, to map statistics onto the anatomy, a technique otherwise known as statistical parametric mapping (see VBM_Example.R). For example, in Fig. 4e, we use the RMINC R package ( to show significant voxel-wise changes (shrinkages) in form between Ghrhr mutants98 and wildtype specimens, as well as voxel-wise variances in form associated with this mutation.


We labelled each atlas, and thus every registered mouse embryo and adult, with a standardized landmark configuration (Figs. 2 and 3). The atlas landmark files are named < Source > _ < Stage > _ < Anatomy > _Atlas_Landmarks.tag. They are stored as “Imaging Data” alongside the atlas volumes on FaceBase36. The individual specimen landmark files are named < Biosample > _ < Anatomy > _Landmarks.tag and are similarly categorized as “Imaging Data” across FaceBase. The MNI.tag file format is an ASCII file which stores the coordinates of each landmark in the millimetric world space of the volume. Each .tag file has a header above an array of p landmarks (rows) in k dimensions (columns). These files can be imported into R individually or collectively as a 3-D array using the tag2array function in the custom package99. Alternatively, the user can employ the read.csv function in R to import a vectorized .csv file. For every developmental stage and anatomical region, we provide a landmark .csv file in the “Supplementary Files” section of the project-wide dataset on FaceBase36, each of which contains a matrix of n specimens (rows) and \(p\times k\) landmark coordinate dimensions (columns). Importantly, there are dense semi-landmarks and sparse fixed landmarks for local and global geometric morphometric analyses of craniofacial, endocast (brain), and mandible morphology. In Fig. 4d, for instance, we show craniofacial shape morphs along the first principal component (PC) in an adult subsample, as well as allometry regressions which relate craniofacial shape to size.

The embryo landmarks are equivalent across stages. Table S2 describes the sparse embryo landmarks and their biological definitions. Table S3 lists the embryo semi-landmark patches and their density, both of which are based on the sparse landmarks. The stage-specific semi-landmark patch files can also be found as tab-separated value (TSV) files on GitHub ( Each embryo has 22 sparse homologous landmarks within their larger dense configuration. To perform a sparse landmark shape analysis, users may subset the first 22 rows of each 3-D array. Since there are three additional sparse landmarks for the E15.5 and E18.5 specimens, rows 23 to 25 may be included for stage-specific analyses or excluded for ontogenetic analyses.

The adult landmarks are simply equivalent within stage (i.e., all postnatal ages). Tables S4, S5, and S6 describe the sparse adult craniofacial, endocast, and mandible landmarks, respectively, as well as their biological definitions. While the adult curve semi-landmarks and surface semi-landmarks are not patch based, they can be slid and resampled using the R scripts on GitHub (see Calgary_Adult_Cranium_Sliding_Semis.R, Calgary_Adult_Mandible_Sliding_Semis.R, and Calgary_Adult_Endocast_Sliding_Semis.R) to mimic patches or any other structure. Much like the embryos, the sparse landmarks are the first 93, 12, and 19 rows of the cranium, endocast, and mandible 3-D arrays, respectively, and can be partitioned for a sparse shape analysis. If users want to generate new landmarks, such as internal landmarks or whole-body landmarks, they can use a script (see, the inverted composite transformations (see the “Transformations” section), and a local or remote compute cluster to propagate the landmarks to an initialized image. To promote standardization, we encourage users to add new landmark subsets to the pre-existing configurations.


We provide segmentation labels for the E15.5 and adult atlases and specimens to support alternative morphological analyses, such as 3-D visualizations, voxel-based morphometry, volumetric size comparisons, and surface-based image processing pipelines. Other stages do not have segmentation labels due to the scope of this work. The segmentations follow the same naming conventions described above: < Source > _ < Stage > _Atlas_Segs .mnc and < Biosample > _Segs.mnc. The atlas segmentations are available as “Imaging Data” on FaceBase36, as are the individual segmentation files across various MusMorph datasets. The published E15.5 atlas contains 48 whole body segmentations (, while the adult atlas comes with cranium, endocast, and mandible segmentations. Each label file is a .mnc volume of integers that matches the dimensionality of the image. To visualize the adult segmentations, for example, the user may load the atlas and label files together and input an integer of 1 to render the endocast, 2 for the cranium, and 3 for the mandible. As with new landmarks, there is the potential to resample new atlas segmentation labels into the initialized space of any image using the composite transformations (see the “Transformations” section) and a local or remote compute cluster (see

Technical Validation

Cross-correlation and root mean squared error

We computed intensity-based, pairwise registrations between each target image (I) and a reference atlas (J) by optimizing a normalized cross-correlation (NCC) similarity metric:

$$NCC\left(I,J\right)=\frac{{\sum }_{p\in \Omega }{I}_{(p)}{J}_{(p)}}{\sqrt{{\sum }_{p\in \Omega }{I}_{(p)}^{2}{\sum }_{p\in \Omega }{J}_{(p)}^{2}}}.$$

Normalized cross-correlation is calculated for all voxel positions p over a discrete domain (\(p\in \Omega \)). If the domain is the entire 3-D volume and \(NCC\left(I,J\right)=1\), the deformed target image and reference image are perfectly aligned. To assess the quality of each registration, we recorded the normalized cross-correlation between each deformed target image and the atlas using code in the labelling scripts (see Unfortunately, it is difficult to know whether the final registration correlations are “good” or “bad” without relating them to the quality of the labels collected. We investigated the relationship between landmark root mean squared error and cross-correlation in the adult crania training set above to build a quality assessment model. Letting \({{\boldsymbol{x}}}_{{\boldsymbol{\ell }}}^{(I)}\) and \({\widehat{{{\boldsymbol{x}}}_{{\boldsymbol{\ell }}}}}^{(I)}\) denote the observed (manual) and predicted (automated) Euclidean vectors at landmark \(\ell \) for a target image I, the root mean squared error for p landmarks is defined as

$$RMSE=\sqrt{\mathop{\sum }\limits_{\ell =1}^{p}\frac{1}{n}\mathop{\sum }\limits_{I=1}^{n}{\left({{\boldsymbol{x}}}_{{\boldsymbol{\ell }}}^{\left(I\right)}-{\widehat{{{\boldsymbol{x}}}_{{\boldsymbol{\ell }}}}}^{\left(I\right)}\right)}^{2}}.$$

After computing the root mean squared error for each specimen, we regressed these values on their corresponding cross-correlation values with linear, squared, and cubic cross-correlation terms (Fig. 5a). We found a statistically significant non-linear relationship (R2 = 0.3, p < 0.001), such that cross-correlation values below 0.90 resulted in exponentially higher landmark errors. The average root mean squared error was 0.23 mm (95% CI ± 0.002 mm). This mean error is comparable to manual landmark intra-observer detection errors across the skull, which tend to be 0.25 mm or less44,50. To verify registration quality across the rest of the database, we calculated cross-correlations for all specimens and stages. The mean cross-correlation values and their standard deviations for E10.5, E11.5, E15.5, E18.5, and adulthood were 0.94 ± 0.07, 0.96 ± 0.04, 0.93 ± 0.02, 0.93 ± 0.12, and 0.95 ± 0.02, respectively (Fig. 5a). These values are on par or higher than those reported in previous mouse registration studies100 and speak to the reproducibility of this approach for analyzing variable morphology.

Fig. 5
figure 5

Validation of adult crania test set. (a) Left: Regression of automated-manual Euclidean distances (error) on cross-correlation, a measure of the final target-reference image similarity. Right: Boxplots showing the distribution of cross-correlation values within each developmental stage. (b) Correlation of automated and manual PC scores. Left: Baseline automated PC correlations. Right: Optimized automated PC correlations. (c) Mean shape deviations between the automated and manual datasets. Red arrows indicate error prone areas.

Covariance patterns and the mean shape

We quantified differences in covariance structure and the sample mean shape between our baseline automated landmarks, the optimized neural network landmarks, and the manual landmarks. To analyze covariance similarity, we projected the automated configurations into the manual PC space and correlated the uncentered PC scores. Figure 5b shows automated and manual correlations for the first 10 PCs (65.1% of the total variance). The average correlation within PCs for the baseline automated configurations was r = 0.6. This measure is biased downwards by lower order automated PCs, which tend to capture residual covariance of the first manual PC. The average correlation within PCs for the optimized automated configurations was r = 0.8, suggesting a restoration of signal among the major PCs.

To analyze mean shape deviations, we computed the grand mean shape for the manual landmarks and deformed it to the automated mean shapes via thin-plate spline. We then used the Morpho package101 in R to generate a deformation heatmap of Procrustes distances at every vertex of the deformed mesh (Fig. 5c). Procrustes distance is equivalent to the root mean squared error between two configurations in shape space. The total distance between the baseline automated mean and manual mean was 0.05, whereas the distance between the optimized automated mean and manual mean was 0.01. Visually, the baseline automated mean shape is largely indistinguishable from the manual mean shape, apart from several known problematic areas44. First, the anterior extent of the frontonasal prominence is underestimated. Second, the shape of the foramen magnum is altered. Third, the lateral extent of the frontal bone is underestimated, likely because there are no sparse landmarks to interpolate there; however, this area is well-covered by the dense landmark configurations. Optimization successfully corrected errors at these problematic locations.

Outliers and stage-specific shape distributions

For each stage, we calculated the Procrustes distance between the mean shape and every configuration to obtain shape distributions and identify outliers (Fig. S1). We defined outlier shapes as those with a Procrustes distance above \({Q}_{3}+1.5\times IQR\), where \({Q}_{3}\) is the third quartile and \(IQR\) is the interquartile range. Next, we displayed a minimum threshold isosurface of each outlier image alongside its landmarks to assess the errors. Landmark (.tag) files with clear head registration errors were removed. We observed most errant outlier landmark configurations in the E15.5 and E18.5 embryos, which underwent whole-body registrations. Since the orientation of the head relative to the body cannot be standardized in embryos, the whole-body registrations and inherent constraints of spatial normalization resulted in local registrations errors if their orientation was markedly different from the atlas.

Eliminating problematic outliers with distance distributions is a global solution but not always a local one. For example, if a landmark configuration hardly deviates from the mean on average, yet still has several landmarks with high detection errors, its distance to the mean could be small but its shape distinct. We performed a Principal Component Analysis on each stage-specific landmark dataset (Figs. S2 and S3) to identify such localized errors, assuming the first PC would capture distinctly problematic shapes. Figure. 6 shows the resulting shape distributions along PC1 for each stage. Here, we morphed a surface of the mean shape to each extreme via thin-plate spline and visualized the outputs. If the deformed surface was unusual, we displayed the image and landmarks as above, removed the errant landmark (.tag) file if necessary, and repeated this process until the prediction was correct.

Fig. 6
figure 6

Principal Component Analysis of stage-specific shape data. The mean shape (center) was deformed to the minimum (left) and maximum (right) extremes of PC1. Every morph is shown with anterior and lateral views. Each row represents a different developmental stage, ranging from E10.5 to adulthood.

Usage Notes

Why MusMorph?

The goal of MusMorph was to create a database of standardized mouse morphology data using an automated, high-throughput, and open-source phenotyping pipeline. By combining developmental atlases with a registration and deep learning framework, we constructed common coordinate systems into which various phenotypic data can be integrated. We primarily focused on acquiring morphological data, including anatomical landmarks, segmentations, and deformation fields, for the craniofacial complex and brain. However, we also generated whole body data for other integrative analyses of late-gestation embryos. To enable novel morphometric analyses of genotype-phenotype maps, we utilized mouse models with substantial developmental and genetic variation. Paired alongside other key metadata, such as strain and sex, MusMorph provides the community with a unique opportunity to disentangle the mechanistic basis for morphological variation.

While sparse landmarks are invaluable for geometric morphometrics, there are scenarios where local shape change can be poorly represented. More ambiguous anatomy, such as curves and surfaces, cannot be sufficiently captured with fixed anatomical landmarks, and semi-landmarking each specimen can be tedious and error-prone. Our standardized sparse and dense landmark datasets can enable global and local shape analyses102,103, an area in geometric morphometrics historically overlooked. Equivalent dense landmark patches across the embryo datasets will also permit joint superimposition of multiple stages into a common shape space for increased statistical power as well as analyses of ontogeny (Fig. S4). In addition to landmarks, we make the corresponding deformation fields available on an ad hoc basis to support voxel-based meta-analyses of morphology. Despite its ubiquitous application in neuroimaging, voxel-based morphometry is rarely seen in fields that study hard tissue, such as evolutionary developmental biology, anthropology, and paleontology. These deformation fields will let one examine internal and external tissue interactions within anatomical context. Finally, we include anatomical segmentations for several stages, which can be used to restrict the dimensionality of a voxel-wise analysis, calculate the size (e.g., volume or surface area) of a structure, or perform a surface-based morphometry analysis. If users are dissatisfied with the coverage of existing landmarks and segmentations, they can modify the atlases and use the image transformations to generate new labels.

We have made the data and scripts freely available at FaceBase (, and GitHub ( to promote transparency, reproducibility, and future data aggregation. Completely open-source efforts like MusMorph are critical for standardizing phenotypic datasets. Unlike the field of genomics, which has been revolutionized through standardized sequencing and data crowdsourcing, phenomics continues to be limited by one-off, self-contained studies that cannot be related to one another. Standardized morphological datasets will allow research groups to, for instance, investigate the effects of a gene mutation alongside other mutants or wildtype strains in a common morphospace. The same can be said for other significant morphological factors and covariates, such as sex and age. Common morphospaces will further encourage multimodal data integration across the phenomic hierarchy, ranging from cellular and developmental phenotyping with light sheet microscopy104 to tissue phenotyping with magnetic resonance imaging and contrast-enhanced computed tomography38. Large phenotypic datasets will ultimately give us the statistical power needed to interrogate mechanisms that bias and generate morphological variation.

Sources of error and potential limitations

Staining artifacts are a drawback of contrast-enhanced computed tomography. Among the largest sources of registration error were poor contrast and background noise, particularly in the E15.5 dataset. Variable stain penetrance and inadequate contrast can underrepresent anatomy, whereas background noise can masquerade as anatomy and deceive the registration, even if the alignment is constrained with a mask. We mitigated labelling errors by registering thresholded images and by employing other preprocessing techniques, such as intensity bias correction and normalization. However, in some cases, the intensities of the scanning tube could not be distinguished from the specimen, leading to surface landmark errors. Another spatial alignment problem that was difficult to reconcile was variation in articulated anatomical positions. For example, head orientation relative to the body varied in the E15.5 and E18.5 datasets, and mandible orientation relative to the skull sometimes differed in the adult dataset. We chose to register the entire scan instead of separate segmentations, masks or cropped volumes, because a) we observed no significant differences in average registration quality, b) a single registration field is computationally more feasible to generate, store, and use downstream and c) a single atlas with a detailed set of labels is better for data standardization.

Non-linear alignment and labelling errors may occur around extreme anatomical points with high variability. To demonstrate how automated landmark error can be reduced, we implemented a neural network that minimized automated and manual craniofacial shape differences. Since the endocast, mandible, and embryo datasets do not have manual landmark training data, they cannot be optimized. However, if other investigators have training data, a network could be built to correct sparse phenotyping errors in areas of high morphological variability. Lastly, it is important to consider the computational time and memory needed for volumetric registration. To integrate new data, we strongly encourage users to parallelize their work on compute clusters.

Future development

The majority of MusMorph is composed of head data, because we had reservations about registering whole body data. Now that we have observed no significant differences in registration quality among datasets, on average, we plan to experiment with more whole-body data for embryos and adults. Another area we intend to improve is our developmental coverage. Despite sampling across most of development, we recognize that additional embryo timepoints (e.g., E9.5 and E12.5-14.5) are needed, as are higher sample sizes throughout mid-gestation and early adulthood. The developing mouse craniofacial complex, for example, undergoes immense growth during the first 30 days after birth105. Early postnatal datasets will be critical for asking questions about size and ontogenetic allometry. Finally, to complement our large sample of homozygous embryo mutants, we hope to introduce more wildtype and heterozygous embryos for analyses of normal variation. Heterozygotes have not been a focus of the IMPC, so there is ample opportunity to reveal previously unrecognized embryo phenotypes with standardized MusMorph comparisons. The adult dataset, by contrast, needs to be balanced with more homozygous mutants to better understand how mutations of large effect influence morphological variance and other related phenomena, such as integration and modularity.

Data access

MusMorph is categorized as a “Project” on FaceBase. Projects can be found in the “Data Browser: Projects” tab at the top of the home page. Project data are organized hierarchically. The levels of the hierarchy in ascending order of data specificity are “Project”, “Dataset”, “Experiment”, and “Biosample”. A project contains datasets, which are sets of similar studies. Each dataset is annotated with study abstracts, experimental designs, and metadata identifiers. Datasets are composed of experiments. An experiment represents a set of similar specimens, so mice with the same genetic background, age, treatment, and mutation would constitute one experiment. Experiments contain biosamples. A biosample is an individual specimen.

After creating a free account and logging in the MusMorph data and metadata can be downloaded at any level in the project hierarchy using the “Export: BDBag” tool at the top-right of the browser. This export function uses DERIVA106, the software platform that powers FaceBase, to generate a BDBag (Big Data Bag)107 ZIP file. Users then need to download the file and process it via BDBag client tools, either via the command line or GUI application. Specific details about the DERIVA Client installation and the step-by-step export instructions are available here: