Mouse MRI shows brain areas relatively larger in males emerge before those larger in females

Sex differences exist in behaviors, disease and neuropsychiatric disorders. Sexual dimorphisms however, have yet to be studied across the whole brain and across a comprehensive time course of postnatal development. Here, we use manganese-enhanced MRI (MEMRI) to longitudinally image male and female C57BL/6J mice across 9 time points, beginning at postnatal day 3. We recapitulate findings on canonically dimorphic areas, demonstrating MEMRI’s ability to study neuroanatomical sex differences. We discover, upon whole-brain volume correction, that neuroanatomical regions larger in males develop earlier than those larger in females. Groups of areas with shared sexually dimorphic developmental trajectories reflect behavioral and functional networks, and expression of genes involved with sex processes. Also, post-pubertal neuroanatomy is highly individualized, and individualization occurs earlier in males. Our results demonstrate the ability of MEMRI to reveal comprehensive developmental differences between male and female brains, which will improve our understanding of sex-specific predispositions to various neuropsychiatric disorders.

were compared with a likelihood ratio test to assess whether group affected righting reflex time or eye opening score. A third model was run with fixed effects of group, postnatal day, group-postnatal day interaction, sex and a sex-postnatal day interaction, with a random effect of individual mouse.
This third model was compared to the first model with a likelihood ratio test to assess whether there was a group by sex interaction. For open field, time spent in the centre of the open field and total ambulatory distance were analysed using linear models.

Affine and Non-affine Registrations
We illustrated the registration procedure in Supplementary Figure 11 (top row), which shows the registration of the p3 average (source image) to the p5 average (target image). The registration procedure was composed of two stages: affine registration performed using the mni autoreg tools [1] and non-affine registration performed using the ANTs toolkit [2]. The two images were overlaid (middle image, first row), demonstrating that the source image needed to be distorted to fit the target image. After the affine registration was performed, the resultant transformation was used to transform the source image and overlay it on the target image (middle image, second row). It was clear that the alignment has improved but was still unsatisfactory in some brain regions (third row). After the non-affine registration was performed, the source image was transformed using both the affine transformation and the non-affine transformation. This procedure resulted in satisfactory alignment between the two images (fourth and fifth row).

Jacobian Determinants
Jacobian determinants were used to quantify the volumetric changes caused by deformations. In Supplementary Figure 12, we illustrated the concept of absolute Jacobian determinants. Gridlines in the target image cerebellum were warped upon transformation to the source image. The determinants of this transformation are called the absolute Jacobian determinants. It was computed for every voxel and the resulting voxel map was overlaid on the target image. This illustrated how absolute Jacobian determinants capture the extent to which regions in the source image were smaller or larger than the corresponding regions in the target image. A similar procedure was applied to find relative Jacobian determinants. Gridlines in the target image cerebellum were warped upon transformation to the affine-transformed source image. The determinants of this transformation are called relative Jacobian determinants and its voxel map was overlaid on the target image. Relative Jacobian determinants captured volumetric changes after scaling brains to the same size.

Generating Atlas Labels
To test for biases, we compared three different sets of atlases (illustrated by Supplementary Figure   13). The first is called a consensus atlas and was used throughout the main study (detailed in Methods). In the second atlas set (called the resampled atlases), we resampled the consensus atlas (atlas on the p65 average) to each of the other 8 time points using the Level 2 transformations and nearest-neighbour interpolation. This atlas specifically removes biases in structure volumes associated with resampling done in the Level 2 registration. However, it does not remove biases associated with using a single atlas as the starting point for structure analysis.
The bias associated with using a single atlas can only be removed by manual segmentation of age-consensus averages. Since this process is quite intensive, we instead chose to use the MAGeT [3] pipeline to generate atlases for each age from multiple intermediate atlases. First, the p65 atlas and its associated MRI average were transformed to each individual p65 image using the transforms obtained from the p65 Level 1 registration. Each of the individual p65 resampled atlases were then used as starting atlases in the MAGeT pipeline to segment the average from the earlier adjacent time point: p36 average. In this pipeline, each atlas was registered to the p36 average, followed by a voxel-voting step to determine the label given to each voxel. At the end of these steps, the p36 average had an atlas overlaid on it, and this atlas never used the information that came from Level 2 of our registration. The exact procedure was then applied to the p36 atlas to generate the p29 atlas, and then the p29 atlas was used to generate the p23 atlas, and so on until all the time points had atlases associated with them. While a p65 atlas began this procedure, Level 2 registration information was never used, multiple atlases were generated in intermediate steps, and each time point's atlas only depended on the atlas of the time point immediately older. While these reasons do not eliminate longitudinal registration bias in this third atlas set (called the voted atlases), they greatly reduce it.

Gene Expression
The underlying gene expression changes associated with sexually dimorphic neuroanatomy ( Figure   6) remains unknown. We wanted to identify candidate genes that might be associated with these dimorphisms. To do so, we used the genome-wide spatial gene expression data available from the Allen Brain Institute [4] and compared them to our neuroanatomical results. There are, however, four caveats associated with using gene expression data for our purpose.
The first caveat is that genome-wide gene expression data was only collected in males at p56.
While some genes have developmental gene expression at p4, p14, and p28 [5], most genes do not and there is no gene expression data for females. The second caveat is that most genes had their expression data come from only one mouse. The third caveat is that the majority of gene expression data was collected using ISH (In situ hybridization) on sagittal slices spanning only one hemisphere.
Only a small subset of genes had ISH conducted on coronal slices spanning the whole brain. Lastly, despite extensive quality-control steps taken by the Allen Brain Institute, several regions in the brain were missing gene expression data. To compensate for this, whenever any gene had multiple replicates, we chose the replicate with the least amount of missing data. Furthermore, we excluded experiments where expression data spanned less than 20% of the brain. While these caveats do limit the conclusions made from this analysis, spatial gene expression data is still useful in identifying candidate genes associated with sexual dimorphisms for further exploration.

Neuroanatomy Prediction
We wanted to predict the absolute volumes of structures. To model this growth, we used natural spline functions. N th-order natural splines are characterized by N basis functions of age t, where the kth basis function is represented by f k (t). For each structure, the structure volume y ij in the training data was fit with the following model: Variables in this model are described in the main text. While increasing the order N of the natural splines allows one to model more complex growth curves, it can also overfit data leading to inaccurate predictions for data outside the training set. We computed Bayes Factors, using the BayesFactor package [6], to decide which order N of natural splines to use. Bayes Factors compute the evidence any model (12) has versus an intercept only model (y ij = α 1 + ε ij ) and assumes a reasonable set of priors on the predictors. We used the BayesFactor package's default Jeffreys prior for our analysis. By finding the Bayes Factor associated with Model (12) for values of spline order N ranging from one (linear growth) to the number of time points in the data minus one. The spline order N chosen for the structure's model is the one with the highest Bayes Factor associated with it.
Once we determined the order N of the natural splines modelling fixed growth effects, we wanted to similarly model random growth effects with M th-order natural splines and optimize M . The training data was fit with the model below: After fitting multiple models with different values for M , we chose the model with the lowest Bayesian information criterion (BIC) [7] to predict structure volumes.
Lastly, we also placed a gaussian weighting on data in the training set depending on the age. This step was motivated by the fact that when predicting volumes at a particular age, the time point closest to that age is the most informative. However, only considering the closest time point alone may be less informative than taking some information from the other time points. To balance the two extremes, we placed a gaussian weighting on the data. The gaussian weighting was centered on the time t which we want to predict and its spread parameter (σ 2 ) can be optimized. A high σ 2 implies data over all time is weighted equally by the model, and a low σ 2 implies data closest to the prediction time t is weighted higher than data further away. This spread parameter was optimized using leave-one-out cross validation.
The model described above was used primarily in our study. However, we also explored two improvements to our model to check for consistency. The first was adding a covariate for total brain volume V i,j at the time point prior to the one being predicted (j → j − 1), which was done to control for whole-brain volume effects in subjects. The model with this covariate is given below, and fixed N splines and random M splines are optimized as detailed above: The second improvement was to use a random forest. Thus far, structures were modelled independently of each other, i.e. a structure's volume at a certain time t was predicted from the same structure's volume at earlier times. Using the random forest machine learning method, we can predict a structure's volume from other structures at an earlier time. To do so, we first fit the primary model described above to the training data and obtained the residuals of this model at the age t we wanted to predict. We then identified the volume of all 182 structures in the brain at the immediate earlier time and used them to model these residuals using a random forest from the randomForest package [8]. The random forest contained 500 trees and randomly sampled 5 structures at each tree split. The final predicted value was the sum of predicted values from both the initial model and the random forest.

Optimizing Growth Models for Absolute Determinants
Absolute volumes required more complex growth curves than relative volumes. To model this curve, we used a similar procedure as in the previous section. We found the Bayes Factor associated with Model (12) for values of spline order N (fixed effect of growth) ranging from 1 to 8. For 80% of structures, Bayes Factor was maximized by splines of order N ≤ 6. The data was then fit with the model defined by Model (13) with N = 6 and the optimized M (spline order associated with random effect growth) is determined by finding the model with the minimum BIC. We found that 95% of structures were best fit by M ≤ 2. Thus, we chose order-6 natural splines for fixed effects of age and order-2 natural splines for random effects of age to fit absolute Jacobian determinants. We computed the likelihood-ratio statistic comparing this optimized model to a similar one without sex and sex-age interactions to ascertain significance of sex on absolute determinants.

Registration Bias
The first level in our longitudinal registration consists of nine independent registrations, each of which registers all the scans from one time point to an age-consensus average. As such, nine ageconsensus averages, one for each age, are created from the first level registration. In the second level, each time point's age-consensus average is registered to the age-consensus average of the immediate next time point, with the exception for p65-the final time point. We chose to make the p65 ageconsensus average the registration consensus average, i.e. the common space to which all subjects and time points are compared to for statistical analysis. While any time point can be chosen for analysis, we picked this age as it is close to the MRI atlas (p60) and the Allen Brain Gene Expression atlas (p56). While it is a necessary part of our analysis, it is important to note that the practice of picking a consensus time point can lead to biases. For example, interpolation bias may be a factor as, one time point (p65 in our case) receives less interpolation than the other time points. Furthermore, these biases may not affect all groups equally [9]. Below, we show that our registration is not biased across sex or individuals, and that the statistics maps generated in our study are similar regardless of the time point chosen as the consensus time point.

Interpolation bias has little effect on voxelwise statistics
We regenerated our statistics map ( Figure 4) after picking p17 as the registration consensus average (Supplementary Figure 9). This time point was chosen as it was the median time point in our study, which follows more closely to literature recommendations [9]. This map was then resampled to p65 space to facilitate comparison of the two statistics maps. The high similarity of Figure 4 and Supplementary Figure 9 show that the effect of choosing p65 or the p17 median time point on sexual dimorphisms detected is small.
Next, we tested whether detection of sexual dimorphisms at any age would be influenced by picking the p65 age-average as the registration consensus average. The two-level registration generates two sets of determinants: one set from Level 1 and one from Level 2. In our main study, we use the determinants from Level 2 as these are all transformed to a consensus p65 space.  Figure 10). Furthermore, instead of computing effect sizes associated with sex, we repeated the above procedure with arbitrary spatial statistics patterns, which we computed by calculating effect sizes after random permutations of the sex label. The correlations between Level 1 and transformed Level 2 spatial patterns were also quite high and reported as a density plot (Supplementary Figure 10). Taken together, this indicated that the biases in the statistics maps generated from our longitudinal registration (whether they are associated with sex or not) are minimal.

Atlas-based bias does not discriminate individuals or sex
We compared structure volumes for every subject and at every time point using each of the three sets of atlases: the atlas placed on the p65 brain (called the consensus atlas), the resampled atlases from each age, and the voted atlases from each age. Supplementary Figure 14A shows the volume of the Lobule 1-2 white matter located in the cerebellum. While the consensus and resampled atlases were slightly different from each other, the voted atlas was very different from them both. In fact, according to the voted atlas, the structure does not exist before 5 days of age. This demonstrates that the particular atlas chosen does play a role in the volumes calculated. However, we sought to test specifically if this effect would apply differently across individuals or sex. To do so, we computed the z-score of the structure-i.e. subtracted the mean volume at every age and scaled by the volume standard deviation at every age (Supplementary Figure 14B). We observed that the atlas method did not play a significant role in affecting volumes of individuals or sex. This was tested using 3 linear mixed-effects models. The first model had z-score volumes as a response variable; fixed effects   Fig. 1: Time to right, eye opening, as well as time spent in centre and total ambulatory distance travelled in the open field was assessed neonatally across scanned mice (S), their non-scanned littermates (L) and non-scanned controls (C). A) There was no effect of group (χ 2 8 = 14.54,P= 0.07), nor was there a group-sex interaction (χ 2 4 = 6.59,P= 0.16) on time it took for pups to right themselves across postnatal days 4, 5 and 6. B) There was also no effect of group (χ 2 32 = 20.61,P= 0.94) or a group-sex interaction effect (χ 2 16 = 9.30,P= 0.90) on when eyes opened across postnatal days 10 to 17. For both A and B, linear mixed-effects models were used to create trendlines and bars representing standard error. C) There was no effect of group (F 2,29 = 0.47,P= 0.63) or a group-sex interaction (F 2,29 = 0.64,P= 0.54) on time spent in the centre of the open field at postnatal day 16, nor was there an effect of group on total ambulatory distance travelled (F 2,29 = 1.16,P= 0.33) or a group-sex interaction (F 2,29 = 0.17,P= 0.85). Thus, no neonatal behavioural metrics collected were significantly impacted by scanning, and the results were the same across both males and females. These mice were kept for further testing in the open field as adults (postnatal day 65). Although there was no group-sex interaction on centre time (F 2,26 = 0.91,P= 0.41), there was a significant effect of group (F 2,26 = 6.81,P= 0.004) as scanned mice spent less time in the centre of the open field compared to nonscanned controls (post hoc Tukey test P adj = 0.003). Total ambulatory distance, however, did not show any significant differences across group (F 2,26 = 1.37,P= 0.27), or by sex within each group (F 2,26 = 1.34,P= 0.28). For both C and D, mean and standard error (bars) were calculated using linear models. Supplementary Fig. 2: Top 5% of largest voxels (relative to whole brain) in males and females clustered by their effect sizes over time. Cluster 1 and 2 correspond to regions larger in males in adulthood and these sexual dimorphisms emerge early. Cluster 3 corresponds to regions larger in females and emerges around puberty. Supplementary Fig. 3: Top 5% of largest voxels in males and females clustered by their effect sizes over time. Similar to the case with relative volumes, Clusters 1 and 2 correspond to regions larger in males in adulthood and these sexual dimorphisms emerge early. Cluster 3 corresponds to regions larger in females and emerges around puberty. Supplementary Fig. 4: Top 5% of largest vertices in males and females clustered by their effect sizes over time. Similar to relative and absolute volumes, Clusters 1 corresponds to regions larger in adult males and these dimorphisms occur early in development, while Cluster 3 corresponds to regions larger in females and these dimorphisms occur around puberty. Similarly, Cluster 2 in both the thickness analysis and volume analysis corresponds to regions larger in males whose dimorphisms emerge after the regions in Cluster 1. However, while volume analysis had both Cluster 1 and Cluster 2 dimorphisms emerging in the first 10 days of life, cortical thickness analysis shows dimorphisms in Cluster 2 emerging around male puberty. Supplementary Fig. 5: Preferential spatial expression of genes involved with sex processes in sexually dimorphic regions. A) Estrogen Receptor 1 (Esr 1) has biased expression in BNST, MPON, and MeA; and its expression is 2.1 times higher in sexually dimorphic regions than its average gene expression in the brain. B) GABA A receptor, subunit theta (Gabrq) has biased expression in BNST, MPON, and MeA (regions larger in males), as well as the midbrain and hindbrain (regions larger in females) with a fold change 1.94 relative to mean. C) Solute Carrier Family 6 (Neurotransmitter Transporter, Serotonin), Member 4 (Slc6a4 ) had the highest preferential expression of any gene measured (fold change 3.7) and D) Tryptophan hydroxylase 2 (Tph2 ) had the second highest preferential expression (fold change 3.4).
Supplementary Fig. 6: Using RMSPD to measure accuracy shows a similar pattern to RMSD as seen in Figure 8. A) Matrices show RMSPD values between each set of predicted structure volumes (columns, 1 per subject) and observed structure volumes (rows, 1 per subject), shifted such that the diagonal (prediction and observation RMSPD for the same subject) is 0. Red off-diagonals indicate that predictions for Subject X match observations for another subject better than observations for Subject X. The more red off-diagonal terms, the less specific the predictions are. Blue off-diagonals indicate specific predictions for Subject X as it matches observations of Subject X better than other subjects. Off-diagonal RMSPD are shown in a density plot (grey) and the diagonal RMSPD are given by points on the same plot (median is the vertical line). As data from further in development is included, model predictions become more specific. For example, model prediction specificity for p3 (which only uses data from Subject X at p3 to make predictions for Subject X at p36) is poor (58% chance of off-diagonal terms being blue) and prediction specificity for p10 (which uses data from Subject X at p3, p5, p7, and p10 to make predictions for Subject X at p36) is much better (75% chance of blue off-diagonal terms). B) RMSPD decreases (accuracy increases) as subject data from later in development are used in prediction. Male accuracy improves earlier than female accuracy. Supplementary Fig. 7: Predicting p65 structure volumes showed similar patterns of individualisation as predicting p36 volumes (Figure 8). A) Observed and predicted structure volumes for three representative structures with averages for each sex given by horizontal lines. Prediction for subject X at p65 was trained on all data excluding Subject X at p65. Model does not memorize the average, but instead fits individualised patterns in neuroanatomy B) Matrix shows RMSD values between each set of predicted structure volumes (columns, 1 per subject) and observed structure volumes (rows, 1 per subject), shifted such that the diagonal (prediction and observation RMSD for the same subject) is 0. Red off-diagonals indicate predictions for Subject X match observations for another subject better than observations for Subject X. The more red off-diagonal terms, the less specific the predictions are. Blue off-diagonals indicate specific predictions for Subject X as it matches observations of Subject X better than other subjects. Off-diagonal RMSD are shown in a density plot (grey) and the diagonal RMSD are given by points on the same plot (median is the vertical line). C) As data from further in development is included, model predictions become more specific. For example, model prediction specificity for p3 (which only uses data from Subject X at p3 to make predictions for Subject X at p65) is poor (53% chance of off-diagonal terms being blue) and prediction specificity for p17 (which uses data from Subject X at p3, p5, p7, p10, and p17 to make predictions for Subject X at p65) is much better (76% chance of blue off-diagonal terms). D) RMSD decreases (accuracy increases) as subject data from later in development is used in prediction. Male accuracy improves earlier than female accuracy. Supplementary Fig. 8: Plot of within-group sum of squares versus the number of k-means clusters. Increasing the number of clusters decreases the within-group sum of squares indicating that cluster members are more similar to each other. However, beyond 4 clusters there is diminishing returns on the within-group sum of squares. This elbow at 4 implies that 4 clusters are appropriate for k-means analysis [10] on our data. Supplementary Fig. 9: Sexually dimorphic areas in the mouse brain calculated after choosing the p17 age-consensus average as the registration consensus average. Voxelwise statistics were computed similar to those in Figure 4 except that p17 age-consensus average was chosen as the registration consensus instead of p65. The resultant statistics map was in p17 age-consensus space and was transformed to p65 age-consensus space for comparison to Figure 4. A high correlation was observed between these two maps (r = 0.992) indicating that choosing p65 as the consensus versus p17 (the median time point) incurs little bias in identifying sexually dimorphic regions.
Supplementary Fig. 10: Statistics maps generated without longitudinal registration are similar to those generated with longitudinal registration. Random statistical maps were generated for each time point by permuting sex labels and computing effect size comparing males and females. For every permutation, the effect sizes were calculated for Level 1 determinants (agnostic to longitudinal data and in age-consensus space) and Level 2 determinants (dependent on longitudinal registration and in p65 age-consensus space). The effect sizes from Level 2 determinants were transformed to the age-consensus space corresponding the effect sizes from Level 1 determinants. Correlations are computed between the transformed Level 2 effect size map and Level 1 effect size map and correlation was observed to be high for all time points. The dotted lines indicate correlations when the sex labels are not permuted and correspond to true volumetric effect sizes between males and females.
Supplementary Fig. 11: Registration of a source image (p3 average) to a target (p5 average). The native images (after rigid alignment) are on the top row and their overlay is in the middle column. Poor alignment can be found in structures like the cerebellum where there is rapid neonatal growth. Affine registration scales and shears the source image to better align with target image. The affine transformation (generated from the affine registration) is applied to the source image and is shown in the second row. The overlay shows a good match between the affine-transformed source and the target images. However, zooming into the cerebellum of the affine-transformed source and target image (third row) showed that affine registration does not produce proper alignment of the cerebellum. This is illustrated by applying a red contour to the cerebellum of the target image and overlaying this contour to the source image. The non-affine registration corrects this discrepancy (fourth row) and produces the best alignment between source and target images (fifth row).
Supplementary Fig. 12: Visualizing deformations caused by transformation of target image using grids and determinants. Illustrated in the left figure, upon transformation of the target image to the source image (this transformation is the inverse of the transformation in Supplementary Figure 11), gridlines in the target image become warped. In the top row, the gridlines warp from transformation to the source image; and in the bottom row, the gridlines warp from transformation to the affinetransformed source. Volumetric changes caused by the transformation can be qualitatively assessed by observing how the volume of a square region (region defined by the open space between gridlines) changes after transformation. It is clear from the convergence of gridlines in the cerebellum that much of the cerebellum decreases in size after transformation. This implies that the cerebellum is smaller in the source image than the target image. Volumetric changes can also be quantified by calculating the determinants (right figure). If a region in the source image is smaller than the corresponding regions in the target image (i.e. gridlines converge), the region has determinants between 0 and 1. Conversely, regions larger in the source image (i.e. gridlines diverge) have determinants larger than 1. Absolute determinants (top row) characterize volumetric changes upon transformation from target to source images and measure the true volumetric differences between target and source images. Relative determinants (bottom row) characterize volumetric changes upon transformation from target to affine-transformed source images and measure the volumetric differences between target and source images upon removal of a scaling factor (this scaling factor makes source and target images the same size as seen in Supplementary Figure 11: second row). The advantage of absolute determinants is that they can be used to calculate the volumes of regions in canonical units like mm 3 . Relative determinants, on the other hand, calculate volumes relative to total brain volume instead. However, relative determinants remove whole-brain size variability (which is the largest source of variability among mice [11]) to expose more subtle variations in neuroanatomy.
Supplementary Fig. 13: Three different sets of atlases used to check for registration bias in structures. Consensus atlas is the MRI atlas registered to the registration consensus average (which is also the p65 age-consensus average). Since all images are registered to this consensus average, we use this atlas alone to quantify structure volumes in our main study. The two additional sets of atlases created test for different types of biases. The resampled atlases are created by transforming the consensus atlas to every single age in a single interpolation step with transformations obtained from Level 2 of our registrations. This atlas allows us to check for resampling bias as, with this atlas, volumetric information need not be transformed to p65 space prior to quantification of structure volumes. For a given time point, its voted atlas is created by aligning its age-consensus average to the atlas overlaid on every subject of the next immediate-older time point. A voxel voting procedure is taken across all subject atlases to create the voted atlas on the time point's age-consensus average. This method greatly reduces the bias in choosing a single starting adult atlas and transforming it to younger time points. There are multiple intermediate atlases and each time point is only responsible for creating an atlas for the immediate-younger time point. Supplementary Fig. 14: Registration bias exists with atlases but does not discriminate individuals or sex. A) The volume of the Lobule 1-2 white matter estimated using the three sets of atlases: consensus, resampled, and voted (see Supplementary Figure 13). Trendline and standard error (shaded region) were obtained by fitting a linear mixed-effects model. We see that registration bias does play a role in volume estimation of small structures like the Lobule 1-2 white matter as, before p7, voted atlases say this structure does not exist and the other atlases do not agree. B) We computed z-score, removing the overall mean and standardizing variability across the three sets of atlases for each age. We test whether registration bias of structure volumes applies equally across all individuals using two linear mixed-effects models: the first model had z-score volumes as a response variable; fixed effect of time point, sex, and atlas (Consensus, Resampled, or Voted), as well as all interactions; and random effect of individual and individual-atlas interaction. The second model was the same but lacked the random effect of individual-atlas interaction. A log likelihood test showed that registration bias does not discriminate individuals (χ 2 5 = 0.33,P> 0.99). To test if registration bias was the same between the sexes, we performed a similar analysis, except the second model lacked all interactions between sex and atlas. We found that registration bias does not discriminate sex (χ 2 12 = 0.08,P> 0.99). All other structures showed little effects of registration bias affecting each individual differently (uncorrected all P>0.93) or affecting each sex differently (uncorrected all P>0.999). This supports our conclusion that while registration bias may exist in structure volume measurements, this bias applies equally to all individuals and sexes. Supplementary Fig. 15: Sexually dimorphic voxels when using time point instead of age as a predictor. Our time points are not evenly spaced in development with more time points concentrated in early life. We assessed whether this has an effect on the sexually dimorphic regions identified by using time point as a categorical predictor. We found more sexually dimorphic voxels compared to Figure 4, but many of the same regions are implicated in both figures. Supplementary Fig. 16: Sexually dimorphic voxels in the mouse brain after removal of all data corresponding to p65. The figure was generated in a similar manner to Figure 4, except all data from p65 was removed prior to statistical analysis. This was done as p65 is almost a month after the previous p36 time point and we wanted to assess whether this type of sampling would have any effect. Since the results are similar to Figure 4, we conclude that this does not for our study.
Supplementary Fig. 17: Voxels with sexually dimorphic absolute Jacobian determinants. The analysis was similar to Figure 4 except absolute Jacobian determinants were used as the dependent variable instead of relative determinants. In addition, 6th-order natural splines were used to model fixed effect of age and 2nd-order natural splines were used to model random effect of age. We observed the absolute volumes capture sexual dimorphisms in similar regions of the brain as relative volumes; however, relative volumes capture dimorphisms larger in females in regions like the somatosensory cortex, midbrain, and pons.