Introduction

Many brain structures present themselves as similar in men and women, especially when properly accounting for total brain size. It has therefore been proposed that both, brain structure and behavior, is largely overlapping between sexes1. However, one cannot assume a total absence of sex differences with regard to brain features as evidenced by both single studies and meta-analyses. In a gaussian-process regression coordinate-based meta-analysis including 16 voxel-based morphometry (VBM) studies, altogether comprising of 2,186 brain scans, Ruigrok and colleagues2 reported larger gray matter volumes (GMV) in women within the frontal pole, inferior/middle frontal gyrus, planum temporale/parietal operculum, anterior cingulate gyrus, right insular cortex, Heschlā€™s gyrus, thalamus, precuneus, parahippocampal gyrus, and lateral occipital cortex. In men, GMV was larger for the amygdala, hippocampus, parahippocampal gyrus, precuneus, putamen and temporal poles, the cingulate gyrus, as well as cerebellum2.

While meta-analyses have an enormous advantage over single studies in terms of statistical power, they are not immune to other pitfalls3,4, such as related to data acquisition, image analysis, and the (often manual) transfer of peak coordinates. Therefore, a study comparable in scale to meta-analyses, but not weakened by the typical methodological confounds would be desirable. On a similar note, studies investigating the reproducibility of sex effects in two independent cohorts using identical measurements and evaluation methods seem imperative.

Recently, in an UK biobank study on 5216 participants Ritchie and colleagues5 presented data on sex differences in total brain volume (preselected subcortical structures), cortical thickness (cortical structures), white matter, resting state connectivity and cognitive testing. Although this is the first study on such a large dataset, including different characteristic measurements of brain structure, connectivity and cognition, the investigated sample was not representative (mean age 62, range 44ā€“77) years, higher education over-represented). Roughly, Ritchie and colleagues described larger brain volume in all preselected subcortical areas (except n. accumbens) of both hemispheres in men and increased cortical thickness in women. Men showed larger variance of brain volume measures than women.

However, an investigation of sex differences in the brain of a large representative cohort with (1) a broad age range, (2) correction for the total brain volume (TBV), (3) inclusion of a number of confounds known to affect brain volume, (4) using voxel-based morphometry for differentiating subcortical subregions is still lacking.

For the general differences between sexes, such as larger TBV, GMV and WMV in men6 and local cortical differences between men and women in the fronto-parietal (womenā€‰>ā€‰men) and occipito-temporal cortex (menā€‰>ā€‰women), there seems to be increasing support1,2,7. In contrast, for subcortical structures, such as the hippocampus, inconsistent results have been reported. This inconsistency might result from differences in methodological approaches (lack of correction for TBV, global structure volume assessment in comparison to regional volume changes in VBM), but also from differences in cohort selection (age, sample size). For instance Neufang et al.8 found that testosterone levels predicted hippocampal size in younger females having larger hippocampi. Whereas sex differences in puberty and early adulthood may be particularly modulated by hormonal factors, in older adulthood environmental factors may have a greater impact. Therefore differences in younger collectives might well be absent in older cohorts, and vice versa. Large cohorts with a broad age range (21ā€“90 years) like the current one may have the power to detect sex differences in small structures, such as the hippocampus, as a function of age.

Here, we compared male and female brains with respect to local (voxel-wise) gray matter volume in a large representative sample. First, we tested for reproducibility of effects in two independent cohorts (n1ā€‰=ā€‰967; n2ā€‰=ā€‰1,871). Since both showed highly reproducible results, the methods of collecting data were identical and the cohorts were not overlapping, we were able to combine both to perform one unified analysis. For this purpose, we applied a state-of-the-art brain mapping approach9 and analyzed 2,838 T1-weighted scans obtained from these two general population cohorts10. In addition, we investigated the interaction of sex and age for the GMV in the hippocampus, since previous findings with regard to this structure were highly controversial and an impact of age on the hippocampal GMV can be assumed.

Results

Cohort 1 (nā€‰=ā€‰967) and Cohort 2 (nā€‰=ā€‰1,871)

Analyzing Cohort 1, in women (womenā€‰>ā€‰men), on average larger GMV was prominent in bilateral prefrontal areas, such as the ventrolateral prefrontal cortex (vlPFC, BA 47), the medial and lateral orbitofrontal cortex (OFC), the anterior cingulate cortex, the frontal pole, and the dorsolateral prefrontal cortex (dlPFC, namely BA 45, 46). In addition, women on average showed larger GMV in the right gyrus of Heschl, the bilateral lateral occipital lobe, posterior insula, the right superior parietal lobe (SPL), the bilateral superior temporal sulcus (STS), and the left posterior cerebellar hemisphere. Effect sizes (Cohenā€™s d) for womenā€‰>ā€‰men ranged from 0.30 (frontal pole) to 0.45 (lateral OFC). Findings are further detailed in Supplementary TableĀ 1A. In men (menā€‰>ā€‰women), on average larger GMV was evident in bilateral temporal areas, such as the parahippocampal gyrus, the hippocampus (Hi), the amygdala (Am), the temporal pole (TP), and the fusiform gyrus (FG), as well as the bilateral putamen (Pu), anterior cerebellar (aCBH, Larsellā€™s lobule IV-VII), and left primary visual cortex (BA 17, 18). Effect sizes (Cohenā€™s d) for menā€‰>ā€‰women ranged from 0.27 (vlPFC) to 0.49 (parahippocampal gyrus). Findings are further detailed in Supplementary TableĀ 1B.

Analyzing Cohort 2, reproduced all of the aforementioned effects. That is, all areas reported for Cohort 1 were also evident for Cohort 2 (see Supplemental TablesĀ 1 and 2). We therefore repeated the analyses pooling Cohort 1 and Cohort 2 (see next section).

Combined Cohort (nā€‰=ā€‰2,828)

Age did not differ between men and women. Men indicated longer education (12.97ā€‰Ā±ā€‰2.50 years) than women (12.44ā€‰Ā±ā€‰2.31 years; pā€‰<ā€‰0.001). As shown in Fig.Ā 1, on average, women had larger GMV (womenā€‰>ā€‰men) in bilateral vlPFC (BA47), medial and lateral OFC, ACC, frontal pole (BA 10), lateral occipital lobe (BA 19), right Heschl gyrus, bilateral dlPFC (BA 45,46), posterior insula, precuneus, STS, left thalamus and SPL and right posterior cerebellar hemisphere and IPL. On average, men had larger GMV (menā€‰>ā€‰women) in bilateral parahippocampal gyrus and hippocampus, amygdala, temporal pole, putamen, fusiform gyrus, anterior cerebellar hemisphere, primary visual cortex (BA 17), and premotor cortex (BA 6).

Figure 1
figure 1

Significant sex differences for the combined cohort (nā€‰=ā€‰2,838). Glass brain projections with labels (top) and MNI-standard brain projections (bottom). Orange clusters display regions with larger gray matter volume in women (womenā€‰>ā€‰men): pCBHā€‰=ā€‰posterior cerebellar hemisphere; IPLā€‰=ā€‰inferior parietal lobe; SPLā€‰=ā€‰superior parietal lobe; STSā€‰=ā€‰superior temporal sulcus; ACCā€‰=ā€‰anterior cingulate cortex; BAā€‰=ā€‰Brodmann areas 45, 46, 47, 10; OFCā€‰=ā€‰orbitofrontal cortex; pInsā€‰=ā€‰posterior insula. Blue clusters display regions with significantly larger gray matter volume in men (menā€‰>ā€‰women): BAā€‰=ā€‰Brodmann areas 6, 17; aCBHā€‰=ā€‰anterior cerebellar hemisphere, Hiā€‰=ā€‰hippocampus, Thā€‰=ā€‰thalamus, Puā€‰=ā€‰putamen, TPā€‰=ā€‰temporal pole, FGā€‰=ā€‰fusiform gyrus, Amā€‰=ā€‰amygdala). All findings are significant at pā€‰ā‰¤ā€‰0.05, FWE corrected for multiple comparisons.

When testing an interaction of age (median (53 years) split of the sample) and sex we found a significant effect for the hippocampus. Post hoc t-tests demonstrated that older women (ā‰„53 years) had larger posterior-superior hippocampal GMV (tā€‰=ā€‰5.52; Cohens dā€‰=ā€‰0.21; 141 voxels in ROI; MNI-coordinates: āˆ’36, āˆ’36, āˆ’9) than older men. In contrast, younger men had larger GMV in the anterior-inferior hippocampus than younger women (tā€‰=ā€‰8.21; Cohens dā€‰=ā€‰0.31; 603 voxels in ROI; MNI-coordinates: āˆ’21, āˆ’2, āˆ’22). Both effects were only observed for the left hemisphere. The comparisons older menā€‰>ā€‰older women and younger womenā€‰>ā€‰younger men revealed no significant effects.

Effect sizes for womenā€‰>ā€‰men were as large as dā€‰=ā€‰0.38 (prefrontal cortex) and for menā€‰>ā€‰women as large as dā€‰=ā€‰0.53 (parahippocampus). A detailed list of different regions between men and women is provided in TablesĀ 1 and 2.

Table 1 Whole Sample (nā€‰=ā€‰2,838): Womenā€‰>ā€‰Men.
Table 2 Whole Sample (nā€‰=ā€‰2,838): Menā€‰>ā€‰Women.

Discussion

The current study compared sex differences in the brain examining gray matter volume in two independent cohorts. We found a high reproducibility of effects between cohorts and therefore pooled the data for a unified analysis, which resulted in a well-powered sample (nā€‰=ā€‰2,838). Since this study did not directly measure associations between brain structure and behavior interpretations drawn between brain structure and behavioral implications are speculative.

Correspondence with previous findings

In our study, the most compelling differences between cortical GMV of men and women laid in the larger prefrontal GMV in women and larger anterior-medial temporal GMV in men. This confirms results of Chen and colleagues7 describing regional GMV differences in an cohort of 411 middle-aged healthy participants (44ā€“48 years) with menā€‰>ā€‰women in midbrain, left inferior temporal gyrus, right occipital lingual gyrus, right middle temporal gyrus, and both cerebellar hemispheres and womenā€‰>ā€‰men in dorsal anterior, posterior and ventral cingulate cortices, and right inferior parietal lobule. In addition, the present study largely confirmed the meta-analytic findings by Ruigrok and colleagues2. That is, we detected larger GMV in women in the inferior and middle frontal gyrus, the ACC, the right OFC, the right insula, the lateral occipital cortex, the Heschl gyrus, the thalamus, the precuneus, but not in the planum temporale/Wernickeā€™s area.

GMV-differences in subcortical structures (parahippocampus, hippocampus, thalamus)

For the parahippocampus, Ruigrok and colleagues2 reported larger GMV posteriorly in women, and larger GMV anteriorly in men. Interestingly, the parahippocampus showed the strongest sex effect (menā€‰>ā€‰women) in the present study and we did not observe any effect for womenā€‰>ā€‰men in this area. For the parahippocampal gyrus, Ritchie and colleagues5 reported that females showed relatively higher thickness but males showed relatively higher volume and surface area.

In the current study, the GMV in the anterior-inferior hippocampus was larger in men than in women. However, testing the interaction of age and sex, this held true only for the younger part of the sample (median split, (<53 years), but not for the older (ā‰„53 years). In contrast, older women showed increased left posterior-superior hippocampal GMV compared to older men. It might well be the case that for women hormonal changes after menopause modulate these specific hippocampal GMV differences in comparison to men11. Additional information on this effect is provided in the Supplement. In accordance with our study (but measured for the complete structure volume), Ritchie and colleagues5 (mean age 62 years) reported no sex differences in hippocampal volume after correction for total brain volume. Our results are also corroborated by the meta-analysis of Ruigrok et al.2, showing increased hippocampal volume bilaterally for men.

We found larger GMV of the thalamus in men compared to women in contrast to Ruigrok and colleagues2 (increased thalamic GMV in females), except for the left thalamus, where we found a larger GMV for the posterior part in women. This demonstrates the strength of a voxelwise analysis enabling a more detailed analysis of subregions.

Larger GMV in men in motor areas

For men compared to women, we observed larger GMV in the putamen, the premotor cortex (BA6), and the anterior cerebellum (i.e., structures involved in motor function). Ruigrok et al.2 likewise found larger GMV in men in the bilateral putamen, bilateral cerebellum and the left precentral gyrus. Larger GMV in motor areas in men may arise during the phases when testosterone in boys and estradiol in girls are causing the greatest modulation of the brain8.

Larger GMV in women in prefrontal areas

Increased GMV in womenā€™s prefrontal areas has been reported in a number of smaller studies and was therefore the most prominent result in the large meta-analysis by Ruigrok et al.2. The present study confirms these results with women demonstrating larger GMV in bilateral dorso- and ventrolateral prefrontal cortices, the frontal pole, and the medial orbitofrontal cortex. In contrast to Ritchie et al.5, who were speculating about the functional meaning of higher prefrontal GMV in men as ā€œregions that showed the largest effects were broadly areas involved in the hypothesized intelligence-related circuit in the ā€œP-FITā€ modelā€œ, we demonstrated the contrary with females showing larger GMV in the same areas. Although our study did not measure cognitive or behavioral data, and is thus not able to draw conclusions about cognitive functioning and brain structure, we would like to point out that increased GMV is usually associated with a better functioning in the cognitive domain12. Prefrontal areas with larger GMV in women are functionally important for executive functioning13, such as planning, working memory, inhibition, mental flexibility as well as the initiation and monitoring of action, but also for emotional control, moral considerations14 and processing of language15.

Do differences between men and women do not allow for individual assignment?

Although these sex differences have been robustly observed in different cohorts, a relevance for an individual is rather small: Joel and colleagues demonstrated that there is a considerable overlap between the features of brain form between males and females and that these features are internally inconsistent1, even when considering only those showing the largest sex differences. In response to the Joel et al.1 study, Chekroud et al.16 used a multivoxel pattern analysis to distinguish male and female brains by structural differences. They found a classification accuracy of 93ā€“95% and concluded that sex can be reliably predicted by brain structure when considering the brain mosaic as a whole.

Limitations

Brain structural differences between men and women are the result of complex biological and environmental influences and the underlying neural mechanisms a matter of ongoing discussion. Additionally, no complete understanding exists whether more GMV is associated with improved function, even if most studies comparing experts and non-experts or longitudinal studies applying training paradigms demonstrated specifically increased GMV in those areas functionally representing improved performance17,18,19. However, these associations are poorly understood and a matter of ongoing discussions20.

Furthermore, while cognitive function is associated with GMV, it has also been linked to white matter and structural connectivity between different brain regions21. Thus, gray matter may explain some, but not all of the differences. In addition, sex-specific incidence of pathologies may have an impact on differences in GMV between men and women. In the current study, all pathologic brain scans had been excluded in this sample, as described in the Methods.

Finally, different measuring techniques of GMV do only partially provide comparable results. A major drawback of voxel-based measurements is that they combine cortical thickness and surface area into one single measurement. It has been demonstrated that vertex-based measures (cortical thickness, surface area) are more or less independent of each other22. A global or local change of these measures in different directions (e.g. increase of cortical thickness, decrease of surface area) wouldnā€™t necessarily be visible in voxel-based morphometry, and this may be one principal explanation for the differences between vertex- and voxel-based measures.

Conclusion

The outcomes of this large-scale study offer an excellent starting point for follow-up research elucidating the role of a sex-specific brain anatomy for cognitive, emotional, and behavioral differences between men and women. In particular, the combination of brain morphology and behavioral testing of cohorts is a challenge for the future. Moreover, they may help to explain sex differences in the prevalence and progression for a number of disorders, diseases, and disabilities.

Methods and Materials

Sample and Imaging

The Study of health in Pomerania (SHIP) comprises two independent general population cohorts, SHIP and SHIP-TREND. The primary objectives of SHIP were (i) to assess prevalence and incidence of common risk factors, subclinical disorders and clinical diseases; and (ii) to investigate the complex associations between the aforementioned issues.

Participants were selected from West Pomerania in Northeastern Germany. Inclusion criteria were primary place of residence in the target area and age 20ā€“79 at sampling. No other criteria were employed for exclusion or inclusion to obtain a general population sample as representative as possible. Invitations comprised three written invitations, phone calls, and personal contacts.

In total, out of 6,265 eligible individuals, 4,308 participated (response 68.8%) in the SHIP-0 baseline examinations (1997ā€“2001). Follow-ups took place from 2002ā€“2006 (SHIP-1, Nā€‰=ā€‰3300) and from 2008ā€“2012 (SHIP-2, Nā€‰=ā€‰2333). SHIP-Trend was a new cohort established in 2008. Out of 8826 eligible subjects 4420 (2,275 women) participated (response 50.1%). Both cohorts showed no overlap since a selection criteria of SHIP-TREND was no participation in SHIP-0, a baseline examination of SHIP-2. In total 3371 out of 6753 SHIP-2 and SHIP-Trend participants took part in the MRI examination. High-resolution magnetic resonance imaging (MRI) data for this project were available from nā€‰=ā€‰1,182 SHIP-2 and from nā€‰=ā€‰2,186 SHIP-Trend-0 participants. For further details of the procedures involved in the selection of participants and amount of data gathered please refer to10,23. TableĀ 3 is providing the descriptive data for the entire sample. The age range was 21ā€“90 years.

Table 3 Demographic data for the two cohorts.

The study protocol was approved by the Ethics Committee of the University Medicine of Greifswald and written informed consent was obtained from each subject. In addition, all methods were performed in accordance with the relevant guidelines and regulations. All brain images were obtained on the same 1.5 Tesla Siemens MRI scanner (Magnetom Avanto, Siemens Medical Systems, Erlangen, Germany) without software updates during the evaluation period. More specifically a T1-weighted magnetization prepared rapid acquisition gradient echo (MPRAGE) sequence was used with the following parameters: 176 slices, matrixā€‰=ā€‰256ā€‰Ć—ā€‰176 pixels, voxel sizeā€‰=ā€‰1.0ā€‰mm isotropic, slice thicknessā€‰=ā€‰1.0ā€‰mm, repetition timeā€‰=ā€‰1900 ms, echo timeā€‰=ā€‰3.37ā€‰ms, flip angle 15Ā°.

Quality control and exclusion of pathologies

All MRI head scans were visually inspected with regard to image artifacts and clinical abnormalities. Any brain images indicating stroke, multiple sclerosis, epilepsy, Parkinsonā€™s disease, dementia, cerebral tumor, intracranial cyst or hydrocephalus were excluded, leaving 1,081 (SHIP-2) and 2,046 (SHIP-Trend-0) images. Furthermore, subjects with recorded intake of anxiolytics or opioids, as well as with PHQ9 (Patient Health Questionnaire with 9 responses) depression scores24 greater than 14 were excluded, leaving 1,037 (SHIP-2) and 1,984 (SHIP-Trend-0) images. Finally, all subjects with incomplete datasets for possible confounds (i.e., age, years of education, nicotine intake, alcohol consumption, body mass index) were excluded. The final sample contained 2,838 subjects, with 967 subjects from SHIP-2 and 1,871 subjects from SHIP-Trend-0. We differentiated ā€œsexā€ as the item ā€œmanā€œ or ā€œwomenā€ as provided by verbal questionnaire by the participant.

Data preprocessing

T1-weighted images were preprocessed in MATLAB (The MathWorks, Natick, MA) using Statistical Parametric Mapping, version 12 (SPM12; Wellcome Department of Cognitive Neurology, University of London) and the Computation Anatomy Toolbox (CAT) for SPM (CAT 12; Christian Gaser; Department of Psychiatry, University of Jena) with CAT12 default parameters, as described elsewhere25. Briefly, images were corrected for magnetic field inhomogenities, spatially normalized using the DARTEL algorithm26, and segmented into GM, white matter (WM), and cerebrospinal fluid (CSF). The segmentation process was further enhanced by accounting for partial volume effects27 and by using a hidden Markov Random Field (MRF) model28. Finally, the resulting GM segments were smoothed using a Gaussian kernel of 8ā€‰mm full width at half maximum (FWHM). In addition, all scans underwent an automated quality check, revealing an index of quality rating (IQR), which later was used as an additional covariate in the statistical model. Total brain volume (TBV) was calculated as sum of GM, WM, and CSF, (also to be used later as a statistical covariate).

Statistical analyses

We first investigated whether there were significant GMV differences between men and women in cohort 1 (SHIP-2; nā€‰=ā€‰967) and cohort 2 (SHIP-Trend-0; nā€‰=ā€‰1,871), separately. Then, we tested for significant differences between those cohort-specific effects (SPM, two sample t-test). Since the absence of a significant effect between two cohorts does actually not allow to consider that both groups are equivalent we used a modified strategy as suggestions by Lakens29 for our voxel based statistical approach. The highest trend for an effect between cohorts was observed for the contrast SHIP2 minus Trend0 in the right BA47 with a t-value of 3.80. When calculating an effect size for this t-value (considering group sizes, two sample t-test for independent means of the two groups) we found an effect size (g*power, version 3.1) of Cohens dā€‰=ā€‰0.29 not relevant according to Lakens29. After ensuring that there were no GMV sex differences between the cohorts, we finally evaluated both cohorts together (combined; nā€‰=ā€‰2,838). For this purpose, a full factorial model as implemented in SPM12 was applied, while removing the variance associated with the following variables: TBV, IQR, age, years of education, nicotine intake, alcohol consumption, and body mass index (BMI). Alpha was set at pā€‰<ā€‰0.05, and corrections for multiple comparisons were applied using the family-wise error (FWE) rate. Clusters smaller than 10 voxel were not considered.

Anatomical labeling

The anatomical differentiation of significant effects was predominantly performed with ANATOMY, version 2.2b30. For regions that have not yet been classified cytoarchitecturally using ANATOMY, the most appropriate differentiations suggested by other atlases were applied. That is, for BA 46 we used Sallet et al.31, for the insula we used Neuromorphometrics (Neuromorphometrics, Inc.) as provided with the SPM12 package, and for the cerebellum, putamen, temporal pole and fusiform gyrus, we used the AAL atlas32.