Main

The simple framework of growth charts to quantify age-related change was first published in the late eighteenth century1 and remains a cornerstone of paediatric healthcare—an enduring example of the utility of standardized norms to benchmark individual trajectories of development. However, growth charts are currently available only for a small set of anthropometric variables, such as height, weight and head circumference, and only for the first decade of life. There are no analogous charts available for quantification of age-related changes in the human brain, although it is known to go through a prolonged and complex maturational program from pregnancy to the third decade4, followed by progressive senescence from approximately the sixth decade5. The lack of tools for standardized assessment of brain development and ageing is particularly relevant to research studies of psychiatric disorders, which are increasingly recognized as a consequence of atypical brain development6, and neurodegenerative diseases that cause pathological brain changes in the context of normative senescence7. Preterm birth and neurogenetic disorders are also associated with marked abnormalities of brain structure8,9 that persist into adult life9,10 and are associated with learning disabilities and mental health disorders. Mental illness and dementia collectively represent the single biggest global health burden11, highlighting the urgent need for normative brain charts as an anchor point for standardized quantification of brain structure over the lifespan12.

Such standards for human brain measurement have not yet materialized from decades of neuroimaging research, probably owing to the challenges of integrating MRI data across multiple, methodologically diverse studies targeting distinct developmental epochs and clinical conditions13. For example, the perinatal period is rarely incorporated in analysis of age-related brain changes, despite evidence that early biophysical and molecular processes powerfully influence life-long neurodevelopmental trajectories14,15 and vulnerability to psychiatric disorders3. Primary case–control studies are usually focused on a single disorder despite evidence of trans-diagnostically shared risk factors and pathogenic mechanisms, especially in psychiatry16,17. Harmonization of MRI data across primary studies to address these and other deficiencies in the extant literature is challenged by methodological and technical heterogeneity. Compared with relatively simple anthropometric measurements such as height or weight, brain morphometrics are known to be highly sensitive to variation in scanner platforms and sequences, data quality control, pre-processing and statistical analysis18, thus severely limiting the generalizability of trajectories estimated from any individual study19. Collaborative initiatives spurring collection of large-scale datasets20,21, recent advances in neuroimaging data processing22,23 and proven statistical frameworks for modelling biological growth curves2,24,25 provide the building blocks for a more comprehensive and generalizable approach to age-normed quantification of MRI phenotypes over the entire lifespan (see Supplementary Information 1 for details and consideration of previous work focused on the related but distinct objective of inferring brain age from MRI data). Here, we demonstrate that these convergent advances now enable the generation of brain charts that (1) robustly define normative processes of sex-stratified, age-related change in multiple MRI-derived phenotypes; (2) identify previously unreported brain growth milestones; (3) increase sensitivity to detect genetic and early life environmental effects on brain structure; and (4) provide standardized effect sizes to quantify neuroanatomical atypicality of brain scans collected across multiple clinical disorders. We do not claim to have yet reached the ultimate goal of quantitatively precise diagnosis of MRI scans from individual patients in clinical practice. However, the present work proves the principle that building normative charts to benchmark individual differences in brain structure is already achievable at global scale and over the entire life-course; and provides a suite of open science resources for the neuroimaging research community to accelerate further progress in the direction of standardized quantitative assessment of MRI data.

Mapping normative brain growth

We created brain charts for the human lifespan using generalized additive models for location, scale and shape2,24 (GAMLSS), a robust and flexible framework for modelling non-linear growth trajectories recommended by the World Health Organization24. GAMLSS and related statistical frameworks have previously been applied to developmental modelling of brain structural and functional MRI phenotypes in open datasets19,26,27,28,29,30,31. Our approach to GAMLSS modelling leveraged the greater scale of data available to optimize model selection empirically, to estimate non-linear age-related trends (in median and variance) stratified by sex over the entire lifespan, and to account for site- or study-specific ‘batch effects’ on MRI phenotypes in terms of multiple random effect parameters. Specifically, GAMLSS models were fitted to structural MRI data from control subjects for the four main tissue volumes of the cerebrum (total cortical grey matter volume (GMV), total white matter volume (WMV), total subcortical grey matter volume (sGMV) and total ventricular cerebrospinal fluid volume (ventricles or CSF)). Supplementary Tables 1.11.8 present details on acquisition, processing and demographics of the dataset; see Methods, ‘Model generation and specification’ and Supplementary Information 1 for further details regarding GAMLSS model specification and estimation; image quality control, which used a combination of expert visual curation and automated metrics of image quality (Supplementary Information 2); model stability and robustness (Supplementary Information 3, 4); phenotypic validation against non-imaging metrics (Supplementary Information 3 and 5.2); inter-study harmonization (Supplementary Information 5); and assessment of cohort effects (Supplementary Information 6). See Supplementary Information 19 for details on all primary studies contributing to the reference dataset, including multiple publicly available open MRI datasets32,33,34,35,36,37,38,39,40,41,42.

Lifespan curves (Fig. 1, Supplementary Table 2.1) showed an initial strong increase in GMV from mid-gestation onwards, peaking at 5.9 years (95% bootstrap confidence interval (CI) 5.8–6.1), followed by a near-linear decrease. This peak was observed 2 to 3 years later than previous reports relying on smaller, more age-restricted samples43,44. WMV also increased rapidly from mid-gestation to early childhood, peaking at 28.7 years (95% bootstrap CI 28.1–29.2), with subsequent accelerated decline in WMV after 50 years. Subcortical GMV showed an intermediate growth pattern compared with GMV and WMV, peaking in adolescence at 14.4 years (95% bootstrap CI 14.0–14.7). Both the WMV and sGMV peaks are consistent with previous neuroimaging and postmortem reports45,46. By contrast, CSF showed an increase until age 2, followed by a plateau until age 30, and then a slow linear increase that became exponential in the sixth decade of life. Age-related variance (Fig. 1d), explicitly estimated by GAMLSS, formally quantifies developmental changes in between-subject variability. There was an early developmental increase in GMV variability that peaked at 4 years, whereas subcortical volume variability peaked in late adolescence. WMV variability peaked during the fourth decade of life, and CSF was maximally variable at the end of the human lifespan.

Fig. 1: Human brain charts.
figure 1

a, MRI data were aggregated from over 100 primary studies comprising 123,984 scans that collectively spanned the age range from mid-gestation to 100 postnatal years. Box–violin plots show the age distribution for each study coloured by its relative sample size (log-scaled using the natural logarithm for visualization purposes). b, Non-centiled, ‘raw’ bilateral cerebrum tissue volumes for grey matter, white matter, subcortical grey matter and ventricles are plotted for each cross-sectional control scan as a function of age (log-scaled); points are coloured by sex. c, Normative brain-volume trajectories were estimated using GAMLSS, accounting for site- and study-specific batch effects, and stratified by sex (female, red; male, blue). All four cerebrum tissue volumes demonstrated distinct, non-linear trajectories of their medians (with 2.5% and 97.5% centiles denoted as dotted lines) as a function of age over the lifespan. Demographics for each cross-sectional sample of healthy controls included in the reference dataset for normative GAMLSS modelling of each MRI phenotype are detailed in Supplementary Table 1.21.8. d, Trajectories of median between-subject variability and 95% confidence intervals for four cerebrum tissue volumes were estimated by sex-stratified bootstrapping (see Supplementary Information 3 for details). e, Rates of volumetric change across the lifespan for each tissue volume, stratified by sex, were estimated by the first derivatives of the median volumetric trajectories. For solid (parenchymal) tissue volumes, the horizontal line (y = 0) indicates when the volume at which each tissue stops growing and starts shrinking and the solid vertical line indicates the age of maximum growth of each tissue. See Supplementary Table 2.1 for all neurodevelopmental milestones and their confidence intervals. Note that y axes in be are scaled in units of 10,000 mm3 (10 ml).

Extended neuroimaging phenotypes

To extend the scope of brain charts beyond the four cerebrum tissue volumes, we generalized the same GAMLSS modelling approach to estimate normative trajectories for additional MRI phenotypes including other morphometric properties at a global scale (mean cortical thickness and total surface area) and regional volume at each of 34 cortical areas47 (Fig. 2, Supplementary Information 79, Supplementary Tables 1, 2). We found, as expected, that total surface area closely tracked the development of total cerebrum volume (TCV) across the lifespan (Fig. 2a), with both metrics peaking at approximately 11–12 years of age (surface area peak at 10.97 years (95% bootstrap CI 10.42–11.51); TCV peak at 12.5 years (95% bootstrap CI 12.14–12.89). By contrast, cortical thickness peaked distinctively early at 1.7 years (95% bootstrap CI 1.3–2.1), which reconciles previous observations that cortical thickness increases during the perinatal period48 and declines during later development49 (Supplementary Information 7).

Fig. 2: Extended global and regional cortical morphometric phenotypes.
figure 2

a, Trajectories for total cerebrum volume (TCV), total surface area and mean cortical thickness. For each global cortical MRI phenotype, the following sex-stratified results are shown as a function of age over the lifespan. From top to bottom: raw, non-centiled data; population trajectories of the median (with 2.5% and 97.5% centiles (dotted lines)); between-subject variance (with 95% confidence intervals); and rate of growth (the first derivatives of the median trajectory and 95% confidence intervals). All trajectories are plotted as a function of log-scaled age (x axis) and y axes are scaled in units of the corresponding MRI metrics (10,000 mm3 for TCV, 10,000 mm2 for surface area and mm for cortical thickness). b, Regional variability of cortical volume trajectories for 34 bilateral brain regions, as defined by the Desikan–Killiany parcellation47, averaged across sex (see Supplementary Information 7,8 for details). Since models were generated from bilateral averages of each cortical region, the cortical maps are plotted on the left hemisphere purely for visualization purposes. Top, a cortical map of age at peak regional volume (range 2–10 years). Middle, a cortical map of age at peak regional volume relative to age at peak GMV (5.9 years), highlighting regions that peak earlier (blue) or later (red) than GMV. Bottom, illustrative trajectories for the earliest peaking region (superior parietal lobe, blue line) and the latest peaking region (insula, red line), showing the range of regional variability relative to the GMV trajectory (grey line). Regional volume peaks are denoted as dotted vertical lines either side of the global peak, denoted as a dashed vertical line, in the bottom panel. The left y axis on the bottom panel refers to the earliest peak (blue line); the right y axis refers to the latest peak (red line).

We also found evidence for regional variability in volumetric neurodevelopmental trajectories. Compared with peak GMV at 5.9 years, the age of peak regional grey matter volume varied considerably—from approximately 2 to 10 years—across 34 cortical areas. Primary sensory regions reached peak volume earliest and showed faster post-peak declines, whereas fronto-temporal association cortical areas peaked later and showed slower post-peak declines (Fig. 2b, Supplementary Information 8.2). Notably, this spatial pattern recapitulated a gradient from sensory-to-association cortex that has been previously associated with multiple aspects of brain structure and function50.

Developmental milestones

Neuroimaging milestones are defined by inflection points of the tissue-specific volumetric trajectories (Fig. 3, Methods, ‘Defining developmental milestones’). Among the total tissue volumes, only GMV peaked before the typical age at onset of puberty51, with sGMV peaking mid-puberty and WMV peaking in young adulthood (Fig. 3). The rate of growth (velocity) peaked in infancy and early childhood for GMV (5.08 months (95% bootstrap CI 4.85–5.22)), sGMV (5.65 months (95% bootstrap CI 5.75–5.83)) and WMV (2.4 years (95% bootstrap CI 2.2–2.6)). TCV velocity peaked between the maximum velocity for GMV and WMV at approximately 7 months. Two major milestones of TCV and sGMV (peak velocity and size) (Fig. 3) coincided with the early neonatal and adolescent peaks of height and weight velocity52,53. The velocity of mean cortical thickness peaked even earlier, in the prenatal period at −0.38 years (95% bootstrap CI −0.4 to −0.34) (relative to birth), corresponding approximately to mid-gestation. This early peak in cortical thickness velocity has not been reported previously—to our knowledge—in part owing to challenges in acquiring adequate and consistent signal from typical MRI sequences in the perinatal period54. Similarly, normative trajectories revealed an early period of GMV:WMV differentiation, beginning in the first month after birth with the switch from WMV to GMV as the proportionally dominant tissue compartment, and ending when the absolute difference of GMV and WMV peaked around 3 years (Supplementary Information 9). This epoch of GMV:WMV differentiation, which may reflect underlying changes in myelination and synaptic proliferation4,55,56,57,58, has not been demarcated in previous studies45,59. It was probably identified in this study owing to the substantial amount of early developmental MRI data available for analysis in the aggregated dataset (in total across all primary studies, N = 2,571 and N = 1,484 participants aged less than 2 years were available for analysis of cerebrum tissue volumes and extended global MRI phenotypes, respectively). The period of GMV:WMV differentiation encompasses dynamic changes in brain metabolites60 (0–3 months), resting metabolic rate61 (RMR) (minimum = 7 months, maximum = 4.2 years), the typical period of acquisition of motor capabilities and other early paediatric milestones62, and the most rapid change in TCV (Fig. 3).

Fig. 3: Neurodevelopmental milestones.
figure 3

Top, a graphical summary of the normative trajectories of the median (50th centile) for each global MRI phenotype, and key developmental milestones, as a function of age (log-scaled). Circles depict the peak rate of growth milestones for each phenotype (defined by the maxima of the first derivatives of the median trajectories (Fig. 1e)). Triangles depict the peak volume of each phenotype (defined by the maxima of the median trajectories); the definition of GMV:WMV differentiation is detailed in Supplementary Information 9.1. Bottom, a graphical summary of additional MRI and non-MRI developmental stages and milestones. From top to bottom: blue shaded boxes denote the age range of incidence for each of the major clinical disorders represented in the MRI dataset; black boxes denote the age at which these conditions are generally diagnosed as derived from literature73 (Methods); brown lines represent the normative intervals for developmental milestones derived from non-MRI data, based on previous literature and averaged across males and females (Methods); grey bars depict age ranges for existing (World Health Organization (WHO) and Centers for Disease Control and Prevention (CDC)) growth charts of anthropometric and ultrasonographic variables24. Across both panels, light grey vertical lines delimit lifespan epochs (labelled above the top panel) previously defined by neurobiological criteria63. Tanner refers to the Tanner scale of physical development. AD, Alzheimer’s disease; ADHD, attention deficit hyperactivity disorder; ASD, autism spectrum disorder (including high-risk individuals with confirmed diagnosis at a later age); ANX, anxiety or phobic disorders; BD, bipolar disorder; MDD, major depressive disorder; RMR, resting metabolic rate; SCZ, schizophrenia.

Individualized centile scores

We computed individualized centile scores that benchmarked each individual scan in the context of normative age-related trends (Methods, ‘Centile scores and case–control differences’ and Supplementary Information 16 for further details). This approach is conceptually similar to quantile rank mapping, as previously reported26,28,29, where the typicality or atypicality of each phenotype in each scan is quantified by its score on the distribution of phenotypic parameters in the normative or reference sample of scans, with more atypical phenotypes having more extreme centile (or quantile) scores. The clinical diversity of the aggregated dataset enabled us to comprehensively investigate case–control differences in individually specific centile scores across a range of conditions. Relative to the control group (CN), there were highly significant differences in centile scores across large (N > 500) groups of cases diagnosed with multiple disorders (Fig. 4a, Supplementary Information  10), with effect sizes ranging from medium (0.2 < Cohen’s d < 0.8) to large (Cohen’s d > 0.8) (see Supplementary Tables 3, 4 for all false discovery rate (FDR)-corrected P values and effect sizes). Clinical case–control differences in cortical thickness and surface area generally followed the same trend as volume differences (Supplementary Information 10). Alzheimer’s disease showed the greatest overall difference, with a maximum difference localized to grey matter volume in biologically female patients (median centile score = 14%, 36 percentage points difference from CN median, corresponding to Cohen’s d = 0.88; Fig. 4a). In addition, we generated a cumulative deviation metric, the centile Mahalanobis distance (CMD), to summarize a comparative assessment of brain morphology across all global MRI phenotypes relative to the CN group (Fig. 4b, Supplementary Information 1.6). Notably, schizophrenia ranked third overall behind Alzheimer’s disease and mild cognitive impairment (MCI) on the basis of CMD (Fig. 4c). Assessment across diagnostic groups, based on profiles of the multiple centile scores for each MRI phenotype and for CMD, highlighted shared and distinct patterns across clinical conditions (Supplementary Information 10, 11). However, when examining cross-disorder similarity of multivariate centile scores, hierarchical clustering yielded three clusters broadly comprising neurodegenerative, mood and anxiety, and neurodevelopmental disorders (Supplementary Information 11).

Fig. 4: Case–control differences and heritability of centile scores.
figure 4

a, Centile score distributions for each diagnostic category of clinical cases relative to the control group median (depicted as a horizontal black line). The median deviation of centile scores in each diagnostic category is overlaid as a lollipop plot (white lines with circles corresponding to the median centile score for each group of cases). Pairwise tests for significance were based on Monte Carlo resampling (10,000 permutations) and P values were adjusted for multiple comparisons using the Benjamini–Hochberg false discovery rate (FDR) correction across all possible case–control differences. Only significant differences from the control group (CN) median (with corrected P < 0.001) are highlighted with an asterisk. For a complete overview of all pairwise comparisons, see Supplementary Information 10, Supplementary Table 3. Groups are ordered by their multivariate distance from the CN group (see c and Supplementary Information 10.3). b, The CMD is a summary metric that quantifies the aggregate atypicality of an individual scan in terms of all global MRI phenotypes. The schematic shows segmentation of four cerebrum tissue volumes, followed by estimation of univariate centile scores, leading to the orthogonal projection of a single participant’s scan (Subx) onto the four respective principal components of the CN (coloured axes and arrows). The CMD for Subx is then the sum of its distances from the CN group mean on all four dimensions of the multivariate space. c, Probability density plots of CMD across disorders. Vertical black line depicts the median CMD of the control group. Asterisks indicate an FDR-corrected significant difference from the CN group (P < 0.001). d, Heritability of raw volumetric phenotypes and their centile scores across two twin studies (Adolescent Brain Cognitive Development (ABCD) and Human Connectome Project (HCP)); Supplementary Information 19), see Supplementary Information 13 for a full overview of statistics for each individual feature in each dataset. Data are mean ± s.e.m. (although some confidence intervals are too narrow to be seen). MCI, mild cognitive impairment. See Fig. 3 for other diagnostic abbreviations. FDR-corrected significance: *P < 0.05, **P < 0.01, ***P < 0.001.

Across all major epochs of the lifespan63, the CMD was consistently greater in cases relative to controls, irrespective of diagnostic category. The largest case–control differences across epochs occurred in late adulthood when risk for dementia increases and in adolescence, which is well-recognized as a period of increased incidence of mental health disorders (Supplementary Information 10.3). In five primary studies covering the lifespan, average centile scores across global tissues were related to two metrics of premature birth (gestational age at birth: t = 13.164, P < 2 × 10−16; birth weight: t = 36.395, P < 2 × 10−16; Supplementary Information 12), such that greater gestational age and birth weight were associated with higher average centile scores. Centile scores also showed increased twin-based heritability in two independent studies (total N = 913 twin pairs) compared with non-centiled phenotypes (average increase of 11.8 percentage points in narrow sense heritability (h2) across phenotypes; Fig. 4d, Supplementary Information 13). In summary, centile normalization of brain metrics reproducibly detected case–control differences and genetic effects on brain structure, as well as long-term sequelae of adverse birth outcomes even in the adult brain10.

Longitudinal centile changes

Owing to the relative paucity of longitudinal imaging data (about 10% of the reference dataset), normative models were estimated from cross-sectional data collected at a single time point. However, the generalizability of cross-sectional models to longitudinal assessment is important for future research. Within-subject variability of centile scores derived from longitudinally repeated scans, measured with the interquartile range (IQR) (Methods, ‘Longitudinal stability’, Supplementary Information 1.7), was low across both clinical and CN groups (all median IQR < 0.05 centile points), indicating that centile scoring of brain structure was generally stable over time, although there was also some evidence of between-study and cross-disorder differences in within-subject variability (Supplementary Information 14). Notably, individuals who changed diagnostic categories—for example, those who progressed from mild cognitive impairment to Alzheimer’s disease over the course of repeated scanning—showed small but significant increases in within-subject variability of centile scores (Supplementary Information 14, Supplementary Tables 5, 6). Within-subject variability was also slightly higher in samples from younger individuals (Supplementary Information 14), which could reflect increased noise due to the technical or data quality challenges associated with scanning younger individuals, but is also consistent with the evidence of increased variability in earlier development observed across other anthropometric traits64.

Centile scoring of new MRI data

A key challenge for brain charts is the accurate centile scoring of out-of-sample MRI data, not represented in the reference dataset used to estimate normative trajectories. We therefore carefully evaluated the reliability and validity of brain charts for centile scoring of such ‘new’ scans. For each new MRI study, we used maximum likelihood to estimate study-specific statistical offsets from the age-appropriate epoch of the normative trajectory; we then estimated centile scores for each individual in the new study benchmarked against the offset trajectory (Fig. 5, Methods, ‘Data-sharing and out-of-sample estimation’, Supplementary Information 1.8). Extensive jack-knife and leave-one-study-out analyses indicated that a study size of N > 100 scans was sufficient for stable and unbiased estimation of out-of-sample centile scores (Supplementary Information 4). This study size limit is in line with the size of many contemporary brain MRI research studies. However, these results do not immediately support the use of brain charts to generate centile scores from smaller-scale research studies, or from an individual patient’s scan in clinical practice—this remains a goal for future work. Out-of-sample centile scores proved highly reliable in multiple test–retest datasets and were robust to variations in image processing pipelines (Supplementary Information 4).

Fig. 5: Schematic overview of brain charts, highlighting methods for out-of-sample centile scoring.
figure 5

Top, brain phenotypes were measured in a reference dataset of MRI scans. GAMLSS modelling was used to estimate the relationship between (global) MRI phenotypes and age, stratified by sex, and controlling for technical and other sources of variation between scanning sites and primary studies. Bottom, the normative trajectory of the median and confidence interval for each phenotype was plotted as a population reference curve. Out-of-sample data from a new MRI study were aligned to the corresponding epoch of the normative trajectory, using maximum likelihood to estimate the study specific offsets (random effects) for three moments of the underlying statistical distributions: mean (\(\mu \)), variance (\(\sigma \)), and skewness (ν) in an age- and sex-specific manner. Centile scores of each phenotype could then be estimated for each scan in the new study, on the same scale as the reference population curve, while accounting for study-specific ‘batch effects’ on technical or other sources of variation (see Supplementary Information 1.8 for details). MLE, maximum likelihood estimation.

Discussion

We have aggregated the largest neuroimaging dataset to date to modernize the concept of growth charts for mapping typical and atypical human brain development and ageing. The approximately 100-year age range enabled the delineation of milestones and critical periods in maturation of the human brain, revealing an early growth epoch across its constituent tissue classes—beginning before 17 post-conception weeks, when the brain is at approximately 10% of its maximum size, and ending by age 3, when the brain is at approximately 80% of the maximum size. Individual centile scores benchmarked by normative neurodevelopmental trajectories were significantly associated with neuropsychiatric disorders as well as with dimensional phenotypes (Supplementary Information 5.2, 12). Furthermore, imaging–genetics studies65 may benefit from the increased heritability of centile scores compared with raw volumetric data (Supplementary Information 13). Perhaps most importantly, GAMLSS modelling enabled harmonization across technically diverse studies (Supplementary Information 5), and thus unlocked the potential value of combining primary MRI studies at scale to generate normative, sex-stratified brain growth charts, and individual centile scores of typicality and atypicality.

The analogy to paediatric growth charts is not meant to imply that brain charts are immediately suitable for benchmarking or quantitative diagnosis of individual patients in clinical practice. Even for traditional anthropometric growth charts (height, weight and BMI), there are still important caveats and nuances concerning their diagnostic interpretation in individual children66; similarly, it is expected that considerable further research will be required to validate the clinical diagnostic utility of brain charts. However, the current results bode well for future progress towards digital diagnosis of atypical brain structure and development67. By providing an age- and sex-normalized metric, centile scores enable trans-diagnostic comparisons between disorders that emerge at different stages of the lifespan (Supplementary Information 10, 11). The generally high stability of centile scores across longitudinal measurements also enabled assessment of brain changes related to diagnostic transition from mild cognitive impairment to Alzheimer’s disease (Supplementary Information 14), which provides one example of how centile scoring could be clinically useful in quantitatively predicting or diagnosing progressive neurodegenerative disorders in the future. Our provision of appropriate normative growth charts and online tools also creates an immediate opportunity to quantify atypical brain structure in clinical research samples, to leverage available legacy neuroimaging datasets, and to enhance ongoing studies.

Several important caveats are worth highlighting. Even this large MRI dataset was biased towards European and North American populations and European ancestry groups within those populations. This bias is unfortunately common in many clinical and scientific references, including anthropometric growth charts and benchmark genetic datasets, representing an inequity that must be addressed by the global scientific community68. In the particular case of brain charts, further increasing ethnic, socioeconomic and demographic diversity in MRI research will enable more population-representative normative trajectories69,70 that can be expected to improve the accuracy and strengthen the interpretation of centile scores in relation to appropriate norms26. The available reference data were also not equally distributed across all ages—for example, foetal, neonatal and mid-adulthood (30–40 years of age) epochs were under-represented (Supplementary Information 1719). Furthermore, although our statistical modelling approach was designed to mitigate study- or site-specific effects on centile scores, it cannot entirely correct for limitations of primary study design, such as ascertainment bias or variability in diagnostic criteria. Our decision to stratify the lifespan models by sex followed the analogous logic of sex-stratified anthropometric growth charts. Males have larger brain-tissue volumes than females in absolute terms (Supplementary Information 16), but this is not indicative of any difference in clinical or cognitive outcomes. Future work would benefit from more detailed and dimensional self-report variables relating to sex and gender71. The use of brain charts also does not circumvent the fundamental requirement for quality control of MRI data. We have shown that GAMLSS modelling of global structural MRI phenotypes is in fact remarkably robust to inclusion of poor-quality scans (Supplementary Information 2), but it should not be assumed that this level of robustness will apply to future brain charts of regional MRI or functional MRI phenotypes; therefore, the importance of quality control remains paramount.

We have focused primarily on global brain phenotypes, which were measurable in the largest achievable sample, aggregated over the widest age range, with the fewest methodological, theoretical and data-sharing constraints. However, we have also provided proof-of-concept brain charts for regional grey matter volumetrics, demonstrating plausible heterochronicity of cortical patterning, and illustrating the potential generalizability of this approach to a diverse range of fine-grained MRI phenotypes (Fig. 2, Supplementary Information 8). As ongoing and future efforts provide increasing amounts of high-quality MRI data, we predict an iterative process of improved brain charts for an increasing number of multimodal72 neuroimaging phenotypes. Such diversification will require the development, implementation and standardization of additional data quality control procedures27 to underpin robust brain chart modelling. To facilitate further research using our reference charts, we have provided interactive tools to explore these statistical models and to derive normalized centile scores for new datasets across the lifespan at www.brainchart.io.

Methods

Ethics

The research was reviewed by the Cambridge Psychology Research Ethics Committee (PRE.2020.104) and The Children’s Hospital of Philadelphia’s Institutional Review Board (IRB 20-017874) and deemed not to require PRE or IRB oversight as it consists of secondary analysis of de-identified primary datasets. Informed consent of participants (or their guardians) in primary studies is referenced in Supplementary Information 19 and Supplementary Table 1.

Model generation and specification

To accurately and comprehensively establish standardized brain reference charts across the lifespan, it is crucial to leverage multiple independent and diverse datasets, especially those spanning prenatal and early postnatal life. Here we sought to chart normative brain development and ageing across the largest age-span and largest aggregated neuroimaging dataset to date using a robust and scalable methodological framework2,24. We used GAMLSS2 to estimate cross-sectional normative age-related trends from 100 studies, comprising a reference dataset of more than 100,000 scans (see Supplementary Tables 1.11.7 for full demographic information and Supplementary Information 19 for dataset descriptions). We optimised GAMLSS model specification and parameterization to estimate non-linear normative growth curves, their confidence intervals and first derivatives, separately for males and females, allowing for random effects on the mean and higher order moments of the outcome distributions.

The reliability of the models was assessed and endorsed by cross-validation and bootstrap resampling procedures (Supplementary Information 3). We leveraged these normative trajectories to benchmark individual scans by centile scores, which were then investigated as age-normed and sex-stratifed measures of diagnostic and longitudinal atypicalities of brain structure across the lifespan.

The GAMLSS approach allowed not only modelling of age-related changes in brain phenotypes but also age related-changes in the variability of phenotypes, and in the form of both linear and nonlinear changes over time, thereby overcoming potential limitations of conventional additive models that only allow additive means to be modelled2. In addition, study-specific offsets (mean and variance) for each brain phenotype were also modelled as random effects. These modelling criteria are particularly important in the context of establishing growth reference charts as recommended by the World Health Organization24, as it is reasonable to assume the distribution of higher order moments (for example, variance) changes with age, sex, site/study and pre-processing pipeline, and it is impossible to circumvent some of these issues by collecting standardized data longitudinally for individuals spanning the approximately 100-year age range. Furthermore, recent studies suggest that changes in between-subject variability might intersect with vulnerability for developing a mental health condition74. The use of data spanning the entire age range is also critical, as data from partial age-windows can bias estimation of growth charts when extrapolated to the whole lifespan. In short, using a sex-stratified approach24, age, preprocessing pipeline and study were each included in the GAMLSS model estimation of first order (μ) and second order (σ) distribution parameters of a generalized gamma distribution using fractional polynomials to model nonlinear trends. See Supplementary Information for more details regarding GAMLSS model specification and estimation (Supplementary Information 1), image quality control (Supplementary Information 2), model stability and robustness (Supplementary Information 3, 4), phenotypic validation against non-imaging metrics (Supplementary Information 3, 5.2), inter-study harmonization (Supplementary Information 5) and assessment of cohort effects (Supplementary Information 6).

More formally, the GAMLSS framework can be specified in the following way:

$$Y\sim F\left(\mu ,\sigma ,\nu ,\tau \right)$$
(1)
$${g}_{\mu }(\mu )={X}_{\mu }{\beta }_{\mu }+{Z}_{\mu }{\gamma }_{\mu }+\sum _{i}{s}_{\mu ,i}({x}_{i})$$
$${g}_{\sigma }(\sigma )={X}_{\sigma }{\beta }_{\sigma }+{Z}_{\sigma }{\gamma }_{\sigma }+\sum _{i}{s}_{\sigma ,i}({x}_{i})$$
$${g}_{\nu }(\nu )={X}_{\nu }{\beta }_{\nu }+{Z}_{\nu }{\gamma }_{\nu }+\sum _{i}{s}_{\nu ,i}({x}_{i})$$
$${g}_{\tau }(\tau )={X}_{\tau }{\beta }_{\tau }+{Z}_{\tau }{\gamma }_{\tau }+\sum _{i}{s}_{\tau ,i}({x}_{i})$$

Here, the outcome vector, \(Y\), follows a probability distribution \(F\) parameterized by up to four parameters, \((\mu ,\sigma ,\nu ,\tau )\). The four parameters, depending on the parameterization of the probability density function, may correspond to the mean, variance, skewness, and kurtosis—that is, the first four moments. However, for many distributions there is not a direct one-to-one correspondence. Each component is linked to a linear equation through a link-function, \({g}_{\bullet }()\), and each component equation may include three types of terms: fixed effects, β (with design matrix X); random effects, γ (with design matrix Z); and non-parametric smoothing functions, s.,i applied to the ith covariate for each parameter. The nature of the outcome distribution determines the appropriate link functions and which components are used. In principle any outcome distribution can be used, from well-behaved continuous and discrete outcomes, through to mixtures and truncations.

Here we have used fractional polynomials as a flexible, but not unduly complex, approach to modelling age-related changes in MRI phenotypes. Although non-parametric smoothers are more flexible, they can become unstable and infeasible, especially in the presence of random effects. Hence, the fractional polynomials enter the model within the X terms, with associated coefficients in β. The GAMLSS framework includes the ability to estimate the most appropriate powers of fractional polynomial expansion within the iterative fitting algorithm, searching across the standard set of powers, \(p\in \{-2,-1,-\mathrm{0.5,0,0.5,\; 1,\; 2,\; 3}\},\) where the design matrix includes the covariate (in this case, age) raised to the power, namely, \({x}^{p}\). Fractional polynomials naturally extend to higher-orders, for example a second-order fractional polynomial of the form, \({x}^{{p}_{1}}+{x}^{{p}_{2}}\) (see Supplementary Information 1.3 for further details).

There are several options for including random effects within the GAMLSS framework depending on the desired covariance structures. We consider the simplest case, including a factor-level (or group-level) random intercept, where the observations are grouped by the study covariate. The random effects are drawn from a normal distribution with zero mean and variance to be estimated, γ  Ν(0,δ2). The ability to include random effects is fundamental to accounting for co-dependence between observations. It is therefore possible to take advantage of the flexibility of ‘standard’ GAMLSS, as typically used to develop growth charts24,62,75, while accounting for co-dependence between observations using random effects. The typical applications of GAMLSS assume independent and identically distributed outcomes; however, in this context it is essential to account for within-study covariance implying the observations are no longer independent.

The resulting models were evaluated using several sensitivity analyses and validation approaches. These models of whole-brain and regional morphometric development were robust to variations in image quality, and cross-validated by non-imaging metrics. However, we expect that several sources of variance, including but not limited to MRI data quality and variability of acquisition protocols, may become increasingly important as brain charting methods are applied to more innovative and/or anatomically fine-grained MRI phenotypes. It will be important for future work to remain vigilant about the potential impact of data quality and other sources of noise on robustness and generalizability of both normative trajectories and the centile scores derived from them.

Based on the model selection criteria, detailed in Supplementary Information 1, the final models for normative trajectories of all MRI phenotypes were specified as illustrated below for GMV:

$$\begin{array}{c}{\rm{G}}{\rm{M}}{\rm{V}}\sim {\rm{G}}{\rm{e}}{\rm{n}}{\rm{e}}{\rm{r}}{\rm{a}}{\rm{l}}{\rm{i}}{\rm{z}}{\rm{s}}{\rm{e}}{\rm{d}}\,{\rm{G}}{\rm{a}}{\rm{m}}{\rm{m}}{\rm{a}}(\mu ,\sigma ,\nu )\,{\rm{w}}{\rm{i}}{\rm{t}}{\rm{h}}\,\\ \log (\mu )={\alpha }_{\mu }+{\alpha }_{\mu ,{\rm{s}}{\rm{e}}{\rm{x}}}({\rm{s}}{\rm{e}}{\rm{x}})+{\alpha }_{\mu ,{\rm{v}}{\rm{e}}{\rm{r}}}({\rm{v}}{\rm{e}}{\rm{r}})+{\beta }_{\mu ,1}{({\rm{a}}{\rm{g}}{\rm{e}})}^{-2}+{\beta }_{\mu ,2}{({\rm{a}}{\rm{g}}{\rm{e}})}^{-2}+{\beta }_{\mu ,3}{({\rm{a}}{\rm{g}}{\rm{e}})}^{-2}\,\log ({\rm{a}}{\rm{g}}{\rm{e}}{)}^{2}+{\gamma }_{\mu ,{\rm{s}}{\rm{t}}{\rm{u}}{\rm{d}}{\rm{y}}}\\ \log (\sigma )={\alpha }_{\sigma }+{\alpha }_{\sigma ,{\rm{s}}{\rm{e}}{\rm{x}}}({\rm{s}}{\rm{e}}{\rm{x}})+{\beta }_{\sigma ,1}{({\rm{a}}{\rm{g}}{\rm{e}})}^{-2}+{\beta }_{\sigma ,2}{({\rm{a}}{\rm{g}}{\rm{e}})}^{3}+{\gamma }_{\sigma {\rm{s}}{\rm{t}}{\rm{u}}{\rm{d}}{\rm{y}}}\\ \,\nu ={\alpha }_{\nu }\end{array}$$
(2)

For each component of the generalized gamma distribution, \(\alpha \) terms correspond to fixed effects of the intercept, sex (female or male), and software version used for pre-processing (five categories); \(\beta \) terms correspond to the fixed effects of age, modelled as fractional polynomial functions with the number of terms reflecting the order of the fractional polynomials; and \(\gamma \) terms correspond to the study-level random effects. Note that we have explicitly included the link-functions for each component of the generalized gamma, namely the natural logarithm for \(\mu \) and \(\sigma \) (since these parameters must be positive) and the identity for \(\nu \).

Similarly for the other global MRI phenotypes:

$$\begin{array}{c}{\rm{W}}{\rm{M}}{\rm{V}}\sim {\rm{G}}{\rm{e}}{\rm{n}}{\rm{e}}{\rm{r}}{\rm{a}}{\rm{l}}{\rm{i}}{\rm{s}}{\rm{e}}{\rm{d}}\,{\rm{G}}{\rm{a}}{\rm{m}}{\rm{m}}{\rm{a}}(\mu ,\sigma ,\nu )\,{\rm{w}}{\rm{i}}{\rm{t}}{\rm{h}}\\ \log (\mu )={\alpha }_{\mu }+{\alpha }_{\mu ,{\rm{s}}{\rm{e}}{\rm{x}}}({\rm{s}}{\rm{e}}{\rm{x}})+{\alpha }_{\mu ,{\rm{v}}{\rm{e}}{\rm{r}}}({\rm{v}}{\rm{e}}{\rm{r}})+{\beta }_{\mu ,1}{({\rm{a}}{\rm{g}}{\rm{e}})}^{-2}+{\beta }_{\mu ,2}{({\rm{a}}{\rm{g}}{\rm{e}})}^{3}+{\beta }_{\mu ,3}{({\rm{a}}{\rm{g}}{\rm{e}})}^{3}\,\log ({\rm{a}}{\rm{g}}{\rm{e}})+{\gamma }_{\mu ,{\rm{s}}{\rm{t}}{\rm{u}}{\rm{d}}{\rm{y}}}\\ \log (\sigma )={\alpha }_{\sigma }+{\alpha }_{\sigma ,{\rm{s}}{\rm{e}}{\rm{x}}}({\rm{s}}{\rm{e}}{\rm{x}})+{\beta }_{\sigma ,1}{({\rm{a}}{\rm{g}}{\rm{e}})}^{-2}+{\beta }_{\sigma ,2}{({\rm{a}}{\rm{g}}{\rm{e}})}^{3}+{\gamma }_{\sigma ,{\rm{s}}{\rm{t}}{\rm{u}}{\rm{d}}{\rm{y}}}\\ \,\nu ={\alpha }_{\nu },\end{array}$$
(3)
$$\begin{array}{c}{\rm{s}}{\rm{G}}{\rm{M}}{\rm{V}}\sim {\rm{G}}{\rm{e}}{\rm{n}}{\rm{e}}{\rm{r}}{\rm{a}}{\rm{l}}{\rm{i}}{\rm{s}}{\rm{e}}{\rm{d}}\,{\rm{G}}{\rm{a}}{\rm{m}}{\rm{m}}{\rm{a}}(\mu ,\sigma ,\nu )\,{\rm{w}}{\rm{i}}{\rm{t}}{\rm{h}}\\ \log (\mu )={\alpha }_{\mu }+{\alpha }_{\mu ,{\rm{s}}{\rm{e}}{\rm{x}}}({\rm{s}}{\rm{e}}{\rm{x}})+{\alpha }_{\mu ,{\rm{v}}{\rm{e}}{\rm{r}}}({\rm{v}}{\rm{e}}{\rm{r}})+{\beta }_{\mu ,1}{({\rm{a}}{\rm{g}}{\rm{e}})}^{-2}+{\beta }_{\mu ,2}{({\rm{a}}{\rm{g}}{\rm{e}})}^{-2}\,\log ({\rm{a}}{\rm{g}}{\rm{e}})+{\beta }_{\mu ,3}{({\rm{a}}{\rm{g}}{\rm{e}})}^{3}+{\gamma }_{\mu ,{\rm{s}}{\rm{t}}{\rm{u}}{\rm{d}}{\rm{y}}}\\ \log (\sigma )={\alpha }_{\sigma }+{\alpha }_{\sigma ,{\rm{s}}{\rm{e}}{\rm{x}}}({\rm{s}}{\rm{e}}{\rm{x}})+{\beta }_{\sigma ,1}{({\rm{a}}{\rm{g}}{\rm{e}})}^{-2}+{\beta }_{\sigma ,2}{({\rm{a}}{\rm{g}}{\rm{e}})}^{-2}\,\log ({\rm{a}}{\rm{g}}{\rm{e}})+{\gamma }_{\sigma ,{\rm{s}}{\rm{t}}{\rm{u}}{\rm{d}}{\rm{y}}}\\ \,\nu ={\alpha }_{\nu },\end{array}$$
(4)
$$\begin{array}{c}{\rm{V}}{\rm{e}}{\rm{n}}{\rm{t}}{\rm{r}}{\rm{i}}{\rm{c}}{\rm{l}}{\rm{e}}{\rm{s}}\sim {\rm{G}}{\rm{e}}{\rm{n}}{\rm{e}}{\rm{r}}{\rm{a}}{\rm{l}}{\rm{i}}{\rm{z}}{\rm{e}}{\rm{d}}\,{\rm{G}}{\rm{a}}{\rm{m}}{\rm{m}}{\rm{a}}(\mu ,\sigma ,\nu )\,{\rm{w}}{\rm{i}}{\rm{t}}{\rm{h}}\,\\ \log (\mu )=\,{\alpha }_{\mu }+{\alpha }_{\mu ,{\rm{s}}{\rm{e}}{\rm{x}}}({\rm{s}}{\rm{e}}{\rm{x}})+{\alpha }_{\mu ,{\rm{v}}{\rm{e}}{\rm{r}}}({\rm{v}}{\rm{e}}{\rm{r}})+{\beta }_{\mu ,1}{({\rm{a}}{\rm{g}}{\rm{e}})}^{3}+{\beta }_{\mu ,2}{({\rm{a}}{\rm{g}}{\rm{e}})}^{3}\,\log ({\rm{a}}{\rm{g}}{\rm{e}})+{\beta }_{\mu ,3}{({\rm{a}}{\rm{g}}{\rm{e}})}^{3}\,\log {({\rm{a}}{\rm{g}}{\rm{e}})}^{2}+{\gamma }_{\mu ,{\rm{s}}{\rm{t}}{\rm{u}}{\rm{d}}{\rm{y}}}\\ \log (\sigma )=\,{\alpha }_{\sigma }+{\alpha }_{\sigma ,{\rm{s}}{\rm{e}}{\rm{x}}}({\rm{s}}{\rm{e}}{\rm{x}})+{\beta }_{\sigma ,1}{({\rm{a}}{\rm{g}}{\rm{e}})}^{-2}+{\beta }_{\sigma ,2}{({\rm{a}}{\rm{g}}{\rm{e}})}^{-2}\log ({\rm{a}}{\rm{g}}{\rm{e}})+{\beta }_{\sigma ,3}{({\rm{a}}{\rm{g}}{\rm{e}})}^{-2}\,\log {({\rm{a}}{\rm{g}}{\rm{e}})}^{2}\\ \,\nu ={\alpha }_{\nu },\end{array}$$
(5)
$$\begin{array}{c}{\rm{T}}{\rm{C}}{\rm{V}}\sim {\rm{G}}{\rm{e}}{\rm{n}}{\rm{e}}{\rm{r}}{\rm{a}}{\rm{l}}{\rm{i}}{\rm{z}}{\rm{e}}{\rm{d}}\,{\rm{G}}{\rm{a}}{\rm{m}}{\rm{m}}{\rm{a}}(\mu ,\sigma ,\nu )\,{\rm{w}}{\rm{i}}{\rm{t}}{\rm{h}}\\ \log (\mu )=\,{\alpha }_{\mu }+{\alpha }_{\mu ,{\rm{s}}{\rm{e}}{\rm{x}}}({\rm{s}}{\rm{e}}{\rm{x}})+{\alpha }_{\mu ,{\rm{v}}{\rm{e}}{\rm{r}}}({\rm{v}}{\rm{e}}{\rm{r}})+{\beta }_{\mu ,1}{({\rm{a}}{\rm{g}}{\rm{e}})}^{-2}+{\beta }_{\mu ,2}{({\rm{a}}{\rm{g}}{\rm{e}})}^{-2}\,\log ({\rm{a}}{\rm{g}}{\rm{e}})+{\beta }_{\mu ,3}{({\rm{a}}{\rm{g}}{\rm{e}})}^{3}+{\gamma }_{\mu ,{\rm{s}}{\rm{t}}{\rm{u}}{\rm{d}}{\rm{y}}}\\ \log (\sigma )=\,{\alpha }_{\sigma }+{\alpha }_{\sigma ,{\rm{s}}{\rm{e}}{\rm{x}}}({\rm{s}}{\rm{e}}{\rm{x}})+{\beta }_{\sigma ,1}{({\rm{a}}{\rm{g}}{\rm{e}})}^{-2}+{\beta }_{\sigma ,2}{({\rm{a}}{\rm{g}}{\rm{e}})}^{-2}\,\log ({\rm{a}}{\rm{g}}{\rm{e}})+{\beta }_{\sigma ,3}{({\rm{a}}{\rm{g}}{\rm{e}})}^{-2}\,\log {({\rm{a}}{\rm{g}}{\rm{e}})}^{2}+{\gamma }_{\sigma ,{\rm{s}}{\rm{t}}{\rm{u}}{\rm{d}}{\rm{y}}}\\ \,\nu ={\alpha }_{\nu }\end{array}$$
(6)
$$\begin{array}{l}{\rm{S}}{\rm{A}}\sim {\rm{G}}{\rm{e}}{\rm{n}}{\rm{e}}{\rm{r}}{\rm{a}}{\rm{l}}{\rm{i}}{\rm{s}}{\rm{e}}{\rm{d}}\,{\rm{G}}{\rm{a}}{\rm{m}}{\rm{m}}{\rm{a}}(\mu ,\sigma ,\nu )\,{\rm{w}}{\rm{i}}{\rm{t}}{\rm{h}}\,\\ \log (\mu )=\,{\alpha }_{\mu }+{\alpha }_{\mu ,{\rm{s}}{\rm{e}}{\rm{x}}}({\rm{s}}{\rm{e}}{\rm{x}})+{\alpha }_{\mu ,{\rm{v}}{\rm{e}}{\rm{r}}}({\rm{v}}{\rm{e}}{\rm{r}})+{\beta }_{\mu ,1}{({\rm{a}}{\rm{g}}{\rm{e}})}^{-2}\\ \,+{\beta }_{\mu ,2}{({\rm{a}}{\rm{g}}{\rm{e}})}^{-2}\,\log ({\rm{a}}{\rm{g}}{\rm{e}})+{\beta }_{\mu ,3}{({\rm{a}}{\rm{g}}{\rm{e}})}^{-2}\,\log {({\rm{a}}{\rm{g}}{\rm{e}})}^{2}+{\gamma }_{\mu ,{\rm{s}}{\rm{t}}{\rm{u}}{\rm{d}}{\rm{y}}}\\ \log (\sigma )=\,{\alpha }_{\sigma }+{\alpha }_{\sigma ,{\rm{s}}{\rm{e}}{\rm{x}}}({\rm{s}}{\rm{e}}{\rm{x}})+{\beta }_{\sigma ,1}{({\rm{a}}{\rm{g}}{\rm{e}})}^{-2}+{\beta }_{\sigma ,2}{({\rm{a}}{\rm{g}}{\rm{e}})}^{-2}\,\log ({\rm{a}}{\rm{g}}{\rm{e}})+{\beta }_{\sigma ,3}{({\rm{a}}{\rm{g}}{\rm{e}})}^{-2}\,\log {({\rm{a}}{\rm{g}}{\rm{e}})}^{2}+{\gamma }_{\sigma ,{\rm{s}}{\rm{t}}{\rm{u}}{\rm{d}}{\rm{y}}}\\ \,\nu ={\alpha }_{\nu },\end{array}$$
(7)
$$\begin{array}{l}{\rm{C}}{\rm{T}}\sim {\rm{G}}{\rm{e}}{\rm{n}}{\rm{e}}{\rm{r}}{\rm{a}}{\rm{l}}{\rm{i}}{\rm{z}}{\rm{e}}{\rm{d}}\,{\rm{G}}{\rm{a}}{\rm{m}}{\rm{m}}{\rm{a}}(\mu ,\sigma ,\nu )\,{\rm{w}}{\rm{i}}{\rm{t}}{\rm{h}}\,\\ \log (\mu )=\,{\alpha }_{\mu }+{\alpha }_{\mu ,{\rm{s}}{\rm{e}}{\rm{x}}}({\rm{s}}{\rm{e}}{\rm{x}})+{\alpha }_{\mu ,{\rm{v}}{\rm{e}}{\rm{r}}}({\rm{v}}{\rm{e}}{\rm{r}})+{\beta }_{\mu ,1}{({\rm{a}}{\rm{g}}{\rm{e}})}^{-2}\\ \,+{\beta }_{\mu ,2}{({\rm{a}}{\rm{g}}{\rm{e}})}^{-2}\,\log ({\rm{a}}{\rm{g}}{\rm{e}})+{\gamma }_{\mu ,{\rm{s}}{\rm{t}}{\rm{u}}{\rm{d}}{\rm{y}}}\\ \log (\sigma )=\,{\alpha }_{\sigma }+{\alpha }_{\sigma ,{\rm{s}}{\rm{e}}{\rm{x}}}({\rm{s}}{\rm{e}}{\rm{x}})+{\beta }_{\sigma ,1}{({\rm{a}}{\rm{g}}{\rm{e}})}^{-1}+{\beta }_{\sigma ,2}{({\rm{a}}{\rm{g}}{\rm{e}})}^{0.5}+{\gamma }_{\sigma ,{\rm{s}}{\rm{t}}{\rm{u}}{\rm{d}}{\rm{y}}}\\ \,\nu ={\alpha }_{\nu }.\end{array}$$
(8)

No smoothing terms were used in any GAMLSS models implemented in this study, although the fractional polynomials can be regarded as effectively a parametric form of smoothing. Reliably estimating higher order moments requires increasing amounts of data, hence none of our models specified any age-related fixed-effects or random effects in the \(\nu \) term. However, \({\alpha }_{\nu }\) was found to be important in terms of model fit and hence we have used a generalized gamma distribution (Supplementary Information 1).

Defining developmental milestones

GAMLSS modelling also allowed us to leverage the aggregated life-spanning neuroimaging dataset to derive developmental milestones (that is, peaks of trajectories) and compare them to existing literature. The cerebrum tissue classes from 100 studies (Fig. 1, Supplementary Tables 1.11.7, Supplementary Information 18) showed clear, predominantly age-related trends, even prior to any modelling. Comparing these models with multiple non-MRI metrics of brain size demonstrated high correspondence across the lifespan (Supplementary Information 3). Peaks were determined based on the GAMLSS model output (50th centile) for each of the tissue classes and TCV, for both total tissue volumes and rates of change or growth (velocity). A similar series of methodological steps was performed for the set of extended global and regional cortical morphometric phenotypes (Fig. 2, Supplementary Information 7, 8). To further contextualize the neuroimaging trajectories, diagnostic age ranges from previous literature73,76 (blue boxes in Fig. 3) were compared with empirical age ranges of patients with a given diagnosis across the aggregated neuroimaging dataset (black boxes in Fig. 3). Note that age of diagnosis is significantly later than age of symptom onset for many disorders73. Developmental milestones were also compared to published work for brain resting metabolic rate61, from its minimum in infancy to its maximum in early childhood; anthropometric variables (height and weight), which reach a first peak in velocity during infancy and a second peak in velocity in adolescence52; typical acquisition of the six gross motor capabilities62; and pubertal age ranges as defined based on previous reports51,53.

Centile scores and case–control differences

These normative trajectories of brain development and aging also enabled each individual scan to be quantified in terms of its relative distance from the median of the age-normed and sex-stratified distributions provided by the reference model67,77 (Fig. 4, Supplementary Information 10, 11). Individual centile scores were estimated relative to the reference curves, in a way that is conceptually similar to traditional anthropometric growth charts (Supplementary Information 1). These centiles represent a novel set of population- and age-standardized clinical phenotypes, providing the capacity for cross-phenotype, cross-study and cross-disorder comparison. A single multivariate metric (CMD, Supplementary Information 1.6) was estimated by combining centile scores on multiple MRI phenotypes for each individual (Fig. 4c). Case–control differences in centile scores were analysed with a bootstrapped (500 bootstraps) non-parametric generalization of Welch’s one-way ANOVA. Pairwise, sex stratified, post-hoc comparisons were conducted using non-parametric Monte Carlo permutation tests (10,000 permutations) and thresholded at a Benjamini–Hochberg FDR of q < 0.05.

Longitudinal stability

To use centile scores in a diagnostically meaningful or predictive way, they need to be stable across multiple measuring points. To assess this intra-individual stability, we calculated the subject-specific IQR of centiles across timepoints for the datasets that included longitudinal scans (N = 9,306, 41 unique studies). Exploratory longitudinal clinical analyses were restricted to clinical groups that had at least 50 subjects with longitudinal data to allow for robust group-wise estimates of longitudinal variability. In addition, there was a subset of individuals with documented clinical progression over the course of longitudinal scans, for instance from mild cognitive impairment to Alzheimer’s disease, where we expected an associated change in centile scored brain structure. To test this hypothesis, we assessed whether these individuals showed longitudinal variation of centile scores (as assessed with IQR) with a direction of change consistent with their clinical progression. See Supplementary Information 14 for further details about the longitudinal stability of centile scores.

Data sharing and out-of-sample estimation

We have provided an interactive tool (www.brainchart.io) and made our code and models openly available (https://github.com/brainchart/Lifespan). The tool allows the user to visualize the underlying demographics of the primary studies and to explore the normative brain charts in a much more detailed fashion than static images allow. It also provides the opportunity for interactive exploration of case–control differences in centile scores across many diagnostic categories that is beyond the scope of this paper. Perhaps most significantly, the brain chart interactive tool includes an out-of-sample estimator of model parameters for new MRI data that enables the user to compute centile scores for their own datasets without the computational or data-sharing hurdles involved in adding that data to the reference dataset used to estimate normative charts (Fig. 5). Bias and reliability of out-of-sample centile scoring was extensively assessed and endorsed by resampling and cross-validation studies for ‘new’ studies comprising at least 100 scans. Although already based on the largest and most comprehensive neuroimaging dataset to date, and supporting analyses of out-of-sample data, these normative brain charts will continue to be updated as additional data are made available for aggregation with the reference dataset. See Supplementary Information 1.8, 4 for further details about out-of-sample estimation.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.