Brain charts for the human lifespan

Over the past few decades, neuroimaging has become a ubiquitous tool in basic research and clinical studies of the human brain. However, no reference standards currently exist to quantify individual differences in neuroimaging metrics over time, in contrast to growth charts for anthropometric traits such as height and weight1. Here we assemble an interactive open resource to benchmark brain morphology derived from any current or future sample of MRI data (http://www.brainchart.io/). With the goal of basing these reference charts on the largest and most inclusive dataset available, acknowledging limitations due to known biases of MRI studies relative to the diversity of the global population, we aggregated 123,984 MRI scans, across more than 100 primary studies, from 101,457 human participants between 115 days post-conception to 100 years of age. MRI metrics were quantified by centile scores, relative to non-linear trajectories2 of brain structural changes, and rates of change, over the lifespan. Brain charts identified previously unreported neurodevelopmental milestones3, showed high stability of individuals across longitudinal assessments, and demonstrated robustness to technical and methodological differences between primary studies. Centile scores showed increased heritability compared with non-centiled MRI phenotypes, and provided a standardized measure of atypical brain structure that revealed patterns of neuroanatomical variation across neurological and psychiatric disorders. In summary, brain charts are an essential step towards robust quantification of individual variation benchmarked to normative trajectories in multiple, commonly used neuroimaging phenotypes.

The simple framework of growth charts to quantify age-related change was first published in the late eighteenth century 1 and remains a cornerstone of paediatric healthcare-an enduring example of the utility of standardized norms to benchmark individual trajectories of development. However, growth charts are currently available only for a small set of anthropometric variables, such as height, weight and head circumference, and only for the first decade of life. There are no analogous charts available for quantification of age-related changes in the human brain, although it is known to go through a prolonged and complex maturational program from pregnancy to the third decade 4 , followed by progressive senescence from approximately the sixth decade 5 . The lack of tools for standardized assessment of brain development and ageing is particularly relevant to research studies of psychiatric disorders, which are increasingly recognized as a consequence of atypical brain development 6 , and neurodegenerative diseases that cause pathological brain changes in the context of normative senescence 7 . Preterm birth and neurogenetic disorders are also associated with marked abnormalities of brain structure 8,9 that persist into adult life 9,10 and are associated with learning disabilities and mental health disorders. Mental illness and dementia collectively represent the single biggest global health burden 11 , highlighting the urgent need for normative brain charts as an anchor point for standardized quantification of brain structure over the lifespan 12 .
Such standards for human brain measurement have not yet materialized from decades of neuroimaging research, probably owing to the challenges of integrating MRI data across multiple, methodologically diverse studies targeting distinct developmental epochs and clinical conditions 13 . For example, the perinatal period is rarely incorporated in analysis of age-related brain changes, despite evidence that early biophysical and molecular processes powerfully influence life-long neurodevelopmental trajectories 14,15 and vulnerability to psychiatric disorders 3 . Primary case-control studies are usually focused on a single disorder despite evidence of trans-diagnostically shared risk factors and pathogenic mechanisms, especially in psychiatry 16,17 . Harmonization of MRI data across primary studies to address these and other deficiencies in the extant literature is challenged by methodological and technical heterogeneity. Compared with relatively simple anthropometric measurements such as height or weight, brain morphometrics are known to be highly sensitive to variation in scanner platforms and sequences, data quality control, pre-processing and statistical analysis 18 , thus severely limiting the generalizability of trajectories estimated from any individual study 19 . Collaborative initiatives spurring collection of large-scale datasets 20,21 , recent advances in neuroimaging data processing 22,23 and proven statistical frameworks for modelling biological growth curves 2,24,25 provide the building blocks for a more comprehensive and generalizable approach to age-normed quantification of MRI phenotypes over the entire lifespan (see Supplementary Information 1 for details and consideration of previous work focused on the related but distinct objective of inferring brain age from MRI data). Here, we demonstrate that these convergent advances now enable the generation of brain charts that (1) robustly define normative processes of sex-stratified, age-related change in multiple MRI-derived phenotypes; (2) identify previously unreported brain growth milestones; (3) increase sensitivity to detect genetic and early life environmental effects on brain structure; and (4) provide standardized effect sizes to quantify neuroanatomical atypicality of brain scans collected across multiple clinical disorders. We do not claim to have yet reached the ultimate goal of quantitatively precise diagnosis of MRI scans from Article individual patients in clinical practice. However, the present work proves the principle that building normative charts to benchmark individual differences in brain structure is already achievable at global scale and over the entire life-course; and provides a suite of open science resources for the neuroimaging research community to accelerate further progress in the direction of standardized quantitative assessment of MRI data.

Mapping normative brain growth
We created brain charts for the human lifespan using generalized additive models for location, scale and shape 2,24 (GAMLSS), a robust and flexible framework for modelling non-linear growth trajectories recommended by the World Health Organization 24 . GAMLSS and related statistical frameworks have previously been applied to developmental   Fig. 1 | Human brain charts. a, MRI data were aggregated from over 100 primary studies comprising 123,984 scans that collectively spanned the age range from mid-gestation to 100 postnatal years. Box-violin plots show the age distribution for each study coloured by its relative sample size (log-scaled using the natural logarithm for visualization purposes). b, Non-centiled, 'raw' bilateral cerebrum tissue volumes for grey matter, white matter, subcortical grey matter and ventricles are plotted for each cross-sectional control scan as a function of age (log-scaled); points are coloured by sex. c, Normative brain-volume trajectories were estimated using GAMLSS, accounting for site-and study-specific batch effects, and stratified by sex (female, red; male, blue). All four cerebrum tissue volumes demonstrated distinct, non-linear trajectories of their medians (with 2.5% and 97.5% centiles denoted as dotted lines) as a function of age over the lifespan. Demographics for each cross-sectional sample of healthy controls included in the reference dataset for normative GAMLSS modelling of each MRI phenotype are detailed in Supplementary Table 1 modelling of brain structural and functional MRI phenotypes in open datasets 19,26-31 . Our approach to GAMLSS modelling leveraged the greater scale of data available to optimize model selection empirically, to estimate non-linear age-related trends (in median and variance) stratified by sex over the entire lifespan, and to account for site-or study-specific 'batch effects' on MRI phenotypes in terms of multiple random effect parameters. Specifically, GAMLSS models were fitted to structural MRI data from control subjects for the four main tissue volumes of the cerebrum (total cortical grey matter volume (GMV), total white matter volume (WMV), total subcortical grey matter volume (sGMV) and total ventricular cerebrospinal fluid volume (ventricles or CSF)). Supplementary Tables 1.  Table 2.1) showed an initial strong increase in GMV from mid-gestation onwards, peaking at 5.9 years (95% bootstrap confidence interval (CI) 5.8-6.1), followed by a near-linear decrease. This peak was observed 2 to 3 years later than previous reports relying on smaller, more age-restricted samples 43,44 . WMV also increased rapidly from mid-gestation to early childhood, peaking at 28.7 years (95% bootstrap CI 28.1-29.2), with subsequent accelerated decline in WMV after 50 years. Subcortical GMV showed an intermediate growth pattern compared with GMV and WMV, peaking in adolescence at 14.4 years (95% bootstrap 5 Female Male Regional phenotypes Regional peak volume 5 7 9 Age (years) Peak relative to total grey matter volume -2  global and regional cortical morphometric phenotypes. a, Trajectories for total cerebrum volume (TCV), total surface area and mean cortical thickness. For each global cortical MRI phenotype, the following sex-stratified results are shown as a function of age over the lifespan. From top to bottom: raw, non-centiled data; population trajectories of the median (with 2.5% and 97.5% centiles (dotted lines)); between-subject variance (with 95% confidence intervals); and rate of growth (the first derivatives of the median trajectory and 95% confidence intervals). All trajectories are plotted as a function of log-scaled age (x axis) and y axes are scaled in units of the corresponding MRI metrics (10,000 mm 3 for TCV, 10,000 mm 2 for surface area and mm for cortical thickness). b, Regional variability of cortical volume trajectories for 34 bilateral brain regions, as defined by the Desikan-Killiany parcellation 47 , averaged across sex (see Supplementary Information 7,8 for details). Since models were generated from bilateral averages of each cortical region, the cortical maps are plotted on the left hemisphere purely for visualization purposes. Top, a cortical map of age at peak regional volume (range 2-10 years). Middle, a cortical map of age at peak regional volume relative to age at peak GMV (5.9 years), highlighting regions that peak earlier (blue) or later (red) than GMV. Bottom, illustrative trajectories for the earliest peaking region (superior parietal lobe, blue line) and the latest peaking region (insula, red line), showing the range of regional variability relative to the GMV trajectory (grey line). Regional volume peaks are denoted as dotted vertical lines either side of the global peak, denoted as a dashed vertical line, in the bottom panel. The left y axis on the bottom panel refers to the earliest peak (blue line); the right y axis refers to the latest peak (red line). CI 14.0-14.7). Both the WMV and sGMV peaks are consistent with previous neuroimaging and postmortem reports 45,46 . By contrast, CSF showed an increase until age 2, followed by a plateau until age 30, and then a slow linear increase that became exponential in the sixth decade of life. Age-related variance (Fig. 1d), explicitly estimated by GAMLSS, formally quantifies developmental changes in between-subject variability. There was an early developmental increase in GMV variability that peaked at 4 years, whereas subcortical volume variability peaked in late adolescence. WMV variability peaked during the fourth decade of life, and CSF was maximally variable at the end of the human lifespan.

Extended neuroimaging phenotypes
To extend the scope of brain charts beyond the four cerebrum tissue volumes, we generalized the same GAMLSS modelling approach to estimate normative trajectories for additional MRI phenotypes including other morphometric properties at a global scale (mean cortical thickness and total surface area) and regional volume at each of 34 cortical areas 47 (Fig. 2, Supplementary Information 7-9, Supplementary Tables 1, 2). We found, as expected, that total surface area closely tracked the development of total cerebrum volume (TCV) across the lifespan (Fig. 2a) and key developmental milestones, as a function of age (log-scaled). Circles depict the peak rate of growth milestones for each phenotype (defined by the maxima of the first derivatives of the median trajectories (Fig. 1e)). Triangles depict the peak volume of each phenotype (defined by the maxima of the median trajectories); the definition of GMV:WMV differentiation is detailed in Supplementary Information  We also found evidence for regional variability in volumetric neurodevelopmental trajectories. Compared with peak GMV at 5.9 years, the age of peak regional grey matter volume varied considerably-from approximately 2 to 10 years-across 34 cortical areas. Primary sensory regions reached peak volume earliest and showed faster post-peak declines, whereas fronto-temporal association cortical areas peaked later and showed slower post-peak declines (Fig. 2b, Supplementary  Information 8.2). Notably, this spatial pattern recapitulated a gradient from sensory-to-association cortex that has been previously associated with multiple aspects of brain structure and function 50 .

Article
and sGMV (peak velocity and size) (Fig. 3) coincided with the early neonatal and adolescent peaks of height and weight velocity 52,53 . The velocity of mean cortical thickness peaked even earlier, in the prenatal period at −0.38 years (95% bootstrap CI −0.4 to −0.34) (relative to birth), corresponding approximately to mid-gestation. This early peak in cortical thickness velocity has not been reported previously-to our knowledge-in part owing to challenges in acquiring adequate and consistent signal from typical MRI sequences in the perinatal period 54 . Similarly, normative trajectories revealed an early period of GMV:WMV differentiation, beginning in the first month after birth with the switch from WMV to GMV as the proportionally dominant tissue compartment, and ending when the absolute difference of GMV and WMV peaked around 3 years ( Supplementary Information 9). This epoch of GMV:WMV differentiation, which may reflect underlying changes in myelination and synaptic proliferation 4,55-58 , has not been demarcated in previous studies 45, 59 . It was probably identified in this study owing to the substantial amount of early developmental MRI data available for analysis in the aggregated dataset (in total across all primary studies, N = 2,571 and N = 1,484 participants aged less than 2 years were available for analysis of cerebrum tissue volumes and extended global MRI phenotypes, respectively). The period of GMV:WMV differentiation encompasses dynamic changes in brain metabolites 60 (0-3 months), resting metabolic rate 61 (RMR) (minimum = 7 months, maximum = 4.2 years), the typical period of acquisition of motor capabilities and other early paediatric milestones 62 , and the most rapid change in TCV (Fig. 3).

Individualized centile scores
We computed individualized centile scores that benchmarked each individual scan in the context of normative age-related trends (Methods, 'Centile scores and case-control differences' and Supplementary Information 1-6 for further details). This approach is conceptually similar to quantile rank mapping, as previously reported 26,28,29 , where the typicality or atypicality of each phenotype in each scan is quantified by its score on the distribution of phenotypic parameters in the normative or reference sample of scans, with more atypical phenotypes having more extreme centile (or quantile) scores. The clinical diversity of the aggregated dataset enabled us to comprehensively investigate casecontrol differences in individually specific centile scores across a range of conditions. Relative to the control group (CN), there were highly significant differences in centile scores across large (N > 500) groups of cases diagnosed with multiple disorders (Fig. 4a, Supplementary Information 10), with effect sizes ranging from medium (0.2 < Cohen's d < 0.8) to large (Cohen's d > 0.8) (see Supplementary Tables 3, 4 for all false discovery rate (FDR)-corrected P values and effect sizes). Clinical case-control differences in cortical thickness and surface area generally followed the same trend as volume differences ( Supplementary  Information 10). Alzheimer's disease showed the greatest overall difference, with a maximum difference localized to grey matter volume in biologically female patients (median centile score = 14%, 36 percentage points difference from CN median, corresponding to Cohen's d = 0.88; Fig. 4a). In addition, we generated a cumulative deviation metric, the centile Mahalanobis distance (CMD), to summarize a comparative assessment of brain morphology across all global MRI phenotypes relative to the CN group (Fig. 4b, Supplementary Information 1.6). Notably, schizophrenia ranked third overall behind Alzheimer's disease and mild cognitive impairment (MCI) on the basis of CMD (Fig. 4c). Assessment across diagnostic groups, based on profiles of the multiple centile scores for each MRI phenotype and for CMD, highlighted shared and distinct patterns across clinical conditions ( Supplementary  Information 10, 11). However, when examining cross-disorder similarity of multivariate centile scores, hierarchical clustering yielded three clusters broadly comprising neurodegenerative, mood and anxiety, and neurodevelopmental disorders ( Supplementary Information 11).
Across all major epochs of the lifespan 63 , the CMD was consistently greater in cases relative to controls, irrespective of diagnostic category.
Estimate study-speci c offsets  Top, brain phenotypes were measured in a reference dataset of MRI scans. GAMLSS modelling was used to estimate the relationship between (global) MRI phenotypes and age, stratified by sex, and controlling for technical and other sources of variation between scanning sites and primary studies. Bottom, the normative trajectory of the median and confidence interval for each phenotype was plotted as a population reference curve. Out-of-sample data from a new MRI study were aligned to the corresponding epoch of the normative trajectory, using maximum likelihood to estimate the study specific offsets (random effects) for three moments of the underlying statistical distributions: mean (μ), variance (σ), and skewness (ν) in an age-and sex-specific manner. Centile scores of each phenotype could then be estimated for each scan in the new study, on the same scale as the reference population curve, while accounting for study-specific 'batch effects' on technical or other sources of variation (see Supplementary Information 1.8 for details). MLE, maximum likelihood estimation.
The largest case-control differences across epochs occurred in late adulthood when risk for dementia increases and in adolescence, which is well-recognized as a period of increased incidence of mental health disorders ( Supplementary Information 10.3). In five primary studies covering the lifespan, average centile scores across global tissues were related to two metrics of premature birth (gestational age at birth: t = 13.164, P < 2 × 10 −16 ; birth weight: t = 36.395, P < 2 × 10 −16 ; Supplementary Information 12), such that greater gestational age and birth weight were associated with higher average centile scores. Centile scores also showed increased twin-based heritability in two independent studies (total N = 913 twin pairs) compared with non-centiled phenotypes (average increase of 11.8 percentage points in narrow sense heritability (h 2 ) across phenotypes; Fig. 4d, Supplementary Information 13). In summary, centile normalization of brain metrics reproducibly detected case-control differences and genetic effects on brain structure, as well as long-term sequelae of adverse birth outcomes even in the adult brain 10 .

Longitudinal centile changes
Owing to the relative paucity of longitudinal imaging data (about 10% of the reference dataset), normative models were estimated from cross-sectional data collected at a single time point. However, the generalizability of cross-sectional models to longitudinal assessment is important for future research. Within-subject variability of centile scores derived from longitudinally repeated scans, measured with the interquartile range (IQR) (Methods, 'Longitudinal stability', Supplementary Information 1.7), was low across both clinical and CN groups (all median IQR < 0.05 centile points), indicating that centile scoring of brain structure was generally stable over time, although there was also some evidence of between-study and cross-disorder differences in within-subject variability ( Supplementary Information 14). Notably, individuals who changed diagnostic categories-for example, those who progressed from mild cognitive impairment to Alzheimer's disease over the course of repeated scanning-showed small but significant increases in within-subject variability of centile scores (Supplementary Information 14, Supplementary Tables 5, 6). Within-subject variability was also slightly higher in samples from younger individuals ( Supplementary Information 14), which could reflect increased noise due to the technical or data quality challenges associated with scanning younger individuals, but is also consistent with the evidence of increased variability in earlier development observed across other anthropometric traits 64 .

Centile scoring of new MRI data
A key challenge for brain charts is the accurate centile scoring of out-of-sample MRI data, not represented in the reference dataset used to estimate normative trajectories. We therefore carefully evaluated the reliability and validity of brain charts for centile scoring of such 'new' scans. For each new MRI study, we used maximum likelihood to estimate study-specific statistical offsets from the age-appropriate epoch of the normative trajectory; we then estimated centile scores for each individual in the new study benchmarked against the offset trajectory (Fig. 5, Methods, 'Data-sharing and out-of-sample estimation', Supplementary Information 1.8). Extensive jack-knife and leave-one-study-out analyses indicated that a study size of N > 100 scans was sufficient for stable and unbiased estimation of out-of-sample centile scores (Supplementary Information 4). This study size limit is in line with the size of many contemporary brain MRI research studies. However, these results do not immediately support the use of brain charts to generate centile scores from smaller-scale research studies, or from an individual patient's scan in clinical practice-this remains a goal for future work. Out-of-sample centile scores proved highly reliable in multiple test-retest datasets and were robust to variations in image processing pipelines ( Supplementary Information 4).

Discussion
We have aggregated the largest neuroimaging dataset to date to modernize the concept of growth charts for mapping typical and atypical human brain development and ageing. The approximately 100-year age range enabled the delineation of milestones and critical periods in maturation of the human brain, revealing an early growth epoch across its constituent tissue classes-beginning before 17 post-conception weeks, when the brain is at approximately 10% of its maximum size, and ending by age 3, when the brain is at approximately 80% of the maximum size. Individual centile scores benchmarked by normative neurodevelopmental trajectories were significantly associated with neuropsychiatric disorders as well as with dimensional phenotypes ( Supplementary Information 5.2, 12). Furthermore, imaging-genetics studies 65 may benefit from the increased heritability of centile scores compared with raw volumetric data ( Supplementary Information 13). Perhaps most importantly, GAMLSS modelling enabled harmonization across technically diverse studies ( Supplementary Information 5), and thus unlocked the potential value of combining primary MRI studies at scale to generate normative, sex-stratified brain growth charts, and individual centile scores of typicality and atypicality.
The analogy to paediatric growth charts is not meant to imply that brain charts are immediately suitable for benchmarking or quantitative diagnosis of individual patients in clinical practice. Even for traditional anthropometric growth charts (height, weight and BMI), there are still important caveats and nuances concerning their diagnostic interpretation in individual children 66 ; similarly, it is expected that considerable further research will be required to validate the clinical diagnostic utility of brain charts. However, the current results bode well for future progress towards digital diagnosis of atypical brain structure and development 67 . By providing an age-and sex-normalized metric, centile scores enable trans-diagnostic comparisons between disorders that emerge at different stages of the lifespan (Supplementary Information 10, 11). The generally high stability of centile scores across longitudinal measurements also enabled assessment of brain changes related to diagnostic transition from mild cognitive impairment to Alzheimer's disease ( Supplementary Information 14), which provides one example of how centile scoring could be clinically useful in quantitatively predicting or diagnosing progressive neurodegenerative disorders in the future. Our provision of appropriate normative growth charts and online tools also creates an immediate opportunity to quantify atypical brain structure in clinical research samples, to leverage available legacy neuroimaging datasets, and to enhance ongoing studies.
Several important caveats are worth highlighting. Even this large MRI dataset was biased towards European and North American populations and European ancestry groups within those populations. This bias is unfortunately common in many clinical and scientific references, including anthropometric growth charts and benchmark genetic datasets, representing an inequity that must be addressed by the global scientific community 68 . In the particular case of brain charts, further increasing ethnic, socioeconomic and demographic diversity in MRI research will enable more population-representative normative trajectories 69,70 that can be expected to improve the accuracy and strengthen the interpretation of centile scores in relation to appropriate norms 26 . The available reference data were also not equally distributed across all ages-for example, foetal, neonatal and mid-adulthood (30-40 years of age) epochs were under-represented ( Supplementary Information 17-19). Furthermore, although our statistical modelling approach was designed to mitigate study-or site-specific effects on centile scores, it cannot entirely correct for limitations of primary study design, such as ascertainment bias or variability in diagnostic criteria. Our decision to stratify the lifespan models by sex followed the analogous logic of sex-stratified anthropometric growth charts. Males have larger brain-tissue volumes than females in absolute terms (Supplementary Article Information 16), but this is not indicative of any difference in clinical or cognitive outcomes. Future work would benefit from more detailed and dimensional self-report variables relating to sex and gender 71 . The use of brain charts also does not circumvent the fundamental requirement for quality control of MRI data. We have shown that GAMLSS modelling of global structural MRI phenotypes is in fact remarkably robust to inclusion of poor-quality scans ( Supplementary Information 2), but it should not be assumed that this level of robustness will apply to future brain charts of regional MRI or functional MRI phenotypes; therefore, the importance of quality control remains paramount.
We have focused primarily on global brain phenotypes, which were measurable in the largest achievable sample, aggregated over the widest age range, with the fewest methodological, theoretical and data-sharing constraints. However, we have also provided proof-of-concept brain charts for regional grey matter volumetrics, demonstrating plausible heterochronicity of cortical patterning, and illustrating the potential generalizability of this approach to a diverse range of fine-grained MRI phenotypes (Fig. 2, Supplementary Information 8). As ongoing and future efforts provide increasing amounts of high-quality MRI data, we predict an iterative process of improved brain charts for an increasing number of multimodal 72 neuroimaging phenotypes. Such diversification will require the development, implementation and standardization of additional data quality control procedures 27 to underpin robust brain chart modelling. To facilitate further research using our reference charts, we have provided interactive tools to explore these statistical models and to derive normalized centile scores for new datasets across the lifespan at www.brainchart.io.

Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41586-022-04554-y.