Article | Open | Published:

# Sample composition alters associations between age and brain structure

## Abstract

Despite calls to incorporate population science into neuroimaging research, most studies recruit small, non-representative samples. Here, we examine whether sample composition influences age-related variation in global measurements of gray matter volume, thickness, and surface area. We apply sample weights to structural brain imaging data from a community-based sample of children aged 3–18 (N = 1162) to create a “weighted sample” that approximates the distribution of socioeconomic status, race/ethnicity, and sex in the U.S. Census. We compare associations between age and brain structure in this weighted sample to estimates from the original sample with no sample weights applied (i.e., unweighted). Compared to the unweighted sample, we observe earlier maturation of cortical and sub-cortical structures, and patterns of brain maturation that better reflect known developmental trajectories in the weighted sample. Our empirical demonstration of bias introduced by non-representative sampling in this neuroimaging cohort suggests that sample composition may influence understanding of fundamental neural processes.

## Introduction

Most neuroimaging studies rely on relatively small samples that are not representative of a well-defined target population. This has resulted in multiple calls to incorporate population science approaches into neuroimaging research1, 2. To date, however, the impact of convenience sampling on neuroimaging findings has not been examined empirically. In the current study, we address this need by examining whether sample composition influences age-related variation in brain structure among children in the United States.

All participants in research studies are drawn from target populations, even if study investigators do not explicitly define or enumerate that population. Even when a target population is defined (e.g., adults between the ages of 25 and 40 in the United States), study participants are unlikely to represent that target population unless they are randomly selected. Decades of methodological work in epidemiology and population science has shed light on the conditions that limit generalizability of findings generated from such non-representative samples3,4,5. This work suggests that sample composition may influence a study’s conclusions when the association between the independent and dependent variable (e.g., age and brain structure) differs between those selected into the study and those who are eligible from the target population but not included6, 7. Such a scenario is likely to occur when study participants do not represent the target population in characteristics known to influence neural structure or function, for example, socioeconomic status (SES)8. Participants recruited into neuroimaging studies are not typically selected to be representative of a known target population, under the assumption—implicit or explicit—that basic neural functions (e.g., visual processing) in healthy individuals are not influenced by sample characteristics. Study findings are often assumed to reflect universal aspects of brain structure and function regardless of the sampling strategy. However, this assumption is largely untested and likely false.

There are exceptional examples of neuroimaging studies that have attempted to select representative samples9, 10; however, logistical challenges and study design decisions reduce the generalizability of findings from these studies to the broader U.S. population. In the foundational NIH MRI Study of Normal Brain Development, investigators selected a sample representative of the population in the study areas9; however, this study included numerous exclusion criteria (e.g., the presence of clinically significant mental health symptoms) that reduced the true representativeness of the sample2. The more recent NKI Rockland study was also designed to minimize sampling bias and maximize generalizability and included a representative sample of children and adults from Rockland County, NY10. Although this study represents a considerable advance toward representative sampling in cognitive neuroscience, participants were from a single geographic location and had higher levels of SES than in the U.S. population overall, indicating that this sample does not fully represent the U.S. While sample composition has become a growing area of focus in neuroimaging research1, 2, to date there are no neuroimaging studies based on a representative sample of the U.S. population.

Here, we test the hypothesis that the use of non-representative samples in neuroimaging studies may influence interpretation of the association between age and brain structure. Age-related variation in brain structure in childhood and adolescence has been examined frequently in cognitive neuroscience. Prior studies have demonstrated substantial heterogeneity in the pattern of developmental change across brain structures and in the age at which peak thickness and surface area are reached for different cortical regions11,12,13,14,15. In the current study, we use a large neuroimaging data set of typically developing children, the Pediatric Imaging, Neurocognition and Genetics (PING) study16, to examine whether sample composition influences age–brain structure associations. We use 2010 U.S. Census data to estimate the national distributions of basic socio-demographic characteristics (i.e., race/ethnicity, age, sex, parental educational attainment, and income) for children in the age range of the study sample (3–18 years). We apply sample weights based on these distributions to the PING sample using a common epidemiological and survey method procedure called raking to create a weighted PING sample that approximates a representative sample of the U.S. To determine the impact of sample composition on age-related variation in brain structure, we compare associations of age with global and regional measures of gray matter structure in the original, unweighted PING sample (i.e., non-representative) to those from the weighted PING sample (i.e., more representative). We focus our analysis on global morphometric cortical gray matter measurements as well as measurements of each lobe of the brain. Specifically, we examine cortical volume, cortical surface area, and cortical thickness for the entire cortex and for the right and left hemispheres; we additionally examine cortical surface area and thickness of frontal, parietal, temporal, and occipital lobes. These are robust metrics of brain structure that are measured with high reliability relative to specific cortical regions17. We also examine the volume of three widely studied subcortical structures—amygdala, hippocampus, and basal ganglia—as well as total subcortical volume to determine whether sample composition has a greater influence on cortical vs. subcortical regions and on global vs. specific measures.

Our results suggest that sample composition alters the interpretation of how cortical and subcortical areas vary with age. In the weighted sample, we frequently observe cubic (S-shaped) developmental patterns for cortical surface area and subcortical volume and younger ages of peak surface area and volume compared to the unweighted sample. In contrast, we primarily observe quadratic (U-shaped) developmental trajectories and older ages at peak cortical surface area and subcortical volume in the unweighted sample. Our findings empirically demonstrate observable impacts of sample composition on cognitive neuroscience findings, even for questions about fundamental processes such as age-related change in neural structure.

## Results

### Image acquisition and processing

The MRI protocol and standardized image processing techniques used in the PING study were designed to extract high-quality multimodal imaging data in a multisite study of children40. For each participant a single whole brain, T1 weighted structural magnetic resonance image was acquired in the sagittal plane using interleaved slice acquisition. All images were acquired on a 3 T scanner at one of 10 different study sites using Siemens, GE, or Philips scanners. Acquisition parameters were standardized across sites and are detailed as follows: for Siemens: TE = 4.33 ms, TR = 2170 ms, flip angle = 7 degrees, 160 slices with 1 × 1 × 1.2 mm voxels, FoV = 256; for Philips: TE = 3.1 ms, TR = 1665.9 ms, flip angle = 8 degrees, 170 slices with 1 × 1 × 1.2 mm voxels, FoV = 256; GE: TE 1 = 4.0 ms, TR = 1500 ms, flip angle = 8 degrees, 170 slices with 1 × 1 × 1.2 mm voxels, FoV = 256. To reduce motion, prospective motion correction (PROMO) was applied during acquisition43. Because different scanners are likely to have different field inhomogeneities resulting in differential sources of image distortion, a gradient field nonlinearity correction was applied prior to analysis40.

Cortical thickness and surface area estimates were calculated with the FreeSurfer image analysis suite, which is documented and freely available for download online (http://surfer.nmr.mgh.harvard.edu). FreeSurfer morphometric procedures are well established60,61,62, have demonstrated good test-retest reliability across scanner manufacturers and field strengths63, have been validated against manual measurement64, 65 and histological analysis66, and have been successfully used in studies of children as young as age 467.

FreeSurfer methods applied to the processing of PING structural data included removal of non-brain tissue using a hybrid watershed/surface deformation procedure68, automated Talairach transformation, previously validated in pediatric populations69, and segmentation of the subcortical white matter and deep gray matter volumetric structures, separately validated for use with pediatric populations67, 70. FreeSurfer provided thickness and surface area estimates for 68 cortical regions (34 for each hemisphere), according to the Desikan-Killiany atlas60, 71. Labels for cortical gray matter were assigned using surface-based nonlinear registration to a gyral and sulcal-based atlas62 and Bayesian classification rules61, 71. For subcortical structures, an automated, atlas-based, volumetric segmentation procedure was used to calculate volumes in mm3 for each structure, also executed in FreeSurfer40.

Prior to inclusion in the final data set, neuroimaging data were required to pass rigorous quality-control procedures. All images were reviewed by trained technicians for significant motion artifacts, operator error and scanner dysfunction within 24 h of the scan to allow for the re-scanning of participants when possible40. T1-weighted images were examined slice by slice for any evidence of excessive motion and rated as either acceptable or for attempted rescan40. The subcortical segmentations, cortical parcellations, and white and pial surface reconstructions from the processed images were also reviewed by trained staff40.

The publically available PING data set provides preprocessed, labeled, and quality controlled structural data for cortical surface area and thickness, and subcortical volumes based on the high-resolution T1-weighted scan. We chose to examine global and lobe-specific measures of cortical structure as they show high test-retest reliability and are more precisely estimated than smaller, individual structures17. Cortical gray matter measurements included total cortical volume, left/right hemispheric cortical volume, total subcortical volume, overall mean cortical thickness, left/right hemispheric mean cortical thickness, total cortical surface area, and left/right total cortical surface area. We also generated measurements for surface area and thickness for each lobe of the brain (frontal, occipital, temporal, and parietal) by combining regions identified in the Desikan-Killiany atlas (see Supplementary Table 1 for a complete list of regions)60, 71. We examined three subcortical structures—amygdala, hippocampus, and basal ganglia.

### Creating sample weights

When a recruited sample does not adequately and proportionally cover segments of a target population, sample weights can be used so that the marginal totals of the adjusted weighted sample align with the target population on predefined characteristics (e.g., age, sex, race/ethnicity, SES, etc.). A classic way in which to create this alignment is through raking18, 19. In raking, the inverse of the marginal distribution of each variable to be included in the weight is iteratively multiplied across individuals in the sample. Each sample participant is then assigned a weight that is estimated as the difference between the unweighted value and the population distribution for the set of raked estimates. For illustrative purposes, consider two of the four variables we used for raking: sex and race. The raking procedure is essentially accomplished by first multiplying each individual by the inverse probability of being the sex that they are based on the overall population distribution of sex; the resulting estimates thus match the population distribution of sex, but not race. Then, each individual is multiplied by the inverse probability of being the race that they are given the overall population distribution of race. The resulting estimates thus match the population distribution of race, but now the sex estimates may not match population distributions. We then multiply again the individual by the inverse of the probability of their sex based on the population, and iteratively move through this sequence until there is convergence by which all of the weighted estimates match the population distributions within a caliper of error18, 47. The generalized raking procedure we followed was similar but with four variables: sex, race/ethnicity, parental education, and income, such that at the end of the procedure, the distributions of these demographic characteristics in the weighted sample were comparable to the population distribution of the U.S. Census in 2010. To improve the stability of estimates and ensure that results are not sensitive to a few individuals with extreme weights, it is traditional in raking procedures to “trim” the weights so that no extreme observation has undue influence72. We applied such trimming to our sample, using an initial weight to estimate interquartile ranges (IQR) of the input sample and adjusted the weights so that no observation fell outside of 3 IQR of the initial weight.

We estimated population totals from the American Community Survey (ACS) Public Use Microdata Sample from 2009–2011. We then applied a raking procedure to the data using the “WTADJUST” procedure in SUDAAN, which employs a model-based approach and can be interpreted as a generalized raking procedure. The equations used to estimate the post-stratification weights are provided in the SUDAAN language manual73, and we will summarize the main equation used for weight estimation here. Readers interested in full details of the generalized raking procedure are encouraged to refer to the manual for more details and full examples. We used the following equation for our post-stratification weight73:

$${\theta _k} = {\gamma _k}{\alpha _k} = {\gamma _k}\left( {\frac{{{l_k}\left( {{u_k} - {c_k}} \right) + {u_k}\left( {{c_k} - {l_k}} \right){\rm{exp}}\left( {{A_k}x_k^\prime \beta } \right)}}{{\left( {{u_k} - {c_k}} \right) + \left( {{c_k} - {l_k}} \right){\rm{exp}}\left( {{A_k}x_k^\prime \beta } \right)}}} \right)$$

In this equation and as applied to our analysis, k refers to each respondent in the PING data for which a final weight (θ k ) was estimated. This final weight is a function of γ k , the weight trimming factor used to stabilize the variance of the weighted estimates, and α k , the post-stratification adjustment. The post-stratification adjustment (α k ) is described by a vector of the socio-demographic variables we included ($$x_k^{\prime}$$, which in our model is sex, race/ethnicity, parental education, and income) and the model parameters (β) based on a logistic function. The remaining factors that determine the final weight are A k , l k , u k , and c k . These are all adjustments to improve weight stability, and include a lower bound (l k ), and upper bound (u k ), and a centering constant (c k ) for the weight of any individual in the data, which is required to be between the lower and upper board. A k is an additional constant that adjusts the final weight for stability. In summary, generalized raking procedures produce stable weight estimates based on a set of user-defined parameters that control the performance of the weight, as well as user-inputted variables that allow for the adjustment of each individual respondent so that the weighted sample as a whole is representative of the selected characteristics in the user-defined target population. We provide all of our statistical code as an online supplement that includes our user-defined parameters and assumptions that we made in the statistical model regarding weight trimming factors (see Supplementary Data 1).

### Regression models

We next estimated separate models of the association of age with global and regional measures of gray matter structure to determine whether a linear, quadratic, or cubic term for age provided the best fit to the data. The best-fitting model for each measure was determined by comparing the AIC21. The more complex model (i.e., with quadratic or cubic terms) was selected when the AIC was at least 2.5 points lower than the AIC in for the less complex model25. AIC is commonly used for model selection (i.e., selecting covariates that provide the best fit to the data and selecting the best functional form of a model)22,23,24. Model fit statistics determine how well a particular model aligns with the underlying data, while taking into account the number of parameters in that model (rather than examining the statistical significance of each parameter individually). Model fit has long been accepted as the gold standard approach for model selection across a wide range of scientific disciplines, including the behavioral sciences and epidemiology24, 25; this approach is particularly well suited for deciding among models with polynomial terms24.

All models included covariates for sex, race/ethnicity, parent educational attainment, family income, and scanner. Models for subcortical volume measurements also included intracranial volume (ICV). For both the unweighted and weighted samples, we used this same model building strategy to arrive at the best-fitting model to describe age-related variation in brain structure, so differences between the models can be attributed to the application of the sample weighting technique and underlying differences in the distribution of demographic characteristics in the unweighted and weighted samples.

To determine the extent to which differences in model parameterization led to meaningful differences in the interpretation of age-related variation between analytic approaches, we generated predicted values for each brain measure (area, thickness, and volume) at each age using the best-fitting unweighted and weighted data and graphed these results. We also calculated the difference in age at peak surface area and volume in both unweighted and weighted data where applicable (i.e., in quadratic and cubic models) by calculating the first-order derivative of the fitted curves. For quadratic models, we used the following formula to estimate peak age:

$${\rm{MeanAge}} + \frac{{ - {\it{a}}1\beta }}{{2{\rm{*}}{\it{a}}2\beta }}$$

where MeanAge is the estimated sample mean, a1β is the beta estimate for the linear age term in the regression model, and a2β is the beta estimate for the age-squared term from the regression model. For cubic models, we used the following formula to estimate peak age:

$${\rm{MeanAge}} + \frac{{ - \left( {2{\rm{*}}{\it{a}}2\beta } \right) \pm \sqrt {{{\left( {2{\rm{*}}{\it{a}}2\beta } \right)}^2} - 4{\rm{*}}\left( {3{\rm{*}}{\it{a}}3\beta } \right){\rm{*}}{\it{a}}1\beta } }}{{2{\rm{*}}(3{\rm{*}}{\it{a}}3\beta )}}$$

where MeanAge is the estimated sample mean, a1β is the beta estimate for the linear age term in the regression model, a2β is the beta estimate for the age-squared term from the regression model, and a3β is the beta term for the age-cubed term from the regression model.

The predicted value graphs are intended to help readers visualize differences between the best-fitting unweighted and weighted data, as even models with quadratic or cubic terms can describe patterns of variation that are effectively linear. However, we are unable to compare aspects of these graphs (e.g., differences in slopes) with statistical tests because they are derived from different samples. For the same reason, calculations of age at peak surface area cannot be statistically compared between unweighted and weighted data and are included to provide a more tangible demonstration of how age-related trajectories in brain development may differ as a result of sample composition. For subcortical volume, final models also included ICV, and thus peak age was based on predicted values averaging the estimated volume within each 2-year age interval. To examine whether differences between unweighted and weighted models may be due to differences in head size, we examined subcortical ICV as an outcome. For subcortical ICV, the best-fitting models in the unweighted and weighted data were quadratic and indicated similar rates of change with age (see Supplementary Table 6 and Supplementary Fig. 1).

### Data availability

The PING Data Resource includes neurodevelopmental histories, information about developing mental and emotional functions, multimodal brain imaging data, and genotypes for over 1000 children and adolescents. The data are available to members of the research community after submission of data use requests, agreement to the data use policies, and registration. More information about the PING Data Resource is available at http://pingstudy.ucsd.edu/ and http://ping.chd.ucsd.edu/. Our statistical code is available in Supplementary Data 1 and it is also available on GitHub at the following link: https://github.com/kajalewinn/PING.git.

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## References

1. 1.

Falk, E. B. et al. What is a representative brain? Neuroscience meets population science. Proc. Natl Acad. Sci. USA 110, 17615–17622 (2013).

2. 2.

Paus, T. Population neuroscience: why and how. Hum. Brain Mapp. 31, 891–903 (2010).

3. 3.

Shadish, W. R., Cook, T. D. & Campbell, D. T. Experimental and Quasi-Experimental Designs for Generalized Causal Inference (Houghton Mifflin Company, 2002).

4. 4.

Westreich, D. et al. Causal impact: epidemiological approaches for a public health of consequence. Am. J. Public Health 106, 1011–1012 (2016).

5. 5.

Hernán, M. A. & VanderWeele, T. J. Compound treatments and transportability of causal inference. Epidemiology 22, 368 (2011).

6. 6.

Hernán, M. A., Hernández-Díaz, S. & Robins, J. M. A structural approach to selection bias. Epidemiology 15, 615 (2004).

7. 7.

Stuart, E. A., Cole, S. R., Bradshaw, C. P. & Leaf, P. J. The use of propensity scores to assess the generalizability of results from randomized trials. J. R. Stat. Soc. 174, 369–386 (2011).

8. 8.

Noble, K. G. et al. Family income, parental education and brain structure in children and adolescents. Nat. Neurosci. 18, 773–778 (2015).

9. 9.

Evans, A. C., Brain Development Cooperative Group. The NIH MRI study of normal brain development. Neuroimage 30, 184–202 (2006).

10. 10.

Nooner, K. B. et al. The NKI-Rockland sample: a model for accelerating the pace of discovery science in psychiatry. Front. Neurosci. 6, 152 (2012).

11. 11.

Shaw, P. et al. Neurodevelopmental trajectories of the human cerebral cortex. J. Neurosci. 28, 3586–3594 (2008).

12. 12.

Giedd, J. N. & Rapoport, J. L. Structural MRI of pediatric brain development: what have we learned and where are we going? Neuron 67, 728–734 (2010).

13. 13.

Ostby, Y. et al. Heterogeneity in subcortical brain development: a structural magnetic resonance imaging study of brain maturation from 8 to 30 years. J. Neurosci. 29, 11772–11782 (2009).

14. 14.

Mills, K. L. et al. Structural brain development between childhood and adulthood: convergence across four longitudinal samples. Neuroimage 141, 273–281 (2016).

15. 15.

Tamnes, C. K. et al. Brain maturation in adolescence and young adulthood: regional age-related changes in cortical thickness and white matter volume and microstructure. Cereb. Cortex 20, 534–548 (2010).

16. 16.

Fjell, A. M. et al. Multimodal imaging of the self-regulating developing brain. Proc. Natl Acad. Sci. USA 109, 19620–19625 (2012).

17. 17.

Ge, T. et al. Massively expedited genome-wide heritability analysis (MEGHA). Proc. Natl Acad. Sci. USA 112, 2479–2484 (2015).

18. 18.

Kalton, G. & Flores-Cervantes, I. Weighting methods. J. Off. Stat. 19, 81–97 (2003).

19. 19.

Kishl, L. Methods for design effects. J. Off. Stat. 11, 55–77 (1995).

20. 20.

Lancaster, H. O. & Seneta, E. Chi-square distribution. Encycl. Biostat. doi:10.1002/0470011815.b2a15018 (2005).

21. 21.

Sawa, T. Information criteria for discriminating among alternative regression models. Econometrica 46, 1273 (1978).

22. 22.

Bozdogan, H. Model selection and Akaike’s information criterion (AIC): the general theory and its analytical extensions. Psychometrika 52, 345–370 (1987).

23. 23.

Buckland, S. T., Burnham, K. P. & Augustin, N. H. Model selection: an integral part of inference. Biometrics 53, 603–618 (1997).

24. 24.

Sclove, S. L. Application of model-selection criteria to some problems in multivariate analysis. Psychometrika 52, 333–343 (1987).

25. 25.

Burnham, K. P. & Anderson, D. R. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach (Springer, New York, 2003).

26. 26.

Harrell, F. Regression Modeling Strategies with Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis (Springer-Verlag, New York, 2015).

27. 27.

Wierenga, L. M., Langen, M., Oranje, B. & Durston, S. Unique developmental trajectories of cortical thickness and surface area. Neuroimage 87, 120–126 (2014).

28. 28.

Thompson, R. A. & Nelson, C. A. Developmental science and the media: early brain development. Am. Psychol. 56, 5–15 (2001).

29. 29.

Luciana, M., Conklin, H. M., Hooper, C. J. & Yarger, R. S. The development of nonverbal working memory and executive control processes in adolescents. Child Dev. 76, 697–712 (2005).

30. 30.

Welsh, M. C., Pennington, B. F. & Groisser, D. B. A normative-developmental study of executive function: a window on prefrontal function in children. Dev. Neuropsychol. 7, 131–149 (1991).

31. 31.

Johnson, J. S. & Newport, E. L. Critical period effects in second language learning: the influence of maturational state on the acquisition of English as a second language. Cogn. Psychol. 21, 60–99 (1989).

32. 32.

Gogtay, N. & Giedd, J. N. Dynamic mapping of human cortical development during childhood through early adulthood. Proc. Natl Acad. Sci. USA 101, 1–6 (2004).

33. 33.

Johansen, J. P., Cain, C. K., Ostroff, L. E. & LeDoux, J. E. Molecular mechanisms of fear learning and memory. Cell 147, 509–524 (2011).

34. 34.

Packard, M. G. & Knowlton, B. J. Learning and memory functions of the basal ganglia. Annu. Rev. Neurosci. 25, 563–593 (2002).

35. 35.

O’Doherty, J. P. Reward representations and reward-related learning in the human brain: insights from neuroimaging. Curr. Opin. Neurobiol. 14, 769–776 (2004).

36. 36.

Hanson, J. L., Chandra, A., Wolfe, B. L. & Pollak, S. D. Association between income and the hippocampus. PLoS ONE 6, e18712 (2011).

37. 37.

Noble, K. G., Houston, S. M., Kan, E. & Sowell, E. R. Neural correlates of socioeconomic status in the developing human brain. Dev. Sci. 15, 516–527 (2012).

38. 38.

Belsky, J., Schlomer, G. L. & Ellis, B. J. Beyond cumulative risk: distinguishing harshness and unpredictability as determinants of parenting and early life history strategy. Dev. Psychol. 48, 662–673 (2012).

39. 39.

Ellis, B. J., Figueredo, A. J., Brumbach, B. H. & Schlomer, G. L. Fundamental dimensions of environmental risk: the impact of harsh versus unpredictable environments on the evolution and development of life history strategies. Hum. Nat. 20, 204–268 (2009).

40. 40.

Jernigan, T. L. et al. The Pediatric Imaging, Neurocognition, and Genetics (PING) data repository. Neuroimage 124, 1149–1154 (2016).

41. 41.

de Kieviet, J. F., Zoetebier, L., van Elburg, R. M., Vermeulen, R. J. & Oosterlaan, J. Brain development of very preterm and very low-birthweight children in childhood and adolescence: a meta-analysis. Dev. Med. Child Neurol. 54, 313–323 (2012).

42. 42.

McLaughlin, K. A., Peverill, M., Gold, A. L. & Alves, S. Child maltreatment and neural systems underlying emotion regulation. J. Am. Acad. Child Adolesc. Psychiatry 9, 753–762 (2015).

43. 43.

White, N. et al. PROMO: real-time prospective motion correction in MRI using image-based tracking. Magn. Reson. Med. 63, 91–105 (2010).

44. 44.

Kuperman, J. M. et al. Prospective motion correction improves diagnostic utility of pediatric MRI scans. Pediatr. Radiol. 41, 1578–1582 (2011).

45. 45.

Ducharme, S. et al. Trajectories of cortical thickness maturation in normal brain development—the importance of quality control procedures. Neuroimage 125, 267–279 (2016).

46. 46.

Reuter, M. et al. Head motion during MRI acquisition reduces gray matter volume and thickness estimates. Neuroimage 107, 107–115 (2015).

47. 47.

Valliant, R., Dever, J. A. & Kreuter, F. Practical Tools for Designing and Weighting Survey Samples (Springer, New York, 2013).

48. 48.

Schumann, G. et al. The IMAGEN study: reinforcement-related behaviour in normal brain function and psychopathology. Mol. Psychiatry 15, 1128–1139 (2010).

49. 49.

Vulser, H. et al. Subthreshold depression and regional brain volumes in young community adolescents. J. Am. Acad. Child Adolesc. Psychiatry 54, 832–840 (2015).

50. 50.

Adoelscent Brain Cognitive Development. School Selection Proceedures. https://abcdstudy.org/school-selection.html (2017)

51. 51.

Button, K. S. et al. Power failure: why small sample size undermines the reliability of neuroscience. Nat. Rev. Neurosci. 14, 365–376 (2013).

52. 52.

Boekel, W. et al. A purely confirmatory replication study of structural brain-behavior correlations. Cortex 66, 115–133 (2015).

53. 53.

Eklund, A., Nichols, T. E. & Knutsson, H. Cluster failure: why fMRI inferences for spatial extent have inflated false-positive rates. Proc. Natl Acad. Sci. USA 113, 7900–7905 (2016).

54. 54.

Evans, G. W. & Kim, P. Multiple risk exposure as a potential explanatory mechanism for the socioeconomic status-health gradient. Ann. N. Y. Acad. Sci. 1186, 174–189 (2010).

55. 55.

Mohai, P., Lantz, P. M., Morenoff, J., House, J. S. & Mero, R. P. Racial and socioeconomic disparities in residential proximity to polluting industrial facilities: evidence from the Americans’ changing lives study. Am. J. Public Health 99, S649–S656 (2009).

56. 56.

Bruine de Bruin, W., Parker, A. M. & Fischhoff, B. Individual differences in adult decision-making competence. J. Pers. Soc. Psychol. 92, 938–956 (2007).

57. 57.

Kraus, M. W., Cote, S. & Keltner, D. Social class, contextualism, and empathic accuracy. Psychol. Sci. 21, 1716–1723 (2010).

58. 58.

United States Census Bureau. Statistical Abstract of the United States (Claitors Publishing Division, Suitland, MD, 2006).

59. 59.

Lynch, J., Kaplan, G. & Shema, S. Cumulative impact of sustained economic hardship on physical, cognitive, psychological, and social functioning. Year Book Psychiatry Appl. Ment. Health 1999, 219–220 (1999).

60. 60.

Fischl, B. & Dale, A. M. Measuring the thickness of the human cerebral cortex from magnetic resonance images. Proc. Natl Acad. Sci. USA 97, 11050–11055 (2000).

61. 61.

Fischl, B. et al. Automatically parcellating the human cerebral cortex. Cereb. Cortex 14, 11–22 (2004).

62. 62.

Fischl, B., Sereno, M. I. & Dale, A. M. Cortical surface-based analysis. II: inflation, flattening, and a surface-based coordinate system. Neuroimage 9, 195–207 (1999).

63. 63.

Han, X. et al. Reliability of MRI-derived measurements of human cerebral cortical thickness: the effects of field strength, scanner upgrade and manufacturer. Neuroimage 32, 180–194 (2006).

64. 64.

Kuperberg, G. R. et al. Regionally localized thinning of the cerebral cortex in schizophrenia. Arch. Gen. Psychiatry 60, 878–888 (2003).

65. 65.

Salat, D. H. et al. Thinning of the cerebral cortex in aging. Cereb. Cortex 14, 721–730 (2004).

66. 66.

Rosas, H. D. et al. Regional and progressive thinning of the cortical ribbon in Huntington’s disease. Neurology 58, 695–701 (2002).

67. 67.

Ghosh, S. S., Kakunoori, S. & Augustinack, J. Evaluating the validity of volume-based and surface-based brain image registration for developmental cognitive neuroscience studies in children 4 to 11years of age. Neuroimage 15, 85–93 (2010).

68. 68.

Ségonne, F. et al. A hybrid approach to the skull stripping problem in MRI. Neuroimage 22, 1060–1075 (2004).

69. 69.

Burgund, E. D. et al. The feasibility of a common stereotactic space for children and adults in fMRI studies of development. Neuroimage 17, 184–200 (2002).

70. 70.

Kharitonova, M., Martin, R. E., Gabrieli, J. D. E. & Sheridan, M. A. Cortical gray-matter thinning is associated with age-related improvements on executive function tasks. Dev. Cogn. Neurosci. 6, 61–71 (2013).

71. 71.

Desikan, R. S. et al. An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. Neuroimage 31, 968–980 (2006).

72. 72.

Izrael, D., Battaglia, M. P. & Frankel, M. R. Extreme survey weight adjustment as a component of sample balancing (aka raking). SAS Global Forum 247 (2009).

73. 73.

Barnwell, B., Bieler, G., & Witt, M. SUDAAN Language Manual, Release 9.0. (Research Triangle Park, NC, Research Triangle Institute, 2004).

## Acknowledgements

Funding for this project was provided by the National Institutes of Mental Health (K01MH097978 to K.Z.L.; R01-MH103291 and R01-106482 to K.A.M.), the National Institute on Drug Abuse (R03DA037405 to M.A.S.), the National Institute on Alcohol Abuse and Alcoholism (K01AA021511 to K.M.K.), an Early Career Research Fellowship from the Jacobs Foundation to K.A.M., and a Rising Star Research Award grant from AIM for Mental Health, a program of One Mind Institute (IMHRO) to K.A.M. Data collection and sharing for this project was funded by the Pediatric Imaging, Neurocognition and Genetics Study (PING) (National Institutes of Health Grant RC2DA029475). PING is funded by the National Institute on Drug Abuse and the Eunice Kennedy Shriver National Institute of Child Health & Human Development. PING data are disseminated by the PING Coordinating Center at the Center for Human Development, University of California, San Diego. We thank Dr. Randy Buckner for his helpful comments on an earlier version of this manuscript.

## Author information

### Affiliations

1. #### Department of Psychiatry, University of California, San Francisco, 401 Parnassus Ave., San Francisco, 94143, USA

• Kaja Z. LeWinn
2. #### Clinical Psychology Department, University of North Carolina at Chapel Hill, 235 E. Cameron Avenue, Chapel Hill, NC, 27599, USA

• Margaret A. Sheridan
3. #### Department of Epidemiology, Columbia University, 722 West 168th Street #724, New York, NY, 10032, USA

• Katherine M. Keyes
•  & Ava Hamilton
4. #### Department of Psychology, University of Washington, Box 351525, Seattle, WA, 98195, USA

• Katie A. McLaughlin

### Contributions

K.Z.L. conceptualized the study. K.Z.L., K.A.M., and M.A.S. designed the study. K.M.K. developed the statistical methods and supervised A.H. who analyzed data and produced tables and figures. K.Z.L. and K.A.M. wrote and revised the manuscript; K.M.K. and M.A.S. wrote sections of the manuscript, contributed to interpretation of findings, and reviewed the manuscript.

### Competing interests

The authors declare no competing financial interests.

### Corresponding author

Correspondence to Kaja Z. LeWinn.

## Electronic supplementary material

### DOI

https://doi.org/10.1038/s41467-017-00908-7

• ### Social and population health science approaches to understand the human microbiome

• Pamela Herd
• , Alberto Palloni
• , Federico Rey
•  & Jennifer B. Dowd

Nature Human Behaviour (2018)

• ### The complex aetiology of cerebral palsy

• Steven J. Korzeniewski
• , Jaime Slaughter
• , Peterson Haak
•  & Nigel Paneth

Nature Reviews Neurology (2018)