## Abstract

In the dawning era of large-scale biomedical data, multidimensional phenotype vectors will play an increasing role in examining the genetic underpinnings of brain features, behaviour and disease. For example, shape measurements derived from brain MRI scans are multidimensional geometric descriptions of brain structure and provide an alternate class of phenotypes that remains largely unexplored in genetic studies. Here we extend the concept of heritability to multidimensional traits, and present the first comprehensive analysis of the heritability of neuroanatomical shape measurements across an ensemble of brain structures based on genome-wide SNP and MRI data from 1,320 unrelated, young and healthy individuals. We replicate our findings in an extended twin sample from the Human Connectome Project (HCP). Our results demonstrate that neuroanatomical shape can be significantly heritable, above and beyond volume, and can serve as a complementary phenotype to study the genetic determinants and clinical relevance of brain structure.

## Introduction

The exponential progress in genomic technologies has accelerated the examination of the genetic underpinnings of complex phenotypes, such as psychiatric and neurological disorders, many of which are highly heritable^{1,2}. For example, large-scale genome-wide association studies (GWASs) have provided insights about common genetic variants linked with a range of clinical conditions^{3,4,5,6}. Most prior genetic studies have focused on univariate (scalar) phenotypes, such as diagnosis or a quantitative measurement. However, with the emergence of large-scale data collection efforts, such as the Human Connectome Project (HCP; http://www.humanconnectome.org) and the UK Biobank (http://www.ukbiobank.ac.uk), each subject can be linked to a high-dimensional phenotype vector, which might include imaging measurements or electronic health record. Such phenotypically rich studies open up the opportunity to analyse collections of multidimensional phenotypes, which can be more informative than scalar traits.

Brain imaging is playing an increasing role in the study of the relationship between genetic variants, neuroanatomy, behaviour and disease susceptibility^{7,8,9,10}. To date, most structural neuroimaging genetics studies have utilized the size, average cortical thickness or surface area of a brain region to yield important discoveries about the genetic basis of brain morphology (see, for example, refs 11, 12, 13, 14). While these measurements capture a few basic dimensions of anatomical variability, they provide a limited description of the underlying geometry.

Neuroanatomical shape measurements—multidimensional geometric descriptions of brain structure—have attracted increasing attention in medical image analysis. Shape measurements characterize isometry-invariant (in particular, independent of location and orientation) geometric attributes of an object, which provide a rich description of an anatomical structure and can encompass volumetric variation. Such measurements may thus offer increased sensitivity and specificity in examining the clinical relevance and genetic underpinnings of brain structure. Recent studies have shown that the shape of subcortical brain regions and cortical folding patterns provide information that is not available in volumetric measurements and is predictive of disease status, onset and progression in schizophrenia^{15,16,17}, autism^{18,19}, bipolar disorder^{20,21}, Alzheimer’s disease^{22,23,24,25} and other mental disorders^{26,27}. There is also increasing evidence that genetic variants may have influences on brain morphology that can be captured by shape measurements^{28,29,30,31,32}.

This paper makes two major contributions to the investigation of the genetic basis of neuroanatomical shape. First, we extend the theoretical concept of heritability to multidimensional traits, such as the shape descriptor of an object, and propose a novel method to estimate the heritability of multidimensional traits based on genome-wide single-nucleotide polymorphism (SNP) data from unrelated individuals (known as SNP heritability). Our estimation method builds on genome-wide complex trait analysis^{33,34} and phenotype correlation–genetic correlation regression^{35}, and generalizes these techniques to the multivariate setting. Second, using structural magnetic resonance imaging (MRI) and SNP data from 1,320 unrelated individuals collected as part of the Harvard/Massachusetts General Hospital (MGH) Brain Genomics Superstruct Project (GSP)^{36,37}, we present the first comprehensive heritability analysis of the shape of an ensemble of brain structures, quantified by the truncated Laplace–Beltrami spectrum (LBS; also known as the ‘Shape-DNA’)^{38,39,40}, in this young (18–35 years) and healthy cohort, and devise a strategy to visualize primary modes of shape variation. We also replicate our findings in an extended twin sample (72 monozygotic (MZ) twin pairs, 69 dizygotic (DZ) twin pairs, 253 full siblings of twins and 55 singletons) from the HCP^{41}.

The truncated LBS is a multidimensional shape descriptor, which can be obtained by solving an eigenvalue problem on the two-dimensional (2D) boundary surface representation of an object. It is invariant to the representation of the object including parameterization, location and orientation, and thus does not require spatial alignment with a population template, making it computationally efficient and robust to registration errors. LBS also depends continuously on topology-preserving deformations, and is thus suitable to quantify differences between shapes. Recent empirical evidence suggests that the LBS-based shape descriptor provides a discriminative characterization of brain anatomy and offers state-of-the-art performance for a range of shape retrieval and segmentation applications^{42,43}. A collection of the descriptors of brain structures, known as the *BrainPrint*, can provide an accurate and holistic representation of brain morphology, and has been successfully applied to subject identification, sex and age prediction, brain asymmetry analysis, twin studies, and computer-aided diagnosis of dementia^{40,44}. Our LBS-based heritability analyses demonstrate that neuroanatomical shape can be significantly heritable, above and beyond volume, and yield a complementary phenotype that offers a unique perspective in studying the genetic determinants of brain structure.

## Results

### Heritability of the volume of neuroanatomical structures

To benchmark our shape results, we first computed SNP heritability estimates for the volumetric measurements of an array of brain regions using the GSP sample. Table 1 lists these heritability estimates after adjusting for intracranial volume (or head size) as a covariate. Point estimates of the heritability of volumetric measurements suggested that several neuroanatomical structures have moderately heritable volumes. In particular, the caudate, corpus callosum, lateral ventricle, third and fourth ventricles, pallidum, putamen, and thalamus all had volume heritability estimates >25%. Table 1 further includes *P* values for the statistical significance of the heritability estimates. The parametric (Wald) and nonparametric (permutation-based) *P* values were virtually identical, confirming the accuracy of the s.e. estimates we computed (see Methods). We observe that none of the volume heritability estimates were statistically significant after correcting for multiple comparisons (false discovery rate (FDR) at *q*=0.05), possibly due to sample size limitations. Only the volumes of the caudate, corpus callosum and third ventricle achieved a heritability that was nominally significant in our sample (uncorrected *P*<0.05). Table 1 also includes test–retest reliability estimates of volume after regressing out intracranial volume, computed as Lin’s concordance correlation coefficient^{45} using measurements from 42 subjects with repeated scans on separate days. Almost all the structures had a volume estimate reliability >0.75 except for the pallidum. There was no significant correlation between the reliability and heritability estimates of volume (*P*=0.828).

### Heritability of the shape of neuroanatomical structures

Neuroanatomical shape measurements provide a geometric characterization and a rich description of brain structure. We therefore hypothesize that analysing the shape variation of neuroanatomical structures can identify genetic influences beyond captured by volumetric measurements. Figure 1 and Table 2 show the SNP heritability estimates of the shape of an ensemble of brain structures using the GSP sample. These estimates were computed based on LBS descriptors normalized for size and explicitly including the volume of the corresponding structure as a covariate in the analysis to account for potential volume effects. A number of structures showed moderate-to-high SNP heritability. Specifically, the shape of the caudate, cerebellum, corpus callosum, hippocampus, third ventricle and putamen exhibited heritability estimates >25%. All these estimates were statistically significant after correcting for an FDR at *q*=0.05. We observe that this is in contrast with the case of volume, where despite a similar heritability range, no estimate reached FDR-corrected significance. The main reason for this discrepancy is the theoretically guaranteed reduced s.e.’s in SNP heritability estimates of multidimensional traits (see Methods for a theoretical treatment). The shape of the accumbens area was also marginally significantly heritable with an uncorrected *P* value <0.05. As in the case of volume, the parametric (Wald) *P* values were virtually identical to the permutation *P* values, suggesting that our s.e. estimates are accurate (see Methods).

Table 2 also lists test–retest reliability estimates for the shape of different structures. Analogous to the case of volume, we quantified reliability as the average Lin’s concordance correlation coefficient of individual components of the multidimensional shape descriptor from 42 subjects with repeat scans on separate days. These results suggest that the LBS-based shape descriptors were overall less reliable than volumetric measurements, with half of the structures exhibiting a shape reliability <0.75. This is likely due to the increased sensitivity of shape to segmentation differences relative to the volume. Furthermore, there was a marginally significant correlation between reliability and heritability of shape (Pearson’s *r*=0.562 and *P*=0.057). We conclude that close to 30% of the variation in shape heritability across structures can be attributed to the reliability of the shape descriptor. This suggests that for structures that exhibited low shape heritability (for example, amygdala), a more accurate image segmentation and shape analysis pipeline might yield an increased estimate of heritability. We further conducted a sensitivity analysis of shape heritability estimates, with respect to the two free parameters of the LBS-based shape descriptor: number of eigenvalues incorporated and amount of smoothing applied to the surface mesh representing the geometry of the object. Supplementary Figure 1 shows that the heritability estimates were largely robust to variations in these parameters.

We sought to replicate these findings in the HCP sample, in which subjects are healthy and have a similar age range as the GSP sample. We selected 590 non-Hispanic/Latino Europeans aged between 22 and 35 years, comprising 72 MZ twin pairs, 69 DZ twin pairs, 253 full siblings of twins and 55 singletons (single birth individuals without siblings). We estimated the shape heritability for brain structures that had significantly heritable shapes in the GSP sample using an ACE model (A: additive genetics; C: common environment; E: unique or subject-specific environment), where the additive genetic similarity was derived from pedigree information and the common environment term reflected household sharing between subjects. We also obtained the s.e. of the shape heritability estimates using a block bootstrapping procedure (see Methods). All the shapes we analysed were significantly heritable in the HCP sample: accumbens area 0.309±0.081; caudate 0.583±0.062; cerebellum 0.653±0.060; corpus callosum 0.558±0.068; hippocampus 0.363±0.095; third ventricle 0.536±0.067; and putamen 0.483±0.106. We also observe that the HCP shape heritability estimates were consistently larger than the GSP estimates, which is theoretically expected because the SNP heritability estimated from unrelated GSP subjects only captured the genetic variation tagged by common SNPs in the data set, and is thus a lower bound for the classical narrow-sense heritability estimated from familial data such as the HCP sample.

### Visualizing the principal mode of shape variation

The LBS-based shape descriptor is suitable to efficiently and accurately extract intrinsic properties of the shape of brain structures from a large number of individuals, but is not designed to visually inspect shape differences. Here we propose a strategy to visualize the principal mode of shape variation. Specifically, it can be shown that the first principal component (PC) of the multidimensional LBS-based shape descriptor captures the greatest shape variation and has the largest impact on the overall heritability estimate of the shape (see Methods). We thus visualized shape variation along the first PC of the shape descriptor for brain structures with significantly heritable shapes in the GSP sample: right caudate, cerebellum, corpus callosum, right hippocampus, third ventricle and left putamen. The illustrations of contralateral structures (that is, left caudate, left hippocampus and right putamen), which showed similar shape variation, are provided in Supplementary Fig. 2. In each panel of Fig. 2, the structure is represented with a sample-specific population average, on which average shapes at the two extremes (±2 s.d.) of the principal axis with identical volume (−2 s.d., blue; +2 s.d., red) are depicted. Blue regions indicate where shapes around the −2 s.d. are larger than shapes around the +2 s.d., and vice versa for the red regions.

The first PC of the right caudate captured 77% of the shape variation and had a SNP heritability estimate of 0.88. Moving along the principal mode of shape variation, the right caudate had a larger (smaller) head with a corresponding shorter (longer) tail. For the cerebellum, the first PC explained 69% of the shape variation and had a SNP heritability of 0.61. A clear expansion (contraction) of the anterior lobe and a corresponding contraction (expansion) of the posterior lobe can be observed along the principal axis. The first PC of the corpus callosum captured 41% of the shape variation and had a SNP heritability of 0.41. The principal mode captured an expansion (contraction) of the middle corpus callosum along the dorsoventral axis and a corresponding shrinking (enlargement) of the anterior and posterior part of the structure. For the right hippocampus, the first PC explained 69% of the shape variation, had a SNP heritability of 0.47 and exhibited dorsoventral widening (narrowing) of the body and corresponding lateral and anterior–posterior contraction (expansion). The first PC of the third ventricle captured 69% of the shape variation and had a SNP heritability estimate of 0.80. The principal mode captured an enlargement (shrinking) of the posterior protrusions and an expansion (contraction) of the lateral walls, coupled with a corresponding contraction (expansion) of the roof of the cleft. Finally, first PC of the left putamen explained 61% of the shape variation, had a SNP heritability of 0.70, and captured lateral widening (narrowing) and a corresponding contraction (expansion) along the dorsoventral and anterior–posterior axes.

## Discussion

This work makes two contributions to neuroscience and genetic research. First, we extend the concept of heritability to multidimensional traits and present an analytic strategy that generalizes SNP heritability analysis. The heritability estimator we propose for multidimensional traits has reduced uncertainty in its point estimate relative to univariate estimates, and thus offers more statistical power. Our empirical analyses confirmed this theoretical expectation. Moreover, and in the same line, we provide methods that can easily adjust for covariates in multivariate models, and also both parametric and nonparametric inferential tools that can assess the significance of a heritability estimate. Our approach opens the door to the genetic characterization of shape measurements and other multidimensional traits.

Second, we use the proposed approach to quantify the SNP heritability of the shape of an ensemble of anatomical structures spanning the human brain in a group of young and healthy subjects. The shape of caudate, cerebellum, corpus callosum, hippocampus, third ventricle and putamen exhibited moderate-to-high heritability (that is, >25%), after controlling for volume. All of these estimates achieved FDR-corrected significance at *q*=0.05. This is in contrast to the volume heritability estimates of the same set of brain structures on the same sample, none of which reached FDR-corrected significance. Although our heritability analysis of volume, which was used to benchmark the shape analysis, may be less informative compared with more powerful twin studies and large-scale meta-analyses in the literature, the increased statistical power and the additional information shape analysis can provide relative to volumetric analysis demonstrate the usefulness of our methods and underscore the potential of leveraging multidimensional traits when analysing data sets with moderate sample sizes.

Using the extended twin data from the HCP, we also replicated significant shape heritability estimates observed in the GSP sample. Our HCP estimates were consistently larger than the GSP estimates, which is theoretically expected because SNP heritability estimated from the unrelated sample in GSP does not capture genetic contributions (for example, from rare variants) that are not tagged by genotyped SNPs, and thus explains a smaller proportion of the phenotypic variation. However, additional factors may contribute to the difference between SNP and familial heritability estimates, which include improper modelling of shared environment, assortative mating, genetic interaction (epistasis), suboptimal statistical methods for heritability estimation and differences in sample characteristics such as age range, ethnic background and environmental exposures^{35,46,47,48,49,50,51}. Dissecting the discrepancy in heritability estimates from familial and unrelated data is an area under active investigation. More systematic future work is required to fully disentangle this problem.

A handful of prior neuroimaging studies have explored the shape of certain brain structures as potential phenotypes in examining genetic associations. For example, Qiu *et al.*^{28} and Shi *et al.*^{29} reported influences of the apolipoprotein E *ɛ*4 allele on hippocampal morphology in depressive and Alzheimer’s disease patients. Variants involved in the regulation of the *FKBP5* gene were recently associated with hippocampal shape^{30}. A meta-study^{32} identified a GWAS significant SNP that exerts its effect on the shape of putamen bilaterally. Prior studies have also estimated heritability of shape based on familial relatedness. In a recent study, the heritability of the shape of subcortical and limbic structures was estimated using data from multigenerational families with schizophrenia^{31}. In other related work, Mamah *et al.*^{52} and Harms *et al.*^{53} revealed shape abnormalities in basal ganglia structures (caudate, putamen and globus pallidus) and the thalamus in siblings of schizophrenia patients. An application of the LBS-based shape descriptor to twin data found increased shape similarity of brain structures in MZ twin pairs relative to DZ twins, indicating genetic influences on brain morphology^{40}, although heritability was not estimated.

However, to date, outside of these notable exceptions, most structural imaging genetics studies have utilized scalar measurements (for example, volume, thickness and area) as phenotypes. In the present study, we accounted for potential volume effects in our shape analyses by normalizing the LBS-based shape descriptor for size and additionally including the volumetric measurement of the corresponding structure as a covariate when estimating heritability. Our results show that shape measurements provide a rich and novel set of phenotypes for exploring the genetic basis of brain structure, and may identify novel genetic influences on the brain that are not detectable with conventional analyses based on the volume of structures.

There are several biological mechanisms that might lead to shape differences with minimal effect on the overall size of the structure. These include localized volumetric effects that are confined to subfields, sub-nuclei or other sub-regions that make up the structure. Shape analysis may provide significant information about neurodevelopmental abnormalities, such as those associated with neuronal migration, synaptogenesis, synaptic pruning and myelination. Shape measurements might for example shed light on morphogenetic mechanisms that involve mechanical tensions along axons, dendrites and glial processes^{54}. Thus, shape measurements are particularly promising phenotypes for studying neurodevelopmental disorders. Neurodegenerative processes and other pathologies, many of which are known to be genetically influenced, can also have an impact on neuroanatomical shape by exerting focal and/or selective insults. For example, in Alzheimer’s disease, morphological alterations in the hippocampus may only target certain subfields^{55}.

The shape analysis literature offers an expanding list of methods to quantify and characterize shape^{43}. A major advantage of the LBS-based shape descriptor^{38} employed in this study is that it is robust to intensity variation across scans and does not require the nonlinear spatial registration of the object with a population template, which can be computationally demanding and prone to error. In this paper, we also present a novel strategy to visualize the principal mode of shape variation across the population. For brain structures with significantly heritable shapes, we demonstrated that the principal mode explains a large portion of the overall shape variation and is often highly heritable. This approach can thus shed light on the global genetic influences on brain structures, and is complementary to studies that rely on nonlinear group-wise registration to characterize localized genetic influences on shape variation.

In the present study, in light of the similar shape variation of bilateral brain structures as observed in Fig. 2 and Supplementary Fig. 2, and to increase signal-to-noise ratio and statistical power, we combined the left and right structures in our shape heritability analysis. SNP heritability estimates of the shape of bilateral brain structures using the GSP sample are provided in Supplementary Table 1. The results indicate that the genetic influences on several anatomical structures (for example, caudate) may be lateralized, although with the current sample size we are not able to claim that the lateral difference is significant. It will be interesting to investigate the laterality of brain structures from the genetic perspective when we have a better understanding of the genetic basis of brain morphology and when a data set with larger sample size becomes available.

The heritability analysis of multidimensional traits developed here can be applied to phenotypes other than shape that are intrinsically multivariate. Another application might involve heritability or genetic association analyses combining related traits to obtain more stable effect estimates. For example, it can be used as an alternative to PC analysis and factor analysis when investigating the genetic basis of various psychometric or behavioural traits. Also, voxel- or vertex-level neuroimaging measurements are often noisy, and analysing these measurements in homogeneous brain regions in a multivariate manner may increase the reliability and reproducibility of the results. Finally, the empirical genetic similarity matrix can be computed with other SNP grouping strategies (for example, based on genes, pathways, functional annotations and previous GWAS findings) to model the genetic influences from a specific genomic region or partition the heritability of multidimensional traits, as in Yang *et al*^{56}.

## Methods

### Variance component models

We start with a brief review of variance component models (also known as random effects models) for the heritability analysis of univariate (scalar) traits, which provide a general statistical framework that can handle both familial designs and unrelated individuals randomly sampled from the population. Assuming, for the moment, no covariate needs to be adjusted, and a trait can be partitioned into the sum of additive genetic effect *g*, common (or shared) environment *c* and unique (subject-specific) environment *e*, the variance component model takes the following form:

where **y**=[*y*_{1}, …, *y*_{N}]^{⊤} is a vector comprising quantitative traits from *N* individuals, , and are the additive genetic variance, common environmental variance and unique environmental variance, respectively, *K* is a genetic similarity matrix, Λ quantifies shared environment between pairs of individuals and *I* is an identity matrix.

In familial studies, *K* is twice the kinship matrix, *K*=2Φ, and indicates expected additive genetic covariance among individuals. The *ij*th entry of the kinship matrix, *ϕ*_{ij}, known as the kinship coefficient, defines genetic relatedness for subjects *i* and *j*, and in general can be derived from pedigree information^{57,58}. Λ is a matrix that usually reflects household sharing between pairs of individuals. For example, in the present study, *ϕ*_{ij}=1/2 for MZ twins and *ϕ*_{ij}=1/4 for DZ twins and full siblings, and we assume that twin pairs and their non-twin siblings share the same environment and the corresponding elements in Λ are 1.

When modelling unrelated individuals randomly sampled from the population, *K* is the empirical genetic similarity matrix for each pair of individuals estimated from genome-wide SNP data, and the corresponding variance component parameter is the total additive genetic variance tagged by common SNPs spanning the genome. We note that in unrelated subject studies does not capture contributions (for example, from rare variants) that are not assayed by the genotyping microarray, and thus needs to be interpreted differently from in familial studies, although we use the same notation here for simplicity. In addition, the common environmental matrix Λ is often assumed to vanish when analysing unrelated individuals, in which case equation (1) becomes the classical model used in genome-wide complex trait analysis^{33,34}.

The heritability of a univariate (scalar) trait is defined as

where is the phenotypic variance. In familial studies, *h*^{2} measures the narrow-sense heritability of a trait, while in unrelated subject studies, *h*^{2} measures additive heritability attributable to common genetic variants (known as SNP heritability and often denoted as ), and provides a lower bound for the narrow-sense heritability estimated by familial studies.

### Heritability of multidimensional traits

We now consider an *M*-dimensional trait *Y*=[*y*_{1},⋯, *y*_{M}]=[*y*_{im}]_{N × M}. We model *Y* by a multivariate variance component model:

where *G*, *C* and *E* are *N* × *M* matrices, and represent additive genetic effects, common environmental factors and unique environmental factors, respectively. We have the following distributional assumptions:

where vec(·) is the matrix vectorization operator that converts a matrix into a vector by stacking its columns, is the Kronecker product of matrices, is the genetic covariance matrix, whose *rs*th element is the genetic covariance between *y*_{r} and *y*_{s}, is the common environmental covariance matrix, and is the unique environmental covariance matrix. The distributional assumptions in equation (4) indicate that both the genetic effects and environmental factors can be correlated across trait dimensions. When the trait is a scalar, the multivariate model (3) degenerates to the univariate model specified in equation (1). Analogous to the discussion above, *K* is derived from pedigree information in familial studies and empirically estimated from genome-wide SNP data in unrelated subject studies. As a result, Σ_{A} denotes the genetic covariance due to common SNPs when analysing unrelated subjects.

We define the heritability of a multidimensional trait as

where is the phenotypic covariance matrix and tr[·] is the trace operator of a matrix. This definition measures the proportion of the total phenotypic variance tr[Σ_{P}] that can be explained by the total additive genetic variance tr[Σ_{A}], and yields a heritability measure that is bounded between 0 and 1. When the trait is univariate, Σ_{A}, Σ_{C} and Σ_{E} become scalars, and equation (5) reduces to the classical definition of heritability in equation (2). We use in place of *h*^{2} in unrelated subject studies to emphasize that it only captures genetic influences due to common genetic variants and is a lower bound for the narrow-sense heritability estimated using familial designs.

### Properties of multidimensional heritability

Our definition of heritability is invariant to rotations of the data. For a linear transformation *T* applied to the trait dimensions in model (3), that is,

using the properties of vectorization and the Kronecker product, the covariance structure of the transformed trait *YT* can be computed as follows:

Therefore, the transformed heritability is

When *T* is an orthogonal matrix satisfying *TT*^{⊤}=*T*^{⊤}*T*=*I*, we have =*h*^{2}.

The definition of heritability in equation (5) can also been written as

where with ∑_{m}*γ*_{m}=1, and is the heritability of the *m*th component of the trait. Therefore, our definition of the heritability of a multidimensional trait is essentially a weighted average of the heritability of its individual components.

### A moment-matching estimator for unrelated subject studies

The model (3) can in principle be fitted using likelihood-based methods. However, this can be computationally expensive when the dimension of the trait is moderate. Here we derive an alternative moment-matching estimator for unrelated subject studies where the common environmental matrix Λ vanishes. Specifically, the multivariate model *Y*=*G*+*E* and its distributional assumptions and lead to the following relationship:

Therefore, an unbiased estimator of and can be obtained by regressing *y*_{r}, the empirical estimate of the phenotypic covariance matrix cov[*y*_{r}, *y*_{s}], onto the empirical genetic similarity matrix *K* and identity matrix *I*. In particular, we consider the following multiple regression problem:

where is the residual of this regression. This approach is essentially the Haseman–Elston regression for the classical heritability analysis^{59,60}, and has been extended recently to handle various study designs including case–control studies, and more generally termed as phenotype correlation–genetic correlation regression^{35}. The ordinary least squares estimator of the multiple regression problem (11) satisfies the linear system:

and can be explicitly written as

where we have defined =tr[*K*]/*N*, *κ*=tr[*K*^{2}]/*N* and *v*_{K}=tr[*K*^{2}]−tr^{2}[*K*]/*N*=*N*(*κ*−^{2}). Therefore, it can be seen that unbiased estimates of the genetic and environmental covariance matrices are as follows:

Let , the SNP heritability of a multidimensional trait can then be estimated as

For scalar traits, equation (15) degenerates to the classical Haseman–Elston regression estimator.

### Sampling variance of the point estimator

We now derive the variance of . For notational simplicity, we denote *Q*_{A}=(*K*−*I*)/*v*_{K}, *Q*_{E}=(*κ**I*−*K*)/*v*_{K}, and also , , *t*_{P}=*t*_{A}+*t*_{E}. Let *t*=(*t*_{A}, *t*_{E})^{⊤} and define *f*(*t*)=*t*_{A}/(*t*_{A}+*t*_{E})=*t*_{A}/*t*_{P}. Using a Taylor expansion, we can approximate the variance of the function *f* as follows:

where

To compute cov[*t*], we define *V*_{rs}=cov[*y*_{r}, *y*_{s}]=*K*+*I*, and notice that for any symmetric matrices *Q*_{α} and *Q*_{β},

Therefore,

Equation (19) can be computationally expensive for a moderate or large *M*. We approximate equation (19) using two assumptions that are often made when estimating the sampling variance of the heritability estimator in the study of unrelated individuals^{61}: (1) the off-diagonal elements in the empirical genetic similarity matrix *K* are small, such that *K*≈*I* and *V*_{rs}=*K*+*I*≈*I*+*I*=*I*; and (2) the phenotypic covariance Σ_{P} is known or can be estimated with very high precision. Using assumption (1), the covariance of *t* can be simplified as follows:

Therefore,

where in the last approximation we have used assumption (2) and replaced Σ_{P} with its empirical estimate . We note that given the empirical genetic similarity matrix *K*, the estimator (21) only depends on the sample size and the phenotypic covariance structure.

For scalar traits, , and the estimator (21) degenerates to , which coincides with existing results in the literature^{61}. In general, the covariance matrix is non-negative definite. Let denote its eigenvalues, we have

This inequality becomes an equality if and only if , that is, the *M* traits are perfectly correlated. Therefore, combining multiple traits reduces the variability of heritability estimates relative to analysing each trait individually.

### Statistic inference

To measure the significance of a heritability estimate, a *P* value can be computed by conducting a Wald test. Since the null hypothesis, , lies on the boundary of the parameter space, the Wald test statistic is distributed as

a half–half mixture of , a *χ*^{2} distribution with all probability mass at zero, and , a *χ*^{2} distribution with 1 degrees of freedom^{62}.

Alternatively, permutation inference can be used by shuffling the rows and columns of the empirical genetic similarity matrix *K*. For each permutation *r*=1, 2, ⋯, *N*_{perm}, we record the heritability estimate computed from the permuted data. Then, for an observed heritability estimate , the permutation *P* value can be computed as^{63}

### Heritability estimation in familial studies

A moment-matching estimator can be analogously derived for familial data analysis, but has low statistical efficiency due to the strong correlation between the genetic similarity matrix *K* and the common environmental matrix Λ. Therefore, when analysing the HCP data, we estimate the heritability of each individual component of a multidimensional trait using the restricted maximum likelihood algorithm and combine these estimates using a variance-weighted average as derived in equation (9).

To estimate the variance of the heritability estimate of a multidimensional trait, we employ a block bootstrapping procedure whereby families are randomly resampled with replacement to produce a bootstrap sample and the heritability is re-estimated. This procedure is repeated for *N*_{boots} times (*N*_{boots}=1,000 in the present study) to yield bootstrap heritability estimates , *b*=1, 2,⋯, *N*_{boots}. The variance of the heritability estimate is then estimated as^{64}

where .

### Modelling covariates

When covariates or nuisance variables need to be adjusted, model (3) becomes a multivariate linear mixed effects model:

with the distributional assumptions , and , where *X* is an *N* × *q* matrix of covariates, and *B* is a *q* × *M* matrix of fixed effects. We employ a strategy described in ref. 65 to removes the covariate matrix from the model. Specifically, there exists an *N* × (*N*−*q*) matrix *U*, satisfying *U*^{⊤}*U*=*I*_{(N−q) × (N−q)}, *UU*^{⊤}=*P*_{0}, and *U*^{⊤}*X*=0, where *P*_{0}=*I*−*X*(*X*^{⊤}*X*)^{−1}*X*^{⊤}. The matrix *U*^{⊤} projects the data from the *N* dimensional space onto an *N*−*q* dimensional subspace:

where , and . The transformed model is the same as model (3) and thus all estimation and inferential methods developed above can be applied.

### The Brain Genomics Superstruct Project

The Harvard/MGH Brain GSP is a neuroimaging and genetics study of brain and behavioural phenotypes. More than 3,500 native English-speaking adults with normal or corrected-to-normal vision were recruited from Harvard University, MGH and the surrounding Boston communities. To avoid spurious effects resulting from population stratification, we restricted our analyses to 1,320 young adults (18–35 years) of non-Hispanic European ancestry with no history of psychiatric illnesses or major health problems (age, 21.54±3.19 years; female, 53.18%; right-handedness, 91.74%). All participants provided written informed consent in accordance with guidelines set by the Partners Health Care Institutional Review Board or the Harvard University Committee on the Use of Human Subjects in Research. For further details about the recruitment process, participants and imaging data acquisition, we refer the reader to Holmes *et al.*^{36,37}.

### The Human Connectome Project

The HCP collects imaging, behavioural and demographic data from a large population of healthy adults and aims to shed light on anatomical and functional connectivity within the healthy human brain. We used preprocessed structural MRI data from the WU-Minn HCP consortium and selected subjects that have a similar age range (22–35 years) and ancestry (non-Hispanic/Latino European) as the GSP sample. The 590 subjects we analysed (age, 29.21±3.45 years; female 55.93%) come from 249 families and comprise 72 MZ twin pairs, 69 DZ twin pairs, 253 full siblings of twins and 55 singletons (single birth individuals without siblings). Further details about the recruitment process, imaging data acquisition and MRI data preprocessing can be found in^{41,66}.

### Genetic analysis

We used PLINK 1.90 (https://www.cog-genomics.org/plink2)^{67}, to preprocess the GSP genome-wide SNP data. Major procedures included sex discrepancy check, removing population outliers, spuriously related subjects and subjects with low genotype call rate (<97%). Individual markers that contained an ambiguous strand assignment and that did not satisfy the following quality control criteria were excluded from the analyses: genotype call rate , minor allele frequency and Hardy–Weinberg equilibrium . A total of 574,632 SNPs remained for analysis after quality control. We performed a multidimensional scaling analysis to ensure that no clear population stratification and outliers exist in the sample (Supplementary Fig. 3). The genetic similarity matrix was estimated from all genotyped autosomal SNPs.

### LBS-based shape descriptor

The intrinsic geometry of any 2D or three-dimensional manifold can be characterized by its LBS^{38,39}, which is obtained by solving the following Laplacian eigenvalue problem (or Helmoltz equation):

where Δ is the Laplace–Beltrami operator, a generalization of the Laplacian in the Euclidean space to manifolds, *f* is a real-valued eigenfunction defined on a Riemannian manifold and *λ* is the corresponding eigenvalue. Equation (28) can be solved by the finite element method, yielding a diverging sequence of eigenvalues . An implementation of the algorithm is freely available (http://reuter.mit.edu/software/shapedna). The first *M* eigenvalues of the LBS can be used to define a description of the object, which provides a numerical fingerprint or signature of the shape, and is thus known as (length-*M*) ‘Shape-DNA’.

### Shape analysis pipeline

We used FreeSurfer (http://freesurfer.net)^{68}, version 4.5.0, a freely available, widely used, and extensively validated brain MRI analysis software package, to process the GSP structural brain MRI scans and label subcortical brain structures. HCP MRI scans were preprocessed by the WU-Minn HCP consortium, and the label files of subcortical structures have been made available^{41,66}. Surface meshes of brain structures were obtained via marching cubes from subcortical segmentations. We created triangular meshes on the boundary surfaces for 20 structures. We then geometrically smoothed these meshes and solved the eigenvalue problem of the 2D Laplace–Beltrami operator on each of these representations, yielding the LBS-based shape descriptor^{40}. A python implementation of this pipeline is freely available (http://reuter.mit.edu/software/brainprint).

### Heritability analyses of neuroanatomical shape

We treated the length-*M* LBS-based shape descriptor of each structure as a multidimensional trait and quantified its heritability. In the case of a closed manifold without a boundary, the first eigenvalue is always zero and was thus removed from analysis. Theoretical and empirical evidence have confirmed that the eigenvalues grow linearly and their variance grows quadratically^{38,40}. To avoid that higher eigenvalues dominate the phenotypic covariance, we re-weighted the *m*th eigenvalue for the *i*th subject as^{38}:

This ensures a balanced contribution of lower and higher eigenvalues on the phenotypic covariance. The LBS also depends on the overall size of the structure. To measure the genetic influences on the shape that are complementary to volume, we further scaled the eigenvalues as:

where *V*_{i} is the volume of the structure for the *i*th subject. Since scaling the eigenvalues by a factor *η* results in scaling the underlying manifold by a factor *η*^{−1/2} (ref. 38) , the normalization (30) ensures that the volumes of the structure are identical across individuals.

We combined the left and right structures in our heritability analyses by averaging their volumetric measurements and concatenating their re-weighted and scaled shape descriptors into one multidimensional trait. We included age, gender, handedness, scanner group, console group and the top 10 PCs of the empirical genetic similarity matrix as covariates when analysing the GSP sample, and included age, gender and handedness as covariates when analysing the HCP sample. To remove potential size effect, we always explicitly included the volume of the corresponding structure as a covariate in our shape analyses.

The number of eigenvalues incorporated in the LBS-based shape descriptor and the amount of smoothing applied to the surface mesh are crucial study designs, which might have an impact on heritability estimates. In particular, incorporating a very small number of eigenvalues may be insufficient to characterize the shape of a structure, while very large eigenvalues typically capture fine-scale details, which can be noise and thus might reduce sensitivity. In this study, we reported results obtained by incorporating 50 eigenvalues in the shape descriptor and applying three iterations of geometric smoothing to the surface mesh. We conducted sensitivity analyses and confirmed that in the present shape analysis the results were largely robust to different parameter settings (Supplementary Fig. 1).

### Visualizing the principal mode of shape variation

We note that, as shown above, our definition of the heritability of a multidimensional trait is a variance-weighted average of individual components, and is invariant to the rotation of the trait vector. Therefore, an equivalent definition of the heritability of a length-*M* LBS-based shape descriptor is the variance-weighted average of the heritability of the first *M* PCs of the descriptor, because PC analysis is essentially a rotation of the data. The first PC thus explains the greatest shape variation and has the largest impact on the overall heritability estimate of the shape.

To visualize shape variation along the first PC of the shape descriptor for a given structure, we first aligned the structures from all subjects to a template, *fsaverage*, which is a population average distributed with FreeSurfer^{68}, using a seven-parameter (global scaling plus six-parameter rigid body transformation) registration with linear interpolation. Both individual structures and the template were represented with binary label maps, where voxels within the corresponding segmentation label had one and the remainder of the volume had zero values. The registration algorithm maximized the overlap, measured with the Dice score^{69}, between the corresponding label maps (the fixed template and moving subject that was interpolated and thresholded at 0.5). Note that LBS is invariant to the spatial position and orientation of an object, and we had normalized the shape descriptor for volume in all the analyses. Thus, this registration has no impact on the results of our heritability analyses. We then created a sample-specific population average of the structure by computing a weighted average of the interpolated subject images. In particular, each subject was associated with a weight equal to a Gaussian kernel centred around the mean of the first PC and evaluated at the subject’s first PC of the shape descriptor. The width of the kernel was selected such that 500 shapes received non-zero weights. The isosurface of the resulting probability map at 0.5 was used to represent the average shape of the structure, and all visualizations were presented on this surface.

The same Gaussian kernel was used to generate average probability images for shapes centred at the two extremes (±2 s.d.) of the principal axis. These average probability images were offset to achieve identical volumes when thresholded at 0.5. The difference of the two extreme shapes was depicted on the sample-specific population average, by visualizing the difference in the probability values. Blue indicated that the average shape at −2 s.d. achieved a higher probability value and thus was larger in those regions than the average shape at the +2 s.d. For red regions, the opposite was true.

### Data availability

The Brain GSP data analysed during the current study are publicly available at http://neuroinformatics.harvard.edu/gsp/. The HCP data analysed during the current study are publicly available at http://www.humanconnectome.org. Other data are available from the corresponding author on reasonable request.

## Additional information

**How to cite this article:** Ge, T. *et al.* Multidimensional heritability analysis of neuroanatomical shape. *Nat. Commun.* **7,** 13291 doi: 10.1038/ncomms13291 (2016).

**Publisher's note:** Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## References

- 1.
Manolio, T. A.

*et al.*Finding the missing heritability of complex diseases.*Nature***461**, 747–753 (2009). - 2.
Sullivan, P. F., Daly, M. J. & O’Donovan, M. Genetic architectures of psychiatric disorders: the emerging picture and its implications.

*Nat. Rev. Genet.***13**, 537–551 (2012). - 3.
Visscher, P. M., Brown, M. A., McCarthy, M. I. & Yang, J. Five years of GWAS discovery.

*Am. J. Hum. Genet.***90**, 7–24 (2012). - 4.
Schizophrenia Working Group of the Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci.

*Nature***511**, 421–427 (2014). - 5.
Cross-Disorder Group of the Psychiatric Genomics Consortium. Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis.

*Lancet***381**, 1371–1379 (2013). - 6.
Gratten, J., Wray, N. R., Keller, M. C. & Visscher, P. M. Large-scale genomics unveils the genetic architecture of psychiatric disorders.

*Nat. Neurosci.***17**, 782–790 (2014). - 7.
Gottesman, I. I. & Shields, J.

*Schizophrenia Genetics: A Twin Study Vantage Point*Academic Press (1972). - 8.
Gottesman, I. I. & Gould., T. D. The endophenotype concept in psychiatry: etymology and strategic intentions.

*Am. J. Psychiatry***160**, 636 (2003). - 9.
Meyer-Lindenberg, A. & Weinberger, D. R. Intermediate phenotypes and genetic mechanisms of psychiatric disorders.

*Nat. Rev. Neurosci.***7**, 818–827 (2006). - 10.
Thompson, P. M., Ge, T., Glahn, D. C., Jahanshad, N. & Nichols, T. E. Genetics of the connectome.

*Neuroimage***80**, 475–488 (2013). - 11.
Potkin, S. G.

*et al.*Hippocampal atrophy as a quantitative trait in a genome-wide association study identifying novel susceptibility genes for Alzheimer’s disease.*PLoS ONE***4**, e6501 (2009). - 12.
Shen, L.

*et al.*Whole genome association study of brain-wide imaging phenotypes for identifying quantitative trait loci in MCI and AD: a study of the ADNI cohort.*Neuroimage***53**, 1051–1063 (2010). - 13.
Stein, J. L.

*et al.*Identification of common variants associated with human hippocampal and intracranial volumes.*Nat. Genet.***44**, 552–561 (2012). - 14.
Sabuncu, M. R.

*et al.*The association between a polygenic alzheimer score and cortical thickness in clinically normal subjects.*Cereb. Cortex***22**, 2653–2661 (2012). - 15.
Csernansky, J. G.

*et al.*Hippocampal morphometry in schizophrenia by high dimensional brain mapping.*Proc. Natl Acad. Sci. USA***95**, 11406–11411 (1998). - 16.
Csernansky, J. G.

*et al.*Abnormalities of thalamic volume and shape in schizophrenia.*The American Journal of Psychiatry***161**, 896–902 (2004). - 17.
Joshi, S. H.

*et al.*Statistical shape analysis of the corpus callosum in schizophrenia.*Neuroimage***64**, 547–559 (2013). - 18.
Vidal, C. N.

*et al.*Mapping corpus callosum deficits in autism: an index of aberrant cortical connectivity.*Biol. Psychiatry***60**, 218–225 (2006). - 19.
Dierker, D. L.

*et al.*Analysis of cortical shape in children with simplex autism.*Cereb. Cortex***25**, 1042–1051 (2015). - 20.
Hwang, J.

*et al.*Basal ganglia shape alterations in bipolar disorder.*Am. J. Psychiatry***163**, 276–285 (2006). - 21.
Bearden, C. E.

*et al.*Three-dimensional mapping of hippocampal anatomy in unmedicated and lithium-treated patients with bipolar disorder.*Neuropsychopharmacology***33**, 1229–1238 (2008). - 22.
Csernansky, J. G.

*et al.*Preclinical detection of Alzheimer’s disease: hippocampal shape and volume predict dementia onset in the elderly.*Neuroimage***25**, 783–792 (2005). - 23.
Gerard, E.

*et al.*Multidimensional classification of hippocampal shape features discriminates Alzheimer’s disease and mild cognitive impairment from normal aging.*Neuroimage***47**, 1476–1486 (2009). - 24.
Costarred, S. G.

*et al.*Automated hippocampal shape analysis predicts the onset of dementia in mild cognitive impairment.*Neuroimage***56**, 212–219 (2011). - 25.
Tang, X.

*et al.*Shape abnormalities of subcortical and ventricular structures in mild cognitive impairment and Alzheimer’s disease: detecting, quantifying, and predicting.*Hum. Brain Mapp.***35**, 3701–3725 (2014). - 26.
Posener, J. A.

*et al.*High-dimensional mapping of the hippocampus in depression.*Am. J. Psychiatry***160**, 83–89 (2003). - 27.
Qiu, A.

*et al.*Basal ganglia volume and shape in children with attention deficit hyperactivity disorder.*Am. J. Psychiatry***166**, 74–82 (2009). - 28.
Qiu, A.

*et al.*APOE related hippocampal shape alteration in geriatric depression.*Neuroimage***44**, 620–626 (2009). - 29.
Shi, J.

*et al.*Genetic influence of apolipoprotein E4 genotype on hippocampal morphometry: an*n*=725 surface-based Alzheimer’s disease neuroimag-ing initiative study.*Hum. Brain Mapp.***35**, 3903–3918 (2014). - 30.
Fani, N.

*et al.*FKBP5 and attention bias for threat associations with hippocampal function and shape.*JAMA Psychiatry***70**, 392–400 (2013). - 31.
Roalf, D. R.

*et al.*Heritability of subcortical and limbic brain volume and shape in multiplex-multigenerational families with schizophrenia.*Biol. Psychiatry***77**, 137–146 (2015). - 32.
Hibar, D. P.

*et al.*Common genetic variants influence human subcortical brain structures.*Nature***520**, 224–229 (2015). - 33.
Yang, J.

*et al.*Common SNPs explain a large proportion of the heritability for human height.*Nat. Genet.***42**, 565–569 (2010). - 34.
Yang, J., Lee, S. H., Goddard, M. E. & Visscher., P. M. GCTA: a tool for genome-wide complex trait analysis.

*Am. J. Hum. Genet.***88**, 76–82 (2011). - 35.
Golan, D., Lander, E. S. & Rosset, S. Measuring missing heritability: inferring the contribution of common variants.

*Proc. Natl Acad. Sci. USA***111**, E5272–E5281 (2014). - 36.
Holmes, A. J.

*et al.*Individual differences in amygdala-medial prefrontal anatomy link negative affect, impaired social functioning, and polygenic depression risk.*J. Neurosci.***32**, 18087–18100 (2012). - 37.
Holmes, A. J.

*et al.*Brain Genomics Super-struct Project initial data release with structural, functional, and behavioral measures.*Sci. Data***2**, 150031 (2015). - 38.
Reuter, M., Wolter, F. E. & Peinecke, N. Laplace-Beltrami spectra as ‘Shape-DNA’ of surfaces and solids.

*Comput. Aided Des.***38**, 342–366 (2006). - 39.
Reuter, M., Wolter, F. E., Shenton, M. & Niethammer, M. Laplace-Beltrami eigenvalues and topological features of eigenfunctions for statistical shape analysis.

*Comput. Aided Des.***41**, 739–755 (2009). - 40.
Wachinger, C.

*et al.*BrainPrint: a discriminative characterization ofbrain morphology.*Neuroimage***109**, 232–248 (2015). - 41.
Van Essen, D. C.

*et al.*The WU-Minn human connectome project: an overview.*Neuroimage***80**, 62–79 (2013). - 42.
Reuter, M. Hierarchical shape segmentation and registration via topological features ofLaplace-Beltrami eigenfunctions.

*Int. J. Comput. Vision***89**, 287–308 (2009). - 43.
Lian, Z.

*et al.*A comparison of methods for non-rigid 3D shape retrieval.*Pattern Recogn.***46**, 449–461 (2013). - 44.
Bron, E. E.

*et al.*Standardized evaluation of algorithms for computer-aided diagnosis of dementia based on structural MRI: the CADDementia challenge.*Neuroimage***111**, 562–579 (2015). - 45.
Lin., L. I. K. A concordance correlation coefficient to evaluate reproducibility.

*Biometrics***45**, 255–268 (1989). - 46.
Visscher, P. M., McEvoy, B. & Yang., J. From Galton to GWAS: quantitative genetics of human height.

*Genet. Res.***92**, 371–379 (2010). - 47.
Tenesa, A. & Haley., C. S. The heritability of human disease: estimation, uses and abuses.

*Nat. Rev. Genet.***14**, 139–149 (2013). - 48.
Mmioz, M.

*et al.*Evaluating the contribution of genetics and familial shared environment to common disease using the UK Biobank.*Nat. Genet.***48**, 980–983 (2016). - 49.
Zuk, O., Hechter, E., Sunyaev, S. R. & Lander., E. S. The mystery of missing heritability: Genetic interactions create phantom heritability.

*Proc. Natl Acad. Sci. USA***109**, 1193–1198 (2012). - 50.
Zaitlen, N.

*et al.*Using extended genealogy to estimate components of heritability for 23 quantitative and dichotomous traits.*PLoS Genet.***9**, e1003520 (2013). - 51.
Zaitlen, N.

*et al.*Leveraging population admixture to characterize the heritability of complex traits.*Nat. Genet.***46**, 1356–1362 (2014). - 52.
Mamah, D.

*et al.*Basal ganglia shape abnormalities in the unaffected siblings of schizophrenia patients.*Biol. Psychiatry***64**, 111–120 (2008). - 53.
Harms, M. P.

*et al.*Thalamic shape abnormalities in individuals with schizophrenia and their nonpsychotic siblings.*J. Neurosci.***27**, 13835–13842 (2007). - 54.
Van Essen., D. C. A tension-based theory of morphogenesis and compact wiring in the central nervous system.

*Nature***385**, 313–318 (1997). - 55.
Pievani, M.

*et al.*APOE4 is associated with greater atrophy of the hippocampal formation in Alzheimer’s disease.*Neuroimage***55**, 909–919 (2011). - 56.
Yang, J.

*et al.*Genome partitioning of genetic variation for complex traits using common SNPs.*Nat. Genet.***43**, 519–525 (2011). - 57.
Neale, M. & Cardon, L.

*Methodology for Genetic Studies of Twins and Families*Springer (1992). - 58.
Almasy, L. & Blangero, J. Multipoint quantitative-trait linkage analysis in general pedigrees.

*Am. J. Hum. Genet.***62**, 1198–1211 (1998). - 59.
Haseman, J. K. & Elston., R. C. The investigation of linkage between a quantitative trait and a marker locus.

*Behav. Genet.***2**, 3–19 (1972). - 60.
Elston, R. C., Buxbaum, S., Jacobs, K. B. & Olson., J. M. Haseman and Elston revisited.

*Genet. Epidemiol.***19**, 1–17 (2000). - 61.
Visscher, P. M.

*et al.*Statistical power to detect genetic (co)variance of complex traits using SNP data in unrelated samples.*PLoS Genet.***10**, e1004269 (2014). - 62.
Molenberghs, G. & Verbeke, G. Likelihood ratio, score, and Wald tests in a constrained parameter space.

*Am. Stat.***61**, 22–27 (2007). - 63.
Westfall, P. H. & Young, S. S.

*Resampling-Based Multiple Testing: examples and Methods for P-value Adjustment*Vol. 279 John Wiley & Sons (1993). - 64.
Efron, B. & Tibshirani., R. J.

*An introduction to the Bootstrap*CRC press (1994). - 65.
Ge, T.

*et al.*Massively expedited genome-wide heritability analysis (MEGHA).*Proc. Natl Acad. Sci. USA***112**, 2479–2484 (2015). - 66.
Glasser, M. F.

*et al.*The minimal preprocessing pipelines for the Human Connectome Project.*Neuroimage***80**, 105–124 (2013). - 67.
Chang, C. C.

*et al.*Second-generation PLINK: rising to the challenge of larger and richer datasets.*GigaScience***4**, doi:10.1186/s13742-015-0047-8 (2015). - 68.
Fischl., B. Freesurfer.

*Neuroimage***62**, 774–781 (2012). - 69.
Dice, L. R. Measures of the amount of ecologic association between species.

*Ecology***26**, 297–302 (1945).

## Acknowledgements

This research was carried out in whole at the Athinoula A. Martinos Center for Biomedical Imaging at the MGH, using resources provided by the Center for Functional Neuroimaging Technologies, P41EB015896, a P41 Biotechnology Resource Grant supported by the National Institute of Biomedical Imaging and Bioengineering (NIBIB), National Institutes of Health (NIH). This work also involved the use of instrumentation supported by the NIH Shared Instrumentation Grant Program; specifically, grant numbers S10RR023043 and S10RR023401. Data were provided in part by the Brain GSP of Harvard University and MGH, with support from the Center for Brain Science Neuroinformatics Research Group, Athinoula A. Martinos Center for Biomedical Imaging, Center for Human Genetic Research and Stanley Center for Psychiatric Research. Twenty individual investigators at Harvard and MGH generously contributed data to the overall project. Data were also provided in part by the HCP, WU-Minn Consortium (principal investigators: David Van Essen and Kamil Ugurbil; 1U54MH091657) funded by the 16 NIH Institutes and Centers that support the NIH Blueprint for Neuroscience Research; and by the McDonnell Center for Systems Neuroscience at Washington University. This research was carried out in whole or in part at the Athinoula A. Martinos Center for Biomedical Imaging at the MGH, using resources provided by the Center for Functional Neuroimaging Technologies, P41EB015896, a P41 Biotechnology Resource Grant supported by the NIBIB, NIH. This research was also funded in part by NIH grants K25CA181632 (to M.R.); K01MH099232 (to A.J.H.); K99MH101367 (to P.H.L.); R21AG050122-01A1, R41AG052246-01 and 1K25EB013649-01 (to M.R.S.); K24MH094614 and R01MH101486 (to J.W.S.); an MGH ECOR Tosteson Postdoctoral Fellowship Award (to T.G.); the Brazilian National Research Council (CNPq), grant number 211534/2013-7 (to A.M.W.); and a BrightFocus Foundation grant AHAF-A2012333 (to M.R.S.). J.W.S. is a Tepper Family MGH Research Scholar.

## Author information

## Author notes

- Jordan W. Smoller
- & Mert R. Sabuncu

These authors contributed equally to this work

## Affiliations

### Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital/Harvard Medical School, Charlestown, Massachusetts 02129, USA

- Tian Ge
- , Martin Reuter
- , Avram J. Holmes
- , Lee S. Tirrell
- , Randy L. Buckner
- & Mert R. Sabuncu

### Psychiatric and Neurodevelopmental Genetics Unit, Center for Human Genetic Research, Massachusetts General Hospital, Boston, Massachusetts 02114, USA

- Tian Ge
- , Phil H. Lee
- & Jordan W. Smoller

### Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02138, USA

- Tian Ge
- , Phil H. Lee
- & Jordan W. Smoller

### Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA

- Martin Reuter
- & Mert R. Sabuncu

### Centre for Functional MRI of the Brain (FMRIB), University of Oxford, Oxford OX3 9DU, UK

- Anderson M. Winkler

### Department of Psychology, Yale University, New Haven, Connecticut 06520, USA

- Avram J. Holmes

### Department of Psychiatry, Massachusetts General Hospital/Harvard Medical School, Boston, Massachusetts 02114, USA

- Avram J. Holmes
- , Joshua L. Roffman
- & Randy L. Buckner

### Department of Psychology and Center for Brain Science, Harvard University, Cambridge, Massachusetts 02138, USA

- Randy L. Buckner

## Authors

### Search for Tian Ge in:

### Search for Martin Reuter in:

### Search for Anderson M. Winkler in:

### Search for Avram J. Holmes in:

### Search for Phil H. Lee in:

### Search for Lee S. Tirrell in:

### Search for Joshua L. Roffman in:

### Search for Randy L. Buckner in:

### Search for Jordan W. Smoller in:

### Search for Mert R. Sabuncu in:

### Contributions

T.G., J.L.R., R.L.B., J.W.S. and M.R.S. designed the research; T.G., M.R. and M.R.S. contributed new analytic tools; T.G., A.J.H., P.H.L., L.S.T. and M.R.S. analysed the data; T.G., A.M.W. and M.R.S. wrote the paper; all authors contributed to the preparation and revision of the manuscript.

### Competing interests

The authors declare no competing financial interests.

## Corresponding authors

Correspondence to Tian Ge or Mert R. Sabuncu.

## Supplementary information

## PDF files

- 1.
### Supplementary Information

Supplementary Figures 1-3 and Supplementary Table 1

## Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/