Introduction

Although developmental research from childhood to adolescence reveals species-general changes in brain structure and function,1, 2 much less is known about the development of individual differences within our species, which has been called ‘one of the preeminent challenges of neuroimaging’.3 It is important to understand the developmental etiology of individual differences, because societal problems often involve individual differences—for example, why some children are slow to speak, to learn or to read. The description and causes of species’ means are not necessarily related to the description and causes of variances within a species.4 Two well-replicated genetic findings from twin studies comparing monozygotic and dizygotic (DZ) twins suggest hypotheses at the level of individual differences in cognitive ability that may be relevant to neuroscience, to the extent that brain structure and function underlie cognitive outcomes. These twin-study findings involve general cognitive ability, which was labeled g by Spearman more than a century ago,5 but is commonly known as intelligence.6 g is the most researched cognitive trait in genetics7 and has important links with neuroscience.8, 9

First, the heritability of g increases during development, even from childhood to adolescence.10 This finding is counterintuitive to the extent that genetic effects are thought to be static, and environmental effects are expected to accumulate during development. The increasing heritability of g also seems at odds with the second genetic finding: The same genes largely affect g throughout development.11 For example, in a longitudinal twin analysis from childhood to adolescence, the genetic correlation was estimated as 0.96, although the 95% confidence interval for this estimate was 0.74–1.0.12 The genetic correlation is literally the correlation between genetic effects on g at the two ages independent of heritability.11 The high genetic correlation implies that if a gene is found to be associated with g in childhood, the gene is also highly likely to be associated with g in adolescence. Later, we offer a hypothesis as to how heritability can increase when genetic effects are stable from age to age.

These two genetic findings have not found much traction in the neurodevelopmental literature. This neglect might be due in part to a lack of attention to individual differences, but it might also be due to skepticism about the twin method, which relies on some major assumptions, most notably, equal environmental treatment of monozygotic and DZ twins.11 Quantitative genetic designs such as the twin method would no longer be needed if it were possible to identify all of the genes responsible for heritability.13 However, it has proven more difficult than expected to identify genes for complex traits,14 including g,15 which has led to the refrain of ‘missing heritability’.16, 17 Nonetheless, it is now possible to use DNA itself to estimate genetic variance and covariance in any sample of unrelated individuals, not just samples consisting of special family members such as twins or adoptees. The method, called genome-wide complex trait analysis (GCTA)18 correlates genomic similarity across hundreds of thousands of single nucleotide polymorphisms (SNPs) with phenotypic similarity in a large sample of unrelated individuals.19 This population-based DNA approach does not rely on the strong assumptions made in classical twin studies. GCTA compares similarity across hundreds of thousands of SNPs with phenotypic similarity pair by pair in a large sample of unrelated individuals. Although conventionally unrelated individuals only vary in their genetic similarity by a small amount, GCTA accumulates all the genotype−phenotype association signals using the massive information available in a matrix of thousands of individuals, each compared pair by pair with every other individual in the sample. GCTA has been used to estimate genetic influence for height,19 weight,20 psychiatric and medical disorders,21, 22, 23 personality24 and even economic and political preferences.25 GCTA has also been applied to g in adults26 and children.27 These GCTA estimates of genetic influence, although substantial, have been lower than heritability typically found in twin studies of these traits. Using the 12-year data from the sample in the present report, GCTA and twin estimates of heritability were compared explicitly for several cognitive measures; the GCTA estimate of g was 35% and the twin estimate was 46%.28 Precision in comparing GCTA and twin estimates is important because, as explained later, this comparison reveals important information about a trait’s genetic architecture.

This previous GCTA research involves univariate analysis in that it decomposes the phenotypic variance of a single trait into genetic and non-genetic components of variance. Recently, GCTA has been extended to bivariate analysis, which decomposes the phenotypic covariance between traits into components of covariance. The first preliminary attempt to extend GCTA to bivariate analysis reported a genetic correlation of 0.62 for g in childhood (age 11) and old age.27 Here, we use a new bivariate GCTA method18, 29 to test the hypotheses of strong stability and increasing heritability for g from age 7 to 12. We also compare GCTA estimates with those from a twin analysis based on the same sample at the same ages using the same measures.

Materials and methods

Sample

The sample was drawn from the Twins Early Development Study (TEDS), which is a multivariate longitudinal twin-study that recruited over 11 000 twin pairs born in England and Wales in 1994, 1995 and 1996.30, 31 TEDS is representative of the UK population.32 The project received approval from the Institute of Psychiatry ethics committee (05/Q0706/228), and parental consent was obtained before data collection. Individuals were included if their first language was English and they had no major medical or psychiatric problems. GCTA was conducted on g at ages 7 and 12 for 2875 unrelated individuals in TEDS (only one member of a twin pair), of which 1334 had g data at both ages. Twin model-fitting analyses of g at ages 7 and 12 were conducted for 6702 TEDS twin pairs, of which 2269 pairs had g data at both ages. As expected for representative twin studies, the twins included similar numbers of monozygotic twins, same-sex DZ twins and opposite-sex DZ twins.

Genotyping

Although DNA is available for more than 12 000 TEDS participants, funds were available to genotype 3665 individuals (one member only per twin pair) on Affymetrix GeneChip 6.0 (Affymetrix Inc., Santa Clara, CA, USA) SNP genotyping arrays using standard experimental protocols as part of the WTCCC2 project. In addition to nearly 700 000 genotyped SNPs, more than one million other SNPs were imputed using IMPUTE v.2 software (https://mathgen.stats.ox.ac.uk/impute/impute.html).33 DNA for 3152 individuals (1446 males and 1706 females) survived quality control criteria. Of these 3152 individuals, 2875 had g scores at least at one age and 1344 had g scores at both ages. To control for ancestral stratification, we performed principal component analyses on a subset of 100 000 quality-controlled SNPs after removing SNPs in linkage disequilibrium (r2>0.2).34 Using the Tracy−Widom test,35 we identified 8 axes with P<0.05, which were used as covariates in GCTA analyses.

Measures

The measures and testing procedures have been described in detail for age 736 and 12.37 At each age, a composite index of g was derived from two verbal tests and two non-verbal tests. At age 7, the two verbal tests consisted of the Similarities subtest and the Vocabulary subtest from the WISC-III-UK, and the two non-verbal tests were the picture completion subtest from the WISC-III-UK and the Conceptual Grouping subtest from the McCarthy Scales of Children’s Abilities. At age 12, the verbal tests included the Information and Vocabulary subtests from the WISC-III-PI Multiple Choice test, and the two non-verbal reasoning tests were WISC-III-UK Picture Completion and Raven’s Standard and Advanced Progressive Matrices. At age 7, testing was conducted by telephone as described elsewhere;36 at age 12, testing was conducted online.37 For each cognitive measure at each age, scores were regressed on sex and age and standardized residuals were derived, ranked with random values given to tied data, and quantile normalized.38, 39 Finally, total composites for g were created as unit-weighted means requiring complete data for at least three of the four tests. All the procedures were executed using R (www.r-project.org).40

Statistical analyses

Genome-wide complex trait analysis

The first step in GCTA is to calculate pairwise genomic similarity between all pairs of individuals in the sample using all genetic markers genotyped on the SNP array. Because GCTA is designed to estimate genetic variance due to linkage disequilibrium between unknown causal variants and genotyped SNPs from a sample of unrelated individuals in the population, any close genetic relatedness is eliminated; for this reason any individual whose genetic similarity is equal to or greater than a fourth cousin is removed (estimate of pairwise relatedness >0.025). The essence of GCTA is to compare a matrix of pairwise genomic similarity to a matrix of pairwise phenotypic similarity using a random-effects mixed linear model.18 In univariate analysis, the variance of a trait can be partitioned using residual maximum likelihood into genetic and residual components. Detailed description of this method can be found in Yang, Lee et al.18 and Yang, Benyamin et al.19 The bivariate method extends the univariate model by relating the pairwise genetic similarity matrix to a phenotypic covariance matrix between traits 1 and 2, allowing for correlated residuals.29 The eight principal components described earlier were used as covariates in our GCTA analyses; as mentioned, all phenotypes were age- and sex-regressed before analysis.

Twin modeling

The classical twin design and model-fitting is discussed elsewhere.11 We fit a bivariate twin model using OpenMx,41 which provided a direct comparison with the bivariate GCTA. The correlated factor solution is the least restricted model allowing variables to correlate with one another via genetic, shared environment and non-shared environment. Because previous analyses of these data indicated nonsignificant differences in model-fitting results between males and females,32, 42 we combined same-sex and opposite DZ twin pairs in order to increase the power of the analyses. Twin analyses limited to same-sex twins yielded highly similar results (available from the first author).

Results and Discussion

Genetic stability

As shown in Table 1, the GCTA genetic correlation between g at ages 7 and 12 was 0.73 (0.29 standard error, s.e.). Table 2 shows that the twin-study yielded a highly similar genetic correlation of 0.75 (0.08 s.e.). The genetic correlation indexes the correlation between genetic effects on g at the two ages independent of heritability. That is, the genetic correlation can be high even if heritability is low. It is also possible to weight the genetic correlation by heritability in order to estimate the genetic contribution to the phenotypic correlation. The phenotypic correlation for g between ages 7 and 12 was 0.46 (0.02) for 2408 children (one member randomly chosen from each twin pair) with g data at both ages. For GCTA, the genetic contribution to the phenotypic correlation was 0.25 (0.11), which is the GCTA genetic correlation weighted by heritability (that is, the product of the square roots of the GCTA heritabilities of g at the two ages). Another way of expressing this is as bivariate heritability, which is the proportion of the phenotypic correlation that can be attributed to genetic covariance. GCTA bivariate heritability was 0.60 (that is, 0.25÷0.42), indicating that 60% of the phenotypic correlation could be accounted for by genetic factors. The comparable twin-study estimate of the genetic contribution to the phenotypic correlation was 0.31 (0.03), yielding a bivariate heritability of 0.68.

Table 1 Bivariate GCTA results (with standard errors) for general cognitive ability (g) from age 7 to 12a
Table 2 Bivariate twin model-fitting results (with standard errors) for general cognitive ability from age 7 to 12a

Increasing heritability

Despite the substantial genetic correlation of 0.73 from age 7–12, GCTA estimates of genetic influence on g increased from 0.26 (0.17 s.e.) at age 7 to 0.45 (0.14 s.e.) at age 12, although the large standard errors indicate that the increase did not reach statistical significance. Heritability increased significantly in the twin model-fitting analyses, from 0.36 (0.03) at age 7 to 0.49 (0.03) at age 12. Thus, GCTA estimates account for 74% of the twin-study heritability estimate of g at age 7 and 94% at age 12.

Why genetic stability but increasing heritability?

In summary, GCTA confirms the twin-study hypotheses of strong genetic stability and increasing heritability. In other words, the same genes are largely (about 75%) responsible for genetic influence on g at age 7 and age 12, yet the effect of these genes (heritability) increases substantially from age 7 to 12. How is this possible? We hypothesize that the same genes affect g from age to age but heritability increases as children select their own environments that are correlated with their g-related genetic propensities,10 a process called genotype−environment correlation.11 This hypothesis makes three predictions. The first prediction is that g-related experiences will themselves show genetic influence, for which there is considerable evidence from twin studies.43, 44 Second, the links between these experiences and g are expected to be mediated genetically, evidence which is beginning to emerge from twin studies.45 The third prediction is that genetic links between g and experience should strengthen during development, but this has not yet been investigated. These genetic links are expected especially for experiences in which children are able to select or modify their environments in line with their genetic propensities, in contrast to environments that are passively imposed on children. Supportive evidence to date for this genotype−environment hypothesis relies on twin data, but GCTA can also be used to address these issues with DNA alone.

Genetic architecture

Our GCTA results clarify the genetic architecture of g in ways that are relevant to solving the ‘missing heritability’ puzzle that has emerged from the limited success of genome-wide association studies to identify the genes responsible for heritability.46 Two of the major hypotheses to account for missing heritability are epistatic (nonadditive) genetic effects and rare variants, because genome-wide association research is limited to detecting additive genetic effects and genetic effects that can be tagged by the common SNPs used to date on commercially available DNA arrays.19 Because GCTA is also limited in these same two ways, finding significant GCTA estimates of genetic influence provides strong evidence that current genome-wide association research strategies can detect the majority of the missing heritability if samples are sufficiently large to provide power to detect associations of small effect size. As noted above, our GCTA estimates of genetic influence account for 74–94% of our twin-study heritability estimates, which implies that most of the missing heritability can be found with additive effects of common SNPs. The heritability that remains missing might be due to epistatic effects and rare variants.

In our longitudinal genetic analyses from age 7 to 12, the GCTA estimate of genetic covariance is also somewhat lower than the twin-study estimate. As shown in Table 1, the genetic covariance for g between ages 7 and 12—that is, the genetic contribution to the phenotypic covariance—is 20% lower for GCTA than for twins (that is, 0.25 for GCTA and 0.31 for twins). However, the GCTA genetic correlation of 0.73 is highly similar to the twin-study genetic correlation of 0.76. The likely reason is that GCTA genetic variance and covariance estimates are attenuated by imperfect linkage disequilibrium between causal variants and genotyped SNPs, but the GCTA estimate of the genetic correlation is unbiased, because the genetic correlation is derived from the ratio between genetic covariance and genetic variance. Because GCTA genetic variance and covariance estimates are biased to the same extent due to imperfect linkage disequilibrium, they cancel each other out in the calculation of the genetic correlation, leaving an unbiased estimate of the genetic correlation.

Implications for brain structure and function

To the extent that g indexes general brain function, the present results suggest hypotheses for the etiology of individual differences in brain development. The same genes can be expected to be responsible for individual differences throughout brain development despite the major mean changes that occur during development. The hypothesis of increasing heritability for individual differences in brain development points to genotype−environment correlation as the process by which genotypes become phenotypes. Importantly, the correspondence between GCTA and twin results indicates that special samples such as twins are no longer needed to test such genetic hypotheses in neurodevelopment—GCTA makes it possible to test them in any large sample of unrelated individuals.