Extracting stability increases the SNP heritability of emotional problems in young people

Twin studies have shown that emotional problems (anxiety and depression) in childhood and adolescence are moderately heritable (~20–50%). In contrast, DNA-based ‘SNP heritability’ estimates are generally <15% and non-significant. One notable feature of emotional problems is that they can be somewhat transient, but the moderate stability seen across time and across raters is predominantly influenced by stable genetic influences. This suggests that by capturing what is in common across time and across raters, we might be more likely to tap into any underlying genetic vulnerability. We therefore hypothesised that a phenotype capturing the pervasive stability of emotional problems would show higher heritability. We fitted single-factor latent trait models using 12 emotional problems measures across ages 7, 12 and 16, rated by parents, teachers and children themselves in the Twins Early Development Study sample. Twin and SNP heritability estimates for stable emotional problems (N = 6110 pairs and 6110 unrelated individuals, respectively) were compared to those for individual measures. Twin heritability increased from 45% on average for individual measures to 76% (se = 0.023) by focusing on stable trait variance. SNP heritability rose from 5% on average (n.s.) to 14% (se = 0.049; p = 0.002). Heritability was also higher for stable within-rater composites. Polygenic scores for both adult anxiety and depression significantly explained variance in stable emotional problems (0.4%; p = 0.0001). The variance explained was more than in most individual measures. Stable emotional problems also showed significant genetic correlation with adult depression and anxiety (average = 52%). These results demonstrate the value of examining stable emotional problems in gene-finding and prediction studies.


Table of Contents
Three studies in samples other than TEDS have estimated SNP heritabilities for quantitative measures of childhood anxiety and depression, known as emotional problems, or internalising'. The average SNP heritability estimate from these studies is 10%. Although this is twice as high as the average in this study from the TEDS sample (5%), 10% SNP heritability is still well below half of the twin estimates of these phenotypes in the literature (40-60% in TEDS). Larger samples are needed to clarify this inconsistent and underpowered literature. For example, the study of pre-school internalising symptoms (Benke et al., 2014) had maximum 35% power with a sample of 2000 if true SNP heritability was half of the twin estimate they cite (50%), and a minimum of 10% power with a sample of 1000 and heritability of 20%. Moreover, the large ranges within the studies makes the estimates difficult to interpret.  Note: 'gtsanxt1'= Teacher-rated SDQ at 7, 'gpsanxt1'= Parent-rated SDQ at 7, 'ltsanxt1'= Teacher-rated SDQ at 12, 'lpsanxt1' = Parent-rated SDQ at 12, 'lpmfqt1'= Parent-rated MFQ at 12, 'lcsanxt1'= Self-rated SDQ at 12 ,'lcmfqt1'= Self-rated MFQ at 12, 'ppbhanxt1'= Parent-rated ARBQ at 16, 'ppbhmfqt1= Parent-rated MFQ at 16', 'pcbhsdqanxt1'= Self-rated SDQ at 16,'pcbhcasit1'= Self-rated CASI at 16, 'pcbhmfqt1'= Self-rated MFQ at 16.  g p s a n x t 1 g t s a n x t 1 l c m f q t 1 l c s a n x t 1 l p s a n x t 1 l p m f q t 1 l t s a n x t 1 p c b h s d q a n Note that the correlation between CFA and IRT factor scores is 0.92.  The single-factor model did not provide a good fit to the data. CFI and TLI were less than 0.9, suggesting worse fit than a restricted baseline model, and RMSEA was significantly greater than 0.05, suggesting poor fit. However, poor fit might be expected: it could reflect the lack of stability of anxiety across childhood (as indicated by the low correlations across our measures (heat map above), and the low stability in the literature), or it could reflect the higher than modelled associations within raters and within time-points.   After initial quality control and genotype calling, the same quality control was conducted on the samples genotyped on the Affymetrix and Illumina platforms separately using PLINK (Purcell et al. 2007;Chang et al. 2015), R (GBIF.ORG n.d.), and vcftools (Danecek et al. 2011). Samples were removed based on call rate (<0.99), suspected non-European ancestry, heterozygosity, array signal intensity, and relatedness. SNPs were excluded if minor allele frequency was <0.5% , if more than 1% of genotype data were missing, or if the Hardy Weinberg p-value was lower than 10-5. Non-autosomal markers and indels were removed. Associations between the SNP and the platform, batch, or plate on which samples were genotyped were calculated; SNPs with p-values less than 10-3 were excluded. A total sample of 6710 individuals remained after quality control -3093 individuals and 525 859 SNPs genotyped on the Affymetrix platform and 3617 individuals and 600 034 SNPs genotyped on the Illumina platform.

CFA model fit and factor loadings
Genotypes from the two platforms were separately imputed using the Haplotype Reference Consortium 5 and Minimac3 1.0.13 (Howie et al. 2012;Fuchsberger et al. 2015) before merging genotype data obtained from both arrays.
We performed principal component analysis on a subset of 42 859 common (MAF>5%) autosomal HapMap3 SNPs (International HapMap 3 Consortium et al. 2010), after stringent pruning to remove markers in linkage disequilibrium (r2 >0.1) and excluding high linkage disequilibrium genomic regions so as to ensure that only genome-wide effects were detected.
SNP heritability: supplementary methods SNP heritabilities were estimated using genomic relatedness matrix restricted maximum likelihood (GREML), implemented in the Genome-wide Complex Trait Analysis (GCTA) program (Yang et al. 2011). This estimates genetic influence directly using genome-wide genotypes in large samples of unrelated individuals. First, genetic similarity for each pair of unrelated individuals across all genotyped or imputed genetic markers is calculated. Each pair's genetic similarity is then used to predict their phenotypic similarity. In the present study, one from each pair of individuals with pairwise identity-by-descent (IBD) of >0.025 (third degree relatives) were removed, so that chance genetic similarity could be used as a random effect in a linear mixed model. Comparing a matrix of pairwise genomic similarity to a matrix of pairwise phenotypic similarity using a random-effects mixed linear model, the variance of a trait can be decomposed into genetic and residual components. Note: 'h2' = heritability; 'SE' = standard error; '95% CI' = 95% confidence intervals; 'p' = the pvalue associated with the SNP heritability estimate, and whether p<0.05 (indicated with '*'); 'FS' = stable emotional problems factor scores extracted from latent modelling; 'simultaneous' = simultaneous latent trait-twin modelling. Supplementary Figure 4 shows that the simpler approach of creating a mean of the 12 anxiety and depression measures yields similar results to a latent modelling approach (comparing the first and eighth sections of the plot). As such, the finding of higher twin and SNP heritabilities holds for even a crude composite measure that adds up common and measure-specific error (rather than latent modelling to remove error). This bolsters the argument that aggregating across time and across raters taps into a more heritable, core 'trait' aspect of emotional development.
To explore the whether the effect of extracting longitudinal stability increasing heritability remained when using single-rater composites, we created longitudinal means for child-, parentand teacher-reports separately (sections 2-4 of the plot). Results suggest that this is the case. Twin estimates increased from ~45% to ~55%. SNP heritability point estimates increased for child-and parent-report longitudinal means (with longitudinal aggregation boosting SNP heritability of parent-report measures into statistical significance), but not for the teacherreport composite. Given the similar point estimates and large standard error intervals around SNP heritability estimates, it would be inappropriate to compare SNP heritability estimates.
To investigate the contribution of trans-situational variance (i.e. rater agreement on childhood emotional problems), we created means of anxiety and depression measures across reporters but at individual ages (sections 5-7 of the plot). Results suggest that combining cross-sectional measures across reporters increases twin and SNP heritability estimates at ages 7 and 12, but not at 16. Again, standard errors for SNP heritability estimates are large, such that we cannot compare estimates for different composite measures, and we cannot say that aggregation across raters significantly increases SNP heritability. The heritability increase resulting from aggregating measures using a simple mean approach is weaker for SNP heritability than for twin estimates, but this is partly because the sample sizes for SNP heritability estimation were ~4000 (rather than over 6000 for latent modelling with maximum likelihood). The restriction of the mean composite approach to individuals with complete data is a disadvantage in comparison to the latent modelling approach. The finding that latent modelling is more powerful than simple approaches replicates evidence from the Netherlands Twin Register: a latent anxious depression phenotype was substantially more heritable than the scores observed at any age, but measure created by simply summing data from the different ages was considerably less heritable (Lubke et al. 2016).
Overall, results suggest that both the mean composite and latent factor across time and across reporters shows increased heritability as a result of tapping into longitudinally stable aspects of behaviour that are also agreed upon by different raters. Future research could more thoroughly and explicitly test the structure of age-, scale-, and rater-specific influences on emotional problems. For example, in the Trait-State-Occasion model, significant covariances between observed variable error terms indicates that trait stability is inflated by shared method variance cross waves.
Investigation of the contribution of individuals with persistent emotional problems to the increased heritability of our stable phenotype.
In studies of adults, recurrent depression is more severe and heritable than single episode depression. It follows that the heritability for stable childhood emotional problems could be higher because we are capturing the higher heritability of severe recurrent symptoms, rather than stability at any level of emotional symptoms. This is especially worth investigating given the high genetic correlation of our stable childhood emotional problems phenotype with adult depression.
To test this idea, we calculated how many of the 12 measures each individual was an outlier for. Outliers were defined as individuals with z-scores >=1.96 (i.e. cutting off the top 1% of scores). We removed twin pairs for whom twin 1 or 2 was an outlier for >6 of the 12 scales (i.e. removed 21 pairs). Then, we re-calculated the twin heritability of a single common factor (from CFA).
We found a lower twin heritability for stable emotional problems when excluding individuals with persistent problems (0.50 (se=0.05) vs 0.70 (se=0.05) for the CFA factor scores using all 6110 pairs). This suggests that a small number of young people with enduring severe emotional symptoms could be disproportionately contributing to the high heritability of stable emotional problems. However, 95% CIs for the two estimates just overlap (.40-.60 and .60-.80), so the difference is not statistically significant.
Adjusting the definition of outliers to z>=1.64 (the top 5% of scores) led to the removal of 43 twin pairs for whom at least one twin was an outlier on more than half of the measures. The heritability estimate hardly decreased below 0.50 (0.49 (se=0.05)), but the upper 95% CI decreased to .57, such that the CIs no longer overlapped with the estimates from the full sample.
Overall, there is no strong evidence that individuals with severe persistent symptoms are inflating heritability estimates.

Figures 5 and 6: Polygenic score results
The PRSice 2 plots below demonstrate that the R 2 is not dramatically better at the best p-value threshold than at the others. This is indicative of reliable prediction for a polygenic trait.   Note: Pairs of traits are ordered according to the p-value of their genetic correlation. * = significant at p<0.05, **= significant at p<0.00385 i.e. (0.05/13) to correct for multiple tests; †= genetic correlation analysis performed separately (using LDSR) because phenotype was not in LD Hub.
The genetic correlations between our CFA-derived stable scores, and case-control depression, depressive symptoms, case-control generalised anxiety, and wellbeing, are significant at p<0.05. The negative genetic correlation with wellbeing did not remain significant at the more stringent threshold (p<0. 00385). Genetic correlations have many possible interpretations other than true biological pleiotropy (Martin et al. 2017). For example, the relationship between adult depression and childhood stable emotional problems could be mediated by shared genetic risk with low levels of well-being. It has been suggested that the genetic correlation between MDD and depressive symptoms in the population could be accounted for by shared genetic risk with low levels of subjective well-being (Direk et al. 2017). Another issue is that nosological issues involving heterogeneity, comorbidity and misclassification might inflate phenotypic and genetic correlations.