Genetic association study of childhood aggression across raters, instruments, and age

Childhood aggressive behavior (AGG) has a substantial heritability of around 50%. Here we present a genome-wide association meta-analysis (GWAMA) of childhood AGG, in which all phenotype measures across childhood ages from multiple assessors were included. We analyzed phenotype assessments for a total of 328 935 observations from 87 485 children aged between 1.5 and 18 years, while accounting for sample overlap. We also meta-analyzed within subsets of the data, i.e., within rater, instrument and age. SNP-heritability for the overall meta-analysis (AGGoverall) was 3.31% (SE = 0.0038). We found no genome-wide significant SNPs for AGGoverall. The gene-based analysis returned three significant genes: ST3GAL3 (P = 1.6E–06), PCDH7 (P = 2.0E–06), and IPO13 (P = 2.5E–06). All three genes have previously been associated with educational traits. Polygenic scores based on our GWAMA significantly predicted aggression in a holdout sample of children (variance explained = 0.44%) and in retrospectively assessed childhood aggression (variance explained = 0.20%). Genetic correlations (rg) among rater-specific assessment of AGG ranged from rg = 0.46 between self- and teacher-assessment to rg = 0.81 between mother- and teacher-assessment. We obtained moderate-to-strong rgs with selected phenotypes from multiple domains, but hardly with any of the classical biomarkers thought to be associated with AGG. Significant genetic correlations were observed with most psychiatric and psychological traits (range \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left| {r_g} \right|$$\end{document}rg: 0.19–1.00), except for obsessive-compulsive disorder. Aggression had a negative genetic correlation (rg = ~−0.5) with cognitive traits and age at first birth. Aggression was strongly genetically correlated with smoking phenotypes (range \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left| {r_g} \right|$$\end{document}rg: 0.46–0.60). The genetic correlations between aggression and psychiatric disorders were weaker for teacher-reported AGG than for mother- and self-reported AGG. The current GWAMA of childhood aggression provides a powerful tool to interrogate the rater-specific genetic etiology of AGG.


INTRODUCTION
There is a variety of phenotypic definitions of aggressive behavior (AGG), from broadly defined externalizing problems to narrow definitions like chronic physical aggression [1]. Generally any action performed with the intention to harm another organism can be viewed as AGG [2,3]. AGG is considered a common human behavior [4], with people varying in the degree of AGG they exhibit [5]. Children typically display AGG early in life, after which symptoms tend to diminish [6,7], although in some individuals AGG persists into adulthood [8]. AGG is also part of numerous childhood and adult disorders [9], including oppositional defiant disorder and conduct disorder (CD) [10]. In its extreme forms, AGG may be considered a disorder by itself-inflicting a huge personal and financial burden on the individual, their relatives, friends, and society as a whole [11]. In general population studies, AGG is commonly treated as a quantitative trait, and pathological AGG has been argued to be best seen as the extreme end of such a continuum [12][13][14]. Childhood AGG co-occurs with many other behavioral, emotional, and social problems [15,16] and is associated with increased risk of developing negative outcomes later in life, including cannabis abuse [17], criminal convictions [18], anxiety disorder [19], or antisocial personality disorder [20]. Not all associated outcomes are harmful [21]. For example, children who learn to control their impulses and apply aggressive acts as a well-timed coercion strategy are generally more liked by their peers and score higher on social dominance [22].
Despite a heritability of roughly 50% [5,23], genome-wide association studies (GWASs) on childhood AGG have not identified genome-wide significant loci that replicated [1]. Childhood cohorts often have rich longitudinal data and assessments from multiple informants and we aimed to increase power to detect genomic loci via multivariate genome-wide association metaanalysis (GWAMA) across genetically correlated traits [24,25]. In AGG, twin studies have reported moderate to high genetic correlations among instruments, raters, and age [26][27][28][29]. Childhood behavior can be context dependent, with teachers, fathers, and mothers each observing and rating aggression against a different background. Teachers are typically unrelated to the child, and see the child in the context of a structured classroom and can judge the child's behavior against that of other pupils. Parents share part of their genome with their offspring and, most often, a household. Parental genomes also influence the home environment, and it is predominantly within this context that parents observe the child's behavior. Multiple assessments of aggression by teachers, fathers, and mothers, by different instruments and at different ages, provide information that may be unique to a specific context and therefore may capture context-dependent expression of AGG. These considerations support an approach in which all AGG data are simultaneously analyzed, while retaining the ability to analyze the data by rater. Our analyses include repeated observations on the same subject, which requires appropriate modeling of the clustered data, since the covariance between test statistics becomes a function of a true shared genetic signal and the phenotypic correlation among outcomes [29]. We developed an approach that allowed inclusion of all measures for a child-e.g., from multiple raters at multiple agesand resolved issues of sample overlap at the level of the metaanalysis. By doing so we make full use of all data and maximize statistical power for gene discovery. At the same time, by aggregating data at the level of the meta-analysis, we retain the flexibility to estimate r g s between AGG at different ages, by different raters and instruments, and test how AGG assessed by multiple raters differ in the r g with other phenotypes.
Data on AGG from parent-, teacher-, and self-report in boys and girls were collected in 29 cohorts from Europe, USA, Australia, and New Zealand with 328 935 observations from 87 485 participants, aged 1.5-18 years. First, we combined all data to produce the largest GWAMA on childhood AGG to date. SNP-based association tests were followed up by gene-based analyses. We computed polygenic scores (PGSs) to test the out-of-sample prediction of AGG to explore the usefulness of our GWAMA in future research [30]. To assess genetic pleiotropy between AGG and associated traits, we estimated r g s with a preselected set of external phenotypes from multiple domains-with a focus on psychiatric and psychological traits, cognition, anthropometric and reproductive traits, substance use, and classic biomarkers of AGG, including testosterone levels. Second, meta-analyses were done by rater, instrument, and age. We estimated r g s across these assessments of AGG. To identify context-specific genetic overlap with the external phenotypes, r g s were also estimated between rater-specific assessments of AGG and the external phenotypes.

METHODOLOGY Data description
Extended description of the cohorts and phenotypes is supplied in the Supplementary text and Supplementary Tables 1-9. Cohorts with assessment of AGG in genotyped children and adolescents took part in the meta-analysis. AGG was assessed on continuous scales, with higher scores indicating higher levels of AGG. Within cohort, samples were stratified by (1) rater, (2) instrument, and (3) age, maintaining at least 450 observations in each stratum. We ran a univariate GWAS for each stratum within each cohort (Supplementary Table 8). GWASs were run by local analysts following a standard operation protocol (see URLs) after which the summary statistics were uploaded to a central location for the meta-analysis. To account for dependence within cohort in the meta-analysis (see Supplementary text), each cohort supplied the phenotypic covariance matrix between the AGG measures (Supplementary Table 10) and the degree of sample overlap (Supplementary Table  11) between the different strata. Supplementary Fig. 1 shows the distribution of phenotypic correlations across all AGG measures. We assumed no sample overlap across cohorts, and phenotypic correlations among cohorts were set to zero and omitted from Supplementary Fig. 1. Phenotypic correlations of zero also correspond to independent samples within a cohort. For GWASs with sample overlap, most phenotypic correlations ranged between 0.1 and 0.4, with a median value of 0.29. When stratified by rater, phenotypic correlations were more heavily centered around 0.4 (see Supplementary Fig. 1). The maximum number of correlations within cohort at a specific age is three based on four raters, with the largest number of observations within age-bin around age 12 years. Within this age group, phenotypic correlations among raters ranged between 0.22 and 0.65, with a median of 0.34. The lowest phenotypic correlations were seen between teachers and parents. Since limited data were available on individuals of non-European ancestry, we restricted analyses to individuals of European ancestry.
In total, 29 cohorts contributed 163 GWASs, based on 328 935 observations from 87 485 unique individuals (Supplementary Table  2). Children were 1.5-18 years old at assessment, or retrospectively assessed at these ages. Cohorts supplied between 1 and 26 univariate GWASs. Approximately 50% of the subjects were males. Most GWASs were based on maternal-(52.4%) and self-assessment (25.1%), with the remainder based on teacher (12.4%) and paternal report (10.1%). After QC, applied to the univariate GWASs, between 3.47 M SNPs and 7.28 M SNPs were retained for meta-analysis (see Supplementary Fig. 2 and Supplementary Table 9). Note that the wide range of retained SNPs is a result of applying more stringent QC filters for GWASs with smaller sample sizes and that GWASs with comparable sample sizes returned roughly equal number of SNPs (see Supplementary text and Supplementary Fig. 2).

Meta-analysis
Within cohort measures of AGG may be dependent due to including repeated measures of AGG over age and measures from multiple raters. To account for the effect of sample overlap, we applied a modified version of the multivariate meta-analysis approach developed by Baselmans et al. [25] (see Table 1). Instead of estimating the dependence among GWASs based on the crosstrait-intercept (CTI) with linkage disequilibrium score regression (LDSC) [29,31], the expected pairwise CTI value was calculated (Table 1) using the observed sample overlap and phenotypic covariance as sample sizes of the univariate GWASs were insufficient to run bivariate LDSC. The effective sample size (N eff ) was approximated by the third formula in Table 1. When there is no sample overlap (or a phenotypic correlation equal to zero) between all GWASs (i.e., CTI is an identity matrix), N eff is equal to the sum of sample sizes.
First, we meta-analyzed all available GWASs (AGG overall ). Second, we meta-analyzed all available data within rater (rater-specific GWAMAs). Third, rater-specific age-bins were created for motherand self-reported AGG based on the mean ages of the subjects in each GWAS (age-specific GWAMA). To ensure that the age-specific GWAMAs would have sufficient power for subsequent analyses, age-bins were created such that the total univariate number of observations (N obs ) exceeded 15 000 (see Supplementary text and  (c) effective sample size for a GWAMA.
CTIik for i≠k q (a) Multivariate test statistic for jth SNP. P is the number of GWASs across that we run the meta-analysis; is the weight given to the jth SNP in GWAS i, with h 2 SNP;i being the SNP-heritability of the trait analyzed in GWAS i; and V ji ¼ 1 represents the variance of the distribution of Z ji under the null hypothesis of no effect.
Cross-trait intercept between GWAS i and k. N s represents the sample overlap; r p indicates the phenotypic correlation; N ji and N jk are the sample sizes at SNP j for, respectively, GWASs i and k N is an P-sized vector of sample sizes, and CTI is the P × P matrix of cross-trait-intercepts.
H.F. Ip et al.
Supplementary Table 12). For father-and teacher-reported AGG, there were insufficient data to run age-specific GWAMAs. Fourth, we performed instrument-specific GWAMAs for (1) the ASEBA scales and (2) for the SDQ, because for these two instruments the total univariate N obs was over 15 000. SNPs that had MAF < 0.01, N eff < 15 000, or were observed in less than two cohorts were removed from further analyses. SNPheritabilities (h 2 SNP ) were estimated using LDSC [31]. r g s were calculated across stratified assessments of AGG using LDSC [29]. To ensure sufficient power for the genetic correlations, r g was calculated across stratified assessments of AGG if the Z-score of the h 2 SNP for the corresponding GWAMA was 4 or higher [29].

Gene-based tests
For AGG overall , a gene-based analysis was done in MAGMA [32].
The gene-based test combines P values from multiple SNPs to obtain a test statistic for each gene, while accounting for LD between the SNPs. From the MAGMA website (see URLs) we obtained (1) a list of 18 087 genes and their start-and endpositions, and (2) pre-formatted European genotypes from 1000 Genomes phase 3 for the reference LD. We applied a Bonferroni correction for multiple testing at α = 0.05/18 087. A lookup for significant results was performed in GWAS Catalog and PhenoScanner (see URLs).

Polygenic scores
All data were meta-analyzed twice more, once omitting all data from the Netherlands Twin Register (NTR) and once omitting the Australian data from the Queensland Institute for Medical Research (QIMR) and the Mater-University of Queensland Study of Pregnancy (MUSP). As the NTR target sample we considered mother-reported AGG at age 7 (N = 4491), which represents the largest NTR univariate stratum. In the QIRM participants, we tested whether our childhood AGG PGS predicted adult retrospective assessment of their own CD behavior during adolescence (N = 10 706). We allowed for cohort-specific best practice in the PGS analysis. In the NTR, we created 16 sets of PGSs in PLINK1.9 [33], with P value thresholds between 1 and 1.0E-05 (see Supplementary Table 13). The remaining SNPs were clumped in PLINK. We applied an r 2 threshold of 0.5 and minimum clumping distance of 250 000 base pair positions [33]. Age, age [2], sex, first five ancestry-based principal components, a SNP-array variable, and interaction terms between sex and age, and sex and age [2] were defined as fixed effects. To account for relatedness, prediction was performed using generalized equation estimation (GEE) as implemented in the "gee" package (version 4. [13][14][15][16][17][18][19] in R (version 3.5.3). GEE applies a sandwich correction over the standard errors to account for clustering in the data [34]. To correct for multiple testing, we applied an FDR correction at α = 0.05 for 16 tests. QIMR excluded SNPs with low imputation quality (r 2 = 0.6) and MAF below 1% and selected the most significant independent SNPs using PLINK1.9 [35] (criteria linkage disequilibrium r 2 = 0.1 within windows of 10 MBp). We calculated different PGS for seven P value thresholds (P < 1e-5, P < 0.001, P < 0.01, P < 0.05, P < 0.1, P < 0.5, and P < 1.0) of the GWAS summary statistics. PGS were calculated from the imputed genotype dosages to the 1000 Genomes (Phase 3 Release 5) reference panel. We fitted linear mixed models, which controlled for relatedness using a Genetic Relatedness Matrix (GRM) and covariates sex, age, two dummy variables for the GWAS array used, and the first five genetic principal components. The parameters of the model were estimated using GCTA 1.9 [36] The linear @@model was as follows: Where b and c represent the vectors of fixed effects; and G $ Nð0; GRM Ã σ2GÞ represents the random effect that models the sample relatedness, with GRM being the N by N matrix of relatedness estimated from SNPs and N = 10 706 is the number of individuals.

Genetic correlations with external phenotypes
We computed r g s between AGG overall and a set of preselected outcomes (N = 46; collectively referred to as "external phenotypes"; Supplementary Table 14). Phenotypes were selected based on established hypotheses with AGG and the availability of sufficiently powered GWAS summary statistics. We restricted r g s to phenotypes for which the Z-scores of the LDSC-based h 2 SNP ≥ 4 [29]. Next, we estimated r g s for all rater-specific assessments of AGG (except for father-reported AGG). Genomic Structural Equation Modelling (Genomic SEM) [37] was applied to test if r g s were significantly different across raters. Specifically, for every phenotype, we tested whether (1) all three r g s between the external phenotype and rater-specific assessment of AGG, i.e., mother, teacher, or self-ratings, could be constrained at zero, and (2) whether r g s could be constrained to be equal across raters. A χ 2 difference test was applied to assess whether imposing the constraints resulted in a significant worse model fit compared to a model where the r g s between the phenotype and three raterspecific assessment of AGG were allowed to differ. We applied an FDR correction at α = 0.05 over two models for 46 external phenotypes, for a total of 92 tests. An FDR correction for 4 × 46 = 184 tests was applied to correct for multiple testing of whether the genetic correlations were significantly different from zero.

Overall GWAMA
We first meta-analyzed the effect of each SNP across all available univariate GWASs. Assuming an N eff of 151 741, the h 2 SNP of AGG overall was estimated at 3.31% (SE = 0.0038). The mean χ 2 statistic was 1.12 along with an LDSC-intercept of 1.02 (SE = 0.01). This indicated that a small, but significant, part of the inflation in test statistics might have been due to confounding biases, which can either reflect population stratification or subtle misspecification of sample overlap within cohorts. No genome-wide significant hits were found for AGG overall (Fig. 1). The list of suggestive associations (P < 1.0E-05) is provided in Supplementary Table 15. SNPs were annotated with SNPnexus (see URLs). The strongest association, in terms of significance, was located on chromosome 2 (rs2570485; P = 2.0E-07). The SNP is located inside a gene desert, without any gene in 400 Kbp in any direction. The second strongest independent association was found with rs113599846 (P = 4.3E-07), which is located inside an intronic Fig. 1 Manhattan plot of overall meta-analysis for childhood aggression (AG Goverall ). Red triangles represent SNPs that were included in the significant genes from the gene-based analysis. SNPs for ST3GAL3 and IPO13 are included in the same locus on chromosome 1. region of TNRC18 on chromosome 7. None of the suggestive associations have previously been reported for AGG or AGGrelated traits [1].
We tested previously reported genome-wide significant associations for AGG [1] and performed a lookup in AGG overall . We restricted lookup to associations with autosomal SNPs that were found in samples of European ancestry, resulting in three loci. One genome-wide significant hit was reported for adult antisocial personality disorder (rs4714329; OR = 0.63 [odds ratio was signed to the other allele in the original study]; P = 1.64E-09) [38]. The same SNP, however, had an opposite direction of effect in AGG overall (β = 0.0022; P = 0.41). Tielbeek et al. [39] reported two genome-wide significant hits for antisocial behavior, one on chromosome 1 (rs2764450) and one on chromosome 11 (rs11215217). While both SNPs have the same direction of effect, neither SNP is associated with AGG overall (both P > 0.5).

Gene-based analysis
After correction for multiple testing, the gene-based analysis returned three significant results (Supplementary Table 16): ST3GAL3 (ST3 beta-galactoside alpha-2,3-sialyltransferase3; P = 1.6E-06), PCDH7 (protocadherin 7; P = 2.0E-06), and IPO13 (importin 13; P=2.5E-06). ST3GAL3 codes for a type II membrane protein that is involved in catalyzing the transfer of sialic acid from CMP-sialic acid to galactose-containing substrates. ST3GAL3 has been implicated in 107 GWASs, most notably on intelligence and educational attainment. The top SNP in ST3GAL3 (rs2485997; P = 2.48E-06) is in strong LD (r 2 > 0.8) with several other SNPs inside the gene body of ST3GAL3 and in moderate LD (r 2 > 0.6) with SNPs in several neighboring genes ( Supplementary Fig. 3). PCDH7 codes for a protein that is hypothesized to function in cell-cell recognition and adhesion. PCDH7 has been implicated in 196 previous GWASs, for example educational attainment and adventurousness. The top SNP for PCDH7 (rs13138213; P = 1.44E-06) is in strong LD (r 2 > 0.8) with a small number of other closely located SNPs and the signal for the gene-based test appears to be driven by two independent loci ( Supplementary Fig. 4). IPO13 codes for a nuclear transport protein. IPO13 has been implicated in the UKB GWASs on whether a person holds a college or university degree and intelligence. The top SNP (rs3791116; P = 1.19E-05) is in moderate-to-strong LD with multiple SNPs (Supplementary Fig. 5), including SNPs in the neighboring ST3GAL3 gene.

Polygenic prediction
In children, 11 out of 16 PGSs were significantly correlated with mother-reported AGG in 7-year olds (Fig. 2) after correction for multiple testing. The scores explained between 0.036 and 0.44% of the phenotypic variance. The significant correlations consistently emerged when scores including SNPs with P values above 0.002 in the discovery GWAS were considered. In the retrospective assessments of adolescent CD, the PGS calculated at various thresholds (Fig. 3) explained up to 0.2% of the variance in symptom sum scores. Generally, CD is significantly predicted at most thresholds, although, as we would expect based on the SNP-heritability of AGG overall , the proportion of explained variance is small.

Genetic correlation with external phenotypes
Genetic correlations between AGG overall and a set of preselected external phenotypes are shown in Fig. 4 and Supplementary Table  17. These phenotypes can broadly be grouped into psychiatric and psychological traits, substance use, cognitive ability, anthropometric traits, classic biomarkers of AGG, reproductive traits, and sleeping behavior. We included childhood phenotypes (e.g., birth weight and childhood IQ) and disorders (e.g., ADHD and autism spectrum disorder [ASD]), but the majority of phenotypes were adult characteristics or characteristics measured in adult samples. After correction for multiple testing, 36 phenotypes showed a significant r g with AGG overall (P < 0.02). In general, the highest positive correlations were seen with psychiatric traits, notably ADHD, ASD, and major depressive disorder (MDD). The largest negative genetic correlations were found for age at smoking initiation, childhood IQ, and age at first birth. Based on the biomarker-aggression literature, we tested for the presence of genetic correlations between AGG overall, and lipids, heart rate, heart rate variability, and testosterone levels. Very low genetic correlations were observed for AGG overall, and these biomarkers, with in many cases the sign of the genetic correlation opposite to what was expected based on the literature on biomarkers of AGG.
Stratified assessment of childhood aggressive behavior Separate meta-analyses were carried out for raters, instruments and age. None of these GWAMAs returned genome-wide Fig. 2 Proportion of explained variance (vertical axis) in childhood aggression at age 7 by polygenic scores from the overall GWAMA for multiple P value thresholds (horizontal axis). Numbers above the bars represent unadjusted P values for twosided test of significance.  Supplementary Fig. 6. Estimates of h 2 SNP for rater-specific assessment of AGG are shown in Supplementary  Table 18. The lowest h 2 SNP was observed for father-reported AGG (h 2 SNP = 0.04; SE = 0.03) and the highest for teacher-reported AGG (h 2 SNP = 0.08; SE = 0.02). We estimated r g between rater-specific assessment of AGG, except for father-reported AGG, which returned a non-significant h 2 SNP . A substantial genetic correlation was observed between AGG Mother and AGG Teacher (r g = 0.81; SE = 0.11). Moderate genetic correlations were observed between AGG Self and AGG Mother (r g = 0.67; SE = 0.10), and between AGG Self and AGG Teacher (r g = 0.46; SE = 0.13). Both genetic correlations involving self-reported AGG were significantly lower than 1.
We performed a GWAMA across all GWASs where an ASEBA scale was used (AGG ASEBA ) and another GWAMA across all GWASs for the SDQ (AGG SDQ ). SNP-heritabilities for AGG ASEBA and AGG SDQ were 0.031 (SE = 0.0099) and 0.026 (SE = 0.0086), respectively. The GWAMAs were insufficiently powered to estimate r g across instrument-specific assessment of AGG.
Age-specific GWAMAs were performed for mother-and selfreported AGG, which made up 77.5% of the data. Motherreported data were split into seven age-bins and self-reported data into three (Supplementary Table 12). Estimates of the h 2 SNP for each age-specific GWAMA can be found in Supplementary  Table 19. For mother-reported AGG, h 2 SNP ranged between 0.012 and 0.078. For self-reported AGG, the highest h 2 SNP was seen for the retrospective data (h 2 SNP = 0.12; SE = 0.03), which also showed a significantly inflated intercept (1.05; SE = 0.01). r g could only be estimated between AGG M7 , AGG S13 , and AGG SR (Supplementary Table 20).
Genetic correlation between rater-specific assessment of AGG and external phenotypes We estimated rater-specific r g s with the external phenotypes, except for father-reported AGG, and tested for each external phenotype whether these r g s could be constrained to be equal to zero. For 31 out of 46 external phenotypes, constraining the r g s to be equal to zero for all three raters resulted in significant reduction in model fit (Supplementary Table 21), indicating that, for these external phenotypes, at least one rater has an r g that is significantly different from zero.
Next, we tested for each external phenotype whether the three rater-specific r g s with the external phenotypes could be constrained to be equal across mothers, teachers and self-ratings. For ADHD, ASD, MDD, schizophrenia, well-being, and self-reported health, constraining the r g s to be equal across rater resulted in significantly worse model fit (Supplementary Table 21). For all these phenotypes, r g s with teacher-reported AGG were consistently lower compared to mother-and self-reported AGG ( Supplementary Fig. 7 and Supplementary Table 17). For lifetime cannabis use, genetic correlations also could not be constrained to be equal across raters. Here, a relatively strong r g was found with self-reported AGG (r g = 0.36; SE = 0.08) compared to teacher-(r g = 0.13; SE = 0.07) and mother-reported AGG (r g = 0.08; SE = 0.08).

DISCUSSION
We present the largest GWAMA of childhood AGG to date. The gene-based analysis implicated three genes, PCDH7, ST3GAL3, and IPO13, based on the overall meta-analysis (AGG overall ), which did not return genome-wide significant SNPs. Lead SNPs in the implicated genes were related to educational outcomes, but did not reach genome-wide significance and these loci require further evidence before being considered as AGG risk variants. PGS predicted childhood AGG and retrospectively assessed adolescent CD. Stratified analyses within AGG generally returned moderateto-strong genetic correlations across raters. We found substantial genetic correlations between AGG overall and a list of preselected external phenotypes from various domains, including, psychiatry and psychology, cognition, anthropometric and reproductive traits. Most notably was the perfect r g between AGG overall and ADHD (r g = 1.00; SE = 0.07). This is in line with the moderate-tostrong phenotypic correlations that have consistently been found across sex-, rater-, age-, and instrument-specific assessment of AGG with attention problems and hyperactivity [15]. Significant genetic correlations were further observed with other psychiatric and psychological traits (range r g : 0.19-0.55). Negative genetic correlations (r g =~−0.5) were found with all three traits from the cognitive domain. Genetic correlations were positive with smoking initiation (r g = 0.55; SE = 0.04) and smoking quantity (r g = 0.46; SE = 0.06), and negative with age at smoking initiation (r g = −0.60; SE = 0.09). We examined genetic correlations with classical biomarkers of AGG. Higher levels of aggression have been associated with lower levels of LDL [40] and lower resting heart rate [41,42]. We found a positive, albeit weak, r g between AGG overall and LDL (r g = 0.15; SE = 0.07), which has an opposite sign than what was expected based on the literature [39]. More broadly, except for HDL (r g = −0.13; SE = 0.07), all measures of lipid levels returned significant positive r g s with AGG overall , albeit weakly (r g < 0.2). No heart rate measure showed a significant genetic correlation with AGG overall . The relationship between testosterone levels and (childhood) AGG in the literature is, at best, unclear. A positive association between AGG and testosterone is often assumed, but the relation may be more complex [43]. Both positive and negative phenotypic correlations have been found and seem context-dependent [44]. We found significant negative, r g s between AGG overall and testosterone levels in males and females ( r g < 0.15). These should be interpreted with some caution because of the design of the GWA studies: AGG was measured in children and young adolescents whereas testosterone levels were measured in adults in the UK Biobank [45], and genetic stability of testosterone levels might be low, at least for males [46]. Genetic correlations with reproductive traits showed a positive relation with having more children (r g = 0.27; SE = 0.08) and having offspring earlier in life (r g = −0.60; SE = 0.06), tending to confirm that not all associated outcomes are harmful.
The stratified design of our study also allowed for examination of the genetic etiology of AGG in subsets of the data and examination of genetic correlations among raters. We found a high genetic correlation between AGG Mother and AGG Teacher (r g = 0.81; SE = 0.11). However, the 95% confidence interval covers 1, which makes these results hard to reconcile with previous findings of rater-specific additive genetic effects in childhood AGG [47]. Most external phenotypes showed comparable r g s with mother-, self-, and teacher-reported AGG. For ADHD, ASD, MDD, schizophrenia, well-being, and self-reported health, r g s differed significantly across raters. Weaker r g s were consistently found in teacher-reported AGG compared to mother-and self-reported AGG. These findings indicate the presence of rater-specific effects when considering the genetic correlation of AGG with other outcomes. r g s are generally stronger in the psychopathology and psychological domains. A lack of power, however, seems insufficient to explain why we found weaker r g s between AGG Teacher and phenotypes from these two domains. Other phenotypes, like smoking behavior, educational attainment or age at first birth, are, like psychopathological phenotypes, highly genetically correlated with AGG overall , but, unlike psychopathologies, have near identical r g s across raters. The rater-specific effects on r g s between childhood AGG and external phenotypes might be limited to psychopathologies, and future research into the genetics of childhood psychopathology might consider these nuances in effects of assessment of childhood AGG from various sources, be that multiple raters, instruments, and ages.
Despite the considerable sample sizes, we were still underpowered to compute genetic correlations with external phenotypes while stratifying AGG over age or instrument. Age-stratified GWASs in larger samples across development are a desirable target for future research. Because genetic correlations can be computed between phenotypes for which a well-powered GWAS is available, age-stratified GWAS of many developmental phenotypes, behavioral, cognitive, and neuroscientific can be leveraged to better understand development of childhood traits.
We note that multivariate results should be interpreted with some caution. While combining data from correlated traits can indeed improve power to identify genome-wide associations, interpreting the phenotype may not be straightforward. In the current GWAMA, we have referred to our phenotype as "aggressive behavior" and interpreted the results accordingly. AGG, however, is an umbrella term that has been used to identify a wide range of distinct-though correlated-traits and behaviors [1].
GWASs are increasingly successful in identifying genomic loci for complex human traits [48] and also in psychiatry, genetic biomarkers are increasingly thought of as promising for both research and treatment. Genetic risk prediction holds promise for adult psychiatric disorders [30] and it seems reasonable to expect the same for childhood disorders. Here we found that PGSs explain up to 0.44% of the phenotypic variance in AGG in 7-year olds and 0.2% of the variance in retrospectively reported adolescent CD. Note that differences in ages, instrument and local best-practices have led to differences in explained variance. Future studies may explore the utility of these PGSs in illuminating pleiotropy between AGG overall and other traits. A limiting factor in this regard is the relatively low SNPheritability, which puts an upper bound on the predictive accuracy of PGSs. Since measurement error suppresses SNPheritability, better measurement may offer an avenue to higher powered GWAS, and subsequently to better PGS. Furthermore, sample sizes for developmental phenotypes, including AGG, may need to increase by one to two orders of magnitude before PGS become useful for individual patients.
Despite our extensive effort, the first genome-wide significant SNP for childhood AGG has yet to be found. Even in the absence of genome-wide significant loci, however, GWASs aid in clarifying the biology behind complex traits. Our results show that, even without genome-wide significant hits, a GWAS can be powerful enough to illuminate the genetic etiology of a trait in the form of r g s with other complex traits. Non-significant associations are expected to capture part of the polygenicity of a trait [31] and various follow-up analyses have been developed for GWASs that do not require, but are aided by, genome-wide significant hits [49]. PGSs aggregate SNP effects into a weighted sum that indicates a person's genetic liability to develop a disorder. While their clinical application is still limited in psychiatric disorders, they can already aid in understanding the pleiotropy among psychiatric and other traits [30]. Similarly, summary statistics-based genetic correlations (r g ) provide insight into the genetic overlap between complex traits [29,50].

CODE AVAILABILITY
Code for meta-analyses and follow-up analyses are available from the corresponding author upon request.