Disentangling polygenic associations between attention-deficit/hyperactivity disorder, educational attainment, literacy and language

Interpreting polygenic overlap between ADHD and both literacy-related and language-related impairments is challenging as genetic associations might be influenced by indirectly shared genetic factors. Here, we investigate genetic overlap between polygenic ADHD risk and multiple literacy-related and/or language-related abilities (LRAs), as assessed in UK children (N ≤ 5919), accounting for genetically predictable educational attainment (EA). Genome-wide summary statistics on clinical ADHD and years of schooling were obtained from large consortia (N ≤ 326,041). Our findings show that ADHD-polygenic scores (ADHD-PGS) were inversely associated with LRAs in ALSPAC, most consistently with reading-related abilities, and explained ≤1.6% phenotypic variation. These polygenic links were then dissected into both ADHD effects shared with and independent of EA, using multivariable regressions (MVR). Conditional on EA, polygenic ADHD risk remained associated with multiple reading and/or spelling abilities, phonemic awareness and verbal intelligence, but not listening comprehension and non-word repetition. Using conservative ADHD-instruments (P-threshold < 5 × 10−8), this corresponded, for example, to a 0.35 SD decrease in pooled reading performance per log-odds in ADHD-liability (P = 9.2 × 10−5). Using subthreshold ADHD-instruments (P-threshold < 0.0015), these effects became smaller, with a 0.03 SD decrease per log-odds in ADHD risk (P = 1.4 × 10−6), although the predictive accuracy increased. However, polygenic ADHD-effects shared with EA were of equal strength and at least equal magnitude compared to those independent of EA, for all LRAs studied, and detectable using subthreshold instruments. Thus, ADHD-related polygenic links with LRAs are to a large extent due to shared genetic effects with EA, although there is evidence for an ADHD-specific association profile, independent of EA, that primarily involves literacy-related impairments.


Introduction
Children with Attention-Deficit/Hyperactivity Disorder (ADHD) often experience difficulties mastering literacyrelated and/or language-related abilities (LRAs) [1][2][3] . It has been estimated that up to 40% of children diagnosed with clinical ADHD also suffer from reading disability (RD, also known as developmental dyslexia) and vice versa 4 . The spectrum of affected LRAs in ADHD may, however, also include writing 5,6 , spelling 7,8 , syntactic 9,10 and phonological 9,10 abilities. Both clinical ADHD and RD are complex childhood-onset neurodevelopmental conditions that affect about 5% and 7% of the general population, respectively 11,12 . ADHD is characterised by hyperactive, inattentive and impulsive symptoms 13 , whereas decoding and/or reading comprehension deficits are prominent in individuals with RD 14 .
To interpret the comorbidity of ADHD and RD, a multiple-deficit model including shared underlying aetiologies has been proposed, involving both genetic and environmental influences 15 . This model is supported by twin studies suggesting that the co-occurrence of ADHD symptoms and reading deficits is, to a large extent, attributable to shared genetic influences [16][17][18] . Further twin research suggests that the genetic covariance between reading difficulties and ADHD is largely independent of genetic factors shared with IQ 19 , although it is not known whether these findings extend to a wider spectrum of LRAs, beyond reading abilities. Furthermore, the interpretation of polygenic ADHD-LRA overlap using markers on genotyping arrays is more challenging. There is strong evidence that genetically predicted educational attainment (EA) 20 shares genetic variability with both ADHD 21 and reading abilities 22,23 . Genetically predicted EA is a genetic proxy of cognitive abilities, but also socioeconomic status 20 including, for example, associations with maternal smoking during pregnancy, parental smoking, household income or watching television 24 . Thus, observed genetic associations between ADHD and reading abilities may solely reflect shared genetic variation with EA, but not any other, more specific neuro-cognitive mechanisms. In other words, polygenic associations might be inflated or even induced 25 by genetically predictable traits that are related to both, ADHD and reading abilities (or other LRAs).
Here, we (a) study polygenic links between clinical ADHD and a wide range of population-ascertained literacy-related and language-related measures as captured by common variation, (b) evaluate to what extent such links reflect a shared genetic basis with EA and (c) assess whether there is support for shared genetic factors between clinical ADHD and LRAs conditional on genetically predicted EA.
Studied ADHD polygenic scores (ADHD-PGS) are based on ADHD genome-wide association study (GWAS) summary statistics from two large independent ADHD samples, the Psychiatric Genomics Consortium (PGC) and the Danish Lundbeck Foundation Initiative for Integrative Psychiatric Research (iPSYCH), and a combination thereof. Associations between ADHD-PGS and a wide spectrum of population-based literacy-related and language-related measures related to reading, spelling, phonemic awareness, listening comprehension, non-word repetition and verbal intelligence skills, are examined in a sample of children from the UK Avon Longitudinal Study of Parents and Children (ALSPAC). Applying multivariable regression (MVR) techniques, analogous to Mendelian Randomisation (MR) approaches 26 , we report here disentangled associations between polygenic ADHD risk and LRA measures and estimate effects independent of and shared with genetically predicted years of schooling, using summary statistics from the Social Science Genetic Association Consortium (SSGAC).

Methods and materials
Literacy-related and language-related abilities in the general population LRAs were assessed in children and adolescents from ALSPAC, a UK population-based longitudinal pregnancyascertained birth cohort (estimated birth date: 1991-1992, Supplementary Information) 27,28 . Ethical approval was obtained from the ALSPAC Law-and-Ethics Committee (IRB00003312) and the Local Research-Ethics Committees. Written informed consent was obtained from a parent or individual with parental responsibility and assent (and for older children consent) was obtained from the child participants.

Phenotype information
Thirteen measures capturing LRAs related to reading, spelling, phonemic awareness, listening comprehension, non-word repetition and verbal intelligence scores were assessed in 7 to 13 year-old ALSPAC participants (N ≤ 5919, Table 1) using both standardised and ALSPACspecific instruments. Detailed descriptions of all LRA measures are available in Table 1 and the Supplementary  Information. All LRA scores were rank-transformed to allow for comparisons of genetic effects across different psychological instruments with different distributions (Supplementary Information). Phenotypic correlations, using Pearson-correlation coefficients, were comparable for untransformed and rank-transformed scores (Table S1). To account for multiple testing, we estimated the effective number of phenotypes studied using Matrix Spectral Decomposition 29 (MatSpD), revealing seven independent measures (experiment-wide error rate of 0.007).
For sensitivity analysis, we excluded 188 children with an ADHD diagnosis at age 7, based on the Development and Wellbeing Assessment (DAWBA) 30 (Supplementary  Information).

Genetic analyses
ALSPAC participants were genotyped using the Illumina HumanHap550 quad chip genotyping platforms, and genotypes were called using the Illumina GenomeStudio software. Genotyping, imputation and genome-wide association analysis details are described in the Supplementary Information and Table 2.

Clinical ADHD summary statistics
Psychiatric Genomics Consortium (PGC). GWAS summary statistics were obtained from a mega-analysis of clinical ADHD 31 , conducted by the PGC (4163 cases and 12,040 controls/pseudo-controls) ( Combined PGC and iPSYCH ADHD sample (PGC + iPSYCH). To maximise power, we also analysed meta-GWAS summary statistics from an ADHD sample containing both PGC and iPSYCH participants 21 (20,183 cases, 35,191 controls/pseudo-controls) (      20 (discovery and replication sample combined, excluding ALSPAC and 23andMe samples, N = 326,041) were obtained from the SSGAC consortium. EA was assessed as years of schooling 20 . A detailed sample description is available in Table 2 and the Supplementary Information.

Genome-wide complex trait analysis
SNP-h 2 and genetic correlations (r g ) between LRAs were estimated using Restricted Maximum Likelihood (REML) analyses 34,35 as implemented in Genome-wide Complex Trait Analysis (GCTA) software 36 , including individuals with a genetic relationship < 0.05 34 . For this study, we selected only LRAs with evidence for SNP-h 2 and sample size N > 4000 (Table S2).

Linkage disequilibrium score regression and correlation
Linkage Disequilibrium Score (LDSC) regression 37 was used to distinguish confounding biases from polygenic influences by examining the LDSC regression intercept. Unconstrained LD-score correlation 38 analysis was applied to estimate r g (Supplementary Information).

Polygenic scoring analyses
ADHD-PGS 39,40 were created in ALSPAC using the independent PGC and iPSYCH GWAS summary statistics, and, to maximise power, also for GWAS summary statistics from the combined PGC + iPSYCH sample (Supplementary Information). ADHD-PGS have been previously linked to ADHD symptoms in ALSPAC participants 41 . Rank-transformed LRAs were regressed on Z-standardised ADHD-PGS (aligned to measure risk-increasing alleles) using ordinary least square (OLS) regression (R:stats library, Rv3.2.0). The proportion of phenotypic variance explained is reported as OLS-regression-R 2 . Betacoefficients (β) for ADHD-PGS quantify here the change in standard deviation (SD) units of LRA performance per one SD increase in ADHD-PGS.

Multivariable regression analysis
To study the genetic association between ADHD and LRAs conditional on genetic influences shared with EA, we applied MVR. This technique is analogous to MR methodologies 26 and controls for collider bias 42 through the use of GWAS summary statistics. Technically, it involves the regression of regression estimates from independent samples on each other 26 (Supplementary Information). Within this study we use MVR without inferring causality due to violations of classical MR assumptions 26 (see below).
Estimation of ADHD effects: We extracted regression estimates for selected ADHD-instruments (conservative and subthreshold) from ADHD (PGC + iPSYCH), EA (SSGAC) and 13 LRA (ALSPAC) GWAS summary statistics. Analysing each set of variants independently, regression estimates for individual LRA measures (β) were regressed on both ADHD (β as lnOR) and EA regression estimates (β) using an OLS regression framework (R:stats library, Rv3.2.0). Outcomes were (1) a MVR regression estimate quantifying the change in SD units of LRA performance per log odds increase in ADHD risk conditional on years of schooling (ADHD effect independent of EA), and (2) a MVR regression estimate quantifying the change in SD units of LRA performance per year of schooling as captured by ADHD instruments (ADHD effect shared with EA). Latter MVR regression estimates capture here shared genetic effects between ADHD, EA and LRAs, including (1) genetic confounding (i.e., genetically predictable EA influences both ADHD and LRAs), (2) mediation (i.e., genetically predictable ADHD influences LRA indirectly through EA) and (3) biological pleiotropy (i.e., ADHD risk variants affect ADHD and EA through independent biological pathways). As ADHD risk and EA are inversely genetically related with each other 21 , they were reported to quantify change per missing year of schooling. To compare the magnitude of both MVR estimates, we also conducted analyses using fully standardised EA, ADHD and LRA regression estimates (Supplementary Information).

Sensitivity analyses
As the directionality of the relationship between ADHD, EA and LRAs cannot be inferred in this study, we also examined the genetic association between EA and LRAs, conditional on ADHD, using MVR. Two sets of EA instruments (conservative and subthreshold, Table S8) were selected from EA (SSGAC) GWAS summary statistics, analogous to the selection of ADHD instruments, and MVR was conducted as described above. Note that we did not create LRA instrument sets, as GWAS summary statistics of LRAs were underpowered.

Attrition analysis
We carried out an attrition analysis in ALSPAC studying the genetic association between LRAmissingness and polygenic ADHD risk, using both polygenic scoring analyses and MVR (Supplementary Information).
These findings indicate complex, potentially reciprocal cross-trait relationships (Fig. 2a) and violate MR causal modelling assumptions 26 . Consequently, ADHD instruments are not valid MR instruments as they are not independent of EA.

Multivariable regression analyses
To disentangle the genetic overlap of polygenic ADHD risk with literacy-related and language-related measures into ADHD genetic effects independent of and shared with EA, we applied MVR 26 using ADHD instruments based on the most powerful ADHD discovery sample (PGC + iPSYCH) (Fig. 2b).
Using conservative ADHD instruments (Table S8), nonword reading accuracy at age 9 and pooled readingrelated abilities were associated with polygenic ADHD risk, conditional on EA ( Table 3). The latter translates into, for example, a decrease of 0.35 SD in pooled reading performance per log-odds increase in ADHD risk (β ADHD = -0.35(SE = 0.09), P = 9.2 × 10 -5 , P het = 0.19), an effect that was considerably stronger than for other LRAs (P mod = 0.011, Table S10).
Using subthreshold ADHD instruments (Table S8), polygenic ADHD effects on LRA performance, conditional on EA, were detectable for all reading-related and spelling-related measures, phonemic awareness and verbal intelligence, but not other LRAs such as listening comprehension and non-word repetition (Table 3). Evidence was strongest for pooled reading and spelling abilities ( Table 3, minimum P = 1.1 × 10 −8 ). However, observable effects were smaller in magnitude compared to those captured by conservative ADHD instruments with, for example, a 0.03 SD decrease in pooled reading performance per log-odds increase in ADHD risk (β ADHD = -0.03(SE = 0.01), P = 1.4 × 10 −6 , Table 3). Comparing ADHD-specific effects on both reading and spelling with ADHD-specific effects on all other LRAs provided evidence for effect differences (P mod = 0.016), with stronger ADHD effects on literacy-related abilities, in particular spelling (Table S10).
Polygenic ADHD effects that are shared with EA were identified for all LRAs studied using subthreshold, but not conservative ADHD instruments (Table 3). This translates into, for example, a further 0.50 SD units decrease in pooled reading performance per missing school year (β EA = −0.50(SE = 0.09), P = 4.9 × 10 −8 , Table 3). Thus, the observed association between polygenic ADHD risk and listening comprehension and non-word repetition is fully attributable to genetic effects shared with EA (Table 3). Contrary to ADHD-specific effects, ADHD effects shared with EA showed no evidence for effect differences between literacy-related versus other LRAs (P = 0.31). showed that ADHD effects shared with EA were as large as or even larger compared to ADHD-specific effects (Fig.  2c, Table S9).

Conducting MVR with fully standardised estimates
Using an analogous approach, we disentangled the genetic overlap between polygenic EA and LRAs into genetic EA effects independent of and shared with ADHD, based on EA instruments (Fig. S1). There was strong evidence for EA effects shared with ADHD using subthreshold, but not conservative EA instruments (Table S11). The magnitude of ADHD genetic effects shared with EA, captured by ADHD genetic instruments, compared to the magnitude of EA genetic effects shared with ADHD, captured by EA instruments, was largely consistent with each other in fully standardised analyses (Tables S9 and S11).
There was little evidence supporting the inclusion of regression intercepts in MVR that would imply additional genetic effect variation in LRAs estimates, not yet captured by either ADHD or EA effect estimates, based on the selected instruments. Therefore, all MVRs were performed using constrained intercepts 26 .

Attrition in ALSPAC
Analyses of sample drop-out in ALSPAC, exemplified by missing reading accuracy and comprehension scores at age 7 (WORD), revealed a positive genetic association between missingness and polygenic ADHD risk (min P = 1.4 × 10 −8 , Supplementary Information, Table S12, Table  S13).

Discussion
This study identified strong and replicated evidence for an inverse association between polygenic ADHD risk and multiple population-based LRAs using a polygenic scoring approach. However, these associations involve shared genetic variation with genetically predictable EA. Accurate modelling of polygenic links using MVR techniques, conditional on EA, revealed an ADHD-specific association profile that primarily involves literacy-related impairments. Once shared genetic effects with EA were accounted for, polygenic ADHD risk was most strongly inversely associated with reading and/or spelling abilities, β β Fig. 2 Genetic relationships between ADHD, educational attainment and literacy-related and language-related abilities ADHD Attention-Deficit/Hyperactivity Disorder, EA educational attainment, LRAs literacy and language-related abilities, PGC Psychiatric Genomics Consortium, iPSYCH The Lundbeck Foundation Initiative for Integrative Psychiatric Research; SSGAC Science Genetic Association Consortium, ALSPAC Avon Longitudinal Study of Parents And Children, MVR multivariable regression. a Hypothesised biological model of genetic relationships between ADHD, EA, and LRAs reflecting complex, pleiotropic and reciprocal genetic links that prevent causal inferences. b Schematic MVR model assessing polygenic ADHD-LRA overlap independent of and shared with genetic effects for EA. c MVR estimates of ADHD-specific effects independent of EA and ADHD effects shared with EA on LRAs using standardised ADHD instruments: Sets of conservative (P < 5 × 10 -8 ) and subthreshold (P < 0.0015) ADHD instruments were extracted from ADHD (PGC + iPSYCH), EA (SSGAC) and LRAs (ALSPAC) GWAS summary statistics. ADHD-specific effects independent of EA (β ADHD ) and ADHD effects shared with EA (β EA ) on LRAs were estimated with MVRs. To compare the magnitude of β ADHD and β EA , MVR analyses were conducted using standardised regression estimates (Supplementary Methods). β ADHD estimates measure the change in LRA Z-score per Z-score in ADHD liability. β EA estimates measure the change in LRA Z-scores per Z-score in missing school years. MVR estimates based on raw genetic effect estimates are provided in Table 3. Pooled estimates for reading, spelling and global LRA measures (Table 1) were obtained through random-effects meta-regression. Only effects passing the experiment-wide significance threshold (P < 0.007) are shown with corresponding 95% confidence intervals. There is no causality inferred Table 3 Multivariable regression analysis of polygenic associations between ADHD and literacy-related and language-related abilities (raw estimates)

0.004
Note: Sets of conservative (P < 5 × 10 −8 ) and subthreshold (P < 0.0015) ADHD instruments were extracted from ADHD (PGC + iPSYCH), EA (SSGAC) and LRAs (ALSPAC) GWAS summary statistics. ADHD-specific effects independent of EA (β ADHD ) and ADHD effects shared with EA (β EA ) on LRAs were estimated with MVRs (Fig. 2b). ADHD effects shared with EA were assessed through EA genetic effect estimates of ADHD-associated variants and presented with respect to missing school years.
β ADHD quantifies the change in LRA Z-score per log odds increase in ADHD liability. in addition to phonemic awareness and verbal intelligence, but not listening comprehension and non-word repetition abilities. Importantly, genetic overlap between polygenic ADHD risk and all of the LRAs studied was inflated by genetic effects shared with EA. Using independent ADHD discovery samples, these findings show that genetic overlap between ADHD and literacy-related impairments observed in twin and family studies [16][17][18] can be extended to genetic associations, as captured by common variation in general population samples. The identified association profile suggests that not only reading-related abilities (including both word and nonword reading skills), but also phonological and spelling-related abilities share genetic aetiology with ADHD. These interrelated LRAs may, as hypothesised for RD, arise from a phonological impairment 48,49 , which affects decoding and reading skills 50 , but also spelling abilities 51 . However, reading abilities can, once developed, also shape phonological skills 52 .
In addition, this study suggests that genetic associations between polygenic ADHD and LRAs reflect, at least partially, shared genetic influences with genetically predictable EA and that, equally likely, genetic associations between polygenic EA and LRAs share genetic influences with ADHD. The magnitude of these shared effects, modelled with different MVR approaches, was comparable with each other. This is consistent with reciprocal genetic influences between EA and ADHD (Fig. 2a) and supports an intergenerational multiple-deficit model proposed for reading disability 15,53 . Children growing up in disadvantaged environments, genetically predictable through polygenic EA scores 54 , might be more vulnerable to psychiatric illness including ADHD 55 that affects, in turn, their LRAs. In addition, adolescents with ADHD might be more likely to leave school at an earlier age, with lower LRA performance and EA, and pass on an increased genetic load to their own children 56 .
Here, we demonstrate that disentangling multivariate genetic interrelationships between ADHD, EA and LRAs using MVR can aid the interpretation of genetic overlap, while controlling for collider bias 42 . However, using MVR, the detection of these polygenic associations was strongly governed by the choice of genetic variants. Conservative ADHD instruments identified large ADHD-specific effects on reading as a domain and little evidence for genetic effects that are shared with EA, although they had limited power 57 . They comprised 15 independent SNPs only, including variation within FOXP2, a gene that has been implicated in childhood apraxia of speech and expressive and receptive language impairments (http:// omim.org/entry/602081) 58 . On the other hand, subthreshold instruments, including thousands of variants, tagged ADHD-specific polygenic links with LRAs (conditional on EA) with smaller effects, but with higher predictive accuracy. However, these instruments also captured shared genetic effects with EA, affecting polygenic links between ADHD and all of the LRAs studied. These shared genetic influences were of equal strength and at least equal magnitude compared to ADHD-LRA associations independent of EA. Contrary, a previous twin study showed that the genetic covariance between ADHD and reading difficulties was largely independent of genetic effects shared with IQ 19 , suggesting that our findings may also reflect socio-economic influences. Thus, in order to improve reading and, more generally, literacy-related deficits in children with ADHD, there is potentially a need for further intervention programmes targeting EAindependent underlying neurocognitive deficits, beyond general training programmes aiming at schooling outcomes 59 .
In general, our findings are consistent with an omnigenic 60 model of complex trait architectures, compatible with a general factor model of psychopathology 61 , including ADHD 62 . The omnigenic model construes that only the largest-effect variants will reflect ADHD specificity, and may thus tag the most trait-specific associations between ADHD and reading, independent of EA. The majority of variants, however, will capture pleiotropic (omnigenic) influences pointing to highly interconnected neural networks 60 that give rise to genetic confounding. Consequently, the majority of subthreshold variants, captured by both ADHD and EA subthreshold variants, are likely to represent highly powerful cross-trait genetic predictors that may enhance and induce genetic overlap.
Finally, the methodological framework within this work has not only relevance for studies investigating polygenic links between ADHD and LRAs, but for many studies examining multivariate trait interrelationships that involve shared genetic effects with a genetically predictable confounder. Specifically, our findings suggest that lower variant selection thresholds can introduce genetic variance sharing that is unspecific and needs to be accounted for before identified associations can be interpreted in terms of underlying mechanisms, including shared genetic aetiologies. This is especially important as current guidelines for studying polygenic links with allelic scores recommend aggregating genetic variants across less stringent significance thresholds to maximise genetic association between discovery and target samples 63,64 .
This study has several limitations. Firstly, ALSPAC, as other cohort studies, suffers from attrition 65,66 . Sensitivity analyses showed that this is unlikely to bias our findings based on conservative instruments. However, links identified using subthreshold ADHD variants, might have been underestimated given that individuals with a genetic predisposition to ADHD (but also smoking initiation, higher body mass index, neuroticism, schizophrenia and depression) are more likely to drop out 66 . Secondly, the strength of the genetic overlap between polygenic ADHD risk and LRAs may vary according to ADHD symptom domain levels, implicating especially inattentiveness 67 , as well as the nature of the literacy-related or languagerelated ability involved (as we observed evidence for effect heterogeneity when combining all LRAs). It is conceivable that also other verbal abilities, not investigated in this study, such as grammar, expressive vocabulary or pragmatic skills, may genetically overlap with ADHD. Furthermore, we only studied the extent to which shared genetic variance with EA affects the genetic association between ADHD and LRAs. However, we found little evidence for the presence of additional unaccounted for genetic influences using these instruments, i.e., effects that are not yet captured by either genetically predicted ADHD or EA. Finally, the power of available LRA GWAS summary statistics is still too low to generate genetic instruments supporting reverse models. Larger and more detailed clinical and population-based samples, as well as extensive multivariate variance analyses of the spectrum of LRAs (that are currently computationally expensive 68 ) will help to further characterise the overlap between ADHD and literacy-related and language-related cognitive processes.

Conclusion
Polygenic associations of clinical ADHD and a range of LRAs are to a large extent attributable to genetic effects that are also shared with EA, especially when investigated with genetic variants typically selected for polygenic scoring approaches. Adjusting for these unspecific genetic effects reveals an ADHD-specific association profile that primarily involves literacy-related impairments.