## Introduction

Attention deficit/hyperactivity disorder (ADHD) is a common neurodevelopmental disorder that severely impairs the daily functioning of patients due to age-inappropriate levels of impulsivity and hyperactivity, and/or difficulties in focusing attention [1]. ADHD has a prevalence of 5–6% in childhood, and impairing symptoms persist into adulthood in around two-thirds of children with ADHD diagnosis, with an estimated adult prevalence around 3.4% [1, 2].

ADHD is a multifactorial disorder with heritability averaging 76% throughout the lifespan [3,4,5]. There is consistent evidence that both common and rare variants make an important contribution to the risk for the disorder [6,7,8,9,10,11]. Several genome-wide association studies (GWAS) and meta-analyses across those have been conducted [7], but only the largest GWAS meta-analysis (GWAS-MA) performed to date reported genome-wide significant loci [6]. This study concluded that common genetic variants (minor allele frequency, MAF > 0.01) account for 22% of the heritability of the disorder [6] and supported substantial genetic overlap between ADHD and other brain disorders and behavioral/cognitive traits [6, 12].

The presentation of ADHD symptoms changes from childhood to adulthood, with lower levels of hyperactivity in adulthood but a high risk for ongoing attention problems, disorganization, and emotional dysregulation [13, 14]. As in the general population, the pattern of psychiatric and somatic comorbid conditions in ADHD also changes substantially over time, with learning disabilities, oppositional defiant disorder, and conduct disorder being more prevalent in children, and substance use disorders, social phobia, insomnia, obesity, and mood disorders becoming more pronounced in adulthood [1, 15,16,17,18]. In addition, persistent ADHD in adults is, compared with the general population (and to cases with remitting ADHD), associated with higher risk for a wide range of functional and social impairments, including unemployment, accidents, and criminal behavior [7, 19,20,21,22,23].

Several risk factors measured in childhood predict the persistence of ADHD symptoms into adulthood, such as the presence of comorbid disorders, the severity of ADHD symptoms, being exposed to psychosocial adversity, as well as having a high polygenic risk score (PRS) for childhood ADHD [24,25,26,27,28]. Twin studies suggest that both stable and dynamic genetic influences affect the persistence of ADHD symptoms [4, 5, 29, 30]. However, specific genetic factors differentiating childhood and persistent ADHD into adulthood are not well understood due to the lack of longitudinal studies. Molecular studies, including the most recent GWAS-MA of ADHD [6], have been performed in children and adults either separately or jointly [6, 31,32,33,34,35,36,37,38,39,40], but large-scale analyses comparing their genetic basis are yet to be conducted.

## Material and methods

### Sample description

A total of 19 GWAS of ADHD comprising 49,560 individuals (17,149 cases and 32,411 controls), provided by the Psychiatric Genomics Consortium (PGC), the Lundbeck Foundation Initiative for Integrative Psychiatric Research (iPSYCH), and the International Multi-centre persistent ADHD CollaboraTion (IMpACT), were analyzed. All participants were of European ancestry, had provided informed consent, and all sites had documented permission from local ethics committees.

The meta-analysis on persistent ADHD was conducted in 22,406 individuals (6,532 ADHD adult cases and 15,874 controls) using six datasets from the IMpACT consortium, two datasets from the PGC, and the adult subset from the iPSYCH cohort included in Demontis and Walters et al. [6]. The meta-analysis on ADHD in childhood included 27,154 individuals (10,617 cases and 16,537 controls), comprising two Brazilian and Spanish cohorts, seven datasets from the PGC, and the children subset from the iPSYCH cohort included in Demontis and Walters et al. [6]. All patients met DSM-IV/ICD-10 diagnostic criteria. In total, 7,086 new samples not included in Demontis and Walters et al. [6] were considered in the present study. Detailed information on each dataset is provided in Table S1 and in Supplementary Methods.

### GWAS and meta-analyses

Genotyping platforms and quality control (QC) filters for each of the datasets are shown in Table S1. Pre-imputation QC at individual and SNP level were performed using the Rapid Imputation and COmputational PIpeLIne with the default settings (https://sites.google.com/a/broadinstitute.org/ricopili/). Non-European ancestry samples, related and duplicated individuals, and subjects with sex discrepancies were excluded. Phasing of genotype data was performed using the SHAPEIT2 algorithm, and imputation for unrelated samples and trios was performed with MaCH, IMPUTE2, or MINIMAC3 (http://genome.sph.umich.edu/wiki/Minimac3) depending on software availability at the time of imputation (Table S1) [41,42,43]. The European ancestry panel of the 1000 Genomes Project using genome build hg19 was considered as reference for genotype imputation (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/). After imputation, the association with ADHD of genotype dosages was tested using logistic regression in PLINK 1.9 [44], assuming an additive genetic model and including sex, the first ten principal components, and other relevant covariates for each case-control study (Table S1). GWAS summary statistics were filtered prior to meta-analysis, excluding variants with MAF < 0.01, and imputation quality scores (INFO) ≤ 0.8. Inverse-variance weighted fixed-effects meta-analyses were conducted using METAL [45] and results were filtered by effective sample size > 70% of the total, defined as $${\mathrm{Neff}} = \frac{2}{{\left( {\frac{1}{{{\mathrm{Nca}}}}} \right) + \left( {\frac{1}{{{\mathrm{Nco}}}}} \right)}}$$ [46]. The genome-wide significance threshold was set at P < 5.00E−08 to correct for multiple testing. Independent loci for variants exceeding this threshold were defined based on clumping using PLINK 1.9. Variants that were ±250 kb away from the index variant (variant with smallest P value in the region), with P value < 0.001, and with an estimated linkage disequilibrium (LD) of r2 > 0.2 with the index variant were assigned to a clump (p1 = 5.00E−08, p2 = 0.001, r2 = 0.2, kb = 250). Manhattan and Forest plots were generated using the “qqman” and “forestplot” R packages (3.4.4R version), respectively. The LocusZoom software [47] was used to generate regional association plots.

Details of downstream analyses for top signals identified are provided in the online supplement and include conditional analysis, Bayesian credible set analysis, and functional characterization of the significant variants.

### SNP-based heritability (SNP-h2)

The SNP-h2 was estimated by single-trait LD score regression using summary statistics, HapMap 3 LD-scores, considering default SNP QC filters (INFO > 0.9 and MAF > 0.01) and assuming population prevalence of 3.4, 5.5, and 5% for persistent ADHD, ADHD on childhood, and ADHD across the lifespan, respectively, [48]. Data of 1,113,287, 1,072,558, and 1,092,418 SNPs from the GWAS-MA of persistent ADHD, ADHD on childhood, and ADHD across the lifespan, respectively, were considered to estimate the liability-scale SNP-h2. Partitioning and enrichment of the heritability by functional categories was analyzed using the 24 main annotations (no window around the functional categories) described by Finucane et al. [49]. Statistical significance was set using Bonferroni correction (P < 2.08E−03).

### Gene-based and gene-set analyses

MAGMA software was undertaken for gene-based and gene-set association testing using summary data from our GWAS-MAs [50]. Variants were mapped to a gene if they were within 20 kb upstream or downstream from the gene according to dbSNP build 135 and NCBI 37.3 gene definitions. Genes in the MHC region (hg19:chr6:25-35M) were excluded from the analyses. LD patterns were estimated using the European ancestry reference panel of the 1000 Genomes Project. Gene sets denoting canonical pathways were downloaded from MSigDB (http://www.broadinstitute.org/gsea/msigdb), which integrates Kyoto Encyclopedia of Genes and Genomes (http://www.genome.jp/kegg/), BioCarta (http://www.biocarta.com/), Reactome (https://reactome.org/), and Gene Ontology (GO) (http://www.geneontology.org/) resources. Bonferroni correction (P < 2.77E−06 for 18,038 genes in persistent ADHD; P < 2.75E−06 for 18,218 genes in childhood ADHD; P < 2.79E−06 for 17,948 genes in ADHD across the lifespan) and 10,000 permutations were used for multiple testing correction in the gene-based and gene-set analyses, respectively.

### BUHMBOX analysis

The Breaking Up Heterogeneous Mixture Based On cross(X)-locus correlations (BUHMBOX) analysis [51] was used to test whether the genetic correlation between persistent ADHD and ADHD in childhood was driven by subgroup heterogeneity, found when there is a subset of children enriched for persistent ADHD-associated alleles. Subgroup heterogeneity was tested in each childhood dataset considering independent SNPs (r2 = 0.1, kb = 10,000) with MAF > 0.05 from the GWAS-MA of persistent ADHD using two different P value thresholds of P < 5.00E−05 (62 SNPs) and P < 1.00E−03 (710 SNPs). Results were meta-analyzed using the standard weighted sum of z-score approach, where z-scores are weighted by the square root of the effective sample size. The statistical power was calculated using 1,000 simulations, considering the ADHD children meta-analysis sample size, the odds ratios and risk allele frequencies from the GWAS-MA of persistent ADHD, and assuming 65% of heterogeneity proportion (π).

### Sign test

The direction of the effect of variants associated with ADHD in childhood was tested in persistent ADHD and vice versa, using strict clumping (r2 = 0.05, kb = 500, p2 = 0.5) and different P value thresholds (1.00E−07, 5.00E−07, 1.00E−06, 5.00E−06, 1.00E−05, 5.00E−05, 1.00E−04, and 5.00E−04). The concordant direction of effect was evaluated using a one sample test of the proportion with Yates’ continuity correction against a null hypothesis of P = 0.50 with the “stats” R package.

### Polygenic risk scoring

PRSs were constructed using different P value thresholds (P < 0.001, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, and 1) to select independent variants (p1 = 1, p2 = 1, r2 = 0.1, kb = 250) from the childhood GWAS-MA of ADHD and were then tested for association with persistent ADHD in each of the nine datasets, adjusting for the covariates included in the GWAS and using PRSice-2 (https://choishingwan.github.io/PRSice/). Best guess genotypes for nonambiguous strand variants present in all the persistent ADHD studies (missing rate < = 0.02) were included (NSNPs = 32,584 for P =1). Results from the nine PRS analyses at each P value threshold were combined using inverse-variance weighted meta-analysis.

### Genetic correlation

Cross-trait LD score regression with unconstrained intercept was used to calculate genetic correlations (rg) between pairs of traits, considering HapMap3 LD-scores, markers with INFO ≥ 0.90, and excluding the MHC region (hg19:chr6:25-35M) [48]. Other ADHD datasets [6, 52] and phenotypes from the LD-hub centralized database [53] with heritability z-scores (observed heritability/observed standard error) >4 and with an observed heritability > 0.1 were considered (N = 139 out of 689 available traits). Statistical significance was set using Bonferroni correction (P < 3.60E−04). Pearson’s correlation coefficient (Pearson’s r) was calculated between the genetic correlations of persistent ADHD with the phenotypes from the LD-hub and the genetic correlations of ADHD in childhood with the phenotypes from the LD-hub.

## Results

The GWAS-MA of persistent ADHD in adults included 6,532 adult ADHD cases and 15,874 controls. Minimal population stratification or other systematic biases were detected (LD score regression intercept = 1.01, Fig. S1a). The proportion of heritability of persistent ADHD attributable to common single-nucleotide polymorphisms on the liability scale (SNP-h2) was 0.19 (SE = 0.024), with a nominally significant enrichment in the heritability of variants located in conserved genomic regions (P = 5.18E−03) and in the cell-specific histone mark H3K4me1 (P = 3.17E−02) (Fig. S2a). The gene-based analysis revealed six genes in four loci (ST3GAL3, FRAT1/FRAT2, CGB1, and RNF225/ZNF584) significantly associated with persistent ADHD, with ST3GAL3 being the most significant one (P = 8.72E−07) (Table S2a). The single-marker analysis showed no variants exceeding genome-wide significance, with the most significant signal being rs3923931 (P = 1.69E−07) (Fig. 1a and Table S3a). Similarly, no significant gene sets were identified in the pathway analysis after correction for multiple comparisons (Table S4a [excel file]).

### GWAS-MA of ADHD in childhood

To compare the genetic background between persistent ADHD in adults and ADHD in childhood (that may include future remittent and persistent forms of the disorder), we conducted a GWAS-MA on children with ADHD in a total of 10,617 ADHD cases and 16,537 controls. We found no evidence of genomic inflation or population stratification (LD score regression intercept = 1.02, Fig. S1b). The liability-scale SNP-h2 for ADHD in childhood was 0.19 (SE = 0.021), with a significant enrichment in the heritability of variants located in conserved genomic regions after Bonferroni correction (P = 1.21E−06) (Fig. S2b). The gene-based analysis highlighted a significant association between FEZF1 and ADHD in childhood (P = 5.42E−07) (Table S2b). No single genetic variant exceeded genome-wide significance, with the top signal being in rs55686778 (P = 1.67E−07) (Fig. 1b and Table S3b), and no significant gene sets were identified in the pathway analysis after correction for multiple comparisons (Table S4b [excel file]).

We found a strong genetic correlation between persistent ADHD in adults and ADHD in childhood (rg = 0.81, 95% CI: 0.64–0.97), significantly different from 0 (P = 2.13E−21) and from 1 (P = 0.02). Sign test results provided evidence of a consistent direction of effect of genetic variants associated with ADHD in childhood in persistent ADHD and vice versa (P = 6.60E−04 and P = 4.47E−03, respectively, for variants with P < 5.00E−05 in each dataset) (Table S5). In addition, PRS analyses showed that childhood ADHD PRSs were associated with persistent ADHD at different predefined P value thresholds, with the P = 0.40 threshold (NSNPs = 20,398) explaining the most variance (r2 = 0.0041 and P = 1.20E−27) (Fig. 2a). The quintiles of the PRS built using this threshold showed the expected trend of higher ADHD risk for individuals in higher quintiles (Fig. 2b, Table S6).

We then tested whether the genetic correlation between persistent ADHD and ADHD in childhood was driven by a subset of children enriched for persistent ADHD-associated alleles using the Breaking Up Heterogeneous Mixture Based On Cross-locus correlations (BUHMBOX) analysis. We found no evidence of subgroup genetic heterogeneity in children, supporting that the sharing of persistent ADHD-associated alleles between children and adults was driven by the whole group of children, with a statistical power of 98.4 and 100% for thresholds of P < 5.00E−05 and P < 1.00E−03, respectively (Table S7).

### GWAS-MA of ADHD across the lifespan

Given the strong genetic correlation between persistent ADHD in adults and in childhood, we performed a GWAS-MA of ADHD across the lifespan considering all datasets included in the GWAS-MAs. In total, 17,149 ADHD cases and 32,411 controls were included, and no evidence of genomic inflation or population stratification was found (LD score regression intercept = 1.03, Fig. S1c). The liability-scale SNP-h2 for ADHD across the lifespan was 0.17 (SE = 0.013), and a significant enrichment in the heritability of variants located in conserved genomic regions was observed after Bonferroni correction (P = 1.53E−06) (Fig. S2c). We identified four genome-wide significant variants (Figs. 1c and 3, Table 1a, and Fig. S3) and nine genes in seven loci (FEZF1, DUSP6, ST3GAL3/KDM4A, SEMA6D, C2orf82/GIGYF2, AMN, and FBXL17) significantly associated with ADHD across the lifespan (Table 1b). The most significantly associated locus was on chromosome 6 (index variant rs183882582-T, OR = 1.43 (95% CI: 1.26–1.60), P = 1.57E−08), followed by loci on chromosome 7 (index variant rs3958046), chromosome 4 (index variant rs200721207), and chromosome 3 (index variant rs1920644) (Table 1a, Fig. 3). The gene-set analysis showed a significant association of the “ribonucleoprotein complex” GO term with ADHD across the lifespan (P.adj = 0.021) (Table S4c [excel file]).

One of the four loci identified in the single-variant analysis also reached genome-wide significance in the previous GWAS-MA on ADHD [6], and all of them showed consistent direction of the effect in that study (Table S8a). Significant loci reported by Demontis et al. [6] showed nominal association with ADHD across the lifespan in our study (Table S8b, c), with single variant hits showing the same direction of the effect (Table S8b).

Analyses conditioning on the index variant for the four ADHD-associated loci did not reveal new independent markers. These four significant loci were functionally characterized by obtaining Bayesian credible sets and searching for expression quantitative trait loci (eQTL) using available data in blood or brain [54, 55]. We found that credible sets for three of the four loci contained at least one eQTL within 1 Mb of the index variant. The credible set on chromosome 6 included the index variant (rs183882582) and rs12197454. This variant, in LD with the index variant (r2 = 0.56), was associated with the expression of RSPH3 in blood and brain (P.adj < 1.65E−05 and P.adj = 2.36E−07, respectively), and with the expression of VIL2 in blood (P.adj = 3.21E−03). The credible set for the second most associated locus on chromosome 7 included 24 variants. The index variant, rs3958046, and other variants in this set, were eQTLs for CADPS2 in brain (maximum P.adj = 2.91E−03). The credible set for the locus on chromosome 4 contained 50 variants, most of them located in or near PCDH7, but no eQTLs were identified. In the credible set for the locus on chromosome 3, which included 98 variants, the index variant, rs1920644, was associated with the expression of KPNA4, IFT80, and KRT8P12 in brain (P.adj = 1.16E−04, P.adj = 1.40E−03, and P.adj = 1.77E−03, respectively). Many other variants in this set were eQTLs for these genes and also for TRIM59, OTOL1, and/or C3orf80 in brain (P.adj < 0.05) (Table S9 [excel file]).

In a summary-data-based Mendelian randomization (SMR) analysis, we used summary data from the GWAS-MA of ADHD across the lifespan and the eQTL data in blood and brain from Westra et al. [54] and Qi et al. [55] to identify gene expression levels associated with ADHD. We found a significant association between ADHD across the lifespan and RMI1 expression in blood (PSMR = 5.36E−06) (Table S10 [in excel]), finding not likely to be an artifact due to LD between eQTL and other ADHD-associated variants given that the PHEIDI was 0.47.

### Genetic correlation with other ADHD datasets and phenotypes

We found significant genetic correlations of ADHD in children and adults from the previous GWAS-MA [6] (N = 53,296) and persistent ADHD (rg = 0.85, SE = 0.04, P = 5.49E−99), ADHD in childhood (rg = 0.99, SE = 0.03, P = 5.02E−273), and ADHD across the lifespan (rg = 0.98, SE = 0.01, P < 2.23E−308) (Table S11). When removing sample overlap (LD score genetic covariance intercept = 0.75) and considering only the subset of new samples included in our GWAS-MA on ADHD across the lifespan (N = 7086), a significant genetic correlation was also obtained between their sample and ours (rg = 0.91, SE = 0.35, P = 8.70E−03).

We also observed significant genetic correlations between childhood ADHD symptom scores from a GWAS-MA in a population of children reported by the EAGLE consortium [52] (N = 17,666) and persistent ADHD (rg = 0.65, SE = 0.20, P = 1.10E−03), ADHD in childhood (rg = 0.98, SE = 0.21, P = 2.76E−06), and ADHD across the lifespan (rg = 0.87, SE = 0.19, P = 4.80E−06). Similarly, significant genetic correlations between GWAS of self-reported ADHD status from 23andMe (N = 952,652) and persistent ADHD (rg = 0.75, SE = 0.05, P = 2.49E−45), ADHD in childhood (rg = 0.63, SE = 0.05, P = 1.39E−42), and ADHD across the lifespan (rg = 0.72, SE = 0.04, P = 4.86E−88) were observed (Table S11).

We also estimated the genetic correlation of persistent ADHD in adults, ADHD in childhood, and ADHD across the lifespan with all available phenotypes in LD-hub. Results for 139 phenotypes passed the QC parameters and 41 genetic correlations were significant after Bonferroni correction in both children and adults with persistent ADHD (Table S12 [excel file]). Again, the genetic correlations with ADHD were consistent across the lifespan, with similar patterns found in adulthood and childhood (Pearson’s r = 0.89) (Fig. 4a, Table S12 [excel file]). The strongest genetic correlations with ADHD were found for traits related to academic performance, intelligence, and risk-taking behaviors, including smoking and early pregnancy (Fig. 4b).

## Discussion

In the current study, we set out to explore the contribution of common genetic variants to the risk of ADHD across the lifespan by conducting GWAS-MAs separately for children and adults with persistent ADHD that meet DSM-IV/ICD-10 criteria. Using the largest GWAS datasets available from the PGC, the iPSYCH, and IMpACT consortia we found evidence for a common genetic basis for ADHD in childhood and persistent ADHD in adults and identified nine new loci associated with the disorder.

In the view of the fact that children with ADHD may be an admixed group of individuals whose ADHD symptoms will persist or remit in adulthood, we ran a BUHMBOX analysis to elucidate if the potential “persistent” individuals could be distinguishable already in childhood. Our data supported genetic similarities in ADHD across the lifespan with no evidence of a subset of patients enriched for persistent ADHD-associated alleles within the group of children.

We also found strong and significant positive genetic correlations of ADHD ascertained in clinical populations of adults, children, or both with other ADHD-related measures from general population samples, including the largest GWAS of self-reported ADHD status from 23andMe participants (N = 952,652) and the GWAS-MA of childhood rating scales of ADHD symptoms in the general population [52]. In agreement with previous reports, these data suggest that a clinical diagnosis of ADHD in adults is an extreme expression of continuous heritable traits [6] and that a single question about ever having received an ADHD diagnosis, as in the 23andMe sample, may be informative for molecular genetics studies.

Similar patterns of genetic correlation of ADHD with different somatic disorders and anthropometric, cognitive, and educational traits were identified for children and adults. These findings were highly similar to those observed in the recent GWAS-MA [6] and further extend the existing hypothesis of a shared genetic architecture underlying ADHD and these traits to a lifespan perspective.

We report 13 loci in gene- and SNP-based analyses for childhood ADHD, adult ADHD, and/or ADHD across the lifespan. Four ADHD-associated loci were previously identified by Demontis et al. [6], which was expected due to the sample overlap between the two datasets. The new loci identified in the present study mainly included genes involved in brain formation and function, such as FEZF1, a candidate for autism spectrum disorder implicated in the formation of the diencephalon [58, 59], RSPH3, which participates in neuronal migration in embryonic brain [60], CADPS2, which has been associated with psychiatric conditions due to its role in monoamine and neurotrophin neurotransmission [61,62,63,64], AMN, which is involved in the uptake of vitamin B12 [65, 66], essential for brain development, neural myelination, and cognitive function [67], and FBXL17, which has previously been related to intelligence [68].

The main limitation of this study is the sample overlap (85.7%) between the present GWAS-MAs and the previous one by Demontis et al. [6], which highlighted loci previously associated with ADHD. Although sample overlap may have inflated the genetic correlation found between these studies, the estimate remained strong and significant when excluding nonoverlapping datasets.

In summary, the present cross-sectional analyses identify new genetic loci associated with ADHD and, more importantly, support the hypothesis that persistent ADHD in adults is a neurodevelopmental disorder that shows a high and significant genetic overlap with ADHD in children. Future longitudinal studies will be required to disentangle the role of common genetic variants on ADHD remittance and/or persistence.