Integrative genomic analysis of methylphenidate response in attention-deficit/hyperactivity disorder

Methylphenidate (MPH) is the most frequently used pharmacological treatment in children with attention-deficit/hyperactivity disorder (ADHD). However, a considerable interindividual variability exists in clinical outcome. Thus, we performed a genome-wide association study of MPH efficacy in 173 ADHD paediatric patients. Although no variant reached genome-wide significance, the set of genes containing single-nucleotide polymorphisms (SNPs) nominally associated with MPH response (P < 0.05) was significantly enriched for candidates previously studied in ADHD or treatment outcome. We prioritised the nominally significant SNPs by functional annotation and expression quantitative trait loci (eQTL) analysis in human brain, and we identified 33 SNPs tagging cis-eQTL in 32 different loci (referred to as eSNPs and eGenes, respectively). Pathway enrichment analyses revealed an over-representation of genes involved in nervous system development and function among the eGenes. Categories related to neurological diseases, psychological disorders and behaviour were also significantly enriched. We subsequently meta-analysed the association with clinical outcome for the 33 eSNPs across the discovery sample and an independent cohort of 189 ADHD adult patients (target sample) and we detected 15 suggestive signals. Following this comprehensive strategy, our results provide a better understanding of the molecular mechanisms implicated in MPH treatment effects and suggest promising candidates that may encourage future studies.

Genome-wide association study. Genomic DNA was isolated from peripheral blood leukocytes by a salting out procedure 17 . A total of 173 samples were genotyped on the Infinium PsychArray-24 BeadChip platform (Illumina, San Diego, CA, USA), which covers 588,628 markers, and processed at the Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard (Cambridge, MA, USA). Pre-imputation quality control and principal components analysis were implemented following the QC and PCA modules from the Ricopili with the default settings (https://sites.google.com/a/broadinstitute.org/ricopili/). Genotype imputation was performed with the pre-phasing and imputation strategy using the EUR population of the 1,000 Genomes Project Phase 1 dataset as the reference panel (http://www.1000genomes.org/). We assured the accuracy of the imputation SCIENTIFIC RePoRtS | (2018) 8:1881 | DOI: 10.1038/s41598-018-20194-7 data by filtering best-guess genotypes for an info score <0. 3. This resulted in a total of 11,051,824 markers eligible for association tests.
Before GWAS analysis, further quality control measures were applied using the PLINK software 18 . Individuals exhibiting high rates of genotype missingness (>98%) were removed, as well as SNPs with low call rate (<0.99), MAF < 0.01 or failing Hardy-Weinberg equilibrium test (P < 1e-06).
Finally, 173 subjects and 3,566,199 variants were tested for association with MPH response through logistic regression under an additive model, which included those clinical variables (i.e., CGI-S baseline scores) and principal components (i.e., PC6) significantly associated with clinical outcome (P ≤ 0.05) as covariates.
Identification of candidate causal SNPs. Among the SNPs showing nominal association with treatment outcome (P < 0.05), we used the genome pipeline of SNPinfo (http://snpinfo.niehs.nih.gov/) 19 to prioritise those that were more likely to affect protein sequence, transcriptional regulation, mRNA splicing or miRNA binding based on functional annotation. GenomePipe parameter values included: GWAS population = CEU; study population = CEU; flanking region = 200,000 bp; GWAS P-value < 0.05; LD threshold = 0.8; and MAF = 0.01 for all prediction methods. Additionally, we combined both the predicted conserved transcription factor-binding sites (TFBS) with the regulatory potential score (RP Score; available at http://genome.ucsc.edu) to improve predictions as suggested in several studies 20-22 . Cis-expression quantitative trait loci analysis. Cis-eQTL analysis was conducted on 193 neuropathologically normal cortical samples of adult humans from Myers et al. 23 . Expression-genotype pairs were extracted after extending the genotyped data by imputation as previously described, and considering a 10 kb window around the untranslated regions. Rank-invariant normalised expression levels were log 10 transformed and transcripts detected in less than 75% of the samples were discarded from the study. Association tests were performed under a linear model with the MatrixEQTL R Package 24 , including gender, age at death, cortical region, day of expression hybridisation, institute source of sample, post-mortem interval and transcripts detected rate in each sample as covariates.
Functional and pathway enrichment analysis. The biological functions and pathways related to genes containing at least one SNP nominally associated with both MPH response and human cortical expression levels (referred to as eSNPs) were assessed through the Ingenuity Pathway Analysis software (IPA) (Ingenuity Systems, Redwood City, CA, USA; www.ingenuity.com). IPA was also used to test for over-representation of genes previously studied in either ADHD or treatment outcome. Candidate genes for ADHD or MPH response were selected based on the gene list provided by the ADHDgene database (http://adhd.psych.ac.cn/index.do) 25 and a comprehensive search for published reviews of ADHD genetic and pharmacogenetic studies 11,12,[26][27][28][29][30][31] . Thus, a total of 436 genes were considered (Supplementary Table S1). Fisher's exact tests, with a Benjamini-Hochberg-adjusted P-value (P B-H ) < 0.05 as significance threshold, were applied in all analyses. To achieve meaningful statistics and interpretation of the results, we restricted the enrichment analysis to those annotation terms that included ≥2 genes of our dataset.
Polygenic risk score analysis. We generated polygenic risk scores (PRS) based on the results of the present GWAS using the Polygenic Risk Score software (PRSice) 32 . A logistic regression model was applied to test whether PRS at multiple stepwise P-value thresholds (i.e., P T < 1e-04, P T < 1e-03, P T < 0.05, P T < 0.1, P T < 0.2, P T < 0.3, P T < 0.4, and P T < 0.5) predicted treatment outcome in an independent sample of patients with ADHD (target population). The target population comprised 189 Brazilian adults from the Adult ADHD Outpatient Clinic of the Hospital de Clínicas de Porto Alegre, who underwent immediate-release MPH treatment. Diagnoses of ADHD and comorbidities, as well as inclusion/exclusion criteria, were achieved as previously described 33 . The outcome measures of MPH treatment were the CGI-S scale, applied before medication and at least four weeks after its beginning, and the CGI-I scale. Drug response was defined following the criteria used in the discovery sample. Similarly, samples were genotyped and imputed using the same platform, imputation protocol and reference panel. The resulting dataset consisted of 7,304,149 SNPs with an info score >0.6, a genotype call probability >0.8 and a missing rate <0.02.
Potential confounders were included as covariates in the PRS model if they were associated with MPH response (P ≤ 0.05) in the target sample (i.e., CGI-S baseline scores, use of concomitant medication and presence of phobia as comorbid condition), as well as the 10 first principal components to control for population stratification.
Meta-analysis. The eSNPs nominally associated with MPH response in the discovery sample were meta-analysed across the discovery and the target population used in the PRS analysis by the inverse-variance weighted method. The threshold for significance was set at P ≤ 1.52e-03 under the more conservative Bonferroni correction, taking into account 33 SNPs. Data availability. The datasets generated and/or analysed during the current study are not publicly available due to ethics constraints but are available from the corresponding author on reasonable request.

Results
Genome-wide association study in the discovery population. Subjects were predominantly male (84.4%), with an average age at assessment of 9.59 (SD = 2.91) years (range 5-17). One hundred and thirty-one participants (75.7%) met DSM-IV criteria for ADHD-combined subtype, 37 (21.4%) had ADHD-inattentive subtype and 5 (2.9%) were diagnosed with ADHD-hyperactive-impulsive subtype. Comorbid disorders were present in a modest number of patients (22.5%), the main ones being disabilities in reading and writing (12.7%), oppositional defiant disorder (5.8%) and tic disorders (1.7%). One hundred and forty-one subjects (81.5%) responded favourably to treatment according to the CGI-I scale, while 32 (18.5%) failed to show a clinical response to MPH. Responders and non-responders were comparable with regard to age, sex, ADHD subtype, comorbidity, use of concomitant medication, MPH dose and drug formulation (P > 0.05). There were significant differences, however, in the severity of symptoms as assessed by the CGI-S scale (P < 1e-03), with children resistant to MPH scoring higher at the baseline evaluation than children showing clinical improvement (Supplementary Table S2).
No variant reached genome-wide significance (P < 5e-08). However, the set of 4,709 genes containing SNPs nominally associated with MPH response (P < 0.05; Supplementary Table S3) was significantly enriched for candidates previously studied in ADHD or treatment outcome, with 199 out of 436 being present in this category (ratio = 0.46; P B-H = 1.56e-31).
Identification of candidate causal SNPs and cis-expression quantitative trait loci analysis. Considering these results, we prioritised the SNPs with P-values below 0.05 based on functional annotation and eQTL analysis rather than focusing on the top significant hits. Eight hundred and ninety-six independent markers were selected as candidate causal variants by functional annotation (Supplementary Table S4) and were subjected to further cis-eQTL analysis on a pre-existing dataset of 193 neuropathologically normal human cortical samples 23 . After imputation and quality control, a total of 284 variants and 300 genes with detectable expression levels in at least 75% of the samples were available for 146 individuals. Of these, we identified 33 SNPs tagging cis-eQTL in 32 different loci (referred to as eGenes), with eight SNP-gene pairs surpassing the 0.05 false discovery rate (FDR) threshold: rs12302749-SPSB2, P FDR = 1.13e-05; rs1061115-PYROXD2, P FDR = 2.17e-04; rs2071421-ARSA, P FDR = 7.26e-04; rs11553441-RRP7A, P FDR = 7.26e-04; rs4902333-CHURC1, P FDR = 7.26e-04; rs17279558-GGH, P FDR = 0.013; rs9901673-SENP3, P FDR = 0.023; and rs17685420-PEBP4, P FDR = 0.041 (Table 1) Polygenic risk score analysis and meta-analysis using the target population. Finally, in order to assess the predictive value of our findings we first computed PRS derived from the present GWAS in an independent sample of ADHD adult patients for whom data on response to MPH were available. The demographic and clinical characteristics of the target population according to the response status are presented in Supplementary  Table S5. Briefly, 85.2% of subjects (n = 161) were classified as responders, while 14.8% (n = 28) exhibited a reduced or lack of improvement. Responders and non-responders significantly differed with regard to CGI-S baseline scores, use of concomitant medication and presence of phobia as comorbid condition, and thus these additional risk factors were entered as covariates in the PRS model, as well as the 10 first principal components to control for population stratification. Since we did not detect significant results at any of the predefined P-value thresholds, we subsequently focused on the 33 eSNPs nominally associated with treatment outcome in the discovery sample and we increased statistical power by performing a meta-analysis across the discovery and the target population. Sixteen suggestive signals were identified (Table 3). Among them, 15 revealed the same direction of effect, with rs17685420 in the PEBP4 gene being significant after Bonferroni correction (OR = 3.07 (1.76-5.35), P = 7.90e-05), followed by additional compelling markers such as rs2071421 within ARSA (OR = 2.63 (1.29-5.37), P = 7.71e-03), rs2886059 in ALDH1L1 (OR = 2.30 (1.14-4.66), P = 0.020), and rs17712523 in CDH23 (OR = 2.13 (1.07-4.24), P = 0.031).

Discussion
To our knowledge, this is the first study investigating the genetic basis of MPH response from an integrative perspective that combines GWAS data, functional annotation, eQTL in relevant tissues to ADHD and pathway enrichment analyses. Our results highlight genes related to nervous system development and function, neurological diseases and psychological disorders. Thus, this comprehensive strategy provides a better understanding of the molecular mechanisms implicated in MPH treatment effects and suggests promising candidates that may contribute to clinical outcome.
In our attempt to improve earlier genetic studies by bridging the gap between genotype and phenotype, we prioritised the nominally significant SNPs based on functional annotation and cis-eQTL analysis in human brain, and we identified three candidates previously investigated in ADHD: ALDH1L1 34 , CDH23 35 and CMTM8 36 . Of these, CMTM8 showed overlapping association between adult ADHD and bipolar disorder 36 , and ALDH1L1, which yielded suggestive results in the present meta-analysis of MPH response, has been related to other neuropsychiatric conditions such as major depressive disorder or schizophrenia 37,38 . In addition, the ALDH1L1 locus was found among the top hits of a GWAS conducted on children and adolescents with ADHD 34 and contains non-synonymous rare variants identified through whole-exome sequencing in an ADHD nuclear family 39 . Similarly, CDH23 harbours one of the top SNPs from a pooling-based GWA of adult ADHD 35 and reached nominal significance in our meta-analysis. CDH23 is a member of the cadherin superfamily that mediates calcium-dependent cell-cell adhesion. The activity of cadherins depends on their anchorage to the neuronal cytoskeleton through proteins termed catenins (e.g., CTNNA2), which in turn activate KALRN, a key regulator of dendritic spine development and synaptic plasticity underlying learning and memory 40 . This is of particular interest since catenin-cadherin cell-adhesion complexes are important in cerebellar and hippocampal lamination 41 and both CTNNA2 and KALRN have shown nominal associations with clinical outcome in our GWAS. In this sense, Park et al. 41 demonstrated that mice lacking the actin-binding domain of Ctnna2 (cdf-mutant mice) exhibited abnormal morphology of cerebellum and hippocampus. Moreover, the cdf-mutant mice showed an impaired control of the startle response and deficits in startle modulation have also been found in ADHD patients 42,43 . Therefore, cell-adhesion molecules and regulators of synaptic plasticity may play a role in MPH treatment effects, which is in agreement with data from genome-wide linkage and association studies pointing to cadherin13 (CDH13) as one of the most consistent candidates implicated in ADHD pathophysiology. Specifically, CDH13 was detected in three independent GWAS 34,35,44 and lies within the 16q22-16q24 region identified in a meta-analysis of seven ADHD linkage scans 45 . Furthermore, SNPs in this gene have been linked to defects in verbal working memory and hyperactive/impulsive symptoms in subjects with ADHD 46,47 , addiction vulnerability and drug dependence (e.g., methamphetamine, alcohol, and nicotine) 48,49 .
Pathway enrichment analysis provided further evidence for neuroplastic changes in response to MPH treatment, considering the over-representation of genes involved in morphology of neurons, neuroglia, white matter and brain regions relevant to ADHD (e.g., cerebellum, cerebral cortex, and hippocampus) that we found among eGenes associated with drug response. Our results are in accordance to a wealth of data from neuroimaging studies showing that unmedicated ADHD patients present cortical thickness and reduced white matter volume in several areas of the brain compared to healthy subjects, while medicated children do not differ from control group [50][51][52][53] . In addition to structural alterations, ADHD patients exhibit deficits in neural functioning, which may  be normalised by MPH. In this sense, Rubia et al. [54][55][56] demonstrated that MPH restores the aberrant activation and functional connectivity in attention, motivation and interference inhibition networks, as well as during error processing, thus improving neuropsychological performance of subjects with ADHD. It should also be noted that 15 out of the 32 eGenes included in the pathway enrichment analysis harboured variants which provided preliminary evidence for an association with clinical outcome across the discovery and an independent sample. Our top hit from the meta-analysis, rs17685420, is located in the phosphatidylethanolamine binding protein 4 (PEBP4), a member of an evolutionary conserved family of proteins with pivotal biological functions such as cell proliferation and survival, stimulation of acetylcholine synthesis and inhibition of serine proteases 57 . Given that serine proteases are implicated in many processes during development and tissue homeostasis (e.g., neuronal outgrowth, cell migration, and cell death), disturbances in their activity on the nervous system have been proposed as a possible pathological mechanism for neurological disorders 58 . Indeed, Hohman et al. 59 identified a gene-gene interaction involving PEBP4 associated with late onset Alzheimer's disease (AD) across 13 independent datasets, and decreased expression levels have been found in hippocampus of both AD patients and mouse models for another phosphatidylethanolamine binding protein, the PEBP1 [60][61][62] , which has also shown alterations after methamphetamine and morphine administration 63,64 . Additional compelling results were detected for ARSA, SPSB2, CORO7 and PIGM. The ARSA gene encodes the arylsulfatase A, whose deficiency is characterised by decline in school performance, emergence of behavioural problems and neurologic symptoms, such as cerebellar ataxia, among others 65 . SPSB2 has been associated with borderline personality disorder in a genome-wide methylation analysis 66 and CORO7, which has shown to be important in brain development 67 , was identified as a novel candidate gene for emotionality by comparing the expression profile of two mouse lines with either high or low anxiety-related behaviour 68 . Finally, mutations in the PIGM gene, which encodes a protein involved in the synthesis of the glycosylphosphatidylinositol anchor, have been reported in individuals with severe neurological features, including seizures, muscular hypotonia and intellectual disability 69 . Another interesting finding arising from our research is the significant enrichment for candidates previously related to ADHD or MPH response detected among the set of genes nominally associated with treatment outcome. It is worth mentioning that four of these candidates, namely CTNNA2 (rs79067553, P = 3.51e-05), PARD3B (rs62172701, P = 3.28e-04), LRP1B (rs410870, P = 4.00e-04) and GRM7 (rs17047590, P = 6.36e-04), were significant at P < 1e-03 in the present GWAS analysis. In particular, the metabotropic glutamate receptor 7 (GRM7), which is widely expressed in brain regions relevant to ADHD such as the cerebral cortex, the hippocampus and the cerebellum 51,70 and has been associated with the disorder [71][72][73] , was also found among the top hits in a prior GWAS of MPH efficacy 13 , thus supporting the role of the glutaminergic system as a moderator of treatment outcome.
The main strengths of our design include the coverage of a considerably higher number of genetic variants in comparison with the study from Mick et al. 13 (319,722 vs 3,566,199 markers), the use of an integrative approach that combines GWAS data with bioinformatic methods, and the follow up of our top signals in an independent cohort, which did increase the association of a number of markers located in loci with biologically plausible functions (PEBP4, ARSA, and SPSB2). Nevertheless, some limitations should also be considered when interpreting these results. Given the limited sample size, the present study might not be sufficiently powered to detect individual variants of modest effects and we did not identify any loci reaching the genome-wide threshold. This constraint, however, is heavily conditioned on the difficulty to find the required phenotype as shown by the sample size of the studies included in the last meta-analysis of candidate gene-based investigations on MPH response 74 . The small dimension of our paediatric sample could also explain the lack of significance of the PRS derived from the GWAS results in an independent population of ADHD adult patients. Alternatively, this discrepancy may be attributed to differences in the genetic background and the clinical heterogeneity (e.g., comorbidities, frequency of clinical subtypes, and sex ratio) of ADHD among children and adults, as suggested by most of the pharmacogenetic studies conducted in adult samples, which failed to replicate variants previously identified in children and adolescents 75 . Additional methodological aspects or distinct environmental influences between the discovery and the target population may also be responsible for the absence of association.
In conclusion, despite not reaching any genome-wide significant association, our results are consistent with previous findings and highlight genes related to morphological abnormalities in brain regions important for motor control, attention and memory, thus supporting the use of bioinformatic and biological evidence as a complement to GWAS data to disentangle the genetic basis of MPH response.