Non-coding structural variation differentially impacts attention-deficit hyperactivity disorder (ADHD) gene networks in African American vs Caucasian children

Previous studies of attention-deficit hyperactivity disorder (ADHD) have suggested that structural variants (SVs) play an important role but these were mainly studied in subjects of European ancestry and focused on coding regions. In this study, we sought to address the role of SVs in non-European populations and outside of coding regions. To that end, we generated whole genome sequence (WGS) data on 875 individuals, including 205 ADHD cases and 670 non-ADHD controls. The ADHD cases included 116 African Americans (AA) and 89 of European Ancestry (EA) with SVs in comparison with 408 AA and 262 controls, respectively. Multiple SVs and target genes that associated with ADHD from previous studies were identified or replicated, and novel recurrent ADHD-associated SV loci were discovered. We identified clustering of non-coding SVs around neuroactive ligand-receptor interaction pathways, which are involved in neuronal brain function, and highly relevant to ADHD pathogenesis and regulation of gene expression related to specific ADHD phenotypes. There was little overlap (around 6%) in the genes impacted by SVs between AA and EA. These results suggest that SVs within non-coding regions may play an important role in ADHD development and that WGS could be a powerful discovery tool for studying the molecular mechanisms of ADHD


Results
Exonic/splicing SVs impact structure of genes related to neurodevelopment procedures. Approximately 160,000 structural variations (SVs) were identified in ADHD patients (Fig. 1b), of those, 0.96% were classified as exonic, 0.59% as splicing, 42.3% as intronic and 56.13% as intergenic. Exonic/ splicing usually have more significant impacts since they alter the coding regions and splicing sites directly As expected, they accounted for a small proportion (~ 1.5%) of the total of which 37 were significantly with ADHD threshold 0.05 and 9 with threshold 0.01 (Table 1). In addition to the 37 exonic/splicing SVs that associated with ADHD we identified 451 rare ADHD-associated SVs in AA (Supplementary Table 2a) and 382 in EA (Supplementary Table 2b), 41 SVs are only existed in ADHD cases and were absent from controls ( Table 2). A recurrent 320 bp long deletion was identified for three AA ADHD patients at chr5:171723712-171724032, at the splicing site of non-coding RNA LOC100288254.   Supplementary Tables 3, 4, 5 for AA, EA, and meta-analysis, respectively. The majority of selected ADHD-associated SV-genes were impacted by SVs within non-coding regions (Table 3), furthermore based on the ADHDgene database 14 , there are no known exonic/splicing SV-genes from previous studies passed the statistic threshold. However, a novel exonic deletion in IQSEC3 passed the ethnicity meta-analysis (p value = 0.0083, Supplementary Table 6). IQSEC3 is a neuronal exchange gene related to speech, i.e. childhood apraxia of speech, and down-regulated in autism and schizophrenia 15 . An example network pathway of SVs within non-coding regions is neuroactive ligand-receptor interaction, a pathway critical in neuronal brain function, known to be highly relevant to ADHD development and regulation of differential gene expression in different ADHD-related brain regions 16,17 . Non-coding SVs such as intronic deletion of HTR1F, intronic translocation of CHRNA3, intergenic translocation of GRIN2A, and intronic insertions of GRM5, were found significantly enriched in ADHD patients (Table 4).  Fig. 1), however, impacted ADHD-associated SV-genes, which reach statistical significance, are different between two ethnicities (Fig. 2). There were 686 ADHD-associated SV-genes for AA based on statistical tests (Supplementary Table 3), and 439 ADHD-associated SV-genes for EA (Supplementary Table 4). Only 34 genes shared between two ethnicities (Supplementary Table 5), which counted 5%/8% for entire SV-gene set. Meta-analysis identified 234 ADHD-associated SV-genes (Supplementary Table 6), and only four ADHD-associated SV-genes were found in previous literatures (Table 5). Actually, genes in meta-analysis results are still impacted by ethnicities, for example MYBPC1 has intergenic SVs with p value 0.017 in meta-analysis, and the p value is 0.79 in AA and 0.0032 in EA, in other words, this meta-significant SV-gene passed through meta-analysis because highly ADHD-associated in EA and not significant at all in AA.

Discussion
Attention deficit hyperactivity disorder (ADHD) is the most common neurobiological disorder in children, with a prevalence of 6-8% 6 . In this study, we identified 37 exonic/splicing SVs, several involving genes that have been previously reported in neurological and mental diseases, such as VPS53, which has been previously associated with a neurological conditions and Parkinson disease 18 . Consequently, we identified 40 novel recurrent SV  www.nature.com/scientificreports/ genes associated with ADHD, where the SVs occurred exclusively in ADHD or have frequency less than 0.5% in non-ADHD controls. Those novel recurrent SVs could be also important in ADHD development. For example, we identified a novel 320 bp deletion at splicing sites of non-coding RNA LOC100288254, which is a recurrent SV in three independent ADHD patients and only seen in ADHD patients. Additional notable recurrent rare SVs included an exonic insertion of a non-coding RNA, MIR137, which has been shown to play a significant role in neural development and neoplastic transformation 19 , splicing inversion in DRD4 which has previously been implicated in ADHD 20 , and an exonic translocation of BPTF, which causes expressive language delay and intellectual disability 21 . BPTF, which exonic translocation was identified in two independent individuals, was considered as a candidate gene in neurodevelopmental disorder based on exome pool-seq 22 , and believed to be the cause of syndromic developmental, speech delay, postnatal microcephaly, and dysmorphic features in recent study 21 . We also observed that the non-coding RNA LINC00461, which wasone of the 12 significant loci in the study by Demontis et al. 13 , had an intronic insertion in six ADHD patients with chi-square p value = 0.02. In addition, this study reveals that SVs within non-coding regions may be more critical in ADHD biological networks than they used to believe. One typical example is the neuroactive ligand-receptor interaction pathway, a pathway critical in neuronal brain function, known to be highly relevant to ADHD pathogenesis. In this study, 17 SV genes were found significantly different between ADHD patients and controls, including four genes CHRNA3, GRM5, HTR1F, GRIN2A which were supported by previous literature 4, [23][24][25] . All the identified structural variations related to this pathway are either intronic or intergenic. Similar situations were found in other neurodevelopmental pathways, such as MAPK signaling pathway and Axon guidance. Functional role of SVs in non-coding regions in ADHD therefore warrant further investigation. We also show that there is only a small portion of overlap between the two ethnicities of SV impacted genes, and the result was further replicated as we limited the SV-genes to known ADHD genes based on the ADHD gene database 14 . 25 ADHD-associated SV-genes have been previously studied and reported in the literature (Table 6), and only one gene, AGBL1, with intergenic SVs shows statistically significant difference in both ethnicities. AGBL1 was the top locus in the largest ADHD genome-wide meta-analysis done 26 and mutation in this gene showed significant association with learning performance 27 . Taken together, the results suggest that impacted ADHD-associated genes differ between the two ethnicities, suggesting that ADHD analysis without population information could miss potential disease genes.
While the majority of those ADHD-associated SVs are located in non-coding regions, the question is how these SVs impact ADHD pathways in the two ethnicity groups: are the pathways different or do the SVs impact same pathways but different gene modules? In order to explore the answers, we further mapped these SVs within non-coding regions into the highly studied ADHD pathways, including neuroactive ligand-receptor interaction pathway. For the ADHD SV-genes which are significantly different in ADHD and non-ADHD controls (p value ≤ 0.05) or have a trend towards significance (p value ≤ 0.1), 10 of them belong to neuroactive ligand-receptor interaction pathway and five genes are AA SV-genes and the left are EA specific SV-genes ( Table 4). The results  www.nature.com/scientificreports/ also suggest that SVs, especially SVs within non-coding regions, impact the same gene families but different gene members, such as intronic SVs of CHRNA3 in AA versus CHRNA4 in EA, and intronic SVs of HTR1F in AA versus HTR2C in EA. Based on the enrichment studies for those ADHD-associated pathways, it suggests that the SVs within non-coding regions impact the same ADHD-associated pathways for both ethnicities, but different genes in the same gene families.
In summary, we have conducted a genomic-level study in ADHD patients using whole genome sequencing that takes non-coding genes/regions and ethnicity factors into consideration. The results show that non-coding region SVs and non-coding genes may play a role in the development and progression of ADHD, and WGS may be a powerful tool to explore ADHD molecular mechanisms. Additionally, our study highlights that genomiclevel population differences exist between Caucasian and African American patients, especially for non-coding SVs in neuronal genes and that these variants may influence response to specific medications. For the potential evolutional advantages of ADHD in human history 28 , the same ADHD-associated pathways though different genes were involved in the adaption to the environmental selection for survival in the two major human populations. On the other hand, we admit that the current study is limited by the sample size because of the significant expense of WGS. The SVs highlighted by our study warrant further study and confirmation.

Methods
Patient selection. The patients were randomly selected from the Philadelphia Neurodevelopmental Cohort (PNC), archived in the biobank of the Center for Applied Genomics (CAG) at the Children's Hospital of Philadelphia (CHOP), with full electronic medical record (EMR). Psychopathology of the cohort was assessed using a computerized, structured interview (GOASSESS) 29 . The diagnosis of ADHD was based on the Diagnostic and Statistical Manual of Mental Disorders-Fourth Edition (DSM-IV) originally, and later confirmed by DSM-V. There were 205 ADHD cases, including 116 African Americans (AA) and 89 European Americans (EA), and 670 controls, including 408 AA and 262 EA (Supplementary Table 1, Fig. 1a).  30 . For quality control, we only included SVs that passed MANTA's default filters, which required the length of corresponding SVs to be at least 50 bp and rated Table 6. Significant ADHD-associated non-coding SV-genes which have previously ADHD literature support in AA and EA, respectively. Small overlap between two ethnicities. Gene AA intronic SV significant associated with ADHD EA intronic SV significant associated with ADHD AA intergenic SV significant associated with ADHD EA intergenic SV significant associated with ADHD

AGBL1
--Yes Yes www.nature.com/scientificreports/ "PASS" based on MANTA threshold. Passing SVs were stratified into different classes based on their sequence content. Using the hg19 GENCODE reference sequence, if the start and end points of a SV mapped within an exon, it was annotated as "exonic'; if the start and end points of a SV were located within an intronic region and the SV does not spanned an exon, it was annotated as "intronic"; if the SV was located across exon/intron border sites, it was annotated as "splicing"; and the remaining SVs were annotated as "intergenic". Exonic, intronic and splicing SVs were annotated with the impacted gene, and intergenic SVs were annotated with their closet gene based on genomic locus. The corresponding annotated gene, either the SVs impacted gene or the closet gene, was named as "SV-gene". Association of SV-genes in ADHD were calculated for AA and EA independently using Chi-square tests and Fisher's exact tests. Bonferroni correction was used for correction of multiple testing by the number of tested variations or genes. Significance was set at 0.05 after Bonferroni correction. We only included risk variants in downstream analyses, i.e. SVs that had odd ratios greater than 1. Meta-analysis (combined Chi-square test) was applied when combing two ethnicities together. Pathway analysis is based on DAVID bioinformatics platforms.
Ethic statement. We confirm that all methods were carried out in accordance with relevant guidelines and regulations and all experimental protocols were approved by the Children's Hospital of Philadelphia (CHOP). Informed consent was obtained from all subjects or, if subjects are under 18, from a parent and/or legal guardian.