Whole exome sequencing in ADHD trios from single and multi-incident families implicates new candidate genes and highlights polygenic transmission

Several types of genetic alterations occurring at numerous loci have been described in attention deficit hyperactivity disorder (ADHD). However, the role of rare single nucleotide variants (SNVs) remains under investigated. Here, we sought to identify rare SNVs with predicted deleterious effect that may contribute to ADHD risk. We chose to study ADHD families (including multi-incident) from a population with a high rate of consanguinity in which genetic risk factors tend to accumulate and therefore increasing the chance of detecting risk alleles. We employed whole exome sequencing (WES) to interrogate the entire coding region of 16 trios with ADHD. We also performed enrichment analysis on our final list of genes to identify the overrepresented biological processes. A total of 32 rare variants with predicted damaging effect were identified in 31 genes. At least two variants were detected per proband, most of which were not exclusive to the affected individuals. In addition, the majority of our candidate genes have not been previously described in ADHD including five genes (NEK4, NLE1, PSRC1, PTP4A3, and TMEM183A) that were not previously described in any human condition. Moreover, enrichment analysis highlighted brain-relevant biological themes such as “Glutamatergic synapse”, “Cytoskeleton organization”, and “Ca2+ pathway”. In conclusion, our findings are in keeping with prior studies demonstrating the highly challenging genetic architecture of ADHD involving low penetrance, variable expressivity and locus heterogeneity.


Introduction
The World health organization (WHO) estimates (10-20%) of children worldwide to be affected with some form of childhood-onset neuropsychiatric disorder [https://www.who. int/mental_health/maternal-child/child_adolescent/en/]. These disorders, if left unmanaged, can impair normal functioning and development. Perhaps the most common example of such conditions is attention deficit hyperactivity disorder (ADHD), characterized by a persistent pattern of inattention and/or hyperactivity-impulsivity. In some cases, these symptoms continue into adulthood [1].
Like other neuropsychiatric disorders (e.g., autism, schizophrenia, and depression), the exact causes of ADHD are not yet clearly understood. The earliest evidence for genetic contribution to ADHD came from family, twin and adoption studies consistently demonstrating high heritability (up to 88%) [2]. While this disorder may aggregate in families, it tends not to segregate in a typical Mendelian manner. This is true not only for ADHD, but also for most of complex disorders, whereby risk to the disorder is several folds higher in the relatives of the proband compared with the general population. Also, disease risk was found to be proportional with the degree of relatedness [3,4]. The current consensus is that ADHD is a complex condition influenced by multiple genetic, social, and environmental factors [5].
Various genome-wide approaches have been employed to investigate the genetic basis of ADHD. For a comprehensive review see [2,6]. The first wave of studies applied genetic linkage methods on sibling pairs, small families, or multigenerational families in search for genetic variants with large effect. Because of the high prevalence of ADHD (possibly due to increased recognition of the disorder), later studies focused on finding common single nucleotide polymorphisms (SNPs) with modest effects that are significantly associated with the disorder in case-control samples. In these studies one of two main strategies were adopted: candidate gene association or genome-wide association. The former strategy is hypothesis-driven, whereby the selection of candidate genes is based on existing knowledge about; (1) their relevant biological roles, (2) their participation to a recognized ADHD drug target pathway, and (3) their previous association with ADHD. Results from this type of studies were often inconsistent for the same locus. However, only through genome-wide association studies (GWAS) meta-analysis that researchers were able to identify significant associations across studies [7].
Unlike candidate gene association studies, GWAS are hypothesis-free interrogation of the entire genome for significant risk loci (SNPs). Previous GWAS of ADHD failed to find variants with genome-wide significance even when they were combined in a meta-analysis. This is expected given the highly stringent P value threshold (5 × 10 −8 ) required in this approach, which can only be achieved using a very large sample size [8]. Thus far, only a single study was able to detect association signal with global significance [7]. In this well-powered study (20,183 cases and 35,191 controls), Demontis et al. identified 12 independent risk loci that were also found by another group to be a robust set from which polygenic risk scores could be derived for reliable prediction of some of ADHD concurrent traits [9].
Although up to one-third (estimates ranging from 0.1 to 0.28) of total ADHD heritability is thought to be based on inherited common SNPs, a large portion of the heritable risk remains unaccounted for [4,10]. This so-called "missing heritability" is likely to be, at least partially, explained by rare genetic variants. The first attempts to address the role of rare variants in ADHD susceptibility were focused on studying copy number variants (CNVs). While enrichment for large CNVs (≥100 kbp) has been observed in a couple of studies [11,12], the majority of the reported CNVs had reduced penetrance and limited replication across studies [2,13]. Only recently, have next generation sequencing (NGS) research begun to explore the role of another class of rare variants (single nucleotide variants (SNVs)) in ADHD. However, a very limited number of studies have been published so far [14][15][16][17][18][19]. In all of these studies exome sequencing was applied for identification of rare functional SNVs either in preselected set of genes [14][15][16] or across the genome in a small sample size [16][17][18][19].
Collectively around 359 genes (counts from ADHDgene database, http://adhd.psych.ac.cn/index.do [accessed on 17.12.19] [20]) have been published in ADHD genetic studies. Most of these studies were conducted in Asian or European populations. The increasing realization of the disadvantages of lack of diversity calls for new diversityincreasing strategies to minimize bias and ensure worldwide applicability of the genetic findings. Such strategies are the path for realizing the promise of personalized medicine for all. The present study contributes to the growing efforts to enrich diversity by including an underrepresented non-European population. We report here the first attempt to explore the role of inherited and de novo variants (SNVs and CNVs) in ADHD (parents-child) trios of Arab descent. Using whole exome sequencing (WES), we applied our inhouse analysis pipeline to identify potential ADHD risk variants.

Subjects and methods
Participants A total of 16 Saudi families with at least one child affected with ADHD were recruited in this study after obtaining signed informed consent in accordance with the Declaration of Helsinki and the institution's relevant committees; Institutional Review Board (IRB), Research Ethics committee and Basic Research Committee. All experimental protocols used in this study were authorized under an IRBapproved project (RAC#2120001). Recruitment was done through the psychiatry department at King Faisal Specialist Hospital and Research Centre (KFSHRC). All of the approached families agreed to participate (no withdrawals). Male to female ratio of the enrolled cases was 3:1 and the mean age was 12 years (Table 1). Probands were assessed by a trained medical team (child psychiatrist and neurologist) and diagnosed according to the diagnostic and statistical manual of mental disorders (DSM-IV) criteria. Our exclusion criteria included ADHD secondary to syndromes with known genetic causes such as Fragile X syndrome, Tuberous Sclerosis, Rett syndrome, Angelman, Prader-Willi syndrome, or Phenylketonuria. Therefore, only cases with non-syndromic ADHD were selected. Blood samples for DNA extraction, were collected from all available consenting family members (parents as well as affected and unaffected siblings).

Copy number variation analysis
All recruited samples were surveyed for CNVs in genes listed in (Table S1). The list contains genes from AutismKB core dataset (with a total score ≥20), International Multisite ADHD Genetics (IMAGE) candidate risk genes [21] and genes participating in neurotransmitter systems that were not included in the aforementioned sets. CytoScan TM HD Suite (Affymetrix, Santa Clara, CA, USA) was used for genome-wide detection of CNVs. Targeted analysis of CNVs in the selected genes was performed using Chromosome Analysis Suite 3.0 (ChAS 3.0). All genomic locations were based on GRCh37/hg19 human genome assembly. Gains were defined as (log2 ratio of copy) values greater than 0.58 and loss as values less than −1.

Whole exome sequencing and data analysis
The exomes of 16 (parents-child) trios were captured using whole exome AmpliSeq kit and sequenced on Ion Proton TM System platforms (up to 200 bp, single reads). NGS raw data are deposited at the Saudi Human Genome Program (SHGP) repository [https://genomics.saudigenomeprogram. org/en/]. All variants described in this study have been submitted to LOVD [https://databases.lovd.nl/shared/genes] public repository. Sample processing and all three stages of data analysis (primary, secondary, and tertiary) were performed as previously described [22]. The WES workflow applied in this study is summarized in Fig. 1.

Variants filtering, validation, and prioritization
Variants obtained by applying the four possible modes of inheritance (autosomal recessive, autosomal dominant, Xlinked, and de novo) were retained if present in genes with positive brain expression in three out of four databases; GTEx [23], BioGPS [24], CGAP [25], and human brain transcriptome (HBT) [26]. Next, only functional variants (missense, small insertions/deletions (indels) and loss of function (LoF)) with minor allele frequency (MAF) of <1%) in population databases (1000 Genomes project, ExAC and Kaviar) and in the local ethnically matching Saudi Human Genome Program normal controls database (SHGPdb >2379 exomes) were selected for Sanger validation. The resulting variants were prioritized based on their functional effect prediction whereby only those predicted to be "deleterious" by at least two out of three bioinformatics tools (CADD, PredictSNP2 and FATHMM-MKL) [27][28][29] were considered. At the gene level, intolerance to variation (missense and LoF) metrics (pLI and Z score) were extracted for each identified gene with validated variant(s). However, although these metrics can be helpful when combined with other prediction tools, they are inadequate to infer or exclude pathogenicity when used solely [22]. In addition, publicly available variants or gene databases such as dbSNP, SFARI Gene [https://gene.sfari.org/autdb/ Welcome.do], AutismKB [http://autismkb.cbi.pku.edu.cn/], and ADHDgene [http://adhd.psych.ac.cn/] were checked for previous reports of the variants/genes identified herein. When possible, segregation analysis was performed on affected as well as unaffected siblings.

Relatedness assessment
Relationships between samples were computed using two algorithms. First we utilized VCFtools program package (with option-relatedness) to calculate the "unadjusted" A jk relatedness statistic based on Yang et al. method. In the second method, shared homozygosity was calculated by comparing the homozygous variants (MAF > 1%) between each pair of individuals and counting the number of overlapping variants after normalization. For detailed description refer to the supplementary material.

Functional enrichment analysis
Two enrichment analysis tools were employed to discover biological processes overrepresented in our gene list.

Analysis of CNVs
None of the genes included in our list (Table S1) were harbored in the regions with copy number alterations except for a heterozygous copy loss in SHANK3 that was identified in probands from F5 and F15. The deleted region spans the entire gene in F5, whereas in F15 it was found to overlap with parts of SHANK3. Of note, our analysis revealed CNVs in two genes that were not included in our gene list but are worth reporting here given their reported link to autism spectrum disorder (ASD) [30,31]. The first was NOTCH1, in which copy gain was detected in probands of F10 and F17. The second was NRXN3 that was found to harbor heterozygous copy loss in F19. Locations and details of the detected CNVs are listed in Table 2 WES data, performance, and statistics Results of relatedness assessment and demographic information are summarized in Table S2. Performance metrics of the WES runs including, number of reads per base that map to the reference genome, the percentage of target region coverage and sequencing depth are summarized in Table S3. High performance metrics were achieved indicating high quality mapping and variant calling. Type and number of the detected variants before applying the filtering pipeline are listed in Table S4. All trios passed quality

ADHD families
Analysis of CNVs in preselected genes (      control checks except for F9 which was excluded from further analysis.

Rare SNVs detected by WES
By interrogating all protein coding regions covered by the employed kit, we have identified and validated a total of 32 unique rare variants in 31 different genes. Of which, 2 were de novo and 30 were inherited mainly being missense changes. Interestingly, 5 of the identified genes (NEK4, NLE1, PSRC1, PTP4A3, and TMEM183A) have not been previously described in any human disorder. The biological function of 10/31 genes is yet to be determined, while others participate in similar biological themes. For instance, signal transduction (THBS4, PLCB2, GNB4, PTCH1), chromatin modeling or DNA damage response (NEK4, ACIN1, PSRC1, TLK2, SPAG5), gene transcription regulation (KMT2A and ZFHX3), and vesicle transport (NBAS, KIF21A, PCLO) (Table S5).
Positive brain expression was reported for all the identified genes, also information on the developmental and regional differential expression was obtained from HBT (Table S6). Information was available for 26/32 identified genes, 12 of which were expressed throughout the entire life span, 13 were under temporal and/or spatial control, and 1 gene (ACCS) had a negative expression in the brain. NLE1 is the only gene that appears to be developmentally regulated across all included brain regions. Based on the HBT database, its expression is restricted to a limited developmental time frame (embryonic-late mid fetal development) [26].
The absence of a universal candidate gene/variant for neurodevelopmental disorders like ASD and ADHD is consistent with the increasingly accepted view that heterogeneous set of genetic variants, displaying incomplete penetrance, can shape risk to the disorder [32,33]. In light of this, we considered a model in which a combination of genetic variants forming a "constellation" may influence risk. Therefore, we have reported all SNVs that have survived our filtering criteria regardless of their co-segregation profile ( Table 3). As expected, most of the validated variants exhibited reduced penetrance being either present in normally developing siblings or absent in the affected ones. One exception was LRRK1 LoF variant (c.2687-2A>G) that was detected in a homozygous state in three out the four affected individuals in F1, while the remaining (screened) family members including the 4th affected sibling were all heterozygous carriers for this change. These variants are an example of variation in penetrance and phenotypic expressivity, a phenomenon that is not uncommon in neurodevelopmental disorders.
Almost half of the genes identified herein (15/31) have a mouse model with documented behavioral or neurological phenotype (Table S7). Among these genes SLC9A9 was the only one with established mouse models of a neurodevelopmental disorder (ASD). This gene encodes a membrane protein localized in the late recycling endosomes, which are involved in the trafficking of neurotransmitters receptors and transporters. It is worth mentioning that we have screened all the genes listed in Table S1 for variants fulfilling our selection criteria to ensure that our pipeline had not missed any known causal or candidate variants in these genes. No candidate variants were flagged except the one that we have previously detected in SLC9A9. This gene was the only one overlapping with (IMAGE) [21].

Biological pathways overrepresented in this study
In order to investigate which biological processes might be affected by the variants identified herein, we utilized two web-based bioinformatics tools (WebGestalt and Reactome). A list of diverse functional categories were flagged as statistically overrepresented within our final set of genes with validated variants (Table S8). By querying KEGG and GO databases, WebGestalt was able to identify the top ten functional categories under "pathway", "biological process", "cellular component" and "molecular function" that are significantly enriched in our gene list. Various biological themes were highlighted including "Insulin secretion", "Glutamatergic, cholinergic, serotonergic, dopaminergic synapses", "Circadian rhythm", "Cell division", "Pyruvate dehydrogenase complex", and "Cytoskeleton organization". In addition, Reactome identified the 25 most significant pathways, examples of which include; "Regulation of insulin secretion", "Presynaptic function of Kainate receptors", and " Ca 2+ pathway". Interestingly, among all the enriched functional categories "Insulin secretion" was identified as a recurrent theme.

Discussion
The role of rare SNVs has been much less investigated in ADHD, than in ASD, with no more than a handful of studies in which NGS was applied. These studies could be grouped into hypothesis-driven [14,15], hypothesis-free [17][18][19] or a mix of both [16]. In the hypothesis-driven approach, NGS analysis was restricted to either a predefined set of ADHD candidate risk genes (IMAGE) [14], or gene sets curated from previous association studies (candidate genes/GWAS) after the application of specific criteria [15], or to a limited number of genes linked to ASD and intellectual disability (26 genes) [16]. With respect to sample size, all hypothesis-free studies were performed in a small number of cases/discovery-cohort (comprising 2-30 affected individuals), whereas larger number of samples (117-152 affected individuals) were included in the Table 3 Rare variants detected and validated in this study. PredticSNP2 predictions were reported with the expected accuracy (%). Genomic, transcript, and protein reference sequence identifiers are from NCBI database (GRCh37/hg19).
All NGS files were deposited in the SHGP repository [https://genomics.saudigenomeprogram.org/en/]. hypothesis-driven studies. Moreover, variant filtering criteria that was applied in the hypothesis-free investigations was relatively relaxed, not including either variant validation by an orthogonal method (Sanger sequencing or Taq-Man assay) [18,19], or variant functional impact prediction analysis [16,17,19].
In this study, we employed discovery-based approach coupled with our stringent variant filtering criteria to identify de novo and inherited rare (disease-relevant) candidate variants (Fig. 1). We have detected and validated 32 unique rare SNVs in 13/15 (86.6%) trios. For the majority of cases, at least two variants were found per proband. The presence of multiple candidate variants per case, is something that we and others have encountered in similar complex disorders (ASD and Parkinson's disease) [22,34,35]. This reinforces the plausibility of a polygenic model whereby several genes/loci may harbor risk variants collectively contributing to the disease burden.
Assuming homozygous inheritance, we have previously analyzed all of the multi-incident families (Table S9) with apparently unaffected parents for regions of homozygosity shared only between affected individuals within or across families [36]. In our previous work the intention was to detect genomic regions of homozygosity that are likely to contain causal variants. However, in the current study our primary aim is to detect rare variants with predicted damaging effect regardless of the segregation pattern. It is perhaps unsurprising that none of the candidate genes revealed here in the multi-incident families existed in regions overlapping with what we have previously reported [36]. This is anticipated, given that genetic and allelic heterogeneity, reduced penetrance and variable expressivity are all factors known to influence the impact of genetic changes, an issue that has been increasingly recognized in complex disorders [33,37]. Thus far, only four of the genes reported in ADHD NGS-based studies were replicated across cohorts from different descent (Table S10). Each was found to harbor distinct variants. However, whether or not they contribute to the disorder is yet to be discovered.
Our findings from both studies (here and [36]) are in keeping with the notion that ADHD does not conform to a monogenic model. Moreover, while homozygosity analysis may be less successful compared with WES in detecting rare variants with large effect size, it can be useful in highlighting genomic regions containing other types of common genetic events (structural, noncoding, or regulatory) that may act " in aggregate" as modifiers [33].
LRRK1 has been pointed out by our analysis as an interesting candidate gene. The LoF variant (c.2687-2A>G) detected within this gene was present in three out of the four affected individuals in F1. Recessive transmission of the alternative allele was observed in the three affected members (homozygous) while the rest of the family members Variants LOVD IDs and other information are listed in Table S11. (4th affected sibling along with the parents and the healthy siblings) were all heterozygous carriers. Under the assumption of a compound heterozygous transmission, we revisited the data generated from the CytoScan TM HD platform for any CNVs affecting LRRK1 in the 4th affected individual (with a heterozygous genotype). However, no copy number changes were detected in this gene. In spite of the reduced penetrance of this variant, it is the only one in this study that was present in more than two affected individuals (same allele and genotype) which was not shared with the unaffected family members. This variant affects a splice-acceptor site located between intron 19 and exon 20 (exons are numbered from 1 to 34 based on NM_024652/ ENST00000388948.3, GRCh37 genome build) which maps onto the ROC-COR bidomain of the protein [38]. However, assessment of its consequence on gene splicing pattern was not possible here due to the unavailability of RNA samples. Thus far, osteosclerotic metaphyseal dysplasia, which is a rare form of skeletal dysplasia, is the only human condition thought to be caused by variants in LRRK1 [39,40]. Defective or depleted LRRK1 has been shown to disrupt osteoclast normal function [39,41]. Unfortunately, we could not determine whether the individuals homozygous for LRRK1 variant had any form of bone abnormality due to the unavailability of relevant clinical and radiographic data.
Besides its role in bone homeostasis [41], LRRK1 have been shown to play a role in Grb2-mediated EGFR endocytic trafficking [42]. EGFR signaling influences many cellular processes including proliferation and migration of neural progenitor cells [43,44] and more recently has been identified as an essential regulator of axon branching in the developing Drosophila brain [45]. Proper axonal branching is crucial for the formation of functional neural circuits; disruption of which may underlie brain circuitry aberrations that have been documented in individuals with ASD and ADHD [46,47]. However, whether EGFR signaling regulates axonal branch formation in the mammalian brain or not, is yet to be determined.
Of the biological themes revealed by our enrichment analysis, at least five categories overlapped with prior findings "Glutamatergic synapse" (40 genes reported in ADHDgene database), "Dopaminergic synapse" (20 genes reported in ADHDgene database), "Serotonergic synapse", "Ca 2+ signaling" (40 genes reported in ADHDgene database), and "Cytoskeleton organization". These biological processes are central to brain function therefore it is unsurprising to find that they have been repeatedly described in the literature [19,48,49]. Moreover, functional categories pertaining to the cytoskeleton comprised the largest number of genes (7/31) followed by "Regulation of insulin secretion" (14 genes reported in ADHDgene database) (3/31). Interestingly, insulin signaling may indirectly be involved in normal brain function through its known role in regulating the activity of both excitatory and inhibitory synapses in addition to influencing the brain's structural plasticity [50].
Small sample size is a limitation common to most of the published hypothesis-free ADHD genetic studies (including the present one) that can undermine the reliability of the produced findings. In fields such as human genetics of complex disorders, large-scale collaborative consortia are considered the gold standard approach for improving power and obtaining reliable and more generalizable results. However, extensive genetic and clinical heterogeneity not only across different cohorts/populations but also within families should be taken into consideration when this approach is employed. Another limitation of our study is that only the coding regions were interrogated for rare candidate variants, while variants existing outside our WES coverage area or those located in noncoding or regulatory regions were not possible to detect. In addition, the allele frequency of the CNVs identified here could not be determined due to the absence of data from matching normal controls.
In conclusion, our workflow revealed 32 unique rare variants in 31 different candidate genes most of which have not been previously described in ADHD. Even though our variant prioritization method is restricted to those with predicted deleterious effect, further functional and cellular analysis is essential to confirm their true biological consequences. The concept of a single causative variant being sufficient on its own to drive complex disorders like ADHD is no longer viable. Instead, an accumulating body of evidence indicates that multiple rare and common variants collectively contribute to the susceptibility to such disorders.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons. org/licenses/by/4.0/.