Introduction

The heterogeneous manifestation of autism spectrum disorders (ASD) consists of several characteristic features including markedly abnormal social interaction, impaired communication abilities, and repetitive and restricted patterns of behavior and interests. Symptoms vary in severity per case and can occur with or without intellectual or language impairment.1 Heritability is high, with an estimate of 60%,2 which has led to extensive research into the genetic variants underlying ASD.

There have been several genome-wide studies including genome-wide association (GWA) and linkage studies, which reported promising associations.3, 4, 5 However, these associations explained only a small fraction of the genetic risk to ASD, showed little replicable results with P-values between 10−4 and 10−8, and illustrated that most of the GWA studies up to date lack power to reliably determine a role for common variants in ASD.4, 5, 6, 7, 8, 9, 10

The examination of copy number variants (CNVs) in ASD patients has been more successful by revealing enrichment of genes important for several, mostly synaptic, functions.11, 12, 13, 14 However, causal CNVs occur in 5–10% of ASD cases and although they have a large effect on the liability to ASD, they are very rare, generally not specific to ASD, and cover large genomic areas including multiple genes.15

The most exciting genetic discoveries for ASD have been reported based on whole-exome sequencing (WES) data, in which several rare de novo variants in a diversity of genes have been linked to ASD.15, 16, 17, 18, 19 These WES findings implicated a role for genes involved in chromatin remodeling,16, 17, 19, 20 synaptic formation,16 transcriptional regulation,16 and FMRP-associated genes16, 17, 20 (see for reviews refs 10, 21).

The emerging picture is that ASD, like other psychiatric traits, is highly polygenic, and likely influenced by a mix of rare and common variants, of which functional implications still have to be determined.10, 22 The identification of pathways with genes dysfunctional in ASD may increase by investigating the combined effect of multiple variants, using gene-set analysis. This, because it evaluates the joint effect of multiple genetic variants grouped according to biological or cellular function, thereby decreasing the multiple testing problem and increasing effect size.23, 24, 25, 26

In the current study, our objective is to investigate the contribution of common genetic variation to biological pathways functionally involved in ASD. To this end, we investigated whether the joint effect of common genetic variants grouped into a priori selected gene-sets is associated with ASD. Because of our hypothesis driven top down approach, we performed direct testing on all single nucleotide polymorphisms (SNPs) in a particular gene-set. This, instead of a functional enrichment analysis of top SNPs that does not require a priori hypotheses and uses top SNPs to define possible associated pathways based on functional enrichment.

We selected five categories of gene-sets, and limited ourselves to expert-curated gene-sets, resulting in testing 32 gene-sets. We included 19 curated synaptic24 (category one) and three glial gene-sets (oligodendrocytes, astrocytes, and oligodendrocytes and astrocytes combined)27 (category two), which have cell type-specific functions. Synaptic genes have been implicated to be among the top genes harboring variants associated with ASD14, 16 and other psychiatric disorders.28 The glial sets are an exploratory approach to provide general insights and starting points for more specific hypothesis formation although previous research has pointed in the direction of a role for astrocytes29 and oligodendrocytes30 in ASD, and for oligodendrocytes in schizophrenia (SCZ).31 The third category we selected consists of genes which gene-transcripts are targeted by fragile X mental retardation protein (FMRP). FMRP is an RNA binding protein expressed in the brain coded by the FMRI gene located at Xq27.3 that has been linked to fragile X syndrome (FXS)32 and evidence for an association with ASD is accumulating due to WES16, 17, 20 and CNV studies,33 yet it has not been confirmed using common variants from genome-wide association studies (GWAS). Darnell et al.34 and Ascano et al.35 have provided insights into the biological underpinnings of FXS and ASD using human tissue and mouse models on FMRP and the RNA this binds to. Their research resulted in three gene-sets with FMRP target transcripts (Darnell gene-set, Ascano gene-set, Darnell and Ascano overlap gene-set). We aimed to test whether these sets of FMRP-targeted genes are associated with ASD in a large human sample not enriched with the FMR1 variant. Our fourth gene-set category is a glutamate pathway. With glutamate being the most important excitatory agent in the brain, glutamate and its receptors have been suggested to have a role in psychiatric diseases including ASD.27, 36 The fifth category (six gene-sets) is based on mitochondrial genes.27 Mitochondria provide energy for the cell and with the brain being the organ using most of the energy, even a small reduction in energy production can result in impaired brain processes in the synapse.27 Mitochondrial dysfunction has cautiously been associated with psychiatric diseases, including ASD, based on abnormal mitochondrial biomarker values and high prevalence of mitochondrial diseases in ASD patients compared to a healthy subpopulation.37, 38

In sum, our main goal is to directly test predefined sets of genes for their association with ASD. These gene-sets were selected because of previous associations with psychiatric diseases and test involvement of (1) synaptic processes, (2) glia cells, (3) FMRP, (4) glutamate and (5) mitochondrial involvement. The underlying hypothesis is that the polygenic nature of ASD shows convergence of genetic effects in biologically meaningful sets of genes.

Materials and methods

Sample

We used the publicly available GWAS summary statistics (PGC.ASD.euro.all.25Mar2015.txt.gz) downloaded from http://www.med.unc.edu/pgc/results-and-downloads on 21 May 2015. More information on the sample can be found on the mentioned Psychiatric Genomics Consortium (PGC) website. Briefly, in their original study PGC used five cohorts: the Geschwind Autism Center of Excellence (ACE), the Autism Genome Project (AGP), the Autism Genetic Resource Exchange (AGRE), the NIMH Repository, the Montreal/Boston Collection (MONBOS), and the Simons Simplex Collection (SSC); see Table 1). The total number of ASD probands in this sample is 5305, and of pseudocontrols this is 5303. In a pseudocontrol setting, instead of a regular control group the non-transmitted parental allele is used as the control. All participants were of European descend. For the current gene-set analyses summary statistics (ie, P-values per SNP) of this PGC study were used.

Table 1 Overview of the cohorts that were included in the initial analysis by the PGC

Generation of gene-sets

We used 32 publicly available expert-curated gene-sets that we assigned to five distinct categories. The 19 synaptic gene-sets were published in previous studies14, 24, 28 in which they were defined based on assignment of subcellular function as determined by synaptic protein purification experiments and data mining for synaptic genes and gene ontology. Synaptic genes were subdivided into 19 functional groups (N genes 1047). The glial (146 genes), oligodendrocyte (52 genes), astrocyte (42 genes) and mitochondrial (six gene-sets, N genes 132) gene-sets were created and described by Duncan et al.27 who conducted a database search in the gene ontology database and REACTOME. They supplemented the identified genes with genes found via an in-depth literature study. In addition, we included three gene-sets consisting of FMRP-targeting genes (N genes 1809) as defined by Darnell et al.34 and Ascano et al.35 with sequencing methods. All gene-sets as used in the present study are shown in Supplementary Tables S1 and S2.

MAGMA and INRICH gene-set analyses

Gene-set analyses can consist of self-contained testing and competitive testing. In a self-contained test the alternative hypothesis states that a gene-set is associated with the trait against the null hypothesis of no association, whereas in competitive testing the alternative hypothesis is that the gene-set is significantly stronger associated with the trait than genes not included in the gene-set.

We performed our analysis using two methods, MAGMA and INRICH, both providing competitive test results.39

MAGMA (v1.01, http://ctglab.nl/software/magma) is a tool to perform gene and gene-set analysis which is distinguishable from other methods like INRICH, ALIGATOR, MAGENTA by having more statistical power, being less affected by linkage disequilibrium (LD; a SNP is in LD with another SNP when their specific alleles occur more often together than expected by chance, implicating that the independent association assumption is violated and you can predict one of the specific alleles with high certainty dependent on the other known allele.) and multi-marker associations due to its multiple regression approach and being computationally less demanding as it does not use a permutation based approach.39, 40 A significant hit in MAGMA indicates that multiple genes in the gene-set are associated. Although the SNPs included in these genes can have relatively high P-values, only together they are responsible for a positive signal in a gene-set. The 1000 genomes European panel (reference file) and NCBI 37.3 (gene location) (downloaded from http://ctglab.nl/software/magma) were used for SNP annotation to genes.

INRICH41 is a permutation based GWA analysis tool that tests whether functionally related genes compiled in gene-sets show a stronger association with a phenotype than expected by chance. A significant hit in INRICH can occur with only a few highly associated SNPs in a gene-set. INRICH can be downloaded from http://atgu.mgh.harvard.edu/inrich/downloads.html. Utilizing Plink42 we computed several intervals for our analyses using different parameters for the LD clumping procedure. We applied the default INRICH values of 5000 first pass permutations, and 1000 s pass permutations.

Clumping is a method to reduce the amount of double signal in a data set due to LD. We assigned SNPs that are significant between a certain threshold to the same clump if they have an r2 of 0.5 and are not yet assigned to another clump. For this clumping parameter we used several SNP P-value significance thresholds: 0.0001 and 0.01 (both fixed P-value thresholds) and 0.00896288 (1% cut-off P-value, computed in R studio v3.0.2, Boston, MA, USA) and 0.0008106764 (0.1% cut-off P-value, computed in R studio v3.0.2). These different thresholds influence which and how many SNPs are assigned to a clump. A stricter P-value cut-off results in a clump with less SNPs. Again, we used NCBI 37.1 for gene location.

Statistical testing

We applied multiple testing following our hypothesis driven approach: per hypothesis we multiplied the P-value by the number of gene-sets that was tested for that particular hypothesis. Although this correction is not as strict as the Bonferroni correction, it provides sufficient correction as we constructed independent hypotheses generating independent results per hypothesis. This allows for multiple testing correction per hypothesis instead of a more stringent multiple testing correction over all hypotheses.

A consensus has not yet been reached so different studies parameterize and evaluate results differently.25

Results

MAGMA

Competitive gene-set analyses resulted in a statistically significant, multiple testing corrected association with the FMRP target gene-set by Darnell et al.34 (P=0.014). None of the other gene-sets showed a statistically significant association, after multiple testing correction (for all results see Supplementary Table S3. Also, see Figure 1 (gene level) and Figure 2 (SNP level). These figures show the polygenic signal (Figures 1a and 2a) and the role of the FMRP gene-set in this signal (Figures 1b and 2b).

Figure 1
figure 1

Visualisation of the polygenic pattern at gene level. (a) QQ plot of P-values of all genes that have SNPs from the PGC data set assigned to them regardless of inclusion in a gene-set. This plot compares the observed P-values to the expected P-values at gene level. The non-linear pattern (the deviation from the diagonal) visualizes the polygenic signal in the genes. (b) QQ plot of the genes in the FMRP (Darnell) gene-set. The earlier lift off and larger deviation from the diagonal compared to (a) illustrates the signal in a is driven in part by genes in the FMRP gene-set by Darnell.

Figure 2
figure 2

Visualisation of the polygenic pattern at SNP level. (a) QQ plot from all SNPs in the PGC data set. This plot compares the observed P-values to the expected P-values at SNP level. The non-linear pattern (the deviation from the diagonal) visualizes the polygenic signal in the SNPs. (b) QQ plot from the SNPs in the FMRP (Darnell) gene-set. The earlier lift off and larger deviation from the diagonal compared to (a) illustrates the signal in a is driven partly by SNPs in the FMRP gene-set by Darnell.

Gene-based tests in MAGMA for the genes included in the significant gene-set resulted in multiple significantly associated genes, indicating the results were not driven by a few highly associated genes and illustrating the value of testing groups of genes together (Supplementary Table S4).

INRICH

We performed this method with several parameter settings, as described in the method section. Overall, different parameter settings regarding clumping thresholds did not meaningfully change the results. However, the FMRP targets showed again a significant association with ASD after multiple testing correction, P=0.031, for the clumping SNP P-value significance threshold: 0.0001. For all results see Supplementary Table S5.

Discussion

Previous research efforts have clearly shown that ASD is a highly polygenic disorder with reported heritability around 60%.2 Some genetic variants have consistently been identified but functional implications of current genetic findings are as yet modest.14, 22, 25 In the present study, we tested whether the genetic variants of small effect on ASD tend to cluster in selected functional sets of genes. We tested expert-curated functional gene-sets that have been constructed previously in the context of their putative role in psychiatric disorders. Our multiple testing corrections were not as strict as a Bonferroni correction. If we had applied this correction no gene-set would be significantly associated. Still, we believe a sufficient correction was applied as we constructed, based on previous findings, independent hypotheses that as such generated independent results per hypothesis. Also, we applied competitive tests instead of the – far less stringent – self-contained tests, as usually used in this type of analyses. Moreover, the gene-sets are not independent, and thus a Bonferroni based on all gene-sets for all tested hypotheses is likely overly conservative.

The FRMP gene-set by Darnell was found to be significantly associated with the risk for ASD confirming previously reported associations in WES and CNV studies.17, 33, 34, 35 This gene-set of FMRP-targeting proteins was constructed by Darnell et al.34 who identified FMRP interactions with mRNA in the mouse brain by means of high-throughput sequencing of RNAs isolated by crosslinking immunoprecipitation. Their study showed a connection between loss of function FMRP and ASD-associated symptoms in FXS and ASD patients. FMRP is important for translation of hundreds of neuronal mRNA’s and its loss results in morphological and physiological neuronal defects resulting in FXS-like symptoms like cognitive impairment, seizures, anxiety and hyperactivity.43, 44, 45 The link between ASD and FXS seems intuitive as FXS is the leading form of monogenetic inherited intellectual disability with many cognitive and behavioral symptoms which are also manifest in ASD.43 In addition, FXS shows comorbidity with ASD, about 30% of FXS patients are diagnosed with ASD, whereas 1 to 2% of ASD patients show FXS comorbidity.43 As an FXS diagnosis was an exclusion criterion our results are not likely due to inclusion of patients with this monogenetic disorder. However, in the FXS there are pre-mutations, less CGG repeats than FXS patients but more than healthy individuals, causing diseases like fragile X-associated tremor ataxia syndrome, and increasing the incidence of ASD and ADHD.46 Additional genetic phenotyping, including FMRP count, would be needed to ensure that samples are not enriched for FMRP pre-mutations.

A study on ASD rare variants47 also reported associations with the Darnell FMRP gene-set, yet they found no evidence for overlap at the individual gene level. An association between this gene-set and SCZ has also been reported.47, 48, 49, 50 Two SCZ CNV studies48, 50 showed genetic overlap between ASD at gene-set and individual gene level. As a whole, these results might point in the direction of a common biological basis between ASD and SCZ making it of interest to look further into this possible overlap.

A general concern in psychiatric disorders is phenotypic heterogeneity and in ASD heterogeneity in intellectual disability (ID),51 is one of these concerns. As ID was not an exclusion criterion in our study, we cannot ensure with 100% certainty that our results are not partly driven by ID. Attempts have been made52 to stratify samples into low (IQ<60) and high IQ (IQ>60) but a downside is that subsequent decreasing sample sizes reduce statistical power. Unfortunately, we only had access to GWAS summary statistics and not to IQ scores and raw genotypes of participants we could not perform such analyses in our current study.

A final point to address regarding our FMRP hypothesis is that, out of three FMRP gene-sets, only the Darnell gene-set remained significant after multiple testing correction. The Ascano and Ascano autism overlap gene-sets did not generate significant results. Possible explanations for these findings are the different ways the gene-sets were constructed. Darnell 34 identified FMRP interactions within the mouse brain by means of high-throughput RNA sequencing and follow-up analysis, whereas Ascano et al.35 examined FMR1 family protein binding sites to identify and rank FMRP targets in human embryonic kidney cells. These methods may have resulted in two different subsets of FMRP targets showing little overlap and expressing different biological properties. As a final remark, both gene-sets are large compared to all other tested sets. As the Ascano gene-set is larger than the Darnell gene-set, and given that effects of larger gene-sets are generally more easily to detect,40 it is unlikely that gene-set size explains the association of the Darnell set. In addition, both MAGMA and INRICH have a correct type I error rate which is independent of gene-set size.40

Our results do not support a role of the glutamate pathway, mitochondrial, synaptic or glial pathway in ASD, suggesting it is unlikely that there are large effects of these pathways on the risk of ASD. However, gene-set definitions are dynamic, and with increased precision in pathway annotation, these results may change.25

Taken together, the current results provide evidence for a role of FMRP-targeted transcripts in ASD. As FMRP is associated with several psychiatric conditions a more thorough exploration of genes in this gene-set and their association with different psychiatric disorders might provide useful information on an underlying shared genetic etiology between several disorders.

To conclude, we performed a gene-set analysis aiming to find common variation clustered in functional pathways associated with ASD. Our significant hit in common genetic variants is an FMRP-targeting gene-set that has been associated with ASD in rare variation and other psychiatric illnesses. These findings can point in the direction of a more general mechanism underlying psychiatric disorders making cross disorder research an important future component of the scientific repertoire.