Gene-ontology enrichment analysis in two independent family-based samples highlights biologically plausible processes for autism spectrum disorders

Article metrics


Recent genome-wide association studies (GWAS) have implicated a range of genes from discrete biological pathways in the aetiology of autism. However, despite the strong influence of genetic factors, association studies have yet to identify statistically robust, replicated major effect genes or SNPs. We apply the principle of the SNP ratio test methodology described by O’Dushlaine et al to over 2100 families from the Autism Genome Project (AGP). Using a two-stage design we examine association enrichment in 5955 unique gene-ontology classifications across four groupings based on two phenotypic and two ancestral classifications. Based on estimates from simulation we identify excess of association enrichment across all analyses. We observe enrichment in association for sets of genes involved in diverse biological processes, including pyruvate metabolism, transcription factor activation, cell-signalling and cell-cycle regulation. Both genes and processes that show enrichment have previously been examined in autistic disorders and offer biologically plausibility to these findings.


Autism is a complex neurodevelopmental disorder characterized by impairments of varying severity in the three core areas of communication, social interaction and repetitive behaviour. Population prevalence of autism is approximately 15–20 per 10 000 with all autism spectrum disorders (ASD) estimated at 60 in 10 000 children.1, 2 The role of genetic factors in the development of autism is undisputed. Heritability has been estimated as high as 91–93% using a multi-threshold liability model.3 However, despite the strong influence of genetic factors, autism linkage studies and association studies of common SNPs have not identified any genes of major effect. Recent genome-wide association studies (GWAS) have implicated a number of genes from discrete biological pathways in the aetiology of autism.4, 5, 6 In a recent study by the AGP using these data, we identified genome-wide significant association with MACROD2.7 However, we did not observe strong marker-wise associations within the cadherin gene region (CDH9, CDH10) or the TAS2R1, SEMA5A region that were highlighted in the work of Wang and co-workers,4 Ma and co-workers5 and Weiss and co-workers.6 In addition to identifying genome-wide significant association, it can be hypothesized that additional true vulnerability loci may exist within the nominal to modest range of statistical significance and confer risk to the disorder.8 A milieu of nominal to modestly associated risk variation fits with a polygenic model of disease and presents additional challenges for the identification of patterns of association within expected experimental noise.9

One promising approach is to examine association enrichment within ‘pathways’ or groups of genes. The underlying hypothesis of association enrichment analysis is that functional polymorphisms that exist within a group of biologically inter-related genes are in essence ‘disrupting’ the normal functioning of the biological process of the pathway. Consequently, one can consider the biological process, rather than the individual gene or SNP, in the development of the disease/disorder. By examining the ratio of association signals within a group of genes, we can determine whether there is enrichment of the signal above that expected by chance. This strategy decreases the multiple-testing burden that accompanies GWAS, and can have increased power to observe association.

A number of pathway-based methodologies have been developed to examine gene enrichment in association data (reviewed by Wang et al10). These include gene ranking algorithms,11 gene-enrichment algorithms, for example, ALIGATOR (Association LIst Go AnnoTatOR)9 and SNP enrichment approaches such as the SNP ratio test (SRT).12 The SRT provides a formal test of whether markers within pre-defined pathways show enrichment in association signal over that expected by chance alone. For case–control data, the basic algorithm underpinning the SRT is to first calculate the ratio of the number of nominally associated SNP markers within a pathway to the total number of markers within the pathway. Significance is assigned through a case-randomisation permutation routine, which takes account of the linkage disequilibrium between markers.

To apply the SRT to family-based data, we are unable to perform standard case randomisation; therefore, a pseudo-sibling model is generated from the alleles that are not transmitted to the proband. A proband-randomisation procedure is performed within the family, whereby the affection status of the offspring (case and pseudo-sibling) is permuted. This method allows retention of the linkage disequilibrium structure within the families and retains the advantages of the transmission disequilibrium test (TDT) design for the family-based association. In this study, we chose the SRT over other approaches for a number of reasons. Firstly, as the SRT retains all of the markers from the association analysis, it is sensitive to more than one true association signal per gene, and therefore gains information in the presence of allelic heterogeneity. Secondly, the SRT's use of multiple association signals across a gene as opposed to a single maximum signal limits potential genotyping artefact effects. Genotyping error at a single point may highlight a gene erroneously in a maximum signal design where this becomes the only observation. However, taking the ratio of all signals across a gene restricts the impact of single points of error as they are more likely to be diluted across the gene. Thirdly, the SRT also controls for gene size and linkage disequilibrium effects by permuting case-ness independently of genotype, consequently maintaining the same recombination patterns. Approaches that do not apply a gene-wise correction to GWAS data can show inflated signals for pathways containing larger genes. This is often the case in brain expressed pathways that are enriched for larger genes such as cell-surface receptors and can lead to misinterpretation of any association enrichment. Finally, as the SRT uses an SNP-wise association statistic over a gene-wise association statistic, we have sufficient observations to examine pathways that may contain fewer genes. Thereby, we are able to examine discrete ‘niche’ pathways as well as larger, more diverse gene sets for enrichment in the GWAS.

For this study, we use gene-set lists derived from the gene-ontology (GO) ( database to examine whether association enrichment is present in a cohort of individuals from the Autism Genome Project (AGP) with a diagnosis of autistic disorder.



The individuals examined in this study were collected as part of the AGP Consortium genome analysis project. The AGP represents more than 50 centres in North America and Europe. Subjects with known karyotypic abnormalities, fragile X mutations or other known genetic disorders were excluded. Diagnostic and ancestral definitions were as previously reported by this group.7 Briefly, families are grouped into two nested diagnostic classes (Strict and Spectrum) based on proband diagnostic measures. To qualify for the Strict class, affected individuals met the criteria for autism on both primary diagnostic instruments: the Autism Diagnostic Interview-Revised (ADI-R13) and Autism Diagnostic Observation Schedule (ADOS14). ADI-R-based diagnostic classification of subjects as ASD followed the criteria published by Risi and co-workers.15 Specifically, individuals who almost met the ADI criteria for autism were classified as ASD if: (1) they met the criteria on social and either communication or repetitive behaviour domains; or (2) met the criteria on the social domain and were within two points of criteria for communication, or met the criteria on the communication domain and were within two points of social criteria, or within one point on both social and communication domains. The Spectrum class included all individuals who met the Strict criteria and those individuals who were classified as ASD or autism on both the ADI-R and ADOS or who were not evaluated on one of the instruments, but were diagnosed with autism on the other instrument. A summary of the sample sizes for the discovery and replication datasets for each DiagnosticAncestry subset is shown in Table 1.

Table 1 Sample size for the discovery and replication samples for each diagnostic/ancestral subset

As described elsewhere,7 ancestry for these individuals was determined for the proband by using 5239 widely spaced, independent SNPs that had a genotype completion rate of ≥99.9%. The software used was Spectral-GEM,16 which estimated five significant dimensions of ancestry. Subsequent clustering on dimensions of ancestry identified nine clusters; five clusters were used to describe European ancestry and the remaining clusters best reflect Asian, African (East/West) and Latin American origins. The All ancestry class included all individuals including those who met the European ancestry criteria.

Genotyping and association analysis (TDT)

The discovery sample were genotyped using the Illumina Infinium 1M-single SNP microarray, the replication sample were genotyped on a either the Illumina Infinium 1M-single SNP microarray or the Illumina 1M-duo microarray. All quality control (QC) procedures were maintained across datasets; in addition, QC marker sets from both the discovery and replication datasets were matched and only those markers meeting QC for both the discovery and replication datasets were carried forward to analysis. Additional QC details are described elsewhere.7 A total of 856 932 SNPs passed QC on both the discovery and replication sample. TDT statistics were generated using PLINK v.1.07.17

Pedigree SRT

The pedigree SRT (pedSRT) is a modification to the SRT described by O’Dushlaine and co-workers,12 which is applicable to family-based data. Briefly, the SRT tests the ratio of the number of associated SNPs to the total number of SNPs in a pre-defined set of genes. A marker is considered ‘associated’ if the association statistic is observed below a given threshold. The threshold used is arbitrary, but is set by default at an unadjusted P≤0.05. The significance of the ratio is determined through permutation using an empirical P-value derived from the proportion of the ratios for the permuted datasets that are greater than or equal to the observed ratio.12 We performed 10 000 permuted GWAS analyses for each of the diagnostic, ancestry strata for both the discovery and replication datasets. The pedSRT determines association using the TDT,18 as implemented in PLINK.17 In a case–control model, permutation is performed using case randomisation. In the TDT design, case randomisation is performed by creating a pseudo-sibling. The pseudo-sibling is created from the non-transmitted alleles from the parents. Within each permutation cycle either the proband or pseudo-sibling is considered the ‘case’. Alternate case randomisation for the TDT is implemented in PLINK, using the alternate phenotype routine.

It is important to note that to reduce type-I error in the SRT due to inflation of the original association signal, for each permutation ‘associated’ SNPs are assigned according to their rank in the dataset.12 In short, the numbers of SNPs (T) that meet the ‘associated’ threshold are calculated from the primary dataset. For each permuted dataset, the top T SNPs are termed ‘associated’.

All SNP ratio statistics were calculated using custom scripts in STATA version 10 (Stata Corp., College Station, TX, USA).

Gene tagging

Individual SNP codes from the Illumina 1M Infinium SNP array platform were updated to reflect build 130 of dbSNP. SNPs were assigned to genes using gene criteria from the dbSNP/NCBI criteria: namely, if the SNP resides within the locus containing the gene transcript including 2 kb 5′ and 500 bp 3′ of the transcript. The gene assignment protocol was performed using the NCBI criteria and facilitated using the file b130_SNPContigLocusId_36_3.bcp available at

Gene-set selection

Gene sets were described using the GO database ( Gene lists were obtained from the OBO format 1.2 database release available from (build release date 15 December 2009). GO terms are structured in a semihierarchical relationship within the cellular component, molecular function and biological process nodes. Daughter ontology terms are more specialized and parent ontology terms are less specialized. However, unlike a hierarchy, a term may have more than one parent term.

Parent terms were populated by their daughter terms to describe a composite list of genes for each term. SNP ratios were calculated on GO terms with greater than 20 SNPs, but less than 2000 SNPs, and greater than 1 gene, but no more than 1000 genes. A total of 6853 GO terms met these criteria. To account for identity of terms, we merged those GO terms containing identical gene lists; in total the list of unique terms is 5955.

Simulation of GO terms

As mentioned above, the GO terms used in this study can show considerable overlap owing to term redundancy, biological overlap and the hierarchical nature of the database. Simulations were performed to calculate the null distribution and subsequent expectancy for the total number of associated GO terms at a given threshold in a single study given the GO terms used.

We performed 1000 pedSRT permutations on a case-randomized sample derived from 1248 families from the discovery dataset. A GWAS TDT was performed on each dataset followed by pedSRT, using 10 000 additional permutations on the 5995 GO terms. For each of the 1000 original permutations, the proportion of the 5955 GO terms that met a significance threshold of P≤0.05 in the subsequent 10 000 was calculated. The mean proportion across the 1000 permutations was used to predict the expected number of associated GO terms in a dataset.

Pathway enrichment map generation

Visual representation of overlap in enriched GO terms was performed using the EnrichmentMap ( plugin for Cytoscape 2.8.0 ( Consistent with the author's recommendations for use with the GO database, nodes were joined if the overlap coefficient was ≥0.5.


Across all analysis in the discovery dataset, 1035 unique GO terms show association enrichment at SRT-P-value≤0.05. Examination of those GO terms that show strong enrichment (SRT-P-value<0.001) highlights diverse processes such as the regulation of cell division (mitosis and meiosis), ribosome processing and apoptosis. A visual representation of enriched pathways is shown in Supplementary Figure 1. A summary of the total number of GO terms that show enrichment at SRT-P-value≤0.05 is given in Table 2. On the basis of simulated data, 4.46% (SD=0.8%) of the 5995 unique but non-independent pathways are expected to be associated at SRT-P≤0.05 level. Given this level, we would expect 267 GO terms to be associated per experiment. To provide a greater distinction of potentially important GO terms, we examined the overlap of enriched GO terms in an independent replication dataset. On the basis of 4.46% of GO terms showing enrichment, we would expect to observe replication for 12 of the 5995 pathways. All individual discovery samples show more GO terms associated than would be expected by chance (see Expected 1; Table 2). Moreover, the overlap between the discovery and replication sample also show enrichment over what would be expected by chance (see Expected replication 2; Table 2). When we use a more cautious interpretation based on the total number of observed associated GO terms in the discovery data and the predicted replication of 4.46% we would expect to replicate is between 15 and 17 pathways (see Expected 3; Table 2). Under this model, we still show enriched replication for each ‘DiagnosisAncestry’ groupings. Overall, compared to simulated data we observe between 1.5- and 3.2-fold enrichment in the overlap of pathways in the discovery and replication dataset above what can be expected by chance.

Table 2 Summary of enriched GO terms and overlap in the discovery and replication sample
Table 3 GO terms showing replicated enrichment in two or more analytic groupings

A summary of the replicated pathways, summary statistics, gene number and genes tagged in this analysis is shown in Tables 4, 5, 6, 7 (full lists of replicated pathways can be found in Supplementary Tables 1A–C). A total of 88 unique GO terms were shown to be replicated within analytic groupings (see Supplementary Table 2), 22 GO terms were replicated within two of the analytic groupings and four GO terms were replicated within three of the analytic groupings (see Table 3). Replication was only considered within strata, such that, for example, GO terms identified in the discovery StrictEuropean analyses were examined in the StrictEuropean replication dataset. The four GO terms that show enrichment across three groupings are as follows: GO:0006090, GO:0032872, GO:0032874 and GO:0042156, involved in pyruvate metabolism, regulation of the mitogen-activated protein kinase (MAPK) cascade and zinc-mediated transcriptional activation. A visual representation of replicated enriched pathways is shown in Supplementary Figure 2.

Table 4 Top 10 association enrichments of pedSRT for overlapping GO terms for analyses of families of all ancestries with a proband with a Spectrum diagnosis
Table 5 Top 10 association enrichments of pedSRT for overlapping GO terms for analyses of families of European ancestries with a proband with a Spectrum diagnosis
Table 6 Top 10 association enrichments of pedSRT for overlapping GO terms for analyses of families of all ancestries with a proband with a Strict diagnosis
Table 7 Top 10 association enrichments of pedSRT for overlapping GO terms for analyses of families of European ancestries with a proband with a Strict diagnosis


The interpretation of GWAS data purely on the strength of association data is challenging where the distribution of association is close to or barely exceeding what is expected by the number of tests. In the absence of clear association enrichment across the entire dataset, interpretation has relied on rank order or via the application of suboptimal significance thresholds that juggle type-I and type-II error. The principle of association enrichment approaches is to discover whether within this milieu of data there are underlying patterns to the association. In these approaches, we ask whether SNPs that are linked to genes of common function show greater proportion of nominal association than expected by chance. Although a modest association signal at an individual SNP within a gene may not warrant further investigation, the cumulative association of SNPs within a gene family may offer insight into the biology of the disorder.

Gene enrichment approaches have been primarily developed to aid interpretation of data from microarray expression studies. In this context, each gene is tagged by either one or a small number of probes regardless of gene size. However, when applying these technologies to SNP-based data, we do not measure gene-wise variation or gene-wise association; instead, we can potentially examine multiple points of association at any given gene using many tagging SNPs. This brings additional challenges and bias. When applying association enrichment, we must account for and correct for these potential bias in these data. Firstly, when examining larger genes, we utilize more SNP markers to tag the variation than for smaller genes. If we choose a maximum association signal approach per gene, we observe by chance an inflated signal for the larger genes. By calculating the ratio of associated to not associated SNPs, we can adjust each GO term to the total number of SNPs examined per GO term. Secondly, where multiple markers tag a gene, one might observe multiple strong association signals due to strong linkage disequilibrium between the associated markers. To reduce this effect, we calculate significance of the data through permutation. Permutation is performed by case randomisation within families where a pseudo-control sibling is created from the alleles that are not transmitted to the proband. By using the non-transmitted alleles, we retain the linkage disequilibrium structure across the genome, thereby retaining linkage-disequilibrium-related inflation in the original association signal.

We have applied the SRT to family-based data from the AGP to identify 88 gene sets from the GO database that show a replicated enrichment for association signal. Of the overlapping GO terms, we observe enrichment in sets involved in diverse biological processes, including pyruvate metabolism, transcription factor activation, cell-signalling and cell-cycle regulation.

One of the strongest findings from the discovery and replication findings was observed across the ‘Strict diagnosisAll ancestries’ grouping for the GO term GO:0031146; SCF-dependent proteasomal ubiquitin-dependent protein catabolic process (Discovery SRT-P=0.0001; Replication SRT-P=0.0009). GO:0031146 is described by only two genes (FBXO31 and FBXO6). Both genes are members of the F-box protein family, which are involved in a variety of molecular and cellular functions, including protein degradation, synapse formation and circadian rhythm.22 FBXO6 has also been suggested as a putative biomarker for autism,23 as one of 13 genes highlighted in the work of Nishimura and Brown24 who show differential expression at this gene in the lymphoblastoid cell lines from individuals with both the FMR1 mutation and autism compared with typically developing controls.

Those GO terms that show replication across multiple diagnostic and ancestral groups are also worth noting as they are robust to differences in sampling used in our analyses. Four replicated GO terms were observed in three analytic groupings (see Table 3). These include GO:0006090, GO:0032872, GO:0032874 and GO:0042156. GO:0006090 (pyruvate metabolic process) describes a group of 39-tagged genes (see Supplementary Table 3) covered by 589 SNPs. These genes are involved in the biological processes connecting the chemical reactions and pathways involving pyruvate. Pyruvate metabolism is a component of the energy metabolism pathway, which has received considerable attention with respect to autism. The biological plausibility of the pyruvate metabolic process association enrichment is supported by numerous studies showing evidence of aberration in pyruvate levels in individuals with autism.25 The GO term GO:0042156 (zinc-mediated transcriptional activator activity) describes a group of three genes tagged by 37 SNPs (MTF1, RNF4 and ZNF384). One of the constituent genes, MTF1, human metal-regulatory transcription-factor-1, has previously warranted investigation as putative candidate gene for autistic disorder under an environmental exposure model of autism.26 Finally, GO:0032872 (regulation of stress-activated MAPK cascade) and GO:0032874 (positive regulation of stress-activated MAPK cascade), which differ by a single gene (see Supplementary Table 3), describe 10 and 9 genes, and 122 and 116 SNPs, respectively. These pathways are involved in increasing the signalling of the stress-related MAPK signalling pathway. Stress-activated MAPKs are thought to have a critical role in modulating inflammation, DNA damage response, apoptosis in cancer27 and negative regulation of cell-cycle progression.28, 29 Cell-cycle progression and DNA damage response are also highlighted in enriched replicated GO terms in these analyses, for example, GO:0032404 (mismatch repair complex binding) and GO:0031571 (G1/S DNA damage checkpoint).

In a recent study by this group, we explored enrichment in GO terms for rare deleted CNVs.30 Using individuals from the Discovery Group, we identified 24 enriched GO terms that show enrichment in rare CNV at FDR q<0.05 that highlighted five biological domains: namely cell proliferation, cell projection and motility, MHC-I, GTPase/RAS signalling and kinase activation/regulation. We do not observe any overlap between the 88 gene sets showing replicated enrichment in the GWAS data with the 24 significant GO terms identified for rare structural variation. However, we do observe some overlap for GO terms enriched only in the discovery dataset. These include overlap in ‘cell migration’, ‘cell motility’, ‘cell morphogenesis’ and GO terms identified as having a role in protein kinase regulation.

We can take some encouragement that highlighted pathways are supported in the autism literature. We have emphasized biological plausibility of some of these pathways with autism and ASD. However, one major caveat when interpreting these data is whether this overlapping evidence reflects the considerable literature surrounding autism research and is therefore coincident, or is biologically meaningful concordance.

Pathway approaches, such as the SRT and pedSRT, can also be applied to research questions using candidate gene list. Candidate genes rely on the selection of genes and markers based on previous knowledge of biology, function and position of the gene or marker. The pathway approach in the form used in this manuscript applies a ‘hypothesis-free’ design, in which we examine all GO terms regardless of putative role. In a recent autism GWAS described by Wang et al,4 the authors applied a hypothesis-testing candidate gene approach using their own methodology11 to examine whether a group of cadherin and neurexin genes showed enrichment in their association data. The authors conclude that there was association enrichment for both a group of cadherin, and cadherin plus neurexin genes (P=0.02, 0.004, respectively). We applied our approach to these gene lists in our data (data not shown). Using the pedSRT, which differs in statistical method and gene-to-SNP assignment to that of Wang and co-workers, we do not observe significant association signal enrichment in either the discovery or replication dataset for any of the analytic groupings.

To further explore potential overlap of our data and other GWAS, we examined whether previously implicated genes from recent autism GWAS were present in the GO terms identified in this study. None of the genes that overlap with the top-associated SNPs from previous GWAS described by Wang and co-workers4 (CDH22, CTNNA3, DMD, FEZF2, LOC100132914, LRRC1 and SYT17) and Weiss and co-workers6 (ACTN2, ADA, CENPC1, CRIM1, CTNNA3, CUGBP2, GAS2, IQGAP2, JARID2, SGCD and XG) appeared in the 88 unique GO terms showing overlap in these analyses. Moreover, we do not observe overlap with those genes highlighted by the authors as residing close to their maximal association peaks, namely SEMA5A, TAS2R1 and CDH9, CDH10.

The GO database is continuously updated as evidence is gathered on gene biology. The build of the database used in these analyses contains information on 17 703 genes, compared with less than 5000 for databases such as KEGG. However, not all genes are tagged to GO terms. This is exemplified by the MACROD2 gene, which contained SNPs showing the strongest association signal from our previous GWAS analyses.7 Over time more information will be gathered on the biological role and interactions between these genes to further annotate these terms.

In addition to single gene effects such as MACROD2, data presented in this analysis may offer some additional insight into biological processes, within which genetic risk for autism may lie. This can include hypothesis-free gene lists such as those in the GO dataset, or more hypothesis-driven candidate gene lists highlighting previous linkage, association or biology. The application of pedSRT to our GWAS data has highlighted biological processes previously implicated in autism and offers impetus to re-examine these processes based on evidence from genome-wide investigation. Association enrichment analysis provides additional evidence from GWAS data to identify genetic risk variants and genes and prioritize biological processes for further research into areas such as biomarker discovery, gene–gene interaction analyses and identification of putative drug targets.


  1. 1

    Fombonne E : Epidemiology of pervasive developmental disorders. Pediatr Res 2009; 65: 591–598.

  2. 2

    Fernell E, Gillberg C : Autism spectrum disorder diagnoses in Stockholm preschoolers. Res Dev Disabil 2010; 31: 680–685.

  3. 3

    Bailey A, Le Couteur A, Gottesman I et al: Autism as a strongly genetic disorder: evidence from a British twin study. Psychol Med 1995; 25: 63–77.

  4. 4

    Wang K, Zhang H, Ma D et al: Common genetic variants on 5p14.1 associate with autism spectrum disorders. Nature 2009; 459: 528–533.

  5. 5

    Ma D, Salyakina D, Jaworski JM et al: A genome-wide association study of autism reveals a common novel risk locus at 5p14.1. Ann Hum Genet 2009; 73: 263–273.

  6. 6

    Weiss LA, Arking DE, Daly MJ et al: A genome-wide linkage and association scan reveals novel loci for autism. Nature 2009; 461: 802–808.

  7. 7

    Anney R, Klei L, Pinto D et al: A genome-wide scan for common alleles affecting risk for autism. Hum Mol Genet 2010; 19: 4072–4082.

  8. 8

    Purcell SM, Wray NR, Stone JL et al: Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 2009; 460: 748–752.

  9. 9

    Holmans P, Green EK, Pahwa JS et al: Gene ontology analysis of GWA study data sets provides insights into the biology of bipolar disorder. Am J Hum Genet 2009; 85: 13–24.

  10. 10

    Wang K, Li M, Hakonarson H : Analysing biological pathways in genome-wide association studies. Nat Rev Genet 2010; 11: 843–854.

  11. 11

    Wang K, Li M, Bucan M : Pathway-based approaches for analysis of genomewide association studies. Am J Hum Genet 2007; 81: 1278–1283.

  12. 12

    O’Dushlaine C, Kenny E, Heron EA et al: The SNP ratio test: pathway analysis of genome-wide association datasets. Bioinformatics 2009; 25: 2762–2763.

  13. 13

    Lord C, Rutter M, Le Couteur A : Autism Diagnostic Interview-Revised: a revised version of a diagnostic interview for caregivers of individuals with possible pervasive developmental disorders. J Autism Dev Disord 1994; 24: 659–685.

  14. 14

    Lord C, Rutter M, Goode S et al: Autism diagnostic observation schedule: a standardized observation of communicative and social behavior. J Autism Dev Disord 1989; 19: 185–212.

  15. 15

    Risi S, Lord C, Gotham K et al: Combining information from multiple sources in the diagnosis of autism spectrum disorders. J Am Acad Child Adolesc Psychiatry 2006; 45: 1094–1103.

  16. 16

    Lee AB, Luca D, Klei L et al: Discovering genetic ancestry using spectral graph theory. Genet Epidemiol 2010; 34: 51–59.

  17. 17

    Purcell S, Neale B, Todd-Brown K et al: PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 2007; 81: 559–575.

  18. 18

    Spielman RS, McGinnis RE, Ewens WJ : Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am J Hum Genet 1993; 52: 506–516.

  19. 19

    Ashburner M, Ball CA, Blake JA et al: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 2000; 25: 25–29.

  20. 20

    Merico D, Isserlin R, Stueker O et al: Enrichment map: a network-based method for gene-set enrichment visualization and interpretation. PLoS One 2010; 5: e13984.

  21. 21

    Shannon P, Markiel A, Ozier O et al: Cytoscape. A software environment for integrated models of biomolecular interaction networks. Genome Res 2003; 13: 2498–2504.

  22. 22

    Ho MS, Ou C, Chan YR et al: The utility F-box for protein destruction. Cell Mol Life Sci 2008; 65: 1977–2000.

  23. 23

    Nishimura Y, Martin CL, Vazquez-Lopez A et al: Genome-wide expression profiling of lymphoblastoid cell lines distinguishes different forms of autism and reveals shared pathways. Hum Mol Genet 2007; 16: 1682–1698.

  24. 24

    Brown V, Jin P, Ceman S et al: Microarray identification of FMRP-associated brain mRNAs and altered mRNA translational profiles in fragile X syndrome. Cell 2001; 107: 477–487.

  25. 25

    Haas RH : Autism and mitochondrial disease. Dev Disabil Res Rev 2010; 16: 144–153.

  26. 26

    Serajee FJ, Nabi R, Zhong H et al: Polymorphisms in xenobiotic metabolism genes and autism. J Child Neurol 2004; 19: 413–417.

  27. 27

    Dhillon AS, Hagan S, Rath O et al: MAP kinase signalling pathways in cancer. Oncogene 2007; 26: 3279–3290.

  28. 28

    Bulavin DV, Fornace Jr AJ : p38 MAP kinase's emerging role as a tumor suppressor. Adv Cancer Res 2004; 92: 95–118.

  29. 29

    Bradham C, McClay DR : p38 MAPK in development and cancer. Cell Cycle 2006; 5: 824–828.

  30. 30

    Pinto D, Pagnamenta AT, Klei L et al: Functional impact of global rare copy number variation in autism spectrum disorders. Nature 2010; 466: 368–372.

Download references


We gratefully acknowledge the families participating in the study and the main funders of the AGP: Autism Speaks (USA), the Health Research Board (HRB, Ireland; AUT/2006/1, AUT/2006/2, PD/2006/48), The Medical Research Council (MRC, UK), Genome Canada/Ontario Genomics Institute and the Hilibrand Foundation (USA). Additional support for individual groups was provided by the US National Institutes of Health (NIH Grants: HD055751, HD055782, HD055784, MH52708, MH55284, MH061009, MH06359, MH066673, MH080647, MH081754, MH66766, NS026630, NS042165, NS049261), the Canadian Institutes for Health Research (CIHR), Assistance Publique – Hôpitaux de Paris (France), Autism Speaks UK, Canada Foundation for Innovation/Ontario Innovation Trust, Deutsche Forschungsgemeinschaft (Grant: Po 255/17-4) (Germany), EC Sixth FP AUTISM MOLGEN, Fundação Calouste Gulbenkian (Portugal), Fondation de France, Fondation FondaMental (France), Fondation Orange (France), Fondation pour la Recherche Médicale (France), Fundação para a Ciência e Tecnologia (Portugal), the Hospital for Sick Children Foundation and University of Toronto (Canada), INSERM (France), Institut Pasteur (France), the Italian Ministry of Health (convention 181 of 19 October 2001), the John P Hussman Foundation (USA), McLaughlin Centre (Canada), Ontario Ministry of Research and Innovation (Canada), the Seaver Foundation (USA), the Swedish Science Council, The Centre for Applied Genomics (Canada), the Utah Autism Foundation (USA) and the Wellcome Trust core award 075491/Z/04 (UK). DP is supported by fellowships from the Royal Netherlands Academy of Arts and Sciences (TMF/DA/5801) and the Netherlands Organization for Scientific Research (Rubicon 825.06.031). SWS holds the GlaxoSmithKline-CIHR Pathfinder Chair in Genetics and Genomics at the University of Toronto and the Hospital for Sick Children (Canada).

Author contributions

EMK and COD developed the principle of the SRT experiments. RJLA designed the study, developed and implemented the pedSRT methodology and wrote the manuscript. EMK, COD, BLY, MG and LG aided in manuscript preparation. RJLA, EMK, COD, BY, EP, JDB and JSS discussed research strategies and data through the ’pathway-based analysis working group‘. Additional intellectual support and guidance was provided through the AGP including BD, ADP, EHC, PS, JTG, CK, KW, HH and EM.

Author information

Correspondence to Richard J L Anney.

Ethics declarations

Competing interests

The authors declare no conflict of interest.

Additional information

The AGP Members §Member of Senior Investigator Committee for the AGP Anthony J Bailey§1, Bridget A Fernandez2, Peter Szatmari§3, Stephen W Scherer§4,5, Andrew Patterson§4, Christian R Marshall4, Dalila Pinto4, John B Vincent6, Eric Fombonne7, Catalina Betancur§8, Richard Delorme9, Marion Leboyer10, Thomas Bourgeron11, Carine Mantoulan12, Bernadette Roge12, Maïté Tauber12, Christine M Freitag§13, Fritz Poustka13, Eftichia Duketis13, Sabine M Klauck§14, Annemarie Poustka14, Katerina Papanikolaou15, John Tsiantis15, Louise Gallagher§16, Michael Gill§16, Richard Anney16, Nadia Bolshakova16, Sean Brennan16, Gillian Hughes16, Jane McGrath16, Alison Merikangas16, Sean Ennis§17, Andrew Green17, Jillian P Casey17, Judith M Conroy17, Regina Regan17, Naisha Shah17, Elena Maestrini§18, Elena Bacchelli18, Fiorella Minopoli18, Vera Stoppioni19, Agatino Battaglia§20, Roberta Igliozzi20, Barbara Parrini20, Raffaella Tancredi20, Guiomar Oliveira§21, Joana Almeida21, Frederico Duque21, Astrid Vicente§22,23,24, Catarina Correia22,23,24, Tiago R Magalhaes23, Christopher Gillberg25, Gudrun Nygren25, Maretha de Jonge26, Herman Van Engeland26, Jacob AS Vorstman26, Kerstin Wittemeyer27, Gillian Baird28, Patrick F Bolton29, Michael L Rutter30, Jonathan Green31, Janine A Lamb32, Andrew Pickles33, Jeremy R Parr34, Ann Le Couteur34, Tom Berney34, Helen McConachie34, Simon Wallace35, Marc Coutanche35, Suzanne Foley35, Kathy White35, Anthony P Monaco§36, Richard Holt36, Penny Farrar36, Alistair T Pagnamenta36, Ghazala K Mirza36, Jiannis Ragoussis36, Inês Sousa36, Nuala Sykes36, Kirsty Wing36, Joachim Hallmayer§37, Rita M Cantor§38, Stanley F Nelson38, Daniel H Geschwind§39, Brett S Abrahams39, Fred Volkmar40, Margaret A Pericak-Vance§41, Michael L Cuccaro41, John Gilbert41, Edwin H Cook§42, Stephen J Guter42, Suma Jacob42, John I Nurnberger Jr§43, Christopher J McDougle43, David J Posey43, Catherine Lord44, Christina Corsello44, Vanessa Hus44, Joseph D Buxbaum§45,46, Alexander Kolevzon46, Latha Soorya46, Elena Parkhomenko46, Bennett L Leventhal47, Geraldine Dawson48, Veronica J Vieland§49, Hakon Hakonarson§50,51, Joseph T Glessner51, Cecilia Kim, Kai Wang51, Gerard D Schellenberg§52, Bernie Devlin§53, Lamburtus Klei53, Nancy Minshew54, James S Sutcliffe§55, Jonathan L Haines§55, Sabata C Lund55, Susanne Thomson55, Brian L Yaspan55, Hilary Coon§56, Judith Miller56, William M McMahon56, Jeff Munson57, Annette Estes58, Ellen M Wijsman§59 1Department of Psychiatry, University of British Columbia, British Columbia, Canada; 2Disciplines of Genetics and Medicine, Memorial University of Newfoundland, St John's, Canada; 3Department of Psychiatry and Behavioural Neurosciences, McMaster University, Hamilton, Canada; 4The Centre for Applied Genomics and Program in Genetics and Genomic Biology, The Hospital for Sick Children, Toronto, Canada; 5Department of Molecular Genetics, University of Toronto, Toronto, Canada; 6Centre for Addiction and Mental Health, Clarke Institute and Department of Psychiatry, University of Toronto, Toronto, Canada; 7Division of Psychiatry, McGill University, Montreal, Canada; 8INSERM U952 and CNRS UMR 7224 and UPMC Univ Paris 06, Paris, France; 9INSERM U955, Fondation FondaMental, APHP, Hôpital Robert Debré, Child and Adolescent Psychiatry, Paris, France; 10INSERM U995, Department of Psychiatry, Groupe Hospitalier Henri Mondor-Albert Chenevier, AP-HP; University Paris 12, Fondation FondaMental, Crétiel, France; 11Human Genetics and Cognitive Functions, Institut Pasteur; University Paris Diderot-Paris 7, CNRS URA 2182, Fondation FondaMental, Paris, France; 12Octogone/CERPP (Centre d’Eudes et de Recherches en Psychopathologie), University de Toulouse Le Mirail, Toulouse, France; 13Department of Child and Adolescent Psychiatry, Psychosomatics and Psychotherapy, JW Goethe University Frankfurt, Frankfurt, Germany; 14Division of Molecular Genome Analysis, German Cancer Research Center (DKFZ), Heidelberg, Germany; 15University Department of Child Psychiatry, Athens University, Medical School, Agia Sophia Children's Hospital, Athens, Greece; 16Autism Genetics Group, Department of Psychiatry, School of Medicine, Trinity College, Dublin, Ireland; 17School of Medicine and Medical Science, University College, Dublin, Ireland; 18Department of Biology, University of Bologna, Bologna, Italy; 19Neuropsichiatria Infantile, Ospedale Santa Croce, Fano, Italy; 20Stella Maris Institute for Child and Adolescent Neuropsychiatry, Calambrone (Pisa), Italy; 21Hospital Pediátrico de Coimbra, Coimbra, Portugal; 22Instituto Nacional de Saude Dr Ricardo Jorge, Lisbon, Portugal; 23BioFIG – Center for Biodiversity, Functional and Integrative Genomics, Lisboa, Portugal; 24Instituto Gulbenkian de Ciência, Oeiras, Portugal; 25Gillberg Neuropsychiatry Centre, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden; 26Department of Child and Adolescent Psychiatry, University Medical Center, Utrecht, The Netherlands; 27Autism Centre for Education and Research, School of Education, University of Birmingham, Birmingham, UK; 28Newcomen Centre, Guy's Hospital, London, UK; 29Department of Child and Adolescent Psychiatry, Institute of Psychiatry, King's College London, London, UK; 30Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, King's College London, London, UK; 31Academic Department of Child Psychiatry, Booth Hall of Children's Hospital, Blackley, Manchester, UK; 32Centre for Integrated Genomic Medical Research, University of Manchester, Manchester, UK; 33Department of Medicine, School of Epidemiology and Health Science, University of Manchester, Manchester, UK; 34Institute of Neuroscience, and Institute of Health and Society, Newcastle University, Newcastle Upon Tyne, UK; 35Department of Psychiatry, University of Oxford, Warneford Hospital, Headington, UK; 36Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK; 37Department of Psychiatry, Division of Child and Adolescent Psychiatry and Child Development, Stanford University School of Medicine, Stanford, USA; 38Department of Human Genetics, University of California – Los Angeles School of Medicine, Los Angeles, USA; 39Program in Neurogenetics, Department of Neurology and Center for Autism Research and Treatment, Semel Institute, David Geffen School of Medicine at UCLA, Los Angeles, USA; 40Child Study Centre, Yale University, New Haven, USA; 41The John P Hussman Institute for Human Genomics, University of Miami School of Medicine, Miami, USA; 42Department of Psychiatry, Institute for Juvenile Research, University of Illinois at Chicago, Chicago, USA; 43Department of Psychiatry, Indiana University School of Medicine, Indianapolis, USA; 44Autism and Communicative Disorders Centre, University of Michigan, Ann Arbor, USA; 45Department of Genetics and Genomic Sciences and Neuroscience, Mount Sinai School of Medicine, New York, USA; 46The Seaver Autism Center for Research and Treatment and Department of Psychiatry, Mount Sinai School of Medicine, New York, USA; 47Nathan Kline Institute for Psychiatric Research (NKI), Orangeburg; Department of Child and Adolescent Psychiatry, New York University and NYU Child Study Center, New York, USA; 48Autism Speaks, New York; Department of Psychiatry, University of North Carolina, Chapel Hill, USA; 49Battelle Center for Mathematical Medicine, The Research Institute at Nationwide Children's Hospital and The Ohio State University, Columbus, USA; 50Department of Pediatrics, Children's Hospital of Philadelphia, University of Pennsylvania School of Medicine, Philadelphia, USA; 51Division of Human Genetics, The Center for Applied Genomics, The Children's Hospital of Philadelphia, Philadelphia, USA; 52Pathology and Laboratory Medicine, University of Pennsylvania, USA; 53Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, USA; 54Departments of Psychiatry and Neurology, University of Pittsburgh School of Medicine, Pittsburgh, USA; 55Department of Molecular Physiology and Biophysics, and Center for Human Genetics Research, Vanderbilt University School of Medicine, Nashville, USA; 56Psychiatry Department, University of Utah Medical School, Salt Lake City, USA; 57Department of Psychiatry and Behavioural Sciences, University of Washington, Seattle, USA; 58Department of Speech and Hearing Sciences, University of Washington, Seattle, USA; 59Departments of Biostatistics and Medicine, University of Washington, Seattle, USA.

Supplementary Information accompanies the paper on European Journal of Human Genetics website

Supplementary information

Supplementary Information (DOC 791 kb)

Rights and permissions

Reprints and Permissions

About this article


  • autism
  • genome-wide association analysis
  • pathway analysis
  • family-based association test
  • gene ontology

Further reading