Autism spectrum disorder (ASD) is a complex neurodevelopmental disorder with a strong genetic basis. Yet, only a small fraction of potentially causal genes—about 65 genes out of an estimated several hundred—are known with strong genetic evidence from sequencing studies. We developed a complementary machine-learning approach based on a human brain-specific gene network to present a genome-wide prediction of autism risk genes, including hundreds of candidates for which there is minimal or no prior genetic evidence. Our approach was validated in a large independent case–control sequencing study. Leveraging these genome-wide predictions and the brain-specific network, we demonstrated that the large set of ASD genes converges on a smaller number of key pathways and developmental stages of the brain. Finally, we identified likely pathogenic genes within frequent autism-associated copy-number variants and proposed genes and pathways that are likely mediators of ASD across multiple copy-number variants. All predictions and functional insights are available at http://asd.princeton.edu.
Your institute does not have access to this article
Open Access articles citing this article.
SFARI genes and where to find them; modelling Autism Spectrum Disorder specific gene expression dysregulation with RNA-seq data
Scientific Reports Open Access 16 June 2022
Bringing machine learning to research on intellectual and developmental disabilities: taking inspiration from neurological diseases
Journal of Neurodevelopmental Disorders Open Access 02 May 2022
BMC Bioinformatics Open Access 22 April 2022
Subscribe to Journal
Get full journal access for 1 year
only $4.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
Gene Expression Omnibus
Winter, E.E., Goodstadt, L. & Ponting, C.P. Elevated rates of protein secretion, evolution, and disease among tissue-specific genes. Genome Res. 14, 54–61 (2004).
Goh, K.-I. et al. The human disease network. Proc. Natl. Acad. Sci. USA 104, 8685–8690 (2007).
Sanders, S.J. et al. De novo mutations revealed by whole-exome sequencing are strongly associated with autism. Nature 485, 237–241 (2012).
O'Roak, B.J. et al. Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature 485, 246–250 (2012).
Neale, B.M. et al. Patterns and rates of exonic de novo mutations in autism spectrum disorders. Nature 485, 242–245 (2012).
Iossifov, I. et al. De novo gene disruptions in children on the autistic spectrum. Neuron 74, 285–299 (2012).
Iossifov, I. et al. The contribution of de novo coding mutations to autism spectrum disorder. Nature 515, 216–221 (2014).
De Rubeis, S. et al. Synaptic, transcriptional and chromatin genes disrupted in autism. Nature 515, 209–215 (2014).
Sanders, S.J. et al. Insights into autism spectrum disorder genomic architecture and biology from 71 risk loci. Neuron 87, 1215–1233 (2015).
Ronemus, M., Iossifov, I., Levy, D. & Wigler, M. The role of de novo mutations in the genetics of autism spectrum disorders. Nat. Rev. Genet. 15, 133–141 (2014).
He, X. et al. Integrated model of de novo and inherited genetic variants yields greater power to identify risk genes. PLoS Genet. 9, e1003671 (2013).
Gilman, S.R. et al. Rare de novo variants associated with autism implicate a large functional network of genes involved in formation and function of synapses. Neuron 70, 898–907 (2011).
Lee, T.-L.L., Raygada, M.J. & Rennert, O.M. Integrative gene network analysis provides novel regulatory relationships, genetic contributions and susceptible targets in autism spectrum disorders. Gene 496, 88–96 (2012).
Kou, Y., Betancur, C., Xu, H., Buxbaum, J.D. & Ma'ayan, A. Network- and attribute-based classifiers can prioritize genes and pathways for autism spectrum disorders and intellectual disability. Am. J. Med. Genet. C. Semin. Med. Genet. 160C, 130–142 (2012).
Ben-David, E. & Shifman, S. Combined analysis of exome sequencing points toward a major role for transcription regulation during brain development in autism. Mol. Psychiatry 18, 1054–1056 (2013).
Parikshak, N.N. et al. Integrative functional genomic analyses implicate specific molecular pathways and circuits in autism. Cell 155, 1008–1021 (2013).
Li, J. et al. Integrated systems analysis reveals a molecular network underlying autism spectrum disorders. Mol. Syst. Biol. 10, 774 (2014).
Chang, J., Gilman, S.R., Chiang, A.H., Sanders, S.J. & Vitkup, D. Genotype to phenotype relationships in autism spectrum disorders. Nat. Neurosci. 18, 191–198 (2015).
Hormozdiari, F., Penn, O., Borenstein, E. & Eichler, E.E. The discovery of integrated gene networks for autism and related disorders. Genome Res. 25, 142–154 (2015).
Liu, L., Lei, J. & Roeder, K. Network assisted analysis to reveal the genetic basis of autism. Ann. Appl. Stat. 9, 1571–1600 (2015).
Greene, C.S. et al. Understanding multicellular function and disease with human tissue-specific networks. Nat. Genet. 47, 569–576 (2015).
Darnell, J.C. et al. FMRP stalls ribosomal translocation on mRNAs linked to synaptic function and autism. Cell 146, 247–261 (2011).
King, I.F. et al. Topoisomerases facilitate transcription of long genes linked to autism. Nature 501, 58–62 (2013).
Cotney, J. et al. The autism-associated chromatin modifier CHD8 regulates other autism risk genes during human neurodevelopment. Nat. Commun. 6, 6404 (2015).
Pinto, D. et al. Convergence of genes and cellular pathways dysregulated in autism spectrum disorders. Am. J. Hum. Genet. 94, 677–694 (2014).
Bayés, A. et al. Characterization of the proteome, diseases and evolution of the human postsynaptic density. Nat. Neurosci. 14, 19–21 (2011).
Corominas, R. et al. Protein interaction network of alternatively spliced isoforms from brain links genetic risk factors for autism. Nat. Commun. 5, 3650 (2014).
Iossifov, I. et al. Low load for disruptive mutations in autism genes and their biased transmission. Proc. Natl. Acad. Sci. USA 112, E5600–E5607 (2015).
Kang, H.J. et al. Spatio-temporal transcriptome of the human brain. Nature 478, 483–489 (2011).
Willsey, A.J. et al. Coexpression networks implicate human midfetal deep cortical projection neurons in the pathogenesis of autism. Cell 155, 997–1007 (2013).
Uddin, M. et al. Brain-expressed exons under purifying selection are enriched for de novo mutations in autism spectrum disorder. Nat. Genet. 46, 742–747 (2014).
Stoner, R. et al. Patches of disorganization in the neocortex of children with autism. N. Engl. J. Med. 370, 1209–1219 (2014).
Haar, S., Berman, S., Behrmann, M. & Dinstein, I. Anatomical abnormalities in autism? Cereb. Cortex 4, 1440–1452 (2016).
Dinstein, I., Heeger, D.J. & Behrmann, M. Neural variability: friend or foe? Trends Cogn. Sci. 19, 322–328 (2015).
Wang, S.S.-H., Kloth, A.D. & Badura, A. The cerebellum, sensitive periods, and autism. Neuron 83, 518–532 (2014).
Peça, J. et al. Shank3 mutant mice display autistic-like behaviours and striatal dysfunction. Nature 472, 437–442 (2011).
Di Martino, A. et al. The autism brain imaging data exchange: towards a large-scale evaluation of the intrinsic brain architecture in autism. Mol. Psychiatry 19, 659–667 (2014).
Goldberg, D.S. & Roth, F.P. Assessing experimentally derived interactions in a small world. Proc. Natl. Acad. Sci. USA 100, 4372–4376 (2003).
Voineagu, I. et al. Transcriptomic analysis of autistic brain reveals convergent molecular pathology. Nature 474, 380–384 (2011).
Masi, A. et al. Cytokine aberrations in autism spectrum disorder: a systematic review and meta-analysis. Mol. Psychiatry 20, 440–446 (2015).
Bresnahan, M. et al. Association of maternal report of infant and toddler gastrointestinal symptoms with autism: evidence from a prospective birth cohort. JAMA Psychiatry 72, 466–474 (2015).
Hazen, E.P., Stornelli, J.L., O'Rourke, J.A., Koesterer, K. & McDougle, C.J. Sensory symptoms in autism spectrum disorders. Harv. Rev. Psychiatry 22, 112–124 (2014).
Cohen, S., Conduit, R., Lockley, S.W., Rajaratnam, S.M. & Cornish, K.M. The relationship between sleep and behavior in autism spectrum disorder (ASD): a review. J. Neurodev. Disord. 6, 44 (2014).
Takahashi, T. et al. Rosbin: a novel homeobox-like protein gene expressed exclusively in round spermatids. Biol. Reprod. 70, 1485–1492 (2004).
Weiss, L.A. et al. Association between microdeletion and microduplication at 16p11.2 and autism. N. Engl. J. Med. 358, 667–675 (2008).
Lin, G.N. et al. Spatiotemporal 16p11.2 protein network implicates cortical late mid-fetal brain development and KCTD13-Cul3-RhoA pathway in psychiatric diseases. Neuron 85, 742–754 (2015).
Martin-Granados, C., Philp, A., Oxenham, S.K., Prescott, A.R. & Cohen, P.T.W. Depletion of protein phosphatase 4 in human cells reveals essential roles in centrosome maturation, cell migration and the regulation of Rho GTPases. Int. J. Biochem. Cell Biol. 40, 2315–2332 (2008).
Vanunu, O., Magger, O., Ruppin, E., Shlomi, T. & Sharan, R. Associating genes and protein complexes with disease via network propagation. PLoS Comput. Biol. 6, e1000641 (2010).
Hus, V., Gotham, K. & Lord, C. Standardizing ADOS domain scores: separating severity of social affect and restricted and repetitive behaviors. J. Autism Dev. Disord. 44, 2400–2412 (2014).
Moreno-De-Luca, D. et al. Using large clinical data sets to infer pathogenicity for rare copy number variants in autism cohorts. Mol. Psychiatry 18, 1090–1095 (2013).
Abrahams, B.S. et al. SFARI Gene 2.0: a community-driven knowledgebase for the autism spectrum disorders (ASDs). Mol. Autism 4, 36 (2013).
Hamosh, A., Scott, A.F., Amberger, J.S., Bocchini, C.A. & McKusick, V.A. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 33, D514–D517 (2005).
Yu, W., Gwinn, M., Clyne, M., Yesupriya, A. & Khoury, M.J. A navigator for human genome epidemiology. Nat. Genet. 40, 124–125 (2008).
Becker, K.G., Barnes, K.C., Bright, T.J. & Wang, S.A. The genetic association database. Nat. Genet. 36, 431–432 (2004).
Peng, K. et al. The Disease and Gene Annotations (DGA): an annotation resource for human disease. Nucleic Acids Res. 41, D553–D560 (2013).
Fan, R., Wang, X. & Lin, C. LIBLINEAR: a library for large linear classification. J. Machine Learning Res. 9, 1871–1874 (2008).
Fischbach, G.D. & Lord, C. The Simons Simplex Collection: a resource for identification of autism genetic risk factors. Neuron 68, 192–195 (2010).
Petrovski, S., Wang, Q., Heinzen, E.L., Allen, A.S. & Goldstein, D.B. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 9, e1003709 (2013).
Samocha, K.E. et al. A framework for the interpretation of de novo mutation in human disease. Nat. Genet. 46, 944–950 (2014).
Barrett, T. et al. NCBI GEO: archive for functional genomics data sets–update. Nucleic Acids Res. 41, D991–D995 (2013).
Bolstad, B.M., Irizarry, R.A., Astrand, M. & Speed, T.P. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19, 185–193 (2003).
Dai, M. et al. Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res. 33, e175 (2005).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995).
Blondel, V.D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008, P10008 (2008).
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
Gene Ontology Consortium. The Gene Ontology: enhancements for 2011. Nucleic Acids Res. 40, D559–D564 (2012).
Kent, W.J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).
Grant, C.E., Bailey, T.L. & Noble, W.S. FIMO: scanning for occurrences of a given motif. Bioinformatics 27, 1017–1018 (2011).
Kulakovskiy, I.V. et al. HOCOMOCO: a comprehensive collection of human transcription factor binding sites models. Nucleic Acids Res. 41, D195–D202 (2013).
Bostock, M., Ogievetsky, V. & Heer, J. D3: data-driven documents. IEEE Trans. Vis. Comput. Graph. 17, 2301–2309 (2011).
Niculescu-Mizil, A. & Caruana, R. Predicting good probabilities with supervised learning. in Proc. 22nd Internat. Conf. Machine Learning 625–632 (ACM Press, 2005).
We are grateful to all of the families at the participating Simons Simplex Collection (SSC) sites, as well as the SSC principal investigators. We thank all members of the Troyanskaya lab for valuable discussions. We thank J. Spiro and other members of the Simons Foundation for constant feedback on the work and manuscript. This work was primarily supported by US National Institutes of Health (NIH) grants R01 GM071966 and R01 HG005998 to O.G.T. V.Y. was supported in part by US NIH grant T32 HG003284. This work was supported in part by US NIH grant P50 GM071508. O.G.T. is a senior fellow of the Genetic Networks program of the Canadian Institute for Advanced Research (CIFAR).
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 Illustration of the working of network-based SVM to confidently predict a new (and now confirmed) ASD gene.
Gene CTNND2’s brain network neighborhood that enabled its prediction by the SVM. E1-E4 denote genes with various levels of evidence (high to low) for known association with ASD. CTNND2 gene ranks 16th in our prediction and has been recently discovered as a high-confidence gene associated with ASD in female-enriched multiplex (Turner et al. (2015) Nature 520(7545), 51–6), although our classifier did not see this gene during training (at any level of confidence). In our brain network, CTNND2 is tightly linked to high-evidence genes SHANK2 and NRXN1 (E1), and several lower-evidence genes such as ATP2B2, DPP6 and SNAP25. In addition, it also shares common neighbors with E1 genes SHANK2, GRIN2B and NRXN1. The combination of local connectivity and global interaction pattern together enable the SVM to accurately predict CTNND2 as an ASD-related gene. Both DAWN (Liu et al. (2015) The Annals of Applied Statistics 9(3), 1571–1600) (DAWN-2015 rank 6079/8488) and NETBAG+ (Chang et al. (2014) Nature Neuroscience 18(2), 191–198), recent network-based methods focused on autism, fail to predict this gene’s relevance for autism, mainly because previous genetic studies have not strongly linked this gene to ASD. This is evident from CTNND2’s high P-value (0.90) using TADA-2014 (De Rubeis et al. (2014) Nature 515(7526), 209–15), a method to summarize prior ASD genetic data (all types of mutations) into gene P-values for ASD-risk. Our method, on the other hand, without any previous genetic evidence about this gene (this gene is also not in our gold standard), ranks it 16th out of more than 25,000 genes in the genome.
Supplementary Figure 2 Robustness of ASD-gene predictions to changes in our gold-standard gene sets.
To test the robustness of our predictions, we made predictions by subsampling the gold standard, each time using 4/5 of the negatives and positives. The rank based correlation coefficient between our original prediction and the 100 sets of predictions made on the resampled gold standard is 0.993, indicating that our genome-wide predictions are highly robust to noise in the gold standard.
In order to improve the interpretability of our genome-wide ranking, we calculated a permutation-based P-value and a corresponding Q-value for each gene (see Methods). The plot shows the distribution of these P-values, with the red dashed line indicating mean frequency.
Supplementary Figure 4 Extended evaluation of autism-gene predictions on empirical data from an independent sequencing study.
All evaluations below were performed and presented similar to those in Figure 2b and 2c. The resulting trends – significant enrichment of proband LGD mutations and non-enrichment of sibling LGD mutations – are consistent with those observed in Figure 2. (a) Rank-based enrichment test (without top-decile cutoffs) on data from all and unpublished families, for ‘All’ and ‘Novel’ ASD genes (Fig. 2, and Fig S4 b, c and d), showing trends similar to those presented in Figure 2. ‘All families – All genes’ is in Figure 2b, and the rest are here. All plots present the z-score quantifying the enrichment of the gene-set of interest towards the top of our genome-wide ranking of genes (see Methods). The three mutation gene-sets in each case are colored differently (labeled in the legend below) with the number of genes in parenthesis below. The P-values recorded at top of each bar were calculated using a permutation test described in Methods in detail. (b) Novel de novo LGDs from all families: This data set is derived from mutations recorded from all families published in 2014, but restricted to only genes that were not part of our training gold-standard (completely ‘novel’ ASD genes). The total number of genes in the gene-set and the P-value from the binomial test are given in parenthesis just below. (c) All de novo LGDs from unpublished families: This data set is derived from mutations recorded only from SSC families that were unpublished in the 2012 studies and subsequently published in 2014 (all genes from completely unseen families). (d) Novel de novo LGDs from unpublished families: This data set is derived from mutations recorded from families that were unpublished in the 2012 studies and subsequently published in 2014, further restricted to only genes that were not part of our training gold-standard (completely ‘novel’ ASD genes from unseen families).
Supplementary Figure 5 Histograms of prior genetic evidence scores and the distributions of top genes as predicted by DAWN-2015 and our method.
(a) Distribution of 2014 TADA P-values of 8488 genes (white), which is the input genetic evidence for DAWN-2015. Overlaid on top (blue) is the distribution of TADA P-values of DAWN’s top 333 predicted genes. (b) Similar to (a), here overlaid on top (red) with the distribution of 2014 TADA P-values of a comparable set of our top 333 predicted genes.
We compare our method to DAWN-2015 by evaluating the ASD gene ranking produced by the two methods in their ability to prioritize (a) novel LGD-targets, (b) novel protein-protein interaction (PPI) partners, and (c) ASD-associated genes identified by genome-wide association studies (GWAS). Precision is presented as fold-over-random, measured as observed precision over expected baseline-precision (calculated as Precision/[P/(P+N)], where P is the number of positives and N is the number of negatives). For (a), gene targets of novel de novo LGDs observed in SSC probands and siblings were, respectively, used as positive and negative examples. For (b), novel PPI partners of potential ASD genes identified in a genome-wide assay (Corominas et al. (2014) Nature Communications 5) were used as positives, and all other proteins tested in that assay were used as negatives. For (c), genes associated with autism as documented in the GWAS catalog were used as positives, while all other genes in the catalogue were used as negatives. P-values were calculated by Wilcoxon rank-sum test.
(a) Enrichment of ASD-gene ranking on genes associated with various neurological/brain diseases. To test the specificity of our genome-wide ranking to ASD, we tested the ranking of genes associated with five neurological diseases annotated in the OMIM database. We found no significant enrichment for unrelated diseases, indicating that our top-ranked genes are indeed specific for ASD, and distinct from genes associated with other neuronal disease. (b) Evaluation of ASD-gene ranking on genes linked to disorders closely related to ASD. We obtained genes associated with intellectual disability (ID; left; (Parikshak et al. (2013) Cell 155(5), 1008–21)), schizophrenia (middle; (Fromer et al. (2014) Nature 506(7487), 179–84)), and developmental disorders (DDD; right; (TDDD Study (2015) Nature 519(7542), 223–8)) identified in large sequencing studies, removed the genes in our ASD positive gold standards, and test their distribution in our genome-wide ASD-gene ranking. We observe the expected significant overlap with ID and schizophrenia, while noting that our ranking prioritizes hundreds of additional genes not implicated in these disorders. The enrichment of genes in the DDD dataset is not surprising because the underlying cohort includes cases with ASD as well as several other disorders that have comorbidity with ASD (e.g. ID, heart development disorders).
(a) Boxplot of distribution of RVIS score (Petrovski et al. (2013) PLoS Genetics 9(8), e1003709) as a function of our autism gene ranking. (b) Histogram of fraction of constrained genes (out of 1,003) (Samocha et al. (2014) Nature Genetics 46(9), 944–50), along autism gene ranking. Testing the two constrained sets against our predictions (see Methods) showed similar results both within the top-decile (using Fisher’s exact test) or without any cutoff (using a rank-based permutation test): top-ranked genes do tend to be more constrained relative to all the other genes (RVIS, top-decile Wilcoxon test P < 2.2e-16, rank-based permutation test P = 1e-6; Constrained set, Fisher’s exact test P = 1.36e-86, rank-based permutation test P = 1e-6).
Spatiotemporal signature for each window was derived by controlling for both brain region and developmental stage. The permutation test used to identify association of each signature with ASD also controls for the number of genes in that signature. The plot shows that ASD-association of all signatures (indicated using the negative logarithm of the enrichment Q-value) as a function of the number of genes in each signature, demonstrating that there is no correlation between signature-size and ASD-association.
Results of a permutation test to evaluate the clustering of the ASD genes in a randomized brain network. Shared k-nearest-neighbor-based analysis was repeated a 100 times, each time randomizing the k-nearest neighbors of each node in the network of top ASD genes. For a random set of nearly 30,000 gene pairs, the bulk of the clustering scores ranged between 0 and 0.3 based on random k neighbors, significantly lower than the cutoff of 0.9 used in our analysis of the real network. Our ASD functional modules thus are significantly and substantially more cohesive than random.
Supplementary Figure 11 Comparison of predicted ASD ranks of genes within autism-associated CNVs that have prior genetic and functional evidence.
Boxplots show the distribution of ASD ranks of genes within the 8 ASD-associated CNVs that have different types of prior evidence: genetic (strong; red; n = 13), functional (weak; blue; n = 8), or none (grey; n = 166). *: P ≤ 0.05; ***: P ≤ 0.001.
Top decile genes within ASD-associated CNVs (blue circles), and known ASD genes (red circles) are linked in the brain network via intermediate genes (green circles). Statistically significant intermediate genes are identified using a permutation test against random genomic intervals. The biological processes enriched among significant intermediate genes (green bubble) that are shared with multiple ASD-associated CNVs are shown in Figure 6.
The interactive ASD web-interface enables biologists to explore their genes of interest (left) in the human brain-specific network (right). Users can easily explore the contribution of each of the different data types to any predicted brain-specific interaction (window at bottom right), including a summary of which data types contributed the most – for example, co-expression and GSEA miRNA targets in this case – as well as evidence weights for each individual dataset. Users can click on any dataset to be redirected to the underlying data.
Supplementary Figure 14 New predictions based on training with updated gold standards correspond closely to the predictions used in this study.
As an example of how we can regularly update the results in our web-server based on newly identified genes, we have made available a new set of predictions at http://asd.princeton.edu/v2 by training on an updated version of the SFARI gene database that includes all the results from the 2014 study (Iossifov et al. (2014) Nature 515(7526), 216–21). The scatter plot shows the original (v1) and new ranks (v2) of each gene in the genome. The dotted lines correspond to the top-decile of the predicted genes. The new predictions are overall quite consistent with the original one used throughout this study, having a correlation coefficient of 0.93 between the genome-wide rankings with 83.3% of top decile genes consistent between v1 and v2. In addition, as our web server demonstrates, analyses results (including GO enrichments of predictions and neurodevelopmental analysis) also remain nearly identical.
Supplementary Figure 15 A potential use case: estimating the relevance of a high-resolution spatiotemporal window in the brain to autism.
A biomedical researcher could use our predictions as a framework for analyzing their data from high- or low-throughput assays, allowing high-resolution study of autism genetics in the functional and physiological contexts of their interest. For example, a researcher who has generated gene expression or proteomics data from a new high-resolution spatiotemporal window in the brain (either human or in a model organism) can use our approach to assess the molecular relevance of that window to ASD. This approach – identifying a characteristic gene signature from the samples and estimating its enrichment in our genome-wide ASD-gene ranking (similar to results presented in Figure 3) – can provide results of increasing specificity with increasing spatiotemporal resolution in the available data. (S)he could then focus on the highly-ranked genes in her/his expression signature in combination with the specific functional contexts identified for these genes (‘ASD-associated functional modules’ as presented in Figure 4) to generate hypotheses and design experiments to further characterize the genes expressed in the window in relation to autism.
Supplementary Figures 1–15 (PDF 3820 kb)
Our training gold standard consisted of known ASD-associated genes (with varying levels of evidence E1-4) as positives and non-mental-health-related genes as negatives. The positives are listed along with their evidence level and source database. (XLSX 76 kb)
Supplementary Table 2: Top 20 biological processes enriched in our SVM model for predicting ASD-genes.
We analyzed our ASD-gene prediction model to identify which biological processes and pathways contribute the most in associating a gene with ASD in the brain-specific network. The table contains the top 20 statistically enriched Gene Ontology biological processes among genes that are most highly “weighted” by the model, i.e., associated with the highest feature weights in our SVM model. The most informative genes in our ASD network-based model are strongly enriched for neurological processes, providing insight into the general underlying processes that may be driving our predictions. (XLSX 9 kb)
The predicted ASD-association ranking of all genes in the genome is listed along with detailed information on their gold standard status, prediction score, prediction probability, prediction P and Q values, and membership in ASD-related gene sets. The file also contains the evaluation of the genome-wide ranking controlling for gene length and neuronal functional annotations, and literature support for select top-ranked genes not used in our positive training standard. (XLSX 2992 kb)
Supplementary Table 4: Targets of de novo mutations identified by exome sequencing of the Simon Simplex Collection.
Genes harboring de novo likely-gene-disrupting (LGD; also known as loss-of-function) or synonymous (SYN) mutations identified in autistic children (probands; prb) and unaffected sibling (sib) are listed separately. (XLSX 106 kb)
All signatures that are significantly enriched among the top-ranked ASD genes are listed here along with the number of genes in each signature and their enrichment scores. (XLSX 45 kb)
The nine modules of top-ranked ASD genes each tightly connected in the brain-specific network are presented here with information about their module/cluster membership, connectivity within each cluster, and enriched biological processes in each cluster. (XLSX 222 kb)
The table contains the complete ASD ranking of genes within each of eight autism-associated CNVs along with details on previous genetic or functional evidence for the connection of individual CNV-genes to ASD. (XLSX 59 kb)
Results from the functional analysis of top-ranked genes in the eight ASD-associated CNVs are presented here, with details on the specific ‘intermediate’ genes and processes that connect the CNV genes to the molecular phenotype of autism. The table also contains literature support for select intermediate genes. (XLSX 54 kb)
Supplementary Table 9: Detailed functional, developmental, and CNV information for our top-decile genes.
Top 2,500 ASD candidate genes along with their functional module memberships, spatiotemporal developmental gene-expression patterns, and CNV membership. (XLSX 317 kb)
About this article
Cite this article
Krishnan, A., Zhang, R., Yao, V. et al. Genome-wide prediction and functional characterization of the genetic basis of autism spectrum disorder. Nat Neurosci 19, 1454–1462 (2016). https://doi.org/10.1038/nn.4353
Bringing machine learning to research on intellectual and developmental disabilities: taking inspiration from neurological diseases
Journal of Neurodevelopmental Disorders (2022)
BMC Bioinformatics (2022)
SFARI genes and where to find them; modelling Autism Spectrum Disorder specific gene expression dysregulation with RNA-seq data
Scientific Reports (2022)
Molecular Psychiatry (2022)
Multimedia Tools and Applications (2022)