Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Genome-wide prediction and functional characterization of the genetic basis of autism spectrum disorder

Abstract

Autism spectrum disorder (ASD) is a complex neurodevelopmental disorder with a strong genetic basis. Yet, only a small fraction of potentially causal genes—about 65 genes out of an estimated several hundred—are known with strong genetic evidence from sequencing studies. We developed a complementary machine-learning approach based on a human brain-specific gene network to present a genome-wide prediction of autism risk genes, including hundreds of candidates for which there is minimal or no prior genetic evidence. Our approach was validated in a large independent case–control sequencing study. Leveraging these genome-wide predictions and the brain-specific network, we demonstrated that the large set of ASD genes converges on a smaller number of key pathways and developmental stages of the brain. Finally, we identified likely pathogenic genes within frequent autism-associated copy-number variants and proposed genes and pathways that are likely mediators of ASD across multiple copy-number variants. All predictions and functional insights are available at http://asd.princeton.edu.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Genome-wide prediction of autism-associated genes.
Figure 2: Evaluation of autism-associated gene predictions.
Figure 3: ASD-associated genetic changes in the spatiotemporal development of the brain.
Figure 4: Autism-associated brain-specific functional modules.
Figure 5: Prioritization of genes within eight recurrent ASD-associated CNVs.
Figure 6: Convergence of cellular functions disrupted by multiple CNVs identified through key intermediate genes in the brain network.

Similar content being viewed by others

Accession codes

Accessions

Gene Expression Omnibus

References

  1. Winter, E.E., Goodstadt, L. & Ponting, C.P. Elevated rates of protein secretion, evolution, and disease among tissue-specific genes. Genome Res. 14, 54–61 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  2. Goh, K.-I. et al. The human disease network. Proc. Natl. Acad. Sci. USA 104, 8685–8690 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  3. Sanders, S.J. et al. De novo mutations revealed by whole-exome sequencing are strongly associated with autism. Nature 485, 237–241 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  4. O'Roak, B.J. et al. Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature 485, 246–250 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  5. Neale, B.M. et al. Patterns and rates of exonic de novo mutations in autism spectrum disorders. Nature 485, 242–245 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  6. Iossifov, I. et al. De novo gene disruptions in children on the autistic spectrum. Neuron 74, 285–299 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  7. Iossifov, I. et al. The contribution of de novo coding mutations to autism spectrum disorder. Nature 515, 216–221 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  8. De Rubeis, S. et al. Synaptic, transcriptional and chromatin genes disrupted in autism. Nature 515, 209–215 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  9. Sanders, S.J. et al. Insights into autism spectrum disorder genomic architecture and biology from 71 risk loci. Neuron 87, 1215–1233 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  10. Ronemus, M., Iossifov, I., Levy, D. & Wigler, M. The role of de novo mutations in the genetics of autism spectrum disorders. Nat. Rev. Genet. 15, 133–141 (2014).

    CAS  PubMed  Google Scholar 

  11. He, X. et al. Integrated model of de novo and inherited genetic variants yields greater power to identify risk genes. PLoS Genet. 9, e1003671 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  12. Gilman, S.R. et al. Rare de novo variants associated with autism implicate a large functional network of genes involved in formation and function of synapses. Neuron 70, 898–907 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  13. Lee, T.-L.L., Raygada, M.J. & Rennert, O.M. Integrative gene network analysis provides novel regulatory relationships, genetic contributions and susceptible targets in autism spectrum disorders. Gene 496, 88–96 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  14. Kou, Y., Betancur, C., Xu, H., Buxbaum, J.D. & Ma'ayan, A. Network- and attribute-based classifiers can prioritize genes and pathways for autism spectrum disorders and intellectual disability. Am. J. Med. Genet. C. Semin. Med. Genet. 160C, 130–142 (2012).

    PubMed  PubMed Central  Google Scholar 

  15. Ben-David, E. & Shifman, S. Combined analysis of exome sequencing points toward a major role for transcription regulation during brain development in autism. Mol. Psychiatry 18, 1054–1056 (2013).

    CAS  PubMed  Google Scholar 

  16. Parikshak, N.N. et al. Integrative functional genomic analyses implicate specific molecular pathways and circuits in autism. Cell 155, 1008–1021 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. Li, J. et al. Integrated systems analysis reveals a molecular network underlying autism spectrum disorders. Mol. Syst. Biol. 10, 774 (2014).

    PubMed  PubMed Central  Google Scholar 

  18. Chang, J., Gilman, S.R., Chiang, A.H., Sanders, S.J. & Vitkup, D. Genotype to phenotype relationships in autism spectrum disorders. Nat. Neurosci. 18, 191–198 (2015).

    CAS  PubMed  Google Scholar 

  19. Hormozdiari, F., Penn, O., Borenstein, E. & Eichler, E.E. The discovery of integrated gene networks for autism and related disorders. Genome Res. 25, 142–154 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  20. Liu, L., Lei, J. & Roeder, K. Network assisted analysis to reveal the genetic basis of autism. Ann. Appl. Stat. 9, 1571–1600 (2015).

    PubMed  PubMed Central  Google Scholar 

  21. Greene, C.S. et al. Understanding multicellular function and disease with human tissue-specific networks. Nat. Genet. 47, 569–576 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. Darnell, J.C. et al. FMRP stalls ribosomal translocation on mRNAs linked to synaptic function and autism. Cell 146, 247–261 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  23. King, I.F. et al. Topoisomerases facilitate transcription of long genes linked to autism. Nature 501, 58–62 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  24. Cotney, J. et al. The autism-associated chromatin modifier CHD8 regulates other autism risk genes during human neurodevelopment. Nat. Commun. 6, 6404 (2015).

    CAS  PubMed  Google Scholar 

  25. Pinto, D. et al. Convergence of genes and cellular pathways dysregulated in autism spectrum disorders. Am. J. Hum. Genet. 94, 677–694 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  26. Bayés, A. et al. Characterization of the proteome, diseases and evolution of the human postsynaptic density. Nat. Neurosci. 14, 19–21 (2011).

    PubMed  Google Scholar 

  27. Corominas, R. et al. Protein interaction network of alternatively spliced isoforms from brain links genetic risk factors for autism. Nat. Commun. 5, 3650 (2014).

    PubMed  Google Scholar 

  28. Iossifov, I. et al. Low load for disruptive mutations in autism genes and their biased transmission. Proc. Natl. Acad. Sci. USA 112, E5600–E5607 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  29. Kang, H.J. et al. Spatio-temporal transcriptome of the human brain. Nature 478, 483–489 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  30. Willsey, A.J. et al. Coexpression networks implicate human midfetal deep cortical projection neurons in the pathogenesis of autism. Cell 155, 997–1007 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. Uddin, M. et al. Brain-expressed exons under purifying selection are enriched for de novo mutations in autism spectrum disorder. Nat. Genet. 46, 742–747 (2014).

    CAS  PubMed  Google Scholar 

  32. Stoner, R. et al. Patches of disorganization in the neocortex of children with autism. N. Engl. J. Med. 370, 1209–1219 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  33. Haar, S., Berman, S., Behrmann, M. & Dinstein, I. Anatomical abnormalities in autism? Cereb. Cortex 4, 1440–1452 (2016).

    Google Scholar 

  34. Dinstein, I., Heeger, D.J. & Behrmann, M. Neural variability: friend or foe? Trends Cogn. Sci. 19, 322–328 (2015).

    PubMed  Google Scholar 

  35. Wang, S.S.-H., Kloth, A.D. & Badura, A. The cerebellum, sensitive periods, and autism. Neuron 83, 518–532 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. Peça, J. et al. Shank3 mutant mice display autistic-like behaviours and striatal dysfunction. Nature 472, 437–442 (2011).

    PubMed  PubMed Central  Google Scholar 

  37. Di Martino, A. et al. The autism brain imaging data exchange: towards a large-scale evaluation of the intrinsic brain architecture in autism. Mol. Psychiatry 19, 659–667 (2014).

    CAS  PubMed  Google Scholar 

  38. Goldberg, D.S. & Roth, F.P. Assessing experimentally derived interactions in a small world. Proc. Natl. Acad. Sci. USA 100, 4372–4376 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  39. Voineagu, I. et al. Transcriptomic analysis of autistic brain reveals convergent molecular pathology. Nature 474, 380–384 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  40. Masi, A. et al. Cytokine aberrations in autism spectrum disorder: a systematic review and meta-analysis. Mol. Psychiatry 20, 440–446 (2015).

    CAS  PubMed  Google Scholar 

  41. Bresnahan, M. et al. Association of maternal report of infant and toddler gastrointestinal symptoms with autism: evidence from a prospective birth cohort. JAMA Psychiatry 72, 466–474 (2015).

    PubMed  PubMed Central  Google Scholar 

  42. Hazen, E.P., Stornelli, J.L., O'Rourke, J.A., Koesterer, K. & McDougle, C.J. Sensory symptoms in autism spectrum disorders. Harv. Rev. Psychiatry 22, 112–124 (2014).

    PubMed  Google Scholar 

  43. Cohen, S., Conduit, R., Lockley, S.W., Rajaratnam, S.M. & Cornish, K.M. The relationship between sleep and behavior in autism spectrum disorder (ASD): a review. J. Neurodev. Disord. 6, 44 (2014).

    PubMed  PubMed Central  Google Scholar 

  44. Takahashi, T. et al. Rosbin: a novel homeobox-like protein gene expressed exclusively in round spermatids. Biol. Reprod. 70, 1485–1492 (2004).

    CAS  PubMed  Google Scholar 

  45. Weiss, L.A. et al. Association between microdeletion and microduplication at 16p11.2 and autism. N. Engl. J. Med. 358, 667–675 (2008).

    CAS  PubMed  Google Scholar 

  46. Lin, G.N. et al. Spatiotemporal 16p11.2 protein network implicates cortical late mid-fetal brain development and KCTD13-Cul3-RhoA pathway in psychiatric diseases. Neuron 85, 742–754 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  47. Martin-Granados, C., Philp, A., Oxenham, S.K., Prescott, A.R. & Cohen, P.T.W. Depletion of protein phosphatase 4 in human cells reveals essential roles in centrosome maturation, cell migration and the regulation of Rho GTPases. Int. J. Biochem. Cell Biol. 40, 2315–2332 (2008).

    CAS  PubMed  Google Scholar 

  48. Vanunu, O., Magger, O., Ruppin, E., Shlomi, T. & Sharan, R. Associating genes and protein complexes with disease via network propagation. PLoS Comput. Biol. 6, e1000641 (2010).

    PubMed  PubMed Central  Google Scholar 

  49. Hus, V., Gotham, K. & Lord, C. Standardizing ADOS domain scores: separating severity of social affect and restricted and repetitive behaviors. J. Autism Dev. Disord. 44, 2400–2412 (2014).

    PubMed  PubMed Central  Google Scholar 

  50. Moreno-De-Luca, D. et al. Using large clinical data sets to infer pathogenicity for rare copy number variants in autism cohorts. Mol. Psychiatry 18, 1090–1095 (2013).

    CAS  PubMed  Google Scholar 

  51. Abrahams, B.S. et al. SFARI Gene 2.0: a community-driven knowledgebase for the autism spectrum disorders (ASDs). Mol. Autism 4, 36 (2013).

    PubMed  PubMed Central  Google Scholar 

  52. Hamosh, A., Scott, A.F., Amberger, J.S., Bocchini, C.A. & McKusick, V.A. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 33, D514–D517 (2005).

    CAS  PubMed  Google Scholar 

  53. Yu, W., Gwinn, M., Clyne, M., Yesupriya, A. & Khoury, M.J. A navigator for human genome epidemiology. Nat. Genet. 40, 124–125 (2008).

    CAS  PubMed  Google Scholar 

  54. Becker, K.G., Barnes, K.C., Bright, T.J. & Wang, S.A. The genetic association database. Nat. Genet. 36, 431–432 (2004).

    CAS  PubMed  Google Scholar 

  55. Peng, K. et al. The Disease and Gene Annotations (DGA): an annotation resource for human disease. Nucleic Acids Res. 41, D553–D560 (2013).

    CAS  PubMed  Google Scholar 

  56. Fan, R., Wang, X. & Lin, C. LIBLINEAR: a library for large linear classification. J. Machine Learning Res. 9, 1871–1874 (2008).

    Google Scholar 

  57. Fischbach, G.D. & Lord, C. The Simons Simplex Collection: a resource for identification of autism genetic risk factors. Neuron 68, 192–195 (2010).

    CAS  PubMed  Google Scholar 

  58. Petrovski, S., Wang, Q., Heinzen, E.L., Allen, A.S. & Goldstein, D.B. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 9, e1003709 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  59. Samocha, K.E. et al. A framework for the interpretation of de novo mutation in human disease. Nat. Genet. 46, 944–950 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  60. Barrett, T. et al. NCBI GEO: archive for functional genomics data sets–update. Nucleic Acids Res. 41, D991–D995 (2013).

    CAS  PubMed  Google Scholar 

  61. Bolstad, B.M., Irizarry, R.A., Astrand, M. & Speed, T.P. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19, 185–193 (2003).

    CAS  PubMed  Google Scholar 

  62. Dai, M. et al. Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res. 33, e175 (2005).

    PubMed  PubMed Central  Google Scholar 

  63. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995).

    Google Scholar 

  64. Blondel, V.D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008, P10008 (2008).

    Google Scholar 

  65. Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  66. Gene Ontology Consortium. The Gene Ontology: enhancements for 2011. Nucleic Acids Res. 40, D559–D564 (2012).

  67. Kent, W.J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).

    CAS  PubMed  PubMed Central  Google Scholar 

  68. Grant, C.E., Bailey, T.L. & Noble, W.S. FIMO: scanning for occurrences of a given motif. Bioinformatics 27, 1017–1018 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  69. Kulakovskiy, I.V. et al. HOCOMOCO: a comprehensive collection of human transcription factor binding sites models. Nucleic Acids Res. 41, D195–D202 (2013).

    CAS  PubMed  Google Scholar 

  70. Bostock, M., Ogievetsky, V. & Heer, J. D3: data-driven documents. IEEE Trans. Vis. Comput. Graph. 17, 2301–2309 (2011).

    PubMed  Google Scholar 

  71. Niculescu-Mizil, A. & Caruana, R. Predicting good probabilities with supervised learning. in Proc. 22nd Internat. Conf. Machine Learning 625–632 (ACM Press, 2005).

Download references

Acknowledgements

We are grateful to all of the families at the participating Simons Simplex Collection (SSC) sites, as well as the SSC principal investigators. We thank all members of the Troyanskaya lab for valuable discussions. We thank J. Spiro and other members of the Simons Foundation for constant feedback on the work and manuscript. This work was primarily supported by US National Institutes of Health (NIH) grants R01 GM071966 and R01 HG005998 to O.G.T. V.Y. was supported in part by US NIH grant T32 HG003284. This work was supported in part by US NIH grant P50 GM071508. O.G.T. is a senior fellow of the Genetic Networks program of the Canadian Institute for Advanced Research (CIFAR).

Author information

Authors and Affiliations

Authors

Contributions

A.K., R.Z., A.L. and O.G.T. conceived and designed the research. A.K. and R.Z. performed computational analyses with contributions from A.L., V.Y., A.K.W. and C.L.T. N.V. provided data. A.T. developed the web interface with contributions from A.K.W., A.K. and R.Z. A.K., R.Z., A.L., A.P. and O.G.T. wrote the manuscript with inputs from V.Y. and C.L.T., and all authors contributed to revisions. A.K. and R.Z. are co-first authors and are listed alphabetically.

Corresponding authors

Correspondence to Alex Lash or Olga G Troyanskaya.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Illustration of the working of network-based SVM to confidently predict a new (and now confirmed) ASD gene.

Gene CTNND2’s brain network neighborhood that enabled its prediction by the SVM. E1-E4 denote genes with various levels of evidence (high to low) for known association with ASD. CTNND2 gene ranks 16th in our prediction and has been recently discovered as a high-confidence gene associated with ASD in female-enriched multiplex (Turner et al. (2015) Nature 520(7545), 51–6), although our classifier did not see this gene during training (at any level of confidence). In our brain network, CTNND2 is tightly linked to high-evidence genes SHANK2 and NRXN1 (E1), and several lower-evidence genes such as ATP2B2, DPP6 and SNAP25. In addition, it also shares common neighbors with E1 genes SHANK2, GRIN2B and NRXN1. The combination of local connectivity and global interaction pattern together enable the SVM to accurately predict CTNND2 as an ASD-related gene. Both DAWN (Liu et al. (2015) The Annals of Applied Statistics 9(3), 1571–1600) (DAWN-2015 rank 6079/8488) and NETBAG+ (Chang et al. (2014) Nature Neuroscience 18(2), 191–198), recent network-based methods focused on autism, fail to predict this gene’s relevance for autism, mainly because previous genetic studies have not strongly linked this gene to ASD. This is evident from CTNND2’s high P-value (0.90) using TADA-2014 (De Rubeis et al. (2014) Nature 515(7526), 209–15), a method to summarize prior ASD genetic data (all types of mutations) into gene P-values for ASD-risk. Our method, on the other hand, without any previous genetic evidence about this gene (this gene is also not in our gold standard), ranks it 16th out of more than 25,000 genes in the genome.

Supplementary Figure 2 Robustness of ASD-gene predictions to changes in our gold-standard gene sets.

To test the robustness of our predictions, we made predictions by subsampling the gold standard, each time using 4/5 of the negatives and positives. The rank based correlation coefficient between our original prediction and the 100 sets of predictions made on the resampled gold standard is 0.993, indicating that our genome-wide predictions are highly robust to noise in the gold standard.

Supplementary Figure 3 Permutation-based P-values for our genome-wide predictions.

In order to improve the interpretability of our genome-wide ranking, we calculated a permutation-based P-value and a corresponding Q-value for each gene (see Methods). The plot shows the distribution of these P-values, with the red dashed line indicating mean frequency.

Supplementary Figure 4 Extended evaluation of autism-gene predictions on empirical data from an independent sequencing study.

All evaluations below were performed and presented similar to those in Figure 2b and 2c. The resulting trends – significant enrichment of proband LGD mutations and non-enrichment of sibling LGD mutations – are consistent with those observed in Figure 2. (a) Rank-based enrichment test (without top-decile cutoffs) on data from all and unpublished families, for ‘All’ and ‘Novel’ ASD genes (Fig. 2, and Fig S4 b, c and d), showing trends similar to those presented in Figure 2. ‘All families – All genes’ is in Figure 2b, and the rest are here. All plots present the z-score quantifying the enrichment of the gene-set of interest towards the top of our genome-wide ranking of genes (see Methods). The three mutation gene-sets in each case are colored differently (labeled in the legend below) with the number of genes in parenthesis below. The P-values recorded at top of each bar were calculated using a permutation test described in Methods in detail. (b) Novel de novo LGDs from all families: This data set is derived from mutations recorded from all families published in 2014, but restricted to only genes that were not part of our training gold-standard (completely ‘novel’ ASD genes). The total number of genes in the gene-set and the P-value from the binomial test are given in parenthesis just below. (c) All de novo LGDs from unpublished families: This data set is derived from mutations recorded only from SSC families that were unpublished in the 2012 studies and subsequently published in 2014 (all genes from completely unseen families). (d) Novel de novo LGDs from unpublished families: This data set is derived from mutations recorded from families that were unpublished in the 2012 studies and subsequently published in 2014, further restricted to only genes that were not part of our training gold-standard (completely ‘novel’ ASD genes from unseen families).

Supplementary Figure 5 Histograms of prior genetic evidence scores and the distributions of top genes as predicted by DAWN-2015 and our method.

(a) Distribution of 2014 TADA P-values of 8488 genes (white), which is the input genetic evidence for DAWN-2015. Overlaid on top (blue) is the distribution of TADA P-values of DAWN’s top 333 predicted genes. (b) Similar to (a), here overlaid on top (red) with the distribution of 2014 TADA P-values of a comparable set of our top 333 predicted genes.

Supplementary Figure 6 Comparison of our method to a previous method (DAWN).

We compare our method to DAWN-2015 by evaluating the ASD gene ranking produced by the two methods in their ability to prioritize (a) novel LGD-targets, (b) novel protein-protein interaction (PPI) partners, and (c) ASD-associated genes identified by genome-wide association studies (GWAS). Precision is presented as fold-over-random, measured as observed precision over expected baseline-precision (calculated as Precision/[P/(P+N)], where P is the number of positives and N is the number of negatives). For (a), gene targets of novel de novo LGDs observed in SSC probands and siblings were, respectively, used as positive and negative examples. For (b), novel PPI partners of potential ASD genes identified in a genome-wide assay (Corominas et al. (2014) Nature Communications 5) were used as positives, and all other proteins tested in that assay were used as negatives. For (c), genes associated with autism as documented in the GWAS catalog were used as positives, while all other genes in the catalogue were used as negatives. P-values were calculated by Wilcoxon rank-sum test.

Supplementary Figure 7 Specificity of our genome-wide ranking to ASD.

(a) Enrichment of ASD-gene ranking on genes associated with various neurological/brain diseases. To test the specificity of our genome-wide ranking to ASD, we tested the ranking of genes associated with five neurological diseases annotated in the OMIM database. We found no significant enrichment for unrelated diseases, indicating that our top-ranked genes are indeed specific for ASD, and distinct from genes associated with other neuronal disease. (b) Evaluation of ASD-gene ranking on genes linked to disorders closely related to ASD. We obtained genes associated with intellectual disability (ID; left; (Parikshak et al. (2013) Cell 155(5), 1008–21)), schizophrenia (middle; (Fromer et al. (2014) Nature 506(7487), 179–84)), and developmental disorders (DDD; right; (TDDD Study (2015) Nature 519(7542), 223–8)) identified in large sequencing studies, removed the genes in our ASD positive gold standards, and test their distribution in our genome-wide ASD-gene ranking. We observe the expected significant overlap with ID and schizophrenia, while noting that our ranking prioritizes hundreds of additional genes not implicated in these disorders. The enrichment of genes in the DDD dataset is not surprising because the underlying cohort includes cases with ASD as well as several other disorders that have comorbidity with ASD (e.g. ID, heart development disorders).

Supplementary Figure 8 Top decile genes tend to be more constrained (have less common variation).

(a) Boxplot of distribution of RVIS score (Petrovski et al. (2013) PLoS Genetics 9(8), e1003709) as a function of our autism gene ranking. (b) Histogram of fraction of constrained genes (out of 1,003) (Samocha et al. (2014) Nature Genetics 46(9), 944–50), along autism gene ranking. Testing the two constrained sets against our predictions (see Methods) showed similar results both within the top-decile (using Fisher’s exact test) or without any cutoff (using a rank-based permutation test): top-ranked genes do tend to be more constrained relative to all the other genes (RVIS, top-decile Wilcoxon test P < 2.2e-16, rank-based permutation test P = 1e-6; Constrained set, Fisher’s exact test P = 1.36e-86, rank-based permutation test P = 1e-6).

Supplementary Figure 9 Spatiotemporal analysis is not biased by windows with dramatic changes.

Spatiotemporal signature for each window was derived by controlling for both brain region and developmental stage. The permutation test used to identify association of each signature with ASD also controls for the number of genes in that signature. The plot shows that ASD-association of all signatures (indicated using the negative logarithm of the enrichment Q-value) as a function of the number of genes in each signature, demonstrating that there is no correlation between signature-size and ASD-association.

Supplementary Figure 10 Statistical analysis of ASD-gene clustering in the brain network.

Results of a permutation test to evaluate the clustering of the ASD genes in a randomized brain network. Shared k-nearest-neighbor-based analysis was repeated a 100 times, each time randomizing the k-nearest neighbors of each node in the network of top ASD genes. For a random set of nearly 30,000 gene pairs, the bulk of the clustering scores ranged between 0 and 0.3 based on random k neighbors, significantly lower than the cutoff of 0.9 used in our analysis of the real network. Our ASD functional modules thus are significantly and substantially more cohesive than random.

Supplementary Figure 11 Comparison of predicted ASD ranks of genes within autism-associated CNVs that have prior genetic and functional evidence.

Boxplots show the distribution of ASD ranks of genes within the 8 ASD-associated CNVs that have different types of prior evidence: genetic (strong; red; n = 13), functional (weak; blue; n = 8), or none (grey; n = 166). *: P ≤ 0.05; ***: P ≤ 0.001.

Supplementary Figure 12 Illustration of CNV diagram.

Top decile genes within ASD-associated CNVs (blue circles), and known ASD genes (red circles) are linked in the brain network via intermediate genes (green circles). Statistically significant intermediate genes are identified using a permutation test against random genomic intervals. The biological processes enriched among significant intermediate genes (green bubble) that are shared with multiple ASD-associated CNVs are shown in Figure 6.

Supplementary Figure 13 Illustration of network visualization in the ASD web server.

The interactive ASD web-interface enables biologists to explore their genes of interest (left) in the human brain-specific network (right). Users can easily explore the contribution of each of the different data types to any predicted brain-specific interaction (window at bottom right), including a summary of which data types contributed the most – for example, co-expression and GSEA miRNA targets in this case – as well as evidence weights for each individual dataset. Users can click on any dataset to be redirected to the underlying data.

Supplementary Figure 14 New predictions based on training with updated gold standards correspond closely to the predictions used in this study.

As an example of how we can regularly update the results in our web-server based on newly identified genes, we have made available a new set of predictions at http://asd.princeton.edu/v2 by training on an updated version of the SFARI gene database that includes all the results from the 2014 study (Iossifov et al. (2014) Nature 515(7526), 216–21). The scatter plot shows the original (v1) and new ranks (v2) of each gene in the genome. The dotted lines correspond to the top-decile of the predicted genes. The new predictions are overall quite consistent with the original one used throughout this study, having a correlation coefficient of 0.93 between the genome-wide rankings with 83.3% of top decile genes consistent between v1 and v2. In addition, as our web server demonstrates, analyses results (including GO enrichments of predictions and neurodevelopmental analysis) also remain nearly identical.

Supplementary Figure 15 A potential use case: estimating the relevance of a high-resolution spatiotemporal window in the brain to autism.

A biomedical researcher could use our predictions as a framework for analyzing their data from high- or low-throughput assays, allowing high-resolution study of autism genetics in the functional and physiological contexts of their interest. For example, a researcher who has generated gene expression or proteomics data from a new high-resolution spatiotemporal window in the brain (either human or in a model organism) can use our approach to assess the molecular relevance of that window to ASD. This approach – identifying a characteristic gene signature from the samples and estimating its enrichment in our genome-wide ASD-gene ranking (similar to results presented in Figure 3) – can provide results of increasing specificity with increasing spatiotemporal resolution in the available data. (S)he could then focus on the highly-ranked genes in her/his expression signature in combination with the specific functional contexts identified for these genes (‘ASD-associated functional modules’ as presented in Figure 4) to generate hypotheses and design experiments to further characterize the genes expressed in the window in relation to autism.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–15 (PDF 3820 kb)

Supplementary Methods Checklist (PDF 384 kb)

Supplementary Table 1: Training gold standard.

Our training gold standard consisted of known ASD-associated genes (with varying levels of evidence E1-4) as positives and non-mental-health-related genes as negatives. The positives are listed along with their evidence level and source database. (XLSX 76 kb)

Supplementary Table 2: Top 20 biological processes enriched in our SVM model for predicting ASD-genes.

We analyzed our ASD-gene prediction model to identify which biological processes and pathways contribute the most in associating a gene with ASD in the brain-specific network. The table contains the top 20 statistically enriched Gene Ontology biological processes among genes that are most highly “weighted” by the model, i.e., associated with the highest feature weights in our SVM model. The most informative genes in our ASD network-based model are strongly enriched for neurological processes, providing insight into the general underlying processes that may be driving our predictions. (XLSX 9 kb)

Supplementary Table 3: Genome-wide prediction of ASD-associated genes.

The predicted ASD-association ranking of all genes in the genome is listed along with detailed information on their gold standard status, prediction score, prediction probability, prediction P and Q values, and membership in ASD-related gene sets. The file also contains the evaluation of the genome-wide ranking controlling for gene length and neuronal functional annotations, and literature support for select top-ranked genes not used in our positive training standard. (XLSX 2992 kb)

Supplementary Table 4: Targets of de novo mutations identified by exome sequencing of the Simon Simplex Collection.

Genes harboring de novo likely-gene-disrupting (LGD; also known as loss-of-function) or synonymous (SYN) mutations identified in autistic children (probands; prb) and unaffected sibling (sib) are listed separately. (XLSX 106 kb)

Supplementary Table 5: ASD-association of brain developmental gene-expression signatures.

All signatures that are significantly enriched among the top-ranked ASD genes are listed here along with the number of genes in each signature and their enrichment scores. (XLSX 45 kb)

Supplementary Table 6: ASD-associated functional modules in the brain-specific network.

The nine modules of top-ranked ASD genes each tightly connected in the brain-specific network are presented here with information about their module/cluster membership, connectivity within each cluster, and enriched biological processes in each cluster. (XLSX 222 kb)

Supplementary Table 7: Prioritization of genes within ASD-associated CNVs.

The table contains the complete ASD ranking of genes within each of eight autism-associated CNVs along with details on previous genetic or functional evidence for the connection of individual CNV-genes to ASD. (XLSX 59 kb)

Supplementary Table 8: Functional analysis of ASD-associated CNVs.

Results from the functional analysis of top-ranked genes in the eight ASD-associated CNVs are presented here, with details on the specific ‘intermediate’ genes and processes that connect the CNV genes to the molecular phenotype of autism. The table also contains literature support for select intermediate genes. (XLSX 54 kb)

Supplementary Table 9: Detailed functional, developmental, and CNV information for our top-decile genes.

Top 2,500 ASD candidate genes along with their functional module memberships, spatiotemporal developmental gene-expression patterns, and CNV membership. (XLSX 317 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Krishnan, A., Zhang, R., Yao, V. et al. Genome-wide prediction and functional characterization of the genetic basis of autism spectrum disorder. Nat Neurosci 19, 1454–1462 (2016). https://doi.org/10.1038/nn.4353

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nn.4353

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research