Autism spectrum disorder (ASD) is a complex neurodevelopmental disorder with a strong genetic basis. Yet, only a small fraction of potentially causal genes—about 65 genes out of an estimated several hundred—are known with strong genetic evidence from sequencing studies. We developed a complementary machine-learning approach based on a human brain-specific gene network to present a genome-wide prediction of autism risk genes, including hundreds of candidates for which there is minimal or no prior genetic evidence. Our approach was validated in a large independent case–control sequencing study. Leveraging these genome-wide predictions and the brain-specific network, we demonstrated that the large set of ASD genes converges on a smaller number of key pathways and developmental stages of the brain. Finally, we identified likely pathogenic genes within frequent autism-associated copy-number variants and proposed genes and pathways that are likely mediators of ASD across multiple copy-number variants. All predictions and functional insights are available at http://asd.princeton.edu.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.


Gene Expression Omnibus


  1. 1.

    , & Elevated rates of protein secretion, evolution, and disease among tissue-specific genes. Genome Res. 14, 54–61 (2004).

  2. 2.

    et al. The human disease network. Proc. Natl. Acad. Sci. USA 104, 8685–8690 (2007).

  3. 3.

    et al. De novo mutations revealed by whole-exome sequencing are strongly associated with autism. Nature 485, 237–241 (2012).

  4. 4.

    et al. Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature 485, 246–250 (2012).

  5. 5.

    et al. Patterns and rates of exonic de novo mutations in autism spectrum disorders. Nature 485, 242–245 (2012).

  6. 6.

    et al. De novo gene disruptions in children on the autistic spectrum. Neuron 74, 285–299 (2012).

  7. 7.

    et al. The contribution of de novo coding mutations to autism spectrum disorder. Nature 515, 216–221 (2014).

  8. 8.

    et al. Synaptic, transcriptional and chromatin genes disrupted in autism. Nature 515, 209–215 (2014).

  9. 9.

    et al. Insights into autism spectrum disorder genomic architecture and biology from 71 risk loci. Neuron 87, 1215–1233 (2015).

  10. 10.

    , , & The role of de novo mutations in the genetics of autism spectrum disorders. Nat. Rev. Genet. 15, 133–141 (2014).

  11. 11.

    et al. Integrated model of de novo and inherited genetic variants yields greater power to identify risk genes. PLoS Genet. 9, e1003671 (2013).

  12. 12.

    et al. Rare de novo variants associated with autism implicate a large functional network of genes involved in formation and function of synapses. Neuron 70, 898–907 (2011).

  13. 13.

    , & Integrative gene network analysis provides novel regulatory relationships, genetic contributions and susceptible targets in autism spectrum disorders. Gene 496, 88–96 (2012).

  14. 14.

    , , , & Network- and attribute-based classifiers can prioritize genes and pathways for autism spectrum disorders and intellectual disability. Am. J. Med. Genet. C. Semin. Med. Genet. 160C, 130–142 (2012).

  15. 15.

    & Combined analysis of exome sequencing points toward a major role for transcription regulation during brain development in autism. Mol. Psychiatry 18, 1054–1056 (2013).

  16. 16.

    et al. Integrative functional genomic analyses implicate specific molecular pathways and circuits in autism. Cell 155, 1008–1021 (2013).

  17. 17.

    et al. Integrated systems analysis reveals a molecular network underlying autism spectrum disorders. Mol. Syst. Biol. 10, 774 (2014).

  18. 18.

    , , , & Genotype to phenotype relationships in autism spectrum disorders. Nat. Neurosci. 18, 191–198 (2015).

  19. 19.

    , , & The discovery of integrated gene networks for autism and related disorders. Genome Res. 25, 142–154 (2015).

  20. 20.

    , & Network assisted analysis to reveal the genetic basis of autism. Ann. Appl. Stat. 9, 1571–1600 (2015).

  21. 21.

    et al. Understanding multicellular function and disease with human tissue-specific networks. Nat. Genet. 47, 569–576 (2015).

  22. 22.

    et al. FMRP stalls ribosomal translocation on mRNAs linked to synaptic function and autism. Cell 146, 247–261 (2011).

  23. 23.

    et al. Topoisomerases facilitate transcription of long genes linked to autism. Nature 501, 58–62 (2013).

  24. 24.

    et al. The autism-associated chromatin modifier CHD8 regulates other autism risk genes during human neurodevelopment. Nat. Commun. 6, 6404 (2015).

  25. 25.

    et al. Convergence of genes and cellular pathways dysregulated in autism spectrum disorders. Am. J. Hum. Genet. 94, 677–694 (2014).

  26. 26.

    et al. Characterization of the proteome, diseases and evolution of the human postsynaptic density. Nat. Neurosci. 14, 19–21 (2011).

  27. 27.

    et al. Protein interaction network of alternatively spliced isoforms from brain links genetic risk factors for autism. Nat. Commun. 5, 3650 (2014).

  28. 28.

    et al. Low load for disruptive mutations in autism genes and their biased transmission. Proc. Natl. Acad. Sci. USA 112, E5600–E5607 (2015).

  29. 29.

    et al. Spatio-temporal transcriptome of the human brain. Nature 478, 483–489 (2011).

  30. 30.

    et al. Coexpression networks implicate human midfetal deep cortical projection neurons in the pathogenesis of autism. Cell 155, 997–1007 (2013).

  31. 31.

    et al. Brain-expressed exons under purifying selection are enriched for de novo mutations in autism spectrum disorder. Nat. Genet. 46, 742–747 (2014).

  32. 32.

    et al. Patches of disorganization in the neocortex of children with autism. N. Engl. J. Med. 370, 1209–1219 (2014).

  33. 33.

    , , & Anatomical abnormalities in autism? Cereb. Cortex 4, 1440–1452 (2016).

  34. 34.

    , & Neural variability: friend or foe? Trends Cogn. Sci. 19, 322–328 (2015).

  35. 35.

    , & The cerebellum, sensitive periods, and autism. Neuron 83, 518–532 (2014).

  36. 36.

    et al. Shank3 mutant mice display autistic-like behaviours and striatal dysfunction. Nature 472, 437–442 (2011).

  37. 37.

    et al. The autism brain imaging data exchange: towards a large-scale evaluation of the intrinsic brain architecture in autism. Mol. Psychiatry 19, 659–667 (2014).

  38. 38.

    & Assessing experimentally derived interactions in a small world. Proc. Natl. Acad. Sci. USA 100, 4372–4376 (2003).

  39. 39.

    et al. Transcriptomic analysis of autistic brain reveals convergent molecular pathology. Nature 474, 380–384 (2011).

  40. 40.

    et al. Cytokine aberrations in autism spectrum disorder: a systematic review and meta-analysis. Mol. Psychiatry 20, 440–446 (2015).

  41. 41.

    et al. Association of maternal report of infant and toddler gastrointestinal symptoms with autism: evidence from a prospective birth cohort. JAMA Psychiatry 72, 466–474 (2015).

  42. 42.

    , , , & Sensory symptoms in autism spectrum disorders. Harv. Rev. Psychiatry 22, 112–124 (2014).

  43. 43.

    , , , & The relationship between sleep and behavior in autism spectrum disorder (ASD): a review. J. Neurodev. Disord. 6, 44 (2014).

  44. 44.

    et al. Rosbin: a novel homeobox-like protein gene expressed exclusively in round spermatids. Biol. Reprod. 70, 1485–1492 (2004).

  45. 45.

    et al. Association between microdeletion and microduplication at 16p11.2 and autism. N. Engl. J. Med. 358, 667–675 (2008).

  46. 46.

    et al. Spatiotemporal 16p11.2 protein network implicates cortical late mid-fetal brain development and KCTD13-Cul3-RhoA pathway in psychiatric diseases. Neuron 85, 742–754 (2015).

  47. 47.

    , , , & Depletion of protein phosphatase 4 in human cells reveals essential roles in centrosome maturation, cell migration and the regulation of Rho GTPases. Int. J. Biochem. Cell Biol. 40, 2315–2332 (2008).

  48. 48.

    , , , & Associating genes and protein complexes with disease via network propagation. PLoS Comput. Biol. 6, e1000641 (2010).

  49. 49.

    , & Standardizing ADOS domain scores: separating severity of social affect and restricted and repetitive behaviors. J. Autism Dev. Disord. 44, 2400–2412 (2014).

  50. 50.

    et al. Using large clinical data sets to infer pathogenicity for rare copy number variants in autism cohorts. Mol. Psychiatry 18, 1090–1095 (2013).

  51. 51.

    et al. SFARI Gene 2.0: a community-driven knowledgebase for the autism spectrum disorders (ASDs). Mol. Autism 4, 36 (2013).

  52. 52.

    , , , & Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 33, D514–D517 (2005).

  53. 53.

    , , , & A navigator for human genome epidemiology. Nat. Genet. 40, 124–125 (2008).

  54. 54.

    , , & The genetic association database. Nat. Genet. 36, 431–432 (2004).

  55. 55.

    et al. The Disease and Gene Annotations (DGA): an annotation resource for human disease. Nucleic Acids Res. 41, D553–D560 (2013).

  56. 56.

    , & LIBLINEAR: a library for large linear classification. J. Machine Learning Res. 9, 1871–1874 (2008).

  57. 57.

    & The Simons Simplex Collection: a resource for identification of autism genetic risk factors. Neuron 68, 192–195 (2010).

  58. 58.

    , , , & Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 9, e1003709 (2013).

  59. 59.

    et al. A framework for the interpretation of de novo mutation in human disease. Nat. Genet. 46, 944–950 (2014).

  60. 60.

    et al. NCBI GEO: archive for functional genomics data sets–update. Nucleic Acids Res. 41, D991–D995 (2013).

  61. 61.

    , , & A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19, 185–193 (2003).

  62. 62.

    et al. Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res. 33, e175 (2005).

  63. 63.

    & Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995).

  64. 64.

    , , & Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008, P10008 (2008).

  65. 65.

    et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).

  66. 66.

    Gene Ontology Consortium. The Gene Ontology: enhancements for 2011. Nucleic Acids Res. 40, D559–D564 (2012).

  67. 67.

    et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).

  68. 68.

    , & FIMO: scanning for occurrences of a given motif. Bioinformatics 27, 1017–1018 (2011).

  69. 69.

    et al. HOCOMOCO: a comprehensive collection of human transcription factor binding sites models. Nucleic Acids Res. 41, D195–D202 (2013).

  70. 70.

    , & D3: data-driven documents. IEEE Trans. Vis. Comput. Graph. 17, 2301–2309 (2011).

  71. 71.

    & Predicting good probabilities with supervised learning. in Proc. 22nd Internat. Conf. Machine Learning 625–632 (ACM Press, 2005).

Download references


We are grateful to all of the families at the participating Simons Simplex Collection (SSC) sites, as well as the SSC principal investigators. We thank all members of the Troyanskaya lab for valuable discussions. We thank J. Spiro and other members of the Simons Foundation for constant feedback on the work and manuscript. This work was primarily supported by US National Institutes of Health (NIH) grants R01 GM071966 and R01 HG005998 to O.G.T. V.Y. was supported in part by US NIH grant T32 HG003284. This work was supported in part by US NIH grant P50 GM071508. O.G.T. is a senior fellow of the Genetic Networks program of the Canadian Institute for Advanced Research (CIFAR).

Author information

Author notes

    • Arjun Krishnan
    •  & Ran Zhang

    These authors contributed equally to this work.


  1. Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, USA.

    • Arjun Krishnan
    • , Chandra L Theesfeld
    • , Alicja Tadych
    •  & Olga G Troyanskaya
  2. Department of Molecular Biology, Princeton University, Princeton, New Jersey, USA.

    • Ran Zhang
  3. Department of Computer Science, Princeton University, Princeton, New Jersey, USA.

    • Victoria Yao
    •  & Olga G Troyanskaya
  4. Simons Foundation, New York, New York, USA.

    • Aaron K Wong
    • , Natalia Volfovsky
    • , Alan Packer
    •  & Alex Lash
  5. Flatiron Institute, Simons Foundation, New York, New York, USA.

    • Olga G Troyanskaya


  1. Search for Arjun Krishnan in:

  2. Search for Ran Zhang in:

  3. Search for Victoria Yao in:

  4. Search for Chandra L Theesfeld in:

  5. Search for Aaron K Wong in:

  6. Search for Alicja Tadych in:

  7. Search for Natalia Volfovsky in:

  8. Search for Alan Packer in:

  9. Search for Alex Lash in:

  10. Search for Olga G Troyanskaya in:


A.K., R.Z., A.L. and O.G.T. conceived and designed the research. A.K. and R.Z. performed computational analyses with contributions from A.L., V.Y., A.K.W. and C.L.T. N.V. provided data. A.T. developed the web interface with contributions from A.K.W., A.K. and R.Z. A.K., R.Z., A.L., A.P. and O.G.T. wrote the manuscript with inputs from V.Y. and C.L.T., and all authors contributed to revisions. A.K. and R.Z. are co-first authors and are listed alphabetically.

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Alex Lash or Olga G Troyanskaya.

Integrated supplementary information

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–15

  2. 2.

    Supplementary Methods Checklist

Excel files

  1. 1.

    Supplementary Table 1: Training gold standard.

    Our training gold standard consisted of known ASD-associated genes (with varying levels of evidence E1-4) as positives and non-mental-health-related genes as negatives. The positives are listed along with their evidence level and source database.

  2. 2.

    Supplementary Table 2: Top 20 biological processes enriched in our SVM model for predicting ASD-genes.

    We analyzed our ASD-gene prediction model to identify which biological processes and pathways contribute the most in associating a gene with ASD in the brain-specific network. The table contains the top 20 statistically enriched Gene Ontology biological processes among genes that are most highly “weighted” by the model, i.e., associated with the highest feature weights in our SVM model. The most informative genes in our ASD network-based model are strongly enriched for neurological processes, providing insight into the general underlying processes that may be driving our predictions.

  3. 3.

    Supplementary Table 3: Genome-wide prediction of ASD-associated genes.

    The predicted ASD-association ranking of all genes in the genome is listed along with detailed information on their gold standard status, prediction score, prediction probability, prediction P and Q values, and membership in ASD-related gene sets. The file also contains the evaluation of the genome-wide ranking controlling for gene length and neuronal functional annotations, and literature support for select top-ranked genes not used in our positive training standard.

  4. 4.

    Supplementary Table 4: Targets of de novo mutations identified by exome sequencing of the Simon Simplex Collection.

    Genes harboring de novo likely-gene-disrupting (LGD; also known as loss-of-function) or synonymous (SYN) mutations identified in autistic children (probands; prb) and unaffected sibling (sib) are listed separately.

  5. 5.

    Supplementary Table 5: ASD-association of brain developmental gene-expression signatures.

    All signatures that are significantly enriched among the top-ranked ASD genes are listed here along with the number of genes in each signature and their enrichment scores.

  6. 6.

    Supplementary Table 6: ASD-associated functional modules in the brain-specific network.

    The nine modules of top-ranked ASD genes each tightly connected in the brain-specific network are presented here with information about their module/cluster membership, connectivity within each cluster, and enriched biological processes in each cluster.

  7. 7.

    Supplementary Table 7: Prioritization of genes within ASD-associated CNVs.

    The table contains the complete ASD ranking of genes within each of eight autism-associated CNVs along with details on previous genetic or functional evidence for the connection of individual CNV-genes to ASD.

  8. 8.

    Supplementary Table 8: Functional analysis of ASD-associated CNVs.

    Results from the functional analysis of top-ranked genes in the eight ASD-associated CNVs are presented here, with details on the specific ‘intermediate’ genes and processes that connect the CNV genes to the molecular phenotype of autism. The table also contains literature support for select intermediate genes.

  9. 9.

    Supplementary Table 9: Detailed functional, developmental, and CNV information for our top-decile genes.

    Top 2,500 ASD candidate genes along with their functional module memberships, spatiotemporal developmental gene-expression patterns, and CNV membership.

About this article

Publication history






Further reading