Autism is a highly heritable complex disorder in which de novo mutation (DNM) variation contributes significantly to risk. Using whole-genome sequencing data from 3,474 families, we investigate another source of large-effect risk variation, ultra-rare variants. We report and replicate a transmission disequilibrium of private, likely gene-disruptive (LGD) variants in probands but find that 95% of this burden resides outside of known DNM-enriched genes. This variant class more strongly affects multiplex family probands and supports a multi-hit model for autism. Candidate genes with private LGD variants preferentially transmitted to probands converge on the E3 ubiquitin–protein ligase complex, intracellular transport and Erb signaling protein networks. We estimate that these variants are approximately 2.5 generations old and significantly younger than other variants of similar type and frequency in siblings. Overall, private LGD variants are under strong purifying selection and appear to act on a distinct set of genes not yet associated with autism.
Subscribe to Journal
Get full journal access for 1 year
only $4.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
The WGS data used in this study are available from the following resources. The AGRE study is available at the Database of Genotypes and Phenotypes (dbGaP) under accession no. phs001766. Access to the AGRE WGS data is subject to approval by Autism Speaks and AGRE. All sequencing and phenotype data for the SSC are available through SFARI and are available to approved researchers at SFARI Base (accession nos. SFARI_SSC_WGS_p, SFARI_SSC_WGS_1 and SFARI_SSC_WGS_2). The genomic and phenotypic data for the SPARK study are available by request from SFARI Base (accession no. SFARI_SPARK_WES_1). Data from the SAGE study are available at the dbGaP under accession no. phs001740.v1.p1. Data from the TASC study are available at dbGaP under accession no. phs001741. Family-level FreeBayes and GATK VCF files for SAGE, SSC and TASC are available under dbGaP accession no. phs001874.v1.p1 and at SFARI Base under accession no. SFARI_SSC_WGS_2a.
All software used in this study is publicly available. The code for the ultra-rare transmitted variant pipeline can be found at https://github.com/EichlerLab/ultra_rare_transmitted.git. The code for the figures and analyses are available upon request.
Baio, J. et al. Prevalence of autism spectrum disorder among children aged 8 years—Autism and Developmental Disabilities Monitoring Network, 11 Sites, United States, 2014. MMWR Surveill. Summ. 67, 1–23 (2018).
Iossifov, I. et al. The contribution of de novo coding mutations to autism spectrum disorder. Nature 515, 216–221 (2014).
Krumm, N. et al. Excess of rare, inherited truncating mutations in autism. Nat. Genet. 47, 582–588 (2015).
Turner, T. N. et al. Genomic patterns of de novo mutation in simplex autism. Cell 171, 710–722.e12 (2017).
Gaugler, T. et al. Most genetic risk for autism resides with common variation. Nat. Genet. 46, 881–885 (2014).
Constantino, J. N. et al. Autism recurrence in half siblings: strong support for genetic mechanisms of transmission in ASD. Mol. Psychiatry 18, 137–138 (2013).
Ganna, A. et al. Quantifying the impact of rare and ultra-rare coding variation across the phenotypic spectrum. Am. J. Hum. Genet. 102, 1204–1211 (2018).
Ruzzo, E. K. et al. Inherited and de novo genetic risk for autism impacts shared networks. Cell 178, 850–866.e26 (2019).
De Rubeis, S. et al. Synaptic, transcriptional and chromatin genes disrupted in autism. Nature 515, 209–215 (2014).
Sanders, S. J. et al. Insights into autism spectrum disorder genomic architecture and biology from 71 risk loci. Neuron 87, 1215–1233 (2015).
Satterstrom, F. K. et al. Large-scale exome sequencing study implicates both developmental and functional changes in the neurobiology of autism. Cell 180, 568–584.e23 (2020).
Satterstrom, F. K. et al. Autism spectrum disorder and attention deficit hyperactivity disorder have a similar burden of rare protein-truncating variants. Nat. Neurosci. 22, 1961–1965 (2019).
Schaaf, C. P. et al. Oligogenic heterozygosity in individuals with high-functioning autism spectrum disorders. Hum. Mol. Genet. 20, 3366–3375 (2011).
Girirajan, S. et al. A recurrent 16p12.1 microdeletion supports a two-hit model for severe developmental delay. Nat. Genet. 42, 203–209 (2010).
Du, Y. et al. Nonrandom occurrence of multiple de novo coding variants in a proband indicates the existence of an oligogenic model in autism. Genet. Med. 22, 170–180 (2020).
Jiang, Y.-H. et al. A mixed epigenetic/genetic model for oligogenic inheritance of autism with a limited role for UBE3A. Am. J. Med. Genet. A 131, 1–10 (2004).
Grove, J. et al. Identification of common genetic risk variants for autism spectrum disorder. Nat. Genet. 51, 431–444 (2019).
Weiner, D. J. et al. Polygenic transmission disequilibrium confirms that common and rare variation act additively to create risk for autism spectrum disorders. Nat. Genet. 49, 978–985 (2017).
Turner, T. N. et al. Genome sequencing of autism-affected families reveals disruption of putative noncoding regulatory DNA. Am. J. Hum. Genet. 98, 58–74 (2016).
Fischbach, G. D. & Lord, C. The Simons Simplex Collection: a resource for identification of autism genetic risk factors. Neuron 68, 192–195 (2010).
Guo, H. et al. Genome sequencing identifies multiple deleterious variants in autism patients with more severe phenotypes. Genet. Med. 21, 1611–1620 (2019).
An, J.-Y. et al. Genome-wide de novo risk score implicates promoter variation in autism spectrum disorder. Science 362, eaat6576 (2018).
Buxbaum, J. D. et al. The Autism Simplex Collection: an international, expertly phenotyped autism sample for genetic and phenotypic analyses. Mol. Autism 5, 34 (2014).
Feliciano, P. et al. Exome sequencing of 457 autism families recruited online provides evidence for autism risk genes. NPJ Genom. Med. 4, 19 (2019).
SPARK Consortium. SPARK: a US cohort of 50,000 families to accelerate autism research. Neuron 97, 488–493 (2018).
Snijders Blok, L. et al. De novo mutations in MED13, a component of the Mediator complex, are associated with a novel neurodevelopmental disorder. Hum. Genet. 137, 375–388 (2018).
Shah, A. A. et al. Excess of RALGAPB de novo variants in neurodevelopmental disorders. Eur. J. Med. Genet. 63, 104041 (2020).
Sapio, M. R. et al. Novel carboxypeptidase A6 (CPA6) mutations identified in patients with juvenile myoclonic and generalized epilepsy. PLoS ONE 10, e0123180 (2015).
Li, Q. S., Parrado, A. R., Samtani, M. N., Narayan, V. A. & Alzheimer’s Disease Neuroimaging Initiative. Variations in the FRA10AC1 fragile site and 15q21 are associated with cerebrospinal fluid Aβ1-42 level. PLoS ONE 10, e0134000 (2015).
Siitonen, A. et al. Genetics of early-onset Parkinson’s disease in Finland: exome sequencing and genome-wide association study. Neurobiol. Aging 53, 195.e7–195.e10 (2017).
Gravel, S. et al. Demographic history and rare allele sharing among human populations. Proc. Natl Acad. Sci. USA 108, 11983–11988 (2011).
Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Tennessen, J. A. et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012).
Iossifov, I. et al. Low load for disruptive mutations in autism genes and their biased transmission. Proc. Natl Acad. Sci. USA 112, E5600–E5607 (2015).
Epi25 Collaborative. Ultra-rare genetic variation in the epilepsies: a whole-exome sequencing study of 17,606 individuals. Am. J. Hum. Genet. 105, 267–282 (2019).
Coe, B. P. et al. Neurodevelopmental disease genes implicated by de novo mutation and copy number variation morbidity. Nat. Genet. 51, 106–116 (2019).
O’Roak, B. J. et al. Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature 485, 246–250 (2012).
Sebat, J. et al. Strong association of de novo copy number mutations with autism. Science 316, 445–449 (2007).
He, X. et al. Integrated model of de novo and inherited genetic variants yields greater power to identify risk genes. PLoS Genet. 9, e1003671 (2013).
Maruyama, T. The age of a rare mutant gene in a large population. Am. J. Hum. Genet. 26, 669–673 (1974).
Speidel, L., Forest, M., Shi, S. & Myers, S. R. A method for genome-wide genealogy estimation for thousands of samples. Nat. Genet. 51, 1321–1329 (2019).
Pinto, D. et al. Convergence of genes and cellular pathways dysregulated in autism spectrum disorders. Am. J. Hum. Genet. 94, 677–694 (2014).
Deardorff, M. A. et al. HDAC8 mutations in Cornelia de Lange syndrome affect the cohesin acetylation cycle. Nature 489, 313–317 (2012).
Williams, S. R. et al. Haploinsufficiency of HDAC4 causes brachydactyly mental retardation syndrome, with brachydactyly type E, developmental delays, and behavioral problems. Am. J. Hum. Genet. 87, 219–228 (2010).
Bernier, R. et al. Disruptive CHD8 mutations define a subtype of autism early in development. Cell 158, 263–276 (2014).
King, I. F. et al. Topoisomerases facilitate transcription of long genes linked to autism. Nature 501, 58–62 (2013).
Sanders, S. J. et al. Multiple recurrent de novo CNVs, including duplications of the 7q11.23 Williams syndrome region, are strongly associated with autism. Neuron 70, 863–885 (2011).
Glessner, J. T. et al. Autism genome-wide copy number variation reveals ubiquitin and neuronal genes. Nature 459, 569–573 (2009).
Fairless, R. et al. Polarized targeting of neurexins to synapses is regulated by their C-terminal sequences. J. Neurosci. 28, 12969–12981 (2008).
Gromova, K. V. et al. Neurobeachin and the kinesin KIF21B are critical for endocytic recycling of NMDA receptors and regulate social behavior. Cell Rep. 23, 2705–2717 (2018).
Tomaselli, P. J. et al. A de novo dominant mutation in KIF1A associated with axonal neuropathy, spasticity and autism spectrum disorder. J. Peripher. Nerv. Syst. 22, 460–463 (2017).
The Gene Ontology Consortium. The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Res. 47, D330–D338 (2019).
Girirajan, S. et al. Phenotypic heterogeneity of genomic disorders and rare copy-number variants. N. Engl. J. Med. 367, 1321–1331 (2012).
Stessman, H. A., Bernier, R. & Eichler, E. E. A genotype-first approach to defining the subtypes of a complex disease. Cell 156, 872–877 (2014).
Epi4K consortium & Epilepsy Phenome/Genome Project. Ultra-rare genetic variation in common epilepsies: a case-control sequencing study. Lancet Neurol. 16, 135–143 (2017)..
Regier, A. A. et al. Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects. Nat. Commun. 9, 4038 (2018).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Poplin, R. et al. Scaling accurate genetic variant discovery to tens of thousands of samples. Preprint at bioRxiv https://doi.org/10.1101/201178 (2018).
Garrison, E. & Marth, G. Haplotype-based variant detection from short-read sequencing. Preprint at https://arxiv.org/abs/1207.3907 (2012).
Rimmer, A. et al. Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications. Nat. Genet. 46, 912–918 (2014).
Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993 (2011).
Manichaikul, A. et al. Robust relationship inference in genome-wide association studies. Bioinformatics 26, 2867–2873 (2010).
Hsieh, P. et al. Adaptive archaic introgression of copy number variants and the discovery of previously unknown human genes. Science 366, eaax2083 (2019).
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).
Patterson, N., Price, A. L. & Reich, D. Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006).
Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).
Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).
Cingolani, P. et al. Using Drosophila melanogaster as a model for genotoxic chemical mutational studies with a new program, SnpSift. Front. Genet. 3, 35 (2012).
Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 6, 80–92 (2012).
Quinlan, A. R. BEDTools: the Swiss-Army tool for genome feature analysis. Curr. Protoc. Bioinformatics 47, 11.12.1-34 (2014).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Krumm, N. et al. Copy number variation detection and genotyping from exome sequence data. Genome Res. 22, 1525–1532 (2012).
Fromer, M. et al. Discovery and statistical genotyping of copy-number variation from whole-exome sequencing depth. Am. J. Hum. Genet. 91, 597–607 (2012).
Wang, K. et al. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 17, 1665–1674 (2007).
Scharpf, R. B., Irizarry, R. A., Ritchie, M. E., Carvalho, B. & Ruczinski, I. Using the R package crlmm for genotyping and copy number estimation. J. Stat. Softw. 40, 1–32 (2011).
Rausch, T. et al. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28, i333–i339 (2012).
Hormozdiari, F. et al. Next-generation VariationHunter: combinatorial algorithms for transposon insertion discovery. Bioinformatics 26, i350–i357 (2010).
Kronenberg, Z. N. et al. Wham: identifying structural variants of biological consequence. PLoS Comput. Biol. 11, e1004572 (2015).
Layer, R. M., Chiang, C., Quinlan, A. R. & Hall, I. M. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 15, R84 (2014).
Handsaker, R. E. et al. Large multiallelic copy number variations in humans. Nat. Genet. 47, 296–303 (2015).
Sudmant, P. H. et al. Global diversity, population stratification, and selection of human copy-number variation. Science 349, aab3761 (2015).
Abyzov, A., Urban, A. E., Snyder, M. & Gerstein, M. CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res. 21, 974–984 (2011).
Turner, T. N. et al. Sex-based analysis of de novo variants in neurodevelopmental disorders. Am. J. Hum. Genet. 105, 1274–1285 (2019).
McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).
Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).
Samocha, K. E. et al. A framework for the interpretation of de novo mutation in human disease. Nat. Genet. 46, 944–950 (2014).
Ware, J. S., Samocha, K. E., Homsy, J. & Daly, M. J. Interpreting de novo variation in human disease using denovolyzeR. Curr. Protoc. Hum. Genet. 87, 7.25.1–7.25.15 (2015).
He, Z. et al. Rare-variant extensions of the transmission disequilibrium test: application to autism exome sequence data. Am. J. Hum. Genet. 94, 33–46 (2014).
Cole, P. & MacMahon, B. Attributable risk percent in case-control studies. Br. J. Prev. Soc. Med. 25, 242–244 (1971).
Browning, S. R. & Browning, B. L. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81, 1084–1097 (2007).
Dougherty, J. D., Schmidt, E. F., Nakajima, M. & Heintz, N. Analytical approaches to RNA profiling data for the identification of genes enriched in specific cells. Nucleic Acids Res. 38, 4218–4230 (2010).
Hodge, R. D. et al. Conserved cell types with divergent features in human versus mouse cortex. Nature 573, 61–68 (2019).
Szklarczyk, D. et al. STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 47, D607–D613 (2019).
Ono, K., Muetze, T., Kolishovski, G., Shannon, P. & Demchak, B. CyREST: turbocharging Cytoscape access for external tools via a RESTful API. F1000Res. 4, 478 (2015).
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
We thank T. Brown for assistance in editing this manuscript and S. Stray, M. Eng, J. Moore, H. Kortbawi and A. Thornton from the laboratory of Mary-Claire King for the isolation of DNA from whole blood. We thank T. Maniatis and the New York Genome Center for conducting the sequencing and initial quality control. We thank all the families at the participating SSC sites, as well as the principal investigators (A. Beaudet, R. Bernier, J. Constantino, E. Cook, E. Fombonne, D. Geschwind, R. Goin-Kochel, E. Hanson, D. Grice, A. Klin, D. Ledbetter, C. Lord, C. Martin, D. Martin, R. Maxim, J. Miles, O. Ousley, K. Pelphrey, B. Peterson, J. Piggot, C. Saulnier, M. State, W. Stone, J. Sutcliffe, C. Walsh, Z. Warren and E. Wijsman). We thank all the families in SPARK, the SPARK clinical sites and SPARK staff. We appreciate obtaining access to the phenotypic and genetic data on SFARI Base. Approved researchers can obtain the SSC population dataset described in this study (https://www.sfari.org/resource/simons-simplex-collection/) and the SPARK population dataset described in this study (https://www.sfari.org/resource/spark/) by applying at https://base.sfari.org. We gratefully acknowledge the resources provided by the AGRE Consortium and the participating AGRE families. Genomic data for the AGRE cohort was provided by iHART, an initiative led by the Hartwell Foundation and directed by D. Wall and D. Geschwind. This work was supported, in part, by grants from the National Institutes of Health (no. R01 MH101221 to E.E.E.; no. R01 MH100047 to R.A.B.; no. K99 MH117165 to T.N.T.; no. K99 HG011041 to P.H.; and no. UM1 HG008901 to M.C.Z.) and the Simons Foundation (no. SFARI 608045 to E.E.E.). The CCDG is funded by the National Human Genome Research Institute and the National Heart, Lung, and Blood Institute. The Genome Sequencing Program Coordinating Center (no. U24 HG008956) contributed to cross-program scientific initiatives and provided logistical and general study coordination. AGRE is a program of Autism Speaks and is supported in part by grant no. 1U24MH081810 from the National Institute of Mental Health to C. M. Lajonchere. E.E.E. is an investigator of the Howard Hughes Medical Institute.
The authors declare no competing interests.
Peer review information Nature Genetics thanks Anders Børglum, Thomas Bourgeron and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Wilfert, A.B., Turner, T.N., Murali, S.C. et al. Recent ultra-rare inherited variants implicate new autism candidate risk genes. Nat Genet 53, 1125–1134 (2021). https://doi.org/10.1038/s41588-021-00899-8