Spontaneously arising (de novo) mutations have an important role in medical genetics. For diseases with extensive locus heterogeneity, such as autism spectrum disorders (ASDs), the signal from de novo mutations is distributed across many genes, making it difficult to distinguish disease-relevant mutations from background variation. Here we provide a statistical framework for the analysis of excesses in de novo mutation per gene and gene set by calibrating a model of de novo mutation. We applied this framework to de novo mutations collected from 1,078 ASD family trios, and, whereas we affirmed a significant role for loss-of-function mutations, we found no excess of de novo loss-of-function mutations in cases with IQ above 100, suggesting that the role of de novo mutations in ASDs might reside in fundamental neurodevelopmental processes. We also used our model to identify ∼1,000 genes that are significantly lacking in functional coding variation in non-ASD samples and are enriched for de novo loss-of-function mutations identified in ASD cases.
All data from published studies are available in the respective publications. All newly generated data and computational tools used in this paper will be available online as downloadable material. We have also constructed a website to query genes that provides information on constraint and the de novo mutations found in the specified gene across published studies of de novo mutation. We would like to thank E. Daly and M. Chess for their contributions to data analysis and the construction of the website, respectively. We acknowledge the following resources and families who contributed to them: the National Institute of Mental Health (NIMH) repository (U24MH068457); the Autism Genetic Resource Exchange (AGRE) Consortium, a program of Autism Speaks (1U24MH081810 to C.M. Lajonchere); The Autism Simplex Collection (TASC) (grant from Autism Speaks); the Simons Foundation Autism Research Initiative (SFARI) Simplex Collection (grant from the Simons Foundation); and The Autism Consortium (grant from the Autism Consortium). This work was directly supported by US National Institutes of Health (NIH) grants R01MH089208 (M.J.D.), R01MH089025 (J.D.B.), R01MH089004 (G.D.S.), R01MH089175 (R.A.G.) and R01MH089482 (J.S.S.) and was supported in part by US NIH grants P50HD055751 (E.H.C.), R01MH057881 (B.D.) and R01MH061009 (J.S.S.). We acknowledge partial support from grants U54HG003273 (R.A.G.) and U54HG003067 (E. Lander). We thank T. Lehner (NIMH), A. Felsenfeld (National Human Genome Research Institute) and P. Bender (NIMH) for their support and contribution to the project. E.B., J.D.B., B.D., M.J.D., R.A.G., K. Roeder, A.S., G.D.S. and J.S.S. are lead investigators in the ARRA Autism Sequencing Collaboration (AASC). We would also like to thank the NHLBI GO Exome Sequencing Project (ESP) and its ongoing studies that produced and provided exome variant calls on the web: the Lung GO Sequencing Project (HL-102923), the Women's Health Initiative (WHI) Sequencing Project (HL-102924), the Broad GO Sequencing Project (HL-102925), the Seattle GO Sequencing Project (HL-102926) and the Heart GO Sequencing Project (HL-103010).
Integrated supplementary information
The per-gene probabilities of mutation are listed for each gene (transcript specified) by mutation type. Probabilities of mutation are given per chromosome and have been transformed by log10. “NA” is listed when there is no probability of mutation due usually to low coverage.
The gene-specific information listed includes transcript and identifier, chromosome, transcription start position, number of coding bases, probabilities of a synonymous and missense mutation (given per chromosome), the number of observed and expected synonymous and missense variants, the signed Z scores for the deviation for both synonymous and missense variants, and the ratio of missing missense variation (“ratio_missing”).