The genomes of individuals with severe, undiagnosed developmental disorders are enriched in damaging de novo mutations (DNMs) in developmentally important genes. Here we have sequenced the exomes of 4,293 families containing individuals with developmental disorders, and meta-analysed these data with data from another 3,287 individuals with similar disorders. We show that the most important factors influencing the diagnostic yield of DNMs are the sex of the affected individual, the relatedness of their parents, whether close relatives are affected and the parental ages. We identified 94 genes enriched in damaging DNMs, including 14 that previously lacked compelling evidence of involvement in developmental disorders. We have also characterized the phenotypic diversity among these disorders. We estimate that 42% of our cohort carry pathogenic DNMs in coding sequences; approximately half of these DNMs disrupt gene function and the remainder result in altered protein function. We estimate that developmental disorders caused by DNMs have an average prevalence of 1 in 213 to 1 in 448 births, depending on parental age. Given current global demographics, this equates to almost 400,000 children born per year.
We thank the families for their participation and patience. We are grateful to the Exome Aggregation Consortium for making their data available. The DDD study presents independent research commissioned by the Health Innovation Challenge Fund (grant HICF-1009-003), a parallel funding partnership between the Wellcome Trust and the UK Department of Health, and the Wellcome Trust Sanger Institute (grant WT098051). The views expressed in this publication are those of the author(s) and not necessarily those of the Wellcome Trust or the UK Department of Health. The study has UK Research Ethics Committee approval (10/H0305/83, granted by the Cambridge South Research Ethics Committee and GEN/284/12, granted by the Republic of Ireland Research Ethics Committee). The research team acknowledges the support of the National Institutes for Health Research, through the Comprehensive Clinical Research Network. We thank the Sanger Human Genome Informatics team, the Sample Management team, the Illumina High-Throughput team, the New Pipeline Group team, the DNA pipelines team and the Core Sequencing team for their support in generating and processing the data. D.R.F. is funded through an MRC Human Genetics Unit program grant to the University of Edinburgh. Finally we acknowledge the contribution of two esteemed DDD clinical collaborators, J. Tolmie and L. Brueton, who died during the course of the study.
Extended data figures
Extended data tables
This file contains Supplementary Tables 1-4 comprising: (1) de novo mutations (DNM) in the 4,293 DDD individuals. It includes sex, chromosome, position, reference and alternate alleles, HGNC symbol, VEP consequence, posterior probability of DNM and validation status where available. Individual IDs are available on request. This list excludes the sites that failed validations, but includes sites that passed validation (confirmed), sites that were uncertain (uncertain), and sites that were not tested by secondary validation (NA). Genome positions are given as GRCh37 coordinates; (2) Details of cohorts used in meta-analyses. This includes numbers of individuals by sex and publication details; (3) Genes with genome-wide significant statistical evidence to be developmental disorder genes. The numbers of unrelated individuals with independent de novo mutations (DNMs) are given for protein truncating variants (PTV) and missense variants. If any additional individuals were in other cohorts, that number is given in brackets. The P-value reported is the minimum P-value from the testing of the DDD dataset or the meta-analysis dataset. The subset providing the P-value is also listed. Mutations are considered clustered if the P-value proximity clustering of DNMs is less than 0.01; (4) Comparison of known haploinsufficient (HI) neurodevelopment genes to HI and non-HI enrichment models. Genes are ranked by difference in the Akaike’s Information Criterion computed for models where the genes match either expected non-HI PTV enrichment (model 1), or expected HI protein-truncating variant (PTV) enrichment (model 2).