The genomes of individuals with severe, undiagnosed developmental disorders are enriched in damaging de novo mutations (DNMs) in developmentally important genes. Here we have sequenced the exomes of 4,293 families containing individuals with developmental disorders, and meta-analysed these data with data from another 3,287 individuals with similar disorders. We show that the most important factors influencing the diagnostic yield of DNMs are the sex of the affected individual, the relatedness of their parents, whether close relatives are affected and the parental ages. We identified 94 genes enriched in damaging DNMs, including 14 that previously lacked compelling evidence of involvement in developmental disorders. We have also characterized the phenotypic diversity among these disorders. We estimate that 42% of our cohort carry pathogenic DNMs in coding sequences; approximately half of these DNMs disrupt gene function and the remainder result in altered protein function. We estimate that developmental disorders caused by DNMs have an average prevalence of 1 in 213 to 1 in 448 births, depending on parental age. Given current global demographics, this equates to almost 400,000 children born per year.
This is a preview of subscription content, access via your institution
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Rent or buy this article
Prices vary by article type
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Sheridan, E. et al. Risk factors for congenital anomaly in a multiethnic birth cohort: an analysis of the Born in Bradford study. Lancet 382, 1350–1359 (2013)
Ropers, H. H. Genetics of early onset cognitive impairment. Annu. Rev. Genomics Hum. Genet. 11, 161–187 (2010)
de Ligt, J. et al. Diagnostic exome sequencing in persons with severe intellectual disability. N. Engl. J. Med. 367, 1921–1929 (2012)
De Rubeis, S. et al. Synaptic, transcriptional and chromatin genes disrupted in autism. Nature 515, 209–215 (2014)
Epi4K Consortium & Epilepsy Phenome/Genome Project. De novo mutations in epileptic encephalopathies. Nature 501, 217–221 (2013)
EuroEPINOMICS-RES Consortium, Epilepsy Phenome/Genome Project & Epi4K Consortium. De novo mutations in synaptic transmission genes including DNM1 cause epileptic encephalopathies. Am. J. Hum. Genet. 95, 360–370 (2014)
Fromer, M. et al. De novo mutations in schizophrenia implicate synaptic networks. Nature 506, 179–184 (2014)
Gilissen, C. et al. Genome sequencing identifies major causes of severe intellectual disability. Nature 511, 344–347 (2014)
Iossifov, I. et al. The contribution of de novo coding mutations to autism spectrum disorder. Nature 515, 216–221 (2014)
Iossifov, I. et al. De novo gene disruptions in children on the autistic spectrum. Neuron 74, 285–299 (2012)
O’Roak, B. J. et al. Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature 485, 246–250 (2012)
Rauch, A. et al. Range of genetic mutations associated with severe non-syndromic sporadic intellectual disability: an exome sequencing study. Lancet 380, 1674–1682 (2012)
Sanders, S. J. et al. De novo mutations revealed by whole-exome sequencing are strongly associated with autism. Nature 485, 237–241 (2012)
Zaidi, S. et al. De novo mutations in histone-modifying genes in congenital heart disease. Nature 498, 220–223 (2013)
Deciphering Developmental Disorders Study. Large-scale discovery of novel genetic causes of developmental disorders. Nature 519, 223–228 (2015)
de Ligt, J., Veltman, J. A. & Vissers, L. E. L. M. Point mutations as a source of de novo genetic disease. Curr. Opin. Genet. Dev. 23, 257–263 (2013)
Wilkie, A. O. The molecular basis of genetic dominance. J. Med. Genet. 31, 89–98 (1994)
Wright, C. F. et al. Genetic diagnosis of developmental disorders in the DDD study: a scalable analysis of genome-wide research data. Lancet 385, 1305–1314 (2014)
Jacquemont, S. et al. A higher mutational burden in females supports a “female protective model” in neurodevelopmental disorders. Am. J. Hum. Genet. 94, 415–425 (2014)
Kong, A. et al. Rate of de novo mutations and the importance of father’s age to disease risk. Nature 488, 471–475 (2012)
Rahbari, R. et al. Timing, rates and spectra of human germline mutation. Nat. Genet. 48, 126–133 (2016)
Wong, W. S. et al. New observations on maternal age effect on germline de novo mutations. Nat. Commun. 7, 10486 (2016)
Samocha, K. E. et al. A framework for the interpretation of de novo mutation in human disease. Nat. Genet. 46, 944–950 (2014)
Ferry, Q. et al. Diagnostically relevant facial gestalt information from ordinary photos. eLife 3, e02020 (2014)
Hirata, H. et al. ZC4H2 mutations are associated with arthrogryposis multiplex congenita and intellectual disability through impairment of central and peripheral synaptic plasticity. Am. J. Hum. Genet. 92, 681–695 (2013)
Homan, C. C. et al. Mutations in USP9X are associated with X-linked intellectual disability and disrupt neuronal cell migration and growth. Am. J. Hum. Genet. 94, 470–478 (2014)
Liu, J. et al. SMC1A expression and mechanism of pathogenicity in probands with X-linked Cornelia de Lange syndrome. Hum. Mutat. 30, 1535–1542 (2009)
Akawi, N. et al. Discovery of four recessive developmental disorders using probabilistic genotype and phenotype matching among 4,125 families. Nat. Genet. 47, 1363–1369 (2015)
Meynert, A. M., Ansari, M., FitzPatrick, D. R. & Taylor, M. S. Variant detection sensitivity and biases in whole genome and exome sequencing. BMC Bioinformatics 15, 247 (2014)
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016)
Petrovski, S., Wang, Q., Heinzen, E. L., Allen, A. S. & Goldstein, D. B. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 9, e1003709 (2013)
Boycott, K. M., Vanstone, M. R., Bulman, D. E. & MacKenzie, A. E. Rare-disease genetics in the era of next-generation sequencing: discovery to translation. Nat. Rev. Genet. 14, 681–691 (2013)
Springett, A. et al. Congenital Anomaly Statistics 2011: England and Wales. (2013)
Sifrim, A. et al. Distinct genetic architectures for syndromic and nonsyndromic congenital heart defects identified by exome sequencing. Nat. Genet. 48, 1060–1065 (2016)
Okur, V. et al. De novo mutations in CSNK2A1 are associated with neurodevelopmental abnormalities and dysmorphic features. Hum. Genet. 135, 699–705 (2016)
El Chehadeh, S. et al. Dominant variants in the splicing factor PUF60 cause a recognizable syndrome with intellectual disability, heart defects and short stature. Eur. J. Hum. Genet. 25, 43–51 (2016)
Lelieveld, S. H. et al. Meta-analysis of 2,104 trios provides support for 10 new genes for intellectual disability. Nat. Neurosci. 19, 1194–1196 (2016)
Cohen, J. S. et al. Further evidence that de novo missense and truncating variants in ZBTB18 cause intellectual disability with variable features. Clin. Genet. http://dx.doi.org/10.1111/cge.12861 (2016)
Bragin, E. et al. DECIPHER: database for the interpretation of phenotype-linked plausibly pathogenic sequence and copy-number variation. Nucleic Acids Res. 42, D993–D1000 (2014)
Köhler, S. et al. Clinical diagnostics in human genetics with semantic similarity searches in ontologies. Am. J. Hum. Genet. 85, 457–464 (2009)
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009)
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010)
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009)
Ramu, A. et al. DeNovoGear: de novo indel and point mutation discovery and phasing. Nat. Methods 10, 985–987 (2013)
Abecasis, G. R. et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012)
McLaren, W. et al. Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics 26, 2069–2070 (2010)
Felzenszwalb, P. F., Girshick, R. B., McAllester, D. & Ramanan, D. Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1627–1645 (2010)
Xiong, X. & De la Torre, F. Supervised Descent method and its applications to face alignment. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 532–539 (Portland, 2013)
Cooper, G. M. et al. A copy number variation morbidity map of developmental delay. Nat. Genet. 43, 838–846 (2011)
Sagoo, G. S. et al. Array CGH in patients with learning disability (mental retardation) and congenital anomalies: updated systematic review and meta-analysis of 19 studies and 13,926 subjects. Genet. Med. 11, 139–146 (2009)
Central Intelligence Agency. The World Factbook. Vol. 2016 (2016)
The World Bank. Fertility rate, total (births per woman). in World Development Indicators (2016)
Copen, C. E., Thoma, M. E. & Kirmeyer, S. Interpregnancy Intervals in the United States: Data From the Birth Certificate and the National Survey of Family Growth. In National Vital Statistics Reports Vol. 64 (National Center for Health Statistics, 2015)
We thank the families for their participation and patience. We are grateful to the Exome Aggregation Consortium for making their data available. The DDD study presents independent research commissioned by the Health Innovation Challenge Fund (grant HICF-1009-003), a parallel funding partnership between the Wellcome Trust and the UK Department of Health, and the Wellcome Trust Sanger Institute (grant WT098051). The views expressed in this publication are those of the author(s) and not necessarily those of the Wellcome Trust or the UK Department of Health. The study has UK Research Ethics Committee approval (10/H0305/83, granted by the Cambridge South Research Ethics Committee and GEN/284/12, granted by the Republic of Ireland Research Ethics Committee). The research team acknowledges the support of the National Institutes for Health Research, through the Comprehensive Clinical Research Network. We thank the Sanger Human Genome Informatics team, the Sample Management team, the Illumina High-Throughput team, the New Pipeline Group team, the DNA pipelines team and the Core Sequencing team for their support in generating and processing the data. D.R.F. is funded through an MRC Human Genetics Unit program grant to the University of Edinburgh. Finally we acknowledge the contribution of two esteemed DDD clinical collaborators, J. Tolmie and L. Brueton, who died during the course of the study.
M.E.H. is a co-founder of, consultant to, and holds shares in, Congenica Ltd, a genetics diagnostic company.
Reviewer Information Nature thanks D. Goldstein, B. Neale and the other anonymous reviewer(s) for their contribution to the peer review of this work.
Extended data figures and tables
Only individuals with protein-altering or protein-truncating DNMs in dominant or X-linked dominant DD-associated genes, or males with DNMs in hemizygous DD-associated genes were included. The proportions given are for those individuals with any DNMs rather than the total number of individuals in each subset. Cohorts included in the DNM meta-analyses are shaded blue.
Phenotypes are grouped by type. The first group indicates numbers of individuals with DNMs per gene divided by sex (m, male; f, female), and by functional consequence (NSV, nonsynonymous variant; PTV, protein-truncating variant). The second group indicates mean values for growth parameters: birthweight (bw), height (ht), weight (wt) and occipitofrontal circumference (OFC). Values are given as standard deviations from the healthy population mean derived from ALSPAC (Avon longitudinal study of parents and children) data. The third group indicates the mean age for achieving developmental milestones: age of first social smile, age of first sitting unassisted, age of first walking unassisted and age of first speaking. Values are given in months. The final group summarizes HPO-coded phenotypes per gene, as number of HPO terms within different clinical categories.
Extended Data Figure 3 Phenotypic summary of individuals with DNMs in genes achieving genome-wide significance.
Phenotypes are grouped by type. The first group indicates numbers of individuals with DNMs per gene divided by sex (m, male; f, female), and by functional consequence (NSV, nonsynonymous variant; PTV, protein-truncating variant). The second group indicates mean values for growth parameters: birthweight (bw), height (ht), weight (wt) and occipitofrontal circumference (OFC). Values are given as standard deviations from the healthy population mean derived from ALSPAC data. The third group indicates the mean age for achieving developmental milestones: age of first social smile, age of first sitting unassisted, age of first walking unassisted and age of first speaking. Values are given in months. The final group summarizes HPO-coded phenotypes per gene, as number of HPO terms within different clinical categories.
a, CDK13. b, CHD4. c, CNOT3. d, CSNK2A1. e, GNAI1. f, KCNQ3. g, MSL3. h, PPM1D. i, PUF60. j, QRICH1. k, SET. l, KMT5B. m, TCF20. n, ZBTB18.
Extended Data Figure 5 Effect of clustering by phenotype on the ability to identify genome-wide significant genes.
a, Comparison of P values derived from genotypic information alone versus P values that incorporate genotypic information and phenotypic similarity. b, Comparison of P values from tests in the complete DDD cohort versus tests in the subset with seizures. Genes that were previously linked to seizures are shaded blue. c, Proportion of cohort with a DNM in a seizure-associated gene, stratified by seizure-affected status. Error bars, 95% CI. d, Comparison of power to identify genome-wide significant genes in probands with seizures, versus the unstratified cohort, at matched sample sizes.
Extended Data Figure 6 Power of genome versus exome sequencing to discover dominant genes associated with DDs.
a, The number of genes exceeding genome-wide significance was estimated at three different fixed budgets ($USD1, 2 or 3 million) and a range of relative sensitivities for genomes versus exomes to detect DNMs. The number of genes identifiable by exome sequencing are shaded blue, whereas the number of genes identifiable by genome sequencing are shaded green. The regions where exome sequencing costs 30–40% of genome sequencing are shaded with a grey background, which corresponds to the price differential in 2016. b, Simulated estimates of power to detect loss-of-function genes in the genome at different cohort sizes, given fixed budgets.
Extended Data Figure 7 Gene-wise significance of neurodevelopmental genes versus the expected number of mutations per gene.
Points are shaded by clinical recognizability classification (blue and brown points denote cryptic and distinctive disorders, respectively). Genes have been separated into two plots. Left, genes for cryptic disorders with low, mild or moderate clinical recognizability. Right, genes for distinctive disorders with high clinical recognizability.
a, Sensitivity and specificity of DNM validations within sets filtered using varying thresholds of DNM quality (posterior probability of DNM). The analysed DNMs were restricted to sites identified within the earlier 1,133 trios15, where all candidate DNMs underwent validation experiments. The labelled value is the quality threshold at which the number of candidate synonymous DNMs equals the number of expected synonymous mutations under a null germline mutation rate. b, Excess of missense and loss-of-function DNMs at varying DNM quality thresholds. The DNM excess is adjusted for the sensitivity and specificity at each threshold.
This file contains a Supplementary Note and the Phenicons for the 94 genes exceeding genome-wide significance (see page 2 for details). (PDF 4196 kb)
This file contains Supplementary Tables 1-4 comprising: (1) de novo mutations (DNM) in the 4,293 DDD individuals. It includes sex, chromosome, position, reference and alternate alleles, HGNC symbol, VEP consequence, posterior probability of DNM and validation status where available. Individual IDs are available on request. This list excludes the sites that failed validations, but includes sites that passed validation (confirmed), sites that were uncertain (uncertain), and sites that were not tested by secondary validation (NA). Genome positions are given as GRCh37 coordinates; (2) Details of cohorts used in meta-analyses. This includes numbers of individuals by sex and publication details; (3) Genes with genome-wide significant statistical evidence to be developmental disorder genes. The numbers of unrelated individuals with independent de novo mutations (DNMs) are given for protein truncating variants (PTV) and missense variants. If any additional individuals were in other cohorts, that number is given in brackets. The P-value reported is the minimum P-value from the testing of the DDD dataset or the meta-analysis dataset. The subset providing the P-value is also listed. Mutations are considered clustered if the P-value proximity clustering of DNMs is less than 0.01; (4) Comparison of known haploinsufficient (HI) neurodevelopment genes to HI and non-HI enrichment models. Genes are ranked by difference in the Akaike’s Information Criterion computed for models where the genes match either expected non-HI PTV enrichment (model 1), or expected HI protein-truncating variant (PTV) enrichment (model 2). (XLSX 629 kb)
About this article
Cite this article
Deciphering Developmental Disorders Study. Prevalence and architecture of de novo mutations in developmental disorders. Nature 542, 433–438 (2017). https://doi.org/10.1038/nature21062
This article is cited by
Drosophila CASK regulates brain size and neuronal morphogenesis, providing a genetic model of postnatal microcephaly suitable for drug discovery
Neural Development (2023)
Molecular Cytogenetics (2023)
An unsupervised deep learning framework for predicting human essential genes from population and functional genomic data
BMC Bioinformatics (2023)
Nature Reviews Neurology (2023)
Cell Death & Disease (2023)