De novo mutations in protein-coding genes are a well-established cause of developmental disorders1. However, genes known to be associated with developmental disorders account for only a minority of the observed excess of such de novo mutations1,2. Here, to identify previously undescribed genes associated with developmental disorders, we integrate healthcare and research exome-sequence data from 31,058 parent–offspring trios of individuals with developmental disorders, and develop a simulation-based statistical test to identify gene-specific enrichment of de novo mutations. We identified 285 genes that were significantly associated with developmental disorders, including 28 that had not previously been robustly associated with developmental disorders. Although we detected more genes associated with developmental disorders, much of the excess of de novo mutations in protein-coding genes remains unaccounted for. Modelling suggests that more than 1,000 genes associated with developmental disorders have not yet been described, many of which are likely to be less penetrant than the currently known genes. Research access to clinical diagnostic datasets will be critical for completing the map of genes associated with developmental disorders.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Nature Genetics Open Access 27 July 2023
Communications Medicine Open Access 17 July 2023
Deep exome sequencing identifies enrichment of deleterious mosaic variants in neurodevelopmental disorder genes and mitochondrial tRNA regions in bipolar disorder
Molecular Psychiatry Open Access 30 May 2023
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Rent or buy this article
Prices vary by article type
Prices may be subject to local taxes which are calculated during checkout
Sequence and variant-level data and phenotypic data for the DDD study data are available from the European Genome-phenome Archive (EGA; https://www.ebi.ac.uk/ega/) with study ID EGAS00001000775. The RadboudUMC sequence and variant-level data cannot be made available through the EGA owing to the nature of consent for clinical testing. To access the data, please contact C.G. (firstname.lastname@example.org) with a request. Data sharing will be dependent on patient consent, diagnostic status of the patient, the type of request and the potential benefit to the patient. GeneDx data cannot be made available through the EGA owing to the nature of consent for clinical testing. GeneDx-referred patients are consented for aggregate, deidentified research and subject to US HIPAA privacy protection. As such, we are not able to share patient-level BAM or VCF data, which are potentially identifiable without a HIPAA Business Associate Agreement. Access to the deidentified aggregate data used in this analysis is available upon request to GeneDx. GeneDx has contributed deidentified data to this study to improve clinical interpretation of genomic data, in accordance with patient consent and in conformance with the ACMG position statement on genomic data sharing (details are provided in the Supplementary Note). Clinically interpreted variants and associated phenotypes from the DDD study are available through DECIPHER (https://decipher.sanger.ac.uk). Clinically interpreted variants from RUMC are available from the Dutch national initiative for sharing variant classifications (https://www.vkgl.nl/nl/diagnostiek/vkgl-datashare-database) as well as LOVD (https://databases.lovd.nl/shared/variants), where they are listed with ‘VKGL-NL_Nijmegen’ as the owner. Clinically interpreted variants from GeneDx are deposited in ClinVar (https://www.ncbi.nlm.nih.gov/clinvar) under accession number 26957 (https://www.ncbi.nlm.nih.gov/clinvar/submitters/26957/). Previously described datasets were from the Genome Aggregation Database (gnomAD v2.1.1; https://gnomad.broadinstitute.org/), The Cancer Genome Atlas (TCGA; https://portal.gdc.cancer.gov) and the Developmental Disorders Genotype-Phenotype Database (DDG2P; https://www.ebi.ac.uk/gene2phenotype/downloads).
Deciphering Developmental Disorders Study. Prevalence and architecture of de novo mutations in developmental disorders. Nature 542, 433–438 (2017).
Martin, H. C. et al. Quantifying the contribution of recessive coding variation to developmental disorders. Science 362, 1161–1164 (2018).
Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).
Samocha, K. E. et al. Regional missense constraint improves variant deleteriousness prediction. Preprint at https://doi.org/10.1101/148353 (2017).
Kosmicki, J. A. et al. Refining the role of de novo protein-truncating variants in neurodevelopmental disorders by using population reference samples. Nat. Genet. 49, 504–510 (2017).
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
Cooper, G. M. et al. A copy number variation morbidity map of developmental delay. Nat. Genet. 43, 838–846 (2011).
Coe, B. P. et al. Refining analyses of copy number variation identifies specific genes associated with developmental delay. Nat. Genet. 46, 1063–1071 (2014).
Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).
Villegas, F. et al. Lysosomal signaling licenses embryonic stem cell differentiation via inactivation of Tfe3. Cell Stem Cell 24, 257–270 (2019).
Diaz, J., Berger, S. & Leon, E. TFE3-associated neurodevelopmental disorder: a distinct recognizable syndrome. Am. J. Med. Genet. A 182, 584–590 (2020).
Jaganathan, K. et al. Predicting splicing from primary sequence with deep learning. Cell 176, 535–548 (2019).
Yilmaz, R. et al. A recurrent synonymous KAT6B mutation causes Say-Barber-Biesecker/Young-Simpson syndrome by inducing aberrant splicing. Am. J. Med. Genet. A 167, 3006–3010 (2015).
Wu, X., Pang, E., Lin, K. & Pei, Z.-M. Improving the measurement of semantic similarity between gene ontology terms and gene products: insights from an edge- and IC-based hybrid method. PLoS ONE 8, e66745 (2013).
Catterall, W. A., Dib-Hajj, S., Meisler, M. H. & Pietrobon, D. Inherited neuronal ion channelopathies: new windows on complex neurological diseases. J. Neurosci. 28, 11768–11777 (2008).
Lasser, M., Tiber, J. & Lowery, L. A. The role of the microtubule cytoskeleton in neurodevelopmental disorders. Front. Cell. Neurosci. 12, 165 (2018).
Hamilton, M. J. et al. Heterozygous mutations affecting the protein kinase domain of CDK13 cause a syndromic form of developmental delay and intellectual disability. J. Med. Genet. 55, 28–38 (2018).
Martincorena, I. et al. Universal patterns of selection in cancer and somatic tissues. Cell 171, 1029–1041 (2017).
Qi, H., Dong, C., Chung, W. K., Wang, K. & Shen, Y. Deep genetic connection between cancer and developmental disorders. Hum. Mutat. 37, 1042–1050 (2016).
Ronan, J. L., Wu, W. & Crabtree, G. R. From neural development to cognition: unexpected roles for chromatin. Nat. Rev. Genet. 14, 347–359 (2013).
Cancer Genome Atlas Research Network et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).
Goriely, A. & Wilkie, A. O. M. Paternal age effect mutations and selfish spermatogonial selection: causes and consequences for human disease. Am. J. Hum. Genet. 90, 175–200 (2012).
Duncan, B. K. & Miller, J. H. Mutagenic deamination of cytosine residues in DNA. Nature 287, 560–561 (1980).
Maher, G. J. et al. Visualizing the origins of selfish de novo mutations in individual seminiferous tubules of human testes. Proc. Natl Acad. Sci. USA 113, 2454–2459 (2016).
Maher, G. J. et al. Selfish mutations dysregulating RAS-MAPK signaling are pervasive in aged human testes. Genome Res. 28, 1779–1790 (2018).
Young, L. C. et al. SHOC2–MRAS–PP1 complex positively regulates RAF activity and contributes to Noonan syndrome pathogenesis. Proc. Natl Acad. Sci. USA 115, E10576–E10585 (2018).
Coe, B. P. et al. Neurodevelopmental disease genes implicated by de novo mutation and copy number variation morbidity. Nat. Genet. 51, 106–116 (2019).
Lord, J. et al. Prenatal exome sequencing analysis in fetal structural anomalies detected by ultrasonography (PAGE): a cohort study. Lancet 393, 747–757 (2019).
Cassa, C. A. et al. Estimating the selective effects of heterozygous protein-truncating variants from human exome data. Nat. Genet. 49, 806–810 (2017).
Deciphering Developmental Disorders Study. Large-scale discovery of novel genetic causes of developmental disorders. Nature 519, 223–228 (2015).
Satterstrom, F. K. et al. Large-scale exome sequencing study implicates both developmental and functional changes in the neurobiology of autism. Cell 180, 568–584 (2020).
Deelen, P. et al. Improving the diagnostic yield of exome-sequencing by predicting gene–phenotype associations using large-scale gene expression analysis. Nat. Commun. 10, 2837 (2019).
He, X. et al. Integrated model of de novo and inherited genetic variants yields greater power to identify risk genes. PLoS Genet. 9, e1003671 (2013).
We thank the families and their clinicians for their participation and engagement, and our colleagues who assisted in the generation and processing of data. Inclusion of RadboudUMC data was in part supported by the Solve-RD project that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement no. 779257. This work was in part financially supported by grants from the Netherlands Organization for Scientific Research (917-17-353 to C.G.). The DDD study presents independent research commissioned by the Health Innovation Challenge Fund (grant number HICF-1009-003). This study makes use of DECIPHER, which is funded by the Wellcome Trust. The full acknowledgements can be found at www.ddduk.org/access.html. The DDD study authors acknowledges the work of R. Kelsell. Finally, we acknowledge the contribution of an esteemed DDD clinical collaborator, M. Bitner-Glindicz, who died during the course of the study.
Z.Z., K.J.A., R.I.T., J.J. and K.R. are employees of GeneDx. J.J. and K.R. are shareholders of OPKO. M.E.H. is a co-founder of, consultant to and holds shares in Congenica, a genetics diagnostic company.
Peer review information Nature thanks Ipsita Agarwal, James Lupski, Shamil Sunyaev and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
a, Number of significant genes after downsampling the full cohort and running the enrichment test of DeNovoWEST. b, The likelihood of the observed distribution of de novo PTV mutations was modelled. This model varies the numbers of remaining haploinsufficient (HI) DD genes and PTV enrichment in those remaining genes. The 50% credible interval is shown in red and the 90% credible interval is shown in orange. Note that the median PTV enrichment in genes that are significant and known to operate through a loss-of-function mechanism (as indicated by an arrow) is 39.7.
This file contains Supplementary Methods and descriptions of Supplementary Analyses, Supplementary Figures 1-14, descriptions of Supplementary Tables 1-3, Supplementary Tables 4-9 and Supplementary References. It also contains Supplementary notes detailing the DDD consortia members.
Supplementary Table 1 De novo mutations from 31,058 individuals with developmental disorders. For every de novo mutation, we provide: proband ID (‘id’), chromosome (‘chrom’), position in GRCh37 (‘pos’), the reference allele (‘ref’), the alternative allele (‘alt’), the VEP consequence of the mutation (‘consequence’), the HGNC symbol (‘symbol’), the centre which sequence the proband (‘study’), the fraction of reads that are from the alternative allele (‘altprop_child’), and the HGNC ID (‘hgnc_id’).
Supplementary Table 2 Results of DeNovoWEST. Results from analysis on the full cohort and on the undiagnosed subset, along with gene-level DNM counts per consequence. Column headers are described within the file.
Supplementary Table 3 Novel genes. For each of the 28 novel genes in this analysis, we determined if it had an associated phenotype in OMIM, any publications about an association between mutations in that gene and developmental disorders, and whether it was significant in a study of inherited and de novo mutations in autism spectrum disorders21 (“sig_ASD”) or a metaanalysis of de novo mutations in individuals with neurodevelopmental disorders22 (“sig_meta”).
About this article
Cite this article
Kaplanis, J., Samocha, K.E., Wiel, L. et al. Evidence for 28 genetic disorders discovered by combining healthcare and research data. Nature 586, 757–762 (2020). https://doi.org/10.1038/s41586-020-2832-5
This article is cited by
European Journal of Human Genetics (2023)
Nature Genetics (2023)
Nature Cell Biology (2023)
Communications Medicine (2023)
Nature Medicine (2023)