Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Evidence for 28 genetic disorders discovered by combining healthcare and research data


De novo mutations in protein-coding genes are a well-established cause of developmental disorders1. However, genes known to be associated with developmental disorders account for only a minority of the observed excess of such de novo mutations1,2. Here, to identify previously undescribed genes associated with developmental disorders, we integrate healthcare and research exome-sequence data from 31,058 parent–offspring trios of individuals with developmental disorders, and develop a simulation-based statistical test to identify gene-specific enrichment of de novo mutations. We identified 285 genes that were significantly associated with developmental disorders, including 28 that had not previously been robustly associated with developmental disorders. Although we detected more genes associated with developmental disorders, much of the excess of de novo mutations in protein-coding genes remains unaccounted for. Modelling suggests that more than 1,000 genes associated with developmental disorders have not yet been described, many of which are likely to be less penetrant than the currently known genes. Research access to clinical diagnostic datasets will be critical for completing the map of genes associated with developmental disorders.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Results of DeNovoWEST analysis.
Fig. 2: Properties of the novel genes.
Fig. 3: Factors that influence power to detect DD-associated genes.

Similar content being viewed by others

Data availability

Sequence and variant-level data and phenotypic data for the DDD study data are available from the European Genome-phenome Archive (EGA; with study ID EGAS00001000775. The RadboudUMC sequence and variant-level data cannot be made available through the EGA owing to the nature of consent for clinical testing. To access the data, please contact C.G. ( with a request. Data sharing will be dependent on patient consent, diagnostic status of the patient, the type of request and the potential benefit to the patient. GeneDx data cannot be made available through the EGA owing to the nature of consent for clinical testing. GeneDx-referred patients are consented for aggregate, deidentified research and subject to US HIPAA privacy protection. As such, we are not able to share patient-level BAM or VCF data, which are potentially identifiable without a HIPAA Business Associate Agreement. Access to the deidentified aggregate data used in this analysis is available upon request to GeneDx. GeneDx has contributed deidentified data to this study to improve clinical interpretation of genomic data, in accordance with patient consent and in conformance with the ACMG position statement on genomic data sharing (details are provided in the Supplementary Note). Clinically interpreted variants and associated phenotypes from the DDD study are available through DECIPHER ( Clinically interpreted variants from RUMC are available from the Dutch national initiative for sharing variant classifications ( as well as LOVD (, where they are listed with ‘VKGL-NL_Nijmegen’ as the owner. Clinically interpreted variants from GeneDx are deposited in ClinVar ( under accession number 26957 ( Previously described datasets were from the Genome Aggregation Database (gnomAD v2.1.1;, The Cancer Genome Atlas (TCGA; and the Developmental Disorders Genotype-Phenotype Database (DDG2P;

Code availability

The DeNovoWEST method is available on GitHub ( along with code to recreate all of the figures in the manuscript ( Code to run the Phenopy method is also available on GitHub (


  1. Deciphering Developmental Disorders Study. Prevalence and architecture of de novo mutations in developmental disorders. Nature 542, 433–438 (2017).

    Article  Google Scholar 

  2. Martin, H. C. et al. Quantifying the contribution of recessive coding variation to developmental disorders. Science 362, 1161–1164 (2018).

    Article  ADS  CAS  Google Scholar 

  3. Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).

    Article  CAS  Google Scholar 

  4. Samocha, K. E. et al. Regional missense constraint improves variant deleteriousness prediction. Preprint at (2017).

  5. Kosmicki, J. A. et al. Refining the role of de novo protein-truncating variants in neurodevelopmental disorders by using population reference samples. Nat. Genet. 49, 504–510 (2017).

    Article  CAS  Google Scholar 

  6. Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).

    Article  ADS  CAS  Google Scholar 

  7. Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).

    Article  CAS  Google Scholar 

  8. Cooper, G. M. et al. A copy number variation morbidity map of developmental delay. Nat. Genet. 43, 838–846 (2011).

    Article  CAS  Google Scholar 

  9. Coe, B. P. et al. Refining analyses of copy number variation identifies specific genes associated with developmental delay. Nat. Genet. 46, 1063–1071 (2014).

    Article  CAS  Google Scholar 

  10. Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).

    Article  CAS  Google Scholar 

  11. Villegas, F. et al. Lysosomal signaling licenses embryonic stem cell differentiation via inactivation of Tfe3. Cell Stem Cell 24, 257–270 (2019).

    Article  CAS  Google Scholar 

  12. Diaz, J., Berger, S. & Leon, E. TFE3-associated neurodevelopmental disorder: a distinct recognizable syndrome. Am. J. Med. Genet. A 182, 584–590 (2020).

    Article  CAS  Google Scholar 

  13. Jaganathan, K. et al. Predicting splicing from primary sequence with deep learning. Cell 176, 535–548 (2019).

    Article  CAS  Google Scholar 

  14. Yilmaz, R. et al. A recurrent synonymous KAT6B mutation causes Say-Barber-Biesecker/Young-Simpson syndrome by inducing aberrant splicing. Am. J. Med. Genet. A 167, 3006–3010 (2015).

    Article  CAS  Google Scholar 

  15. Wu, X., Pang, E., Lin, K. & Pei, Z.-M. Improving the measurement of semantic similarity between gene ontology terms and gene products: insights from an edge- and IC-based hybrid method. PLoS ONE 8, e66745 (2013).

    Article  ADS  CAS  Google Scholar 

  16. Catterall, W. A., Dib-Hajj, S., Meisler, M. H. & Pietrobon, D. Inherited neuronal ion channelopathies: new windows on complex neurological diseases. J. Neurosci. 28, 11768–11777 (2008).

    Article  CAS  Google Scholar 

  17. Lasser, M., Tiber, J. & Lowery, L. A. The role of the microtubule cytoskeleton in neurodevelopmental disorders. Front. Cell. Neurosci. 12, 165 (2018).

    Article  Google Scholar 

  18. Hamilton, M. J. et al. Heterozygous mutations affecting the protein kinase domain of CDK13 cause a syndromic form of developmental delay and intellectual disability. J. Med. Genet. 55, 28–38 (2018).

    Article  CAS  Google Scholar 

  19. Martincorena, I. et al. Universal patterns of selection in cancer and somatic tissues. Cell 171, 1029–1041 (2017).

    Article  CAS  Google Scholar 

  20. Qi, H., Dong, C., Chung, W. K., Wang, K. & Shen, Y. Deep genetic connection between cancer and developmental disorders. Hum. Mutat. 37, 1042–1050 (2016).

    Article  Google Scholar 

  21. Ronan, J. L., Wu, W. & Crabtree, G. R. From neural development to cognition: unexpected roles for chromatin. Nat. Rev. Genet. 14, 347–359 (2013).

    Article  CAS  Google Scholar 

  22. Cancer Genome Atlas Research Network et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).

    Article  Google Scholar 

  23. Goriely, A. & Wilkie, A. O. M. Paternal age effect mutations and selfish spermatogonial selection: causes and consequences for human disease. Am. J. Hum. Genet. 90, 175–200 (2012).

    Article  CAS  Google Scholar 

  24. Duncan, B. K. & Miller, J. H. Mutagenic deamination of cytosine residues in DNA. Nature 287, 560–561 (1980).

    Article  ADS  CAS  Google Scholar 

  25. Maher, G. J. et al. Visualizing the origins of selfish de novo mutations in individual seminiferous tubules of human testes. Proc. Natl Acad. Sci. USA 113, 2454–2459 (2016).

    Article  ADS  CAS  Google Scholar 

  26. Maher, G. J. et al. Selfish mutations dysregulating RAS-MAPK signaling are pervasive in aged human testes. Genome Res. 28, 1779–1790 (2018).

    Article  CAS  Google Scholar 

  27. Young, L. C. et al. SHOC2–MRAS–PP1 complex positively regulates RAF activity and contributes to Noonan syndrome pathogenesis. Proc. Natl Acad. Sci. USA 115, E10576–E10585 (2018).

    Article  CAS  Google Scholar 

  28. Coe, B. P. et al. Neurodevelopmental disease genes implicated by de novo mutation and copy number variation morbidity. Nat. Genet. 51, 106–116 (2019).

    Article  CAS  Google Scholar 

  29. Lord, J. et al. Prenatal exome sequencing analysis in fetal structural anomalies detected by ultrasonography (PAGE): a cohort study. Lancet 393, 747–757 (2019).

    Article  CAS  Google Scholar 

  30. Cassa, C. A. et al. Estimating the selective effects of heterozygous protein-truncating variants from human exome data. Nat. Genet. 49, 806–810 (2017).

    Article  CAS  Google Scholar 

  31. Deciphering Developmental Disorders Study. Large-scale discovery of novel genetic causes of developmental disorders. Nature 519, 223–228 (2015).

    Article  ADS  Google Scholar 

  32. Satterstrom, F. K. et al. Large-scale exome sequencing study implicates both developmental and functional changes in the neurobiology of autism. Cell 180, 568–584 (2020).

    Article  CAS  Google Scholar 

  33. Deelen, P. et al. Improving the diagnostic yield of exome-sequencing by predicting gene–phenotype associations using large-scale gene expression analysis. Nat. Commun. 10, 2837 (2019).

    Article  ADS  Google Scholar 

  34. He, X. et al. Integrated model of de novo and inherited genetic variants yields greater power to identify risk genes. PLoS Genet. 9, e1003671 (2013).

    Article  CAS  Google Scholar 

Download references


We thank the families and their clinicians for their participation and engagement, and our colleagues who assisted in the generation and processing of data. Inclusion of RadboudUMC data was in part supported by the Solve-RD project that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement no. 779257. This work was in part financially supported by grants from the Netherlands Organization for Scientific Research (917-17-353 to C.G.). The DDD study presents independent research commissioned by the Health Innovation Challenge Fund (grant number HICF-1009-003). This study makes use of DECIPHER, which is funded by the Wellcome Trust. The full acknowledgements can be found at The DDD study authors acknowledges the work of R. Kelsell. Finally, we acknowledge the contribution of an esteemed DDD clinical collaborator, M. Bitner-Glindicz, who died during the course of the study.

Author information

Authors and Affiliations




J.K., K.E.S., L.W., K.J.A., M.E.H., C.G. and K.R. contributed to the generation of figures and writing of the manuscript. J.K., K.E.S., L.W., Z.Z., K.J.A., R.Y.E., G.G., S.H.L., H.C.M., J.F.M., E.d.B., R.P., M.R.F.R. and H.G.Y. contributed to the generation and quality control of data. J.K., K.E.S., L.W., Z.Z., K.J.A., R.I.T., J.F.M., P.J.S., P.D., E.J.G., N.H., J.L., I.M., A.Y. and K.R. developed methods, contributed data or performed analyses. H.C.M., L.E.L.M.V., J.J., C.F.W., H.G.B., H.V.F., D.R.F., J.C.B., M.E.H., C.G. and K.R. provided experimental and analytical supervision. M.E.H., C.G. and K.R. provided project supervision.

Corresponding author

Correspondence to Matthew E. Hurles.

Ethics declarations

Competing interests

Z.Z., K.J.A., R.I.T., J.J. and K.R. are employees of GeneDx. J.J. and K.R. are shareholders of OPKO. M.E.H. is a co-founder of, consultant to and holds shares in Congenica, a genetics diagnostic company.

Additional information

Peer review information Nature thanks Ipsita Agarwal, James Lupski, Shamil Sunyaev and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Exploring the remaining number of DD genes.

a, Number of significant genes after downsampling the full cohort and running the enrichment test of DeNovoWEST. b, The likelihood of the observed distribution of de novo PTV mutations was modelled. This model varies the numbers of remaining haploinsufficient (HI) DD genes and PTV enrichment in those remaining genes. The 50% credible interval is shown in red and the 90% credible interval is shown in orange. Note that the median PTV enrichment in genes that are significant and known to operate through a loss-of-function mechanism (as indicated by an arrow) is 39.7.

Extended Data Table 1 Recurrent mutations

Supplementary information

Supplementary Information

This file contains Supplementary Methods and descriptions of Supplementary Analyses, Supplementary Figures 1-14, descriptions of Supplementary Tables 1-3, Supplementary Tables 4-9 and Supplementary References. It also contains Supplementary notes detailing the DDD consortia members.

Reporting Summary


Supplementary Table 1 De novo mutations from 31,058 individuals with developmental disorders. For every de novo mutation, we provide: proband ID (‘id’), chromosome (‘chrom’), position in GRCh37 (‘pos’), the reference allele (‘ref’), the alternative allele (‘alt’), the VEP consequence of the mutation (‘consequence’), the HGNC symbol (‘symbol’), the centre which sequence the proband (‘study’), the fraction of reads that are from the alternative allele (‘altprop_child’), and the HGNC ID (‘hgnc_id’).


Supplementary Table 2 Results of DeNovoWEST. Results from analysis on the full cohort and on the undiagnosed subset, along with gene-level DNM counts per consequence. Column headers are described within the file.


Supplementary Table 3 Novel genes. For each of the 28 novel genes in this analysis, we determined if it had an associated phenotype in OMIM, any publications about an association between mutations in that gene and developmental disorders, and whether it was significant in a study of inherited and de novo mutations in autism spectrum disorders21 (“sig_ASD”) or a metaanalysis of de novo mutations in individuals with neurodevelopmental disorders22 (“sig_meta”).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kaplanis, J., Samocha, K.E., Wiel, L. et al. Evidence for 28 genetic disorders discovered by combining healthcare and research data. Nature 586, 757–762 (2020).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing