Abstract
To identify candidate genes for intellectual disability, we performed a meta-analysis on 2,637 de novo mutations, identified from the exomes of 2,104 patient–parent trios. Statistical analyses identified 10 new candidate ID genes: DLG4, PPM1D, RAC1, SMAD6, SON, SOX5, SYNCRIP, TCF20, TLK2 and TRIP12. In addition, we show that these genes are intolerant to nonsynonymous variation and that mutations in these genes are associated with specific clinical ID phenotypes.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Accession codes
Accessions
NCBI Reference Sequence
References
Gulsuner, S. et al. Cell 154, 518–529 (2013).
Iossifov, I. et al. Nature 515, 216–221 (2014).
Gilissen, C. et al. Nature 511, 344–347 (2014).
O'Roak, B.J. et al. Nat. Commun. 5, 5595 (2014).
Deciphering Developmental Disorders Study. Nature 519, 223–228 (2015).
Samocha, K.E. et al. Nat. Genet. 46, 944–950 (2014).
Luscan, A. et al. J. Med. Genet. 51, 512–517 (2014).
Lumish, H.S., Wynn, J., Devinsky, O. & Chung, W.K. J. Autism Dev. Disord. 45, 3764–3770 (2015).
Lek, M. et al. Preprint at bioRxiv http://dx.doi.org/10.1101/030338 (2016).
Krumm, N., O'Roak, B.J., Shendure, J. & Eichler, E.E. Trends Neurosci. 37, 95–105 (2014).
McRae, J.F. et al. Preprint at bioRxiv http://dx.doi.org/10.1101/049056 (2016).
Neveling, K. et al. Hum. Mutat. 34, 1721–1726 (2013).
de Ligt, J. et al. N. Engl. J. Med. 367, 1921–1929 (2012).
Strom, S.P. et al. Genet. Med. 16, 510–515 (2014).
Genome Diagnostics Nijmegen. Gene Panel: Intellectual Disability https://www.radboudumc.nl/Informatievoorverwijzers/Genoomdiagnostiek/en/Pages/Intellectualdisability.aspx (2015).
Kong, A. et al. Nature 488, 471–475 (2012).
Goeman, J.J. & Solari, A. Stat. Med. 33, 1946–1978 (2014).
MacArthur, D.G. et al. Science 335, 823–828 (2012).
Zhu, J., He, F., Song, S., Wang, J. & Yu, J. BMC Genomics 9, 172 (2008).
Petrovski, S., Wang, Q., Heinzen, E.L., Allen, A.S. & Goldstein, D.B. PLoS Genet. 9, e1003709 (2013).
Acknowledgements
The authors would like to thank the Exome Aggregation Consortium and the groups that provided exome variant data for comparison. A full list of contributing groups can be found at http://exac.broadinstitute.org/about. We thank all clinicians involved for referring individuals with ID for diagnostic exome sequencing. We thank J. Goeman for statistical advice and M. Hurles for discussions. We would also like to thank the participating individuals and their families. This work was in part financially supported by grants from the Netherlands Organization for Scientific Research (912-12-109 to J.A.V., A.S. and B.B.A.d.V.; 916-14-043 to C.G.; 907-00-365 to T.K. and SH-271-13 to C.G. and J.A.V.) and the European Research Council (ERC Starting Grant DENOVO 281964 to J.A.V.).
Author information
Authors and Affiliations
Contributions
C.G., L.E.L.M.V. and H.G.B. designed the study; S.H.L., M.R.F.R., C.G. and L.E.L.M.V. performed the analysis. R.P., H.G.Y., E.-J.K., T.R., S.J.C.S., A.P.A.S. and M.R.N. signed out initial diagnostic reports. P.d.V. performed Sanger validations. B.B.A.d.V., M.H.W., T.K., K.L., M.V., I.v.d.B., E.M.H.F.B., P.R. and M.R.F.R. collected patient phenotypes. S.H.L., M.R.F.R., J.A.V., H.G.B., L.E.L.M.V. and C.G. drafted the manuscript; all authors contributed to the final version of the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 Overview of the percentage of on target coverage by 10 or more reads.
A scatter plot of the percentage of on target coverage by 10 or more reads shown for the 820 samples of the RUMC cohort. In this figure the samples are ordered ascending based on the percentage of on target coverage. The median percentage of 98.9% on target coverage by 10 or more reads of the RUMC cohort is depicted by the black thick line.
Supplementary Figure 2 Distribution of de novo mutations (DNM) over patients in the RUMC cohort.
In total, 619 of 820 patients had at least one de novo mutation.
Supplementary Figure 3 Distribution of de novo mutations per gene in the RUMC cohort.
In total, 619 of 820 patients had at least one de novo mutation.
Supplementary Figure 4 Schematic representation of the location of de novo mutations identified in the RUMC cohort and their presumed effect on protein function.
*Premature Termination Codon (PTC); An insertion or deletion does not introduce a frameshift event, but directly creates a PTC.
Supplementary Figure 5 Simulations of recurrent mutated genes of the RUMC cohort.
The two panels show the distribution of recurrently mutated genes based on 100,000 simulations resampling the 211 LoF and 872 fuctional de novo mutations of the 820 ID patients in the RUMC-cohort. Simulations are based on the gene specific mutation rates of Samocha et al. (Samocha, K.E., et al. Nat. Genet. 46, 944–950 (2014)).The colored boxes indicate the interquartile range; the whiskers indicate the full interval and the orange diamond indicate the observed number of recurrent de novo mutated genes in the RUMC cohort. a. For the loss-of-function simulations the observed number of recurrently mutated genes (N = 23, depicted by the diamond) does statistically differ from the simulations. (LoF simulations in orange boxplot: μ = 2.31, σ = 1.49; empirical P-value: <1.00×10−05, Z-value = 13.90) b. For the functional simulations the observed number of recurrently mutated genes (N = 85, depicted by the diamond) does statistically differ from the simulations (functional simulations in green boxplot: μ = 34.98, σ = 5.40; empirical P-value: <1.00×10−05, Z-value = 9.26).
Supplementary Figure 6 Flow chart of cohort construction for statistical analysis.
Based on the presence of de novo mutations (DNMs) in 1,537 known ID genes (Supplementary table 4) the patients were divided among two groups for a) the RUMC cohort (820 trios) b) the combined ID cohort (2,104 trios) and c) the neurodevelopmental cohort (6,206 trios). The group on the left side in the color red indicate the patients with DNMs found in known ID genes. On the right, in the green color, is the group of patients without DNMs present in the 1,537 known ID genes. The statistical analysis was performed on the cohort consisting of patients without DNMs in known genes. The number of trios and DNMs present in genes is shown for each group.
Supplementary Figure 7 Schematic overview of de novo mutations found in SETD2
SETD2 (Q9BYW2) with de novo mutations in individuals with ID and ASD. Individuals show similar overgrowth phenotype including macrocephaly, tall stature and facial dysmorphisms (Supplementary Note).
Supplementary Figure 8 Intolerance to loss-of-function (LoF) variation for ID genes.
The box plots indicate the distribution of median pLI (probability LoF intolerant) based on 100,000 permutations of equal sized gene sets (see Online Methods). Gene sets analyzed include genes with at least one functional de novo mutation in healthy control set (pink box; N=1,300), dominant ID genes (blue box; N=423), and novel candidate ID genes (green box, N=10). In orange, all genes (n=21, all known dominant ID genes) were used for which we observed at least three amino missense DNM - and no loss-of-function - mutations, suggestive for genes with gain-of-function and/or dominant negative effect. The red diamonds show the observed median pLI per category. The closer a pLI is to 1 the more intolerant a gene is to LoF variants. A pLI >= 0.9 is considered as an extremely LoF intolerant set of genes (Lek, M., et al. Preprint at bioRxiv http://dx.doi.org/10.1101/030338 (2016)) For the control genes, the observed median pLI matched the simulated distribution of median pLI (Observed: 0.03; simulated distribution: μ=0.03, σ=0.01; empirical p-value: 0.31; Z-value = 0.39). For the set of dominant ID genes and ten novel candidate ID genes, the observed median pLI is significantly higher than the simulated distribution of median pLI (Dominant ID genes: observed: 0.87; simulated distribution: μ=0.03, σ=0.01; empirical p-value <1x10−5; Z-value: 61.54 and Novel candidate ID genes: Observed 0.99; simulated distribution: μ= 0.14, σ= 0.20; empirical p-value <1x10−5; Z-value = 4.28). For the dominant ‘missense only’ genes (with at least 3 missense mutations in the absence of LoF mutations). we observed the highest median pLI of all evaluated gene sets (Observed: 0.9999; simulated distribution: μ=0.09, σ=0.14; empirical p-value <1x10−5; Z-value = 6.70)
Supplementary Figure 9 Genes enriched for LoF and functional de novo mutations in the cohort of 6,206 individuals with neurodevelopmental disease.
The y-axes shows the -log10(P) value of the mutation enrichment. Corrected P-values based on LoF mutations are colored in blue and corrected P-values based on functional mutations are colored green. Only genes with a corrected P-value (LoF, functional, or both) less than the significance threshold (red dotted line, 0.05) are shown.
Supplementary Figure 10 Schematic representation of a synapse, with special focus on the postsynaptic density (PSD)
Proteins playing an essential role in the PSD for signaling cascades and/or receptor trafficking are schematically depicted, as well as the AMPA Receptor (AMPAR), NMDA receptor (NMDAR), metabolic Glutamate receptor (mGluR), Calcium and Potassium channels (Ca2+ and K+ respectively)(Iasevoli, F., Tomasetti, C. & de Bartolomeis, A. Neurochem Res 38, 1-22 (2013)) An overlay was made between all DNMs identified in the NDD meta-analysis and the genes playing essential roles in the PSD, as well as with the list of known ID genes. Known ID genes are indicated by an asterisk. For genes in blue, we identified at least one DNM in the ID cohort, whereas the genes in orange were restricted to carry DNMs in the EE, SCZ and/or ASD patients. For genes in green, we identified DNMs in our meta-analysis both the ID and (at least one of the) NDD cohorts. Genes listed in black play a role in e.g. complex formation of the AMPAR or NMDAR, but in have not been identified to carry DNM in our current ID/NDD cohort. Importantly, three of ten genes which we identified as novel candidate ID gene play a role in the PSD and its downstream processes. DLG4, encoding post-synaptic density protein 95 (PSD95), is one of the core PSD proteins (Zalfa, F., et al. Nat Neurosci 10, 578-587 (2007)), whereas RAC1 and TCF7L2 are important in downstream signaling cascades, including Rho- and Wnt signaling respectively (novel candidate ID are underlined and highlighted in red).
Supplementary Figure 11 Simulations of recurrent mutated genes of the control cohort.
The two panels show the distribution of recurrently mutated genes based on 100,000 simulations resampling the 196 LoF and 1,478 functional de novo mutations of the 2,299 control cohort. Simulations are based on the gene specific mutation rates of Samocha et al. (Samocha, K.E., et al. Nat. Genet. 46, 944–950 (2014)) The colored boxes indicate the interquartile range; the whiskers indicate the full interval and the orange diamond indicate the observed number of recurrent de novo mutated genes in control cohort. a. For the loss-of-function simulations the observed number of recurrently mutated genes (N = 2, depicted by the diamond) does not statistically differ from the simulations (simulations in orange boxplot: μ = 2.00, σ = 1.39; empirical P-value: 0.60; Z-value = 2.69×10−3) b. For the functional simulations the observed number of recurrently mutated genes (N = 103, depicted by the diamond) does not statistically differ (simulations in green boxplot: μ=93.17 σ=8.33; empirical P-value: 0.13; Z-value = 1.39).
Supplementary Figure 12 Gene set based evaluation of pLI.
The box plots indicate the distribution of median pLI (probability Loss-of-function intolerant) based on 100,000 permutations of equal sized gene sets (see Online Methods). Gene sets analyzed include LoFT tolerant genes with (light blue; N=163), genes with at least one functional de novo mutation in healthy control set (pink box; N=1,300), House keeping (HK) genes (yellow box; N=398), and dominant ID genes (N=423). The red diamonds show the observed median pLI per category. The closer a pLI is to 1 the more intolerant a gene is to LoF variants. A pLI >= 0.9 is considered as an extremely LoF intolerant set of genes (Lek, M., et al. Preprint at bioRxiv http://dx.doi.org/10.1101/030338 (2016)). The median pLI for the loss-of-function tolerant genes was significantly lower than the simulated distribution of median pLI (observed 9.33x10−9; simulated distribution: μ= 0.04, σ= 0.03; empirical p-value: <1x10−5; Z-value=1.25). For the gene set with DNM in healthy controls, the observed median pLI matched the simulated distribution of median pLI (observed 0.03; simulated distribution: μ= 0.03, σ= 0.01; empirical p-value: 0.31; Z-value=0.39). For the “house-keeping” and dominant ID gene sets, the observed median pLI is significantly higher than the simulated distribution of median pLI (HK genes: observed: 0.87; simulated distribution: μ= 0.03, σ= 0.02; empirical p-value <1x10−5; Z-value = 54.05 and Dominant ID genes: Observed: 0.95; simulated distribution: μ= 0.03, σ= 0.01; empirical p-value <1x10−5; Z-value = 61.54).
Supplementary Figure 13 Gene set based evaluation of RVIS.
The box plots indicate the distribution of median RVIS (Residual Variation Intolerance Score) based on 100,000 permutations of equal sized gene sets (see Online Methods). Gene sets analyzed include loss-of-function tolerant (LoFT) genes (light blue; N=161), genes with at least one functional de novo mutation in healthy control set (pink box; N=1,262),), House-keeping (HK) genes (yellow box; N=397), dominant ID genes (blue box; N=412), and novel candidate ID genes (green box, N=9). In orange, all genes (N=21, all known dominant ID genes) were used for which we observed at least three amino missense DNM - and no loss-of-function - mutations, suggestive for genes with gain-of-function and/or dominant negative effect. The red diamonds show the observed median RVIS per category. Based on the simulations, depicted by the boxplots, we could identified a significant higher median RVIS for the LoFT genes which is in line with the tolerant nature of this gene set (Observed: 85.04; simulated distribution: μ=50.01, σ=2.48; empirical p-value: <1x10−5; Z-value: 14.14). For the healthy control set the observed median RVIS was significantly lower than the expected median RVIS (observed: 37.05; simulated distribution: μ=50.01, σ=1.36; empirical p-value: <1x10−5; Z-value=-9.56). For the House-keeping and dominant ID gene sets the observed median RVIS is significantly lower than the simulated distribution of median RVIS (HK genes: observed: 32.80; simulated distribution: μ=50.01, σ=2.48; empirical p-value <1x10−5; Z-value: -6.95; and Dominant ID genes: observed: 18.92; simulated distribution: μ=50.02, σ=2.43; empirical p-value <1x10−5; Z-value: -12.82). The set of novel candidate ID genes has an observed median RVIS of 8.47 (simulated distribution: μ=50.05, σ=15.08; empirical p-value = 4.60x10−4; Z-value: -2.76). For the 21 dominant ‘missense only’ genes (with at least 3 missense mutations in the absence of LoF mutations) we observe the lowest median RVIS of 3.56 (simulated distribution: μ=50.02, σ=10.42; empirical p-value <1x10−5; Z-value: -4.46) again illustrating that those known and novel candidate dominant ID genes that harbor only missense variants are among the most intolerant ID genes.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–13, Supplementary Tables 1, 7, 11–13, and Supplementary Note (PDF 2748 kb)
Supplementary Table 2
All identified de novo mutations in the RUMC cohort (XLSX 124 kb)
Supplementary Table 3
Genes significantly enriched for de novo mutations in the full RUMC cohort (XLS 337 kb)
Supplementary Table 4
List of known ID genes (XLSX 57 kb)
Supplementary Table 5
Gene statistics for the RUMC set (XLS 212 kb)
Supplementary Table 6
Gene statistics for the ID set (XLS 430 kb)
Supplementary Table 8
Gene sets and corresponding pLI and RVIS (XLS 239 kb)
Supplementary Table 9
Clustering of mutations in genes with only missense mutations (XLS 31 kb)
Supplementary Table 10
Gene statistics for the NND set (XLSX 674 kb)
Supplementary Table 14
Gene statistics for the control set (XLS 621 kb)
Rights and permissions
About this article
Cite this article
Lelieveld, S., Reijnders, M., Pfundt, R. et al. Meta-analysis of 2,104 trios provides support for 10 new genes for intellectual disability. Nat Neurosci 19, 1194–1196 (2016). https://doi.org/10.1038/nn.4352
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nn.4352