Genome sequencing identifies major causes of severe intellectual disability

Journal name:
Nature
Volume:
511,
Pages:
344–347
Date published:
DOI:
doi:10.1038/nature13394
Received
Accepted
Published online

Severe intellectual disability (ID) occurs in 0.5% of newborns and is thought to be largely genetic in origin1, 2. The extensive genetic heterogeneity of this disorder requires a genome-wide detection of all types of genetic variation. Microarray studies and, more recently, exome sequencing have demonstrated the importance of de novo copy number variations (CNVs) and single-nucleotide variations (SNVs) in ID, but the majority of cases remain undiagnosed3, 4, 5, 6. Here we applied whole-genome sequencing to 50 patients with severe ID and their unaffected parents. All patients included had not received a molecular diagnosis after extensive genetic prescreening, including microarray-based CNV studies and exome sequencing. Notwithstanding this prescreening, 84 de novo SNVs affecting the coding region were identified, which showed a statistically significant enrichment of loss-of-function mutations as well as an enrichment for genes previously implicated in ID-related disorders. In addition, we identified eight de novo CNVs, including single-exon and intra-exonic deletions, as well as interchromosomal duplications. These CNVs affected known ID genes more frequently than expected. On the basis of diagnostic interpretation of all de novo variants, a conclusive genetic diagnosis was reached in 20 patients. Together with one compound heterozygous CNV causing disease in a recessive mode, this results in a diagnostic yield of 42% in this extensively studied cohort, and 62% as a cumulative estimate in an unselected cohort. These results suggest that de novo SNVs and CNVs affecting the coding region are a major cause of severe ID. Genome sequencing can be applied as a single genetic test to reliably identify and characterize the comprehensive spectrum of genetic variation, providing a genetic diagnosis in the majority of patients with severe ID.

At a glance

Figures

  1. Study design and diagnostic yield in patients with severe ID per technology.
    Figure 1: Study design and diagnostic yield in patients with severe ID per technology.

    Diagnostic yield for patients with severe ID (IQ <50), specified by technology: genomic microarrays, WES and WGS. Percentages indicate the number of patients in whom a conclusive cause was identified using the specified technique. Brackets indicate the group of patients in whom no genetic cause was identified and whose DNA was subsequently analysed using the next technology. WES data are updated with permission from ref. 6 (see Supplementary Methods).

  2. Detected duplication of a chromosome 4 region into the X-chromosomal IQSEC2 gene.
    Figure 2: Detected duplication of a chromosome 4 region into the X-chromosomal IQSEC2 gene.

    ad, Graphical representation of a de novo duplication–insertion event in patient 31. a, Circos plot with chromosome numbers and de novo mutations in the outer shell. Red bars represent genome-wide potential de novo SNVs, whereas blue lines represent potential de novo CNVs/structural variants. Inner shell represents the location of known ID genes (red marks) with the respective gene names. Green line illustrates a duplication event on chromosome 4, which is inserted into chromosome X. b, Details for inserted duplication event on chromosome X. The last six exons of TENM3 are inserted in inverted orientation into intron 2 of IQSEC2, predicted to result in an in-frame IQSEC2-TENM3 fusion gene. ex., exon. c, d, PCR (c) on and Sanger sequencing (d) of complementary DNA junction fragment in patient 31. Lanes in c represent the following: M, 100bp marker; 1, cDNA of patient with cyclohexamide treatment; 2, cDNA of patient without cyclohexamide treatment; 3, control cDNA with cyclohexamide treatment; 4, control cDNA without cyclohexamide treatment. Our data verify the presence of a fusion gene in patient 31 that is suggested to escape nonsense-mediated decay.

  3. Pie chart showing role of de novo mutations in severe ID.
    Figure 3: Pie chart showing role of de novo mutations in severe ID.

    Contribution of genetic causes to severe ID on the basis of the cumulative estimates provided per technology. Our data indicate that de novo mutations are a major cause of severe ID. Note, small variants include SNVs and insertion/deletion events whereas large variants include structural variants and CNVs (>500bp).

  4. Boxplots of rare missense burden in different gene sets.
    Extended Data Fig. 1: Boxplots of rare missense burden in different gene sets.

    Boxplots showing the difference in tolerance for rare missense variation in the general population. The vertical axis shows the distribution for each gene set of the number of rare (<1% in NHLBI Exome Sequencing Project) missense variants divided by the number of rare synonymous variants. From left to right the following gene sets are depicted: all 18,424 RefSeq genes, 170 loss-of-function tolerant genes from ref. 30, all 528 known ID genes (Supplementary Table 10), all 628 candidate ID genes (Supplementary Table 11), 9 known ID genes in which de novo mutations were identified in this study (Supplementary Table 8), and 10 candidate ID genes in which de novo mutations were identified in this study (Supplementary Table 8).

  5. Structural variant involving STAG1 (patient 40).
    Extended Data Fig. 2: Structural variant involving STAG1 (patient 40).

    ac, CNV identified using WGS in patient 40, including the STAG1 gene. a, Chromosome 3 profile (log2 test over reference (T/R) ratios) based on read-depth information for patient, father and mother. Black arrow points towards the de novo event in patient 40. b, Genic contents of deletion. Grey arrows show primers used to amplify the junction fragment. c, Details on the proximal and distal breakpoints, showing the ‘fragmented’ sequence at both ends. Breakpoints are provided in Extended Data Table 1.

  6. Structural variant involving SHANK3 (patient 5).
    Extended Data Fig. 3: Structural variant involving SHANK3 (patient 5).

    ac, CNV identified using WGS in patient 5, including the SHANK3 gene. a, Detail of chromosome 22 profile (log2 T/R ratios) based on read-depth information for patient, father and mother. Red dots in top panel show ratios indicating the de novo deletion in patient 5. b, Genic content of the deletion. c, Sanger validation for the junction fragment. Dotted vertical line indicates the breakpoint with sequence on the left side originating from sequence proximal to SHANK3 and on the right side sequence that originates from sequence distal to ACR. Breakpoints are provided in Extended Data Table 1.

  7. Single-exon deletion involving SMC1A (patient 48).
    Extended Data Fig. 4: Single-exon deletion involving SMC1A (patient 48).

    a, Schematic depiction of the deletion identified in patient 48 involving a single exon of SMC1A. Pink horizontal bar highlights the exon that was deleted in the patient. b, Details at the genomic level of the deletion including exon 16, with Sanger sequence validation of the breakpoints. Junction is indicated by a black vertical dotted line. Breakpoints are provided in Extended Data Table 1.

  8. Intra-exonic deletion involving MECP2 (patient 18).
    Extended Data Fig. 5: Intra-exonic deletion involving MECP2 (patient 18).

    a, Schematic depiction of the deletion identified in patient 18, which is located within exon 4 of MECP2. Initial Sanger sequencing in a diagnostic setting could not validate the deletion as the primers used to amplify exon 4 removed the primer-binding sites (FW2 and RV1 respectively). Multiplex ligation probe amplification (MLPA) analysis for CNV detection showed normal results as the MLPA primer-binding sites were located just outside of the deleted region. b, Combining primers FW1 and RV2 amplified the junction fragment, clearly showing the deletion within exon 4. Of note, the background underneath the Sanger sequence is derived from the wild-type allele. Breakpoints are provided in Extended Data Table 1.

  9. Confirmation of mosaic mutations in PIAS1, HIVEP2 and KANSL2.
    Extended Data Fig. 6: Confirmation of mosaic mutations in PIAS1, HIVEP2 and KANSL2.

    ac, Approaches used to confirm the presence of mosaic mutations in PIAS1 (a), HIVEP2 (b) and KANSL2 (c). Images and read-depth information showing the base counts in the BAM files (left) indicated that the variants/wild-type allele were not in a 50%/50% distribution. Sanger sequencing (middle) then confirmed the variant to be present in the patient, and absent in the parents (data from parents not shown), again indicating that the mutation allele is underrepresented. Guided by these two observations, amplicon-based deep sequencing using Ion Torrent subsequently confirmed the mosaic state of the mutations (right). On the basis of deep sequencing, percentages of mosaicism for PIAS1, HIVEP2 and KANSL2 were estimated at 21%, 22% and 20%, respectively.

  10. Compound heterozygous structural variation affecting VPS13B (patient 12).
    Extended Data Fig. 7: Compound heterozygous structural variation affecting VPS13B (patient 12).

    a, b, CNVs of VPS13B identified using WGS in patient 12. a, Schematic representation of VPS13B, with vertical bars indicating coding exons. In patient 12 two deletions were identified, one ~122kb in size which was inherited from his father, and another ~2kb in size, which was inherited from his mother and consisted only of a single exon. b, Both CNV junction fragments were subsequently validated using Sanger sequencing. Left, junction fragment from the paternally inherited deletion. Right, junction fragment from the maternally inherited deletion. Breakpoints are provided in Extended Data Table 1.

Tables

  1. Large variants of potential clinical relevance identified using WGS and probability of exonic CNVs occurring in affected and control individuals for these loci
    Extended Data Table 1: Large variants of potential clinical relevance identified using WGS and probability of exonic CNVs occurring in affected and control individuals for these loci
  2. De novo SNVs of potential clinical relevance identified using WGS
    Extended Data Table 2: De novo SNVs of potential clinical relevance identified using WGS

References

  1. Ropers, H. H. Genetics of early onset cognitive impairment. Annu. Rev. Genomics Hum. Genet. 11, 161187 (2010)
  2. Mefford, H. C., Batshaw, M. L. & Hoffman, E. P. Genomics, intellectual disability, and autism. N. Engl. J. Med. 366, 733743 (2012)
  3. de Vries, B. B. et al. Diagnostic genome profiling in mental retardation. Am. J. Hum. Genet. 77, 606616 (2005)
  4. Vissers, L. E. et al. A de novo paradigm for mental retardation. Nature Genet. 42, 11091112 (2010)
  5. Rauch, A. et al. Range of genetic mutations associated with severe non-syndromic sporadic intellectual disability: an exome sequencing study. Lancet 380, 16741682 (2012)
  6. de Ligt, J. et al. Diagnostic exome sequencing in persons with severe intellectual disability. N. Engl. J. Med. 367, 19211929 (2012)
  7. Lupski, J. R. et al. Whole-genome sequencing in a patient with Charcot–Marie–Tooth neuropathy. N. Engl. J. Med. 362, 11811191 (2010)
  8. Drmanac, R. et al. Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 327, 7881 (2010)
  9. Michaelson, J. J. et al. Whole-genome sequencing in autism identifies hot spots for de novo germline mutation. Cell 151, 14311442 (2012)
  10. Kong, A. et al. Rate of de novo mutations and the importance of father’s age to disease risk. Nature 488, 471475 (2012)
  11. Jiang, Y. H. et al. Detection of clinically relevant genetic variants in autism spectrum disorder by whole-genome sequencing. Am. J. Hum. Genet. 93, 249263 (2013)
  12. O’Roak, B. J. et al. Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature 485, 246250 (2012)
  13. Iossifov, I. et al. De novo gene disruptions in children on the autistic spectrum. Neuron 74, 285299 (2012)
  14. Neale, B. M. et al. Patterns and rates of exonic de novo mutations in autism spectrum disorders. Nature 485, 242245 (2012)
  15. Sanders, S. J. et al. De novo mutations revealed by whole-exome sequencing are strongly associated with autism. Nature 485, 237241 (2012)
  16. Epi4K Consortium & Epilepsy Phenome/Genome Project De novo mutations in epileptic encephalopathies. Nature 501, 217221 (2013)
  17. Gulsuner, S. et al. Spatial and temporal mapping of de novo mutations in schizophrenia to a fetal prefrontal cortical network. Cell 154, 518529 (2013)
  18. Xu, B. et al. Exome sequencing supports a de novo mutational paradigm for schizophrenia. Nature Genet. 43, 864868 (2011)
  19. Girard, S. L. et al. Increased exonic de novo mutation rate in individuals with schizophrenia. Nature Genet. 43, 860863 (2011)
  20. Petrovski, S., Wang, Q., Heinzen, E. L., Allen, A. S. & Goldstein, D. B. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 9, e1003709 (2013)
  21. Rippey, C. et al. Formation of chimeric genes by copy-number variation as a mutational mechanism in schizophrenia. Am. J. Hum. Genet. 93, 697710 (2013)
  22. Biesecker, L. G. & Spinner, N. B. A genomic view of mosaicism and human disease. Nature Rev. Genet. 14, 307320 (2013)
  23. The ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature 489, 5774 (2012)
  24. Bell, J. B. D., Sistermans, E. & Ramsden, S. C. Practice guidelines for the Interpretation and Reporting of Unclassified Variants (UVs) in Clinical Molecular Genetics (The UK Clinical Molecular Genetics Society and the Dutch Society of Clinical Genetic Laboratory Specialists, 2007)
  25. Berg, J. S., Khoury, M. J. & Evans, J. P. Deploying whole genome sequencing in clinical practice and public health: meeting the challenge one bin at a time. Genet. Med. 13, 499504 (2011)
  26. Vulto-van Silfhout, A. T. et al. Clinical significance of de novo and inherited copy-number variation. Hum. Mutat. 34, 16791687 (2013)
  27. Hehir-Kwa, J. Y., Pfundt, R., Veltman, J. A. & de Leeuw, N. Pathogenic or not? Assessing the clinical relevance of copy number variants. Clin. Genet. 84, 415421 (2013)
  28. Kolehmainen, J. et al. Cohen syndrome is caused by mutations in a novel gene, COH1, encoding a transmembrane protein with a presumed role in vesicle-mediated sorting and intracellular protein transport. Am. J. Hum. Genet. 72, 13591369 (2003)
  29. Carnevali, P. et al. Computational techniques for human genome resequencing using mated gapped reads. J. Comput. Biol. 19, 279292 (2012)
  30. MacArthur, D. G. et al. A systematic survey of loss-of-function variants in human protein-coding genes. Science 335, 823828 (2012)

Download references

Author information

  1. These authors contributed equally to this work.

    • Christian Gilissen,
    • Jayne Y. Hehir-Kwa,
    • Han G. Brunner,
    • Lisenka E. L. M. Vissers &
    • Joris A. Veltman

Affiliations

  1. Department of Human Genetics, Radboud Institute for Molecular Life Sciences and Donders Centre for Neuroscience, Radboud University Medical Center, Geert Grooteplein 10, 6525 GA Nijmegen, the Netherlands

    • Christian Gilissen,
    • Jayne Y. Hehir-Kwa,
    • Djie Tjwan Thung,
    • Maartje van de Vorst,
    • Bregje W. M. van Bon,
    • Marjolein H. Willemsen,
    • Michael Kwint,
    • Irene M. Janssen,
    • Alexander Hoischen,
    • Annette Schenck,
    • Tan Bo,
    • Rolph Pfundt,
    • Helger G. Yntema,
    • Bert B. A. de Vries,
    • Tjitske Kleefstra,
    • Han G. Brunner,
    • Lisenka E. L. M. Vissers &
    • Joris A. Veltman
  2. Complete Genomics Inc. 2071 Stierlin Court, Mountain View, California 94043, USA

    • Richard Leach,
    • Robert Klein &
    • Rick Tearle
  3. State Key Laboratory of Medical Genetics, Central South University. 110 Xiangya Road, Changsha, Hunan 410078, China

    • Tan Bo
  4. Department of Clinical Genetics, Maastricht University Medical Centre. Universiteitssingel 50, 6229 ER Maastricht, the Netherlands

    • Han G. Brunner &
    • Joris A. Veltman

Contributions

Laboratory work: M.K., I.M.J., T.B., A.H., L.E.L.M.V. Clinical investigation: B.W.M.v.B., M.H.W., B.B.A.d.V., T.K., H.G.B. Data analysis: C.G., J.Y.H.-K., D.T.T., M.v.d.V., R.T. Generation of ID gene list: C.G., A.S., R.P., H.G.Y., T.K., L.E.L.M.V. Data interpretation: L.E.L.M.V., R.P., H.G.Y. Study design: J.A.V., H.G.B., R.L., R.K. Supervision of the study: H.G.B., L.E.L.M.V., J.A.V. Manuscript writing: C.G., J.Y.H.-K., H.G.B., L.E.L.M.V., J.A.V.

Competing financial interests

R.L., R.K. and R.T. are employees of Complete Genomics Inc.

Corresponding author

Correspondence to:

Data included in this manuscript have been deposited at the European Genome-phenome Archive (https://www.ebi.ac.uk/ega/home) under accession number EGAS00001000769.

Author details

Extended data figures and tables

Extended Data Figures

  1. Extended Data Figure 1: Boxplots of rare missense burden in different gene sets. (139 KB)

    Boxplots showing the difference in tolerance for rare missense variation in the general population. The vertical axis shows the distribution for each gene set of the number of rare (<1% in NHLBI Exome Sequencing Project) missense variants divided by the number of rare synonymous variants. From left to right the following gene sets are depicted: all 18,424 RefSeq genes, 170 loss-of-function tolerant genes from ref. 30, all 528 known ID genes (Supplementary Table 10), all 628 candidate ID genes (Supplementary Table 11), 9 known ID genes in which de novo mutations were identified in this study (Supplementary Table 8), and 10 candidate ID genes in which de novo mutations were identified in this study (Supplementary Table 8).

  2. Extended Data Figure 2: Structural variant involving STAG1 (patient 40). (615 KB)

    ac, CNV identified using WGS in patient 40, including the STAG1 gene. a, Chromosome 3 profile (log2 test over reference (T/R) ratios) based on read-depth information for patient, father and mother. Black arrow points towards the de novo event in patient 40. b, Genic contents of deletion. Grey arrows show primers used to amplify the junction fragment. c, Details on the proximal and distal breakpoints, showing the ‘fragmented’ sequence at both ends. Breakpoints are provided in Extended Data Table 1.

  3. Extended Data Figure 3: Structural variant involving SHANK3 (patient 5). (268 KB)

    ac, CNV identified using WGS in patient 5, including the SHANK3 gene. a, Detail of chromosome 22 profile (log2 T/R ratios) based on read-depth information for patient, father and mother. Red dots in top panel show ratios indicating the de novo deletion in patient 5. b, Genic content of the deletion. c, Sanger validation for the junction fragment. Dotted vertical line indicates the breakpoint with sequence on the left side originating from sequence proximal to SHANK3 and on the right side sequence that originates from sequence distal to ACR. Breakpoints are provided in Extended Data Table 1.

  4. Extended Data Figure 4: Single-exon deletion involving SMC1A (patient 48). (109 KB)

    a, Schematic depiction of the deletion identified in patient 48 involving a single exon of SMC1A. Pink horizontal bar highlights the exon that was deleted in the patient. b, Details at the genomic level of the deletion including exon 16, with Sanger sequence validation of the breakpoints. Junction is indicated by a black vertical dotted line. Breakpoints are provided in Extended Data Table 1.

  5. Extended Data Figure 5: Intra-exonic deletion involving MECP2 (patient 18). (124 KB)

    a, Schematic depiction of the deletion identified in patient 18, which is located within exon 4 of MECP2. Initial Sanger sequencing in a diagnostic setting could not validate the deletion as the primers used to amplify exon 4 removed the primer-binding sites (FW2 and RV1 respectively). Multiplex ligation probe amplification (MLPA) analysis for CNV detection showed normal results as the MLPA primer-binding sites were located just outside of the deleted region. b, Combining primers FW1 and RV2 amplified the junction fragment, clearly showing the deletion within exon 4. Of note, the background underneath the Sanger sequence is derived from the wild-type allele. Breakpoints are provided in Extended Data Table 1.

  6. Extended Data Figure 6: Confirmation of mosaic mutations in PIAS1, HIVEP2 and KANSL2. (374 KB)

    ac, Approaches used to confirm the presence of mosaic mutations in PIAS1 (a), HIVEP2 (b) and KANSL2 (c). Images and read-depth information showing the base counts in the BAM files (left) indicated that the variants/wild-type allele were not in a 50%/50% distribution. Sanger sequencing (middle) then confirmed the variant to be present in the patient, and absent in the parents (data from parents not shown), again indicating that the mutation allele is underrepresented. Guided by these two observations, amplicon-based deep sequencing using Ion Torrent subsequently confirmed the mosaic state of the mutations (right). On the basis of deep sequencing, percentages of mosaicism for PIAS1, HIVEP2 and KANSL2 were estimated at 21%, 22% and 20%, respectively.

  7. Extended Data Figure 7: Compound heterozygous structural variation affecting VPS13B (patient 12). (118 KB)

    a, b, CNVs of VPS13B identified using WGS in patient 12. a, Schematic representation of VPS13B, with vertical bars indicating coding exons. In patient 12 two deletions were identified, one ~122kb in size which was inherited from his father, and another ~2kb in size, which was inherited from his mother and consisted only of a single exon. b, Both CNV junction fragments were subsequently validated using Sanger sequencing. Left, junction fragment from the paternally inherited deletion. Right, junction fragment from the maternally inherited deletion. Breakpoints are provided in Extended Data Table 1.

Extended Data Tables

  1. Extended Data Table 1: Large variants of potential clinical relevance identified using WGS and probability of exonic CNVs occurring in affected and control individuals for these loci (147 KB)
  2. Extended Data Table 2: De novo SNVs of potential clinical relevance identified using WGS (201 KB)

Supplementary information

PDF files

  1. Supplementary Information (2.9 MB)

    This file contains Supplementary Methods, Supplementary Tables 1-15 and Supplementary References.

Additional data