Review Article | Published:

Exome sequencing as a tool for Mendelian disease gene discovery

Nature Reviews Genetics volume 12, pages 745755 (2011) | Download Citation

Abstract

Exome sequencing — the targeted sequencing of the subset of the human genome that is protein coding — is a powerful and cost-effective new tool for dissecting the genetic basis of diseases and traits that have proved to be intractable to conventional gene-discovery strategies. Over the past 2 years, experimental and analytical approaches relating to exome sequencing have established a rich framework for discovering the genes underlying unsolved Mendelian disorders. Additionally, exome sequencing is being adapted to explore the extent to which rare alleles explain the heritability of complex diseases and health-related traits. These advances also set the stage for applying exome and whole-genome sequencing to facilitate clinical diagnosis and personalized disease-risk profiling.

Key points

  • The development of methods that couple targeted capture and massively parallel DNA sequencing —termed exomesequencing — has made it possible to determine cost-effectively nearly all of the coding variation in an individual human genome.

  • Exome sequencing is a powerful and cost-effective new tool for dissecting the genetic basis of Mendelian diseases or traits that have proven intractable to conventional gene-discovery strategies.

  • Most Mendelian disorders that have been solved to date by exome sequencing have relied on comparison of variants found in a small number of unrelated or closely related affected individuals to identify shared novel or rare alleles of the same gene. An alternative to this discrete-filtering approach is to apply tests of association.

  • Exome sequencing of parent–child trios is a highly effective approach for identifying de novo coding mutations, as multiple de novo events occurring within a specific gene (or within a gene family or pathway) is an extremely unlikely event.

  • Solving the remaining several thousand Mendelian disorders by exome or whole-genome sequencing is possible and should be an imperative for the human and medical genetics community.

  • The widespread, useful, convenient and cost-effective use of exome sequencing and eventually whole-genome sequencing for clinical diagnosis and screening will necessitate overcoming a number of major challenges that currently limit its broad applicability.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    Mendelian Inheritance in Man and its online version, OMIM. Am. J. Hum. Genet. 80, 588–604 (2007).

  2. 2.

    Human genetics. Affordable 'exomes' fill gaps in a catalogue of rare diseases. Science 330, 903 (2010).

  3. 3.

    & Mendelian disorders deserve more attention. Nature Rev. Genet. 7, 277–282 (2006).

  4. 4.

    , , & Common vs. rare allele hypotheses for complex diseases. Curr. Opin. Genet. Dev. 19, 212–219 (2009).

  5. 5.

    et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009).

  6. 6.

    & Genetic heterogeneity in human disease. Cell 141, 210–217 (2010).

  7. 7.

    Sequencing technologies — the next generation. Nature Rev. Genet. 11, 31–46 (2010).

  8. 8.

    et al. Target-enrichment strategies for next-generation sequencing. Nature Methods 7, 111–118 (2010).

  9. 9.

    Exome sequencing makes medical genomics a reality. Nature Genet. 42, 13–14 (2010).

  10. 10.

    et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461, 272–276 (2009). This was the first study to show the feasibility of using exome sequencing to identify disease-causing variants.

  11. 11.

    et al. Exome sequencing identifies the cause of a Mendelian disorder. Nature Genet. 42, 30–35 (2010). This was the first study to use exome sequencing to discover the genetic basis of a monogenic disorder.

  12. 12.

    et al. Whole-exome sequencing identifies recessive WDR62 mutations in severe brain malformations. Nature 467, 207–210 (2010). This is an outstanding paper demonstrating the narrowing to a single candidate gene that is made possible by exome sequencing a single case in the context of a consanguineous pedigree and a recessive phenotype.

  13. 13.

    et al. The Human Gene Mutation Database: providing a comprehensive central mutation database for molecular diagnostics and personalized genomics. Hum. Genomics 4, 69–72 (2009).

  14. 14.

    , & Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. Am. J. Hum. Genet. 80, 727–739 (2007).

  15. 15.

    et al. Mutations in NOTCH2 cause Hajdu–Cheney syndrome, a disorder of severe and progressive bone loss. Nature Genet. 43, 303–305 (2011).

  16. 16.

    et al. Identity-by-descent filtering of exome sequence data identifies PIGV mutations in hyperphosphatasia mental retardation syndrome. Nature Genet. 42, 827–829 (2010).

  17. 17.

    et al. Rapid detection of a mutation causing X-linked leucoencephalopathy by exome sequencing. J. Med. Genet. 48, 606–609 (2011).

  18. 18.

    et al. Confirmation by exome sequencing of the pathogenic role of NCSTN mutations in acne inversa (hidradenitis suppurativa). J. Invest. Dermatol. 131, 1570–1572 (2011).

  19. 19.

    et al. Exome resequencing combined with linkage analysis identifies novel PTH1R variants in primary failure of tooth eruption in Japanese. J. Bone Miner. Res. 26, 1655–1661 (2011).

  20. 20.

    et al. Whole-exome sequencing links a variant in DHDDS to retinitis pigmentosa. Am. J. Hum. Genet. 88, 201–206 (2011).

  21. 21.

    et al. Candidate exome capture identifies mutation of SDCCAG8 as the cause of a retinal–renal ciliopathy. Nature Genet. 42, 840–850 (2010).

  22. 22.

    et al. Exome sequencing identifies ACAD9 mutations as a cause of complex I deficiency. Nature Genet. 42, 1131–1134 (2010).

  23. 23.

    et al. Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome. Nature Genet. 42, 790–793 (2010).

  24. 24.

    et al. Exome capture and massively parallel sequencing identifies a novel HPSE2 mutation in a Saudi Arabian child with Ochoa (urofacial) syndrome. J. Pediatr. Urol. 28 Mar 2011 (doi:10.1016/j.jpurol.2011.02.034).

  25. 25.

    et al. Whole-exome-sequencing-based discovery of human FADD deficiency. Am. J. Hum. Genet. 87, 873–881 (2010).

  26. 26.

    et al. Exome sequencing reveals a novel mutation for autosomal recessive non-syndromic mental retardation in the TECR gene on chromosome 19p13. Hum. Mol. Genet. 20, 1285–1289 (2011).

  27. 27.

    et al. Whole-exome re-sequencing in a family quartet identifies POP1 mutations as the cause of a novel skeletal dysplasia. PLoS Genet. 7, e1002027 (2011).

  28. 28.

    et al. Whole exome sequencing and homozygosity mapping identify mutation in the cell polarity protein GPSM2 as the cause of nonsyndromic hearing loss DFNB82. Am. J. Hum. Genet. 87, 90–94 (2010).

  29. 29.

    et al. Massively parallel sequencing of exons on the X chromosome identifies RBM10 as the gene that causes a syndromic form of cleft palate. Am. J. Hum. Genet. 86, 743–748 (2010).

  30. 30.

    et al. Genome-wide studies of copy number variation and exome sequencing identify rare variants in BAG3 as a cause of dilated cardiomyopathy. Am. J. Hum. Genet. 88, 273–282 (2011).

  31. 31.

    et al. Exome sequencing, ANGPTL3 mutations, and familial combined hypolipidemia. N. Engl. J. Med. 363, 2220–2227 (2010).

  32. 32.

    et al. Exome sequencing reveals VCP mutations as a cause of familial ALS. Neuron 68, 857–864 (2010).

  33. 33.

    et al. TGM6 identified as a novel causative gene of spinocerebellar ataxias using exome sequencing. Brain 133, 3510–3518 (2010).

  34. 34.

    et al. Exome sequencing identifies WDR35 variants involved in Sensenbrenner syndrome. Am. J. Hum. Genet. 87, 418–423 (2010).

  35. 35.

    et al. Unexpected allelic heterogeneity and spectrum of mutations in Fowler syndrome revealed by next-generation exome sequencing. Hum. Mutat. 31, 918–923 (2010).

  36. 36.

    et al. MASP1 mutations in patients with facial, umbilical, coccygeal, and auditory findings of Carnevale, Malpuech, OSA, and Michels syndromes. Am. J. Hum. Genet. 87, 679–686 (2010).

  37. 37.

    et al. De novo mutations of SETBP1 cause Schinzel–Giedion syndrome. Nature Genet. 42, 483–485 (2010).

  38. 38.

    et al. CEP152 is a genome maintenance protein disrupted in Seckel syndrome. Nature Genet. 43, 23–26 (2011).

  39. 39.

    , & PHAST and RPHAST: phylogenetic analysis with space/time models. Brief. Bioinf. 12, 41–51 (2011).

  40. 40.

    et al. Single-nucleotide evolutionary constraint scores highlight disease-causing mutations. Nature Methods 7, 250–251 (2010).

  41. 41.

    & S. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nature Protoc. 4, 1073–1081 (2009).

  42. 42.

    et al. A method and server for predicting damaging missense mutations. Nature Methods 7, 248–249 (2010).

  43. 43.

    & Physicochemical constraint violation by missense substitutions mediates impairment of protein function and disease severity. Genome Res. 15, 978–986 (2005).

  44. 44.

    & Estimate of the mutation rate per nucleotide in humans. Genetics 156, 297–304 (2000).

  45. 45.

    et al. A de novo paradigm for mental retardation. Nature Genet. 42, 1109–1112 (2010). This was the first study to use exome sequencing of parent–child trios of affected offspring and their unaffected parents to identify de novo variants and thus candidate genes for a complex trait characterized by substantial locus heterogeneity.

  46. 46.

    et al. Increased exonic de novo mutation rate in individuals with schizophrenia. Nature Genet. 43, 860–863 (2011).

  47. 47.

    et al. Exome sequencing in sporadic autism spectrum disorders identifies severe de novo mutations. Nature Genet. 43, 585–589 (2011).

  48. 48.

    & Investigation of Mendelian forms of obesity holds out the prospect of personalized medicine. Ann. N.Y. Acad. Sci. 1214, 180–189 (2010).

  49. 49.

    New therapeutic approaches to Mendelian disorders. N. Engl. J. Med. 363, 852–863 (2010).

  50. 50.

    St. et al. NT5E mutations and arterial calcifications. N. Engl. J. Med. 364, 432–42 (2011).

  51. 51.

    et al. Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proc. Natl Acad. Sci. USA 106, 19096–19101 (2009). This paper provides the first example of applying exome sequencing to make an unanticipated diagnosis in a clinical setting.

  52. 52.

    et al. Making a definitive diagnosis: successful clinical application of whole exome sequencing in a child with intractable inflammatory bowel disease. Genet. Med. 13, 255–262 (2011). This is an outstanding example of the clinical diagnosis of a rare disorder by exome sequencing leading to a subsequent, life-saving change in treatment.

  53. 53.

    et al. Molecular diagnosis of neonatal diabetes mellitus using next-generation sequencing of the whole exome. PLoS ONE 5, e13630 (2010).

  54. 54.

    et al. Exome sequencing allows for rapid gene identification in a Charcot–Marie–Tooth family. Ann. Neurol. 69, 464–470 (2011).

  55. 55.

    et al. Noninvasive prenatal diagnosis of fetal chromosomal aneuploidy by massively parallel genomic sequencing of DNA in maternal plasma. Proc. Natl Acad. Sci. USA 105, 20458–20463 (2008).

  56. 56.

    & Non-invasive prenatal diagnosis by fetal nucleic acid analysis in maternal plasma: the coming of age. Semin. Fetal Neonatal Med. 16, 88–93 (2011).

  57. 57.

    et al. Carrier testing for severe childhood recessive diseases by next-generation sequencing. Sci. Transl. Med. 3, 65ra4 (2011). This work reports on efforts to implement pre-conception carrier screening for over 400 recessive disorders by hybrid capture and next-generation sequencing.

  58. 58.

    et al. Clinical assessment incorporating a personal genome. Lancet 375, 1525–1535 (2010). This paper illustrates both the promise and challenges we face in the clinical interpretation of exome or genome sequences of individual patients.

  59. 59.

    & Deep sequencing of patient genomes for disease diagnosis: when will it become routine? Sci. Transl. Med. 3, 87ps23 (2011).

  60. 60.

    The PAH gene, phenylketonuria, and a paradigm shift. Hum. Mutat. 28, 831–845 (2007).

  61. 61.

    et al. Challenges in the clinical application of whole-genome sequencing. Lancet 375, 1749–1751 (2010).

  62. 62.

    , & Automated validation of genetic variants from large databases: ensuring that variant references refer to the same genomic locations. Bioinformatics 27, 891–893 (2011).

  63. 63.

    et al. How to catch all those mutations—the report of the third Human Variome Project Meeting, UNESCO Paris, May 2010. Hum. Mutat. 31, 1374–1381 (2010).

  64. 64.

    et al. Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328, 636–639 (2010). This was the first study to report the sequencing of the entire genome for each member of a family with a Mendelian disorder.

  65. 65.

    & Offering individual genetic research results: context matters. Sci. Transl. Med. 2, 38cm20 (2010).

  66. 66.

    et al. ACMG recommendations for standards for interpretation and reporting of sequence variations: revisions 2007. Genet. Med. 10, 294–300 (2008).

  67. 67.

    et al. Ethical and practical guidelines for reporting genetic research results to study participants: updated guidelines from a National Heart, Lung, and Blood Institute working group. Circ. Cardiovasc. Genet. 3, 574–580 (2011).

  68. 68.

    et al. Research ethics recommendations for whole-genome research: consensus statement. PLoS Biol. 6, e73 (2008).

  69. 69.

    et al. Managing incidental findings in human subjects research: analysis and recommendations. J. Law Med. Ethics 36, 219–248 (2008).

  70. 70.

    & Disclosing individual genetic results to research participants. Am. J. Bioeth. 6, 8–17 (2006).

  71. 71.

    & Charting a course for genomic medicine from base pairs to bedside. Nature 470, 204–213 (2011).

  72. 72.

    et al. Multigene amplification and massively parallel sequencing for cancer mutation discovery. Proc. Natl Acad. Sci. USA 104, 9387–9392 (2007).

  73. 73.

    et al. Multiplex amplification of all coding sequences within 10 cancer genes by Gene-Collector. Nucleic Acids Res. 35, e47 (2007).

  74. 74.

    et al. Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nature Biotech. 27, 182–189 (2009).

  75. 75.

    et al. Microarray-based genomic selection for high-throughput resequencing. Nature Methods 4, 907–909 (2007).

  76. 76.

    et al. Multiplex amplification of large sets of human exons. Nature Methods 4, 931–936 (2007).

  77. 77.

    et al. Direct selection of human genomic loci by microarray hybridization. Nature Methods 4, 903–905 (2007).

  78. 78.

    , , , & Massively parallel exon capture and library-free resequencing across 16 genomes. Nature Methods 6, 315–316 (2009).

  79. 79.

    , , & Methods for genomic partitioning. Annu. Rev. Genomics Hum. Genet. 10, 263–284 (2009).

  80. 80.

    , , & Genotype and SNP calling from next-generation sequencing data. Nature Rev. Genet. 12, 443–451 (2011).

  81. 81.

    , & Genome structural variation discovery and genotyping. Nature Rev. Genet. 12, 363–376 (2011).

  82. 82.

    et al. Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition. Genome Biol. 11, R119 (2010).

  83. 83.

    & Uncovering the roles of rare variants in common disease through whole-genome sequencing. Nature Rev. Genet. 11, 415–425 (2010).

  84. 84.

    , , & Extremes of unexplained variation as a phenotype: an efficient approach for genome-wide association studies of cardiovascular disease. Circ. Cardiovasc. Genet. 3, 215–221 (2010).

  85. 85.

    et al. Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science 305, 869–872 (2004). This was an important study that demonstrated the effectiveness of sequencing candidate genes at the extremes of a phenotype to find rare alleles influencing risk for a complex trait.

  86. 86.

    & Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am. J. Hum. Genet. 83, 311–321 (2008).

  87. 87.

    & An evaluation of statistical approaches to rare variant analysis in genetic association studies. Genet. Epidemiol. 34, 188–193 (2010).

  88. 88.

    et al. Pooled association tests for rare variants in exon-resequencing studies. Am. J. Hum. Genet. 86, 832–838 (2010).

  89. 89.

    & A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet. 5, e1000384 (2009).

  90. 90.

    , , & Statistical analysis strategies for association studies involving rare variants. Nature Rev. Genet. 11, 773–785 (2010).

  91. 91.

    et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genet. 43, 491–498 (2011).

Download references

Acknowledgements

We thank the US National Institutes of Health (NIH)/National Heart, Lung, and Blood Institute (NHLBI) Exome Sequencing Project (Lung Grand Opportunity (GO) Sequencing Project (HL-102923 to M.J.B.), the US Women's Health Initiative (WHI) GO Sequencing Project (HL-102924), the Heart GO Sequencing Project (HL-103010), the Broad GO Sequencing Project (HL-102925) and the Seattle GO Sequencing Project (HL-102926 to D.A.N. and J.S.) for early data release that proved useful for demonstrating filtering strategies. Our work was supported in part by grants from the NIH/NHLBI (5R01HL094976 to D.A.N. and J.S.), the NIH/National Human Genome Research Institute (5R21HG004749 to J.S., 1RC2HG005608 to M.J.B., D.A.N. and J.S., and 5RO1HG004316 to H.K.T.), NIH/National Institute of Environmental Health Sciences (HHSN273200800010C to D.N.), the Life Sciences Discovery Fund (2065508 and 0905001), the Washington Research Foundation and the NIH/National Institute of Child Health and Human Development (1R01HD048895 to M.J.B.). S.B.N. is supported by the Agency for Science, Technology and Research, Singapore. A.W.B. is supported by a training fellowship from the NIH/National Human Genome Research Institute (T32HG00035).

Author information

Affiliations

  1. Department of Pediatrics, University of Washington, Health Sciences Building RR349, 1959 N.E. Pacific Street, Seattle, Washington 98195-6320, USA.

    • Michael J. Bamshad
    • , Abigail W. Bigham
    •  & Holly K. Tabor
  2. Department of Genome Sciences, University of Washington, Foege Building, S-210 3720 15th Avenue N.E., Seattle, Washington 98195-5065, USA.

    • Michael J. Bamshad
    • , Sarah B. Ng
    • , Deborah A. Nickerson
    •  & Jay Shendure
  3. Department of Anthropology, University of Michigan, 222C West Hall, 1085 S. University Avenue, Ann Arbor, Michigan 48014, USA.

    • Abigail W. Bigham
  4. Treuman Katz Center for Pediatric Bioethics, Seattle Children's Research Institute, M/S C9S-6, 1900 Ninth Avenue, Seattle, Washington 98101, USA.

    • Holly K. Tabor
  5. Department of Biostatistics, University of Washington, Health Sciences Building F-658, 1959 N.E. Pacific Street, Seattle, Washington 98195-6320, USA.

    • Mary J. Emond

Authors

  1. Search for Michael J. Bamshad in:

  2. Search for Sarah B. Ng in:

  3. Search for Abigail W. Bigham in:

  4. Search for Holly K. Tabor in:

  5. Search for Mary J. Emond in:

  6. Search for Deborah A. Nickerson in:

  7. Search for Jay Shendure in:

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Michael J. Bamshad or Jay Shendure.

Supplementary information

PDF files

  1. 1.

    Supplementary information S1 (table)

    Diseases identified to date via exome sequencing

Glossary

Mendelian disorders

Phenotypes caused by a mutation (or mutations) in a single gene and inherited in a dominant, recessive or X-linked pattern.

Penetrance

The proportion of individuals with a specific phenotype among carriers of a particular genotype.

Locus heterogeneity

The appearance of phenotypically similar characteristics resulting from mutations at different genetic loci. Differences in effect size or in replication between studies and samples are often ascribed to different loci leading to the same disease.

Genome-wide association studies

(GWASs). Studies that search for a population association between a phenotype and a particular allele by screening loci (most commonly by genotyping SNPs) across the entire genome.

Complex traits

Traits that are influenced by the environment and/or through a combination of variants in at least several genes, each of which has a small effect.

Heritability

The proportion of the total phenotypic variation in a given characteristic that can be attributed to additive genetic effects.

Next-generation DNA sequencing

Highly parallelized DNA-sequencing technologies that produce many hundreds of thousands or millions of short reads (25–500bp) for a low cost and in a short time.

Exome

The subset of a genome that is protein coding. In addition to the exome, commercially available capture probes target non-coding exons, sequences flanking exons and microRNAs.

Homozygosity mapping

Narrowing down the location of a gene underlying a trait by searching for regions of the genome in which both chromosomal segments are inherited identically-by-descent.

Sample indexing

Sequencing more than one sample in a single sequencing lane.

RefSeq

An open-access, annotated and curated collection of publicly available nucleotide sequences (DNA and RNA) and their protein translations.

Ultra-conserved elements

Subsequences of the genome that appear to be under extremely high levels of sequence constraint based on phylogenetic comparisons.

Purifying selection

Selection against a functionally deleterious allele.

Parametric tests

Statistical significance tests for which P values are based on models or assumed formulae for the distribution of the test statistic.

Permutation test

A statistical test in which the data are randomized many times to determine the statistical significance of the experimental outcome.

Multiplex families

Families in which two or more individuals are affected by the same disorder.

Identity-by-descent

Alleles on different chromosomes that are identical because they are inherited from a shared common ancestor.

Identity-by-state

Alleles on different chromosomes that are identical but do not share a common ancestor with respect to a pedigree or population of interest.

Haplotype

A combination of alleles on a single chromosome.

Processed pseudogenes

Copies of the coding sequences of genes that lack promoters and introns, contain poly(A) tails and are flanked by target-site duplications.

Posterior probability

The probability of an event after combining prior knowledge of the event with the likelihood of that event given by observed data.

Bootstrap

A type of statistical analysis that is generally used for measuring the reliability of a sample estimate. It proceeds by the repeated sampling, with replacement, of the original data set. In the application described here bootstrapping is used to assess the probability of identifying the causal variant for a genetic condition in a population.

Incidental findings

Findings that are not explicitly related to the original research hypotheses (that is, primary findings).>

About this article

Publication history

Published

DOI

https://doi.org/10.1038/nrg3031

Further reading