Exome sequencing — the targeted sequencing of the subset of the human genome that is protein coding — is a powerful and cost-effective new tool for dissecting the genetic basis of diseases and traits that have proved to be intractable to conventional gene-discovery strategies. Over the past 2 years, experimental and analytical approaches relating to exome sequencing have established a rich framework for discovering the genes underlying unsolved Mendelian disorders. Additionally, exome sequencing is being adapted to explore the extent to which rare alleles explain the heritability of complex diseases and health-related traits. These advances also set the stage for applying exome and whole-genome sequencing to facilitate clinical diagnosis and personalized disease-risk profiling.
The development of methods that couple targeted capture and massively parallel DNA sequencing —termed exomesequencing — has made it possible to determine cost-effectively nearly all of the coding variation in an individual human genome.
Exome sequencing is a powerful and cost-effective new tool for dissecting the genetic basis of Mendelian diseases or traits that have proven intractable to conventional gene-discovery strategies.
Most Mendelian disorders that have been solved to date by exome sequencing have relied on comparison of variants found in a small number of unrelated or closely related affected individuals to identify shared novel or rare alleles of the same gene. An alternative to this discrete-filtering approach is to apply tests of association.
Exome sequencing of parent–child trios is a highly effective approach for identifying de novo coding mutations, as multiple de novo events occurring within a specific gene (or within a gene family or pathway) is an extremely unlikely event.
Solving the remaining several thousand Mendelian disorders by exome or whole-genome sequencing is possible and should be an imperative for the human and medical genetics community.
The widespread, useful, convenient and cost-effective use of exome sequencing and eventually whole-genome sequencing for clinical diagnosis and screening will necessitate overcoming a number of major challenges that currently limit its broad applicability.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $22.08 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
We thank the US National Institutes of Health (NIH)/National Heart, Lung, and Blood Institute (NHLBI) Exome Sequencing Project (Lung Grand Opportunity (GO) Sequencing Project (HL-102923 to M.J.B.), the US Women's Health Initiative (WHI) GO Sequencing Project (HL-102924), the Heart GO Sequencing Project (HL-103010), the Broad GO Sequencing Project (HL-102925) and the Seattle GO Sequencing Project (HL-102926 to D.A.N. and J.S.) for early data release that proved useful for demonstrating filtering strategies. Our work was supported in part by grants from the NIH/NHLBI (5R01HL094976 to D.A.N. and J.S.), the NIH/National Human Genome Research Institute (5R21HG004749 to J.S., 1RC2HG005608 to M.J.B., D.A.N. and J.S., and 5RO1HG004316 to H.K.T.), NIH/National Institute of Environmental Health Sciences (HHSN273200800010C to D.N.), the Life Sciences Discovery Fund (2065508 and 0905001), the Washington Research Foundation and the NIH/National Institute of Child Health and Human Development (1R01HD048895 to M.J.B.). S.B.N. is supported by the Agency for Science, Technology and Research, Singapore. A.W.B. is supported by a training fellowship from the NIH/National Human Genome Research Institute (T32HG00035).
Diseases identified to date via exome sequencing
- Mendelian disorders
Phenotypes caused by a mutation (or mutations) in a single gene and inherited in a dominant, recessive or X-linked pattern.
The proportion of individuals with a specific phenotype among carriers of a particular genotype.
- Locus heterogeneity
The appearance of phenotypically similar characteristics resulting from mutations at different genetic loci. Differences in effect size or in replication between studies and samples are often ascribed to different loci leading to the same disease.
- Genome-wide association studies
(GWASs). Studies that search for a population association between a phenotype and a particular allele by screening loci (most commonly by genotyping SNPs) across the entire genome.
- Complex traits
Traits that are influenced by the environment and/or through a combination of variants in at least several genes, each of which has a small effect.
The proportion of the total phenotypic variation in a given characteristic that can be attributed to additive genetic effects.
- Next-generation DNA sequencing
Highly parallelized DNA-sequencing technologies that produce many hundreds of thousands or millions of short reads (25–500bp) for a low cost and in a short time.
The subset of a genome that is protein coding. In addition to the exome, commercially available capture probes target non-coding exons, sequences flanking exons and microRNAs.
- Homozygosity mapping
Narrowing down the location of a gene underlying a trait by searching for regions of the genome in which both chromosomal segments are inherited identically-by-descent.
- Sample indexing
Sequencing more than one sample in a single sequencing lane.
An open-access, annotated and curated collection of publicly available nucleotide sequences (DNA and RNA) and their protein translations.
- Ultra-conserved elements
Subsequences of the genome that appear to be under extremely high levels of sequence constraint based on phylogenetic comparisons.
- Purifying selection
Selection against a functionally deleterious allele.
- Parametric tests
Statistical significance tests for which P values are based on models or assumed formulae for the distribution of the test statistic.
- Permutation test
A statistical test in which the data are randomized many times to determine the statistical significance of the experimental outcome.
- Multiplex families
Families in which two or more individuals are affected by the same disorder.
Alleles on different chromosomes that are identical because they are inherited from a shared common ancestor.
Alleles on different chromosomes that are identical but do not share a common ancestor with respect to a pedigree or population of interest.
A combination of alleles on a single chromosome.
- Processed pseudogenes
Copies of the coding sequences of genes that lack promoters and introns, contain poly(A) tails and are flanked by target-site duplications.
- Posterior probability
The probability of an event after combining prior knowledge of the event with the likelihood of that event given by observed data.
A type of statistical analysis that is generally used for measuring the reliability of a sample estimate. It proceeds by the repeated sampling, with replacement, of the original data set. In the application described here bootstrapping is used to assess the probability of identifying the causal variant for a genetic condition in a population.
- Incidental findings
Findings that are not explicitly related to the original research hypotheses (that is, primary findings).>
About this article
Scientific Reports (2018)