Genome-wide association studies suggest that common genetic variants explain only a modest fraction of heritable risk for common diseases, raising the question of whether rare variants account for a significant fraction of unexplained heritability1,2. Although DNA sequencing costs have fallen markedly3, they remain far from what is necessary for rare and novel variants to be routinely identified at a genome-wide scale in large cohorts. We have therefore sought to develop second-generation methods for targeted sequencing of all protein-coding regions (‘exomes’), to reduce costs while enriching for discovery of highly penetrant variants. Here we report on the targeted capture and massively parallel sequencing of the exomes of 12 humans. These include eight HapMap individuals representing three populations4, and four unrelated individuals with a rare dominantly inherited disorder, Freeman–Sheldon syndrome (FSS)5. We demonstrate the sensitive and specific identification of rare and common variants in over 300 megabases of coding sequence. Using FSS as a proof-of-concept, we show that candidate genes for Mendelian disorders can be identified by exome sequencing of a small number of unrelated, affected individuals. This strategy may be extendable to diseases with more complex genetics through larger sample sizes and appropriate weighting of non-synonymous variants by predicted functional impact.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
For discussions or assistance with genotyping data, we thank P. Green, J. Akey, R. Patwardhan, G. Cooper, J. Kidd, D. Gordon, J. Smith, I. Stanaway and M. Rieder. For assistance with project management, computation, data management and submission, we thank E. Torskey, S. Thompson, T. Amburg, B. McNally, S. Hearsey, M. Shumway and L. Hillier. For Human1M-Duo genotype data on HapMap samples, we thank Illumina. Our work was supported in part by grants from the National Institutes of Health/National Heart Lung and Blood Institute, the National Institutes of Health/National Human Genome Research Institute, National Institutes of Health/National Institute of Child Health and Human Development, and the Washington Research Foundation. S.B.N. is supported by the Agency for Science, Technology and Research, Singapore. E.H.T. and A.W.B. are supported by a training fellowship from the National Institutes of Health/National Human Genome Research Institute. E.E.E. is an investigator of the Howard Hughes Medical Institute.
Author Contributions The project was conceived and experiments planned by S.B.N., E.H.T., A.B., E.E.E., M.B., D.A.N. and J.S. Experiments were performed by S.B.N., E.H.T., C.L. and M.W. Algorithm development and data analysis were performed by S.B.N., P.D.R., S.D.F., A.W.B., T.S., M.B., D.A.N. and J.S. The manuscript was written by S.B.N. and J.S. All aspects of the study were supervised by J.S.
This file lists intervals within the targeted exome that were excluded from consideration based on poor anticipated mappability with 76 bp single-end reads.
This file lists the fraction of targeted coding bases in each gene that were covered in each of 12 individuals (either with >=1x coverage or with sufficient coverage to variant call).