Subjects

Abstract

Genome-wide association studies suggest that common genetic variants explain only a modest fraction of heritable risk for common diseases, raising the question of whether rare variants account for a significant fraction of unexplained heritability1,2. Although DNA sequencing costs have fallen markedly3, they remain far from what is necessary for rare and novel variants to be routinely identified at a genome-wide scale in large cohorts. We have therefore sought to develop second-generation methods for targeted sequencing of all protein-coding regions (‘exomes’), to reduce costs while enriching for discovery of highly penetrant variants. Here we report on the targeted capture and massively parallel sequencing of the exomes of 12 humans. These include eight HapMap individuals representing three populations4, and four unrelated individuals with a rare dominantly inherited disorder, Freeman–Sheldon syndrome (FSS)5. We demonstrate the sensitive and specific identification of rare and common variants in over 300 megabases of coding sequence. Using FSS as a proof-of-concept, we show that candidate genes for Mendelian disorders can be identified by exome sequencing of a small number of unrelated, affected individuals. This strategy may be extendable to diseases with more complex genetics through larger sample sizes and appropriate weighting of non-synonymous variants by predicted functional impact.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    et al. Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science 305, 869–872 (2004)

  2. 2.

    , , & Human genetic variation and its contribution to complex traits. Nature Rev. Genet. 10, 241–251 (2009)

  3. 3.

    & Next-generation DNA sequencing. Nature Biotechnol. 26, 1135–1145 (2008)

  4. 4.

    The International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005)

  5. 5.

    et al. Mutations in embryonic myosin heavy chain (MYH3) cause Freeman-Sheldon syndrome and Sheldon-Hall syndrome. Nature Genet. 38, 561–565 (2006)

  6. 6.

    et al. The consensus coding sequences of human breast and colorectal cancers. Science 314, 268–274 (2006)

  7. 7.

    Enrichment of super-sized resequencing targets from the human genome. Nature Methods 4, 891–892 (2007)

  8. 8.

    et al. Genome-wide in situ exon capture for selective resequencing. Nature Genet. 39, 1522–1527 (2007)

  9. 9.

    National Center for Biotechnology Information. Consensus CDS protein set <> (2009)

  10. 10.

    et al. Genetic variation in an individual human exome. PLoS Genet. 4, e1000160 (2008)

  11. 11.

    et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008)

  12. 12.

    et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008)

  13. 13.

    , & Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18, 1851–1858 (2008)

  14. 14.

    et al. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nature Genet. 40, 722–729 (2008)

  15. 15.

    & Base-calling of automated sequencer traces using Phred. II. Error probabilities. Genome Res. 8, 186–194 (1998)

  16. 16.

    , , & Massively parallel exon capture and library-free resequencing across 16 individuals. Nature Methods 6, 315–316 (2009)

  17. 17.

    et al. Haplotype sorting using human fosmid clone end-sequence pairs. Genome Res. 18, 2016–2023 (2008)

  18. 18.

    et al. Direct selection of human genomic loci by microarray hybridization. Nature Methods 4, 903–905 (2007)

  19. 19.

    et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872–876 (2008)

  20. 20.

    et al. The diploid genome sequence of an Asian individual. Nature 456, 60–65 (2008)

  21. 21.

    et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007)

  22. 22.

    et al. DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature 456, 66–72 (2008)

  23. 23.

    et al. Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet. 4, e1000083 (2008)

  24. 24.

    et al. Prediction of deleterious human alleles. Hum. Mol. Genet. 10, 591–597 (2001)

  25. 25.

    et al. A genome-wide survey of the prevalence and evolutionary forces acting on human nonsense SNPs. Am. J. Hum. Genet. 84, 224–234 (2009)

  26. 26.

    When less is more: gene loss as an engine of evolutionary change. Am. J. Hum. Genet. 64, 18–23 (1999)

  27. 27.

    et al. Low LDL cholesterol in individuals of African descent resulting from frequent nonsense mutations in PCSK9. Nature Genet. 37, 161–165 (2005)

  28. 28.

    et al. Exomic sequencing identifies PALB2 as a pancreatic cancer susceptibility gene. Science 324, 217 (2009)

  29. 29.

    1000 Genomes project. Nature Biotechnol. 26, 256 (2008)

  30. 30.

    , , & Power of deep, all-exon resequencing for discovery of human trait genes. Proc. Natl Acad. Sci. USA 106, 3871–3876 (2009)

Download references

Acknowledgements

For discussions or assistance with genotyping data, we thank P. Green, J. Akey, R. Patwardhan, G. Cooper, J. Kidd, D. Gordon, J. Smith, I. Stanaway and M. Rieder. For assistance with project management, computation, data management and submission, we thank E. Torskey, S. Thompson, T. Amburg, B. McNally, S. Hearsey, M. Shumway and L. Hillier. For Human1M-Duo genotype data on HapMap samples, we thank Illumina. Our work was supported in part by grants from the National Institutes of Health/National Heart Lung and Blood Institute, the National Institutes of Health/National Human Genome Research Institute, National Institutes of Health/National Institute of Child Health and Human Development, and the Washington Research Foundation. S.B.N. is supported by the Agency for Science, Technology and Research, Singapore. E.H.T. and A.W.B. are supported by a training fellowship from the National Institutes of Health/National Human Genome Research Institute. E.E.E. is an investigator of the Howard Hughes Medical Institute.

Author Contributions The project was conceived and experiments planned by S.B.N., E.H.T., A.B., E.E.E., M.B., D.A.N. and J.S. Experiments were performed by S.B.N., E.H.T., C.L. and M.W. Algorithm development and data analysis were performed by S.B.N., P.D.R., S.D.F., A.W.B., T.S., M.B., D.A.N. and J.S. The manuscript was written by S.B.N. and J.S. All aspects of the study were supervised by J.S.

Author information

Affiliations

  1. Department of Genome Sciences,

    • Sarah B. Ng
    • , Emily H. Turner
    • , Peggy D. Robertson
    • , Steven D. Flygare
    • , Choli Lee
    • , Tristan Shaffer
    • , Michelle Wong
    • , Evan E. Eichler
    • , Deborah A. Nickerson
    •  & Jay Shendure
  2. Department of Pediatrics, University of Washington,

    • Abigail W. Bigham
    •  & Michael Bamshad
  3. Howard Hughes Medical Institute, Seattle, Washington 98195, USA

    • Evan E. Eichler
  4. Agilent Technologies, Santa Clara, California 95051, USA

    • Arindam Bhattacharjee

Authors

  1. Search for Sarah B. Ng in:

  2. Search for Emily H. Turner in:

  3. Search for Peggy D. Robertson in:

  4. Search for Steven D. Flygare in:

  5. Search for Abigail W. Bigham in:

  6. Search for Choli Lee in:

  7. Search for Tristan Shaffer in:

  8. Search for Michelle Wong in:

  9. Search for Arindam Bhattacharjee in:

  10. Search for Evan E. Eichler in:

  11. Search for Michael Bamshad in:

  12. Search for Deborah A. Nickerson in:

  13. Search for Jay Shendure in:

Competing interests

COMPETING INTERESTS: A.B. is an employee of Agilent Technologies. Agilent supplies arrays that can be used for exome capture as described.

Corresponding authors

Correspondence to Sarah B. Ng or Jay Shendure.

The authors declare competing financial interests: details accompany the full-text HTML version of the paper at www.nature.com/nature.

Supplementary information

PDF files

  1. 1.

    Supplementary Information

    This file contains Supplementary Figures 1-6 with Legends and Supplementary Tables 1-5.

Text files

  1. 1.

    Supplementary Data 1

    This file lists intervals within the targeted exome that were excluded from consideration based on poor anticipated mappability with 76 bp single-end reads.

  2. 2.

    Supplementary Data 2

    This file lists the fraction of targeted coding bases in each gene that were covered in each of 12 individuals (either with >=1x coverage or with sufficient coverage to variant call).

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nature08250

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.