Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Letter
  • Published:

Resequencing of 200 human exomes identifies an excess of low-frequency non-synonymous coding variants

Abstract

Targeted capture combined with massively parallel exome sequencing is a promising approach to identify genetic variants implicated in human traits. We report exome sequencing of 200 individuals from Denmark with targeted capture of 18,654 coding genes and sequence coverage of each individual exome at an average depth of 12-fold. On average, about 95% of the target regions were covered by at least one read. We identified 121,870 SNPs in the sample population, including 53,081 coding SNPs (cSNPs). Using a statistical method for SNP calling and an estimation of allelic frequencies based on our population data, we derived the allele frequency spectrum of cSNPs with a minor allele frequency greater than 0.02. We identified a 1.8-fold excess of deleterious, non-syonomyous cSNPs over synonymous cSNPs in the low-frequency range (minor allele frequencies between 2% and 5%). This excess was more pronounced for X-linked SNPs, suggesting that deleterious substitutions are primarily recessive.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Comparision of site frequency spectra for SNPs in different annotation categories with functional consideration.

Similar content being viewed by others

Accession codes

Accessions

GenBank/EMBL/DDBJ

References

  1. Wang, J. et al. The diploid genome sequence of an Asian individual. Nature 456, 60–65 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  3. Wheeler, D.A. et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872–876 (2008).

    Article  CAS  PubMed  Google Scholar 

  4. Albert, T.J. et al. Direct selection of human genomic loci by microarray hybridization. Nat. Methods 4, 903–905 (2007).

    Article  CAS  PubMed  Google Scholar 

  5. Ng, S.B. et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461, 272–276 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Choi, M. et al. Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proc. Natl. Acad. Sci. USA 106, 19096–19101 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Leabman, M.K. et al. Natural variation in human membrane transporter genes reveals evolutionary and functional constraints. Proc. Natl. Acad. Sci. USA 100, 5896–5901 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Bustamante, C.D. et al. Natural selection on protein-coding genes in the human genome. Nature 437, 1153–1157 (2005).

    Article  CAS  PubMed  Google Scholar 

  9. Nielsen, R. et al. Darwinian and demographic forces affecting human protein coding genes. Genome Res. 19, 838–849 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Boyko, A.R. et al. Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet. 4, e1000083 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  11. Johnson, P.L. & Slatkin, M. Inference of population genetic parameters in metagenomics: a clean look at messy data. Genome Res. 16, 1320–1327 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Johnson, P.L. & Slatkin, M. Accounting for bias from sequencing error in population genetic estimates. Mol. Biol. Evol. 25, 199–206 (2008).

    Article  CAS  PubMed  Google Scholar 

  13. Lynch, M. Estimation of nucleotide diversity, disequilibrium coefficients, and mutation rates from high-coverage genome-sequencing projects. Mol. Biol. Evol. 25, 2409–2419 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Lynch, M. Estimation of allele frequencies from high-coverage genome-sequencing projects. Genetics 182, 295–301 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Sunyaev, S.R., Lathe, W.C. III, Ramensky, V.E. & Bork, P. SNP frequencies in human genes an excess of rare alleles and differing modes of selection. Trends Genet. 16, 335–337 (2000).

    Article  CAS  PubMed  Google Scholar 

  16. Williamson, S.H. et al. Simultaneous inference of selection and population growth from patterns of variation in the human genome. Proc. Natl. Acad. Sci. USA 102, 7882–7887 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Keightley, P.D. & Eyre-Walker, A. Joint inference of the distribution of fitness effects of deleterious mutations and population demography based on nucleotide polymorphism frequencies. Genetics 177, 2251–2261 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Hammer, M.F. et al. Heterogeneous patterns of variation among multiple human X–linked loci: the possible role of diversity-reducing selection in non-Africans. Genetics 167, 1841–1853 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Vicoso, B. & Charlesworth, B. Evolution on the X chromosome: unusual patterns and processes. Nat. Rev. Genet. 7, 645–653 (2006).

    Article  CAS  PubMed  Google Scholar 

  20. Hill, W.G. & Robertson, A. The effect of linkage on limits to artificial selection. Genet. Res. 8, 269–294 (1966).

    Article  CAS  PubMed  Google Scholar 

  21. Nachman, M.W. & Crowell, S.L. Estimate of the mutation rate per nucleotide in humans. Genetics 156, 297–304 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. Meunier, J. & Duret, L. Recombination drives the evolution of GC content in the human genome. Mol. Biol. Evol. 21, 984–990 (2004).

    Article  CAS  PubMed  Google Scholar 

  23. McCarthy, M.I. et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat. Rev. Genet. 9, 356–369 (2008).

    Article  CAS  PubMed  Google Scholar 

  24. Li, R., Li, Y., Kristiansen, K. & Wang, J. SOAP: short oligonucleotide alignment program. Bioinformatics 24, 713–714 (2008).

    Article  CAS  PubMed  Google Scholar 

  25. Li, R. et al. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25, 1966–1967 (2009).

    Article  CAS  PubMed  Google Scholar 

  26. Li, R. et al. SNP detection for massively parallel whole-genome resequencing. Genome Res. 19, 1124–1132 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Sawyer, S.A. & Hartl, D.L. Population genetics of polymorphism and divergence. Genetics 132, 1161–1176 (1992).

    CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This project was funded by the Lundbeck Foundation and produced by The Lundbeck Foundation Centre for Applied Medical Genomics in Personalised Disease Prediction, Prevention and Care (LuCAMP). The project was also supported by a National Basic Research Program of China (973 program no. 2011CB809200; 2007CB815703; 2007CB815705; and 863 program no. 2006AA02Z177; 2006AA02Z334; 2006AA02A302; 2009AA022707), the Chinese Academy of Science (GJHZ0701-6), the National Natural Science Foundation of China (30725008; 30890032; 30811130531; 30221004), the Chinese Academy of Science (GJHZ0701-6), the Chinese 973 program (2007CB815703; 2007CB815705), the Chinese 863 program (2006AA02Z177), the Danish Platform for Integrative Biology and the Ole Rømer grant from the Danish Natural Science Research Council. The Shenzhen Municipal Government and the Yantian District Local Government of Shenzhen additionally funded the project (grants JC200903190767A; JC200903190772A; ZYC200903240076A; CXB200903110066A; ZYC200903240077A; ZYC200903240076A and ZYC200903240080A). N.V. and E.H.-S. were supported with fellowships from the Swiss and American National Science Foundations. We are indebted to T. Lauritzen and K. Borch-Johnsen for their contribution to LuCAMP.

Author information

Authors and Affiliations

Authors

Contributions

LuCamp was founded and is managed by O.P., Jun Wang, R.N., T.H., G.A., L.B., O.S., T. Lauritzen, K.K., T. Jørgensen, A. Astrup, T.W.S. and A. Albrechtsen. Y.L., N.V., G.T., E.H.-S. and T. Jiang contributed equally to this work. H.Y., Jian Wang, O.P. and Jun Wang managed the present project. Jun Wang, R.N., O.P. and Y.L. designed the analyses. O.P., T.H. and T. Jørgensen recruited the volunteers and prepared the DNA samples. Jun Wang, R.N., Y.L., N.V., E.H.-S., T. Jiang, A. Albrechtsen, H.C., T.K., Y.G., X.J., Q.L., H.W., C.Y., H.Z. and O.P. performed the data analyses. G.T., H.J., J.L., X.L., M.T., R.W. and X.Z. performed sequencing and Sequenom genotyping. Jun Wang, R.N., O.P., N.V., E.H.-S. and Y.L. wrote the first manuscript. All authors contributed to the final manuscript.

Corresponding authors

Correspondence to Oluf Pedersen, Rasmus Nielsen or Jun Wang.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Tables 2,4,6 and 7, Supplementary Figures 1–3 and Supplementary Note (PDF 588 kb)

Supplementary Table 1

Detailed data production information for each sample. (XLS 17 kb)

Supplementary Table 3

Sequenom iPex genotyping results and sequencing results of each sample individual at genotyped sites. (XLS 388 kb)

Supplementary Table 5

Putative extrapolation estimation of SNP counts in each individual. (XLS 31 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Y., Vinckenbosch, N., Tian, G. et al. Resequencing of 200 human exomes identifies an excess of low-frequency non-synonymous coding variants. Nat Genet 42, 969–972 (2010). https://doi.org/10.1038/ng.680

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng.680

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing