Abstract
Targeted capture combined with massively parallel exome sequencing is a promising approach to identify genetic variants implicated in human traits. We report exome sequencing of 200 individuals from Denmark with targeted capture of 18,654 coding genes and sequence coverage of each individual exome at an average depth of 12-fold. On average, about 95% of the target regions were covered by at least one read. We identified 121,870 SNPs in the sample population, including 53,081 coding SNPs (cSNPs). Using a statistical method for SNP calling and an estimation of allelic frequencies based on our population data, we derived the allele frequency spectrum of cSNPs with a minor allele frequency greater than 0.02. We identified a 1.8-fold excess of deleterious, non-syonomyous cSNPs over synonymous cSNPs in the low-frequency range (minor allele frequencies between 2% and 5%). This excess was more pronounced for X-linked SNPs, suggesting that deleterious substitutions are primarily recessive.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Accession codes
References
Wang, J. et al. The diploid genome sequence of an Asian individual. Nature 456, 60–65 (2008).
Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007).
Wheeler, D.A. et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872–876 (2008).
Albert, T.J. et al. Direct selection of human genomic loci by microarray hybridization. Nat. Methods 4, 903–905 (2007).
Ng, S.B. et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461, 272–276 (2009).
Choi, M. et al. Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proc. Natl. Acad. Sci. USA 106, 19096–19101 (2009).
Leabman, M.K. et al. Natural variation in human membrane transporter genes reveals evolutionary and functional constraints. Proc. Natl. Acad. Sci. USA 100, 5896–5901 (2003).
Bustamante, C.D. et al. Natural selection on protein-coding genes in the human genome. Nature 437, 1153–1157 (2005).
Nielsen, R. et al. Darwinian and demographic forces affecting human protein coding genes. Genome Res. 19, 838–849 (2009).
Boyko, A.R. et al. Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet. 4, e1000083 (2008).
Johnson, P.L. & Slatkin, M. Inference of population genetic parameters in metagenomics: a clean look at messy data. Genome Res. 16, 1320–1327 (2006).
Johnson, P.L. & Slatkin, M. Accounting for bias from sequencing error in population genetic estimates. Mol. Biol. Evol. 25, 199–206 (2008).
Lynch, M. Estimation of nucleotide diversity, disequilibrium coefficients, and mutation rates from high-coverage genome-sequencing projects. Mol. Biol. Evol. 25, 2409–2419 (2008).
Lynch, M. Estimation of allele frequencies from high-coverage genome-sequencing projects. Genetics 182, 295–301 (2009).
Sunyaev, S.R., Lathe, W.C. III, Ramensky, V.E. & Bork, P. SNP frequencies in human genes an excess of rare alleles and differing modes of selection. Trends Genet. 16, 335–337 (2000).
Williamson, S.H. et al. Simultaneous inference of selection and population growth from patterns of variation in the human genome. Proc. Natl. Acad. Sci. USA 102, 7882–7887 (2005).
Keightley, P.D. & Eyre-Walker, A. Joint inference of the distribution of fitness effects of deleterious mutations and population demography based on nucleotide polymorphism frequencies. Genetics 177, 2251–2261 (2007).
Hammer, M.F. et al. Heterogeneous patterns of variation among multiple human X–linked loci: the possible role of diversity-reducing selection in non-Africans. Genetics 167, 1841–1853 (2004).
Vicoso, B. & Charlesworth, B. Evolution on the X chromosome: unusual patterns and processes. Nat. Rev. Genet. 7, 645–653 (2006).
Hill, W.G. & Robertson, A. The effect of linkage on limits to artificial selection. Genet. Res. 8, 269–294 (1966).
Nachman, M.W. & Crowell, S.L. Estimate of the mutation rate per nucleotide in humans. Genetics 156, 297–304 (2000).
Meunier, J. & Duret, L. Recombination drives the evolution of GC content in the human genome. Mol. Biol. Evol. 21, 984–990 (2004).
McCarthy, M.I. et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat. Rev. Genet. 9, 356–369 (2008).
Li, R., Li, Y., Kristiansen, K. & Wang, J. SOAP: short oligonucleotide alignment program. Bioinformatics 24, 713–714 (2008).
Li, R. et al. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25, 1966–1967 (2009).
Li, R. et al. SNP detection for massively parallel whole-genome resequencing. Genome Res. 19, 1124–1132 (2009).
Sawyer, S.A. & Hartl, D.L. Population genetics of polymorphism and divergence. Genetics 132, 1161–1176 (1992).
Acknowledgements
This project was funded by the Lundbeck Foundation and produced by The Lundbeck Foundation Centre for Applied Medical Genomics in Personalised Disease Prediction, Prevention and Care (LuCAMP). The project was also supported by a National Basic Research Program of China (973 program no. 2011CB809200; 2007CB815703; 2007CB815705; and 863 program no. 2006AA02Z177; 2006AA02Z334; 2006AA02A302; 2009AA022707), the Chinese Academy of Science (GJHZ0701-6), the National Natural Science Foundation of China (30725008; 30890032; 30811130531; 30221004), the Chinese Academy of Science (GJHZ0701-6), the Chinese 973 program (2007CB815703; 2007CB815705), the Chinese 863 program (2006AA02Z177), the Danish Platform for Integrative Biology and the Ole Rømer grant from the Danish Natural Science Research Council. The Shenzhen Municipal Government and the Yantian District Local Government of Shenzhen additionally funded the project (grants JC200903190767A; JC200903190772A; ZYC200903240076A; CXB200903110066A; ZYC200903240077A; ZYC200903240076A and ZYC200903240080A). N.V. and E.H.-S. were supported with fellowships from the Swiss and American National Science Foundations. We are indebted to T. Lauritzen and K. Borch-Johnsen for their contribution to LuCAMP.
Author information
Authors and Affiliations
Contributions
LuCamp was founded and is managed by O.P., Jun Wang, R.N., T.H., G.A., L.B., O.S., T. Lauritzen, K.K., T. Jørgensen, A. Astrup, T.W.S. and A. Albrechtsen. Y.L., N.V., G.T., E.H.-S. and T. Jiang contributed equally to this work. H.Y., Jian Wang, O.P. and Jun Wang managed the present project. Jun Wang, R.N., O.P. and Y.L. designed the analyses. O.P., T.H. and T. Jørgensen recruited the volunteers and prepared the DNA samples. Jun Wang, R.N., Y.L., N.V., E.H.-S., T. Jiang, A. Albrechtsen, H.C., T.K., Y.G., X.J., Q.L., H.W., C.Y., H.Z. and O.P. performed the data analyses. G.T., H.J., J.L., X.L., M.T., R.W. and X.Z. performed sequencing and Sequenom genotyping. Jun Wang, R.N., O.P., N.V., E.H.-S. and Y.L. wrote the first manuscript. All authors contributed to the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Text and Figures
Supplementary Tables 2,4,6 and 7, Supplementary Figures 1–3 and Supplementary Note (PDF 588 kb)
Supplementary Table 1
Detailed data production information for each sample. (XLS 17 kb)
Supplementary Table 3
Sequenom iPex genotyping results and sequencing results of each sample individual at genotyped sites. (XLS 388 kb)
Supplementary Table 5
Putative extrapolation estimation of SNP counts in each individual. (XLS 31 kb)
Rights and permissions
About this article
Cite this article
Li, Y., Vinckenbosch, N., Tian, G. et al. Resequencing of 200 human exomes identifies an excess of low-frequency non-synonymous coding variants. Nat Genet 42, 969–972 (2010). https://doi.org/10.1038/ng.680
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng.680
This article is cited by
-
Sequencing for germline mutations in Swedish breast cancer families reveals novel breast cancer risk genes
Scientific Reports (2021)
-
A search for modifying genetic factors in CHEK2:c.1100delC breast cancer patients
Scientific Reports (2021)
-
Functional architecture of low-frequency variants highlights strength of negative selection across coding and non-coding annotations
Nature Genetics (2018)
-
The effect of QTL-rich region polymorphisms identified by targeted DNA-seq on pig production traits
Molecular Biology Reports (2018)
-
Novel mutations of TCIRG1 cause a malignant and mild phenotype of autosomal recessive osteopetrosis (ARO) in four Chinese families
Acta Pharmacologica Sinica (2017)