Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants

Fu, Wenqing; O’Connor, Timothy D.; Jun, Goo; Kang, Hyun Min; Abecasis, Goncalo; Leal, Suzanne M.; Gabriel, Stacey; Rieder, Mark J.; Altshuler, David; Shendure, Jay; Nickerson, Deborah A.; Bamshad, Michael J.; NHLBI Exome Sequencing Project; Akey, Joshua M.

doi:10.1038/nature11690

Letter
Published: 28 November 2012

Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants

Wenqing Fu¹,
Timothy D. O’Connor¹,
Goo Jun²,
Hyun Min Kang²,
Goncalo Abecasis²,
Suzanne M. Leal³,
Stacey Gabriel⁴,
Mark J. Rieder¹,
David Altshuler⁴,
Jay Shendure¹,
Deborah A. Nickerson¹,
Michael J. Bamshad^1,5,
NHLBI Exome Sequencing Project⁶ &
…
Joshua M. Akey¹

Nature volume 493, pages 216–220 (2013)Cite this article

29k Accesses
673 Citations
349 Altmetric
Metrics details

Subjects

A Corrigendum to this article was published on 13 March 2013

Abstract

Establishing the age of each mutation segregating in contemporary human populations is important to fully understand our evolutionary history^1,2 and will help to facilitate the development of new approaches for disease-gene discovery³. Large-scale surveys of human genetic variation have reported signatures of recent explosive population growth^4,5,6, notable for an excess of rare genetic variants, suggesting that many mutations arose recently. To more quantitatively assess the distribution of mutation ages, we resequenced 15,336 genes in 6,515 individuals of European American and African American ancestry and inferred the age of 1,146,401 autosomal single nucleotide variants (SNVs). We estimate that approximately 73% of all protein-coding SNVs and approximately 86% of SNVs predicted to be deleterious arose in the past 5,000–10,000 years. The average age of deleterious SNVs varied significantly across molecular pathways, and disease genes contained a significantly higher proportion of recently arisen deleterious SNVs than other genes. Furthermore, European Americans had an excess of deleterious variants in essential and Mendelian disease genes compared to African Americans, consistent with weaker purifying selection due to the Out-of-Africa dispersal. Our results better delimit the historical details of human protein-coding variation, show the profound effect of recent human history on the burden of deleterious SNVs segregating in contemporary populations, and provide important practical information that can be used to prioritize variants in disease-gene discovery.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

Figure 1: **The vast majority of protein-coding single-nucleotide variants arose recently.**

Figure 2: **Characteristics of allele age for deleterious single-nucleotide variants.**

Figure 3: **Distribution of deleterious single-nucleotide variants across the exome before and after recent accelerated population growth.**

Figure 4: **Heterogeneity of allele age across genes and pathways.**

FarGen: Elucidating the distribution of coding variants in the isolated population of the Faroe Islands

Article Open access 21 November 2022

A structural variation reference for medical and population genetics

Article Open access 27 May 2020

The prevalence, genetic complexity and population-specific founder effects of human autosomal recessive disorders

Article Open access 02 June 2021

Accession codes

Data deposits

Filtered sets of annotated variants and their allele frequencies are available at (http://evs.gs.washington.edu/EVS/) and genotypes and phenotypes from a large subset of individuals are also available through dbGaP (http://www.ncbi.nlm.nih.gov/gap) using the following accession information: NHLBI GO-ESP: Women’s Health Initiative Exome Sequencing Project (WHI) – WHISP, WHISP_Subject_Phenotypes, pht002246.v2.p2, phs000281.v2.p2; NHLBI GO-ESP: Heart Cohorts Exome Sequencing Project (JHS), ESP_HeartGO_JHS_LDLandEOMI_Subject_Phenotypes, pht002539.v1.p1, phs000402.v1.p1; NHLBI GO-ESP: Heart Cohorts Exome Sequencing Project (FHS), HeartGO_FHS_LDLandEOMI_PhenotypeDataFile, pht002476.v1.p1, phs000401.v1.p1; NHLBI GO-ESP: Heart Cohorts Exome Sequencing Project (CHS), HeartGO_CHS_LDL_PhenotypeDataFile, pht002536.v1.p1, phs000400.v1.p1; NHLBI GO-ESP: Heart Cohorts Exome Sequencing Project (ARIC), ESP_ARIC_LDLandEOMI_Sample, pht002466.v1.p1, phs000398.v1.p1;NHLBIGO-ESP: Lung Cohorts Exome Sequencing Project (Cystic Fibrosis), ESP_LungGO_CF_PA_Culture_Data, pht002227.v1.p1, phs000254.v1.p1; NHLBI GO-ESP: Early-Onset Myocardial Infarction (Broad EOMI), ESP_Broad_EOMI_Subject_Phenotypes, pht001437.v1.p1, phs000279.v1.p1; NHLBI GO-ESP: Lung Cohorts Exome Sequencing Project (Pulmonary Arterial Hypertension), PAH_Subject_Phenotypes_Baseline_Measures, pht002277.v1.p1, phs000290.v1.p1; NHLBI GO-ESP: Lung Cohorts Exome Sequencing Project (Lung Health Study of Chronic Obstructive Pulmonary Disease), LHS_COPD_Subject_Phenotypes_Baseline_Measures, pht002272.v1.p1, phs000291.v1.p1.

References

Kimura, M. & Ota, T. The age of a neutral mutant persisting in a finite population. Genetics 75, 199–212 (1973)
CAS PubMed PubMed Central Google Scholar
Tishkoff, S. A. & Verrelli, B. C. Patterns of human genetic diversity: implications for human evolutionary history and disease. Annu. Rev. Genomics Hum. Genet. 4, 293–340 (2003)
Article CAS Google Scholar
Slatkin, M. & Rannala, B. Estimating allele age. Annu. Rev. Genomics Hum. Genet. 1, 225–249 (2000)
Article CAS Google Scholar
Keinan, A. & Clark, A. G. Recent explosive human population growth has resulted in an excess of rare genetic variants. Science 336, 740–743 (2012)
Article ADS CAS Google Scholar
Nelson, M. R. et al. An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science 337, 100–104 (2012)
Article ADS CAS Google Scholar
Tennessen, J. A. et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012)
Article ADS CAS Google Scholar
Griffiths, R. C. & Tavaré, S. The age of a mutation in a general coalescent tree. Commun. Stat. Stoch. Models 14, 273–295 (1998)
Article MathSciNet Google Scholar
Coventry, A. et al. Deep resequencing reveals excess rare recent variants consistent with explosive population growth. Nature Commun. 1, 131 (2010)
Article ADS Google Scholar
Gravel, S. et al. Demographic history and rare allele sharing among human populations. Proc. Natl Acad. Sci. USA 108, 11983–11988 (2011)
Article ADS CAS Google Scholar
Gutenkunst, R. N., Hernandez, R. D., Williamson, S. H. & Bustamante, C. D. Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet. 5, e1000695 (2009)
Article Google Scholar
Schaffner, S. F. et al. Calibrating a coalescent simulation of human genome sequence variation. Genome Res. 15, 1576–1583 (2005)
Article CAS Google Scholar
Gibson, G. Rare and common variants: twenty arguments. Nature Rev. Genet. 13, 135–145 (2012)
Article CAS Google Scholar
Kumar, P., Henikoff, S. & Ng, P. C. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nature Protocols 4, 1073–1081 (2009)
Article CAS Google Scholar
Adzhubei, I. A. et al. A method and server for predicting damaging missense mutations. Nature Methods 7, 248–249 (2010)
Article CAS Google Scholar
Chun, S. & Fay, J. C. Identification of deleterious mutations within three human genomes. Genome Res. 19, 1553–1561 (2009)
Article CAS Google Scholar
Schwarz, J. M., Rodelsperger, C., Schuelke, M. & Seelow, D. MutationTaster evaluates disease-causing potential of sequence alterations. Nature Methods 7, 575–576 (2010)
Article CAS Google Scholar
Davydov, E. V. et al. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLOS Comput. Biol. 6, e1001025 (2010)
Article Google Scholar
Pollard, K. S., Hubisz, M. J., Rosenbloom, K. R. & Siepel, A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110–121 (2010)
Article CAS Google Scholar
Becker, K. G., Barnes, K. C., Bright, T. J. & Wang, S. A. The genetic association database. Nature Genet. 36, 431–432 (2004)
Article CAS Google Scholar
Pyun, J. A., Cha, D. H. & Kwack, K. LAMC1 gene is associated with premature ovarian failure. Maturitas 71, 402–406 (2012)
Article CAS Google Scholar
Liu, Q. et al. Amyloid precursor protein regulates brain apolipoprotein E and cholesterol metabolism through lipoprotein receptor LRP1. Neuron 56, 66–78 (2007)
Article CAS Google Scholar
Jia, E. Z. et al. Association of the mutation for the human carboxypeptidase E gene exon 4 with the severity of coronary artery atherosclerosis. Mol. Biol. Rep. 36, 245–254 (2009)
Article CAS Google Scholar
Valdmanis, P. N. et al. Mutations in the KIAA0196 gene at the SPG8 locus cause hereditary spastic paraplegia. Am. J. Hum. Genet. 80, 152–161 (2007)
Article CAS Google Scholar
Blekhman, R. et al. Natural selection on genes that underlie human disease susceptibility. Curr. Biol. 18, 883–889 (2008)
Article CAS Google Scholar
Liao, B. Y., Scott, N. M. & Zhang, J. Impacts of gene essentiality, expression pattern, and gene compactness on the evolutionary rate of mammalian proteins. Mol. Biol. Evol. 23, 2072–2080 (2006)
Article CAS Google Scholar
Lohmueller, K. E. et al. Proportionally more deleterious genetic variation in European than in African populations. Nature 451, 994–997 (2008)
Article ADS CAS Google Scholar
Hawks, J., Wang, E. T., Cochran, G. M., Harpending, H. C. & Moyzis, R. K. Recent acceleration of human adaptive evolution. Proc. Natl Acad. Sci. USA 104, 20753–20758 (2007)
Article ADS CAS Google Scholar

Download references

Acknowledgements

We acknowledge the support of the National Heart, Lung and Blood Institute (NHLBI), the contributions of the research institutions that participated in this study, the study investigators, field staff and study participants who created this resource for biomedical research, and the Population Genetics Project Team of the NHLBI. We thank J. Wilson and R. Do for critical feedback on the manuscript. Funding for the GO (Grand Opportunity) Exome Sequencing Project was provided by NHLBI grants RC2 HL-103010 (Heart GO), RC2 HL-102923 (Lung GO) and RC2 HL-102924 (WHISP). The exome sequencing was was supported by NHLBI grants RC2 HL-102925 (Broad GO) and RC2 HL-102926 (Seattle GO).

Author information

Authors and Affiliations

Department of Genome Sciences, University of Washington, Seattle, 98195, Washington, USA
Wenqing Fu, Timothy D. O’Connor, Mark J. Rieder, Jay Shendure, Deborah A. Nickerson, Michael J. Bamshad & Joshua M. Akey
Department of Biostatistics, University of Michigan, Ann Arbor, 48109, Michigan, USA
Goo Jun, Hyun Min Kang & Goncalo Abecasis
Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, 77030, Texas, USA
Suzanne M. Leal
Broad Institute of MIT and Harvard, Cambridge, 02142, Massachusetts, USA
Stacey Gabriel & David Altshuler
Department of Pediatrics, University of Washington, Seattle, 98195, Washington, USA
Michael J. Bamshad
*Lists of participants and affiliations appear in the Supplementary Information,
NHLBI Exome Sequencing Project

Authors

Wenqing Fu
View author publications
You can also search for this author in PubMed Google Scholar
Timothy D. O’Connor
View author publications
You can also search for this author in PubMed Google Scholar
Goo Jun
View author publications
You can also search for this author in PubMed Google Scholar
Hyun Min Kang
View author publications
You can also search for this author in PubMed Google Scholar
Goncalo Abecasis
View author publications
You can also search for this author in PubMed Google Scholar
Suzanne M. Leal
View author publications
You can also search for this author in PubMed Google Scholar
Stacey Gabriel
View author publications
You can also search for this author in PubMed Google Scholar
Mark J. Rieder
View author publications
You can also search for this author in PubMed Google Scholar
David Altshuler
View author publications
You can also search for this author in PubMed Google Scholar
Jay Shendure
View author publications
You can also search for this author in PubMed Google Scholar
Deborah A. Nickerson
View author publications
You can also search for this author in PubMed Google Scholar
Michael J. Bamshad
View author publications
You can also search for this author in PubMed Google Scholar
NHLBI Exome Sequencing Project
View author publications
You can also search for this author in PubMed Google Scholar
Joshua M. Akey
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

W.F. and J.M.A. conceived the analyses. D.A.N., S.G., M.J.R. and D.A. oversaw data generation and quality control. G.J., H.M.K. and G.A. developed algorithms and identified SNVs from the sequencing data. W.F. carried out the majority of analyses with contributions from T.D.O. W.F., M.J.B., J.S. and J.M.A. analysed the data and wrote the manuscript with contributions from all authors. W.F., T.D.O., S.M.L., J.S., M.J.R., D.A.N., M.J.B. and J.M.A. are members of the Seattle Grand Opportunity (GO) group and G.J., H.M.K., G.A., S.G. and D.A. are members of the Broad GO group, which are both sub-groups of the NHLBI Exome Sequencing Project (ESP).

Corresponding authors

Correspondence to Wenqing Fu or Joshua M. Akey.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Information

This file contains Supplementary Text and Data, Supplementary References, Supplementary Tables 1-4 and Supplementary Figures 1-15 (see Table of Contents for more details). (PDF 3066 kb)

PowerPoint slides

PowerPoint slide for Fig. 1

PowerPoint slide for Fig. 2

PowerPoint slide for Fig. 3

PowerPoint slide for Fig. 4

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fu, W., O’Connor, T., Jun, G. et al. Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature 493, 216–220 (2013). https://doi.org/10.1038/nature11690

Download citation

Received: 13 July 2012
Accepted: 19 October 2012
Published: 28 November 2012
Issue Date: 10 January 2013
DOI: https://doi.org/10.1038/nature11690

This article is cited by

Novel MTR compound-heterozygous mutations in a Chinese girl with HHcy due to methionine synthase deficiency, cblG: a case report
- Juan Luo
- Xiaohong Chen
- Luhong Yang
Egyptian Journal of Medical Human Genetics (2024)
Pathogenic variants in human DNA damage repair genes mostly arose in recent human history
- Bojin Zhao
- Jiaheng Li
- San Ming Wang
BMC Cancer (2024)
Evolutionary origin of germline pathogenic variants in human DNA mismatch repair genes
- Huijun Lei
- Jiaheng Li
- San Ming Wang
Human Genomics (2024)
ClinVar and HGMD genomic variant classification accuracy has improved over time, as measured by implied disease burden
- Andrew G. Sharo
- Yangyun Zou
- Steven E. Brenner
Genome Medicine (2023)
Lack of CFAP54 causes primary ciliary dyskinesia in a mouse model and human patients
- Xinyue Zhao
- Haijun Ge
- Xue Zhang
Frontiers of Medicine (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.