Despite great progress in identifying genetic variants that influence human disease, most inherited risk remains unexplained. A more complete understanding requires genome-wide studies that fully examine less common alleles in populations with a wide range of ancestry. To inform the design and interpretation of such studies, we genotyped 1.6 million common single nucleotide polymorphisms (SNPs) in 1,184 reference individuals from 11 global populations, and sequenced ten 100-kilobase regions in 692 of these individuals. This integrated data set of common and rare alleles, called ‘HapMap 3’, includes both SNPs and copy number polymorphisms (CNPs). We characterized population-specific differences among low-frequency variants, measured the improvement in imputation accuracy afforded by the larger reference panel, especially in imputing SNPs with a minor allele frequency of ≤5%, and demonstrated the feasibility of imputing newly discovered CNPs and SNPs. This expanded public resource of genome variants in global populations supports deeper interrogation of genomic variation and its role in human disease, and serves as a step towards a high-resolution map of the landscape of human genetic variation.
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
International Human Genome Sequencing Consortium.. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001)
The Internation SNP Map Working Group.. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409, 928–933 (2001)
The International HapMap Consortium. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–861 (2007)
Donnelly, P. Progress and challenges in genome-wide association studies in humans. Nature 456, 728–731 (2008)
Manolio, T. A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009)
Korn, J. M. et al. Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nature Genet. 40, 1253–1260 (2008)
McCarroll, S. A. et al. Integrated detection and population-genetic analysis of SNPs and copy number variation. Nature Genet. 40, 1166–1174 (2008)
Barnes, C. et al. A robust statistical method for case-control association testing with copy number variation. Nature Genet. 40, 1245–1252 (2008)
Redon, R. et al. Global variation in copy number in the human genome. Nature 444, 444–454 (2006)
Conrad, D. F. et al. Origins and functional impact of copy number variation in the human genome. Nature 464, 704–712 (2010)
Teo, Y. Y. et al. A genotype calling algorithm for the Illumina BeadArray platform. Bioinformatics 23, 2741–2746 (2007)
The Internatinal HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005)
Zhang, J. et al. SNPdetector: a software tool for sensitive and accurate SNP detection. PLOS Comput. Biol. 1 e53 10.1371/journal.pcbi.0010053 (2005)
Campbell, M. C. & Tishkoff, S. A. African genetic diversity: implications for human demographic history, modern human origins, and complex disease mapping. Annu. Rev. Genomics Hum. Genet. 9, 403–433 (2008)
Keinan, A., Mullikin, J. C., Patterson, N. & Reich, D. Measurement of the human allele frequency spectrum demonstrates greater genetic drift in East Asians than in Europeans. Nature Genet. 39, 1251–1255 (2007)
van Heel, D. A. et al. A genome-wide association study for celiac disease identifies risk variants in the region harboring IL2 and IL21. Nature Genet. 39, 827–829 (2007)
The Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007)
Pe’er, I. et al. Biases and reconciliation in estimates of linkage disequilibrium in the human genome. Am. J. Hum. Genet. 78, 588–603 (2006)
Grossman, S. R. et al. A composite of multiple signals distinguishes causal variants in regions of positive selection. Science 327, 883–886 (2010)
Sabeti, P. C. et al. Positive natural selection in the human lineage. Science 312, 1614–1620 (2006)
Lamason, R. L. et al. SLC24A5, a putative cation exchanger, affects pigmentation in zebrafish and humans. Science 310, 1782–1786 (2005)
Akey, J. M. Constructing genomic maps of positive selection in humans: where do we go from here? Genome Res. 19, 711–722 (2009)
Pickrell, J. K. et al. Signals of recent positive selection in a worldwide sample of human populations. Genome Res. 19, 826–837 (2009)
Carlson, C. S. et al. Genomic regions exhibiting positive selection identified from dense genotype data. Genome Res. 15, 1553–1565 (2005)
Gu, J. et al. A genome scan for positive selection in thoroughbred horses. PLoS ONE 4 e5767 10.1371/journal.pone.0005767 (2009)
Li, Y. & Abecasis, G. R. Mach 1.0: rapid haplotype reconstruction and missing genotype inference. Am. J. Hum. Genet. S79, 2290 (2006)
Colella, S. et al. QuantiSNP: an Objective Bayes Hidden-Markov Model to detect and accurately map copy number variation using SNP genotyping data. Nucleic Acids Res. 35, 2013–2025 (2007)
We dedicate this work to Leena Peltonen for her vital leadership role in this study, and in memory of a valued friend and colleague. We thank E. Boerwinkle and R. Durbin for critical reading of the manuscript. We thank the USA National Institutes of Health, the National Human Genome Research Institute, the National Institute on Deafness and Other Communication Disorders and the Wellcome Trust for supporting the majority of this work. Funding was also provided by the Louis-Jeantet Foundation and the NCCR ‘Frontiers in Genetics’ (Swiss National Science Foundation). We thank the people from the following communities who were generous in donating their blood samples to be studied in this project: the Yoruba in Ibadan, Nigeria; the Maasai in Kinyawa, Kenya; the Luhya in Webuye, Kenya; the Han Chinese in Beijing, China; the Japanese in Tokyo, Japan; the Chinese in metropolitan Denver, Colorado; the Gujarati Indians in Houston, Texas; the Toscani in Italia; the community of African ancestry in the southwestern USA; and the community of Mexican ancestry in Los Angeles, California. We also thank the people in the Utah Centre d’Etude du Polymorphisme Humain community who allowed the samples they donated earlier to be used for the project. The authors acknowledge use of DNA from the 1958 British birth cohort collection, funded by the UK Medical Research Council grant G0000934 and the Wellcome Trust grant 068545/Z/02. The Illumina 550K genotype data for the 1958 British birth cohort samples were made available by the Sanger Institute. For the 1958 British birth cohort Affymetrix 500K genotype data, we thank the Wellcome Trust Case Control Consortium (http://www.wtccc.org.uk), which was funded by Wellcome Trust award 076113.
The author declare no competing financial interests.
The HapMap 3/ENCODE 3 data set has been deposited at http://www.hapmap.org. The sequence traces of ENCODE 3 can be accessed at http://www.ncbi.nlm.nih.gov/Traces/trace.cgi by submitting the query:species_code5“HOMO SAPIENS” and CENTER_NAME 5 “BCM” and CENTER_PROJECT 5 “RHIAY”.
A list of participants and their affiliations appears at the end of the paper
This file contains Supplementary Information comprising Introduction, Large Scale Genotyping, Rare Allele Calling Bias, Deep PCR Sequencing, Copy Number Polymorphism (CNP) Analysis, Population Analyses, Recurrent SNPs and Haplotype sharing (see Contents page for full details). It also includes Supplementary Tables 1-11, Supplementary Figures 1-9 with legends and additional references. (PDF 1111 kb)
About this article
Cite this article
Altshuler, D., Gibbs, R., Peltonen, L. et al. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010) doi:10.1038/nature09298
Nature Communications (2019)
Confounding of linkage disequilibrium patterns in large scale DNA based gene-gene interaction studies
BioData Mining (2019)
Bivariate causal mixture model quantifies polygenic overlap between complex traits beyond genetic correlation
Nature Communications (2019)
A scalable sparse Cholesky based approach for learning high-dimensional covariance matrices in ordered data
Machine Learning (2019)
Current Opinion in Rheumatology (2019)