Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

A second generation human haplotype map of over 3.1 million SNPs

This article has been updated

Abstract

We describe the Phase II HapMap, which characterizes over 3.1 million human single nucleotide polymorphisms (SNPs) genotyped in 270 individuals from four geographically diverse populations and includes 25–35% of common SNP variation in the populations surveyed. The map is estimated to capture untyped common variation with an average maximum r2 of between 0.9 and 0.96 depending on population. We demonstrate that the current generation of commercial genome-wide genotyping products captures common Phase II SNPs with an average maximum r2 of up to 0.8 in African and up to 0.95 in non-African populations, and that potential gains in power in association studies can be obtained through imputation. These data also reveal novel aspects of the structure of linkage disequilibrium. We show that 10–30% of pairs of individuals within a population share at least one region of extended genetic identity arising from recent ancestry and that up to 1% of all common variants are untaggable, primarily because they lie within recombination hotspots. We show that recombination rates vary systematically around genes and between genes of different function. Finally, we demonstrate increased differentiation at non-synonymous, compared to synonymous, SNPs, resulting from systematic differences in the strength or efficacy of natural selection between populations.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: SNP density in the Phase II HapMap.
Figure 2: Haplotype structure and recombination rate estimates from the Phase II HapMap.
Figure 3: The extent of recent co-ancestry among HapMap individuals.
Figure 4: Properties of untaggable SNPs.
Figure 5: Recombination rates around genes.
Figure 6: Properties of non-synonymous and synonymous SNPs.

Similar content being viewed by others

Change history

  • 18 January 2008

    Co-author Todd A. Johnson's name was inadvertently omitted from the list of RIKEN authors in the HTML version of the paper only. This was corrected on 18 January 2008.

References

  1. The International HapMap Consortium. Integrating ethics and science in the International HapMap Project. Nature Rev. Genet. 5, 467–475 (2004)

  2. The International HapMap Consortium. The International HapMap Project. Nature 426, 789–796 (2003)

  3. The International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005)

  4. Bowcock, A. M. Genomics: guilt by association. Nature 447, 645–646 (2007)

    Article  ADS  CAS  Google Scholar 

  5. Altshuler, D. & Daly, M. Guilt beyond a reasonable doubt. Nature Genet. 39, 813–815 (2007)

    Article  CAS  Google Scholar 

  6. Myers, S., Bottolo, L., Freeman, C., McVean, G. & Donnelly, P. A fine-scale map of recombination rates and hotspots across the human genome. Science 310, 321–324 (2005)

    Article  ADS  CAS  Google Scholar 

  7. McCarroll, S. A. et al. Common deletion polymorphisms in the human genome. Nature Genet. 38, 86–92 (2006)

    Article  CAS  Google Scholar 

  8. Conrad, D. F., Andrews, T. D., Carter, N. P., Hurles, M. E. & Pritchard, J. K. A high-resolution survey of deletion polymorphism in the human genome. Nature Genet. 38, 75–81 (2006)

    Article  CAS  Google Scholar 

  9. Voight, B. F., Kudaravalli, S., Wen, X. & Pritchard, J. K. A map of recent positive selection in the human genome. PLoS Biol. 4, e72 (2006)

    Article  Google Scholar 

  10. Redon, R. et al. Global variation in copy number in the human genome. Nature 444, 444–454 (2006)

    Article  ADS  CAS  Google Scholar 

  11. de Bakker, P. I. et al. A high-resolution HLA and SNP haplotype map for disease association studies in the extended human MHC. Nature Genet. 38, 1166–1172 (2006)

    Article  CAS  Google Scholar 

  12. Pastinen, T. et al. Mapping common regulatory variants to human haplotypes. Hum. Mol. Genet. 14, 3963–3971 (2005)

    Article  CAS  Google Scholar 

  13. Stranger, B. E. et al. Genome-wide associations of gene expression variation in humans. PLoS Genet. 1, e78 (2005)

    Article  Google Scholar 

  14. Cheung, V. G. et al. Mapping determinants of human gene expression by regional and genome-wide association. Nature 437, 1365–1369 (2005)

    Article  ADS  CAS  Google Scholar 

  15. Hinds, D. A. et al. Whole-genome patterns of common DNA variation in three human populations. Science 307, 1072–1079 (2005)

    Article  ADS  CAS  Google Scholar 

  16. de Bakker, P. I. et al. Efficiency and power in genetic association studies. Nature Genet. 37, 1217–1223 (2005)

    Article  CAS  Google Scholar 

  17. Pe'er, I. et al. Evaluating and improving power in whole-genome association studies using fixed marker sets. Nature Genet. 38, 663–667 (2006)

    Article  CAS  Google Scholar 

  18. Barrett, J. C. & Cardon, L. R. Evaluating coverage of genome-wide association studies. Nature Genet. 38, 659–662 (2006)

    Article  CAS  Google Scholar 

  19. Burdick, J. T., Chen, W. M., Abecasis, G. R. & Cheung, V. G. In silico method for inferring genotypes in pedigrees. Nature Genet. 38, 1002–1004 (2006)

    Article  CAS  Google Scholar 

  20. Servin, B. R. & Stephens, M. Imputation-based analysis of association studies: candidate regions and quantitative traits. PLoS Genet. 3, e114 (2007)

    Article  Google Scholar 

  21. The Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–668 (2007)

  22. Scott, L. J. et al. A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science 316, 1341–1345 (2007)

    Article  ADS  CAS  Google Scholar 

  23. Marchini, J., Howie, B., Myers, S., McVean, G. & Donnelly, P. A new multipoint method for genome-wide association studies via imputation of genotypes. Nature Genet. 39, 906–913 (2007)

    Article  CAS  Google Scholar 

  24. Chapman, J. M., Cooper, J. D., Todd, J. A. & Clayton, D. G. Detecting disease associations due to linkage disequilibrium using haplotype tags: a class of tests and the determinants of statistical power. Hum. Hered. 56, 18–31 (2003)

    Article  Google Scholar 

  25. Paabo, S. The mosaic that is our genome. Nature 421, 409–412 (2003)

    Article  ADS  CAS  Google Scholar 

  26. McVean, G., Spencer, C. C. & Chaix, R. Perspectives on human genetic variation from the HapMap Project. PLoS Genet. 1, e54 (2005)

    Article  Google Scholar 

  27. Purcell, S. et al. PLINK: a toolset for whole-genome association and population-based linkage analysis. Am. J. Hum. Genet. 81, 559–575 (2007)

    Article  CAS  Google Scholar 

  28. Broman, K. W. & Weber, J. L. Long homozygous chromosomal segments in reference families from the centre d’Etude du polymorphisme humain. Am. J. Hum. Genet. 65, 1493–1500 (1999)

    Article  CAS  Google Scholar 

  29. Gibson, J., Morton, N. E. & Collins, A. Extended tracts of homozygosity in outbred human populations. Hum. Mol. Genet. 15, 789–795 (2006)

    Article  CAS  Google Scholar 

  30. Lander, E. S. & Botstein, D. Homozygosity mapping: a way to map human recessive traits with the DNA of inbred children. Science 236, 1567–1570 (1987)

    Article  ADS  CAS  Google Scholar 

  31. Leutenegger, A. L. et al. Using genomic inbreeding coefficient estimates for homozygosity mapping of rare recessive traits: application to Taybi-Linder syndrome. Am. J. Hum. Genet. 79, 62–66 (2006)

    Article  CAS  Google Scholar 

  32. Te Meerman, G. J., Van der Meulen, M. A. & Sandkuijl, L. A. Perspectives of identity by descent (IBD) mapping in founder populations. Clin. Exp. Allergy 25 (Suppl 2). 97–102 (1995)

    Article  Google Scholar 

  33. Houwen, R. H. et al. Genome screening by searching for shared segments: mapping a gene for benign recurrent intrahepatic cholestasis. Nature Genet. 8, 380–386 (1994)

    Article  CAS  Google Scholar 

  34. Durham, L. K. & Feingold, E. Genome scanning for segments shared identical by descent among distant relatives in isolated populations. Am. J. Hum. Genet. 61, 830–842 (1997)

    Article  CAS  Google Scholar 

  35. Jeffreys, A. J. & May, C. A. Intense and highly localized gene conversion activity in human meiotic crossover hot spots. Nature Genet. 36, 151–156 (2004)

    Article  CAS  Google Scholar 

  36. McVean, G. A. et al. The fine-scale structure of recombination rate variation in the human genome. Science 304, 581–584 (2004)

    Article  ADS  CAS  Google Scholar 

  37. Myers, S. et al. The distribution and causes of meiotic recombination in the human genome. Biochem. Soc. Trans. 34, 526–530 (2006)

    Article  CAS  Google Scholar 

  38. Spencer, C. C. et al. The influence of recombination on human genetic diversity. PLoS Genet. 2, e148 (2006)

    Article  Google Scholar 

  39. Petes, T. D. Meiotic recombination hot spots and cold spots. Nature Rev. Genet. 2, 360–369 (2001)

    Article  CAS  Google Scholar 

  40. Smith, A. V., Thomas, D. J., Munro, H. M. & Abecasis, G. R. Sequence features in regions of weak and strong linkage disequilibrium. Genome Res. 15, 1519–1534 (2005)

    Article  CAS  Google Scholar 

  41. Thomas, P. D. et al. PANTHER: a library of protein families and subfamilies indexed by function. Genome Res. 13, 2129–2141 (2003)

    Article  CAS  Google Scholar 

  42. Winckler, W. et al. Comparison of fine-scale recombination rates in humans and chimpanzees. Science 308, 107–111 (2005)

    Article  ADS  CAS  Google Scholar 

  43. Ptak, S. E. et al. Fine-scale recombination patterns differ between chimpanzees and humans. Nature Genet. 37, 429–434 (2005)

    Article  CAS  Google Scholar 

  44. Sabeti, P. C. et al. Detecting recent positive selection in the human genome from haplotype structure. Nature 419, 832–837 (2002)

    Article  ADS  CAS  Google Scholar 

  45. Sabeti, P. C. et al. Genome-wide detection and characterization of positive selection in human populations. Nature doi:10.1038/nature06250 (this issue).

  46. Bustamante, C. D. et al. Natural selection on protein-coding genes in the human genome. Nature 437, 1153–1157 (2005)

    Article  ADS  CAS  Google Scholar 

  47. Cargill, M. et al. Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nature Genet. 22, 231–238 (1999)

    Article  CAS  Google Scholar 

  48. Akey, J. M., Zhang, G., Zhang, K., Jin, L. & Shriver, M. D. Interrogating a high-density SNP map for signatures of natural selection. Genome Res. 12, 1805–1814 (2002)

    Article  CAS  Google Scholar 

  49. Sabeti, P. C. et al. Positive natural selection in the human lineage. Science 312, 1614–1620 (2006)

    Article  ADS  CAS  Google Scholar 

  50. de Bakker, P. I. et al. Transferability of tag SNPs in genetic association studies in multiple populations. Nature Genet. 38, 1298–1303 (2006)

    Article  CAS  Google Scholar 

  51. Conrad, D. F. et al. A worldwide survey of haplotype variation and linkage disequilibrium in the human genome. Nature Genet. 38, 1251–1260 (2006)

    Article  CAS  Google Scholar 

  52. Service, S., Sabatti, C. & Freimer, N. Tag SNPs chosen from HapMap perform well in several population isolates. Genet. Epidemiol. 31, 189–194 (2007)

    Article  Google Scholar 

  53. Lim, J. et al. Comparative study of the linkage disequilibrium of an ENCODE region, chromosome 7p15, in Korean, Japanese, and Han Chinese samples. Genomics 87, 392–398 (2006)

    Article  CAS  Google Scholar 

  54. Rabbee, N. & Speed, T. P. A genotype calling algorithm for affymetrix SNP arrays. Bioinformatics 22, 7–12 (2006)

    Article  CAS  Google Scholar 

  55. Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007)

    Article  CAS  Google Scholar 

  56. Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nature Genet. 38, 904–909 (2006)

    Article  CAS  Google Scholar 

  57. Smith, R. A., Ho, P. J., Clegg, J. B., Kidd, J. R. & Thein, S. L. Recombination breakpoints in the human β-globin gene cluster. Blood 92, 4415–4421 (1998)

    CAS  PubMed  Google Scholar 

  58. Holloway, K., Lawson, V. E. & Jeffreys, A. J. Allelic recombination and de novo deletions in sperm in the human β-globin gene region. Hum. Mol. Genet. 15, 1099–1111 (2006)

    Article  CAS  Google Scholar 

  59. Weir, B. S. & Cockerham, C. C. Estimating F-statistics for the analysis of population structure. Evolution 38, 1358–1370 (1984)

    CAS  Google Scholar 

Download references

Acknowledgements

We thank many people who contributed to this project: all members of the genotyping laboratory and the sample, primer, bioinformatics, data quality and IT groups at Perlegen Sciences for technical and infrastructural support; J. Beck, C. Beiswanger, D. Coppock, A. Leach, J. Mintzer and L. Toji for transforming the Yoruba, Japanese and Han Chinese samples, distributing the DNA and cell lines, storing the samples for use in future research, and producing the community newsletters and reports; J. Greenberg and R. Anderson for providing funding and support for cell line transformation and storage in the NIGMS Human Genetic Cell Repository at the Coriell Institute; T. Dibling, T. Ishikura, S. Kanazawa, S. Mizusawa and S. Saito for help with genotyping; C. Hind and A. Moghadam for technical support in genotyping and all members of the subcloning and sequencing teams at the Wellcome Trust Sanger Institute; X. Ke for help with data analysis; Oxford E-Science Centre for provision of high-performance computing resources; H. Chen, W. Chen, L. Deng, Y. Dong, C. Fu, L. Gao, H. Geng, J. Geng, M. He, H. Li, H. Li, S. Li, X. Li, B. Liu, Z. Liu, F. Lu, F. Lu, G. Lu, C. Luo, X. Wang, Z. Wang, C. Ye and X. Yu for help with genotyping and sample collection; X. Feng, Y. Li, J. Ren and X. Zhou for help with sample collection; J. Fan, W. Gu, W. Guan, S. Hu, H. Jiang, R. Lei, Y. Lin, Z. Niu, B. Wang, L. Yang, W. Yang, Y. Wang, Z. Wang, S. Xu, W. Yan, H. Yang, W. Yuan, C. Zhang, J. Zhang, K. Zhang and G. Zhao for help with genotyping; P. Fong, C. Lai, C. Lau, T. Leung, L. Luk and W. Tong for help with genotyping; C. Pang for help with genotyping; K. Ding, B. Qiang, J. Zhang, X. Zhang and K. Zhou for help with genotyping; Q. Fu, S. Ghose, X. Lu, D. Nelson, A. Perez, S. Poole, R. Vega and H. Yonath for help with genotyping; C. Bruckner, T. Brundage, S. Chow, O. Iartchouk, M. Jain, M. Moorhead and K. Tran for help with genotyping; N. Addleman, J. Atilano, T. Chan, C. Chu, C. Ha, T. Nguyen, M. Minton and A. Phong for help with genotyping, and D. Lind for help with quality control and experimental design; R. Donaldson and S. Duan for help with genotyping, and J. Rice and N. Saccone for help with experimental design; J. Wigginton for help with implementing and testing QA/QC software; A. Clark, B. Keats, R. Myers, D. Nickerson and A. Williamson for providing advice to NIH; C. Juenger, C. Bennet, C. Bird, J. Melone, P. Nailer, M. Weiss, J. Witonsky and E. DeHaut-Combs for help with project management; M. Gray for organizing phone calls and meetings; D. Leja for help with figures; the Yoruba people of Ibadan, Nigeria, the people of Tokyo, Japan, and the community at Beijing Normal University, who participated in public consultations and community engagements; the people in these communities who donated their blood samples; and the people in the Utah CEPH community who allowed the samples they donated earlier to be used for the Project. This work was supported by the Japanese Ministry of Education, Culture, Sports, Science and Technology, the Wellcome Trust, Nuffield Trust, Wolfson Foundation, UK EPSRC, Genome Canada, Génome Québec, the Chinese Academy of Sciences, the Ministry of Science and Technology of the People’s Republic of China, the National Natural Science Foundation of China, the Hong Kong Innovation and Technology Commission, the University Grants Committee of Hong Kong, the SNP Consortium, the US National Institutes of Health (FIC, NCI, NCRR, NEI, NHGRI, NIA, NIAAA, NIAID, NIAMS, NIBIB, NIDA, NIDCD, NIDCR, NIDDK, NIEHS, NIGMS, NIMH, NINDS, NLM, OD), the W.M. Keck Foundation, and the Delores Dore Eccles Foundation. All SNPs genotyped within the HapMap Project are available from dbSNP (http://www.ncbi.nlm.nih.gov/SNP); all genotype information is available from dbSNP and the HapMap website (http://www.hapmap.org).

Author information

Authors and Affiliations

Consortia

Corresponding authors

Correspondence to Mark J. Daly (Project Leader) or Gilean McVean (Project Leader).

Ethics declarations

Competing interests

Some authors declare employment and personal financial interests. These authors declare employment financial interests: authors who are current employees of genotyping companies or were employees of genotyping companies (Affymetrix, Illumina, ParAllele, Perlegen) during the project. These authors declare personal financial interests (defined as serving on the advisory board of a genotyping company, owning stock in a genotyping company, or receiving royalties from a patent licensed to a genotyping company): A.B., A.C., A.S., D.R.C., M.S.C., J.B.F., L.M.G., L.R.C., P.H., P.Y.K., S.S.M. and T.D.W.

Additional information

Lists of participants and affiliations appear at the end of the paper. (Participants are arranged by institution and then alphabetically within institutions except for Principal Investigators and Project Leaders, as indicated.)

Supplementary information

Supplementary Information

The file contains Supplementary Notes, Supplementary Tables 1-9, Supplementary Figures 1-7 with Legends and additional references. (PDF 4470 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

The International HapMap Consortium. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–861 (2007). https://doi.org/10.1038/nature06258

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nature06258

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing