The haplotype map of the human genome, affectionately known as the HapMap, has been in the spotlight since the project was launched in 2002. Although the data have been available on the web throughout, the results of Phase I of the project were only published in October. The report shows how the HapMap can guide the design and analysis of genetic association studies — its principal goal — but also it reveals important information about structural variation and recombination, ultimately showing how natural selection shapes the human genome.

The aim was to create a public, genome-wide database of common human variation. In Phase I, 1 common SNP (with a minor allele frequency of at least 0.05) was genotyped per 5 kb on a genome-wide scale in each of the project's 269 DNA samples. These come from the Yoruba in Nigeria, Japanese in Tokyo, Han Chinese in Beijing and the Centre d'Etude du Polymorphisme Human collection, from Utah.

Using the HapMap data the authors confirmed that the human genome has a block structure; this is because recombination occurs mainly in short regions called recombination hotspots. They also constructed a fine-scale genetic map of the human genome, analysis of which led to an unexpected observation. Gene-rich regions that encode, for example, immune response or neurophysiological functions lie in regions of high recombination, whereas regions rich in genes that are associated with 'core cellular function' such as DNA and RNA metabolism lie in regions of low recombination. It seems therefore that the HapMap data contain information about natural selection.

The Phase I data contain more than 1 million SNPs, most of which are rare and are either present in the dbSNP database or are tightly linked with those that are. The authors show that the HapMap data capture enough common variation to provide sufficient tag SNPs for genome-wide association studies. For ancestral populations, further SNPs might be required, and it is not yet clear to what extent tag SNPs are transferable across populations. Moreover, the HapMap data can be used to maximize the power of array-based association studies, in which tag SNPs cannot be chosen by researchers, and to evaluate statistical significance and interpret results of genome-wide association studies.

As the authors say, the HapMap is a natural extension of the Human Genome Project, focusing on inter-individual variation. The project has created an unprecedented resource that will facilitate comprehensive genome-wide association studies and ultimately lead to identification of genetic determinants of complex diseases. The next step will be to understand the environmental factors that affect them.