The human haplotype map (HapMap) has come a long way from its first public appearance in 2003. Phase 1 of the project saw a little over 1 million single-nucleotide polymorphisms (SNPs) genotyped in 270 individuals from four populations. The recently published HapMap 3 maps 1.6 million SNPs and around 800 copy-number polymorphisms in over 1,000 individuals from 11 populations as diverse as Japanese in Tokyo, Maasai in Kenya and Tuscans in Italy.

The HapMap project is a multinational effort to catalog genetic differences in humans. Initially the consortium's strategy was to use DNA microarrays containing common SNPs—polymorphisms with high minor allele frequencies—to genotype individuals and determine haplotyes, the blocks of SNPs that are always seen together in a population. These data provided a dense map of markers across the genome that proved useful, for example, in genome-wide association studies (GWAS) to establish connections between certain haplotypes and a phenotype of interest.

From its inception it was clear that HapMap would expand into different populations in future, but what became evident only over time was the importance of also including SNPs of lower frequency that current microarrays do not capture. “The current generation of [microarrays] is really quite a skewed version of the human genetic architecture,” says Richard Gibbs of the Baylor College of Medicine, one of the HapMap project coordination leaders.

Hence, he and the other members of the HapMap 3 consortium decided on a two-pronged approach. They combined the use of microarrays to look at common SNPs in new populations with in-depth sequencing of ten 100-kilobase regions in 700 individuals to discover low-frequency, rare and private SNPs. These turned out to make up more than half of all variants in a population, with 99% of all new SNPs showing a minor allele frequency of less than 5%.

The researchers were particularly intrigued by recurrent variants—SNPs that are rare, with a minor allele frequency of less than 0.5%, but seen in geographically widely separated populations in the context of different haplotypes. These SNPs mark independently occurring events and will be of value for understanding the evolution of the human genome.

Surprisingly, the HapMap team also found that the rate of accumulation of rare variants in each individual did not flatten even after 700 people had been analyzed, supporting the idea that much larger sequencing efforts are called for to truly understand human genetic diversity.

Such sequencing efforts are underway in the 1000 Genomes Project, which aims to provide an extensive catalog of human genetic variation drawn from sequence data of thousands of people from 20 populations. Gibbs sees HapMap 3 as a forerunner for the 1000 Genomes Project, as it shows that deep sequencing uncovers variants that are hidden to array analysis; he quips that an appropriate summary of HapMap 3 could be, “Look what a good idea it is to sequence all these people.”

Although HapMap 3 offers important insights into the variation in the human genome, a true data explosion will arise out of the 1000 Genomes project. These data will inform the design of new GWAS that may lead to a better understanding of the contribution of rare variants to complex traits.