One of the biggest challenges that faces human genetics is to uncover the genetic basis of common disease. Although the idea that underpins this quest is relatively simple — study the genomes of many individuals and identify the genetic variants that distinguish those with disease from those without — to actually do this is a very difficult and expensive undertaking. However, a team of scientists from Perlegen Sciences, Inc. have now taken a step towards this goal in a large pilot study to identify all the single-nucleotide polymorphisms (SNPs) on human chromosome 21 using high-density oligonucleotide arrays. Importantly, they've found that the SNPs can be grouped into haplotype blocks, each of which can be defined by just three common haplotypes for 80% of all human chromosome 21s — indicating that much less haplotype diversity exists than previously thought. They also show that screening the human genome for genetic variation relating to disease is an achievable goal.

Patil et al. began this mammoth undertaking by isolating 20 different haploid copies of chromosome 21 — derived from 24 ethnically diverse individuals — in rodent–human somatic-cell hybrids. They did this so that they could directly assess the haplotypes of the chromosomes. From each hybrid, they amplified 3,253 DNA fragments, which span most of the non-repetitive chromosome 21 DNA and cover 32.4 Mb. These PCR products were then pooled and hybridized to high-density oligonucleotide arrays — Perlegen's so-called wafers — that carried chromosome 21 sequence corresponding to that in the pooled samples. By using pattern-recognition software to detect altered hybridization signals on the wafers, the team identified 36,000 SNPs from the 20 chromosomes sampled.

Next, the authors identified 24,000 SNPs with a minor allele present in more than one sample, 4,705 of which overlapped with SNPs identified by The SNP Consortium. They used these SNPs, together with a new algorithm, to find the smallest number of SNPs that could define each chromosome 21 haplotype. Surprisingly, they found that, on average, three common haplotypes can define a block of consecutive SNPs and that, within these blocks, all the common haplotype information can be captured by genotyping only 4,563 selected SNPs.

This work has several important implications for future whole-genome studies for mapping common disease genes. It shows that, although a few SNPs can define most haplotypes, a very dense SNP map is needed to capture all haplotype information because of the unpredictable nature of haplotype structure. It also shows that such large-scale studies are feasible, taking us one step closer to finding the genetic variants that predispose us to disease. Pui-Yan Kwok, however, sounds a note of caution in an accompanying Perspective with respect to drawing broad conclusions about human haplotype structure from so few chromosomes.