Even before DNA sequencing was developed, we knew from enzyme polymorphisms that people had a few genetic differences that could be linked to physical traits. We now know the most easily identifiable variations are single nucleotide polymorphisms (SNPs).

In the early 1990s, coincident with the development of microarrays (see Milestone 16) and discussions of sequencing the human genome, several groups demonstrated that PCR-based or ligation-based sample-preparation methods combined with array-based technologies could readily identify SNPs. At the same time, it became clear that linkage-based studies to find disease genes were limited in power; they worked when rare variants had large effects on health, but complex diseases were due to a combination of common variants with each contributing a smaller effect. In 1996, Neil Risch and Kathleen Merikangas made the controversial proposal that statistical association-based techniques were the method of choice for complex diseases. In order to scan for disease genes, however, markers had to be discovered across the entire genome, requiring massive effort and resources from an international collaboration. The resulting race to develop technologies for SNP detection on a large scale revolutionized (and monetized) human genetics.

By November 2000, the SNP database dbSNP contained over 1.42 million SNPs across the human genome. In 2001, Mark Daly and colleagues showed that the SNPs in the human genome existed in a block-like structure, in which all SNPs in the kilobase-length blocks were linked together into only a few combinations or haplotypes. One simply needed to genotype a few of the SNPs in each block to learn the status of all the others, greatly reducing the complexity of the process. In 2005, a large consortium published the first haplotype map — the genotype of 1 million SNPs in 269 samples from essentially three population groups — setting off a frenzy of whole-genome association studies for common disease-susceptibility variants.