The beginning of the Human Genome Project, over a decade ago, was accompanied by a cantankerous debate over whose genome was to be sequenced. Would it be a single individual? A celebrity, perhaps (widely rumoured to be Jim Watson, co-discoverer of the structure of DNA)? Or would several genomes, from many individuals, be studied? The discussion struck at the very heart of genetics. As the study of inherited variation between individuals, genetics might not immediately benefit from the sequence of a single genome. But even one genome would be immensely revealing to the science of deciphering the molecular blueprint of a species. Fortunately, geneticists were not forced to make this choice. Papers in this issue describe not only a single, history-making human genome sequence, composed of little bits from many humans1 (page 860), but also some 1.4 million sites of variation mapped along that reference sequence2 (page 928).

But why this preoccupation with sequence variation, with the fact that no two humans (except identical twins) are genetically the same? The answer is that such variations, or 'polymorphisms', are markers of genes and genomes with which researchers perform genetic analysis in an outbred species where matings cannot be controlled. The fields of human and medical genetics simply cannot exist without understanding this variation.

It has become clear that the two 'genomes' that each of us carry, inherited from our parents, most often differ — from each other, and from the genomes of other humans — in terms of single base changes1 (Fig. 1). The twentieth century saw the identification of only a few thousand of these so-called single nucleotide polymorphisms (SNPs, or 'snips' to the streetwise). In just the first year of the new century, this number has been increased one-thousand-fold2. Beyond the numbers, the excitement today comes from precise knowledge of where these sites of variation are in the genome2. The 1.42 million known SNPs are found at a density of one SNP per 1.91 kilobases. This means that more than 90% of any stretches of sequence 20 kilobases long will contain one or more SNPs. The density is even higher in regions containing genes. The International SNP Map Working Group2 estimates that they have identified 60,000 SNPs within genes ('coding' SNPs), or one coding SNP per 1.08 kilobases of gene sequence. Moreover, 93% of genes contain a SNP, and 98% are within 5 kilobases of a SNP. For the first time, nearly every human gene and genomic region is marked by a sequence variation.

Figure 1: The most common sources of variation between humans are single nucleotide polymorphisms (SNPs) — single base differences between genome sequences.
figure 1

Fragments of two sequences, with eight SNPs, are shown.

These data provide interesting first glimpses into the pattern of variation across the genome. Variation is commonly assessed by nucleotide diversity — the number of base differences between two genomes, divided by the number of base pairs compared. Nucleotide diversity is a sensitive indicator of biological and historical factors that have affected the human genome3. The nucleotide diversity in gene-containing regions has been estimated to be 8 differences per 10 kilobases4,5; we now know that the genome-wide average is similar, 7.51 differences per 10 kilobases (ref. 2). The variation between individual non-sex chromosomes is small, and lies in the range 5.19 (for chromosome 21) to 8.79 (for chromosome 15) differences per 10 kilobases (ref. 2).

Strikingly, humans vary least in their sex chromosomes. The variation between different X chromosomes is about 4.69 differences per 10 kilobases, and it is very much lower for the Y chromosome (1.51 differences per 10 kilobases). This is because the sex chromosomes have patterns of mutation and recombination (the swapping of similar DNA segments during the generation of eggs and sperm) that differ both from each other and from the non-sex chromosomes. Moreover, fewer ancestors have contributed to the sex chromosomes, which are therefore less variable than the non-sex chromosomes.

Perhaps not surprisingly, some genomic regions have significantly lower or higher diversity than the average. For example, the HLA locus, which encodes proteins that present antigens to the immune system, shows the greatest diversity. Such comparisons within genomes will be essential to our understanding of how variation shapes biochemical and cellular functions, and in illuminating past human evolution, as discussed in ref. 3, and by Stoneking in the preceding article (page 821 ; ref. 6).

But the main use of the human SNP map will be in dissecting the contributions of individual genes to diseases that have a complex, multigene basis. Knowledge of genetic variation already affects patient care to some degree. For example, gene variants lead to tissue and organ incompatibility, affecting the success of transplants. And the mainstay of medical genetics has been the study of the rare gene variants that lie behind inherited diseases such as cystic fibrosis.

But variations in genome sequences underlie differences in our susceptibility to, or protection from, all kinds of diseases; in the age of onset and severity of illness; and in the way our bodies respond to treatment. For example, we already know that single base differences in the APOE gene are associated with Alzheimer's disease, and that a simple deletion within the chemokine-receptor gene CCR5 leads to resistance to HIV and AIDS. The benefit of the SNP map is that it covers the entire genome. So, by comparing patterns and frequencies of SNPs in patients and controls, researchers can identify which SNPs are associated with which diseases7,8,9. Such research will bring about 'genetic medicine', in which knowledge of our uniqueness will alter all aspects of medicine, perceptibly and forever.

Studies of SNPs and diseases will become more efficient when a few more problems are solved3. First, although 82% of SNP variants are found at a frequency of more than 10% in the global human population, the 'microdistribution' of SNPs in individual populations is not known. Second, not all SNPs are created equal, and it will be essential to know as much as possible about their effects from computational analyses before studying their involvement in disease. For example, each SNP can be classified by whether it is coding or not. Coding SNPs can be classified by whether they alter the sequence of the protein encoded by the altered gene. Changes that alter protein sequences can be classified by their effects on protein structure. And non-coding SNPs can be classified according to whether they are found in gene-regulating segments of the genome10 — many complex diseases may arise from quantitative, rather than qualitative, differences in gene products. Third, the technology for assaying thousands of SNPs, in thousands of patients and controls7, is not yet fully developed, although there are some creative ideas around.

In the twentieth century, humans were not the geneticists' species of choice. The emphasis then was on understanding gene structure and function. Now, geneticists will concentrate increasingly on understanding physical and behavioural characteristics. Here, our species, with its obsession with self-examination, will make a superior subject. We will also see more studies of how natural variation leads to each one of our qualities. To some, there is a danger of genomania, with all differences (or similarities, for that matter) being laid at the altar of genetics11. But I hope this does not happen. Genes and genomes do not act in a vacuum, and the environment is equally important in human biology. By identifying variation across the whole genome, the SNP map2 may be our best route yet to a better understanding of the roles of nature and (not versus) nurture.