The rapid progress in genetic screening assays and DNA sequencing techniques promises to increase our understanding of the complex relationship between the human genetic make-up (the genotype) and its associated traits (the phenotype). For example, using the composite human genome sequences1,2,3, genome-wide association studies have identified regions that control specific traits through single nucleotide polymorphisms (SNPs) — the most common form of genetic variation. In this issue, Bentley et al.4 (page 53) and Wang et al.5 (page 60) detail the development and application of a high-throughput technology for sequencing DNA to decipher the genomes of two people, one of West African descent and the other of Han Chinese descent. This advance provides a technology that might eventually relate specific sequences and regions of DNA directly to human phenotypes.

Although genome-wide association studies can establish a link between a genetic locus marked by adjacent SNPs and its associated phenotype, they do not automatically identify the implicated nucleotide's position, as they use only a fraction of human SNPs. Genome-wide association studies were used because of their relatively low cost compared with the technological challenge and high cost of sequencing genomes in large human populations. Sequencing the genomes of many individuals would overcome the problem of identifying which nucleotide(s) are implicated in a phenotype, as long as the procedure could be performed accurately and completely. From such data sets, DNA variants can be identified, and the frequency with which they occur in humans who carry a particular trait — such as a disease — can then be compared with their frequency in people who lack that trait. Thus, all genetic variants contributing to the trait can be identified, giving a more complete picture of the biology involved.

The genomes of the anonymous African and Asian individuals supplement the existing sequenced genomes of two people of European origin, Craig Venter6 and James Watson7. Both teams involved in the latest work4,5 used the Illumina GA sequencing instrument, in which sequencing is performed by synthesizing fluorescently detectable DNA molecules, using the DNA from the genome being sequenced as a template. In a single cycle, this platform can produce more than 40 million discrete 'reads' of 35 nucleotides from either end of a 200- or a 2,000-nucleotide DNA fragment. Compared with the instruments used to complete the initial human genome sequence1,2,3, the Illumina GA generates three to four orders of magnitude more sequence per operation cycle. This instrument therefore joins the 454 Life Sciences sequencer7 as yet another 'next generation' technology for sequencing individual human genomes.

How do the two new genome sequences allow a better understanding of human genetics? Both studies4,5 confirm that it is possible to routinely sequence the genome of an individual to discover the wide spectrum of DNA variations that it harbours. Of course, this process is greatly facilitated by having a reference human genome against which to compare sequence data from the two individuals. This allows the identification of SNPs, as well as insertion/deletion polymorphisms and structural variations (Fig. 1, overleaf). Extensive validation of the SNPs detected shows that sequencing accuracy is high. A strength of this latest approach is the extent of deep sequencing achieved, which aids SNP identification.

Figure 1: Genomic variations.
figure 1

The latest whole-genome sequences of two humans confirm4,5 that individual genomes vary in several respects. The types of variability in inheritance include: variations in single nucleotides (SNPs); insertion or deletion of several nucleotides; insertion or deletion of thousands of nucleotides (structural variation); and duplication or multiplication of DNA segments more than 1,000 nucleotides long (copy-number variation).

The advantages of obtaining these two genomes, such as the identification of DNA variations, indicate that their usefulness will ultimately be much broader than simply demonstrating the technological milestone of relatively low-cost sequencing. But some goals remain. As the genomes were reconstituted on the basis of alignments with existing reference genomes, the set of non-SNP variants that are absent in the reference genome will be incomplete. For example, in these studies, the detection of structural variants — insertions or deletions of thousands of nucleotides at any one position on a chromosome — is preferential for deletions. This is because such insertions come from sequenced reads that will not overlap with the existing reference genome. There are two possible solutions to this detection bias. One would be to sequence larger DNA fragments whose ends overlap with sequences on the reference genome8. Alternatively, all sequenced reads could be assembled independently, before mapping them to a reference human genome6.

Another deficiency of the four genomes4,5,6,7 is that they do not accurately define copy-number variants at the nucleotide level. These forms of genetic variation arise from the insertion of multiple copies of DNA segments that may include whole genes and that have been increasingly implicated in, among other disease phenotypes, neurological disorders9,10.

Our genomes are not just collections of DNA variation: parental inheritance also dictates specific associations between neighbouring variations. Knowledge of these associations will ultimately help us discover whether and how much of an aberrant protein is produced by each of our cells and how these events contribute to observed phenotypes. The association between neighbouring variations across all 23 pairs of human chromosomes is referred to as haplotype assembly, and has not yet been completely achieved in any of the individual genomes sequenced.

These limitations notwithstanding, the approach of Bentley4, Wang5 and their colleagues represents a substantial advance in the sequencing of individual human genomes. Together with the other two genomes sequenced6,7, they reinforce the catalogue of variants that exist in human genomes — SNPs in the millions, insertion/deletion polymorphisms in the hundreds of thousands and structural variants in the thousands. The numbers of these variants do not directly tell us how such polymorphisms contribute to the wide spectrum of human traits. But they do provide a necessary step towards accurately defining genomic loci that are likely to be implicated in those traits.

With such rapid advances in next-generation technologies, and with 'third generation' technologies emerging, this is just the beginning of the era of the individual genome. Soon, association studies using complete individual genomes will become the approach of choice for understanding the complexity of human biology and disease. The latest advances have broad implications for expediting that goal.