The article on page 695 of this issue1 describes the draft sequencing and initial analysis of the genome of Gallus gallus — more commonly known as the red jungle fowl, the predecessor of the domestic chicken, and a valuable experimental organism (Box 1). Describing the first avian genome to be sequenced, this paper and the two that accompany it2,3 provide a valuable resource for a diverse set of scientists studying a diverse set of scientific problems. Those who will benefit include agricultural researchers attempting to breed the most productive strain by recognizing links between DNA sequences and attributes such as egg production; comparative genomicists desiring to accurately identify the functional elements of the human genome; and genome-sequence producers, who continue to debate the most effective way of sequencing a vertebrate's large genome.

The draft chicken genome sequence, as reported by the International Chicken Genome Sequencing Consortium1, has several features that distinguish it from the sequenced mammalian genomes — those of humans, mice, rats and dogs. Weighing in at about 1 billion DNA base pairs, the chicken genome is broken down into 1 pair of sex chromosomes (Z and W, with females being ZW and males ZZ) and 38 non-sex chromosomes (autosomes). The autosomes vary greatly in size, being described as macrochromosomes (large) and microchromosomes (tiny). Microchromosomes, which range from 5 million to 20 million base pairs, are not common in mammals but are abundant in birds and some fish and reptile species. The consortium's analysis of these microchromosomes in chickens indicates that they are easily discernible from macrochromosomes at the sequence level, because of their relatively high levels of guanosine–cytosine (GC) base pairs (compared with adenine–thymine pairs) and relative lack of repetitive sequences.

Another notable difference between the chicken genome and the average mammalian genome is that the chicken sequence is about one-third the size. This is now explained in part by its markedly smaller amount of ‘common repeats’ (stretches of sequence that occur many times), including a reduction in the number of degraded copies of gene sequences, a simpler structure of large duplications, and fewer duplicated copies of genes overall.

So much for generalities; how does this draft sequence benefit the various ‘special interests’ groups mentioned above? First, it provides an initial framework for chicken breeders who want to understand how genetic variation influences traits that are important in the production of domestic chickens, by allowing the traits to be mapped back to precise genomic locations and genes. These groups have traditionally used quantitative trait loci — an estimate of the occurrence rate of a desirable ‘continuous’ trait in a population — to link the genetics of a strain to that trait. Continuous traits show gradated variation and are controlled by more than one gene; in humans, they include height.

With the draft sequence, however — together with the second paper in this issue2 — it will be easier to link specific genetic variations with variations in physical traits. In that second paper, the International Chicken Polymorphism Map Consortium describes numerous single-base-pair differences — 2.8 million of them, in fact — between three lines of domestic chicken (broiler, layer and Silkie) and the red jungle fowl. The map they have developed should allow researchers to identify the genes, and the combinations of gene variations, that produce desirable traits in chicken breeding populations. It should also increase the odds of optimizing a particular trait in subsequent generations.

For some time now, researchers in comparative genomics who are studying the human genome have also been craving the genome sequence of a species in the chicken's rough evolutionary position. In general, researchers have a good grasp of how to identify those portions of a genome that are translated into proteins, by aligning sequences of messenger RNAs, the precursors of proteins, against genomic sequences of interest. One can also identify these ‘coding’ genomic sequences by comparing the DNA of organisms that are evolutionarily distant. For example, stretches of sequence that have been preserved in humans and fruitflies are likely to be very important for the functioning of the organisms. These sequence stretches are called conserved elements.

However, now that the human genome sequence is essentially finished4, researchers would like to do more than just identify the sequences that are translated into proteins. They also want to understand all of the regulatory structures present in a genome — structures that might, for instance, adjust the amount of protein manufactured from a particular gene. These structures are collectively known as functional elements, and the chicken, having diverged from humans more than 310 million years ago, is considered the best example so far of an ‘outgroup’ with which to identify them. Because enough differences between the human and chicken sequences have accumulated over this period, one can zero in on the precise base pairs that evolution has left alone for all these years — the base pairs most likely to be functional in the human genome. By comparison, the mouse, which split from humans only 75 million years ago, is too similar at the base-pair level, leading to difficulties in identifying functional elements5.

The consortium's initial analysis1 describes 70 million base pairs of sequence that are highly conserved between chickens and humans. This includes base pairs within genes, but also base pairs that are between genes and therefore relate to potential functional elements (interestingly, many of these seem to be at a considerable distance from genes). Questions surrounding what these structures are and why evolution has constrained them over time will only be answered with targeted experiments, some of which are beginning to get under way6.

Finally, for those who concentrate on generating large-scale genomic sequences and resources, the chicken genome represents another in a series of grand experiments to balance two different approaches. Traditional clone-by-clone approaches (see, for example, refs 4, 7) — which involve cloning a genome into bacterial artificial chromosomes (BACs), mapping the clones, then sequencing them and assembling the sequences by using the map — are time-consuming but generally produce an accurate representation of all regions of the genome. Whole-genome shotgun8 (WGS) is quicker, because it involves shattering the whole genome into pieces, sequencing the fragments and assembling them by computer, but it often fails to represent all regions accurately.

The chicken sequence presented here is a halfway house: it is not a straight WGS assembly, but has been revised according to a physical map of 180,000 BAC clones, detailed by Wallis et al.3 on page 761. This map was crucial in ordering and localizing the sequence pieces generated by WGS. Thus the assembly captures an impressive 98% of the sequence over most of the genome, with that number falling slightly in very GC-rich regions. The authors were also able to locate partial or complete sequences of at least 97% of coding genes that were previously known to exist.

However, the genome has received no directed ‘finishing’ work, and issues do still exist — there is a distinct lack of continuity in 10% of the gene-rich regions, and there are perhaps 1.4 million base pairs of sequence that are in the wrong position. Recent studies9 suggest that, even with algorithmic improvements, WGS assemblies fail to resolve large-scale duplications in vertebrate genomes; even with a BAC map, recently duplicated sequences in the chicken assembly are poorly resolved1. And the authors suggest that one reason why they were able to resolve most of the WGS sequence was the minimal repetitive content of the chicken genome, so the experience will not necessarily translate to all vertebrate genomes. As we move forward in this post-genomic era, we must learn from all past experience, so that we can maintain the high quality we have come to expect from genome-sequencing projects.