Adding to a rapidly growing list of organisms that have had their genomes decoded, the genome sequence of zebra finch (Taeniopygia guttata) was recently reported (Warren et al., 2010). Being the second bird to be genome sequenced (chicken was the first; ICGSC (International Chicken Genome Sequencing Consortium), 2004), it opens up new possibilities for comparative genomic approaches to studies of evolutionary processes in birds. It also increases the power in using birds as an outgroup in analyses of mammalian molecular evolution.

The zebra finch is a major avian model organism for studies of behaviour, endocrinology and neurobiology. Neuroscientists have been attracted by zebra finches for many years, as songbirds (oscine birds of the order Passeriformes) represent perhaps the most vital model for understanding the cognitive processes behind the trait of vocal learning. This is a complex trait—which includes learning, memory and vocalization—that songbirds only share with humans and a few other avian lineages (parrots and hummingbirds). This made a strong case for zebra finch genome sequencing (Clayton et al., 2009), which was approved and funded by the National Human Genome Research Institute (NHGRI) in 2005. The sequencing work has been completed at the Washington University Genome Sequencing Centre.

The assembled zebra finch genome is 1.2 Gb, of which 1.0 Gb has been assigned to chromosomes. The genome size in birds is typically 30–50% of that of mammals and, similar to what was observed in chicken genome sequencing, in zebra finch this can largely be explained by the low frequency of mobile repetitive elements (7.7% of the genome). In all 17 475 genes were predicted, which is slightly less than what is often considered for mammalian genomes. A set of these genes is differentially expressed in the auditory forebrain in song experiments (Dong et al., 2009) and are thus candidates for involvement in vocalization and learning functions. Moreover, differentially expressed transcripts include a large number of non-coding RNAs, indicating that other types of sequences than protein-coding genes might mediate complex cognitive behaviours, for example, microRNAs. The same was found in analyses of patterns of gene expression in song control loci of the zebra finch brain, yielding a catalogue of about 800 genes that alter their expression in response to a song. As vocal learning is a derived trait in songbirds, one would expect that some of these genes have been subject to positive selection in the evolutionary lineage leading to songbirds. An analysis of the evolutionary rates of genes shared between zebra finch, chicken and mammals revealed 49 that have both been subject to adaptive evolution in the zebra finch lineage and are suppressed in expression upon song exposure. Interestingly, there is a significant overrepresentation among these of genes with the functional annotation ‘ion channel activity’ and, indeed, ion channels are critical components of the nervous system. In a more detailed study, Nam et al. (2010) demonstrated an overrepresentation of glutamate receptors among positively selected and differentially expressed genes in zebra finch song control loci. Incidentally, the gene ASPM, which is associated with microcephaly in humans and has been implicated in the emergence of modern human cognition (a conclusion that has been questioned), is positively selected both in the zebra finch (Nam et al., 2010) and in the human lineage (Evans et al., 2004).

A relatively large number of papers covering particular aspects of zebra finch genomics accompanied the main genome paper. These include topics such as the peptidome (Xie et al., 2010), the degradome (Quesada et al., 2010) and immunogenetics (Balakrishnan et al., 2010), to name but a few. Genome sequencing was also accompanied by the development of a zebra finch linkage map (Stapley et al., 2009; Backström et al., 2010). By integrating the data from the assembly and the genetic map, it is possible to estimate the rate of recombination in different parts of the genome. This revealed a highly heterogeneous recombination landscape with one of the most pronounced ‘telomere effects’ seen in any species (Backström et al., 2010; Stapley et al., 2010). Up to 90% of recombination events apparently occur in the terminal 10% of the larger chromosomes; the central parts of chromosomes thereby essentially form recombination desserts. One consequence of this is that the degree of linkage disequilibrium is also highly heterogeneous in the zebra finch genome. This means that association mapping should be expected to more easily find correlations between markers and trait loci located in the central parts of chromosomes. On the other hand, however, the low recombination rate in these regions will make it more difficult to fine-map such loci by pedigree approaches.

Recombination rate variation also has implications for genome evolution. Using a chicken tiling path microarray, Völker et al. (2010) identified 32 genomic regions that show copy number variation (CNV) distinguishing between chicken and zebra finch. As a clue to the mechanistic basis for structural variation in the genome, it was found that both the incidence of CNVs and chromosomal rearrangements were associated with regional variation in the recombination rate.

Zebra finch genome sequencing was done in the ‘standard way’, using Sanger technology and by integrating sequence assembly data and information on the physical map via BAC clones. Genome sequencing is now gradually turning to assemblies based on next-generation sequencing (NGS) technology. For example, the 2.25-Gb giant panda genome was recently de novo assembled at 56 × coverage using short Illumina reads (Li et al., 2010). It is interesting to note that the zebra finch and panda assemblies achieved approximately the same mean length of continuous stretches of assembled sequence (contigs) and of assembled contigs with gaps (scaffolds/supercontigs). This would suggest that about 10 times more sequence data are currently needed to obtain the same de novo assembly quality with short reads compared with Sanger sequencing coupled with physical mapping. Clearly, this will come to change with increasing NGS read lengths.