Assigning a biological function to a string of DNA bases is the hardest and arguably the most important part of any genome-sequencing effort, as without knowing where genes are, we cannot say anything about their biological importance and evolutionary history. Tried and tested methods for genome annotation have their merits, but there is ample scope for developing new approaches. Dario Boffelli and colleagues have now succesfully risen to this challenge: by using the sea squirt Ciona intestinalis, they show that intraspecies sequence comparisons can be effective for identifying functional sequences — and that other organisms, including humans, might also benefit from the same strategy.

The most effective means of annotating genomes has been to compare the sequences among organisms at varying evolutionary distances, on the premise that less divergent sequences will be functionally important. However, using different species restricts the analysis to genes that are common to the species compared. Intraspecies comparisons have always posed a theoretical way out of this limitation, but until now, the high costs of sequencing many individuals prevented this method from being put into practice. However, with resequencing costs now becoming ever cheaper, it remains only to decide where to start.

Boffelli and colleagues settled on C. intestinalis, an excellent experimental system that also benefits from a high level of allelic polymorphism and a sequenced genome. The authors collected 140 individuals from 4 locations around the world, then using PCR, they amplified 4 defined stretches of coding and 5′ upstream sequences (a total of 16 kb) from each organism. Although sequences were not obtained from all individuals, those that were available were used to infer the phylogenetic relationships, both among the four geographical populations and within the individuals collected at each location.

Crucially, the same intra-population sequence comparisons also told the authors which nucleotide sites were mutating more slowly than others, and these were then used to detect functional DNA regions. Several exons of collagen and patched were predicted in this way. 5′ gene-control elements — five for the forkhead and two for the snail developmental genes — were also predicted computationally, and then verified by using in vivo reporter studies.

Intraspecies comparisons are unlikely to supplant the more conventional interspecies approaches. However, they come into their own when investigating species-specific sequences (including those that occur in humans), new species or those that lack relatives at a convenient phylogenetic distance.