A recent study published in Science has established an exciting new direction in environmental genomics — comparative metagenomic analysis.

For many years, environmental genomic studies were hindered by the fact that most environmental microorganisms cannot be cultured in the laboratory. This hurdle began to be overcome with the development of cultivation-independent techniques to characterize these organisms and, in the past 2–3 years, the development of metagenomics — the analysis of microbial community genome sequence data recovered directly from the environment — has provided the field with a new impetus. In this latest study, Susannah Tringe, Christian von Mering and colleagues set out to characterize several disparate microbial communities present in three deep-sea whale carcasses and an agricultural soil sample.

One of the key issues in metagenomics is obtaining ecologically and metabolically relevant data from the complex environments analysed. For the two niches examined in this work, it was estimated that for soil, 2–5 × 109 bp of sequence would be required to obtain a draft sequence of the most abundant genome, and for the whale carcasses, an estimated 100–700 Mb would be required. Given this obstacle to obtaining whole-genome sequences, Tringe et al. proceeded using a 'gene-centric', comparative approach. The aim is to identify a subset of genes that are present in a microbial habitat and use these to understand the genomic diversity between environmental samples, without the need to have a full genome sequence or even know which specific species are being analysed.

The authors generated shotgun libraries derived from each sample, then sequenced and annotated the genes present on the small DNA fragments. Comparing these so-called environmental genome tags (EGTs) with published metagenomic data obtained from an acid-mine drainage site and the Sargasso Sea demonstrated that the predicted protein complement in each niche differed in ways that would be expected given the different nutrients present. Gene-content analysis of pooled samples from the different environments provided evidence of environmental specialization — for example, genes that convert light into energy were found in the marine samples but not in the soil communities. The relative abundance of particular functional gene clusters (operons) also provided evidence for specialization: soil samples were enriched in genes required for potassium transport, which reflects the enrichment of this ion in this habitat.

The results of this study indicate that it might be possible to use sequence data in the form of EGTs to 'generate environmental fingerprints' that could be used to glean details of different environmental niches and provide a greater understanding of the interactions between microbial communities and their environments.