News & Views | Published:


Reconstructing the wild types

A challenging way to characterize the world's naturally occurring microbes is to piece together whole genomes from complex communities. An unusually acidic microbial habitat provides the setting for a ranging shot on that target.

Antonie van Leeuwenhoek discovered much of the microbial world by looking at it through a simple microscope. Later, while deciphering the roles of microbes in natural elemental cycles of sulphur, Sergei Winogradsky again showed the importance of direct observation of microbes in their natural habitats. These early lessons have not been lost on contemporary microbial biologists, either. For instance, over the past decade a variety of molecular methods have been developed that allow direct surveys and descriptions of naturally occurring microbial species in their native habitats1.

Using the tools of production-scale genome sequencing and bioinformatics analyses, Tyson et al.2 now take direct observation of native microbial assemblages to the next level. On page 37 of this issue, they describe the compilation of large portions of microbial genome sequences, retrieved directly from a natural community. The approach they used was whole-genome ‘shotgun’ sequencing. In this approach, total DNA from a complex microbial mixture is collected and smashed into tiny fragments, which are then pieced back together into their proper genomic arrangement with computer guidance. The results emphasize the potential of genomic strategies for studying complex microbial assemblages, as well as some of their inherent limitations.

To appreciate this feat, a little back-ground on the environmental setting and the microbial players is useful. The microbial community studied by Tyson et al. is a relatively simple one, existing in an unusual, man-made habitat: the drainage conduits of an abandoned mine in northern California's Sierra Nevada mountains. This setting is classified as a ‘superfund’ clean-up site, as the resident microbes interact with pyrite (FeS2) mine tailings to produce large amounts of sulphuric acid3. Here there are thriving microbial assemblages that have adapted to the self-generated low pH (about 0.5), moderate temperature (40°C) and abundant source of geochemical energy (FeS2). The extremely acidic conditions and relatively restricted energy source combine to select for a relatively simple community, which Tyson et al. recognized as ideal for testing new genomic approaches in the environment. The main microbial players at the drainage site appear to make their living by oxidizing iron and sulphide, and represent two of the three principal domains of life4, Bacteria and Archaea.

The genomic approach used by Tyson et al. consisted entirely of the whole-genome shotgun method5, as practised at large-scale DNA-sequencing centres such as the US energy department's Joint Genome Institute (JGI) in Walnut Creek, California, where the work was performed. After collection of an arbitrary chunk of microbial biofilm from the acid drainage site (Fig. 1), DNA was extracted, sheared to an average size of 3,000 bases, cloned and shotgun sequenced. Remarkably, the end-sequencing of a mere 50,000 or so clones (about 76 million base pairs) allowed the assembly of large, contiguous genomic regions from several dominant microbes in the mine-drainage community.

Figure 1: Microbial life in the pink.


These apparently inert threads — shown close-up in the inset — are an example of the biofilm community that thrives underground, on the surface of drainage from the highly acid conditions in the mine sampled by Tyson et al.2 (the Richmond mine at Iron Mountain, California). In this instance of ‘environmental microbial genomics’, Tyson et al. recovered two near-complete genomes of a bacterium and an archaean, as well as partial genomes of other microorganisms.

Following the DNA sequencing phase, the JGI group assembled a large fraction of the short DNA sequence reads into larger ‘scaffolds’. The 1,183 scaffolds, totalling 10 million base pairs in length, could be sorted into distinct groups based on a combination of the read density (the number of individual DNA reads per unit DNA length in a scaffold) and the percentage content of guanine and cytosine (G + C) bases. Sorting of the scaffolds using the combined criteria of G + C content and DNA read coverage grouped related genome sequence assemblies from the mix. Additionally, the location of a taxonomic molecular marker — ribosomal RNA (rRNA) — helped to link individual genomic scaffolds with specific microbial groups.

The high-G + C (55.8%) scaffolds having tenfold read coverage contained only one type of bacterial rRNA. This rRNA, which grouped with members of the genus Leptospirillum, encompassed a total of 2.3 million base pairs of DNA sequence. Similarly, the low-G + C scaffold group that had about tenfold read coverage contained one general archaeal rRNA type related to species of Ferroplasma, and totalled about 1.8 million base pairs. Their novel approaches to community assembly allowed the JGI bioinformatics group to stitch together large genomic scaffolds from these two abundant microbes in the mine drainage community.

These preliminary results are truly encouraging. But conclusions about the origins, absolute genetic structure and identity of the DNA sequences must be considered as hypotheses that remain to be tested. The kind of data reported by Tyson and colleagues represents a sort of average genome structure, a patchwork quilt combining pieces of DNA from many individual cells in a complex population. For instance, an estimated 100 million Leptospirillum-like cells contributed to a total of 29,000 assembled DNA sequence reads for this group. Each DNA sequence that was read probably originated from a different (and potentially non-identical) cell within the population. So the assembled scaffolds are very different beasts from the singular genome sequences (determined from clonal laboratory isolates) that currently populate genome-sequence databases. Such environmentally derived genome-sequence assemblies need to be very carefully flagged up in the databases, so that their distinction as composite population assemblies will be obvious.

In the case of the Leptospirillum species, the nucleotide polymorphism rate (average sequence differences between individual reads) was low, about 0.08%. For this Leptospirillum population, then, there appeared to be only moderate genetic heterogeneity. In contrast, the Ferroplasma-like scaffolds showed a much higher polymorphism rate (around 2.2%) between individual reads that make up the scaffold. Given the locations and frequencies of these polymorphisms, the authors guess that the Ferroplasma polymorphisms arise predominantly from homologous recombination — a process whereby microbial cells can assimilate large blocks of foreign DNA into their genome.

There are, however, other plausible explanations for the patterns observed in the data. For example, the population may have originally been much more complex, but recently have undergone a ‘selective sweep’, leaving behind highly related but non-identical polymorphic genotypes6. What Tyson et al. presume to be genomic types that have been variously blended by lateral gene transfer may, in contrast, simply represent the visible survivors (by linear descent) of complex lineages — the surviving nodes in a family tree that has been extensively pruned by natural selection. At present, the existing data cannot fully distinguish between these alternatives. Better quantitative data on the levels of heterogeneity within and between Ferroplasma populations in this habitat, and a more quantitative assessment of genotypic representation in the environment, may help to clarify the issue.

But the complexities of deciphering genomic data from complex microbial populations are more of an opportunity than a difficulty7. After all, the reality is that genomes are dynamic biological structures: any notion that they are singular, unchanging entities does not capture the process that shapes the diversity we see today. It is only by peering directly into naturally occurring genomic diversity, as Tyson et al.2 have done, that the tempo, mode and mechanism of genome evolution and diversification, and its relationship to higher-order biological and ecological processes, will become clear. Our currently static snapshot views of microbial genomic diversity have the potential to develop into motion pictures as this emerging field develops. As predicted8, new vistas on the natural microbial world are opening up dramatically, in part owing to the new capabilities afforded by modern genomic technologies.


  1. 1

    Pace, N. R. Science 276, 734–740 (1997).

  2. 2

    Tyson, G. W. et al. Nature 428, 37–43 (2004).

  3. 3

    Edwards, K. J., Bond, P. L., Gihring, T. M. & Banfield, J. F. Science 287, 1796–1799 (2000).

  4. 4

    Woese, C. R. Microbiol. Rev. 51, 221–271 (1987).

  5. 5

    Fleischmann, R. D. et al. Science 269, 496–512 (1995).

  6. 6

    Palys, T., Nakamura, L. K. & Cohan, F. M. Int. J. Syst. Bacteriol. 47, 1145–1156 (1997).

  7. 7

    DeLong, E. F. Curr. Opin. Microbiol. 5, 520–524 (2002).

  8. 8

    Woese, C. R. Microbiol. Rev. 58, 1–9 (1994).

Download references

Author information

Rights and permissions

Reprints and Permissions

About this article

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.