Main

When the first steps were made towards understanding the principles that govern biological systems, scientists made simplistic assumptions about biological concepts to reduce the complexity of their hypotheses and draw interpretable conclusions. In the past few years, the transition from single-isolate genomics to comparative genomics of entire microbial populations has introduced new parameters that question or even threaten to reject those initial assumptions. For example, the current recognition of increased microbial genome fluidity indicates that the fundamental definition of a biological species1 fails in some cases to provide a realistic description of the dynamic relationships that shape microbial evolution. These findings do not support the strictly bifurcating tree of life as a means of phylogenetic analysis and instead favour the more realistic model of a phylogenetic network2, which better represents the true relationships among species that are characterized by high rates of DNA exchange3,4,5,6.

The first data to support this model came from the genomic analysis of the obligate intracellular bacterium Wolbachia pipientis. Klasson et al.7 compared 450 genes shared by three W. pipientis strains (W. pipientis wRi, W. pipientis wMel and W. pipientis wUni) that infect Drosophila simulans, Drosophila melanogaster and Muscidifurax uniraptor, respectively. Approximately 30% of core genes indicated that W. pipientis wMel and W. pipientis wRi are sister lineages, a different 30% supported the W. pipientis wMel and W. pipientis wUni sister phylogeny and 20% showed that W. pipientis wRi and W. pipientis wUni are the more closely related pair. The authors concluded that the high rates of intra-species recombination in W. pipientis do not allow a one-to-one relationship between gene history, genome history and strain phenotype. This suggests that W. pipientis is a mixture of subpopulations, and strains in the same subpopulation recombine more frequently which each other than with strains outside of it.

In the second example, Didelot et al.8 compared the genomes of eight serovars of Salmonella enterica to identify blocks of high or low similarity. Their data showed that in all but one pairwise comparison the distribution of sequence divergence is unimodal. However, in the case of S. enterica subsp. enterica serovar Paratyphi A and S. enterica subsp. enterica serovar Typhi, the distribution showed two peaks corresponding to regions of high (1.2%) and low (0.18%) sequence divergence. Overall, in 75% of their DNA sequences the two serovars appeared to be distantly related isolates of S. enterica and in 25% they resemble sister lineages. The authors suggest that this apparent relatedness is the result of more than 100 recombination events that took place over a recent, restricted time span.

A similar pattern of genome mosaicism is seen in Pseudomonas fluorescens. Silby et al.9 sequenced the genomes of two P. fluorescens strains (SBW25 and Pf0-1) and compared them with that of P. fluorescens Pf-5. The comparison yielded a shared core set of 3,600 protein-coding genes, which corresponds to only 60% of genes in each of the three genomes. By contrast, a similar analysis of five isolates of Pseudomonas aeruginosa gave a core set of almost 5,000 genes, with only 1–8% of protein-coding genes being strain specific. Despite this diversity, a comparison of the three P. fluorescens strains and P. aeruginosa PA01 showed that almost 24% and 35% of the genes place P. fluorescens SBW25 closest to P. fluorescens Pf-5 and P. fluorescens Pf0-1, respectively, and 37% put P. fluorescens Pf0-1 in the same node as P. fluorescens Pf-5, suggesting that there has been extensive genetic recombination between these strains despite their extreme diversity.

These three examples show that, in the case of highly mosaic genomes, traditional models for analyzing the history of microorganisms are not applicable. Methodologies that tailor the model to the data, rather than the data to the model, offer a more realistic approximation of microbial diversity and complexity.