Main

Emerging and opportunistic diseases are caused by a microorganism invading a new habitat, either a new host species (as in the case of SARS and tuberculosis) or a new tissue compartment within the same host species (as in the case of Escherichia coli -mediated urinary tract infection and bacterial meningitis). Often, the pathogen can live for many generations in the new habitat and, as natural selection favours mutations that provide an immediate advantage (that is, evolution is short-sighted), mutations that are advantageous in the new habitat will be selected even if they are disadvantageous in the original habitat. This selection should leave a 'signature' or pattern of sequence variation in the genes targeted by natural selection, with which the genes and the type of selection can be recognized.

In bacteria, there are two sources of genetic variation that can facilitate adaptation to new habitats — horizontal acquisition of new genes and alteration of existing genes. Horizontal transfer between different bacterial clones, or even species, leads to the incorporation of new plasmids or multigene chromosomal insertions (islands) into the genome, and to gene replacement or modification by recombination1. The second type of variation includes point mutations in pre-existing genes or regulatory regions, and gene shuffling, duplication, inversion and deletion2. Traditionally, particularly for genomic islands (Box 1), the signature looked for is the significantly higher frequency of a specific island or gene in strains isolated from the alternative habitat. However, this procedure is difficult, particularly when the selected changes are unknown.

In this paper, we describe a 'source–sink' model for adaptive evolution, which provides us with another method to find genetic changes that have been selected for in the alternative habitat. The source–sink model is derived from population ecology, where the 'sink' population is only maintained by immigration from the 'source' population. The evolutionary source–sink model presented here allows the population in the sink to maintain itself, but only transiently.

We emphasize here that the source–sink model of evolution can be applied to most bacterial pathogens and suggest a phylogeny-based method to search for genes undergoing source–sink evolution. It has previously been proposed that a shift from a commensal to a pathogenic lifestyle is likely to result in bacterial pathogens evolving adaptively within the host, during the course of an infection3,4. It has also been postulated, however, that the emergence of a new professional pathogen (that is, an organism adapted for circulation within the host population as a pathogen) from a non-pathogen is expected to be relatively rare, because it is likely to require the selection of multiple adaptive changes5. The source–sink model is applied to pathogens at early or intermediate stages of adaptation (before the pathogenic lifestyle might become self-sustainable), which we believe is the most common outcome.

The source–sink model

Source–sink ecological models6 were developed, and are currently applied, in the population ecology of animals and plants. They relate to species that are distributed across source habitats, where populations are self-sustaining, and sink habitats, where populations can be maintained continuously only by immigration from established source habitats. In sinks, deaths eventually exceed births whereas the opposite is true in sources. For example, populations of an annual phlox plant in loose sandy loam are self-sustaining, and provide migrants into adjacent populations in denser soil that produce fewer seeds than are required for replacement7. Another example is the habitat dynamics of the three-spine stickleback fish, Gasterosteus aculeatus , which has a large ocean population that functions as a self-sustainable source habitat from which colonization of post-glacial freshwater lakes can occur8. These lakes, however, are relatively unstable sink habitats as they will disappear during the next ice age. Therefore, the sink population can maintain itself without constant immigration, but only transiently, as the population remains relatively small and has a high probability of extinction either through stochastic effects or habitat destruction.

Whether a habitat is a source or a sink depends on the organism, and the abiotic and biotic components of the environment9. Therefore, one organism's sink could be another organism's source. Also, environmental change could convert a sink into a source and vice versa. Finally, a sink habitat can change into a source habitat through adaptive evolution of the organism.

A body of theory has developed over the past decade that describes how species could adaptively evolve in sink habitats10. These mathematical models of source–sink evolutionary dynamics have considered a broad range of general population and habitat scenarios, including 'closed' sinks (Fig. 1a), in which a population that is isolated in a sink habitat with no access to or from other habitats must adapt to avoid extinction11,12,13. Closed-sink models also apply to an isolated population that occupies a habitat that is initially a source habitat but becomes a sink owing to a gradual or abrupt change for the worse in the habitat conditions. Another scenario, called the 'black-hole sink' model (Fig. 1b), involves the recurrent, one-way influx of migrants from the source habitat to the sink habitat14,15,16,17,18. Other models include the reciprocal sink, where there is a constant exchange of migrants between source and sink habitats (Fig. 1c); reciprocal migration can occur either between distinct sources and sinks19,20,21,22 or along continuous environments that gradually change between source and sink habitats9,23.

Figure 1: Different types of source–sink migration dynamics.
figure 1

a | A closed-sink model, in which a population that is isolated in a sink habitat, with no access to or from other habitats, must adapt to avoid extinction11. b | A black-hole sink model, in which there is a recurrent, one-way influx of migrants from the source habitat to the sink habitat14,15,16,17,18. c | A reciprocal sink model, where there is a constant exchange of migrants between the source and sink habitats. Green box, source habitat; blue box, sink habitat; green dots, colonizing organisms; red arrows, inter-habitat migration.

A general conclusion of these theoretical considerations is that the evolution of sink populations is rare when the sink populations are small and short-lived owing to the harshness of the habitat and low immigration rates. This indicates that the sink population size will decrease rapidly, making it unlikely that the genetic changes required for adaptation will be fixed before extinction of the population. However, the theory does indicate that factors such as milder sinks, larger sink populations, intermediate migration rates and, in harsh sinks, environmental fluctuations, should boost the rate of adaptation to the sink habitat by increasing the number of generations that survive in the sink before the population is likely to go extinct. Once some adaptation takes place, the time to extinction is longer, allowing more time for further adaptation5.

Despite various theoretical studies, direct empirical evidence for genetic diversification that can be attributed to source–sink evolution is sparse. Examples of sink-driven genetic diversification can be obtained only in a few macroorganisms where the sink populations exist for a long enough time for evolution to take place. For example, the three-spine stickleback fish that transiently colonizes (from an evolutionary perspective) glacial freshwater lakes undergoes rapid evolution for loss of armour and for changes in shape, size and behaviour compared with the population in the ocean (source habitat)8,24,25. Many of these changes are caused by recessive mutations, which can be held in the source population in low frequency until they become advantageous in the sink populations (M. A. Bell, personal communication).

Although the opportunities to study the evolutionary dynamics of source–sink models are limited for macroorganisms, we believe that bacterial species are ideally suited for the study of source–sink dynamics as they have short generation times and are easily disseminated from one habitat to another. Therefore, the key point in understanding whether the population dynamics of a bacterial (or any other) species can be understood by using source–sink models is to use species that can be found naturally in alternative habitats, because source and sink habitats are definably distinct environments26. Alternative habitats are especially clear for bacterial pathogens that invade different hosts and/or different compartments in the same host3. We will focus here on bacterial pathogens of humans as relatively large amounts of epidemiological data are available. Although many evolutionary models have been developed that analyse virulence from the perspective of pathogen fitness in the infected host, inter-host transmissibility or pathogen-induced host mortality, the source–sink model of virulence evolution allows us to analyse multiple habitats of pathogens for which the infected host does not necessarily provide the main habitat.

The source–sink model for pathogens

Each pathogen has a reservoir habitat and a virulence habitat. The reservoir is defined as an environmental site, host organism or population or specific body compartment where the species can sustain itself continuously (by long-lasting colonization or inter-host circulation) and from where it can be transmitted to other habitats27. Because the reservoir is a continuous habitat for the pathogen, it can be defined as its source habitat28. The virulence habitat is a disease-susceptible host or specific compartment within the same host, in which growth of the pathogen causes clinical infection, that is, induces host damage either directly, by producing toxic compounds, or indirectly, by provoking self-damaging host responses29,30. Consequently, to understand whether the virulence habitat can be defined as a source or a sink habitat for a particular pathogen, it is crucial to determine to what extent the pathogen's reservoir habitat overlaps with, or depends on, its virulence habitat.

Regarding the relationship between virulence and reservoir habitats, human pathogens can be divided into three general categories. First, pathogens with reservoirs of non-human origin, where infection is acquired by human contact with vertebrate animals or their products (zoonotic pathogens such as Campylobacter jejuni and Salmonella enterica serovar Enteritidis), arthropods, plants or environmental sites (for example, Legionella pneumophila ) and where there is no transmission between humans or transmission is short lived. For these pathogens, their virulence habitat in humans can be defined as a non-self-sustainable sink habitat.

Second, 'professional' pathogens, where continuous circulation among humans depends on the ability of the microorganism to cause disease. Such pathogens can infect humans asymptomatically, but benign colonization usually follows disease (for example, by Shigella spp. or Salmonella enterica serovar Typhi) or does not provide the means for inter-host transmission ( Mycobacterium tuberculosis )31. Also, inflammatory host responses during the course of disease development have been shown to be advantageous for professional pathogens such as Shigella spp.32 Here, the virulence habitat is a source habitat that serves directly as the pathogen's reservoir or is required to establish reservoirs of the pathogen in a benign habitat. It should be noted, however, that in professional pathogens the source habitat could be a virulence niche in which a relatively mild infection is caused, and more severe or rare types of infection could represent sink habitats.

Lastly, opportunistic pathogens, where bacteria can circulate continuously in humans asymptomatically and cause clinical infections mainly when the host defences are at least partly compromised. For example, by underlying somatic illness, such as Pseudomonas aeruginosa in cystic fibrosis patients, Serratia marcescens in mechanically ventilated patients, and E. coli and Streptococcus pneumoniae in newborns. Also, an increase in opportunistic infections could be attributed to factors such as exposure to cold temperatures, viral infection or reduced exposure to daylight33. Clinical infections are generally not directly communicable (although the hospital environment presents a risk) and the continuous circulation of these microorganisms in human populations seems not to depend on their ability to cause disease34,35. Therefore, for these pathogens the virulence habitat — the tissues in which they cause disease — could be viewed as a transient, marginal habitat that, at least in the long term, is non-self-sustainable, in other words, a sink habitat.

The virulence habitat is an unstable, or sink, habitat for most bacterial pathogens in humans. However, the source versus sink distinction is based primarily on epidemiological data and is not usually clear-cut. For example, many species of putative opportunistic pathogens that circulate as non-pathogens in a large proportion of the human population have clonal groups that cause clinical infections relatively often and in seemingly healthy individuals (for example, uropathogenic E. coli). In addition, host heterogeneity in infection susceptibility is a significant factor in the occurrence of infectious disease, but is not readily recognized for many pathogens. Therefore, separation of source and sink habitats can be complicated, and molecular evolution analysis must be used to determine when the population dynamics of bacterial pathogens conform to the source–sink model. We propose that the way to determine which alternative habitat of a bacterial species is a source (stable) habitat and which is a sink (transient) habitat, is to identify a bacterial trait that is under different types of selection in alternative habitats, and compare its evolutionary stability through sequence analysis of the genes that respond to the differential selective pressures on the trait.

Expected source–sink dynamics

To determine the different types of microevolutionary events in bacterial pathogens that could be produced by source–sink dynamics, one can build evolutionary scenarios around a bacterial clone that is adapted to circulate in a self-sustainable manner as an asymptomatic colonizer of humans (that is, the commensal habitat is its reservoir or source), but can occasionally spread into a protected host compartment to cause disease owing to a co-incidental pre-adaptation of its commensal traits to function as virulence factors. We will also assume that this clone cannot be transmitted directly from one virulence habitat to another in a self-sustainable manner, in other words, the virulence habitat is its transient sink habitat.

It is possible that sink-adaptive changes in the clone would not occur during the course of disease (Fig. 2a), either because the clone only replicates for a few generations before host recovery or death, or because of a small population size of the invading bacteria. In this case, no sink-adaptive variants of any trait will emerge and so there will be no source–sink selective footprint.

Figure 2: Different scenarios of sink-adaptive evolution of bacterial pathogens.
figure 2

a | A scenario where there is no adaptive evolution during the course of infection; all new infections are caused by repeated invasion of the reservoir-adapted organisms and therefore there is no genetic signature. b | A scenario in which adaptive evolution occurs during the course of infection, but pathoadaptive mutants do not migrate back to the reservoir, or are eliminated very fast on migration, before reaching detectable numbers; virtually all new infections are caused by repeated invasion of the reservoir-adapted organisms. The genetic signature associated with this scenario is that the evolutionarily short-lived pathoadaptive mutations are found exclusively in the infecting organisms. c | A scenario in which pathoadaptive mutants migrate back to the reservoir and circulate there in detectable numbers, but are eventually eliminated from the reservoir and replenished with new migrants; new infections can be caused by re-invasion of the pathoadaptive organisms from the reservoir, but their frequency and migration back to the reservoir is not significant enough to continuously sustain a particular pathoadaptive population there. The genetic signature that is associated with this scenario is that the evolutionarily short-lived pathoadaptive mutations are found commonly but not exclusively in the infecting organisms. d | Shows the same scenario as (c) but the pathoadaptive organisms re-invade the virulence niche and migrate back to the reservoir frequently enough to continuously sustain the pathoadaptive population in the reservoir habitat. The genetic signature associated with this scenario is that the pathoadaptive mutations are recently emerged but show signs of evolutionary stability and are found in both infecting and reservoir organisms, with a predominance in infecting organisms. Green box, reservoir source habitat; blue box, virulence sink habitat; green dots, reservoir-adapted bacteria; red dots, patho-(sink-)adapted mutants; red arrows, inter-habitat migration.

It is also possible, however, that sink-adaptive mutations might be emerging, at low frequency, in the source habitat and be selected and increase (sometimes dramatically) in the sink habitat, even when the sink habitat population is small and/or short-lived. If the sink habitats allow for a relatively large or transiently self-sustainable invading population, adaptive changes could emerge during the course of infection. Under natural conditions in both cases, a mixed population of bacterial clones would be established in the sink habitats, some carrying original source-adapted genes and others the sink-adaptive variants. We will assume that the sink-adaptive variants will sweep through the bacterial population in the virulence habitat owing to, for example, improved ability to escape host defences or use available nutrients. On eventual host recovery or death, the fate of the sink-adaptive mutation depends on the type of sink habitat and the overall functional effect of the mutation.

If there were no migration from the virulence habitat to the reservoir habitat, or migration is insignificant (a closed or 'black-hole' sink, for example, cerebrospinal fluid and the bloodstream in meningititis and sepsis, respectively), the mutation would disappear when the virulence habitat collapses (Fig. 2b). In such situations, the sink-adaptive mutations would be found in isolates from the virulence habitat, and seldom or never in isolates from the reservoir habitat. These mutations would all have arisen independently in each individual affected by the disease and, unlike the source-adaptive non-mutated alleles, they will be of recent origin owing to the relatively transient existence of the virulence habitat from an evolutionary perspective.

When there is significant transmission from the virulence habitat back to the reservoir habitat (a reciprocal sink), the sink-adaptive mutations could spread in the source habitat. The ultimate fate of the mutation would depend on its effect on the fitness of the pathogen back in the reservoir habitat and its ability to re-invade the virulence habitat.

A gene mutation that changes the function or the expression level of an encoded protein usually results in a trade-off in the ability of the protein to perform the function for which it was originally adapted36,37,38. Therefore, although one cannot exclude the possibility that sink-adaptive mutations might be neutral or even beneficial in the reservoir habitat, it is most likely that they will be at least mildly deleterious and be selected against, unless revertant or compensatory mutations occur. Alternatively, a sink-adaptive mutation could lead to a greater level and/or more prolonged colonization of the virulence habitat and a higher rate of disease occurrence in the human population (although some of the sink-adaptive mutations could be adaptive at later stages of the infection but maladaptive for early invasion). In these cases, the higher number of organisms in the sink environment will lead to increased backflow of the sink-adaptive variants into the source habitat.

If the advantage of more frequent and/or increased spread from the virulence habitat does not compensate for selection against the mutation in the reservoir habitat, clones carrying the sink-adaptive mutation will be unstable and will eventually disappear from circulation (Fig. 2c). These sink-adaptive and source-deleterious alleles will be continuously emerging and continuously becoming extinct. There will not be enough time to accumulate selectively neutral variants associated with these sink-adaptive mutations before they become extinct and, therefore, sink-adaptive and source-deleterious alleles will have a short-lived footprint compared with the original source-adapted alleles, which circulate continuously.

The deleterious effects of sink-adaptive mutations in the source habitat could be reduced or avoided if the genetic change is easily reversible. Such changes can occur in so-called simple contingency loci that contain repeats of single or few nucleotide sequences39. The hypermutability and reversibility of genetic changes at these loci result from rec-independent expansion or contraction of the number of repeat units, which leads to a switch in the translation reading frame or changes in the level of promoter activity.

However, if the deleterious effect of sink-adaptive mutations in the reservoir habitat is significant and non-reversible, it could still be balanced by increased migration of the mutation-carrying clone from the virulence habitat (Fig. 2d). In this case, the virulence habitat would stop being the 'sink' habitat for the mutant clone, because its circulation in the reservoir habitat will become dependent on its success in the virulence habitat — the clone will become more like a professional pathogen (for example, it is possible that some diarrhoeagenic E. coli belong to this category). Such clones might become stable to the point that neutral changes would start to accumulate in the sink-adaptive gene variants, that is, they will become similar to the source-adaptive variants.

In both the unstable and stable scenarios, sink-adaptive variants will probably predominate in the virulence habitat, but be sub-dominant in the reservoir habitat. However, in the unstable scenario, the adaptive mutations will be of a recent origin, whereas in the stable scenario the genes with the adaptive mutations will show signs of 'ageing'. To distinguish whether a particular pathogen undergoes stable or unstable dynamics, analytical methods of molecular evolution can be used to determine which particular habitats are primary, long-term habitats (sources) or marginal, transient habitats (sinks), and what genes are targeted by virulence-specific adaptive evolution.

Evidence for source–sink dynamics

On spreading from the reservoir habitat (source) into the virulence habitat (sink), adaptation to the sink is likely to occur not by horizontal gene transfer (clinical infections are usually monoclonal in nature) but by modification of existing genes, that is, pathoadaptive loss-of-function or change-of-function mutations.

The loss-of-function mutations (gene knockouts or deletions) are expected to be selected when expression of the source-adapted gene is detrimental in the virulence niche, not simply by the 'use it or lose it' rule. For example, lysine decarboxylase interferes with the function of endotoxin in enterotoxigenic E. coli and Shigella spp. and the corresponding gene regions are deleted in these clones40,41. Also, a knockout of hemB (hemB encodes a protein involved in haemin biosynthesis) leads to the emergence of small-colony variants of Staphylococcus aureus that are better adapted for long-term persistence within host tissues than S. aureus with regular growth rates42,43. Sometimes, gene losses contribute to the emergence of a professional pathogen (for example, Shigella spp.). The tradeoff is then less important because the clone becomes less dependent on the original source habitat. For non-professional pathogens, however, where the original reservoir remains the primary habitat, the trade-off imposed by the pathoadaptive gene loss indicates that the mutant clones are likely to become unstable.

A classical example of pathoadaptive loss-of-function mutations is found in the mucA gene of P. aeruginosa. The primary habitats of P. aeruginosa are soil and freshwater, but it is also found as a transient asymptomatic colonizer of healthy humans44,45. P. aeruginosa is also an opportunistic pathogen that causes infections in patients with extensive burns, cystic fibrosis (CF) or indwelling devices such as urinary bladder catheters. Clinical infection is not associated with specific clonal lineages; the pathogens are generally acquired independently from the environment and are non-communicable (although transmission in a nosocomial setting is possible)46. Therefore, the virulence habitat for P. aeruginosa is a distinctly marginal sink habitat that does not contribute significantly to the overall natural circulation of the species.

MucA is a negative regulator of capsule production in P. aeruginosa, and deletion of mucA leads to overproduction of alginate polysaccharide on the bacterial surface47. The resultant 'mucoid' phenotype enables bacteria to resist phagocytosis and is highly adaptive in P. aeruginosa that cause chronic endobronchial and urinary tract infection in individuals with CF and a catheterized urinary bladder, respectively48,49. However, mucoid strains are not found in P. aeruginosa isolates that colonize humans asymptomatically or in environmental isolates, where increased resistance to phagocytosis is not as important and cannot offset the energy cost and other limitations associated with capsule overproduction. Therefore, mucA knockouts occur and are selected for during the course of infections that are initiated by non-mucoid strains, but are eliminated quickly in the source population after shedding.

The distinctive source–sink dynamics of P. aeruginosa and the sink-adaptive, source-deleterious nature of the mucA mutations in P. aeruginosa allow us to examine the specific genetic footprint in this gene produced by the source–sink. The loss-of-function mutations in mucA are primarily frame-shift indels (insertions/deletions) or nonsense mutations and, sometimes, non-synonymous changes48,49, and they can be located anywhere in the gene. A phylogenetic relationship between different MucA variants based on the mucA sequences from mucoid and non-mucoid isolates of P. aeruginosa from different CF patients and other sources48,50 is depicted in Fig. 3a. An important difference between the large nodes formed by the functional and non-functional MucA is the presence and absence of silent variation between genes encoding corresponding MucA variants, respectively. In other words, most functional MucA variants have identical amino acid sequences and form a single node on the tree, but are encoded by multiple alleles that differ from each other only by synonymous changes. This reflects their long-term stability within the species, which, over time, allows selectively neutral changes to accumulate. Same-node non-functional MucA variants are always encoded by one allele type, indicating their recent origin. This would be expected, considering that adaptation of the 'mucoid' variants in the sink habitat (lungs) is short-lived relative to the functional MucA that is found in the source habitat (the environment). Consequently, this type of phylogenetic analysis can be used to discover genes involved in the source–sink evolutionary dynamic.

Figure 3: Phylogenetic analysis of source–sink dynamics.
figure 3

The phylogenic relationships of structural variants of the MucA regulator (a); FimH adhesin (b) and adenylate cyclase (c) are shown. The protein tree is derived from the corresponding gene tree (built as a maximum likelihood unrooted phylogram) by collapsing branches with silent changes (for details see Ref. 59). The length of the sequences used are 623 bp, 900 bp and 546 bp, respectively. The size of the node corresponds to the number of strains carrying the specific protein variant. Green circles represent nodes composed of a protein variant carried by multiple strains and encoded by multiple alleles, that is, differing in silent changes only (the proportional occurrence of individual alleles of the corresponding variant is represented as pie pieces). Pink circles represent nodes composed of protein variants found in multiple strains but encoded by one allele only. Blue dots represent nodes composed of a protein variant found in a single strain. Most branches connecting the pink or blue nodes with the green nodes did not contain silent changes on the original gene tree. No silent changes were found on branches between pink and blue nodes. Dashed circles show the nodes resulting from functional mutations (that is, a knockout of MucA or an increase of monomannose-binding in FimH).

In summary, the pattern of mucA variability created by the source–sink dynamics of P. aeruginosa is characterized by the existence of a pool of stable haplotypes (with accumulated silent variation) that are adapted to the source habitat and a series of recently derived haplotypes (without silent variation) that are adapted to the sink habitat, with the adaptive mutations being of various types that commonly occur in hot-spot positions. Obviously, the number of source- versus sink-adapted haplotypes on the tree will depend on the relative prevalence of alleles derived from source or sink habitats in the sample.

Using this methodology, we examined the pattern of allelic variability in another human pathogen, uropathogenic E. coli. E. coli primarily circulates in humans as a commensal organism of the large intestine (with faecal-oral transmission) but can cause intestinal and extraintestinal infections in seemingly healthy individuals51. Among the extraintestinal infections, urinary tract infections (UTIs) are most common, with about half of the women in the United States having at least one E. coli-related episode of cystitis in their lifetime52. These UTIs are mostly self-resolved within a few days, but up to 20% of affected women experience recurrent episodes of cystitis. The UTIs are caused by E. coli inhabiting the large intestine of the patient and are considered to be a non-communicable infection, although sexual activity is a predisposing factor for UTIs and sexual transmission has been recorded53. Because of the short duration of most UTIs and the lack of direct transmission, the urinary tract seems to be a marginal sink habitat for E. coli, with the intestinal tract being the source habitat. However, the source versus sink separation is not as clear here as it was for P. aeruginosa. Large numbers of bacteria are voided in the urine during the course of a UTI. Therefore, sink-adapted bacteria can return easily, and in large numbers, to faecal-oral circulation in the intestinal reservoir and are likely to be able to invade the urinary tract again as a pathogen. Also, certain clonal groups that include specific serotypes of E. coli seem to be genetically predisposed to cause UTIs54. Finally, E. coli can colonize the urinary tract asymptomatically for prolonged periods of time (months or even years) and at a relatively high level (>103 ml−1 in urine), indicating that urinary-tract-adapted strains exist.

Among the traits associated with the ability of E. coli to cause UTIs, the expression of type 1 fimbriae has been shown to have a direct role in urovirulence by mediating bacterial binding to uroepithelial cells55,56. Type 1 fimbriae are thought to be crucial in faecal-oral transmission, by allowing E. coli to bind to, and therefore colonize, the oropharyngeal epithelial surface57. This transient colonization in the throat might increase the probability of successful passage of the clone through the stomach acid to become resident in the large intestine. As type 1 fimbriae function in physiologically distinct habitats, the urinary tract and the oropharynx, one might expect selection to favour differences in the adhesive properties of the fimbriae under these different conditions. In fact, it has been shown that type 1 fimbriae from uropathogenic E. coli tend to bind monomannose-containing glycoproteins with higher affinity than fimbriae from intestinal isolates58. This increased monomannose-binding capability results in an increased tropism of bacteria for uroepithelial cells and increased bladder colonization (as shown in the murine model of UTI), indicating that this property is adaptive for E. coli urovirulence58,59.

Examining the gene that encodes the adhesive subunit of type 1 fimbriae, FimH, a 30 kDa lectin-like protein located on the fimbrial tip, revealed that the increased monomannose binding is caused by the presence of single amino-acid replacements in different regions of the FimH protein. Therefore, these point mutations in fimH are adaptive to the virulence habitat of uropathogenic E. coli, the urinary tract. At the same time, the increased affinity for monomannose residues is accompanied by an increased sensitivity of the mutant fimbriae to inhibition by soluble mannosylated glycoproteins, which are present in abundance in human saliva59,60,61. This inhibition presumably reduces the ability of type 1 fimbriae to mediate adhesion to the oropharyngeal epithelia and therefore decreases the chance of successful transmission of the clone through the stomach acid to the intestinal reservoir. Therefore, the monomannose-affinity-enhancing mutations in FimH are adaptive for urovirulence, but exhibit a functional trade-off with the original function of FimH in the reservoir habitat.

The phylogenic relationship of fimH adhesin alleles (from a set of isolates split between intestinal and UTI origin) was analysed in the same way as the mucA regulator alleles62 (Fig. 3b). The phylogram structures are strikingly similar for the P. aeruginosa and E. coli proteins, with nodes containing silent variation at the centre of the phylogram and a significant number of recently evolved variants forming singleton nodes or nodes without silent variation located on the external branches. Similar to the non-functional pathoadaptive MucA variants, the E. coli UTI-adaptive FimH variants (those carrying mutations that enhance monomannose binding) are found on the recently derived nodes that are more external. By contrast, similar to the fully functional MucA from environmental P. aeruginosa, FimH variants from which the external nodes are derived exhibit a low monomannose-binding capability (FimH that is adaptive for circulation in the intestinal habitat of E. coli) and form nodes that contain silent variation. This supports the hypothesis that the urinary tract is a sink habitat for E. coli bacteria that cannot support long-term circulation of the uro-adapted clone.

Unlike the P. aeruginosa MucA mutations, sink-adaptive FimH variants are found in intestinal isolates, although in a much smaller proportion than in UTI isolates58,62. Obviously, once the uropathogenic clones return to faecal-oral circulation, the uro-adaptive fimH alleles are not as strongly selected against in the reservoir habitat as the CF-adaptive or urocatheter-adaptive mucA alleles are. This is not surprising, because mutations in fimH do not abolish protein function, but result in an adaptive 'tune-up'. Therefore, clones carrying the mutant fimH return to faecal-oral circulation for at least some time and are likely to invade the urinary tract again, possibly at a higher rate than clones carrying the non-adapted fimH. However, the functional trade-off of fimH mutations is obviously significant enough to make the carrying clone unstable in the long term, especially considering that uropathogenic clones are likely to accumulate multiple sink-adaptive, but source-detrimental, mutations.

Importantly, the phylogenetic pattern demonstrated by mucA and fimH is not shown by haplotypes of adk, the gene encoding the housekeeping protein adenylate kinase (Fig. 3c), even though the sequences are from the same E. coli strains as the fimH alleles. Almost all adk alleles are within stable, multi-haplotype nodes, containing both strains isolated from the urinary tract and the intestinal tract. Therefore, there is no evidence of strong selection for uro-adaptive mutations in the adk gene. Apart from the adhesins, another trait that could be under selection in uropathogenic E. coli is the ability of fast growth in urine63. Choosing the correct gene for the analysis is therefore crucially important when one wishes to determine whether or not the population dynamics of a bacterial pathogen, or any other species, conform to the source–sink model. Alternatively, when complete genome sequences for many isolates are available, one can use the molecular-phylogeny-based structure–function analysis described here to discover which genes are under the kind of diversifying selection expected in source and sink habitats.

Conclusion

The source–sink model of population dynamics presented here provides a conceptual framework for understanding both short-term and long-term patterns in evolution. The assumption that virulence habitats are marginal sink habitats for at least some pathogens and that virulence-enhancing genetic adaptation is mostly transient in nature is supported by the molecular evolutionary footprint of genes under selection in sinks. In this paper we have focused on opportunistic pathogens adapting to pathogenic habitats because we have both a clear idea of what the source and sink habitats are and molecular data on the adaptation of particular genes are available, however, this analytical technique is not limited to analysis of the evolution of virulence. We could use the source–sink approach to determine the genes under source–sink adaptation for professional pathogens adapting to commensal habitats or we could use it to determine whether a bacterial species spans different habitats or is adapted to a single habitat. The genetic footprint proposed for the source–sink model could help in understanding the adaptive dynamics of other prokaryotic or eukaryotic species known or suspected to exist in alternative habitats which, eventually, could provide insights into the molecular genetic mechanisms of niche differentiation and the emergence of new species.