Main

Yersinia pestis is primarily a rodent pathogen, usually transmitted subcutaneously to humans by the bite of an infected flea, but also transmitted by air, especially during pandemics of disease. Notably, Y. pestis is very closely related to the gastrointestinal pathogen Yersinia pseudotuberculosis, and it has been proposed that Y. pestis is a clone that evolved from Y. pseudotuberculosis (probably serotype O:1b (ref. 3)) 1,500–20,000 years ago4. Thus Y. pestis seems to have rapidly adapted from being a mammalian enteropathogen widely found in the environment, to a blood-borne pathogen of mammals that is also able to parasitize insects and has limited capability for survival outside these hosts. Horizontally acquired DNA may be significant in having enabled Y. pestis to adapt to new hosts; conversely, the identification of gene remnants produced through genome decay may be associated with a redundant enteric lifestyle. Given the historical importance of plague and the need to understand the evolution and pathogenesis of such a potentially devastating pathogen, we undertook the genome sequencing of Y. pestis CO92 (biovar Orientalis), a strain recently isolated from a fatal human case of primary pneumonic plague contracted from an infected cat5.

The general features of the genome are shown in Fig. 1 and Table 1. The most striking large-scale features in the genome are anomalies in GC bias. All bacterial genomes sequenced to date have a small but detectable bias towards G on the leading strand of the bidirectional replication fork6. Anomalies in this plot can be caused by the very recent acquisition of DNA (such as prophages) or by the inversion or translocation of blocks of DNA. The three anomalies visible in the Y. pestis plot (see Supplementary Information; see also http://www.sanger.ac.uk/Projects/Y_pestis/) are each bounded by insertion sequence elements, suggesting that they could be the result of recent recombination between these perfect repeats. To investigate this, we designed polymerase chain reaction (PCR) primers to test for the presence and absence of the predicted translocation, and for the orientation of the two inversions (see Supplementary Information). PCR confirmed the position of the translocation, but, intriguingly, the results for the two inversions showed that both orientations were present in the same DNA preparation, with the inverse orientation predominating. This suggests genomic rearrangement during growth of the organism. The results were similar in DNA from three different subcultures of CO92 and investigation of other strains indicated that similar rearrangements may have occurred (see Supplementary Information). These results demonstrate that the Y. pestis genome is fluid, and capable of frequent intragenomic recombination in vitro; the rapid emergence of new ribotypes of Y. pestis biovar Orientalis in the environment following pandemic spread7 shows that chromosomal rearrangements are common in vivo. The effects of these rearrangements on the biology and pathogenicity of the organism are unknown.

Figure 1: Circular representation of the Y. pestis genome.
figure 1

The outer scale is marked in megabases. Circles 1 and 2 (from the outside in), all genes colour coded by function, forward and reverse strand; circles 3 and 4, pseudogenes; circles 5 and 6, insertion sequence elements (blue, IS1661; black, IS285; red, IS1541; green, IS100); circle 7, G + C content (higher values outward); circle 8, GC bias ((G - C/G + C), khaki indicates values >1, purple <1). Colour coding for genes: dark blue, pathogenicity or adaptation; black, energy metabolism; red, information transfer; dark green, surface associated; cyan, degradation of large molecules; magenta, degradation of small molecules; yellow, central or intermediary metabolism; pale blue, regulators; orange, conserved hypothetical; brown, pseudogenes; pink, phage and insertion sequence elements; pale green, unknown; grey, miscellaneous.

Table 1 General features of the Yersinia pestis genome

Gene acquisition has been important in the evolution of Y. pestis. In addition to the 70-kb virulence plasmid (pYV/pCD1) found in all pathogenic Yersinia, Y. pestis has acquired two unique plasmids that encode a variety of virulence determinants. A 9.5-kb plasmid (pPst/pPCP1) encodes the plasminogen activator Pla (ref. 8), a putative invasin that is essential for virulence by the subcutaneous route. A 100–110-kb plasmid (pFra/pMT1) encodes murine toxin Ymt and the F1 capsular protein, which have been shown to have a role in the transmission of plague. No conjugation apparatus is encoded by any Yersinia plasmid, but horizontal mobility has apparently occurred: a plasmid closely related to pFra (but lacking ymt and the caf operon) exists in the exclusively human pathogen Salmonella enterica serovar Typhi9.

Many regions within the Y. pestis chromosome showed some of the characteristics of islands acquired through lateral transfer (see Supplementary Information). Among these were several genes that seem to have come from other insect pathogens. Only two features of Y. pestis have so far been shown to be essential for aspects of its life cycle in the flea: the plasmid-encoded murine toxin Ymt is essential for flea colonization10, and the chromosomal hms locus is required for blockage of the flea midgut by Y. pestis to maximize its transmission1. This second locus is also present in Y. pseudotuberculosis11.

Sequences related to the parasitism of insects include genes encoding homologues of the insecticidal toxin complexes (Tcs) from Photorhabdus luminescens, Serratia entomophila and Xenorhabdus nematophilus12. These toxins are complexes of the products of three different gene families: tcaA/tcaB/tcdA, tcaC/tcdB and tccC. In Y. pestis, three adjacent genes encoding homologues of P. luminescens TcaA, TcaB and TcaC were identified (YPO3681, 35% identity; YPO3679, 42% identity; and YPO3678, 49% identity) separated from nearby homologues of TccC (YPO3673, 52% identity; and YPO3674, 54% identity) by phage-related genes. The tcaA gene was intact, but tcaB contained a frameshift mutation and tcaC an internal deletion. The disruption of these genes might be necessary for the lifestyle of Y. pestis, which persists in the flea gut for relatively long periods. Two isolated genes encoding homologues of TccC were also present in the genome (YPO2312, 60% identity; and YPO2380, 74% identity). Strikingly, a protein showing weak, but significant, similarity only to the enhancins encoded uniquely by baculoviral pathogens of insects was identified (YPO0339, 25% identity) in a region of low G + C content, flanked by a transfer RNA gene and transposase fragments, suggesting horizontal acquisition. During baculoviral pathogenesis, the proteolytic activity of enhancin damages the peritrophic membrane, which normally provides a physical barrier against microbial pathogens in the insect midgut13 and the product of YPO0339 may be important in colonization of the flea by Y. pestis. Interestingly, PCR shows (R.W.T., data not shown) that the tca insecticidal toxin genes are also present in Y. pseudotuberculosis IP 32953 (virulent serotype I strain). This suggests an association between Y. pseudotuberculosis and insects, or insect pathogens in the environment, before the emergence of Y. pestis.

The type III secretion system located on the Yersinia virulence plasmid (pYV/pCD1) is present in all three human pathogenic species of Yersinia. This system allows translocation of a range of effector proteins (Yops) that downregulate the responses of host phagocytic cells to infection. Human pathogenic Yersinia possessing mutations that disrupt this system are severely attenuated14. A second chromosomally encoded type III secretion system, similar in gene content and order to the SPI-2 type III system of Salmonella typhimurium15, is also present in Y. pestis (Fig. 2). This secretion system is distinct from the chromosomally borne type III system in Y. enterocolitica16. We were unable to identify the effectors for this system, as these are generally divergent.

Figure 2: Protein secretion systems in Y. pestis.
figure 2

a, Chaperone–usher systems. The plasmid pMT1-borne caf system and nine chromosomal systems are shown. Blue, chaperones; red, usher proteins; green, putative target proteins; black, regulatory proteins; pink, transposases and integrases; brown, other genes. IS, insertion of insertion sequence element; FS, frameshift. b, Type III secretion systems. The chromosomal and plasmid type III systems are shown, with S. typhimurium SPI-2 as a comparator. c, Flagella operons. The parts of the flagella clusters with similarity to type III systems are shown, with the E. coli flagella cluster as a comparator. The genes are arbitrarily colour coded in b and c to show related genes.

The genome sequence of Y. pestis CO92 also revealed the presence of a range of genes predicted to encode previously unknown surface antigens, which might have a role in the virulence of the bacterium. Two fimbrial-type surface structures were known in Y. pestis: the locus encoding pH6 antigen (psa) is chromosomally borne and is also found in Y. pseudotuberculosis, and the caf locus, encoding the F1 antigen, is found only in Y. pestis on plasmid pFra. We also identified eight systems with a similar organization to the psa and caf operons, each of which provides potential mechanisms for fimbrial or adhesin production (Fig. 2). In five cases the operons are flanked by genes encoding transposases or integrases, again implying horizontal acquisition. Such high redundancy of fimbria-related genes is also found in other bacterial pathogens, including Escherichia coli and Salmonella enterica serovar Typhi17. A large arsenal of independent gene clusters encoding different fimbriae and adhesins might be beneficial in evading host immune response, or may allow multiple interactions with several different hosts during its complex life cycle.

Although there is ample evidence of horizontal gene acquisition, many of these sequences will have been acquired before Y. pestis evolved from Y. pseudotuberculosis. Preliminary comparisons with the unfinished Y. pseudotuberculosis sequence from the Lawrence Livermore National Laboratories (http://bbrp.llnl.gov/bbrp/html/microbe.html) indicate that around 40% of the islands identified may be partially or completely absent from Y. pseudotuberculosis, although this must be treated with caution, given the incomplete nature of the data. Phylogenetic analysis to accurately date the acquisition of these genetic islands will have to await the completion of the genome sequences of Y. pseudotuberculosis and Y. enterocolitica.

Horizontal gene acquisition in Y. pestis has been balanced by gene loss. In total the genome sequence contains 149 pseudogenes, of which 51 are a consequence of disruption by insertion sequence elements. The total number of insertion sequences exceeds that described in most other bacterial genomes, comprising 3.7% of the genome. At least four different insertion sequences (IS) were found on the chromosome: 66 complete or partial copies of IS1541, 44 of IS100, 21 of IS285 and 9 of IS1661. Overall numbers of insertion sequence copies are around tenfold higher than in Y. pseudotuberculosis (IS1541, 7–13 copies18; IS100, 0–6 copies19). Fifty-eight pseudogenes were due to frameshift mutations (21 at homopolymeric tracts), 32 due to deletions, and the remainder due to in-frame stop codons. Mutations in genes associated with pathogenicity were over-represented (see Supplementary Information). Plasmid pCD1, which is shared with the enteropathogens Y. enterocolitica and Y. pseudotuberculosis, contained mutations in virulence-related genes ylpA and yadA, and in five other genes, but of the unique Y. pestis plasmids, pMT1 has only two pseudogenes and pPst has none.

The change in lifestyle of Y. pestis compared with the ancestral Y. pseudotuberculosis strain would be expected to result in the loss of genes required for enteropathogenicity. Enteropathogens specifically adhere to surfaces of the gut and may invade cells lining it. Proteins important for this process in Y. pseudotuberculosis include YadA and Invasin, both of which are represented by pseudogenes in Y. pestis (see Supplementary Information). Several of the other pseudogenes reported here could encode adhesin molecules, suggesting that some of these might also have been required for enteropathogenicity. The pseudogene YPO1562 shows 34% amino-acid identity with E. coli intimin, which is carried on a pathogenicity island termed the locus of enterocyte effacement. Yersinia pseudotuberculosis also contains an intact gene sharing 61% amino-acid identity with a further E. coli virulence factor, cytotoxic necrotizing factor 1 (CNF1); this is also represented by a pseudogene (YPO1449) in Y. pestis. CNF1 acts on Rho GTPases to affect cytoskeletal rearrangement20, which may be required for epithelial invasion.

Thirty-eight (26%) of the pseudogenes would have encoded or synthesized surface-expressed antigens or exported proteins. Several mucosal pathogens have been shown to switch surface-expressed antigens on or off in vitro and in vivo using slipped-strand mispairing of repeat sequences during replication21. A similar process has been demonstrated in Y. pestis. The organism is characteristically urease negative but activity can be restored in vitro by the spontaneous deletion of a single base pair in a homopolymeric tract in ureD22. This type of reversible mutation would reduce the metabolic burden of producing proteins unnecessary to Y. pestis in its new flea/mammal life cycle yet still allow the potential to express these should a subsequent need arise.

Some typical virulence properties of enteropathogens are associated with systems that in Y. pestis are subject to multiple mutations which would make reversion unlikely. Motility is required for efficient invasion of host cells by Y. enterocolitica23, apparently by promoting bacterial contact with the host cell; strains of Y. pestis are uniformly nonmotile, and analysis of the flagellar and chemotaxis gene clusters reveal six mutations. However, there are two separate and distinct flagellar gene clusters (Fig. 2), one of which contains no obvious mutations in flagellar biosynthetic or structural genes, suggesting that some form of motility may be possible under as-yet-undiscovered conditions.

Five pseudogenes were lipopolysaccharide (LPS) biosynthesis genes. The O side chain of LPS is an important factor in the virulence of a range of pathogens, including Y. enterocolitica24. The O side chain mediates resistance to complement-mediated and phagocyte killing. It might also have a role in survival in the gut by protecting the bacterium from cationic peptides, such as those produced by Paneth cells in the human small intestine25. Yersinia pestis produces a rough LPS, lacking an O antigen as a consequence of these mutations within the biosynthesis cluster of the O antigen3, but the nature of the selective advantage this may confer to Y. pestis is unexplained. The surface adhesin Ail (YPO2905) is also involved in serum resistance in Y. enterocolitica26. An IS285 insertion was reported in the ail gene of a laboratory-adapted strain of Y. pestis used as a vaccine27, and ail is generally assumed to be inactivated in Y. pestis8. However, in strain CO92 it is intact. Three other intact genes encoding different Ail-like proteins were identified (YPO1850, YPO2190, YPO2506); Y. pestis expresses an unidentified plasmid-independent adhesin8 that could therefore be Ail or a paralogue. Rough mutants of Y. enterocolitica show an enhanced ability to invade cultured cells, a suggested consequence of the enhanced accessibility of surface Ail28. However, a recent mutagenesis study in Y. pseudotuberculosis found that O-antigen mutants were unable to invade epithelial cells29 and were attenuated when infecting mice both orally and systemically, suggesting that as Y. pestis is invasive8 and virulent, it may have alternative surface structures that complement the loss of the O antigen.

Mutations in energy metabolism and central and intermediary metabolism are comparatively rare (see Supplementary Information). A mutation in the methionine biosynthetic pathway, and in pheA, may account for known requirements for growth in vitro of one or more of the amino acids cysteine, methionine and phenylalanine1. In contrast, many of the mutations were in uptake and transport systems, indicating that, compared with life as a gastrointestinal pathogen, fewer or different types of nutrients are available to Y. pestis within the flea or mammal. Three isolated genes possibly involved in iron uptake and a gene essential for aerobactin production (iucA, YPO0989) are inactivated in Y. pestis CO92. However, adding to previously described iron-uptake systems, we identified three apparently functional siderophore biosynthesis operons, five siderophore uptake systems, a non-siderophore iron-uptake system and a second haem-acquisition system.

Several mechanisms account for the accumulation of pseudogenes in Y. pestis, including expansion of insertion sequence elements, deletion, point mutation and slippage in homopolymeric tracts. Together, these have resulted in the loss of genes that were either not required in its new niche, or may have decreased fitness for a new lifestyle as a systemic pathogen of mammals and an insect pathogen. Comparison of the pseudogenes (excluding those in insertion sequences and phages) with the unfinished Y. pseudotuberculosis sequence (see above) shows that, of the >60% for which information is available, >95% of the Y. pseudotuberculosis orthologues do not carry the inactivating mutation. This, combined with the published evidence of intact genes in Y. pseudotuberculosis detailed above, indicates that most of the mutations in Y. pestis are recent. They may have arisen in a gradual process since the colonization of this new niche, or Y. pestis could have acquired these pseudogenes as a consequence of an evolutionary bottleneck that may have accompanied the colonization process.

The genome sequence of Y. pestis reveals a pathogen that has undergone considerable genetic flux, with evidence of selective genome expansion by lateral gene transfer of plasmid and chromosomal genes, and subsequent initial stages of genome size reduction. We believe that these features correlate with a change in pathogenic niche, and therefore this genome sequence provides a unique insight into the genetic events that are associated with the emergence of a new pathogenic species. The newly emerged pathogen is highly virulent for humans, causing pandemics of systemic, and often fatal, disease, contrasting with the ancestral species that evolved to cause non-fatal enteritis in similar hosts.

Methods

A single colony of Y. pestis strain CO92 was picked from Congo Red agar and grown overnight in BAB broth with shaking at 37 °C. Cells were collected and total DNA (10 mg) was isolated using proteinase K treatment followed by phenol extraction. The DNA was fragmented by sonication, and several libraries were generated in pUC18 using size fractions ranging from 1.0 to 2.5 kb. The whole genome sequence was obtained from 94,881 end sequences (giving 9.6× coverage) derived from these libraries using dye terminator chemistry on ABI377 automated sequencers. End sequences from larger insert plasmid (pSP64; 3.1× clone coverage, 9–11 kb insert size) and lambda (lambda-FIX-II; 4.0× clone coverage, 20–22 kb insert size) libraries were used as a scaffold. The sequence was assembled, finished and annotated as described previously30, using the program Artemis (http://www.sanger.ac.uk/Software/Artemis) to collate data and facilitate annotation. The genome sequences of Y. pestis and E. coli were compared pairwise using the Artemis Comparison Tool (ACT) (http://www.sanger.ac.uk/Software/ACT). Pseudogenes had one or more mutations that would ablate expression; each of the inactivating mutations was subsequently checked against the original sequencing data.