Main

B. anthracis has become notorious as a bioweapon because of its tough, environmentally resistant endospore and its ability to cause lethal inhalational anthrax. During the course of the disease, endospores are taken up by alveolar macrophages where they germinate in the phagolysosomal compartment1. Vegetative cells then escape from the macrophage, eventually infecting blood. Expression of the major plasmid-encoded virulence determinants, tripartite toxin and a poly-d-glutamic acid capsule, are essential for full pathogenicity1. Sequencing the chromosome of B. anthracis was undertaken to help identify additional genes that might contribute to virulence either by encoding functions necessary for the survival and escape from the mammalian macrophage or by enhancing evasion of the immune system and the extent of damage caused by the bacterium to its animal host.

The B. anthracis Ames chromosome sequenced in this work (5,227,293 base pairs, bp) derives from an isolate taken from a dead cow in Texas (Methods). This sequence differs in only 11 confirmed single nucleotide polymorphisms5 (SNPs) from the 2001 Florida attack Ames isolate, verifying that the chromosome sequenced to completion is essentially identical to a virulent strain. The chromosome encodes 5,508 predicted protein-coding sequences (Table 1) with a pronounced bias for genes on the replication leading strand (Fig. 1), as has been seen in other low G + C Gram-positive replicons6. A feature shared with the chromosomes of other endospore-forming Gram-positive species of the genera Bacillus and Clostridium7,8 is the concentration of the ribosomal RNA, transfer RNA and ribosomal protein genes around the replication origin. This arrangement may maximize protein synthesis during early rounds of DNA replication after germination from the dormant endospore phase. The chromosome also contains at least four prophages (Supplementary Information) as well as two type I introns, one of which disrupts the recA gene9. Housekeeping functions such as DNA replication and fatty-acid metabolism are overwhelmingly partitioned to the chromosome, whereas the pXO1 and pXO2 plasmids have a greater proportion of transposons, genes involved in toxicity and genes without function assigned (Table 1).

Table 1 Features of the B. anthracis Ames genome
Figure 1: Circular representation of the B. anthracis chromosome and comparative genome hybridizations of B. cereus group strains.
figure 1

Outer circle, predicted coding regions on the plus strand colour-coded by role categories (see Supplementary Fig. 4). Circle 2, predicted coding regions on the minus strand colour-coded by role categories. Circle 3, atypical nucleotide composition curve. Salmon colour, phage regions; yellow, other unique regions located around positions 2.0 and 4.3 Mb (referred to as regions 5 and 6 in the text). Circle 4, genes not represented on the array. Circle 5, genes present on the array. Genes were classified into three groups: genes present in the query strain (shown yellow), genes absent in the query strain (red), and diverged genes (blue). Missing data are in grey. B. cereus group strains are displayed following the phylogeny of Fig. 2 (circle number, strain number): 6, B.c. 874; 7, B.c. 535; 8, B.c. 612; 9, B.w. 1143; 10, B.t. 248; 11, B.t. 442; 12, B.c. 14579; 13, B.t. 775; 14, B.c. 259; 15, B.t. 1031; 16, B.t. 251; 17, B.c. 607; 18, B.c. ATCC 10987; 19, B.c. 812; 20, B.c. 819; 21, B.c. 831; 22, B.t. 840; 23, B.c. 1123; 24, B.c. 816. Here we use B.c, B.t. and B.w to indicate B. cereus, B. thuringiensis and Bacillus weihenstephanensis, respectively.

Most B. anthracis Ames chromosomal proteins have homologues to proteins encoded on the draft genome sequence of B. cereus ATCC 10987 (T.D.R., unpublished results), a closely related strain (Figs 1 and 2). There are only 141 proteins in B. anthracis for which a putative functional assignment could not be made that do not have a match in the protein set of B. cereus ATCC 10987 sequence (BLASTP10 E < 10-5). For the most part, these are encoded by genes of unknown function, are transposases or are present in phage regions. Almost all potential chromosomal virulence-enhancing genes have homologues in B. cereus ATCC 10987, suggesting that they are not specifically associated with the unique pathogenicity of B. anthracis but are part of the common arsenal of the B. cereus group of bacteria11.

Figure 2: Phylogenetic relationships among 19 B. cereus/B. thuringiensis strains inferred from CGH results for 3,601 chromosomal B. anthracis genes.
figure 2

The tree was built by applying the neighbour-joining algorithm to a pairwise distance matrix of percentages of differences between the presence/absence patterns of all strains (diverged genes not taken into account). Similar trees were obtained using the maximum-parsimony method. The scale bar represents 2% divergence. The arrow indicates the position where B. anthracis would emerge by extrapolation from multilocus enzyme electrophoresis analysis4.

The chromosome of B. anthracis Ames contains several homologues of genes known to be involved in B. cereus and B. thuringiensis pathogenesis. These include two channel-forming type III haemolysins (BA5701, BA2241) and a complex of three non-haemolytic enterotoxins (BA1887–1889). Several B. anthracis Ames proteins have sequence homology to proteins that contribute to the virulence of the Gram-positive pathogen Listeria monocytogenes12. These include phosphatidyl-inositol-specific and phosphatidyl-choline-preferring phospholipase C (BA0677 and BA3891), internalin-like genes (BA1346 and BA1406), listeriolysin O (BA3355), sigma factor B (BA0992) and p60 extracellular protease (BA1952 and BA5474). The significance of these homologies may lie in the similarities in the pathways of intracellular survival and multiplication of L. monocytogenes, and the germination, survival and escape from macrophages by B. anthracis.

B. anthracis contains a gene encoding a homologue of the enhancin protein (BA3443), first described in baculoviruses that infect gypsy moths. Enhancin is a metalloprotease that boosts viral infectivity by degrading the mucin layer surrounding insect guts13. A homologue of B. anthracis enhancin is also found in the genome of Yersinia pestis, which survives in both mammals and insects14. B. anthracis also contains two homologues of B. thuringiensis immune inhibitor A metalloprotease (BA0672 and BA1295), which enhances virulence in insects through cleavage of bacteriocidal lectins11. The presence of these genes may be evidence of an insect-infecting lifestyle in a recent ancestor.

Germination of the anthrax endospore is a key initial event in the B. anthracis infectious cycle. B. anthracis has seven (six chromosomal and one plasmid-borne) paralogues of the gerA family of tri-cistronic operons utilized by endospores to recognize the presence of specific small molecules to initiate the germination process15. Protection of DNA during dormancy and efficient DNA repair during germination are also believed to be important factors in endospore viability. B. anthracis has several homologues of the Bacillus subtilis small acid soluble DNA protection proteins, and the full complement of DNA repair proteins found in B. subtilis. B. anthracis also appears to have additional DNA repair capabilities focused on UV-induced DNA damage, with a unique deoxyribodipyrimidine photolyase gene (BA3180) and two, rather than one, UV dimer endonucleases. The photolyase is more closely related to enzymes from proteobacteria than those from other Gram-positive bacteria. The B. anthracis genome encodes several proteins that mitigate damage by free-oxygen radicals, including five catalases and three Fe-Mn superoxide dismutases. Other detoxification functions for which no obvious homologues could be found in B. subtilis include bromoperoxidase, thiolperoxidase, multiple thioredoxin proteins and a cytoplasmic Cu-Zn superoxide dismutase (SodC; BA5139). SodC has been shown to have a key role in the virulence of certain other intracellular bacteria, counteracting nitric oxide-mediated killing in the macrophage16.

The B. anthracis chromosome encodes a machinery for sporulation that is broadly similar to B. subtilis7. The proteins with the highest degree of sequence divergence between the species are endospore coat constituents and endospore polysaccharide biosynthesis components, suggesting altered composition of the outer surface. B. subtilis alternative sigma factors, which govern a cascade of events associated with cell development, are also generally conserved. One sigma factor missing in B. anthracis is sigD, which is essential for the expression of the flagellum operon17. However, L. monocytogenes, which is motile and carries a flagellum operon similar to B. anthracis, also lacks a sigD gene18.

Despite having numerous predicted secreted proteins encoded in its genome (Supplementary Information), B. anthracis is notable for paucity of extracellular protease activity under standard laboratory conditions19. One reason for this lack of protein secretion may lie in a mutation that affects regulation of gene expression: a nonsense mutation in the plcR positive regulator gene20. In B. thuringiensis and B. cereus, the plcR gene product is known to upregulate the production of numerous extracellular enzymes through binding at an upstream motif (TATGNAN4TNCATA). Although the B. anthracis plcR homologue is truncated, there are 56 putative plcR binding motifs in the chromosome and 2 on pXO2. The extracellular protein genes downstream include phospholipases, enterotoxins and haemolysins (Table 2), and the plcR mutation has been shown to account for a dramatic reduction in lecithinase, protease and haemolysin production by B. anthracis19. However, it is possible that some PlcR-regulated gene products still contribute to virulence but are under alternative regulatory controls, as low-level expression of some of the genes in the PlcR regulon has been reported in B. anthracis19. There is another PlcR-family protein in the genome (BA0597) that might potentially function to complement expression under certain conditions.

Table 2 Putative PlcR-regulated proteins

The chromosome of B. anthracis contains three homologues of the sortase transpeptidase responsible for attachment of secreted proteins to peptidoglycans on the cell surface of Gram-positive bacteria21, and also contains the csaAB genes for binding of proteins with S-layer homology (SLH) domains to polysaccharide. Using searches against models for the sortase attachment sites and SLH domains, 34 candidate surface proteins were identified (Supplementary Information). Two putative B. anthracis sortase-attached genes have internalin-like repeats11. The potential role of most proteins with SLH domains on the surface of B. anthracis is unknown at present. However, these surface proteins may mediate as-yet-unknown interactions between B. anthracis and its external environment, and could be targets for vaccine and drug design.

The broad similarity in metabolic and transport genes of B. anthracis and B. subtilis (the model aerobic Gram-positive organism)7 suggests many common capabilities, yet there are a number of idiosyncrasies that may shed light on the ecology of B. anthracis. Compared to B. subtilis, B. anthracis appears to have an expanded capacity for amino-acid and peptide utilization. For instance, there are 17 ABC-type peptide binding proteins in B. anthracis compared with four in B. subtilis; and there are nine homologues of the BrnQ branched chain amino-acid transporter in B. anthracis and only two in B. subtilis. B. anthracis also has an expanded number of secreted proteases and peptidases relative to B. subtilis and a number of amino-acid utilization genes not found to date in other Bacillus genomes such as homogentisate dioxygenase (BA0242), involved in tyrosine degradation. Emphasizing the potential importance of peptides and amino acids for B. anthracis metabolism, there are six LysE/Rht amino-acid efflux systems compared with two in B. subtilis. These systems prevent accumulation of amino acids to bacteriostatic concentrations during growth on peptides22. B. anthracis may therefore be adapted for life in a protein-rich environment, such as decaying animal matter.

B. anthracis appears to have a reduced capacity for sugar utilization relative to B. subtilis. It lacks catabolic pathways for mannose, arabinose and rhamnose, and has reduced numbers of phosphotransferase systems and other types of sugar transporters. B. anthracis possesses genes for the cleavage of extracellular chitin and chitosan, and the utilization of N-acetylglucosamine constituents of these polymers. This may reflect some type of association with insects analogous to B. thuringiensis, or with polymers derived from plant or fungal material. B. anthracis contains a complete operon for polyester biosynthesis, which may function as an alternative energy storage compound for the organism. B. anthracis also has a multisubunit NADH hydrogenase not described before in Gram-positive bacteria.

B. anthracis possesses an expanded array of iron-acquisition genes compared to B. subtilis that may be important for iron scavenging in a mammalian host. These include 15 ABC uptake systems for iron siderophores or chelates, as well as two clusters of genes for the biosynthesis of siderophores. Two genes involved in synthesis of an aerobactin-like siderophore are not found in B. subtilis or the B. cereus ATCC 10987 sequence (BA1981, BA1982). Like B. subtilis and other soil bacteria, B. anthracis encodes a broad swathe of predicted drug efflux pumps, and a variety of other antibiotic-resistance genes are also present. However, it is unknown whether these contribute to resistance in a clinical setting23.

We designed a B. anthracis DNA microarray on the basis of identifiable genes present at the conclusion of random phase sequencing. The microarray was used to compare B. anthracis to 19 members of the B. cereus group by comparative genome hybridization (CGH) (Fig. 1). Strains examined by CGH possessed 66–92% of their chromosomal genes in common with B. anthracis. Genes unique to B. anthracis in particular, and the B. cereus group in general, appear to be over-represented in the 2.0-Mb chromosomal region (coordinates 1,500,000 to 3,500,000 in Fig. 1) surrounding the presumed terminus of replication. Genome plasticity around the replication terminus has been seen in other comparisons of bacterial genomes24. Six smaller regions of the B. anthracis genome appeared to be absent from nearly all other B. cereus group strains tested (Fig. 1). Regions one to four correspond to the B. anthracis prophages (Supplementary Information), and region five centres on an IS110 family insertion element. Only region six does not bear obvious relationship to mobile elements. The magnitude of genomic variability based on CGH experiments comparing the B. anthracis Ames microarray and B. cereus group strains (Fig. 1) is 25–100 times greater than in similar experiments involving comparison of B. anthracis Ames to other B. anthracis strains (T. Blank and S.N.P., unpublished results). This reflects the very limited molecular diversity of the B. anthracis species25.

Hybridization experiments indicate the presence of pXO1 homologues in half of the 19 strains examined (Supplementary Information), consistent with what has been shown in other studies26. Few genes from the pXO1 pathogenicity island, pXO1-96 to pXO1-127 (ref. 2), appeared to be present in the 19 B. cereus group strains. The toxin genes, central to anthrax aetiology, are found only in B. anthracis and not in any of the 19 B. cereus group strains sampled. In sharp contrast to pXO1, there were few pXO2 genes hybridizing with genomic DNA from the 19 B. cereus group bacteria.

Ratios obtained by CGH for chromosomal genes were used to infer the phylogenetic relationships among the B. cereus strains, producing three clusters (I, IIa and IIb; Fig. 2). The groupings were compatible with the phylogeny reconstructed using multilocus enzyme electrophoresis4. By extrapolation of the results from that study to the microarray-based phylogeny, B. anthracis would emerge within cluster IIa (arrow in Fig. 2). The presence of genes with pXO1 sequence identity in various branches covering all three clusters of the B. cereus group tree (Fig. 2), and the distribution of pXO1-like genes in the B. cereus group independent of the chromosomal relatedness among the strains (Supplementary Information), provides further evidence for mobility of pXO1 genes within the B. cereus group. Plasmid transfer within the B. cereus group is well established27, and there are numerous mobility genes on pXO12. Despite the evidence for genomic variability in the B. cereus group, the B. anthracis chromosome and virulence plasmids display little localized variation in G + C content and dinucleotide composition (generally associated with horizontally acquired genes from distantly related donors), suggesting that most genes are native to the B. cereus group.

The B. anthracis chromosome sequence portrays a soil-dwelling organism, possessing numerous potential virulence genes, which has possibly a preference for protein-rich environments. This is consistent with the evolution of B. anthracis from a B. cereus ancestor through acquisition of key plasmid-encoded toxin, capsule and regulatory loci. CGH data presented here demonstrate variability in plasmid gene content among the group as compared to chromosomal genes. Other major differences between B. anthracis and B. cereus may have been effected through altered gene expression rather than loss or gain of genes. Although both species contain genes associated with secreted proteases, haemolysins, extracellular chitinases11, motility, tyrosine degradation and penicillin resistance23, B. anthracis and B. cereus phenotypes differ with respect to the function of these genes. These changes in expression may reflect recent adaptations following acquisition of the pathogenicity island that contains the lethal toxin loci on pXO1. The atxA regulatory gene in this region controls toxin gene expression but is incompatible with the chromosomal regulator plcR, found in B. cereus19. The worldwide, near-clonal spread of the organism25 suggests that expression of the toxin and capsule genes confers an advantage to B. anthracis that outweighs changes in the chromosomal gene expression. Findings from this genome sequence analysis raise further questions about the biology of B. anthracis; for instance, what are the roles of putative ‘virulence’ genes in close relatives of B. anthracis that do not cause anthrax, and do they actually contribute to virulence in B. anthracis?

Methods

Genome sequencing and analysis of B. anthracis Ames (pXO1- pXO2-)

B. anthracis Ames was cured of plasmid pXO1 by incubation at 43 °C and pXO2 subsequently cured by novobiocin treatment (Supplementary Methods). As previously described, the chromosome was sequenced using two DNA preparations. For the first (Porton1)5, 2–3 kilobase (kb) and 4–7 kb random insert libraries in plasmid-derived vectors were constructed and end-sequenced following the standard strategy for TIGR microbial shotgun projects6, achieving success rates of 74% and 64% and average high-quality read lengths of 559 nucleotides (nt) and 586 nt, respectively. For the Porton2 preparation5, libraries of 2–3 kb and 6–8 kb were constructed with success rates of 89% and 85% and average high-quality read lengths of 609 nt and 645 nt. The completed chromosome sequence consisted of 73,806 and 6,052 reads from the Porton1 small and large insert libraries, and 3,532 and 32,430 from the Porton2 small and large insert libraries—achieving an average of 13-fold sequence coverage per base. After assembly, gaps between contigs were closed by editing, walking library clones, and linking assemblies by polymerase chain reaction (PCR). The Glimmer gene finder28 was modified by enhancing its model of noncoding sequences. This improved its ability to exclude short open reading frames (ORFs), and substantially reduced the number of predicted small hypothetical proteins. Annotation was as described for a previous project6. BLASTP10 was used for comparisons of the protein sets of B. anthracis, B. cereus ATCC 10987 (T.D.R., unpublished results; http://www.tigr.org/tdb/ufmg) and other complete bacterial genomes (http://www.tigr.org/tigr-scripts/CMR2/CMRHomePage.spl and http://genolist.pasteur.fr/SubtiList/index.html). A predicted probability score of less than 10-5 was used as a standard cut-off to define a likely match.

DNA microarray preparation and analysis

Amplicons representing 79 of 217 and 41 of 122 genes from pXO1 and pXO2 respectively, and 3,601 of 5,753 chromosomal genes as predicted by Glimmer28 (see Supplementary Methods) were arrayed onto glass microscope slides (Telechem Inc.). Redundant genes were generally represented once or a few times on the array. Genomic DNA was labelled with Cy3 and Cy5 according to J. DeRisi (http://www.microarrays.org/pdfs/GenomicDNALabel_B.pdf), except that genomic DNA was not digested or sheared before labelling. Arrays were scanned with a GenePix 4000B scanner (Axon Inc.). Hybridization signals were quantified using TIGR SPOTFINDER (software available at http://www.tigr.org/softlab). Hybridization experiments were competitive using probes derived from B. anthracis Ames (reference) and a B. cereus group (query) strain. Normalized signal intensities were used to generate relative hybridization ratios (query/reference). Data representing weak signal were removed. The ratios from a maximum of six data points (duplicate spots, hybridizations performed in triplicate) were placed in three bins: <0.1, gene is absent in query strain; 0.1–0.3, present but diverged in query strain; and >0.3, gene is present in the query strain. A majority rule was applied to the data for binning such that more than 50% of ratios were in agreement as to assignment and that at least two data points were used (exceeded in 99% of the cases). In cases where less than two data points existed, the gene was treated as data missing.

The criteria for the numerical ranges of our bins were established in two ways. First, we determined the presence or absence of sequences homologous to 3,601 B. anthracis genes in the sequence of B. cereus ATCC 14579 (Integrated Genomics Inc.; http://www.integratedgenomics.com/) using BLASTN10, and compared that to the assignments inferred from hybridization ratios. A threshold of 0.1 was found to be suitable for classifying a gene as absent (that is, agreement between sequence and CGH data in 99% of the cases), while a cut-off value of 0.3 was conservative for gene presence (agreement in 92% of the cases). Second, we used a set of 65 genes conserved in 26 bacterial genomes, NCBI COG database (http://www.ncbi.nlm.nih.gov/COG/). Genes judged as present in query strains using our selected cut-offs correctly binned data in 1,225 out of 1,235 total calls. There was a tendency for underprediction of plasmid homologues by CGH, when compared to results from the sequence analysis. Two possible explanations for this are variability in plasmid copy number in B. cereus strains relative to B. anthracis1 and/or that the average divergence of plasmid genes is greater than chromosomal genes.

Other techniques and analysis

PCR amplification for microarray spotting, pulsed field gel electrophoresis and Southern blotting are described in Supplementary Information, as is the phylogenetic analysis of chromosomal data.