Main

Citrus variegated chlorosis (CVC), which was first recorded in Brazil in 1987, affects all commercial sweet orange varieties1. Symptoms include conspicuous variegations on older leaves, with chlorotic areas on the upper side and corresponding light brown lesions, with gum-like material on the lower side. Affected fruits are small, hardened and of no commercial value. A strain of Xylella fastidiosa was first identified as the causal bacterium in 1993 (ref. 2) and found to be transmitted by sharpshooter leafhoppers in 1996 (ref. 3). CVC control is at present limited to removing infected shoots by pruning, the application of insecticides and the use of healthy plants for new orchards. In addition to CVC, other strains of X. fastidiosa cause a range of economically important plant diseases including Pierce's disease of grapevine, alfalfa dwarf, phony peach disease, periwinkle wilt and leaf scorch of plum, and are also associated with diseases in mulberry, pear, almond, elm, sycamore, oak, maple, pecan and coffee4. The triply cloned X. fastidiosa 9a5c, sequenced here, was derived from the pathogenic culture 8.1b obtained in 1992 in Bordeaux (France) from CVC-affected Valencia sweet orange twigs collected in Macaubal (São Paulo, Brazil) on May 21, 1992 (ref. 2). Strain 9a5c produces typical CVC symptoms on inoculation into experimental citrus plants5, and into Nicotiana tabacum (S. A. Lopes, personal communication) and Catharantus roseus (P. Brant-Monteiro, personal communication)—two novel experimental hosts.

General features of the genome

The basic features of the genome are listed in Table 1 , and a detailed map is shown in Fig. 1 ( pdf file 171K). The conserved origin of replication of the large chromosome has been identified in a region between the putative 50S ribosomal protein L34 and gyrB genes containing dnaA, dnaN and recF6. The Escherichia coli DnaA box consensus sequence TTATCCACA is found on both DNA strands close to dnaA. In addition, there are typical 13-nucleotide (ACCACCACCACCA) and 9-nucleotide (two TTTCATTGG and two TTTTATATT) sequences in other intergenic sequences of this region. This region is coincident with the calculated GC-skew signal inversion7. We have designated base 1 of the X. fastidiosa genome as the first T of the only TTTTAT sequence found between the ribosomal protein L34 gene and dnaA.

Table 1 General features of the Xylella fastidiosa 9a5c genome

The overall percentage of open reading frames (ORFs) for which a putative biological function could be assigned (47%) was slightly below that for other sequenced genomes such as Thermotoga maritima8 (54%), Deinococcus radiodurans9 (52.5%) and Neisseria meningitidis 10 (53.7%). This may reflect the lack of previous complete genome sequences from phytopathogenic bacteria. Plasmid pXF1.3 contains only two ORFs, one of which encodes a replication-associated protein. Plasmid pXF51 contains 64 ORFs, of which 5 encode proteins involved in replication or plasmid stability and 20 encode proteins potentially involved in conjugative transfer. One ORF encodes a protein similar to the virulence-associated protein D (VapD), found in many other bacterial pathogens11. Four regions of pXF51 present significant DNA similarity to parts of transposons found in plasmids from other bacteria, suggesting interspecific horizontal exchange of genetic material.

The principal paralogous families are summarized in Table 2. The complete list of ORFs with assigned function is shown in Table 3 ( pdf file 131K). Seventy-five proteins present in the 21 completely sequenced genomes in the COG database12 (as of 15th March 2000) were also found in X. fastidiosa. Each of these sequences was used to generate a phylogenetic tree of the 22 organisms. In 69% of such trees, X. fastidiosa was grouped with Haemophilus influenzae and E. coli, consistent with a phylogenetic analysis undertaken with the 16S rRNA gene13.

Table 2 Largest families of paralogous genes

One ORF, a cytosine methyltransferase (XF1774), is interrupted by a Group II intron. The intron was identified on the basis of the presence of a reverse transcriptase-like gene (as in other Group II introns), conserved splice sites, conserved sequence in structure V and conserved elements of secondary structure14. Group II introns are rare in prokaryotes, but have been found in different evolutive lineages including E. coli, cyanobacteria and proteobacteria15.

Transcription, translation and repair

The basic transcriptional and translational machinery of X. fastidiosa is similar to that of E. coli16. Recombinational repair, nucleotide and base-excision repair, and transcription-coupled repair are present with some noteworthy features. For example, no photolyase was found, indicating exclusively dark repair. Although the main genes of the SOS pathway, recA and lexA, are present, ORFs corresponding to the three DNA polymerases induced by SOS in E. coli (DNA polymerases II, IV and V)17 are missing, indicating that the mutational pathway itself may be distinct.

Energy metabolism

Even though X. fastidiosa is, as its name suggests, a fastidious organism, energy production is apparently efficient. In addition to all the genes for the glycolytic pathway, all genes for the tricarboxylic acid cycle and oxidative and electron transport chains are present. ATP synthesis is driven by the resulting chemiosmotic proton gradient and occurs by an F-type ATP synthase. Fructose, mannose and glycerol can be utilized in addition to glucose in the glycolytic pathway. There is a complete pathway for hydrolysis of cellulose to glucose, consisting of 1,4-β-cellobiosidase, endo-1,4-β-glucanase and β-glucosidase, suggesting that cellulose breakdown may supplement the often low concentrations of monosaccharides in the xylem18. Two lipases are encoded in the genome, but there is no β-oxidation pathway for the hydrolysis of fatty acids, presumably precluding their utilization as an alternative carbon and energy source. Likewise, although enzymes required for the breakdown of threonine, serine, glycine, alanine, aspartate and glutamate are present, pathways for the catabolism of the other naturally occurring amino acids are incomplete or absent.

The gluconeogenesis pathway appears to be incomplete. Phosphoenolpyruvate carboxykinase and the gluconeogenic enzyme fructose-1,6-bisphosphatase, which are required to bypass the irreversible step in glycolysis, are not present. The absence of the first is compensated by the presence of phosphoenolpyruvate synthase and malate oxidoreductase, which together can generate phosphoenolpyruvate from malate. There appears, however, to be no known compensating pathway for the absence of fructose-1,6-bisphosphatase. It is possible that among the large number of unidentified X. fastidiosa genes there are non-homologous genes that compensate for steps in such critical pathways. Barring this possibility, however, the absence of a functional gluconeogenesis pathway implies a strict dependence on carbohydrates both as a source of energy and anabolic precursors. The glyoxylate cycle is absent and the pentose phosphate pathway is incomplete. In the latter pathway, genes for neither 6-phosphogluconic dehydrogenase nor transaldolase were identified.

Small molecule metabolism

X. fastidiosa exhibits extensive biosynthetic capabilities, presumably an absolute requirement for a xylem-dwelling bacterium. Most of the genes found in E. coli necessary for the synthesis of all amino acids from chorismate, pyruvate, 3-phosphoglycerate, glutamate and oxaloacetic acid16 were identified. However, some genes in X. fastidiosa are bi-functional, such as phosphoribosyl-AMP cyclohydrolase/phosphoribosyl-ATP pyrophosphatase (XF2213), aspartokinase/homoserine dehydrogenase I (XF2225), imidazoleglycerolphosphate dehydratase/histidinol-phosphate phosphatase (XF2217) and a new diaminopimelate decarboxylase/aspartate kinase (XF1116) that would catalyse the first and the last steps of lysine biosynthesis. In addition, the gene for acetylglutamate kinase (XF1001) has an acetyltransferase domain at its carboxy-terminal end that would compensate for the missing acetyltransferase in the arginine biosynthesis pathway. Other missing genes include phosphoserine phosphatase, cystathionine β-lyase, homoserine O-succinyltransferase and 2,4,5-methyltetrahydrofolate-homocysteine methyltransferase. The first two enzymes are also absent in the Bacillus subtilis genome, the third is absent in Haemophilus influenzae and the fourth is missing in both genomes12. We thus presume that alternative, unidentified enzymes complete the biosynthetic pathways in these organisms and in X. fastidiosa.

The pathways for the synthesis of purines, pyrimidines and nucleotides are all complete. X. fastidiosa is also apparently capable of both synthesizing and elongating fatty acids from acetate. Again, however, some E. coli enzymes were not found, such as holo acyl-carrier-protein synthase (also absent in Synechocystis sp., H. influenzae and Mycoplasma genitalium) and enoyl-ACP reductase (NADPH) (FabI) (also absent from M. genitalium, Borrelia burgdorferi and Treponema pallidum )12.

X. fastidiosa appears to be capable of synthesizing an extensive variety of enzyme cofactors and prosthetic groups, including biotin, folic acid, pantothenate and coenzyme A, ubiquinone, glutathione, thioredoxin, glutaredoxin, riboflavin, FMN, FAD, pyrimidine nucleotides, porphyrin, thiamin, pyridoxal 5′-phosphate and lipoate. In a number of the synthetic pathways, one or more of the enzymes present in E. coli are absent, but this is also true for at least one other sequenced Gram-negative bacterial genome in each case12. We therefore again infer that the missing enzymes are either not essential or replaced by unknown proteins with novel structures.

Transport-related proteins

A total of 140 genes encoding transport-related proteins were identified, representing 4.8% of all ORFs. For comparison, E. coli, B. subtilis and M. genitalium have around 10% of genes encoding transport proteins, whereas Helicobacter pylori, Synechocystis sp. and Methanococcus jannaschii have 3.5–5.4% (ref. 19). Transport systems are central components of the host–pathogen relationship (Fig. 2). There are a number of ion transporters and transporters for the uptake of carbohydrates, amino acids, peptides, nitrate/nitrite, sulphate, phosphate and vitamin B12. Many different transport families are represented and include both small and large mechanosensitive conductance ion channels, a monovalent cation:proton antiporter (CAP-2) and a glycerol facilitator belonging to the major intrinsic protein (MIP) family. In addition, 23 ABC transport systems comprising 41 genes can be identified. X. fastidiosa appears to possess a phosphotransferase system (PTS) that typically mediates small carbohydrate uptake. There are both the enzyme I and HPr components of this system, as well as a gene supposedly involved in its regulation (pstK or hprK); however, there is no PTS permease—an essential component of the phosphotransferase complex. The functionality of the system therefore remains in question.

Figure 2: A comprehensive view of the biochemical processes involved in Xylella fastidiosa pathogenicity and survival in the host xylem.
figure 1

The principal functional categories are shown in bold, and the bacterial genes and gene products related to that function are arranged within the coloured section containing the bold heading. Transporters are indicated as follows: cylinders, channels; ovals, secondary carriers, including the MFS family; paired dumbbells, secondary carriers for drug extrusion; triple dumbbells, ABC transporters; bulb-like icon, F-type ATP synthase; squares, other transporters. Icons with two arrows represent symporters and antiporters (H+ or Na+ porters, unless noted otherwise). 2,5DDOL, 2,5-dichloro-2,5-cyclohexadiene-1,4-dol; EPS, exopolysaccharides; MATE, multi-antimicrobial extrusion family of transporters multidrug efflux gene (XF2686); MFS, major facilitator superfamily of transporters; Pbp, β-lactamase-like penicillin-binding protein (XF1621); RND, resistance-nodulation-cell division superfamily of transporters; ROS, reactive oxygen species.

There are five outer membrane receptors, including siderophores, ferrichrome-iron and haemin receptors, which are all associated with iron transport. The energizing complexes, TonB–ExbB–ExbD and the paralogous TolA–TolR–TolQ, essential for the functioning of the outer membrane receptors, are also present. In all, 67 genes encode proteins involved in iron metabolism. We propose that in X. fastidiosa the uptake of iron and possibly of other transition metal ions such as manganese causes a reduction in essential micronutrients in the plant xylem, contributing to the typical symptoms of leaf variegation.

The X. fastidiosa genome encodes a battery of proteins that mediate drug inactivation and detoxification, alteration of potential drug targets, prevention of drug entry and active extrusion of drugs and toxins. These include ABC transporters and transport processes driven by a proton gradient. Of the latter, eight belong to the hydrophobe/amphiphile efflux-1 (HAE1) family, which act as multidrug resistance factors.

Adhesion

X. fastidiosa is characteristically observed embedded in an extracellular translucent matrix in planta20. Clumps of bacteria form within the xylem vessels leading to their blockage and symptoms of the disease such as water-stress leaf curling. We deduce, from our analysis of the complete genome sequence, that the matrix is composed of extracellular polysaccharides (EPSs) synthesized by enzymes closely related to those of Xanthomonas campestris pv campestris (Xcc) that produce what is commercially known as xanthan gum. In comparison with Xcc, however, we did not find gumI (encoding glycosyltransferase V, which incorporates the terminal mannose), gumL (encoding ketalase which adds pyruvate to the polymer) or gumG (encoding acetyltransferase which adds acetate), suggesting that Xylella gum may be less viscous than its Xanthomonas counterpart.

Positive regulation of the synthesis of extracelullar enzymes and EPS in Xanthomomas is effected by proteins coded by the rpf (regulation of pathogenicity factors) gene cluster21. Mutations in any of these genes in Xanthomomas results in failure to synthesize the EPS. In consequence, the strain becomes non-pathogenic21. X. fastidiosa contains genes that encode RpfA, RpfB, RpfC and RpfF, suggesting that both bacteria may regulate the synthesis of pathogenic EPS factors through similar mechanisms.

Fimbria-like structures are readily apparent upon electron microscopical observation of X. fastidiosa within both its plant and insect hosts22. Because of the high velocity of xylem sap passing through narrow portions of the insect foregut, fimbria-mediated attachment may be essential for insect colonization. Indeed, in the insect mouthparts the bacteria are attached in ordered arrays, indicating specific and polarized adhesion23. In addition, fimbriae are thought to be involved in both plant–bacterium and bacterium–bacterium interactions during colonization of the xylem itself. We identified 26 genes encoding proteins responsible for the biogenesis and function of Type 4 fimbria filaments. This type of fimbria is found at the poles of a wide range of bacterial pathogens where they act to mediate adhesion and translocation along epithelial surfaces24. The genes include pilS and pilR homologues, which encode a two-component system controlling transcription of fimbrial subunits, presumably in response to host cues, and pilG, H, I, J and chpA, which encode a chemotactic system transducing environmental signals to the pilus machinery.

In addition to the EPS and fimbriae, which are likely to have central roles in the clumping of bacteria and in adhesion to the xylem walls, we also identified outer membrane protein homologues for afimbrial adhesins. Although fimbrial adhesins are well characterized as crucial virulence factors in both plant and human pathogens25, afimbrial adhesins, which are directly associated with the bacterial cell surface, have been hitherto associated only with human and animal pathogens, where they promote adherence to epithelial tissue. Of the three putative adhesins of this kind identified in X. fastidiosa , two exhibit significant similarity to each other (XF1981, XF1529) and to the hsf and hia gene products of H. influenzae 26. The third (XF1516) is similar to the uspA1 gene product of Moraxella catarrhalis27. All these proteins share the common C-terminal domain of the autotransporter family28. Direct experimentation will be required to establish whether these adhesins promote binding to plant cell structures or components of the insect vector foregut, or both. Nevertheless, their presence in the X. fastidiosa genome adds to the increasing evidence for the generality of mechanisms of bacterial pathogenicity, irrespective of the host organism29.

We also identified three different haemagglutinin-like genes. Again, similar genes have not previously been identified in plant pathogens. These genes (XF2775, XF2196, XF0889) are the largest in the genome and exhibit highest similarity to a Neisseria meningitidis putative secreted protein10.

Intervessel migration

Movement between individual xylem vessels is crucial for effective colonization by X. fastidiosa. For this to occur, degradation of the pit membrane of the xylem vessel is required. Of the known pectolytic enzymes capable of this function, a polygalacturonase precursor and a cellulase were identified, although the former contains an authentic frameshift. These genes exhibited highest similarity to orthologues in Ralstonia solanacearum—which causes wilt disease in tomatoes—where the polygalacturonase genes are required for wild-type virulence.

Toxicity

We identified five haemolysin-like genes: haemolysin III (XF0175), which belongs to an uncharacterized protein family, and four others (XF0668, XF1011, XF2407, XF2759) which belong to the RTX toxin family that contains tandemly repeated glycine-rich nonapeptide motifs at the C-terminal domain. One of these ORFs is closely related to bacteriocin, an RTX toxin also found in the plant bacterium Rhizobium leguminosarum30. RTX or RTX-like proteins are important virulence factors widely distributed among Gram-negative pathogenic bacteria31.

There are two Colicin-V-like precursor proteins. Colicin V is an antibacterial polypeptide toxin produced by E. coli, which acts against closely related sensitive bacteria32. The precursors consist of 102-amino-acid peptides (XF0262, XF0263) that have the typical conserved leader 15-amino-acid motif, and have some similarity with Colicin V from E. coli at the remaining C-terminal portion. The necessary apparatus for Colicin biosynthesis and secretion is also present. Interestingly, in E. coli most of the genes necessary for biogenesis and export of Colicin V are in a gene cluster present in a plasmid, whereas in X. fastidiosa these genes are dispersed in the chromosome.

We found four genes that may function in polyketide biogenesis: polyketide synthase (PKS), pteridine-dependent deoxygenase, daunorubicin C-13 ketoreductase and a NonF-related protein. These genes belong to the synthesis pathways of frenolicin, rapamycin, daunorubicin and nonactin, respectively. These pathways include many more enzymes, which we did not find; however, some of the genes listed lie close to ORFs without significant database matches, suggesting that at least one (as yet undiscovered) polyketide pathway may be functional.

Prophages

Bacteriophages can mediate the evolution and transfer of virulence factors and occasional acquisition of new traits by the bacterial host. Because as much as 7% of the X. fastidiosa genome sequenced corresponds to double-stranded (ds) DNA phage sequences, mostly from the Lambda group, we suspect that this route may have been of particular importance for this bacterium. It is noteworthy that a very high percentage of phage-related sequences has also been detected in a second vascular-restricted plant pathogen, Spiroplasma citri33. We identified four regions, with a high density of ORFs homologous to phage sequences, that we considered to be prophages, in addition to isolated phage sequences dispersed throughout the genome. Two of these prophages (each 42 kbp, designated XfP1 and XfP2) are similar to each other, lie in opposite orientations in distinct regions and appear to belong to the dsDNA, tailed-phage group. Both appear to contain most of the genes responsible for particle assembly, although we know of no reports of phage particle release from X. fastidiosa cultures. In prophage XfP1, we found two ORFs between tail genes V and W that are similar to ORF118 and vapA from the virulence-associated region of the animal pathogen Dichelobacter nodosus, which by homology encode a killer and a suppressor protein34. Interestingly, in prophage XfP2, we found two other ORFs also between tail genes V and W that are similar to hypothetical ORFs of Ralstonia eutropha transposon Tn4371 (ref. 35). The other two identified prophages, XfP3 and XfP4, are also similar in sequence to each other and to the H. influenzae cryptic prophage φflu (ref. 36). They both contain a 14,317-bp exact repeat. Few particle-assembly genes were found in these regions, suggesting that these prophages are defective. An ORF similar to hicB from H. influenzae, a component of the major pilus gene cluster in some isolates, was found in XfP4 (ref. 37).

The presence of virulence-associated genes from other organisms within the prophage sequences is strong evidence for a direct role for bacteriophage-mediated horizontal gene transfer in the definition of the bacterial phenotype.

Absence of avirulence genes

Phytopathogenic bacteria generally have a limited host range, often confined to members of a single species or genus. This specificity is defined by the products of the so-called avirulence (avr) genes present in the pathogen, which are injected directly into host cells, on infection, through a type III secretory system38,39,40. BLAST41 searches with all known avr and type III secretory system sequences failed to identify genes encoding proteins with significant similarities in the genome of X. fastidiosa. Although the variability of avr genes amongst bacteria might account for this apparent lack, the high level of similarity of some components of the type III secretory system argues against this. We suspect that these genes are, in fact, not required because of the insect-mediated transmission and vascular restriction of the bacterium that obviates the necessity of host cell infection. Furthermore, if the differing host ranges of X. fastidiosa are molecularly defined, this may be by a quite different mechanism not involving avr proteins.

Conclusions

Before the elucidation of its complete genome sequence, very little was known of the molecular mechanisms of X. fastidiosa pathogenicity. Indeed, this bacterium was probably the least characterized of all organisms that have been fully sequenced. Our complete genetic analysis has determined not only the basic metabolic and replicative characteristics of the bacterium, but also a number of potential pathogenicity mechanisms. Some of these have not previously been postulated to occur in phytopathogens, providing new insights into the generality of these processes. Indeed, the availability of this first complete plant pathogen genome sequence will now allow the initiation of the detailed comparison of animal and plant pathogens at the whole-genome level. In addition, the information contained in the sequence should provide the basis for an accelerated and rational experimental dissection of the interactions between X. fastidiosa and its hosts that might lead to fresh insights into potential approaches to the control of CVC.

Methods

The sequencing and analysis in this project were carried out by a network of 34 biology laboratories and one bioinformatics centre. The network is called the Organization for Nucleotide Sequencing and Analysis (ONSA)42, and is entirely located in the state of São Paulo, Brazil.

Figure 1: Linear representation of the main chromosome and plasmids pXF51 and pXF1.3 of the Xylella fastidiosa genome.
figure 2

(PDF file 171K). Genes are coloured according to their biological role. Arrows indicate the direction of transcription. Genes with frameshift and point mutations are indicated with an X. Ribosomal RNA genes, the tmRNA, the principal repeats, prophages and the group II intron are indicated by coloured lines. Transfer RNAs are identified by a single letter identifying the amino acid. Pie chart represents the distribution of the number of genes according to biological role. The numbers below protein-producing genes correspond to gene IDs.

Sequencing and assembly

The sequence was generated using a combination of ordered cosmid and shotgun strategies43. A cosmid library was constructed, providing roughly 15-fold genome coverage, containing 1,056 clones with average insert size of 40 kilobases (kb). High-density colony filters of the library were made, and a physical map of the genome was constructed using a strategy of hybridization without replacement44. A total of 113 cosmid clones was selected for sequencing on the basis of the hybridization map and end-sequence analysis. The cosmid sequences were assembled into 15 contigs covering 90% of the genome. Additionally, shotgun libraries with different insert sizes (0.8–2.0 kb and 2.0–4.5 kb) were constructed from nebulized or restricted genomic DNA cloned into plasmids, and sequenced to achieve a 3.74-fold coverage of high-quality sequence (29,140 reads). Most of the sequencing was performed with BigDye terminators on ABI Prism 377 DNA sequencers.

Cosmid and shotgun sequences were assembled into six contigs. We identified sequence gaps by linking information from forward and reverse reads, and closed either by primer walking or insert subcloning. The remaining physical gaps were closed by combinatorial PCR and by lambda clones selected from a λDash library by end-sequencing. The collinearity between the genome and the obtained sequence was confirmed by digestion of genomic DNA with AscI, Not I, SfiI, SmiI and SrfI, followed by comparison of the digestion pattern with the electronic digestion of the generated sequence. In addition, sequences from both ends of most cosmid clones and 236 λ clones were used to confirm the orientation and integrity of the contigs. The sequence was assembled using phred+phrap+consed45. All consensus bases have quality with Phred value of at least 20. There are no unexplained high quality discrepancies, each consensus base is confirmed by at least one read from each strand, and the overall error estimate is less than 1 in every 10,000 bases.

ORF prediction and annotation

ORFs were determined using glimmer 2.0 (ref. 46) and the glimmer post-processor RBSfinder (S. L. Salzberg, personal communication). A few ORFs were found by hand guided by BLAST41 results. Annotation was carried out in a cooperative way, mostly by comparison with sequences in public databases, using BLAST41 and tRNAscan-SE (ref. 47) and was based on the functional categories for E. coli48. Only one tmRNA was located (K. Williams, personal communication). To help annotate transport proteins, we built a custom BLAST41 database using sequences from http://www-biology.ucsd.edu/msaier/transport/toc.html and compared our ORFs with these sequences. Phylogenetic trees for conserved COGs12 were built using ClustalX49 for multiple alignment and Phylip50. Paralogous gene families ( Table 2) were determined using BLASTX with the E-value cut-off equal to e-5 and such that at least 60% of the query sequence and at least 30% of the subject sequence were aligned.