Introduction

Horizontal gene transfer, a fundamental mechanism of evolution in bacteria, occurs by conjugation, transduction, and transformation [1]. These processes lead to homologous recombination (HR) that mediates exchange of genetic material among related taxa, including acquisition of novel genetic elements, such as antibiotic resistance and virulence factors, that can result in emergence of new pathotypes [2]. In bacteria, HR is proposed to play a key role in repairing damaged DNA [3] to promote survival under stressful conditions. But HR may also give rise to new variants if the homologous DNA originates from a closely related but genetically distinct donor [4]. New genetic variants can result in profound phenotypic changes, such as evasion of host defenses by creating variation in surface structures [5, 6], facilitate disease emergence [7], or enhance adaptation to new environments [8]. The relative role of HR and point mutations in generating genetic variation differs among bacterial species [9], with HR expected to play a major role in naturally transformable bacteria.

The first observation of HR was achieved by detection of mosaic genes that contained fragments originating from different taxa of the same genus [10]. Use of multilocus sequence typing/analysis (MLST/MLSA), a technique to assess genetic diversity within a species by sequence analysis of 5–11 housekeeping genes scattered across the genome [11], has revealed HR in bacterial genomes. Until recently, MLST was a signature tool for bacterial phylogenetic studies and detection of HR [12,13,14,15]. However, examination of only few genes with presumed neutral variation will overlook HR present in other genomic regions, including genes involved in host adaptation and/or virulence [16]. Whole-genome sequence (WGS) comparisons provide a superior approach for detecting HR in bacterial genomes [17,18,19].

Xylella fastidiosa is a plant pathogenic bacterium that causes massive economic losses and was believed to be limited to the Americas, but recently has emerged as an important pathogen in other regions of the world [20], including Europe [21, 22] and Asia [23]. X. fastidiosa exclusively colonizes two habitats: the plant xylem vessels and foregut of insect vectors (sharpshooter leafhoppers and spittlebugs) [24]. Taxonomically, strains within X. fastidiosa are generally categorized into three subspecies (i.e., fastidiosa, multiplex, and pauca) [25,26,27], although additional subspecies have been suggested (i.e., morus, sandyi) [28, 29]. While the species has a very broad plant host range [30], this differs among subspecies. However, mechanisms of host specificity are still unknown [31].

Extensive HR has been identified in X. fastidiosa based on genetic diversity and phylogenetic patterns revealed by MLST [29, 31,32,33,34]. These studies made notable speculations on the impacts of HR on host range expansion in X. fastidiosa strains, including those strains that infect citrus, mulberry, and blueberry [15, 28, 34]. However, these inferences were made based on the evidence of HR in housekeeping genes, rather than taking advantage of whole genomes.

Experimental studies have demonstrated natural competence and HR in X. fastidiosa both in vitro [35] and in artificial habitats mimicking the natural growth environment of the bacterium [36]. Moreover, intersubspecific HR (IHR) between X. fastidiosa subsp. fastidiosa and multiplex was recently demonstrated experimentally [37]. The objective of this study was to use WGS analysis to identify the extent of intra- and intersubspecific HR in X. fastidiosa. We did this by first focusing on X. fastidiosa recombinant strains experimentally generated in vitro [37], and later by studying multiple strains isolated from field infections around the world. Extensive inter- and intrasubspecific recombination of both ancient and recent origins was detected in the genomes of X. fastidiosa, indicating patterns of gene flow among different subspecies. The increased global movement of strains through trade of plant materials can lead to introduction of novel genotypes into an endemic pathogen population. HR can play a significant role in such cases, where novel genotypes might exchange adaptive alleles with the existing population, resulting in niche-adapted recombinant strains with greater fitness than the existing population.

Materials and methods

Bacterial strains, media, and culture conditions

Strains and mutants used in this study are listed in Table S1. For an explanation on recombinant mutants generated in vitro see Supplementary Methods and Fig. 1. A Temecula1 variant strain, that differed based on whole-genome sequence analysis in this study, was named here as TemeculaL [38]. Temecula1* was also a variant of the Temecula1 strain that showed reduced virulence in our previous study [39]. Strains PD7202 and PD7211 were isolated from coffee plants shipped from Costa Rica and intercepted in the Netherlands [40]. Strains were cultured in PW [41] modified agar without phenol red and with 1.8 g l−1 bovine serum albumin (BSA) (Gibco Life Sciences Technology) for a week at 28 °C from −80 °C stock, re-streaked, and cultured for another week before use. Kanamycin (Km) and chloramphenicol (Cm) were used at 30 and 10 µg ml−1, respectively.

Fig. 1
figure 1

Diagram summarizing the lineage of the recombinant strains used in this study. Parental strains are highlighted in boxes. Recombinant strains were obtained by mixing either one live strain with DNA from another dead strain (a, b), or by co-culturing two live strains (c). Denomination between parenthesis after the strain names corresponds to antibiotic-resistance markers used for selection: Km, kanamycin; Cm, chloramphenicol. Whenever available, selection was also done by GFP production in addition to antibiotic resistance. Parental strains include Xylella fastidiosa subsp. fastidiosa wild-type TemeculaL, WM1-1, and mutants KLN59.3 (GFP) [a KmR and GFP-marked mutant of Temecula1 [75]], NS1-CmR [a CmR cassette inserted in the noncoding region NS1 of TemeculaL [76]], and pglA-KmR [a CmR pglA mutant of strain Fetzer [67]]. X. fastidiosa subsp. multiplex AlmaEm3 (marked with a KmR cassette in the NS1 site) was the only strain classified in a different subspecies. The genomes of all parental and recombinant strains shown here were sequenced for this study. For more details see Supplementary Methods

DNA extraction, library preparation, sequencing, and assembly

In this study, 17 novel X. fastidiosa genome sequences (including nine WT, five in vitro recombinant strains, and three parental strains) were generated (Table S1). Genomic DNA extraction, library preparation and MiSeq/Pacific Biosciences (PacBio) sequencing protocol, quality filtering, trimming, assembly, and annotation details are presented in Supplementary Methods. Briefly, paired-end MiSeq FASTQ files were quality assessed with FastQC (Babraham Bioinformatics) and trimmed with Trimmomatic-0.35 [42]. Trimmed reads (>36 bp) were de novo assembled using SPAdes 3.9.0 [43]. For WM1-1 and AlmaEM3, sequenced by both MiSeq and PacBio, a hybrid assembly of SPAdes 3.9.0 was used. Information for read and assembly statistics, breadth and depth of coverage, and annotation results is summarized in Tables S2 and S3.

Core-genome prediction

Complete or draft genomes of 55 strains of X. fastidiosa sequenced here or obtained from NCBI (Table S1) were aligned using progressiveMauve [44] using default settings. Core-genome alignment was obtained by removing the accessory regions from the alignment using stripSubsetLCBs script on the aligned locally colinear blocks (LCBs) [44], https://github.com/xavierdidelot/ClonalOrigin/wiki/Usage, after discarding core LCBs shorter than 500 nt. The same steps were followed to obtain the core genome of 57 X. fastidiosa strains when two additional intercepted strains PD7202 and PD7211 [40] were added to our database. Implementing the Mauve approach instead of using alignment of concatenated core genes allowed us to include intergenic regions and contiguous order in recombination analysis. For the model implemented in BratNextGen, contiguously ordered core genome alignment is preferred, which was obtained by concatenation of core LCBs.

Population structure and inference of recombination

Population structure was defined with BAPS 6.0 (hierBAPS module for hierarchical BAPS) using the default settings [45]. The core genome (i.e., LCBs) obtained from Mauve alignment of 55 X. fastidiosa genomes was used as input in BAPS 6.0 to obtain the population structure. A total of 33824 backbone entries were concatenated to obtain core genome alignment of the 55 genomes. In addition, the core genome from Mauve alignment of 13 genomes of in vitro recombinants and their parents was used as input for analyzing population structure for in vitro recombinants. These core genome alignments were used to infer maximum likelihood (ML) phylogeny estimated with RAxML v 8.2.11 under the generalized time-reversible model (GTRGAMMA) with 1000 bootstrap replicates [46]. Mid-point rooted ML trees were visualized and annotated using FigTree (v.1.4.2, http://tree.bio.ed.ac.uk/software/figtree/).

For genealogy reconstruction, recombination was identified from the core genome alignments using BratNextGen algorithm. The cutoff level for obtaining clusters from the proportion of shared ancestry tree (PSA tree) was guided by the number of clusters determined by hierBAPS. Recombination was estimated with 20 iterations of the HMM estimation algorithm and 100 permutations with a 5% significance threshold [18]. To analyze the origin of recombination, we used the fastGEAR algorithm, which employs hierBAPS to identify lineages as clusters of genomes that share a monophyletic signal in at least 50% of the sites. Recombination events between X. fastidiosa lineages that affected a subset of recipient lineages are considered “recent” with the origin of the recombinant sequence assigned to the lineage with the highest probability at that position. Regions affected by recent recombination were removed prior to detection of “ancestral” recombination events that are shared by all strains, which comprise a lineage. The origin of these ancestral recombination events cannot be inferred due to their conserved nature [47]. In identifying both recent and ancestral recombination events, Bayes factor >1 or >10, respectively, was used for testing the statistical significance. We conducted this fastGEAR analysis on the core genome alignments for both sets of 55 and 57 X. fastidiosa strains (including newly intercepted strains PD7202 and PD7211). Next, to obtain recombination-filtered core-genome alignment, positions in the core genome alignment that corresponded to the predicted recent recombination events by fastGEAR were masked using the Perl script, bng-mask_recombination.pl (https://github.com/tseemann/bng-tools/blob/master/bng-mask_recombination.pl). Gaps were then removed from the alignment using trimal (v1.2) with the ‘-nogaps’ option [48]. This recombination-filtered core genome alignment was then used to construct a ML phylogeny as described above. Further analysis of recently recombined regions was carried out to understand ecological significance of intersubspecies recombination. Detailed methodology of extracting coordinates for recently recombined genes as well as annotations, is described in Fig. S1.

Accession numbers

The genomes and the raw reads generated in this study have been deposited in NCBI GenBank under BioProject PRJNA433735 with BioSample numbers SAMN08537137–SAMN08537151, SAMN10499944, and SAMN10499946. GenBank accession numbers are included in Table S1.

Results and Discussion

Recombination by natural competence in vitro leads to multiple events of DNA exchange across the genome

Previously, we performed in vitro natural transformation experiments of X. fastidiosa by co-culturing combinations of either live or heat-killed donor and live recipient strains, generating recombinant strains that were selected based solely on the acquisition of antibiotic-resistance markers [36, 37] (Fig. 1). Complete genomes were analyzed to determine the overall distribution of recombination events that occurred, including the length of DNA flanking the selection marker. Initially, the presence of recombinogenic regions were identified by reference mapping of the reads and alignment of contigs to the parental strain genome (Fig. S2). To further analyze recombinant regions beyond those flanking selection markers, we classified strains used in vitro by hierBAPS that identified five clusters among parental and recombinant strains (Fig. 2a). One interesting observation of this analysis was that strains referred to as “Temecula1” in previous reports are actually different strains that were classified into two different clusters: cluster 5 [TemeculaL, NS1-CmR(TemeculaL)]; and cluster 4 [Temecula1, Temecula1*, KLN59.3(GFP)(Temecula1)]. The NCBI-deposited Temecula1 genome (Table S1) differs from the genome of other strains that have been referred to in the literature by the same name. For instance, clustering showed that NS1-CmR (TemeculaL) has TemeculaL as background and not Temecula1 as previously thought [35]. Recombinant regions shared among Temecula1 and Temecula1* represented different alleles compared with TemeculaL, further highlighting the differences among strains referred to with the same name.

Fig. 2
figure 2

Maximum likelihood phylogeny of the core genome alignment of 13 strains of X. fastidiosa in vitro recombinants and their donor and recipient parental strains before (a) and after (b) recombination filtering. Population structure was inferred using nested clustering implemented in hierBAPS. For (b), recombinant regions detected by BratNextGen in the genomes were excluded to build a recombination-free phylogenetic tree. The overall topology remained similar, but the phylogenetic distance decreased among the genomes after recombination filtering. The phylogenetic tree was mid-point rooted, representing an increasing order of nodes. Bootstrap support was >98% for all nodes, except where otherwise indicated

To further investigate variable regions, we used more robust detection tools that detect recombination events at the intrasubspecific level (within subspecies fastidiosa; BratNextGen) and at the intersubspecific level (between subspecies fastidiosa and multiplex; fastGEAR). BratNextGen and fastGEAR identified recombination events in the core genome of X. fastidiosa subsp. fastidiosa WT and in vitro recombinant strains, with prediction of 72 recombination events (Fig. 3). Influence of recombination on phylogenetic relationships among in vitro recombinants and their parents was evident based on recombination-free tree topology that showed increased cohesiveness among clusters within X. fastidiosa subsp. fastidiosa and reduced branch length among subspecies fastidiosa and multiplex strains (Fig. 2b).

Fig. 3
figure 3

Detection of recombinogenic regions in the core genomes of X. fastidiosa in vitro recombinants and their parental strains using BratNextGen. Core-genome alignment obtained with progressiveMauve was used for genealogy reconstruction, and recombination was identified using BratNextGen algorithm. The estimation of recombination was performed with 20 iterations of the HMM estimation algorithm and 100 permutations with 5% significance threshold. The left panel corresponds to the proportion of shared ancestry (PSA) tree computed by BratNextGen. In PSA tree, strains are clustered according to the proportion of sequence shared, including recombinant regions. In the right panel, recombinant segments identified by BratNextGen for each strain are shown. The colors are arbitrarily assigned. The fragments of the same color present in the same column share the same origin. BratNextGen identified intrasubspecies recombination events (within fastidiosa subspecies), also indicating differences among different “Temecula” strains (see text for more details)

Recombinants representing an intersubspecific strain combination, i.e., heat-killed X. fastidiosa subsp. multiplex strain AlmaEM3 as a donor and X. fastidiosa subsp. fastidiosa strain TemeculaL as a recipient (Fig. 1a), showed recombination only around the Km marker cassette, as identified by both fastGEAR and the stringent method BratNextGen (data not shown). The lengths of recombining regions varied in the case of the two recombinants, with TemL(Alma) Rec1 showing a total of 10-kb region (spanning five recombination events surrounding the Km marker gene) and TemL(Alma) Rec2 showing a 3.5-kb region (spanning one recombination event) flanking the Km marker cassette, as was observed in reference mapping and alignment (Fig. S2). No additional recombinant regions were identified, indicating that recombination in this case was confined to the flanking regions of the Km selection marker insertion site.

On the other hand, in the case of in vitro intrasubspecific recombinants generated by mixing live cells of X. fastidiosa subsp. fastidiosa WM1-1 (hierBAPS cluster 3, Fig. 2a) with heat-killed cells of X. fastidiosa subsp. fastidiosa KLN59.3(GFP) (hierBAPS cluster 4) (Fig. 1b), recombination events were detected away from the GFP marker using BratNextGen and fastGEAR analysis in WM1-1(GFP) Rec1 and WM1-1(GFP) Rec2 (10.5-kb region identified in both recombinants and an additional 6.1 kb in WM1-1(GFP) Rec1). Such random recombination events away from the selection marker were also observed in in vitro recombinants of Helicobacter pylori [49]. We scanned these recombinant regions for polymorphisms in amino acid sequences, and identified polymorphisms in genes encoding lipase/alpha–beta hydrolase and putative Ctpa-like serine protease/peptidase S41 (Table 1). Previously, homologous serine proteases have been linked to virulence of X. fastidiosa [50, 51].

Table 1 List of recent recombination events in genes/categories identified in WT strains and their possible ecological role in host adaptation

Finally, we assessed the extent of recombination in another intrasubspecific in vitro recombinant. TemL(NS1pglA) Rec1 was obtained by co-culturing a live donor and recipient parents NS1-CmR (TemeculaL) and pglA-KmR, belonging to clusters 5 and 2, respectively (Figs. 1c, 2a). Upon analyzing the flanking region of the Km marker cassette in the recombinant genome, a 6-kb region upstream of the cassette was identical to the region from pglA-KmR, indicating recombination at this region. A 5-kb region downstream of the Km cassette was identical in both pglA-KmR as well as NS1-CmR. In addition to recombination near the Km cassette, we identified three recombination events (320 bp, 1.5, and 3 kb) away from the Km cassette, similar to what was described above. Scanning these regions for changes in amino acid polymorphisms in protein-coding genes indicated recombinant genes encoding extracellular serine protease, Ctpa-like serine protease, hypothetical proteins and translocation, and assembly module TamA/surface antigen/autotransporter domain (Table 1). Except for the last two, all the other proteins were identified also as recombinants in WT strains (Table 1). Interestingly, a recombinant gene encoding Ctpa-like serine protease, identified in this in vitro recombinant, is present upstream of the recombinant gene (also encoding Ctpa-like serine protease) [52] in WM1-1(GFP) Rec1 (see above). The presence of recombination in similar regions of the genome in two independent recombinants arising from different parents is surprising. Ctpa-like serine protease was also identified to be a recombinant locus in morus/sandyi, multiplex, and fastidiosa subspecies in WT strains (Table 1, see below), suggesting this region as a possible recombination hotspot. Potential role of these genes in the interaction of X. fastidiosa with its host plants is described in Table 1.

Our results indicate that in vitro recombination by natural competence is not limited to a small fraction of the genome, and it is variable, even when considering the same pair of strains under controlled conditions in the same experiment [37], as seen in the case of two TemL(Alma) recombinants that vary in the extent of integration of transformed DNA. While this study analyzed recombinants obtained after a single cycle of transformation, frequency of recombination upon natural competence is expected to increase dramatically after several cycles [49]. Thus, co-infection of plants [53] or insects with different strains of a single or multiple subspecies is expected to result in unpredictable widespread exchanges of genetic material. Interestingly, we found that in the case of intersubspecific recombination (Fig. 1a), only the region of the selection marker was exchanged, while in intrasubspecific recombination (Fig. 1b, c), additional regions away from the selection marker were also exchanged. This may indicate that in nature, intrasubspecific HR may be more widespread in the genome than intersubspecific recombination, where it could be inhibited by sequence divergence [54], or possibly by different restriction-modification systems [49]. Testing this hypothesis would require extensive sampling and genome analysis of strains belonging to each subspecies from wild populations followed by genome sequencing.

Ancient and recent recombinogenic regions are common among X. fastidiosa strains

Analysis of in vitro transformation detected multiple recombination events within and between subspecies multiplex and fastidiosa. To estimate the influence of recombination on the evolution of X. fastidiosa at the intersubspecific level, we analyzed 55 genomes belonging to all X. fastidiosa subspecies described/available to date [i.e., subsp. fastidiosa, multiplex, pauca, morus, and sandyi (15 sequenced in this study and 40 obtained from NCBI GenBank)] (Table S1). Population structure of all the genomes inferred using nested clustering implemented in hierBAPS identified five clusters: one cluster each representing subsp. fastidiosa and multiplex, two clusters within subsp. pauca, and a single cluster representing subsp. morus and sandyi (Fig. 4a). The secondary level clustering further divided the cluster containing subspecies morus and sandyi into individual subspecies clusters, and a cluster containing multiplex into two clusters (as seen by two subclades within subspecies multiplex). The core genome alignment of 1.79 MB was used for the construction of a phylogenetic tree (Fig. 4a). Subspecies morus and sandyi were positioned in a “hybrid” subclade between fastidiosa and multiplex; this is in agreement with the phylogeny based on MLST [20].

Fig. 4
figure 4

Population structure of X. fastidiosa. Maximum likelihood phylogeny of core genome alignment of 1.79 Mb of 55 X. fastidiosa strains analyzed in this study, before (a) and after (b) filtering for the recombinant regions. The phylogenetic tree was mid-point rooted, representing an increasing order of nodes. For (a), a total of 33,824 backbone entries were concatenated to obtain core genome alignments of 1.79 MB. Population structure was inferred using nested clustering implemented in hierBAPS. For (b), recombinant regions detected by fastGEAR in all subspecies of X. fastidiosa were excluded to build a recombination-free phylogenetic tree. Bootstrap support was >98% for all nodes, except where otherwise indicated

While BratNextGen allowed us to detect recombination within a single lineage (subsp. fastidiosa), it did not provide enough resolution when diverse lineages belonging to a subspecies were included, with no recombination identified in subsp. multiplex or pauca (Fig. S3). To address this limitation, we used fastGEAR software that inferred recombination between lineages as well as from external origins [47]. As evident from Fig. 5, subsp. morus and sandyi showed mosaic/chimeric genomes due to a relatively high degree of recent (Table S4) as well as ancient (Table S5) recombination events, thus, confirming previous observations [28].

Fig. 5
figure 5

Interlineage recombination. Distribution, origin, and proportion of recent (a) and ancestral (b) recombination events across the X. fastidiosa core genome as predicted by fastGEAR. The lineage predictions are based on hierBAPS analysis and are color-coded, and clade designations are labeled accordingly. The presence of differently colored markings within a given lineage indicates a recombination event relative to the genomic position shown in the X axis. Predicted recent recombination events were removed prior to the analysis of ancestral recombination, which are represented by white gaps in (b). Subspecies morus and sandyi showed a high degree of both ancestral and recent recombination, indicating mosaicism in these two subspecies

Next, we assessed the influence of recent recombination events on different subspecies. Subspecies pauca has experienced recent recombination events originating from all three lineages, morus/sandyi (27% of the events), multiplex (16% events), and fastidiosa (16% of the events) subspecies, as well as from possibly other bacteria (40% of the total recombination events) (Table S4). There are multiple reports of coexistence in the same geographic areas of subsp. pauca and other subspecies, that can explain the exchange of DNA leading to recombination [55, 56]. The total size of recombination events varied from 530 bp to ~250 kb, representing ~0.02–10% of the genome (Table S4). We identified CFBP8072 as highly recombinant among pauca strains, showing a total of 376 recombination events, with origins in all other subspecies of X. fastidiosa and possible outside sources. This strain, isolated in France from intercepted coffee plants imported from Ecuador, was previously identified as a recombinant strain based on MLSA [57]. Analysis of length distribution of recombination events across genomes of subsp. pauca indicated an average size of 1 kb, ranging from 25 to 3791 bp, with the exception of 4423 bp for CFBP8072 (Fig. 6). A plot of the frequency of specific lengths of recombinant segments showed a normal distribution, and a small peak at 25 bp followed by a large peak at 300 bp (without the outlier CFBP8072), indicating two frequently recombined lengths across pauca subspecies. This bimodal distribution model is similar to what has been observed in other bacteria [49, 58], but the mechanism(s) behind it is still unclear.

Fig. 6
figure 6

Distribution of the length of recombinant fragments and the number of recombination events for each subspecies of X. fastidiosa. The number of recent recombination events identified by fastGEAR were plotted on the Y axis, with the lengths of recombination events plotted on the X axis using a logarithmic scale. The average length of a recombined fragment was around 1000 bp for all subspecies, with 10,000 bp being a higher limit for the recombined fragment length (subsp. morus as an exception). Higher incidence of recombination as well as the greater lengths of recombinant fragments were observed in subsp. morus (31,554 bp as the longest recombinant fragment) as well as in strains isolated from intercepted coffee plants (XFCO33, CFBP8072, and CFBP8073, marked in red, 19,280 bp as the longest recombinant fragment)

Subspecies fastidiosa has experienced relatively low levels of recombination events compared with other subspecies, except for an outlier strain, CFBP8073, which is highly recombinant (Fig. 6). The overall number of recombination events experienced by subsp. fastidiosa strains, except CFBP8073, were 235 in total, of which 87% of the events originated in subsp. multiplex, 11% originated in subsp. morus/sandyi, and 2% originated from outside species, with none from subsp. pauca. A highly recombinant strain CFBP8073 contains recombinant regions with 83% of the events originating in morus/sandyi subspecies. This is in contrast with other subsp. fastidiosa strains that contained recombination origins in subsp. multiplex. Interestingly, strain CFBP8073 was also intercepted in Europe in coffee plants shipped from Mexico [57]. Despite a comparatively low rate of recombination in subsp. fastidiosa, two frequently recombined segments included 75 and 2325 bp, indicating a similar pattern as observed above, including a normal distribution. The maximum length of a recombinant segment was 10,909 bp (Fig. 6). These lengths are comparable with the recombinant segment lengths observed in experimentally generated recombinants.

A population structure using hierBAPS indicated that subspecies multiplex is diverse (Fig. 4a), with at least five subclades. A sublineage within multiplex consisting of Dixon and Griffin-1 strains was minimally affected by recombination (Fig. 5, Table S4). However, sublineages containing susbp. multiplex ATCC35871, SY-VA, and BB08-1-containing subclades were significantly affected by recombination, with donor lineages being morus/sandyi (51% of recombination events) and fastidiosa (48% of recombination events) (Fig. 5, Table S4). The length distribution of recombinant events also followed a normal curve for subsp. multiplex, with the two most frequently recombined fragments of 649 and 2318 bp.

In addition to ancient recombination, subsp. morus and sandyi have also undergone recent recombination. It is interesting to note that these two subspecies have different origins of recombination, with morus showing a higher proportion of recombination originating in multiplex (56% in multiplex compared with 41% recombination originating in fastidiosa), and with sandyi showing a higher proportion of recombination originating in fastidiosa (64% of the events originating in fastidiosa and 20% originating in multiplex). Strain XFC033 belonging to subspecies sandyi can be referred to as a hybrid strain, with origins in pauca, fastidiosa, and multiplex. Interestingly, this strain was identified from symptomatic coffee leaves originating in Costa Rica and intercepted in northern Italy [59]. Although similar to other subspecies, fragments of ~1 kb were frequently recombined in morus/sandyi strains; subsp. morus contains the largest recombined fragment of 31.5 kb. The two subsp. also contained at least nine fragment sizes >10 kb, reiterating the highly recombinant nature of these subspecies. The widespread recombination found here is in concordance with X. fastidiosa being a γ proteobacteria, since this group is considered to have the “most chamaleonic-like evolutionary history” [60].

Phylogenetic relationships are influenced by recombination events between X. fastidiosa subspecies

To understand the influence of recombination on the overall topology of the X. fastidiosa phylogeny, we filtered core genome alignments to exclude the regions identified as recent recombinant events predicted by fastGEAR. The remaining nonrecombinant regions were then used to build a recombination-free phylogenetic tree. As shown in Fig. 4b, the overall topology of the tree appeared similar to that of the unfiltered tree, showing similar major clades and subclades. Recombinant filtering resulted in a strongly increased cohesiveness within subsp. fastidiosa, multiplex, and pauca. The most notable observation after recombinant filtering was the shift of subclades morus and sandyi with respect to their relatedness to fastidiosa and multiplex (Fig. 4b). When we compared branch-length distances from both trees, the average distance between morus and sandyi decreased (from 0.015 to 0.010), whereas that between morus and multiplex significantly increased (from 0.014 to 0.019) upon recombinant filtering. This led to a shift in the position of subsp. morus closer to sandyi in the recombinant-filtered phylogeny. These observations indicate that recombination plays an important role in X. fastidiosa evolution.

Majority of strains from intercepted plants are highly recombinant

The findings that (i) different degrees of ancestral and recent recombination have shaped the population structure of X. fastidiosa, and (ii) each lineage contains different degrees of recombination with origins in other subspecies, show that a shared ecological niche [55, 56] of these diverse lineages could have contributed to the emergence of the recombinant strains. Interestingly, the highly recombinant strains CFBP8072, CFBP8073, and XFC033, were strains isolated from intercepted infected coffee plants, with their origin in American countries. We further included two additional strains belonging to subspecies pauca, PD7202 (CFBP8495, belonging to Sequence Type 53, ST53) and PD7211 (CFBP8498, belonging to ST73) (Table S1), that were isolated from asymptomatic coffee plants intercepted in the Netherlands with their origin in Central America. Based on our findings, we hypothesized that these two strains are highly recombinant. Upon analyzing recombination using fastGEAR, we found PD7211 to contain in total 99 recent recombination events, with recombination regions shared with CFBP8072. On the contrary, strain PD7202 contained 49 recent recombination events, similar to other pauca strains, indicating that not all intercepted strains were highly recombinant (Fig. S4). Maximum likelihood phylogeny of the core genome indicated that PD7202 belongs to the same clade as subsp. pauca strains, containing CoDiRo and DeDonno, while PD7211 with a higher degree of recombination formed a distinct clade (Fig. S5). The presence of all subspecies has been identified in countries of the American continents in different hosts [55, 56]. It is likely that recombination events are rampant in the case of plants co-infected by multiple strains [53]. The presence of multiple subspecies in certain hosts or vectored by insects can foster recombination among these subspecies, resulting in emergence of recombinant strains. The importance of these recombinant regions in the epistatic contribution of each strain background [61] in terms of host range expansion, or aggressiveness of strains, remains to be investigated. Movement of these highly recombinant strains, that have acquired genomic regions from diverse subspecies, across the globe via shipment of asymptomatic plant material, is a major concern. Such introduction of novel variants to a new geographic area could further foster generation of new recombinants. Acquisition of novel traits by endemic pathogenic species could lead to increasing losses due to changes in host adaptation.

Recombinant regions shared among subspecies inform eco-evolutionary dynamics of X. fastidiosa–host interactions

Functional significance of 1026 genes (~40% of the X. fastidiosa genome), that were identified in recombinant regions, was analyzed. We focused on recently recombined annotated loci that were included among the top 10% in terms of the frequency of recombination events (105 genes with 19 recombination events or higher, see Table S7 genes highlighted in yellow). The predicted functions of genes encoded in these loci can be considered to play roles in adjustment to and/or transition between the two ecological niches experienced by the pathogen during its life cycle, i.e., plant xylem and insect foregut [62] (Table 1, S6, S7, and Fig. 7). Thus, they have a major impact on the ecology of X. fastidiosa. Cell attachment [63] is a fundamental feature needed to live in physically restricted environments with strong liquid flow, including xylem and insect foregut [64, 65], while movement against flow (twitching motility) [66] and secretion of degrading enzymes [67] are crucial for colonization of xylem vessels. Although it is not completely understood how the plant host recognizes X. fastidiosa, lipopolysaccharide (LPS) [68] has been suggested as a key target, and other cell envelope structures could be involved in evading recognition by the host innate immune system [68]. Recombinant alleles for genes encoding utilization of specific nutrients, such as mineral elements [39] or amino acids, indicate adaptation of the pathogen to the nutritionally constrained environment. A great number of genes involved in regulatory and signaling cascades were detected as recombinants. Interestingly, genes involved in production (rpfF) and sensing (rpfC and rpfG) of the quorum-sensing molecule diffusible signal factor (DSF), which is critical for acquisition of X. fastidiosa by insects from plants [62], and regulates plant host specificity [69], were detected in high frequencies. Other regulatory genes detected here, have been shown to be important for virulence of X. fastidiosa; interestingly, phoP/phoQ [70] were shown to be needed for survival in plants.

Fig. 7
figure 7

Diagram illustrating the ecological role of genes found to be under recent recombination in Xylella fastidiosa. X. fastidiosa is only found in two very specific ecological niches: xylem vessels of host plants and food canal of xylem-feeding insects. Genes that were identified through our analysis as having the highest rate of recent intersubspecific recombination are listed in Table 1, S6, and S7. Based on the functional classification of these genes (Table 1), we placed these functions in the context of the ecology of X. fastidiosa during its life cycle. Briefly, in plant hosts (left panel), X. fastidiosa needs twitching motility [1] and exoenzymes [3] to colonize the xylem system inside plants, where nutrients are very limited [5]. While colonizing xylem vessels, they need attachment [2] to withstand constant liquid flow in the vascular system and to form biofilms. Proteins in the cell envelope are fundamental for recognition by the plant [7]. When colonizing a new host, X. fastidiosa most likely needs to acquire new genetic information for adaptation to new conditions [6]. Several regulatory cascades detected to be under recombination, have been implicated in regulating bacterial colonization and virulence [4]. DSF quorum-sensing molecule has a role in causing an increase in X. fastidiosa attachment and biofilm formation that is important to increase cell stickiness and therefore insect acquisition [4]. In the food canal of the xylem-feeding insect (right panel), attachment by TIP and afimbrial adhesins [2] is important to sustain population under high liquid flow. TIVP: type IV pili; TIP: type I pili; T2SS: type two secretion system; DSF: diffusible signal factor. Numbers of the highlighted X. fastidiosa traits from 1 to 7 correlate to the categories described in Table 1. For more information refer to the text and Table 1. Photo credits: plant host 1, 2: Harvey Hoch (Cornell University); insect vector 2: Phil Brannen (University of Georgia). All other pictures are credited to the authors

Notably, the vitamin B12 transporter BtuB was the single gene with the highest recombination events (165) across subspecies, and was among the most frequently recombined genes for each subspecies (Table S7). Interestingly, vitamin B12 regulates gene expression, enzyme activity, and abundance of microorganisms [71], and has been related to virulence [72] and biofilm formation [73] in bacterial human pathogens, but has not been studied in X. fastidiosa. The presence of a greater number of recombination events (collectively 2631 events) in genes encoding “hypothetical proteins”, brings attention to functionally uncharacterized proteins. Four genes encoding lipase, Ctpa-like serine proteases, and extracellular serine protease, were common recombinant loci in intrasubspecific recombination (in vitro recombinants), as well as in intersubspecific recombination in WT strains (Table 1). Genes identified to be highly recombinant across different subspecies and encoding functions not previously shown to be important in X. fastidiosa, are of great interest to further understand the ecological adaptation of X. fastidiosa to specific environmental niches. Recombination in tRNA modification genes and recombination machinery indicates adaptability to receive and successfully integrate foreign DNA (from different subspecies) in order to obtain adaptive alleles that provide overall fitness to this naturally competent pathogen [35, 74].

Conclusions

In this study, we assessed the effect of HR on the diversity of the bacterial plant pathogen X. fastidiosa. Through our comparison of both experimentally generated recombinants and WT strains, we conclude that recombination occurs randomly and with varying lengths of DNA recombination across the genome. While the extent of recombination in different subspecies varies among WT strains, all subspecies were affected by recent intersubspecific recombination at common loci, encoding important functions, such as host colonization, nutrient acquisition, and gene regulation/signaling. Four of such recombinant loci were also identified under intrasubspecific recombination in experimentally generated recombinants, further supporting the importance of in vitro studies to understand recombination potential of these naturally competent strains. These findings underline the prominent ability of this pathogen to evolve and adapt to new conditions by acquiring new DNA, which is facilitated by the lifestyle of this bacterium that is restricted to xylem vessels of plants or mouthparts of insect vectors [62], where it can easily encounter other bacteria in very confined spaces.