Introduction

Rhizobia are a group of Gram-negative bacteria that form symbiotic associations with leguminous plants. They convert atmospheric nitrogen (N2), which is unavailable to plants, into ammonia, which is used for the synthesis of amino acids. This fundamental process is essential for life on the Earth. Rhizobia include α-proteobacteria members of the genera Rhizobium, Sinorhizobium (Ensifer), Bradyrhizobium, Azorhizobium, Mesorhizobium, Devosia, Methylobacterium, Microvirga, Ochrobactrum, Phyllobacterium and Shinella, and β-proteobacteria members of the genera Burkholderia and Cupriavidus (Weir, 2012). S. meliloti and S. medicae are closely related species forming symbioses with the same host legume species (for example, Medicago sativa, Medicago truncatula, Melilotus and Trigonella). The genomes of both S. meliloti and S. medicae consist of a single circular chromosome (3.65 Mb) plus two large symbiotic (sym) plasmids of 1.3 (megaplasmids) and 1.6 Mb (chromids) in size (Barloy-Hubler et al., 2000; Barnett et al., 2001; Capela et al., 2001; Finan et al., 2001; Galiber et al., 2001; Reeve et al., 2010), and additional smaller plasmids optimizing adaptation to environmental changes.

RmInt1 is a mobile bacterial group II intron that is widespread in natural populations of S. meliloti (Muñoz et al., 2001) and was first described in the GR4 strain (Martínez-Abarca et al., 1998). The complete genome sequence of this strain has recently been reported (Martinez-Abarca et al., 2013). Group II introns are self-splicing catalytic RNAs that act as mobile retroelements. They consist of a structured RNA that folds into a conserved three-dimensional structure organized into six double-helical domains, DI to DVI (Michel et al., 2009). Most bacterial group II introns have an open reading frame (ORF) encoding an intron-encoded protein (IEP) in DIV. This IEP consists of a reverse transcriptase followed by a putative RNA-binding domain with RNA splicing or maturase activity (the X domain), and, in some intron lineages, a C-terminal DNA-binding and endonuclease domain (Mohr et al., 1993; San Filippo and Lambowitz, 2002; Toro and Martínez-Abarca, 2013). Group II intron mobility is mediated by a ribonucleoprotein complex consisting of the IEP encoded by the ORF and the spliced intron lariat RNA, which remains associated with the IEP. The ribonucleoprotein complex recognizes the intron target via both the IEP and the intron lariat RNA. The central part of the target containing the intron insertion site is recognized by base pairing between the exon-binding sites (EBSs) in the lariat RNA and the complementary region in the DNA target, the intron-binding site (IBS; Michel and Ferat, 1995). In S. meliloti, RmInt1 is mostly found located within the (IS)Rm2011-2 insertion sequence (Martínez-Abarca et al., 1998; Biondi et al., 2011). It propagates at high frequency in the S. meliloti genome, principally via the homing of an RNA intermediate to cognate-homing sites (retrohoming), with a strand bias related to the replication of the chromosome and the plasmids harbored (Martínez-Abarca et al., 2004; Nisa-Martínez et al., 2007). RmInt1-like elements have also been identified in other Sinorhizobium and Rhizobium species (Fernandez-Lopez et al., 2005). The intron-homing sites in these species are IS elements of the ISRm2011-2 group, as in S. meliloti. It has been suggested that these related bacteria have acquired RmInt1-like elements by vertical inheritance from a common ancestor and by independent horizontal transfer events. Interestingly, ectopic transposition has also been observed in natural populations (insertion into target sites other than the usual homing site, occurring at a lower frequency), providing a possible means of transfer to new genomic locations (Muñoz et al., 2001; Fernandez-Lopez et al., 2005).

We report here that a fragmented group II intron from a closely related RmInt1-like element provides a genomic record of ancient intron insertion, probably occurring before the divergence of the S. meliloti/S. medicae species. This ancient intron record provided us with an opportunity to investigate the long-term evolutionary dynamics of group II introns and the associated microevolutionary processes. Our results suggest that the gradual eradication of group II introns by the host during evolution would not result in some cases in the complete elimination of intron sequences, with some intron fragments remaining and continuing to evolve in the genome making a contribution to the symbiotic capacity and environmental adaptations of these rhizobial species.

Materials and methods

Search for homologous sequences in databases

The nr database (GenBank+EMBL+DDBJ+PDB+RefSeq or GenBank+PDB+Swissprot+PIR+PRF (AA or DNA)) maintained by National Center for Biotechnology Information (entries with absolutely identical sequences have been merged) was used as a target for BLAST searches of sequences homologous to RmInt1 (Y11597.2), the fragmented RmInt1-like element (FRE652, see the results section) and the associated digualinate cyclase (DGC) sequence, using BlastN (nucleotides) and tBlastN (amino acids). Homologous sequences were identified on the basis of e-value, size and % pairwise identity, and by careful examination of the corresponding retrieved sequences. Sequences were downloaded, analyzed and processed with Geneious Pro software (Biomatters Ltd, Auckland, New Zealand). FRE652 encompasses the locus tag C770_GR4pB086, annotated as group II catalytic intron D1-D4-ncRNA. The entry for DGC associated with GR4 FRE652 is annotated as C770_GR4pB085 (Gene-ID: 14254915). A search for the S. meliloti strain GR4 FRE652-associated DGC amino-acid sequence yielded 154 hits, 4 of which were identified as additional homologous sequences from strain GR4, as indicated in the results section. Searches of the NCBI and KEGG complete organism databases for the strain GR4 sequence identified another 15 annotated DGCs GGDEF genes.

Comparison of complete genome sequences

Complete genome sequences were aligned with the progressive Mauve algorithm (Darling et al., 2004) and locally collinear blocks were calculated and extracted with Geneious Pro software (Biomatters Ltd).

Phylogenetic analyses

Multiple sequence alignments were generated with MAFFT v7.017, using the FFT-NS-ix2 algorithm and the BLOSUM62 scoring matrix for protein alignment and the automatic algorithm and the 200PAM/k=2 scoring matrix for nucleotide alignment (Geneious Pro software). For nucleotide phylogeny, a consensus unrooted tree and 100 bootstraps were generated with PhyML or Bayesian methods using the model of nucleotides substitution HKY85 and gamma model with four categories. We applied the GUIDANCE method (Penn et al., 2010) to the original alignment of DGC GGDEF domain sequences (169) with a confidence score of 0.948650, to remove aligned positions considered unreliable. The final alignment used for the phylogenetic analysis retained 90.3% of the columns (cutoff: 0.582) and 196 informative positions. GUIDANCE is freely available from http://guidance.tau.ac.il. Unrooted trees and 100 bootstraps were generated with PhyML (Guindon and Gascuel, 2003, Guindon et al., 2010), with the WAG amino-acid substitution model (similar tree topologies was inferred with the LG model) and a discrete gamma model with four categories. In some analyses, we carried out Bayesian analysis with the parallel version of MrBayes 3.1 (Huelsenbeck and Ronquist, 2001). Two independent runs of four chains were completed for 1 100 000 Metropolis-coupled Markov chain Monte Carlo generations, using the default priors for model parameters, the WAG (amino acids) and HKY85 (nucleotides) model as the rate matrix (fixed) and the gamma model for between-site rate variation. Trees were sampled every 200 generations, and 110 000 samples were discarded as the ‘burn-in’, to produce a 50% majority-rule consensus tree. The phylogenetic tree generated with PhyML for rhizobial species was based on the concatenated alignment of DnaK and RpoB (Tian et al., 2012) sequences retrieved from the GenBank database.

Results

Presence of full-length and fragmented RmInt1-like elements in rhizobia

BlastN search of nr database carried out here revealed that the closest (85% identity in pairwise comparisons) full-length known relatives of the S. meliloti RmInt1 intron were present in the closest relatives S. medicae, E. adhaerens and S. terangae (85–99% identity), whereas the most closely related fragmented introns (90–99% identity) were present in the same bacterial species and in other Sinorhizobium (S. fredii) and Rhizobium (R. etli, R. tropici and R. leguminosarum bv. phaseoli) species.

In addition to the chromosome and the expected symbiotic plasmids pRmeGR4c (pSymA) and pRmeGR4d (pSymB), S. meliloti strain GR4 harbors two accessory plasmids, pRmeGR4a and pRmeGR4b (Martinez-Abarca et al., 2013). GR4 has 10 copies of RmInt1 that are 99.9% identical (Table 1). In S. meliloti, the currently existing RmInt1 copies display considerable sequence conservation (>99% identity). The numbers of these copies differ between the S. meliloti strains for which complete genome sequences are available (Table 1), such as 1021 (Galibert et al., 2001), Rm41 (Weidner et al., 2013), BL225C (Galardini et al., 2011) and SM11 (Schneiker-Bekel et al., 2011), and strain AK83 (Galardini et al., 2011) harbors a single copy in plasmid pSINME01 (pSymA) including an internal deletion from nucleotides 1520–1844 (98% identity at the 5′ end of the fragment). In addition to its full-length copies of RmInt1, strain GR4 harbors a cryptic plasmid, pRmeGR4b, bearing a fragmented RmInt1-like element (Table 1) of 652 nt (hereafter referred to as FRE652) that is 88.9% identical to RmInt1 and 89.7% identical to its closest relative, S. medicae intron Sr.md.I1. BlastN search and sequence alignments revealed the presence of copies of a similar fragmented intron to FRE652 in other S. meliloti strains (Figure 1). Strain AK83 harbors two copies on the so-called ‘chromosome 3’ (pSymA), whereas strain SM11 has one copy on pSmeSM11c (pSymA) and strain C017 has one copy on the sequenced cryptic plasmid pHRC017 (Crook et al., 2012). In strain BL225C, a 145-nt copy of this intron fragment, trimmed at its 5′ end, was identified on pSINMEB01 (pSymA). FRE652 was found to be absent from the genomes of strains 1021 and Rm41, but one copy of 94% identical to FRE652 was present in the closely related species S. medicae, WSM419, for which complete genome sequence is available located on plasmid pSMED02 (orthologous to pSymA).

Table 1 Full-length and fragmented RmInt1 phylogenetically related elements in Rhizobia.
Figure 1
figure 1

Alignment of FRE652 and RmInt1 sequences. The ribozyme domains DI to DIV are underlined. Exon-binding sequences (EBS1, EBS2 and EBS3) are boxed, and relevant nucleotides corresponding to the first G residue of the RmInt1 intron and the ATG start codon of the IEP absent from FRE652 sequences are also boxed. Residues differing between FRE652 and the consensus are highlighted. Genomic positions, as shown from left to right, are: S. me GR4 pRmeGR4b (bases 73 323–73 974), S. md WSM419 pSMED02 (bases 1 062 090–1 061 452), S. me SM11 pSymA (bases 95 351–96 001), S. me C017 pHRC017 (bases 63 063–63 713), S. me AK83 pSymA copy 1 (bases 1 249 601–1 248 958), S. me AK83 pSymA copy 2 (bases 330 796–331 448) and S. me BL225C (bases 821 534–821 391).

FRE652 provides a genomic record of the history of an intron closely related to RmInt1

An analysis of the sequences of FRE652 copies in S. meliloti and S. medicae strains (Figure 1) revealed that the sequence of this intron fragment spanned ribozyme domains I to III and included part of domain IV, which was truncated at a position corresponding to position 653 of RmInt1. Likewise, the 5′ end of the ORF had been subject to an earlier frameshift mutation, deleting the T residue of the ATG start codon. Interestingly, all copies of FRE652 have lost the first G residue of the intron, but the exon-binding sequences (EBS1, EBS2 and EBS3) are identical to those of RmInt1, suggesting that the original full-length intron had the same potential targets. The phylogenetic tree (Figure 2) generated by maximum-likelihood methods and Bayesian analyses from alignments (Supplementary Figure 1) of 24 sequences covering 664 informative nucleotide positions in FRE652 copies from various hosts, RmInt1 and other fragmented RmInt1-like elements identified in Sinorhizobium/Ensifer and Rhizobium species revealed that all FRE652 sequences branched from a common node with strong bootstrap support (100% bootstrap support and a posterior probability of 99.98%), suggesting that they arose from a single ancestral intron. Furthermore, the group of FRE652 elements had a statistically supported node in common with the RmInt1 group (71% bootstrap support and a posterior probability of 95.37%). The 487 nt intron fragment (also truncated at its 3′ end) of S. fredii strain NGR234 also clustered within the RmInt1 group, and the fragmented intron copies of R. etli/Ensifer species formed a differentiated group with strong statistical support (93% bootstrap support and a posterior probability of 98.55%). The mutation of existing RmInt1 elements is therefore unlikely to account for FRE652, and the most plausible explanation seems to be that FRE652 represents a genomic record of the history of an intron closely related to RmInt1.

Figure 2
figure 2

Phylogeny of FRE652, RmInt1 and other full-length and fragmented RmInt1-like elements. A consensus unrooted tree is shown and the corresponding sequences are labeled. The clusters are highlighted in color and posterior probabilities for Bayesian analyses are indicated at the nodes. The multiple sequence alignment used for the analysis is shown in Supplementary Figure 1. F, fragmented intron. Accession numbers for sequences are provided in Table 1. S. me, S. meliloti; S. md, S. medicae; E. ad, E. adhaerens; R. et, R. etli; S. te, S. terangae. S. me GR4 RmInt2 is a distant relative of RmInt1.

FRE652 represents an ancient intron insertion event

We investigated whether FRE652 resulted from an ancient intron insertion event, by analyzing the sequences flanking FRE652 in S. meliloti and S. medicae, by using Mauve to align whole-genome sequences carrying copies of the ancient fragmented intron. We found that the sequences flanking the copies of FRE652 on pSymA in S. meliloti strains AK83 and SM11, the accessory plasmids pRmeGR4b and pHRC017 from strains GR4 and C017, respectively, and pSMED02 from S. medicae strain WSM419 displayed synteny over a region of about 7 kb (Figure 3a). Synteny extended over a longer distance in pHRC017 and pRmeGR4b (not shown), suggesting that these plasmids have a common origin. In all cases, a predicted helix–turn–helix transcriptional regulator (referred to hereafter as TR) transcribed in the opposite orientation to the putative intron insertion was found downstream from the FRE652 element. However, this putative regulator sequence was interrupted by a copy of the ISBm1 insertion sequence in S. medicae. Upstream from the putative intron insertion, in the opposite orientation, there is a diguanylate cyclase/phosphodiesterase (DGC/PDEA) gene, followed by genes encoding a pectate lyase and a carbonate dehydratase. In strain BL225C, only the putative TR remains, the upstream flanking sequences of the syntenic block being absent (not shown). The complete block was found to be absent from the genomes of strains 1021 and Rm41.

Figure 3
figure 3

Analysis of the neighborhood of FRE652 in the S. meliloti and S. medicae genomes. (a) Locally collinear block corresponding to a conserved segment containing FRE652 sequences identified by Mauve. Complete sequences of megaplasmid pSymA from S. meliloti strains AK83 and SM11, the cryptic plasmids pHRC017 and pRmeGR4b from S. meliloti strains CO17 and GR4, respectively, and that of megaplasmid pSMED02 from S. medicae strain WSM419 were aligned with the progressive Mauve algorithm, and locally collinear blocks were extracted with Geneious Pro software (Biomatters Ltd). The relevant ORFs (coding sequence (CDS)) are depicted (yellow). The authors of the original studies in the database annotated carbonate dehydratase as carbonic anhydrase (GR4 and pHRC017); diguanylate cyclase/phosphodiesterase as GGDEF domain/EAL domain protein (GR4), conserved hypothetical protein (SM11) and PAS/PAC sensor-containing diguanylate cyclase (GGDEF)/phosphodiesterase (EAL) (pHRC017); the putative transcriptional regulator as hypothetical protein (SM11), XRE family helix-turn-helix transcriptional regulator (pHRC017) and helix-turn-helix domain protein (AK83). Pink open arrows above some of the CDS indicate putative signal peptides and an S within some of the CDS correspond to annotated short hypothetical proteins. FRE652 annotations and annotations for a group II intron derivative have been introduced manually here. The consensus identity is shown above the block, and the green color reflects the regions of higher identity. Genomic positions, as shown from left to right, are: S. me GR4 pRmeGR4b (bases 69 345–74 866), S. md WSM419 pSMED02 (bases 1 066 557–1 060 009), S. me SM11 pSymA (bases 91 375–96 621), S. me C017 pHRC017 (bases 59 086–64 605), S. me AK83 pSymA copy 1 (bases 1 253 952 to 1 248 066) and S. me AK83 pSymA copy 2 (bases 326 826–332 352). (b) DNA target sites recognized by group II introns RmInt1 and Sr.md.I1, and the putative sequences recognized by the intron from which FRE652 was derived. 5′ Exon positions are indicated relative to the intron insertion site, and the EBS and IBS sequences are shown. The unpaired residues are highlighted in red. Note that for FRE652, the −1 residue of the 5′ exon is absent; as it was probably deleted as the first G residue of the intron (see Figure 1).

As the 3′ end of the original intron from which FRE652 was derived is currently missing, the boundary and presence of the 3′ exon remain uncertain. We therefore investigated the sequence of the putative 5′ exon. The 5′ exon of RmInt1 contains the intron-binding sequences IBS1 (7 nt) and IBS2 (5 nt), which are separated by a single nucleotide (C residue), and extends to the 5′ distal exon region via seven additional nucleotides, including the critical T residue in position −15, a DNA-target region probably recognized by the IEP (Jiménez-Zurdo et al., 2003). Presently, only limited pairing was observed at the putative IBS1 (4 of 7 residues) and IBS2 (3 of 5 residues) for FRE652; likewise, instead of a T in position −15 a G residue was found, and possibly the A in position −1 of the target site was also lost, like the first nucleotide of the intron (Figure 3b). Overall, these results suggest that intron insertion at this genomic location could result from infrequent retrotransposition or retrohoming event with later alterations in the EBS/IBS pairing.

The DGC sequence associated with FRE652 derived from a common ancestor in the Rhizobiaceae/Phyllobacteriaceae proteobacteria families

We tested our proposed evolutionary hypothesis further, by analyzing the phylogenetic information available for the DGC ORF adjacent to FRE652. The DGC and phophodiesterase (PDE) enzymes control intracellular c-di-GMP concentration by the synthesis and degradation, respectively, of this molecule. This ubiquitous second messenger is known to have a key role in several cellular functions, including exopolysaccharide production, attachment and motility, and in adhesion and biofilm formation in bacteria (for a review see Jenal and Malone, 2006; Hengge, 2009; Schirmer and Jenal, 2009; Römling et al., 2013). These functions are relevant to rhizosphere colonization and host plant nodulation by S. meliloti and S. medicae (Gage, 2004; Fujishige et al., 2006). The active site of DGCs contains a conserved GGDEF domain, characterized by the GG(D/E)EF motif (A site), whereas PDE activity is associated with C-terminal EAL (PDEA) or HD-GYP domains. The DGCs linked to FRE652 are composite DGC-PDEA structural proteins with the A site motif RxAGDEF but without the I site (characterized by an RXXD motif) subject to allosteric product inhibition (Supplementary Figure 2).

TBlastN searches using the amino-acid sequence of the DGC (543 aa) encoded by a gene adjacent to FRE652 in strain GR4 as a query identified a large number of homologous sequences in databases. The DGC sequences identified in the searches (154 sequences) were mostly from bacterial species of the order Rhizobiales (Figure 4) and all were DGC/PDEA domain proteins. These hits included four additional homologous sequences from strain GR4 displaying distinct similarities to the query sequence. Two of these sequences were located on pSymA, about 682 kb apart, and encoded proteins displaying 81.4% (551 aa) and 44.8% (1071 aa) identity. One was located on pSymB and was 62.5% (564 aa) identical and the remaining sequence (742 aa) was located on pRmeGR4b and was 46.6% identical in pairwise comparisons. This last sequence was separated from that associated with FRE652 by 28 kb. The larger ORFs found on pSymA- and pRmeGR4b-encoded proteins that also contained a PAS domain sensory box at the N-terminus. Strain GR4 actually harbors 20 proteins annotated as DGCs and three annotated as EALs, highlighting the potential importance of c-di-GMP in the lifestyle of these symbiotic bacteria.

Figure 4
figure 4

Phylogeny of DGCs homologous to that associated with FRE652. The consensus unrooted tree estimated by ML (maximum likelihood) methods is presented as a radial cladogram, and bootstrap values of >75% are shown at the nodes. The phylogenetic tree is based on the alignment of the GGDEFF domains (196 informative positions) of 169 sequences (see Supplementary Figure 2), including all the DGCs (20) annotated in the genome of S. meliloti strain GR4, which are highlighted in red, with an indication of their size and genomic location. The bacterial species harboring the DGCs used in the alignment are indicated at each external branch. Major clades and relevant clusters are also indicated and the corresponding bacterial families are shown. The branches corresponding to the Rhizobiaceae/Phyllobacteriaceae clade containing the most closely related orthologs and paralogs of FRE652-associated DGC are highlighted in red. The DGCs associated to FRE652 are indicated in dark blue and the possible paralog in strain GR4 pSymA in light blue. The additional four DGC homologs identified in strain GR4 in the Blast search within the 154 hits are indicated by an asterisk, whereas the DGC associated to FRE652 is indicated by an arrow.

Phylogenetic analyses were performed on an alignment of 154 amino-acid sequences (identity 40%) and the other 15 DGC sequences harbored by strain GR4. Similar tree topologies were obtained from independent alignments (not shown) for the GGDEF and EAL domains. Some of the DGCs in strain GR4 have no EAL domain. We therefore present here only the phylogenetic analysis based on the GGDEF domain alignment (196 informative positions, Supplementary Figure 2) of 169 sequences after the removal of unreliable positions from the alignment by GUIDANCE. The estimated phylogenetic tree (Figure 4) indicates that the DGCs associated with FRE652 in S. meliloti and S. medicae cluster together in a well-supported (88% bootstrap value) common node, consistent with a monophyletic group and the occurrence of an intron insertion event in this genomic region before speciation. The homologous DGC found on the strain GR4 pSymA (551 aa), in particular, branched from a common node (96% bootstrap value) shared with the cluster described above (referred to hereafter as pSymA-DGC1). The other sequenced strains of S. meliloti and S. medicae have no counterpart of former DGC gene, suggesting that this gene and the DGC associated to FRE652 in GR4 strain may be paralogs.

Similarly, the DGCs/PDEAs found on S. meliloti pSymB and the orthologous plasmid of S. medicae, pSMED01, and on the symbiotic plasmid of S. fredii (pNGR234b/pSfHH103e) are monophyletic (97% bootstrap support). This cluster (referred to hereafter as pSymB-DGC1) and pSymA-DGC1 have an internal node, with a bootstrap value of 99%, in common with other DGCs from other members of the Rhizobiaceae (R. leguminosarum bv. trifolii and viciae, R. etli, R. tropici, and A. radiobacter species) and Phyllobacteriaceae (M. loti species) families. The tree topology of this node resembles that for the species phylogeny (Figure 5): Mesorhizobium, Rhizobium and Sinorhizobium cluster together with a high level of bootstrap support (100%); these results suggest that these DGCs of these species are probably derived from a single ancestral gene that subsequently underwent rearrangements, gene duplications and losses after species and strain differentiation.

Figure 5
figure 5

Species tree for rhizobia. Consensus unrooted phylogenetic tree estimated by ML methods from the concatenated sequences of dnaK and rpoB housekeeping genes for the rhizobial species indicated aligned using the Geneious Pro Software (Biomatters Ltd). Bootstrap values of >75% are indicated at each node. α, the branch corresponding to α-proteobacteria; β, the branch corresponding to β-proteobacteria. The arrow indicates the predicted intron insertion into the ancestor of S. meliloti/S. medicae.

The additional DGCs in GR4, which are distributed in various replicons, including the chromosome, did not cluster with these nodes in the tree and their relationships with the other sequences included in the alignment remain uncertain, but some of them may have been acquired by horizontal gene transfer (see the pSymA-DGC2 clade in Figure 4).

Discussion

We identified a genomic record of a closely RmInt1-related intron buried in the genome of extant S. meliloti/S. medicae species, and obtained evidence to suggest that the insertion of this intron probably occurred before the divergence of these rhizobial species. The intron subsequently underwent deletion events and accumulated diverse mutations. In S. meliloti strains, this genomic region carrying genes that might be important to rhizosphere colonization and host plant nodulation undertook further deletions and diverse genetic rearrangements, including losses and duplications, and in some strains, a block of 7 kb containing the intron fragment and neighboring genes was hijacked by other smaller accessory replicons. This fragmented form of the intron has been maintained over extensive evolutionary time, which suggests it may confer a selective advantage on the host.

Rhizobial genes involved in symbiosis are often clustered on large plasmids (pSym), a feature differentiating these nitrogen-fixing plant endosymbionts from other nonsymbiotic saprophytes. Rhizobial genomes appear to be highly dynamic, probably due to the presence of repeated DNA sequences, IS elements and transposons, together with multiple replicons (MacLean et al., 2007). S. meliloti has a large, typically multipartite genome with a chromosome, a chromid (pSymB), a large replicon containing not only plasmid-type replication systems but also genes essential for growth and survival (Harrison et al., 2010), a megaplasmid (pSymA), and several additional smaller plasmids. The smaller plasmids and the megaplasmids are considered to be essentially strain-specific and of recent origin, whereas the chromid and the chromosome are thought to be less variable and more genus-specific and of ancient origin. It has been suggested that the pSymA megaplasmid is involved principally in structural fluidity and the emergence of new functions (Galardini et al., 2013). Consistent with this assumption, we found that the insertion of the ancient RmInt1-like intron into the ancestor of pSymA has, to some extent, contributed to generate pan-genome divergence after strain differentiation.

Like other bacterial group II introns (Dai and Zimmerly, 2002), RmInt1 is tending to evolve toward an inactive form by fragmentation, with the loss of the 3′ terminus, including the IEP (Fernandez-Lopez et al., 2005). The significance of fragmented introns within a particular genome remains unclear. They generally have no counterpart from the same intron in the same genome (Leclercq and Cordaux, 2012) and are thus considered to have been inactivated before proliferation, or they are overlooked as dying copies of a particular intron that is currently active. Only 25% of the bacterial genomes sequenced to date (Lambowitz and Zimmerly, 2011) harbor recognizable group II introns, arguing against a role as a broad and important force promoting evolutionary change, but caution is required in the interpretation of these observations. The overall group II intron primary sequence is not well conserved, other than for RNA domain (DV), so the 5′ end of intron sequences lacking the encoded ORF is unlikely to have been detected in sequenced bacterial genomes.

Nuclear pre-messenger RNA introns (Michel and Ferat, 1995) and non-long terminal repeat retrotransposons are both thought to be descended from mobile group II introns (Eickbush, 1994), but the role of group II introns in generating genetic novelty and bacterial evolution remains unclear. It has recently been suggested that, as for transposable elements, the dispersal and dynamics of group II intron spread within a bacterial genome would follow a selection-driven extinction model, predicting the removal of highly colonized genomes from the population by purifying selection (Leclercq and Cordaux, 2012). It has been reported (Muñoz et al., 2001) that 10% of S. meliloti strains and isolates seem to lack RmInt1 but do not appear to have any active mechanism for controlling intron invasion or proliferation (Martínez-Abarca et al., 2004; Nisa-Martinez et al., 2007). It is generally accepted that the ‘selfish’ features of mobile elements underlie their acquisition and maintenance in bacterial genomes, but these elements may also be beneficial to their hosts. In bacteria, group II introns are thought to be tolerated to some extent because they self-splice and preferential home to sites outside of function genes, generally within intergenic regions or in other mobile genetic elements (Simon et al., 2008), through mechanisms including the divergence of DNA target specificity to prevent target site saturation (Mohr et al., 2010). Other studies have suggested that group II introns are beneficial to their hosts because they control other potentially harmful mobile genetic elements (Chillón et al., 2010), and contribute to the generation of diversity and remodel genomes in time of stress (Coros et al., 2009). These features may decrease negative effects on the host organism, resulting in the maintenance of these retroelements for longer periods in bacterial populations. Our results suggest that the gradual eradication of group II introns by the host during evolution would not result in the complete elimination of intron sequences, with some intron fragments remaining and continuing to evolve in the genome.

The divergence of the rapidly growing rhizobial genera (SinorhizobiumRhizobiumMesorhizobium) has been dated to 203–324 million years ago (MYA), before the emergence of legumes estimated to 60 MYA (Turner and Young, 2000; Sprent and James, 2007). S. meliloti and S. medicae are taxonomically and symbiotically related species, but DNA–DNA hybridization was found to exhibit only 42–60% DNA homology (Rome et al., 1996), and have important differences in gene content (Sugawara et al., 2013). There are no reports dating the divergence of S. meliloti and S. medicae species, but phylogenies based on a sample of housekeeping genes suggest that S. meliloti and S. medicae are not sister species (Martens et al., 2007), and suggest that S. medicae might be the first emerging taxon within a clade including S. medicae, S. meliloti and S. arboris, suggesting a rather ancient speciation event leading to the first two species (Bailly et al., 2007). The identification in this study of a genomic record of a group II intron in S. meliloti/S. medicae genomes reveals that fragmented introns from ancient insertions within intergenic regions can persist for long periods, probably because their removal increases the likelihood of harmful effects on adjacent genes, as suggested for other fragmented transposable elements in eukaryotes (Werren, 2011). A search for conserved fragmented introns in S. meliloti species (N. Toro, unpublished) revealed that FRE652 is not a unique case, and similar ribozyme 5′ end fragments of other group II introns are conserved buried in the genomes of this bacterial species closed to conserved actively transcribed regions. These group II intron remnants could just represent a stochastic persistence of some intron fragments under a very slow process of degradation, but the possibility remains that they could provide sequence variation on which selection can act remaining and continuing to evolve in the genome in some bacterial lineages. We hypothesize that, as for other fragmented transposable elements in eukaryotes (Werren 2011), these fragmented intron sequences in bacteria may have evolved into functional cis-regulatory elements making a direct contribution to bacterial speciation. The data presented here raise novel issues concerning the significance of group II introns in bacterial evolution, which need to be further investigated.

Data Archiving

There were no data to deposit.