Harnessing the intragenomic variability of rRNA operons to improve differentiation of Vibrio species

Although the 16S rRNA gene is frequently used as a phylogenetic marker in analysis of environmental DNA, this marker often fails to distinguish closely related species, including those in the genus Vibrio. Here, we investigate whether inclusion and analysis of 23S rRNA sequence can help overcome the intrinsic weaknesses of 16S rRNA analyses for the differentiation of Vibrio species. We construct a maximum likelihood 16S rRNA gene tree to assess the use of this gene to identify clades of Vibrio species. Within the 16S rRNA tree, we identify the putative informative bases responsible for polyphyly, and demonstrate the association of these positions with tree topology. We demonstrate that concatenation of 16S and 23S rRNA genes increases the number of informative nucleotide positions, thereby overcoming ambiguities in 16S rRNA-based phylogenetic reconstructions. Finally, we experimentally demonstrate that this approach considerably improves the differentiation and identification of Vibrio species in environmental samples.

comparative analysis of 16S rRNA gene sequences from the ribosomal operon, along with the use of other molecular tools (e.g., DNA-DNA hybridization; DDH and reverse transcription-polymerase chain reaction, RT-PCR) and phenotypic data, have promoted the accumulation of millions of ribosomal gene sequences in reference databases such as SILVA 20 and rrnDB 21 .These archived sequences are currently used for species identification of vibrios and other bacteria 8,22 .Alternatively, the high genetic diversity and number of bacterial 'housekeeping' genes 23 have made it possible to differentiate congeneric species, and to generate phylogenetic hypotheses based on comparative analysis, also known as Multilocus Sequence Analysis (MLSA) 24,25 .For some time MLSA has been used to differentiate Vibrio species and conduct taxonomic assignment of Vibrio isolates [26][27][28] .However, except for 16S rRNA, the use of house-keeping genes as phylogenetic markers is less common in analysis of more complex environmental DNA samples (eDNA), which contain the genetic material of multiple species.The complexity and multi-species composition of eDNA makes it difficult to ensure that chimeric genomes are not formed during sequence assembly, producing artificial genotypes.Further, the use of sequence from 16S rRNA alone results in incomplete species coverage during sequencing, as well as taxonomic assignment with limited accuracy 26,27 .Previous studies reveal that within-genome heterogeneity of ribosomal operons draws into question species discrimination based on the sequence of a single copy of the 16S rRNA gene 29,30 .For instance, 16S rRNA sequences from Scytonema hyalinum strains are extremely heterogenous as previously shown 31 , with an intra-genomic sequence variability of 7.3-9.0%,and their use frequently leads to incorrect taxonomic assignment.Similarly, the inconsistent differentiation of the genera Butyrivibrio and Pseudobutyrivibrio 32 further demonstrates limitations of 16S RNA analysis.These findings highlight some intrinsic weaknesses of 16S rRNA as a phylogenetic marker.To overcome them, Martijn et al. 33 employ the 16S and 23S rRNA genes to study bacterial and archaeal diversity in environmental samples and demonstrate higher statistical support and increased number of monophyletic groups in comparison to those obtained in phylogenetic analysis of variation at single genes.Nonetheless, the efficiency of this approach for species level differentiation in the Vibrio genus has not previously been studied.Likewise, an alternative approach using the ITS regions within 16S-23S rRNA has not provided sufficient resolution to unambiguously differentiate Vibrio species in earlier work 34,35 .
Here, we (i) define the key sequence features that limit the potential of 16S rRNA gene to serve as a phylogenetic marker in discrimination of Vibrio species and (ii) assess the capacity of 23S rRNA to improve species resolution in phylogenetic analysis.First, we reconstruct a phylogenetic tree by using 16S rRNA sequences retrieved from 40 completely sequenced Vibrio genomes.We evaluate whether nucleotide polymorphism within 16S rRNA loci of single Vibrio species genomes might cause polyphyly and taxonomic ambiguity.Further, we identify how variation at particular nucleotide positions in 16S rRNA gene can drive the formation of polyphyletic clades during phylogeny reconstruction, and demonstrate the role of these positions in determining tree topology.We show that concatenation of 16S and 23S rRNA genes increases the number of informative nucleotide positions, thereby overcoming ambiguities in 16S rRNA-based phylogenetic reconstructions and improving the differentiation of Vibrio species.We use these results to design and test Vibrio-specific PCR primers that target the conserved terminal regions of 16S and 23S rRNA genes in order to amplify complete 16S-23S regions.Our results will help to improve the detection and identification of Vibrio species in eDNA samples, thus facilitating Vibrio monitoring in aquatic ecosystems.

Results
Through phylogenetic analysis of sequence variation among gene copies, we visualize the genetic variation within and among species that is represented in topological variation in phylogenetic trees.To minimize errors, our analysis was carried out with 40 Vibrio genomes selected based on (i) their completeness, and availability of highquality sequencing data and annotation, along with (ii) preferentially unambiguous taxonomic assignment.The selected high-quality genomes belong to the groups that satisfy one of three levels of certainty.The first group include genomes that meet literature support and satisfy the NCBI taxonomic criteria check.The second group is limited to those that satisfy only NCBI taxonomic check.The third group contains those genomes that are not supported by any of the above criteria.

16S rRNA gene-based tree
The topological analysis of 16S rRNA gene-based tree indicates that 16S rRNA gene copies of 26 species (e.g., V. cholerae, V. vulnificus, V. casei) form monophyletic clades (MCs), indicated on the tree by triangles (Fig. 1).Unlike a "cluster" that is usually referred to a group of sequences that bare resemblance to each other regardless of their evolutionary relationship, the phylogenetic term "clade" unites the group of sequences that belong to the organisms possessing a common ancestor.Our results demonstrate that 19 MCs are highly supported by bootstrap values ≥ 95 (Fig. 1), whereas other monophyletic clades, such as V. parahaemolyticus (bootstrap equal to 54) and V. furnissii (bootstrap equal to 70), are not sufficiently supported to fully rely on this monophyletic clade formation, based on variation at the 16S rRNA locus.The sequence variation present in the 16S rRNA copies of the remaining 14 species results in polyphyly, which is manifested by clustering of one or more 16S rRNA copies of one species with those of other species (e.g., V. chagasii in light blue and V. azureus in brown; Fig. 1).The species forming polyphyletic clades mostly coincide with the second and third level categories of certainty in genome taxonomic assignment that we defined (see Materials and Methods).Three distinct types of phylogenetic tree incongruences contribute to observed polyphyly: (i) 'outlier' , the failure of one 16S rRNA copy to cluster with the rest of the copies from the same genome (e.g., V. chagasii in light blue, V. campbellii in orange, Fig. 1); (ii) 'breaking' , a cluster of 16S RNA gene copies of one species is placed into what would otherwise be a MC of another species, resulting in paraphyly or polyphyly (e.g., V. cholerae-V.mimicus in light pink, V. coralliilyticus-V.tubiashii in purple, Fig. 1); (iii) 'distinct clusters' , the appearance of two separate clusters of 16S rRNA gene sequences from the same genome (e.g., V. azureus in brown, Fig. 1).

Disentangling polyphyletic patterns in the 16S rRNA gene-based tree
The phylogenetic ambiguity in the 16S rRNA tree appears largely associated with outlier gene copies and broken monophyly.The former is illustrated by patterns yielded by 16S rRNA copies in V. chagasii and V. campbellii, indicated by light blue and orange, respectively (Figs. 1 and 2).These gene copies present differing placements in the 16S rRNA tree, with one gene copy positioned separately from the rest of the clade, thus representing the case of an outlier gene copy (Fig. 1).The V. chagasii outlier gene copy M has three unique regions (i.e., V1, V2 and V3; Fig. 2a).Analysis of V. chagasii sequences from SILVA SSU data repository (indicated by a single asterisk in Fig. 2) reveals that they mainly contain variable regions 2 and 3, while only V. chagasii AP619 has all three variable regions (Fig. 2a).

23S rRNA phylogenetic tree
Following the 16S rRNA tree-based strategy for 23S rRNA and further examination reveals that ribosomal sequences of 32 among 40 representative Vibrio species form MCs. V. diabolicus (magenta) and V. owensii www.nature.com/scientificreports/(yellow) are the only species for which monophyly is not highly supported (bootstrap values equal to 76 and 91, respectively; Fig. 4).Polyphyly of an additional eight species is determined by one or several 23S rRNA gene copies clustering separately from the rest of the clade (e.g., V. splendidus, V. crassostreae, Fig. 4).

16S and 23S rRNA gene concatenation
The above results indicate that some sequence features, including the lack of sufficient informative bases, can limit the number of MCs, especially when phylogenetic analysis is conducted with only one of either the 16S or the 23S rRNA genes.In an attempt to overcome this limitation, we alternatively generate phylogenetic trees by using concatenated 16S and 23S rRNA gene sequences (Fig. 5).Both types of 16S and 23S rRNA gene sequence concatenation (i.e., 16S-23S and 23S-16S) produce 32 MCs, of which the same 29 species are highly supported (bootstrap ≥ 95).We also show that the use of 16S-23S concatemers increases bootstrap support for clades of two species (V.mimicus and V. chagasii) and reduces support for V. vulnificus, (i.e., 91, 88 and 84, respectively; Fig. 5a) when compared to a 23S-16S tree (i.e., 71, 42 and 93, respectively; Fig. 5b).Moreover, although V. neocaledonicus and V. alginolyticus (Fig. 5b) form a monophyletic clade in the 23S-16S tree, this clade emerges as polyphyletic in the 16S-23S tree (Fig. 5a) due to insertion of V. diabolicus and V. natriegens sequences.In contrast to the MCs formed by V. campbellii (orange) and V. owensii (yellow) on the 23S rRNA tree (Fig. 4), these species do not form MCs in the 16S-23S and 23S-16S gene concatenation trees (Fig. 5a,b).However, the use of 16S-23S concatemers

Conserved and variable regions in Vibrio 16S and 23S rRNA genes
Since concatenation improves bootstrap support, we analyze further 16S and 23S rRNA loci to identify variable and conserved regions, and subsequently design Vibrio-specific primers for amplifying the entire 16S-23S genomic region.First, we find that the 16S rRNA locus is highly conserved across the Vibrio genus (Fig. 6).
Approximately 74.6% of positions (1210) are fully conserved, whereas 162 positions are variable (Supplementary V. tu bia sh ii (10 )  Table S1).Nine variable regions of 16S rRNA (indicated by brackets, Fig. 6) universally present in bacteria 30,36 and variable regions identified in the alignment of Vibrio 16S rRNA sequences (indicated in red, Fig. 6) largely coincide.Regions 1, 3, 4, and 6 are the most variable in Vibrio, while regions 5, and 7 do not contain any variable positions except a 12-nucleotide-long insertion present in V. metschnikovii gene copy E, and guanine (G) insertion in V. rumoiensis gene copy B in variable regions 5 and 7, respectively (Fig. 6).Regions 2, 8, and 9 have some variable positions as well.Second, our approximate delimitation of conserved and variable regions of Vibrio 23S rRNA sequences retrieved from SILVA LSU r138.1 RefNR data repository (Fig. 7a) reveals that 57.43% (1743) of the positions are fully conserved (Supplementary Table S1).By allowing a single mismatch in one of the aligned sequences per position, the conserved region can approximately cover 73.34% of the total alignment.In further interpretation of

V . m e d it e r r a n e i (
303 variable positions (represented by red bars, Fig. 7a) we define variable regions as those with high concentration of variable nucleotide positions, i.e., those at which the consensus nucleotide occurs at frequencies lower than 0.747 (see Materials and Methods section and Fig. 7a).We find ten variable regions in aligned Vibrio 23S rRNA gene sequences.Moreover, variable region 3 is split into two subregions (i.e., 3a and 3b; Fig. 7a).In contrast to the alignment of 23S rRNA SILVA sequences, the 23S rRNA sequence alignment of our data repository (Fig. 7b) has a higher percentage of fully conserved regions (77.47%), which leads to a higher cutoff frequency value of 0.893 (Supplementary Table S1).The variable regions coincide in both alignments (Fig. 7), except for the presence of additional variable positions at the 3' end (Fig. 7b).We include this additional region as V10a in Fig. 7b to differentiate shared variable positions in sequences retrieved from SILVA from those held in our repository.

Selection of primers for amplification of Vibrio 16S-23S region
The assessment of several universal PCR primers allows selection of candidates for forward and reverse primers for the entire 16S-23S region (including both ribosomal RNA genes and intergenic spacer) in Vibrio genomes (Fig. 6, Supplementary Tables S2 and S3).Preliminary literature review reveals a main candidate location suitable for designing universal 16S rRNA gene-specific bacterial primers.This candidate location corresponds to primer variants similar to 27F, a universal forward primer widely used to amplify 16S rRNA bacterial gene sequences 37 .Furthermore, comparison of these variants provides a consensus primer sequence (c27F) representing all available primer variants (Supplementary Table S2).Although the universal bacterial forward primer (i.e., S-DBact-0008-cS-20, Supplementary Table S2) containing the same number of degenerate bases as c27F allows a better coverage among bacteria than a less degenerated primer, the Vibrio-specific consensus sequence eliminates the necessity of using such a highly degenerated forward primer for Vibrio species (Supplementary Table S2).We found that all custom Vibrio 16S rRNA gene copies can be amplified by the original 27F universal primer (5'-AGA GTT TGATCMTGG CTC AG-3') introduced in 1991 (Supplementary Table S2).
After defining conserved regions in 23S rRNA genes (Fig. 7b), we consider two conserved regions as targets for a Vibrio-specific 23S rRNA reverse primer.Among them, the sense strand target of primer 23S_rev_V (positions 2864 to 2285) is closer to the 3' end of 23S rRNA sequence than the region in the sense strand (positions 2227 to 2243) complementary to primer 2242R.The first region provides a longer conserved sequence without indels and should provide nearly full length 23S rRNA amplicons, making 23S_rev_V the best candidate as a universal Vibrio-specific primer (Supplementary Table S3).
An in silico specificity test employing the locus-specific 27F and 23S_rev_V primer pair for amplification of the 16S-23S region of complete Vibrio and non-Vibrio genomes does not yield products with non-Vibrionaceae genomes as templates (Fig. 8).In contrast, in silico PCR produces amplicons of anticipated length for all Vibrio genomes as well as for five non-Vibrio Vibrionaceae genomes (Fig. 8, Supplementary Table S4).Furthermore, the same primers amplify in silico all 16S-ITS-23S copies from 40 Vibrio genomes from our database (Fig. 8).All the amplicons obtained have the expected size, approximately 4700 bp.

Experimental validation of Vibrio-specific primers
To experimentally test the ability of the Vibrio-specific primers to generate 16S-ITS-23S amplicons, we extracted metagenomic DNA from an environmental water sample and used it as a template.Consistent with the in silico results (see previous section), PCR amplification of environmental DNA using 27F and 23S_rev_V primers yields amplicons of the expected size (Supplementary Fig. S1).Their sequencing with Oxford Nanopore technology and standard protocols produced 105,230 reads (see "Materials and methods"; Supplementary Table S5).Analysis of high quality reads (83,207 in total) by WIMP (Oxford Nanopore) revealed that almost all reads belong to Proteobacteria (see Supplementary Fig. S2).Moreover, nearly half of these reads (48.9%) are of Gammaproteobacteria origin.We show that 2.17% reads correspond to the species that belong to the Vibrionaceae family (Supplementary Table S5).These reads (1,806 in total) represent a large variety of Vibrio species (see Supplementary Table S6).Interestingly, the trimming of 16S-ITS-23S reads at their 3' end to obtain 1600 nt fragments (representing 16S rRNA gene sequences) and subsequent taxonomic annotation by WIMP (Oxford Nanopore) demonstrates that nearly one third of the truncated reads (i.e., 33.9%) become assigned to non-Vibrio species (Supplementary Fig. S2, panel c).Moreover, the trimming-dependent "loss" of some Vibrio reads also decreases the number of species compared to that initially discovered based on 16S-ITS-23S sequences (Supplementary Table S6).

Discussion
Improved differentiation of Vibrio species has been possible using the 23S rRNA gene as a phylogenetic marker instead of the 16S rRNA gene.Analysis of a maximum likelihood (ML) 16S rRNA gene tree identifies the informative bases associated with multiple polyphyletic patterns (Figs. 2 and 3), which are largely resolved in a 23S rRNA gene tree.The 23S rRNA gene tree presents 11 additional highly supported monophyletic clades compared to the 16S rRNA gene tree (Figs. 1 and 4).The capacity of a 23S rRNA gene tree to reveal a higher number of monophyletic clades than a 16S rRNA gene phylogeny is consistent with the results obtained for non-Vibrio taxa 38 .The higher number of informative bases within the 23S rRNA gene when compared to 16S rRNA (i.e., 295 vs 162 variable positions; Supplementary Table S1) likely accounts for the observed increase in differentiation of Vibrio and non-Vibrio species.
We combine 16S rRNA and 23S rRNA gene sequences to increase the number of informative bases, in order to distinguish additional Vibrio species.The 16S-23S concatemer-based trees form a number of monophyletic clades similar to that provided by the 23S rRNA tree (Figs. 4 and 5, respectively), thus resembling the results for non-Vibrio species that are obtained with trees of single copies (per genome) of 23S rRNA and 16S-23S concatemers 38 .Nonetheless, the trees based on the concatemer sequences (Fig. 5) enable taxonomic assignment of three additional Vibrio species (V.diabolicus, V. natriegens and V. vulnificus).These species are not resolved in the phylogenetic trees based on Vibrio 23S rRNA or 16S rRNA sequences individually.
A new combination of 16S forward and 23S reverse primers for in silico amplification of Vibrio 16S-23S region suggests the feasibility of targeting Vibrio-specific sequences in environmental DNA.Amplification of this region in bacteria usually involves universal 16S rRNA 27F and 23S rRNA 2490R 39,40 or 2241R 41 primers.To increase the specificity of amplification, assure the broadest coverage of Vibrio species, and produce amplicons with an increased number of informative bases, we propose the combination of forward (16S rRNA 27F 42 ) and new reverse (23S rRNA 23S_rev_V) primers.These primers can amplify in silico the corresponding fragments of all Vibrio ribosome operons in our custom database and provide products of 4.7 kbp or more.This size matches that of amplicons (i.e., 4.3-5.4kbp) that encompass the entire 16S-ITS-23S regions in a large variety of bacterial species 41 .
Intragenomic variability in the number of rRNA operons constrains the use of ribosomal genes for analysis of environmental samples (https:// rrndb.umms.med.umich.edu/ search/) 21 .We observe that the number of www.nature.com/scientificreports/ribosomal operons ranges from 5 to 16 for Vibrio genomes (Supplementary Fig. S3), and is on average higher than that reported in other studies 11,35 .Ribosomal operon multiplicity allows bacteria to increase ribosomal content quickly, providing rapid adaptation to changing environmental conditions, such as increase in nutrient availability or favorable temperature shifts 43 .Intragenomic variability in number of operons among strains that belong to the same species could lead to over-or underestimation of species richness, as the real number of species in samples can be lower/higher than that estimated based on the number of detected unique gene copies 30 .This circumstance restricts the application of ribosomal genes in quantitative analysis of eDNA 44 .The general plasticity of bacterial genomes extends to the nucleotide composition of ribosomal operons, and therefore can influence outcomes of phylogenetic analysis.The presence of 16S rRNA and 23S rRNA in the same molecular machine (i.e. the ribosome) suggests their interdependent evolution to preserve ribosomal function.Compared to sequence conservation in regions important for ribosomal function, variable regions show higher diversity and are the location of informative bases in phylogenetic analysis.The phylogenies of our 16S rRNA and 23S rRNA gene copies from the same operon reveal different evolutionary relationships for some Vibrio species (Fig. 1 vs Fig. 4).These observations suggest that the routine sequencing of 16S rRNA amplicons and subsequent homology search can produce false matches for "outlier gene copies", and consequently mislead species assignment.In this context, species assignment will benefit from fully annotated genomes based on all ribosomal operons 30,44,45 .The divergence of ribosomal operons can be explained by horizontal gene transfer or effects of mutation 46,47 .Even though gene transfer and mutations may increase intragenomic operon divergence in Vibrio, many studies report selection that favors homogeneous ribosomal structure and maintenance of function 47,48 .Intragenomic operon divergence may be transitory, and may provide an opportunity to study processes of operon homogenization.
Internal transcribed spacer (ITS; Supplementary Fig. S4) represents an additional source of informative bases contributing to the variability of the Vibrio ribosomal operon.Previous studies support the potential of ITS as a phylogenetic marker for differentiation of bacterial taxa from distinct families 39,41 .However, the effectiveness of ITS alone as a marker in a narrower range of taxa, such as species in the genus Vibrio, likely decreases.Consistent with this idea, the results of a previous study indicate that the use of ITS alone is insufficient for differentiating all Vibrio species 35 .Furthermore, lower intergenomic than intragenomic ITS sequence variability 33 can further complicate species delineation based on ITS as a single marker.Despite some apparent limitations of ITS use, this region in combination with 16S and 23S rRNA genes could increase the total number of informative bases available for phylogeny construction and, therefore, might further improve the taxonomic assignment of Vibrio species.
We show through in silico analysis that, despite the key role of 16S rRNA gene in establishing the taxonomy of bacterial species, this gene possesses a number of deficiencies that complicate its use for differentiating Vibrio species in multispecies assemblages, such as those in environmental samples.Moreover, we demonstrate that some limitations can be overcome by the joint use of 16S and 23S rRNA genes, and we propose a candidate universal primer pair for Vibrio-specific amplification of the rRNA genes and the ITS.Although the joint use of ribosomal genes per se does not allow delineating all Vibrio species, the additional incorporation of ITS sequences present in the amplified 16S-ITS-23S fragments may increase the number of informative bases, potentially providing further improvements in the differentiation of Vibrio species in environmental samples.
The experimental testing of the proposed primers reveals that they work well with environmental DNA and are capable of amplifying a wide range of Vibrio sequences.Moreover, the use of these primers makes it possible to greatly increase the discoverability of Vibrio species compared to the "classical" 16S rRNA-based approaches widely used to monitor microbial diversity.For instance, one of the previous studies 49 using 16S rRNA gene along as a phylogenetic marker apparently failed to identify any member of the Vibrionaceae family in environmental samples obtained from the same area (Plentzia Bay).Finally, 35.4% of Vibrio sequences that we discovered in the environmental sample originate from genomes that are not present among the genomes of 40 Vibrio species we initially selected for analysis.This strongly suggests that our primers enable broad coverage of Vibrio species.

Creating a custom repository of 16S and 23S rRNA gene sequences
Vibrio is the most diverse genus in Vibrionaceae, currently including 151 described species and 5 subspecies (LPSN database, https:// www.bacte rio.net/, accessed June 2022) 50 .To carry out in silico analysis, we created a data repository by retrieving all copies of ribosomal operon genes (i.e., 16S rRNA and 23S rRNA) from 40 representative, fully-sequenced Vibrio genomes, one genome per species (Supplementary Table S7).Genome taxonomic assignment was further verified when Vibrio spp.didn't form highly supported and unambiguously differentiated monophyletic clades.We classified levels of certainty of genome taxonomic assignment in the following way: first, literature support existed and the NCBI taxonomic check criteria were satisfied; second, only the NCBI taxonomic check criteria were satisfied; and third, when none of these criteria were satisfied (Fig. 1, Supplementary Table S8).When multiple genomes were available, we preferentially selected published and annotated genomes of validated Vibrio species in the LPSN database that were assembled using both long-and short-read sequences (e.g., those obtained by both PacBio and Illumina sequencing).To choose representative genomes of V. diabolicus, V. natriegens, and V. scophthalmi from IMG/M database (https:// img.jgi.doe.gov/ cgi-bin/m/ main.cgi), we constructed a similarity matrix of gene copies from the same genome based on NCBI BLASTn results (https:// blast.ncbi.nlm.nih.gov/) and analyzed the number of gaps and mismatches to find the genomes with the highest internal variability in 16S and 23S rRNA gene copies.Next, the ribosomal sequences that were downloaded from NCBI GenBank and IMG/M databases (Supplementary Fig. S5) were manually curated by adding missing conserved terminal nucleotides to obtain full-length copies.We assigned to each retrieved sequence a unique ID in which the last three digits referred to the operon carrying the corresponding

Figure 1 .
Figure 1.Maximum likelihood phylogenetic tree using all 16S rRNA sequences of 40 representative Vibrio genomes.Species are highlighted if they form clades in one of the phylogenetic trees (16S or 23S rRNA-based).Clades are collapsed and labelled with the corresponding species name.The number of gene copies in each clade is shown in brackets.The bootstrap values are calculated with 1000 replicates.The genome taxonomic assignment satisfies both reliability criteria (no asterisk), satisfies only the NCBI criterion (*) or does not satisfy any of the criteria (**).Two additional sequences corresponding to Salmonella bongori (16S422) and Escherichia coli (16S423) are indicated in red as an outgroup.

Figure 2 .
Figure 2. Maximum likelihood phylogenetic trees of V. chagasii (a, blue background) and V. campbellii (b, orange background) cases illustrating the outlier gene copies with a sister clade and the variable regions associated with outlier gene copy topology.A graphical representation of 16S rRNA gene with characteristic variable regions (V1, V2 and V3) is shown in the bottom left corner of each tree.The variable regions in blue are unique to the sequences (tree tips) featured with blue text, and variable regions highlighted with magenta are shared among sequences, indicated with both blue and magenta text.The black nucleotides inside of the variable regions are not unique to the outlier gene copy.Additional sequences with the highest homologies to the two species are from the SILVA database (*).Bootstrap support is calculated with 1000 replicates.The HKY + F + R2 substitution model is used here.Two additional sequences corresponding to Salmonella bongori (16S422) and Escherichia coli (16S423) are included as an outgroup (in red).

Figure 3 .
Figure 3.Maximum likelihood phylogenetic trees of V. cholera-V.mimicus (a) and V. jasicida-V.owensii (b), both cases forming a clade inside of a clade of another species.Informative nucleotides responsible for the species clustering patterns, i.e., nucleotides (in orange) of collapsed clades (i.e., V. cholera and V. jasicida) breaking another species clade; nucleotides (in blue) shared among gene copies of polyphyletic clades (V.mimicus and V. owensii); nucleotides (in yellow) belonging to sister species (V.navarrensis, V. campbellii and V. neocaledonicus); nucleotides (in green) associated with additional sequence sub-cluster, are specified.Sequences sharing the same nucleotide in the same position as one of the defined groups are indicated by the same color.Letter codes that represent degenerate nucleotides are in red (i.e., Y can be C or T; S can be G or C; W can be A or T; R can be A or G).The indicated nucleotides represent substitutions with one exception (i.e., deletion of one nucleotide in position 94 in gene copy "G" of V. owensii, indicated by hyphen), whereas the blank positions correspond to conserved nucleotides omitted for simplicity.The labelling by asterisks is the same as in Fig.1.Bootstrap support is calculated with 1000 replicates.Two additional sequences corresponding to Salmonella bongori (16S422) and Escherichia coli (16S423) are included as an outgroup in red.

Figure 4 .
Figure 4. Maximum likelihood phylogenetic tree using all 23S rRNA sequences of 40 representative Vibrio genomes.Species are highlighted if they form clades in exactly one of the phylogenetic trees (16S or 23S rRNAbased).Clades are collapsed and the corresponding species names are indicated.The number of gene copies in each clade is shown in brackets.The bootstrap values are calculated with 1000 replicates.The labelling by asterisks is the same as in Fig. 1.Two additional sequences corresponding to Salmonella bongori (23S422) and Escherichia coli (23S423) are indicated in red as an outgroup.

Figure 5 .
Figure 5. Maximum likelihood phylogenetic trees using 16S and 23S rRNA concatenated sequences in order 5′-16S-23S-3′ rRNA (panel a) or 5′-23S-16S-3′ rRNA (panel b) of 40 representative Vibrio genomes.To simplify the view, clades are collapsed and labelled with the corresponding species name.The number of gene copies in each clade is shown in brackets.Bootstrap support is calculated with 1000 replicates.V. neocaledonicus and V. alginolyticus are shown in blue.The labelling by asterisks is the same as in Fig. 1.Additional sequences corresponding to Salmonella bongori (16S-23S422 and 23S-16S422, respectively) and Escherichia coli (16S-23S423 and 23S-16S423, respectively) are indicated in red as an outgroup.

Figure 7 .
Figure 7. Deduced variable and conserved regions of 23S rRNA based on frequency values with which the consensus nucleotide at each aligned position occurs in Vibrio sequences retrieved from SILVA database (a) and a local data repository of representative Vibrio genomes (b).Indicated are fully conserved positions (dark gray), variable (red), conserved (light gray), and gaps (white).The position of the white dots representing individual consensus nucleotides is adjusted according to the frequency occurring at each position.A blue dashed line represents the cutoff frequency separating ten percent of the most variable nucleotides.

Figure 8 .
Figure 8.The results of in silico PCR amplifications obtained by the combination of 16S rRNA 27F42 and 23S rRNA 23S_rev_V primers.Whole genome sequences of Vibrio, non-Vibrio Vibrionaceae and non-Vibrionaceae species are used as templates.Genomes for which gene copies can be amplified (shaded) and can not be amplified (blue).The lack of amplification in the case of non-Vibrionaceae genomes suggests that the sequence of Vibrio-specific primers was not conserved in these genomes.Thus, the primers demonstrate preferential specificity for Vibrionaceae family.