Genomic adaptation of Pseudomonas strains to acidity and antibiotics in hydrothermal vents at Kolumbo submarine volcano, Greece

Although the rise of antibiotic and multidrug resistant bacteria is one of the biggest current threats to human health, our understanding of the mechanisms involved in antibiotic resistance selection remains scarce. We performed whole genome sequencing of 21 Pseudomonas strains, previously isolated from an active submarine volcano of Greece, the Kolumbo volcano. Our goal was to identify the genetic basis of the enhanced co-tolerance to antibiotics and acidity of these Pseudomonas strains. Pangenome analysis identified 10,908 Gene Clusters (GCs). It revealed that the numbers of phage-related GCs and sigma factors, which both provide the mechanisms of adaptation to environmental stressors, were much higher in the high tolerant Pseudomonas strains compared to the rest ones. All identified GCs of these strains were associated with antimicrobial and multidrug resistance. The present study provides strong evidence that the CO2-rich seawater of the volcano associated with low pH might be a reservoir of microorganisms carrying multidrug efflux-mediated systems and pumps. We, therefore, suggest further studies of other extreme environments (or ecosystems) and their associated physicochemical parameters (or factors) in the rise of antibiotic resistance.

During the last decade, there is a growing global concern about the rise of antibiotic and multidrug resistant bacteria resulting from the pressure of antibiotic usage 1,2 . Previous research on antibiotic resistance (AR) was limited in clinical environments, but due to an increase in life-threatening infections, research on AR in natural habitats emerged 3,4 . Recent studies on natural environments have revealed vast genetic reservoirs of AR genes 5,6 . These include soils 6,7 , glaciers 8 , seawater and penguin fecal samples 9 and animals 10 . In 2015, Hatosy and Martiny 11 uncovered a previously unknown diversity of AR genes among marine environments and suggested the ocean as a global reservoir of clinically relevant and potential novel AR genes.
Recent environmental studies have shown that the increased AR resistance of bacteria in natural habitats, can be associated with environmental stressors such as the pH reduction. The first observation was made in 2009 by Vega Thurber et al. 12 , who noticed that stressors, such as the pH decrease, can increase the abundance of AR genes in coral-associated bacterial communities 3,12 . In 2011, Meron et al. 13 recorded an increase in isolated bacteria producing antibacterial activity from corals maintained at pH 7.3 as compared with isolated strains obtained from corals maintained at pH 8.2. Such observations raise special concerns as there is an alarming decline in global surface ocean pH, (also known as ocean acidification 14 ), which can potentially trigger an increase in antimicrobial activity and the selection of antibiotic resistance genes.
A good candidate microorganism to study the bacterial evolution in a range of environmental stressors, such as the pH decline, and its role on the rise of antibiotic resistance, is the gram-negative cosmopolitan Pseudomonas, one of the most diverse bacterial genera within Gammaproteobacteria 15 . Pseudomonas members show an impressive metabolic and physiologic versatility [16][17][18] , and they have adopted mechanisms to promote their survival and persistence in various environments 19 .

Results
Genome analysis. A total of 21 strains were chosen based on previously published data to represent both surface (5-90 m) and deep (430 m and 495 m) seawater layers of the Kolumbo volcano and their susceptibility to various environmental stressors including acidity, six commonly used antibiotics and heavy metals 4 ( Fig. 1; Table 1; Supplementary Table S1). More specifically, 9 of the strains stemmed from the surface, and the remaining 12 strains originated from the deep seawater layers. All surface seawater strains and three deep seawater seawater samples that were used for Pseudomonas strains isolation. AUV data were collected during POS510 cruise, in 7 missions of AUV Abyss (GEOMAR), under the framework of the collaborative project "ANYDROS: Rifting and Hydrothermal Activity in the Cyclades Back-arc Basin" (modified from Nomikou et al. 70 ). For the swath data processing, visualization, and bathymetric maps we used the freely available software packages MBsystem v.5.7.6 (https ://www.mbari .org/produ cts/resea rch-softw are/mb-syste m/) and QGIS v. 3 (Table 1, Supplementary Table S1). Genome sequencing analysis of all strains with MiSeq technology and subsequent assembly with Spades 28 allowed to obtain draft assemblies for all isolates. These assemblies were inspected for contamination from other strains and corrected, further extended using complete genomes from related Pseudomonas strains, and gaps were filled by mapping the raw sequence data against these assemblies. The final assemblies contained between 4 and 52 scaffolds for each genome, and genome size varied from 4.3 Mb to 6.4 Mb (Fig. 2). Single Nucleotide Variants (SNVs) were found in few positions across the genomes (Fig. 3). CheckM analysis 29 showed completeness equal to or higher than 98.9% for all genomes (Fig. 2), whereas Busco analysis 30 showed completeness equal to or higher than 99.7% (Supplementary Table S2 Table S3). A total of 2059 GCs were common to all 21 strains and called "Core" (A in Fig. 5), 2775 GCs were found exclusively to the Aeruginosa group strains and called them unique ("Aeruginosa-unique"; B1 in Fig. 5), whereas 643 GCs were found exclusively to all the Stutzeri group strains ("Stutzeri-unique"; C in  Table S4). The unique GCs for Aeruginosa group were related to functions of biotechnological importance, such as geraniol degradation 37 , central carbon metabolism in cancer 38 , betalain biosynthesis 39 , ferroptosis 40 , bisphenol degradation 41 , phenazine biosynthesis 42 and sphingolipid metabolism 43 . According to Blast2GO functional associations 44 , the vast majority of the core functions were related to an integral component of membrane (i.e. 5.4%), while according to Phobius associations 45 , the majority of the core protein regions were predicted to be extracellular (26.2%), followed by those predicted to be transmembrane (i.e. 17.5%) and then by the cytoplasmic ones (i.e. 16.7%) (data not shown). Remarkable differences between Aeruginosa and Stutzeri group strains were noticed for GCs, which are related to phages and sigma (σ) factors. More specifically, in Aeruginosa group strains, the phage-related GCs were 34 including a hypothetical protein of bacteriophage Pf1, phage-related tail proteins, bacteriophage tail proteins, helix destabilizing proteins of bacteriophage Pf1, P2-like prophage tail proteins, phage tail sheath protein, phage tail assembly chaperone proteins etc. whereas for the Stutzeri group were only 4 i.e. the ORF phage PA0727, a phage/conjugal plasmid C-4 type zinc finger protein, a putative bacteriophage protein and a phage holing family protein (Supplementary  Table S5). Similarly, the sigma (σ) factors which are involved in the regulation of gene expression were 26 for the Aeruginosa group strains and only 7 for the Stutzeri ones. The Aeruginosa group GCs included the RNA polymerase sigma-70 factor and the sigma-24 subunit of the ECF subfamily, the sigma factor PvdS, the sigma-70 factors family signature 2, the extracytoplasmic function (ECF) sigma factor and the sigma-70 factor Fpvl (Supplementary Table S6). In addition, very few GCs related to transposable elements were found, and these included 4 for Aeruginosa group strains, which are linked to transposition, DNA-mediated functions, transposase and inactivated derivates of IS5 family and other transposase and only 2 GCs of transposition-related function for the Stutzeri group strains (Supplementary Table S7).
GCs of antibiotic and multidrug resistance. Mandalakis et al. 4 reported that the bacterial isolates from Kolumbo crater floor exhibited higher tolerance to antibiotics compared to isolates from surface seawaters. To further ascertain whether there are genetic features that shape this difference between deep and surface seawater microbes we attempted to mine genes from the 21 genomes related to antimicrobial or/and multidrug resistance, i.e. resistance to two or more classes of antibiotics 46 . Following the results of Mandalakis et al. 4 , the strains were grouped based on their phenotypic tolerance to antibiotics, i.e. -into "High Tolerance" and "Low Tolerance" (Table 1) and a functional enrichment comparison of the two groups was performed (Supplementary Table S8). The "High Tolerance" group included all members of the Aeruginosa group plus the Stutzeri group member Strain08, while the "Low Tolerance" group included the rest 12 strains (Table 1). In this analysis, we focused only on the GC-associated functions that were unique to each group. The "High Tolerance" group comprised 37 unique GCs associated with antimicrobial and multidrug resistance, such as antimicrobial resistance genes, multidrug efflux systems and pumps and transmembrane transporters 47 . In addition, a GC associated with a TetR/AcrR family transcriptional regulator that regulates antibiotic resistance was also identified for the "High   48 . On the contrary, only 3 GCs associated with antimicrobial and multidrug resistance were found for the "Low Tolerance" group (Supplementary Table S8).

Discussion
One of the biggest current threats to human health is the rise of antibiotic and multidrug resistant bacteria 1,2 .
Microorganisms produce many antimicrobials in nature, they become resistant to the antibiotics they produce, and then, the genes of resistance can be transferred to other non-resistant bacteria 49 . The presence and accumulation of antibiotics in the environment may provide an additional selective pressure for the transmission of AR genes to non-resistant bacteria 2,50 . The possible mechanisms of anthropogenic origin that can lead to the occurrence of AR in marine habitats include (a) coastal runoff of AR bacteria from terrestrial sources and (b) selection for AR due to anthropogenic antibiotic runoff, which challenges native microbes to become resistant 11 . Our understanding of the environmental factors that may control the AR selection in extreme environments is still limited. In our previous work 4 we noticed an enhanced co-tolerance to acidity and antibiotics of Pseudomonas strains isolated from the deep seawater (430 and 495 m) layers of the hydrothermally active submarine Kolumbo volcano compared to those isolated from the surface seawater (5-90 m). In the present study, we performed whole genome sequencing for 21 of these strains to gain insight into the genetic basis of the enhanced co-tolerance to antibiotics and acidity. A series of tools were used for complete genome assembly and annotation, whereas a total of 28 different databases were used in order to comprehensively assign functions to the predicted genes. The example case presented in Table 2 demonstrates the benefits gained from using multiple annotation sources. This advantage becomes more obvious when annotation for a gene is missing in some databases (grey gaps in Fig. 5).
In this example case given in, the annotated CDS most probably represents a 6-pyruvoyl-tetrahydropterin synthase (PTPS), since at least 10 of the annotations point to the same enzyme. PTPS is a well-studied enzyme, and in P. aeruginosa, is involved in the biosynthesis of the modified tRNA nucleosides queuosine and archaeosine 51 . This is also supported by the position of the CDS in the genome, i.e. next to the rRNA operon. The annotations from KEGG Function [34][35][36] and TIGRFAM agree on this function, while the more general annotations from COG Category and KEGG Pathways are not very informative. By contrast, Reactome and Superfamily seem to give erroneous annotations (biosynthesis of tetrahydrobiopterin). Reactome annotations, in particular, are based on datasets of Homo sapiens and are thus expected to not be pertinent for bacteria.
Phylogenetic analysis revealed two distinct and well-known Pseudomonas groups, the Aeruginosa and the Stutzeri. Pangenome analysis revealed that the numbers of phage-related GCs and sigma factors were much higher in Aeruginosa group compared to the Stutzeri one, whereas only few transposable elements were found in both groups. GCs related to both phages and sigma factors are of exceptional importance, as they provide the mechanisms of adaptation to challenging conditions such as extreme pH values or increased antibiotic concentrations. According to Colomer-Lluch et al. 2 , phages can carry antibiotic resistance genes able to confer resistance to bacterial strains and they may influence the generation of resistance in the environment. In addition, the sigma factors can provide the mechanisms for regulating the expression of large numbers of genes in response to changing environmental conditions, such as a temperature rise or a decline in pH (e.g. 52 ). The sigma factors found in Aeruginosa group of the present study included almost all sigma factors that have been identified so far in P. aeruginosa strains 52 . Their presence in sufficient numbers is a strong indication of the high adaptability of Aeruginosa group members to the extreme conditions of the active hydrothermal field of Kolumbo volcano. In addition to sigma factors, the high numbers of GCs related to phages in Aeruginosa group provided also strong evidence for the presence of genes associated with antibiotic resistance in the CO 2 -rich acidic water column above the active hydrothermal vents of the Kolumbo volcano.
We further examined all unique GCs in Pseudomonas strains of high co-tolerance to antibiotics and acidity. Interestingly, all of them were associated with antimicrobial and multidrug resistance. The identified GCs included antimicrobial resistance genes, multidrug efflux systems and pumps and transmembrane transporters. Interestingly, that was not the case for the low tolerant Pseudomonas strains implying that strains from the acidic seawater layers above the active area of Kolumbo volcano, carry antimicrobial resistance mechanisms. Bacterial multidrug efflux pumps are antibiotic resistance determinants present in all microorganisms. According to Blanco et al. 47 multidrug efflux pumps are ancient elements encoded in bacterial genomes long before the recent use of antibiotics for human and animal therapy. These elements can extrude, beside antibiotics, a wide range of non-antibiotic substrates, such as heavy metals (e.g. Cu, Hg, As), organic pollutants and others. The submarine Kolumbo volcano is characterized by a unique metal enrichment of hydrothermal spires and mounds 26 . Such metal enrichment (e.g. Sb, Tl, Ag, As, Hg, Pb, Cu) may have provided selective pressure for the maintenance of resistance mechanisms in the acidic water above the active hydrothermal vents of the Kolumbo crater. These mechanisms are mostly processed by membrane proteins which limits the intracellular access to antibiotics 53 and allows the increased tolerance of Pseudomonas strains from Kolumbo to a series of stressors such as acidity and antibiotics. The present study provided strong evidence that the CO 2 -rich acidic seawater of the Kolumbo volcano serve as a pool of microorganisms carrying GCs for multidrug efflux-mediated systems and pumps and is an ideal natural laboratory to understand the physicochemical factors selective pressure that shape the antibiotic resistome. Additional research is needed to further our insights into the extent to which extreme ecosystems are reservoirs of resistance mechanisms across the globe.

Materials and methods
Strains characteristics. A total of 21 bacterial strains previously isolated from surface and deep seawater samples of submarine Kolumbo volcano were used in the present study 4 . Nine of the strains were isolated from surface seawater (i.e. 1 from 5 m, 4 from 20 m, 1 from 45 m, 3 from 90 m) and the rest 12 strains were isolated from deeper seawater (i. e. 1 from 430 m and 11 from 495 m) Table S9).

Phylogenetic analysis.
We selected a series of reference strains for the phylogenetic analysis by using the highest ANI similarity results with the sequenced strains of the present study. All reference strains were retrieved from the Genome Taxonomy Database Release 03-RS86 54 . Fasta files of the selected genomes were downloaded either from Pseudomonas Genome Database or NCBI RefSeq. All downloaded fasta files where either complete genomes or had at most 10 contigs. Cellvibrio japonicus, a closely related strain to Pseudomonas genus, was used as an outgroup 55 . We used the program anvi-script-reformat-fasta to reformat fasta headers, and the output was used with program anvi-gen-contigs-database to create an Anvi' o contig database for each of the 93 strains selected for our analysis. Program anvi-run-hmms was used to store HMM hits of Busco orthologs from OrthoDB v9 Gamma-Proteobacteria database in the anvi' o contigs database. We used the program anviget-sequences-for-hmm-hits to select single copy genes which existed in all genomes and aligned the amino acid sequences separately with muscle v.3.8.31. For each of the amino-acid fasta multiple sequence alignments (MSAs) we removed all gaps and renamed fasta headers to make them shorter with an in-house script. Prank v.170427 56 was used to align each fasta file with the +F option. We inspected each MSA manually with Mega v.10.0.5 57 and selected 10 amino-acid single copy gene MSAs based on phylogenetic information for downstream analyses. Guidance2 58 was used with 200 bootstrap repeats to detect unreliable MSA regions and unreliable columns with confidence score below 0.93 were removed. Mega v.10.0.5 was used to inspect and manually trim each MSA if needed. MEGA-CC v.10.0.5 was used to create a maximum likelihood (ML) tree for each MSA. The program fasta-2relaxedPhylip.pl from the Phylogenomic suite 59 was used to turn each fasta MSA into phylip format. We ran CodeML from the PAML v.4.9 suite 60 with input each phylip MSA and starting tree the corresponding ML tree, Figure 3. Genome analysis of a representative strain. The same analysis was applied for all strains. (A) Anvi' o representation of the genome assembly of Strain03. The tree structure in the inner circle represents the scaffolds of the genome. The black first inner layer is the length of the split for which all above layers are calculated, the dark-greened second layer is the GC content, the third, fourth, fifth and sixth layers represent the variability (SNVs) (in black) found in the paired end reads, merged reads, single end from pair 1 reads and single end from pair 2 reads, respectively. The seventh layer represents the rRNAs found in the genome, and the last outer layer represents the gene-level taxonomy. Coloration of the outer layer is according to taxonomy assignment as indicated in the upper right insert. Anvi' o splits that were not taxonomically assigned under the Pseudomonaceae family are marked as A, B and C. The highlighted split D, is explored further in Fig. 5. (B) Detailed view of a part of the genome of Strain03. In the upper frame the coverage of the paired end reads is depicted in the gray graph. Vertical lines with colored nucleotide letters on top depict position of the SNVs. The lower frame shows the genes found in the particular split, their length and orientation. An rRNA operon (16S rRNA, 23S rRNA and 5S rRNA) is shown in this split. A partial 16S rRNA with only 55% of its sequence aligned to 16S rRNA is also shown to be found outside of the rRNA operon. Most CDS shown were predicted by Prodigal (green arrows), except for the two CDS predicted by GeneMarkS2 (purple arrows), as indicated in the figure. The annotation of the yellow framed CDS is given in Table 2.   . To the right is given an ANI percentage identity heatmap of the same genomes. The subsequent 6 layers correspond to various statistics related to the analysis i.e. the number of contributing genomes per GC, number of genes per GC, maximum number of paralogs per GC, Species Core Genome Clusters, Functional Homogeneity Index and Geometric Homogeneity Index. The last 38 layers represent the various annotation sources for each GC and the green coloration signifies the existence of annotation. "A" represents the common GCs to all 21 strains (Core genome). "B1" represents the GCs unique to the Aeruginosa group strains and "C" represents the GCs which are unique to the Stutzeri group strains. "B2" represents the GCs specific to the Aeruginosa group strains but without Strain05 which has no GCs in this bin. The bar chart at the top right represents the gene redundancy (blue bar color) and the singleton GC (orange bar color) per genome. Table 2. Functional annotation of a CDS (yellow border in Fig. 3Β).