Determining the ecological and evolutionary forces that shape the structure of marine sponge microbiomes is fundamental to current marine microbiology research due to the relevance of these symbiotic communities to ecosystem functioning1,2,3,4 and biotechnology5,6,7,8. Fifty-two bacterial phyla have been reported to inhabit sponges3, with Proteobacteria (mostly Alpha- and Gammaproteobacteria) being by far the most abundant, followed by Acidobacteria, Actinobacteria, Chloroflexi, Nitrospirae, Cyanobacteria and the candidate phylum Poribacteria3,9. Sponge-associated bacteria engage in nutritional exchange with their hosts and as such are considered to play an important role in benthic biogeochemical cycling4,10,11. Moreover, they are believed to produce most of the secondary metabolite repertoire of sponges6,12,13,14,15,16,17, and thus hold potential value for applications in medicine and pharmacy7,16.

Alphaproteobacteria display great versatility in their association with multicellular organisms, with interactions ranging from mutualistic over commensal to parasitic and pathogenic18. Microbial diversity surveys performed on different sponge species from various geographic locations have noted Alphaproteobacteria as regular sponge associates9,19,20,21. Particularly, a variety of currently uncultivatable lineages in the families Rhodobacteraceae and Rhodospirillaceae have been found as dominant members of the marine sponge microbiome22,23,24. Recent metagenomic “binning” studies, that sort metagenomic sequences into genomes that are assumed to constitute separate taxa, uncovered versatile metabolisms among diverse uncultivated Alphaproteobacteria symbionts of sponges, in which import and utilization of organic nitrogen and sulphur emerged as conspicuous features2,25,26. Among cultivated sponge-associated Alphaproteobacteria, the genera Pseudovibrio and Ruegeria likely rank as the best-described groups. They have been consistently isolated from various host species across the globe19,20,27,28,29, and based on recent genomic surveys, are considered to be well equipped for a symbiotic life-style30,31,32,33. In contrast, our understanding of the potential contribution of most cultivatable sponge-associated bacteria to holobiont functioning remains hindered by scarce knowledge of their genome content and architecture.

The known taxonomic diversity of sponge-derived culture collections is still limited, with 1% to 14% of the total sponge bacterial community estimated to be cultivatable using different methods19,27,34,35. Indeed, the most abundant bacterial symbionts of sponges, in particular, remain uncultivated36,37. Complicating factors for the cultivability of these bacteria are the initial sample processing method, the nature of the growth medium, and the incubation conditions38. The in-situ implantation of nutrient medium-containing diffusion growth chambers (DGCs)39 into sponge specimens and their subsequent incubation in the field40, or the concomitant use of several solid or liquid media (with and without antibiotics)27,41, for example, have been attempted to enlarge the phylogenetic breadth of marine sponge symbionts captured in the laboratory, and have shown promising results. However, continuous effort to cultivate hitherto “uncultivatable” symbionts or novel representative lineages within taxa less prone to cultivation is needed if we are to harness the metabolism of the marine sponge microbiome in a comprehensive fashion.

In this study, to attempt the isolation of “difficult-to-culture” bacterial symbionts of sponges - defined here as any organism detected in association with the sponge host regardless of whether the interaction is beneficial or obligatory36 - we used simple modifications to growth medium preparation and incubation conditions. First, we replaced the solidifying agent agar, which may inhibit the growth of certain bacterial taxa42,43, with the nontoxic agent gellan gum. In addition, to favour the cultivation of putatively slow-growing bacteria44, we prepared a low-carbon culture medium and utilized a lower incubation temperature (19 °C) with a prolonged incubation period (8 weeks). Our conditions favoured the cultivation of taxonomically diverse Alphaproteobacteria strains, especially of the genus Ruegeria, prompting us to (1) investigate the functional features of ten distinct genera spanning three Alphaproteobacteria orders (Rhodobacterales, Sphingomonadales and Rhizobiales) in detail, and (2) define the core functional attributes of Alphaproteobacteria species cultivated from the model sponge host Spongia officinalis. Cultivation-independent methods were employed to infer the relative abundance of the studied lineages in the S. officinalis microbiome, enabling us to critically contextualize the implications of genomic blueprints of symbiosis, identified across all these lineages, as possible factors enhancing host fitness.

Material and Methods

Sample collection, cultivation of bacteria, and phylogenetic analysis

Four Spongia officinalis specimens (Alg230-Alg233, for details see Karimi, et al.23) were collected in May 2014 by SCUBA diving at 20 m depth off the coast of Pedra da Greta (36° 58′ 47.2 N ;7° 59′ 20.8 W), southern Atlantic Ocean, Portugal, and transported to the laboratory within approximately 1 h in a cooling box. Specimens were processed immediately upon arrival: 2.5 g of the specimens’ inner body were cut and macerated with a sterile mortar and pestle in 22.5 mL of calcium and magnesium-free artificial seawater (CMFASW) (for details see Hardoim, et al.45; Esteves, et al.29). Suspensions were serially diluted in CMFASW after which 100 µL of 10−3 to 10−8 dilutions were spread on marine gellan gum medium (hereafter called ‘MG50’) plates in triplicates. ‘MG50’ was prepared by adding 0.802 g marine broth (MB; ROTH®) in 1 L artificial seawater (ASW) (MB 50 times diluted) and solidified with Phytagel™ (gellan gum; 5 gL−1). All plates were incubated for eight weeks at 19 °C. Bacterial growth was monitored weekly and colony forming units (CFUs) counted. Colonies were selected based on their variations in colour and shape with the aim of isolating as many different bacterial morphotypes as possible rather than randomly collecting a high number of strains. Nevertheless, highly abundant morphotypes were picked more often to enable the assessment of different bacterial lineages possibly sharing the same colony morphology (see Supplementary Table S1). Average CFU counts ranged from 3.0 × 106 (specimen Alg232) to 8.1 × 106 (specimen Alg230) CFUs g−1 sponge wet weight. Sponge specimen Alg231 had 6.9 × 106 ± 0.00015 × 106 CFUs g−1 (mean ± SE), and was chosen for colony isolation as it showed the greatest variety of morphologically distinct colonies. Here, we benefited from previous knowledge of the (equivalent) functional and taxonomic bacterial diversity present in each sponge specimen, acquired via shotgun metagenome sequencing23. This allowed us to calibrate our sampling effort to cover the total colony morphotype diversity within one specimen (higher morphotype sampling depth) rather than spreading the effort across several specimens, a strategy that would likely lead to the retrieval of the same and most abundant phylotypes from different specimens (lower morphotype sampling depth). In total, 48 colonies (many of which were morphologically unique) were picked and streaked to purity on MG50 plates. The purified isolates were then grown for 48 h in 1:2 diluted marine broth (MB2) and stocked in fresh MB2 supplemented with 20% glycerol at −80 °C. 16 S rRNA gene-based taxonomic affiliation of the isolates, from genomic DNA extraction to PCR amplification and classification using the RDP Classifier tool, were performed as established elsewhere29,46 and described in File S1 (Supplementary Information).

Genome sequencing of sponge-associated Alphaproteobacteria

Although not specifically designed for this purpose, most of the isolates retrieved in our cultivation attempt belonged to the class Alphaproteobacteria (see below), leading us to inspect their coding potential through genome sequencing. In-depth 16S rRNA gene phylogenetic inference of the Alphaproteobacteria isolates obtained in this study was performed as detailed in File S1 (Supplementary Information) to select representative strains for comparative genomics. Thereafter, genomic DNA samples of ten phylogenetically distinct Alphaproteobacteria strains (representing all obtained Alphaproteobacteria genera) were sent for genome sequencing on an Illumina MiSeq platform at Mr. DNA (Shallowater, TX, USA). Paired-end libraries (2 × 301 bp) were generated and the genomes were assembled de novo into contigs with the NGen DNA assembly software by DNAStar, Inc. as described previously47. All contigs of each genome were subjected to a BLAST (NCBI) search via the computational cluster facility of the Algarve Centre of Marine Sciences (CCMAR). The extracted BLAST files were then analysed in MEGAN548 to confirm whether the taxonomic affiliation of each contig matched that of its respective source strain. Contigs found not to fall within the expected taxonomic affiliation of its respective strain, and/or less than 1,000 bp in length, were discarded prior to annotation and downstream comparative analyses. Estimates of completeness and “contamination” of all genomes – as determined by the proportion of core single copy genes in each genome and their extent of duplication, respectively - were obtained using the CheckM tool, with lineage-specific marker sets selected at class, order, or family ranks49.

Annotation and comparative analysis of genomes

Open Reading Frame (ORF) prediction and annotation of the genome sequences were performed using the RAST (Rapid Annotation using Subsystem Technology) prokaryotic genome annotation server (version 2.0) with standard procedures50. In addition, all genomes were uploaded to the software platform EDGAR 2.051 where core and pan-genomes were defined and the number of singleton genes per genome was determined based on the coding sequences (CDSs) predicted using RAST. EDGAR was further employed to generate a phylogenomic tree and estimate average amino-acid and nucleotide sequence identities (AAI and ANI, respectively) for the ten Alphaproteobacteria genomes, following the approach of Karimi, et al.26. CDSs were also subjected to annotation based on Clusters of Orthologous Groups of Proteins (COGs) using the on-line server WebMGA (e-value = 0.001)52. Unless otherwise stated, quantitative functional comparisons between the genomes were performed using COG annotations after Hellinger transformation of COG profiles (i.e. square root calculation of the relative abundance of each COG entry in a given genome). To address the functional relatedness between the ten alphaproteobacterial genomes and determine whether statistically sound functional groups exist among strains, redundancy analysis (RDA) was performed with the software package Canoco 4.5 (Microcomputer power, Ithaca, USA). The “species fit range” function was applied using high stringent settings ( > 99%) to identify COGs exclusive to different functional groups. Where applicable (that is, when sample sizes – numbers of genomes - were not prohibitive) White’s non-parametric t-test was conducted within STAMP v2.0.953 to identify COG entries differently abundant (i.e. “enriched” or “depleted”) between groups of genomes.

To identify core alphaproteobacterial functions deemed ecologically and evolutionarily informative in the context of sponge-bacteria symbiotic relationships, the lists of CDSs and COGs common to all genomes were manually inspected. Particularly, we looked for the presence of genomic signatures found to be markedly enriched or depleted in the Spongia officinalis endosymbiotic consortium23,26 or in marine sponges in general12,54 in the core genome of the Alphaproteobacteria strains cultivated and fully sequenced in this study. To test whether the relative abundance of the examined signatures varied significantly among functional genome groups (as determined by RDA, see above), one-way ANOVA, followed by a Tukey post-hoc test if significant, was performed after verifying that all data passed equal variance tests.

To gain further insight into their secondary metabolite production capacities beyond COG-based annotations, all genomes were screened for the presence of secondary metabolite biosynthetic gene clusters (BGCs) using antiSMASH v.355.

Relative abundance of sponge-associated Alphaproteobacteria across marine biotopes

Variations in coverage of each Alphaproteobacteria genome were inspected by mapping the already available microbial metagenomes from S. officinalis (four specimens), surrounding seawater (three replicates) and sediments (three replicates)23 against the assembled genome of each bacterium. To this end, the sequencing reads from the replicate metagenome samples within each marine biotope mentioned above were pooled and thereafter aligned to each Alphaproteobacteria genome using bowtie2 v. 2.2.6 at default settings56. The alignment scores, displayed as proportions of reads in the metagenomes that could be aligned with each single genome, were used as comparative measures of relative abundance of the studied alphaproteobacterial strains across S. officinalis, sediments and seawater. Additionally, genus-level inference of relative abundance in S. officinalis, seawater and sediment metagenomes was performed by calculating the proportion of protein-encoding genes (CDSs) assigned to the Alphaproteobacteria genera targeted in this study in each replicate metagenome sample. Genus-level taxonomic assignment of CDSs and subsequent relative abundance inference was achieved, in this study, using MG-RAST (Meta-Genome Rapid Annotation using Subsystems Technology) v3.057 annotations made available previously for S. officinalis, seawater and sediment metagenomes (Karimi et al.23, MG-RAST study ID: 021215RCmetagenomes).

Ethics statement

This study relied on in situ sampling of microorganisms from marine invertebrates without a nervous system, and as such was exempt from ethical approval procedures according to the current Portuguese legislation (Decreto-Lei n° 113/2013). This study did not occur within privately owned or protected areas. This study did not involve endangered or protected species. The sampling methodology privileged minimally invasive handling procedures, following the guidelines of the European Directive 2010/63/EU.


Isolation and identification of S. officinalis-associated bacteria

Forty-eight aerobic, heterotrophic bacteria representing different colony morphologies were selected for further genotypic characterization (Supplementary Table S1), with 46 isolates belonging to the phylum Proteobacteria and only two isolates to the phylum Actinobacteria (Supplementary Table S2). Altogether, 12 formally recognized bacterial genera and two phylotypes unclassifiable at the genus level were identified, representing 24 unique 16S rRNA gene OTUs (Supplementary Table S2). Within the Proteobacteria isolates, the vast majority (41) affiliated with the Alphaproteobacteria class in the orders Rhizobiales (1 isolate), Sphingomonadales (2 isolates) and Rhodobacterales (38 isolates) (Fig. 1), whereas the remainder belonged to the Gammaproteobacteria class in the orders Vibrionales (4 isolates in the genus Vibrio) and Alteromonadales (1 isolate in the genus Shewanella) (Supplementary Tables S1 and S2). Among the Alphaproteobacteria isolates, two subgroups were represented within the Rhodobacterales order, namely the “Roseobacter clade”58 containing isolates classified as Ruegeria, Loktanella, Tateyamaria and two Rhodobacteraceae strains unclassifiable at the genus level (Alg231–04 and Alg231-30, see File S1 for details), and the “Stappia clade”58 containing isolates affiliated with the genera Pseudovibrio and Labrenzia (Fig. 1). Of the 19 unique OTUs assigned to the Alphaproteobacteria, ten belonged to the genus Ruegeria (see File S1 for details), while the remaining genera/phylotypes were represented by one single OTU each (Fig. 1). To determine the core genomic features of the diverse Alphaproteobacteria community cultivated from S. officinalis, we sequenced and compared the genomes of ten Alphaproteobacteria isolates representing eight formally accepted genera and two unclassifiable phylotypes (Alg231-04 and Alg231-30, Fig. 1, File S1) spanning the Rhodobacteraceae, Rhodobiaceae, Sphingomonadaceae and Erythrobacteraceae families (Table 1). The Ruegeria strain chosen for genome sequence represented the most abundant OTU within the genus (Fig. 1).

Figure 1
figure 1

16S rRNA gene Maximum Likelihood tree of Alphaproteobacteria species. Kimura 2-parameter evolutionary distances between sequences were calculated using MEGA792. Alphaproteobacteria strains isolated from S. officinalis are shown in bold, with each entry representing a unique OTU at 100% nucleotide homology cut-off. The number of isolates obtained from S. officinalis that belong to the same OTU are given in brackets. Closest NCBI BlastN hits and type strains (T) to each isolate are shown on the tree. Blue marks sponge-associated, orange marks invertebrate-associated and green marks marine algae-associated closest NCBI BlastN hits and type strains. Strains that had their genome sequenced are marked with an asterisk. Bootstrap values (500 repetitions) above 70% (0.7) are shown on the tree nodes. The tree contains 80 entries, and 682 nucleotide positions are included in the dataset.

Table 1 Basic genome features of sponge-associated Alphaproteobacteria cultivated in this study.

General features of the S. officinalis-associated Alphaproteobacteria genomes

The size of the assembled alphaproteobacterial genomes ranged between 3.13 Mb for Erythrobacter sp. Alg231-14 and 7.40 Mb for Labrenzia sp. Alg231-36. G + C contents varied from 51.3% in Pseudovibrio sp. Alg231-02 to 59.6% in the unclassified Rhodobacteraceae strain Alg231-04 (Table 1). The number of coding sequences ranged from 3,139 to 5,120 and the number of RNA genes from 41 to 77 including 3 to 12 ribosomal RNA (rRNA) genes (Table 1).

The core genome of the ten S. officinalis associated Alphaproteobacteria consisted of 587 genes (Supplementary Table S3, see below for details), while the pan-genome comprised 25,449 genes. The number of singleton genes unique to each genome ranged from 955 in Rhodobacteraceae bacterium Alg231-04 to 3,193 in Labrenzia sp. Alg231-36, and correlated to some extent with the phylogenetic position of the isolates: the five Roseobacter clade strains had the smallest numbers of singleton genes, followed by the Sphingomonadales, the Rhizobiales and then the two “Stappia clade” isolates Pseudovibrio Alg231-02 and Labrenzia Alg231-36, which as well possessed the largest genomes (Fig. 2a).

Figure 2
figure 2

Singleton (i.e., strain-specific) genes, core and pan-genomes (a), and frequency plot of COG classes (b) across the Alphaproteobacteria genomes analysed in this study. The doughnut in (A) represents the core/pan-genome ratio retrieved from the dataset.

At a coarse level of functional resolution (i.e., COG classes), we found that the inspected strains possessed similar functional genome organization as COG classes ‘amino acid transport and metabolism’ (E), ‘transcription’ (K), ‘carbohydrate transport and metabolism’ (G), and ‘energy production and conversion’ (C), together with ‘general function prediction’ (R) and ‘function unknown’ (S), were the most dominant of the entire dataset (Fig. 2b). Although the rank distribution of COG classes differed somewhat between the individual genomes, the above-mentioned classes always prevailed compared to other classes, in each genome (Fig. 2b).

Functional ordination of sponge-associated Alphaproteobacteria genomes

At the finest level of (COG-based) functional resolution, 2,804 individual COG entries were annotated in the ten genomes, with the number of COGs per genome ranging from 2,309 in Erythrobacter sp. Alg231-14 to 5,625 in Labrenzia sp. Alg231-36 (Supplementary Table S4) and 959 COG entries being shared by all genomes (Supplementary Table S5). Three functional groups were found to significantly contribute to variation in COG profiles after RDA: Group I, composed by the five Roseobacter clade genera/strains (Rhodobacterales); Group II, comprising Anderseniella sp. Alg231-50 (Rhizobiales) along with Labrenzia sp. Alg231-36 and Pseudovibrio sp. Alg231-02 (Rhodobacterales); and Group III, formed by Erythrobacter Alg231-14 and Sphingorhabdus Alg231-15 (Sphingomonadales) (Fig. 3). The clustering of Group II suggests that the genomes of Labrenzia and Pseudovibrio are functionally closer to that of Anderseniella than to other genera in the Rhodobacterales order/Rhodobacteraceae family (Group I) (Fig. 3), in line with genome-wide assessments of phylogeny26,59.

Figure 3
figure 3

Functional ordination of Alphaproteobacteria genomes via Redundancy Analysis (RDA). Hellinger-transformed COG profiles were used as genome descriptors. Blue stars represent the centroid positions of functional genome Groups I, II and III, found to contribute significantly to variations in COG profiles as determined by Monte-Carlo permutation tests. Values displayed on the diagram axes refer to the percentage variation in the total dataset explained by the respective axis. Samples (i.e., genomes) are plotted in the ordination diagram in accordance with Euclidean distances calculated for each pair of genomes based on their COG relative abundance profiles. Arrows represent COGs displaying positive correlation fit > 99% with their corresponding genome group(s), all of which are listed in Supplementary Table S6. COGs highlighted in red have been approached more thoroughly in this study. Note the closer functional similarity between members of the “Stappia group” (Pseudovibrio and Labrenzia, formally belonging to the family Rhodobacteraceae in the order Rhodobacterales) to the genus Anderseniella (order Rhizobiales) (Group III) than to other genera of the Rhodobacteraceae family (Group I).

Group-specific genome features

Figure 3 displays COG entries showing > 99% fit range with the above-mentioned functional genome groups, allowing us to quickly identify a suite of COGs (n = 73) (Supplementary Table S6), occurring in all genomes within one particular functional group and absent in the remainder. Within the 13 COGs found to be exclusive to Group I under this approach, seven could not be assigned a function and the remainders were involved with nutrient transport and metabolism, DNA replication and repair, and cell wall and ribosome biogenesis (Supplementary Table S6). In contrast, more ecologically informative COGs characterized functional groups II and III. Among these, we highlight COG1345 (Flagellar capping protein), COG1619 (Microcin C7 resistance protein MccF), COG3670 (lignostilbene dioxygenase), and COG4787 (Flagellar basal body rod protein), exclusive to Group III. Further, a protease II entry (COG1770, protein catabolism), two inorganic pyrophosphatases (COG0221; COG3808) important in lipid degradation and inorganic phosphate production, and a type IV pili component (COG5461) generally important for adherence, movement and host colonization - all traits usually regarded as host-associated adaptive features - were shared by Groups II and III while absent in Group I. To further determine characteristic genomic traits of the tightly clustering Group I (Roseobacter clade genomes), their COG profiles were, collectively, compared with those of the remaining five alphaproteobacterial genomes from Groups II and III using White’s non-parametric t-test (White et al., 2009). This quantitative comparison revealed 306 COGs differentially abundant between Group I and Groups II-III (Supplementary Table S6). COGs related to ABC transporters, sulphate/phosphate metabolism and secondary metabolite biosynthesis (COG class Q) were generally enriched in Group I, with N-acyl-L-homoserine lactone synthetases (COG3916) being typical of it. In contrast, predicted xylanase/chitin deacetylases (COG0726) and eukaryotic-like protein (ELP) COGs (COG0666, COG0790, and COG0457) were more abundant in functional genome Groups II-III (Supplementary Table S6, see also Table 2).

Table 2 COG entries involved in Restriction-Modification systems (V), Polyketide biosynthesis (Q), and Eukaryotic-like protein repeats (R) across the genomes of sponge-associated Alphaproteobacteria analyzed in this study.

Alphaproteobacteria cultivated from S. officinalis share a versatile metabolism

Nutrient metabolism and cycling

All Alphaproteobacteria strains had several ABC-type transporter-encoding genes in common for the transport of sugars, dipeptides and branched-chain amino acids. Likewise, they possessed several copies of nitroreductase-encoding genes (COG0778) involved in the reduction of nitrogen-containing aromatic compounds. The nitrogen regulatory protein PII (COG0347), which mediates cell response to N availability, was also present with at least two gene copies in each genome, along with ammonia permease encoding genes (COG0004). Many important sulphur metabolic functions were common to the Alphaproteobacteria genomes. All contained several gene copies encoding for arylsulfatase A (COG3119), which breaks down sulphatides thus liberating sulphate, sulphate permeases (COG0659), taurine deoxygenase TauD (COG2175, taurine catabolism), sulphur transferases (COG2897), 3′-phosphoadenosine 5′-phosphosulfate (PAPS) 3′-phosphatase (COG1218; enzyme involved in sulphur assimilation and/or sulphate reduction) and sulphite reductases (COG0155; enzymes catalysing the reduction of sulphite (SO32−) to hydrogen sulphide (H2S)). All strains also shared ABC-type components involved in phosphate transport (COG0226, COG1117, COG0573, COG0581), a phosphate uptake regulator (COG0704), and genes encoding for guanosine polyphosphate pyrophosphohydrolases/synthetases (COG0317).

We found that all genomes shared several genes encoding for proteins that contain or require B vitamins including thiamine (B1, COG0352), riboflavin (B2, COG0054; COG0307, COG1985), nicotinic acid (B3, COG1057), pyridoxamine phosphate oxidase (B6, COG0259), biotin (B7, COG0340), and cobalamin (B12, COG4547) (Supplementary Table S5). The presence of riboflavin synthase alpha and beta chains and of pyridoxal phosphate biosynthesis protein (PdxJ) encoding genes confirms the potential synthesis of vitamin B2 and B6 by all genomes.

Defence, detoxification and antibiotic resistance

All genomes possessed gene copies for cation and Na+ driven multidrug efflux pumps, ABC-type multidrug efflux systems and antimicrobial peptide transport systems (Supplementary Table S5). Hydrolases of the metallo-beta-lactamase superfamily and beta-lactamase class C were collective antibiotic resistance functions, whereby the Sphingomonadales strains Sphingorhabdus sp. Alg231-15 and Erythrobacter sp. Alg231-14 had the highest gene copy numbers with 16 and 11 genes, respectively. A gene encoding for an uncharacterized protein (COG1968) conveying resistance against the polypeptide antibiotic bacitracin was also detected (Supplementary Table S5). A shared catalase (peroxidase 1, COG0376) encoding gene could scavenge reactive oxygen species (ROS), while several glutathione S-transferase gene copies (COG0625) in each genome could aid in xenobiotic detoxification/oxidative stress. The genomes also harboured between one and three arsenate reductase-encoding genes for the reduction of arsenate to arsenite in arsenic detoxification processes. Moreover, all genomes were equipped with varied restriction-modification (R-M) systems (i.e. endonucleases) e.g. involved in anti-viral defence, but only one single R-M system (COG1403) was common to all of them (Table 2). Likewise, all genomes possessed genes involved in the biosynthesis of polyketides, but only one COG entry (COG5285 - mitomycin antibiotics/fumonisin) was shared by all genomes (Table 2).

Eukaryotic-like protein (ELP) encoding genes

Genes encoding for eukaryotic-like proteins (ELPs) usually thought to play a role in sponge-microbe interactions including ankyrin repeats (ANKs), tetratricopeptide repeats (TPRs), WD40 proteins, and pyrroloquinoline quinone (PQQ) were identified in all ten alphaproteobacterial genomes (Table 2). However, only Anderseniella sp. Alg231-50 possessed all the above-mentioned ELP types, and leucine-rich repeats (LRR) were detected only in Pseudovibrio sp. Alg231-02. Also, the Anderseniella strain, together with Labrenzia sp. Alg231-36, possessed the highest numbers of gene copies for the respective ELP motifs (Table 2).

Secondary metabolite gene clusters

Using antiSMASH, all strains except Rhodobacteraceae bacterium Alg231-04 and Erythrobacter sp. Alg231-14 were found to harbour polyketide synthase (PKS)/non-ribosomal peptide (NRPS) gene clusters but, congruent with COG-based annotation (Table 2), diverged in terms of the diversity and types of compounds predicted to be produced (Fig. 4, Supplementary Table S7). Likewise, terpene synthase clusters were detected in eight of the ten strains but not in Loktanella sp. Alg231-35 and Rhodobacteraceae bacterium Alg231-04 (Fig. 4). Bacteriocin (peptidic toxins) gene clusters were detected for all cultivated strains except Anderseniella sp. Alg231-50. In contrast, only the Anderseniella strain harboured a gene cluster encoding for the osmolyte ectoine. Corroborating COG annotations, genes encoding for homoserine lactone signalling molecules were identified via antiSMASH in all Roseobacter clade genomes (Group I, Fig. 3) and, in addition, in Labrenzia sp. Alg231-36 (Fig. 4, Supplementary Table S7).

Figure 4
figure 4

Phylogenomic tree and secondary metabolite biosynthesis potential of Alphaproteobacteria species cultivated from Spongia officinalis. The tree was generated using PHYLIP within the EDGAR environment. The neighbor joining method was applied on a matrix of Kimura distances between amino acid sequences predicted from all protein-encoding genes common to the ten genomes (core genes, n = 587). The scale bar represents the residue substitutions per site. Bootstrap values (300 repetitions) are shown on tree nodes. Sequence alignments were performed using MUSCLE. The heat-map shows the average nucleotide identity (ANI) calculated for each pair of genomes. Colored bars next to tree leaves represent gene clusters showing homology to known biosynthetic gene clusters (BGCs, see Supplementary Table S7 for details) after genome-wide screening with antiSMASH.

Cultivatable Alphaproteobacteria share genomic blueprints of symbiosis

We investigated the relative abundance of all COG entries (COG-derived core genome, Supplementary Table S5) or CDSs (RAST-derived core genome, Supplementary Table S3) ascribed to functions present in the core genome that represent so-called “symbiosis factors” or “sponge-associated adaptive features” (e.g., Arylsulfatase A, Carbon monoxide hydrogenases – Cox, Cytochrome P450, resistance to heavy metals and xenobiotics) (Fig. 5). Some of the functions further explored (e.g. taurine metabolism, glutathione metabolism, resistance to antibiotics and metabolism of aromatic compounds) have been previously identified as enriched features of one hitherto uncultivatable, sponge-specific alphaproteobacterial lineage in the order Rhodobacterales26. Often no significant difference in COGs/CDSs relative abundance per functional genome group was found for the traits inspected, except for the quantitative enrichment of glutathione metabolism, taurine dioxygenase and cytochrome P450 encoding genes in Group III and the lower abundance of genes involved in the metabolism of aromatic compounds in the same group. Importantly, while displaying typical genomic features of several thus far uncultivatable sponge symbiotic bacteria, the here cultivated Alphaproteobacteria also possessed traits found to be “de-selected” in the S. officinalis endosymbiotic community in comparison with the surrounding environment23,26. These included, for instance, regulators of c-di-GMP metabolism along with Tad (Tight adherence) pilus- and motility and chemotaxis-encoding genes (Fig. 5).

Figure 5
figure 5

Relative abundance of ecologically informative genomic signatures across functional genome Groups I, II and III (Fig. 3). Columns represent average proportions (%) of genomic features in each functional group ± standard errors. The different letters above error bars indicate significant differences (P < 0.05) between groups. Respective F and P values are presented in the graphs. All data were of equal variance and all data except c-di-GMP-metabolism (5) and widespread colonization island (7) were normally distributed. COG annotations were used to infer the relative abundance of genomic features (1) to (7), determined by the ratio “total number of CDSs in the respective COG entry(ies)/total number of CDSs assigned to COGs” in each genome group. RAST annotations were used to infer the relative abundance of genomic features (8) to (10) (RAST subsystems), determined by the ratio “total number of CDSs in subsystem/total number of CDSs” in each functional group. Functions enriched (1–4, 6, 8, and 10) and depleted (5, 7 and 9) in the S. officinalis endosymbiotic consortium (Karimi et al., 2017), or in sponge-specific and uncultured Alphaproteobacteria lineages (Karimi et al., 2018), could be found across the genomes analysed in this study.

MG50 favours the cultivation of low-abundant sponge-associated Alphaproteobacteria

The available shotgun sequenced metagenomes of S. officinalis, seawater and sediment samples23 were mapped against the ten genomes sequenced in this study. The proportions of metagenome reads aligned to the target genomes were very low (Table 3). Under this strain-level approach, Anderseniella sp. Alg231-50 was the most dominant strain in the sponge metagenome (0.0082%) followed by Labrenzia sp. Alg231-36 (0.0068%) and Ruegeria sp. Alg231-54 (0.0067%), while Rhodobacteraceae bacterium Alg231-30 (0.0007%) clearly was the least abundant. All Alphaproteobacteria genome reads were somewhat more abundant in the seawater metagenome, followed by sediments and then S. officinalis (Table 3). As a frame of comparison with Anderseniella sp. Alg231-50, in this study 7x as many metagenomic reads from S. officinalis were found to align with the genome of the more dominant, uncultivated Rhodospirillaceae symbiont So9, reconstructed previously from the host’s microbial metagenome via genomic binning procedures26. Genus-level relative abundances calculated with the ratio “CDS reads assigned to taxon/total CDS reads in metagenome” returned higher percent values per taxon, but corroborated the trends of higher taxon representativeness in seawater, followed by sediments and then sponge metagenomes (File S1).

Table 3 Percent alignment of total metagenomic reads from S. officinalis, seawater and sediments with the genomes assembled in this study.


To date most bacteria isolated from sponges have been affiliated with the phyla Actinobacteria, Bacteroidetes, Firmicutes, and Proteobacteria29,35,60,61. Our cultivation method (see File S1 for a detailed discussion) promoted the controlled growth of diverse Alphaproteobacteria species from S. officinalis, in line with the observations of Sipkema et al. (2011) who retrieved a majority of Alphaproteobacteria strains from Haliclona sp. with diverse oligotrophic media. Our bacterial isolation procedure placed sharp focus on distinct colony morphologies, enabling us to cultivate 14 bacterial genera within 48 cultures and to foster unprecedented, deep genome mining of putatively novel genera (Alg231-04 and Alg231-35, File S1), of the first-described genome sequence in the Anderseniella genus - found to possess many adaptive signatures for a symbiotic life-style (File S1), and of several other understudied lineages within the Alphaproteobacteria class. Indeed, only few genome assemblies are currently available on public databases for Labrenzia (22 assemblies), Sphingorhabdus (9), Loktanella (17), Tateyamaria (3), Pseudovibrio (24), Ruegeria (33), and Erythrobacter (89) species (our own assemblies included), in comparison with the number of genome assemblies available for intensively studied marine bacteria such as Vibrio spp. (2,798 genome assemblies). Moreover, although the abovementioned genera have already been cultivated before, most of the strains sequenced in this study (7 in 10) share less than 99 or 98% 16S rRNA gene similarity with the type strain of their closest, described species (Table S1). Finally, the genome-centred strategy employed in this study can be useful in solidifying the phylogeny of unresolved groups, which is likely the case of Pseudovibrio and Labrenzia strains and their placement within the Alphaproteobacteria58,59. It can also aid in the proposal of novel taxa (the case of strains Alg231-04 and Alg231-30) as supported, for instance, by genome-wide ANI/AAI estimates62,63.

It is possible that technical limitations such as insufficient metagenome sequencing depth and/or the usage of only short read lengths which may not align properly with reference genomes64,65 contribute to an underestimation of relative abundances calculated with the metagenome-genome mapping approach used in this study. Nevertheless, for all strains the percentages of aligned metagenome-genome reads were highest in seawater, followed by sediments and only then by sponge microbial metagenomes, a pattern corroborated by genus-level assessment of relative abundances using MG-RAST functional annotation (see File S1). Moreover, we verified that metagenome-genome mapping estimates delivered higher relative abundances for uncultivatable alphaproteobacteria representing dominant sponge symbionts. Altogether, these results indicate that (1) marine sponges are not the primary habitat of the here cultivated Alphaproteobacteria species, (2) bacterial culturing methods tend to sample rare members of the marine sponge microbiome (as suggested by Montalvo, et al.66 and extensively discussed by Hardoim, et al.37, (3) low-abundant sponge symbionts usually captured in culture evolve adaptive features that support a biphasic particle- (“free-living”)/host-associated life-style. Indeed, numerous “symbiosis factors” have been identified in the genomes of Pseudovibrio and Ruegeria spp., prompting extensive discussion on their potential roles in promoting host fitness30,31,67. The debate has, however, often disregarded the in-situ densities of the studied organisms, raising concerns about the net effect of the presumed adaptive features on holobiont functioning68. Evidence exists for the presence of these symbiosis factors in the genomes of both free-living and host-associated representatives of cultivatable sponge symbionts68, reinforcing the biphasic mode of living hypothesis, and such factors have been proposed to underlie the evolution of canonical commensal bacteria such as Escherichia coli69. Here, we delineate the core functional traits of sponge-associated alphaproteobacterial cultures and address their relevance as genomic hallmarks of symbiosis and bimodal life strategies.

In nutritional terms, the presence of arylsulfatase-encoding genes in all genomes, an attribute enriched in the marine sponge microbiome23 and revealed to be common among several uncultivated lineages of sponge symbionts25,26, underlies one possible role of this pool of Alphaproteobacteria species in consuming sulphated polysaccharides. The potential ability to break-down taurine, identified earlier as one adaptive feature of a currently uncultivatable and sponge-enriched Rhodospirillales clade (“SERC”26), was identified, in this study, among many low-abundance and cultivatable Alphaproteobacteria spp. In addition, several genes encoding for vitamin B biosynthesis were shared among our strains. Biotin (B7), thiamine (B1) and cobalamin (B12) biosynthesis capacities are common, for instance, among members of the Roseobacter clade70 and evidence exists for the participation of symbiotic Alphaproteobacteria in nourishing vitamins B1 and B12 required by a marine dinoflagellate (Lingulodinium polyedrum) for growth71. In line with this view, Alpphaproteobacteria spp. could likewise play an important role in providing essential nutrients for sponge growth and functioning.

Each of the studied isolates possessed hundreds of genes conferring resistance to antibiotics and toxic compounds. Particularly intriguing in this regard was the ubiquitous presence of genes encoding arsenate reductase that mediates the reduction of arsenate (As(V)) to arsenite (As(III)) in arsenic detoxification processes72,73. This feature has been recently assigned for the sponge symbiont Entotheonella sp. which mineralizes arsenic and barium in intracellular vesicles74. Furthermore, genes involved in ABC-type multidrug efflux systems, hydrolases of the metallo-beta-lactamase superfamily, and remediation of ROS stress (e.g. gluthathione metabolism genes) underline how versatile the mechanisms of cell detoxification employed by these organisms can be75,76. Such capabilities may substantially increase bacterial fitness within dense and chemically-rich microbial communities, and have been generally reported as distinguishing features of the marine sponge microbiome in cultivation-independent studies12,23,26,54.

All of the analysed alphaproteobacterial genomes have ELPs which are known sponge symbiosis factors because of the role they play in the modulation of cellular protein-protein interactions and in the prevention of symbiont phagocytosis by host cells77,78. Functional genome Groups II and III had altogether higher proportions of ELPs than Group I, suggesting higher affinity of members of the former groups in establishing favourable or more stable interactions with marine sponges. This seems to be particularly true for the Anderseniella and Labrenzia strains, which possessed the higher ELP counts among the surveyed genomes and showed the highest relative abundance values, respectively, in the S. officinalis microbiome. It remains to be determined whether ELPs could likewise be involved in bacterial adaptation to other marine hosts, supporting an emerging, generalist pattern of occurrence of cultivatable Alphaproteobacteria across multiple sessile invertebrates such as ascidians79, corals80, and bryozoans81. Particularly intriguing was also the presence of genes required for the tight-adherence (Tad) pilus secretion machinery in all strains. The Tad locus underlies the assembly of Flp (fimbrial low-molecular-weight protein) pili fundamental for cell aggregation, biofilm formation, surface attachment, host colonization and pathogenesis82,83,84. Along with protein domains known to mediate biofilm formation (e.g. EAL and CGDEF domains involved c-di-GMP metabolism) and a multitude of other cell motility and chemotaxis factors, the Tad locus equips their host cells not only with host-colonization aptitude but also environmental hardiness. All these genomic features were de-selected in the S. officinalis endosymbiotic consortium while being more pronounced, for instance, in sediment metagenomes23, suggesting that they might be more required for persistence in other microniches. We therefore posit that such traits have been subjected to purifying selection to favour the maintenance of a dual life-style among the studied organisms.

Using antiSMASH, we could detect several antibiotic biosynthetic gene clusters across the studied genomes, in line with accumulating in vitro evidence for mild to high antimicrobial activities by sponge-associated Alphaproteobacteria such as Ruegeria, Pseudovibrio, and Labrenzia28,29,31,85,86,87. Particularly, both terpene-synthase and polyketide-synthase (PKS) biosynthetic gene clusters were common among the studied strains, each being present in eight out of ten genomes, while COG annotations predicted PKS-encoding genes for all genomes. The roles and activities of polyketides from sponge symbiotic bacteria have been largely explored in the last fifteen years6,13,14,15, however much less is known about the potential contribution of bacterial symbionts as producers of terpenoids in marine sponges23. Intriguingly, terpenoid biosynthesis has been regularly documented in keratose marine sponges61,88,89, including Spongia officinalis90. Yet the origin of the biosynthesis (host or symbionts) has, to our knowledge, not been specifically addressed by regular chemical screening studies. Sponge-derived diterpenoids have shown antimicrobial activity against pathogenic bacteria such as Pseudomonas aeruginosa88. Dihydrogracilin A, a terpene extracted from Dendrilla membranosa, has been shown to possess immune modulatory and anti-inflammatory action89. In addition, except for Anderseniella, all other Alphaproteobacteria strains possessed the potential to produce bacteriocins commonly regarded to inhibit growth of closely related strains and, as such, considered to be major molecules shaping the structure of microbial communities in situ91. Our results reveal that polyketide, terpene and bacteriocin biosynthesis capacities, recently documented in several Pseudovibrio genomes31,32, are widespread across diverse sponge-associated Alphaproteobacteria, suggesting a pivotal contribution of this clade to the chemical complexity, natural product biosynthesis repertoire and taxonomic composition of the marine sponge microbiome.

In conclusion, the use of simple modifications to regular culture conditions coupled to dedicated genome-wide analysis of marine sponge symbionts enabled unprecedented access to highly versatile metabolisms across diverse understudied Alphaproteobacteria. To improve our capacity to domesticate the so-far uncultivatable portion of the marine sponge microbiome, the design of future culture media should consider our improved understanding of the nutritional requirements of these symbionts acquired via recent metagenomic binning studies25,26, which allow strain-level, deep insights into the physiology of uncultivated bacteria. Here, we disclose manifold genomic blueprints of the marine sponge microbiome12,23,54 across the genomes of several low-abundance, cultivatable symbionts of Spongia officinalis, providing support for the convergent evolution of symbiosis traits above the genus level within a class known for its widespread occurrence in association with sponge hosts, encompassing hundreds of cultivatable and so far uncultivable sponge-associated lineages9,22,26. Certainly, the genomic attributes revealed here are to be found among closely-related, cultivatable Alphaproteobacteria - as emphasized above for Pseudovibrio and Ruegeria strains - retrieved not only from sponges but also from other particle- and host-associated microniches, suggesting that such traits are widespread across diverse lineages of generalist marine bacteria. Taken together, the outcomes compiled here contribute to novel insights into the potential roles of alphaproteobacterial communities in mediating molecular interactions and shaping the structure of the marine sponge microbiome. They further open new opportunities for study regarding the roles of low-abundace microorganisms as consistent reservoirs of functional redundancy within nature’s microbiomes, likely promoting the resilience of host-associated microbial assemblages in the marine realm.