The marine Roseobacter group is a subfamily-level lineage in Alphaproteobacteria and plays an important role in global carbon and sulfur cycling [1, 2]. It is highly abundant in the coastal environments, accounting for up to 20% of all bacterial cells [3,4,5]. Over 300 species and 100 genera have been described [6], the vast majority of which harbor large and variable genomes and grow readily on nutrient-rich solid media which are not representative of the niches found in the oligotrophic oceans. Early culture-independent 16S rRNA gene surveys showed that the oceanic roseobacters are represented by a few uncultivated lineages [1, 7]. Recently, novel cultivation techniques and single-cell genomics have made available (partial) genome sequences of several previously uncultivated lineages including NAC11-7 [8], DC5-80-3 (also called RCA) [9, 10], and CHAB-I-5 [11, 12]. Although these lineages are spottily distributed throughout the Roseobacter phylogeny, they together form a pelagic Roseobacter cluster (PRC). The PRC members consistently harbor smaller genomes and show more similar genome content compared to other roseobacters [11]. Learning their evolutionary histories is essential to understand how the genetic and metabolic diversity of the pelagic Roseobacter lineages has formed, which in turn helps appreciate their roles in oceanic carbon and sulfur cycles. However, most PRC members form orphan branches and lack closely related reference genomes, which hampers our further understanding of their evolutionary trajectories.

Here, we isolated eight closely related roseobacters from several ocean regions that consistently possess some of the smallest genomes (~2.6 Mb) among all known roseobacters. They together formed a novel Roseobacter lineage which we named ‘CHUG’ (Clade Hidden and Underappreciated Globally) that is abundant and active in global oceans. Unlike other PRC lineages, CHUG members are uncorrelated with chlorophyll a (Chl-a) concentration in their global distribution and they cannot de novo synthesize vitamin B12, which is often the metabolite roseobacters supply to phytoplankton during their symbiosis [2, 13,14,15]. In contrast to the model roseobacter Ruegeria pomeroyi DSS-3, which is known to interact with phytoplankton species [16, 17], CHUG members cannot use many metabolites commonly released by marine phytoplankton groups. Therefore, the reductive evolution of CHUG may also indicate a dissociation with phytoplankton, a feature so far unique to CHUG among pelagic roseobacters.

Materials and methods

Detailed methods are described in Supplementary Text 1. Briefly, samples were collected from surface water of the South China Sea, the East China Sea and the northern Gulf of Mexico. Eight CHUG isolates were retrieved following different dilution cultivation procedures and sequenced with Illumina platforms. The raw reads were quality trimmed with Trimmomatic v0.36 [18] and assembled with SPAdes v3.10.1 [19]. Only contigs with length >2000 bp and sequencing depth >5x were retained. Among these, the isolate HKCCA1288 was further sequenced with PacBio Sequel platform and assembled using Unicycler v0.4.6 [20] to obtain a complete and closed genome. Protein-coding genes were predicted with Prokka v1.12 [21].

The average nucleotide identity (ANI) between genomes was calculated using fastANI v1.3 [22]. The assembled genome size, gene number, coding density, and GC content of each genome were predicted using CheckM v1.0.7 [23], whereas the estimated genome size was adjusted as (assembled genome size)/(completeness + contamination) [24]. Pseudogenes were predicted following our recent study [25], and other genomic features were summarized using custom scripts (see Code availability). The phylogenetic ANOVA analyses were performed to compare the analyzed genomic traits while controlling for the evolutionary history of those traits using the ‘phylANOVA’ function of the ‘phytools’ R package [26].

To characterize the global occurrence and activity of CHUG, the TARA Ocean metagenomic and metatranscriptomic sequencing data with size fractions up to 3 µm (prokaryote-enriched) [27, 28] and additional metagenomic sequencing data with size fraction of 5–20 µm (nanoplankton-enriched) [29] were mapped to all 79 roseobacters studied here using bowtie v2.3.2 [30] and BLASTN v2.9.0+ [31]. Only reads sharing >95% similarity and >80% alignment to their best hit were kept for the calculation of relative abundance and activity, which is approximated by Reads Per Kilobase per Million mapped reads (RPKM). The relative abundances of CHUG and other PRC lineages across the global oceans were compared using the Wilcox test. The correlation analysis between their relative abundances and environmental factors was performed using the ‘rcorr’ function in the ‘Hmisc’ R package [32], and the significance level was adjusted using stringent Bonferroni correction for multiple comparisons. These analyses were performed for the CHUG lineage as an entirety instead of each individual genome because these isolates are very closely related and performed equally well in metagenomic read recruitment.

The Roseobacter phylogeny was constructed based on 120 bacterial marker genes [33], and sampling the reference Roseobacter genomes included in the phylogeny followed a previous study [34]. Marker genes were each aligned at the amino acid sequence level using MAFFT v7.222 [35] and trimmed using trimAl v1.4.rev15 [36]. Next, a maximum likelihood (ML) phylogenomic tree was built using IQ-TREE v1.6.2 [37] based on the concatenated alignments. To characterize the similarity between roseobacters at the genome content level, a binary matrix showing the presence and absence pattern of orthologous gene families predicted by OrthoFinder v2.2.1 [38] was used to construct the genome content dendrogram with IQ-TREE v1.6.2 [37]. The gene copy number of each orthologous family was further used to estimate the ancestral genome content for CHUG, its sister group and the outgroup using BadiRate v1.35 [39] on top of the ML phylogenomic tree. A potential role of random genetic drift in driving CHUG genome reduction was approximated by comparing the ratio of radical nonsynonymous nucleotide substitution rate (dR) to conservative nonsynonymous nucleotide substitution rate (dC) on the ancestral branch leading to the CHUG cluster with this ratio on the ancestral branch leading to its sister clade, following our previous protocol [40]. The phylogenetic signal of a specific gene (e.g., coxL, pdo, sox) or trait (e.g., light utilization) was predicted with the ‘phylosig’ function of the ‘phytools’ R package [26]. The association with a particular category (e.g., PRC or non-PRC) of a gene or a trait with or without a strong phylogenetic signal was performed using ‘binaryPGLMM’ analysis in the ‘ape’ R package [41] and χ2 test, respectively.

To validate the vitamin B12 auxotrophy in CHUG members, a growth assay was performed separately on HKCCA1288 as the experimental CHUG strain and on the model roseobacter strain Ruegeria pomeroyi DSS-3 [16] as the positive control. Strains were each cultivated in a defined medium for 96 h in the presence or absence of vitamin B12, and samples were collected every 12 h for cell counting using a flow cytometer (Guava EasyCyte Plus, MA, USA) equipped with a fluorescence detector. To test the differences in substrate (190 carbon sources) utilization, the two strains were assayed with the phenotype microarray (PM) technology from BiOLOG following our recent study [42]. All experiments were performed in triplicate.

Results and discussion

The CHUG diversity

The eight CHUG genomes share ≥99.7% 16S rRNA gene identity and ≥93% ANI. The CHUG cluster further exhibit ≥98.2% 16S rRNA gene identity when sequences of a few uncultivated members are included (Fig. S1), which is comparable to other pelagic Roseobacter lineages, such as 98% [10] for DC5-80-3 and 96% [43] or 98% [7] for CHAB-I-5. CHUG genomes are relatively distantly related to their sister group (Fig. 1A), showing ≤96.5% 16S rRNA gene identity and ≤71% ANI. Unlike their sister group and the outgroup members isolated from various habitats (e.g., saline lake, algal culture, and coastal sediment; Table S1) other than the pelagic environment, CHUG members were collected exclusively from seawater. There are some important differences regarding the source ecosystems for the eight CHUG isolates, though. While five isolates were collected from regular coastal seawater, two (HKCCA1065 and HKCCA1288) and one (HKCCD6035) were isolated from the ambient seawater of the brown alga Sargassum hemiphyllum and of the coral Platygyra acuta, respectively (Table S1).

Fig. 1: Phylogenomic tree and gene content dendrogram of roseobacters.
figure 1

A Maximum likelihood phylogenomic tree showing the position of CHUG in the Roseobacter group. The phylogeny was inferred using IQ-TREE v1.6.2 [37] based on a concatenation of 45,904 amino acid sites over 120 conserved bacterial proteins [33]. Solid circles in the phylogeny indicate nodes with bootstrap values >95%. The potential of aerobic (key gene cobG, red) and anaerobic (key gene cbiX, green) cobinamide synthesis (the first stage of Vitamin B12 synthesis) is labeled at the tips. B Dendrogram of the same Roseobacter genomes based on the presence/absence pattern of orthologous gene families.

Although not monophyletic in the phylogenomic tree (Fig. 1A), CHUG and seven other genomes from taxa previously sampled from pelagic environments form a coherent group called the ‘Pelagic Roseobacter Cluster’ (PRC; Fig. 1B) [11]. One previously identified PRC member, Roseobacter sp. R2A57 (4.13 Mb), was not affiliated with PRC in the present study. To facilitate our analysis, we divided the 79 roseobacters used here into five groups: CHUG, its sister group, the outgroup of CHUG and its sister group, other PRC members and other reference roseobacters, with eight, five, six, seven and 53 genomes, respectively.

Genomic features

Among the eight CHUG genomes, one (HKCCA1288) is closed with 2.66 Mb and the remaining draft genomes are nearly complete (≥98.5%) according to CheckM predictions (Table S1). Among other roseobacter genomes under comparison, at least 17 genomes are closed and the remaining are nearly complete (≥96.5%) (Table S1). Based on the assembled genome sizes, CHUG members possess much smaller genomes (2.52 ± 0.07 Mb, Fig. 2A) than an average roseobacter (4.16 ± 0.68 Mb). Further, their genome sizes are comparable to those of the NAC11-7 cluster represented by the strain HTCC2255 (estimated complete size to be 2.34 Mb), which is a phylogenetically basal roseobacter with the smallest genome among all known roseobacters [44]. As in HTCC2255, no plasmids were found in CHUG. However, the coding density of CHUG (91.7 ± 0.5%, Fig. 2B) show no significant difference from its sister group and the outgroup based on the phylogenetic ANOVA analysis (p > 0.05, ‘phylANOVA’; the same test used below unless stated otherwise). CHUG genomes have a lower genomic GC content (55.4 ± 0.8%, Fig. 2C) compared to their sister group (63.5 ± 1.6%, p < 0.05), although no significant difference was identified compared to the outgroup. In terms of pseudogenes, the number (99 ± 24, Fig. 2D) and ratio (3.9 ± 0.9%, Fig. 2E) in CHUG members are not significantly different from those of the sister group and outgroup. The seven other PRC members also have smaller genomes (3.26 ± 0.51 Mb, Fig. 2A) and a reduced GC content (49.6 ± 5.5%, Fig. 2C) compared to the 53 other reference roseobacters (genome size: 4.32 ± 0.64 Mb, GC content: 61.9 ± 4.1%; p < 0.01), but no significant differences were identified between the two groups in the coding density (Fig. 2B), the number (Fig. 2D) and ratio of pseudogenes (Fig. 2E).

Fig. 2: Genomic feature comparisons between CHUG, their sister group, the outgroup, seven other PRC members, and other reference roseobacters.
figure 2

The significance level in genomic features between CHUG and the other four groups is shown in red, while that between seven other PRC members and the remaining groups are shown in blue. Statistical tests were performed using phylANOVA analysis [26] for genome size (A), coding density (B), GC content (C), number of pseudogenes (D), ratio of pseudogenes (E), C-ARSC (F), N-ARSC (G), number of genes (H), number of orthologous families (I), and gene copy number per orthologous family (J), respectively. The markers *p < 0.05 and **p < 0.01, respectively. C-ARSC carbon atoms per amino-acid-residue side chain, N-ARSC nitrogen atoms per amino-acid-residue side chain.

CHUG genomes show increased use of carbon atoms per amino-acid-residue side chain (C-ARSC, 2.833 ± 0.005, Fig. 2F) compared to the sister group (2.799 ± 0.004, p < 0.05). However, no significant difference was found in CHUG members in the use of C-ARSC compared to the outgroup, nor that of nitrogen atoms per amino-acid-residue side chain (N-ARSC, 0.345 ± 0.002, Fig. 2G) compared to the sister group or the outgroup. Likewise, the seven other PRC genomes have significantly higher C-ARSC (2.879 ± 0.031, Fig. 2F) than the 53 other reference roseobacters (2.817 ± 0.026, p < 0.01), but no significant difference was found between their N-ARSC values (Fig. 2G).

Consistent with their genome size differences, CHUG genomes contain a significantly smaller number of coding genes (2486 ± 78, Fig. 2H) compared to the outgroup (3939 ± 214, p < 0.01) and the seven other PRC genomes (3253 ± 545 genes, p < 0.05). The CHUG genomes contain 2215 ± 70 orthologous gene families (Fig. 2I) with 1.12 ± 0.01 gene copy per family (Fig. 2J). By comparison, the outgroup genomes contain 3259 ± 130 orthologous gene families (p < 0.01) with 1.20 ± 0.04 (p > 0.05) gene copy per family, while the seven other PRC genomes possess 2678 ± 398 orthologous gene families (p > 0.05) with 1.21 ± 0.05 (p < 0.01) gene copy per family. No significant difference occurs between CHUG and the sister group in the number of genes, orthologous gene families and gene copies per family. Additionally, while the number of genes and gene copies per family of the seven other PRC genomes is not significantly different from those in the 53 other reference roseobacters (Fig. 2H, J), the seven other PRC genomes have fewer orthologous families compared to the 53 other reference roseobacters (3362 ± 362, p < 0.01, Fig. 2I).

Global distribution and ecological drivers

The eight CHUG members recruited 0.0005% and 0.0008% of all metagenomic (Fig. 3A) and metatranscriptomic (Fig. 3B) reads from the global TARA Ocean datasets with size fractions up to 3 µm (prokaryote-enriched) [27, 28], respectively. The CHUG members appear to be less abundant and less active than a few other PRC members such as the strain HTCC2255 (NAC11-7), the strain SB2 (CHAB-I-5) and Planktomarina temperata RCA23 (RCA or DC5-80-3) (Welch’s t-test, p < 0.01 for each). A similar pattern was also found using TARA Ocean metagenomic sequencing data with the size fraction of 5–20 µm (nanoplankton-enriched; Fig. 3C) [29]. From a gene-centric perspective, 58.6% ± 1.2% and 88.3% ± 12.7% genes from the eight CHUG genomes and seven other PRC members recruited TARA metatranscriptomic reads, respectively (see Supplementary Text 2.1 for more details).

Fig. 3: The global distribution of CHUG and its ecological correlation with environmental factors.
figure 3

AC The relative abundance of CHUG and other PRC members in the bacterial communities based on recruitment analysis using the metagenomic TARA Ocean sequencing samples with size fractions up to 3 µm (A), and metatranscriptomic sequencing samples with size fractions up to 3 µm (B), and metagenomic sequencing samples with size fraction of 5–20 µm (C). D, E Correlation analysis between the relative abundance of CHUG and other PRC members and environmental parameters measured in the TARA Ocean metagenomic (D) and metatranscriptomic (E) samples. The p value is adjusted using stringent Bonferroni correction. Nonsignificant correlations are indicated by crosses for p > 0.05 after adjusting. AO Arctic Ocean, NAO North Atlantic Ocean, SAO South Atlantic Ocean, IO Indian Ocean, MS Mediterranean Sea, NPO North Pacific Ocean, SPO South Pacific Ocean, RS Red Sea, SO Southern Ocean, fCDOM fluorescence, colored dissolved organic matter.

The relative abundances of typical PRC lineages are positively correlated with each other, with Chl-a concentration (a proxy for phytoplankton biomass [45]), and with total carbon in both metagenomic and metatranscriptomic samples (Fig. 3D, E; Bonferroni corrected p < 0.05). This is interesting but not entirely new. DC5-80-3 and NAC11-7, for example, were previously shown to be positively correlated with phytoplankton blooms [1, 4, 46,47,48,49]. In the case of CHAB-I-5, while such a correlation was not evident in a previous study with a more limited sampling effort [11], a few uncultivated members carry signature genes mediating organismal interactions (e.g., type VI secretion system and quorum sensing) [12], potentially enabling them to explore microenvironments including phytoplankton. On the other hand, the relative abundance and activity of the CHUG are not correlated with other PRC members, chlorophyll a (Chl-a) concentration, or the total carbon (Fig. 3D, E; Bonferroni corrected p > 0.05). The same result was also obtained for Rhodobacteraceae bacterium HIMB11 [50], another PRC member whose ecology and evolution has been studied. These results suggest that CHUG (and perhaps HIMB11) may take a free-living lifestyle decoupled from marine phytoplankton.

CHUG genome reduction and vitamin B12 auxotrophy

The last common ancestor (LCA) of the CHUG cluster was estimated to have 2320 genes, 2134 orthologous gene families (1.09 gene copy per family), and a genome size of 2.35 Mb. There were 172 families (185 genes) gained and 406 families (425 genes) lost on the ancestral branch leading to the LCA of CHUG, while 28 and 52 families (30 and 79 genes) underwent copy number increase and decrease, respectively. Compared to its sister group and the outgroup, CHUG members lost 412 Kb (9.8%) on the ancestral branch leading to its LCA (filled triangle in Fig. 4A).

Fig. 4: The phyletic pattern of select genes.
figure 4

The solid and open circles in the right panel represent the presence/absence of the genes, respectively. A The phyletic pattern in the CHUG, its sister group and its outgroup. The phylogenomic tree shown in the left panel is pruned from the full phylogenomic tree shown in Fig. 1A, and branch length is ignored for better visualization. The ancestral genome reconstruction was performed with BadiRate v1.35 [39]. Each ancestral and leaf node is associated with three numbers, representing the total number of orthologous gene families at this node, and the number of orthologous gene families gained and lost on the branch leading to this node. The last common ancestor (LCA) of CHUG, the LCA shared by CHUG and its sister group, and the LCA shared by CHUG, its sister group and the outgroup are marked with a filled triangle, a filled circle, and a filled star, respectively. B The estimated phyletic pattern of the above-mentioned three LCAs. C The gene presence and absence pattern in the CHUG and other seven PRC genomes. The dendrogram in the left panel is pruned from that shown in Fig. 1B. thiE thiamine-phosphate pyrophosphorylase, pdxH pyridoxamine 5′-phosphate oxidase, bioB biotin synthase, cobG precorrin-3B synthase, cbiX sirohydrochlorin cobaltochelatase, cobV adenosylcobinamide-GDP ribazoletransferase, btuB vitamin B12 transporter, amtB ammonium transport system, glnBD nitrogen regulatory protein P-II, ntrBC nitrogen regulation two-component system, ntrXY nitrogen regulation two-component system, ureABC urease, urtABCDE urea transport system, nrtABC nitrate/nitrite transport system, phoBR two-component phosphate regulatory system, pstABCS phosphate transport system (high affinity), phnGHIJKLM carbon-phosphorus (C-P) lyase, phoX alkaline phosphatase, plcP phospholipase C, cheAB chemotaxis family protein, fliC flagellin, luxR quorum-sensing system regulator, virB type IV secretion system protein, vasKF type VI secretion system protein, GTA gene transfer agent, fucA l-fuculose-phosphate aldolase.

Since the CHUG genomes experienced net DNA and gene losses, we explored whether metabolic auxotrophy (i.e., inability to synthesize a compound required for the growth) arose as a result of these losses. The extant CHUG genomes harbor the complete pathways for the synthesis of all 20 amino acids, many of which, such as the synthesis of lysine (dapD) and methionine (ahcY), were transcribed in the wild (Fig. S2). Specifically, genes encoding both vitamin B12-dependent (metH) and -independent methionine synthase (metE) were expressed, and the expression level of the former was approximately twice the latter one (Fig. S2). CHUG members further encode the key genes for thiamine (vitamin B1) synthesis (thiamine-phosphate pyrophosphorylase, thiE) and pyridoxine (vitamin B6) synthesis (pyridoxamine 5′-phosphate oxidase, pdxH). Nevertheless, the key gene for biotin (vitamin B7) synthesis (biotin synthase, bioB) was not found in CHUG nor in the sister group and the outgroup, suggesting that the biotin auxotrophy in CHUG was not part of their net gene losses.

Intriguingly, CHUG is auxotrophic for cobalamin (vitamin B12) biosynthesis, which can be synthesized by most roseobacters [2]. This was validated using a growth assay, in which the CHUG strain HKCCA1288 did not grow in the defined medium lacking vitamin B12 but grew well when vitamin B12 was supplemented (Fig. 5A). As a comparison, the model roseobacter Ruegeria pomeroyi DSS-3 grew equally well in the presence or absence of vitamin B12 (Fig. 5A). Mapping of the vitamin B12 de novo synthesis pathway to the phylogeny (Fig. 1A) indicates that the loss of this capability was most likely associated with the genome reduction leading to the LCA of the CHUG cluster. On the other hand, no genome content changes were inferred related to vitamin B12 synthesis by the ancestral genome reconstruction (Fig. 4B). This controversy can be ascribed to the facts that the de novo synthesis of cobinamide, the key precursor of vitamin B12, has two non-homologous pathways (i.e., aerobic and anaerobic synthesis of cobinamide via key genes cobG and cbiX, respectively), and that distinct pathways are maintained in the CHUG sister lineages (Fig. 1A). The ancestral genome reconstruction further inferred that the loss of vitamin B12 de novo synthesis capability is compensated with the coincidental gain of a putative transporter (btuB) for vitamin B12 and its related compounds such as cobinamide [51] (Fig. 4B), which is absent from all other PRC members capable of de novo vitamin B12 synthesis (Fig. 4C). Taken together, the loss of de novo synthesis capability, the gain of a putative transporter, and the increased expression of the vitamin B12-dependent (metH) methionine synthase gene indicates that CHUG may have to acquire vitamin B12 or its precursor from the environment.

Fig. 5: Growth assay of CHUG strain HKCCA1288 and the model roseobacter Ruegeria pomeroyi DSS-3.
figure 5

A Growth assay performed on defined marine ammonium mineral salts (MAMS) medium with and without supplementing vitamin B12 are plotted in red and blue, respectively. Three triplicates were performed for each condition and error bars denote standard deviation. B Growth assay under phenotype microarray (PM) plates with l-fucose as the sole carbon source. The horizontal gray line represents the basal line as defined in the negative control (well A01 on PM1, Fig. S3A, C) without any carbon source. Curves above and below the basal line indicate the strain can and cannot use l-fucose for growth, respectively. Three replicates were performed for each condition.

Utilization of organic compounds including metabolites released by marine phytoplankton and other marine organisms

Of the 190 organic compounds assayed through the phenotype microarrays, 43 are experimentally verified metabolites secreted by select species of the dominant eukaryotic marine phytoplankton groups including diatom [52], dinoflagellate [53], and coccolithophore [54, 55]. The CHUG isolate HKCCA1288 is limited in using these phytoplankton-related substrates for growth compared to R. pomeroyi DSS-3. Specifically, while both can use five of these substrates and neither can use eight of them (Table S2 and Fig. S3), the remaining 30 compounds were exclusively used by R. pomeroyi DSS-3.

Most of the remaining 147 substrates covered by the phenotype microarrays cannot differentiate the two strains. Specifically, 16 of these compounds can support both strains and 90 supported neither (Table S2 and Fig. S3). Among the compounds differentially utilized by the two strains, 32 and 9 were exclusively used by R. pomeroyi DSS-3 and HKCCA1288, respectively (Table S2 and Fig. S3). The latter includes l-fucose, which can support HKCCA1288 as a sole carbon source (Fig. 5B). l-fucose is the degradation product of fucoidan made by marine brown algae [56, 57]. Consistently, the gene encoding l-fuculose-phosphate aldolase (fucA) responsible for l-fucose degradation was found in HKCCA1288 and four other CHUG genomes (Fig. 4) but not in R. pomeroyi DSS-3 (Table S2). Notably, the early-branching CHUG lineages represented by HKCCA1288 and HKCCA1065 were collected from the ambient seawater of the brown algae Sargassum hemiphyllum. These lines of evidence are consistent with the hypothesis that the ambient seawater of marine brown algae is likely an important microenvironment supporting CHUG members. Another compound of potential interest is d-fucose, since it also specifically supported HKCCA1288 and it is a component of the glycosphingolipid [58] and glycoside [59] in some sponges, tentatively suggesting sponge ambient seawater as another microenvironment for some CHUG members.

CHUG members take an evolutionary path decoupled from phytoplankton

We have provided three independent lines of evidence for the hypothesis that CHUG takes a free-living lifestyle decoupled from phytoplankton. First, unlike other PRC lineages, the global distribution of CHUG is not correlated with marine phytoplankton biomass (approximated by Chl-a concentration). Moreover, when the TARA Ocean metagenomic sequencing reads at the nanoplankton-enriched size fraction (5–20 µm) were recruited, CHUG members exhibited a lower relative abundance than other PRC representatives by approximately one order of magnitude (Fig. 3C). Second, unlike all other PRC members and most other reference roseobacters, all CHUG members lack the genes for de novo vitamin B12 synthesis. The auxotrophy for vitamin B12 was also validated for HKCCA1288 – for which we generated a complete genome sequence – in a growth assay (Fig. 5A). The marine eukaryotic algae are predominantly vitamin B12 auxotrophs [60], whereas most roseobacters have the potential to synthesize vitamin B12 [2]. This complementarity is one of the major mechanisms that facilitate mutualistic interactions between roseobacters and phytoplankton [2, 13,14,15]. One more evidence is from the high-throughput growth assays, which demonstrated limited capacity of using phytoplankton-derived metabolites as carbon sources by CHUG compared to that by R. pomeroyi DSS-3 known to interact with marine phytoplankton groups (Table S2).

These observations are unusual because members of the Roseobacter group are known as the dominant bacterial lineages associated with marine phytoplankton groups [17] and their evolutionary history was likely correlated with phytoplankton diversification [2, 61]. They usually benefit from the fixed carbon or other excretions released by phytoplankton and, in return, produce secondary metabolites (e.g., vitamins, indole-3-acetic acid) to promote phytoplankton growth [15, 62, 63]. These interactions likely occur in microzones immediately surrounding phytoplankton cells, which may create gene flow barriers and facilitate population differentiation of associated roseobacters [42, 64]. Therefore, the ecology and evolution of the Roseobacter group in the pelagic ocean are generally shaped by marine phytoplankton, making the possible separation from this ecological pattern in the CHUG lineage unique.

Other important metabolic potential of CHUG

Nitrogen (N) is a primarily limiting nutrient in surface oceans [65]. Genes encoding the nitrogen regulatory protein P-II (glnBD) were highly expressed in the wild CHUG populations (Fig. S2). Genes encoding the high-affinity ammonium transporter (amtB) and the two-component regulatory system (ntrBC and ntrXY) were found in CHUG genomes. Genes encoding urease (ureABC) were also identified in CHUG members, though the urea transport system (urtABCDE) was not found. It is possible that urea is assimilated via passive diffusion across the cell membrane in CHUG as shown in other bacteria [66], or that urea is taken up by another promiscuous transporter. The genes encoding the transporter for nitrate/nitrite assimilation (nrtABC) are also missing in CHUG genomes. CHUG members retain the genes for the spermidine/putrescine transporter (potABCD and ABC.SP) (Table S3), and the latter was among the most highly expressed genes in the oceanic CHUG populations (Fig. S2). However, CHUG members do not carry genes for other polyamine transport systems, such as potFGHI for putrescine acquisition. CHUG members also retain and highly expressed aapJMPQ for the general l-amino acid transporter (Fig. S2), but lost genes encoding the polar amino acid transport system ABC.PA, which is prevalent in all other roseobacters studied here. CHUG members further have a reduced number of genes (only one copy) encoding the branched-chain amino acid transport system (livFGHKM) compared to its sister group (at least three copies), the outgroup (at least three copies) and other PRC members (at least two copies; Table S3). Overall, fewer genes involved in the acquisition of amino acids were found in CHUG (Table S3), but they may remain efficient due to the high expression level of the retained genes.

Phosphorus (P) is often a co-limiting nutrient in surface oceans [65]. To deal with P limitation, CHUG members may be assisted by the essential regulatory and metabolic pathways known to be induced by P limitation including the two-component regulatory system (phoBR), the high-affinity phosphate transporter (pstABCS) and the C-P lyase (phnGHIJKLM) for phosphonate utilization. However, they lost the phoX encoding an alkaline phosphatase for phosphodiester utilization [67] during the genome reduction process (Fig. 4A, B). A notable evolutionary innovation upon the emergence of the CHUG cluster was a gain of the gene encoding phospholipase C (plcP) (Fig. 4A, B), which is missing from all the traditional PRC members (Fig. 4C). The plcP is the key gene of the pathway for phospholipid substitution with non-phospholipids in response to P starvation, and is prevalent in marine bacterioplankton [68].

CHUG members have also lost genes for chemotaxis (cheAB) and flagellar assembly (fliC). These genes were essential to mediate roseobacter-phytoplankton interactions [69], but may become dispensable when switching to a planktonic lifestyle [70]. Consistent with this, the quorum-sensing (QS) system (luxR), type IV secretion system (virB), and type VI secretion system (vasKF) involved in organismal interactions were rarely found in CHUG genomes (Fig. 4A). CHUG members also lost the gene cluster encoding gene transfer agent (GTA), which resembles small double-stranded DNA (dsDNA) bacteriophages that increase horizontal gene transfer and metabolic flexibility at high population density [71].

In terms of the strategies for energy conservation, CHUG members maintain a complete (Fig. S4) and actively expressed (Fig. S2) photosynthesis gene cluster enabling aerobic anoxygenic photosynthesis (e.g. puf, bch, and crt). Some of them further carry gene clusters for the oxidation of carbon monoxide (cox) and sulfide/thiosulfate (sox) as energy sources, but lack genes involved in dissimilatory nitrate/nitrite reduction. For important substrates commonly used by roseobacters, CHUG members are limited in their catabolic pathways. For the utilization of aromatic compounds degradation, for example, CHUG members possess protocatechuate ring cleavage pathway (pcaGH) but lost genes for the ring cleavage of phenylacetate (paaABCDE) and homogenisate (hmgA). CHUG carry both demethylation (dmdA) and cleavage pathways (dddD and dddL) to utilize dimethylsulfoniopropionate (DMSP), but lack genes to degrade other important methylated compounds including trimethylamine N-oxide (tmd), trimethylamine (tmm), and taurine (tauABC and xsc). More details are provided in Supplementary Text 2.1 and 2.2.

Potential evolutionary forces driving genome reduction of CHUG

The most abundant marine bacterioplankton, such as the Pelagibacterales (also called the SAR11 clade) in Alphaproteobacteria and the Prochlorococcus in Cyanobacteria, are often equipped with very small genomes [72]. The evolutionary mechanisms driving their genome reduction have been discussed extensively. Among these, selection for metabolic efficiency under extreme nutrient limitation (i.e., ‘genome streamlining’) has been theorized as the dominant force driving their genome reduction [72, 73]. Although CHUG members possess smaller genomes and lower GC content compared to the sister group and the outgroup (Fig. 2A, C), they do not show other features of genome streamlining (Fig. 2B, E, J), such as higher coding density, fewer paralogues, or rarers pseudogenes [72, 74]. Likewise, these genomic features are also missing in other abundant PRC members (Fig. 2B, E, J) such as NAC11-7, DC5-80-3, and CHAB-I-5, though they have smaller genomes compared to other reference roseobacters (Fig. 2A). Therefore, the genome reduction process of CHUG and other PRC members does not meet the canonical definition of genome streamlining.

Other important evidence against the genome streamlining explanation for CHUG genome reduction is from the genomic proxies for nutrient acquisition and saving strategies used by marine bacterioplankton. Among the selective factors that may drive bacterioplankton genome reduction in the pelagic ocean, N limitation is considered as the dominant factor [44, 72, 75, 76]. Although the relative abundance of gene transcripts (but not the genes) in the wild CHUG populations is positively correlated with the nitrate concentration (Fig. 3E; Bonferroni corrected p < 0.05), which provides marginal evidence for a role of N limitation, other key evidence is missing. For example, we did not observe a reduced use of N in the amino acid sequences (approximated by N-ARSC) in CHUG compared to the sister group and the outgroup. Similar observation is used as evidence against the hypothesis that N limitation is a strong driver of genome streamlining in other marine bacterioplankton lineages [77, 78]. A second potential ecological factor driving genome streamlining is P limitation [79], though this theory has been debated [80]. Genome reduction likely leads to a sizable decrease in cellular P requirement and thus may confer a competitive advantage in the P-limited marine environments [81]. Although a few important genes for P acquisition (pst for high-affinity phosphate transporter and phn for C-P lyase) are retained and a gene encoding phospholipase C (plcP) responsible for cell membrane phospholipid substitution for non-phosphorus lipids [68] was even acquired, the key P scavenging gene encoding PhoX alkaline phosphatase was lost during the CHUG genome reduction (Fig. 4). Therefore, available evidence for either N or P limitation as a driver of CHUG genome reduction is self-contradictory.

Because evidence for genome streamlining is weak, we examined neutral evolutionary forces as potential explanations for CHUG genome reduction. In fact, neutral mechanisms have recently been considered to play important roles in driving genome reduction of marine bacterioplankton lineages [40, 82, 83]. Most of the prior studies focused on Prochlorococcus (see references cited in the following paragraphs). While some extended their discussions to Pelagibacterales [40, 84], knowledge on the evolutionary mechanisms driving genome reduction of most other marine bacterioplankton lineages is rather limited.

One potentially important neutral driver is random genetic drift due to a reduction of effective population size (Ne). A previous study showed that the major genome reduction event coincided with an accelerated rate of accumulating deleterious mutations in the early evolution of Prochlorococcus, providing important evidence that genetic drift is likely the primary mechanism of genome reduction in this lineage [40]. Specifically, the power of genetic drift (i.e., the inverse of Ne) of an ancestral lineage (e.g., the ancestral branch underlying the ancient genomic events) can be approximated by the ratio of the radical nonsynonymous nucleotide substitution rate (dR) to the conservative nonsynonymous nucleotide substitution rate (dC) [40]. Because a replacement by a physicochemically dissimilar amino acid (i.e., radical change) is likely to be more deleterious than a replacement by a similar amino acid (i.e., conservative change) [85, 86], the elevated dR/dC ratio is evidence for genetic drift acting to accumulate the deleterious type of mutations (i.e., the radical changes) in excess. In terms of the CHUG, the dR/dC ratio is not significantly elevated compared to its sister group (Fig. S5A) under two independent methods for biochemical classification of the 20 amino acids (Fig. S5B, C), suggesting that the deleterious type of mutations was not accumulated in excess at the ancestral branch leading to the LCA of the CHUG cluster (filled triangle in Fig. 4A). Since this ancestral branch corresponds to the time when major genome reduction occurred for CHUG, we can conclude that genetic drift is unlikely to be an important driver of CHUG genome reduction.

A second potentially important neutral driver of prokaryotic genome reduction is increased mutation rate, which has also been proposed to explain Prochlorococcus genome reduction [87]. Mathematical modeling predicts that not all auxiliary genes can be maintained by purifying selection when mutation rate is increased, and that an increase of 10 fold in mutation rate may lead to a 30% decrease in genome size [88]. More recently, this hypothesis was supported with empirical data from comparative genomics analyses [83], though whether increased mutation rate is a truly important driver of prokaryotic genome reduction is debated [89]. Given the potentially important role of increased mutation rate in driving prokaryotic genome reduction, determining the unbiased spontaneous mutation rate of the CHUG and the sister lineage using the mutation accumulation experiment followed by whole genome sequencing of the mutant lines becomes an urgent research need.

One more potentially important but rarely discussed neutral force leading to genome reduction is the loss of the genes that were important in the initial habitat but became dispensable after the bacteria switched to a new environment. This neutral loss mechanism, termed relaxation of purifying selection, may also have contributed to genome reduction in Prochlorococcus [90]. Importantly, the loss of dispensable genes under this mechanism is not related to the change of Ne but results instead from a change of habitat or lifestyle. In contrast to other PRC members and copiotrophic roseobacters such as R. pomeroyi DSS-3 sampled from the pelagic environments, CHUG members take a free-living lifestyle uncoupled from marine phytoplankton. This is supported by three lines of evidence: (i) unlike other PRC members, CHUG members do not exhibit a correlative pattern between their global distributions and Chl-a (Fig. 3D, E); (ii) vitamin B12, a fundamental metabolite many roseobacters produce and supply to phytoplankton, cannot be synthesized by CHUG; (iii) compared to R. pomeroyi DSS-3, CHUG members have a very limited capacity of using phytoplankton-derived metabolites as carbon sources. Since supplying vitamin B12 to phytoplankton in exchange for phytoplankton-derived carbon sources is an important mechanism underlying the symbiosis between roseobacters and phytoplankton [2, 13,14,15], the inability of the de novo synthesis of vitamin B12 and of using most phytoplankton-related metabolites indicates that the CHUG ancestor may have lost its ability to establish symbiosis with phytoplankton. As a consequence of this transition, other genes contributing to roseobacter-phytoplankton symbiosis (e.g., motility and chemotaxis), relying on population density (e.g., quorum sensing), and involved in interactions with other bacteria (e.g., gene transfer agent), may have become dispensable [70]. Indeed, the loss of these signature genes contributed to the genome reduction of CHUG (Fig. 4). We therefore propose that relaxation of purifying selection stemming from reduced interactions with marine phytoplankton may be one of the primary evolutionary forces leading to the major genome reduction of CHUG.

Concluding remarks

Although taking a planktonic lifestyle and having some of the smallest genomes among roseobacters, CHUG members lack some canonical features of typical marine bacterioplankton lineages undergoing genome streamlining. Because genome streamlining process is driven by natural selection under extreme nutrient limitation [72], the absence of many genomic features commonly found in streamlined bacterioplankton genomes indicates that the genome reduction of CHUG may have taken place in some pelagic microenvironments enriched in nutrients. While microalgal phycosphere has been thought as the most common microenvironments colonized by roseobacters inhabiting the pelagic ocean, it is unlikely to support CHUG. This is convincingly revealed by its auxotrophy for vitamin B12, its inability to use many phytoplankton-derived metabolites, and its decoupling from phytoplankton in its global distribution. Instead, the available evidence allows generating a testable hypothesis that the seawater surrounding marine brown algae and perhaps other marine macroorganisms is potentially important microenvironments that CHUG members may explore. The discovery of the CHUG cluster greatly expands the diversity of the marine Roseobacter group, and its unique evolutionary trajectory complements to our understanding of how the genomes of many marine bacterioplankton lineages become small.

Cultivar availability

One strain sampled from the East China Sea (FZCC0069) and two strains sampled from the South China Sea (HKCCA1065 and HKCCA1288) are available at the China General Microbiological Culture Collection Center (CGMCC) under the accession number CGMCC 1.19034, 1.19035, and 1.19036, respectively. Three strains sampled from the northern Gulf of Mexico (LSUCC1028, LSUCC0387, and LSUCC0374) are currently under deposition at the German Collection of Microorganisms and Cell Cultures (DSMZ).