Gene gains and losses play a major role in shaping genome content and ecological diversification of bacteria [1]. In bacteria, gene acquisition through lateral gene transfer (LGT) has been well studied [2, 3]. Compared with gene gains, the mechanisms and roles of gene losses in microbial evolution and ecological diversification are less clear. One of the most common ways of gene loss is pseudogenization [4]. Pseudogenes, first discovered in Xenopus laevis [5], have long been considered as a paradigm of neutral evolution [6]. They represent genes lost as a result of disabling mutations, fixed in the population by genetic drift because the effected gene becomes dispensable in the organism’s current niche and thus not subject to strong functional constraints [7]. Recently, accumulating evidence implies pseudogenization as an important contributor to genetic variation and evolution of microbial genomes [4, 8]. Despite being predominantly neutral, there are cases in which gene losses via pseudogenization increase fitness and are thus adaptive [9, 10]. A few characterized pseudogenization events are associated with increases in virulence [11], such as the marT gene in Salmonella Typhi, which likely contributes to the surV-dependent survival to oxidative stress imposed by abundant H2O2 inside human macrophages [12]. Further, pseudogenes may serve as raw materials from which novel genes and noncoding RNAs arise [13]. From the evolutionary perspective, pseudogenes provide a rapid way for ecological adaptation because (i) a single mutation could lead to the inactivation of a gene, and (ii) pseudogenes have the potential to be re-activated by back mutation or recombination, particularly when the functionality of the gene is selectively favored in a new environment [14]. Besides, pseudogenes represent unequivocal cases of ongoing gene loss, which is often inaccurately predicted by canonical methods [15, 16] based on the presence/absence pattern of a functional gene across a phylogenetic tree.

Despite their prevalence in eukaryotic genomes, pseudogenes are relatively infrequent in prokaryotes [17, 18]. Pseudogenes are known to be absent or depleted in streamlined free-living marine bacterioplankton lineages, such as Prochlorococcus, SAR11 and SAR86 among others [19, 20]. However, they are rarely explored in metabolically versatile lineages with larger and more fluid genomes. Members of the Roseobacter group comprise up to 20% of bacterial communities in coastal marine habitats [21]. In addition to their well-established free-living and patch-associated lifestyles in the pelagic environments [22], roseobacters often represent a significant fraction in the microbiomes associated with various marine algae and invertebrates [21, 23]. For instance, on average 19% of the epiphytic bacterial cells of a green Ulvacean alga fall within the Roseobacter group [24]. In coral species Acropora tenuis on the Great Barrier Reef, Roseobacter-affiliated sequences make up to 63% of the clone library [25].

Roseobacters and other marine bacteria are found in three unique niches in corals: the surface mucus layer, coral tissue, and skeleton [26, 27]. Coral nutrition is largely supported by Symbiodinium in coral tissue through translocation of photosynthates, and up to 50% of the net fixed carbon is released as mucus by corals [28]. Coral skeleton is characterized as a separate and relatively stable environment well-protected from external surroundings. Endolithic algae, shown as green bands beneath the coral tissue layer, have been reported in a wide array of coral species [29], including those in the genus Platygyra [30] sampled in the present study. While the photosynthetic rate is considerably lower in coral skeleton than that in coral tissue due to the extremely low intensity of light that reaches skeleton, the coral skeleton is subject to diurnal fluctuations in oxygen and pH levels as coral tissue [29, 31]. Members of the Roseobacter group are also commonly found in the macroalgal ecosystems, capable of degrading a variety of algal osmolytes [32]. For mutualistic relationship between macroalgae and bacteria, the algae provide nutrients and oxygen for epiphytic bacteria, which in turn mineralize organic substrates and provide CO2, essential vitamins, minerals, and growth factors to the host [33, 34]. Macroalgae release a large portion of their fixed carbon in the form of carbohydrates to the surrounding environment [35]. Therefore, the ambient seawater and sediments are typically rich in organic carbon [36, 37].

In line with their habitat diversity, roseobacters exhibit tremendous genomic diversity and metabolic versatility [38]. Strikingly, even between closely related members sharing nearly identical 16S rRNA gene sequences, the genome contents can vary greatly, as shown in 42 globally distributed strains of the roseobacter species Epibacterium mobile (formerly known as Ruegeria mobilis) [39]. Such intraspecific variation could be associated with habitat diversity, as illustrated by two Sulfitobacter sp. strains (NAS-14.1 and EE-36) and two Phaeobacter gallaeciensis strains [40, 41]. In fact, a patchy distribution of ecologically relevant genes is often observed among roseobacters [23, 42]. This may, to some extent, reflect their adaptation to the diverse ecological niches. However, the evolutionary mechanisms underlying the tremendous amount of diversity, especially how the genome content diverged among closely related lineages, are poorly understood.

Many previous studies have investigated the genetic diversity in members of the Roseobacter group and its relationship to ecological niches [23, 38, 39, 43, 44]. However, these studies were based on canonical methods of gene gain/loss inferences which ignored pseudogenes, or focused on distantly related species where traces of most pseudogenes have been eliminated from the genome. Here, we inferred the evolutionary history of pseudogenization events among closely related strains and linked them to the ancestral switches between niches of the coral and macroalgal ecosystems. Closely related genomes were analyzed because, according to previous studies, most bacterial pseudogenes are strain-specific, and of the few that are shared across strains, many may have been generated independently since pseudogenes are retained in bacterial genomes for only a short time on an evolutionary scale [45,46,47]. In terms of the ecosystem type, we chose a coral ecosystem represented by Platygyra acuta and a macroalgal ecosystem represented by Sargassum hemiphyllum. This is because these two species are ecologically significant and globally widespread throughout coastal environments including Hong Kong [48,49,50,51], serving as important nursery habitats for various marine organisms [52]. To minimize the effect of confounding biological and stochastic factors that may also explain the evolutionary changes of functional genes, we collected most samples at the same season and from locations nearby in Hong Kong. We isolated bacteria and sequenced 279 genomes of closely related strains affiliated with the genus Ruegeria of the Roseobacter group. Ruegeria is widely distributed among various habitats provided by marine eukaryotes [53], and is among the few bacterial genera associated with the greatest number of coral species according to a recent comprehensive review [53]. We tested the hypothesis that genes dispensable in new niches were likely lost via pseudogenization by showing the correlations between inferred pseudogenization events and corresponding niche shifts. We elaborated on specific ecologically relevant genes contributing to microbial diversification in these niches from the novel perspective of the gene loss processes.

Materials and methods

Bacterial isolation and pseudogene identification

We collected most microbial samples from the brown alga Sargassum hemiphyllum ecosystem and the coral Platygyra acuta ecosystem at the same season during 2016–2018 and from locations nearby in Hong Kong waters (Fig. 1b; Supplementary Table S1). Focusing on different niches in the macroalgal ecosystem (algal tissue, ambient seawater, and sediment) and the coral ecosystem (mucus, tissue, skeleton, and ambient seawater), we isolated and performed genome sequencing of 279 closely related strains affiliated with the roseobacter genus Ruegeria.

Fig. 1: The sampling information and phylogenomic analysis of the 279 isolates.
figure 1

a The maximum likelihood phylogenomic tree based on core single-copy gene families identified by OrthoFinder was constructed using IQ-TREE. Black circles on the nodes denote those with ultrafast bootstrap values <95%. Clustering of isolates based on phylogenetic depth is shown in the outer layers. Different groups of genomes clustered using each phylogenetic depth cutoff are indicated by the differences in the height of the layer. The ecological niche and isolation method of each strain are marked on the tree. The two major marine habitats are represented by green (macroalga) and purple (coral), respectively. Niches within each habitat are marked with different shapes. Outgroups composed of public roseobacter genomes are collapsed into triangles and labeled with the numbers one and two. b Four different sites for the macroalgal and coral sample collection in Hong Kong. c Kernel density estimates of whole-genome ANI and pairwise 16S rRNA gene identity. The values above the plots indicate the ANI or 16S rRNA gene identity of the peak.

Next, we used the program suite Psi-Phi [45] for pseudogene identification. This program uses a conservative criterion considering a pseudogene only when it lost >20% of its original length, and has been widely employed in other studies [46, 47, 54]. Applying a comparative method, this approach enhances pseudogene recognition among closely related strains both in annotated regions by identifying incorrectly annotated open reading frames (ORFs) and in intergenic regions by detecting new pseudogenes [45]. We further applied the following three steps to filter out misidentified pseudogenes. First, Psi-Phi only works on complete chromosomes. However, all of our newly sequenced genomes contain contigs (Supplementary Dataset S1). We therefore modified the original scripts to make them capable of searching for pseudogenes based on merged contigs followed by removing identified pseudogenes whose locations spanned contigs (Supplementary Fig. S1A). Second, we removed those identified by a query of less than 100 amino acids, since the vast majority of such short annotated ORFs are less likely to be genuine genes [47, 55], which accounted for 19.7% of the original output. Last, as noted by Lerat and Ochman [45], if the query gene is annotated as being longer than its actual length, some “real” genes could be misidentified by this query ORF as pseudogenes truncated by a premature stop codon. The logic is that, if an ORF identified a lot of (here we used 10 as an arbitrary cutoff) pseudogenes as a single query, then it was highly suspicious that the query was annotated to be longer than its actual length (see Supplementary Fig. S1B for an example). Applying this criterion further removed 18.8% of the pseudogenes originally identified by Psi-Phi.

For pseudogene identification, annotated proteins of each genome were queried against the complete nucleotide sequence of every other genome. For the vast majority of identified pseudogenes (>93%), all of the query ORFs of each pseudogene were from the same gene family, and these pseudogenes were therefore referred to as “strict-consistent” pseudogenes. Approximately 6% of the pseudogenes had less than half of their query ORFs from different families, and these were named as “relaxed-consistent” pseudogenes. Both strict-consistent and relaxed-consistent pseudogenes were retained for subsequent analyses (Fig. 2). Further technical details were provided in Supplementary Text S1.

Fig. 2: Schematic illustration of pseudogene identification and ancestral states reconstruction.
figure 2

Flowchart of pseudogene analysis. OG1, OG2, and OG3 refer to three different homologous gene family IDs assigned by OrthoFinder. Pseudogenes with all query ORFs from the same gene family are “strict-consistent” and those with >1/2 of their query ORFs clustered into the same gene family are “relaxed-consistent”. “Inconsistent” pseudogenes do not have >1/2 of query ORFs clustered into one family. Only strict- and relaxed-consistent pseudogenes were assigned to corresponding gene families. “Multi-state” gene families contain both ORFs and pseudogenes from the same genome, while in “single-state” gene families, either ORFs or pseudogenes are present in each of the involved genome. Only single-state gene families were selected for ancestral reconstructions of genetic and ecological traits.

Phylogenomic analysis and ancestral state reconstruction

Amino acid sequences from core single-copy gene families identified by OrthoFinder v2 [56] were concatenated for the maximum likelihood (ML) phylogenomic tree inference using IQ-TREE v1.6.2 [57]. To choose proper candidates for ancestral state reconstruction (ASR), we explored all homologous gene families with assigned pseudogenes. Gene families containing both ORFs and pseudogenes from the same genome were designated as “multi-state” gene families (Fig. 2). In contrast, gene families in which each of the involved genomes corresponds to only one functional state, containing either ORFs or pseudogenes, were called “single-state” gene families (Fig. 2). In the case of the multi-state gene families, since pseudogenes likely evolved from a functional ancestor originating from vertical descent, or gene duplication, or LGT, it is difficult to identify (i) how the pseudogenes were formed, and (ii) whether the remaining ORF copy has changed its function. Thus, only single-state gene families were used for ASR (Fig. 2).

ASR was performed for functional state and ecological trait separately using the “MPR” (most-parsimonious reconstruction) algorithm implemented in the Phangorn R package [58]. Based on the inferred functional states for ancestral nodes, pseudogenization, where an ancestral ORF became a pseudogene, or a back event, which refers to the functional restoration from an ancestral pseudogene to an ORF, was identified for each single-state gene family. By mapping the inferred functional transitions and niche shifts along the phylogeny for each single-state gene family, we identified cases where these two were matched on the same branch, i.e., an ORF became a pseudogene (or vice versa) during a niche shift. Gene families with at least one match were summarized (see details in Supplementary Text S1).

Results and discussion

Phylogenomic analysis of the 279 newly isolated Ruegeria strains

Our phylogenomic tree of the 279 new strains together with the 166 publicly available roseobacter genomes (Fig. 1a; Supplementary Dataset S2) showed that the new strains were in general related to several described species of the genus Ruegeria, which together formed a monophyletic group phylogenetically distinct from the model roseobacter R. pomeroyi DSS-3 (Fig. 1a; Supplementary Dataset S1). The mean pairwise average 16S rRNA gene identity and the mean whole-genome average nucleotide identity (ANI) were around 98% and 82%, respectively (Fig. 1c), strengthening the argument that these new isolates were members of multiple different species within the Ruegeria. Even closely related strains did not necessarily share the same ecological niche (Fig. 1a), indicating their rapid shift in habitat. Overall, the 279 newly sequenced isolates greatly expanded the genetic and ecological diversity of Ruegeria than previously appreciated (Supplementary Dataset S1).

Overview of the identified pseudogenes

In an attempt to explore the correlation of pseudogene formation with niche shifts, we first performed genome-wide pseudogene identification among all of the 279 isolates (Fig. 2). We tested whether increasing the number of analyzed genomes could identify more pseudogenes by grouping genomes using different cutoffs of the phylogenetic depth (which considers phylogenetic distance and topology to cluster all isolates into subclades; see Supplementary Fig. S2 for a schematic illustration). As shown in Fig. 3a, the number of genomes within each cluster increased as the phylogenetic depth gradually increased, and accordingly, the number of both the identified pseudogenes and gene families per genome became larger (Fig. 3a). Importantly, the trend suggests that the identified pseudogenes represent a conservative set of the pseudogene repertoire within surveyed genomes, which awaits a more complete sampling of Ruegeria species in future. Hence, in the following analyses, we pooled all 279 genomes together for pseudogene identification. This showed that the number of pseudogenes per genome ranged from 70 to 365 (median: 177). Approximately 16% of the families of the accessory genomes of the newly sequenced isolates contained pseudogenes. No significant differences in the number of pseudogenes per genome across niches were detected (p > 0.05 for both ANOVA and Kruskal–Wallis test; Fig. 3b).

Fig. 3: Overview of the pseudogenes in the 279 newly sequenced Ruegeria genomes.
figure 3

a The number of pseudogenes per genome (left) and the number of pseudogene-containing gene families (right) plotted against the phylogenetic depth (shown in Fig. 1a). Genomes are first divided into different groups according to the phylogenetic depth cutoff. Pseudogenes are estimated using genomes within the same group. b The number of pseudogenes per genome in different ecological niches. Pseudogenes are estimated using the whole data set (i.e., 279 genomes). C-mu coral mucus, C-sk coral skeleton, C-sw coral ambient seawater, C-tis coral tissue, M-tis macroalgal tissue, M-sw macroalgal ambient seawater, M-sd macroalgal ambient sediment. c The pan-genome of the 279 sequenced Ruegeria isolates for the chromosome and the plasmid, respectively. Genes are sorted according to the genome of the model roseobacter Ruegeria pomeroyi DSS-3 (the innermost circle). The chromosome and the plasmid pan-genomes are not plotted in proportion to their number of nucleotides. d Proportions of different COG functional categories for the total pseudogenes and ORFs. The asterisk denotes significant difference (p < 0.05; Fisher’s exact test).

Approximately 99% of the total pseudogenes (strict- and relaxed-consistent; see “Materials and methods”) were assigned to corresponding homologous gene families according to their query ORFs. A proportion of 62% of the gene families with assigned pseudogenes occurred as either ORFs or pseudogenes in a given genome. These gene families were thereafter referred to as “single-state” families, which make 9.6% of the accessory genomes. The remaining families, where both ORFs and pseudogenes were found within a single genome, were termed as “multi-state” gene families. Due to the uncertainties in multi-state gene families, only single-state gene families were used in ancestral state reconstruction (ASR; see “Materials and methods”). Occasionally, pseudogenes can be converted to ORFs by back mutation, nonsense suppression or site-specific recombination [14]. Such events, referred to as “back events”, may reflect that a pseudogene in an ancestral niche became “activated” in a new habitat. Overall, pseudogenization events exceeded back events by a factor of four. The “transposable elements” appeared to be the only functional category overrepresented in pseudogenes compared to ORFs across the 279 genomes (Fisher’s exact test, p < 0.001, Bonferroni correction; Fig. 3d), agreeing with the result of a previous study [59]. A further look at the distribution of pseudogenes on the pan-genomes of the 279 isolates (Fig. 3c) revealed that mobile genetic elements (MGEs), which consist of genomic islands, insertion sequences, prophages and plasmid genes, carried a significantly higher proportion of pseudogenes than the remaining regions of the genome (12.4% vs. 3.0%; p < 0.001, χ2 test). This could be because MGEs are more likely to be dispensable for the cell, but might also suggest selection against “junk DNA” whose activity could be harmful under certain conditions (although some genes carried by genomic islands might provide adaptive functions).

There were 343 gene families whose members had undergone pseudogenization or back events during ecological niche shifts (Fig. 4, Supplementary Dataset S3). For such families, pseudogenization was in line with niche shift, thus we hypothesized that the functional transitions of these genes might be important in the ancestral environment and became dispensable in the new niche (and vice versa for the back events), potentially contributing to ecological adaptation. In 43 of these families, pseudogenization or back events occurred during the niche change in the same direction for at least twice (see those marked with an asterisk in Table 1 as examples and an illustrated example in Fig. 4a), indicating convergent evolution in response to niche shift in different roseobacter strains (we also provided a summary of families experiencing repeated pseudogenization irrespective of niche shift in Supplementary Fig. S3 and Supplementary Dataset S4). A vast majority of these families were transposons, hypothetical proteins, or those with unknown or ecologically irrelevant functions, and after excluding them there remained 41 families that were of particular interest. As follows, we elaborated on potential implications of pseudogenization (or back events) in these families for ecological adaptation of Ruegeria (Table 1, Supplementary Table S2; see also Supplementary Text S2).

Fig. 4: Examples of mapping pseudogenization/back events with niche shifts.
figure 4

Ancestral state reconstructions were performed for functional state (left) and ecological trait (right) with the “MPR” algorithm. Ancestral states are indicated by circles with different colors. The internal nodes with multiple colors mean that there are multiple possibilities of niche states at these nodes. The matched branches are highlighted in red. ad Nitrite-sensitive transcriptional repressor (NsrR), glycogen phosphorylase (OG0005684), membrane-bound lytic murein transglycosylase E (MltE), and Type VI secretion system protein (ImpB).

Table 1 Examples (n = 26) of the 41 ecologically relevant gene families in which functional transitions are in line with niche shifts. Families involved in different categories are marked with superscript c.

Resource recovery and energy acquisition in coral skeleton

The porous calcium carbonate skeleton is characterized as a separate and relatively stable environment protected from external surroundings. It is a harsh environment since light intensity is exceedingly low, with only <1% photosynthetically active radiation penetration due to the absorbance by zooxanthellae-rich coral tissues [60, 61]. We therefore hypothesized that genes involved in resource and energy utilization may be pseudogenized during the niche shift from coral skeleton to energy-richer niches as a result of the relaxation of energy limitation. In agreement with this idea, we found that two gene families involved in cell wall degradation and recycling were pseudogenized in the shifts from the energy-limited coral skeleton to energy-richer niches (Table 1). Cell wall recycling is a common process for resource recovery in bacteria [62], which can recycle up to 50% of the peptidoglycan (PG) components of their cell wall per generation to rebuild cell wall or to use as energy sources [63, 64]. One of the gene families encodes the N-acetylmuramic acid 6-phosphate etherase MurQ, which is required for the utilization of anhydro-N-acetylmuramic acid derived from cell wall murein [65]. The other encodes the membrane-bound lytic murein transglycosylase E, which is essential for non-hydrolytic cleavage of the glycan strands of bacterial cell wall [66]. Consistent with the increase in energy availability during the niche shift, pseudogenization of these families suggests the importance of cell wall recycling in recovering resources and preserving critical energy resources in coral skeleton, in a way similar to the low-light adaptation through PG recycling in two cyanobacteria species [67]. In addition, the family encoding the carbon monoxide (CO) dehydrogenase small chain CoxS was pseudogenized during the niche shift from coral skeleton to macroalgal tissue (Table 1). As this enzyme participates in the oxidation of CO into CO2, which is an energy supplement to roseobacters [38], the increase in energy availability during the niche shift may also relax the functional constraint of the gene involved in this pathway.

Pseudogenization events related to nitrogen regulation and carbohydrate utilization

Pore water within the coral skeleton is highly enriched in nitrogen (N) sources like ammonium, nitrite and nitrate, the concentrations of which are about ten times higher than that in the ambient seawater [68, 69]. Consequently, increase in N availability during the niche change might relax the functional constraint of genes induced under N starvation. This idea was supported by the pseudogenization event in the gene family encoding the nitrogen regulation sensor histidine kinase NtrY during the shift from macroalgal ambient sediment to coral skeleton (Table 1). Genes of the Ntr family were reported to be significantly upregulated in response to N starvation [70]. Hence, we speculated that NtrY was important for the survival of roseobacters in the ancestral N-limited niche, but became dispensable when they shifted to coral skeleton where N is more abundant.

As macroalgae release a large portion of their fixed carbon mainly in the form of carbohydrates to the surrounding environment [35, 71], the ambient seawater and sediments are both rich in organic matter [36, 37]. Interestingly, we found two pseudogene-containing families related to the carbohydrate utilization in the macroalgal ecosystem. During the shift from macroalgal ambient sediment to coral skeleton, one gene encoding the ribose 5-phosphate isomerase A was pseudogenized. RipA converts ribose 5-phosphate to ribulose 5-phosphate, and plays a pivotal role in the pentose phosphate pathway [72]. Pseudogenized during the shift from macroalgal ambient sediment to coral mucus, the other gene encodes the permease protein of a sugar ABC transporter. Considering the rich carbohydrates in the S. hemiphyllum ecosystem, it is therefore conceivable that roseobacters inhabiting macroalgal ambient sediment harbored plenty of genes to utilize the abundant carbohydrates in the environment. The pseudogenization of the two gene families suggests that they are more necessary for carbohydrate utilization in macroalgal ecosystem than in coral skeleton or mucus where carbohydrates may be less available [73].

Transport systems suited to coral skeleton and macroalgal ecosystem

Among the families undergoing pseudogenization or back events during niche shift, we found several cases related to transport systems that may enable bacteria to obtain nutrients from environments. Specifically, two gene families both encoding tripartite ATP-independent periplasmic (TRAP) transporter proteins were pseudogenized during the niche change from coral skeleton to macroalgal ambient sediment (Table 1). TRAP transporters are widely used in marine bacteria inhabiting high-salt and nutrient-poor environments, which has a lower energetic cost than ABC transporters [74]. The family encoding the secondary Na+/H+ antiporter was also pseudogenized during the shift from coral skeleton to macroalgal ambient sediment (Table 1). In addition, a pseudogenization event occurred in the gene family encoding Na+/solute symporter during the shift from coral skeleton to macroalgal tissue (Table 1). In agreement with the energy limitation in coral skeleton, functional changes of these secondary transporters upon niche shifts imply that they may be more favorable in coral skeleton than in macroalgal niches, serving as an extra energy-saving strategy.

Considering the above mentioned gene family encoding the permease protein of sugar ABC transporter (Table 1), we speculated that ATP-driven primary transporters are presumably kept in energy-rich niches while low-energy demanding secondary transporters are favorable in energy-limited environments. This idea was further supported by two gene families pseudogenized during the shift from macroalgal ambient sediment to coral skeleton (Table 1), which encode the ABC-type peptide/nickel transport system permease protein and the V/A-type H+/Na+-transporting ATPase subunit F, respectively [75].

While we interpreted the different preferences of transport systems in the context of energy sources, we were aware that the performance of transport systems is also correlated with other important factors such as substrate availability and ion gradients. It should be noted that contradictory cases might exist here, including the loss of functions of ATPase components (OG0005367) and oligopeptide transport ATP-binding protein OppF (OG0008986) upon shifts from coral skeleton to macroalgal ambient sediment (Supplementary Dataset S3). In these cases, the functional changes plausibly were associated with factors other than energy, such as the uncharacterized substrate levels in different niches.

Stress response genes coping with environmental changes

Coral and macroalgal ecosystems are both exposed to changing physiochemical conditions such as oxygen, osmolarity, temperature and pH [76]. We identified several matched cases associated with stress responses, involving functional transitions between macroalgal and coral ecosystems. For example, three gene families were pseudogenized during the shift from coral skeleton to macroalgal ambient sediment (Table 1), including a thioredoxin (a key antioxidant in defense against oxidative stress), a Na+/H+ antiporter involved in pH homeostasis, and a potassium efflux system protein PhaA for pH adaptation. Given the diurnal fluctuation in oxygen and pH levels in coral skeleton, which drop sharply during the night [29, 31], these pseudogenization events may suggest the roles of these genes in oxidative stress response and pH regulation during habitat change. When the niche shifted from macroalgal ambient sediment to coral skeleton, the nitrite-sensitive transcriptional repressor NsrR and the general stress protein A were both pseudogenized (Table 1). NsrR likely contains an [Fe–S] cluster, and is known as a key regulator to cope with oxidative and nitrosative stresses [77, 78]. These two genes may help resist stress conditions in macroalgal ambient sediment.

In addition, the gene family encoding glutathione S-transferase, a protein responsible for protection against oxidative stress [79], was pseudogenized during the shift from macroalgal ambient sediment to coral mucus (Table 1). Members of another gene family encoding the same function were repeatedly pseudogenized during the shifts from coral skeleton to macroalgal tissue and macroalgal ambient seawater (Table 1). The frequent functional changes of this gene likely reflect the ability of roseobacters inhabiting different ecological niches to sense and rapidly respond to oxidative stresses through pseudogenization.

Genes mediating organismal interactions correlated with environmental changes

We found that pseudogenization and back events frequently occurred in gene families encoding the components of secretion systems and flagella, potentially mediating bacteria-host and bacteria-bacteria interactions (Table 1). For example, genes encoding the components of Type II, IV, and VI secretion systems were pseudogenized during niche shifts between coral and macroalgal ecosystems or between niches within each ecosystem (Table 1). Particularly, T4SS and T6SS were demonstrated to be important in the interactions with host and other bacteria [80]. Considering the diversity and specificity of bacterial secretion systems, secreting effector proteins or cytotoxins into certain environments may represent a competitive strategy that helps roseobacters exploit the host or compete with other bacteria inhabiting the same niche [80].

Further, pseudogenization or back events occurred in gene families associated with bacterial flagellar assembly and export (TipF, FlaA, FlgD, and FliI) during niche shifts between macroalgal ambient sediment and coral skeleton (Table 1). Besides enabling motility, bacterial flagella play additional roles in surface adhesion and biofilm formation, which may enhance the resistance to antimicrobial agents and enable the embedded cells to outcompete unrelated neighbors for both space and resources [81]. As a classic example are genes encoding components of the flagellar (mentioned above), whose mutant can lead to defective adhesion to coral [82] and the loss of the ability of biofilm formation and swarming motility (which may be relevant to roseobacters’ activities in the sediment) [83]. These events suggest the potential importance of dynamic regulation of flagellar assembly in host colonization and bacterial survival during niche shifts.

As each niche can be further divided into multiple physiochemical microenvironments, which differs from time to time, the nature of each niche is likely much more complex and heterogeneous than we can characterize. Accordingly, the resident microbial communities can be even more diverse and dynamic. Recently, a surprisingly greater microbiome diversity was revealed in coral skeleton than in coral mucus and tissue [84, 85], which may have been shaped by multiple physiochemical gradients across depth layers and limited dispersal of microbes in the skeletal matrix [86]. It was therefore speculated that roseobacters inhabiting coral skeleton may harbor some “niche factors” aiding competition with other members in the community or serving as cell-cell communication strategy [87]. One explanation for the pseudogenization events upon shift from coral skeleton to coral mucus could be that the microbiome diversity decreased as a result of niche shift, which leads to reduced extent of microbial interactions. Likewise, the functional transitions between coral and macroalgal ecosystems may reflect the different composition and diversity of microbial community in different niches. As microbial communities on algal surface differ remarkably from the free-living bacteria in the surrounding seawater [33, 88], it is possible that bacteria inhabiting distinct niches of macroalgal ecosystem are equipped with various weapons to defend against competitors and persist in the niche. However, the knowledge about specific niche factors in different environments at the current stage of research is still very limited. Therefore, future studies are necessary to address these important questions.

Caveats and future research needs

Similar to a recent study [89], we defined the ecological niche of roseobacters according to their isolation sites. Yet, this could lead to several caveats and future research needs. Importantly, the relationship between isolation sites and habitats is not simple, which could further be complicated by potential disruption in sampling (e.g., a reliable method to separate coral tissue from skeleton and mucus has not been available [90]). In the present study, although we presented interesting examples regarding the correlation between pseudogenization events and niche shifts, it is possible that some of these occurred simply by chance. Most of the pseudogenization or back events occurring during niche shifts were annotated as hypothetical proteins (Supplementary Dataset S3). For those with an assigned function, due to their small number (n = 41) of ecologically matched cases (Table 1, Supplementary Table S2), it is very difficult to perform rigorous statistical tests to examine whether specific functions are more likely to be associated with such events. As a matter of expediency, we elaborated on the relationship between their functions and the niche change, highlighting potential contributions of pseudogenization or back events to niche adaptation, which, however, could be speculative. In addition, except for a few cases (e.g., [91],), the knowledge of the adaptiveness of pseudogenes is still scarce. With the above caveats in mind, we suggest that function-testing experiments and population genetics analyses are needed before any conclusion of the adaptive role of pseudogenization are reached. This is beyond the scope of this study and awaits future work.

In addition, while most samples were collected in the same season and from locations nearby in Hong Kong, we cannot rule out the effect of sampling sites and dates on the evolutionary changes of functional genes. Since roseobacters isolated from samples collected before May 2017 and on Feb 25, 2018 (Ngo Mei Chau) together account for only ~14% of all isolates (Supplementary Table S1), we removed them from an updated analysis to minimize the effect of sampling sites and dates. A total of 306 gene families were identified, among which 262 families were consistent with the families identified with the dataset before removing the time-inconsistent isolates. Although the number of ecologically matched cases decreased from 41 to 31 in the updated analysis, 28 of them remained consistent with the previous ones (Supplementary Dataset S3). Among the rest three newly identified cases, two families encoding the multidrug ABC transporter ATP-binding protein and the flagellar biosynthesis protein FliR, respectively, were pseudogenized during the niche shift from macroalgal ambient sediment to coral skeleton. The remaining one encodes a catalase for mitigating oxidative stresses, and it was pseudogenized during the shift from coral skeleton to macroalgal ambient sediment.

Concluding remarks

Through comprehensive genomic analyses of 279 newly sequenced Ruegeria isolates collected from coral and macroalgal ecosystems nearby, we showed that gene loss via pseudogenization is likely an important mechanism driving genome content differentiation of this ecologically diverse and metabolically versatile marine bacterial lineage, and further identified a potential correlation between changes in genome content mediated by pseudogenization and shifts in ecological niches harbored in these two typical coastal ecosystems. Genes whose pseudogenization events may be correlated with niche switches include those involved in resource recovery and energy conservation, N metabolism and carbohydrate utilization, transport systems, stress response and organismal interactions (summarized in Fig. 5). Since gene loss by pseudogenization often requires only a single point-nonsense mutation, this mechanism may enable roseobacters to rapidly respond to environmental changes and adapt to new habitats. Overall, our study suggests that gene loss mediated by gene pseudogenization is an important contributor to the genetic variation and ecological diversification of the Roseobacter group. This mechanism may similarly act in other generalist bacteria.

Fig. 5: A summary of the interpretation of pseudogenization and back events of the genetic traits in the Ruegeria isolates during different types of niche shifts.
figure 5

The images of the coral and macroalgae, respectively, are kindly provided by CHUI Pui Yi and ANG Put On Jr. with permission to use here.