Introduction

A central tenant of in our current understanding of marine microbiology is that aerobic anoxygenic phototrophic bacteria (AAnPs) are heterotrophic and lack the capacity for carbon fixation [1]. AAnPs utilize type-II photochemical reaction centers (RCIIs) and the photopigment bacteriochlorophyll to harness light energy in order to translocate protons across the cytoplasmic membrane [2, 3]. The presence and abundance of AAnPs in marine environments has been studied extensively, revealing a phylogenetically diverse [4, 5] and globally distributed group of microorganisms [6]. AAnPs with the capacity of carbon fixation have not been recognized, unlike anaerobic anoxygenic phototrophs, such as Rhodobacter sphaeroides, which will only perform photosynthesis under suboxic or anoxic conditions [7, 8]. Utilizing the microbial metagenomes generated during the Tara Oceans expedition [9, 10] and the subsequent microbial metagenome-assembled genomes (MAGs) generated through various studies [11, 12, 13], we have identified the genomes of a globally-distributed, novel alphabacterial clade that encodes the genes necessary for aerobic anoxygenic phototrophy and carbon fixation through the Calvin-Benson-Bassham (CBB) cycle. While further research will be required to ascertain the extent to which anoxygenic phototrophy and carbon fixation interact within these organisms, it is possible this may represent a new mode of photosynthesis in the global ocean.

Materials and methods

A collection of non-redundant MAGs generated from several studies using the Tara Oceans metagenomic dataset [11,12,13] and from the Red Sea [14] were screened for the predicted presence of genes assigned as the M and L subunits of type-II photochemical reaction center (PufML) and the large and small units of ribulose-1,5-bisphosphate carboxylase (RbcLS, RuBisCO). A detailed methodology can be found in the Supplemental Information.

Results and discussions

A collection of 3655 marine microbial MAGs were screened for the presence of the genes encoding PufML, resulting in the identification of 102 genomes. Within this group of anoxygenic phototrophs, nine genomes were identified (six from Tully et al. [11] and three from Delmont et al. [12]) that also encoded the genes for RbcLS. The nine genomes of interest are of varying degrees of quality with estimated completion values of 38–95% (mean: 76%) and a limited amount of strain non-specific contamination (Table 1). Along with nine other genomes lacking the genes for PufML and/or RbcLS, they form a distinct sister clade to the Rhodobacteraceae (Fig. 1a), absent of cultured or genomic representatives in NCBI GenBank [15]. As is common for MAGs, none of the genomes possessed a full-length 16S rRNA gene sequence. A partial, 90 bp 16S rRNA gene fragment in one MAG was linked to a clade of 16S rRNA genes assigned to an uncultured group of organisms associated with the Rhodobacteraceae. To represent the novel nature of this clade, we propose the creation of a family level group that encompasses all 18 genomes with the tentative name ‘Candidatus Luxescamonaceae’ (L. fem. lux, light; L. fem. esca, food; Gr. fem. monas, a unit, monad; N.L. fem. n. Luxescamonaceae, the light and food monad).

Table 1 Summary of genome statistics for the putative members of the ‘Ca. Luxescamonaceae’
Fig. 1
figure 1

a Phylogenomic tree of 31 concatenated marker genes for the Alphaproteobacteria. Numbers in parentheses represent the number of genomes collapsed within a branch. Black stars denote genomes within the ‘Ca. Luxescamonaceae’ that possess PufM and RbsL; gray stars denote genomes that possess RbsL only; white stars denote genomes that possess PufM only. Bootstrap values >0.75 are shown. Circle size representing the bootstrap value is scaled from 0.75–1.0. b Cellular schematic comparing the six AAnP genomes derived from Tully et al. [11]. c Comparison of predicted functions for the entire ‘Ca. Luxescamonaceae’ clade, selected neighbors, and previously described anaerobic anoxygenic phototrophs. Dendrogram represents the phylogenetic relationship between members of the ‘Ca. Luxescamonaceae’. Black boxes and blue arrows denote specific comparison discussed in the manuscript. Predicted functions are represented on a scale from 0 to 1 denoting the fraction of completeness a pathway or function has within a genome. KEGG module or ontology IDs used to determine function completeness are noted

The nine AAnP genomes from this novel clade were detected in 51 samples from 29 stations in the Tara Oceans metagenomic dataset at >0.01% relative abundance (max. ~1.0% relative abundance; Fig. 2; Supplemental Information 1). Just under half of those samples (n = 21) had approximately >0.1% relative abundance (max. ~1.0%), with the highest abundance in a sample from the North Pacific (TARA137, 1.04%). Tara Oceans metagenomic samples were classified in to three partially overlapping size classes (<0.2, 0.2–3.0, 0.8–5.0 μm) and collected from three depths (surface, deep chlorophyll maximum [DCM], and mesopelagic). Predominantly, the AAnP genomes of interest occur in metagenomic samples from the 0.2–3.0 μm size fraction at the shallower sampling depths (surface and DCM), suggesting that these organisms occur as a component of the ‘free-living’ fraction of the microbial community in the photic zone. In many instances, the genomes can be detected from the same depth in medium and large size fractions, for which the most parsimonious explanation would be that cells are regularly in in the 0.8–3.0 μm size range.

Fig. 2
figure 2

Global map illustrating the Tara Oceans sampling sites. Sites at which the AAnP members of the ‘Ca. Luxescamonaceae’ were detected at >0.01% relative abundance are depicted. For each site, filter size fractions that were not collected are represented by an ‘X’ and each column represents one of the three Tara Oceans filter size fractions. Red asterisks denote filter fractions in which the relative abundance of genomes from Delmont et al. [12] contributed at least 0.01% (max. 0.04%) of the total relative abundance (this study). Squares highlighted in red denote filter samples in which ‘Ca. Luxescamonaceae’ genomes from Delmont et al. [12] had a reported relative fraction of the metagenome of 0.01–0.03% [12]

Phylogenetic assessments of the RbcL and PufM sequences was performed to contextualize the AAnP genomes relative to the established diversity. The ‘Ca. Luxescamonaceae’ AAnP RuBisCO sequences clustered with the Type-IC/D RuBisCOs, suggesting that the AAnP sequences are bona fide RuBisCOs capable of carbon fixation (Supplemental Figure 1). Specifically, the AAnP sequences exclusively clustered with environmental RuBisCOs identified in the Global Ocean Survey (GOS) dataset. PufM sequences from the AAnP genomes cluster in two distinct clades that consist only of environmental GOS sequences (Supplemental Figure 2). Interestingly, PufM sequences in the clade encompassing NP970, SAT68, and MED800 do not match the canonical pufM primers and may explain why this group has not been previously identified [5, 16,17,18,19].

A comparative analysis of the six genomes generated in Tully et al. [11] reveals the genomic capacity for complete carbon fixation via the CBB cycle, the biosynthesis of bacteriochlorophyll, and the oxidation of inorganic sulfur compounds, either thiosulfate via the SOX system and/or sulfite via sulfite dehydrogenase (Fig. 1b). An expanded comparison of the ‘Ca. Luxescamonaceae’ clade and the genomes of anaerobic anoxygenic phototrophs reveals a distinct lack of microaerobic cytochromes, specifically cbb3-type cytochrome oxidase [20] and cytochrome bd-type [21], and the inability to utilize alternative electron acceptors, such as nitrate, nitrite, and sulfate (Fig. 1c). Additionally, all of the AAnP from the ‘Ca. Luxescamonaceae’ also possess oxygen-dependent ring cyclase (acsF) necessary for bacteriochlorophyll biosynthesis in oxic environments. Collectively, it would suggest that the genomes in the ‘Ca. Luxescamonaceae’ with RCII and RuBisCO are true AAnP, capable of lithoautotrophic growth through the oxidation of various sulfur compounds [22].

Conclusion

The potential for the metabolic link between RCII and CBB in a previously unidentified clade of Alphaproteobacteria may be a new avenue of photosynthesis in the ocean. It should be noted that despite identification of the ‘Ca. Luxescamonaceae’ in reconstructed MAGs from independent assembly and binning methodologies, MAGs remain an imperfect tool due to the nature of incomplete genomic content and the influence of contaminating DNA sequences. One avenue to test how the RCII apparatus and the CBB cycle are connected under oxic conditions would be the identification of simultaneous expression of both metabolic pathways. However, due to low abundance of these organisms in natural assemblages and known confounding issues in expression, such as the staggered expression of photosynthesis genes in Cyanobacteria [23], it will likely be difficult to observe this under in situ conditions. Attempts utilizing publicly-available ribosomal rRNA de novo removed metatranscriptomes from Tara Oceans (Accession No.: ERS490659, ERS494518, ERS1092158, ERS490542) generated read coverage values in a range well below accepted quantifiable ranges. With this consideration, it seems likely that targeted identification of expression (e.g., qPCR) and/or the isolation/enrichment of members of this clade will be required to explore the potential metabolism of these organisms.

Data availability

For the ‘Ca. Luxescamonaceae’ genomes generated from Tully et al. [11], the original genomes can be accessed at DDBJ/ENA/GenBank with the Whole Genome Shotgun project deposited under the accessions: PAHT01000000, PACC01000000, PBQX01000000, PAKN01000000, PAGE01000000, NZPY01000000. For the ‘Ca. Luxescamonaceae’ genomes generated from Delmont et al. [12], the original genomes can be accessed at https://doi.org/10.6084/m9.figshare.4902923. Updated genomes, as the result of the applied refinement step, originating from Delmont et al. [12] can be accessed at https://doi.org/10.6084/m9.figshare.5615011.v1. Update genomes, as the result of the applied refinement step, originating from Tully et al. [11] can be accessed with the GenBank accessions: PAHT02000000, PACC02000000, PBQX02000000, PAKN02000000, PAGE02000000, NZPY02000000.