Aerobic anoxygenic phototrophs (AAnPs) are common in marine environments and are associated with photoheterotrophic activity. To date, AAnPs that possess the potential for carbon fixation have not been identified in the surface ocean. Using the Tara Oceans metagenomic dataset, we have identified draft genomes of nine bacteria that possess the genomic potential for anoxygenic phototrophy, carbon fixation via the Calvin-Benson-Bassham cycle, and the oxidation of sulfite and thiosulfate. Forming a monophyletic clade within the Alphaproteobacteria and lacking cultured representatives, the organisms compose minor constituents of local microbial communities (0.1–1.0%), but are globally distributed, present in multiple samples from the North Pacific, Mediterranean Sea, the East Africa Coastal Province, and the Atlantic. This discovery may require re-examination of the microbial communities in the oceans to understand and constrain the role this group of organisms may play in the global carbon cycle.
A central tenant of in our current understanding of marine microbiology is that aerobic anoxygenic phototrophic bacteria (AAnPs) are heterotrophic and lack the capacity for carbon fixation . AAnPs utilize type-II photochemical reaction centers (RCIIs) and the photopigment bacteriochlorophyll to harness light energy in order to translocate protons across the cytoplasmic membrane [2, 3]. The presence and abundance of AAnPs in marine environments has been studied extensively, revealing a phylogenetically diverse [4, 5] and globally distributed group of microorganisms . AAnPs with the capacity of carbon fixation have not been recognized, unlike anaerobic anoxygenic phototrophs, such as Rhodobacter sphaeroides, which will only perform photosynthesis under suboxic or anoxic conditions [7, 8]. Utilizing the microbial metagenomes generated during the Tara Oceans expedition [9, 10] and the subsequent microbial metagenome-assembled genomes (MAGs) generated through various studies [11, 12, 13], we have identified the genomes of a globally-distributed, novel alphabacterial clade that encodes the genes necessary for aerobic anoxygenic phototrophy and carbon fixation through the Calvin-Benson-Bassham (CBB) cycle. While further research will be required to ascertain the extent to which anoxygenic phototrophy and carbon fixation interact within these organisms, it is possible this may represent a new mode of photosynthesis in the global ocean.
Materials and methods
A collection of non-redundant MAGs generated from several studies using the Tara Oceans metagenomic dataset [11,12,13] and from the Red Sea  were screened for the predicted presence of genes assigned as the M and L subunits of type-II photochemical reaction center (PufML) and the large and small units of ribulose-1,5-bisphosphate carboxylase (RbcLS, RuBisCO). A detailed methodology can be found in the Supplemental Information.
Results and discussions
A collection of 3655 marine microbial MAGs were screened for the presence of the genes encoding PufML, resulting in the identification of 102 genomes. Within this group of anoxygenic phototrophs, nine genomes were identified (six from Tully et al.  and three from Delmont et al. ) that also encoded the genes for RbcLS. The nine genomes of interest are of varying degrees of quality with estimated completion values of 38–95% (mean: 76%) and a limited amount of strain non-specific contamination (Table 1). Along with nine other genomes lacking the genes for PufML and/or RbcLS, they form a distinct sister clade to the Rhodobacteraceae (Fig. 1a), absent of cultured or genomic representatives in NCBI GenBank . As is common for MAGs, none of the genomes possessed a full-length 16S rRNA gene sequence. A partial, 90 bp 16S rRNA gene fragment in one MAG was linked to a clade of 16S rRNA genes assigned to an uncultured group of organisms associated with the Rhodobacteraceae. To represent the novel nature of this clade, we propose the creation of a family level group that encompasses all 18 genomes with the tentative name ‘Candidatus Luxescamonaceae’ (L. fem. lux, light; L. fem. esca, food; Gr. fem. monas, a unit, monad; N.L. fem. n. Luxescamonaceae, the light and food monad).
The nine AAnP genomes from this novel clade were detected in 51 samples from 29 stations in the Tara Oceans metagenomic dataset at >0.01% relative abundance (max. ~1.0% relative abundance; Fig. 2; Supplemental Information 1). Just under half of those samples (n = 21) had approximately >0.1% relative abundance (max. ~1.0%), with the highest abundance in a sample from the North Pacific (TARA137, 1.04%). Tara Oceans metagenomic samples were classified in to three partially overlapping size classes (<0.2, 0.2–3.0, 0.8–5.0 μm) and collected from three depths (surface, deep chlorophyll maximum [DCM], and mesopelagic). Predominantly, the AAnP genomes of interest occur in metagenomic samples from the 0.2–3.0 μm size fraction at the shallower sampling depths (surface and DCM), suggesting that these organisms occur as a component of the ‘free-living’ fraction of the microbial community in the photic zone. In many instances, the genomes can be detected from the same depth in medium and large size fractions, for which the most parsimonious explanation would be that cells are regularly in in the 0.8–3.0 μm size range.
Phylogenetic assessments of the RbcL and PufM sequences was performed to contextualize the AAnP genomes relative to the established diversity. The ‘Ca. Luxescamonaceae’ AAnP RuBisCO sequences clustered with the Type-IC/D RuBisCOs, suggesting that the AAnP sequences are bona fide RuBisCOs capable of carbon fixation (Supplemental Figure 1). Specifically, the AAnP sequences exclusively clustered with environmental RuBisCOs identified in the Global Ocean Survey (GOS) dataset. PufM sequences from the AAnP genomes cluster in two distinct clades that consist only of environmental GOS sequences (Supplemental Figure 2). Interestingly, PufM sequences in the clade encompassing NP970, SAT68, and MED800 do not match the canonical pufM primers and may explain why this group has not been previously identified [5, 16,17,18,19].
A comparative analysis of the six genomes generated in Tully et al.  reveals the genomic capacity for complete carbon fixation via the CBB cycle, the biosynthesis of bacteriochlorophyll, and the oxidation of inorganic sulfur compounds, either thiosulfate via the SOX system and/or sulfite via sulfite dehydrogenase (Fig. 1b). An expanded comparison of the ‘Ca. Luxescamonaceae’ clade and the genomes of anaerobic anoxygenic phototrophs reveals a distinct lack of microaerobic cytochromes, specifically cbb3-type cytochrome oxidase  and cytochrome bd-type , and the inability to utilize alternative electron acceptors, such as nitrate, nitrite, and sulfate (Fig. 1c). Additionally, all of the AAnP from the ‘Ca. Luxescamonaceae’ also possess oxygen-dependent ring cyclase (acsF) necessary for bacteriochlorophyll biosynthesis in oxic environments. Collectively, it would suggest that the genomes in the ‘Ca. Luxescamonaceae’ with RCII and RuBisCO are true AAnP, capable of lithoautotrophic growth through the oxidation of various sulfur compounds .
The potential for the metabolic link between RCII and CBB in a previously unidentified clade of Alphaproteobacteria may be a new avenue of photosynthesis in the ocean. It should be noted that despite identification of the ‘Ca. Luxescamonaceae’ in reconstructed MAGs from independent assembly and binning methodologies, MAGs remain an imperfect tool due to the nature of incomplete genomic content and the influence of contaminating DNA sequences. One avenue to test how the RCII apparatus and the CBB cycle are connected under oxic conditions would be the identification of simultaneous expression of both metabolic pathways. However, due to low abundance of these organisms in natural assemblages and known confounding issues in expression, such as the staggered expression of photosynthesis genes in Cyanobacteria , it will likely be difficult to observe this under in situ conditions. Attempts utilizing publicly-available ribosomal rRNA de novo removed metatranscriptomes from Tara Oceans (Accession No.: ERS490659, ERS494518, ERS1092158, ERS490542) generated read coverage values in a range well below accepted quantifiable ranges. With this consideration, it seems likely that targeted identification of expression (e.g., qPCR) and/or the isolation/enrichment of members of this clade will be required to explore the potential metabolism of these organisms.
For the ‘Ca. Luxescamonaceae’ genomes generated from Tully et al. , the original genomes can be accessed at DDBJ/ENA/GenBank with the Whole Genome Shotgun project deposited under the accessions: PAHT01000000, PACC01000000, PBQX01000000, PAKN01000000, PAGE01000000, NZPY01000000. For the ‘Ca. Luxescamonaceae’ genomes generated from Delmont et al. , the original genomes can be accessed at https://doi.org/10.6084/m9.figshare.4902923. Updated genomes, as the result of the applied refinement step, originating from Delmont et al.  can be accessed at https://doi.org/10.6084/m9.figshare.5615011.v1. Update genomes, as the result of the applied refinement step, originating from Tully et al.  can be accessed with the GenBank accessions: PAHT02000000, PACC02000000, PBQX02000000, PAKN02000000, PAGE02000000, NZPY02000000.
We would like to acknowledge and thank Drs. Eric Webb and William Nelson for providing invaluable comments and critiques in the early stages of this research. We are indebted to the Tara Oceans consortium for their commitment to open-access data that allows data aficionados to indulge in the data and attempt to add to the body of science contained within. And we thank the Center for Dark Energy Biosphere Investigations (C-DEBI) for providing funding to BJT and JFH (OCE-0939654). This is C-DEBI contribution number 418.