Introduction

Archaea of the phylum Thaumarchaeota are among the most abundant microorganisms on the planet, constituting up to 20% of single-celled life in marine systems alone [1]. Although most characterized members of Thaumarchaeota are ammonia-oxidizing archaea (AOA), the phylum also encompasses several archaeal clades for which ammonia oxidation has not yet been demonstrated (e.g., Group 1.1c, and Group 1.3 [2]). These basal, non-AOA members of the phylum have primarily been described in terrestrial systems such as anoxic peat soils [3], subsurface aquifer sediments [4], geothermal springs [5, 6], and acidic forest soil [7]. Availability of molecular oxygen on Earth is hypothesized to have influenced the evolution and habitat expansion of AOA from the basal anaerobic guilds [8].

A deeply branching marine thaumarchaeal clade that has eluded cultivation and genomic analysis efforts is the pSL12-like group, also referred to as Group 1A or ALOHA group. First detected by DeLong et al. [9] in the North Pacific Subtropical Gyre at station ALOHA, this clade appeared to be divergent from Marine Group 1 Archaea, clustering with a hot spring-associated crenarchaeal 16S rRNA sequence designated pSL12 [10]. Mincer et al. [11] suggested that at least some members of the clade may harbor the ammonia-oxidation machinery, based on correlating abundances of the 16S rRNA gene and the amoA gene in oceanic water column samples (amoA encodes the alpha-subunit of ammonia monooxygenase (AMO); conventionally used as the functional marker for AOA). The only available genomic information for the pSL12-like lineage comes from a fosmid clone library generated from the Mediterranean Sea [12]. One of the three pSL12-like fosmid sequences recovered by Martin-Cuadrado et al. [12] contained genes putatively involved in nitrogen fixation; however, there has been no genomic or biogeochemical evidence supporting this observation since. Several SSU rRNA gene surveys have detected the pSL12-like group in various marine systems such as the Atlantic Ocean [13], Mediterranean Sea [14], multiple Pacific Ocean transects [15], the Northern Gulf of Mexico [16], and Monterey Bay [17]. Despite their suggested roles in N-cycle transformations, the metabolic adaptations of the pSL12-like lineage remain an open question.

Here we analyze the genomic repertoire and metabolic strategies of the pSL12-like lineage, based on two metagenome-assembled genomes (MAGs) obtained from seawater incubation metagenomes derived from Monterey Bay. Metabolic reconstructions point to a heterotrophic lifestyle. Intriguingly, both genomes also encoded a form III ribulose-bisphosphate carboxylase (RuBisCO), which may participate in a CO2 incorporation pathway linked to nucleoside salvage reactions. The high degree of phylogenetic and metabolic separation between these MAGs and typical marine thaumarchaeal clades suggests that the pSL12-like lineage represents an evolutionary link between anaerobic basal clades of Thaumarchaeota and aerobic marine AOA.

Materials and methods

Sample collection, incubation, and DNA extraction

Water column samples for AOA enrichment incubations were collected from Monterey Bay, CA, in May 2010. ASW2 was collected from 150 m at station M1 (36.747 N, −122.022 W), and ASW8 was collected from 200 m at station M2 further offshore (36.697 N, −122.378 W). After 8 years of incubation at 12° C (seawater samples were unamended; the long incubation period was to facilitate natural enrichment of AOA), 925 and 1000 mL each of the samples (for ASW2 and ASW8, respectively) were filtered using a 0.22-μm filter (Supor, Pall Inc, New York, USA). DNA was extracted using the DNeasy kit (Qiagen, Valencia, CA, USA), following the manufacturer’s protocol. To maximize DNA yield, DNeasy capture columns were eluted twice with 50 μL each of elution buffer, resulting in 100 μL total elution volume for each sample. DNA concentration was measured using Qubit Fluorometer (Invitrogen, NY, USA); 1.41 and 1.88 μg/mL DNA was obtained from ASW2 and ASW8, respectively.

Metagenome sequencing, assembly, and binning

Metagenome sequencing was performed as a part of a Community Science Program (CSP) project with the DOE Joint Genome Institute (JGI); the samples were sequenced (2 × 151 bp) using the HiSeq 2000 1TB platform. Read quality-filtering was carried out using the custom JGI script jgi_mga_meta_rqc.py (v2.0.0). Briefly, trimmed paired‐end reads filtered using BBDuk [18] (v37.50; BBTools software package, http://bbtools.jgi.doe.gov) were read‐corrected using BFC (v.r181 [19]). Reads without a mate pair were removed.

Quality-filtered reads were assembled using MEGAHIT (v1.1.3 [20, 21]), using a range of k-mers (k = 21, 33, 55, 77, 99, 127). Contigs longer than 2000 bp were binned using two algorithms: MetaBAT2 (v2.12.1 [22]) and MaxBin2 (v2.2.6 [23, 24]). Resulting bins were refined using the bin refinement module in metaWRAP (v1.2.2 [25]), and subsequently re-assembled using SPAdes (v3.13.0 [26]) to improve assembly quality. CheckM (v1.0.12 [27]) was used to assess bin completion. Taxonomic classifications were obtained using the GTDB-tk toolkit (v0.3.2 [28]). Dereplication based on average nucleotide identity (ANI) was performed using dRep (v2.3.2 [29]). Only bins with estimated completeness ≥70% and contamination <10% were retained for downstream analysis.

The assembled genome sequences can be accessed under the BioSample IDs SAMN14765629 and SAMN14765628, respectively, for ASW2_bin45 and ASW8_bin1 (corresponding BioProject accessions are PRJNA621967 and PRJNA539366, respectively).

MAG annotation and metabolic reconstruction

Prodigal (v2.6.3 [30]) was used for gene prediction, and functional annotations were obtained using Prokka (v1.12 [31]). In addition, the BlastKOALA and GhostKOALA tool servers [32] were used to obtain KO annotations for genes predicted by Prodigal. KEGG-decoder [33] was used to estimate pathway completeness based on KO annotations, and the results were plotted in R [34]. SEED annotations were obtained from the online Rapid Annotation using Subsystem Technology server [35]. Metabolic reconstructions were carried out using the ‘Reconstruct Pathway’ tool in KEGG mapper (https://www.genome.jp/kegg/mapper.html). TransportDB (v2.0 [36]) was used to predict membrane transporters; these annotations were further confirmed by BLASTp searches against the NCBI nonredundant protein database. SignalP-5.0 Server was used for signal peptide prediction (http://www.cbs.dtu.dk/services/SignalP-5.0/).

Phylogenetic analyses

Reference genomes for Thaumarchaeota and Aigarchaeota were downloaded from NCBI or the Integrated Microbial Genomes system. The phylogenomics module in Anvi’o (v5.4 [37]) was used to retrieve ribosomal sequences from the MAGs and the reference genomes. The ‘anvi-get-sequences-for-hmm-hits’ command was used to search for and retrieve 30 ribosomal proteins from each genome (these included ribosomal proteins L1, L10, L11, L11_N, L13, L14, L16, L18p, L2, L22, L23, L29, L2_C, L3, L4, L5, L5_C, L6, S11, S13, S15, S17, S19, S2, S3_C, S5, S5_C, S7, S8, and S9). Amino acid sequences for the retrieved proteins were aligned using MUSCLE [38] and concatenated. The alignment was trimmed using trimal [39], with the parameters: -gapthreshold 0.75 -simthreshold 0.001. Further manual refinement was carried out in Geneious (v10.2; Biomatters Ltd, New Zealand). Since some of the genomes included in the analysis were assembled from metagenome or single-cell genome data, not all ribosomal proteins were universally identified across genomes. In the final alignment, only those genes identified in all genomes were retained, and this amounted to a total of 11 genes across 23 genomes. A maximum-likelihood tree was computed using FastTree [40] with 100 bootstrap replicates.

We used BLASTp [41] to search the MAGs for proteins of interest—both to confirm automatic annotations and to search for specific pathways/genes. Barrnap (v0.9; https://github.com/tseemann/barrnap) was used to identify ribosomal features. 16S rRNA sequences were aligned with reference sequences using MAFFT [42], and a maximum-likelihood phylogenetic tree was computed in FastTree [40] with 1000 bootstrap replicates. RuBisCO reference sequences were obtained from Jaffe et al. [43]; MAFFT and FastTree, respectively, were used for generating an alignment and a phylogenetic tree.

FastANI [44] was used to compute ANI between the MAGs. GTDB-tk identified a moderate-quality (62% estimated completeness) MAG assembled from a deep hydrothermal plume [45] as a close relative of the MAGs assembled here; this genome (UBA57) was also included in FastANI and function comparison analyses.

Assessing environmental distribution of MAGs

As part of the time-series microbiome survey in Monterey Bay, we previously obtained a depth-resolved dataset of 16S rRNA V4-V5 amplicon sequences [46], as well as metagenomes and metatranscriptomes [47]. We were able to match one of the MAG-derived 16S rRNA sequences to an operational taxonomic unit (OTU) obtained in the 16S rRNA time-series dataset. We estimated the relative abundance of this OTU as well as another that shared 96–97% sequence identity with the MAG-derived sequences.

We used three metagenome sets for read recruitment: (i) a depth- and time-resolved metagenome dataset from Monterey Bay; (ii) a North Atlantic Ocean depth profile from the TARA Oceans dataset; and (iii) a North Pacific Ocean depth profile from the TARA Oceans dataset. Note that the TARA oceans datasets do not represent a continuous depth profile (Table S1). Bowtie2 (v2.3.5 [48]) was used to recruit metagenomic and metatranscriptomic reads against the MAGs. Read abundances were normalized as the number of reads recruited per kilobase of MAG and gigabase of metagenome (RPKG). The RPKG values allowed the direct comparison of genome abundances (measured as coverage) between metagenomes of different sizes.

Results and discussion

Genomes recovered from reduced-diversity metagenomes

Unamended seawater incubations were started in 2010, using water collected from various depths in Monterey Bay (see Materials and methods). Prior to metagenome sequencing, 16S rRNA gene amplicon libraries were generated to examine the community composition in each incubation. This suggested an enrichment of Thaumarchaeota in both samples presented here (Fig. S1), and these samples were further examined via metagenome sequencing. Assembly and binning resulted in three genomic bins of the pSL12-like lineage, which were further dereplicated into two MAGs (see Materials and methods).

The MAGs assembled here represent the first high-quality genomes reported for the pSL12-like lineage (completion estimates for the two MAGs are 88.8 and 97.08%, with <3% contamination; Table 1). Their relative placement within the phylum Thaumarchaeota was confirmed by both phylogenomic and single-gene phylogenetic analyses (Fig. 1). Both MAGs contained two partial copies each of the 16S ribosomal rRNA gene. On two separate maximum-likelihood trees computed on nucleotide alignments that included reference sequences from all major thaumarchaeal lineages, the MAG-derived 16S rRNA sequences clustered with Group 1A clone fragments generated from various ocean regions in prior studies (Figs. 1a and S2; [11,12,13]). The 3′-truncated 16S rRNA gene fragments within the MAGs shared 93.85% nucleotide identity along the aligned region (910 aligned positions), while the 5′-truncated fragments shared 92.32% nucleotide identity (664 aligned positions). The original primer pairs developed by Mincer et al. [11] to target the pSL12-like lineage aligned without any mismatches to the longer 3′-truncated 16S rRNA gene fragments from both genomes. Similarly, the widely used universal primers targeting the V4-V5 region of the 16S rRNA gene [49] also aligned with the MAG-derived sequences, again without any mismatches. Thus, microbial community surveys employing either of these primer sets should pick up the pSL12-like/Group 1A lineage. We were able to verify this in a high-resolution 16S rRNA gene dataset generated from Monterey Bay targeting the V4-V5 region (see discussion below).

Table 1 Metagenome-assembled genome (MAG) statistics.
Fig. 1: The assembled genomes cluster within the marine pSL12-like thaumarchaeal lineage.
figure 1

a Maximum-likelihood phylogenomic tree computed using a concatenated alignment of 11 ribosomal proteins. Bootstrap values are indicated on nodes. See Materials and methods for details on alignment and tree computation. b Phylogeny of MAG-derived 16S rRNA gene sequences with genomic as well as environmental reference sequences. Node shading indicates bootstrap support.

The closest genomic relative in the database was a MAG obtained from a hydrothermal vent plume metagenome (from 4900 m depth on the Mid-Cayman Spreading Center [45]), which potentially represents a species-level relative [50] of ASW8_bin1 (Table 1). Within a maximum-likelihood tree computed using a concatenated alignment of 11 core ribosomal proteins, the two MAGs were placed as a sister-clade to all known ammonia-oxidizing Thaumarchaeota of Group 1.1a (marine AOA) and 1.1b (soil AOA) (Fig. 1a). Similarly, based on 16S rRNA gene phylogeny, the MAGs clustered with environmental clone sequences of the pSL12-like clade (Fig. 1b). The original hot spring pSL12 lineage (including the only available MAG for this lineage, DRTY-7 bin_36, assembled from a hot spring metagenome [6]) comprised a distant sister clade to the marine pSL12-like group.

Metabolic potential distinct from typical marine Thaumarchaeota

Capacity for ammonia oxidation was not detected in either MAG, as we could not retrieve homologs of the AMO or nitrite reductase (nirK) genes. Moreover, the carbon-fixation pathway uniquely found in chemolithoautotrophic Thaumarchaeota—a modified version of the 3-hydroxypropionate/4-hydroxybutyrate (HP/HB) cycle [51]—appeared to be missing in both genomes. The myriad of multicopper oxidases characteristic of mesophilic AOA genomes [52] were also missing; although manual BLASTp searches did identify copper-binding proteins of the plastocyanin/azurin family in both genomes. These genes were located in the vicinity of cytochrome or ATP synthase proteins, suggesting a role in electron transfer. Since the genomes are not closed, our failure to detect the ‘expected’ pathways/genes does not definitively indicate their absence. However, there were striking differences in the overall genomic repertoire of typical AOA genomes and the MAGs recovered here (Fig. 2a), which cannot be explained by the lack of genome completeness alone.

Fig. 2: Metabolic capabilities of pSL12-like clade distinct from typical AOA.
figure 2

a Comparison of selected metabolic features across thaumarchaeal genomes. pSL12-like MAGs are highlighted in red. Caldiarchaeum subterraneum belonging to the closely related candidate phylum Aigarchaeota, is also included for comparison. Gene abbreviations: AMO, ammonia monooxygenase; nirK, nitrite reductase; CA, carbonic anhydrase; pqq-adh, PQQ-dependent alcohol dehydrogenase; fixABC, electron transferring flavoprotein subunits A, B, and C. Taxa abbreviations: Caldi, Ca. Caldiarchaeum subterraneum; NO23, SCGC AAA007 O23; Nbrevis, Ca. Nitrosopelagicus brevis CN25; Ncatalina, Ca. Nitrosomarinus catalina SPOT01; Nexaq, Ca. Nitrosocosmicus exaquare; NAQ6f, Ca. Nitrosotenuis aquarius AQ6f; Nvien, Nitrososphaera viennensis; Ncavasc, Ca. Nitrosocaldus cavascurensis; Ndev, Ca. Nitrosotalea devanaterra; Nbavar, Ca. Nitrosotalea bavarica; BS4, Thaumarchaeota archaeon BS4 (MAG); DS1, Thaumarchaeota archaeon DS1 (MAG); and DRTY36, DRTY-7 bin_36 (MAG). b Phylogenetic tree of RuBisCO sequences computed in FastTree using a MAFFT alignment of amino acid sequences. The MAG-derived RuBisCO sequences are highlighted. Previously reported thaumarchaeal RuBisCO sequences are also highlighted. Forms I, II, and III exhibit carboxylation activity, whereas the form IV RuBisCO does not. c Subtree highlighting the relative placements of the MAG-derived RuBisCO sequences with respect to Form III-a sequences from methanogens. This subtree was extracted from the original maximum-likelihood tree presented in (b).

None of the six canonical carbon-fixation pathways were complete in the MAGs. It is possible that these Thaumarchaeota may use the recently described reverse oxidative TCA cycle for CO2 fixation [53], since the genomes contained fumarate reductases, and 2-oxoglutarate/2-oxoacid ferredoxin oxidoreductases. In this pathway, a reversible citrate synthase catalyzes the production of citrate from acetyl-CoA. Recently, metabolic reconstructions were used to predict the existence of the roTCA cycle in Aigarchaeota [6]. However, we take caution in asserting roTCA CO2 fixation in pSL12-like Thaumarchaeota, since genomic inference alone is not sufficient evidence for this pathway (many of the enzymes are bifunctional and common with the anabolic TCA cycle).

Metabolic reconstructions indicate aerobic heterotrophy

The presence of respiratory complexes and various organic carbon-assimilating metabolic pathways (e.g., fatty acid oxidation, sugar metabolism, amino acid degradation, and potential methylotrophy; Fig. 3) suggest a predominantly heterotrophic lifestyle for these Thaumarchaeota. No external inorganic electron donors were identified based on the genome annotations. In addition to the aerobic respiratory chain, both genomes contained electron transfer flavoprotein (fixABC) homologs. These proteins are involved in electron transfer to nitrogenase in diazotrophic bacteria [54]. Homologs of fixABC have previously been reported in non-diazotrophic archaea, including terrestrial AOA [55]; yet their functional role in non-diazotrophs remains unclear. The fix operon has not been reported in marine AOA, but it appears that the deep marine AOA clade (water column B (WCB) group found predominantly at depths >200 m [56]) may also contain the fix genes (Fig. 2a; SCGC AAA007 O23 is a representative WCB genome). As discussed in a later section, the pSL12-like lineage appears to be particularly abundant deeper in the water column, resembling the distribution of the WCB lineage (also observed in a recent survey of Thaumarchaeota communities in Monterey Bay [17]). The presence of fixABC genes in these two clades might be a reflection of their niche adaptation, and will need to be investigated further.

Fig. 3: Overview of metabolic potential based on metabolic reconstructions of the pSL12-like MAGs.
figure 3

Red dashed arrows indicate unidentified genes. The TCA cycle is presented in the anabolic direction. For detailed gene information, see Dataset 1. PEP phosphoenol pyruvate, 2-PG 2-phosphoglycerate, 3-PG 3-phosphoglycerate, 1,3-BPG 1,3-bisphosphoglycerate, G3P glyceraldehyde 3-phosphate, DHAP dihydroxyacetone phosphate, F6P fructose 6-phosphate, F16P fructose 1,6-bisphosphate, G6P glucose 6-phosphate, M6P mannose 6-phosphate, E4P erythrose 4-phosphate, X5P xylulose 5-phosphate, S7P sedoheptulose 7-phosphate, G3P glyceraldehyde 3-phosphate, R5P ribose 5-phosphate, Ru5P ribulose 5-phosphate, PRPP phosphoribosylpyrophosphate, AMP adenosine monophosphate, R15P ribose 1,5-bisphosphate, RuBP ribulose 1,5-bisphosphate.

Unlike other AOA, our two MAGs encoded several pyrroloquinoline quinone (PQQ)-dependent dehydrogenases containing N-terminal signal peptides (indicating extracellular localization), which can directly contribute reducing equivalents to the respiratory chain via extracellular sugar and/or alcohol oxidation (Fig. 3). Specific proteins were identified in both MAGs as putative PQQ-dependent methanol, ethanol, and glucose dehydrogenases (Dataset 1). Both methanol and glucose dehydrogenases that use PQQ as the prosthetic group are known to catalyze the oxidation of diverse alcohols and hexoses/pentoses, respectively [57], suggesting some degree of metabolic versatility in these archaea. PQQ synthase proteins were also identified in both genomes (Dataset 1). Up to 5 quinoprotein dehydrogenases were found to be colocalized on the same contig, along with amicyanin/plastocyanin-like small copper proteins and ATP synthase subunits (e.g., contigs ASW2bin45_2 and ASW8bin1_21; Dataset 1), indicating their combined involvement in an electron transport chain coupled to energy conservation.

Formaldehyde resulting from methanol oxidation is cytotoxic, and hence is promptly removed via dissimilatory or assimilatory pathways. Formaldehyde oxidation to formate likely proceeds via the tetrahydromethanopterin (H4MPT) pathway in these archaea, as the annotated genes included a F420-dependent methylene-tetrahydromethanopterin dehydrogenase (mtd) and a methylene-tetrahydromethanopterin cyclohydrolase/reductase (Dataset 1). Whether formaldehyde oxidation proceeds all the way to CO2 is unclear based on the annotations, since neither MAG encoded a formate dehydrogenase. Alternatively, formaldehyde may also get assimilated via the tetrahydrofolate or the serine pathway (neither pathway annotations were complete).

Metabolic reconstructions suggest the use of diverse organic compounds as potential electron donors. In addition to the fatty acid oxidation pathway, multiple sugar transporters with homology to trehalose/maltose import proteins and arabinose permeases were annotated in the MAGs (Dataset 1). Both MAGs also encoded a halolysin-like protease, which may hydrolyze proteins extracellularly and the resulting peptides may be imported as nutrients. Supporting this, peptide ABC transporter permease proteins and branched-chain amino acid transporters were identified in both genomes. Protein topology modeling suggested the extracellular localization of the halolysin protease, suggesting its involvement in protein degradation externally.

Thaumarchaeal lineages previously identified as basal groups lacking the capacity to oxidize ammonia (which were obtained from nonmarine environments) are reported to possess anaerobic energy generation pathways such as sulfate or nitrate reduction [5]. The MAGs assembled here contained no definitive evidence for anaerobic respiration, although we acknowledge this might be due to the lack of genome completeness. Moreover, many of the genomic features identified as unique/core features for the anaerobic basal thaumarchaeal lineages in a recent comparative meta-analysis [8] were also absent in these MAGs [(i.e., pyruvate:ferredoxin oxidoreductase (porABDG), cytochrome bd-type terminal oxidase (cydA), and acetyl-CoA decarbonylase/synthase (codhAB)]. Thus, multiple lines of evidence point to these MAGs representing a divergent, basal lineage within the aerobic, mesophilic clade of Thaumarchaeota.

Metabolic hypothesis on a RuBisCO-mediated anaplerotic CO2 assimilation pathway

Unexpectedly, both MAGs harbored an archaeal type III RuBisCO gene (463 aa long; 96.76% amino acid identity to each other). Hypothesized to be the most ancient form of RuBisCO, form III is predominantly found in Archaea [58]. Recent metagenomic surveys have revealed numerous members of the candidate phyla radiation [59, 60] and DPANN archaea [43, 61] also encoding a form III-like RuBisCO. A divergent variant categorized as form III-a is found in methanogenic archaea. Our MAG-derived sequences clustered with the methanogen III-a RuBisCO sequences (Fig. 2b–c), albeit with 30–35% amino acid identity.

Two separate studies have previously reported a form III RuBisCO in Thaumarchaeota, and in both cases the assembled genomes represented acidophilic terrestrial lineages: (i) Ca. Nitrosotalea bavarica SbT1 was assembled and binned from an acidic peatland metagenome [62], and (ii) the deeply branching BS4 and DS1 were assembled from acidic geothermal spring sediments in Yellowstone National Park [5]. RuBisCO sequences from these MAGs clustered within the main archaeal form III clade (Fig. 2b), and were <30% identical (in the amino acid space) to the sequences we obtained in this study.

Despite exhibiting carboxylase activity, genomic and biochemical evidence suggest that form III RuBisCO is not involved in carbon fixation via the canonical Calvin–Benson–Bassham (CBB) cycle [63, 64]. In many archaea harboring RuBisCO, phosphoribulokinase (PRK) required for the regeneration of the RuBisCO substrate (RuBP) is missing [63], suggesting the absence of a functional CBB pathway. Intriguingly, methanogenic archaea harboring form III-a RuBisCO encode a PRK, yet are missing other key components of the CBB cycle [65]. Thus, RuBisCO in these methanogens is thought to be involved in carbon assimilation via the reductive-hexulose-phosphate (RHP) pathway [65]. As demonstrated in Methanospirillum hungatei, RuBP regeneration in the RHP pathway involves the activity of PRK, as well as the formaldehyde-assimilating ribulose monophosphate (RuMP) pathway operating in reverse [65].

The second proposed route for form III RuBisCO-mediated carbon metabolism involves nucleoside assimilation/degradation via the archaeal AMP pathway [63, 64]. Briefly, adenosine monophosphate (AMP, retrieved from the phosphorylation of nucleosides) is converted to ribose 1,5-bisphosphate (R15P) by AMP phosphorylase. Subsequently, R15P is isomerized to ribulose 1,5-bisphosphate (RuBP) by ribose 1,5-bisphosphate isomerase (R15Pi). In an irreversible reaction, RuBisCO combines RuBP with CO2 and H2O to yield 3-phosphoglycerate (3-PG), which then enters the central carbon metabolism (via glycolysis or gluconeogenesis). Sato et al. [63] proposed that the reductive pentose phosphate pathway, if present, may cyclize the above-described series of transformations, effectively rendering it a carbon-fixation pathway.

Homology comparisons revealed the conservation of key active site residues for carboxylation in our MAG-derived RuBisCO sequences (Fig. S3). However, little evidence exists to support the methanogenic RHP CO2 fixation pathway—in addition to a missing PRK, many key enzymes in the methanogenic RHP and RuMP pathways could not be identified. Metabolic inferences best support an anaplerotic function for the carboxylation reaction via the AMP pathway for nucleotide salvage. A key difference from the archaeal AMP pathway, however, is the presence of a complete non-oxidative pentose phosphate pathway (nPPP) and gluconeogenesis in the pSL12-like lineage. The nPPP pathway operating in reverse to generate R5P from gluconeogenesis intermediates, combined with RuBP regeneration from PRPP and/or AMP, might constitute a cyclic CO2 fixation pathway ([63, 66]; Fig. 3). Several of the genes encoding key enzymes in the proposed pathway appeared to be colocalized on the same assembled contigs in both MAGs (Fig. S4), suggesting potential co-expression. This pathway, however, likely has an anaplerotic function, potentially regulated by intracellular levels of AMP and/or PRPP. We, however, emphasize that the proposed pathway is inferred purely via bioinformatic methods, and may well be impacted by the lack of genome completeness.

A gamma-class carbonic anhydrase (CA) was present in both MAGs, which catalyzes the interconversion of CO2 and HCO3. CA homologs have been identified in several terrestrial AOA, and are hypothesized to function extracellularly to facilitate CO2 uptake for carbon fixation [52]. However, marine lineages do not harbor CA genes (Fig. 2a). Unlike the CAs from terrestrial AOA, the pSL12-like CAs did not contain signal peptide sequences and, therefore, are likely involved in intracellular reversible dehydration of HCO3 to CO2. While CA is not exclusively indicative of carbon fixation, its activity may facilitate CO2 incorporation by RuBisCO and/or phosphoenol pyruvate carboxykinase in the pSL12-like Thaumarchaeota.

Distribution of the pSL12-like lineage in the water column

To assess the environmental distribution of the pSL12-like lineage, we matched the MAG-derived 16S rRNA sequences to a previously generated 16S rRNA amplicon dataset from the Monterey Bay upwelling system [46]. One of the MAG-derived 16S rRNA gene sequences (from ASW8_bin1) was an exact match to an OTU #694, which comprised <0.5% of the total thaumarchaeal abundance at any given time in the depths sampled. The next closest match was OTU #8597, which shared 96.02% and 97.08% sequence identity with sequences from ASW2_bin45 and ASW8_bin1, respectively. At any given time, these two OTUs together comprised at most 0.5% of thaumarchaeal abundance in the time-series dataset (Fig. 4a). As observed in previous surveys, the pSL12-like group of Thaumarchaeota appeared to be more abundant below the euphotic zone [11, 13, 15, 16, 17], with potential seasonal variations in relative abundances. Occasional abundance peaks were observed in the photic zone during spring at M1 (Fig. 4a), which likely reflects upwelled populations (station M1 is situated directly above the upwelling plume in Monterey Bay).

Fig. 4: Distribution of pSL12-like lineage in oceanic water columns.
figure 4

a Relative abundances (as a percentage of total thaumarchaeal abundance) of OTUs ≥ 96% identical to the 16S rRNA gene sequences retrieved from the MAGs. The two major panels correspond to two oceanographic sampling stations, M1 and M2, in Monterey Bay. Each subpanel represents a depth profile between 5 and 500 m. b Read recruitments of each MAG against Monterey Bay metagenomes. Size of the circle corresponds to normalized abundance. c, d Metagenome read recruitments against Atlantic Ocean and Pacific Ocean depth profiles, respectively, from the TARA Oceans dataset. Relative abundances are presented as number of reads mapped per kilobase of genome per gigabase of metagenome (RPKG). Metagenome sample accessions are provided in Table S1.

In recruiting metagenomic reads from Monterey Bay against the MAGs, we observed the highest recruitment at 500 m for ASW2_bin45. ASW8_bin1 recruited slightly fewer reads but appeared to have a similar abundance distribution across depths as ASW2_bin45 (Fig. 4b). In addition, the relative abundances appeared to change with seasonal hydrologic changes in the system (Fig. 4b). Recruitment against TARA Ocean metagenomes representing Atlantic Ocean and Pacific Ocean depth profiles revealed similar depth distribution of the pSL12-like lineage, with the greatest abundance at depths well below the euphotic zone (200–800 m; Fig. 4c).

Conclusions

In this work, we used reconstructed population genomes to infer metabolic adaptations of the elusive pSL12-like lineage of Thaumarchaeota, widely distributed in marine systems. The high-quality genomes described here offer a first glimpse into the genomic repertoire of a marine thaumarchaeal group devoid of an exclusively chemoautotrophic energy generation strategy. Only terrestrial basal lineages of Thaumarchaeota have been described thus far; the MAGs presented here represent the first genomic description of a basal lineage inhabiting the marine oxic environment. In this context, an especially intriguing consideration is the relative positioning of the pSL12-like clade within the thaumarchaeal evolutionary trajectory. The diversification of Thaumarchaeota, from basal groups to the mesophilic AOA appears to have included multiple metabolic changes—acquiring the 3-HP/4-HB pathway for CO2 fixation, ammonia oxidation, and potential differences in co-factor use, among others (Fig. 2a; also reviewed in [8]). The MAGs described here represent a basal lineage that appears to coexist with aerobic ammonia-oxidizing Thaumarchaeota in marine waters (basal lineages reported thus far have been found in terrestrial systems, as reviewed in [8]). These MAGs may thus enable a more detailed probing of the trajectory leading to marine AOA evolution from basal groups, and help constrain the relative timing of the acquisition of aerobic metabolism and ammonia oxidation within the phylum.

Overall, the divergent genomic features of the pSL12-like clade significantly alter our understanding of the metabolic diversity within this abundant archaeal phylum in the oceans. While further biochemical characterization is warranted to confirm the proposed metabolic pathways, our results suggest that obligate aerobic heterotrophy might be an overlooked metabolic strategy within pelagic Thaumarchaeota.