Introduction

The terrestrial subsurface is a major reservoir of phylogenetically and metabolically diverse bacteria and archaea that remain largely uncultivated. High-throughput culture-independent sequencing approaches in recent years have offered a window into the genetic diversity and metabolic novelty of subsurface microbes e.g., [1,2,3]. However, much remains unknown about the ecological adaptations of subsurface communities and their biogeochemical impacts. Among the most abundant members of the subsurface community are archaea of the phylum Thaumarchaeota [2,3,4,5,6]. Ubiquitous across terrestrial and marine systems, Thaumarchaeota are key members of the global biogeochemical nitrogen and carbon cycles. While the ubiquity of these organisms in shallow subsurface soils (<100 cm) is increasingly being documented e.g., [2, 3, 5], a systematic inquiry into their ecophysiological adaptations vertically along the soil profile is thus far lacking.

The most well-characterized members of the phylum Thaumarchaeota are the ammonia-oxidizing archaea (AOA)—a monophyletic group of predominantly mesophilic organisms that mediate the first and rate-limiting step of nitrification—the aerobic oxidation of ammonia to nitrate via nitrite. The ecology of AOA in natural systems has been studied extensively e.g., [7,8,9,10,11], primarily using the molecular marker gene amoA, which encodes the α-subunit of the key metabolic enzyme ammonia monooxygenase. Deeply rooted lineages of Thaumarchaeota described to date, however, are not ammonia-oxidizers [12,13,14,15,16,17]. With the exception of the recently described marine heterotrophic clade [16, 17], basal lineages of Thaumarchaeota lack the ability to respire oxygen [18]. The lack of cultivated non-AOA representatives [15] limits explorations into the ecophysiological adaptations of these lineages.

Mesophilic AOA primarily constitute three major lineages: (i) Nitrososphaerales comprised mostly of soil AOA [19]; (ii) Nitrosopumilales, predominantly found in marine systems [20], and (iii) Candidatus Nitrosotaleales, an acidophilic lineage typically found in soils [21]. Oxygen availability and soil pH are hypothesized to have played key roles in the habitat-associated diversification of AOA over evolutionary timescales [18, 22]. In terrestrial systems, AOA population structure has been linked to soil pH [22, 23], moisture levels [24], ammonium availability [25], and temperature [26]. Notably, however, most studies assessing AOA community structure in terrestrial environments have focused on topsoil e.g., [23, 27, 28]. The few studies that examined subsurface AOA have uncovered significant changes in AOA community structure with soil depth [29, 30], which are not captured in topsoil studies.

A recent analysis of amoA gene abundance and diversity along a floodplain sediment profile in the Wind River Basin near Riverton, Wyoming revealed AOA as the predominant ammonia-oxidizers in these sediments, as well as regionally across four other floodplain sites spanning a 900-km north-south transect in the intermountain western United States [5]. 16S rRNA-based community profiling at a nearby site in Riverton also reported the dominance of AOA over bacterial ammonia oxidizers in the subsurface, and their community structure appeared to shift with changing soil horizons with depth [6]. Curiously, no relationship was found between AOA community structure and any of the physicochemical parameters measured, suggesting depth-associated changes in lithography was the primary determinant of shifts in phylogenetic structure [6]. Phylogenetic analysis of the amoA sequences from Riverton sediments also pointed to a notable shift in AOA community structure with sediment depth, which appeared to be linked to the moisture content of the sediments [5]. In order to examine the ecophysiological adaptations of these AOA, we obtained metagenome-assembled genomes (MAGs) from subsurface sediment samples collected along a ~2 m depth profile at site KB1 near Riverton, WY. The assembled genomes include diverse AOA spanning the mesophilic order-level lineages Nitrososphaerales and Nitrosopumilales, as well as a basal lineage of Thaumarchaeota that harbors unique, previously unknown metabolic adaptations.

Materials and methods

Sample collection, geochemical analysis, and DNA extraction

Details of field sampling and general site characteristics are presented in Cardarelli et al. [5]. The sampling site at Riverton, WY is located on an alluvial terrace within the Wind River Basin. A 234 cm deep soil pit was carved into (first ~1 m) and dug out below (bottom ~1.5 m) the terrace wall to abstract soil samples from the top soil to the groundwater aquifer at a location (KB1) close to the former river bank (Latitude: 42° 59.322804, Longitude: −108° 23.977843) in August 2015. The water table was located at 235 cm below ground surface (BGS), and the transiently flooded capillary fringe was estimated to be located between 155 and 235 cm BGS. The sediment core consisted of dry, free draining soil (i.e., at field capacity) up to ~80 cm BGS. Sediment samples for molecular analyses were collected at discrete depths at ~10–20 cm intervals (depending on soil horizonation) along the length of the core, flash-frozen in liquid nitrogen, and stored at −80 °C until nucleic acid extraction. Geochemical characterization was performed on freeze-dried core samples, finely ground and homogenized with a mortar and pestle. For total carbon and nitrogen, 30–60 mg of sample was weighed into tin capsules and analyzed in duplicate on a Carlo Erba NA1500 elemental analyzer. Remaining samples were analyzed by using energy-dispersive X-ray fluorescence spectroscopy (XEPOS, SPECTRO Analytical, Kleve, Germany) to measure elemental composition. Key geochemical variables measured include total carbon, nitrogen and sulfur content (Fig. S1).

Approximately 0.3 g of sediment from each sample was used for DNA extraction. To lyse cells, samples were subjected to mechanical agitation in a FastPrep bead beater (MP Biomedicals, Santa Ana, CA) for 2 cycles of 30 s at setting 5.5. Following this, DNA was extracted using the PowerSoil DNA Extraction Kit (MoBio, Carlsbad, CA) following the manufacturer’s instructions.

Metagenome sequencing, assembly, and genome reconstruction

Metagenome sequence libraries were constructed and sequenced 2 × 151 bp using the NovaSeq platform (Illumina) at the DOE Joint Genome Institute. Reads were quality trimmed by using BBDuk (v38.24; ref. [31]) - reads with 4 or more “N” bases were removed, and those with an average quality score >3 and minimum length ≥ 51 bp were retained. Read-correction was performed using BFC (v.r181; ref. [32]). Reads without a mate pair were removed. Quality-filtered reads from each library were assembled individually using MEGAHIT (v1.1.3; ref. [33, 34]), using a range of k-mers (k = 21, 33, 55, 77, 99, 127). Contigs longer than 2000 bp were binned using MetaBAT2 (v2.12.1; ref. [35]) and MaxBin2 (v2.2.6; ref. [36, 37]). Resulting bins were refined using the bin refinement module in metaWRAP (v1.2.2; ref. [38]) and re-assembled using metaSPAdes (v3.13.0; ref. [39]). Short contigs (<2000 bp) introduced during re-assembly were removed. CheckM (v1.0.12; ref. [40]) was used to assess bin completion and redundancy. Taxonomic classifications were obtained using the Genome Taxonomy Database Toolkit (GTDB-Tk; ref. [41]) classified against GTDB Release 05-RS95 [42, 43]. Of the 27 MAGs, 13 representing species-level clusters on a ribosomal protein phylogenomic tree were selected for read recruitment in order to estimate relative abundances of the various phylogenetic clades across the soil/sediment profile. Bowtie2 [44] was used for the read recruitment analysis, with the flags “--sensitive --no-unal”. Reads were recruited against each MAG individually, as well as competitively against each MAG by using a combined index built from all 13 MAGs. Mapped reads were normalized to the size of the MAG (kb) and the metagenome size (Gb), rendering values in units of reads per kilobase of genome per gigabase of metagenome.

Functional annotations

Prodigal (v2.6.3; ref. [45]) was used to predict protein-coding genes, and initial functional annotations were obtained using Prokka (v1.12; ref. [46]). KO annotations were obtained using GhostKOALA (v2.2; ref. [47]), KAAS (v2.1; ref. [48]) and eggNOG-mapper (v2; ref. [49, 50]). SEED annotations were obtained from the online Rapid Annotation using Subsystem Technology server [51]; and annotations for genes of interest were confirmed by BLASTP [52] searches against the NCBI non-redundant protein database. TransportDB (v2.0; ref. [53]) was used to predict membrane transporters. SignalP-5.0 server was used for signal peptide prediction (http://www.cbs.dtu.dk/services/SignalP-5.0/ (ref. [54, 55]); and transmembrane domains were identified using TMHMM-2.0 (ref. [56, 57]).

Phylogenetic analyses

Phylogenomic analysis was carried out using a concatenated alignment of ribosomal proteins, retrieved from the MAGs and thaumarchaeal reference genomes using the phylogenomics module in Anvi’o 5 [58]. The following proteins were included in the analysis: ribosomal protein L1, L13, L14, L15e, L16, L21e, L22, L23, L26, L29, L3, L31e, L32e, L37ae, L39, L4, L44, L5e, L6, S12/S23, S11, S13, S15, S17, S17e, S19, S19e, S2, S24e, S27e, S28e, S3Ae, S7, S8, S8e, and S9. A concatenated alignment of the protein sequences was generated using MUSCLE [59]. Alignment trimming was conducted by using trimAL (-gt 0.80 -resoverlap 0.55 -seqoverlap 55; ref. [60]). The trimmed alignment was used for phylogenomic tree inference using IQ-TREE [61] with 1000 bootstrap replicates [62]. The ModelFinder [63] in IQ-TREE chose LG + F + R6 as the best substitution model.

Specific functional gene sequences were identified via BLASTP searches [52], and single-protein phylogenies were computed using FastTree [64] with 100 bootstrap replicates each, based on Clustal Omega [65] alignments of protein sequences (unless otherwise specified in the figure legends). Trees were visualized in FigTree (http://tree.bio.ed.ac.uk/software/figtree/) and edited in Adobe Illustrator to add annotations and highlight clusters.

Pangenomic analysis of basal Thaumarchaeota

Reference genomes of Thaumarchaeota lineages basal to AOA were downloaded from NCBI and the Integrated Microbial Genomes and Microbiomes (IMG/M; ref. [66]) databases. Defined family-level clusters in the GTDB release 06-RS202 were used as guides for identifying basal lineages and associated MAGs. When multiple genomes formed a species cluster, the highest quality genome was included in the reference set. Pangenome analysis was conducted in Anvio’ 6.2 [58], using a genome storage database containing HMM and NCBI COG annotations. The pangenome was summarized using ‘anvi-summarize’ and features of interest were manually selected to plot a presence/absence diagram alongside a phylogenomic tree of the basal MAGs computed as described earlier.

Results and discussion

Diverse Thaumarchaeota populations along the sediment depth profile

Thaumarchaeota ranked among the top 15 phyla in the KB1 metagenomes across all depths, and ranked 7th at 155 and 175 cm depths (Fig. S2). Within the top soil layers (38, 57, 67, 86 cm depths), 88–91% of the thaumarchaeal reads were classified as Nitrososphaerales. In contrast, Nitrosopumilales accounted for 76–86% of the thaumarchaeal reads in deeper sediments below 100 cm (Fig. S2). Assembly and binning of metagenomic data from the KBI metagenomes yielded 27 medium- to high-quality genomes of Thaumarchaeota (Table 1). Note that recent versions of the GTDB Toolkit [41] classify Thaumarchaeota as the monophyletic class-level lineage Nitrososphaeria within the phylum Thermoproteota. To minimize confusion regarding name changes, and since the conventional Thaumarchaeota taxa definitions have not yet been formally redefined to accommodate the revised taxonomic framework in the GTDB, we continue using the conventional taxa names to refer to thaumarchaeal clades in this manuscript. Where appropriate, we point out the corresponding GTDB taxa names.

Table 1 Metagenome-assembled genome (MAG) statistics.

On the ribosomal protein phylogenomic tree, 25 of the thaumarchaeal MAGs clustered with previously published AOA genomes (Fig. 1). The remaining two MAGs, D197_2 and D197_116, clustered as a basal non-AOA lineage, forming a sister group to the recently described marine heterotrophic Thaumarchaeota (16,17; Fig. 1). Clustering patterns in the phylogenomic tree (Fig. 1) were largely congruent with the single-gene phylogenies inferred based on the MAG-derived amoA and 16S rRNA genes (Fig. S3). Surface soil metagenomes mostly yielded AOA MAGs classifying within the order Nitrososphaerales while MAGs assembled from the moist sediment metagenomes deeper in the profile classified within the typical marine order Nitrosopumilales (Fig. 1). This shift in phylogenetic structure along the profile clearly reflected the depth-related shift in AOA population structure previously observed in alluvial sediments, based on the amoA gene diversity [5].

Fig. 1: Maximum-likelihood phylogenomic tree inferred using a concatenated alignment of select ribosomal proteins.
figure 1

See Methods for details on tree inference. Genus names listed next to each cluster (prefixed by “g_”) correspond to the classification obtained via the GTDB toolkit. In square brackets next to the genus names are the corresponding conventionally used thaumarchaeal clade names (family names prefixed by “f__”). For MAGs used as reference genomes, the habitat where the corresponding metagenomes originated have been indicated in italics. The naming convention used for naming the MAGs assembled in this study is as follows: “D[Depth cm]_[MAG #]”. For example, “D197_2” is the MAG #2 obtained from the 197 cm metagenome.

Metagenomes obtained from the top 4 depths of the sediment core (38, 57, 67, and 86 cm) yielded 12 MAGs classified as Nitrososphaerales (GTDB family Nitrososphaeraceae). On the ribosomal protein phylogeny (Fig. 1), these MAGs were placed within three distinct genus-level clusters: (i) the typical terrestrial genus Nitrososphaera (n = 4); (ii) a Nitrososphaera-sister group (n = 4), currently represented by genomic data alone (family NS-β as described in ref. [67]; GTDB genus UBA10452), and (iii) a sister-lineage to Nitrosocosmicus (n = 4) within the NS-ε lineage [67], which could not be assigned to any of the delineated AOA genera in GTDB (Table 1). The latter two groups included several MAGs from the recently published AOA MAG collection from sediments along the River Thames, UK (68; Fig. 1), suggesting a potentially wide distribution of these under-sampled AOA clusters in terrestrial subsurface and/or alluvial environments.

In contrast to the shallow soil layers, MAGs obtained from depths below 100 cm (111, 125, 155, 175, 185, 197, and 214 cm) were all classified as Nitrosopumilales (GTDB family Nitrosopumilaceae), spanning the genera Nitrosotenuis, Nitrosopumilus, Nitrosarchaeum and the uncultured CSP1-1 lineage (Table 1). CSP1-1, an AOA MAG assembled from alluvial sediments in Rifle, Colorado [2], was the phylogenetically closest database representative to 7 out of the 13 Nitrosopumilales MAGs assembled here (Table 1, Fig. 1). In the Rifle metagenomes, CSP1-1 recruited 0.7% of all sequence reads and represented the most abundant member of the sediment community [2]. We observed relatively high abundances for the Riverton MAGs clustering with CSP1-1, as up to 0.36% of metagenomic reads were mapped to the closest representative genome D185_135, whereas the remaining AOA MAGs recruited up to 0.26% of the reads across all metagenomes (Fig. 2, Table S1 in Data Set 1). Read recruitment profiles suggested a particularly high abundance of CSP1-1 AOA within the capillary fringe (i.e., >150 cm below ground surface; Fig. 2). The CSP1-1 lineage may thus represent a pervasive and dominant AOA group within subsurface alluvial sediments.

Fig. 2: Differential abundances of Thaumarchaeota lineages along the hydrological gradient.
figure 2

a Schematic representation of the sediment hydrology at KB1, along with measured values of total carbon (%) and carbon to nitrogen (C:N) ratios. The water table was located at 235 cm below ground surface, and the sediments were completely dry (at field capacity) up to ~ 80 cm. Total carbon and nitrogen were determined to be zero at 175 cm (the blue and red dots are overlapping). At 185 cm, total carbon was determined to be 0.05 % while total nitrogen was again zero, hence the missing blue dot. b Results of read recruitment analysis. The highest quality MAG (i.e., highest percent completion, lowest redundancy) in each major phylogenetic cluster as presented in Fig. 1 was chosen for read recruitment analysis. Panels correspond to individual MAGs to which metagenomic reads were mapped. The panel shading indicates the AOA order level lineage (orange: Nitrososphaerales and blue: Nitrosopumilales); and bar colors indicate depth layers as indicated in a. Abundances are expressed as the number of reads mapped per kilobase of genome per gigabase of metagenome (RPKG). Genus names obtained via the GTDB Toolkit are noted for each MAG.

Nitrosarchaeum [68] was identified as an abundant AOA genus at depths below 100 cm at Riverton, based on amoA [5] and relative 16S rRNA gene abundances [6]. While we assembled two Nitrosarchaeum MAGs from the 185 and 197 cm metagenomes, the genome quality estimates were relatively lower for these compared to the other AOA MAGs (Table 1). Their abundance profile suggests some degree of habitat overlap with CSP1-1 AOA, as both lineages appeared to be numerically abundant at sediment depths experiencing transient water intrusion from the water table below (Fig. 2). However, unlike CSP1-1, the Nitrosarchaeum MAG recruited notably fewer reads from metagenomes from 125 cm and above, suggesting different environmental controls on their distributions (Fig. 2).

The sole Nitrosopumilus MAG assembled in our dataset was found to be abundant only at 214 cm (Fig. 2). This MAG recruited 0.035% of the reads from 214 cm, and was the only AOA MAG recruiting >0.01% of reads from this depth. All three MAGs classified as Nitrosarchaeum or Nitrosopumilus clustered within a clade of representative AOA isolated or enriched from marine, estuarine or rhizosphere sediments (Fig. 1). The Nitrosopumilus MAG D214_93 shared 81.7% average nucleotide identity (ANI) with Ca. Nitrosopumilus sediminus AR2 enriched from Arctic sediments [69]. Similarly, the two Nitrosarchaeum MAGs (D185_51 and D197_10) were most closely related to Ca. Nitrosarchaeum koreense MY1 isolated from rhizosphere sediments [70], sharing 87.2% and 87.6% ANI with MY1, respectively.

Dominant AOA groups in mesophilic terrestrial systems typically affiliate within the genera Nitrososphaera, Nitrosocosmicus, and Nitrosotalea e.g., [71,72,73]. While the majority of Nitrosopumilus AOA have been identified in marine systems [74,75,76,77], they are also found in soil/sedimentary environments, particularly in aquifer-associated sediment layers [2, 3, 5, 6, 30, 78, 79]. Groundwater environments, in particular, have been found to host high abundances of Nitrosarchaeum and Nitrosopumilus AOA [29, 30]. These observations suggest that soil moisture may be an important control not only on the relative abundances of AOA in soils as described before [24], but also on their population structure. Increasing energy limitation with depth, as evidenced by decreasing amounts of total carbon and relatively higher carbon to nitrogen ratios (C:N) within the moist sediment layers (111–155 cm BGS; Fig. 2A), might also explain the AOA taxonomic shift observed in the KB1 profile from the dominance of relatively generalist (i.e., greater number of metabolic adaptations as explained below) lineages in surface soils to more oligotrophic groups (i.e., with more streamlined genomes similar to marine AOA) in deeper sediments (Fig. 2B). Such a shift in phylogenetic structure with depth is likely pervasive in alluvial sediments, as similar patterns were recovered across multiple floodplain sites in the western United States, based on amoA gene diversity [5]. The depth-differentiation was also evident in a non-metric multidimensional scaling analysis of the MAG relative abundances (Fig. S4). The following taxa were identified as significantly correlating to the sample clustering in the nMDS space: Nitrosarchaeum, Nitrosopumilus, Nitrososphaera, unclassified genus of the NS-ε family, UBA0452, and CSP1-1. Nitrosopumilales (including CSP1-1) were particularly dominant within the capillary fringe (>155 cm), where both total carbon and nitrogen were generally lower (Fig. 2B). The elevated concentrations of sulfur below 111 cm (Fig. S1) reflect the influence of sulfate-rich groundwater plumes at the Riverton site [6], where sulfur has also been shown to be strongly correlated to salinity [5].

Notable functional features of AOA MAGs

While many of the AOA MAGs lacked relatives among the cultured reference AOA (Fig. 1), they often clustered with the MAGs described in Sheridan et al. [80]. The core genomic features of the Nitrososphaera MAGs appeared consistent with previously described AOA genomes. Genomic potential for ammonia oxidation was confirmed in all AOA clusters (determined by the presence of at least one of the three ammonia monooxygenase gene subunits; Figs. 3, S2). 4-hydroxybutyryl-CoA dehydratase, the key enzyme of the AOA-specific CO2 fixation pathway [81], is also ubiquitously encoded by all AOA MAGs in this study. Additionally, many of the AOA lineages found in Riverton sediments may be capable of utilizing nitrogenous organic compounds to supplement energy generation (Fig. 3). For instance, the capacity for urea hydrolysis appears to be pervasive across the subsurface lineages, as most MAGs (23 out of the 25 AOA) harbor urease subunits (Fig. 3). Many terrestrial and marine AOA are hypothesized to supplement their energy metabolism via urea hydrolysis to ammonia [69, 75, 77, 82,83,84,85]. Near-stoichiometric growth on urea has been observed for AOA strains in the mesophilic genera Nitrosopumilus [75, 77] and Nitrososphaera [86], as well as in the thermophilic genus Nitrosocaldus [85]. Notably, urease genes were not detected in the Nitrosopumilus and Nitrosarchaeum MAGs recovered here (Fig. 3).

Fig. 3: Metabolic features across MAG phylogenetic clusters.
figure 3

The phylogeny on the left was inferred using the maximum-likelihood method in IQ-Tree, based on a concatenated alignment of ribosomal marker genes (see “Methods”). Highlighted next to each MAG name are estimates of genome completeness (green) and redundancy (red), which are also presented in Table 1. The colored bars next to each cluster indicate the genus-level classification as obtained via the GTDB toolkit. Bootstrap support values for 1000 replicates are indicated at each node. The tree was rooted with the Ca. Caldiarchaeum subterraneum genome. amoABC ammonia monooxygenase subunits A, B and C; nirK nitrite reductase; MCO multicopper oxidase; NHase nitrile hydratase; Hyd_4 group 4 [NiFe] hydrogenase; Hyd_3b group 3b [NiFe] hydrogenase; 3HP_HB 3-hydroxypropionate/4-hydroxybutyrate cycle (using the hcd gene that codes for hydroxybutyryl-CoA dehydratase as the pathway marker); CODH_ACL carbon monoxide dehydrogenase/acetyl-CoA synthase; porABCD pyruvate:ferrodoxin oxidoreductase subunits A, B, C and D; PEPCase phophoenolpyruvate carboxylase; SOD superoxide dismutase; feoAB ferrous ion transporter; fbpABC ferric ion transporter; W-AOR tungsten-containing aldehyde:ferredoxin oxidoreductase.

Additional reduced nitrogen compounds contributing to AOA metabolism in the Riverton sediments may possibly include cyanate and nitriles. Indeed several of the Nitrososphaera MAGs contain homologs of cyanate and nitrile hydratases (Fig. 3). Cyanate hydratase (cyanase) catalyzes the formation of NH3 and CO2 from cyanate and bicarbonate [87], and can enable growth on cyanate as a nitrogen source as demonstrated for the soil thaumarchaeon Ca. Nitrososphaera gargensis [88]. Moreover, isotope incorporation experiments suggest that marine AOA may assimilate cyanate-derived nitrogen, despite missing the cyanase homolog in their genomes [89]. All three cyanase-encoding MAGs from the Riverton metagenomes clustered within the genus Nitrososphaera (Fig. 3), corroborating the limited phylogenetic distribution of this gene among AOA. Nitriles are another group of N-containing organic compounds that could potentially serve as carbon and nitrogen sources for organisms harboring the nitrile-hydrolyzing enzymes - nitrilases and nitrile hydratases. Nitrilases catalyze the formation of ammonia from nitrile compounds directly [90], whereas nitrile hydratases (NHases) catalyze the hydrolysis of nitriles to the corresponding amides, which can then be converted to ammonia and carboxylic acids by an amidase [91]. Several AOA within the Nitrosopumilus, Nitrosotenuis and Nitrosocaldus genera are known to carry nitrilases e.g., [92,93,94,95]; however, these appear to be phylogenetically distinct from the NHases found in the Riverton Nitrososphaerales MAGs (Fig. S5). The relatively patchy distribution of NHases within Nitrososphaerales points to potential horizontal acquisition or loss of this gene by AOA lineages (Fig. S5).

Genomic capabilities for reducing oxidative stress varied between the Nitrosopumilales and Nitrososphaerales MAGs. Superoxide dismutase, which catalyzes the conversion of highly reactive superoxide to hydrogen peroxide (H2O2) and oxygen, was found in 21 out of the 25 AOA MAGs (Fig. 3), as expected based on the wide distribution of this gene across AOA clades [96, 97]. H2O2 detoxification is most efficiently catalyzed by the gene catalase [98], which is generally absent in AOA, although there are exceptions. Among cultured AOA, manganese-containing catalases (Mn-catalases; as opposed to the typical heme-group containing catalases) have been annotated in genomes of the soil thaumarchaeon Ca. Nitrososphaera evergladensis [99] and in Ca. Nitrosocosmicus exaquare [100]. A truncated copy of Mn-catalase is also found in the Ca. N. gargensis genome, which appears to be horizontally acquired [97]. In accordance with this, several of the Nitrososphaera and Nitrosocosmicus-like MAGs from Riverton (but none of the Nitrosopumilales MAGs) contained Mn-catalases (Fig. 3). H2O2, even at nanomolar levels, has been shown to inhibit ammonia oxidation by marine AOA (i.e., Nitrosopumilales; ref. [101]); and culture experiments have suggested that these AOA might employ α-keto acids or co-occurring catalase-harboring bacteria as H2O2 scavengers [97, 102]. Whether the subsurface Nitrosopumilales have also adopted these strategies for H2O2 detoxification remains to be examined. Intriguingly, a canonical heme catalase was found in the non-AOA MAG D197_116, which clustered with catalase sequences from anaerobic archaea and bacteria, including Methanomicrobia and ANME-1 cluster archaea (Fig. S6). Heme-catalase sequences were found in two non-AOA thaumarchaeal MAGs assembled from a peat metagenome [103], one of which appears to be particularly distinct from the rest (Fig. S6). None of the other thaumarchaeal genomes we examined contained homologs of this gene, suggesting lateral acquisition by D197_116 and the peat Thaumarchaeota described above.

All four MAGs in the Nitrosocosmicus sister cluster (NS-ε), harbored group 3b [NiFe] hydrogenases (Fig. 3; Fig. S7), previously reported only in thermophilic AOA [95], and more recently in Nitrososphaerales MAGs obtained from a wastewater treatment plant [104]. The 3b-type hydrogenases are oxygen-tolerant bidirectional enzymes that couple NAD(P)H oxidation/reduction with H2 production/consumption [105]. These may also act as sulfhydrogenases that reduce elemental sulfur or polysulfide to hydrogen sulfide [106]. Hydrogenases were not detected in any of the remaining Nitrososphaerales MAGs (Fig. 3). However, all thaumarchaeal MAGs assembled from 185 cm and below (except for those falling within the CSP1-1 cluster) harbored group 4f [NiFe]-hydrogenases (Fig. 3; Fig. S7). These hydrogenases potentially comprise a respiratory complex that mediates formate oxidation to CO2 while reducing protons to generate H2 [107], and may function in cellular redox balance. Since these genes were found across all three lineages from 185 cm and below (Nitrosopumilus, Nitrosarchaeum and the non-AOA Thaumarchaeota; Fig. 1), the group 4f hydrogenases are potentially a habitat-specific adaptation in subsurface Thaumarchaeota that aids with redox changes resulting from water table fluctuations.

Divergent non-AOA Thaumarchaeota encoding the Wood-Ljungdahl pathway, a form-III RuBisCO, and the potential for extracellular electron transfer

Two of the MAGs recovered from the 197 cm sediment sample, D197_116 and D197_2, appear to represent a non-AOA basal lineage of Thaumarchaeota (Fig. 1). On the ribosomal protein phylogeny, these MAGs form a sister group to the recently described heterotrophic marine Thaumarchaeota (HMT) lineage (Fig. 1; 16,17). While the HMT lineage was inferred to be capable of aerobic respiration, neither of the two non-AOA MAGs from Riverton harbor aerobic terminal oxidases. Despite this, these two lineages share many metabolic features, including the complete lack of ammonia-oxidation machinery. We expand upon these comparisons in later sections.

Both non-AOA MAGs encode a complete tetrahydromethanopterin (H4MPT)-dependent Wood-Ljungdahl pathway (WLP). In addition to the key enzyme CO dehydrogenase/acetyl-CoA synthase (CODH/ACS), both MAGs encode the complete archaeal methyl-branch of the WLP (Fig. 4). Homologs of methyl-CoM reductase (McrABC) were not identified in either MAG, excluding the potential for methane metabolism in this lineage. Thus, the WLP in these Thaumarchaeota likely functions in a non-methanogenic, autotrophic CO2 fixation pathway, as has been suggested recently for several archaeal phyla harboring the WLP without identifiable Mcr homologs [108,109,110,111,112]. Unlike in acetogenic Bathyarchaeota harboring the WLP [109], the non-AOA MAGs do not contain phosphate acetyltransferase (pta) or acetate kinase (ack), which are responsible for converting acetyl-CoA to acetate with concomitant generation of ATP via substrate-level phosphorylation. Instead, acetate formation is likely catalyzed by acetyl-CoA synthetase/acetate-CoA ligase, which was identified in both MAGs (Fig. 4).

Fig. 4: Metabolic reconstruction of the non-AOA MAG D197_116.
figure 4

Amino acid sequences corresponding to each highlighted gene/pathway are presented in Table S3 in Data Set 1. Abbreviations: CHO-MF formyl-methanofuran, fwd formyl-MFR dehydrogenase, CHO-H4MPT formyl-tetrahydromethanopterin, ftr formyl-MFR:H4MPT formyltransferase, mch methenyl-H4MPT cyclohydrolase, CH = H4MPT methenyl-tetrahydromethanopterin, mtd F420-dependent methylene H4MPT dehydrogenase, CH2 = H4MPT methylene-tetrahydromethanopterin, mer methylene-H4MPT reductase, CH3-H4MPT methyl-tetrahydromethanopterin, CO carbon monoxide, CO2 carbon dioxide, cdhABC carbon monoxide dehydrogenase/acetyl-CoA synthase subunits, AMP adenosine monophosphate, AMPase AMP phosphorylase, R15P ribose 1,5-bisphosphate, R15Pi ribose 1,5-bisphosphate isomerase, RuBP rubilose 1,5-bisphosphate, RuBisCO ribulose-1,5-bisphosphate carboxylase, 3-PGA 3-phosphoglycerate, PEP phophoenolpyruvate, PPDK pyruvate phosphate dikinase, PEPCK phosphoenolpyruvate carboxykinase, PFOR pyruvate:ferredoxin oxidoreductase, acs acetyl-CoA synthetase, W-AOR tungsten-dependent aldehyde:ferredoxin oxidoreductase, ADH aldehyde dehydrogenase, PQQ-DH pyrroloquinoline quinone-dependent dehydrogenase, Ttr tetrathionate reductase, MHC multiheme cytochromes.

The presence of WLP within Thaumarchaeota was implied when a previous study [113] identified the CODH/ACS gene cluster in a thaumarchaeal MAG RBG_16_49_8 assembled from a subsurface aquifer sediment metagenome [3]. Following this, another study [18] also highlighted the presence of CODH/ACS subunits in RBG_16_19_8, suggesting that the loss of WLP may be a key evolutionary event marking the transition from basal anaerobic lineages of Thaumarchaeota to oxygen-respiring AOA. We were able to identify the complete H4MPT-dependent WLP in the RBG_16_19_8 genome as well. On the phylogenomic tree, RBG_16_49_8 clustered together with the two non-AOA MAGs from Riverton (Fig. 1).

The non-AOA MAGs also harbor pyruvate:ferredoxin oxidoreductase (POR), an oxygen-sensitive enzyme catalyzing the decarboxylation of pyruvate to acetyl-CoA. The POR system was identified as a key metabolic feature of anaerobic lineages within Thaumarchaeota [18]. In the non-AOA MAGs described here, acetyl-CoA generated via the WLP or by POR activity could either enter central carbon metabolism via an incomplete TCA cycle or get reduced to ethanol (Fig. 4). The MAGs uniquely contain multiple homologs of a tungsten-containing aldehyde:ferredoxin oxidoreductase (W-AOR), which is potentially involved in the conversion of acetate to ethanol (Fig. 4; [114, 115]). W-AORs are known to have broad substrate range [116] and, therefore, might mediate various redox reactions involving organic acids and aldehydes. While the best-characterized W-AORs are from hyperthermophilic archaea such as Pyrococcus furiosus and Thermococcus litoralis [117], homologs have been detected in various anaerobic bacteria [118] and archaea, including protein-metabolizing (cren)archaea in marine sediments [119], as well as Aigarchaeota lineages [120]. To the best of our knowledge, this is the first report of W-AORs in Thaumarchaeota. Tungsten transporters are also present in these genomes, homologs of which were detected in several non-AOA thaumarchaeal genomes analyzed (Fig. 3; Table S2 in Data Set 1).

Similar to the previously described HMT lineage [16, 17], both RVT197_2 and RVT197_116 encode several pyrroloquinoline-quinone-dependent dehydrogenases (PQQ-DHs). As membrane-bound dehydrogenases, PQQ-DHs can supply electrons directly to the membrane quinone pool, and enable growth on various alcohols and sugars, including glucose, methanol and ethanol [121]. Diverse PQQ-DHs are present in the non-AOA MAGs, all of which harbor predicted signal peptides suggesting extracellular localization (Table S3 in Data Set 1). Another similarity between D197_116 and the HMT lineage is the presence of a form III Ribulose-1,5-bisphosphate carboxylase (RuBisCO) gene (Fig. 3). Contig neighborhood comparisons between the two non-AOA MAGs suggest that D197_2 is missing the RuBisCO gene due to genome incompleteness, as the contig homologous to the RuBisCO-containing contig in D197_116 is truncated in this MAG. The D197_116 RuBisCO clusters with archaeal form III-b sequences, also found in bacteria of the Candidate Phyla Radiation (Fig. S8; [122]). RuBisCO genes previously reported in terrestrial Thaumarchaeota also affiliate within the form III-b cluster [96, 123]. The HMT RuBisCO sequences, in contrast, are homologous to the divergent form III-a sequences found in methanogens [16, 17]. The most parsimonious metabolic hypothesis for the HMT genomes suggested the involvement of RuBisCO in a potentially cyclic, anaplerotic CO2 incorporation pathway [17]. AMP phosphorylase (AMPase), a key gene in the archaeal AMP pathway for nucleotide salvage [124], was not identified in the HMT genomes, which rendered the generation of ribose 1,5-bisphosphate (R15P) in these archaea uncertain. In contrast, D197_116 contains an AMPase in the vicinity of the RuBisCO gene, in addition to an R15P isomerase required for the generation of the RuBisCO substrate ribulose bisphosphate. However, unlike the HMT genomes, the non-oxidative pentose phosphate pathway is missing in these MAGs and, therefore, a cyclic CO2 incorporation pathway is likely not present. Instead, the product of the AMP pathway, 3-phosphoglycerate, likely enters glycolysis to be converted to pyruvate and acetyl-CoA (Fig. 4).

Basal lineages of Thaumarchaeota described thus far, with the exception of the HMT lineage, are anaerobic heterotrophs respiring sulfate, nitrate, or iron [18, 96, 123]. The HMT lineage is genetically capable of respiring oxygen [16, 17]. Similar to other terrestrial basal groups, the non-AOA MAGs assembled here do not harbor aerobic terminal oxidases. Both MAGs, however, contained respiratory complexes I and II. The complex I (NADH:quinone oxidoreductase; Nuo) gene cluster in these genomes resemble that of the 2M-type complex I found in the HMT lineage [16], which features an extra copy of the NuoM subunit that may enhance the proton pumping efficiency of the complex [125].

D197_116 is potentially capable of extracellular electron transfer, as evidenced by the presence of multiple multiheme c-type cytochromes (MHCs) containing 10–12 heme (CXXCH) motifs each (Fig. 4). These MHCs may be involved in Fe(III) reduction as the gene neighborhood resembled that of the Fe(III)-reducing archaea in the family Candidatus Methanoperedenaceae [126, 127]; the MHCs in D197_116 are adjacent to an electron-transporting ferredoxin iron-sulfur protein and membrane-spanning NrfD-like proteins, as found in Ca. Methanoperedenceae [126]. Feo- and Fbp-like iron transporters (ferrous and ferric iron transporters, respectively) are uniquely found in the two basal MAGs, along with a bacterioferritin homolog that may be involved in intracellular Fe-storage (Fig. 4). Notably, many of these genes are absent in reference thaumarchaeal genomes (Table S2 in Data Set 1). Finally, D197_116 also harbors tetrathionate reductases (Fig. 4; Table S3 in Data Set 1), suggesting that these sulfur compounds may serve as external electron acceptors for these archaea. Membrane topology analysis of the tetrathionate reductase (Ttr) subunits predicted transmembrane domains for TtrC and external localization for TtrAB, potentially indicating their involvement in extracellular redox processes. The broader metabolic plasticity due to diverse pathways for energy generation and carbon assimilation might be advantageous for the survival of these archaea in the capillary fringe that experiences seasonal redox oscillations due to fluctuations in the water table [5].

Pangenomic inference into functional diversification among basal Thaumarchaeota

In order to compare the metabolic features of the non-AOA MAGs, we performed a pangenome analysis of a set of reference genomes representing basal lineages of Thaumarchaeota. The genomes were selected to represent species-level clusters on the GTDB r202 archaeal tree. Intriguingly, two of the genomes—UBA213_sp011331095 (Nitrososphaerales archaeon SpSt-435; ref. [128]) and UBA213_sp002713325 (Thaumarchaeota archaeon SAT137; ref. [129])—appear to be ammonia oxidizers, even though both genomes cluster basal to most non-AOA genomes on the ribosomal protein tree (Fig. 5) and harbor divergent copies of the amoA gene (Fig. S9). These two genomes thus represent an AOA family that falls outside the monophyletic clade of AOA families described so far.

Fig. 5: Pangenomic comparison of selected functions across basal lineages of Thaumarchaeota.
figure 5

Included in the comparison are the two non-AOA MAGs assembled in this study (highlighted in red text), as well as genomes representing each family level clusters of Nitrososphaerales defined in the GTDB release 06-RS202. Previously described genomes not included in the GTDB (potentially due to failing quality checks) are indicated in gray letters. Genomes names in bold, black letters indicate the AOA family UBA213, that falls outside of the typical monophyletic AOA clade and harbors divergent copies of the amoA gene (Fig. S9). Filled circles indicate the presence of a function/gene. Horizontal shading of the clades signifies taxonomic lineage – either order or family (indicated by the prefixes “o_” and “f_”, respectively). For each clade, the habitats of origin for the corresponding genomes are indicated in italic typeface. Complete pangenome analysis results are presented in Table S4.

A pangenome analysis of the basal lineages based on COG functional categories suggested independent gains or losses of functional modules among basal Thaumarchaeota (Fig. 5). For example, the patchy distribution of functional modules including urease, RuBisCO, plastocyanin, catalase, and bacterioferritin indicate that multiple basal lineages may have acquired (or lost) these genes independently along thaumarchaeal evolution (Fig. 5, Table S4). Intriguingly, the subsurface non-AOA genomes (i.e., RVT MAGs and RBG_16_49_8 from Rifle sediments) share several functional modules with the hot spring-associated thaumarchaeal families JACAEJ01 and JAAOZN01 (Fig. 5). These include CO dehydrogenase/acetyl-CoA synthase and other enzymes of the WLP, aldehyde-ferredoxin oxidoreductase, tungstate transporters, formate hydrogen lyase, and a putative DMSO reductase (Table S4; Fig. 5). Consistent with genome streamlining observed among marine AOA, many of the functional modules seem to be lost in transition to the HMT lineage (Fig. 5), with the exception of RuBisCO and PQQ-dependent dehydrogenases.

Conclusions

This study examined the phylogenetic diversity and metabolic potential of subsurface Thaumarchaeota lineages in floodplain sediments in the Wind River Basin, WY. Metagenomes obtained at discrete depths along the sediment profile yielded diverse Thaumarchaeota MAGs with distinct functional potential. Particularly notable was the shift in phylogenetic identity with sediment depth, which appeared to be linked to soil moisture as well as carbon/nitrogen content. The predominantly terrestrial Nitrososphaerales were dominant in the top, well-drained (dry) layers with relatively higher total C (and lower C:N), while the typically marine Nitrosopumilales dominated the deeper, moister layers (especially > 111 cm), including the capillary fringe where total C and N were the lowest. Non-ammonia oxidizing Thaumarchaeota MAGs were also recovered from within the capillary fringe. Thus, surface soils were dominated by relatively more generalist AOA capable of utilizing various organic compounds such as urea, cyanate and nitriles whereas typically oligotrophic AOA lineages became prominent in deeper, moister layers. This shift in phylogenetic diversity and metabolic potential from aerobic ammonia-oxidizing lineages in the surface depths to anaerobic, non-AOA lineages with diverse metabolic strategies in deeper layers likely indicates a link between thaumarchaeal population structure and subsurface hydrology, and/or a habitat partitioning pattern resulting from increasing oligotrophy in deeper sediment layers. These results, therefore, emphasize that soil moisture content should be a key variable of consideration in studies of thaumarchaeal evolution. The non-AOA MAGs representing a late-diverging basal lineage are particularly intriguing as they potentially represent an autotrophic lineage among basal Thaumarchaeota. The presence of WLP among multiple family level lineages within Thaumarchaeota suggests independent acquisition of this pathway among Thaumarchaeota. Overall, the Riverton MAGs add to the diversity of thaumarchaeal genomic data while expanding our knowledge on the functional capabilities of these ubiquitous archaea in subsurface environments.