Introduction

Over the past several years, the advancement of culture-independent techniques has prompted the discovery and genomic characterization of several new archaeal lineages (see refs. [1, 2] and references within). Each new phylum fills gaps in the genomic tree of life, allowing for the continuous reevaluation of the evolutionary models that lead from a common ancestor [2,3,4,5] to the splitting of the domains [6,7,8,9]. Similarly, analyzing genome annotations of uncultivated lineages has provided valuable insight into each groups’ metabolic potential and prospective geochemical role within their environment [10,11,12].

Recently, the first genomes from a unique, uncultivated lineage of Archaea, known as Candidatus Hydrothermarchaeota (previously Marine Benthic Group-E or MBG-E [13, 14]), were documented through metagenomic sequencing of crustal fluids collected from the deep subseafloor environment of the Juan de Fuca Ridge flank (JdFR; Figure S1) [15]. Samples were acquired using subseafloor borehole observatories called CORKs for (Circulation Obviation Retrofit Kits) that were installed during Integrated Ocean Drilling Program (IODP) Expedition 327 and provide access to the oceanic crust and the fluids circulating therein [16]. In the JdFR environment, fluid circulating between outcrops undergoes extensive fluid-rock reactions [17,18,19], becoming warm (64 °C), depleted in oxygen and nitrate, and enriched in dissolved metals and reduced gases [19, 20]. Eventually, the chemically altered fluids escape from discharge outcrops and hydrothermal vents, connecting both abiotic and biologically mediated water-rock reactions to global biogeochemical cycles [21, 22].

The fluid circulating through the JdFR basement environment harbors microbial cell densities in the order of 104 cells per ml [23, 24]. The Hydrothermarchaeota appear particularly abundant in the JdFR environment, comprising up to half of the archaeal 16S ribosomal RNA (rRNA) gene amplicons and one-third of the single-amplified genomes (SAGs) sorted from the total community [23]. These abundances are significantly greater than those observed from sedimentary environments, but similar to some hydrothermal vent structures (Table S1, Figure S2). Thus, some clades of Hydrothermarchaeota may be well adapted to life in the warm crustal biosphere, and detailing the potential metabolisms of these organisms should advance our understanding as to how life survives in this energy-limited environment.

This study combined the analysis of Hydrothermarchaeota metagenome-assembled genomes (MAGs [15]) with several newly generated SAGs from the same JdFR crustal fluids to evaluate Hydrothermarchaeota’s evolutionary relationship to other archaeal groups and assess the functional potential of the lineage. Our data reveal an early-branching archaeal candidate phylum arising between the Euryarchaeota superphyla and the superphylum containing Micrarchaeota, Altiarchaeota, UAP2, and Nanoarchaeota (DPANN (Diapherotrites, Parvarchaeota, Aenigmarchaeota, Nanoarchaeota, and Nanohaloarchaeota) [25, 26]). This basal evolutionary position is supported by the coding potential for early-evolved enzymes for anaerobic sulfate and nitrate reduction.

Materials and methods

Observatory description and fluid sampling

During the IODP Expedition 327 in 2010, borehole observatories were placed in subseafloor basement at IODP Holes U1362A and U1362B (Table S2, Figure S1). These observatories feature epoxy-coated steel observatory casing to minimize corrosion and mitigate impact on in situ processes, Teflon-lined “umbilical” tubes for pristine fluid collection from isolated subsurface intervals, and a 4-inch diameter “free flow” ball valve at the wellhead for additional fluid sampling [2729]. Each borehole penetrates approximately 235 m of sediment. Hole U1362B has an umbilical for fluid collection 30 m below the sediment-basement interface (meters sub-basement (msb)), while Hole U1362A has fluid collection horizons at approximately 30 and 190 msb.

In July 2011, fluid samples were collected using equipment on the remotely operated vehicle (ROV) Jason II from the Research Vessel Atlantis (Table S2). Fluids for metagenomic analyses were sampled from the umbilicals that accessed 190 and 30 msb at Holes U1362A and U1362B, respectively, as described previously [15]. At the seafloor, a mobile pumping system filtered approximately 124 and 70 l of crustal fluid from boreholes U1362A and U1362B using Steripak-GP20 (Millipore, Billerica, MA, USA) polyethersulfone filter cartridges containing 0.22 μm pore-sized membranes [15]. Before filtering, at least three times the volume of the umbilical line was flushed through the system to remove any stagnant fluids. Fluids for single-cell genomic analyses were sampled from the 190 msb Hole U1362A using the same mobile pumping system [23] and from the open ball valve on the wellhead at Hole U1362B [16], after a long-period of free flow to flush out the borehole dead volume, using a syringe cleaned with bleach and dilute trace-metal-grade acid. Temperatures as high as 62 °C were recorded with the ROV thermistor inside the ball valve opening. Immediately upon recovery, fluid was fixed with glycerol-Tris-EDTA (glyTE) buffer and frozen at −80 °C in cryovials for single-cell sorting [30].

Single-cell sorting, genome sequencing, and assembly

The generation, identification, sequencing, and de novo assembly of SAGs was performed at the Bigelow Laboratory for Ocean Sciences Single Cell Genomics Center (scgc.bigelow.org). The cryopreserved samples were thawed, pre-screened through a 40 μm mesh size cell strainer (Becton Dickinson) and incubated with 5 μM (final concentration) SYTO-9 DNA stain (Thermo Fisher Scientific) for 10–60 min. Fluorescence-activated cell sorting, cell lysis, multiple displacement amplification, sequencing (using Illumina technology), de novo genome assemblies, and quality control were performed using the workflow benchmarked in ref. [30]. Contigs >2 kbp in length were uploaded to the Joint Genome Institute (JGI) Integrated Microbial Genomes & Microbiomes (IMG/M) comparative data analysis system (Table S3 [31]) for gene prediction and annotation using the genome annotation pipeline [32].

Metagenome sequencing, assembly, and annotation

Metagenome sequencing, assembly, binning, and annotation has been reported previously [15]. Briefly, quality-filtered raw sequence reads from the crustal fluids of Hole U1362A (IMG/M ID 330002481) and Hole U1362B (IMG/M ID 3300002532) were assembled using SOAPdenovo version 1.05 with default settings, binned using CONCOCT [33] and curated within the Anvi’o package, version 1.1.0 [34]. In total, 98 MAGs were produced, of which 3 were identified as Hydrothermarchaeota. Hydrothermarchaeota MAGs JdFR-16, JdFR-17, and JdFR-18 were uploaded to IMG/M for gene prediction and annotation using the genome annotation pipeline [32]. Completeness and contamination estimates for SAGs and MAGs were made by comparing annotated protein sequences against the Euryarchaeota marker list within CheckM [35]. Average Nucleotide Identity (ANI) comparisons were calculated using IMG/M pairwise ANI tool [31].

Phylogenetic and phylogenomic analyses

Phylogenetic trees of the 16S rRNA, nitrate reductase, and ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO) genes were constructed using raxMLHPC (version 8.2.8, [36], see supplemental information). Briefly, 16S rRNA genes were aligned using the SILVA Incremental Aligner (SINA) online tool [37] and masked out with the lane1349 mask [38]. Nitrate reductase and RuBisCO genes were aligned using MUSCLE (version 3.8.31 [39]) and trimmed and masked using trimAl (version 1.2rev59 [40]). All alignments were manually inspected.

A phylogenomic tree based on 43 single-copy marker genes was created from all publicly available genomes from the archaeal domain from IMG/M, National Center for Biotechnology Information (NCBI), and other repositories of data (Tables S4, S5). MAGs and SAGs with CheckM-generated completeness, contamination, and strain heterogeneity information (version 1.0.11 [35]) were used as input to dRep (version 2.0.5 [41]) to produce a dereplicated set of genomes for phylogenomic analysis. Most default dRep parameters were used, but the required completeness was reduced to 50% to account for genes that are systematically absent from single-copy marker gene sets in archaeal groups of critical importance to the phylogeny (e.g., Hadesarchaeaota). From the dereplicated genomes (n = 1198) and the three most complete Hydrothermarchaeota genomes (JdFR-17, JdFR-18. and SAG AC-708-L17), the 43 single-copy marker gene alignment produced by CheckM was used as input to FastTree [42] with the WAG amino acid substitution model. Phylogenetic lineages were identified and collapsed in ARB (version 6.0.4 [43]) with the guided assistance of taxonomic information generated using GTDB-Tk (version 0.0.7) using the classify workflow with database release 83 [26].

Amplification of mcrA gene

Amplification of the mcrA gene on a sorted, whole genome-amplified Hydrothermarchaeota cell was attempted using primers qmcrA [44] and ML 5 ([45], see supplemental information).

Thermodynamic calculations of Gibbs free energy

The potential energy yields for sulfate reduction coupled to various electron donors were calculated according to the Gibbs energy of reaction (see supplemental information).

Results and discussion

Comparative genomics of Hydrothermarchaeota MAGs and SAGs

Hydrothermarchaeota constituted 42% (n = 28/66) and 24% (n = 23/94) of the identified SAGs from the Holes U1362A and U1362B, respectively. Five SAGs were chosen for genome sequencing: SAGs AC-708-L17 and AC-708-N22 from Hole U1362A; and AC-334-K11, AC-335-G21, and AC-335-L21 from Hole U1362B (Table 1). These SAGs range in size and estimated completeness from 1.26 Mbp and 70% complete (AC-708-L17) to 0.47 Mbp and 23% complete (AC-708-N22; Table 1). Based on these SAGs, a complete Hydrothermarchaeota genome is estimated to approximate 1.8 Mbp, comparable to other subsurface Archaea [46, 47]. The three Hydrothermarchaeota MAGs previously constructed from Holes U1362A (MAGs JdFR-17 and JdFR-18) and U1362B (MAG JdFR-16 [15]) range from 1.35 Mbp and 31% complete to 2.18 Mbp and 97% complete (Table 1). Contamination estimates (sequence redundancy) within the MAGs range from 7 to 25%, although strain heterogeneity estimates (72–80% [35]) suggest that the redundancy may reflect the binning of closely related Hydrothermarchaeota strains. Genomic guanine–cytosine (GC) content is approximately 50% for all genomes except MAG JdFR-18, which is 39%. The five Hydrothermarchaeota SAGs have similar 16S rRNA genes (>99%, Table S6). Of the two MAGs that contained 16S rRNA genes, JdFR-17 was >99% similar to the five Hydrothermarchaeota SAGs, while the JdFR-18 16S rRNA gene was only 88–89% similar to the Hydrothermarchaeota SAGs and MAG JdFR-17 (Table S6), potentially representing a second Hydrothermarchaeota family.

Table 1 General characteristics of Candidatus Hydrothermarchaeota SAGs and MAGs

Phylogenetic analyses revealed that the JdFR Hydrothermarchaeota 16S rRNA gene sequences grouped most closely with environmental sequences from other crustal environments (Fig. 1a). The majority of sequences clustered with a sequence from black rust that formed on the exterior of a leaking subseafloor observatory at nearby Hole 1026B, where the black rust was still exposed to hydrothermal fluids leaking from the observatory [48]. MAG JdFR-18 branched separately with a sequence identified within crustal fluids collected from the hydrothermal vent of the Southern Mariana Trough [49].

Fig. 1
figure 1

Phylogenetic associations of the Juan du Fuca Candidatus Hydrothermarchaeota. Black (100%) and white (99–80%) circles indicate nodes with high local support values. a Phylogenetic associations relative to other Ca. Hydrothermarchaeota. 16S rRNA genes sequences from this study are in bold, sequences from other studies are indicated with their accession numbers (Table S8). b Phylogenomic associations of Ca. Hydrothermarchaeota genomes among archaeal genomes publicly available in Integrated Microbial Genomes (IMG), National Center for Biotechnology Information (NCBI), and other repositories, using classifications suggested by the Genome Taxonomy Database [26] (Table S4). Tree represents the concatenation of 43 single copy marker proteins (Table S5)

Genome-wide ANI values corroborate the 16S rRNA gene phylogeny (Table S7). With the exception of SAG AC-708-N22, the SAGs are very similar (mean 98.6 ± 1.4% s.d.; n = 4). SAG AC-708-N22 is most similar to MAG JdFR-17, sharing an ANI value of 98.2%. Consistent with the 16S rRNA gene phylogeny and GC content, MAG JdFR-18 is equally dissimilar to all other genomes (mean 68 ± 0.8% s.d.; n = 8). These genomic relationships highlight the connectivity between the two different borehole locations and depth horizons (Table S2) and the diversity within the Hydrothermarchaeota community.

Evolutionary placement of the Hydrothermarchaeota

This study sheds light on the taxonomic placement of Hydrothermarchaeota within the archaeal tree of life. Phylogenies based on 16S rRNA genes as well as concatenated alignments of 43 single-copy phylogenetic marker genes place the Hydrothermarchaeota lineage branches toward the root of the DPANN superphylum (Fig. 1b). Currently, there are several predictions for the root of the archaeal tree, including placement within the Euryarchaeota [4, 50] or between DPANN and the remaining archaeal lineages [5]. Regardless, the placement of Hydrothermarchaeota [26] between Euryarchaeota and DPANN lineages suggest that Hydrothermarchaeota represents an early-branched lineage that should be considered in future evolutionary models. Likewise, additional exploration of the JdFR crustal ecosystem is important for interpreting the metabolic potential of this early-branched lineage. Similar to other proposed early-life analogs, the crustal aquifer presents a hot, anoxic environment protected from sunlight and oxygen [51], but is drastically understudied relative to hydrothermal vent systems.

Terminal electron acceptors for Hydrothermarchaeota

By leveraging metagenomics with single-cell genomics, we were able to validate the binning of the previously constructed MAGs by confirming the presence of important genes within the partial SAGs. Here, the functional attributes of each genome are evaluated individually (Figure S4) in order to provide a collective summary of the lineage’s metabolic potential (Fig. 2).

Fig. 2
figure 2

Metabolism interpretation of Candidatus Hydrothermarchaeota single-amplified genomes (SAGs) and metagenome-assembled genomes (MAGs) from Juan de Fuca Ridge flank subsurface crustal aquifer, based on the genes present within all genomes collectively. Black labels represent metabolites, blue labels represent genes or gene subunits that are present within at least one of the genomes (for individual genomes see Figure S4), gray labels represent genes or subunits not found in the genomes studied. Two black arrows aligned in the same direction represent a pathway requiring multiple genes, all of which were found in at least one genome. Pathway abbreviations: WL Wood–Ljungdahl, RHP reductive hexulose-phosphate. Gene name abbreviations: cdhABCDE CO dehydrogenase/acetyl-CoA synthase (subunits alpha, A; epsilon, B; beta, C; delta, D; gamma, E), cooC CO dehydrogenase maturation factor, cooS carbon monoxide dehydrogenase catalytic subunit, fwdABCDEFG formylmethanofuran dehydrogenase (subunits A–G), ftr formylmethanofuran-tetrahydromethanopterin formyltransferase, mch methenyltetrahydromethanopterin cyclohydrolase, mtd methylenetetrahydromethanopterin dehydrogenase, mer methylenetetrahydromethanopterin reductase, mtrA tetrahydromethanopterin S-methyltransferase (subunit A), hdrBCD CoB–CoM heterodisulfide reductase (subunits B–D), fdo formate dehydrogenase, Fqo ferredoxin:NADP+ oxidoreductase, frhABG coenzyme F420-reducing hydrogenase (subunits ABG), pgm/pmm phosphomannomutase/phosphoglucomutase, gpi glucose-6-phosphate isomerase, fba fructose-bisphosphate aldolase, fbp D-fructose 1,6-bisphosphatase, gap glyceraldehyde 3-phosphate dehydrogenase, pgk phosphoglycerate kinase, pgm phosphoglycerate mutase, eno enolase, pk pyruvate kinase, porABGD pyruvate ferredoxin oxidoreductase (subunits A–D), acs acetyl-coenzyme A synthetase, apr dissimilatory adenylylsulfate reductase (subunits A, B), dsrAB sulfite reductase alpha (subunits, A, B), sat sulfate adenylyltransferase, NapADGH nitrate reductase (subunits ADGH). Biomolecule abbreviations: SO4 sulfate, APS adenosine-5’-phosphate, SO3 sulfite, S sulfide, NO3 nitrate, NO2 nitrite, MQ menaquinone, F420 coenzyme F420, MF methanofuran, MPT methanopterin, CoA/CoB/CoM coenzyme A/B/M, P phosphate

The Hydrothermarchaeota genomes contain genes for the use of several different terminal electron acceptors including sulfate, nitrate, and potentially metal oxides (Figure S4), suggesting versatility in the choice of oxidant. Phylogenetic analyses of key sulfate and nitrate reductase subunits suggest that these genes represent some of the earliest evolved forms of these enzymes found to date, further supporting Hydrothermarchaea as a deeply branching lineage (Figure S5 [52]).

The capacity for sulfate reduction is evident in the JdFR Hydrothermarchaeota (Fig. 2, Table S9). Most of the genomes had at least one sulP permease, suggesting that sulfate can readily enter Hydrothermarchaeota cells, while MAG JdFR-18 also included coding regions for sulfate adenylyltransferase (sat), the enzyme responsible for reducing sulfate to adenosine-5′-phosphosulfate (APS), and APS reductase for reducing APS to sulfite. The identification of genes for dissimilatory sulfite reductase (dsrAB) in SAG AC-708-L17 and the three MAGs suggests that sulfite reduction to sulfide is also likely. The dsrA genes found in SAG AC-335-L21 and the three MAGs are most closely related to dsrA from Moorella species, “Candidatus Rokubacteria”, and “Candidatus Aigarchaeota”, and appear to be an early-evolved form of the sulfate reductase gene, as suggested previously [52]. The dsrA genes of Hydrothermarchaeota are not monophylogenetic: additional drsA genes present in MAG JdFR-18, and SAGs AC-334-L17 and AC-334-K11 are similar to genes from bacteria and may be obtained via horizontal gene transfer [52]. The electrons for sulfate reduction are possibly passed from the menaquinone loop using membrane-bound dsrMK, and then to a soluble, unidentified electron carrier protein. For example, the protein DsrC has been suggested to be an electron carrier in past studies of Archaeoglobus [53]. The possible roles of other sulfur species as electron donors or acceptors for Hydrothermarchaeota are limited, and no evidence of thiosulfate reductase was found in any of the Hydrothermarchaeota genomes.

There is ample evidence for microbial sulfate reduction within the JdFR crustal fluids. Fluids are replete with sulfate (~18 mM [19, 20]), demonstrate measurable sulfate reduction, and contain dissimilatory dsrAB genes [54]. dsrAB genes were also observed in JdFR rocks, along with pyrite sulfur stable isotope values that indicate microbial sulfate reduction [55]. We hypothesize that Hydrothermarchaeota contribute to the sulfate reduction potential in this ecosystem, along with the Deltaproteobacteria, Firmicutes, and Archaeoglobus microbial community members previously identified in this ecosystem [15, 54].

Hydrothermarchaeota may have metabolic flexibility in terminal electron acceptors for respiration, as evidenced by the presence of genes for nitrate reduction. Genomes contain subunits for two different types of nitrate reductase genes: nap, the periplasmic dissimilatory nitrate reductase genes, and nar, the cytoplasmic membrane-bound nitrate reductase (Fig. 2, S4, Table S10). Phylogenetic analysis of napA genes suggests that the Hydrothermarchaeota napA gene represents an early-evolved form, supporting other evidence for Hydrothermarchaeota’s basal placement in the archaeal tree of life (Figure S5). However, the possibility for nitrate reduction in this ecosystem is unclear. Nitrate is rapidly exhausted after entering the ocean crust [18] and measured concentrations within the JdFR fluids over multiple years have been below detection or at nanomolar concentrations [19, 20]. Plausibly, trace amounts of nitrate could be intensely cycled, as has been observed in continental subsurface systems with cryptic N cycling [56] and suggested from other JdFR metagenomic interpretations [15]. Thus, the possibility of nitrate reduction in this crustal ecosystem warrants further attention.

Most genomes also possess various subunits for cytoplasmic membrane-bound nitrate reductase (subunits narGHIJ, Table S10). The carboxydrotroph A. fulgidus also has this nitrate reductase, but has not demonstrated nitrate reduction in the laboratory [53]. Interestingly, transcripts of A. fulgidus show an upregulation of narA when reducing sulfate, indicating that this nitrate reductase complex might be accepting electrons from ferredoxin to reduce menaquinone [53]. However, additional laboratory studies are necessary to decipher the true potential of nitrate reductase in A. fulgidus and JdFR Hydrothermarchaeota.

Hydrothermarchaeota genomes include genes for many different c-type cytochromes, which are likely involved in terminal electron transferring processes (Table S11). Within the Archaea, microorganisms known to contain c-type cytochromes are restricted to the orders Archaeoglobales, Methanosarcinales, Halobacteriales, and Thermoplasmatales [57], where they serve as electron transfer proteins. Given the metabolisms of neutrophilic anaerobes within these orders, we hypothesize that the c-type cytochromes in Hydrothermarchaeota are either (1) generating a proton gradient through the use of an electron transport system, as observed in methanogens and methanotrophs of the Methanosarcinales order [57], or (2) reducing extracellular ferric oxide species, as observed in Ferroglobus placidus and Geoglobus ahangari [57], iron reducing archaea of other marine hydrothermal systems.

Carbon cycling by Hydrothermarchaeota

All Hydrothermarchaeota genomes have genes for carbon monoxide (CO) cycling and the Wood–Ljungdahl pathway (Fig. 2, Tables S12,13), suggesting that these organisms are carboxydotrophs capable of using CO in dissimilative and assimilative pathways [58]. Most genomes contained the mono-functional CO dehydrogenase catalytic subunit (cooS) and/or the CO dehydrogenase maturation factor genes (cooC), which may be used together to oxidize CO to CO2 for dissimilative processes [59]. The Hydrothermarchaeota genomes lack evidence for CO-induced hydrogenase subunits (cooH and cooL) and associated electron carrier (cooF) used by bacterial hydrogenogenic carboxydotrophs Rhodospirillum rubrum and Carboxydothermus hydrogenoformans [60, 61]. Instead, Hydrothermarchaeota probably couple CO oxidation to sulfate reduction similar to Archaeoglobus fulgidus [62].

Gene subunits for the ferredoxin:NADP+ oxidoreductase (fqo) complex may represent a potential electron shunt into the membrane-bound respiratory chain, linking CO oxidation to an external electron acceptor (Table S14 [53, 63]). Genes for the biosynthesis of menaquinone suggests that menaquinone redox reactions could continue to transfer elections along the respiratory chain to an ultimate acceptor, while translocating protons across the membrane (Table S14).

Hydrothermarchaeota SAGs and MAGs also contain one to two bifunctional CO dehydrogenase/acetyl-CoA synthase (CODH/ACS) complexes (cdhABC and cdhCDE; Fig. 2, S12), which couple the reversible reduction of CO2 to CO oxidation to form acetate (acetogenesis), a key step of the Wood–Ljungdahl pathway [64]. The CODH/ACS complex is hypothesized to be an early-evolved complex [58], and thus its presence in this early-branching lineage is consistent with its proposed ancestry. The presence of these complexes and other Wood–Ljungdahl enzymes suggest that the Hydrothermarchaeota can also assimilate carbon monoxide to biomass, again similar to A. fulgidus. Several of the SAGs and the three MAGs also contain formate dehydrogenase (Fig. 2, Table S13), suggesting that formate production during CO oxidation or the use of formate as an electron donor is possible, as observed with cultures of A. fulgidus [62, 65].

Hydrothermarchaeota do not appear to be involved in methane cycling, although they possess some genes known to be involved in methyl cycling. For example, the genomes contain genes that encode for the methanogenic tetrahydromethanopterin S-methyltransferase subunit (mtrH, Table S13), and MAG JdFR-18 also encodes genes for methyl transferases of methylamide compounds (Table S15). These enzymes are all involved with the methylation of methyl-coenzyme M (methyl-SCoM); during methanogenesis, methanogens reduce methyl-SCoM with the enzyme methyl-coenzyme M reductase (MCR). No subunits of the mcr gene were found within the Hydrothermarchaeota genomes, suggesting that these organisms cannot produce methane. To verify that the absence of the mcr subunits did not result from a lack of recovery, a PCR reaction targeting the mcrA gene was performed on amplified SAG DNA from a sorted cell that was identified as Hydrothermarchaeota but not genome sequenced. No PCR product was observed (data not shown). A negative result supports the absence of mcrA in these Hydrothermarchaeota genomes, but cannot rule out possible biases related to the amplification reactions. Nevertheless, the presence of mtrH and lack of mcrA has been recognized in Theionarchaea (SAG DG-70, [66]) and Euryarchaeota genomes (A. fulgidus, IMG/M IDs 2588253768/ 638154502; Achaeoglobus sulfacticallidus, IMG/M ID 2522125074, and a partial MSBL1 genome [67]). This suggests that the methyl-SCoM metabolite may have an alternate fate in these organisms. Given that the last common ancestor has been hypothesized to be a methanogen [3, 68], the placement of Hydrothermarchaeota outside the Euryarchaeota (many of which are methanogens) may further advance our understanding of the distribution and evolution of this ancient and fundamental metabolism.

While no known CO-induced hydrogenases were identified, genes for Ni-Fe hydrogenases (large and small subunits) were found in SAG AC-344-K11 and MAGs JdFR-16 and JdFR-17 (Fig. 2, Table S16). These hydrogenases may serve as a sink for the electrons produced during CO oxidation. However, it is also possible that the Ni-Fe hydrogenases may oxidize hydrogen for the hydrogenotrophic reduction of sulfate, as previously observed as an alternative to carbon monoxide oxidation in a culture of A. fulgidus [69]. In addition to Ni-Fe hydrogenases, most JdFR Hydrothermarchaeota contained genes encoding for the F420-reducing hydrogenase beta subunit, a hydrogenase required for the Wood–Ljungdahl pathway, as well as genes for hydrogenase biosynthesis proteins (hypABCDEF) and hydrogenase maturation proteins.

Given the evidence for CO oxidation and sulfate reduction, Gibbs free energy yields were estimated for sulfate reduction with CO oxidation and compared to various other electron donors (equation S1, Tables S17-S19), following approaches described previously [70,71,72]. When possible, in situ concentrations were used for the calculations (Table S18 [19, 71]). To our knowledge, concentrations of CO and acetate have not been measured within JdFR crustal fluids in this ecosystem; thus, calculations were based on a range of concentrations for these analytes from (10 nM to 100 µM, based on reported in situ CO concentrations in other marine and fluid-rock reaction environments ([73, 74] Table S18). Of the reactions tested, CO oxidation coupled with sulfate reduction yielded the most exergonic conditions when normalized per electron transferred (Fig. 3, Table S19). This supports the interpretation of dissimilatory carboxydotrophy metabolism in Hydrothermarchaeota, which, as an autotroph may feed the microbial food web and drive the carbon cycling in this warm crustal biosphere.

Fig. 3
figure 3

Free energy yield (kJ mol–1 e–1) for sulfate reduction coupled to acetate, hydrogen, methane, or carbon monoxide (CO) oxidation (Table S19) at various electron donor concentrations, based on the in situ conditions of Juan de Fuca Ridge fluids (Table S18)

Metabolic pathways for essential metabolites and biomolecules

The JdFR Hydrothermarchaeota contain pathways for sugar, amino acids, nucleic acids, and lipid metabolism. Collectively, the genomes have many of the genes required for the gluconeogenesis/glycolysis pathway, missing only pyruvate kinase (Table S20). Instead of being converted to pyruvate, we hypothesize that phosphoenolpyruvate is likely converted to oxaloacetate by phosphoenolpyruvate carboxylase. The presence of fumarate hydratase, succinate dehydrogenase, succinyl-CoA synthetase, 2-oxogluterate ferredoxin reductase, and isocitrate dehydrogenase suggests the presence of an incomplete tricarboxylic acid (TCA) cycle (Table S21), which is similar to other anaerobic archaeal groups such as methanogens [75]. In anaerobic organisms, TCA-related genes provide for the potential synthesis of several important biosynthetic intermediates such as fumarate, succinate, succinyl-CoA, and 2-oxoglutate. These intermediates can then serve as the building blocks for amino acid, pyrimidine, and purine metabolisms (Table S22). The Hydrothermarchaeota possess many genes for synthesis of isoprenoid-based lipids using the mevalonate pathway, including hydroxymethylglutaryl-CoA synthase, isopentenyl phosphate kinase, isopentenyl-diphosphate delta-isomerase and mevalonate kinase (Table S23). Transporters for trace elements (Co, Ni, Mo, W) and the vitamin biotin were identified, along with transporters for branched amino acids (Table S24). These could then serve as a potential source of nitrogen and organic carbon for the cell.

Some JdFR Hydrothermarchaeota genomes contain RuBisCO genes (Fig. 2), which were aligned against other RuBisCO genes to understand their potential metabolic function. These Hydrothermarchaeota RuBisCO genes group phylogenetically with form III-a RuBisCOs (Figure S6), which are known to fix CO2 for the synthesis of metabolites (including nucleic acids and sugars) using the reductive hexulose-phosphate (RHP) cycle [76]. Hydrothermarchaeota include many of the genes necessary for the RHP cycle, including a gene for a fused hexulose-6-phosphate/formaldehyde activating enzyme (Table S25). When these two enzymes act in concert, they produce methylene-H4MPT from 3-arbino-hexulose-6-phosphate, an important metabolite for the Wood–Ljungdahl pathway. However, phosphoribulokinase, an important RHP enzyme, has yet to be identified within the JdFR Hydrothermarchaeota genomes, and thus the potential for the RHP cycle cannot be confirmed.

Motility as an adaptive strategy of Hydrothermarchaeota

JdFR Hydrothermarchaeota partial genomes contain more chemotaxis and motility-related genes than many of the archaeal SAGs publicly available in IMG/M (Table S26-27). Specifically, when compared to archaeal genomes of marine environments, and accounting for relative completeness, JdFR genomes generally have more motility genes than genomes from sedimentary environments (Fig. 4a). A similar trend can be observed when comparing community-wide metagenomic samples (Fig. 4b). The relative abundance of motility genes within metagenomes from crustal environments (Juan de Fuca and the Mid-Atlantic Ridge [77]) are greater than sediment environments and comparable to ocean water column samples.

Fig. 4
figure 4

The relative abundance of motility genes in publicly available marine-related genomes and metagenomes as defined by COG (Clusters of Orthologous Groups) annotations. a Relative abundance of motility genes in single-amplified genomes and metagenome-assembled genomes collected from various marine environments relative to their genome completeness (considering genomes that are at least 10% complete, Table S28): sediments (brown triangle), hydrothermal vents (purple circle), crustal aquifers (orange circles), and ocean water column samples (blue squares). Candidatus Hydrothermarchaeota genomes are highlighted as yellow diamonds. b The relative abundance of motility genes in metagenomes collected from various marine environments (Table S29). The box and whiskers represent the range of relative abundance as defined by quartiles

Motility genes found in the Hydrothermarchaeota genomes include those for the archaellum and chemotaxis. It is unclear if Hydrothermarchaeota use the archaellum solely for motility or to attach to surfaces, leading to biofilm formation. The processes for attachment are diverse within Archaea [78], and thus the presence of archaellum-related genes do not necessarily indicate or exclude potential biofilm production. In either case, archaellum rotation would likely be driven by adenosine triphosphate hydrolysis [79]. This poses a problem for microorganisms living in subsurface low-energy environments, which are thought to be surviving just slightly above their minimum energy requirements [72, 80]. Subsurface cells are expected to focus on maintenance over growth, suggesting that motility would be impractical [80]. However, the possibility of motility in sediment is suggested from metatranscriptomes analyzed from sediments of the Peru Margin [81]. This prior work suggested that motility decreases with decreasing porosity, a trend that supports the comparisons of Fig. 4. Given that the crustal biosphere is relatively porous compared to a sedimentary environment, the advantages of motility or biofilm production in this fluid environment can be appreciated, even though the energy requirements remain a paradox. Assuming that nutrients and concentrations of electron acceptors exist as patches that disperse with time (i.e., decaying cells, marine snow particles), the energetic gain of motility would depend on the size and concentration of the chemoattractant, degree of fluid mixing, the distance between the cell and nutrient packet, and the speed at which a cell could travel to the nutrient packet [82]. Alternatively, surface attachment and biofilm production can provide protection and foster metabolic interdependencies [78]. Either way, the abundance of genes related to motility and chemotaxis in these Hydrothermarchaeota genomes, and crustal metagenomes in general, suggest that organisms of the deep crustal biosphere have adopted a strategy for balancing these energetic gains and costs.

Conclusions

Single-cell and metagenome-assembled genomes from the uncultivated Hydrothermarchaeota lineage were derived from warm, anoxic subsurface crustal fluids collected from the Juan de Fuca Ridge flank. Comparative genomic analysis provided the first evolutionary and metabolic characterization of this lineage. These genomic datasets revealed Hydrothermarchaeota to be an early-branching archaeal phylum, arising near the central branch points of the DPANN and Euryarchaeota groups. The genomes harbor evidence for many early-evolved metabolisms including ancient forms of sulfate and nitrate reductases. These observations underscore the significance of the hot, deep marine crustal biosphere as an important habitat for understanding the evolution of early life. Hydrothermarchaeota appear to be carboxydotrophs, highlighting this often-overlooked metabolic pathway as playing an important role in subsurface fluid-rock reaction environments, as suggested recently [73]. The presence of chemotactic and motility genes suggests that Hydrothermarchaeota may be capable of seeking favorable redox conditions or other nutrients. Despite a small average genome size, the versatility afforded by the inferred metabolic and phenotypic characteristics of the Hydrothermarchaeota may represent important survival strategies for life in the warm crustal biosphere.