Introduction

The phylum Thaumarchaeota, encompassing all ammonia-oxidizing archaea (AOA), represents up to ~30% of the prokaryotic community in the marine water column [1, 2] and up to 3% in the soil [3, 4], playing a major role in the global nitrogen cycle [5,6,7,8]. They are prevalent in seawater, soil, marine sediment, freshwater sediment and hot springs among other habitats [6, 7, 9]. While members of Thaumarchaeota remain largely uncultivated, genomic sequences from pure cultures, enrichment cultures, uncultivated single cells, and metagenomic assemblies are becoming increasingly available from all of these environments, providing an opportunity to study metabolic adaptation of this important group during its evolution and niche expansion.

Knowledge of the evolutionary history of Thaumarchaeota could provide insights into the mechanisms underlying their colonization and adaptation to new environments. Phylogenetic trees based on alignments of either single marker genes (including 16S rRNA and amoA genes) or concatenated conserved protein-coding genes indicate that the deepest lineages of AOA members of Thaumarchaeota largely consist of the organisms inhabiting terrestrial thermal environments [6, 10, 11], suggesting a thermophilic ancestor for AOA [12,13,14,15]. Gene acquisition through lateral gene transfer (LGT) from Bacteria to Thaumarchaeota was proposed to have facilitated the transition of Thaumarchaeota from thermal environments to temperate waters, though this conclusion was initially based on phylogenetic analyses of a limited number of taxa and genes [16]. A later study of 545 thaumarchaeotal fosmid clones from metagenomic libraries derived from samples of deep Mediterranean waters suggested that 24% of the gene families in these Thaumarchaeota were acquired from Bacteria [17]. If validated, such massive inter-domain transfers would have fundamental implications for the evolutionary history of Thaumarchaeota. More recently, some uncultivated non-AOA members of Thaumarchaeota were discovered in hot springs [18,19,20], anoxic peat [21], aquifer sediments [22], and acidic forest soils [23], and they consistently represent the deepest lineages in the Thaumarchaeota phylogenomic trees [18, 20, 21].

To gain deeper insight into the evolutionary history of Thaumarchaeota, we constructed a phylogenomic tree of Thaumarchaeota using sequences from phylogenetically and ecologically diverse lineages. We also evaluated the impact of LGT from Bacteria on thaumarchaeotal niche expansions at various evolutionary stages by integrating genomic data from Bacteria and other Archaea clades. Our analyses showed strong coincidences of the origin and habitat expansions of Thaumarchaeota with the availability of molecular oxygen in these habitats, and a potential role for acquisition of genes from Bacteria as drivers of these major evolutionary transitions.

Materials and methods

Phylogenomic tree construction and molecular dating analysis

Amino acid sequences from 77 single-copy gene families were concatenated for maximum-likelihood (ML) phylogenomic tree construction with IQ-Tree (v1.6.2) [24]. Each of these families was largely shared by 64 AOA Thaumarchaeota, 17 non-AOA Thaumarchaeota, 13 Aigarchaeota, 13 Bathyarchaeota, 21 Crenarchaeota, and 38 Euryarchaeota. The complete list of genomes used is provided in Table S1 and metadata for the single amplified genomes (SAGs) and metagenome-assembled genomes (MAGs) [25] used is provided in Table S2.

Next, a Bayesian estimate of the Thaumarchaeota divergence time on the ML concatenated tree was carried out with MCMCTree [26], using as calibration several temporal constraints (Table S3) like the timing of the archaeal root and the timing of diversification of a few crenarchaeotal lineages. Either a recent root of 3.8–2.7 billion years ago (Gya) [27] or an ancient root of 4.38–3.46 Gya [28] was employed. The time constraints of crenarchaeotal diversification events were derived from distinguishing features for the different lineages. For example, both Thermoproteales and Sulfolobales likely originated after the Great Oxygenation Event (GOE), because they use oxygen as a terminal electron acceptor [29]. Given the uncertainty of these calibrations, we used different combinations of these two types of constraints in the dating analyses. There have been other constraints that have been published for Euryarchaeota [29], but these constraints were not used here because whether or not the root of Archaea is placed within Euryarchaeota remains highly controversial [30,31,32,33]. Further technical details are provided in the Supplementary Methods.

Evaluating the role of LGT from Bacteria in Thaumarchaeota evolution

LGT events from Bacteria to Thaumarchaeota have been proposed to be crucial to the evolution and adaptation of Thaumarchaeota [15,16,17, 34]. We used two complementary approaches to infer potential LGT events. First, we inferred potential LGT events using the gene presence/absence pattern, which is independent of gene phylogeny. For example, LGT events may have occurred during the lifestyle transition from non-AOA to AOA if a gene family was absent from non-AOA members but prevalent among AOA members. LGT events may have also occurred during habitat expansion (from terrestrial to shallow ocean and from shallow ocean to deep ocean) if a gene family was absent from the members in the old habitat but present universally among the members in the new habitat. Once a gene family of interest was identified as a potential LGT based on the presence/absence pattern, the putative Bacteria origin of this gene family was inferred through ML phylogenetic analysis using IQ-Tree. The second method detected potential LGT events by systematically analyzing ML gene trees of 1346 gene families that we mapped to archaeal Clusters of Orthologous Groups (arCOGs) database [35]. Many genomes of Thaumarchaeota, Aigarchaeota, and Bathyarchaeota were not available when we performed this part of the analysis; thus, a reduced genomic data set of 63 AOA Thaumarchaeota, 14 Crenarchaeota, 32 Euryarchaeota, and 19 Bacteria were used. In addition, a large collection of fosmid clone sequences putatively derived from Thaumarchaeota [17] were included. Further technical details are provided in the Supplementary Methods.

Results and discussion

The evolutionary origin of ammonia-oxidizing Thaumarchaeota coincided with the Great Oxidation Event

Our Thaumarchaeota phylogenomic tree (Fig. 1) identifies four main groups evolving sequentially, including the basal group, the terrestrial group, the shallow-water group, and the deep-water group, which is congruent with the branching order of these major groups estimated in previous studies [18]. The hallmark genes responsible for ammonia oxidation [36], the amoABC genes encoding the ammonia monooxygenase (AMO), were completely missing in the basal group but universally present in the terrestrial and marine groups of Thaumarchaea (Fig. 2), confirming previous findings in which the basal group and the remaining groups were non-AOA and AOA, respectively [18, 20, 21, 23]. The phylogenomic tree also displayed that the basal, the terrestrial, and the shallow-water groups are paraphyletic and the deep-water group as monophyletic (Fig. 1). This is again fully consistent with previous phylogenetic inferences in which Thaumarchaeota originated from non-AOA members [20, 21], that AOA evolved from non-AOA [18], and that marine AOA evolved from terrestrial AOA. Within the marine Thaumarchaeota lineages (Fig. 1), the deep-water group clustered with the two Candidatus Nitrosopelagicus brevis strains enriched from shallow water sampled at 25 m depth [37, 38] and with two Candidatus Nitrosopelagicus MAGs reconstructed from water sampled at 100 m depth [39].

Fig. 1
figure 1

Evolutionary timeline of Thaumarchaeota estimated using MCMCTree on top of the rooted ML concatenated tree consisting of AOA Thaumarchaeota (the deep-water group, the shallow-water group, and the terrestrial group), and the non-AOA Thaumarchaeota (the basal group), Aigarchaeota, Bathyarchaeota, Crenarchaeota, and Euryarchaeota. Nodes with known time constraints were marked with brown ovals. Nodes showing the non-AOA Thaumarchaeota origin (Node 1), the AOA Thaumarchaeota origin (Node 2), and major habitat expansions (Nodes 3 and 4) are labeled, with the flanking horizontal gray bars representing the posterior 95% confidence intervals (CI). The two vertical gray bars represent the timing of great oxygenation event (GOE, 2330 Mya) and multiple deep ocean oxygenation events in the overall anoxic Ediacaran (635–560 Mya) ocean. The schematic representation of oxygen evolutionary history in the atmosphere, the shallow and deep ocean is placed at the bottom and adapted from [96]. The ancient atmospheric oxygen concentration represented by the percent of present atmospheric level is shown as a vertical bar with red-to-orange gradient with the percent value placed above the bar. For the purpose of illustration, the height of the bars representing oxygen concentration is not proportional to the corresponding data above the bar

Fig. 2
figure 2

The phyletic pattern of the ecologically relevant genes in 81 Thaumarchaeota and 13 Aigarchaeota. The phylogenomic tree shown in the left was pruned from the full phylogenomic tree used to predict the chronogram shown in Fig. 1, and the phylogenetic groups were colored according to Fig. 1 (with color scheme shown in a box at the top-left corner). The nodes with a bootstrap value >95% were marked in black solid dots. The estimated completeness of each genome was placed near the taxon name. The potential gene gains based on the gene presence/absence pattern and based on gene tree topology (resulting from the analyses of 1346 gene trees) were separated by a vertical dashed line. The solid and open circles represent the presence and absence of the genes, respectively. amoABC ammonia monooxygenase subunit A, B, and C, fnr ferredoxin-NADP+ reductase; mut cobalamin-dependent methylmalonyl-CoA mutase, abfD 4-hydroxybutyryl-CoA dehydratase, Carboxylase* biotin-dependent acetyl-CoA/propionyl-CoA carboxylase, HP/HB cycle hydroxypropionate/hydroxybutyrate cycle, kefA K+ transporter, aprAB adenylylsulfate reductase, subunit A, B, narI nitrate reductase gamma subunit, narJ nitrate reductase molybdenum cofactor assembly chaperone, porABDG pyruvate ferredoxin oxidoreductase subunit alpha, beta, delta and gamma, cydA cytochrome bd ubiquinol oxidase subunit I (anaerobic), codhA/B acetyl-CoA decarbonylase/synthase complex subunit alpha and epsilon (component of the Wood–Ljungdahl pathway for anaerobic carbon fixation), uvrABC excinuclease ABC subunit A, B and C (for UV-induced DNA lesion repair), pstACS phosphate transport system, speE spermidine synthase, ferritin-like, DNA-binding ferritin-like protein, ynaK redox-sensitive bicupin protein, ectA l-2,4-diaminobutyric acid acetyltransferase, ectB diaminobutyrate-2-oxoglutarate transaminase, ectC l-ectoine synthase, ectD ectoine hydroxylase. The full names of the genes involved in vitamin (riboflavin, biotin, cobalamin) syntheses are not listed but available in a previous study on Nitrosopumilus maritimus SCM1 [36]

The Bayesian relaxed clock model implemented in MCMCTree [26] dated the origin of all non-AOA lineages (Node 1 in Fig. 1) at 2631 million years ago (Mya) (posterior 95% confidence interval (CI): 2765−2518 Mya), and the origin of all AOA lineages (Node 2 in Fig. 1) around 2165 Mya (95% CI: 2285−2060 Mya). Therefore, the estimated origins of non-AOA members and AOA members of Thaumarchaeota were well separated by the GOE, the latter being recently constrained to approximately 2330 Mya based on the analysis of multiple sulfur isotope records [40]. These results suggest that the origin of terrestrial AOA and the first major ecological and metabolic diversification within Thaumarchaeota were closely associated with the first major oxygenation event on Earth. The fact that the known terrestrial and marine AOA cultures are all oxygen dependent [41,42,43,44,45,46] further supports this hypothesis.

The molecular dating analyses were calibrated with the age of the ancient Archaea root (4.38−3.46 Gya) and the diversification time of several Crenarchaeota lineages. This ancient Archaea root was used as a calibration in a recent study estimating the age of methanogens [28], but it was challenged by an independent study [47], which estimated that Archaea emerged much later (<3.4 Ga). The use of the diversification time of several Crenarchaeota lineages as calibrations is problematic, as these ages are secondary calibrations derived from ancestral state reconstructions [48, 49]. For these reasons, another meaningful way to calibrate the timing of these events is to use the more recent Archaea root (3.8−2.7 Gya), which has been employed in multiple studies [27, 50], as the sole calibration. Interestingly, this new analysis led to nearly identical estimates of the timing of all four major nodes in the Thaumarchaeota phylogeny (Fig. S1) (Nodes 1 and 2 discussed in the above, and Nodes 3 and 4 discussed in the following). Other calibration alternatives, including the use of the ancient Archaea root as the sole calibration and the use of the recent Archaea root along with the Crenarchaeota diversification as the calibration, yielded reasonably older and younger ages of all nodes under discussion, respectively (Fig. S1).

Metabolic changes upon the origin of the ammonia-oxidizing Thaumarchaeota correlate with oxygen availability

In addition to amoABC mentioned above, several potentially important pathways related to electron transport and energy production were likely acquired to facilitate the transition of Thaumarchaeota from an anaerobic to aerobic lifestyle. For example, the gene family encoding a multicopper oxidase was found universally in AOA but rarely in non-AOA thaumarchaeota. This gene family is represented by multiple copies in most AOA thaumarchaeota genomes, including the model marine AOA Nitrosopumilus maritimus SCM1 [36], and they might be involved in electron transfer during ammonia oxidation. During ammonia oxidation, ammonia is first oxidized to hydroxylamine, which is further oxidized to nitrite [51]. While it is well established that the first step was catalyzed by AMO, the enzyme and electron carrier associated with the oxidation of hydroxylamine to nitrite remains unknown. One hypothesis is that the multicopper oxidases are involved in the second step [6, 36, 52]. Furthermore, the gene encoding ferredoxin-NADP+ reductase (fnr) is prevalent among the AOA members but completely missing in the non-AOA Thaumarchaeota (Fig. 2). FNR mainly participates in photosynthesis, where it transfers an electron from ferredoxin to NADP+ to form NADPH [53]. In nonphotosynthetic organisms, FNR works in reverse to provide reduced ferredoxin to various metabolic pathways [54], including the oxidative stress response [55].

The biosynthetic pathways of vitamins B2 (riboflavin), B7 (biotin), and B12 (cobalamin) were prevalent among AOA members, but completely missing from the non-AOA members of the Thaumarchaeota (Fig. 2), whereas those of vitamins B1 (thiamine) and B6 (pyridoxine) were commonly found in both AOA and non-AOA members (data not shown). The ability to synthesize vitamins B1, B2, B6, and B12 has been demonstrated in several pure cultures of the marine AOA of the genus Nitrosopumilus [56, 57], and the genes for biotin synthesis were shown to be functional in Candidatus Nitrososphaera gargensis [58]. Vitamins are often used as cofactors in some enzymes, including biotin-dependent enzymes such as acetyl-CoA/propionyl-CoA carboxylase [59, 60] and cobalamin-dependent enzymes such as methionine synthase (metH), methylmalony-CoA mutase (mut), and ribonucleotide reductase (rnr) [61]. Among these, acetyl-CoA/propionyl-CoA carboxylase and methylmalony-CoA mutase are known to be associated with inorganic carbon fixation via the autotrophic hydroxypropionate/hydroxybutyrate (HP/HB) cycle [62], which has been shown to be highly energy-efficient in the presence of oxygen in N. maritimus SCM1 [63]. Consistent with their tolerance of oxygen, these two genes are commonly found in all AOA groups, but largely missing in non-AOA thaumarchaeota (Fig. 2). A similar phyletic pattern was found for another key enzyme of the HP/HB cycle, 4-hydroxybutyryl-CoA dehydratase (abfD) involved in the formation of crotonyl-CoA from 4-hydroxybutyryl-CoA.

The successful transition from anaerobic non-AOA to aerobic AOA was also accompanied by the loss of genes involved in anaerobic metabolism. For example, the genes encoding adenylylsulfate reductase (aprAB) and nitrate reductase (narIJ) were found in non-AOA members only (Fig. 2), suggesting that some of the non-AOA members may use sulfate/nitrate as an electron acceptor in anaerobic respiration and gain energy through dissimilatory sulfate/nitrate reduction [20, 64]. Other genes (Fig. 2) involved in anaerobic energy production and restricted to non-AOA included four subunits of pyruvate:ferredoxin oxidoreductase (porABDG), a subunit of cytochrome bd-type terminal oxidase (cydA), and two subunits of the acetyl-CoA decarbonylase/synthase complex (codhAB). Pyruvate:ferredoxin oxidoreductase contains [4Fe-4S] clusters and thiamine pyrophosphate (TPP), catalyzing the reversibly oxidative decarboxylation of pyruvate to form acetyl-CoA in most anaerobes [65]. Cytochrome bd may serve as a high-affinity oxidase to support energy-requiring processes under microaerobiosis and to protect anaerobic processes from inhibition by oxygen [66]. Finally, acetyl-CoA decarbonylase/synthase is part of the Wood−Ljungdahl pathway for inorganic carbon fixation under anaerobic conditions and prevalent in Aigarchaeota and Bathyarchaeota, two sister clades of Thaumarchaeota that are strictly or predominantly anaerobes [18, 67, 68]. These lines of evidence support the previous conclusion that the non-AOA thaumarchaeota are anaerobes [18], suggesting that the oxygen requirement of the AOA thaumarchaeota and the metabolism of ammonia oxidation were evolutionary innovations rather than vertically inherited traits.

The timeline of major habitat expansion of ammonia-oxidizing Thaumarchaeota

Our molecular dating analysis suggests that the transition of the AOA thaumarchaeota from the terrestrial habitat to shallow ocean water (Node 3 in Fig. 1) occurred approximately 1017 Mya (95% CI: 1084−959 Mya). This estimate agrees with the results of a previous study that estimated the divergence of Thaumarchaeota group I.1a (corresponding to the marine groups) from the group I.1b (corresponding to the terrestrial group, except that the genus Candidatus Nitrosotenuis is a member of the terrestrial group in the present study but was assigned to group I.1a previously) later than 950 Mya. This previous inference was based on an LGT event involving the gene encoding the DnaJ-Ferredoxin fused protein, from the common ancestor (LGT donor) of green algae and land plants to the common ancestor (LGT receptor) of Thaumarchaeota groups I.1a and I.1b [69]. The underlying principle for estimating diversification time based on LGT is that the donor must have existed before the receptor, and thus the diversification time of the donor serves as a constraint for the timing of the receptor diversification. Rather than linking the invasion of the shallow ocean by Thaumarchaeota to a geological event, we asked why it did not occur earlier. We conclude that this is because the Earth underwent a “boring billion” around 1800−800 Mya, during which the atmospheric oxygen levels were around 0.1% of the present atmospheric level (Fig. 1) and the surface ocean was largely as anoxic as the deep ocean [70, 71].

To overcome the salinity barrier during the transition from the land to ocean habitats, Thaumarchaeota may have acquired genes to regulate osmotic pressure. For example, the gene kefA encoding K+ transporter is universally present in marine Thaumarchaeota but largely missing from the terrestrial group (Fig. 2), consistent with the fact that transporting K+ across cell membranes is necessary for prokaryotes to respond and adapt to osmotic pressure [72, 73]. Furthermore, we estimated that the expansion of marine AOA from shallow to deep-water habitats (Node 4 in Fig. 1) occurred approximately 643 Mya (95% CI: 701–588 Mya), coinciding with multiple events of deep sea oxygenation embedded in an overall anoxic Ediacaran ocean (ca. 635−560 Mya) [74]. The coincidence of deep ocean oxygenation and expansion of Thaumarchaeota to deep water is consistent with the potential oxygen requirement for the deep-water group of Thaumarchaeota [75]. Although members of the deep-water group of Thaumarchaeota have not been cultivated, the metabolic profile of the deep-water group, including ammonia oxidation, electron transfer and vitamin synthesis, aligns well with the shallow-water group and the terrestrial group but differs fundamentally from the basal group, strongly suggesting that members of the deep-water group required oxygen for growth. Interestingly, the transition of marine AOA from shallow to deep water seems to have coincided with the loss of the uvr system encoding excinuclease for repairing ultraviolet (UV) light-induced DNA lesions and the loss of the pst system for phosphate scavenging under phosphorus limitation, as these genes are completely missing in all deep-water group members but consistently present in all other AOA thaumarchaeota (Fig. 2). These gene losses are consistent with the physicochemical conditions in the deep ocean where UV light is absent and phosphate is replete [76, 77].

Potential bacterial and archaeal gene transfers to Thaumarchaeota

Several studies proposed that LGT from Bacteria to Thaumarchaeota occurred frequently and played a major role in Thaumarchaeota adaptation [15,16,17, 34]. However, these studies did not resolve the relative timing of the LGT events, and thus were not able to link them to ancient evolutionary events such as the origin of AOA thaumarchaeota and their subsequent expansion. As discussed in the above sections, the universal presence of a gene family in the terrestrial and marine AOA but completely or largely absent from the non-AOA thaumarchaeota (34 out of 45 gene families shown in Fig. 2) strongly suggests acquisition of the gene occurred during the evolutionary transition from non-AOA to AOA. Likewise, the universal presence of a gene family in the marine AOA but completely or largely absent from the terrestrial AOA (11 out of 45 gene families shown in Fig. 2) strongly suggests acquisition of the gene occurred during the evolutionary transition from terrestrial AOA to marine AOA. Our phylogenetic analyses imply that during the transition from non-AOA to AOA, 11 of 34 gene families were potentially acquired from Euryarchaeota (Fig. S2), two from Crenarchaeota (Fig. S3), the sources of 14 were unresolved (Fig. S4), five were exclusive to AOA (amoABC, cbtA and kefA), and the remaining two were consistent with a pattern of vertical inheritance (Fig. S5). Among the 17 genes responsible for cobalamin synthesis shown in Fig. 2, nine were potentially acquired from Euryarchaeota (Fig. S2A-I), one from Crenarchaeota (Fig. S3A), one exclusively from Thaumarchaeota, the remaining six with unresolved sources (Fig. S4A–F). Among the three genes involved in riboflavin synthesis in AOA, two were potentially imported from Euryarchaeota (Fig. S2K) and Crenarchaeota (Fig. S3B), the remaining one with an unresolved source (Fig. S4J). The inconsistent sources of potential LGT may be explained by insufficient sampling of prokaryotic genomes in our analysis, or by the independent acquisition of component genes from multiple sources during the expansion of Thaumarchaeota. We did not find a strong evidence for LGT from Bacteria to Thaumarchaeota among these gene families.

The identification of LGT events based on gene presence and absence is not perfect, as potential LGT events within gene families shared by all groups of Thaumarchaeota cannot be captured. We therefore performed extensive phylogenetic analyses of 1346 gene families from arCOGs [78, 79]. Each arCOG selected for this analysis was broadly represented by genes from Thaumarchaeota, Crenarchaeota, Euryarchaeota, and Bacteria. Bacterial genes were automatically removed from 261 arCOGs as a result of quality check of the alignments using trimAl [80], suggesting that these bacterial genes may not be true orthologs or that they are too divergent to align well. In 689 of the remaining 1085 arCOGs (Table S4), thaumarchaeotal members comprise one or more well-supported monophyletic group that are more closely related to crenarchaeotal or euryarchaeotal genes compared to bacterial genes (Fig. S6A), suggesting that thaumarchaeotal genes in these families were not likely acquired from Bacteria. There was variable support for LGT from Bacteria to Thaumarchaeota in the remaining arCOGs, but few of these genes mapped to ancestral lineages present at the origin of AOA thaumarchaeota prior to their expansion into major ocean niches.

Our phylogenetic analysis suggests that the thaumarchaeotal gene clusters in 32 arCOGs (Table S5) are embedded in the bacterial lineages (Fig. S6B & C), supporting their transfer from Bacteria to Thaumarchaeota. Among these arCOGs are 15 gene families where the embedded thaumarchaeotal lineages contain members of both marine and terrestrial groups (Fig. S6B), suggesting that gains of these bacterial genes potentially occurred at the MRCA of all AOA Thaumarchaeota lineages. One example is the gene family arCOG00050, encoding a spermidine synthase (speE, Fig. S7A) responsible for spermidine synthesis. Apart from its known function in cell growth and proliferation, spermidine also functions as a scavenger of free radicals, protecting DNA from reactive oxygen species (ROS) damage [81]. A second example is arCOG01101 (Fig. S7B), encoding a DNA-binding ferritin-like protein, which is involved in the protection of DNA from oxidative stress by using ferroxidase activity to deplete ferrous iron (Fe2+) and hydrogen peroxide [82,83,84]. A third example is the arCOG02935 (Fig. S7C), which encodes a redox-sensitive bicupin protein (YnaK) belonging to the pirin family. Experimental evidence from studies in Escherichia coli shows that this family is sensitive to oxidative stress and suggests that it may have a role in cellular redox regulation [85]. Since the origin of AOA coincided with the GOE, the recruitment of these gene families from Bacteria may have helped them overcome the oxidative stress imposed by the GOE, facilitating the adaptation of Thaumarchaeota from the anoxic to oxic habitats.

The remaining 17 arCOGs embedded in Bacteria gene clusters contain exclusively marine members (Fig. S6C), suggesting that these gene gains potentially occurred at the MRCA of marine Thaumarchaeota, and thus likely facilitated the expansion of this lineage to the marine environment. For example, arCOG00915 (ectB, Fig. S8) in this group is a key member of the operon ectABCD responsible for the biosynthesis of ectoine, a prominent member of the compatible solutes [86]. Its role in osmotic stress resistance has been demonstrated recently in the model thaumarchaeote Nitrosopumilus maritimus SCM1 [87]. In fact, the remaining three genes (ectACD) show identical phyletic patterns to ectB (Fig. 2) and the ectC was inferred to be derived from Bacteria via LGT in a previous study [87]. While these gene trees favor the hypothesis that these gene acquisitions occurred at the branch leading to the MRCA of all AOA or the MRCA of marine AOA, these hypotheses remain to be confirmed, as these genes show patchy distributions among the lineages of AOA sampled (Fig. 2). We were not able to map any LGT event specifically to the ancestral branch giving rise to the deep-water group of Thaumarchaeota. This is consistent with the conclusion that the deep-water group members evolved from the shallow-water group, and that members of the latter were already equipped with antioxidative capability.

More gains of bacterial gene may have occurred after the thaumarchaeotal transition into the ocean, which may facilitate the adaptation of their descendant lineages to particular micro-niches. Several phylogenetic patterns support this idea. For example, we found 86 arCOGs each containing many thaumarchaeotal members, among which only a few Thaumarchaeota genes clustered with bacterial genes, versus the majority that form sister groups with Crenarchaeota or Euryarchaeota lineages (Fig. S6D). This distribution suggests that these few thaumarchaeotal genes were acquired recently from Bacteria. In contrast, another 268 arCOGs each contain few (i.e., less than five) thaumarchaeotal genes. Among these are 121 arCOGs where the thaumarchaeotal genes are exclusively derived from organisms inhabiting thermal springs: Candidatus Nitrososphaera gargensis Ga9.2 [41], the single cell AB-179-E04 [19], and the marine sponge associate Cenarchaeum symbiosum A [88] (Fig. S6E), explaining in part that N. gargensis Ga9.2 (2.83 Mbp) and C. symbiosum A (2.05 Mbp) have larger genomes than the pelagic members (1.60–1.70 Mbp). In fact, the inflated genome of N. gargensis Ga9.2 has been considered to be the result of extensive gene duplication and LGT after it diverged from other Thaumarchaeota lineages [52]. Among the remaining 147 arCOGs where Thaumarchaeota genes are rare, 58 contain Thaumarchaeota clustered with Bacteria (Fig. S6F). A parsimonious explanation for this might be that these LGT events occurred at the tips of the tree rather than at the MRCA of marine Thaumarchaeota, and thus have a limited impact on the ancient transition to life in the pelagic ocean.

Several genes, while acquired at the tips of the phylogeny, may also have contributed to the adaptation of recently evolved Thaumarchaeota lineages. An example is the restriction modification system (RM), which may protect marine Thaumarchaeota from viral infection or other foreign DNA invasion [89]. This includes a type I RM system including the restriction subunit (arCOG00878), methyltransferase subunit (arCOG02632) and S subunit (arCOG02626), that were potentially recently acquired from Bacteria (Table S4). Lastly, the thaumarchaeotal genes in the remaining 89 arCOGs are intermingled with Euryarchaeota and Crenarchaeota lineages in the tree, suggesting possible LGT from both major archaeal clades.

The extensive taxonomic sampling performed in the present study allowed identification of important vertically inherited genes previously thought to be acquired from Bacteria. One example is dnaK (arCOG03060), also known as Hsp70, which is a heat shock chaperone protein expressed universally to protect cells from heat shock, among other stresses [90]. The thaumarchaeotal dnaK gene has been considered an important example of LGT from Bacteria, and its role was hypothesized to facilitate Thaumarchaeota adaptation to the pelagic environment [16, 69]. We found that this gene (arCOG03060, Fig. S9) is widespread among Thaumarchaeota members and that the marine and terrestrial members form a monophyletic group, except that the gene from Candidatus Nitrosopumilus salaria BD31 is embedded within bacterial lineages. Because the Bathyarchaeota members of dnaK form a sister lineage to the thaumarchaeotal members (Fig. S9), which is congruent with the concatenated evolutionary tree of these two archaeal phyla, the dnaK must be an ancestral trait found in the MRCA of these phyla instead of LGT from Bacteria to Thaumarchaeota.

Concluding remarks

Genome sequences from phylogenetically diverse Thaumarchaeota lineages occupying distinct niches in terrestrial and marine biomes have only recently become available. A few of them are derived from organisms occupying crucial phylogenetic positions. These include the two SAGs [19, 91] and 15 MAGs located at the base of Thaumarchaeota (known as non-AOA) [20,21,22, 39], and two surface ocean Thaumarchaeota enrichment cultures Candidatus Nitrosopelagicus brevis CN25 [38] and U25 [37] and two Candidatus Nitrosopelagicus MAGs [39] clustering with the deep-water group. These phylogenetic branches are critical in providing an accurate estimate of the timing of the origin of ammonia-oxidizing Thaumarchaeota and their subsequent major niche expansions using genome-based molecular dating analysis. Our study also greatly benefited from the recently available genome sequences of the phylogenetically diverse Aigarchaeota [18, 92] and Bathyarchaeota [67, 93,94,95]. The genome sequences of sister groups of Thaumarchaeota make it possible to validate key LGT events (e.g., speE and ectB) from Bacteria to Thaumarchaeota and to rule out other putative LGT events (e.g., dnaK). In conclusion, our study places Thaumarchaeota evolution in the context of major geological events on the ancient earth. A major finding is that oxygen availability was one of the most important drivers underlying the evolution of Thaumarchaeota at geological timescales. Although the expansion of Thaumarchaeota into new niches was facilitated by LGT of crucial genes from Bacteria, our analysis suggests that the overall impact of this process on the evolution of Thaumarchaeota is far less than previously reported.