Introduction

Candidatus phylum Eremiobacterota is an as-yet-uncultured bacterial lineage, with the type genus Ca. Eremiobacter based on a metagenome-assembled genome (MAG) from Antarctic soil [1]. Ca. Eremiobacter contains genes encoding ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO) type 1E and high-affinity Group 1h [NiFe]- hydrogenase genes [2, 3]. Together, these genes are indicative of a novel form of chemolithoautotrophy called atmospheric chemosynthesis [4, 5]. Discovered in cold desert soils of Antarctica, bacteria genetically capable of this process use high-affinity Group 1h [NiFe]-hydrogenases to oxidize trace levels of hydrogen gas at below atmospheric levels [1, 6]. The energy and reductant derived from trace gas oxidation allow these chemolithoautotrophs to fix CO2 through the Calvin–Benson–Bassham (CBB) cycle [1, 2]. This process is one strategy for survival of bacteria in nutrient-poor desert soils [5], and is a major advance in our understanding of the potential ecological importance of members of Ca. Eremiobacterota. However, Ca. Eremiobacter is just one example of the potential ecological and metabolic diversity of the entire phylum Ca. Eremiobacterota.

The first 16S rRNA gene sequence of Ca. Eremiobacterota (originally candidate phylum WPS-2) was identified from a soil contaminated with polychlorinated biphenyl, with the designation of WPS-2 derived from “Wittenberg-polluted soil” [7]. As part of the rare biosphere (relative abundances usually <0.1%) [8, 9], growing evidence suggests a global distribution of Ca. Eremiobacterota, with 16S rRNA gene sequences reported from diverse terrestrial environments. Improved sequencing efforts have led to Ca. Eremiobacterota 16S rRNA gene sequences being recovered from environmental and animal sources: peatlands [9]; permafrost [10]; mosses in boreal forests [11, 12]; acidic, polluted soil [7]; bare, unvegetated soil [13]; canine and human oral microbiomes [14, 15]; fecal samples [16]; industrial waste processes [16]; and Antarctic desert soils [17,18,19].

Whole-genome phylogenetic analysis has revealed phylum Ca. Eremiobacterota to be closest to the phyla Armatimonadota and Chloroflexota [1, 16]. Ca. Eremiobacterota is composed of two class-level lineages, Ca. Eremiobacteria and UBP9 (formerly SHA-109) [12, 16]. In recent years, Ca. Eremiobacteria MAGs have been extensively recovered from natural terrestrial environments [1, 10, 11]; UBP9 MAGs have been recovered from baboon feces and industrial waste [16]. Ca. Eremiobacteria is composed of two candidate orders, Ca. Eremiobacterales (type genus Ca. Eremiobacter) [1] and Ca. Baltobacterales (formerly UBP12) [12]. Ca. Baltobacterales MAGs have originated from forests and peatlands, with certain members inferred to be facultative aerobes capable of bacteriochlorophyll-based anoxygenic photosynthesis [12], whereas others were predicted to be aerobic organoheterotrophs [1, 13].

Due to the phylogenetic diversity, potential metabolic disparity, and ecological importance of Ca. Eremiobacterota, we wish to elucidate the phylogeny, environmental distribution, and ecological roles of this clade. To do so, we performed a meta-analysis by collecting 16S rRNA gene amplicon sequencing data together with environmental metadata from cold and temperate terrestrial environments. We also comparatively analyzed 72 Ca. Eremiobacterota MAGs (including UBP9) to assess the metabolic potential of this clade. Finally, we used catalyzed reporter deposition fluorescence in situ hybridization (CARD-FISH) to elucidate the morphology of members of Ca. Eremiobacteria inhabiting Antarctic desert soil.

Materials and methods

16S rRNA gene sequence diversity, associated metadata, and multivariate analysis

Australian temperate and polar soil data were obtained from the Biome of Australia Soil Environments (BASE) project [19], and the Polar Soil Archive (PSA) biodiversity survey [20]. The former contains amplicon sequencing data across a range of terrestrial ecosystems within Australia (1641 samples, accessed on 15 November 2018), while the latter contains data spanning 223 soils from eight locations across the High Arctic and East Antarctica. The 16S rRNA gene sequence fragment in both datasets were amplified using primers 27F and 519R [19, 21]. The PCR amplicon in the BASE dataset was sequenced on either Illumina or the 454 platforms [19], while the latter was sequenced on the 454 platform only [22]. The raw sequences for the two datasets were processed using the Mothur pipeline and OTUs were identified at 96% or 97% sequence identity using UCLUST, respectively. Both datasets were not subsampled, as the read distribution was uneven across samples due to the mixed usage of Illumina and 454 sequencing platforms. Analysis has been performed previously that compared a subsampled and non-subsampled version of the dataset [22]. The results suggested that subsampling had no significant effect on the outcome of the diversity analysis and we therefore elected to leave the dataset intact. Both datasets contain comprehensive soil physicochemical data, comprising 64 measured chemicals including pH, moisture, total carbon, nitrogen, and phosphorus, and 67 microclimate factors.

Random Forest modeling was used to estimate the significance of soil physicochemical factors on the number of Ca. Eremiobacteria operational taxonomic units (OTUs) recovered (richness) and their relative abundance in Australian and polar soils datasets, separately. The importance of each predictor variable was determined by evaluating the decrease in prediction accuracy averaged over the forest (5000 trees), using the rfPermute package [23] under the R-environment. The significance of the model and the cross-validated R2 were assessed with 1000 permutations of each predictor. The influence (promoting or inhibiting) of these predictors was assessed based on Pearson correlation analysis using SPSS (version 22; IBM, 2013). The relationships between pH and Ca. Eremiobacteria were visualized using Origin (version b9.5.1.195; OriginLab Corporation, 2018).

Sequencing, metagenome assembly, and binning

Three Antarctic soil samples were collected from Mitchell Peninsula in Eastern Antarctica [20]. DNA extraction and shotgun sequencing were performed as described [1]. Raw low-quality reads were identified and processed with Trimmomatic (ver. 0.36) for adapter removal and quality filtering [24]. Quality-controlled reads were then assembled using MetaSPAdes (ver. 3.11.1) with default parameters [25]. Contigs whose length was <2000 bp were removed using BBMap (ver. 36.92) [26]. All samples were also co-assembled by combining their respective quality-controlled reads and repeating the assembly and contig filtering steps. Quality-controlled reads for each sample, or combined samples, were mapped onto their respective (co-)assemblies using BamM “make” (ver 1.73, M. Imelfort, unpublished, https://github.com/minillinim/BamM). Assemblies were binned by providing the contigs for each and mapped reads (as BAM files) as input to UniteM (ver. 0.0.15, D. Parks, unpublished, https://github.com/dparks1134/UniteM) and using GroopM (ver. 2.0.0) [27], Maxbin (ver. 2.2.4) [28], MetaBAT (ver. 0.32.5) [29], and MetaBAT2 (ver. 2.12.1) binning methods [27, 29,30,31].

Ca. Eremiobacterota MAGs retrieval, taxonomic classification, and analysis

Sixty-three MAGs classified as Ca. Eremiobacterota were retrieved from previous studies (Supplementary Table S1) [1, 10, 16]. Taxonomic assignment of MAGs, including identification of novel taxa, was performed using the Genome Taxonomy Database Toolkit (GTDB-Tk) (ver. 0.3.2; with reference to GTDB R04-R89), which provides an objective taxonomic classification of prokaryote genomes (including MAGs) by placing them into concatenated protein reference trees, using relative evolutionary divergence and average nucleotide identity to establish taxonomic ranks [32, 33]. MAG completeness and contamination were evaluated on the 63 retrieved MAGS and nine new MAGs using CheckM (ver. 1.0.12) [34]. Novel Candidatus taxa were named according to the proposed genomic standards [35]. Open reading frame (ORF) and gene functional prediction were performed using Prokka [36], with the annotations of all proteins discussed here manually confirmed using ExPASY BLAST, including RuBisCO large subunit (CbbL), based on BLAST searching against previously reported sequences [1, 37, 38]. Potential carbon monoxide dehydrogenase (CODH) large subunit and CbbL amino acid sequences were extracted from the respective genomes and aligned using MAFFT together with their respective reference sequences. Phylogenetic trees were generated using IQ-Tree with an approximate-maximum likelihood method and visualized using iTOL [39]. Genome analysis methods including CAZy (for carbohydrate-active enzymes) and genome-based phylogenetics are described in Supplementary Methods. Protein sequences identified as hydrogenases based on catalytic domains were classified further using the hydrogenase classifier HydDB [40].

Ca. Eremiobacteria fluorescence in situ hybridization (FISH) probe design

Near-complete SSU RNA gene sequences (>1300 bp) classified as Ca. Eremiobacteria were retrieved from the SILVA databases (n = 185) [41]. The retrieved sequences were used to design a class-level Ca. Eremiobacteria-specific FISH probe (Erem-289-Cy3, 5′- TCGCTCTCTCAAACCAGC[CY3]-3′) using the ARB probe design function [42], ensuring the site of hybridization was carefully selected based on the SSU rRNA accessibility map [43]. The specificity of the Ca. Eremiobacteria probe was tested in silico using testprobe 3.0 against the nonredundant SILVA Reference dataset [41]. Due to the lack of pure or enriched Ca. Eremiobacteria culture for use as positive controls, the Erem-289 FISH probe was optimized and validated using clone-FISH [44], with further details on the design of positive and negative control clones provided in the Supplementary Methods, Supplementary Fig. S1, and Supplementary Table S2.

CARD-FISH

Bacterial cells were extracted from 0.25 g of soil collected from Mitchell Peninsula using a Nycodenz density gradient medium, in triplicate [45] (see Supplementary Methods). Isolated bacterial cells were filtered onto a 0.2-µm polycarbonate membrane (Millipore, Australia) using a vacuum manifold and washed twice with 100-µL sterile Milli Q water [46]. The membranes were embedded in 0.1% (w/v) low melting temperature agarose and cells fixed with 100 µL of 4% paraformaldehyde at 4 °C for 24 h, followed by a second fixation step in 100-µL ethanol and PBS (1:1) at 4 °C for 24 h. Hybridization was performed overnight as described [46]. The CARD reaction was performed by incubation of the membrane in 100 µL of the tyramide-Cy3 working solution from the TSA Plus Fluorescence Kit (Perkin Elmer, Melbourne, Australia), for 10 min at RT. Membranes were then washed in 100 µL of 1x PBS, followed by 100-µl sterile water (1 min), before dehydration in 100-µL 96% ethanol, and counterstaining with 4′,6-diamidino-2-phenylindol (DAPI; Thermo Fisher, Australia) at a concentration of 2 µg/mL. Cells were visualized using a BX61 motorized epifluorescence microscope with an LED light source and a DP71 digital camera (Olympus), using appropriate filters for the emission of Cy3 (excitation at 531 nm, emission at 593 nm), Cy5 (excitation at 628 nm, emission at 692 nm), and DAPI (excitation at 377 nm, emission at 447 nm). Soil samples, positive and negative clone-FISH controls were prepared and analysed in triplicate (Supplementary Information). In total, 65 Cy3 and DAPI-positive cells were identified between three replicate soil samples.

Results and discussion

Ca. Eremiobacterota: a taxonomic and phylogenetic overview

Phylogenetic analysis of MAGs (Figs. 1 and 2) and 16S rRNA gene sequences (Supplementary Fig. S2 and Supplementary Table S3) support the subdivision of Ca. Eremiobacterota into two class-level lineages, Ca. Eremiobacteria and UBP9. Within Ca. Eremiobacteria, two order-level clades were resolved, corresponding to Ca. Eremiobacterales and Ca. Baltobacterales [1, 12], each of which contained a single family-level clade, Ca. Eremiobacteraceae fam. nov. and Ca. Baltobacteraceae, respectively (Fig. 2 and Supplementary Table S1). Ca. Eremiobacterales include Ca. Eremiobacter and Ca. Mawsoniella gen. nov. (Table 1 and Supplementary Table S1). Ca. Baltobacterales include the type genus Ca. Baltobacter, which was inferred to be capable of both heterotrophy and anoxygenic photosynthesis [12]; and Ca. Rubrimentiphilum, which was predicted to be an obligate heterotroph (Fig. 2 and Table 1) [13]. An additional 59 Ca. Eremiobacteria MAGs were found to be representatives of Ca. Baltobacterales [10]; based on GTDB taxonomy and phylogenetic analysis, these were resolved into 16 novel genus-level taxa (Fig. 2, Table 1, and Supplementary Table S1). Of these, Ca. Cryoxeromicrobium gen. nov. is resolved as the most basal known genus within Ca. Baltobacterales, and Ca. Nyctobacter gen. nov. is the sister taxon of Ca. Baltobacter (Fig. 2). These genera and Ca. Erabacter gen. nov., Ca. Hesperobacter gen. nov., and Ca. Zemynaea gen. nov. form a single cluster within Ca. Baltobacterales. A second Ca. Baltobacterales cluster includes Ca. Rubrimentiphilum and Ca. Meridianibacter gen. nov. (Bin 23) [1, 13]. A third cluster comprises five genera (Ca. Aquilonibacter gen. nov., Ca. Tyrphobacter gen. nov., Ca. Tumulicola gen. nov., Ca. Cybelea gen. nov., Ca. Palsibacter gen. nov., Ca. Hemerobacter gen. nov.). The fourth Ca. Baltobacterales cluster comprises four genera (Ca. Velthaea gen. nov.; Ca. Lustribacter gen. nov.; Ca. Elarobacter gen. nov.; Ca. Tityobacter gen. nov.) and tended to contain the largest genomes within Ca. Eremiobacteria (~4.5–5 Mbp) (Table 1 and Supplementary Table S1).

Fig. 1: Phylogenetic tree based on 15 concatenated marker protein sequences, which demonstrates the relationship of Candidatus Eremiobacterota to closely related bacterial phyla.
figure 1

Escherichia coli (Proteobacteria) was set as the outgroup.

Fig. 2: Phylogenetic tree of Candidatus Eremiobacterota based on 15 concatenated ribosomal protein sequences.
figure 2

Phylum Ca. Eremiobacterota is composed of two classes, Ca. Eremiobacteria and Ca. Xenobia class. nov. Ca. Eremiobacteria contain two orders, Ca. Eremiobacterales and Ca. Baltobacterales. Ca. Eremiobacter (Ga011786) [1] is the type genus of family Ca. Eremiobacteraceae fam. nov., order Ca. Eremiobacterales class Ca. Eremiobacteria, and phylum Ca. Eremiobacterota. Ca. Baltobacter (WPS2_44; IMG accession #2734482170) is the type genus of family Ca. Baltobacteraceae and order Ca. Baltobacterales [12]. Note that because genome completeness was <70% for GCA_014304975.1, no genus or species name was proposed for this metagenome-assembled genome. Ca. Xenobia class. nov. contains a single order Ca. Xenobiales ord. nov., a single family Ca. Xenobiaceae fam. nov., and two genera Ca. Xenobium gen. nov. and Ca. Bruticola gen. nov. Chloroflexus aurantiacus (Chloroflexota) was set as the outgroup. Information for the MAGs discussed in this study, including sampled location, is presented in Supplementary Table S1.

Table 1 Predicted physiological and metabolic traits inferred for Candidatus Eremiobacterota genera, for the three known orders Ca. Eremiobacterales, Ca. Baltobacterales, and Ca. Xenobiales ord. nov.

Ca. Eremiobacterota also comprises the candidate class UBP9, the first genomic representatives of clade SHA-109 (Fig. 2 and Supplementary Table S1) [16]. UBP9 is here renamed Ca. Xenobia class. nov. and was determined to include a single order and family, with the type genus Ca. Xenobium gen. nov. recovered from samples sourced from industrial processes, and containing the largest estimated genome size for Ca. Eremiobacterota (~5 Mbp) (Supplementary Table S1). A second genus, Ca. Bruticola gen. nov., which was recovered from baboon fecal samples, has a genome almost half the size of that of Ca. Xenobium, with a much lower % GC content (Ca. Bruticola, ~43–44%; Ca. Xenobium, ~67–68%). The % GC content of the Ca. Xenobium genome is comparable to that of genomes from Ca. Eremiobacteria (~58–77%) (Supplementary Table S1).

Identification of environmental determinants of Ca. Eremiobacteria

Ca. Eremiobacteria 16S rRNA gene sequences were identified in 163 of the 223 samples in the PSA dataset, with an average relative abundance of 5.8% and an average of 16 OTUs identified (Fig. 3 and Supplementary Fig. S3). Two soils containing the most abundant and diverse Ca. Eremiobacteria community were identified from Mitchell Peninsula (Antarctica), with 66 OTUs comprising up to 25% of the relative abundance. In contrast, 1440 out of the 1641 samples in the Australian biome dataset contained Ca. Eremiobacteria OTUs, with an average relative abundance of only 0.5%, and an average richness of 27 OTUs (Supplementary Fig. S3). In general, Ca. Eremiobacteria showed higher relative abundances and richness in polar soils that had an organic carbon content of 0.1–1%, but this relationship was not evident for Australian soils (Supplementary Fig. S4 and Supplementary Fig. S5). In the Australian soils, Ca. Eremiobacteria were more abundant in soils from Rutherglen (Victoria, Australia) with a relative abundance of 7.2%. Soils from Booderee (NSW, Australia) exhibited the highest richness with 252 OTUs identified from a single sample. This demonstrates that while the Polar soils exhibit a greater relative abundance of Ca. Eremiobacteria, richness in these soils was lower compared to Australian temperate samples.

Fig. 3: Scatter plot showing the significant correlations between Candidatus Eremiobacteria relative abundance and pH in Australian and Polar soils.
figure 3

Dot size represents the richness (number of OTUs) of Ca. Eremiobacteria.

Random Forest and Spearman correlations identified pH and moisture (p < 0.001) (Fig. 3, Supplementary Fig. S4, and Supplementary Table S4) as two of the strongest determinants for Ca. Eremiobacteria, with Ca. Eremiobacteria richness and relative abundance negatively correlated with increased pH [47]. Ca. Eremiobacteria is acidotolerant, with a preference for pH < 6, which is consistent with previous work [12]. Of the 195 soil samples with a relative abundance of Ca. Eremiobacteria >1%, the majority (n = 154) were identified from soil samples with acidic pH (pH < 6) (Fig. 3). However, 19 samples with alkaline pH (pH > 8) also exhibited Ca. Eremiobacteria relative abundances >1%; these were from a single locality (Yellabinna Regional Reserve, NSW, Australia) which was very low in organic carbon content (0.24–0.6%) (Supplementary Table S4). Further analysis revealed that 16S rRNA gene sequences from the Yellabinna locality were dispersed within the Ca. Eremiobacteria tree, rather than constituting a separate Ca. Eremiobacteria clade (Supplementary Fig. S6). As pH appears to be the major determinant of Ca. Eremiobacteria distribution, it is possible that Ca. Eremiobacteria abundance at this alkaline locality was influenced more by their capacity to survive in dry, carbon-limited soils, such as by scavenging atmospheric gases as energy and carbon sources [1, 5].

Predicted metabolic properties of Ca. Eremiobacteria

Autotrophy and trace gas oxidation

The genetic capacity for trace gas oxidation and carbon fixation through the CBB cycle [1] was represented among Ca. Eremiobacteria genomes. Ca. Eremiobacterales genera (Ca. Eremiobacter and Ca. Mawsoniella) and six Ca. Baltobacterales genera (Ca. Nyctobacter, Ca. Hesperobacter, Ca. Cybelea, Ca. Palsibacter, Ca. Velthaea, and Ca. Tityobacter) encode the high-affinity Group 1h [NiFe]-hydrogenase (hhyL) (Table 1, Supplementary Table S5, and Supplementary Table S6). Of these, five encode RuBisCO (cbbL) type IE (Ca. Eremiobacter, Ca. Nyctobacter, Ca. Palsibacter, Ca. Velthaea, and Ca. Tityobacter) (Table 1, Supplementary Table S5, and Supplementary Fig. S7). Together, these two genes indicate the potential to use atmospheric H2-oxidation to drive carbon fixation (“atmospheric chemosynthesis”) (Fig. 3) [1, 5]. In addition, Ca. Hemerobacter encodes both Group 1f and 2a [NiFe] hydrogenases, as well as RuBisCO (both type 1A and 1E), suggesting a capacity to use atmospheric H2-oxidation to drive carbon fixation; bacteria with Group 1f and 2a [NiFe] hydrogenases have previously been found to be capable of H2 uptake at sub-atmospheric concentrations [48, 49]. One proposed explanation for different atmospheric H2-oxidizing hydrogenases is that Group 2a is adapted for exponential phase growth, whereas Group 1h is linked to energy conservation during persistence [50]. Group 1f hydrogenases have also been accorded a function in protection against reactive oxygen species [3, 51]. Given that two genera (Ca. Hemerobacter and Ca. Lustribacter) that encode Group 1f hydrogenases are inferred to be capable of anaerobic respiration (see Respiration, below) and both were recovered from bog soil samples [10], we propose that Group 1f hydrogenases might be employed by Ca. Eremiobacteria for H2-oxidation under anaerobic conditions. The distribution of genes required for atmospheric H2-oxidation and carbon fixation across Ca. Eremiobacteria taxa is best explained by horizontal gene transfer (HGT) rather than these inferred abilities being ancestral for this phylum. For other [NiFe] hydrogenases found in Ca. Eremiobacteria, Group 3b (Ca. Velthaea) and Group 3d (Ca. Palsibacter and Ca. Lustribacter) hydrogenases are bidirectional, and may help to maintain redox balance within the cell (Table 1 and Supplementary Table S6) [3, 52].

Aerobic CODH is also encoded across Ca. Baltobacterales (Supplementary Fig. S8) with five of these genera also containing RuBisCO (either type 1E or 1A) (Ca. Hesperobacter, Ca. Hemerobacter, Ca. Palsibacter, Ca. Velthaea, and Ca. Tityobacter) (Supplementary Fig. S7), suggesting that these genera can oxidize atmospheric CO to generate CO2 for autotrophic growth via the CBB cycle [1, 2]. In addition, for heterotrophic growth, aerobic atmospheric CO oxidation serves as a potential supplemental energy source to support survival during nutrient starvation [50].

Three Ca. Baltobacterales genera contain genes for RuBisCO type 1A, of which two (Ca. Hemerobacter and Ca. Velthaea) also contain RuBisCO type 1E (Table 1, Supplementary Table S5, and Supplementary Fig. S7). Many autotrophic bacteria improve CO2 fixation by sequestering RuBisCO into inclusions called carboxysomes [53]; these were encoded in Ca. Hemerobacter and Ca. Velthaea (Table 1 and Supplementary Table S5). Four Ca. Baltobacterales genera have the genetic potential for anoxygenic phototrophy using Type II reaction centers (Ca. Baltobacter, Ca. Hesperobacter, Ca. Hemerobacter, and Ca. Velthaea), as inferred previously for MAGs belonging to these genera [11, 12] (Table 1 and Supplementary Table S5). HGT has been proposed to explain the distribution of phototrophy-related genes within Ca. Eremiobacteria [12]. For bacteriochlorophyll synthesis, magnesium-protoporphyrin IX monomethyl ester cyclase exists as alternative aerobic (acsF) or anaerobic (bchE) enzymes, and both are encoded in Ca. Hemerobacter and Ca. Velthaea. This is consistent with an ability to grow phototrophically under both aerobic and anaerobic conditions; those genera that had bchE also have the genetic capacity for anaerobic respiration (see “Respiration,” below).

Ca. Velthaea has two photoreceptor genes: photoactive yellow protein (PYP) and bacteriophytochrome (Table 1, Fig. 3b, and Supplementary Table S5). Ca. Hesperobacter has a gene for PYP. Both PYP and bacteriophytochrome are light-detecting photoreceptors found across prokaryotes, and involved in cellular processes such as phototaxis and upregulation of hydrolytic enzymes involved in degradation of plant material [54, 55].

Respiration

Ca. Eremiobacteria are typically associated with oxygen-rich environments, including moss and surface soils [12, 56]. All genera examined here have the capacity for aerobic respiration using cytochrome c oxidase. Many species additionally encoded a cytochrome bd quinol oxidase, associated with aerobic respiration under O2-limited conditions (Supplementary Table S5) [57]. Several genera also encoded the capacity to use terminal electron acceptors other than oxygen, including nitrate (Ca. Zemynaea), nitric oxide (Ca. Hemerobacter), sulfoxides (possibly dimethylsulfoxide and/or trimethylamine N-oxide) (Ca. Velthaea and Ca. Lustribacter) [58], and urocanate (Ca. Lustribacter) [59] (Supplementary Table S5). Microaerobic and anaerobic capacities are consistent with low oxygen levels that prevail in peatlands, including palsa and bog anoxic layers [10]. There was no evidence of anaerobic respiration in any genus sourced from Antarctic soils (either Ca. Eremiobacterales or Ca. Baltobacterales), although Ca. Eremiobacter encodes a dissimilatory nitrite reductase (Fig. 4a), possibly for redox balance [60].

Fig. 4: Illustrations depicting the predicted diverse metabolic capacities within phylum Candidatus Eremiobacterota.
figure 4

Class Candidatus Eremiobacteria a order Candidatus Eremiobacterales, represented by Candidatus Eremiobacter antarcticus sp. nov.; b order Candidatus Baltobacterales, represented by Candidatus Velthaea versatilis gen. et. sp. nov.; and class Candidatus Xenobia class. nov., represented by c Candidatus Xenobium occultum gen. et. sp. nov.; and d Candidatus Bruticola papionis gen. et. sp. nov. All cells are shown as diderm, based on the presence of lipopolysaccharide biosynthesis genes across Ca. Eremiobacterota (Supplementary Table S5). All cells are shown as coccoid, based on observed cell morphology of Antarctic Ca. Eremiobacteria cells, even though the rod-cell shape determinant MreB was encoded across both classes (Supplementary Table S5). ABC ATP-binding cassette, BCAA branched-chain amino acids, BCKDC branched-chain 2-oxoacid dehydrogenase complex, BMC bacterial microcompartment, CBB cycle Calvin–Benson–Bassham cycle, CoA coenzyme A, CODH carbon monoxide dehydrogenase, cyt bd cytochrome bd, DMS dimethylsulfide, DMSO dimethylsulfoxide, DNRA dissimilatory nitrate reduction to ammonia, Dtp di-/tripeptide transporter, F420 coenzyme F420 (8-hydroxy-5-deazaflavin), GDH glutamate dehydrogenase, GS glutamine synthetase, GOGAT glutamate synthase, ox oxidized, PAPS 3′-phosphoadenosine-5′-phosphosulfate, PEP phosphoenolpyruvate, PHA polyhydroxyalkanoate (storage product), PitA inorganic phosphate transporter, POT proton-dependent oligopeptide transporter, PTS phosphotransferase, PYP photoactive yellow protein, red reduced, RuBisCO ribulose-bisphosphate carboxylase/oxygenase, TCA cycle tricarboxylic acid cycle, 6PGL 6-phosphogluconolactone.

Organic compounds

Ca. Eremiobacteria possess genes for a complete glycolysis (Embden–Meyerhof–Parnas [EMP] pathway), pentose phosphate pathway, and tricarboxylic acid (TCA) cycle (Supplementary Table S5). CAZy analysis revealed glycoside hydrolases (GH) for the degradation of complex polysaccharides, including cellulose, xyloglucan, xylan, and possibly chitin in certain Ca. Eremiobacteria (Table 1 and Supplementary Table S7). This could be especially relevant to Ca. Baltobacterales from Stordalen Mire, given that high molecular-weight, plant-derived polysaccharides (primarily cellulose and hemicellulose) comprise a large proportion of carbon in this system [10]. Aromatic compounds, present in soil as degradation products of plant material [61], could potentially be used as carbon and energy sources by Ca. Eremiobacteria (see “Supplementary Results and discussion: catabolism of plant metabolites and aromatic compounds”). Aromatic compound transporters and catabolic enzymes (including those capable of cleavage of aromatic rings) are encoded across Ca. Eremiobacteria genera (Supplementary Table S5; see “Supplementary Results and discussion: oxygen as an agent for substrate degradation”).

A number of Ca. Eremiobacteria encode pathways for glycogen synthesis, by either the classical pathway or the mycobacterial-type pathway sourced from trehalose (Supplementary Table S5) [62]. Certain Ca. Eremiobacteria encode the ability to synthesize trehalose, which can be accumulated for osmotic stress [63] or converted to 2-sulfotrehalose, a precursor for sulfolipid biosynthesis [64], as well as a compatible solute to protect against osmotic stress (Supplementary Table S5) [65]. Polyhydroxyalkanoate (PHA) synthase (PhaC) is encoded throughout the Ca. Eremiobacteria, indicating the capacity for carbon storage as PHA (Table 1, Fig. 4, and Supplementary Table S5).

The primary transporters in Ca. Eremiobacteria were dominated by those for oligopeptides and branched-chain amino acids (BCAAs) (Supplementary Table S5). Ca. Eremiobacteria genera encode numerous peptidases, both cytoplasmic and extracytoplasmic (Supplementary Table S5). Among these diverse peptidases, the majority of Ca. Eremiobacteria encode d-aminopeptidases (Peptidase M55) and β-peptidyl aminopeptidases (Peptidase S58), suggesting that they can utilize d-alanyl-d-alanine dipeptides from peptidoglycans [66] and natural peptides that contain β-amino acids (e.g., microcystins, nodularins, carnosine) [67], respectively. Combined with the prevalence of genes for peptide ATP-dependent cassette (ABC) transporters, this suggests that amino acids are important sources of carbon, nitrogen, and sulfur. Further, certain amino acids could be utilized for pH homeostasis (see “Acid tolerance and resistance,” below). Ca. Eremiobacteria are notable for the prevalence of BCAA aminotransferases and branched-chain 2-oxoacid dehydrogenase complexes; along with BCAA ABC transporters, this suggests that BCAAs are a major source of nitrogen and reductant (Fig. 4). Across Ca. Eremiobacteria, other potential organic nitrogen sources include urea, cyanate, sarcosine, creatine, and putrescine (Table 1 and Supplementary Table S5). In general, Ca. Eremiobacteria appear to rely predominantly on organic nitrogen and sulfur sources. Organosulfonates (including taurine and alkanesulfonates) appear to be an important source of sulfur for many Ca. Baltobacterales. There was no evidence of nitrogenase in the Ca. Eremiobacteria surveyed here, despite the high abundance of Ca. Eremiobacteria in nitrogen-limited Antarctic soils [5]. Only one Ca. Eremiobacteria genus (Ca. Velthaea) showed genomic evidence of reductive nitrate assimilation (Fig. 4d). Only a relatively small number of Ca. Eremiobacteria genera possess identifiable genes for the complete reductive assimilation of sulfate to sulfide (Supplementary Table S5). Phosphate ABC transporters, for the primary import of phosphate, and polyphosphate kinases, for high-energy phosphate storage, are encoded throughout the Ca. Eremiobacteria (Supplementary Table S5).

Acid tolerance and resistance

The globally distributed Ca. Eremiobacteria is most often found in acidic and aerobic environments (Fig. 3) [12, 18]. The proton permeability of the cell membrane increases with temperature [68], so cooler environments would minimize proton permeation. Hopanoid lipids reduce membrane permeability to protons, and therefore are an important adaptation for life in acidic environments [69, 70]. All Ca. Baltobacterales genera encode squalene-hopene cyclase for hopanoid synthesis, with isoprenoids synthesized via the 1-deoxy-d-xylulose-5-phosphate pathway (Supplementary Table S5) [71]. Additional genomic features were identified that facilitate maintenance of pH homeostasis under acidic conditions. Proton influx under acidic conditions is offset by maintaining an inside-positive membrane potential as a charge barrier, achieved by uptake of potassium and other cations [69, 72]. Potassium uptake systems were present in most Ca. Eremiobacterota, including potassium channels and active potassium uptake systems (Supplementary Table S5) [73,74,75]. Ca. Eremiobacterota encode Na+/H+ antiporters (Nha), which may function to export excess protons and simultaneously import sodium ions [70].

Another acid resistance mechanism is to consume excess cytoplasmic protons using amino acid carboxylases, such as arginine carboxylase and glutamate carboxylase, which generate agmatine and γ-aminobutyrate (GABA), respectively; specific antiporters then release these products in exchange for import of exogenous precursors [76]. Arginine carboxylase is more widely distributed across Ca. Eremiobacteria, although the cognate arginine/agmatine antiporter could only be identified in three Ca. Baltobacterales genera (Supplementary Table S5). In addition, genes for glutamate carboxylase and/or glutamate/GABA antiporters were identified in five Ca. Baltobacterales genera (Supplementary Table S5).

Visualization of Ca. Eremiobacteria cells isolated from Antarctic soil

CARD-FISH was employed to visualize Ca. Eremiobacteria cells present in Antarctic desert soils [17]. Given the slow-growing nature of cold-adapted soil bacteria and presumably low ribosomal content [46] (Supplementary Fig. S9), combined with natural autofluorescence of soil particles (Supplementary Fig. S10), we elected to use CARD-FISH incorporating a horse radish peroxidase-labeled Erem-289 probe, tyramide signal amplification with Cy3 and counterstaining with DAPI to enhance fluorescence signal strength. Epifluorescence microscopy was used to visualize the morphology of members of this class for the first time, with the majority of Ca. Eremiobacteria cells being coccoid in shape, with 7.7% of cells analysed exhibiting a larger “spindle-shaped” morphology (Fig. 5; Supplementary Methods). The lower volume/surface area quotient of coccoid cells permits rapid and efficient nutrient uptake [77].

Fig. 5: Visualization of Candidatus Eremiobacteria cells isolated from an Antarctic soil.
figure 5

CARD-FISH was employed using the class-level Ca. Eremiobacteria-specific probe (Erem-289) labeled with Cy3 (red) and counterstained with DAPI (blue). The majority of Ca. Eremiobacteria cells are of coccoid morphology, although note the more “spindle-shaped” cells at the top of the field.

Ca. Xenobia is metabolically divergent from Ca. Eremiobacteria

Ca. Xenobium metabolism

Ca. Xenobium MAGS were recovered from separate and geographically diverse industrial waste samples (Table 1 and Supplementary Table S1) [16]. Both Ca. Xenobia genera, Ca. Xenobium (Fig. 4c) and Ca. Bruticola (Fig. 4d), encode the same acidophilic adaptations as Ca. Eremiobacteria (hopanoid biosynthesis, arginine decarboxylase, Ktr potassium uptake system, Nha sodium/proton antiporters) (Supplementary Table S5), suggesting that an acidophilic or acidotolerant lifestyle is common to this terrestrial phylum. Ca. Xenobium has a relatively large genome (>5 Mbp), comparable in size to the largest genomes identified in Ca. Eremiobacteria (e.g., Ca. Velthaea), and far larger than that of Ca. Bruticola (Supplementary Table S1).

Ca. Xenobium is inferred to be capable of both microaerobic and anaerobic respiration (Fig. 4c and Supplementary Table S5). Both Ca. Xenobium species encode a complete glycolytic (EMP) pathway, TCA cycle, pentose phosphate pathway, Complex I, and cytochrome bd oxygen reductase, the latter suggesting they are adapted to microaerobic conditions [57]. A potential for anaerobic respiration via dissimilatory nitrate reduction to ammonium (DNRA) is suggested by a putative respiratory complex for nitrate reduction to nitrite (NarGH), and a nitrite reductase (NrfAH) for the direct reduction of nitrite to ammonium (Fig. 4c and Supplementary Table S5) [78, 79]. DNRA generates nitric oxide and hydroxylamine as toxic intermediates, and Ca. Xenobium encodes enzymes for protection against oxidative and nitrosative stress, including F420-dependent glucose-6-dehydrogenase, nitrite reductase, and hydroxylamine reductase (Supplementary Table S5) [78]. Certain polysaccharides and oligosaccharides appear to be utilized as substrates, based on genes for secreted GH (see “Supplementary Results and discussion: additional metabolic properties of Ca. Xenobia”). Other potential sources of reductant include the oxidation of amino acids (including derived from peptide degradation), alcohols, and aldehydes (Supplementary Table S5) [80]. Ca. Xenobium also has the capacity for both PHA and glycogen synthesis (Supplementary Table S5). Ca. Xenobium encodes proteins required for the construction of bacterial microcompartments (also encoded in Ca. Tityobacter from Ca. Baltobacterales), likely for sequestering toxic aldehydes [81, 82] (see “Supplementary Results and discussion: additional metabolic properties of Ca. Xenobia”).

Ca. Bruticola metabolism

All five MAGs (81.41–85.65% completeness) that were derived from the baboon fecal metagenome belong to a single genus and species (Table 1 and Supplementary Table S1). Ca. Bruticola has a smaller genome (~2.8 Mbp) compared to Ca. Xenobium (Supplementary Table S1). There is no evidence of genomic streamlining [83], and the genome encodes biosynthetic abilities for lipids and certain amino acids, but not nucleosides (presumably imported) (Fig. 4d). Overall, Ca. Bruticola is inferred to have an extremely simplified, heterotrophic metabolism, with substrates that include oligopeptides, amino acids, and sugars (Supplementary Information; Supplementary Tables S5 and S7).

Ca. Bruticola lacks any evidence for genes associated with respiration, and this genus is therefore predicted to be obligately fermentative. A glycolytic (EMP) pathway is evident, possibly initiated by a phosphoenolpyruvate-phosphotransferase system for concomitant uptake and phosphorylation of glucose. Ca. Bruticola is predicted to lack a TCA cycle, with the only two identifiable genes being those for aconitase and isocitrate dehydrogenase (Fig. 4d and Supplementary Table S5). The truncated central metabolism is sufficient to synthesize the five universal anabolic precursors: acetyl-CoA, pyruvate, PEP, oxaloacetate, and 2-oxoglutarate [84, 85]. Unique among Ca. Eremiobacterota, Ca. Bruticola encodes all the components of citrate lyase, which catalyzes the acyl-carrier-protein (ACP)-dependent cleavage of citrate into oxaloacetate and acetate (Supplementary Table S5) [86]. The genes for citrate lyase (citGCXFED) form a cluster with genes for aconitase, isocitrate lyase, and 3-oxoacyl-ACP-reductase (fabG), the last of which catalyzes the first step in the chain elongation cycle of fatty acid biosynthesis. Thus, we posit that citrate has two possible fates: conversion to 2-oxoglutarate via isocitrate; or generation of oxaloacetate and acetate. Acetyl-CoA can be converted to acetate, to generate ATP by substrate-level phosphorylation (phosphate acetyltransferase, acetate kinase), as in Ca. Xenobium. A Na+-translocating oxaloacetate decarboxylase [87] is encoded by Ca. Bruticola; as well as generating pyruvate, this enzyme extrudes Na+ across the cytoplasmic membrane to create a Na+-motive force. One possibility is that Ca. Bruticola has a Na+-dependent ATP synthase [88], meaning oxaloacetate decarboxylation would be a source of ATP generation for the cell (Fig. 4d).

Ca. Eremiobacterota ecology

Members of Ca. Eremiobacterota were found to be united by multiple adaptations for living in acidic environments, in terms of cell envelope composition, mechanisms for maintenance of a charge barrier, and efflux and consumption of cytoplasmic protons. However, Ca. Eremiobacterota are metabolically diverse, with disparate metabolic strategies exhibited across genera reflecting the localities from which they were recovered (Fig. 4, Table 1, and Supplementary Table S5). Members of Ca. Eremiobacterota have adaptations that allow cells to survive in a diverse range of ecosystems, including acidic, contaminated, and severely nutrient-poor environments. Class Ca. Eremiobacteria has genomic capacities for trace gas oxidation and carbon fixation, and is highly adapted for growth and survival in low-pH, aerobic environments, including polar soils, desert soils, and peatlands [1, 10, 11]. The majority of the Ca. Eremiobacteria MAGs assessed in the current study were derived from peatlands, which is replete with plant-derived organic matter, much of it recalcitrant [10], with Ca. Eremiobacteria from this system typically encoding enzymes for the degradation of a narrow range of polysaccharides (Supplementary Tables S5 and S7). However, the presence of some polysaccharide-degrading enzymes (especially for glucan and xyloglucan) in Antarctic Ca. Eremiobacteria suggests that they also have access to plant degradation products in their cold and arid habitats. Vegetation (lichens and mosses) are present in Antarctica, including in the Windmill Islands region [89, 90] from where Ca. Eremiobacteria MAGs were recovered.

Ca. Eremiobacteria encode abundant mechanisms for obtaining reduced nitrogen from amino acids and proteins, with abilities to obtain nitrogen from diverse peptides, including those that contain β- or d-amino acids (Fig. 4). The latter abilities are not unique among bacteria, but may give Ca. Eremiobacteria a competitive advantage in environments depleted in organic nitrogen sources [5]. Multiple genera within Ca. Eremiobacteria encode the capacity to scavenge H2, CO2, and CO from the atmosphere for use as carbon and/or energy sources (Fig. 4 and Supplementary Table S5) [1]. In addition, several members of the order Ca. Baltobacterales encode the capacity for bacteriochlorophyll-based anoxygenic photosynthesis and anaerobic respiration [12]. Thus, certain Ca. Baltobacterales encode impressive metabolic capacities, with Ca. Velthaea as an exemplar; in addition to the aforementioned capabilities, this genus has genes for dual glycolytic pathways, reductive nitrate assimilation, photoreception, and chemotaxis using flagella (Fig. 4b and Supplementary Table S5).

Given the presence of genes associated with the degradation of aromatic acids, organohalogens, alkanes, and alcohols (Supplementary Table S5), Ca. Eremiobacteria have the capacity to contribute toward the degradation of environmental contaminants. Class Ca. Xenobia (formerly UBP9, SHA-109) [12, 16], comprising two genera, Ca. Xenobium and Ca. Bruticola (Fig. 2), appears to be highly divergent compared to Ca. Eremiobacteria. Yet, the ability to survive in polluted habitats is shared by Ca. Xenobium, with MAGs being recovered from industrial waste sources [16]. Ca. Xenobium encodes mechanisms to deal with heavy metals, arsenic, oxidative stress, and nitrosative stress, with all but the last shared with Ca. Eremiobacteria. Members of genus Ca. Xenobium are inferred to be facultative anaerobes adapted to microaerobic conditions (based on the nature of encoded terminal reductases), with a preference for peptides and amino acids, and capable of fermentation (Fig. 4c). The related genus Ca. Bruticola has the capacity to metabolize a narrow repertoire of organic substrates, has an incomplete TCA cycle, and is incapable of respiration (Fig. 4d). Ca. Bruticola may require an animal host, given that MAGs were recovered from baboon fecal samples. However, the natural environments of Ca. Xenobium and Ca. Bruticola are currently not known, and the exact ecologies of both genera await future discoveries.