Introduction

Nitrification, the biological oxidation of ammonia to nitrate, is an essential process in terrestrial and aquatic environments as well as in water quality engineering applications. For the last century, nitrification was assumed a two-step process executed by two complementary functional groups, ammonia-oxidizing prokaryotes (AOP) and nitrite-oxidizing bacteria (NOB). Recently, several groups have shown that single microorganisms belonging to the genus Nitrospira can carry out the complete oxidation of ammonia to nitrate, a process abbreviated comammox [1, 2]. Nitrospira spp., long-recognized as nitrite-oxidizers, are widespread in both natural and engineered ecosystems associated with nitrogen cycling [3, 4]. The Nitrospira genus is extremely diverse and comprises at least six lineages, which frequently coexist [3, 5]. Comammox Nitrospira genomes described to date belong to Nitrospira lineage II, and comprise two clades (clade A and B) based on the phylogeny of their ammonia monooxygenases [1, 2]. Both clades were detected in samples retrieved from a groundwater well and in a groundwater-treating rapid sand filter, together with conventional Nitrospira spp. [1, 6]. The diversity of Nitrospira spp. points towards ecological niche-partitioning; nitrite and dissolved oxygen concentrations are potential niche determinants for Nitrospira lineages I and II [7,8,9]. The recently described metabolic versatility of some Nitrospira spp., which includes formate, hydrogen, urea and cyanate metabolisms [10,11,12] may also contribute to their coexistence. Nevertheless, little is known about differences in the functional potential of the two comammox Nitrospira clades and niche-partitioning between comammox Nitrospira and AOP. Only the hypothesis that comammox Nitrospira could outcompete canonical AOB in surface-attached and substrate-limited environments has been proposed [13]. The evolutionary history of ammonia oxidation in Nitrospira is unknown. Based on their ammonia monooxygenase (AMO) and hydroxylamine dehydrogenase (HAO) sequences, β-proteobacterial ammonia oxidizing bacteria (β-AOB) are most similar to comammox Nitrospira [1, 2, 6, 14]. Either AMO and HAO encoding genes are ancestral in Nitrospira spp. or were acquired by horizontal gene transfer from AOP, even though the high degree of phylogenetic divergence of these proteins between comammox Nitrospira and AOP makes a recent acquisition improbable [2].

To examine the functional potential which might allow niche-partitioning among comammox Nitrospira and between comammox Nitrospira and AOP, and to unravel the evolutionary history of ammonia oxidation in Nitrospira, we performed differential coverage genome binning on a Nitrospira composite population genome recovered from a metagenome in which comammox Nitrospira, canonical AOB and ammonia oxidizing archaea (AOA) co-occurred [3, 6]. Comparative genome analysis was conducted with the recovered genomes and high-quality published Nitrospira genomes. The evolutionary history of genes involved in ammonia oxidation in comammox Nitrospira was also explored with a special focus on understanding the potential role of horizontal gene transfer, gene duplication, and gene loss. For this purpose, we executed pairwise protein dissimilarity comparison between and within comammox Nitrospira clades as well as with other AOB, explored the genomic arrangement in the relevant pathways, and performed probabilistic and parsimony-based reconciliation analysis where trees for individual genes were compared to a clock-like species-tree constructed from ribosomal proteins. Our study revealed comammox-, comammox clade- and canonical Nitrospira-specific features related to nitrogen assimilation, electron donor versatility and substrate-limitation tolerance. Our analysis suggests that ammonia-oxidation related genes were transferred from β-proteobacterial ammonia oxidizers to comammox Nitrospira.

Materials and methods

Recovery of individual genomes

As described in Palomo et al. [6], DNA was extracted from triplicate 0.5 g samples taken from three adjacent (ca. 0.5 m apart; and two different depths from each) locations in a singular rapid sand filter (RSF) and subject to shotgun sequencing to describe the microbial community involved in groundwater purification. The work presented here focuses on the metagenomes from which a Nitrospira composite population genome (CG24) was recovered [6]. In order to separate the composite population genome into individual genomes, a strategy based on subsample assembly [15] followed by differential coverage binning was applied [16] (further details in Supplementary Information and Supplementary Figure S1). The quality of recovered genomes was evaluated using coverage plots from mmgenome [17], hidden Markov models (HMMs) for 107 essential single-copy genes, and CheckM [18] (further details in Supplementary Information). The selected draft genomes were refined using Anvi’o [19] to remove contigs with inconsistent coverage. Assembly quality was improved by alignment against related complete or draft genomes using the Multi-Draft based Scaffolder (MeDuSa) [20] and gaps were closed with GapFiller v.1.10 [21]. All raw sequence data and genome sequences have been deposited at NCBI under the project PRJNA384587, with sequence accession for the raw sequence data: SRR5739198—SRR5739203; and for the draft genomes: GCA_002869925.1, GCA_002869845.1, GCA_002869885.1, GCA_002869855.1 and GCA_002869895.1 (GenBank assembly accessions).

Comparative genome analysis

High quality published Nitrospira genomes (more than 90% complete and with less than 5% redundancy) were also included in the comparative genomic analysis. Gene calling on the recovered genome bins and published genomes was performed using Prodigal v.2.62 [22]. Annotation was conducted in RAST [23] and protein functional assignments of genes of interest were confirmed using KEGG and blastp. Pangenomic analysis was executed using the meta-pangenomic workflow of Anvi’o [19] with default parameters with the exception --maxbit = 0.3 (further details in Supplementary Information). All comparative genome information was visualized using the program anvi-interactive in Anvi’o [19].

Sequence dissimilarity and gene synteny analysis

Pairwise amino acid dissimilarities (Supplementary Information) among the genomes for 18 housekeeping genes, 27 ammonia oxidation-related proteins and 14 syntenic ribosomal proteins (Supplementary Table S1) were calculated using Clustal Omega v.1.2.1 [24]. Gene arrangement of ammonia oxidation related genes was visualized using the R package genoPlotR [25].

Construction of clock-based species-tree and guest-tree sets

The software BEAST, version 2.4.4, was used to construct a dated species-tree using a relaxed log-normal clock model [26, 27]. The input data was a set of 14 ribosomal proteins assumed to be inherited mostly vertically (Supplementary Information and Supplementary Table S1). The timetree.org resource [28] together with the substitution rate of 16S rRNA [29] were used as a main starting point for finding calibration information for internal nodes (Supplementary Information).

For each investigated gene the protein sequences were aligned using the G-INS-i method in MAFFT [30]. The software MrBayes version 3.2.6 [31] was then used to reconstruct phylogenies (further details in Supplementary Information).

Reconciliation analysis

The analysis of event probabilities was done using the software JPrIME-DLTRS version 0.3.6 [32] (Supplementary Information). The clock-based tree mentioned above was used as the reference species-tree for all analyses (Supplementary Information). For each gene, the sample file containing 1500 midpoint-rooted trees constructed using MrBayes was used as a guest-tree set from which JPrIME-DLTRS could sample guest-tree topologies. MCMC was allowed to run for 4 million iterations with a thinning of 200, and running 3 parallel chains for each gene (Supplementary Information). To investigate which branches the transfer events most likely involved, we used the software ecceTERA version 1.2.4 [33] (Supplementary Information). The software SylvX version 1.3 [34] was used for visualizing reconciliations computed by ecceTERA.

Phylogenetic analysis

Phylogenetic trees for AmoA and HaoA were constructed using MrBayes. The substitution model for both data sets was LG with gamma-distributed rates (determined using jmodeltest), using 5 million iterations for 2 sets of 3 chains, with a thinning of 5000 and a burn-in of 25%. Convergence was checked using Tracer and by computing the potential scale reduction factor (ensuring that R was close to 1.0 for all parameters) and the average standard deviation of split frequencies (which was well below 0.01 in both cases).

Statistical analysis

All statistical tests were performed using R v3.3.1 [35]. Correlations between genomes phylogenetic distance (based on 14 concatenate ribosomal proteins) and sequence divergence for housekeeping or the ammonia-oxidation related proteins were determined using a linear regression model. Significances between two slopes were calculated using ANOVA test. Significances between two means of sequence dissimilarities for either housekeeping proteins or ammonia-oxidation related proteins were evaluated using Welch Two Sample t-test.

Results and Discussion

Genome extraction from the metagenome

Metagenomes from adjacent locations in a RSF were used for individual genome extraction [6]. Five Nitrospira genomes (CG24_A, CG24_B, CG24_C, CG24_D, CG24_E) were recovered from the metagenomes using differential coverage binning [16] (Fig. 1). The characteristics of these and other genomes used in this study are shown in Table 1. The size of the comammox genomes assembled here range from 3.0–3.6 Mb, with a completeness of 95–100%. Based on phylogenetic analysis of 14 syntenic ribosomal proteins, all genomes belong to Nitrospira sublineage II (Supplementary Figure S2). Genes required for complete ammonia oxidation (amo and hao operons) were detected in four of the genomes. Based on amoA phylogeny CG24_A, CG24_C and CG24_E belong to comammox clade B and CG24_B to comammox clade A (Supplementary Figure S3). CG24_A was most abundant at an average relative abundance of 18.6% (of all metagenome sequence reads) followed by CG24_B (8.6%), CG24_C (4.6%), CG24_D (2.4%, the only canonical NOB Nitrospira recovered) and CG24_E (1.1%). (Supplementary Figure S4).

Fig. 1
figure 1

Differential coverage plot of two metagenomes obtained from Islevbro waterworks at different sampling depth. Scaffolds are displayed as circles, scaled by length and colored according to phylum-level taxonomic affiliation. Only scaffolds >4 kbp are shown. A second differential coverage plot showing the CG24_E bin extraction (enclosed by the polygon) is presented

Table 1 Characteristics of examined genomes

Comparative genomics

The newly recovered genomes were compared with each other and publicly-available, high-quality Nitrospira genomes, including in total 11 comammox and five canonical Nitrospira (Table 1). Based on average amino acid identity (AAI) analysis, the 16 studied Nitrospira genomes constitute 11 different species, seven of them comammox Nitrospira (at species level cut-off of 85% AAI [36]. CG24_A, CG24_D and CG24_E are divergent enough from each other and previously published genomes to be separate species (Supplementary Figure S5).

Comparison of the 16 Nitrospira genomes was based on SEED subsystems (further details in Supplementary Information and Supplementary Figure S6) and pangenomic analysis (Fig. 2). The 59,744 coding sequences (CDS) of the 16 Nitrospira genomes clustered into 12,337 protein clusters (PCs), with a core Nitrospira genome consisting of 1382 PCs. The core genome includes genes for the nitrite oxidation pathway, the reductive tricarboxylic acid cycle for CO2 fixation (rTCA), the gluconeogenesis, the pentose phosphate, and the oxidative TCA cycle. Chlorite dismutase and copper-containing nitrite reductase (nirK) are also present in the core genome (Fig. 3). 35 comammox-specific PCs were identified; 16 and 3 of these PCs had highest sequence similarity to homologs in β-AOB and methane oxidizers, respectively (Supplementary Table S2). In addition, we detected 57 and 52 comammox clade A and clade B-specific PCs, respectively (Supplementary Table S2 and Fig. 3). The specific metabolic characteristics inferred from the annotations of genes with known homologs, and identified with the pangenomic analysis are described below.

Fig. 2
figure 2

Communality and uniqueness in the Nitrospira pangenome as derived from the clustering of 16 genomes based on 12,337 protein clusters (PCs). Each radial layer represents a genome, and each bar in a layer represents the occurrence of a PC (dark presence, light absence). Clade A and Clade B comammox Nitrospira genomes are denoted in green and blue, respectively

Fig. 3
figure 3

Cartoon of core and specific key metabolic features in the Nitrospira pangenome, as predicted from genome annotation. AIO arsenite oxidase, CynS cyanate hydratase, FDH formate dehydrogenase, H2ase hydrogenase, MSP methionine salvage pathway, SOR sulfite dehydrogenase, TrHb2 2/2 hemoglobin group 2. Enzyme complexes of the electron transport chain are labeled by Roman numerals. * N. moscoviensis possesses an octaheme nitrite reductase (ONR) putatively involved in nitrite reduction to ammonia

Nitrogen metabolism

All the recovered genomes encoded the nitrite oxidoreductase (NXR), consistent with other Nitrospira spp. The copy number of NXR in the investigated genomes varied from one to two in most of the comammox genomes (except Ca. N. nitrificans, which has four copies), to five in N. moscoviensis (Supplementary Table S2). The ammonia oxidation pathway (AMO structural genes amoCAB and the putative AMO subunits amoEDD2, as well as genes for HaoAB and the associated cytochromes CycAB) is present in four of the newly recovered genomes (CG24_A, CG24_B, CG24_C and CG24_E).

Regarding nitrogen uptake, clade B and canonical Nitrospira genomes encode MEP-type ammonia transporters, which are also found in AOA [37]. Clade A genomes encode an ammonia transporter with high homology (>70% amino acid similarity) to Rh-type transporters found in most of β-AOB (Supplementary Table S3). Experimental analysis in Nitrosomonas europea showed that its Rh-type transporter proteins have ammonium affinity in millimolar range and high uptake capacity [38], and similar results were observed for Rh-type ammonia transporter in other organisms [39]. In contrast, investigated MEP-type ammonium transporters have higher affinity (micromolar range) and lower uptake capacity [40, 41]. Whether these observations are also valid for Nitrospira spp. awaits experimental confirmation. Besides the uptake of exogenous ammonia by Nitrospira, ammonia can be produced through urea degradation as most of the comammox genomes harbor urease genes. This enzyme, which is functional in N. moscoviensis [11], is not detected in genomes of Nitrospira sublineage I. The comammox genomes contain a diversity of urea transporters. In addition to the high affinity urea ABC transporters (urtABCDE) present in some Nitrospira spp. such as N. lenta [11], comammox genomes also harbor two additional urea transporters: an outer-membrane porin (fmdC) involved in uptake of short-chain amides and urea at extremely low concentrations [42], and a urea carboxylase-related transporter (uctT). These extra urea transporters present in comammox Nitrospira might confer a competitive advantage in urea uptake with respect to other Nitrospira in environments with low and/or fluctuating urea concentrations. On the other hand, cyanate hydratase genes (cynS) are only detected in canonical Nitrospira. This may confer a benefit over comammox Nitrospira in environments with very low ammonium concentrations and the presence of cyanate coupled with reciprocal feeding with ammonia oxidizers [12]. All genomes except Ca. N. inopinata encode a NO2/NO3 transporter gene (narK). Furthermore, canonical Nitrospira genomes contain additional transporters for the uptake of nitrite: lineage I Nitrospira and CG24_D genomes harbor a nitrite transporter (nirC) and a formate/nitrite family transporter, while N. moscoviensis and CG24_D contain a nitrite-nitrate ABC transporter (nrtABC). The comammox genomes lack an assimilatory nitrite reductase (nirA) or octaheme nitrite reductase (ONR) (which has been proposed to be associated with nitrite reduction to ammonia in N. moscoviensis [11]), which would prevent comammox Nitrospira growth in the presence of nitrite as sole N source, as observed in Ca. N. inopinata [1].

Alternative electron donors

Some Nitrospira can utilize substrates beyond nitrite as electron donors including formate and hydrogen [7, 10, 11, 43]. The genes coding for formate dehydrogenase (fdh) are present in canonical Nitrospira and comammox clade B but were not detected in comammox clade A. This could provide an opportunity for niche-differentiation particularly in oxic–anoxic transition zones where formate can be found as a product of fermentation. With respect to hydrogen oxidation, the group 2a [NiFe] hydrogenase and accessory proteins involved in aerobic hydrogen oxidation in N. moscoviensis [10] are absent from the other Nitrospira genomes. However, all comammox clade A genomes encode a complete group 3b [NiFe]-hydrogenase. Although this bidirectional hydrogenase (sulfhydrogenase) is extensively distributed across the bacterial domain [44], little is known about its actual role (further details in Supplementary Information). Comammox genomes do not contain genes for a periplasmic sulfite:cytochrome c oxidoreductase (sulfite dehydrogenase, sorAB), which is characteristic of canonical Nitrospira. These genomes also possess the genetic inventory for sulfur assimilation, suggesting that this sulfite dehydrogenase might be involved in sulfite oxidation, as suggested for another NOB (Nitrospina gracilis) [45]. Lastly, a complete arsenite oxidase was exclusively found in Ca. N. defluvii. Although all examined genomes harbor arsenic resistance genes, only Ca. N. defluvii has the genetic potential for energy conservation during arsenite oxidation (Supplementary Information).

Energy conservation and transduction

All Nitrospira genomes harbor two homologous sets of genes encoding the complex I of the electron transport chain (NADH:ubiquinone oxidoreductase, NUO). One of these NUO lacks the subunits involved in NADH binding (NuoE and NuoF). This feature has been observed in other prokaryotes [46]; in Nitrospira spp. this incomplete complex I may be used for reverse electron transport from quinol to low-potential ferredoxin, as proposed in Nitrospina gracilis [45], thereby providing reduced ferredoxin required for CO2 fixation in the rTCA cycle. Additionally, CG24_A, Ca. N. nitrosa, AAU_MBR1, A2_bin and N. moscoviensis possess a third complete homologous set of genes coding for complex I, which is divergent from the other two, and is closely related to the γ-proteobacterial clade E type. This type of complex I has also been found in other distantly related organisms, including Nitrosospira multiformis and some Bacteroidetes spp., suggesting acquisition by horizontal gene transfer [47]. Although the exact function of this additional complex I is not known, differential expression of distinct complex I isozymes has been observed in other microorganisms, pointing towards physiological versatility [48, 49]. Also, differences with respect to complex II, the succinate:quinone oxidoreductase (SQR), were noted across genomes. SQR connects the TCA cycle to the quinone pool. This complex is classified in five types based on the number of membrane-bound domains and haem content [50]. Nitrospira genomes do not all contain the same SQR type. Clade B, CG24_D, and genomes belonging to Nitrospira lineage I contain a SQR type E, whereas clade A genomes and Nitrospira moscoviensis possess a SQR type B. Differences were also detected for the cytochrome bc complex (complex III) which is involved in the transfer of electrons from quinol to a c-type cytochrome. The canonical Nitrospira genomes contain at least two copies of complex III (three in N. moscoviensis and OLB3), while the comammox genomes harbor just one copy of this complex. With respect to cytochrome oxidases (complex IV), the studied genomes harbor a CDS similar to the the uncharacterized but functionally transcribed ‘cyt bd-like oxidase’ (NIDE0901) of Ca. N. defluvii which was proposed to be a terminal oxidase [51]. In addition to this putative terminal oxidase, canonical Nitrospira (except CG24_D), comammox clade A genomes (excluding Ca. N. nitrosa) and the clade B GWW3 bin contain the cydAB genes encoding a heterodimeric cyt. bd quinol oxidase. The reaction catalyzed by this respiratory oxidase is not coupled to proton pumping [52], and may be important when electrons derive from low potential electron donors.

Carbon metabolism

Comammox Nitrospira, like other sequenced Nitrospira genomes [51], encode genes for glycolysis, gluconeogenesis, the pentose phosphate pathway and the TCA cycle. Moreover, clade B comammox genomes and AAU_MBR2 carry an acetate permease gene (actP). Nitrospira genomes have the potential to degrade catechol and protocatechuate (Supplementary Table S2). Although these degradation pathways are only partially complete in Nitrospira genomes, comammox Nitrospira possess an intradiol ring-cleavage dioxygenase, which could transform catecholate derivatives to TCA cycle intermediates [55]. Additionally, comammox clade B and N. moscoviensis harbor a 4-oxalocrotonate tautomerase, essential in the conversion pathway of various aromatic compounds such as catechol to intermediates for the TCA cycle [54]. The phaZ gene which codes for polyhydroxybutyrate (PHB) depolymerase involved in PHB degradation was exclusively found in Nitrospira comammox genomes. Genes related to PHB synthesis are not found in the studied genomes, hence whether the PBH polymerase is a relic or has any functional role remains unknown. Overall, the presence of enzymes and pathways involved in different carbon sources degradation suggests that comammox Nitrospira have the potential to grow mixotrophically as reported for other Nitrospira spp. [43, 55, 56].

Stress response, resistance and defense

Genes encoding catalase (cat) or superoxide dismutase (sod), providing protection against reactive oxygen species (ROS), are absent from the closely related comammox genomes Ca. N. nitrosa and AAU_MBR1 as well as from Ca. N. defluvii. On the other hand, CG24_A and N. moscoviensis harbor diverse genes related to ROS protection including one catalase and one peroxidase (two in both cases for N. moscoviensis), and two dissimilar superoxide dismutases. The remaining genomes encode for either catalases or superoxide dismutases and all genomes harbor the putative ROS defense mechanisms predicted for Ca. N. defluvii (consisting of cyt. c peroxidases, thioredoxin-dependent peroxiredoxins, manganeses, bacterioferritin and carotenoids) [51]. Contrary to canonical Nitrospira, the comammox genomes contain a 2/2 hemoglobin type II (TrHb2), which has been associated with oxidative stress resistance and oxygen scavenging [57, 58]. Thus, the distinct capacity to deal with oxidative stress within Nitrospira spp. may enable these organisms to coexist by occupying different microniches in environments with oxygen gradients such as biofilms and flocs. Most of the genomes encode two proteins homologous to RsbUV, related to environmental stress response, and comammox clade A genomes possess a fusion protein distinct from other Nitrospira spp. that is homologous to RsbUVW. N. moscoviensis also encodes all the genes for the stressosome (rsbRSTX), a complex that controls several signaling pathways in response to diverse environmental stresses [59] (further details in Supplementary Information). Hence, based on these genomes characteristics, comammox clade A and especially canonical N. moscoviensis might be better adapted to changing environmental conditions than other Nitrospira. On another note, genes associated with low-resource environments were encountered in some of the Nitrospira genomes. Comammox Nitrospira may have higher requirements for copper than canonical Nitrospira as ammonia monooxygenase contain Cu as cofactor. Different from canonical Nitrospira, the comammox genomes contain Cu homeostasis genes (copCD and copAB) with high sequence similarities to homologs in β-AOB. These proteins confer higher Cu2+ tolerance or increased Cu2+ uptake [60, 61] and may allow comammox Nitrospira to survive in conditions of low Cu2+ availability. Canonical Nitrospira genomes harbor the genes for the cytochrome c biogenesis system II. In contrast, comammox genomes contain the genes for the cytochrome c biogenesis system I. While system II requires less energy for cytochrome synthesis, system I is advantageous in iron-limited environments as this system has hemes with higher iron affinity [62]. As nitrifiers have a high iron requirement for their [FeS] cluster- and heme-containing enzymes, this could provide an important competitive advantage for comammox Nitrospira in iron-limited environments [63]. Comammox Nitrospira clade B genomes contain the methionine salvage pathway which was not detected in clade A or canonical Nitrospira genomes (Fig. 3 and Supplementary Table S2). This pathway is involved in the recycling of sulfur-containing metabolites to methionine and is upregulated under sulfur-limiting conditions [64, 65]. Thus, comammox clade B could better persist during periods of sulfur depletion. Genes related to arsenic and mercury resistance were found in all studied genomes. Especially striking were the cases of comammox CG24_B and AAU_MBR1 which contain phylogenetically distinct chromate, arsenic and mercury resistance proteins in a putative integrative conjugative element (Supplementary Information).

Comparison between comammox Nitrospira and ammonia oxidizing microorganisms

Comammox Nitrospira generally harbor a single copy of AMO and HAO genes (Ca.N. nitrosa contains two amoA genes and all comammox genomes harbor extra amoC besides the one present in the amo operon). A similar scenario is observed for γ-proteobacterial AOB (γ-AOB) and AOA. In contrast, β-AOB genomes are characterized by the presence of two or three copies of amo and hao gene clusters. Differences are also detected for genes related to NOx metabolism. Copiotrophic AOB genomes generally contain cytochrome c nitric oxide reductase (cNOR), heme-copper nitric oxide reductase (sNOR), nitrosocyanin, and cytochrome P460. In the case of oligotrophic AOB genomes, they lack sNOR and in some cases nitrosocyanin or cNOR (Supplementary Table S3). On the other hand, comammox Nitrospira genomes do not have the four mentioned proteins related to NOx metabolism as also observed for AOA genomes (Supplementary Table S3). This lack of NOx metabolism related proteins may be associated with low ammonium concentration environments where nitrosative stress would be minor. Nevertheless, the investigated comammox Nitrospira genomes contain two proteins putatively linked to resistance to nitrosative stress: a NO-responsive regulator (nnrS) and a 2/2 hemoglobin type I (TrHb1), that are not found in AOA. The different AOP groups also clearly differ in their carbon fixation pathways. AOB and AOA utilize the oxygen-tolerant Calvin–Benson–Bassham and hydroxypropionate–hydroxybutyrate cycles, respectively, while comammox Nitrospira possess the microaerophilic related rTCA pathway [66]. Furthermore, canonical AOP genomes encode the low-affinity aa3-type heme-copper oxidase (Nitrosomonas eutropha also contains a high-affinity cytochrome c oxidase cbb3). In contrast, comammox Nitrospira harbor cytochrome bd-like oxidases. Cytochrome bd oxidases have high affinity for O2 [67] and are expressed under O2-limited conditions in other microorganisms [68, 69], but it remains unknown whether this is also the case for the cyt bd-like oxidases of Nitrospira spp. Additionally, the 2/2 hemoglobin type II (TrHb2), associated with oxidative stress resistance and oxygen scavenging, which was detected in comammox Nitrospira, is not universal for AOB (only present in Nitrosospira spp.) and has not been detected in AOA. It is to be experimentally confirmed whether these low oxygen-tolerance features are also physiologically expressed in comammox Nitrospira and may provide an advantage for growth in microaerophilic environments. Another characteristic of comammox Nitrospira is their genetic potential to compete under phosphorous and copper limiting conditions. Comammox Nitrospira genomes contain an alkaline phosphatase (phoD), which has been detected to be highly expressed under phosphorus limitation and starvation in other microorganisms [70, 71]. This enzyme was not detected in AOA genomes and is not universal in AOB (putative homologs are present in Nitrosomonas sp. Is79A3, Nitrosococcus halophilus and Nitrosospira spp.). In relation to Cu homeostasis, while copCD genes are present in both AOB and comammox genomes, the genes copAB detected in comammox Nitrospira are not common for AOB as just few species harbor homologs of these genes (Nitrosomonas europaea, Nitrosomonas eutropha, Nitrosospira multiformis and Nitrosospira sp. Nv17). Although AOA seem to have a high Cu2+ demand due to their high presence of copper-containing proteins, their Cu2+ acquisition mechanisms are unknown [72]. These two aspects could be of great importance as both phosphorous and copper deficiency can impact nitrification in engineered environments [73,74,75]. Recently, it was described that N. inopinata, the only comammox pure culture to date, has a high affinity for ammonia similar to AOA and much higher than AOB cultures [76]. Taken together, comammox Nitrospira might be adapted to nutrient-limited conditions. Consistently, comammox Nitrospira have so far mainly been detected in substrate-limited environments as well as in low oxygen or microaerophilic environments [1, 2, 6, 77,78,79,80,81].

Horizontal gene transfer events shaped comammox Nitrospira

The AMO and HAO sequences of β-AOB are most closely related to those of comammox Nitrospira [1, 2, 6, 14]. In fact, the sequence similarity of amo and hao genes between β-AOB and comammox Nitrospira (ca. 60% amino acid sequence identity for amoA and ca. 66% for haoA) is higher than between β-AOB and γ-AOB (ca. 45% for amoA and ca. 53% for haoA), even though the latter two groups belong to the same phylum (Proteobacteria) and are only distantly related to the Nitrospirae phylum. Discrepancies between gene-trees and species-trees can be caused by horizontal gene transfer (HGT—in this case, for instance of amo and hao genes between comammox Nitrospira and β-AOB), or by gene duplication followed by differential loss (one copy lost in one group, the other copy lost in a second group). To explore whether HGT has played a role in the evolutionary history of comammox Nitrospira and β-AOB, we compared the sequence dissimilarity of proteins related to the ammonia oxidation pathway and 18 housekeeping proteins to that of a set of 14 ribosomal proteins (Supplementary Table S1). We included sequences from the 11 comammox genomes investigated in this study as well as from eight and two previously published β-AOB and γ-AOB genomes, respectively (Supplementary Table S4). Pairwise dissimilarity comparisons showed that housekeeping proteins are linearly related to the ribosomal proteins across essentially all genomes (Fig. 4 and Supplementary Figure S7) suggesting vertical inheritance. However, for most of the proteins associated with ammonia oxidation, the relationship is different with obvious discontinuities (Fig. 4 and Supplementary Figure S8). Generally, it was observed that the sequence dissimilarities between β-AOB and comammox genomes are smaller than expected, while the sequence dissimilarities are larger than expected between β-AOB and γ-AOB genomes (Fig. 4, Supplementary Figure S8, Supplementary Table S5 and Supplementary Information). Additionally, in some cases, the sequence dissimilarities between β-AOB and comammox genomes are as high as between comammox clade A and clade B (Supplementary Figure S8), although these two groups are closely related based on ribosomal proteins (Supplementary Figure S2).

Fig. 4
figure 4

Relationship between average phylogenetic distance of genomes and protein sequence divergence for housekeeping proteins (left) and for ammonia-oxidation related proteins (right) for comammox Nitrospira, β-AOB and γ-AOB genomes. Boxplot are colored according to the groups to which the compared genomes belong. The y-axis shows the pairwise protein dissimilarity (fraction of differing amino-acid sites) for a set of 18 housekeeping proteins or 27 ammonia-oxidation related proteins, while the x-axis shows the corresponding pairwise dissimilarity for a set of 14 ribosomal proteins. Asterisks(* and **) indicates non-significant (P > 0.01) and significant (P < 0.01) difference between means of sequence dissimilarities of two studied groups, respectively

This pairwise dissimilarity analysis indicates that the evolutionary history of the ammonia oxidation-related genes is complex, and might indicate the occurrence of horizontal transfer(s) between comammox Nitrospira and β-AOB.

We, subsequently, analysed the genomic arrangement of ammonia oxidation-related genes for all comammox genomes and for representative β-AOB genomes (Fig. 5 and Supplementary Figure S9). For all comammox genomes except for Ca. N. inopinata, the AMO and HAO gene clusters as well as the cytochrome c-biosynthesis genes are situated in the same genomic region. This is not the case for AOB genomes. Additionally, comammox genomes have two copies of amoD, while AOB only have one copy. Among the comammox genomes, clade B genomes uniquely contain a duplication of one of the cytochrome c-biosynthesis proteins (CcmI). Clade A genomes have two duplicated hypothetical proteins-coding genes next to the hao operon, while comammox clade B and AOB genomes have only one copy (more details of genomic arrangement analysis in Supplementary Information). Shared features in the genomic regions encoding the ammonia oxidization pathways between the two comammox clades that are not shared with other ammonia oxidizers, suggests that the comammox clades have a common ancestor for this specific genetic capacity.

Fig. 5
figure 5

Schematic of the ammonia oxidation pathway genomic region as well as other AOB-related genomic features in comammox Nitrospira clade A (Ca. N. nitrificans), clade B (GWW3 bin) and selected ammonia-oxidizing bacteria (Nitrosospira multiformis and Nitrosomonas europaea). Homologous genes are connected by lines. Functions of the encoded proteins are represented by color. Parallel double lines designate a break in locus organization (other genes are in between the genes of interest). Single line designates a break probably due to contig fragmentation. Position of the blocks denotes the orientation of the coding strand (upper strands for forward; down strands for reverse)

A reconciliation analysis was performed to stringently examine the possible occurrences of gene duplication(s), gene loss(es), and horizontal gene transfer(s). First, we constructed a clock-model based species-tree from the 14 ribosomal proteins (Supplementary Table S1), including the 16 examined Nitrospira genomes, Leptospirillum ferrooxidans (of the Nitrospirae phylum), eight and two publically available β-AOB and γ-AOB genomes, respectively, and two additional non-nitrifier β- and γ- proteobacterial genomes closely related to β-AOB and γ-AOB, respectively. (Supplementary Figure S10 and Supplementary Table S4). Gene trees were, subsequently, constructed for the ammonia oxidation-related genes under investigation (Supplementary Table S4). The probabilistic (Bayesian) analysis provided strong support for horizontal transfer of the majority of the investigated genes (Supplementary Table S6): for 12 genes, the posterior probability for at least one transfer event was between 95 and 100%, for other 10, the posterior probability was between 80 and 95%. For all investigated genes, we found very strong support (95–100%) for either a transfer or a duplication event.

The evolutionary history of amoA and haoA was further investigated. For amoA, the reconciliation model suggests a gene transfer event from β-AOB to an ancestor of comammox Nitrospira (Fig. 6). In addition, a transfer of amoA from the common ancestor of Ca. N. nitrosa and AAU_MBR_1 to Ca. N. inopinata was inferred. This transfer would explain the deviation in the tetranucleotide pattern of the amoCAB-containing region compared to the genome-wide signature identified in Ca. N. inopinata by Daims et al. [1]. Furthermore, unlike the other comammox genomes, Ca. N. inopinata does not have the amo operon located in the same genomic region as the hao operon and the genes for cytochrome c-biogenesis system II (Supplementary Figure S9).

Fig. 6
figure 6

Reconciliation of functional gene trees (based on amino acid sequences. Top. AmoA; bottom. HaoA) with species tree for comammox Nitrospira, β-AOB, and γ-AOB genomes. The species tree, based on 14 ribosomal proteins, is shown in gray with the gene-trees super-imposed on top in narrower black lines. Arrows and red dots denote transfer and loss events, respectively. The displayed tree is the most parsimonious tree. Dashed arrows indicate an alternative reconciliation scenario

The high dissimilarity between the amoA sequences of the otherwise closely related clade A and clade B comammox genomes (Supplementary Figure S10), may be explained by an adaptation of the comammox organisms to different niches. Based on the pairwise dissimilatory sequence analysis, clade B comammox genomes have a faster evolution rate of the ammonia oxidation-related proteins than clade A genomes (P < 0.0001; only for AmoA P < 0.0005), while the two clades do not have a different evolution rate for the housekeeping protein (P > 0.5).

Regarding haoA, the reconciliation model predicted two transfer events (Fig. 6). As for amoA, a transfer from β-AOB to an ancestor of comammox Nitrospira is suggested. Another transfer was inferred from clade B to the ancestor of the clade A genomes A2 and CG24_B. Consistent with this hypothesis, A2 and CG24_B contain genes coding for a copper-containing nitrite reductase next to the HAO cluster as also observed for clade B genomes (Supplementary Figure S10).

Besides the scenarios predicted from the reconciliation model, alternative scenarios cannot be ruled out due to the uncertainties associated with the limited number of sequenced Nitrospira and AOB genomes. An alternative scenario would be HGT of amoA and haoA from comammox Nitrospira to β-AOB (Fig. 6). In another possible scenario, both comammox Nitrospira and β-AOB would have acquired the ammonia-oxidation genes from an unknown third donor. This final scenario was tested in our reconciliation analysis by including the option of investigating transfers to and from unsampled species. The potential involvement of an unknown donor in the evolutionary history of amoA is not discarded, although these scenarios are less parsimonious than the one showed in Fig. 6. On the other hand, no evidence of HGT from an unsampled species was detected for haoA.

As comammox clade A and clade B do not form a monophyletic group within Nitrospira sublineage II (Supplementary Figure S2), loss events for both amoA and haoA are expected in some of the Nitrospira spp. that phylogenetically placed between these clades such as N. moscoviensis and CG24_D. One possible explanation for loss of ammonia oxidation capacity maybe the postulated trade-off between rate and yield of ATP production [13]. Under this hypothesis, shortening the nitrification pathway would lead to an increased specific growth rate which could be advantageous in some scenarios. Thus, the lack of the putatively acquired AMO and HAO-related genes in extant canonical Nitrospira spp. could be the result of selection for optimal pathway length.

The clock-based species-tree estimated the separation between clade A and clade B comammox genomes to have occurred approximately 300 ± 90 MYA, while the split between Nitrospira lineage I and II was dated at 375 ± 50 MYA (Supplementary Figure S10). Our model point towards an earlier transfer event for both amoA and haoA although it was not possible to date the HGT event with a high level of certainty. Hence, it remains unknown whether additional loss events would be required to explain the lack of comammox genes in Nitrospira genomes belonging to lineages other than lineage II.

Conclusions

In summary, our findings reveal diverse genetic capabilities of the two comammox clades, canonical Nitrospira and canonical ammonia oxidizing prokaryotes. The absence of NOx metabolism pathways in the studied comammox genomes, as observed in AOA and some oligotrophic AOB strains in comparison with copiotrophic AOB, and the diverse genetic capacity for tolerance of low micronutrient concentrations, together with the dominant detection of comammox Nitrospira in substrate-limited environments might indicate an oligotrophic lifestyle of comammox Nitrospira. In fact, an oligotrophic lifestyle has been experimentally demonstrated in N. inopinata [76], the only comammox isolated to date.

Additionally, we identified a high probability of transfer events between β-proteobacterial ammonia oxidizers and comammox Nitrospira for genes belonging to the ammonia oxidation pathway. Together, these results expand our knowledge of the ecology and evolution of the recently discovered comammox Nitrospira. Further discovery and characterization of new comammox Nitrospira, as well as canonical NOB and AOP genomes will help confirm these observations.