Stepwise pathway for early evolutionary assembly of dissimilatory sulfite and sulfate reduction

Microbial dissimilatory sulfur metabolism utilizing dissimilatory sulfite reductases (Dsr) influenced the biochemical sulfur cycle during Earth’s history and the Dsr pathway is thought to be an ancient metabolic process. Here we performed comparative genomics, phylogenetic, and synteny analyses of several Dsr proteins involved in or associated with the Dsr pathway across over 195,000 prokaryotic metagenomes. The results point to an archaeal origin of the minimal DsrABCMK(N) protein set, having as primordial function sulfite reduction. The acquisition of additional Dsr proteins (DsrJOPT) increased the Dsr pathway complexity. Archaeoglobus would originally possess the archaeal-type Dsr pathway and the archaeal DsrAB proteins were replaced with the bacterial reductive-type version, possibly at the same time as the acquisition of the QmoABC and DsrD proteins. Further inventions of two Qmo complex types, which are more spread than previously thought, allowed microorganisms to use sulfate as electron acceptor. The ability to use the Dsr pathway for sulfur oxidation evolved at least twice, with Chlorobi and Proteobacteria being extant descendants of these two independent adaptations.


Tree reconstruction statistics
For each protein set four phylogenies were reconstructed with firstly running the best model selection [1] (Supplementary Table 5) and, in the second phylogeny, with the amino acid substitution model LG+I+G4.An additional set of phylogenies was performed, with a reduction of sequences to proteins encoded in only complete genomes, to investigate the impact of metagenome-derived sequences (Supplementary Table 3).Phylogenies with only sequences from complete genomes were also computed with the model LG+I+G4 and additionally with the ModelFinder of IQ-tree [1].24 out 42 phylogenies were reconstructed with the identically chosen best model LG+I+G4 according to BIC (Bayesian information criterion), in seven cases the model WAG+I+G4 was chosen.In the remaining 17 cases the best model was always a LG model [2] with different combinations of parameters including the proportion of invariable sites +I, empirical frequencies +F, and discrete Gamma model with four categories +G4 (Supplementary Table 5).
Four phylogenies were reconstructed for each of the 15 Dsr proteins, however, some phylogenetic reconstructions needed to be excluded from further analysis.The proteins DsrE, DsrF, and DsrH belong to the DsrEFH sulfurtransferase complex [3].The DsrE proteins bind sulfur with an conserved cysteine residue and transfer it to the DsrC protein [3].Although DsrH has also one conserved cysteine, DsrE proteins are directly involved in the transfer and maintain a stronger phylogenetic signal.In contrast, the small DsrF and DsrH protein sequences create 'star-like' topologies and do not allow conclusive results.The same issue exists for the small sequences of DsrT proteins.Another problem occurs for DsrD proteins.
The around 80 amino acids long DsrD sequences (Supplementary Table 4) have only low allvs.-allsequence similarities, mostly below 25% global identity.Thus, the phylogeny creates artifacts with highly supported clades corresponding to the sequence identities below 25%.

Effect of metagenomic data in DsrA(B) phylogenies
We used three different approaches to reconstruct the phylogenies of DsrA(B) proteins: using sequences only from complete genomes, including all metagenomic diversity and using the paralogous rooting analysis.Both DsrA and DsrB clades mirror each other with the main differences being in missing representatives in either of the clades since multiple metagenome-derived lineages encode for only one of the two proteins due to their genome incompleteness.Further, some clades within the bacterial reductive-type DsrA and DsrB proteins have low phylogenetic resolution and are poorly supported.
While two of our methodological approaches retrieved the expected phylogeny, the inclusion of the extended diversity of Dsr protein sequences within a larger phylogeny led to a changed topology.In this phylogeny, the archaeal reductive-type DsrA proteins branch from within the bacterial reductive-type DsrA proteins (Supplementary Fig. 2a), instead of being inbetween the reductive-type and oxidative-type bacterial clades (Supplementary Fig. 2b), as observed previously [4][5][6].Here, the MAD method [7] separates the proteins by function, with the root between oxidative-type and reductive-type proteins (Supplementary Fig. 2a).
Although not discussed, a similar topology was also found in a recent study in which several new Dsr-containing lineages were identified [8].

Alternative scenarios regarding the origin of DsrABCMK(N)-dependent sulfite reduction in Archaea
Besides the assembly of the DsrABC, DsrMK and DsrN pathway in Archaea, leading to an archaeal origin of this pathway (but not necessarily of the ancestral modules at the origin of each one of these protein families), alternative hypotheses were or can be put forward regarding the appearance of the ancestral module: i) the ancestral version of the pathway was present within the last universal common ancestor.This version would have been maintained within the archaeal domain and lost in the majority of Bacteria, with the ancestral dsrAB genes being kept only in Moorella spp.(2 nd copy) and the ancestral dsrMK in the remaining Clostridia representatives.This hypothesis is not supported by our as well as other's analyses [4,6,9].Briefly, although sulfite and sulfate records possibly trace back to 3.47 Gya [10], it is though that more reduced compounds (in this case sulfide) would have been the predominant sulfur species at the time of the origin of Life [11].Taking into account the early-branching lineage analysis of the minimal set, this hypothesis is further unlikely, as it invokes multiple losses and regains (replacements) of the same metabolic pathway within Bacteria, especially within Clostridia genera.
Another hypothesis would be for the pathway to have originated within a so far unidentified or extinct bacterial lineage, and to have been acquired early in Archaea by possibly several interdomain LGT events.In this case, within Bacteria, the pathway would have evolved faster that in Archaea, where multiple events of gene losses, LGT, and replacements would have occurred.This hypothesis is unlikely as only the genes from archaeal assemblies, and not Bacteria, consistently branch basal within the phylogenies presented here (Supplementary Fig. 1, Supplementary Figs.3-6).

Intertwined evolution of QmoABC and AprAB
The AprAB-Sat-Qmo complexes are necessary for sulfate activation to APS and its reduction to sulfite [12][13][14][15][16].The AprA(B) phylogenies show a basal clade containing archaeal-type AprA(B) from sulfite reducers (where the sulfate reducers Vulcanisaeta moutnovskia [14] is included) having as sister clade SOB with the AprM protein.On the other side are several AprA clades of QmoABC-containing lineages, with AprAB sequences from Chlorobi SOB (with the QmoABC complex) branching in between.Other AprA from SOB with the QmoAB-HdrBC complex group in a later diverged clade.This functional separation into clades corresponding to the presence (or absence) of the electron transfer units Qmo and AprM was also previously reported [14,17] .Here, we have further observed that besides the AprA clades corresponding to the QmoABC complex, a clade containing putative sulfate reducers, with variations of the canonical QmoABC complex is also found (Supplementary Figure 11) where the QmoAB-HdrD/HdrBC from Clostridia, Deltaproteobacteria, and several unclassified lineages are found.This tends to be associated with the synteny of AprAB linked to Qmo clades, sometimes with Sat in one genomic arrangement.For instance, AprAB clades of Deltaproteobacteria are divided into three major clades according to their taxonomic order, having Thermodesulfobacteria as sister clade.This organization occurs also in QmoB and QmoC phylogenies, and corresponds to the Sat-AprAB-[typeI-QmoABC] operon.
Based on our synteny analysis, in most bacteria where the type I QmoABC complex is present, the DsrJOP, DsrT, and DsrD proteins are co-distributed (Fig 3 .,Supplementary Table 1).An exception to this observed trend is found in Ca.Acidulodesulfobacterales and Nitrospirae that branch basal to the oxidative-type DsrK proteins (Fig. 3.).In these lineages, the DsrMKJOP complex and the Sat-AprAB-QmoABC cascade are incomplete, and, if present, are a mixture of both reductive-and oxidative-type proteins.Ca.
Acidulodesulfobacterales were proposed to be able to switch between oxidation and reduction of sulfur compounds [18], which may also be the case for these Nitrospirae lineages.These correspond to exceptions of this grouping, where Ca.Acidulodesulfobacterales and Nitrospira present in AprAB do not form a consistent clade across Qmo phylogenies (represented in green in Supplementary Figure 11), since the individual subunits of their Qmo complex are from type-I (QmoAB) and type-II (QmoC) in the case of Ca.Acidulodesulfobacterales.In summary, the overall topologies of AprA, QmoB, and QmoC (Supplementary Figure 11) show the existence of six major groups in AprAB phylogenies, four of which are also found within QmoABC proteins indicating their co-evolution over time.

Supplementary Table 1 | Syntenic regions of DiSCo/diamond blastp hits across 2,070
genomes with at least one hit to the minimal protein set DsrABCMK.Each syntenic region is separated by a semicolon.Proteins with hits for two non-homologous proteins were identified as fusion proteins and are indicated with ProteinX+ProteinY.Hits to AsrC using the TigrFam TIGR02912 profile are also listed.LG+I+G4 LG+I+G4

Fig. 1 | 18 Archaeoglobus
Genomic neighborhood of genes encoding for Dsr proteins branching basal in the Dsr phylogenies.Genes whose protein sequences branch in basal clades in the corresponding phylogenies are colored in orange, the ones at the base of the bacterial reductive-type clade in red, the remaining sequences located within non-basal clades in dark grey.Phylogenies with low resolution (DsrD; DsrPJT) or type-specific proteins (DsrE/DsrL) were not classifiable regarding basal clades, and only the presence is indicated in black.Co-syntenic non-dsr genes are indicated with light gray and pseudo genes in light brown.Letters in genomic colocalizations indicate the subunit of Dsr proteins.Ca.Hydrothermarchaeota archaeon JdFR−

N M Supplementary Fig. 2 |Supplementary Fig. 3 |
Phylogenetic reconstruction sulfite reductases a) of DsrA proteins from (meta)genomes and b) of DsrA proteins from complete genomes.(models for phylogenetic reconstructions: LG+I+G4).Distribution and phylogenetic clade classification of Dsr proteins across Archaea.Color code according to Supplementary Figure 1.

Table 1
is provided as separate Excel file.

Table 2 | Query sequences used for diamond blastp.
Sequences without reference are homologues present in the genomes of query sequences.

Table 2 |
Query sequences used for diamond blastp.Continued.

Table 3 | Redundancy reduction using a 90% global identity cut-off.
The total number of identified sequences, unique sequences, sequences after the redundancy filtering used in phylogenetic analysis, and sequences belonging to complete genomes is given.

Table 4 |
Alignment lengths before and after trimming.The number of positions of each multiple sequence alignment before and after trimming using a 95% gap threshold is given for the two sequence sets.

Table 5 | Best model for each Dsr protein set according to BIC.
Amino acid substitution model of the phylogenies of Dsr proteins present in only complete genomes and phylogenies of the sequences set covering the full diversity including metagenomederived genomic assemblies.

Distribution and phylogenetic clade classification of Dsr proteins across Acidobacteria, Elusimicrobia, FCB group, Nitrospinae, and Nitrospirae.
Color code according to Supplementary Figure1.
Nitrospirae Supplementary Fig. 5 | Distribution and phylogenetic clade classification of Dsr proteins across Proteobacteria.Color code according to Supplementary Figure 1.