An ancient metalloenzyme evolves through metal preference modulation

Evolution creates functional diversity of proteins, the essential building blocks of all biological systems. However, studies of natural proteins sampled across the tree of life and evaluated in a single experimental system are lacking. Almost half of enzymes require metals, and metalloproteins tend to optimally utilize the physicochemical properties of a specific metal co-factor. Life must adapt to changes in metal bioavailability, including those during the transition from anoxic to oxic Earth or pathogens’ exposure to nutritional immunity. These changes can challenge the ability of metalloenzymes to maintain activity, presumptively driving their evolution. Here we studied metal-preference evolution within the natural diversity of the iron/manganese superoxide dismutase (SodFM) family of reactive oxygen species scavengers. We identified and experimentally verified residues with conserved roles in determining metal preference that, when combined with an understanding of the protein’s evolutionary history, improved prediction of metal utilization across the five SodFM subfamilies defined herein. By combining phylogenetics, biochemistry and structural biology, we demonstrate that SodFM metal utilization can be evolutionarily fine tuned by sliding along a scale between perfect manganese and iron specificities. Over the history of life, SodFM metal preference has been modulated multiple independent times within different evolutionary and ecological contexts, and can be changed within short evolutionary timeframes.

Article https://doi.org/10.1038/s41559-023-02012-0 to those of dimeric SodFM1s (Figs. 1c and 2b). SodFM5s form a sister group to SodFM3s in phylogenetic trees (Extended Data Fig. 2b,e) and structural comparisons (Extended Data Fig. 2c,d), and consist mostly of eukaryotic mitochondrial SodFMs (for example, human mitochondrial Mn-SOD) and a few bacterial sequences including that of Parachlamydia acanthamoebae (Fig. 2a). Grouping of the highly distinct mitochondrial SODs with sequences from Chlamydia could represent functional convergence, eukaryote to prokaryote lateral gene transfer (LGT) or be evidence of ancient LGT between the bacterial ancestor of mitochondria and that of Chlamydia.

SodFM metal preference is continuous rather than discrete
Within subfamilies SodFM1, SodFM2 and SodFM4, we identified examples of cambialistic enzymes 12,18 , and four subfamilies (SodFM1-4) had members that exhibited multiple different metal utilization preferences (Fig. 2a). We thus leveraged our large set of natural enzymes sampled from across the tree of life (Fig. 2a) to investigate if metal preference is a discrete three-state property (Fe, Mn or cambialistic), or rather forms a range of values between perfect Fe and Mn specificities. Although the term 'metal-specificity' is commonly used in reference to the disproportional activity of a SOD enzyme with Fe and Mn, the cambialistic SODs are inherently non-specific for either metal as they can use both 13 . Across the tested SodFMs, metal preferences, represented by approximate cambialism ratios (aCR, equal to the Fe-dependent activity divided by the Mn-dependent activity), formed a range of values ( Fig. 2a-c) between higher Fe preference (aCR >2) and higher Mn preference (aCR <0.5), with perfect cambialism at the midpoint (aCR of 1), indicating that metal preference is best thought of as a continuum. We propose the term 'metal preference' instead of 'metal specificity' to better reflect the observation that most SodFMs are not completely metal specific. Note that the terms metal preference and metal specificity reflect the relative activity levels with alternative metal co-factors, and should not be confused with metal selectivity (metal-binding preference 19 ), which refers to differences in the affinity of metal binding 20 .

Secondary sphere is key to SodFM metal-preference evolution
To test if aCR values matched with the identity of previously identified residues spatially located within the metal's secondary coordination sphere, X D-2 (refs. 13,21,22) and H Cterm /Q Cterm /Q Nterm [23][24][25][26][27] (Fig. 1c), we analysed the distribution of aCRs with respect to the residue found at these positions across SodFMs found in nature (Fig. 2). Higher Fe preference was usually associated with the presence of two alternative water-coordinating residues, H Cterm (69% FeSODs, 9/13 enzymes) or Q Nterm (72% FeSODs, 18/25 enzymes), whereas the enzymes containing Q Cterm were mostly Mn preferring (78% MnSODs, 21/27 enzymes). At the position X D-2 , the two smallest amino acids, glycine (G D-2 ) or alanine (A D-2 ) (Fig. 2b), were frequently found in SodFMs with higher Mn preference (Fig. 2c). More cambialistic outliers included the Bacteroides fragilis (aCR 0.98) and Coprobacter fastidiosus (aCR 0.99) SodFM2s, which seem to have evolved higher Mn activity from their most likely Fe-preferring ancestor. This confirmed that the presence of G D-2 and A D-2 within the context of otherwise predominantly Fe-preferring SodFMs can indicate potential evolutionary metal-preference modulation events. In contrast, either threonine (T D-2 ) or valine (V D-2 ) at this site (Fig. 2b) were usually found in SodFMs with higher Fe preference (Fig. 2c). The only two non-Fe-preferring outliers were two Bacteroidales SodFM1s from Anaerophaga thermohalophila (T D-2 ; aCR 0.87) and Bacteroidetes RBG_13_43_22 (T D-2 ; aCR 0.46). However, these two SodFM1s have probably evolved from highly Mn-preferring ancestors towards increased activity with Fe. This suggests that mutations of X D-2 and H Cterm /Q Cterm /Q Nterm residues were involved in the identified evolutionary changes of metal preference.
Evidence shows that metal specificities within protein families have frequently changed during their evolutionary history 11 . These changes have been important in the evolution of commensals into pathogens as they adapt to metal restriction in the host 12,13 , and in the adaptation of organisms to changes in terrestrial conditions such as altered elemental bioavailability 11,14 induced by oxygenation of the atmosphere and oceans by the emergence of photosynthesis. In contrast to metal-specific proteins, metal-interchangeable (cambialistic) proteins can function using alternative metals 15,16 . Only a few metalloenzymes have been rigorously demonstrated to be cambialistic in vivo, including some iron (Fe)-or manganese (Mn)-dependent superoxide dismutases (SODs; SodFMs) 16 ; metalloenzymes that play an important role in cellular reactive oxygen species defence 17 .
In this Article, we sought to investigate the mechanisms and extent of metal-preference evolution across the tree of life in a widely distributed and highly conserved metalloenzyme family. To study this process, we leveraged the existence of naturally Fe-specific, Mn-specific and flexible Fe/Mn-cambialistic SodFM homologues, which differentially utilize these metals to catalyse the detoxification of superoxide.

SodFM metal preference is not phylogenetically constrained
Bioinformatic analyses of 3,058 genomes sampled across the tree of life (146 eukaryotic, 2,613 bacterial and 281 archaeal genomes) showed that SodFMs are the most widely distributed and most highly conserved of the canonical superoxide detoxification enzymes (Extended Data Fig. 1a,b). Phylogenetics (Extended Data Fig. 2b,d), sequence (Extended Data Fig. 3a) and structural comparisons (Extended Data Fig. 2a,c,e) enabled us to subdivide the SodFM protein family into five main subfamilies: SodFM1-5 (Fig. 1a-c and Extended Data Fig. 2). Members of the two main subfamilies, SodFM1 and SodFM2, are commonly annotated in databases as being either Mn or Fe specific, respectively 14 . To test this, we heterologously expressed 64 representative SodFMs (Extended Data Fig. 4) sampled from across their protein family tree (Fig. 2a) and characterized their activity with Fe and Mn. This revealed that neither subfamily was homogeneous for metal utilization (Fig. 2a) and should no longer be referred to as Fe or Mn specific on the basis of homology searches alone. The robustness of the SodFM1 and SodFM2 groupings (Extended Data Fig. 3), and their wide distribution across the tree of life (Fig. 1a), is consistent with the hypothesis that the split between the SodFM1 and SodFM2 subfamilies is ancient 14 . On the basis of the distribution of the different metal preferences across their protein trees, ancestral sequence reconstruction (Fig. 2a) and estimations of ancestral metal utilization (Extended Data Fig. 2f,g), it seems that the ancient last common ancestors (LCAs) of SodFM1s and SodFM2s were probably Mn and Fe specific, respectively, but these ancestral states were later changed multiple independent times throughout the SodFM family's evolution (Fig. 2a).

Uncovering untapped evolutionary diversity of SodFMs
We also identified three further subfamilies of distinct SodFMs, SodFM3-5 (Extended Data Fig. 2). SodFM3s and SodFM4s seem to be restricted mostly to prokaryotes as none were identified in our sampling of eukaryotes (Fig. 1b). SodFM4s are the most divergent of all SodFMs (Fig. 1b,c and Extended Data Fig. 2b) but are largely uncharacterized, with only a single available crystal structure (Extended Data Fig. 2a). Like the SodFM1s and SodFM2s, metal preference is not conserved across the SodFM3 and SodFM4 subfamilies (Fig. 2a). One of the key structural distinctions between these subfamilies is that SodFM1/2s are homodimeric and SodFM3/4s are homotetrameric (Extended Data Fig. 2a). The subfamily of SodFM5s provides an interesting intermediate group of tetrameric enzymes (Extended Data Fig. 2a) with some sequence features, including the frequency of X D-2 residue (second residue towards the N-terminus from Asp metal ligand), more akin Article https://doi.org/10.1038/s41559-023-02012-0 Importantly, these isozymes also demonstrated that understanding an enzyme's evolutionary history can aid predictions of their properties.

Switched metal preference evolved multiple independent times
To better understand the context of metal-preference evolution, we mapped the distribution of the identified residues and measured aCRs onto the trees of life (Fig. 3a-e and Extended Data Figs. 5 and 6). Clear patterns of metal-preference modulation were identified during the evolution of bacterial lineages important for human health including Bacteroidales (Fig. 3a and Extended Data Fig. 5), Mycobacteria ( Fig. 3d) and Staphylococci (Fig. 3e); and enigmatic uncultured Wolfebacteria (Fig. 3c) from bacterial candidate phyla radiations (CPR). In human-pathogenic Mycobacteria, acquisition of more Fe-preferring tetrameric SodFM3s has occurred at least twice (Fig. 3d), whereas a single neofunctionalization event could be identified in the ancestor of Staphylococcus aureus and Staphylococcus argenteus (Fig. 3e). The largely anaerobic Bacteroidales are a taxonomic group in which metal preferences have frequently switched. We identified at least two independent emergences of cambialism in this group, once from Fe-SodFM2 and a second time from Mn-SodFM1 ancestors. CPR Wolfebacteria represented a striking and highly unusual emergence of dimeric (Extended       (Fig. 1a). A range of aCRs, from higher Fe preference (aCR >2, red), through cambialistic (0.5 < aCR < 2, green), to higher Mn preference (aCR <0.5, orange), were found across the SodFM subfamilies. Sensitivity to peroxide inhibition (+) was detected for Fe-preferring and Fe-loaded cambialistic enzymes, but not in the Mn-preferring enzymes, which were resistant to peroxide treatment (−). Identity of the metal's two key secondary coordination sphere residues, namely water-coordinating Gln/His and the X D-2 residue located close to the Asp ligand (Fig. 1c), are displayed next to aCRs. b, Distribution of the aCRs for the SodFM subfamilies is presented as box and whiskers plots (left), and the frequencies of different amino acid residues found at the X D-2 position are presented as pie charts (right). SodFM1s and SodFM5s were grouped together here as they both contained C-terminal Gln and displayed similar metal preference and X D-2 distribution patterns (SodFM5s X D-2 , G: 89%; A: 7%; T: 1.7%). The high proportion of Mn-preferring SodFM1s and Fe-preferring SodFM2s, and the predicted Gln/His and the X D-2 residues in the reconstructed sequences of the ancestral nodes (a, dashed outlines), are consistent with the hypothesis that the LCAs of these two subfamilies were Mn and Fe preferring, respectively (Extended Data Fig. 2f,g). c, Distribution of the aCR values for SodFMs (top) with different water-coordinating residues (C-terminal Gln, N-terminal Gln or C-terminal His) and different X D-2 residues (bottom) are displayed as box and whiskers plots on a logarithmic scale, colour coded to reflect apparent metal specificities (Mn: orange; cambialistic (camb): green; Fe: red). Amino acid residues were colour coded to enable comparisons among a, b and c.  (Fig. 1c). Importantly, our extensive analysis of SodFM metal utilization evolution provided further evidence demonstrating that the identity of X D-2 and H Cterm /Q Cterm /Q Nterm combined with phylogenetic analysis can enable predictions of their metal preferences more accurate than those on the basis of sequence analysis or phylogenetics alone.

Ancient SodFM metal-preference switch in Alphaproteobacteria
Based on predictions derived from our analysis, we identified a distinct group of Mn-preferring SodFM2s in Alphaproteobacteria (SodFM2.4;  Fig. 6a), and represent the most ancient identified evolutionary metal-preference switch (Fig. 3f) other than that at the split between SodFM1s and SodFM2s (Fig. 2a). Later, five divergent SodFM2.4 sequences reacquired more typical SodFM2 residues, including T/V/C D-2 in place of the SodFM2.4 hallmark residues G D-2 , and display Fe preference (for example, Methylocapsa acidiphila; Fig. 3b and Extended Data Fig. 6a), suggesting they have evolutionarily reverted to their ancestral metal preference on at least three independent occasions ( Fig. 3b and Extended Data Figs. 6a,b and 8a). Extensive sequence analysis-driven mutagenesis of Agrobacterium tumefaciens SodFM2.4 (Extended Data Fig. 9m) revealed that, compared with changing from preferential utilization of a single metal co-factor to cambialism, as observed in S. aureus and Bacteroidetes (Fig. 3a,e), reversion from high Mn to high Fe preference may require more than a single mutation. It is therefore striking that SodFM2.4s from multiple species have evolved back to the ancestral Fe preference (Extended Data Fig. 6) rather than acquiring a canonical Fe-preferring SodFM2 via LGT.

SodFM metal preference co-evolves with peroxide sensitivity
In addition to higher Mn preference, similarly to MnSodFM1s, MnSodFM2.4s were resistant to hydrogen peroxide inhibition (Extended Data Figs. 4c,k and 8a). It is well established that metal preference is strongly associated with the sensitivity of SodFMs to peroxide 28     Staphylococcaceae contained only SodFM1s (e). The identities of the watercoordinating Gln/His, the residue X D-2 , and verified aCR and peroxide inhibition for the SodFMs characterized in this study were mapped onto the trees and annotated as previously described (Fig. 2). Inferred multiple independent evolutionary shifts towards higher Fe preference and peroxide sensitivity (red circles), higher Mn preference and peroxide resistance (orange circles), emergences of cambialism (green circles) and likely SodFM repertoire changes via LGT (magenta circles) were annotated onto the trees (a-e). f, The average phylogenetic distances between the inferred LCA nodes (coloured circles in a-e) and their descendent tree tips were calculated and presented on a colourcoded scale, where the minimum value represents the extant homologues (tree tips) and the maximum value represents the average distances of all studied archaea, bacteria and eukaryotes from their LCA (universal LCA). Emergence of alphaproteobacterial SodFM2.4 was the most ancient and that of the cambialistic SodFM1 within staphylococci the most recent of the identified metalpreference evolutionary modulation events. Scale bars represent the number of substitutions per site.
Article https://doi.org/10.1038/s41559-023-02012-0 is typically inhibited by peroxide. Results from our characterization of isozymes sampled across the tree of life ( Fig. 2a and Extended Data Fig. 4) showed that this biochemical feature was universally conserved across all studied SodFMs, consistent with it co-evolving with metal preference. Consequently, as in the case of metal preference, peroxide inhibition is not constrained to phylogenetic subfamilies, but has changed multiple independent times. This suggests that, in addition to adapting to changing metal bioavailability, metal preference could evolve in response to high concentrations of hydrogen peroxide, which is often used as a potent anti-microbial agent, both artificially in healthcare and by living organisms in nature.

SodFMs vary in susceptibility to metal-preference modulation
To verify our hypothesis that SodFMs can evolve their metal preference and peroxide inhibition by mutating secondary coordination sphere residues X D-2 and H Cterm /Q Cterm /Q Nterm , we performed biochemical characterization of 50 synthetically generated mutants of the natural enzymes. Mutations were designed to alter metal preference by introducing residues from isozymes with opposing preference, guided by our analysis of the frequencies of the amino acids in these positions in natural enzymes (Fig. 2). Of all tested mutants, 35 (70%) exhibited a significant shift in aCR value, indicating change in specific activity with one metal relative to that with the other metal. Of these 35 mutants, a total of 26 variants (52% of all mutants) displayed changes in the preferred metal relative to the wild type (WT) (Extended Data Fig. 9 and Supplementary Data 8), further supporting the significance of these residues in metal-preference determination. Decreased expression levels were observed for nine variants (18% of all mutants), suggesting that mutation of these residues can also have deleterious effects on protein stability and/or folding. No significant effects were observed in 15 (30%) mutants (Extended Data Fig. 9 and Supplementary Data 8), thus mutagenesis of X D-2 and H Cterm /Q Cterm /Q Nterm does not always result in metal preference change. This indicates the importance of other regions of the protein fold for metal-preference modulation by X D-2 and H Cterm /Q Cterm /Q Nterm mutagenesis. This is particularly evident in specialized metal-preferring SODs such as Escherichia coli SodA, which does not acquire Fe when homologously overexpressed in our E. coli strain in aerobic conditions (Extended Data Fig. 10f); and in Bacillus anthracis SodA2 (Fe-preferring SodFM1) or A. tumefaciens Mn-preferring SodFM2.4, where mutagenesis of X D-2 had no effect on their metal preference, although it played a clear role in the metal preference switch of their ancestor. This suggests that, following the metal-preference switch, subsequent changes in other regions of the fold can de-correlate later effects of the X D-2 and H Cterm /Q Cterm /Q Nterm mutations from their initial effects, a process termed epistatic drift 29 . Alternatively, metal-preference changes via X D-2 and H Cterm /Q Cterm /Q Nterm in some SodFMs could require preceding changes in other regions of the fold (intramolecular pre-adaptations).
Regardless of the underlying evolutionary mechanisms, the residues X D-2 and H Cterm /Q Cterm /Q Nterm constitute a strong indicator of metal preference in natural enzymes (Fig. 2). Although the susceptibility of SodFMs to metal-preference modulation by mutating these residues was not universal, the majority of variants in which these residues were mutated exhibited significant modulation of metal utilization.

Cambialism can be a positively selected property in nature
Next, we leveraged the cambialistic SODs from S. aureus and B. fragilis to investigate the specific role of the X D-2 position in determining metal preference (Fig. 4a,b and Extended Data Fig. 9a-d). We investigated multiple variants in which their X D-2 residues were mutated (Extended Data Fig. 9a-d), guided by our analysis of which amino acid appears at this position in natural SodFM sequences (Fig. 2b). In the S. aureus cambialistic SodFM1, the magnitude of aCR change was dependent on which amino acid was introduced at X D-2 (Fig. 4a), and was consistent with the correlation between the aCR and X D-2 residues of SodFMs sampled across the tree of life (Fig. 2c). Importantly, while three S. aureus mutants shifted back towards the ancestral Mn preference, three mutants showed greater Fe preference (Fig. 4a). The shift towards Fe preference was observed for the two amino acids (Thr and Val) most often present in Fe-preferring SODs, while a strong change towards the ancestral Mn specificity was observed for the two amino acids (Ala and Gly) most commonly found in Mn-preferring SODs. This capacity for bidirectionally changing metal preference in S. aureus SodM was unique across the tested SODs, suggesting it can provide a powerful model system for future studies of how metal preference is determined. This contrasted with the B. fragilis cambialistic SodFM2 (Fig. 4b and Extended Data Fig. 9c,d) where the majority of mutants, regardless of the amino acid introduced at the X D-2 position, were shifted towards the ancestral Fe preference. This can probably be explained by the structure of the amino acid (glycine) found in B. fragilis X D-2 position (Fig. 4e), which has no sidechain, thus replacement with any sidechain found in other amino acids affects its local environment and influences metal preference (Fig. 4b). Therefore, our results suggest that the S. aureus cambialistic SodFM1 has potential for evolution towards higher Fe preference, whereas the B. fragilis cambialistic SodFM2 has reached the highest possible level of Mn preference via mutagenesis of the X D-2 residue alone. Crucially, the cambialistic sequence is conserved in all analysed S. aureus and S. argenteus genomes, despite our observation that only a single X D-2 mutation is required to switch S. aureus cambialistic SodFM1 into an Fe-preferring SOD with substantially increased Fe-dependent activity, providing evidence for the selection of cambialism rather than Fe preference in S. aureus. We thus propose that cambialism need not strictly be an evolutionary intermediate stage between two metal specificities but rather the desired property that can be selected for in nature, probably in this case for circumventing nutritional immunity 12 .

Multiple evolutionary pathways can modulate metal preference
To further test the wider evolutionary significance of the secondary coordination sphere residues, we investigated double mutant variants combining mutations at positions X D-2 and the water-coordinating Q/ H Cterm . Our characterization of these variants revealed multiple evolutionary pathways towards metal-preference switching. In some cases, such as Nesterenkonia alba SodFM3 Q/H, a single residue has a major impact on the preferred metal co-factor ( Fig. 4g and Extended Data Fig. 9k,l), while in others, such as CPR SodFM1 H/Q, mutation at the same position had a negligible effect ( Fig. 4c and Extended Data Fig. 9e,f). In some, such as CPR SodFM1 H/Q V D-2 G, only a cumulative effect of the two mutations affected the metal-preference ( Fig. 4c and Extended Data Fig. 9e,f), whereas no significant effect was observed in either single or double mutants of nanoarchaeon Nanopusillus acidilobi SodFM4 (Extended Data Fig. 9i,j). The latter observation suggests that additional loci in the SodFM sequences must also regulate metal preference, which await identification. SodFM metal preference is probably modulated via fine tuning of the metal co-factor's redox potential 13,26,27,30 . Interestingly, in all X D-2 and Q/H Cterm mutants with significantly changed metal preference, activity with at least one metal decreased (Fig. 4i). Significantly increased activity with both metals at the same time was not observed in any of these mutants. This suggests that a substantial and concomitant increase in activity with both metals via X D-2 and H Cterm /Q Cterm /Q Nterm mutagenesis is impossible due to biophysical constraints. This is consistent with the 'redox tuning' hypothesis 31 under the assumption that structural changes that modulate the reduction potential of one metal should also change the reduction potential of the alternative metal within the same active site. This also indicates that SodFMs have evolved to maximize their metal-specific activity via X D-2 and Q/H mutagenesis, but within limitations enforced  Mn-specific activity U mg −1 total protein

Staphylococcus aureus SodM
Bacteroides fragilis CPR-His were mutated to those found commonly at this position in natural SodFMs (Fig. 2b). The average specific activities of the WTs and mutants loaded with either Mn (x axis) or Fe (y axis) were plotted as colour-coded circles. In S. aureus (a), residues most commonly found in Mn-preferring SODs, A D-2 and G D-2 (Fig. 2a,b), shifted metal preference towards Mn (orange circles), whereas those found mainly in Fe-preferring SODs, V D-2 and T D-2 (Fig. 2a,b), shifted towards higher Fe preference (red circles). In the cambialistic SodFM2 from B. fragilis (b), all mutants displayed lower activity with Mn and the majority had higher activity with Fe, representing change towards its most likely ancestral Fe-preferring phenotype. c, For Fe-preferring CPR Wolfebacteria SodFM1 (CPR-His), mutations in both the X D-2 and water-coordinating H/Q residues were necessary to shift towards the inferred ancestral higher Mn activity. The likely ancestral phenotype was proxied by the closely related SodFM1 (Fig. 2a) from more basal Wolfebacteria (Fig. 3c). the resolution limits of the solved structures. This provides further support for 'redox tuning' via subtle structural changes 13,34 . Better understanding of the residues that determine metal preference and its evolvability 35 in SodFMs can enable development of models describing how evolutionary changes in metalloprotein structure can shape their biophysical properties.

Discussion
Despite substantial and continuing progress in our understanding of protein functions and the evolutionary processes that shape them, a vast gap exists between the number of known natural protein sequences and the number of those that have been investigated experimentally. Bridging this gap is required to further our understanding of the fundamental mechanisms involved in the functioning and evolution of all living systems, and to apply this knowledge in medicine, biotechnology and synthetic biology. Although SodFMs are a well-studied protein family, with over 50 WT protein crystal structures available from various organisms, we were able to uncover additional diversity via application of bioinformatics and phylogenetics on a dataset of genomes sampled across the tree of life. This included the novel categorization of the SodFM protein family into five distinct subfamilies, the first comparison of multiple members of the largely unstudied group of SodFM4s, and identification of atypical isozymes such as the Fe-preferring SodFM1 from CPR Wolfebacterium. It also enabled us to identify multiple independent evolutionary events, both ancient and more recent, in which their key biophysical property of Fe/Mn metal preference was switched. S. aureus cambialistic SodFM1, involved in evasion of the mammalian immune response 12,13 , has emerged from the most recent of these switches, and is susceptible to modulation of its metal preference in both directions. This contrasted with the Agrobacterium Mn-preferring SodFM2.4, which descended from the most ancient identified metal-preference switch, where modulation via mutagenesis of the identified key residues was no longer possible. This is probably due to epistatic drift, which can de-correlate the later effects of mutations from their initial effects 29 . We also demonstrated that Agrobacterium Mn-SodFM2.4, and all other tested SodFMs active with Mn, are resistant to hydrogen peroxide inhibition, which could provide additional selective advantage, for example, to organisms involved in peroxide scavenging within microbial communities [36][37][38] .
Metal preferences of SodFMs are commonly misannotated in databases and cannot be predicted by sequence analysis or phylogenetics alone, but our application of both methodologies in combination improved prediction accuracy. Many other protein families include metalloproteins, and although metal preference has been established for some members of some families, its diversity has probably been frequently underappreciated. For example, the unexpected discovery of copper-only SODs, devoid of the requirement for zinc, among the Cu/Zn-dependent SOD family requires reassessment of the annotation of these distinct superoxide detoxification enzymes 39,40 . Similarly, a carbonic anhydrase isozyme from marine diatoms has evolved to utilize cadmium, in place of the usual Zn, as an adaptation to oceanic Zn deficiency 41 . In SodFMs, metal preference can be changed within short evolutionary timeframes. Although the selection pressures driving their evolution remain to be evaluated, we hypothesize that environmental conditions affecting intracellular metal homeostasis are probably frequent drivers. The same selection pressures would affect other metalloproteins, making the groups of organisms with signatures of evolutionary SodFM metal preferences changes, such as Alphaproteobacteria and Bacteroides, prime candidates for investigations of metal-preference evolution in other protein families. This could have particularly important implications in healthcare as metal homeostasis and correct protein metalation are important in human disease 9 and in microbial infections 6 , and have been suggested to be promising anti-microbial therapeutic targets 10 . Furthermore, the ability to engineer cambialistic synthetic proteins can enable the utilization of different metals to perform the same activity, probably providing economic and environmental benefits in biotechnological applications.
The existence of Fe-preferring, Mn-preferring and cambialistic SodFMs is well established, and their metal-preference changes were previously hypothesized as being related to ancient geological changes such as Earth's atmosphere oxygenation 11,14 and adaptation of pathogens to nutritional immunity 12,13 . Here we identified multiple additional metal-preference changes and demonstrated that, rather than switching between three discrete states of metal specificity, SodFM metal preference evolves by sliding on a continuous scale of cambialism ratio (CR). This evolutionary fine tuning probably occurs through the proposed biophysical mechanism of redox tuning 31 , a hypothesis supported by our biochemical and structural analysis of natural Sod-FMs and their metal-preference switched mutants. It was previously hypothesized that ancient ancestral SodFMs were cambialistic 14,42 , but in light of our data, we hypothesize that the ancient SodFMs would have been susceptible to metal-preference modulation, similarly to the extant homologues. A physiological role has only been previously demonstrated for one cambialistic SodFM, in S. aureus pathogenesis. Yet our data also indicate that on modern Earth, cambialism can be more common and more important than previously anticipated as, for example, in the Bacteroidales, cambialism has evolved multiple independent times from different ancestral backgrounds. However, the wider biological significance and frequency of cambialism and evolutionary metal-preference modulation in other protein families remains to be explored. Metals are indispensable for modern cellular life, probably played important catalytic functions at its origin, and their bioavailability has changed dramatically over the Earth's history. However, it is not always clear why one metal was selected over another to play a particular biological role or whether the selected metal can be easily changed for another metal with similar biophysical properties in the course of evolution. Our findings suggest that in the case of SodFMs, the preference for one of the two possible metal co-factors can and has been modulated multiple independent times in the course of the SodFM protein family's evolution, and remains susceptible to further changes in the extant family members.

Generation of ΔsodAΔsodB expression strains of E. coli BL21(DE3) and pLysSRARE2 (DE3) pLysS
The sodFM gene knockout (KO) strain of E. coli BL21(λ DE3) was generated following a standard protocol 43 with minor modifications. First, the sequences of sodA and sodB genes in the BL21 genome (GCF_000022665.1, ASM2266v1) were compared with that of the E. coli K-12 strain (GCF_000005845.2, ASM584v2) where successful gene KO strains were previously generated 44 . Specificity of designed primers with homology to the targeted genes was verified using local BlastN (-task blastn-short, e-value 0.1) 45 against the whole BL21 genome. E. coli BL21(λDE3) cells expressing λ Red recombinase (BL21-RED) from pKD46 plasmid were induced with 10 mM arabinose at OD 600 of 0.2 and made electrocompetent at OD 600 of 0.6 by three washes in ice cold 10% (vol/vol) glycerol. A kanamycin resistance cassette with recombination overhangs was amplified from pKD4 plasmid template with Q5 Hot Start DNA Polymerase (NEB), purified with Monarch DNA gel extraction kit (NEB), and electroporated (500 ng DNA, 250 ng μl −1 ) into 90 μl electrocompetent BL21-RED cells at 2.5 kV, 200 Ohm, 0.025 F for 4.5 ms (BTX ECM 630) in 0.1 cm electroporation cuvettes (VWR), followed by 2 h recovery in super optimal broth with catabolite repression (SOC) medium. All bacterial cultures (except recovery following electroporation in SOC medium) were grown in Luria-Bertani medium supplemented with 0.2% (wt/vol) glucose, with 1.5% (wt/vol) agar for solid medium. The remaining steps of the protocol were performed as described in the original publication 43 . The BL21(λDE3) ΔsodB mutant was used for subsequent generation of the BL21(λDE3) ΔsodAΔsodB Article https://doi.org/10.1038/s41559-023-02012-0 double mutant following the same protocol. Correct antibiotic resistance cassette insertions were verified using diagnostic polymerase chain reaction (PCR) with OneTaq Quick-Load 2× Master Mix (NEB). The gene deletion was verified using PCR with sodA(B)_screen_F(R) primers complementary to regions upstream and downstream from the deleted loci, and its phenotype confirmed using the in-gel SOD activity assay on soluble protein extracts from the WT and KO strains (Extended Data Fig. 10c). The BL21(DE3) pLysSRARE2 ΔsodAΔsodB strain was constructed by transforming the BL21(λDE3) ΔsodAΔsodB with pLysSRARE2 plasmid isolated from the commercially available ROSETTA2 (λDE3) pLysS strain (Novagen). All primers used for KO strain generation and verification were synthesized by Eurofins Genomics, with sequences provided in Supplementary Data 1.

Expressing SodFMs in the presence of Fe or Mn in 24-well culture plates
For screening of soluble expression and aCR verification experiments, E. coli BL21(λDE3) ΔsodAΔsodB cells transformed with pET-22b SOD expression constructs were cultured in 4-5 ml M9 medium (0.1 mM CaCl 2 , 2 mM MgSO 4 , 0.4% glucose, 0.2% casamino acids, 1 mM thiamine, 12.8 g l −1 Na 2 HPO 4 × 7H 2 O, 3 g l −1 KH 2 PO 4 , 0.5 g l −1 NaCl and 1 g l −1 NH 4 Cl) in 24-well culture plates (Whatman Uniplate) at 37 °C with 180 r.p.m. orbital shaking. Pre-cultures were grown for 16 h in M9 medium containing 100 μg ml −1 ampicillin without metal supplementation. Aliquots (200 μl) of the pre-cultures were used to inoculate 4 ml M9 medium with ampicillin (100 μg ml −1 ) supplemented with either 500 μM MnCl 2 or 100 μM Fe(NH 4 ) 2 SO 4 and cultured until OD 600 reached 0.4-0.6. At that point, additional Mn (1 mM final concentration) or Fe (200 μM final concentration) and isopropyl β-d-1-thiogalactopyranoside (1 mM final concentration) were added, and culture was continued for 4 h. Cultures in Fe-or Mn-supplemented media were always performed in separate 24-well plates to avoid metal cross-contamination. The final OD 600 was measured, and the 24-well culture plates were centrifuged for 10 min at 4,000g. Spent media were removed, and cell pellets were resuspended in cell lysis solution (20 mM Tris pH 7.5, 100 μg ml −1 lysozyme, 10 μg ml −1 DNAse I, 1× Roche cOmplete ethylenediaminetetraacetic acid (EDTA)-free protease inhibitors), volume normalized on the basis of their final OD 600 measurements and transferred to 1.5 ml Eppendorf tubes. E. coli cells were lysed by 10 s sonication (MSE Soniprep 150 with exponential probe at 10-14 μm amplitude) in a cold room and centrifuged for 20 min at 21,100g. The same protocol was followed for BL21(DE3) pLysSRARE2 ΔsodAΔsodB except for 16 h expression at 16 °C in M9 media containing 100 μg ml −1 ampicillin and 30 μg ml −1 chloramphenicol. The supernatants (soluble protein extracts) containing the expressed SodFMs were used in the subsequent bicinchoninic acid protein concentration assay (Pierce), liquid and in-gel SOD activity assays, and sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE). Metal purity of culture media were verified using inductively coupled plasma mass spectrometry (ICP-MS).

Protein expression and purification
Proteins were expressed in E. coli BL21(DE3) ΔsodAΔsodB in 1 l M9 medium supplemented with 200 μM Fe(NH 4 ) 2 SO 4 or 1 mM MnCl 2 and induced with 1 mM isopropyl β-d-1-thiogalactopyranoside for 4 h at 37 °C with 220 r.p.m. orbital shaking in 2 l baffled flasks. Bacterial cells were lysed by sonication of washed cell pellets in 20 mM Tris pH 7.5, 1× cOmplete EDTA-free protease inhibitor (Roche), 100 μg ml −1 lysozyme, 10 μg ml −1 DNAse I, followed by centrifugation at 19,000 g at 4 °C. Cleared cell lysates were subjected to chromatographic separation using an ÄKTA purification system (Cytiva). Standard protocol for most characterized recombinant SOD proteins consisted of purification using anion exchange chromatography (HiTrap Q HP column, Cytiva) in 20 mM Tris pH 7.5 buffer with 0-1 M NaCl linear gradient elution, and subsequent size exclusion chromatography in 20 mM Tris pH 7.5, 150 mM NaCl buffer on a Superdex 200 16/600 (Cytiva) column. For purification of B. fragilis SodFM2 and its mutants, 20 mM Tris pH 8.5 was used. The E. coli SodFM1 (SodA) was collected in the flowthrough during anion exchange chromatography and was subsequently subjected to additional cation exchange chromatography (5 ml SP FF column (Cytiva) in 20 mM MES pH 5.5, 0-1 M NaCl linear gradient elution), where it was also collected in the flowthrough. Metal content of protein extraction buffers as well as metal loading of the purified protein preparations were verified using ICP-MS.
Analytical size exclusion chromatography using a Superdex 200 10/300 increase column (Cytiva) was performed to estimate the molecular weight of protein preparations in non-denaturating conditions based on elution volume. The column was calibrated using a mixture of molecular weight standards ranging from 1,350 to 670,000 Da (Bio-Rad) and blue dextran (2,000 kDa, Sigma) in 20 mM Tris pH 7.5, 150 mM NaCl buffer.

SOD activity assays and (a)CR calculations
SOD activity was assessed quantitatively using the riboflavin-nitro blue tetrazolium (NBT) liquid assay 53 with minor modifications. The assay was performed in 96-well plates, in which 20 μl of a sample was mixed in each well with 180 μl assay solution (10 mM methionine, 1.4 μM riboflavin, 66 μM NBT and 10 μM EDTA in 50 mM potassium phosphate buffer, pH 7.8), immediately before 10 min incubation on a white light box followed by immediate measurement of absorbance at 560 nm. Serial twofold dilutions of tested samples were assayed to identify enzyme concentrations within the linear range of the assay that were close to 1 U in reference to a standard curve obtained using commercial bovine SOD standard (S5639, Sigma). To calculate specific SOD activities, the assay was performed in triplicate using samples diluted to the concentrations within the assay's linear range. CRs were calculated as a ratio of SOD activities of the verified Fe-loaded protein preparation divided by that of the verified Mn-loaded protein preparation. aCRs were calculated the same way but using SOD activity values of soluble protein extract samples from the 24-well culture experiments after lysis and centrifugation. CR values >0.5 were considered as evidence of higher Fe preference, CR <0.5 as evidence of higher Mn preference and enzymes with 0.5 < CRs < 2 were considered cambialistic. Article https://doi.org/10.1038/s41559-023-02012-0 For qualitative SOD activity assessment, approximately 15 μg (determined by bicinchoninic acid assay) of soluble protein extracts were separated by 15% (wt/vol) acrylamide native polyacrylamide gel electrophoresis (PAGE) and assayed using an in-gel assay as described previously 13 . The in-gel peroxide inhibition assay was performed as described previously 13 . For protein staining, samples were resolved on 15% (wt/vol) acrylamide SDS-PAGE gels and stained with Quick Coomassie (Protein Ark). All gels were imaged using a ChemiDoc imaging system (Bio-Rad), using the same settings (including exposure time) for all gels compared within a single experiment. Band intensity was quantified with Image Lab 6.1 software (Bio-Rad) using a rectangle volume tool to outline activity bands with local background subtraction. aCRs from the in-gel assay were calculated using the measured band intensities following the formula: aCR = (intFe − intFeH 2 O 2 )/ intMnH 2 O 2 , where intFeH 2 O 2 is band intensity of the Fe-loaded form following peroxide inhibition, intFe refers to the band intensity of Fe samples in non-peroxide-treated gels, and intMnH 2 O 2 is the band intensity of the Mn-loaded form following peroxide inhibition. Data available in Supplementary Data 8, 9 and 10.

Metal concentration verification using ICP-MS
Aliquots of protein extraction buffers or purified protein samples (20 μM) in 20 mM Tris pH 7.5, 150 mM NaCl were each diluted to 5 ml with 2% HNO 3 for elemental analysis. Elemental composition of the resulting acid solutions were quantified using an iCAP RQ ICP-MS instrument (Thermo Fisher Scientific) according to the manufacturer's specifications through comparison with matrix-matched elemental standard solutions, using In and Ir (10 μg l −1 ) as internal standards (University of Plymouth Enterprise Ltd).

Protein crystallography
Purified and concentrated protein samples were subjected to crystallization with commercially available matrix screens:  Structural comparison was performed for global secondary structure matching in Coot and for least square comparison (LSQ-kab) of metal-binding residues in CCP4i. To visualize and compare electron density between structures, the 2Fo-Fc map was displayed at 1σ level.

Bioinformatic protein family identification and characterization
Our previously described dataset of genomes sampled across the tree of life 13 was expanded with additional Bacteroidales and Alphaproteobacteria genomes to improve sampling of the lineages identified as metal-preference switching hotspots. The final dataset contained 3,058 genomes including 2,613 bacteria, 281 archaea and 146 eukaryotes (full list in Supplementary Data 4). Members of the SOD protein families were identified using PFAM 64

Generation of the tree of life
HMM profiles were generated from a curated set of 21 single-copy orthologues shared between bacteria, archaea and eukaryotes 74 . Sequences of the orthologues were downloaded from the supplementary material of the previous study of the universally conserved orthologues 74 , aligned with MAFFT 66 , trimmed with trimAL 67 (gappyout mode) and used for the generation of HMM profiles in HMMER3.3 65 . The profiles were subsequently used in hmmsearch (-E 1e-5) profile search 65 against the local database of 10,304,216 protein sequences encoded within 3,058 genomes sampled from across the three domains of life. Each of the identified orthologue groups were aligned with MAFFT 66 , trimmed with trimAL 67 (gappyout mode) and used for the generation of phylogenies in IQ-TREE 69 under LG + G4 model with 1,000 ultrafast bootstraps 68 . All alignments and phylogenies were manually inspected, eukaryotic organellar sequences were filtered out on the basis of the tree topologies and verified with BLASTP searches against the nr database, and seven widely distributed orthologous groups were selected for the final analysis. The final dataset contained the following eukaryotic (prokaryotic) orthologues: Ribosomal 40S S3 (Ribosomal 30S S3), Ribosomal 40S S15 (Ribosomal 30S S19), Ribosomal 60S L23 (Ribosomal 50S L14), Ribosomal 40S S11/14 (Ribosomal 30S S11), Ribosomal 40S S5/S7 (Ribosomal 30S S7), Ribosomal 40S S0/S2 (Ribosomal 30S S2) and RNA polymerase II subunit 3 (DNA-directed RNA polymerase subunit D). The trimmed alignments of the seven orthologues groups were concatenated with module Nexus of the Biopython package 75 to a final alignment of 1,097 amino acid sites, and was used to generate Article https://doi.org/10.1038/s41559-023-02012-0 the tree of life (Supplementary Data 7) with 1,000 ultrafast bootstrap 68 replicates in IQ-TREE 69 under LG + G4 model.

Ancestral sequence prediction and ancestral aCR state estimation
Ancestral sequence reconstruction (ASR) was used to infer the most likely residues at the identified metal-preference determination positions (X D-2 and H/Q Cterm or Q Nterm ) for the key ancestral nodes of SodFM1, SodFM2, SodFM3, SodFM4 and SodFM5 LCAs.
ASR was performed with the maximum likelihood phylogeny of 2,820 SodFMs (Supplementary Data 6) and the alignment used for the generation of the phylogeny (Supplementary Data 5). The reconstruction was performed using IQ-TREE 69 (-asr option) with the best fitting model used for the tree reconstruction (WAG + R10), and FastML 76 with WAG model. Both approaches gave similar results with only a second decimal place variation in posterior probability value between the two methods. Marginal and joint reconstruction in FastML predicted the same residues at the analysed positions of interest.
The possible ancestral aCR values for the key nodes of SodFM1 and SodFM2 ancestors were estimated using an IQ-TREE 69 maximum likelihood phylogeny of a representative sample of SodFMs with experimentally verified aCR values. The ancestral aCRs were estimated from the consensus phylogeny (of 1,000 ultrafast bootstrap 68 replicates) with a maximum likelihood approach implemented in fastAnc function of phytools package 77 and a Bayesian Markov chain Monte Carlo (MCMC) approach implemented in anc.Bayes 77 from the same package. To account for the uncertainty of the phylogenetic reconstruction, additional Bayesian estimation was performed in BayesTraits V4.0.0 (continuous random walk model A and MCMC analysis) 78,79 using 20,000 locally optimal trees (IQ-TREE -wt option) 69 .

Other bioinformatics methods
Mean sequence identities were calculated as an average of trimAL 67 pairwise identity comparisons (-sident) on MAFFT 66 alignments trimmed with trimAL 67 (gappyout mode). Sequence logos were generated with WebLogo3 80 and displayed so that the total height of each residue corresponds to the information content of the residue in the given position, and the relative size of multiple letters within each position corresponds to their relative frequency at this position. The phylogenetic trees were visualized and annotated with the protein presence/absence in the analysed genomes or with co-evolving amino acid residues using GraPhlAn 81 . Multiple sequence alignments were visualized and annotated using Jalview 82 . Phylogenetic trees were visualized and annotated using FigTree (http://tree.bio.ed.ac.uk/software/ figtree/), Archaeopteryx 83 .

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability
Structural data that support the findings of this study have been deposited in the Protein Data Bank with the accession codes 8AVK, 8AVL, 8AVM, 8AVN (Supplementary Data 16). There are no restrictions on any data within the paper. Source data are provided with this paper. Article https://doi.org/10.1038/s41559-023-02012-0 Extended Data Fig. 2 | SodFM protein family can be subdivided into five distinct subfamilies based on phylogenetics, protein sequence, and 3D structure comparative analyses. a. The rings, coloured to represent the presence of either of the three alternative water-coordinating residues, H Cterm / Q Cterm /Q Nterm , within SOD catalytic centres (three innermost rings) and the oligomerisation state identified in available crystal structures (two outermost rings), were mapped onto the maximum likelihood protein tree of all SodFMs identified in our sampling of genomes from across the tree of life (Extended Data Fig. 1). N-terminal Gln (Q Nterm : blue) is exclusively found in dimeric (green) SodFM2s, C-terminal water-coordinating Gln (Q Cterm : orange) is found mainly in dimeric SodFM1s and tetrameric (brown) SodFM5s, and C-terminal His (H Cterm : red) is most commonly (Fig. 2b) found in tetrameric (purple) SodFM3s and SodFM4s. b. An unrooted version of the SodFM protein tree presented in (a) c. Each cartoon represents a structural alignment of monomers of all SodFM1-5s with an available crystal structure in the protein data bank (PDB). The architecture of the two N-terminal α-helices (blue/turquoise/green) within monomers distinguishes the dimeric SodFM1s and SodFM2s from the tetrameric SodFM3s, SodFM4s, and SodFM5s. d. The tree represents a structural comparison of SodFMs based on crystal structures available in the PDB, generated using a Distance matrix alignment (DALI). The DALI analysis reveals that SodFM subfamilies identified through phylogenetics (b) and amino acid sequence analysis (a) also form distinct groups when only their 3D structures are compared. e. Maximum likelihood phylogenetic protein tree generated using amino acid sequences of the SodFMs with available crystal structures used in the DALI analysis (d). f. The fastAnc estimates of the ancestral aCR states (black numbers, and tree colour coding) were mapped onto a consensus phylogeny of a representative sample of extant SodFMs with experimentally verified aCRs. Ultrafast bootstrap support values were presented for the key nodes (grey numbers) g. The frequency distributions of the ancestral SodFM1 (left panel) and SodFM2 (right panel) aCR state estimates from Bayesian analyses were presented as density plots on a log scale. Bayesian MCMC estimates from anc. Bayes function (PhyTools, bottom panel) were estimated from the consensus phylogeny (f). To account for the uncertainty of the phylogeny (f) BayesTraits analysis (top panel) was performed using 20,000 locally optimal trees generated with IQtree. Fig. 8 | Signatures of metal-preference switching can be detected in amino acid correlation analysis of SodFM2 subfamily members. a. Residues identified in amino acid correlation analysis of SodFM2s were colour coded and mapped onto the SodFM2 protein phylogeny as separate semi-circles. Multiple amino acid correlation groups with distinct distribution patterns across the SodFM2 tree were identified. The alphaproteobacterial SodFM2.4s constituted the most distinct subgroup of SodFM2s, which grouped closely with eukaryotic chloroplast SodFM2.3s and Cyanobacterial SodFM2.2s. Alongside Bacteroidales SodFM2.6, SodFM2.4s also represented one of the two groups with strong patterns of evolutionary metal-preference switching (black elliptical outlines) identified in our dataset. Characteristically in these two groups, residues from the first correlation group (innermost magenta semi-circles) were often replaced with those from the second group (the innermost green semi-circle), one of which was the key metal-preference determinant X D-2 (Figs. 1c and 2c). The top ring (black) indicates SODs with available crystal structures. Species names and the outermost semi-circle (orange) indicated the SODs with aCR values experimentally verified in this study (Fig. 2a). b. Sequence logos of each SodFM2 sub-group, with amino acid correlation groups mapped onto the logos as coloured semi-circles. Within the analysed SodFM2 alignment residue X D-2 is at position 160. Black arrows indicated two highly distinct His residues identified in SodFM2.5, which contained many sequences from diverse Eukaryotic microbial pathogens including Plasmodium falciparum, Toxoplasma gondii, Trypanosoma brucei, and Cryptosporidium parvum. Arrowheads under each logo signify metal coordinating residues (magenta), catalytic water coordinating Q Nterm (orange), and the residue X D-2 (green).