The novel genus, ‘Candidatus Phosphoribacter’, previously identified as Tetrasphaera, is the dominant polyphosphate accumulating lineage in EBPR wastewater treatment plants worldwide

The bacterial genus Tetrasphaera encompasses abundant polyphosphate accumulating organisms (PAOs) that are responsible for enhanced biological phosphorus removal (EBPR) in wastewater treatment plants. Recent analyses of genomes from pure cultures revealed that 16S rRNA genes cannot resolve the lineage, and that Tetrasphaera spp. are from several different genera within the Dermatophilaceae. Here, we examine 14 recently recovered high-quality metagenome-assembled genomes from wastewater treatment plants containing full-length 16S rRNA genes identified as Tetrasphaera, 11 of which belong to the uncultured Tetrasphaera clade 3. We find that this clade represents two distinct genera, named here Ca. Phosphoribacter and Ca. Lutibacillus, and reveal that the widely used model organism Tetrasphaera elongata is less relevant for physiological predictions of this uncultured group. Ca. Phosphoribacter incorporates species diversity unresolved at the 16S rRNA gene level, with the two most abundant and often co-occurring species encoding identical V1-V3 16S rRNA gene amplicon sequence variants but different metabolic capabilities, and possibly, niches. Both Ca. P. hodrii and Ca. P. baldrii were visualised using fluorescence in situ hybridisation (FISH), and PAO capabilities were confirmed with FISH-Raman microspectroscopy and phosphate cycling experiments. Ca. Phosphoribacter represents the most abundant former Tetrasphaera lineage and PAO in EPBR systems in Denmark and globally.

, including resolution of the two midas_s_5 ASV1 species (Ca. P. baldrii and Ca. P. hodrii, represented by Pbr1_AalW_b524 and Pbr2_Ega_b001). Relative abundances were determined using stringent mapping of the reads (95% identity and 75% alignment) as in [3]. Metagenomes are shown across the x axis, lines indicate changes to different WWTPs, and three time points are shown for each WWTP. The percentage relative abundance from the reads that mapped compared to the metagenome is shown on the y axis. The axis is square root transformed. MAG IDs follow the naming in Accumulibacter and Dechloromonas, partly based on experiments presented elsewhere from the same WWTP and the same sample [4]. A) Bulk ortho-P concentration during the anaerobic Prelease experiments with activated sludge from the Aalborg West WWTP. B) Total intracellular polyphosphate measured as an average of 1,500 randomly selected microbial cells by Raman microspectroscopy in initial oxic samples (0 h) and after anaerobic P-release (3 h). FISH probes covered all Ca. Accumulibacter (FISH probe PAO651), most "Tetrasphaera" (FISH probe Actino658), including Ca. Phosphoribacter hodrii and Ca. Phosphoribacter baldrii (see Figure 5), and Dechloromonas (FISH probe Bet135). Mass-balances based on cell count and polyP content showed that "Tetrasphaera" constituted approx. 22% of the total polyP, while Ca. Accumulibacter and Dechloromonas constituted approx. 14% and 6%, respectively [4].

Supplementary Tables
Supplementary Table 1 Table 2. Probes used in this study. * Taxonomy and coverage of groups is defined as an in the MiDAS 4.8 database [2]. Values given as group hits/ group totals; ** Recommended optimal formamide concentration for use in FISH hybridisations; N/A -not applicable. # The probes hybridise to the 23S rRNA of the target organisms.

Fluorescence in situ hybridisation (FISH)
An optimal formamide concentration was determined for each novel FISH probe after carrying out hybridisation at different formamide concentrations (0-70% with increments of 5%). Where available, suitable pure cultures having defined mismatches in the rRNA probe target region were obtained from DSMZ and applied in the optimization process. Sanguibacter suarezii (DSM10543), Lactobacillus reuteri (DSM20016) and Janibacter melonis (DSM16063) were used to assess the need of the specific unlabelled competitor probes Tetra67_C1, Actino221_C3 and Tetra732_C1, respectively. If suitable pure cultures were not available, hybridisation conditions for probes were optimised by selecting activated sludge biomass with a high abundance of the target organism predicted by amplicon sequencing. Microscopic analysis was performed with Axioskop epifluorescence microscope (Carl Zeiss, Germany) equipped with LEICA DFC7000 T CCD camera or with white light laser confocal microscope (Leica TCS SP8 X). The intensity of at least 50 cells at each formamide concentration was measured with ImageJ [5]. Optimal hybridisation conditions and details on the coverage and specificity of the FISH probes can be found in Supplementary Table 2. The EUBmix probes [6,7] and the NON-EUB probe [8] were used to target all bacteria and as a negative control for sequence independent probe binding, respectively. For multicolor FISH, 30% formamide concentration was selected to obtain optimal signal intensity, as it was experimentally determined to be the optimal for all the probes used in the experiment, except one (Actino658). To avoid nonspecific binding of the latter, a sample with no organisms with 16S rRNA genes with less than three mismatches to the probe was selected.

Raman microspectroscopy
FISH was conducted on optically polished CaF2 Raman windows (Crystran, UK). Cells with probe conferred fluorescence were located with a 50× dry objective (Olympus M Plan Achromat-Japan) of the in-built Olympus (model BX-41) fluorescence microscope. After bleaching of fluorophore-derived Raman signals, Raman spectra from single-cells were obtained using a Horiba LabRam HR 800 Evolution (Jobin Yvon -France) equipped with a Torus MPC 3000 (UK) 532 nm 341 mW solid-state semiconductor laser.
The specific settings for the spectrophotometer were: 5% neutral density (ND) filters, 600 mm/groove diffraction grating, 100 µm and 72 µm slidth width and confocal pinhole, respectively. Raman spectra collected spanned the wavenumber region of 200 cm -1 to 3000 cm -1 . The Raman spectrometer was calibrated prior to obtaining all measurements to the first-order Raman signal of Silicon, occurring at 520.7 cm -1 . Raman spectrometer operation and subsequent processing of spectra were conducted using LabSpec version 6.4 software (Horiba Scientific, France). Absolute quantification of intracellular poly-P was carried out as described previously [9]. The method assumes that the intensity of the Raman signal is directly dependent on the amount of the analyte in a determined area. An average amount of poly-P per cell was calculated as a factor of a constant determined during calibration for poly-P, the average charge-coupled device (CCD) counts determined during the experiment, and the average area of cells measured by image analysis [9].

Carbon sources, processing and adaptations to anaerobic growth
Transporters predicted for xylose, ribose and glucose were not widely distributed, particularly in clade 1, 2 and 3 Tetrasphaera groups (Supplementary Data Files 6 & 7). P. duodecadis was an exception encoding a range of sugar transporters, and T. australiensis also has the potential for fructose and xylose import.
However, three additional ABC sugar transporters, two putative and one multiple sugar transporter, are encoded in the former Tetrasphaera isolate genomes and MAGs (Supplementary Data Files 6 & 7). As previous isolates have shown growth on glucose [10], it is likely these transporters facilitate import of a range of simple sugars such as glucose and xylose [11]. Fructose is another potential carbon source for the Ca. Phosphoribacter group, which encoded a PTS sugar transporter subunit IIABC component (fruB, KO number: K02768), a 1-phosphofructokinase (fruK K00882) and a fructose operon transcriptional repressor (fruR K03436) all adjacent to each other. Glycerol 3-phosphate could also potentially be used as a carbon and phosphate source by Pbr3-6 based on the presence of an ABC transporter encoded by the ugpABE (K05814, K05813, K05815) and malK (K10112) genes, and processed via glycolysis (Figure 3) [12].
The potential for beta-oxidation was also widely distributed across the TRC. While transporters for longchain fatty acids (fadL) were missing, the acyl-CoA synthetase (K01897 fadD) specific for C6 to C18 fattyacid biosynthesis or degradation was present in 68 of the 69 TRC genomes as were the genes for beta oxidation (K00249 acd, K01782 fadJ, K00632 fadA) [14]. However, the beta oxidation enzymes overlap with those involved in isoleucine, valine or leucine degradation, consequently it is difficult to determine whether long-chain fatty acids are degraded or only synthesised. The glyoxylate cycle was complete (with K01637 aceA isocitrate lyase) in only two genomes, Terracoccus luteus and the MAG GCA-2748155.
Fermentation of substrates to acetate, lactate, alanine and succinate has been determined in the former Tetrasphaera isolates either through experimental measurements or based on genomic potential [10]. The clade 3 Ca. Phosphoribacter (midas_s_5) MAGs also encode the pyruvate:ferredoxin oxidoreductase (porA K00169, porB K00170), which works to convert pyruvate to acetyl-CoA under anoxic or microoxic conditions [15,16], indicating an adaptation to oxygen limited environments. All former clade 3 and nearly all TRC MAGs (58/69) encoded the alanine dehydrogenase (ald K00259) for the reduction of pyruvate to alanine. This action is reversible and potentially involved in maintaining redox balances and could facilitate anaerobic growth or alanine use similar to other Actinobacteriota [17]. However, only the Ca. P. hodrii and Pbr3 MAGs encoded the full fermentation to acetate pathway, which is missing in the other Ca.
Phosphoribacter MAGs (missing pta and ackA) (Figure 3). Most TRC MAGs (61/69), including clade 3, encode the cytochrome bd oxidase (cydA K00425, cydB K00426), less efficient but better suited to low oxygen conditions than the cytochrome c oxidase [18], again indicating versatility suited to fluctuating oxygen conditions in EBPR systems. Clade 3 MAGs were enriched for all three genes encoding standard formate dehydrogenase (fdoGHI, K00123, K00124, K00127) and an operon encoding many subunit of a putative formate-hydrogen lyaselike complex (hycE K15830, hyfFECB K12141, K12140, K12138, K12137, mbhJ K18023 and arsR K03892), indicating they have the capacity to dissipate formate that may accumulate from anaerobic fermentation, producing CO2 and hydrogen [19]. Genes for indolepyruvate oxidoreductases (iorA K00179, iorB K00180) were also present in 8/10 clade 3 MAGs but were absent from nearly all of the remaining TRC (Figure 3). These typically oxygen sensitive enzymes may be used during anaerobic aromatic amino acid fermentation [20]. Overall, the clade 3 populations have various adaptations for anaerobic and/or microaerophilic growth compared to the former Tetrasphaera isolates, which indicates differences in their potential to utilise organics, and possibly fulfil unique nutrient niches in doing so.

Nitrogen cycling
Nitrogen cycling is an important target for optimisation and sustainability improvements in WWTP [21].
Consequently, we examined the distribution of nitrogen metabolism genes across the TRC. The potential for nitrate reduction to ammonia was prevalent in the Knoellia, Janibacter, Pedococcus and Terrabacter Nitric oxide reductase (NorBC K04561, K02305) was missing in all TRC genomes, and nitrous oxide reductase (NosZ K00376) was missing in all but one (UBA4719 sp002404345), showing the group is devoid of complete denitrifiers. Overall, the differences in nitrate and nitrite reduction potential across the group indicate some niche differentiation for respiration under anaerobic conditions using nitrate or nitrite as electron acceptors.

Polyphosphate accumulation
We examined the prevalence of genes important for, but not limited to, polyphosphate accumulation and storage. These genes were identified widely across the TRC (Figure 3). Nearly all MAGs encoded the lowaffinity phosphate transporter Pit (K03306). The high affinity phosphate transporter encoded by PstSCAB (K02040, K02037, K02038, K02036) was also prevalent across the TRC, but less widespread than Pit, and missing in a few of the MAGs. One MAG in each of the Ca. P. baldrii and Ca. P. hodrii species missed the PstSCAB, but as two of the three MAGs in each species cluster encoded it, an absence could be due to genome incompleteness or indicate strain variation. At the genomic level, polyphosphate accumulation appears possible for many members of the Dermatophilaceae, but the environmental conditions likely determine the storage and cycling phenotype, and as always experimental evidence is required for confirmation of this metabolic trait (see below).

Glycogen, PHA and amino acid storage
Additional storage compounds, such as glycogen and PHA, are believed to be integral to the PAO phenotype by providing energy for polyP accumulation during aerobic conditions [10]. None of the TRC genomes encoded all genes for PHA synthesis (PhaABC, K00626, K00023, K03821), with PhaA and PhaB or PhaA and PhaC found in only 15 of the 69 genomes (Figure 3). Two Ca. P. hodrii MAGs and T. japonica encoded PhaA and PhaC, and PHA has been detected using gas chromatography in T. japonica [10], suggesting that Ca. P. hodrii may also be capable of PHA storage, however no PHA was detected experimentally (see below). Previous genome studies predicted Tetrasphaera produced glycogen as an energy storage compound [10], although recent work showed glycogen was not detectable in individual FISH-defined Tetrasphaera cells by Raman microspectroscopy in activated sludge samples from EBPR plants [9]. Genes for glycogen synthesis were identified in many TRC genomes (Figure 3), however we propose that the TRC may synthesize glycogen-like α-glucan polysaccharides as cell-wall capsular material, similar to other Gram-positive Actinobacteriota, rather than glycogen for storage.
Clustered among several genes previously assigned for glycogen synthesis in P. elongatus (i.e., glgB, glgP, glgX, glgY) [10], we identified several genes encoding enzymes for trehalose and maltose conversions, which together resemble the 'TreS-Pep2-GlgE' pathway for capsular glycogen synthesis in Mycobacterium tuberculosis [22,23] (MetaCyc pathway 'glycogen biosynthesis III'). These gene complements were present in all novel MAGs and former Tetrasphaera isolates. Considering that the TRC are also Grampositive Actinobacteriota, it is likely TRC bacteria also produce capsular polysaccharides using this pathway or a variation thereof.
Proteasomes could give the clade 3 lineages an advantage over Ca. Accumulibacter, enabling them to recycle resources from proteins and potentially respond quickly to challenging and fluctuating conditions [28], such as those in WWTPs.

Supplementary Note 2 -Difference between the most abundant species Ca. L. badrii and hodrii extended discussion
The metabolism unique to Ca. P. baldrii (and Ca. Lutibacillus vidarii in the TRC) is the potential use of ethanolamine using an ethanolamine utilisation operon eutNABCLEMQJ with an alcohol dehydrogenase and an araC (eutR -MAGE) family transcriptional regulator [29] (Figure 4). This operon is much longer than those previously detected in Actinobacteriota, which normally comprise only eutBC (K03735, K03736) and a transporter [30]. Ethanolamine is present in the membranes of all living cells as the lipid phosphatidylethanolamine, and would be readily available in the AS system, and able to diffuse across cell membranes at a neutral pH or via the EutH transporter [29]. Ethanolamine is a source of both carbon (acetaldehyde) and nitrogen (ammonia) and is likely processed in an organelle-like microcompartment that would contain the toxic and gaseous acetaldehyde [30]. Potentially, the acetaldehyde dehydrogenase (eutE K00132) can process acetaldehyde to acetyl-CoA, which can be used in the TCA cycle. The maintenance cost of such a complex operon and metabolism is high [29], suggesting that this pathway is used and could differentiate the Ca. P. baldrii niche from the similarly abundant Ca. P. hodrii.
Ca. P. hodrii encodes several metabolic pathways distinct from Ca. P. baldrii. These included the capacity for assimilatory sulfate reduction and siroheme biosynthesis (ssuBC K02049-K02050, cysN K00956, cysD K00957, cysH K00390, sir K00392, cysG K02302), as well as a long protoheme synthesis operon (hemABCDEHL, K02492, K01698, K01749, K01719, K01599, K01772, K01845). The potential to use the sugar N-acetylglucosamine as a carbon and nitrogen source, similar to T. remsis [31], is suggested by the presence of the N-acetylglucosamine-6-P deacetylase and deaminase (nagA K01443, nagB K02564), PTS transporter genes (nagE K02802-K2804) and YvoA (K03710) regulator in all three MAGs. Use of this sugar is uncommon in WWTP microorganisms, indicating a distinct niche for this population [32]. This species also encoded the potential for aerobic acetate production from acetyl-CoA, acetate uptake, or fermentation of pyruvate to acetate via the pta (K13788) and ackA (K00925) genes, both of which were missing in the Ca. P. baldrii MAGs (Figure 4). Aerobic acetate production results from an overflow metabolism during exponential growth in Escherichia coli K12, which is hypothesised to be a consequence of reaching metabolic capacity limits in the TCA cycle, respiratory chain, or acetyl-CoA concentrations [33]. Under anaerobic conditions the Pta and AckA can act in reverse to produce acetate and ATP from acetyl-CoA [13], thereby suggesting this population inhabits a different anaerobic niche to Ca. P. baldrii.

Supplementary Note 3 -FISH details
Genus-and, when possible, species-specific FISH probes (Supplementary Table 2) were designed to cover the most abundant species in each clade, showing a variety of different morphologies. The existing FISH probes Actino658 and Actino221 [34] target with high specificity and good coverage part of Ca.
The FISH probe Phos741 was designed to cover the remaining part of Ca. Phosphoribacter (Pbr4, Pbr5, Pbr6) and hybridised to rod-shaped bacteria cells, with similar morphology as Actino658 (Supplementary Figure 6D). The FISH probe Phos601, which targets Ca. P. hodrii-related sequences (Supplementary Figure 6A), was the only 16S rRNA species-specific probe that was possible to design and optimise for this microorganism and its application confirmed the morphology already observed with Actino658. Two additional probes, Phos1260-23S-Pbr1 and Phos1260-23S-Pbr2, targeting 23S rRNA, were designed to specifically distinguish between Ca. P. baldrii and Ca. P. hodrii (Supplementary Table 2) and both hybridised with rod-shaped cells. Specificity of the species-specific probes was assessed by overlap with broader probes (Supplementary Figure 6B-C).
Application of the FISH probes with Raman microspectroscopy revealed the presence of polyP in all the genera/species ( Table 2). No other storage polymers have been detected in situ, as previously observed [4,9]. In order to quantify and explore the dynamics of polyP in the Ca. Phosphoribacter, we performed anaerobic-aerobic P-cycling experiments with fresh activated sludge from a full-scale EBPR plant. Different carbon sources (acetate, glucose and casamino acids) were used during the anoxic phase as Tetrasphaera is known to use amino acids or sugars as substrates for P release under anoxic conditions [9,35]. In situ quantification of polyP was performed for the two most abundant species, Ca. P. baldrii and Ca. P. hodrii. Both species exhibited dynamic cycling of intracellular polyP (Figure 5C), higher after the oxic phase and substantially decreasing after the anoxic phase, with small variations between the two species. The highest value was measured for Ca. P. hodrii (1.82*10 −14 g P cell −1 ), while Ca. P. baldrii accumulated 1.68*10 −14 g P cell −1 (Figure 5B). These values are in complete accordance with polyP contents measured with the same method but using the broader FISH probe Actino658 [4].