Marine Dadabacteria exhibit genome streamlining and phototrophy-driven niche partitioning

The remineralization of organic material via heterotrophy in the marine environment is performed by a diverse and varied group of microorganisms that can specialize in the type of organic material degraded and the niche they occupy. The marine Dadabacteria are cosmopolitan in the marine environment and belong to a candidate phylum for which there has not been a comprehensive assessment of the available genomic data to date. Here in, we assess the functional potential of the marine pelagic Dadabacteria in comparison to members of the phylum that originate from terrestrial, hydrothermal, and subsurface environments. Our analysis reveals that the marine pelagic Dadabacteria have streamlined genomes, corresponding to smaller genome sizes and lower nitrogen content of their DNA and predicted proteome, relative to their phylogenetic counterparts. Collectively, the Dadabacteria have the potential to degrade microbial dissolved organic matter, specifically peptidoglycan and phospholipids. The marine Dadabacteria belong to two clades with apparent distinct ecological niches in global metagenomic data: a clade with the potential for photoheterotrophy through the use of proteorhodopsin, present predominantly in surface waters up to 100 m depth; and a clade lacking the potential for photoheterotrophy that is more abundant in the deep photic zone.


Introduction
Heterotrophy in the marine environment is a complex process with many organisms contributing to the remineralization of organic matter. In the surface ocean,~50% of new organic carbon is remineralized by heterotrophs within the first 100 m [1,2]. Despite the importance of this process to the overall ocean carbon budget, the specific contributions of the phylogenetically diverse marine bacterioplankton community remain poorly constrained. The metabolic capacity of the community members directly governs the types of organic material that can be degraded in a particular environment [3]. Heterotrophs occupy a spectrum of metabolic diversity and growth strategies [4]. While copiotrophs exploit multiple organic resources and/or undergo rapid growth in response to nutrient availability, oligotrophs specialize in a limited number of resources and dominate in low nutrient environments [5]. Because of the interplay of heterotrophs on this spectrum of metabolic diversity, it is important to understand the role(s) that specific groups play in the degradation of organic matter in the surface ocean.
An evolutionary feature that has been observed among marine oligotrophs is the reduction and simplification of the genome. This evolutionary trajectory has been posited as the theory of genome streamlining, in which organisms that grow in nutrient limited environments undergo selection to reduce cellular demand for specific compounds and nutrients [6]. While originating in the marine environment [7,8], genome streamlining has been identified in numerous habitats for a variety of microorganisms [9][10][11][12]. Streamlined genomes will tend to have smaller genome sizes as a result of increased coding density and a decreased number of paralogs/gene duplication events, which overall reduce cellular demand for nutrients [13]. Additionally, in nitrogen-limited environments, streamlined genomes may reduce the contribution of nitrogen to the DNA by decreasing genomic GC content and the proteome through the selection of amino acids with side chains that contain fewer nitrogen atoms [13]. The theory of genome streamlining is an important avenue for understanding microbiology and provides important insights into the evolutionary history and ecological distributions of a microorganism.
Here in, we assess the potential contributions of the Dadabacteria to marine heterotrophy. A phylum level group phylogenetically clustered near the phyla Campylobacteria, Aquificota, and Deferribacteres. The Dadabacteria (formerly SBR1093) lack a cultured representative and have not been extensively assessed for their potential contributions to biogeochemical cycles though they have been detected in numerous environments. The first Dadabacteria genome was reconstructed from industrial activated sludge and reported to possess the capacity for carbon fixation through the 3hydroxybutyrate/4-hydroxypropionate cycle [14]. Interestingly, multiple Dadabacteria metagenome-assembled genomes (MAGs) were reconstructed from the Tara Oceans global, marine metagenomic samples, though their exact role in the marine environment was unknown [15][16][17]. Our analysis reveals that the marine Dadabacteria are likely heterotrophic oligotrophs that have undergone genome streamlining with the capacity to degrade microbially derived peptidoglycan as a carbon source with further metabolic diversification between shallow and deep photic zone niches.

Functional annotation
For functional annotation and evidence of genomic streamlining, due to the limited number of available MAGs, all genomes were considered during the analysis. Dadabacteria MAGs were assessed for putative metabolic functionality through the FuncSanity workflow of the tool MetaSanity [28] (beta version;v1). All downstream analyses use the putative CDS (coding DNA sequences) as predicted by Prokka (v1.13.3) [34]. Putative CDS were assigned to carbohydrate-active enzyme (CAZy) families based on HMMs (hidden Markov models) from dbCAN (v6) [35] using hmmsearch (v3.1b2; parameter: -T 75) [36]. The output from MetaSanity that combines the CAZy matches for all submitted genomes (MetaSanity output file: combined.cazy) was used to determine the number of CAZy matches per Mbp in each MAG, including a curated selection of glycoside hydrolases (GH) and carbohydratebinding module (CBM) containing proteins and excluding matches to CAZy subfamily HMMs (e.g., matches to GH13 model were included, while matches to GH13_9 model, etc. were excluded).
Metabolic functions of interest were identified based on the KEGG-Decoder [25] output (v1.0.10) as implemented in MetaSanity (MetaSanity output file: KEGG.final.tsv). As part of this workflow, CDS were assigned to KEGG Ontology (KO) identifiers using KofamScan (v1.2.0) [41] and the accompanying KOfam HMMs. KO annotations were then assigned to a set of manually curated pathways and processes. Additionally, metabolisms of interest, especially those lacking KOfam HMMs, were searched independently and incorporated using KEGG-Expander as implemented in MetaSanity.

Genomic streamlining
Putative CDS were used to calculate the total number of carbon and nitrogen atoms present in the predicted proteome and the corresponding ratio of each MAG (https://github. com/edgraham/CNratio). For identifying duplicate genes in a MAG, first, all putative CDS in a MAG was compared against each other using DIAMOND BLASTP [47] (parameters: --more-sensitive -max-taget-seqs 300). BLAST matches were filtered using the minbit approach [48], where significant matches were determined based on the relative comparison of bitscore values. Minbit was calculated for protein A compared to protein B, as in Eq. (1), retaining all BLAST matches ≥0.5. BLAST matches above this threshold were reformatted and clustered using MCL [49] (mcxload parameters: --abc --stream-mirror --stream-neg-log10 -stream-tf ceil(200); mcl default parameters; mcxdump parameter: -icl). All clusters in the mcxdump output were considered to be gene duplication events within the MAG.

Ecological distribution and environmental correlations
For determining the ecological distribution and environmental correlations, a nonredundant set of MAGs was determined using FastANI [50] (v1.3; parameters: --frag-length 1500) with a representative selected from a cluster of genomes with ≥98.5% average nucleotide identity [51]. Metagenomes derived from bioGEOTRACES [52] (bGT) and Tara Oceans [53] were mapped against the nonredundant set of Dadabacteria genomes using bowtie2 [54]

Results and discussion
As a candidate phylum, a broad understanding of the ecological role of the Dadabacteria has remained elusive due to the limited amount of metabolic information available for the clade. Based on the phylogenetic reconstruction of 48 MAGs (mean ± s.d. completeness 75.72% ± 17.77% and contamination 1.85 ± 1.48%; Fig. 1a; Supplementary  Table 1), the phylum partitions into three distinct clades which share common environmental features: hydrothermal systems (terrestrial hot springs and hydrothermal vents), organic carbon-associated systems (the terrestrial subsurface, oil-polluted marine systems, marine sponges, marine sediment, and hydrothermal vent sediments), and marine pelagic systems. Within the "marine pelagic" clade, there are two distinct subclades, designated as marine pelagic clade I and II. The marine pelagic clades harbor genomic features that differentiate them from the other clades, specifically with regards to genomic evolutionary selection (e.g., streamlining) and putative metabolisms.
The pelagic marine Dadabacteria have undergone a genome streamlining process in comparison to the organic carbon-associated and hydrothermal lineages. The marine pelagic Dadabacteria exhibit all five traits associated with genome streamlining: reduced genome size, decreased % GC content, increased C/N ratio in the predicted proteome, increased coding density, and limited/no gene duplication events (Fig. 1b-  KEGG are displayed on a scale for 0-1, as a fraction of a particular metabolism detected. MAGs abbreviations: TOBG from Tully et al. [15]; TMED from Tully et al. [24]; TARA from Delmont et al. [17]; MED from López-Pérez et al. [69]; UBA from Parks et al. [16]. b A scatter plot of percent G + C (%G + C) and approximate complete genome size in megabase pairs (Mbp) for each Dadabacteria MAG. c A scatterplot of putative proteome carbon-to-nitrogen content ratio and percent coding density for each Dadabacteria MAG. d The number of duplicate gene events in each Dadabacteria MAG. is~1.22 Mb (± 0.05 95% CI) with >96% coding density, smaller in size and similar in coding density to the wellstudied marine SAR11 clades [8,60]. The presence of the Dadabacteria MAGs reconstructed from multiple oligotrophic Tara Oceans regions would suggest that these organisms, like other oligotrophs, are adapted to environments with low nutrient concentrations [6] ( Supplementary  Fig. 1). Modifications in GC content and proteome C/N ratio are associated with lowering the nitrogen demand for organisms in nitrogen-limited environments [6]. While small genomes, devoid of paralogs and with high coding density, are thought to have reduced energy requirements for division and growth. These genomic modifications which confer an advantage in oligotrophic marine environments are the result of changes in selection pressure that occurred at the transition between the marine pelagic and hydrothermal/organic carbon-associated Dadabacteria clades [61,62]. These results provide further evidence that the theory of genome streamlining is a common evolutionary response to organisms that undergo a transition from nutrient rich to nutrient poor environments [63].
While the SBR1093 MAG was implicated in carbon fixation via the 3-hydroxypriopinate/4-hydroxybutyrate cycle [14], analysis of the Dadabacteria phylum reveals, especially for the marine pelagic clades, a predominantly heterotrophic lifestyle (Fig. 1a). Except for the SBR1093 MAG, none of the publicly available Dadabacteria MAGs have the potential for carbon fixation (Supplementary Table 2). Several MAGs from the hydrothermal and organic carbon-associated clades have the potential to interface with the nitrogen and sulfur cycles with metabolic processes involved in denitrification, dissimilatory nitrate reduction to ammonia (DNRA), sulfate reduction, sulfide oxidation, and the production of dimethylsulfoniopropionate (DMSP) (Fig. 1a). However, while both marine pelagic clades lack these particular metabolic pathways, all four clades share in the potential for the heterotrophic degradation of proteins and complex carbohydrates, including starch/glycogen (βglucosidase and α-amylase). One consistent target for the extracellular peptidases (LysM) and carbohydrate-active enzymes (CAZymes; peptidoglycan lyase and CBM Family 50) across the Dadabacteria clades is peptidoglycan, the polymer of the microbial cell wall. It may be possible that these predicted proteins are responsible for the internal recycling of the cell wall during cell division or an indication that the Dadabacteria occupy a niche capable of recycling microbially derived dissolved organic matter (DOM).
Interestingly, the number of extracellular peptidases, CAZymes, and ATP-binding cassette-type (ABC-type) transporter components normalized for MAG length across all four clades remains consistent even as the overall diversity within each group of proteins decreases ( Fig. 1a; Supplementary . This may highlight an interplay between heterotrophic metabolic diversity and changes in carbon utilization as Dadabacteria genome size decreases during streamlining. Additionally, there are several other metabolic processes that distinguish the four clades and highlight the divide between the hydrothermal and organic carbon-associated clades and marine pelagic clades. Specifically, for the hydrothermal clade, the prevalence of CRISPR-associated proteins (used as proxy for CRISPR arrays due low recovery in MAGs), motility, and twocomponent regulatory chemotaxis suggest that both avoidance of viral predation and physical adjustments within the hydrothermal environment are important evolutionary advantages ( Supplementary Tables 2 and 5). Distinct for the hydrothermal and organic carbon-associated clades, are the presence of phosphonate and phosphate ABC transporters, the Entner-Doudoroff pathway, an alternative pathway to glycolysis for glucose degradation, and a Type II secretion system (Supplementary Tables 2 and 6). In many marine systems, phosphorous, like nitrogen, can be a limiting resource. All four clades possess ABC-type phospholipid transporters (Supplementary Table 6), so while most of the marine pelagic clades (63%) lack phosphonate and phosphate transporters, the presence of phospholipid transporters suggest these organisms may recover phosphorous for cellular demand from DOM.
The marine pelagic I and II clades have several distinguishing metabolic properties. Potentially most importantly are the mechanisms related to utilizing light energy. Uniquely amongst the Dadabacteria, the marine pelagic I clade possesses rhodopsins and the biosynthetic capacity for retinal synthesis (Fig. 1a). Based on the present amino acids, it is predicted that all of the identified rhodopsins are H + -pumping proteorhodopsins [64] (Supplementary  Table 7). For the eight identified proteorhodopsins within the marine pelagic I clade, all but one are predicted to be spectrally tuned to absorb blue light [65,66] (Supplementary Table 7). The marine pelagic I clade also has the capacity to produce terpene secondary metabolites (Supplementary Table 8). Terpenes are organic hydrocarbons that have been shown to be associated with carotenoid synthesis [67]. These terpenes may be related to the production of β-carotene, a biological precursor to retinal, or to production of other unidentified carotenoids (Supplementary Table 6). The marine pelagic II clade lack proteorhodopsins, retinal biosynthesis, and terpene secondary metabolites (Fig. 1a). Like all other Dadabacteria clades, the marine pelagic clades possess starch/glycogen and peptidoglycan degradation mechanisms may suggest that these heterotrophic processes are the predominant avenues for energy acquisition.
The metabolic division based on the utilization of light via proteorhodopsins between the marine pelagic clades is reflected in the ecological distribution of the clades. Using a nonredundant set of the marine pelagic Dadabacteria MAGs, the large global metagenomic datasets (Tara Oceans and bGT) were mapped against the MAGs and used to assess where the Dadabacteria occurred through the water column (Supplementary Tables 9 and 10). The two datasets have distinct properties that allow for varying perspectives on the ecology of the Dadabacteria. Tara Oceans is globally distributed with multiple size fractions and samples from the mesopelagic, while bGT provides several high-resolution cruise tracks with multiple depths between the surface and~250 m depth. The results from Tara Oceans demonstrate that, broadly, the marine clades are present in the planktonic size fraction (<3 μm) and almost exclusively found in the epipelagic (Supplementary Fig. 2).
As exemplified by the GA03 cruise track in the North Atlantic, the resolution provided by bGT reveals that the marine pelagic I and II clades tend to be dominant above and below~100 m depth (~1% light level), respectively, and that this niche transition can be sharp, with the marine pelagic I clade dropping to a negligible component of the microbial community at this partitioning depth ( Fig. 2;  Supplementary Table 11). This relationship can be observed for the other three cruise tracks, station ALOHA (Hawaii Ocean Time-series), and hydrostation S (Bermuda Atlantic Time-series) with some localized variation, potentially due to surficial mixing and/or downwelling/upwelling events, where the marine pelagic II clade can be found at the surface and the marine pelagic I clade can be found at 250 m. However, for many of the sampling stations there remains a divide between the two clades at the~1% light depth ( Supplementary Figs. 2 and 3). Canonical correspondence analysis (CCA) of the GA03 environmental parameters support this niche transition as a majority of the marine pelagic II clade MAGs correlated with depth and depthassociated parameters (nutrients, temperature, etc.; Fig. 2c). Similar correlations between depth-associated parameters and the marine pelagic clades are observed for the other cruise tracks (Supplementary Fig. 4). As has been shown previously, deep euphotic zone blue-light proteorhodopsins are adapted to low light incidence and capture a limited amount of light at 75 m [68], the apparent depth partitioning linked to encoding proteorhodopsin likely reflects an evolutionary selective pressure against maintaining a lightresponsive protein apparatus at depth and manifests as depth-specific niche boundaries between the two marine pelagic clades.

Conclusion
The Dadabacteria phylum is an understudied clade with a limited number of genomic representatives. The broad analysis of the four major clades represented among publicly available genomes reveals a broad range of heterotrophic organisms, putatively involved in the recycling of microbially derived DOM, such as peptidoglycan and phospholipids. The hydrothermal and organic carbon-associated clades appear to be facultative anaerobes capable of using alternative electron acceptors, while the marine pelagic clades appear to be obligate aerobes. The marine pelagic clades have genomic features indicating extensive genome streamlining evolutionary pressures that mirror their ecological distribution in oligotrophic environments. Genome streamlining theory is an important hypothesis for explaining the prevalence of small genomes among cosmopolitan microorganisms and the Dadabacteria represent a clear example of the theory in action. The two distinct marine pelagic clades are differentiated in metabolic potential by the presence of light-associated adaptations, such as proteorhodopsin, terpenes, and carotenoids, supporting an argument that marine pelagic I clade possess a photoheterotrophic lifestyle. These adaptations are reflected in the ecological distribution of these clades with depth-partitioned niches for marine pelagic I and II clades. The Dadabacteria have multiple transitions that are of interest for understanding evolutionary pressures and adaptations in different environments, including: terrestrial to marine transitions; high to moderate/low temperature transitions; and adaptations from organic rich to organic poor environments. Further studies and the expansion of available genomes for this clade may provide specific insights as to how these transitions occur and manifest in microbial genomes.

Data availability
Several of the MAGs (TOBG-EAC99, TARA-RED-00009, TOBG-IN994, TOBG-MED731, TOBG-MED713, and TOBG-SP357) used in this study and underwent manual curation originated from the Tara Oceans dataset and were never submitted to NCBI to avoid duplication in GenBank. These curated MAGs are noted in Supplementary Table 1 and are available here: https://doi.org/10.6084/m9.figshare. 12344207. As noted in Supplementary Table 1, MAGs with corresponding submissions in NCBI GenBank have been updated. providing access to such important global marine metagenomic datasets. Their commitment to the open-access data has proven to be a valuable asset to all who build on the shoulder of giants. And we thank the Center for Dark Energy Biosphere Investigations (C-DEBI) for providing funding to BJT (OCE-0939654). This is C-DEBI contribution number 554.
Author contributions Analyses were conducted by EDG and BJT Specifically, EDG performed quality assessments, manual improvement of the MAGs, reconstructed the phylogeny, and recruitment procedure to determine ecological distributions. BJT performed analyses related to functional annotations and genome streamlining. EDG and BJT wrote the manuscript. The study was conceived by BJT.

Compliance with ethical standards
Conflict of interest The authors declare that they have no conflict of interest.
Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons. org/licenses/by/4.0/.