Genomic evidence for the degradation of terrestrial organic matter by pelagic Arctic Ocean Chloroflexi bacteria

The Arctic Ocean currently receives a large supply of global river discharge and terrestrial dissolved organic matter. Moreover, an increase in freshwater runoff and riverine transport of organic matter to the Arctic Ocean is a predicted consequence of thawing permafrost and increased precipitation. The fate of the terrestrial humic-rich organic material and its impact on the marine carbon cycle are largely unknown. Here, a metagenomic survey of the Canada Basin in the Western Arctic Ocean showed that pelagic Chloroflexi from the Arctic Ocean are replete with aromatic compound degradation genes, acquired in part by lateral transfer from terrestrial bacteria. Our results imply marine Chloroflexi have the capacity to use terrestrial organic matter and that their role in the carbon cycle may increase with the changing hydrological cycle.

T he Arctic Ocean accounts for 1.4% of global ocean volume but receives 11% of global river discharge 1 . Up to 33% of the dissolved organic matter in the Arctic Ocean is of terrestrial origin and a major fraction of this terrestrial dissolved organic matter (tDOM) originates from carbon-rich soils and peatlands 2,3 . With thawing permafrost and increased precipitation occurring across the Arctic 4 , increases in freshwater runoff and riverine transport of organic matter to the Arctic Ocean are predicted, which will increase tDOM fluxes and loadings 5,6 . The additional tDOM may represent new carbon and energy sources for the Arctic Ocean microbial community and contribute to increased respiration, which would result in the Arctic being a source of dissolved inorganic carbon to the ocean. Alternatively, as it moves from its source of origin to the Arctic Ocean tDOM could become more recalcitrant to bacterial metabolism and represent a long-term sequestration of the newly released carbon making the Arctic more carbon neutral 7,8 . However, an estimated 50% of Arctic Ocean tDOM is removed before being released to the Atlantic, at least in part by microbial processes 9 . As input of tDOM increases, knowledge on its microbial transformation will be critical for understanding changes in Arctic carbon cycling.
The marine SAR202 is a diverse and uncultivated clade of Chloroflexi bacteria that comprise roughly 10% of planktonic cells in the dark ocean [10][11][12][13][14] . SAR202 is also common in marine sediments and deep lakes [15][16][17] . It has long been speculated that SAR202 may have a role in the degradation of recalcitrant organic matter 11,14 , and the recent analysis of SAR202 single-cell-amplified genomes (SAGs) lends support to this notion 18 . More generally, Chloroflexi, including those in the SAR202 clade, are also present in the upper layers of the Arctic Ocean 19 , leading to the hypothesis that recalcitrant organic compounds present in high Arctic tDOM could be utilized by this group.

Results
In this study, we analyzed Chloroflexi metagenome-assembled genomes (MAGs) generated from samples collected from the vertically stratified waters of the Canada Basin in the Western Arctic Ocean (Fig. 1a). A metagenomic co-assembly was generated from samples originating from the surface layer (5-7 m), the subsurface chlorophyll maximum (25-79 m) and a layer corresponding to the terrestrially-derived DOM fluorescence (FDOM) maximum previously described within the cold Canada Basin halocline comprised of Pacific-origin waters (177-213 m) 20 . The Pacific-origin FDOM maximum is due to sea ice formation and interactions with bottom sediments on the Beaufort and Chukchi shelves, which themselves are influenced by coastal erosion and river runoff 20 . Binning based on tetranucleotide frequency and coverage resulted in 360 MAGs from a diversity of marine microbes (Fig. 1b). Six near-complete Chloroflexi MAGs were identified. Based on 16S rRNA gene phylogeny, these MAGs represented three distinct SAR202 subclades (SAR202-II, -VI, -VII), the AncK29 clade and the TK10 clade (Fig. 2a). Estimated MAG completeness ranged from 77 to 99%, while contamination ranged from 0 to 2.3% (Table 1). All MAGs exhibited highest coverage just below the subsurface chlorophyll maximum (Fig. 2b) which is consistent with earlier findings on SAR202 distribution in the North Pacific Ocean 12 . However, the concentration and composition of the FDOM maximum in the Canada Basin is significantly different compared to the North Pacific Ocean 21 and the North Atlantic Subtropical Gyre 22 . A concatenated protein phylogeny demonstrated that the SAR202 MAGs were distinct from previously published MAGs from the deep ocean 18 and oxygen minimum zones 23 (Supplementary Figure 1). Fragment recruitment of 21 TARA Ocean metagenomic datasets spanning epipelagic to mesopelagic waters at 7 locations and 4 separate bathypelagic metagenomes indicated that the Canada Basin Chloroflexi MAGs were not widely distributed in the oceans (Fig. 2c, Supplementary Data 1). These findings are evidence that the Chloroflexi MAGs represent genotypes that are rare outside Arctic marine waters.
The Chloroflexi MAGs contained many genes implicated in the degradation of aromatic compounds typically associated with humic-rich tDOM (Supplementary Data 2). A single MAG (SAR202-VII-2) from a previously undescribed clade (SAR202-VII) exhibited a striking enrichment in these genes (Fig. 3a). Partial pathways for the catabolism of aromatic compounds were recently reported from deep ocean SAR202 SAGs 18 . To assess whether the abundance and diversity of SAR202-VII-2 genes involved in aromatic compound catabolism is unique to Arctic Ocean MAGs or is a more broad characteristic of marine Chloroflexi, we compared gene content between SAR202-VII-2 and two SAGs (SAR202-V-AB-629-P13 and SAR202-III-AAA240-O15) reported in Landry et al. 18 Of the 117 SAR202-VII-2 orthologs implicated in aromatic compound degradation, 12 were identified in SAR202-III-AAA240-O15 and only one was identified in SAR202-V-AB-629-P13, implying distinct and less diverse pathways in deep ocean SAR202 compared to the Arctic Ocean populations (Supplementary Data 2).
Proteins for the modification and degradation of monoaryl and biaryl compounds were predicted, including a diversity of aromatic ring-cleaving dioxygenases [24][25][26] . A total of 42 ring-cleaving dioxygenases targeting compounds related to catechol, protocatechuate and gentisate were present in the six MAGs, with 25 dioxygenases predicted in SAR202-VII-2 alone (Fig. 3a, b). Ring demethylation, hydroxylation, and decarboxylation are important prerequisite steps to prime diverse aromatic compounds for downstream oxidative cleavage 27,28 . Thirty ring-demethylating monooxygenases, ten ring-hydroxylating dioxygenases, and eleven ring-decarboxylases were annotated in the SAR202-VII-2 MAG (Fig. 3c, Supplementary Data 2). Proteins involved in the conversion of ring-cleavage products to central intermediates of the citric acid cycle were also present in the SAR202-VII-2 MAG, including dehydrogenases (i.e., 2,3-dihydroxy-2,3-dihydrophenylpropionate dehydrogenase), decarboxylases (i.e., oxaloacetate B-decarboxylase), aldolases (i.e., HMG aldolase and 4-carboxymuconolactone decarboxylase), hydratases (i.e., 4oxalmescanoate hydratase and 2-oxopent-4-enoate hydratase), isomerases (i.e., mycothiol maleulpyruvate isomerase and muconolactone isomerase) and hydrolases (i.e., 3-oxoadipate enol-lactonase and β-ketoadipate enol-lactone hydrolase) (Supplementary Data 2, Supplementary Figure 2). We note that we were unable to identify a single complete reference pathway for humic-like aromatic compound degradation. Since estimated genome completeness for SAR202-VII-2 was 99%, it is unlikely the genes were missed due to an incomplete genome. Another explanation is that marine Chloroflexi genomes encode novel pathway variants. Indeed, numerous metal-dependent hydrolases, hydrolases of the HAD family and NAD(P)-dependent dehydrogenase were clustered in genomic regions with the ringmodifying oxygenases, decarboxylases, and demethylases described above. In addition to the array of aromatic compound degradation genes, the SAR202-VII-2 MAG also contained 34 copies of the flavin mononucleotide (FM)/F420-dependent monooxygenase catalytic subunit (FMNO) proteins previously implicated in activation of recalcitrant organic compounds in the deep ocean 18 . These results are consistent with Chloroflexi in the Arctic Ocean having the metabolic potential to access carbon and energy available in aromatic compounds typically associated with tDOM.
The diversity of SAR202-VII-2 genes implicated in aromatic compound degradation lead us to hypothesize that they may have originated by lateral gene transfer from terrestrial bacteria. To test this, we targeted aromatic compound degradation genes (the ring-cleaving dioxygenases, specifically) in the Chloroflexi MAGs for in-depth phylogenetic analyses. The genomic diversity of marine Chloroflexi was expanded in our analysis by including 130 Chloroflexi MAGs recently assembled and binned from the TARA Oceans project 29 . A number of the SAR202-VII-2 ringcleaving dioxygenase homologs were most closely related to Ocean data view   Figure 4) indicating that aromatic compound degradation in Chloroflexi is not restricted to the Arctic Ocean. However, lateral gene transfer from terrestrial bacteria was also evident. For example, an annotated gentisate dioxygenase gene was positioned within a clade of terrestrial Actinomycetes (Fig. 4). Additional genes involved in the degradation of structures related to catechol, protocatechuate and gentisate were phylogenetically associated with homologs from terrestrial Acidobacteria (Supplementary    Figure 7) and a clade of diverse terrestrial phyla (Supplementary Figure 8). Additionally, 2 gentisate 1,2-dioxygenase genes and 1 protocatechuate dioxygenase ligB gene were phylogenetically associated to a clade of genes from both terrestrial Delta-proteobacteria and marine microbes (Fig. 4 and Supplementary Figure 8). These putative gene acquisitions were unlikely due to contaminating scaffolds because the genes were located on long scaffolds that were assigned to Chloroflexi with high confidence based on tetranucleotide frequencies and the phylogenetic identity of housekeeping genes. Such a phylogenetic pattern supports the hypothesis that marine Chloroflexi acquired the capacity for aromatic compound degradation, at least in part, by lateral gene transfer from terrestrial bacteria.

Conclusions
In total, these results are consistent with Chloroflexi having a role in tDOM transformation in waters of the Arctic Ocean. This is the first study to our knowledge to associate a specific microbial group with tDOM metabolism in the Arctic Ocean and it expands on recent studies contributing to our understanding of the metabolic diversity of the abundant yet uncultivated marine Chloroflexi 18,23 . Moreover, lateral gene transfer from terrestrial bacteria appears to have contributed to the evolution of aromatic compound degradation capabilities within marine Chloroflexi, particularly in regions of the Arctic Ocean impacted by terrestrial input. The majority of MAGs were restricted to the humic-rich Pacific-origin halocline of the Canada Basin, however it is the surface waters that will be most immediately affected by increased freshwater input 1 . Hence, our initial observations suggest a need for further research on the distribution of tDOM-utilizing microbes in other Arctic water masses with an aim to establish how common and phylogenetically widespread tDOM metabolism is in the Arctic Ocean. These water masses could include coastal surface waters at the mouth of the Mackenzie River, as well as regions of differing DOM composition such as the East Siberian Sea 20 . Moreover, metagenomic studies such as this are, in essence, hypothesis-generating and future work that includes targeted cultivation, in situ gene expression analysis, and rate measurement-based approaches are required to validate and quantify microbial metabolic contributions to nutrient cycling. Overall, it is likely that marine Chloroflexi have the capacity to degrade tDOM, and their role in the Arctic carbon cycle may increase as Arctic warming leads to greater inputs of terrestrial organic matter.

Methods
Sampling and DNA extraction. Twelve samples for metagenomics were collected in September 2015 during the Joint Ocean Ice Study cruise to the Canada Basin. For each sample, 4-8 L of seawater was sequentially filtered through a 50 μm pore mesh, followed by a 3 μm pore size polycarbonate filter and a 0.22 μm pore size Sterivex filter (Durapore; Millipore, Billerica, MA, USA). Filters were preserved in RNAlater and stored at −80°C until processed in the laboratory. DNA was extracted from the Sterivex filter using the following method: filters were thawed on ice and RNAlater was removed. The Sterivex was then rinsed twice with a sucrosebased lysis buffer, and filled with 1.8 mL of the lysis buffer. Filters were treated with 100 μL of 125 mg mL −1 lysozyme and 20 μL of 10 μg mL −1 RNAse A and left to rotate at 37°C for 1 h. After incubation, 100 μL of 10 mg mL −1 proteinase K and 100 μL of 20% SDS was added. Filters were left to rotate for 2 h at 55°C. Lysate was removed from the filters. Protein was precipitated and removed with 0.583 volumes of MPC Protein Precipitation Reagent (Epicentre, Madison, WI, USA) and centrifugation at 10,000×g at 4°C for 10 min. The supernatant was transferred to a clean tube. DNA was precipitated with cold isopropanol, and resuspended in low TE buffer.
Metagenomic sequencing, assembly, annotation, and binning. DNA sequencing of 12 samples was performed at the Department of Energy Joint Genome Institute (Walnut Creek, CA, USA) on the HiSeq 2500-1TB (Illumina) platform.
Paired-end sequences of 2 × 150 bp were generated for all libraries. A metagenome co-assembly of all raw reads was generated using MEGAHIT 30 with kmer sizes of 23,43,63,83,103,123. Gene prediction and annotation was performed using the DOE Joint Genome Institute's Integrated Microbial Genomes (IMG) database tool (version 4.11.0) 31 . Metagenomic binning was performed on scaffolds ≥10 kb in length using MetaWatt 32 . Relative weight of coverage binning was set to 0.75 and the optimize bins and polish bins options were set to on. The taxonomic identity of MAGs was assessed using a concatenated phylogenetic tree based on 138 single copy conserved genes as implemented in MetaWatt 32 . Visualization of the tree and mapping of data on to taxa was performed with iTOL 33 . Estimation of MAG completeness and contamination was performed using CheckM 34 . Six Chloroflexi MAGs were selected for further analysis based on the presence of a 16S rRNA gene, high completeness and low contamination. Manual curation of the six Chloroflexi MAGs was performed and suspected contaminating scaffolds (single copy genes most similar to non-Chloroflexi taxa) were removed prior to further analysis of MAGs.
16S rRNA phylogenetic analysis. Chloroflexi diversity in the metagenomic assembly was assessed by 16S rRNA gene analysis. All 16S rRNA genes in the coassembly were assigned to taxonomic groups using mothur 35 and the Wang method with a bootstrap value cutoff of 60% 36 . Chloroflexi 16S rRNA genes greater than 360 bp were included in a phylogenetic analysis with Chloroflexi reference sequences. A multiple sequence alignment was generated using MUSCLE (implemented in MEGA6) 37 . Phylogenetic reconstructions were conducted by maximum likelihood using MEGA6-v.0.6 and the following settings: general time reversible model, gamma distribution model for the rate variation with four discrete gamma categories, and the nearest-neighbor interchange (NNI) heuristic search method 38 with a bootstrap analysis using 100 replicates.
Single protein and concatenated protein phylogenies. A concatenated protein phylogeny was constructed using 30 Chloroflexi reference genomes and the 6 Canada Basin Chloroflexi MAGs. Orthologous genes in the 36 Chloroflexi genomes were identified using ProteinOrtho 38 . Fifty orthologs present in at least 34 of the 36 genomes were selected for concatenated phylogenetic analysis (Supplemetary Data 3). Each orthologous protein family was aligned using MUSCLE (implemented in MEGA6) and alignment positions were masked using the probabilistic masker ZORRO 39 , masking columns with weights <0.5. The concatenated alignment consisted of 14,815 amino acid positions. Phylogenetic reconstructions were conducted by maximum likelihood using MEGA6-v.0.6 and the following settings: JTT substitution model, gamma distribution with invariant sites model for the rate variation with four discrete gamma categories, and the nearest-neighbor interchange (NNI) heuristic search method 40 with a bootstrap analysis using 100 replicates.
For phylogenetic analysis of ring-cleaving dioxygenase sequences identified in SAR202-VII-2, query sequences were searched against UniRef90 and 130 Chloroflexi MAGs constructed from the TARA Oceans dataset 29 . The TARA Ocean Chloroflexi MAGs were used as is with no manual curation. UniRef90 sequences and the top TARA Ocean MAG hits for each dioxygenase were aligned with their respective SAR202-VII-2 homologs with MUSCLE (implemented in MEGA6) and alignment positions were masked using the probabilistic masker ZORRO 39 , masking columns with weights less than 0.5. Phylogenetic reconstruction was conducted using the same settings as the concatenated phylogeny.
Comparative genomics and metabolic reconstruction. The distribution of orthologs across Arctic Ocean genomes, as well as the identification of orthologs shared with the deep ocean SAGs, was determined using proteinortho 38 . Inference of protein function and metabolic reconstruction was based on the IMG annotations provided by the JGI, including KEGG, Pfam, EC numbers, and Metacyc annotations. Metabolic reconstruction was also facilitated by generated pathway genome databases for each MAG using the pathologic software available through Pathway Tools 41 .
Metagenomic fragment recruitment. The distribution of the Canada Basin MAGs in the global ocean was determined using best-hit reciprocal blast analysis similar to Landry et al. 18 Unassembled metagenomic data from 25 samples (Supplementary Data 1) was first recruited to the six Canada Basin Chloroflexi MAGS as well as two SAR202 SAGs originating from the deep North Pacific Ocean from the Hawaiian ocean time series (HOTS) and the deep North Atlantic Ocean 18 . Metagenomes from the TARA Ocean project used here were representative of the surface, chlorophyll maximum and mesopelagic waters from the North Atlantic, South Atlantic, North Pacific, Coastal North Pacific, South Pacific, Coast of Brazil and the Antarctic peninsula. To reduce computational demand, only part 1 (1 Gbp of a random subset of reads) of each metagenomic dataset available at EBI was used (Supplementary Data 1). Additional bathypelagic metagenomes from the North Pacific and South Atlantic Oceans (LineP P04, P12 and P26, and Knorr S15 2500 m) were also included. All hits from the initial blast were then reciprocally queried against the Canada Basin Chloroflexi MAGs, bathypelagic SAR202 SAGs, and 130 Chloroflexi MAGs constructed from the TARA oceans data. The best-hit was reported. Only hits with an alignment length ≥100 bp and a percent identity of 95% or more were counted (lower % identity cutoffs did not alter the number of reads recruited in any significant manner). To compare the results among the different datasets, the number of recruited reads was normalized to total number of reads in each sample. The final coverage results were expressed as the number of reads per kilobase of the MAG per gigabase of metagenome (rpkg).