Potential for microbial anaerobic hydrocarbon degradation in naturally petroleum-associated deep-sea sediments

The lack of cultured isolates and microbial genomes from the deep seabed means that very little is known about the ecology of this vast habitat. Here, we investigated energy and carbon acquisition strategies of microbial communities from three deep seabed petroleum seeps (3 km water depth) in the Eastern Gulf of Mexico. Shotgun metagenomic analysis revealed that each sediment harbored diverse communities of chemoheterotrophs and chemolithotrophs. We recovered 82 metagenome-assembled genomes affiliated with 21 different archaeal and bacterial phyla. Multiple genomes encoded enzymes for acetogenic fermentation of aliphatic and aromatic compounds, specifically those of candidate phyla Aerophobetes, Aminicenantes, TA06 and Bathyarchaeota. Microbial interactions in these communities are predicted to be driven by acetate and molecular hydrogen, as indicated by a high abundance of fermentation, acetogenesis, and hydrogen utilization pathways. These findings are supported by sediment geochemistry, metabolomics and thermodynamic modelling of hydrocarbon degradation. Overall, we infer that deep-sea sediments experiencing thermogenic hydrocarbon inputs harbor phylogenetically and functionally diverse communities potentially sustained through anaerobic hydrocarbon, acetate and hydrogen metabolism.

8 classification based on concatenation of 120 ubiquitous, single-copy marker genes 29 as well as 144 classification of 16S rRNA genes using the SILVA database 30 (Tables S4 and S5). 145 In summary, while there are considerable community-level differences between the three sample 146 locations, the recovered MAGs share common taxonomic affiliations at the phylum and class 147 levels. Guided by sediment geochemistry (Table 1), we subsequently analyzed the metabolic 148 potential of these MAGs to understand how bacterial and archaeal community members generate 149 energy and biomass in these natural petroleum-associated deep-sea environments. Hidden 150 Markov models (HMMs) and homology-based models were used to search for the presence of 151 different metabolic genes in both the recovered MAGs and unbinned metagenomes. Where 152 appropriate, findings were further validated through metabolomic analyses, phylogenetic 153 visualization, and analysis of gene context. 154

Capacity for detrital biomass and hydrocarbon degradation in sediment microbial 155
communities 156 In deep-sea marine sediments organic carbon is supplied either as detrital matter from the 157 overlying water column or as aliphatic and aromatic petroleum compounds that migrate upwards 158 from underlying petroleum-bearing sediments 11 . With respect to detrital matter, genes involved 159 in carbon acquisition and breakdown were prevalent across both archaeal and bacterial MAGs. 160 These include genes encoding intracellular and extracellular carbohydrate-active enzymes and 161 peptidases, as well as relevant transporters and glycolysis enzymes ( Figure 3 and Table S6). The 162 importance of these carbon acquisition mechanisms is supported by the detection of 163 corresponding intermediate metabolites, such as glucose and amino acids, in all three sediments 164 (Table S1). The ability to break down fatty acids and other organic acids via the beta-oxidation 9 pathway was identified in 13 MAGs, including members of Chloroflexi,Deltaproteobacteria,166 Aerophobetes and Lokiarchaeota (Figure 3 and Table S6). These results align with many other 167 studies suggesting that the majority of seabed microorganisms are involved in recycling of 168 residual organic matter, including complex carbohydrates, proteins and lipids 13, 31, 32 . 169 Unlike in other studies, the presence of petroleum hydrocarbons is a defining feature of the 170 sediments investigated here and thus a key goal of this study was to identify the potential for 171 microbial degradation of hydrocarbons as a source of energy and carbon. To this end, we focused 172 on functional marker genes encoding enzymes that catalyze the activation of mechanistically 173 sophisticated C-H bonds, to initiate hydrocarbon biodegradation 33  contaminated aquifers, whereas mechanism (4) has been studied extensively in marine sediments 182 23, 37 . 183 Evidence for glycyl-radical enzymes that catalyze fumarate addition was found in 15 out of the 184 82 MAGs based on identifying genes encoding alkylsuccinate synthase (AssA) (Figures 3 and  185 4a). The assA sequences identified, while phylogenetically distant from canonical fumarate-186 adding enzymes and pyruvate formate lyases (Pfl), form a common clade with Pfl-like AssA 187 from Archaeoglobus fulgidus VC-16 and Abyssivirga alkaniphila L81 (Figure 4a). Both of these 188 organisms have been shown experimentally to be capable of anaerobic alkane degradation 38, 39 . 189 The putative assA genes identified here are present in all three samples regardless of 190 hydrocarbon concentrations. They belong to MAGs affiliated with the bacterial phyla 191 Aerophobetes, Aminicenantes and Chloroflexi as well as the archaeal phyla Bathyarchaeota,192 Lokiarchaeota and Thorarchaeota. The highest relative abundance of putative assA sequences 193 was found in Site E29 as indicated by quality-filtered reads, which is consistent with this 194 sediment containing the highest concentration of aliphatic compounds (Tables 1 and S7). 195 Additional searching for other genes encoding fumarate-adding enzymes in the quality-filtered 196 reads (e.g. bssA, nmsA, and canonical assA) did not return significant counts ( Figure 4 and Table  197 S7). Among the other three anaerobic hydrocarbon biodegradation mechanisms mentioned above, 198 a MAG classified as Dehalococcoidia (Chloroflexi E29_bin2) contained genes encoding putative 199 catalytic subunits of p-cymene dehydrogenase (Cmd) and alkane C 2 -methylene hydroxylase 200 (Ahy) (Figures 3 and S2), known to support p-cymene and alkane utilization 37 . Genes encoding 201 enzymes catalyzing hydrocarbon carboxylation, reverse methanogenesis and aerobic 202 hydrocarbon degradation (e.g. alkB, nahC and nahG) were not detected (Table S6). The latter 203 result is expected due to the low concentrations of oxygen in the top 20 cm of organic rich 204 seabed sediments 11 . 205 Considering the degradation of aromatic hydrocarbons, genes responsible for reduction of 206 benzoyl-CoA were detected in 12 MAGs (Figures 3 and 4b). Benzoyl-CoA is a universal 207 biomarker for anaerobic degradation of monoaromatic compounds as it is a common 208 intermediate to biochemical pathways catalyzing this process 40 . Benzoyl-CoA reduction to 209 cyclohex-1,5-diene-1-carboxyl-CoA is performed by Class I ATP-dependent benzoyl-CoA 210 11 reductase (BCR; BcrABCD) in facultative anaerobes (e.g. Thauera aromatica) or Class II ATP-211 independent reductase (Bam; BamBCDEFGHI) in strict anaerobes like sulfate reducers 41 . The 212 bcr genes detected are all Class I, and were found in bacterial MAGs (i.e., Dehalococcoidia, 213 Anaerolineae, Deltaproteobacteria, Aminicenantes and TA06) and archaeal MAGs (i.e., 214 Thermoplasmata and Bathyarchaeota) (Figures 3 and 4b). Genes for further transformation of 215 dienoyl-CoA to 3-hydroxypimelyl-CoA were also identified (Figures 3 and 4b), i.e., those 216 encoding 6-oxo-cyclohex-1-ene-carbonyl-CoA hydrolase (Oah), cyclohex-1,5-diencarbonyl-CoA 217 hydratase (Dch) and 6-hydroxycyclohex-1-ene-1-carbonyl-CoA dehydrogenases (Had) 42 . 218 Together with the detection of 23 -162 nM benzoate in these sediments (Table 1)  Analysis of MAGs from these deep-sea hydrocarbon-associated sediments suggests that 226 fermentation, rather than respiration, is the primary mode of organic carbon turnover in these 227 environments. Most recovered MAGs with capacity for heterotrophic carbon degradation lacked 228 respiratory primary dehydrogenases and terminal reductases, with exceptions being several 229 Proteobacteria and one Chloroflexi (Table S6). In contrast, 6 and 14 MAGs contained genes 230 indicating the capability for fermentative production of ethanol and lactate, whereas some 69 231 MAGs contained genes for fermentative acetate production ( Figure 3 and Table S6). These 12 findings are consistent with other studies emphasizing the importance of fermentation, including 233 acetate production, in deep-sea sediments 12,43 . 234 Acetate can also be produced by acetogenic CO 2 reduction through the Wood-Ljungdahl 235 pathway using a range of substrates, including heterotrophic compounds 15 . Partial or complete 236 sets of genes for the Wood-Ljungdahl pathway were found in 50 MAGs (Figures 3 and S3 unresolved 51 yet their corresponding genes co-occur with heterodisulfide reductases across 257 multiple archaeal and bacterial MAGs (Figure 3). Various Group 1 [NiFe]-hydrogenases were 258 also detected, which are known to support hydrogenotrophic respiration in conjunction with a 259 wide range of terminal reductases. This is consistent with previous studies in the Gulf of Mexico 260 that experimentally measured the potential for hydrogen oxidation catalyzed by hydrogenase 261 enzymes 53 . 262 Given the genomic evidence for hydrogen and acetate production in these sediments, we 263 investigated whether any of the MAGs encoded terminal reductases to respire using these 264 compounds as electron donors. In agreement with the high sulfate concentrations (Table 1), the 265 key genes for dissimilatory sulfate reduction (dsrAB) were widespread across the metagenome 266 reads, particularly at Site E29 (Table S7). These genes were recovered from MAGs affiliated 267 with Deltaproteobacteria and Dehalococcoidia (Table S6). We also identified 31 novel reductive 268 dehalogenase (rdhA) genes across 22 MAGs, mainly from Aminicenantes and Bathyarchaeota 269 ( Figure 3 and Table S6), suggesting that organohalides -that can be produced through abiotic 270 and biotic processes in marine ecosystems 54 -may be electron acceptors in these deep-sea 271 sediments. All MAGs corresponding to putative sulfate reducers and dehalorespirers encoded the 272 capacity to completely oxidize acetate and other organic acids to CO 2 using either the reverse 273 Wood-Ljungdahl pathway or TCA cycle ( Figure 3 and Table S6). Several of these MAGs also 274 harbored the capacity for hydrogenotrophic dehalorespiration via Group 1a and 1b [NiFe]-275 hydrogenases ( Figure 3). In addition to these dominant uptake pathways, one MAG belonging to 276 the epsilonproteobacterial genus Sulfurovum (E29_bin29) included genes for the enzymes 277 14 needed to oxidize either H 2 (group 1b [NiFe]-hydrogenase), elemental sulfur (SoxABXYZ), and 278 sulfide (Sqr), using nitrate as an electron acceptor (NapAGH); this MAG also has a complete set 279 of genes for autotrophic CO 2 fixation via the reductive TCA cycle ( Figure 3 and Table S6). ). In 280 contrast, the capacity for methanogenesis appears to be relatively low and none of the MAGs 281 contained mcrA genes. The genes for methanogenesis were detected in quality-filtered 282 unassembled reads in all three sediments (Figures 1d and S4) and were mainly affiliated with 283 acetoclastic methanogens at Site E29, and hydrogenotrophic methanogens at the other two sites 284 ( Figures 1d and S4). Overall, the collectively weak mcrA signal in the metagenomes suggests 285 that the high levels of biogenic methane detected by geochemical analysis (Table 1)  Instead, we provide theoretical evidence that hydrocarbon degradation is feasible in this 294 environment by modelling whether these processes are thermodynamically favorable in the 295 conditions typical of deep sea sediments, namely high pressure and low temperature. 296 As concluded from the genome analysis and supported by metabolomics (Table 1), it is likely 297 that most hydrocarbon oxidation occurs through fermentation rather than respiration. Taking 298 hydrogen production and the Wood-Ljungdahl pathway into consideration (Figures 3 and 4), we 299 15 compared the thermodynamic constraints on hydrocarbon biodegradation for two plausible 300 scenarios: (1) fermentation with production of hydrogen and acetate, and (2) fermentation with 301 production of acetate alone. Hexadecane and benzoate are used as representative aliphatic and 302 aromatic compounds, respectively, based on the geochemistry results (e.g. C 2+ alkane detection) 303 and genomic analysis (e.g. bcr genes) 47, 55 . The calculated results show that the threshold 304 concentrations of acetate that result in favorable energetics (ΔG′ < 0 kJ mol -1 ) for fermentative 305 co-generation of acetate and hydrogen require acetate to be extremely low in a hexadecane 306 degradation scenario (< 10 -12 mM acetate) and acetate to be at moderate levels in a benzoate 307 degradation scenario (< 3.8 mM acetate) ( Figure 5). By contrast, for fermentation leading to 308 production of only acetate, its concentration can be as high as 470 mM in a benzoate degradation 309 scenario and as high as 300 mM in a hexadecane degradation scenario ( Figure 5). Fermentative 310 degradation of hexadecane to hydrogen and acetate in the deep seabed could therefore be less 311 favorable than acetate production alone via the Wood-Ljungdahl pathway Thus, if microbial 312 communities consume hexadecane or more complex hydrocarbons as carbon and energy sources, 313 it is likely that they employ the Wood-Ljungdahl pathway to produce acetate. However, other 314 reactions such as fermentation to H 2 still cannot be excluded, e.g., for less complex hydrocarbons 315 such as benzoate and related compounds. 316

Discussion 317
In this study, metagenomics revealed that most of the Bacteria and Archaea in the deep-sea 318 sediment microbial communities sampled belong to candidate phyla that lack cultured 319 representatives and sequenced genomes (Figures 1 and 2). As a consequence, it is challenging to 320 link phylogenetic patterns with the microbial functional traits underpinning the biogeochemistry 321 16 of deep seabed habitats. Here, we were able to address this by combining de novo assembly and 322 binning of metagenomic data with geochemical and metabolomic analyses, and complementing 323 our observations with thermodynamic modeling. Pathway reconstruction from 82 MAGs 324 recovered from the three deep-sea near surface sediments revealed that many community 325 members were capable of anaerobic hydrocarbon degradation as well as acquiring and 326 hydrolyzing residual organic matter (Figure 3), whether supplied as detritus from the overlying 327 water column or as autochthonously produced necromass ( Figure 6). Heterotrophic fermenters 328 and acetogens were in considerably higher relative abundance than heterotrophic respirers, 329 despite the abundance of sulfate in the sediments (Table 1) Figures 3 -5). Whereas capacity for detrital organic 339 matter degradation is a common feature in the genomes retrieved in this study, and from many 340 other environments 26 , anaerobic hydrocarbon degradation is a more exclusive feature that was 341 SD-Bact-0341-bS17/SD-Bact-0785-aA21 and SD-Arch-0519-aS15/SD-Arch-0911-aA20, 454 22 respectively 61 as described previously 20 on a ~15 Gb 600-cyce (2 × 300 bp) sequencing run (for 455 results see Figure S1). 456

Metagenomic assembly and binning 457
Raw reads were quality-controlled by (1) clipping off primers and adapters and (2) filtering out 458 artifacts and low-quality reads as described previously 62 . Filtered reads were assembled using 459 metaSPAdes version 3.11.0 63 and short contigs (<500 bp) were removed. Sequence

Phylogenetic analyses 487
For taxonomic classification of each MAG, two methods were used to produce genome trees that 488 were then used to validate each other. In the first method the tree was constructed using 489 concatenated proteins of up to 16 syntenic ribosomal protein genes following procedures 490 reported elsewhere 74 ; the second tree was constructed using concatenated amino acid sequences 491 of up to 43 conserved single-copy genes following procedures described previously 75 . Both trees 492 were calculated using FastTree version 2.1.9 (-lg -gamma) 76 and resulting phylogenies were 493 congruent. Reference genomes for relatives were accessed from NCBI GenBank, including 494 genomes selected from several recent studies representing the majority of candidate bacterial and 495 archaeal phylogenetic groups 4, 65, 77-80 . The tree in Figure 2 was inferred based on concatenation 496 of 43 conserved single-copy genes (Database S1). Specifically, it was built using RAxML