Introduction

Convergent margins are regions where two or more tectonic plates collide, and the denser “subducting slab” is pushed beneath the less dense overriding plate. As the slab descends, it devolatilizes under higher temperatures and pressures, allowing dissolved inorganic carbon (DIC) and redox-active volatiles to cycle between the upper crust and Earth’s mantle [1] (Fig. 1A). These volatiles are brought to the surface by hydrothermal fluids migrating through cracks and fissures in the upper mantle and crust, possibly stimulating microbial activity [2]. In addition to providing volatiles, convergent margins offer a large potential space for a subsurface habitat since the 122 °C sterilizing isotherm (Fig. 1A) can be many kilometers deep [3, 4].

Fig. 1: General convergent margin geological structure and sample sites spanning the Costa Rican convergent margin.
figure 1

Geological provinces (A) and sample sites (B) are colored by their province location: outer forearc (blue), forearc (orange), and arc (red). A The green arrow indicates the location of trench, black arrows show the direction of plate movements, blue arrows show fluids and degassing throughout subduction. B Green dotted line indicates the trench while gray dotted lines mark the depth of the 122 °C isotherm in 5 km increments, based on data for this region of Costa Rica from Harris and Wang 2010. Locations of volcanoes are shown with volcano icons. Dotted white line demarcates the plates emerging from the East Pacific Rise (EPR) and the Cocos Nazca Spreading center (CNS).

At the Costa Rican convergent margin, the Cocos Plate subducts beneath the Caribbean Plate, releasing fluids entrained with volatiles such as sulfur, iron, hydrogen, and DIC [5, 6]. These fluids fuel chemolithoautotrophy-based microbial ecosystems in the subsurface before they are expelled at the surface at hydrothermal seeps and springs [7]. These chemolithoautotrophic ecosystems generate substantial amounts of organic matter with up to 24 mM dissolved organic carbon (DOC), that may sequester 1.4 × 109 to 1.4 × 1010 molC/year [7]. This is similar to the amount of carbon estimated to be sequestered by calcite deposition in the overriding plate [8]. However, carbon sequestration by microbial autotrophy is linked to other elemental cycles in a different way than calcite precipitation is, since chemolithoautotrophs derive energy for carbon fixation by catalyzing a wide variety of exergonic redox reactions [9].

Using an approach that combines co-located metagenome sequences and geochemical data, we investigated whether chemolithoautotrophic organisms, their encoded carbon fixation pathways (CFP), and/or the types of redox reactions they potentially use to fuel carbon fixation vary across the major geological provinces of the Costa Rican convergent margin (outer forearc, forearc, and volcanic arc). The outer forearc is the province closest to the trench where the downgoing oceanic plate subducts beneath the overriding plate. Here, the subducting slab is relatively shallow and seawater along with pore fluids and serpentinization products are released from subducted material [10,11,12]. In the forearc, farther inland from the trench, deeply derived volatiles that flux to the surface in natural seep fluids have likely undergone some hydrothermal alteration and mixing with mantle-derived fluids. The arc consists of volcanic edifices farther inland parallel to the trench [5] typically coinciding with the point where the slab is approximately 90–110 km deep. Here, slab fluids and melts interact with the mantle wedge to generate magmas [13, 14].

Subsurface microorganisms in other convergent margins have been shown to perform carbon fixation as well as hydrogen, sulfur, and nitrogen cycling, but this has only been studied within single provinces of a convergent margin such as the backarc [15,16,17] or outer forearc [18]. Landscape-scale (defined here as tens to hundreds of kilometers) distributions of chemolithoautotrophic metabolisms have been observed in intraplate natural springs in Yellowstone National Park, USA [19,20,21], and Tengchong, Yunnan, China [22], but not across a subducting margin. Here, we expand our understanding of the geobiological underpinnings of carbon release and sequestration across a convergent margin.

We report 404 metagenome-assembled genomes (MAGs) from 14 natural deeply sourced springs (27 samples total) distributed across a ~200 km transect of the Costa Rican convergent margin spanning the outer forearc, forearc, and arc provinces (Fig. 1B and Supplementary Fig. S1). These springs are deeply sourced as their high helium-3/helium-4 values indicate substantial mantle contributions to all samples [8]. A previous study of the unassembled metagenomes from these springs indicated the presence of genes related to the reductive acetyl-CoA pathway (Wood-Ljungdahl; WL), the reductive tricarboxylic acid cycle (rTCA), and the reductive pentose phosphate cycle (Calvin Benson–Bassham Cycle; CBB) for carbon fixation [7]. However, unassembled metagenomic reads cannot be used to directly relate CFPs to the taxa that have them in their genomes, nor their redox-active metabolic pathways. Here, we show that subsurface- and surface-associated MAGs are distinguished by their distributions and metabolic pathways. Genes encoding for WL and rTCA are abundant across the margin but are encoded by MAGs with different taxonomic affiliations. In addition, these CFP are potentially powered by different redox metabolisms across different geological provinces. Together this suggests a strong geological control over the distribution of different metabolic strategies across the convergent margin landscape.

Methods

Sample collection, DNA extraction, and sequencing

Details on collection, DNA extraction, and sequencing of samples from actively venting fluids of 14 natural springs (Fig. 1B) have been described in detail previously [7, 8].

Metagenomic assembly, binning, annotation, and metabolic predictions

Raw reads were trimmed and mate-paired by Trimmomatic (version 0.38 [23]) using default settings. Reads were then assembled de novo in metaSPAdes [24] to create site-specific assemblies with a minimum contig length of 1.5 kb. Assemblies were then used to construct 404 metagenomic assembled genomes (MAGs/bins) with ≥70% completeness and <5% contamination (Supplementary Table S1) [25,26,27,28,29,30] through the MetaWRAP pipeline [31]. Prokka [32], FeGenie [33], GhostKOALA [34], and KEGGdecoder [35] were used for open reading frame prediction, gene annotation, KEGG orthology identifier linkage, and metabolic predictions. The genetic potential to use hydrogen, sulfur, nitrogen, and/or iron species as electron donors or acceptors was assessed based on the presence of the enzymatic components detailed in the Supplementary Methods (Supplementary Table S2). Elemental sulfur oxidation and sulfate reduction were determined by phylogenetic analyses of the dissimilatory (bi)sulfite reductase alpha subunit (dsrA (Supplementary Fig. S2)). In addition, a MAG was considered capable of autotrophy if 60% of the genes and all key enzymes for a CFP were present. As with dsrA, phylogenetic analysis was conducted on the ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO) large subunit to delineate the different RuBisCO forms of CBB (Supplementary Fig. S3). In-depth details for metagenomic assembly, binning, annotation, and metabolic predictions are given in Supplementary Methods.

MAG phylogenetic and statistical analysis

The Genome Taxonomy Database Toolkit (GTDB-TK [36]; version 1.4.0) was used for taxonomic identification and GTDB nomenclature has been used throughout this manuscript, although we refer to other commonly used names when it increases clarity. MAG abundance was quantified in units of GCPM (Genome Copy Per Million Reads). An in-depth description of how GCPM was calculated is given in Supplementary Methods. Abundance correlations across and within samples were determined using hierarchical clustering of Spearman rank correlation distance matrices (Fig. 2 and Supplementary Fig. S1). MAG abundance distributions across the convergent margin and their correlations with relevant environmental variables [7] were then determined via transformation-based canonical correspondence analysis (tb-CCA). See Supplementary Methods for detailed information about the statistical analysis.

Fig. 2: Heatmap of metagenomic read recruitment to each MAG and hierarchical clustering based on Spearman rank correlations between MAGs (vertical clusters) and sample sites (horizontal clusters) shows the separation of MAGs into those found mostly in the forearc/arc, the outer forearc, or delocalized across geological provinces.
figure 2

Entire heatmap is shown to the left, with numbers corresponding to clusters; and representative enlarged sections are shown to the right, with metabolic pathways for redox couples and carbon fixation pathways listed. Subsections were chosen based on inclusion of the key metabolisms and taxa that characterize province-specific clusters (clusters 1 and 3) and delocalized clusters (clusters 2, 4, and 5). Sulfur* and nitrogen* refer to the metabolic capability of metabolizing multiple oxidation states. The full heatmap in high resolution is in Supplementary Fig. S1.

Results

General description of the MAGs

In total, 404 medium to high-quality MAGs spanning 5 archaeal and 43 bacterial phyla were recovered from the 27 samples of hydrothermal fluids and sediment. The 372 bacterial and 32 archaeal MAGs are ≥70% complete with <5% multi-genomic contamination. Of these, 221 MAGs are >90% complete, with <5% contamination and 43 of these are high-quality based on MIMAGs criteria (25 (Supplementary Table S1)). Most MAGs belong to lineages with no cultured relatives at the family level or higher (69% of archaea and 64% of bacteria). One archaeal MAG was classifiable only at the domain level.

Most MAGs (92%) have predictions for at least one metabolic pathway (Fig. 2 and Supplementary Fig. S1). Of the MAGs with none of the genes of interest for the electron donors or acceptors, 13 are archaea (Crenarchaeota = 4, Micrarchaeota = 4, Nanoarchaeota = 4, and Thermoplasmatota = 1) and 19 are bacteria (Actinobacteriota = 1, Chloroflexota = 2, Desulfobacterota = 1, Nitrospirota = 1, Omnitrophota = 1, Patescibacteria = 8, and WOR-3 = 4). Neither complete methanogenic pathways nor the methyl coenzyme M reductase gene are present in any of the MAGs. Photosystems are present only in the four Cyanobacteria MAGs, accounting for 0.2% of total GCPM.

CBB, rTCA, and WL pathways are present in about one-third of the MAGs (n = 117, ~35% of total GCPM). The most abundant is WL (n = 67, 19% of total GCPM), followed by CBB (n = 42, 8% of total GCPM), and rTCA (n = 12, 7% of total GCPM) (Supplementary Fig. S4). Although three MAGs (bins 112, 253, and 91) contain ≥60% of the 3-hydroxypropionate/4-hydroxybutyrate cycle, none contain the key enzymes for this pathway [37, 38]. Three Anaerolineae MAGs (bins 55, 251, and 386) contain both WL and CBB, and have RuBisCO (for CBB) and carbon monoxide dehydrogenase/acetyl-CoA synthase (CODH/ACS (for WL)). These genes are closely related to others from Chloroflexi, suggesting the co-occurrence of these pathways is not due to genomic contamination. Another MAG in the class Syntrophia (bin 351) contains genes for both WL and rTCA. Genes for the WL pathway are found in MAGs from 22 classes across 12 different phyla (Actinobacteria, Bipolaricaulota, Chloroflexota, Deferrisomatota, Desulfobacterota, Firmicutes, MBNT15, Acidobacteriota, Coprothermobacterota, Nitrospirota, Omnitrophota, and Planctomycetota). Genes for the rTCA cycle are found in MAGs from 4 classes across 4 phyla (Aquificota, Chloroflexota, Desulfobacterota, and Nitrospirota). Genes for the CBB cycle are found in MAGs from 8 classes across 6 phyla (Chloroflexota, Proteobacteria, Actinobacteriota, Cyanobacteria, Deinococcota, and Firmicutes), with RuBisCO Form I (25 MAGs), Form II (9 MAGs), both Form I and II (3 MAGs), or an unknown form (4 MAGs) (Supplementary Fig. S3).

Together, WL-containing MAGs have the most diverse metabolic predictions, containing all the metabolic pathways that were identified in this study in some combination. Among the WL-containing MAGs, the most abundant have the genetic potential for the oxidation of hydrogen and sulfide, and some possess bd and aa3-type cytochromes. rTCA-containing MAGs have the genetic potential for every redox pathway except nitrogen cycling and iron (III) reduction. In this category, the most abundant MAGs have the genetic potential for sulfide and thiosulfate oxidation and the reduction of oxygen. CBB-containing MAGs have the genetic potential for every redox pathway except sulfate reduction. These MAGs have the genetic potential for hydrogen, sulfide, and thiosulfate oxidation and the reduction of oxygen (microaerophilic and/or aerobic) and nitrate.

Cluster and multivariate analysis

Hierarchical clustering of sites based on Spearman correlations of MAG abundance resulted in all the outer forearc sites clustering together and most of the forearc and arc sites clustering together (Fig. 2 and Supplementary Fig. S1, horizontal sorting). The only exception is the sediment sample from Borinquen, as discussed below. Of the 10 sites where metagenomes were sequenced from both fluids and sediments, four have their fluid and sediment samples clustering together (El Tucano, Quebrada Naranja, Ranchero el Salitral, and Quepos). Four other sites in the forearc/arc (Blue River 1 and 2, Santa Lucia, and Finca Asociación Nacional de Educadores y Educadoras (ANDE)), have their fluid MAGs clustering together and their sediments clustering adjacent to the fluid MAGs (Fig. 2 and Supplementary Fig. S1). This suggests that fluids and sediments from the same site are generally more similar to each other than to fluids and sediments in different pools, in agreement with 16S rRNA gene results [7]. MAGs cluster into five groups, based on their distribution across the sites, with a linkage height of 10 (Fig. 2 and Supplementary Fig. S1, vertical sorting). Cluster 1 MAGs are mainly in the forearc/arc, cluster 2 MAGs are abundant in the forearc/arc but are also common elsewhere, cluster 3 MAGs are mainly in the outer forearc, and clusters 4 and 5 MAGs are either not localized to any geologic province or are present in only a few sites with no clear distribution pattern across the convergent margin. Based on these clustering patterns, MAGs separate into three groups: two province-specific groups (cluster 1 from forearc/arc and cluster 3 from outer forearc) and delocalized (clusters 2, 4, and 5).

Metabolic and taxonomic descriptions of the clusters

Outer forearc (cluster 3)

Cluster 3 comprises 94 MAGs, 19 phyla, and 25% of total GCPM, almost exclusively from outer forearc sites. Bacteroidota, Nitrospirota, and Chloroflexota dominate numbers of MAGs (n = 21, 16, and 13 respectively), and 19%, 21%, and 9% of cluster 3 GCPM, respectively. The majority of cluster 3 MAGs (73%) are from uncultured families or higher taxonomic levels (Supplementary Fig. S5). Cluster 3 MAGs metabolic pathway predictions are dominated by hydrogen (79% of cluster 3 GCPM) and sulfur (26% of cluster 3 GCPM) oxidation, as well as sulfate reduction (29% of cluster 3 GCPM), microaerophilic respiration (22% of cluster 3 GCPM) and nitrogen reduction (13% of cluster 3 GCPM) (Fig. 3). Genomic material from the top 10 classes is predominantly found in geothermal areas, aquifers, and other deep subsurface systems (Supplementary Table S3).

Fig. 3: Average read abundance (GCPM) per site of MAGs with metabolic pathways for redox reactions and carbon fixation pathways.
figure 3

AC show the province-specific average read recruitment of MAGs with these metabolic pathways at all A outer forearc sites, B forearc sites, and C arc sites, after removing MAGs in the delocalized cluster, which are in panel D. Colors correspond to the top 10 most abundant classes for each province and the delocalized community. Panels B and C show top 9 and 8 most abundant classes, respectively, since MAGs lacking redox genes and carbon fixation pathways were in the top 10 most abundant classes. Bars represent the GCPM mean for MAGs at each province specific site.

WL is the most abundant and diverse CFP, encompassing 23% of MAGs and 29% of cluster 3 GCPM. Thermodesulfovibrionia MAGs form the dominant WL-containing group and have the genetic potential for hydrogenotrophic sulfate reduction. A single MAG classified as Ca. Bipolaricaulia (bin 327) is the second most abundant WL-containing member which also has the genetic potential for hydrogenotrophic growth [39]. Four MAGs classified as CSP1-3 (in phylum Armatimonadetes), the third most abundant WL-containing group, have the genetic potential for sulfur oxidation (Fig. 4). Together, these cluster 3 WL-containing members encompass 11 MAGs and 19% of cluster 3 GCPM. In this cluster, genes for rTCA and CBB are in lower abundance (5% each of cluster 3 GCPM). Syntrophia MAGs dominate the rTCA-containing MAGs and have the genetic potential for sulfate reduction (Fig. 4). Most CBB-containing MAGs in cluster 3 classify as Gammaproteobacteria and encode RuBisCO Form IA. This form is the primary form in chemolithoautotrophs from a diversity of subsurface ecosystems [40]. These MAGs have the genetic potential for sulfide oxidation and reduction of nitrate/nitrite [41] and/or oxygen (microaerophilic) (Fig. 4).

Fig. 4: Nitrogen, sulfur, and hydrogen cycling pathways in MAGs with carbon fixation pathways from the outer forearc and the forearc/arc.
figure 4

Arrow width corresponds to abundance of MAGs with that gene and dots show which CFP pathways were present in the MAGs that had that gene, with blue for WL, yellow for rTCA, and pink for CBB. Class names are listed from most to least abundant (top to bottom), and MAGs in the top 3 CFP-containing classes are in bold. Pink dashed box highlights sulfur-associated chemoautotrophic nitrogen reduction. Blue dashed box highlights hydrogenotrophic sulfur reduction. Gray dotted line with a question mark represents missing aprBA predictions for all sulfide-oxidizing MAGs.

Forearc/arc (cluster 1)

Cluster 1 comprises 102 MAGs, 34 phyla, and 38% of total GCPM, mainly in forearc/arc sites. Chloroflexota, Acidobacteriota, and Aquificota dominate this cluster (~15%, 14%, and 12% of cluster 1 GCPM, respectively). Chloroflexota MAGs are the most diverse of these (n = 15), then Acidobacteriota MAGs (n = 8), while there is only one MAG classified as Aquificota (bin 303, Sulfurihydrogenibium sp.). This cluster has most of the archaeal MAGs (n = 19) and 88% of the MAGs are taxonomically identified as uncultured families or higher taxonomic levels (Supplementary Fig. S5). The cluster contains MAGs with many redox metabolisms, predominantly nitrogen (60% of cluster 1 GCPM) and oxygen reduction (microaerophilic = 32% and aerobic = 26% of cluster 1 GCPM), as well as sulfur (39% of cluster 1 GCPM) and hydrogen (16% of cluster 1 GCPM) oxidation. Forearc sites contain the only MAGs capable of ammonia oxidation (n = 4; 5% of cluster 1 GCPM) within the archaea class Nitrososphaeria (Figs. 2 and 3). DNA sequences from previously sequenced representatives of the top 10 classes are predominantly found in geothermal areas, aquifers, and other deep subsurface systems as well as soils (Supplementary Table S3).

As with cluster 3, WL is the most abundant CFP in cluster 1 (n = 24, 25% of cluster 1 GCPM), and is mostly found in Anaerolineae MAGs with the genetic potential for sulfur-dependent reduction of nitrate and/or nitrite. Ca.Bipolaricaulia MAGs, the second most abundant group of WL-containing MAGs, have the genetic potential for hydrogen oxidation and the reduction of nitrogen species. All Thermodesulfovibrionia MAGs, the next most abundant group of WL-containing MAGs, contain all genes for sulfate reduction (dsrAB, aprAB, and sat). Together, the WL-containing members of Anaerolineae, Ca. Bipolaricaulia, and Thermodesulfovibrionia account for 16 MAGs and 20% of cluster 1 GCPM. A single Sulfurihydrogenibium MAG accounts for the prominence of rTCA as the second most abundant CFP (12% of cluster 1 GCPM). This MAG has the genetic potential for sulfur oxidation and microaerophilic reduction of oxygen (Fig. 4), similar to cultured relatives [42, 43]. It recruited no reads from the outer forearc and <0.05% total GCPM found in the arc sites, suggesting that it is specialized to the forearc (Supplementary Fig. S1). CBB is present in 5 MAGs from the Deinococci, Anaerolineae, UBA4738 (in phylum Actinobacteriota), and Cyanobacteria classes with only 8% of cluster 1 GCPM. The Deinococci and Anaerolineae MAGs contain the genetic potential to oxidize sulfur and reduce nitrogen species (Fig. 4).

Delocalized (clusters 2, 4, and 5)

MAGs in clusters 2 (78 MAGs, 24% of total GCPM), 4 (18 MAGs, 1% of total GCPM), and 5 (112 MAGs, 12% of total GCPM) are common across multiple provinces. Cluster 2 is more concentrated in the forearc and arc, but these MAGs are much more common in other provinces than those of cluster 1, and they share the characteristics discussed below with clusters 4 and 5, so we include them in the delocalized cluster. The top 10 classes in the delocalized clusters are commonly found in humans, animals, and/or plants [44]; human-influenced environments [45]; soil [46,47,48,49,50,51]; marine/freshwater [39, 47, 48, 50, 52,53,54,55,56]; and hot spring microbial mats [57]. These clusters contained three of the only four Cyanobacteria MAGs in the whole dataset, all of the common lab contaminant genera (Sphingomonas, Pelomonas, and Aquabacterium) in the whole dataset [58], and high abundances of human commensals (Chromobacterium haemolyticum, Acinetobacter junii, and Acinetobacter indicus [59, 60]) in hot spring spas at resort sites (Blue River and Recreo Verde). Borinquen is the only site dominated by MAGs from clusters 4 and 5. This could be because the low biomass at Borinquen allowed laboratory and human contaminants to comprise a larger proportion of the extracted and amplified DNA, causing it to cluster away from other forearc/arc sites. Most delocalized MAGs have the genetic potential for sulfur and/or hydrogen oxidation as well as oxygen and/or nitrate reduction. CBB dominates the CFPs in these clusters, mostly with Form I RuBisCO, which is typically found in organisms that respire atmospheric levels of oxygen (Supplementary Fig. S1).

Transformation-based canonical correspondence analysis

The tb-CCA of MAGs from province-specific clusters (clusters 1 and 3) explains 36.3% of the variation in the samples and recovers the same province-specific clusters predicted in Spearman-based hierarchical clusters. The important variables are, in order of most marginal effect significance: concentrations of sediment-derived aluminum, aqueous iron concentrations, DIC, temperature, aqueous phosphate concentrations, and aqueous nickel concentrations (Fig. 5 and Supplementary Table S4). Based on strong correlations, the concentration of aqueous iron is also a proxy for aqueous zinc concentrations. MAGs at outer forearc sites correlate with sediment-associated aluminum; MAGs at forearc sites correlate with DIC, phosphate concentrations, and to a lesser extent aqueous iron and nickel concentrations, while arc sites are tightly associated with iron and nickel concentrations. Distribution of WL-containing MAGs is well-explained by concentrations of aluminum, nickel, and iron, whereas rTCA- and CBB-containing MAGs are more influenced by DIC and phosphate concentrations. Heterotrophs are distributed alongside autotrophs, with the highly abundant heterotrophs co-occurring with rTCA- and CBB-containing MAGs. For MAGs in the delocalized clusters, a tb-CCA plot explains only 14.2% of the between-site variation, and results in no discernable clustering (Supplementary Fig. S6), suggesting that local geochemical variables are not major determinants of the distribution of the delocalized MAGs.

Fig. 5: Transformation-based canonical correspondence analysis showing variables most closely correlated with MAG abundance distribution at each site and site distribution of each MAG (scaling = 2).
figure 5

Distances between MAGs are chi-square distances. A 90° projection of a MAG marker on an environmental vector represents the maximum abundance of that MAG along that vector. Site markers are distributed around the weighted centroid of the MAGs. The model explains 36% of the total variation in community structure and the axes that are displayed contributed 62% of the total explained variation. The four MAGs with multiple carbon fixation pathways have those two colors overlain.

Discussion

Each site contains MAGs with multiple carbon fixation and metabolic pathway predictions, suggesting taxonomically diverse populations with carbon fixation and metabolic pathways that differ systematically across the three geological provinces (outer forearc, forearc, and arc) (Fig. 6). Approximately one-third of the metagenomic reads map to MAGs containing CFPs, consistent with the proportion of chemolithoautotrophs found in other subsurface lithoautotrophic ecosystems [27, 61, 62]. Even though temperatures are low enough for photosynthesis (<70 °C at >90% of sites), only four MAGs contain photosystems, and all are Cyanobacteria (<1% total GCPM). Only one was found in cluster 1 (bin 379) (0.04% of total GCPM) while the other three were found in the delocalized group. These findings support the conclusions reached by Barry et al. (2019) and Fullerton et al. (2021) that chemolithoautotrophy is the main driver of primary productivity in the subsurface of these natural spring communities.

Fig. 6: Schematic cross-section (to scale) of the 3D distribution of subsurface microbial communities and their metabolisms across the Costa Rica convergent margin, showing a shallowing of potential habitable area from the outer forearc to the arc.
figure 6

A subset of sites is shown. Crustal thickness, angle of subduction, and 122 °C isothermal depth are from Harris and Wang [3], while aquifer depth, location, and size are from Worzewski et al. [63]. Top 3 classes per province and their associated metabolisms are listed. Cell morphology displayed is based on morphologies of cultures from each class. Site colors follow those of Fig. 1.

While most MAGs (63% of total GCPM) separate by Spearman-based hierarchical clustering into those abundant in outer forearc sites and those abundant in forearc/arc sites (Fig. 2 and Supplementary Fig. S1), forearc and arc sites only differentiate from each other in the tb-CCA analysis (Fig. 5). The tb-CCA analysis shows the separation of sites by province (outer forearc, forearc, and arc), based on relationships between their associated MAGs’ correlations with metals, slab-derived carbon, and phosphate concentrations as well as temperature. Differences in community composition are significantly correlated with changes in concentrations of aqueous iron, nickel (with zinc by proxy), and sediment-associated aluminum (p < 0.01). Correlations with aqueous iron and nickel concentrations are strongest in the more acidic forearc/arc samples where reactions between the magmatic fluid and the underlying mafic rocks lead to higher concentrations of dissolved iron and iron-sulfides, also yielding pyrite framboids [7]. In particular, MAGs from Blue River (arc) and Finca ANDE (forearc) have the greatest association with aqueous iron and nickel concentrations, likely reflecting the greater influence of metals from the large hydrothermal system.

MAGs in outer forearc sites correlate with lower concentrations of aqueous phosphate and higher concentrations of sediment-associated aluminum. These fluids contain the lowest concentrations of aqueous phosphate (<0.001 mmol/L) and are partially equilibrated with feldspars and clays [7], likely derived from the dehydration of mineral-bound water in the slab material 12–15 km below the surface [63, 64]. The isostructural similarity of aluminum phosphate with polymorphs of silica allows for extensive aluminum phosphate substitution of silica in feldspar [65, 66]. Microorganisms in soil and environments low in aqueous phosphate can liberate this phosphate through the biological weathering of feldspar [67, 68], increasing the concentration of solid and aqueous aluminum species [69, 70]. Therefore, the correlation of outer forearc MAGs with aluminum concentrations may reflect an increase in the liberation of phosphate from feldspars. Relative to the other provinces, the outer forearc has a higher abundance of Thermodesulfovibrionia, Actinobacteriota, Desulfobacterota, and Gammaproteobacteria; the forearc has a higher abundance of Sulfurihydrogenibium sp., Thaumarchaeota, and Acidobacteriota; and the arc has a higher abundance of Firmicutes. Chloroflexi and Ca. Bipolaricaulota are abundant everywhere, although their individual MAGs differ between provinces.

Distribution of carbon fixation pathways across geological provinces

WL is the dominant CFP across the outer forearc, forearc, and arc, while rTCA is abundant only in the forearc, and CBB is in very low abundance throughout the provinces (Fig. 2 and Supplementary Figs. S1 and S4). The WL pathway is highly oxygen-sensitive, the rTCA cycle is moderately oxygen-sensitive, and both tolerate high temperatures [38, 71], making them suitable for subsurface hydrothermal systems. In addition to being more abundant and widespread, WL-containing MAGs are from more diverse taxa and contain a wider range of redox pathway predictions than rTCA and CBB. Since WL has the lowest energy requirements of all known CFPs [38], it may provide autotrophs with a selective advantage. rTCA is abundant only in the forearc, mostly in Sulfurihydrogenibium sp. The four MAGs (class Anaerolineae: bins 55, 251, and 386 and class Syntrophia: bin 351) that contain genes encoding two CFPs each may be adapted to variable carbon dioxide and oxygen concentrations [27, 38, 72].

WL-containing MAGs contribute most to the correlations between the microbial composition of the sites and higher iron and nickel concentrations (Fig. 5). WL has high metal requirements, especially iron and nickel, since nickel is required for CODH and iron is required for ferredoxin [38]. In addition, higher iron, nickel, and zinc concentrations are proxies for more reducing, low pH, and high-temperature environments, which would be favored by WL-containing organisms. Abundances of CBB- and rTCA-containing MAGs, on the other hand, are more associated with DIC and phosphate in the arc and forearc, suggesting that the availability of carbon and phosphorous may be limiting for these chemolithoautotrophic communities, rather than pH, redox metabolic elements, or other nutrients. This is supported by a positive correlation between DOC and DIC concentrations across these sites, suggesting that more DOC can be produced when more DIC is available [7]. DIC in this system is almost entirely derived from the deep slab/mantle mixture and it increases in concentration at the arc due to higher volcanic inputs and less removal of carbon (through the formation of calcite) in the crust [8]. The low abundance rTCA-containing autotrophs contain a low carbon dioxide affinity pyruvate synthase [73] that may be disadvantageous in the outer forearc where alkaline conditions favor calcite precipitation and low carbon dioxide concentrations.

Outer forearc WL-containing MAGs have a higher association with aluminum and a lower association with nickel and iron than their counterparts in the arc and forearc. In the outer forearc, iron and nickel concentrations remain constant through the sites (0.7 mg/g and 0.8 µmol/L respectively), except for the Quepos samples (1.83 mg/g and 2.5 µmol/L respectively). However, as mentioned above, aqueous phosphate here is limited. Nutrient-limited environments, such as those limited in phosphate [70], have been shown to encourage lithoautotrophic communities to enhance the weathering of minerals [69, 70, 74, 75], such as feldspar, leaving behind an aluminum-rich residual [69, 70]. The distribution of heterotrophic MAGs is similar to that of autotrophic MAGs, suggesting that they are either similarly influenced by the geochemical variables, or that autotroph distribution drives heterotroph distribution.

Metabolic potential in the outer forearc

In the outer forearc, hydrogenotrophic sulfate reduction is the most abundant predicted metabolic pathway and occurs in the widest range of MAGs, whether or not they have CFPs. In the outer forearc, dominant WL-containing MAGs classifying as Thermodesulfovibrionia and Ca. Bipolaricaulia have the genetic potential for hydrogen oxidation and Thermodesulfovibrionia MAGs also have genes for sulfate reduction, in agreement with other studies of this strictly anaerobic class [76]. Two of the four dominant rTCA-containing MAGs classify as Syntrophia and contain hydrogenases that have been demonstrated to perform hydrogen uptake (Supplementary methods), while all four contain dsrAB genes for sulfate reduction. The genetic propensity to perform carbon fixation via hydrogenotrophic sulfate reduction and hydrogen oxidation has been found in subsurface organisms associated with analogous submarine serpentinization sites (e.g., the Lost City marine hydrothermal field) terrestrial hydrothermal systems [77, 78], and a ~3 km deep borehole in South Africa [79].

Serpentinization-sourced hydrogen and low dissolved oxygen concentrations in the outer forearc create amenable conditions for WL and rTCA pathways, hydrogen oxidation, and sulfate reduction. These results contrast sharply with the high prevalence of methanogens, absence of sulfate-reducers, and evidence for very limited biogenic sulfate reduction just to the north in the Santa Elena serpentinizing system [80,81,82]. The Santa Elena system is ophiolite-hosted [81], whereas outer forearc sites in the Nicoya peninsula are more influenced by the mantle wedge [83]. Santa Elena also has lower sulfate concentrations, which may allow methanogens to compete for electron donors [81]. This suggests that differences in underlying geology, or possibly different degrees of inmixing of subducted sulfate-rich seawater, can drive differences in microbial metabolic pathways, even though both systems are serpentinizing and have similar pH and temperatures.

Metabolic potential in the forearc and arc

The most abundant and widespread predicted metabolic pathways in the forearc are sulfur oxidation and the reduction of oxygen and oxidized nitrogen compounds. Arc MAGs have fewer genes encoding for fully aerobic oxygen reduction (i.e., aa3-and/or bo-type cytochromes) and lack ammonium oxidation genes. Acidic water-rock interactions in the arc and forearc enable higher concentrations of aqueous carbon dioxide, which may alleviate DIC limitation in these chemosynthetic communities. In addition, the high pressure of deep magmatic degassing, cooling as these gases rise, and gas-fluid-rock interactions in local hydrothermal systems favor the more reduced sulfide over oxidized sulfate [84, 85] creating the ideal conditions for sulfide oxidation with high carbon dioxide availability as an autotrophic substrate. In the forearc and arc, the most abundant WL-containing MAGs (class Anaerolineae) and the lower abundant CBB-containing MAGs (classes Deinococci and Anaerolineae) have the potential to power carbon fixation with sulfur-dependent denitrification. The second most abundant WL-MAGs (class Ca. Bipolaricaulia) have the genetic potential to power carbon fixation with hydrogen oxidation using nitrate/nitrite as terminal electron acceptors. In the forearc, the rTCA-containing MAG (genus Sulfurihydrogenibium sp.) has the genetic potential for the aerobic oxidation of sulfide and thiosulfate. While sulfide oxidation is the most abundant energy pathway in the arc and forearc, the resulting oxidized sulfate can be used by the third most abundant WL-containing MAGs (class Thermodesulfovibrionia) to power carbon fixation with sulfate reduction. Although CBB is less prevalent, the CBB-containing MAGs (class Deinococci) contain form IC of RuBisCO, which may be advantageous in medium to high carbon dioxide suboxic environments [86]. MAGs classified as Anaerolineae in the forearc/arc contain an unknown form of RuBisCO that may be the recently discovered Form I’ or Form I-α RuBisCO which have been proposed to be the ancestral form of all RuBisCO forms [87].

As the distance increases from the trench toward the arc, the geothermal gradient to the 122 °C isotherm steepens [3], potentially compressing the overall volume of available subsurface habitat. The greater presence of respiratory pathways in the forearc and arc may therefore reflect their greater proximity to atmospheric recharge and faster recharge with oxygenated meteoric water driven by the steep geothermal gradient and high rainfall in these mountainous areas.

Delocalized clusters are surface associated

We explored the possibility that the delocalized MAGs in clusters 2, 4, and 5, constitute a low abundance surface-associated community. The tb-CCA analysis shows that the distribution of the delocalized MAGs across sites is poorly explained by local geochemical variables (Supplementary Fig. S6) and their ten most abundant classes are common soils or animal inhabitants. The delocalized clusters also contain the only MAGs from the genera Aquabacterium, Alishewanella, Rheinheimera, and Exiguobacterium whose 16S rRNA genes were previously found to lack correlation with geological parameters [7]. Most MAGs in clusters 2, 4, and 5 (51%) are in cultured family to species (Supplementary Fig. S5), a feature common in human-influenced environments [88]. The presence of common soil inhabitants and human commensals, prevalence of fully aerobic metabolisms including CBB, and lack of correlations with local geochemical and geological parameters suggest that the delocalized clusters are composed mostly of surface-associated MAGs and a few laboratories/in situ contaminants.

Conclusions

We conclude that the distribution of province-specific MAGs across the subsurface of the Costa Rican convergent margin correlates with aluminum, iron, nickel, DIC, and phosphate concentrations, which reflect the underlying geology, fluid sources, and fluid-rock interactions. Hydrogenotrophic sulfate reduction dominates the outer forearc, whereas more oxidized substrates such as nitrate and oxygen can oxidize sulfide in the forearc and arc. This suggests that, unlike individual volcanic systems (i.e., Yellowstone National Park [19, 89]), pH and temperature are not the driving variables that control the microbial distribution at a landscape scale in a convergent margin. MAGs with chemolithoautotrophic pathways comprise a third of this subsurface ecosystem. The WL pathway is found in a wide diversity of taxa, which collectively also contain genes encoding for all available metabolic pathways, in every part of the subsurface landscape, with metal availability associated distributions. rTCA, CBB, and sulfur oxidation are more prominent in the forearc and arc which have high carbon dioxide concentrations. We conclude that these subsurface chemolithoautotrophic ecosystems support a diverse set of metabolic processes and associated autotrophic pathways, tightly connected to deep subsurface geological and geochemical parameters such as dissolved metal content and the degree to which deep volcanic inorganic carbon and other volatile components are available for microbial usage.