Metabolic pathways inferred from a bacterial marker gene illuminate ecological changes across South Pacific frontal boundaries

Global oceanographic monitoring initiatives originally measured abiotic essential ocean variables but are currently incorporating biological and metagenomic sampling programs. There is, however, a large knowledge gap on how to infer bacterial functions, the information sought by biogeochemists, ecologists, and modelers, from the bacterial taxonomic information (produced by bacterial marker gene surveys). Here, we provide a correlative understanding of how a bacterial marker gene (16S rRNA) can be used to infer latitudinal trends for metabolic pathways in global monitoring campaigns. From a transect spanning 7000 km in the South Pacific Ocean we infer ten metabolic pathways from 16S rRNA gene sequences and 11 corresponding metagenome samples, which relate to metabolic processes of primary productivity, temperature-regulated thermodynamic effects, coping strategies for nutrient limitation, energy metabolism, and organic matter degradation. This study demonstrates that low-cost, high-throughput bacterial marker gene data, can be used to infer shifts in the metabolic strategies at the community scale.

T he oceans cover 71% of our planet, and the microbial organisms that inhabit them catalyze important ecosystem services (such as O 2 production, C sequestration, and elemental cycling) which sustain life on Earth 1 . Because microbes execute key roles in numerous biogeochemical pathways, it is important to understand how the species and functional composition of these communities respond to environmental changes. On a geological timescale, a 13-million-year-long nanoplankton abundance time series analysis suggested that ecological functions are more important to community resilience and biochemical functions than species richness 2 . Mapping microbial biogeography in relation to abiotic and biotic parameters therefore merits intensive investigation 3 . A better understanding of the ocean genome 4 will allow society to better preserve and utilize the vast genetic diversity in marine ecosystems. Global oceanographic initiatives such as the GO-SHIP 5 and the GEOTRACES 6 programs originally surveyed only abiotic essential ocean variables such as temperature, salinity, and dissolved inorganic trace metals. These initiatives have recently started to include biological essential ocean variables, such as marker gene sequencing and metagenomics, in their sampling programs, potentially enabling scientists to fulfill critical knowledge gaps on how microbial diversity relates to the microbial community metabolic potential. The combination of these two data sets with measurements on microbial processes, such as primary productivity, will provide insights into how local environmental conditions modulate the relationship between functional microbial diversity and productivity across frontal zones and within ocean provinces.
The highly conserved 16S rRNA gene is commonly sequenced for prokaryotic identification and microbial community profiling; an analysis that has been employed to study many biomes around the world 3,7-9 . 16S rRNA gene sequencing, however, does not provide direct information on the metabolic capacity of the microbial communities studied; this information can be obtained from shotgun metagenomics and genome sequencing. Because metagenome assembly, binning, and taxonomic profiling is a complex process and requires a higher level of computational resources than 16S rRNA gene analyses 10 ; and because the amount of spatial and temporal 16S rRNA gene data readily available vastly surpasses that of shotgun data 11,12 , evolutionary modelers have often inferred the potential metabolic profiles of microbial communities from sequence data of marker genes such as the 16S rRNA gene 13 . Although this is an indirect method to estimate microbial metabolic functions, it has been shown that 16S rRNA gene data analysis with the open source software Phylogenetic Investigation of Communities by Reconstruction of Unobserved States (PICRUSt2 13,14 ) results in predictions of metabolic microbial profiles that strongly agree (i.e., high Spearman correlations) with results from shotgun metagenomics 14 . The best predictions generated so far were for the human microbiome, followed by those for the ocean biome 14,15 . It should be mentioned, however, that high Spearman correlations should be carefully interpreted, as metabolic profiles are highly conserved in bacteria 16 .
Our aim in this study is to test whether 16S rRNA gene-based metabolic reconstructions generated by the software PICRUSt2 can predict broad-scale latitudinal patterns in microbial metabolic capacity which agree with our current mechanistic understanding of functional microbial biogeography, both within and across ecological provinces in the South Pacific Ocean 17,18 . More specifically, by contrasting biomass estimators (concentrations of various photosynthetic pigments and of particulate organic carbon and nitrogen (PON)), primary productivity, and N assimilation measurements with existing quantitative and qualitative data from oceanographic literature, we test the validity of the following hypotheses: • H1: Primary productivity will be positively correlated with pathways associated with CO 2 -fixation. Frontal zones, which stimulate primary productivity 19 , should display a higher relative abundance of pathways associated with CO 2 -fixation and energy metabolism than less productive regions.
• H2: Cell metabolism pathways are positively affected by thermodynamics 20 -as temperature increases, more kinetic energy (adenosine triphosphate (ATP)) is required to maintain the cellular machinery and fuel metabolic processes. Therefore, it can be expected that an increase in temperatures will lead to an increase in the relative abundance of cell biosystems machinery (cell structure and cell wall biosynthesis pathways).
• H3: Pathways which reflect microbial strategies for coping with trace metal and macro-nutrient limitations will show latitudinal trends corresponding to element-specific abundances (i.e., high relative abundances of cofactor and secondary metabolite biosynthesis pathways due to iron limitation in the Southern Ocean (SO) 21 and to co-nutrient stress in the oligotrophic gyre 22 ).
• H4: The high availability of nutrients and seasonally defined production of organic matter in the SO and in the Subtropical Frontal Zone (STF) will result in higher relative abundances of pathways associated with energy production, such as lipid and carbohydrate biosyntheses, in these ocean provinces 23,24 ).
• H5: Higher rates of bacterial degradation of particulate and dissolved organic material in the SO 25,26 in comparison to the other zones should result in the former having a greater diversity (and relative abundance) of degradation pathways, including the presence of more complex compound degradation pathways.
In this study we compare these hypotheses and infer metabolic pathways with measured physico-biochemical parameters, 11 corresponding metagenomes and show evidence supporting that microbial functional diversity follows trends within and between oceanographic provinces as expected from existing quantitative and qualitative oceanographic literature. Such analyses may provide insight into the drivers of ecological changes and, overall, into the effects of biodiversity on marine ecosystem functioning.

Results and discussion
Hydrographic conditions. This study was conducted during late autumn and early winter in 2016 along the decadal repeated P15S GO-SHIP transect, which runs from the Antarctic ice edge to the equator along the 170°W meridional in the South Pacific Ocean (Fig. 1a). Sea surface temperatures along the transect gradually increased from −1.5°C at 66°S to 30.4°C at 5.5°S, and then decreased slightly, due to the equatorial upwelling, to 28.1°C at the equator (Fig. 1a, b). Surface salinity was lowest in the SO (33.80-34.30), increasing north of the Polar Front (60°S) to a maximum of 35.87 at 30°S, then decreasing north of 30°S to 34.50 at 10°S, and then increasing to 35.25 at the equator (Fig. 1a, b). Dissolved NO 3 − , Si, and PO 4 3− concentrations covaried above the mixed layer depth (MLD), and were closely linked to the major oceanographic provinces 17 . NO 3 − concentrations above the MLD were >16 µmol l −1 in the SO; between 1 and 16 µmol l −1 in the STF; ≤0.05 µmol l −1 in the South Pacific Subtropical Gyre Province (SPSG); and ≤2 µmol l −1 in the Pacific Equatorial Divergence Province (PED) (Supplementary Fig. 1). Differences in NO 3 − :PO 4 3− ratios above the MLD showed differences that illustrated the four distinct oceanographic provinces, averaging (±standard deviation (SD)) 14.24 ± 0.25 in the SO, 9.76 ± 2.3 in the STF, 0.43 ± 0.87 in the oligotrophic SPSG, and 5.6 ± 1.4 in the PED (Fig. 1c).
PICRUSt2 predictions, Shotgun metagenomes, and PP data. Ninety-two percentage of the 16S rRNA gene sequencing data were used to infer Kyoto Encyclopedia of Genes and Genomes (KEGG) Orthologs (KO) using PICRUSt2. On average, 30% of the shotgun metagenome sequences (MGS) could be assigned to a KO (Supplementary Table 1), and 1.4 ± 0.9% (average ± SD) of the total reads were eukaryotic. A comparative analysis between the KO predictions from PICRUSt2 and the KOs profiled from corresponding MGS was performed in a similar way as presented by Douglas et al. 14 to validate the PICRUSt2 predicted metabolic pathways. Spearman correlation coefficients (shown as R in Fig. 2a-d) were calculated between the KO abundances predicted by PICRUSt2 and the KO abundances profiled from MGS for all individual samples and for each distinct metabolic function. The Spearman correlation coefficients, which represent the similarity in rank ordering of KO abundances between the predicted and observed data, are summarized for the 11 samples and for each metabolic function in Fig. 2e (pathway correlations for all samples are shown in the Supplementary methods). Overall, the correlation coefficients showed strong associations between the ranks of the predicted KO and the MGS KO abundances, with the strongest correlation coefficients recorded for cofactor and vitamin biosynthesis pathways. Although the correlation coefficients for CO 2 -fixation pathways were the weakest, these PICRUSt2 predicted pathways were positively correlated with PP ( Fig. 2f). Similar to the work of Agrawal et al. 15 , who validated their PICRUSt2 predictions with qPCR data, we herein show that PICRUSt2 predictions related to CO 2 -fixation pathways can be verified (independently from MGS data) with PP rates. We do note that the saturation of the relative abundance of predicted CO 2 -fixation pathways (at >20 nmol C l −1 h −1 ) requires further investigation when extrapolating marker gene predictions to rates.
Bacterial and metabolic community diversities. Bacterial alpha diversity (Chao index) increased from the SO to the northern edge of the STF (between 66 and 40°S). Diversity decreased within the SPSG to then increase again north of 10°S (Fig. 3a). Canonical analysis of principal coordinates (CAP) plots and ANOSIM results of the sequence data revealed clear and significant differences in the bacterial communities (beta diversity) between all four oceanographic provinces ( Fig. 3b and Supplementary Table 2). PICRUSt2 analyses of 387 DNA samples c Conceptual mechanistic understanding of relative changes in the functional prokaryotic and microbial-eukaryotic biogeography. Blue rods represent heterotrophic archaea and bacteria, which recycle in winter the organic matter produced in the summer and autumn months in the high nutrient low chlorophyll (HNLC) region of the SO. High primary productivity (PP) driven by phytoplankton rich in chlorophyll (green discs) is expected in the STF due to turbulence and mixing (curved arrows). The oligotrophic SPSG is characterized by low PP and nutrient co-limitation, as well as by higher abundances of photosynthetic prokaryotes (yellow circles). The westward returning branch of the SEC is indicated with the gray arrow in c, d and can be the source of the increase in N:P ratios at 100 m depth. Equatorial upwelling and mixing in the PED results in an increase of the N:P ratio at the surface and, thus, in increased PP and chlorophyll concentrations. d Latitudinal plot of N:P ratios from the surface to 350 m depth. The thick white line represents the MLD; 16S rRNA sampling stations are shown in large gray circles; shotgun metagenome samples are represented by yellow stars; and CTD stations appear as small gray dots.
resulted in the inference of 400 MetaCyc pathways (the inferred functional diversity). No significant differences were observed in the alpha diversity of MetaCyc pathways diversities between the four oceanographic provinces (two-sided Wilcoxon test p > 0.05; Fig. 3c). CAP plots and ANOSIM results for the MetaCyc pathways, however, revealed significant differences in the composition of the metabolic potential between the four oceanographic provinces ( Fig. 3d and Supplementary Table 2). Metabolic community composition. The 400 inferred MetaCyc pathways could be collapsed into 41 secondary superclasses (Supplementary Data 1). Across the transect, the sum of the relative abundance of ten of these superclasses accounted approximately for 75-80% of the total relative abundance. These metabolic functions covered (in descending order of relative abundance) pathways of (1) amino acid and (2) nucleotide biosynthesis; (3) energy metabolism; (4) lipid and (5) carbohydrates biosynthesis; (6) cell structure and cell wall biosynthesis; (7) cofactor and (8) secondary metabolite biosynthesis; (9) vitamin biosynthesis; and (10) fermentation. CO 2 -fixation pathways accounted for <2% of the total relative abundance but are singled out to illustrate patterns in the data, as are a few degradation pathways discussed below. Trends in the relative abundance of these metabolic functions (see Fig. 4 for functions 3-8 and Supplementary Fig. 2 for the other functions) are described in more detail in the following paragraphs in relation to our hypotheses.
Primary productivity shapes ecological functions of the bacterial community (H1). Our first hypothesis, based upon the fact that PP is stimulated by increases in nutrient concentrations in frontal zones 19 , proposed that CO 2 -fixation pathways in frontal systems should be positively related to autotrophic production. Raes et al. 18 showed that, along the P15S GO-SHIP transect, the SO and the SPSG were areas of low PP, whereas the STF and the PED had relatively high PP (Figs. 1b and 4a). The authors also observed an important trend along this transect: a switch from net autotrophy (i.e., high CO 2 -fixation) in the STF to heterotrophy (i.e., high nitrification rates and degradation of organic matter) when light availability is reduced during the winter months in the SO 18 (more information under the subheading H3 and H4-Energy production and Degradation).
Pathways associated with CO 2 -fixation increased from 66°S to the northern edge of the STF at 40°S (35%) and from 10°S to the equator (by 30%), and decreased (by 36%) northwards within the SPSG (all oceanographic provinces were significantly different from one another; two-sided Wilcoxon tests, p < 0.05; Fig. 4b).
Energy metabolism and nucleotide biosynthesis pathways were positively related to the frontal zones, with both the STF and the equatorial upwelling showing significantly higher relative abundances than the SO and SPSG (two-sided Wilcoxon tests, p < 0.05; Fig. 4c and Supplementary Figs. 2B and 3).
Metabolic pathways associated with CO 2 -fixation, energy metabolism, and nucleotide biosynthesis, thus, showed similar latitudinal trends, which in turn were well aligned with basin-wide variations in PP (Fig. 4a-c). These trends were also in agreement with the distinction between productive (the STF and the PED) and less productive (the SO and the SPSG) oceanographic provinces (Fig. 1b). In our model predictions, the concentrations of nanoplankton and 19′-hexanoyloxyfucoxanthin (a diagnostic . Shifts in salinity and temperature characterize frontal zones, and both parameters were the main explanatory variables in predicting energy metabolism pathways. Besides supporting H1, these results also support previous findings that PP (autotrophic energy production) is a main driver for archaeal and bacterial richness across frontal boundaries in the South Pacific Ocean 17 .
Thermodynamic effects on cell biosynthesis machinery (H2). Our second hypothesis was formulated upon the concept that cell metabolism pathways are positively (though not solely) affected by thermodynamics 20 . As temperature increases, so does the energetic demand for an organism to maintain cellular machinery and metabolic processes (basal metabolism). In our study, metabolic pathways associated with cell biosystems machinery showed trends similar to those that would be expected with an increase in temperature. The relative abundances of cell structure and cell wall biosynthesis pathways were significantly different between all oceanographic provinces (two-sided Wilcoxon test, p < 0.05). Temperature-wise, these abundances showed a quasilinear increase of 20% from~3°C (55°S) in the SO to~25°C (22°S; Fig. 4e), though they decreased significantly thereafter (north of 22°S; two-sided Wilcoxon test, p < 0.05).
Though thermodynamic regulation by increasing basal metabolic rates comes with an energetic cost, the resulting increase in cellular machinery will ultimately enable the organism a more active lifestyle and faster responses to environmental challenges 20 . It should also be noted that basal metabolism involves ATPrequiring pathways that are essential for cell survival, with a significant proportion of these pathways being protein synthesis, which maintains potential energy gradients across membranes 20 . Furthermore, increasing temperatures have also been positively linked to a higher production of cell wall proteins 27 and to a change in the composition of cell structural membranes 28 . The fact that temperature has a regulating effect on the cell biosystems machinery suggests that cellular physiological adaptations, including qualitative and quantitative variations in the cell wall proteins 27,28 , would result in a higher relative abundance of (inferred) cell structure and cell wall biosynthesis pathways. As expected, we recorded an increase in relative abundances of cell structure and cell wall biosynthesis pathways as temperatures increased northward along the transect.
Although temperatures predicted 56% of the variability in our regression models for these pathways, we should also note that the concentrations of NH 4 + and of photosynthetic prokaryotes (picoplankton) together predicted 22% of those models ( Fig. 4e and Supplementary Data 2). The steepest increase in relative abundances of cell structure and cell wall biosynthesis pathways was observed in the SPSG. This tropical oceanic province is characterized by relatively higher NH 4 + uptake rates (in comparison to other provinces), high picoplankton concentrations, and low concentrations of organic matter with relatively high δ 15 N-indications of a food web that is dominated by high turnover of organic material 18 . The turnover rate of organic material and its incorporation into new bacterial biomass has been shown to be regulated by bacterial particle colonization 29 , which itself has also been positively correlated to the presence of cell wall proteins 30 . We therefore suggest another possible (additional) explanation for the increase in cell structure and cell wall biosynthesis pathways: in a system with low concentrations of organic matter, the ability to display multiple particle adhesion strategies would be an added advantage in the competition for particle colonization.
Coping strategies for nutrient limitations (H3). Our third hypothesis postulated that the inferred metabolic predictions would result in latitudinal trends corresponding to element-specific abundances and, thus, reflect microbial strategies for coping with trace metal and macro-nutrient limitations. As expected, we observed bimodal latitudinal trends for the biosynthesis of secondary metabolites and cofactors, which are related to trace metal and co-nutrient limitations. The relative abundance of these pathways significantly: decreased from 66°S toward the STF, increased north of the STF (with highest values in the SPSG), and then decreased again toward the equator (two-sided Wilcoxon tests, p < 0.05; Fig. 4g, h and Supplementary Fig. 2A-C), such that they were lower in the more productive regions (STF and PED) than in the least productive regions. Amino acid and vitamin biosynthesis pathways showed the same significant trends as described for secondary metabolite and cofactor biosynthesis pathways (two-sided Wilcoxon test, p < 0.05; Supplementary Fig. 2A, C).
The observed trends in the above-mentioned pathways agree with our conceptual understanding of these oceanic provinces, and with the available literature on the topic 21,22 . These results suggest that 16S rRNA data can potentially be used to track changes in how the microbial community copes with (essential) micro-and macro-nutrient limitation. Transport membrane proteins, much like secondary metabolites and cofactors, also play an important role in the transport of vital compounds, and the relative abundance of genes regulating their expression could also be higher in the nutrient-limited provinces. This is, however, still to be verified, as the functional output from PICRUSt2 does not resolve specific transporter pathways.
Seasonally defined energy production (H4). Our fourth hypothesis suggested that pathways associated with energy production, such as lipid and carbohydrate biosynthesis, would show higher relative abundances in the SO and in the STF because of the seasonally defined production of organic matter in these provinces. Indeed, we observed higher relative abundances of carbohydrates and lipid biosynthesis pathways in the SO and STF in comparison to the other provinces. The relative abundance of carbohydrate biosynthesis pathways increased significantly (by 12%) from 66°S toward the southern edge of the STF (52°S; Fig. 4j), and the average value in the SPSG was significantly lower (15%) than that recorded at the northern edge of the STF (40°S). North of 10°S their relative abundance increased significantly once again (by 10%; all trends were significant; two-sided Wilcoxon tests, p < 0.05; Fig. 4j). Similar trends were observed for changes in the relative abundance of lipid biosynthesis pathways (Fig. 4k), with values being highest in the SO, then declining by 22% from~52°S until~30°S, and subsequently increasing (by 10%) toward the equator (two-sided Wilcoxon tests, p < 0.05; Fig. 4k).
Lipids and carbohydrates are structurally essential molecules and important energy sources 31 ; and several prokaryotes are known to accumulate large amounts of lipid reserves, as these are advantageous for survival/against starvation and confer a strong evolutionary advantage 32,33 . The type (structural or storage) of lipid synthesized by microorganisms will depend on stress imposed on cells, the growth phase, and on environmental conditions such as nutrient availability in relation to (abundant) C sources, with storage lipids usually being accumulated under nutrient-limiting conditions 32,34 . The strong differences in light availability between the winter and summer in the SO and STF profoundly impact lipid trophodynamics 35 . During early winter the bacterial community will consume the remainder of the autumn production, likely using it to fuel (structural and/or storage) lipid synthesis. Our study was conducted during early winter in the SO and the STF, which could explain the high relative abundance of lipid biosynthesis pathways in the bacterial community at these latitudes (Fig. 3k, l).
The relative abundances likely also reflect bacterial adaptation to low temperatures (changes in phospholipid fatty acid composition to maintain membrane fluidity 23 ). The low temperatures in the SO, which result in slow cell growth and division, coupled with a high availability of C, N, and PO 4 3− , likely enable bacteria in this region to synthesize both structural and storage lipids. Any decreasing trends in the relative abundance of lipid biosynthesis pathways seen in the STF (in relation to the SO) would likely be due to increasing metabolic activities (due to increasing temperatures) and, thus, decreasing synthesis of lipids for storage; and to a decreasing availability of PO 4 3− for phospholipid biosynthesis. North of the STF in the tropical region the seasons are not as distinguishable, and the conditions are oligotrophic (Fig. 1b, c). Although laboratory studies have shown that prokaryotes will readily synthesize storage lipids under N-limiting conditions, this is only true when a C source is abundant. The low availability of organic C, regardless of nutrient input, in the SPSG thus likely explains the rapidly declining relative abundance of lipid and carbohydrate biosynthesis pathways observed in the region (Fig. 4j, k).
The bioavailability of PO 4 3− might also explain the declining trend in the relative abundance of lipid biosynthesis pathways 36 . It has been shown that, in PO 4 3− -deficient environments (such as the SPSG; Fig. 1c), heterotrophic bacteria and photosynthetic prokaryotes (picocyanobacteria) are able to engage in lipid remodeling (substituting phospholipids with non-phosphorus lipids, such as sulfolipids or glycolipids 37 ), a strategy which increases their survival at an evolutionary scale in oligotrophic areas of the ocean 37,38 . As this lipid remodeling is expressed at a community level, the shift in trends of the metabolic pathways might give insight regarding how bacterial communities cope with PO 4 3− -limitation in the South Pacific Ocean. Slightly higher relative abundances in the PED than in the SPSG would reflect an increase in PO 4 3− availability in this region (phosphorusassociated pathways were also observed predominantly in the PED).
Degradation pathways (H5). Our last hypothesis suggested that we would detect more degradation-type pathways during the onset of winter in the SO, given that it has been previously shown that the SO is a region of high nutrient recycling rates and breakdown of organic matter in winter (e.g., measurements of high nitrification rates 18 ). Although the secondary superclasses constituted~75-80% of the total abundance of pathways, other superclasses were also of importance when distinguishing (significant differences in) the main bacterial processes occurring within and between the ocean provinces. The results from our indicator analysis (see Supplementary methods) revealed that degradation pathways (proxies for heterotrophy; Supplementary  Fig. 4B-H) were characteristic of the SO, even though they occurred in low relative abundances. More specifically, we observed a greater variety of degradation pathways and the presence of more complex compound degradation pathways, such as aromatic compound and amino acid degradation (which decreased significantly between the SO and SPSG by 153% and 107%, respectively; two-sided Wilcoxon tests, p < 0.05), in the SO. Other degradation pathways included carbohydrate, sugars and acids, and fatty acid and lipid degradation (all decreased significantly from the SO to the SPSG by 54%, 133%, and 50%, respectively; two-sided Wilcoxon tests, p < 0.05; Supplementary  Fig. 4A-F). Our results also support those of Manganelli and Malfatti 26 , who concluded that bacteria and archaea are the most important producers of organic particles via organic degradation when light availability is reduced at higher latitudes.
Other important pathways. Fermentation pathways were observed along the entire transect (~2% relative abundance). On average, the highest relative abundances were recorded in the SO, followed by a steady decline (by 26%) from the SO to the southern edge of the PED. All oceanographic provinces were significantly different from each other, except the SPSG and the PED, for which differences in the relative abundances of fermentation pathways were not significant (two-sided Wilcoxon tests, p < 0.05; Supplementary Fig. 2D). The anaerobic degradation of organic material (including fermentation) contributes significantly to the degradation processes in marine sediments 39 . Because fermentation is favored under anoxic environments, studies targeting the potential of this process in the (mostly oxygenated) photic zone are absent to the best of our knowledge. Nevertheless anaerobic N-cycling processes such as denitrification and anammox have been shown to occur in anoxic and suboxic marine aggregates in oxygenated waters of the photic zone 40,41 . These microhabitats offer niches for a diverse range of metabolic pathways 42 , and the anoxic zones within marine snow particles could potentially harbor fermentative bacteria. We should note that the presence of fermentation pathways could be an artifact due to the presence of inactive sulfate-reducing bacteria and methanogenic archaea, which are capable of fermenting under favorable environmental condition 39 .
Sulfur metabolism (which significantly decreased by 18% between the SO and the SPSG; Supplementary Fig. 4A) pathways were also found to be characteristic of the SO by the indicator analysis. The SO is a known hotspot for sulfur cycling processes, in particular the production of dimethylsulfoniopropionate (DMSP) (as shown by the high presence of Phaeocystis sp. in Sow and Trull 43 ) and of the climate cooling dimethylsulfide gas (DMS 44,45 ). Members of the SAR11 and Planktomarina genera are also known DMSP degraders and were found to dominate the SO in this data set 17 , explaining the higher relative abundance of sulfur-metabolizing genes in this province ( Supplementary  Fig. 4A). The relative abundance of sulfur-associated pathways declined significantly north of the STF, but pathways were still detectable up to the equator (Supplementary Fig. 4A). The 16S rRNA-derived predictions were, therefore, in agreement with Landa et al. 46 , whose metagenomic analyses from the Tara Ocean' data set showed that a large range of marine bacteria are able to use dissolved organic sulfur metabolites, and that the latter play an important part in the global pelagic ocean.
The indicator analyses also highlighted the importance of denitrification pathways in the SO, with relative abundances decreasing significantly by 165% thereafter to the SPSG (twosided Wilcoxon tests, p < 0.05; Supplementary Fig. 4G).
Methanogenesis-associated pathways were characteristic of the STF (Supplementary Fig. 5), whereas phosphorous compounds-associated pathways were characteristic of the SPSG and of the PED, respectively (Supplementary Fig. 6).
Regression modeling. From the 22 biotic and abiotic parameters used to model latitudinal trends in metabolic pathways and PP (jointly referred to herein as pathways), only five were identified as the main predictors for ≥4 pathways (Table 1 and Supplementary Data 2). Temperature was the major predictor of energy metabolism; lipid, carbohydrate, cell structure and cell wall, and vitamin biosynthesis; fermentation; and CO 2 -fixation pathways (12-56%), and one of the main predicting parameters for amino acid and nucleotide biosynthesis pathways (11-15%). Total pigment concentration was the main predictor of cofactor biosynthesis pathways (43%), but also helped predict trends in amino acid, secondary metabolite, and vitamin biosynthesis and CO 2 -fixation pathways (11-20%). Chl b and nanoplankton concentrations and salinity each helped predict four pathways, being the main predictors for secondary metabolite biosynthesis pathways (29%), PP (26%), and nucleotide biosynthesis pathways (20%), respectively. Chl b concentrations also helped predict variations in amino acid biosynthesis, energy metabolism, and CO 2 -fixation pathways (8-13%); whereas nanoplankton concentrations helped predict nucleotide and secondary metabolite biosynthesis and CO 2 -fixation pathways (7-15%) and salinity helped predict energy metabolism and lipid and cofactor biosynthesis pathways (5-12%). δ 15 N and the concentrations of dissolved inorganic nutrients (NH 4 + , NO 2 − , NO 3 − , and Si) and of picoplankton were one of the main predicting parameters for only two pathways.
Considerations on the use of 16S rRNA gene sequencing for inferences on microbial functional ecology. We acknowledge that 16S rRNA metabarcoding is a broad-brush approach with a number of limitations for drawing conclusions about changes in functional ecology. Douglas et al. 14 and Langille et al. 13 clearly noted two main criticisms of functional estimates based on 16S rRNA amplicon-based hidden-state predictions. The first is that the predictions are obviously biased toward the available reference genomes (which was empirically quantified by Langille et al. 13 ). This limitation will be partially addressed in the near future as the number of metagenome-assembled genomes, and sequenced genomes in general, continues to increase. The second criticism, also confirmed through permutation analyses by Douglas et al. 14 , is that the 16S rRNA-based predictions do not provide the necessary resolution to detect biogeographic patterns of bacterial ecotypes of interest 47 .
We should note two examples from our results that clearly illustrate these limitations. First, the N 2 -fixation and nitrification metabolic pathways, which have been shown to be important in the South Pacific Ocean 18 , were not present in the PICRUSt2 MetaCyc outputs. This is likely because N 2 -fixation is not well resolved by 16S rRNA gene sequencing 48 and because bacteria involved in nitrification made up only 1% of the bacterial biomass in the samples (see Supplementary Fig. 9 in Raes et al. 18 ). Underestimating the occurrence of these pathways that contribute to inputs of new (N 2 -fixation) and regenerated (nitrification) N could ultimately lead to global oceanic models underestimating PP 49,50 . This limitation can, however, be addressed with the use of additional amplicon-based analyses. For example, N 2 -fixation functional gene (nifH) sequencing data have been coupled with direct rate measurements to reveal biogeographic patterns of the diazotrophic community 18 . Secondly, our analyses do not provide the necessary resolution to detect biogeographic patterns of ecotypes of interest. While ecotypes such as the Pelagibacter SAR11, the cyanobacteria Prochlorococcus, and the prymnesiophyte Phaeocystis sp., among others, appear functionally redundant in a broad, amplicon-based functional analysis, the fine-scale metabolic variations that have evolved among these ecotypes may have important bearing on the temporal and spatial structure of the community and productivity of the ecosystem 43,51-53 . Such limitations have been addressed with focused taxonomic analysis of, for example, amplicon sequence variant (ASV) data 43 .
A tool for large-scale functional ecology. We set out to test five hypotheses under the assumption that 16S rRNA gene sequences can offer significant insight into the functional diversity of bacterial communities in oceanographic studies. Our results demonstrated that the observed latitudinal trends in metabolic pathways generated by the PICRUSt2 software were consistent with measured physico-chemical parameters such as temperature, nutrient bioavailability, diagnostic pigments (e.g., fucoxanthin, prasinoxanthin, chl b, and zeaxanthin), and the isotopic fractionation of PON, among others. In addition, our observations aligned with our measurements of biogeochemical rates, with quantitative and qualitative predictions from the available literature, and with our overall mechanistic understanding of functional microbial biogeography in the South Pacific Ocean. A comparative analysis between the KO predictions from PICRUSt2 and the KOs profiled from the corresponding MGS provided support to the inferred metabolic pathways and, thus, to the proposed hypotheses. Our results exemplify the potential for lowcost, high-throughput mapping of (functional) biodiversity and ecosystem change in global monitoring campaigns such as GO-SHIP and (bio)GEOTRACES. Community-level metabolic information directly speaks to the state of, and changes in, ecosystems, while also complementing the information provided by abiotic variables, which are more routinely used to monitor the state of the oceans. The ability to query metabolic pathways in existing and future 16S rRNA gene data sets on a global scale establishes the opportunity to test hypotheses regarding how biodiversity influences functional diversity, and how these are related to energy production in the ocean. Deriving metabolic profiles from 16S rRNA gene data sets obtained by oceanic sampling programs on a global scale may, thus, provide a better understanding of the components of a resilient marine ecosystem and of how that resilience is tested through existing and emerging environmental stressors.

Methods
Study area and water sampling. This oceanographic study was conducted in the South Pacific Ocean onboard the R.V. Investigator from 23 April to 29 June 2016 along the longitudinal P15S GO-SHIP line at 170 o W (Fig. 1a). The P15S GO-SHIP line is a transect that runs from the ice edge (~66 o S) to the equator (0 o S; Fig. 1; http://www.go-ship.org/). The results presented herein are a continuation of the work from Raes et al. 17,18 along the P15S GO-SHIP transect. For clarity, we briefly reiterate some of the methodology applied to our study, but for an in-depth explanation on the physical and biochemical data validation and the presented C and N rate measurements we refer the reader to the aforementioned papers and the Supplementary methods.
For the purposes of this study, the P15S GO-SHIP transect was divided into four oceanographic provinces or Longhurst provinces 54 . From south to north the transect covered: (1) the SO between 66°and 52°S; (2) the STF between 52°and 40°S; (3) the SPSG between 40°and 10°S; and (4) the PED, between 10°and 0°S. Physical, biogeochemical and metadata were collected from 36 depth horizons at 140 stations (approximately every half a latitudinal degree). Full depth profiles for temperature, salinity, and dissolved oxygen were conducted using a Seabird (SBE25 plus) conductivity-temperature-depth profiler with a SBE43 O 2 sensor mounted on a 36 Niskin bottle rosette sampler.
DNA sequencing and bioinformatics. Samples for DNA analyses were collected from 12 litres Niskin bottles at three depths within the mixed layer. Depth horizons  61 . Functions were assigned using Diamond Blastx alignments 62 of the reads against the Clusters of Orthologous Groups of proteins and KEGG databases 63-65 using the lowest common ancestor (Huson et al. 66 ) and fun3 methods. The script make_databases.pl was used to download and format the latest versions of the databases (database creation on Tuesday May 5, 2020). The script combine-sqmtables.py was then used to generate and combine tabular outputs from all 11 samples. All scripts used in this study are available on the SqueezeMeta GitHub page https:// github.com/jtamames/SqueezeMeta.
Functional composition. The software PICRUSt2 (version 2.3.0b) 13,14 was used with default settings to infer approximate functional potential of the microbial communities sampled across the 7000 km transect in the South Pacific Ocean. The average Nearest Sequenced Taxon Index (NSTI) score, based on 387 samples (covering the three depths), was 0.134 ± 0.031 (±SD). Approximately 1% of OTUs (51 out of 4189) were above the maximum NSTI cut-off score of values >2, and were removed. These removed OTUs represented 2.9% of the relative abundance of the bacterial community, and included two OTUs which showed 94 and 99% match with Bathycoccus prasinos mitochondrial DNA and represented 2.8% of the bacterial community relative abundance. It should be noted that chloroplasts and mitochondrial sequences (7% of the data) were removed from the data set prior to the PICRUSt2 analyses. The relative abundance of each OTU (including its sequence at the 97% similarity threshold) with NSTI values >2 is shown in Supplementary Data 3. Two pathways with <10 reads were removed from the data set, namely PWY-6948/sitosterol degradation to androstenedione and PWY-6713/Lrhamnose degradation II. The final predicted metagenome pathway abundance data were converted to relative abundances per sample by rarefying to the lowest abundance per sample as suggested by Douglas et al. 14 .
Statistical analyses. Statistical analyses and data visualizations were performed with R version 3.6.1 67

Code availability
Scripts, key data files, and workflows to reproduce the analyses and plots are available from GitHub 68 .
Received: 27 August 2020; Accepted: 9 March 2021; Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/ licenses/by/4.0/.