Introduction

Heterotrophic prokaryotes play a key role in the cycling of elements in the ocean [1,2,3]. Efforts to decipher prokaryotes-environment interactions, facilitated by meta-omics technologies [4, 5], have revealed highly complex dynamics in marine microbial community structure and function [6,7,8,9,10,11,12,13,14,15,16,17,18]. Recent studies from global ocean expeditions provided insights into microbial diversity coupled with significant changes in gene repertoire and expression along temperature gradients and depths [13, 14, 18]. However, there are still fundamental gaps in understanding how individual prokaryotic taxa regulate their metabolic capacities in response to changes of other environmental factors, such as nutrient availability, and how their adaptation strategies may influence energy flows and nutrient cycling in their ecological niches.

The Southern Ocean remains one of the least explored ocean regions. The perennially cold waters present the largest high-nutrient, low-chlorophyll (HNLC) region of the global ocean, where iron is the primary limiting factor of biological productivity [19]. Heterotrophic prokaryotes experience a double constraint due to low concentrations of bioavailable iron and dissolved organic carbon (DOC) [20, 21]. The availability of these nutrients affects prokaryotic heterotrophic activities, particularly growth and respiration [22, 23], and as a consequence their functions in microbial food webs. Furthermore, iron is present in various chemical forms in the ocean [24] and a multitude of substrates constitute the pool of organic matter [25], challenging the exploration of the functional roles of diverse prokaryotic taxa in accessing these essential resources.

Southern Ocean islands are a source of iron to the surrounding seawater, leading to localized spring phytoplankton blooms [26]. Kerguelen Island, located in the Indian Sector of the Southern Ocean, sustains the largest iron-fertilized region [27,28,29,30]. The annually occurring diatom-dominated phytoplankton blooms east of the island have important consequences on microbial communities. Field studies and onboard experiments identified diatom-derived DOC as a key driver for prokaryotic diversity, activity, and seasonal community succession [21, 31,32,33,34]. The naturally-iron fertilized region off Kerguelen Island within HNLC waters provides a natural laboratory to examine phylogenetic and functional diversity of microbial lineages that mediate iron and carbon cycling.

In this study, we provide a comprehensive survey of the structure, genetic repertoire and expression pattern of the free-living (<0.8 µm size fraction) prokaryotic community in contrasting Southern Ocean productivity regions. We sampled three stations during the Marine Ecosystem Biodiversity and Dynamics of Carbon around Kerguelen (MOBYDICK) cruise in late austral summer (18th February to 30th March 2018), including one located in the naturally iron-fertilized waters and two off-plateau ones within HNLC waters. Metagenomic assembly and curation recovered a novel Southern Ocean meta-omics resource with 3 million protein-coding genes and characterized 133 metagenome-assembled genomes (MAGs) complementary to existing oceanic databases. Our main objective was to explore the distribution of prokaryotic functions related to iron and carbon metabolism in contrasting nutrient regimes and their links to taxonomy. We addressed this objective on the community and taxon-specific level by considering both the functional potential and the gene expression patterns.

Materials and methods

Sample collection, metagenome and metatranscriptome sequencing

Surface seawater (10 m) was collected at three stations in contrasting oceanic regions during the MOBYDICK cruise (Supplementary Fig. 1A). Station M2 was located in the naturally iron-fertilized waters above the central Kerguelen Plateau and stations M3 and M4 were located in off-plateau HNLC waters. The timing of the cruise covered the demise of the summer phytoplankton blooms (Supplementary Fig. 1B), as reflected in enhanced concentrations of dissolved organic carbon, prokaryotic abundance and heterotrophic production in on-plateau surface waters as compared to HNLC waters (Supplementary Fig. 1C, Supplementary Table 1 and Supplementary Methods) [35]. Station M2 was visited three times at an 8-day interval, and station M3 and M4 were visited twice at a two-week interval. For metagenomes, triplicate 6 L seawater samples, collected by Niskin bottles, were each filtered through 0.8 µm Polycarbonate filters (PC, Nuclepore). The cells in the <0.8 µm fraction were concentrated in 0.2 µm Sterivex filter units (Millipore). Total genomic DNA was extracted from the Sterivex filter units using the AllPrep DNA/RNA kit (Qiagen, Hiden, Germany) with modifications (Supplementary Methods). The DNA was extracted from the triplicate seawater samples collected during each of the repeated visits per station. Triplicate DNA extracts were pooled in equimolar amounts providing 1 pooled DNA extract per visit and station. The DNA extracts from the repeated visits (3 at M2 and 2 at each M3 and M4) were then pooled for each station to achieve 1 µg in 30 µL Tris for sequencing purposes. Three metagenomic libraries (one per station) were prepared using the Illumina Nano library preparation kit. For metatranscriptomes, 10 L seawater samples were immediately pre-filtered through 0.8 µm PC filters (Nucelpore) and the cells in the <0.8 µm fraction concentrated on 0.22 µm Express Plus Polyethersulfate (PES) filters (Millipore). RNA was extracted from the samples collected during the first visit at each site using the NucleoSpin® RNA Midi kit (Macherey-Nagel, Düren, Germany). Two internal standard RNA molecules were synthesized and added to each sample with known copy numbers. Technical details are provided in Supplementary Methods. Nine metatranscriptomic libraries (3 triplicates × 3 stations) were prepared using the Illumina TruSeq Stranded mRNA Library Prep kit. Paired-end sequencing (2 × 150 bp) was performed on Illumina HiSeq 4000 platform at Fasteris SA, Inc. (Switzerland).

Metagenome assembly and binning

Quality control (QC) passed reads from each sample were co-assembled using MEGAHIT (v1.0.4) [36] with default settings, resulting 949,228 contigs with a minimum length of 1,000 bp (Supplementary Methods). Sequencing and assembly statistics are summarized in Supplementary Fig. 2 and Supplementary Table 2. MetaWRAP (v1.1.3) [37] was used to assign contigs (≥2,500 bp) into metagenome-assembled genomes (MAGs) with the aid of three binning tools, including CONCOCT (v1.0.0) [38], MaxBin (v2.2.5; -markerset 40 -prob_threshold 0.5) [39] and MetaBAT (v2.12.1) [40]. Further refinement was implemented, based on read coverage and GC content of each contig in a MAG, by multivariate outlier detection using the aq.plot function in the mvoutlier package (v2.0.9) from R (v3.6.1). MAGs after contig outlier removal were reassessed using CheckM (v1.1.2) [41] for completeness and redundancy. Sequence-discrete populations closely related to the MAGs were identified as previously described using BBMap (v38.22) [42] (Supplementary Fig. 3). Comparisons between our Southern Ocean assemblies and existing databases, including the NCBI and the TARA Ocean Global Expedition Project, were performed to evaluate the novelty of our data (Supplementary Figs. 2 and 4). Technical details are thoroughly described in Supplementary Methods.

Metagenome functional profiling

A total of 3,003,586 protein-coding genes were identified from the 949,228 contigs by Prodigal (v2.6.3) [43] under meta mode (-p meta). Functional annotation was carried out against eggnog [44] using eggNOG-mapper (v1.0.3) [45], Pfam [46] using HMMER (v3.2.1) [47], KEGG [48] using GhostKOALA (v2.2) [49] and KofamKOALA (v1.0.0) [50], TCDB [51] using BLASTP (v2.7.1) [52], CAZy [53] using dbCAN2 (v2.0.1) [54], and MEROPS [55] using BLASTP (v2.7.1) [52]. Iron-related genes were further examined by FeGenie [56], and Fe-containing domains were characterized using Superfamily (v1.75) [57].

Metagenome taxonomic profiling

Taxonomy classification of the 133 MAGs was determined using the classify_wf function of the GTDB-Tk toolkit [58] based on the Genome Taxonomy Database (v0.3.0) (Supplementary Table 3). For phylogeny inference, 218 single-copy orthologous gene families shared by at least 20 (out of 133) MAGs were identified by OrthoFinder (v2.2.3) [59], aligned with MAFFT (v7.313) [60] and filtered by trimAl (v1.4) [61]. Maximum Likelihood (ML) phylogenetic reconstruction was performed based on the concatenation of the proteins using IQ-Tree (v1.6.8; -m TESTMERGE -bb 1000 -bnni) [62] (Fig. 1). Metagenome-assembled genes which were not included in the MAGs were subjected to taxonomic classification using Kaiju (v1.7.0) [63] with its precompiled nr databases. We also quantified taxonomic diversity and relative abundance in each sample by using SSU reconstruction and assembly-free taxonomic classifiers (Supplementary Fig. 5). Technical details are thoroughly described in Supplementary Methods.

Fig. 1: Genomic features of the 133 Southern Ocean (SO) metagenome-assembled genomes (MAGs) visualized using the circlize package (v0.4.9) in R (v3.6.1).
figure 1

The outmost circle shows the phylogenetic tree derived from the concatenation of 218 single-copy orthologous genes shared by the MAGs. The tip labels and branches are colored according to their taxonomic affiliations determined by GTDB-Tk [58]. The 50 MAGs whose genes were excluded from the DGE analysis due to low read counts in either metagenomes or metatranscriptomes are marked with asterisks. The second circle (“Cov.”) is a heatmap displaying the average coverage of depth (per million reads) of each MAG in each sample. The third to seventh circles are heatmaps showing the number of total transcripts (“Trans L−1”), transcripts from genes encoding ribosomal proteins (“Ribo.”), genes involved in Fe-related metabolic activities (“Fe”) and TCA cycle (“TCA”), as well as genes encoding carbohydrate-active enzymes (“CAZy”) in one liter of sampled seawaters (L−1). The value of transcripts L−1 of each MAG was further normalized by the length of the MAG (Mbp). The color schemes are given at the left bottom. The 8th circle illustrates the number of significantly differentially expressed genes at contrasting oceanic regions (on-plateau iron-fertilized vs. off-plateau HNLC waters). The orange bars represent the number of genes that are significantly higher expressed in the on-plateau M2 site, as compared to the off-plateau M3 and M4 sites. The blue bars summarize genes that are significantly higher expressed in the off-plateau HNLC waters.

Metatranscriptome transcript abundance and gene expression profiling

Read counts of each gene were generated using featureCounts (v2.0.0) [64] with the BAM files produced by mapping QC-passed meta-genomic/transcriptomic reads to the 949,228 annotated contigs using Bowtie2 (v2.3.5) [65] (Supplementary Methods). Besides common shared options including “-Q 1 --primary -p -B -P -C”, different settings were used for metagenomic (“-O --fracOverlap 0.25 --ignoreDup -s 0”) and metatranscriptomic (“-s 2”) reads. Based on internal standard recoveries (Supplementary Methods), we estimated the quantitative inventories of transcripts per liter of each gene, and enumerated transcripts mediating key iron uptake and carbon metabolism pathways (Fig. 2, Supplementary Figs. 68, and Supplementary Table 4). Functional diversity was measured by Shannon index based on the abundance matrix of functional groups in each sample using the “diversity” function of the vegan package in R (v3.6.1). The abundance of each functional group in a sample (M2, M3, or M4) was defined as the sum of all transcripts of genes assigned with the corresponding function, \(f_{Abundance}\left( {Func_i} \right) = \mathop {\sum }\nolimits_{gene\; \in \;Func_i} f_{Transcripts\;L^{ - 1}}(gene)\). The taxonomic composition of a functional group was assessed by the ratio of \(\frac{{\mathop {\sum }\nolimits_{gene\; \in \;Func_i \cap Tax_j} f_{Transcripts\;L^{ - 1}}\left( {gene} \right)}}{{\mathop {\sum }\nolimits_{gene\; \in \;Func_i} f_{Transcripts\;L^{ - 1}}(gene)}}.\)

Fig. 2: Community functional diversity and taxonomic composition within functional groups.
figure 2

Shannon index based on the abundance of functional groups (A) and shifts in taxonomic composition within functional groups (B) across sampling sites were calculated based on the community-level transcript abundance represented by the normalized per-liter transcripts estimated following the internal standards protocol [70] (E-J in Supplementary Figs. 6, 7). In B, the relative contribution (%) of a specific taxonomic category (e.g., Gammaproteobacteria) to a functional group (e.g., ferrous iron transporter FeoA) in each sampling station was calculated (Materials and methods). Shifts in the relative contribution across stations were estimated using the ratio of the relative contribution in M2 to that in M3 (or M4) and visualized by violin plots. A ratio value less than 1 indicates that the taxonomic category accounts for a larger share of the transcripts (L−1) of a functional group in the off-plateau HNLC waters, and vice versa. Multiple databases were considered, including CAZy, FeGenie, KEGG, Pfam, Superfamily and TCDB. Five dominant taxonomic groups in gene pool and transcript inventories across all sampling sites were shown. Color code is the same as Supplementary Figs. 6, 7. Only functional groups consisting of at least 50 genes, out of the 3,003,587 protein-coding genes predicted from the metagenome assemblies, were used in the calculation.

Furthermore, differential gene expression analyses were performed in two ways, including on the original metatranscriptomic read counts and on the metagenome-normalized metatranscriptomic profile. The normalization was performed to minimize the influence of genome abundance on the assessment of gene expression levels, given that fluctuations in transcript abundance could be a result of shifting genome copies rather than changes in expression levels (Supplementary Fig. 9A) [14]. We normalized the metatranscriptomic profile by relative gene abundance through the division of variance-stabilizing transformed count tables, and then converted the ratios to integer pseudo-counts in a range of 0 to 106 (Supplementary Fig. 9B). DESeq2 (v1.24.0) was applied to identify significantly differentially expressed genes (SDEGs) across contrasting oceanic regions (on- vs. off-plateau) at a false discovery rate (FDR) threshold of 0.1 [66]. Considering that during the MOBYDICK cruise the two HNLC sites (M3 and M4) were located at distinct water bodies separated by the Antarctic Polar Front (Supplementary Fig. 1A), we included in our design formula, besides the factor of iron concentration gradients, a term representing the influence of the Antarctic Polar Front. Genes, identified as significantly differentially expressed, were further summarized according to their functional categories (Fig. 3, Supplementary Figs. 10, 11 and Supplementary Table 4, 5).

Fig. 3: Statistics of significantly differentially expressed genes (SDEGs) involved in glycoside hydrolysis and key iron metabolic pathways.
figure 3

Panels from top to bottom represent glycoside hydrolase (GH), iron uptake regulators (Reg.), ferrous uptake (Fe2+), ferric uptake (Fe3+), siderophore biosynthesis and uptake (Sid. Syn./Upt.), heme uptake (Heme), iron storage (Sto.) and Ferredoxin/Flavodoxin switch (F/F). The full list of KEGG Orthology groups (KOs) related to iron metabolism examined in this study can be found in Supplementary Table 4. Each row represents one functional group. The two vertical panels show statistics of the SDEGs based on the metagenome-normalized metatranscriptomic pseudo counts and the corresponding log2-based fold changes. In the bi-direction bar plots, the bars pointing to the left indicate the number of genes that are significantly higher expressed in the on-plateau iron-fertilized M2 site, as compared to the off-plateau HNLC M3 and M4 sites. To the contrary, the bars pointing to the right represent genes that are significantly higher expressed in the off-plateau HNLC waters. The color scheme of taxonomy is shown on top.

Results and discussion

A novel Southern Ocean meta-omics resource

An average of 316.4 million (M) pairs of high-quality metagenomic reads were obtained from each station, achieving ~95% average coverage of the sampled communities (Supplementary Fig. 2A and Supplementary Table 2) [67]. Combined with another pre-sequenced metagenome from station M2 in early spring, a total of 1,286.5 M pairs of reads were assembled into 949,228 non-redundant contigs (≥1,000 bp), on which 3,003,586 protein-coding genes were identified. Most contigs and genes in the assembled metagenome had low similarity to sequences in the NCBI nt database (Supplementary Fig. 2B–E and Supplementary Methods), underlying the novelty of our data. Although almost half of the predicted proteins displayed homology with high similarity to sequences in the NCBI nr database (bitscore ≥200 and E-value <1e-10) [68], the amount of near identical matches (percentage of identity ≥90%) does not exceed 15.32% and another 717,088 (23.87%) proteins have no significant homologs found in the nr database (Supplementary Fig. 2F, G).

A total of 133 Southern Ocean MAGs were recovered, among which 116 have a completeness ≥50% and a redundancy <5% (Fig. 1 and Supplementary Table 3). The Southern Ocean MAGs represent a wide range of taxonomic groups, including 4 archaeal and 129 bacterial genomes. The classes of Alphaproteobacteria (n = 34), Gammaproteobacteria (n = 35), and Bacteroidia (n = 39) dominated the bacterial Southern Ocean MAGs, while other members belonging to Planctomycetota (n = 5), Myxococcota (n = 3), Verrucomicrobiota (n = 4), and Actinobacteriota (n = 3) were also present. Metagenomic read recruitment revealed variable abundance for some taxonomic groups across sampling sites, including the Pelagibacterales order (also known as the SAR11 clade) and Flavobacteriaceae family (Supplementary Fig. 3).

Only 13 of our Southern Ocean MAGs conform to >95% intra-species ANI values with counterparts from the TARA Ocean Global Expedition (Supplementary Fig. 4A and Supplementary Table 3) [69]. Metagenomic read recruitment analysis further confirmed that the novelty of our Southern Ocean assemblies is not derived from biases introduced during metagenome assembly and binning (Supplementary Fig. 4B–F and Supplementary Information). The protein-level comparison indicated more functional similarity than diversity, given that around 90.51% of the proteins in the Southern Ocean MAGs were assigned with orthologs in the TARA assemblies (Supplementary Fig. 4G, H).

Taxonomic profiling of prokaryotic communities

Regarding the considerable amount of metagenomic reads that could not be assembled into MAGs (Supplementary Table 2), we carried out additional taxonomic profiling analyses using both 16S rRNA reconstruction-based and assembly-free methods (Supplementary Methods), in order to obtain a less biased estimate of the microbial community composition in our samples. Overall, the dominance of Alphaproteobacteria, Gammaproteobacteria and Bacteroidia, as well as the variations in diversity and abundance of individual taxa across samples (Supplementary Fig. 5A, B, E), agreed with the observations in the 133 Southern Ocean MAGs. The SAR11 clade was ubiquitous and abundant across all sampling sites, and its 16S rRNA assemblies displayed high phylogenetic diversity (Supplementary Fig. 5A). Species diversity, measured by Shannon index, was higher for the microbial communities in the off-plateau HNLC M3 and M4 sites than those in the on-plateau M2 site (Supplementary Fig. 5C). Microbial community variability among stations was explored with double principal coordinate analysis (DPCoA) followed by Monte Carlo permutation tests, incorporating not only information on abundance patterns but also phylogenetic structures (Supplementary Methods). Ordination of communities by DPCoA revealed a significant clustering of taxonomic groups along the first principal component correlated with contrasting nutrient regimes (p value ≈ 0.001; Supplementary Fig. 5D and Supplementary Methods). However, the statistical significance of categorical explanatory variable (on- vs. off-plateau waters) could not be appropriately assessed due to the small number of representative samples per environment (Supplementary Methods).

Community functional potential and gene expression patterns

To assess the functional potential and gene expression patterns at the community level, we examined key functions in carbon and iron metabolism and the contribution of prokaryotic taxa to the respective functional groups across the metagenomes and metatranscriptomes obtained from the different sites (Supplementary Fig. 6, 7 and Supplementary Table 4). The overall functional potential, based on gene presence and absence in metagenomes, was similar across sampling sites at the community level (A–D in Supplementary Fig. 6, 7). Key metabolic genes were universally present in all samples, including those involved in iron uptake and carbohydrate-active enzymes (CAZymes). We further examined the community-level transcript abundance based on the normalized per-liter transcripts estimated following the internal standards protocol (E-J in Supplementary Fig. 6, 7) [70]. We did not observe an overall enrichment of functional groups related to iron and carbon metabolism in either the on- or off-plateau prokaryotic communities, but the patterns were quite patchy. For example, the siderophore transporters belonging to ExbD (K03559), ExbB (K03561), TonB (K03832), TonB-dependent outer membrane receptors (K16087 and K02014), ferrous iron transporter FeoA (K04758) and FeoB (K04759), as well as two transcriptional regulators Fur (K09823) and TroR (K03709), had higher abundance in the iron-fertilized waters. To the contrary, the vitamin B12 transporter (K16092), putative hemin transport protein HmuS (K07225), heme iron utilization protein HugZ (K07226), vacuolar iron transporter VIT (K22736), ferredoxin/flavodoxin switch relate HemG (K00230), ferric transporters FbpA (K02012), a siderophore transporter (K02016), and another Fur transcriptional regulator Irr (K09826) was more abundant in the off-plateau waters.

To better explain this mosaic pattern, we explored the possible link between taxonomy and function. We measured the functional diversity using the Shannon index based on the abundance matrix of functional groups across samples. In contrary to the species diversity (Supplementary Fig. 5C), the functional diversity of the on-plateau M2 site was no less than the average of the off-plateau M3 and M4 sites (Fig. 2A and Supplementary Fig. 8A). That is, the evidently lower species diversity in the on-plateau waters was decoupled from the community functional structure. The taxonomic compositions within functional groups across study sites provided a complementary perspective (Supplementary Figs. 6, 7). Overall, the SAR11 clade contributed slightly higher to the gene pool and transcript inventories in the off-plateau waters, whilst Flavobacteriales made up a larger share in the iron-fertilized on-plateau zone (Fig. 2B and Supplementary Fig. 8B). The Roseobacterales and Gammaproteobacteria adopted more flexible ecological strategies, as their contributions to the functional pool were similar in different waters. This pattern is consistent with the clear separation in community taxonomic composition across divergent environmental conditions (Supplementary Fig. 5D) and suggests the aforementioned mosaic transcript abundance as a result of environmental nutrient availability and microbial life strategies. That is to say, when the variation in environmental conditions leads to the selection for specific metabolic functions (e.g., DOC degradation, access to iron), the taxonomic variation within functional groups would be a result of both the importance of the specific function and the phylogenetic distribution of those functions [71].

We further recovered the gene expression profiles by normalizing the metatranscriptomic transcript abundance using the metagenomic gene abundance (Supplementary Fig. 9). The SDEGs obtained with and without the metagenome-based normalization were partially overlapped (Supplementary Fig. 10, 11), confirming that prokaryotic community transcripts vary as a function of shifts in both community composition and gene expression levels [14]. We classified SDEGs according to their functional groups and taxonomic affiliations, and confirmed that gene expression patterns were not fully determined by nutrient regimes, but more taxonomy-resolved with microenvironmental considerations (Fig. 3 and Supplementary Fig. 10, 11).

The Flavobacteriales group and Gammaproteobacteria constituted the majority of the SDEGs belonging to the glycoside hydrolysis (GH) and glycosyltransferase (GT) families, which primarily have higher expression levels in the on-plateau iron-fertilized waters. Among them, the most corresponding GH families included GH16 and GH17 responsible for the decomposition of glucans and galactans, GH92 for the degradation of mannoses, and the β-1,3-D-glucan phosphorylases GH149 with inconclusive roles [72]. Gammaproteobacteria and Flavobacteriales were also enriched in the SDEGs of the GH3 family, which facilitates the utilization of glucose, arabinose and xylose. These GH3 SDEGs were higher expressed in either the on- or off-plateau region. This intra-taxonomy difference in expression patterns across different nutrient regimes suggests a mixture of copio- and oligotrophic life strategists applied within these two taxonomic groups [73]. On the contrary, the SDEGs of the GH23 and GH73 families were mainly from the SAR11 clade. GH23 (lytic transglycosylases) and GH73 (β-N-acetylglucosaminidases) are both involved in peptidoglycan degradation, an essential macromolecule of the bacterial outer cell wall. The higher expression of these families in the off-plateau HLNC waters could indicate the use of peptidoglycan as carbon source or the accelerated growth of SAR11.

The SDEGs involved in Fe-uptake and Fe-related pathways exhibited similar patterns (Fig. 3 and Supplementary Fig. 11). SAR11 constituted a great proportion of the SDEGs coding for two iron-related transcriptional regulators (IscR and Irr; Fig. 3). IscR monitors Fe-S cluster homeostasis and is responsible for the autorepression of genes involved in Fe-S cluster biogenesis, such as the sufBCD operon [74]. Under oxidative stress and iron starvation, IscR is in its apoform and relieves its repression of the suf operon [75, 76]. Irr, a global regulator of iron homeostasis, functions as a sensor of the cellular heme biosynthesis and accumulates under iron limitation to control target genes [77, 78]. It is reported to be conserved in the SAR11 subgroup Ia and maintained by selection due to fitness advantage [79, 80]. The induction of the glyoxylate shunt (GS) is an efficient strategy for heterotrophic prokaryotes to maintain growth and respiration rates under iron stress [23, 81]. We examined three key enzymes related to the GS, including isocitrate lyase (K01637; ICL encoded by aceA) and malate synthase (K01638; MS encoded by aceB) within the GS pathway, as well as isocitrate dehydrogenase that catalyses the oxidative decarboxylation of isocitrate (K00031; IDH encoded by icd) (Supplementary Fig. 11). The upregulation of aceA and aceB indicates the elevation of the GS under stress conditions, whereas the variant expression of icd may provide a clue to the competition between IDH and ICL for the substrate isocitrate. The SAR11 clade accounts for a large share of the aceA and aceB SDEGs which have higher expression levels in off-plateau waters. The community-level abundance of GS-related transcripts generally agree with the observations of SDEGs (Supplementary Fig. 6). For instance, the amount of SAR11 aceA transcripts L−1 were more than doubled in the off-plateau samples. The enrichment of both aceA/aceB and icd suggests that the SAR11 GS system functions supplementary to the classic TCA cycle in response to iron and/or carbon limitation. Gammaproteobacteria and Flavobacteriales dominated the SDEGs encoding the TonB-ExbB-ExbD complex for siderophore uptake. Related functional groups were generally enriched with SDEGs in both the iron-fertilized and the HNLC waters at the community level, however associated to different taxa.

Taxon-specific ecological roles

To resolve ecological roles of prokaryotic taxa across contrasting oceanic waters, we proceeded our data mining effort at a finer resolution with the 133 MAGs (Fig. 1). Initially, we performed a systematic survey for metabolically active prokaryotes through transcript abundances. Ribosomal proteins (RP) are critical for protein synthesis and levels of RP transcripts have been proposed as an indicator for prokaryotic growth rates [82,83,84,85]. We surveyed 93 prokaryotic RP KEGG Orthology groups (KOs) through all our assemblies (Supplementary Table 4). Generally, taxa with higher grow rates (more RP transcripts) had also high cell metabolism (more total transcripts) in the off-plateau HNCL waters (R2 > 0.9), whereas in the on-plateau zone several MAGs showed an all-vs.-RP ratio depart from the fitted line (R2 < 0.5; Fig. 4). This provided us with interesting insights. First, environmental properties in the iron-fertilized on-plateau zone lead to a decoupling between cell metabolism and growth of several MAGs. Second, individual species from closely related taxonomic groups revealed diverse ecological strategies. For example, MAG_91, although it forms a monophyletic clade with MAG_126 on the phylogeny tree (Fig. 1), were better adapted to the HNLC environment at site M4 as compared to the on-plateau zone (Fig. 4).

Fig. 4: Ratios of ribosomal-protein versus all transcripts (L−1 Mbp−1) from 133 MAGs.
figure 4

Lines were calculated from Model II linear regression analyses. The corresponding formulas and R-squared measures are shown at the bottom.

The diversity in ecological strategies became more evident when we examined central metabolic pathways gene by gene (Fig. 5, Supplementary Fig. 12 and Supplementary Table 4, 5). While the SAR11 SDEGs tuned their expression in a relatively consistent manner among individual MAGs, we observed diverse expression patterns of SDEGs belonging to Flavobacteriales and Gammaproteobacteria. Generally, the Flavobacteriales MAGs constituted more genes that were significantly higher expressed in the iron-fertilized on-plateau zone, whereas a limited number of them displayed the opposite pattern by downregulating the expression of genes responsible for iron uptake and carbon metabolism in the same water.

Fig. 5: The distribution of significantly differentially expressed genes (SDEGs) in the MAGs among diverse functional categories related to iron uptake and carbon metabolism.
figure 5

Only 47 MAGs with SDEGs are shown here. From left to right, the panels represent the phylogenetic tree (the same as shown in Fig. 1), the iron-related KEGG Orthology groups (KOs), the KOs involved in the tricarboxylic acid (TCA) cycle, and carbohydrate-active enzymes (CAZymes). Each square block describes the statistics of a protein family in a MAG. An empty square suggests that no genes in the MAG (y axis) are classified into the corresponding functional group (x axis). A circle in the square block indicates the identification of homologs to a protein family in the MAG, with its size proportional to the number of genes assigned to that family. The square blocks are colored according to the differential expression patterns of its gene(s). As illustrated in Fig. 1, genes, which are significantly higher expressed in the iron-fertilized site M2 as compared to the HNLC M3 and M4 sites, are highlighted in orange; vice versa, in blue. Given that genes belonging to the same functional group might not be synchronized in their expression patterns, the transparency of each square block shows the percentage of genes that are significantly differentially expressed. We have not detected protein families whose genes were significantly shifting their expression levels in opposite directions (e.g., parts of the genes in the same family significantly upregulate their expression levels whilst others significantly downregulate theirs). The KO “K00240” (marked with an asterisk) is shown twice, because it is a Fe-S protein family and also involved in the TCA cycle. Among the pathways involved in carbon metabolism, KOs shared by multiple pathways are only shown once. All the information illustrated in this graph is summarized based on the differential expression analysis performed on the metagenome-normalized metatranscriptomic profile (see Methods).

Particularly, with respect to polysaccharide metabolism, two MAGs (MAG_51 and 99), belonging to the same Flavobacteriales UA16 genus but sharing an inter-species ANI value of 79.05% [69], exhibited contrasting expression patterns. Moreover, four MAGs (MAG_78, 3, 134, and 73) from the Flavobacteriales 1G12 family but distinct genera, which formed a monophyletic clade on the phylogenomic tree, also reflected niche divergence. For instance, polysaccharide utilization loci (PULs) are operon-like gene structures that encode co-regulated proteins that specialize in polysaccharide detection, uptake and hydrolysis [86]. PULs are prevalent in the Bacteroidota phylum, and typically feature with SusD-like substrate-binding proteins, TonB-dependent receptors (TBDR) and various CAZymes. We identified several PUL-like loci in four Flavobacteriales UA16 MAGs (Supplementary Fig. 13A–D and Supplementary Methods). All the UA16 MAGs consisted of a “GH149 + GH30 + GH16 (and/or GH17) + transporters” PUL structure, indicating a general utilization of glucans and galactans, but MAG_73 contained a unique “GH92 + GH78 ( + CBM67 domain) + transporter” locus. MAG_73 was ubiquitous in all sampling sites but more abundant in the off-plateau waters (Fig. 1 and Supplementary Table 3) and exhibited an opposite expression pattern to the other three (Fig. 5). The GH92 family exo-α-mannosidases function on α-linked mannose residues in an exo-acting manner and therefore are responsible for the depolymerization of α-linked mannans [87]. Algal mannans are widely distributed in marine ecosystems, and α-mannans were identified in red seaweed [88, 89] and the diatom Phaeodactylum tricornutum [90]. Recent studies demonstrated that marine bacteria, especially Bacteroidota, can degrade mannans [91,92,93]. Although whether the GH92 PUL facilitates the growth of MAG_73 across all sampling sites requires further study due to the incompleteness of MAGs. The utilization of mannans elucidates the metabolic potential of MAG_73 and suggests that it could occupy distinct niches as compared to MAG_78, 3 and 134. Fucose is another bioavailable monosacharide common in marine waters and released by diatoms [94]. We identified candidate gene clusters specific for fucose utilization among Verrucomicrobiae MAGs, showing orthology to the recently discovered functional loci in Lentimonas sp. CC4 (Supplementary Fig. 13E) [95].

We also observed MAGs with similar expression patterns in late summer above the plateau, but contrasting abundances in samples from early spring at the same site (Supplementary Table 3) providing subtle clues to seasonal adaptation. This was specifically the case for the metabolically active on-plateau MAG_103 and MAG_62 belonging to the Pseudomonadales HTCC2089 family but different genera (Fig. 5). To explore the potential genetic reasons behind this observation, we constructed pan-genomes by using both our MAGs and their closely related reference genomes. The incorporation of reference genomes into our analysis is to compensate the incompleteness of our MAGs. For the comparison between the MAG_103 and MAG_62, a total of 19 Pseudomonadales HTCC2089 draft genomes were retrieved from NCBI GenBank database based on the phylogenetic information provided by GTDB [58], including 15 from UBA4421 genus and 4 from UBA9926 (Supplementary Methods). The two MAGs shared most of their polysaccharide degradation and proteolysis genes and perceived to be competitors for similar resources (Supplementary Table 6). We identified one singleton chitinase (GH18) unique to MAG_103 but missing from all other Pseudomonadales HTCC2089 draft genomes, and three GHs (GH17, GH149, and GH158) conserved in MAG_62 but absent from the UBA4421 genus. However, as discussed above, GH149 is with indecisive function, and GH17 and GH158 share similar substrate specificities with GH16, which is common in HTCC2089. The number of peptidases-encoding genes was also comparable in MAG_103 (n = 102) and MAG_62 (n = 104).

We made an unexpected observation that could explain the different abundance patterns of these MAGs in early spring and late summer. We identified a gene cluster in MAG_103 related to light-induced energy acquisition, which was conserved in the UBA4421 genus but absent from the UBA9926 genomes (Supplementary Fig. 14A, B and Supplementary Table 6). This gene cluster consisted of 6 genes, encoding a bacteriorhodopsin (PF01036.18), a synthase (PF00348.17), a phytoene desaturase (crtI; K10027), a 15-cis-phytoene synthase (crtB; K02291), a lycopene beta-cyclase (crtL1; K06443), and a beta-carotene 15,15’-dioxygenase (blh; K21817). Bacteriorhodopsin could facilitate MAG_103 with the capability to use light as a supplemental energy source [96, 97]. The MAG_103 bacteriorhodopsin sequence contained a blue light absorbing glutamine “Q” and a proton pumping motif “DTE” (Supplementary Fig. 14C) [98]. The crtI, crtB, and crtL1 genes are involved in the internal retinal biosynthesis system and responsible for beta-carotene biosynthesis. Furthermore, the beta-carotene dioxygenase encoded by blh cleaves the beta-carotene to produce all-trans retinal, which could be used by MAG_103 as its photoreactive chromophore [99]. The activity of beta-carotene dioxygenase was reported to be iron dependent [100], and the expression of the blh gene was slightly upregulated, but not significantly, in the iron-fertilized region. All the other four genes involved in light harvest were significantly higher expressed in the on-plateau region as compared to the off-plateau waters. The extra energy supplied from light might be the reason that MAG_103 became more competitive in stratified summer surface waters.

MAG_103 and MAG_62 differed in their potential to use and resist to antibiotics (Supplementary Results and Supplementary Table 6). While competing with other prokaryotes in the late summer surface waters when bulk abundances reached 1.18 × 109 cells L−1 (Supplementary Table 1), the production of antibiotics might greatly facilitate MAG_103’s dominance over other species. MAG_103 genes involved in antibiotic production were higher expressed in the on-plateau region. The expression level of MAG_103 genes encoding the general secretory pathway proteins (gspC, gspD, gspE, gspF, gspG, and gspL), as well as the sec translocase system (secA, secB, and secD), were significantly upregulated in the iron-fertilized on-plateau water. The type II secretion (T2S) pathway, coupled with Sec translocon, is regarded as the main protein secretion pathway of bacteria, which is capable of transporting a wide range of substrates, including proteases, lipases, phosphatases, carbohydrates-degrading enzymes and toxins [101]. The translocation of antibiotics produced by MAG_103 through the outer membrane might be mainly facilitated by the T2S. Within the two-component system (TCS), MAG_103 genes encoded a phosphate regulon sensor histidine kinase PhoR, a phosphate regulon response regulator OmpR, an osmolarity sensor histidine kinase EnvZ, and an invasion response regulator UvrY were all significantly upregulated, possibly due to stress specific responses. Further, MAG_103 genes related to iron uptake, encoding a ferric transporter (fbpA), a ferredoxin (fdx), the TonB-ExbB-ExbD system, a bacterioferritin (bfr), a vitamin B12 transporter (btuB), a bacteriorhodopsin, and the Fe-S cluster assembly proteins (nfuA and sufBCD), were also significantly higher expressed in the iron-fertilized water. The significant increase in expression levels of TCA cycle enzymes, CAZymes, more than half of the peptidases [55], and the aerobic carbon-monoxide dehydrogenase subunits (coxS and coxL) indicated an enhanced carbon flux between phytoplankton and MAG_103 represented Pseudomonadales population during the bloom decline. Although these accessory genomic features and their corresponding expression patterns are not direct evidence related to iron and carbon metabolism, the enhanced competence may facilitate the survival and growth of the microbes, whose abundances influence their roles in nutrient cycles.

Deciphering the many unknowns regarding the ecological roles of marine prokaryotes inhabiting the Southern Ocean, undoubtedly a region of key importance in ongoing global warming, remains profoundly challenging. Here we provide a comphrehensive investigation of prokaryotic functional activities from the community level to individual taxa, targeting in situ responses linked to iron and carbon cycling. Despite remarkable shifts in community composition across contrasting nutrient regimes, we observed conservation of functional diversity through functional redundancy among community members inhabiting each ecosystem. The distinct gene expression patterns of individual taxa illustrate the link between the genetic repertoire of prokaryotic taxa and their diverse responses to the multitude of environmental factors. Our observations of a mosaic of taxonomy-specific ecological strategies in the cycling of iron and organic carbon provides insights how the habitat shapes microbial diversity in the ocean.