Introduction

Prochlorococcus and its sister lineage Synechococcus are some of the smallest known photosynthetic organisms and are among the most numerically abundant life forms on the planet. These marine picocyanobacteria are unicellular, free-living, geographically widespread, and highly abundant in the oligotrophic subtropical/tropical ocean, often comprising half of the total chlorophyll [1]. Prochlorococcus and Synechococcus account for approximately 25% of global marine net primary productivity [2], making the picocyanobacteria key drivers of marine biogeochemical cycles [3]. Light gradients drive the vertical distribution of Prochlorococcus with low-light (LL) adapted clades occupying the deeper parts of the euphotic zone, and high-light (HL) adapted clades near the surface. This broad light-driven ecological and evolutionary division is further partitioned into mostly coherent genomic clusters (clades), each with distinct ecological and physiological properties [4]. Synechococcus clades partition along temperature gradients in the sea, but lack clear association with light and depth [5]. At the finest scale of diversity, picocyanobacterial clades further separate into distinct sympatric subpopulations, which share the majority of their genes but also contain segments of unique genetic material [6]. These variable genomic regions encode functionally adaptive traits that tune each population’s physiology to its local environment.

Iron (Fe) is a crucial micronutrient for marine phytoplankton due to its central role as an enzyme cofactor in cellular processes, including respiration and photosynthesis. As a result, the concentrations and chemical forms of Fe influence global carbon cycle dynamics [7]. Dissolved Fe (dFe) is scarce in much of the ocean and is mostly (>99%) complexed with organic chelating ligands that solubilize and stabilize the ions in solution [8]. Concentrations of these ligands quickly increase in response to Fe fertilization events, which implies that they are actively produced by members of the microbial community [9]. It is challenging to determine the chemical structure of these ligands [10], so their abundance and stability coefficients (a measure of how “strongly” the ligand binds Fe) are typically inferred electrochemically [11]. Many electrochemical studies operationally partition the ligand pool into weak (L2) and strong (L1) binding classes. The weak L2 class includes humic-like substances, exopolysaccharides, and undefined colloids. The strong L1 class includes siderophores, small Fe-binding molecules that microbes produce during periods of Fe starvation, but it is not clear how much of the L1 class are genuine siderophores. Under some conditions, siderophores appear to account for much of the strong L1 ligand fraction in seawater [12, 13].

There are two primary mechanisms by which marine microbes extract dFe bound to the organic ligands in the ocean. First, dFe can dissociate from organic ligands in the extracellular environment (via kinetic control, photodegradation, or cell surface reductases) and is imported across the outer membrane as an unbound, inorganic ion [14]. Second, whole Fe-ligand complexes can be directly translocated across cell membranes (Supplementary Fig. S1) [15]. Direct uptake pathways are prevalent in fast-growing copiotrophic marine bacteria with large genomes but absent in free-living marine bacteria with streamlined genomes such as Prochlorococcus, Synechococcus, and SAR11 [16, 17]. In these organisms, selection favors the minimization of genome size and metabolic complexity over the versatility of maintaining multiple direct Fe uptake pathways [16]. A decade ago, it was believed that marine picocyanobacteria fulfilled their Fe requirements only via the dissociation mechanism while relying upon a single inner membrane Fe(III) ATP binding cassette transporter [18, 19]. Prior work also showed that Prochlorococcus and marine Synechococcus isolate genomes lacked the genes necessary for siderophore biosynthesis and uptake [19]. This image changed when putative siderophore uptake gene clusters were identified from a handful of genomes from Prochlorococcus surface clades HLII and HLIV collected from remote, low-Fe regions of the global ocean [17, 20]. This exciting finding suggested that some Prochlorococcus populations had adapted to Fe scarcity by supplementing the uptake of dissociated dFe ions with the direct uptake of siderophore-bound Fe.

Siderophore uptake genes from cells belonging to HL adapted Prochlorococcus clades have been shown to be most abundant in surface waters that have low modeled Fe concentrations [21, 22]. However, the extent of siderophore uptake potential in cells belonging to LL-adapted clades, which are uniquely adapted to waters deep in the euphotic zone, is unknown. This is important because interactions between Fe and light limitation or Fe-light co-limitation may increase Fe demand for Prochlorococcus clades inhabiting deeper waters and the deep chlorophyll maximum layer (DCM) [23, 24]. Also of interest is understanding what environmental features, in addition to Fe, are associated with picocyanobacterial siderophore consumers. Here, we survey an extensive data set of picocyanobacterial uncultivated single-cell and cultivated isolate genomes to obtain a general picture of the phylogenetic and biogeographic structure of picocyanobacterial siderophore traits. Next, we combine biogeochemical and metagenomic time-series datasets from the Hawai’i Ocean Time-series (HOT) station ALOHA and the Bermuda Atlantic Time-series (BATS) station BATS [25] to more deeply understand seasonal dynamics and the depth distributions of Prochlorococcus siderophore traits in a community context. Finally, we quantify the Prochlorococcus and Synechococcus siderophore traits in 645 metagenomic samples from Tara Oceans [26] and GEOTRACES [25] and connect the distribution of the trait to thousands of trace metal and other biogeochemical measurements from across the world’s oceans. Using these rich data, we identify environmental features associated with picocyanobacterial siderophore uptake and reveal that siderophore uptake is predominantly a feature of low-light adapted LLI clade Prochlorococcus residing near DCM layers from the lower limits of the euphotic zone. These findings reveal new regions of phytoplankton Fe stress in the global ocean and underscore light and Fe availability as key features shaping the evolution of Prochlorococcus and Synechococcus.

Materials and methods

Data sources

We used a collection of over 700 Prochlorococcus and Synechococcus single-cell and isolate genomes collected from across the global ocean [27, 28], 195 surface and DCM metagenomes from Tara Oceans project [29], 480 metagenomes acquired from GEOTRACES cruises [25], and 133 metagenomes from the HOT and BATS time-series [25]. Trace metal and other chemical concentrations are from the GEOTRACES Intermediate Data Product IDP2017 version 2 (accessed January 2019) [30]. Samples were from sections GA02 [31, 32], GA03, GA10 [33], and GP13. Biogeochemical data from the Tara Oceans project was obtained from https://doi.pangaea.de/10.1594/PANGAEA.875579. Modeled climatological dFe from the MIT Darwin model v0.1_llc90 was obtained from the Simons Collaborative Marine Atlas Project (CMAP) [34]. Modeled dFe was averaged over 12 months at a 0.5 degree grid in 5 m depth bins from the upper 250 meters. Metagenome sequence data and associated metagenomes with environmental variables were quality controlled as described earlier [35, 36]. See supplementary material for details.

Prochlorococcus cell concentrations (qPCR-calibrated) from HOT and BATS [37] were obtained from the Biological and Chemical Oceanography Data Management Office https://www.bco-dmo.org/dataset/3381. L1 organic Fe-binding ligand data are from previous studies at HOT and BATS [12, 38, 39]. Data for BATS was derived from the occupation of BATS (Station 12) during the U.S. GEOTRACES GA03 cruise [39]. Siderophore concentrations (ferrioxamine E and G) from surface waters of the North Atlantic at stations 41–62 (data not available from the DCM) and the surface and DCM at HOT are from previous studies [12, 40]. In these samples, the DCM was defined as the depth range of maximum chlorophyll fluorescence at HOT (100–125 m) and BATS (90–135 m). The surface was defined as all depths shallower than the DCM.

Measurements of Fe-binding ligands and siderophores

We analyzed organic Fe-binding ligand data from BATS and HOT using competitive ligand exchange adsorptive cathodic stripping voltammetry (CLE-ACSV) as described previously [12, 38, 41]. We measured ferrioxamine E and G siderophore concentrations using liquid chromatography (LC) coupled to inductively coupled plasma mass spectrometry (ICP-MS) after pre-concentration via solid-phase extraction [12, 13, 42]. DCM samples were from a depth range of 70–200 meters, and surface samples were within a depth range of 3–45 meters.

Comparative genomics

The steps for producing the genome phylogeny in Fig. 1 are detailed in a prior publication [36]. All Prochlorococcus (n = 663) and Synechococcus (n = 96) genomes were annotated against eggNOG 4.5.1 [43] using eggNOG-Mapper v1.0.3–3-g3e22728 [44]. Siderophore transport gene clusters were identified preliminarily using best hits to COGs and PFAMs for the outer membrane TonB dependent receptor (Pfam ID: PF00593, COG ID: COG4771). This gene was used as an anchor gene for examining adjacent gene neighborhoods (Supplementary Fig. S1). Genomes were considered to be authentic siderophore consumers if at least five of the following families were detected adjacent to the outer membrane receptor or if the contig was interrupted: ATP binding cassette (ABC) solute binding protein (Pfam ID: PF01497, COG ID: COG0614), ABC permease (Pfam ID: PF01032, COG ID: COG0609), ABC ATPase (Pfam ID: PF00005, COG ID: COG1120), TonB protein (Pfam ID: PF03544, COG ID: COG0810), and ExbB (Pfam ID: PF01618, COG ID: COG0811), and ExbD (Pfam ID: PF02472). All Prochlorococcus genomes were surveyed for siderophore biosynthesis potential using antiSMASH v5.0 [45], which searches for known non­ribosomal peptide synthetases and poly­ketide synthase siderophore biosynthesis pathways. The true proportion of genomes with the siderophore transport cluster was estimated by accounting for genome incompleteness in the single-cell genomes as described earlier [35] and in the supplementary material.

Fig. 1: Phylogenetic and biogeographic patterns of picocyanobacterial siderophore consumers.
figure 1

A Sampling locations of picocyanobacteria isolate and single-cell genomes used for this study. Point size is proportional to the total number of isolated genomes from each location, and point color shows whether a siderophore consumer genome was isolated at that location. Map color displays the climatological annual mean total dissolved Fe concentration (nmol kg−1) integrated over the upper 55 meters of the water column. Oceanographic stations ALOHA and BATS are denoted for reference. B Siderophore uptake potential across picocyanobacterial clades. The phylogenetic tree was built with 96 Synechococcus genomes and 605 Prochlorococcus genomes and is rooted at Synechococcus sp. WH5701; nodes show bootstrap values (250 bootstraps). Triangles depict monophyletic clades, and the triangle area is proportional to the number of siderophore consumer genomes in each clade.

Metagenomic read classification and count processing

Metagenome reads were mapped to the MARMICRODB marine microbial database using Kaiju v1.6.0 [46] as described before [36]. Reads mapping best to Prochlorococcus and Synechococcus TonB dependent outer membrane receptors or single-copy, core gene families from Prochlorococcus and Synechococcus were retained for further analysis. The TonB dependent receptor was selected as an indicator for the presence of the entire siderophore uptake cluster because it is the longest gene in the cluster and because its sequence composition is conserved and distinct in Prochlorococcus and Synechococcus (Supplementary Fig. S2). There was no clear similarity cutoff between metagenomic reads and the receptor sequence that differentiated Prochlorococcus or Synechococcus clades, and we did not attempt to resolve individual clades by the TonB receptor (Supplementary Fig. S2). Receptor and core gene counts were length-normalized to the median length of the corresponding gene, then divided by the length-normalized median abundance of all Prochlorococcus or Synechococcus core genes. Samples where the median core gene coverage was less than 100 were excluded. See supplementary material for details.

Statistics and data analysis

Random forest regression was performed using the normalized Prochlorococcus TonB dependent receptor abundance as the response variable and a collection of 45 potential explanatory variables with the R package Ranger v0.12.1 [47]. Hyperparameters were tuned and the model trained using nested ten-fold cross-validation, reserving 20% of the data for estimating final model performance. Signal extraction using principal component analysis (PCA) was performed on scaled and centered predictors [48] in cross-validation. All principal components (PCs) cumulatively explaining 99% of the variance in at least one training partition were retained (27 PCs total). Signal extraction was performed to ensure that model predictors were statistically independent and to circumvent the significant inter-variable correlations in the environmental data. PC importance rankings were determined in each training partition using the Boruta heuristic [49], resulting in all predictor PCs consistently performing better than random. This step buffers against overfitting the model while assessing variable importance relative to baseline importance (i.e., noise). Boruta rankings were aggregated over the training partitions, resulting in a total of 20 PCs that, in combination, were better predictors of siderophore abundance than randomly generated data. Cumulatively, these PCs explained 97% of the variance in the original data.

After training and testing the final random forest, the original predictor variables were related to PCs using PCA loadings by taking the square of the eigenvector matrix to get the variance explained (R2) by each variable for each PC. All variables with an R2 ≥ 0.1 with at least one of the 20 informative PCs were retained (n = 26). These variables were then ordered by their rank contribution to predicting the distribution of Prochlorococcus siderophore consumers. The rank contribution was estimated as the sum of the variance explained by each variable across all 20 informative PCs, with each PC weighted by the relative importance obtained from the feature importance algorithm. Pearson’s correlation was used to measure the strength of the linear relationship between siderophore relative abundance and each predictor variable.

Beta regression was used to model the relationship between Prochlorococcus siderophore transporter relative abundance, nitrite, and DCM depth, while allowing the effect of nitrite to vary as a function of DCM depth. This model was selected because light-limited phytoplankton generate nitrite [50] and deeper DCMs are, presumably, more light-limited than shallower DCMs [51]. Continuous covariates were transformed to categorical covariates by binning into three roughly equally sized groups: 3.3e–4 < Lo NO2 ≤ 4.0e–2; 4.0e-2 < Md NO2 ≤ 7.3e-2; 7.3e-2 < Hi NO2 ≤ 1.4. The emmeans v1.6.0 R package was used to estimate the magnitude and statistical significance of marginal means for each covariate in the model. The seasonal effects in time-series metagenomes were estimated using Generalized Additive Mixed Models and Linear Mixed-Effect Models with the mgcv v1.8–26 and nlme v3.1–148 libraries in R as described in the supplementary material.

Results and discussion

Siderophore uptake potential in picocyanobacterial genomes

We searched a collection of over 700 marine Prochlorococcus and Synechococcus genomes (including cultivated isolate genomes and uncultivated single-cell genomes) for siderophore biosynthesis and siderophore uptake gene clusters. These genomes are from over 40 distinct geographic locations across the world’s oceans (Fig. 1A, Supplementary Fig. S3), including the well-studied oceanographic stations ALOHA and BATS, and span the breadth of Prochlorococcus and Synechococcus phylogenetic diversity (Fig. 1B). Samples were collected throughout the euphotic zone at depths ranging from five meters to over 200 meters depth. In agreement with previous studies, none of the Prochlorococcus and marine Synechococcus genomes contained siderophore biosynthesis gene clusters [19, 52]. However, we identified 47 genomes with siderophore transport gene clusters (Supplementary Fig. S1) - hereafter siderophore consumers - and they were all either found in, or isolated from remote regions spanning the oligotrophic N. and S. Pacific Ocean gyres (Fig. 1A). Synechococcus siderophore consumers were from the North Pacific subtropical gyre (Supplementary Fig. S3) and polar frontal regions between 28 °N and 37 °N [27], where surface dFe concentrations are typically low. The restriction of picocyanobacterial siderophore consumers to the Pacific is analogous to the strong ocean basin segregation of genomic adaptations to nitrogen and phosphorus limitation in Prochlorococcus [53,54,55]. Because the Pacific Ocean encompasses over 70% of Fe-limited ocean regions [56] and siderophore uptake is a form of Fe scavenging [57], together these findings suggest that Fe scarcity is an important selective pressure on picocyanobacterial genome content in the oligotrophic Pacific ocean.

We next examined the distribution of the siderophore uptake trait among clades within the marine picocyanobacterial phylogeny. The trait is distributed unevenly among different clades of Prochlorococcus: 2% and 14% of the genomes from HLII and HLI clades of Prochlorococcus, respectively, contained these clusters, while 31% of those from LLI Prochlorococcus contained them (Fig. 1B, Supplementary Fig. S1). The relative frequency of the siderophore trait increased within Prochlorococcus ecotypes with lower temperature and light level optima: for example, HLI (14%, N = 89) and LLI (31%, N = 84) [4]. We also identified the siderophore transport gene cluster within genomes from Synechococcus clade 5.1 A (37%, N = 42) in subclades II, III, IV, UC-B, and CRD2, with subclades IV and CRD2 accounting for 70% of all Synechococcus siderophore consumers. Although members of Synechococcus clades do not stratify with depth like Prochlorococcus clades, some members of clade 5.1 A display low-light adapted phenotypes [58], and subclades III and I (whose distribution follows clade IV) in particular are optimized for growth at low irradiance [58].

One caveat is that the total number of picocyanobacterial genomes and the clade composition between ocean basins were different, so the observed gene frequencies are likely biased to some extent. Furthermore, many sampling sites were represented by only a handful of genomes, which is too small of a sample size to reliably assess whether a trait is absent in the sampled population. Still, these findings from complete and partial genomes clearly hint towards genuine trends of higher picocyanobacterial siderophore consumer frequency in the pacific ocean and in clades that thrive deep in the euphotic zone.

Previous studies of siderophore uptake genes in Prochlorococcus have been limited to Fe-limited surface waters which are dominated by cells belonging to HL adapted clades [20,21,22]. Our finding that potential siderophore uptake is most prevalent within the LLI clade (Fig. 1B) implies that picocyanobacterial siderophore use is most likely to be associated with low-light conditions. LLI Prochlorococcus is the dominant clade at DCM layers, which form near the base of the euphotic zone and are shaped by a balance between diminished light and enhanced macronutrients (N, P, Si) from deep waters. Furthermore, Fe-limitation or Fe/light co-limitation can be a persistent feature of this layer [23]. Phytoplankton Fe-limitation at the DCM may emerge due to the upregulation of the Fe-rich photosynthetic apparatus under low-light [59], which may increase Fe demand relative to supply [60] and thus low-light Prochlorococcus clades likely require more Fe than high-light clades [24]. Indeed, Prochlorococcus has relatively high photosynthetic Fe requirements under low irradiance, and there is evidence that LLI Prochlorococcus, in particular, is uniquely adapted to the low-Fe and low-light conditions typical of the DCM [24]. In some cases, the high Fe requirements of LLI Prochlorococcus may make them more sensitive to fluctuations in Fe concentration than eukaryotic phytoplankton [61].

Furthermore, low irradiance at the DCM should suppress the photodegradation of siderophores and the release of free dFe [62], which implies that direct siderophore uptake traits may provide higher relative fitness for low-light adapted Prochlorococcus. Thus, competition for inorganic dFe ions at the DCM is intense, and the ability to utilize siderophores may allow Prochlorococcus to reduce relative competition for Fe through niche differentiation.

Spatial and temporal patterns at two long-term ocean study sites

We further explored the Pacific/Atlantic divide and prevalence of the uptake trait in picocyanobacteria genomes by examining metagenomes from the surface, the DCM, and the bottom of the euphotic zone at Stations ALOHA and BATS. We focused on the distributions of Prochlorococcus because Synechococcus genomes did not recruit enough reads at ALOHA for a robust comparison between the two oceans. Indeed, Prochlorococcus is generally two orders of magnitude more abundant than Synechococcus at ALOHA [63]. ALOHA is located in the N. Pacific subtropical gyre [64], while BATS is located in the N. Atlantic subtropical gyre. Both stations are oligotrophic, have comparable net primary productivity and carbon export [65], and are numerically dominated by Prochlorococcus and SAR11 [66]. However, atmospheric iron fluxes are significantly higher at BATS compared with ALOHA [67]. Siderophores, strong Fe-binding ligands (L1), and dFe concentrations have been measured in many studies at these locations [12, 38,39,40]. This wealth of data allowed us to examine relationships between siderophore consumers and Fe-binding ligands, total dFe, and siderophore concentrations on seasonal time scales.

Prochlorococcus siderophore consumers were absent at BATS but constituted up to half of all Prochlorococcus genomes from DCM metagenomes at ALOHA (Fig. 2A), where Prochlrocococus is typically responsible for the majority of productivity, biomass, and chlorophyll concentration [68]. We estimated the clade-integrated proportion of siderophore consumers within the total Prochlorococcus population because we could not identify one sequence similarity threshold for short metagenomic reads that distinguished Prochlorococcus outer membrane siderophore receptors by clade (Supplementary Fig. S2, see methods). The LLI clade dominates the DCM at both ALOHA and BATS [37] and has the highest frequency of single-cell genomes with siderophore transporters. Likewise, the frequency of siderophore consumers in the total Prochlorococcus community at the DCM (i.e., from metagenomes) was strongly positively correlated with the relative abundance of the LLI clade (Fig. 2A). Siderophore consumers also followed a seasonal pattern at the ALOHA DCM, coinciding with peaks in LLI abundance in late summer and early fall (Fig. 2B, Supplementary Fig. S4). This pattern was not present at BATS, where the trait was effectively absent. The annual spring/winter Asian dust flux is the dominant new Fe source to the N. Pacific subtropical gyre [38, 69], and this annual deposition event coincides with the lowest abundance of siderophore consumers at the ALOHA DCM (Fig. 2B). Seasonal variations in Fe supply may contribute to the seasonal succession of siderophore consumers and Prochlorococcus clades [37] at the DCM in the subtropical N. Pacific.

Fig. 2: Siderophore consumers associate with the DCM in the N. Pacific, but not the N. Atlantic.
figure 2

A Prochlorococcus HLII and LLI cell density over 2 years at HOT and BATS as determined by qPCR [37]. Contour plots of cell concentrations are cube root transformed. The deep chlorophyll maximum (DCM) depth range is highlighted in red. fsidero. consumer is the fraction of Prochlorococcus siderophore consumers, and point size is proportional to relative abundance. B Modeled seasonality of of siderophore consumers, LLI, and HLII Prochlorococcus clade abundance at the DCM at stations BATS and ALOHA. Seasonality is represented with a smoother function in a Generalized Additive Model (see methods). Periods with a positive seasonal effect can be interpreted as having higher abundance than the annual mean. Bands are 95% confidence intervals of the smooth function fit. Only terms with a significant seasonal effect smoother (P < 0.05) are shown. C dFe, L1 strong Fe-binding ligands, and summed ferrioxamine E and G siderophore concentrations in the N. Pacific and N. Atlantic subtropical gyres. Points and error bars are marginal population means and 95% confidence intervals from regression. Red asterisk indicates p < 0.05. Full model results are in Supplementary Table S1.

We also found that biogeochemical patterns of dFe and Fe-binding ligands were inversely related to patterns of siderophore consumer abundance at stations ALOHA and BATS (Fig. 2C, Supplementary Table S1). Station ALOHA had the lowest total dFe and the lowest L1 ligand concentrations. The significant enrichment of siderophore consumers at the ALOHA DCM relative to the surface also coincided with a sharp decrease in total L1 concentration, potentially due to biological demand and uptake. However, measurements of dFe and dFe-binding ligand concentrations are positively correlated [70], and this correlation may be, in part, an artifact of the analytical methods used to estimate complexometric titrations parameters in seawater [71]. Thus, the local L1 minimum at the ALOHA DCM cannot be unambiguously attributed to biological uptake. Furthermore, L1 does not represent the concentration of any particular siderophore molecule but is an aggregate metric for the Fe-binding organic matter in the sample. Ferrioxamine siderophores showed no systematic concentration difference between the Atlantic and Pacific nor between the DCM and surface at ALOHA (Fig. 2C), which suggests that ferrioxamine siderophore production may be similar in both oceans. Thus, there is a potential for differences in the chemistry of iron-binding ligands at the DCMs of BATS and ALOHA, but the pattern of Prochlorococcus siderophore consumers most strongly follows differences in total dFe concentrations between the DCM at BATS and ALOHA. These concentration differences reflect higher atmospheric Fe fluxes in the Atlantic [72], as well as differences in dFe flux from below the DCM. For example, there is a weak ferric line below the DCM at ALOHA [38], while the dFe gradient below the DCM at BATS is comparatively strong [73], which implies a higher upwards flux at BATS (Supplementary Fig. S5).

Global patterns of siderophore uptake genes

We next examined the abundance of Prochlorococcus and Synechococcus siderophore consumers in ocean areas not covered by the genomic and time-series metagenomic datasets by quantifying the abundance of picocyanobacterial siderophore consumers in metagenomes from the upper 300 meters of the global tropical and subtropical ocean. In general, results from the global metagenomic data set reinforced our findings from the time-series metagenomes for Prochlorococcus (Fig. 2) and the single-cell and isolate picocyanobacterial genomes (Fig. 1, Supplementary Fig. S3). Collectively, the global metagenomes revealed: (1) a high frequency of Prochlorococcus siderophore consumers in the subtropical and tropical Pacific relative to the N. Atlantic Ocean, and (2) high frequencies of Prochlorococcus siderophore consumers at the DCM - especially oligotrophic DCMs from the Pacific Ocean from depths over 100 meters (Fig. 3A). Synechococcus consumers generally followed a similar trend to Prochlorococcus with highest abundance in the Pacific ocean and at the DCM. However, there was generally low total abundance of total Synechococcus at many sites from the interior of the gyres - e.g., GEOTRACES line GP13 (Fig. 3B). Both Prochlorococcus and Synechococcus siderophore consumers were also particularly abundant in the eastern South Pacific near known oxygen minimum zones. All samples used in our study were collected from fully oxygenated seawater and little variance in our analysis was explained by oxygen concentrations (Fig. 4, Supplementary Fig. S6). However, prokaryotic biomass in oxygen minimum zones has elevated trace metal signatures which may indirectly influence Fe cycling above the oxycline [74].

Fig. 3: Global biogeography of picocyanobacterial siderophore consumers identified from metagenomes collected from deep chlorophyll maximum layers.
figure 3

Fraction of (A) Prochlorococcus genomes and (B) Synechococcus genomes with siderophore uptake gene clusters (fsidero. consumer) at the DCM in the GEOTRACES (circles) and Tara Oceans (triangles) metagenome dataset. Samples where median core gene coverage was less than 100 are denoted by X. In (A) samples from the GP13 section (S. Pacific) are expanded for increased visibility. Point color and size are proportional to abundance.

Fig. 4: Oceanographic features associated with Prochlorococcus siderophore consumers.
figure 4

Environmental variables (heatmap left) are colored by variable type and ordered (heatmap right) by their rank contribution to predicting the distribution of Prochlorococcus siderophore consumers (see methods). Top barplot shows the informative principal components derived from the combined dataset and ranked by their importance (unitless) to the random forest model. The lower barplot shows the variance explained (%) by each of the informative principal components. Heatmap color shows the correlation (R2) between each variable and each informative principal component.

In most Pacific DCM samples, over half of Prochlorococcus cells could potentially use siderophores (assuming the gene cluster is single-copy), which implies that a substantial fraction of Prochlorococcus Fe-demand at the DCM may be fulfilled by siderophores. Unexpectedly, we observed a high abundance of Prochlorococcus siderophore consumers in the S. Atlantic subtropical gyre between 5 °S and 25 °S (Fig. 3A), where the predicted climatological mean of dFe is relatively high (Fig. 1A). Furthermore, Synechococcus consumers were highly abundant in samples south of 40 °S in the S. Atlantic gyre (Fig. 3B), which is at the limit of Prochlorococcus’ latitudinal range [2]. Over 40% of Prochlorococcus genomes at the South Atlantic ocean DCM between 5 °S and 25 °S were putative siderophore consumers, consistent with recent studies suggesting Fe-deficiency and Fe-N co-limitation in this region [75, 76]. Indeed, there is a persistent local minimum in dFe concentrations within the euphotic zone between 5 °S and 25 °S on GEOTRACES transect GA02 [32], and the thermocline waters (approximately 200–400 meters depth) of this region are Fe-deficient relative to macronutrients [77]. Most biogeochemical models predict community N-limitation in the S. Atlantic gyre [78, 79], but there are an increasing number of studies demonstrating Fe-limitation or Fe-macronutrient co-limitation in phytoplankton communities from this region [75, 76]. Like the equatorial Pacific, primary productivity in the S. Atlantic gyre may be primarily sustained by rapid and efficient internal Fe recycling due to low new Fe inputs [80, 81].

The finding of abundant Prochlorococcus and Synechococcus siderophore consumers at the DCM and in the S. Atlantic subtropical gyre led us to ask what specific chemical, biological, and hydrographic mechanisms might influence these distributions. We focused on the distribution of Prochlorococcus siderophore consumers because they were more abundant with a wider geographic distribution than Synechococcus consumers (Fig. 3). We trained a random forest regression for Prochlorococcus siderophore consumer abundance using a set of 27 predictive features derived from PCA on a 45 variable dataset of hydrographic, geochemical, and biological measurements (see methods, supplementary material). We then ranked the original variables by their weighted contribution to explaining the variance in the 20 informative PCs (Fig. 4). The final model had excellent predictive performance in the test dataset (R2 = 0.93, RMSE = 0.00017) indicating that the abundance of Prochlorococcus siderophore consumers can accurately be predicted from GEOTRACES and Tara Oceans biogeochemical data alone. Prochlorococcus siderophore consumers covaried with the biogeochemical parameters that distinguish the subtropical/tropical N. Atlantic ocean, the Mediterranean Sea, and the Red Sea from the rest of the ocean. The N. Atlantic, Mediterranean Sea, and the Red Sea have the highest salinity [82] and receive a large flux of atmospherically deposited material due to their proximity to the Sahara desert and the Arabian peninsula [67]. Correspondingly, salinity and atmospherically deposited trace elements - such as Cu, Al, Zn, Mn, Fe, and Pb [72] - were strongly negatively associated with Prochlorococcus siderophore consumers (Figs. 45, and Supplementary Fig. S6). Siderophore transporters were highly abundant in equatorial Pacific samples dominated by the Prochlorococcus HLIV clade (>60% HLIV), which has been shown previously to have siderophore transport genes [20] and is associated with Fe-limited ocean regions [83]. Overall, our findings reveal that picocyanobacterial siderophore uptake is common in the remote reaches of the subtropical/tropical oligotrophic ocean, where the atmospheric input fluxes of trace elements to the upper ocean are the smallest.

Fig. 5: Top ten most predictive variables for Prochlorococcus siderophore consumer distribution.
figure 5

These ten environmental variables (ranked 1–10) have the greatest cumulative predictive contribution to the random forest model (Fig. 4, see methods). Each point is a metagenomic observation colored by ocean basin. Small squares represent samples with missing data that was imputed for each variable (see methods). dFe climatological means are from the MIT Darwin model. The remaining measurements are in situ chemical, biological, or hydrographic measurements from GEOTRACES and Tara Oceans. Boxes show Pearson correlation coefficients (r) and the p value testing whether true r is equal to 0.

Of all trace metals in the global dataset, dFe measured from GEOTRACES samples explained most of the weighted variance across the principal component predictors used in the random forest. However, the magnitude of its linear correlation with siderophore consumers (r = −0.10) was smaller than for other metals like Al (r = −0.28) or Cu (r = −0.28) (Fig. 5). Furthermore, dFe concentrations from the Darwin biogeochemical model were more strongly correlated with Prochlorococcus siderophore consumers (r = −0.54) than for dFe or any other trace metal measurement from the GEOTRACES program. Although dFe had the weakest linear correlation with siderophore consumers, the dFe variance was partitioned across multiple, strongly predictive principal components (Fig. 4). This implies that dFe’s predictive power in the random forest was primarily derived from its nonlinear associations and interactions with other environmental variables (e.g., LLI abundance, DCM depth, nitrite, and salinity) and not its linear correlation with siderophore consumers. This finding is compatible with current views of marine Fe biogeochemistry: much of the upper ocean dFe inventory is cycled rapidly and shaped by multiple interacting biotic and abiotic biogeochemical processes [7].

The strong linear correlation between modeled climatological dFe concentrations and Prochlorococcus siderophore consumers is likely because atmospheric dust deposition is the dominant Fe source to the ocean in the MIT Darwin model [84]. This is consistent with the strong negative linear relationship we observe between siderophore consumers and atmospherically deposited elements (e.g., aluminum) in the GEOTRACES data and the strong partitioning of Prochlorococcus siderophore consumers to the oligotrophic Pacific and S. Atlantic. We argue that the global biogeography of Prochlorococcus siderophore consumers is driven by basin-scale geological forcing in the marine Fe cycle [72, 85]. Specifically, it appears that patterns of atmospheric dust deposition set the biogeographical boundary for where the relative fitness benefit of siderophore use exceeds that for reduced genome size and gene loss [86]. On a local scale, the fraction of a Prochlorococcus population that can use siderophores will also reflect a combination of physiological and ecological processes (e.g., light availability), which affects local dynamic biogeochemical processes like regeneration, scavenging, uptake, and colloid formation. In this way, the abundance of Prochlorococcus siderophore consumers reflects a balance of both the “fast” and “slow” biogeochemical processes shaping the marine Fe cycle.

Surprisingly, the random forest regression also showed a strong association between Prochlorococcus siderophore consumers, nitrite concentrations, and DCM depth. We explored this relationship further using linear regression (Fig. 6, Supplementary Table S2). The results were most consistent with a scenario where Prochlorococcus siderophore consumers are abundant at depths with the highest nitrite concentrations, especially where the DCM layer is deeper than 100 meters and to a lesser degree where LLI cells are most abundant. We propose that this pattern is due to phytoplankton Fe and light limitation (or co-limitation) at the DCM and the primary nitrite maximum layer. The origin of the primary nitrite maximum is debated, but it likely forms due to incomplete assimilatory nitrate reduction by Fe- or light-limited phytoplankton [51] in parallel with uncoupled chemoautotrophic nitrification [87]. Due to fast phytoplankton uptake kinetics, nitrite should only accumulate in waters where phytoplankton inorganic nitrogen uptake is limited by Fe and light [87]. Nearly all LLI cells can assimilate nitrite and some can also assimilate nitrate [55], but we found no evidence that nitrate assimilation genes were more or less abundant in LLI siderophore consumers than expected by chance. Overall, the significant positive association of Prochlorococcus siderophore consumers with high nitrite samples - especially from the deepest DCMs - provides indirect support for the hypothesis that the primary nitrite maximum layer in the euphotic subtropical/tropical ocean is associated with Fe and light limitation of photoautotrophic cells.

Fig. 6: Prochlorococcus siderophore consumers associate with the highest nitrite concentrations and the deepest deep chlorophyll maximum (DCM) layers.
figure 6

Each point is a direct metagenomic observation. Subplots show different DCM depth ranges. Points and error bars show estimated marginal population means and 95% confidence intervals from beta regression using nitrite concentration and DCM depth as model covariates. These continuous covariates were transformed into categorical covariates for regression by binning into three equal-sized groups (see methods).

Synthesis

High frequency dynamics in the organic Fe-binding ligand pool of the upper ocean balance the interplay between regeneration and removal fluxes which determines the global inventory of oceanic Fe [88]. Siderophores are increasingly understood to be an important molecular mechanism regenerating Fe from biomass and incorporating atmospherically deposited lithogenic Fe into microbial ecosystems [57]. Many copiotrophic marine bacteria appear to use siderophores, but nearly all genome-streamlined oligotrophic marine bacteria do not - likely to minimize overall metabolic complexity [16, 17]. This raises the question of how minimalist cells like Prochlorococcus fulfill their Fe requirements, especially in the most Fe-limited regions.

Using rich genomic and metagenome data sets, we found that large populations of Prochlorococcus and Synechococcus from remote regions of the global ocean have evolved the ability to scavenge exogenous siderophore-bound Fe, which likely helps them fulfill cellular iron demand in limiting or stressful conditions. The Prochlorococcus siderophore uptake trait is prominent in ocean regions where the atmospheric Fe flux to the surface ocean is lowest (Fig. 7). In these regions, Fe recycling mediated by organic ligands fuels a substantial portion of primary productivity [80]. Here siderophores are likely a critical molecular shuttle between particulate, colloidal, and dissolved phases, ultimately retaining Fe in the upper ocean (Fig. 7). The siderophore uptake trait is most abundant in picocyanobacteria inhabiting remote Fe-depleted regions and in low-light adapted Prochlorococcus, particularly those at deep DCMs near the primary nitrite maximum layer. The vertical distribution of this trait in the water column implies that light availability is also an important control on Prochlorococcus Fe demand. Indeed, the association of siderophore consumers with the highest nitrite concentrations suggests that nitrite accumulation in the euphotic zone may be a consequence of both iron and light limitation.

Fig. 7: Biogeochemical drivers of Prochlorococcus siderophore uptake.
figure 7

Conceptual diagram depicting differences in Prochlorococcus siderophore use between the Pacific and Atlantic oceans as exemplified by (A) station ALOHA and (B) station BATS. The total abundance of Prochlorococcus is proportional to the number of points at each depth, while the color represents high-light and low-light adaptation. DCM Deep chlorophyll maximum layer.

There are some caveats to the interpretations of our findings that should be considered. First, we can not unambiguously pinpoint the substrate of the picocyanobacterial transport system, but there are clues as to which molecule is targeted. The siderophore gene cluster includes an outer membrane TonB dependent receptor - a protein family with diverse substrates including siderophores, heme, vitamin B12, and polysaccharides. A prior study with a HLII Prochlorococcus strain found that the outer membrane receptor and substrate-binding protein were strongly upregulated only under Fe-deficient culture conditions [20], which suggests polysaccharides and vitamin B12 are not the primary substrates. TonB dependent receptors can also transport heme, an iron-containing enzyme cofactor, but authentic heme uptake operons usually contain a highly expressed heme oxygenase [89, 90], which is missing from the picocyanobacterial gene clusters. Therefore, we argue that the most parsimonious explanation is that the target substrate is a siderophore or a novel iron-containing metallophore. As a final caveat, we note that we have primarily focused on atmospheric Fe supply due to the strong negative correlations between the distribution of Prochlorococcus siderophore transporters and atmospherically sourced metals like Cu, Al, Mn, and Pb (Fig. 5, Supplementary Fig. S6). However, dFe supply from below the euphotic zone is also an important new Fe source [61, 91], and rapid internal recycling of Fe in the euphotic zone may prolong seasonal phytoplankton productivity and fuel macronutrient consumption [80, 81]. These potential dFe supply processes and other aspects of the Fe budget (e.g., particulate Fe lability, variable mesopelagic regeneration efficiencies [92], and mechanisms of dust bioavailability [93]) were not explicitly considered in this work but are likely relevant to the global distribution of picocyanobacteria siderophore consumers.

Our results imply that the presence of siderophore-consuming Prochlorococcus and Synechococcus could be a valuable biomarker for diagnosing ecosystem Fe deficiency. Indeed, our study highlights two areas of potential iron deficiency that warrant further investigation - the S. Atlantic subtropical gyre and the DCM of the oligotrophic Pacific and S. Atlantic oceans. These areas have historically been considered nitrogen- and light-limited ecosystems, but our results suggest an essential role for Fe. Most of the potentially Fe-deficient regions revealed by our analysis are remote, and sampling Fe at these locations requires specialized knowledge of shipboard trace metal incubations, sampling, and analysis. Because DNA sampling does not require specialized trace metal clean conditions, omics-based approaches based on robust markers of nutrient deficiency could be one complementary tool for increasing the breadth of trace metal surveys in the ocean. Here we have leveraged cross-scale biology of the well-described and ubiquitous marine picocyanobacterium, Prochlorococcus, to reveal vast regions of community iron deficiency in the global ocean using a siderophore transport system as a biomarker. These findings high-light the intricate connection between the ecology and evolution of Prochlorococcus and the Fe cycle of the surface ocean.