Introduction

The autotrophic phytoplankton species in the upper ocean, constituting <1% of the entire photosynthetic biomass on the globe, are responsible not only for ~50% of the global annual carbon-fixation [1, 2], but also for providing life-support to marine food-webs through its trophic connections. In addition to their biomass and species composition, the cellular macromolecular contents and energy value of phytoplankton can strongly impact the trophic balance within a marine ecosystem, e.g., by directly impacting the developmental stages of grazers, and influencing the trophic-energy flow affecting the production of higher trophic species [3,4,5,6]. The stoichiometric ratio, i.e., the relative elemental composition of carbon, nitrogen and phosphorous in phytoplankton is known to vary with phytoplankton assemblages across resource gradients in marine biomes [7, 8]. Stoichiometric variations alter the nutritional quality of phytoplankton as food to the grazers [9, 10]; and variations in nutrient bound or energy value of phytoplankton affect the stability and oscillatory dynamics of producer-grazer interactions (e.g., see refs. [11,12,13]). It is, therefore, imperative to monitor the variations in cellular macromolecular contents of marine phytoplankton, on local, regional and global scales. In this context, possibilities of having satellite-based estimates would be invaluable, given that in situ observations are often infrequent, and inadequate for monitoring over large spatial scales. Moreover, conducting in situ measurements of the macromolecular contents of phytoplankton in the global ocean, would be extremely time consuming and considerably expensive.

Over the last two decades, several satellite-based methods have been developed to extend our capabilities from routinely estimating chlorophyll concentration, to distinguishing phytoplankton functional types (PFTs), in terms of the proportions of chlorophyll either in major taxonomic groups, or in phytoplankton size classes (PSCs) (for more details, see, [14, 15]). Some progress has also been made to estimate phytoplankton carbon [16,17,18,19], and carbon-based classification of PSCs [17, 18], from ocean colour. However, strong variations in phytoplankton cellular carbon and carbon-based macromolecules, with taxa, cell morphology, and environmental conditions such as ambient light and available nutrient [20,21,22,23,24], impose additional layers of difficulties in converting the satellite-derived estimates of chlorophyll or carbon-to-macromolecular concentrations. Certain phytoplankton macromolecules, such as cellular fatty acids, strongly vary (e.g., between 1 and 85%, [25]), not only among algal groups and species, but also within a specific algal group, e.g., diatoms under different culture conditions [26]. In addition to laboratory cultures, essential fatty acids in phytoplankton have also been reported to vary with oceanographic conditions, such as sea-surface temperature and chlorophyll-a, on regional scales (e.g., [27, 28]). However, progress is yet to be made to estimate the variations in total phytoplankton lipid from satellite data, on a global scale. Moreover, no method exists yet to estimate from satellite, either on a regional or global scale, the spatiotemporal variations of other essential phytoplankton macromolecules, such as carbohydrate or protein. Given that the proportional contributions of these macromolecules determine the energy value of phytoplankton, it would be useful to develop an advanced ocean-colour-based method for estimating the macromolecular concentrations in the ocean waters.

In this paper, the cellular macromolecular contents of marine phytoplankton, in particular, the concentrations of carbohydrate, protein and lipid are estimated on a global scale, for the first time, based on ocean-colour data from satellite remote-sensing. To do so, a novel method is derived that utilises light-absorption coefficients of phytoplankton (aph)—an inherent optical property (IOP) retrievable from ocean colour (e.g., [29]), coupled with allometric relationships between phytoplankton cells and their cellular macromolecular contents, reported in the literature [20,21,22,23,24, 30,31,32]. The method builds on a semi-analytical algorithm for retrieving the exponent of the phytoplankton size spectrum from satellite ocean colour, developed recently by [33, 34]. The concentrations of the total macromolecular contents are further partitioned according to their contributions in three bulk PSCs, namely, picoplankton, nanoplankton and microplankton. The estimates are obtained over the global ocean, and for different marine biomes represented by Longhurst oceanographic provinces [35, 36]. Further, insights on the estimation uncertainties are provided through detailed sensitivity analyses, highlighting the possibilities of further improvements of the estimates, with the expectation that the input satellite data would further improve, as the satellite era enters into higher temporal and spatial resolution.

Methodology

Satellite validation

Global 4-km, level-3 mapped chlorophyll concentration, remote-sensing reflectance and the IOPs were obtained from the European Space Agency’s Ocean Colour Climate Change Initiative (OC-CCI) project (freely available on http://www.esa-oceancolour-cci.org). The OC-CCI data were produced by merging ocean-colour data from three satellite sensors: NASA-SeaWiFS, NASA-MODIS-Aqua and ESA-MERIS; further details on OC-CCI, including data processing, temporal consistency of the data products and details of the algorithms used, can be found in [37, 38]. Monthly climatologies of the mixed-layer depth were obtained on 0.5° × 0.5° spatial grid from Monthly Isopycnal & Mixed-layer Ocean Climatology (MIMOC, [39], available on http://www.pmel.noaa.gov/mimoc/). To obtain depth-integrated estimates of the satellite-derived products from OC-CCI, the mixed-layer depths were remapped onto OC-CCI 4-km grids using nearest-neighbour interpolation by implementing MATLAB2015b interpolation routine (similar to previous studies, e.g., [18]).

A sufficiently large global in situ dataset on phytoplankton carbohydrate, protein and lipid that would ideally be required to validate the satellite-based estimates was unavailable. The historical in situ measurements on carbohydrate, protein and lipid, which were already compiled by [40], did not cover the period over which satellite data (e.g., OC-CCI v2) were available (i.e., September 1997 onwards). These constraints hindered satellite validation exercise in different oceanographic conditions.

Whilst direct measurements on carbohydrate, protein and lipid were unavailable, large datasets on in situ phytoplankton abundance were available, e.g., those compiled in marine biodiversity database [41], which included phytoplankton cell counts from samples collected in different oceanographic cruises between 1992 and 2002, partly covering the satellite period. Owing to the constraints on direct measurements, a validation exercise was attempted by converting the in situ data on phytoplankton abundance [41] into estimates of phytoplankton macromolecular concentrations, using allometric relationships from the literature [40]. To do so, a subset of phytoplankton abundance data [41] that overlapped with the OC-CCI v2 temporal coverage (September 1997—December 2013) were considered, and the concentrations of phytoplankton carbohydrate, protein and lipid were computed using the information on phytoplankton cell size (reported in [41]) and the corresponding allometric relationships (reported in [40]). This subset included 250 samples collected from 1997 to 2002, across various oceanographic regions (see Section 2.2); and consisted of 943 species of diatom, dinoflagellate and coccolithophores with equivalent-spherical diameter ranging from 1.34 to 50 μm (to be consistent with the size range of microplankton assumed within the algorithm, only the species with diameter <50 μm were considered). This cell-diameter range covered nanoplankton, microplankton and a part of picoplankton. To be consistent with previous studies [18, 33], the diameter ranges of the three phytoplankton size classes used in the model were picoplankton: 0.2–2 μm, nanoplankton: 2–20 μm, and microplankton: 20–50 μm. Satellite matched-up chlorophyll concentrations and IOPs were retrieved from OC-CCI data archive. Given that the sampling times were mostly within the early years of SeaWiFS coverage (and SeaWiFS was the only contributing ocean-colour sensor over 1997–2002), a large number of gaps in satellite data were identified. To maximise the number of validation data points, match-ups from composite satellite images on daily (n = 39) and monthly (n = 249) scales were used.

The global annual stocks of phytoplankton carbohydrate, protein and lipid within the oceanic mixed layer were computed from the estimated surface concentrations, grid-by-grid, using the available mixed-layer depth values obtained from MIMOC (no specific depth profiles of the macromolecular concentrations were known from either in situ or remote sensing, on a global scale).

Relating the size spectrum of phytoplankton to its cellular macromolecular concentrations

Studies have shown that phytoplankton cell size strongly determines its cellular concentrations of chlorophyll, carbon and carbon-based macromolecules through allometric relationships [20,21,22,23,24, 30,31,32]. The allometric relationship between the cellular concentration of a macromolecule ([M]cell, expressed in the units of pg per cell) and the volume of a phytoplankton cell (Vcell, in μm3) can be described by the canonical equation: \(\left[ M \right]_{cell} = a_MV_{cell}^{b_M}\); where, M stands for the macromolecule that can be carbohydrate, protein or lipid, and aM, bM are the allometric parameters with magnitudes specific to a macromolecule M. For a given macromolecule, aM and bM would remain constant across the size spectrum of phytoplankton cells. Assuming that the particle size distribution of phytoplankton cells follows the power law [42,43,44], the number of phytoplankton cells with equivalent spherical diameter D per unit volume of seawater can expressed as: N(D) = kDξ, with ξ as the exponent of the phytoplankton size spectrum, and k as a constant related to the abundance of the total population. Following [33], the concentration of phytoplankton chlorophyll-a (Btotal, mg Chl m−3) within the cell-diameter range [Dmin, Dmax] can be expressed as a product of the number of phytoplankton cells within that size class, the volume of each cell (π D3/6), and the intracellular concentration of chlorophyll-a ci (in mg m−3, parameterised as ci = c0 Dm with the magnitudes of c0 = 3.9 × 106, and m = 0.06 by [34] using the in situ measurements of [30]), as follows:

$$\begin{array}{*{20}{l}} {B_{total}} \hfill & = \hfill & {{\int}_{D_{min}}^{D_{max}} \left[ {\left( {\frac{\pi }{6}D^3} \right){\kern 1pt} \left( {c_0D^{ - m}} \right)\left( {kD^{ - \xi }} \right)} \right]{\kern 1pt} dD} \hfill \\ {} \hfill & = \hfill & {\left( {\frac{\pi }{6}kc_0} \right)\frac{{D_{max}^{4 - \xi - m} - D_{min}^{4 - \xi - m}}}{{4 - \xi - m}}.} \hfill \end{array}$$
(1)

Similarly, the total concentration of the macromolecule M (in mg m−3) due to all phytoplankton cells within a diameter range [Dmin, Dmax] can be expressed as a product of the number of cells and the cellular concentration [M]cell:

$$\begin{array}{*{20}{l}} {\left[ M \right]_{total}} \hfill & = \hfill & {{\int}_{D_{min}}^{D_{max}} \left[ {N(D) \times [M]_{cell}} \right]{\kern 1pt} dD} \hfill \\ {} \hfill & = \hfill & {{\int}_{D_{min}}^{D_{max}} \left( {kD^{ - \xi }} \right)\left[ {10^{ - 9}{\kern 1pt} a_M\left( {10^{18}\frac{\pi }{6}D^3} \right)^{b_M}} \right]dD,} \hfill \\ {} \hfill & = \hfill & {10^{ - 9}k{\kern 1pt} a_M\left( {10^{18}\frac{\pi }{6}} \right)^{b_M}{\kern 1pt} \left( {\frac{{D_{max}^{3b_M - \xi + 1} - D_{min}^{3b_M - \xi + 1}}}{{3b_M - \xi + 1}}} \right);} \hfill \end{array}$$
(2)

with the condition that \([M]_{total} \to \left[ {10^{ - 9}ka_M\left( {10^{18}\pi {\mathrm{/}}6} \right)^{b_M}{\mathrm{log}}_e\left( {D_{max}{\mathrm{/}}D_{min}} \right)} \right]\), when ξ → (3bM + 1), applied to avoid division by zero. The factors 10−9 and 1018 are associated with the conversions of units from pg to mg, and m3 to μm3 respectively. Using Eqs. (1) and (2), the ratio of the macromolecular concentration to the chlorophyll concentration (χM) can be expressed as:

$$\begin{array}{*{20}{l}} {\chi _M} \hfill & = \hfill & {\frac{{[M]_{total}}}{{B_{total}}} = \frac{{10^{ - 9}a_M\left( {10^{18}\pi {\mathrm{/}}6} \right)^{b_M}}}{{(\pi {\mathrm{/}}6)c_0}}} \hfill \\ {} \hfill & {} \hfill & {\left( {\frac{{D_{max}^{3b_M - \xi + 1} - D_{min}^{3b_M - \xi + 1}}}{{D_{max}^{4 - \xi - m} - D_{min}^{4 - \xi - m}}}} \right)\left( {\frac{{4 - \xi - m}}{{3b_M - \xi + 1}}} \right).} \hfill \end{array}$$
(3)

Note that the expression of macromolecule-to-chlorophyll ratio χM in Eq. (3) does not depend on the parameter k appearing in Eqs. (1) and (2). So, once χM is computed, Mtotal can be computed from the observed value of Btotal as:

$$M_{total} = \chi _MB_{total},$$
(4)

provided that ξ, aM and bM of the population are known (see the following Sections).

Size-partitioned cellular contents of phytoplankton

Assuming that the total biomass of phytoplankton is a sum of the biomasses of n non-overlapping PSCs defined by cell-diameter ranges [Di, Dj] with 0 ≤ i < j ≤ n, [M]total = ∑ [M]ij, where [M]ij denote the macromolecular concentration within the size class [i, j]. It follows from Eq. (4), that [M]ij = χMij Bij, with χMij and Bij, respectively, are the macromolecule-to-chlorophyll ratio and the concentration of chlorophyll Bij in the size class [Di, Dj], where χMij follows directly from using Eq. (3):

$$\begin{array}{*{20}{l}} {\chi _{M{\kern 1pt} ij}} \hfill & = \hfill & {\frac{{10^{ - 9}a_M\left( {10^{18}\pi {\mathrm{/}}6} \right)^{b_M}}}{{(\pi {\mathrm{/}}6)c_0}}} \hfill \\ {} \hfill & {} \hfill & {\left[ {\frac{{D_j^{3b_M - \xi + 1} - D_i^{3b_M - \xi + 1}}}{{D_j^{4 - \xi - m} - D_i^{4 - \xi - m}}}} \right]\left[ {\frac{{4 - \xi - m}}{{3b_M - \xi + 1}}} \right],} \hfill \end{array}$$
(5)

and the expression of Bij is taken from [33], so that,

$$[M]_{ij} = \chi _{Mij}\,B_{ij} = \chi _{Mij}\left( {\frac{{D_j^{4 - \xi - m} - D_i^{4 - \xi - m}}}{{D_{max}^{4 - \xi - m} - D_{min}^{4 - \xi - m}}}} \right)B_{total};$$
(6)

and therefore,

$$\begin{array}{*{20}{l}} {[M]_{total}} \hfill & = \hfill & {\mathop {\sum}\limits_{i \,=\, 0,j \,=\, i + 1}^{i = n - 1,j \,=\, n} [M]_{ij} = \frac{{B_{total}}}{{D_{max}^{4 - \xi - m} - D_{min}^{4 - \xi - m}}}} \hfill \\ {} \hfill & {} \hfill & {\mathop {\sum}\limits_{i \,=\, 0,j \,=\, i + 1}^{i \,=\, n - 1,j \,=\, n} \left[ {\chi _{Mij}\left( {D_j^{4 - \xi - m} - D_i^{4 - \xi - m}} \right)} \right].} \hfill \end{array}$$
(7)

Also, the fraction of [M]ij to [M]total can be computed as:

$$F_{M,ij} = \frac{{[M]_{ij}}}{{[M]_{total}}} = \frac{{\chi _{M{\kern 1pt} ij},\left( {D_j^{4 - \xi - m} - D_i^{4 - \xi - m}} \right)}}{{\mathop {\sum}\limits_{i \,=\, 0,j \,=\, i + 1}^{i \,=\, n - 1,{\kern 1pt} j \,=\, n} \left[ {\chi _{Mij}\left( {D_j^{4 - \xi - m} - D_i^{4 - \xi - m}} \right)} \right]}}.$$
(8)

Using the equations derived above, the concentrations of carbohydrate, protein and lipid can be partitioned into any number of PSCs. However, for the sake of discussion, in this study, the estimates are obtained for three major PSCs, namely, picoplankton, nanoplankton and microplankton, with cell-diameter bounds [D0, D1], [D1, D2] and [D2, D3], respectively, where D0 = 0.25 μm, D1 = 2 μm, D2 = 20 μm, and D3 = 50 μm based on previous studies [33, 45, 46].

Allometric parameters a M and b M from the literature, and retrieval of ξ from satellite data

The allometric parameters aM and bM corresponding to phytoplankton species are reported in several studies, e.g., [20, 23, 31, 40]. More recently, [40] compiled a large database of macromolecular concentrations in various eukaryotic microalgae from 53 published studies, covering various taxonomic groups, culture conditions and growth phases; and reported the allometric relationships between cell volume and concentrations of carbohydrate, protein, and lipid in phytoplankton. In the current study, aM and bM are fixed based on ref. [40] (see, their Table-II), and their reported values along with the confidence intervals are used for estimating the macromolecular concentrations and performing uncertainty analyses described in Section Uncertainties and biases).

The exponent of the phytoplankton size spectrum ξ is retrieved from the specific-absorption coefficient of phytoplankton at 676 nm using a semi-analytical ocean-colour algorithm developed by ref. [33]. For completeness, the major steps of this methodology are described in the Supplementary Materials, without fully reproducing it from ref. [33]. However, for further details on the parameterisation and optimisation steps related the retrieval of ξ, readers are referred to refs. [33, 34].

Uncertainties and biases

Although the method described above is founded on theories of light-absorption properties and cellular allometric relationships of phytoplankton, the estimates need to be validated against direct in situ measurements, which are currently unavailable. This limitation raises the possibility of bias and uncertainties in satellite products at each pixel, leading to biased estimates of the macromolecules on a global scale. The inaccuracy of the estimates may arise from several sources, the most prominent of which is the uncertainties associated with the satellite products used as inputs to the model, e.g., chlorophyll-a and absorption coefficients of phytoplankton. The uncertainties in chlorophyll-a retrievals for optically complex (Case II) waters are considerably large, when compared within those for the open oceans (Case I waters), mainly due to the limitations of the empirical chlorophyll algorithms used (e.g., [47]). The absorption coefficients of phytoplankton, on the other hand, being an IOP are retrieved generally by semi-analytical algorithms, the performance of which also vary for optically complex waters (e.g., [29]).

In the coastal oceans and optically complex waters, the retrievals are affected due to the presence of high concentration of coloured-dissolved organic matters (CDOM), sediments, other suspended materials and water constituents that interfere with light penetration and reflectance [47]. Uncertainties in remote sensing retrievals can further be attributed to clouds, ice covers, solar zenith angles, sun glint, atmospheric dusts and aerosols (e.g., [47, 48]). Thus, the satellite-derived estimates of carbohydrate protein and lipid presented on global maps (in the result section) comes with uncertainty and bias, an accurate estimation of which would be possible only when adequate in situ measurements on these quantities become available.

Nevertheless, to understand and quantify the overall uncertainty levels in the satellite-derived estimates, a model sensitivity analysis was carried out. Theoretically, accurate estimations of the macromolecular concentrations in phytoplankton based on the above method would depend on the allometric parameters (aM and bM) and the estimates of ξ. The retrieval of ξ further depends on satellite-derived estimates of chlorophyll-a and aph. Using Eqs. (1)–(3), the relative sensitivities of the estimates of Mtotal, i.e., \(\frac{{{\mathrm{\Delta }}M_{total}}}{{M_{total}}}\), can be computed as a combined function of \(\frac{{{\mathrm{\Delta }}\xi }}{\xi }\), \(\frac{{{\mathrm{\Delta }}a_M}}{{a_M}}\), and \(\frac{{{\mathrm{\Delta }}b_M}}{{b_M}}\). Following [33], where \(\frac{{{\mathrm{\Delta }}\xi }}{\xi }\) are reported pixel-by-pixel in the global ocean, a maximum overall \(\frac{{{\mathrm{\Delta }}\xi }}{\xi }\) in the range 0–25% is considered. For \(\frac{{{\mathrm{\Delta }}a_M}}{{a_M}}\) and \(\frac{{{\mathrm{\Delta }}b_M}}{{b_M}}\), the half of the 95% spread with respect to the mean levels reported by ref. [40] are considered. The resultant \(\frac{{{\mathrm{\Delta }}M_{total}}}{{M_{total}}}\) are then computed pixel-by-pixel, as percentages of the default estimates. So, without the availability of adequate in situ measurements, the uncertainties discussed in the following sections should be interpreted as model-based uncertainties, and not as those based on the in situ observations.

Results and discussion

Macromolecular concentrations across phytoplankton size range

The ratios of carbohydrate-to-chlorophyll (χcarbo), protein-to-chlorophyll (χprot) and lipid-to-chlorophyll (χlipid) increase with ξ within the ranges given by [5.0, 9.5], [7.1, 48.9] and [3.1, 32], respectively (Fig. 1a). For any given value of ξ, χprot is higher than χcarbo and χlipid. For low values of ξ, χlipid is lower than χcarbo, but it increases more rapidly with the assemblages of small phytoplankton cells, and so, for high values of ξ, χlipid is significantly higher than χcarbo (Fig. 1a).

Fig. 1
figure 1

a Carbohydrate-to-chlorophyll (χcarbo), protein-to-chlorophyll (χprot) and lipid-to-chlorophyll (χlipid) ratios of the mixed-phytoplankton population derived (using Eq. (3)) as functions of the exponent of the phytoplankton size spectrum (ξ). b–d Size-partitioned carbohydrate, protein and lipid proportions in: b picoplankton, c nanoplankton and d microplankton, derived using Eq. (8). eg Algorithm-based relative uncertainties in the estimates of: e carbohydrate, f protein and g lipid, quantified as a joint function of the relative uncertainties in ξ, aM and bM (see, Section Uncertainties and biases). The 95% confidence levels for the allometric parameters reported in ref. [40] are considered for computing the % uncertainties in the parameters with respect to their reported means, along with a range of 0–25% relative uncertainty in ξ (following ref. [33]) (Color figure online)

The proportions of carbohydrate, protein and lipid increase with ξ in picoplankton (Fig. 1b), and decrease with ξ in microplankton (Fig. 1d), but are unimodal in nanoplankton having magnitudes typically less than 50% with highest values in the middle rage of ξ (Fig. 1c). At any given level of ξ, the proportion of lipid in picoplankton is higher than that of carbohydrate or protein (with carbohydrate<protein<lipid) (Fig. 1b); but in microplankton the order is reversed to carbohydrate>protein>lipid (Fig. 1d). For nanoplankton these proportions alter from carbohydrate < protein<lipid at the lower end of ξ to carbohydrate>protein>lipid at the higher end of ξ (Fig. 1c). These results show strong dependencies of phytoplankton size structure on the available macromolecular concentrations with implications on their stocks in mixed populations of phytoplankton.

For carbohydrate estimates, the relative uncertainties would be <30% for 3.25 < ξ < 5 (typically representing small-cell dominated populations), but would increase up to 60% at the lower end of ξ (typically representing large-cell dominated populations) (Fig. 1e, Table 1). For protein estimates (Fig. 1f), the relative uncertainties would be <40% across the range of ξ provided that the relative uncertainty in ξ is <10%. If the relative uncertainties in ξ are >15%, the uncertainties in protein would increase to > 60% typically for 3.25 < ξ < 4.5, but would generally remain within <40% for populations dominated by either large or small cells (i.e., at the low and high ends of ξ, see Table 1 for more details). For lipid estimates, the relative uncertainties would be similar to those for protein: <40% for the low and high ends of ξ, but >60% for the mid-range of ξ, if the uncertainties in ξ is >15% (Fig. 1g). Further details on these uncertainty estimates for various combinations of uncertainties in ξ estimates (based on Fig. 1e–g) are summarised in Table 1, and the propagations of the uncertainties in the global ocean are discussed in Section Algorithm uncertainties on global map.

Table 1 Summary of overall uncertainties (mean with ranges) in carbohydrate, protein and lipid estimates as a function of uncertainties in ξ and allometric parameters, as shown in Fig. 1e, f, g.

Comparison with estimates based on in situ abundance data

The matched-up in situ data were from specific cruises (see, Fig. 2a) with moderate sample size having non-normal distribution; therefore, non-parametric statistics were implemented, in particular, Spearman’s correlation instead of Pearson’s, and other non-parametric matrices following ref. [49]. The in situ and satellite-based estimates generally follow the 1:1 line, but with some level of spread around it (Fig. 2b–d, Supplementary Fig. S1), with significant correlations (Spearman’s ρ) between them on linear scale, for carbohydrate: ρ = 0.25, p < 0.001; protein: ρ = 0.24, p < 0.001; and lipid: ρ = 0.23, p < 0.001 (Fig. 2b–d). The root mean squared error (RMSE) and bias of the estimates vary for carbohydrate (RMSE 10.20, bias −7.28 mg m−3), protein (RMSE 21.55, bias −10.93 mg m−3) and lipid (RMSE 9.77, bias −4.87 mg m−3). As expected, the RMSE and bias for daily match-ups, turn out to be lower than those for monthly match-ups (see, Supplementary Table S1); but in both cases their magnitudes are within a reasonable range, when compared with those for other derived products, such as phytoplankton carbon [17, 18].

Fig. 2
figure 2

a Geographic locations of the in situ samples (yellow dots) used from the marine biodiversity database [41]; this subset overlapped with the temporal coverage of satellite data, and were considered for computing phytoplankton carbohydrate, protein and lipid using species size and cell abundances, and by applying the allometric relationships reported in [40]. The Westerlies, Trades and Coastal Longhurst provinces are shown in different colours. bd Satellite match-ups from daily (green dots) and monthly (black dots) images were considered for comparing the satellite-derived b carbohydrate, c protein and d lipid with the in situ estimates. e Box-plots comparing the estimates from in situ with satellite based on the current method (Color figure online)

Following [49], three further metrices are computed for comparing the estimates with daily (monthly) match-ups: the median satellite-to-in-situ-ratio (median ratio, found to be 0.51(0.71), 0.59 (0.73), and 0.59 (0.73), respectively), the median of the relative-percent difference (median RPD, found to be −49.41 (−29.36), −40.86 (−27.38) and −41.11 (−26.75), respectively), and the semi-interquartile percent differences (SIQ-PD, found to be −48.50 (−67.65), −50.66 (−65.08) and −51.36 (−63.82), respectively) (see, Supplementary Table S1). The median RPDs and SIQ-PDs are lowest for lipid estimates, followed by those for protein and carbohydrate (Supplementary Table S1). The median ratios are <1, suggesting that the algorithm would generally underestimate the macromolecular concentrations (Fig. 2e). Also, the algorithm seems to produce relatively less natural variability of the macromolecular concentrations, in comparison with those estimated from the in situ abundance data (Fig. 2e). However, it is worth mentioning that the median ratio, median RDP, SIR-PD for SeaWiFS chlorophyll were reported [49] to be in the ranges [1.7, 81.5], [−34.7, 122.3], and [0.88, 1.69], respectively. Therefore, in terms of these metrices, the accuracy of the current estimates of the macromolecular concentrations are generally comparable with that reported for SeaWiFS chlorophyll.

Nevertheless, these comparisons would be affected by several layers of uncertainties associated with the in situ and satellite estimates. For example, prominent natural variability of cell size of the 943 phytoplankton species would alter the in situ estimates of carbohydrate, protein and lipid, which were not possible to include in the in situ calculations; and the uncertainties in satellite inputs (chlorophyll, IOPs) would also affect the satellite retrievals of ξ (also see, Section  Algorithm uncertainties on global map).

Phytoplankton carbohydrate, protein and lipid in the world oceans

Strong spatial variability of the annually averaged χcarbo, χprot, χlipid, carbohydrate, protein and lipid are found over the world’s oceanic biomes, for the period of study (Fig. 3). The magnitude of χcarbo varies from <5 in the high-chlorophyll coastal waters and large parts of the northern latitudes beyond 40 degree north (Fig. 3a, c), to >9 in the open oceans and Case I waters (Fig. 3c). Similarly, χprot (Fig. 3e) or χlipid (Fig. 3g) vary, respectively, from <15 or <10 in the coastal waters and northern latitudes, to >45 or >30, respectively, in the open oceans and Case I waters. These results generally reflect that the oceanographic regions dominated by large and small phytoplankton are, respectively, represented by low and high values of χcarbo, χprot or χlipid. In the Atlantic and Pacific subtropical gyres, despite the high magnitudes of χcarbo, χprot and χlipid, the concentrations of carbohydrate, protein and lipid are typically low (<0.5, 1.0 and 1.0 mg m−3, respectively), and the spatial pattern is similar to the distribution of low chlorophyll. Most of the coastal oceans and Case II waters are generally characterised by higher than 5, 10 and 10 mg m−3 of carbohydrate, protein and lipid, respectively, which in places spike beyond 50, 100 and 100 mg m−3, respectively (Fig. 3d, f, h). It is noteworthy that some of these very high values may be attributed to the uncertain or erroneous retrievals of chlorophyll and other optical properties in the optically complex water (as also discussed in Section Macromolecule concentrations in Longhurst provinces).

Fig. 3
figure 3

Distributions of the annually averaged surface concentrations of macromolecules and energy value of phytoplankton over 1997–2013. Overlaid on the global maps are thin black lines representing the boundaries of the Longhurst biogeographical provinces [35, 36]. Annual averages of a surface chlorophyll in [mg m−3]; b chemical energy value of phytoplankton in (J m−3) as a combinations of the estimated carbohydrate, protein, lipid; c carbohydrate to chlorophyll ratio (dimensionless); d concentration of carbohydrate in (mg m−3); e protein to chlorophyll ratio (dimensionless); f concentration of protein in (mg m−3); g lipid to chlorophyll ratio (dimensionless); h concentration of lipid in (mg m−3), computed based on the methodology described in Section Methodology (Color figure online)

Applying the macromolecular concentration-to-energy conversion factors, i.e, 4.2 kcal g−1 for carbohydrate, 4.19 kcal g−1 for protein, 9.5 kcal g−1 for lipid [20, 40], the chemical-energy values of the surface-ocean phytoplankton can be computed (Fig. 3b, d, f, h). The annual average of the phytoplankton energy-value is generally <0.1 kJ m−3 of ocean water in the subtropical gyres, but goes up to 0.5–1.0 kJ m−3 in parts of the equatorial, northern and southern latitudes, and beyond 10 kJ m−3 in certain coastal and optically complex waters (Fig. 3b).

Size-partitioned phytoplankton carbohydrate, protein and lipid in the world oceans

Picoplankton contributions to carbohydrate (in the range (0.1, 1.0) mg m−3), protein (in the range (1.0, 5.0) mg m−3) or lipid (in the range (0.5, 3.0) mg m−3) dominate over the contributions of nanoplankton and microplankton in the open oceans and equatorial gyres (Fig. 4a, d, g). In the northern latitudes beyond 40 degrees and in coastal waters, microplankton contributions to carbohydrate, protein and lipid are higher than those of picoplankton and nanoplankton, with approximate ranges (2.5, 10), (2.0, 25) and (0.5, 5.0) mg m−3, respectively (Fig. 4c, f, i). Nanoplankton contributions are generally in the range (1, 3) mg m−3 of carbohydrate, (1, 5) mg m−3 of protein and (1, 5) mg m−3 of lipid, respectively (Fig. 4b, e, h), except in the oligotrophic gyres, where all the concentrations reduce to <0.05 mg m−3 (Fig. 4b, e, h).

Fig. 4
figure 4

Annually averaged surface macromolecular concentrations (mg m−3) in picoplankton, nanoplankton, and microplankton over 1997–2013: a picoplankton carbohydrate, b nanoplankton carbohydrate, c microplankton carbohydrate; d picoplankton protein, e nanoplankton protein, f microplankton protein; and g picoplankton lipid, h nanoplankton lipid, i microplankton lipid, computed based on the methodology described in Section Methodology. Overlaid on the global maps are thin black lines representing the boundaries of the Longhurst biogeographical provinces [35, 36] (Color figure online)

Macromolecular concentrations in Longhurst provinces

The geographical variations of carbohydrate, protein and lipid in the world oceans can be inferred from their regionally binned concentrations in the Longhurst biogeographical provinces [35, 36]. Given that the ocean-colour data from satellites are inadequate (and may be more erroneous) in the polar regions over most of the year, the estimates from the polar provinces (6 out of 54 Longhurst provinces) are excluded from further discussion. For the remaining 48 provinces, the spatial estimates of χcarbo, χprot, χlipid and the concentrations carbohydrate, protein and lipid are computed from their corresponding annually averaged global maps (Fig. 5). These provinces include 14 Westerlies (NADR, GFST, NASW, MEDI, NASE, PSAE, PSAW, KURO, NPPF, NPSW, TASM, SPSG, SSTC, SANT), 12 Trades (NATR, WTRA, ETRA, SATL, CARB, MONS, ISSG, NPTG, PNEC, PEQD, WARM, ARCH) and 22 Coastal (NECS, CNRY, GUIN, GUIA, NWCS, BRAZ, FKLD, BENG, EAFR, REDS, ARAB, INDE, INDW, AUSW, ALSK, CCAL, CAMR, CHIL, CHIN, SUND, AUSE, NEWZ) provinces (full names of the provinces are given in Supplementary Table S2, and the descriptions in refs. [35, 36]). The Westerlies, Trades and Coastal provinces are shown in Fig. 2a.

Fig. 5
figure 5

Annually averaged surface macromolecular composition within Longhurst biogeographical provinces [35] computed over 1997–2013. Box plots with annual median (black dots), interquartile ranges (thick red bar), and ranges (thin whiskers) for a carbohydrate-to-chlorophyll ratio (χcarbo), b protein-to-chlorophyll ratio (χprot), c lipid-to-chlorophyll ratio (χlipid), d carbohydrate (mg m−3), e protein (mg m−3) and f lipid (mg m−3), are shown for 48 Longhurst provinces. The provinces include 14 Westerlies (NADR, GFST, NASW, MEDI, NASE, PSAE, PSAW, KURO, NPPF, NPSW, TASM, SPSG, SSTC, SANT), 12 Trades (NATR, WTRA, ETRA, SATL, CARB, MONS, ISSG, NPTG, PNEC, PEQD, WARM, ARCH) and 22 Coastal (NECS, CNRY, GUIN, GUIA, NWCS, BRAZ, FKLD, BENG, EAFR, REDS, ARAB, INDE, INDW, AUSW, ALSK, CCAL, CAMR, CHIL, CHIN, SUND, AUSE, NEWZ) provinces. The provinces within Westerlies, Trades and Coastal are arranged from north to south as they appear in the Longhurst’s original list. Descriptions of the provinces can be found in [35, 36], and the full names of the provinces along with the plotted median values of the annual averages are given in Table S1

Spatial variability of the estimates in the Coastal provinces are found to be higher than those in the Westerlies or Trades provinces (Fig. 5), with the lowest variability in the Westerlies provinces (Figs. 35), reflecting that coastal upwellings would strongly influence the distribution of phytoplankton macromolecules (similar to chlorophyll distribution). The spatial medians of χcarbo, χprot and χlipid are lowest (5.69, 13.86 and 8.0, respectively) for the NWCS (North-West Atlantic Coastal Shelves) province, and highest (8.95, 45.13, 29.56, respectively) for the NPSW (North Pacific Subtropical Gyre West) province (Fig. 5a–c, and Supplementary Table S2). The NECS (North-East Atlantic Coastal Shelves) province is characterised by the highest surface concentrations (Fig. 5d–f) of the annually averaged spatial medians of carbohydrate (9.53 mg m−3), protein (25.2 mg m−3), and lipid (14.81 mg m−3). The lowest surface concentrations (spatial median) of carbohydrate (0.60 mg m−3) and lipid (1.75 mg m−3) are obtained in the NATR (North Atlantic Tropical Gyral) province (Fig. 5d, f, Table S2), whereas, the lowest concentrations of protein (2.11 mg m−3) is obtained in the NPSW (North Pacific Subtropical Gyre West) province (Fig. 5e Table S2), both of which are generally populated by small picoplankton throughout the year.

The size-partitioned estimates also vary considerably across the 48 Longhurst provinces (Table S3, also Fig. 4). The spatial medians of picoplankton carbohydrate, protein and lipid are lowest (0.13, 1.11 and 0.83 mg m−3, respectively) in the MEDI (Mediterranean Sea, Black Sea) province, and highest (1.87, 13.33 and 9.43 mg m−3, respectively) in the NECS (NE Atlantic Coastal Shelves) province (Fig. 4a, d, g, Table S2). For nanoplankton, the median concentrations vary from their lowest values (0.09, 0.13 and 0.06 mg m−3, respectively) in the WARM (W. Pacific Warm Pool Trades) province, to their highest values (3.34, 7.36 and 3.66 mg m−3, respectively) in the China Sea Coastal (CHIN) province (Fig. 4b, e, h, Table S2). For microplankton, the median concentrations of carbohydrate and protein vary from their lowest values (0.01 mg m−3 for both) in the WARM province, to their highest values (3.39 and 3.37 mg m−3, respectively) in the CHIN province; but that for lipid is found to be highest (1.42 mg m−3) in the NE Atlantic Coastal Shelves (NECS) province, and lowest (0.01 mg m−3) in the WARM province (Fig. 4c,f,i, Table S2). Unsurprisingly, the province-wise distribution of the three macomolecular concentrations show spatial patterns generally consistent with our understanding of the biogeography of phytoplankton size structure.

Global-ocean stocks of phytoplankton macromolecules

The annually averaged global stocks are: 0.044 Gt of carbohydrate with monthly range (0.041, 0.05) Gt; 0.17 Gt of protein with monthly range (0.155, 0.18) Gt; and 0.108 Gt of lipid with monthly range (0.098, 0.121) Gt (Fig. 6, and Supplementary Table S4). The largest global stocks are obtained in the month of September, which generally matches with the time of phytoplankton bloom in large parts of the equatorial-southern hemisphere [50]. The smallest stocks are obtained in the month of June, generally after the termination of the spring blooms.

Fig. 6
figure 6

Annually averaged macromolecular compositions for three phytoplankton size classes over 1997–2013. Grouped bars represent the monthly and annual stocks of the total (height of each bar) and size-partitioned (blue—picoplankton fraction, green—nanoplankton fraction, and red—microplankton fraction) estimates of carbohydrate (first bar in each group), protein (second bar in each group) and lipid (third bar in each group), computed from the surface concentrations through integrations over the mixed-layer depths. All concentrations are expressed in gigatonnes (Gt) (Color figure online)

The percentages of the size-partitioned carbohydrate, protein and lipid stocks also vary over the months of the years (Fig. 6). The stocks constitute the lowest percentage of picoplankton carbohydrate ~ 46% (equivalent to 0.02 Gt, with monthly range of 43–53%), compared with the percentages of picoplankton protein ~78% (equivalent to 0.133 Gt, with monthly range of 76–83%), and picoplankton lipid ~85% (equivalent to 0.092 Gt, with monthly range of 83–88%) (Supplementary Table S4). The stocks further constitute ~33% of nanoplankton carbohydrate (equivalent to 0.015 Gt, with monthly range of 32–36%), which is considerably higher that the percentages of nanoplankton protein ~ 17% (equivalent to 0.028 Gt, with monthly range of 14–18%), and nanoplankton lipid ~12% (equivalent to 0.013 Gt, with monthly range of 10–13%). Similarly, the percentage of microplankton carbohydrate ~21% (equivalent to 0.009 Gt, with monthly range of 16–24%) is significantly higher than the percentages of microplankton protein ~5% (equivalent to 0.009 Gt, with monthly range of 3–7%) and microplankton lipid ~3% (equivalent to 0.003 Gt, with monthly range of 2–4%). But clearly, for any given macromolecular stock, the largest contribution comes from picoplankton and the smallest from microplankton (Fig. 6).

No previous estimates were available to compare with the stocks of carbohydrate, protein and lipid reported here. However, the carbon-based macromolecular stocks could be viewed in conjunction with the stocks of total phytoplankton biomass (in carbon units), which were estimated previously from satellite remote sensing (e.g., see refs. [16,17,18]). For example, the annually averaged stocks of the total phytoplankton biomass varied between 0.2 and 1.0 GtC depending on the estimation method (e.g., see refs. [16,17,18, 51,52,53]). The annually averaged stocks of carbohydrate protein and lipid and their sum total, estimated above, are within this range. Recent studies [40, 54] also suggested that under ‘nutrient-sufficient, exponential growth conditions’ the median composition of the dry weight of microalgae contains 15% carbohydrate 32.2% protein and 17.3% lipid. With respect to the most recent satellite-based estimates of phytoplankton biomass (i.e., ~0.3 GtC, based on refs. [17, 18]), the percentages of the annually averaged global stocks (which included both nutrient sufficient and oligotrophic waters) of carbohydrate, protein and lipid are ~15%, ~57%, ~36%, respectively. These preliminary results thus suggests that on a global scale, the relative proportions of carbohydrate in phytoplankton might be more robust than the proportions of protein and lipid. However, direct in situ measurements would be required to further validate these results.

Algorithm uncertainties on global map

The uncertainty propagation maps based on the sensitivity analysis suggest that the relative uncertainties in lipid estimates would be higher than those in protein or carbohydrate for most of the world’s productive regions (Fig. 7); but in the less productive oligotrophic waters, the relative uncertainties in all the estimates would be generally comparable. The relative uncertainties in carbohydrate estimates would be within 30–45% in most of the upwelling and productive regions and coastal waters, but would reduce to <15% in the subtropical gyres and oligotrophic waters (Fig. 7a). Similar spatial pattern are obtained for the relative uncertainties in protein and lipid estimates, although the magnitudes of the relative uncertainties would be different. For protein and lipid the relative uncertainties would be <15% and <25%, respectively, inside the gyres, and between 30–40% and 35–50%, respectively, in major parts of the Northern hemisphere; and but would increase up to 60–64% and 65–80%, respectively, in large parts of the southern ocean and around the overlapping regions of the oligotrophic and eutrophic waters (Fig. 7b, c).

Fig. 7
figure 7

Algorithm uncertainty maps corresponding to the estimates of a phytoplankton carbohydrate, b phytoplankton protein and c phytoplankton lipid based on the sensitivity analysis in Section Uncertainties and biases. Annually averaged uncertainties in estimating the surface concentrations of carbohydrate, protein and lipid are shown for an overall relative uncertainty of 25% in ξ retrievals combined with 95% confidence intervals of the allometric parameters reported by refs. [40] (Color figure online)

Concluding remarks

Although a variety of satellite-based ocean-colour algorithms have already been developed to retrieve chlorophyll-a and its contributions in PFTs and PSCs (e.g., review by [15]), and phytoplankton carbon [16,17,18,19], no methodology exists so far to estimate from satellites, the concentrations of macromolecules that essentially determine the energy value of phytoplankton. The bio-optical method presented here would be the first one to compute, from satellite data, the concentrations of phytoplankton carbohydrate, protein and lipid, and the resultant energy value of phytoplankton on a global scale. In this novel approach, the satellite-derived bio-optical fingerprints of the living phytoplankton combined with allometric relationships are used, which builds on the ocean-colour algorithms recently developed for retrieving phytoplankton cell size, the exponent of the phytoplankton size spectra, phytoplankton carbon and PSCs from satellite [18, 33, 34]. Presented are the first estimates of annually averaged concentrations of carbohydrate, protein, lipid, and ratios of chlorophyll-a to cellular macromolecular concentrations over the global oceans as well as those for the Longhurst biogeochemical provinces, over the period 1997–2013. Although the current estimates are based on the OC-CCI merged satellite products, by design, the methodology would be equally applicable to ocean-colour data from any other satellite sensor.

Recent studies based on either ocean-colour data [16,17,18, 52] or Earth System models (e.g., CMIP5, [53]), have attempted to improve the estimates of the stocks of phytoplankton carbon, and have narrowed down the estimation range of the annually averaged stocks. But unclear is how the total carbon stock partitions into the stocks of essential carbon-based macromolecules in phytoplankton. For example, although the proportions of the macromolecules to dry weight of phytoplankton are reported for ideal nutrient-rich conditions [40, 54], little in known about those proportions in diverse oceanographic regions where growth conditions deviate from ideal. This study independently estimates the annually averaged stocks of the three essential phytoplankton macromolecules, and finds that the sum total of these estimates are well within the range of the reported stocks of total phytoplankton carbon. The estimates would be potentially useful for understanding the cellular allocation of carbon to carbohydrate, protein and lipid pools in phytoplankton, both spatially and over time, with implications for trophic transfer models, and higher trophic or fisheries models.

The lack of adequate direct measurements on carbohydrate, protein and lipid overlapping the temporal coverage of the ocean-colour data have restricted rigorous validation of the satellite-derived estimates. Therefore, new in situ measurements of phytoplankton macromolecules across various oceanic conditions should be a priority for increasing the reliability and reducing the bias and uncertainties of the satellite-based estimates. Adequate direct measurement would also allow computation of observation-based uncertainties such as RMSE and bias, pixel-by-pixel, and providing those to the users. The sensitivity analyses carried out here, with assumptions on fixed relative uncertainties of <30% for the input parameters (following the requirement provided by Global Climate Observing System, [55]), have identified oceanographic regions where the estimates would be less (or more) sensitive to relative uncertainties in satellite inputs. But, how the relative uncertainties may alter (reduce or increase), due to regional variations of uncertainties in the input parameters, and how those may impact the estimates of the global stocks, would require further investigations. The sensitivity analyses however have shown promise that the estimation errors could reduce, as the retrievals of satellite-based IOPs become more accurate. Finally, due to the constraints of inadequate in situ validation data, and large uncertainties and biases in the optically complex waters, arising from the presence of high concentration of coloured-dissolved organic matters, sediments, clouds and ice, the current estimates may be less reliable in coastal waters and high latitudes, than those in open oceans. So, the applicability and reliability of the estimates to optically complex waters would also be subject to further investigations, possibly including improved satellite inputs, as the satellite era enters into higher temporal and spatial resolution.