MetaFlux: Meta-learning global carbon fluxes from sparse spatiotemporal observations

We provide a global, long-term carbon flux dataset of gross primary production and ecosystem respiration generated using meta-learning, called MetaFlux. The idea behind meta-learning stems from the need to learn efficiently given sparse data by learning how to learn broad features across tasks to better infer other poorly sampled ones. Using meta-trained ensemble of deep models, we generate global carbon products on daily and monthly timescales at a 0.25-degree spatial resolution from 2001 to 2021, through a combination of reanalysis and remote-sensing products. Site-level validation finds that MetaFlux ensembles have lower validation error by 5–7% compared to their non-meta-trained counterparts. In addition, they are more robust to extreme observations, with 4–24% lower errors. We also checked for seasonality, interannual variability, and correlation to solar-induced fluorescence of the upscaled product and found that MetaFlux outperformed other machine-learning based carbon product, especially in the tropics and semi-arids by 10–40%. Overall, MetaFlux can be used to study a wide range of biogeochemical processes.

there is a growing need to have a globally continuous, high-resolution dataset that best represents critical regions that, unfortunately, tend to have few data points.
To bridge these gaps, we aim to (i) evaluate the performance of meta-learning in environments where spatiotemporal information is sparse, (ii) check its robustness when predicting extreme cases that are critical to the carbon cycle, and (iii) upscale point data to a globally continuous map using an ensemble of meta-trained deep models. In particular, we focus on gross primary production (GPP) and ecosystem respiration (R eco ) as specific applications of the broader terrestrial carbon cycle. The upscaled product is resolved at 0.25-degree spatial resolution, spanning either at daily or monthly temporal resolutions between the years 2001 and 2021. Preliminary analysis shows that our global product ("MetaFlux") is internally consistent in terms of its seasonality, interannual trend, and variability, and has high correlation with satellite-based solar-induced fluorescence (SIF) -a proxy for photosynthesis -when compared with other data-driven products, especially in the tropics and semi-arid critical regions.
Finally, the dataset is freely accessible in Zenodo at https://doi.org/10.5281/zenodo.7761881 12 , while the meta-learning code can be reproduced and extended from https://github.com/juannat7/metaflux with example notebooks to apply our approach to your specific use cases. The overall methodology is summarized in Fig. 1.

Methods
Meta-learning: learning how to learn. Meta-learning is a machine-learning paradigm that trains a model to learn new tasks from sparse data efficiently 13 , leveraging information that comes from tasks with more available data. In general and as illustrated in Fig. 3a, meta-learning involves two stages: a meta-training and a meta-update stage. During meta-training, the model, f θ (a function f that is parameterized by θ), proposes intermediate parameters, φ, that minimize the loss of base tasks 14 . In the meta-update step, the model fine-tunes its parameter, θ, given φ and the target tasks 14 , seeking the optimal parameter θ*. We define target tasks as a collection of stations in data-sparse regions, including the tropics, semi-arid regions, and representative stations in each ecoregion defined by plant functional types (PFTs), while the base tasks consist of the complement of the former (i.e. stations in data-abundant regions). Given the two-step gradient update procedures in meta-training and meta-update loops, the optimal parameters θ*, are not biased toward data-abundant base tasks, as illustrated in Fig. 3a, whereas in the baseline case (Fig. 3b), the optimal learned parameters would be biased toward data-abundant tasks as each data point contributes similarly to model's learning. The details on how the data is split, including how the base (D base ) and target (D target ) datasets are divided for training and testing, are provided in the "Training setup" section below and illustrated in Fig. 4. For this work, we use an optimization-based meta-learning approach that is adapted from the model-agnostic meta-learning (MAML) 13 as detailed in Fig. 2a. Differentiable learners. Next, we will discuss the different deep learning models used for this work, including a multilayer perceptron (MLP), long-short term memory (LSTM), and bi-directional LSTM (BiLSTM).
Multilayer perceptron (MLP). MLP or the feedforward artificial neural network is a fully connected deep model that can capture nonlinear relationships between inputs and the response variable 15 . Generally, an MLP www.nature.com/scientificdata www.nature.com/scientificdata/ consists of the input and output layers with several hidden layers and is activated by a set of nonlinear functions. In this paper, we use the Leaky Rectified Linear Unit (LeakyReLU) which is formulated in Eq. 1, where α controls the extent of "leakiness" in the negative x direction. The MLP model receives instantaneous weather data and vegetation index.
Long-Short Term Memory (LSTM). Time-series representing environmental processes tends to be strongly autocorrelated in time 16 and Recurrent Neural Networks (RNN) were first introduced to solve this issue. The LSTM model was then proposed by 17 to address the issues of vanishing and exploding gradients commonly observed in RNNs. The model can preserve long-term dependencies of sequential data through its gated structure that controls how information flows across cells. It is able to leverage association across multiple timesteps Schematic diagram of our (a) meta-learning approach that is meta-trained on data-abundant tasks to obtain a set of φ, and fine-tuned on data-sparse tasks to get θ*; (b) baseline algorithm that is not meta-trained. The optimal θ* in the latter case tends to be biased towards data-abundant tasks as represented by the gradient sets * n φ .

Fig. 2
Algorithm for both optimization-based meta-learning and its non-meta-learning counterpart.
www.nature.com/scientificdata www.nature.com/scientificdata/ to inform inferential tasks where time dependency is present and significant. This can be especially useful to represent water stress, for instance, that depends not only on current daily precipitation but also on previous time steps of precipitation (water supply) and evaporative demand (temperature, radiation, humidity).

Bi-directional LSTM (BiLSTM).
The BiLSTM model is trained on both forward and backward timesteps to best estimate the value at the current timestep, t 18 , similar to reanalysis products (compared to weather forecasts that only use past information like LSTMs). BiLSTM has been used in cases where the past is as important as the future contexts 19 . The equations describing a BiLSTM cell are similar to that in LSTM with a slight modification in the hidden state representations that have to capture the forward and backward timesteps.
Training setup. We train an ensemble of meta-trained deep networks. The purpose of training an ensemble is to quantify uncertainty and reduce the bias of each individual model 20 . The final model architecture and hyperparameters, including batch size and learning rates, are determined after performing a k-fold cross validation (k = 5) on the training set. In all, we use a 3-layer MLP with a hidden size of 350. We replace the first layer with either the LSTM or BiLSTM modules for the LSTM and BiLSTM models respectively to capture temporal features prior to the final prediction layer. The optimized ensemble is trained by minimizing the mean squared error (MSE). We train our models to estimate GPP and R eco at daily and monthly timescale across 206 FLUXNET2015 sites 1 (retrievable from https://fluxnet.org/data/fluxnet2015-dataset/) using a combination of meteorological and remote sensing inputs including precipitation, air temperature at 2-meter (Ta), vapor pressure deficit (VPD), and incoming short-wave radiation from ERA5 reanalysis data 21 (retrievable from https://cds.climate.copernicus.eu/) and leaf area index (LAI) from the Moderate Resolution Imaging Spectroradiometer (MODIS) product 22 (retrievable from https://modis-land.gsfc.nasa.gov/). We retrieve the associated time-series closest to each tower site. In particular, we use the night-time partitioning methods for GPP and R eco and match each flux record with reanalysis and remote sensing data corresponding to the station of interest. Aside from the target variable, we perform a z-normalization of the inputs to improve the learning process (Eq. 2).
X where E(X) is the estimated expected value of the variable X on our training data, and σ X the corresponding sample standard deviation. Next, we define class C i as a set of batches consisting of input variables and target flux observations, with i denoting the index of batches of size 256. The definition of batch differs between linear MLP and time-series models, such as LSTM and BiLSTM. A batch, in the linear MLP case, refers to a collection of instantaneous data points, while in time-series models, it corresponds to a set of 30-day continuous data points. The choice of a 30-day window to form a single data point in the latter case is to better capture seasonal water stress 23 . We construct D target by randomly selecting half of the stations in the tropical and semi-arid regions (defined by the Köppen classification 24 , retrievable from http://www.gloh2o.org/koppen/), which are sparse, and one station from each plant functional type (PFT) including those in the cropland and boreal areas 25 . By extension, the D base consist of stations that are complement to D target . Each dataset is divided into training (D train ) and testing (D test )  www.nature.com/scientificdata www.nature.com/scientificdata/ sets with an 80:20 split ratio. As the term suggest, the former is used to train the model f θ , while the latter is used to validate the performance of the model. Overall, the set of all possible datasets, D, includes  D  D  D  D  (  ,  ,  ,  )   train  base  test  base  train  target  test target . Figure 4 illustrates how the dataset for meta-learning is constructed, with base tasks data being used for meta-training and that from target tasks for meta-update steps. We meta-train three models: MLP, LSTM, and BiLSTM, with an ensemble of five members each; where their individual weights are randomly initialized.
The non-meta-learning baseline uses identical architectures and hyperparameters, but the models do not have a meta-update outer loop (see Fig. 2b). This learning paradigm is similar to a single-step gradient descent learning approach 26 where we backpropagate the gradient of the loss function to update f θ . This learning mode, however, can be biased to representations of tasks that have a lot more data. To ensure that a similar data structure is used in the baseline case, we compile D D Upscaling of global products. For the upscaling portion of this work, we use a similar set of meteorological and remote sensing inputs as during training at either the daily or monthly timesteps. Since VPD is not available in the existing ERA5 catalogue, we estimate it from air and dewpoint temperatures through the saturated (SVP) and actual vapor pressure (AVP) relation: VPD = SVP-AVP, which are both functions of Ta and dewpoint temperatures (Td). Finally, the spatial resolution of the resulting data inputs is harmonized to 0.25-degree using an arithmetic averaging. The final product has four variables, including the ensemble mean estimate of GPP and R eco , and its uncertainty as captured by the standard deviation.
evaluation on the site and global level. First, we compare the performance of meta-trained versus non-meta-trained models in terms of their RMSE scores on the testing sets. In addition, we evaluate how robust meta-trained models are in predicting extreme fluxes. This is done by selecting GPP or R eco fluxes that exceed a predefined z-normalized threshold, t, that we vary between 1.0 and 2.0 (i.e. higher threshold means more extreme observations away from the mean value). Next, we evaluate the upscaled product by analyzing its seasonality and interannual trends across climate zones. Thereafter, we compute the interannual variability using the interannual coefficient of variation (CV; Eq. 3) at the pixel level: where σ and μ are the interannual standard deviation and mean, respectively.

Data records
The global products amount to around 50GB and are freely accessible in Zenodo at https://doi.org/10.5281/ zenodo.7761881 12 . The spatial resolution is 0.25-degree, extending between 90-degree north to 90-degree south, and between 180-degree west and 180-degree east. We mask out cold regions that consist of the Arctic circle and Antarctica. Each Network Common Data Form (NetCDF) file contains four variables: GPP, R eco , GPP_std, R eco _ std that represent GPP, R eco ensemble mean and their uncertainties respectively. Temporally, each file is resolved at either the daily or monthly timescale. For instance, Fig. 5 illustrates the annual ensemble mean, while Fig. 6 the ensemble uncertainties of GPP and R eco for the year 2021. We note that GPP tends to have higher uncertainty than R eco , especially in the equator and higher-latitude regions.
For the daily product, the naming convention for each .nc file is METAFLUX_GPP_RECO_daily_<− year><month>.nc; where <year> takes a value between 2001 and 2021 and <month> between 01 and 12 for January and December.
For the monthly product, we perform identical training and upscaling steps but using monthly, rather than daily fluxes, reanalysis, and remote sensing products. The naming convention for each file is METAFLUX_GPP_ RECO_monthly_<year>.nc; where <year> takes a value between 2001 and 2021.

Technical Validation
In this section, we first evaluate our meta-learning approach based on site-level validation RMSE and its robustness to extreme observations. Next, we examine the seasonality, interannual trend and variability, and correlation with independent SIF products. evaluation of meta-learning as a learning framework. Convergence and site-level performance. As illustrated in Fig. 7 and Table 1, meta-trained deep models generally perform better than their baseline non-meta-trained counterparts. For instance, the validation RMSE of the meta-trained MLP on GPP is 3.13 gC m −2 d −1 ± 0.06 as compared to 3.47 gC m −2 d −1 ± 0.07 in the baseline case. A similar result is observed for R eco where the RMSE of the meta-trained MLP is 3.07 gC m −2 d −1 ± 0.05 as compared to 3.31 gC m −2 d −1 ± 0.07 in the baseline case. In addition, the choice of deep networks matters. Overall, models that incorporate temporal www.nature.com/scientificdata www.nature.com/scientificdata/ information, i.e. the LSTM and BiLSTM models, perform better than models that do not. In the GPP case, for example, the non-meta-trained BiLSTM model has the lowest validation error of 3.00 gC m −2 d −1 ± 0.04, followed by the meta-trained LSTM model with an RMSE of 3.06 gC m −2 d −1 ± 0.06. This confirms our physical intuition that water stress, which tends to regulate productivity, builds up over many days to months and thus requires a memory process as captured by the recurrent neural networks. Moreover, plant photosynthesis and respiration can acclimate to the prevailing environmental conditions, such as temperature, light and VPD 31,32 , which tend to be captured more effectively by memory-informed models. Nonetheless, the addition of bi-directionality in the BiLSTM model does not appear to significantly reduce error in the meta-trained models. This can be because the concept of data assimilation from future context has been captured through the process of meta-learning itself or that the signal coming from unidirectional timeseries is sufficiently saturated to parameterize the model. In other words, since our meta-learning approach primarily considers the spatial heterogeneity of the fluxes (e.g., across climate zones and PFTs), this spatial information, along with the temporal signals coming from BiLSTM gradient steps, result in a more unstable learning due to signal oversaturation which is evident from the larger convergence spread across model runs. This can be regularized by considering not just spatial, but also the spatiotemporal heterogeneity in a meta-learning approach 33 , though this will increase the complexity of the algorithm and could potentially limit its extrapolation capacity. This remains the subject of future work.
Robustness under extreme conditions. Making an accurate estimate for extreme cases is especially important in climate science because extreme weather tends to cause catastrophic damages, such as major droughts, wildfires, or plant mortality 34,35 . Fig. 8 illustrates the performance of our meta-trained models under an increasing magnitude of extremes as defined by the z-normalized threshold, t. In general, our meta-trained models (orange line) are more robust in predicting extreme cases of observed GPP and R eco (i.e. lower validation RMSE) than their baseline counterparts (blue line), with a difference of around 1.2 gC m −2 d −1 and 0.7 gC m −2 d −1 for GPP and R eco respectively. www.nature.com/scientificdata www.nature.com/scientificdata/ If we further examine model performance across climate zones (Tables 2, 3) and select extreme fluxes with a normalized-target threshold, t, that is greater than 1.0, we find that our meta-trained models outperform the baselines. The reason why we choose this threshold is to have sufficient extreme observations across climate zones such that a more meaningful comparison can be made. In the GPP case, for example, meta-trained ensemble has lower validation RMSE of 3.78 gC m −2 d −1 ± 0.33 (versus 4.10 gC m −2 d −1 ± 0.29) and 3.04 gC m −2 d −1 ± 0.02 (versus 3.45 gC m −2 d −1 ± 0.06) in the semi-arid and tropics, respectively. A similar finding is observed for R eco where meta-trained ensemble has lower validation RMSE of 2.35 gC m −2 d −1 ± 0.04 (versus 2.65 gC m −2 d −1 ± 0.06) in the tropics. These results are promising as the representation of both the tropics and semi-arid regions in many upscaled products is often challenging due to the limited number of observations available and the complex, memory-like processes involved. For example, in the semi-arid regions, there is a build-up of time-dependent water stresses 36 , while in the tropics, there is a complex seasonal cycle of leaf flushing and phenology 37,38 . Our approach is superior in its ability to reproduce carbon fluxes in the tropics and semi-arid areas because limited data here are optimally enriched with shared information coming from other data-abundant regions through meta-learning. Generally, GPP has higher uncertainty than R eco , especially in the equator and higher-latitude regions. In general, the meta-trained models perform better, and the choice of internal learner matters as demonstrated by the overall lower RMSE of the either LSTM or BiLSTM time models that account for temporal dependency. www.nature.com/scientificdata www.nature.com/scientificdata/ evaluation of meta-learned global data. Now that we have validated our meta-learning framework on the site level, we proceed to evaluate the internal consistency of our upscaled product. This includes the analysis of seasonality, interannual variability, and comparison to SIF as an independent photosynthesis product.

Model
Temporal analysis. First, we analyze the seasonality of our upscaled GPP and R eco across months for the years between 2001 and 2021. As shown in Fig. 9, both fluxes exhibit similar seasonality albeit at different magnitudes. The tropics (including the dry and wet regions) contribute the most to the global GPP and R eco , as expected 39,40 , while the semi-arid regions contribute the least 41 . Carbon fluxes in the temperate (northern hemisphere) and continental regions exhibit unimodal variations that peak in the summer (June, July, and August -JJA), while those in the southern temperate regions peak in December, January, and February (DJF) 42 . On average, the temperate regions have higher carbon fluxes than the continental areas, which tend to be limited by light and temperature, with shorter growing seasons 43 .
Another interesting analysis is to understand the long-term trends of our global carbon fluxes. As observed in Fig. 10, our meta-trained global carbon product shows an overall increase in GPP by 0.0113 PgCyr −1 and R eco by 0.0101 PgCyr −1 . We extend Fig. 10 by making a comparison with other carbon flux products, including those from light response function (LRF) 44 , P-model 45 , MODIS17 (MOD17) 46 , Soil Moisture Active Passive (SMAP) 47 , vegetation photosynthesis model (VPM) 27 , and Global LAnd Surface Satellite (GLASS) 48 for GPP, as well as  Fig. 8 Comparison between the performance of meta-trained and baseline models at an increasing level of extreme (a) GPP and (b) R eco daily observations. The y-axis indicates the average RMSE when the ensemble predicts carbon fluxes with z-value greater than the threshold t. Overall, we find that the meta-trained ensemble has lower RMSE across increasing extreme threshold than its baseline counterpart.  Table 3. Robustness of meta-trained ensemble when inferring extreme R eco observations across climate zones at the normalized-target threshold, t > 1.0 (all units in gC m −2 d −1 ). The numbers in bold represent models with lower RMSE between the baseline and meta-learning cases across different climate zones. (2023) 10:440 | https://doi.org/10.1038/s41597-023-02349-y www.nature.com/scientificdata www.nature.com/scientificdata/ Fluxcom 29 for both GPP and R eco . Overall, they show similar peaks and declines, albeit at varying magnitudes (between 100-140 PgC yr −1 ) as shown in Fig. 11.

Climate zones Mean extreme GPP Baseline RMSE Meta-trained RMSE
Interannual variability. We find that the semi-arid regions of Australia, America, and some parts of the northern latitudes have the largest interannual variability of GPP and R eco (Fig. 12). This is consistent with results from 2 and 3 that reported significant contribution of these regions, particularly of the Australian ecosystems, in explaining much of the global carbon interannual variability. As a result, the high turnover rate of carbon pools in these semi-arid environments warrants further research into how the climate and anthropogenic factors can account for this large interannual variability, such as the extent of carbon stock decomposition (e.g. due to wildfire) and accumulation during the dry and wet seasons. In addition, our upscaled product shows high interannual variability in the dry tropical regions of Asia. However, this variation becomes smaller in the tropical forests of Asia, Africa, and America owing to their relatively stable climate. This can be attributed to the region's sensitivity to rainfall pattern driven by El Niño-Southern Oscillation (ENSO), or soil moisture 49,50 and rapid land-use changes 51 . In contrast to Fluxcom, our upscaled product does not show as much interannual variability, especially in desert regions (eg. Australia, Central America, South America, and Central Asia), which may be more accurate owing to the extremely low primary productivity there in the first place 52 . Nonetheless, we note that in some parts of the globe, especially along the Sahel and continental Western Europe, the interannual variability of carbon fluxes from MetaFlux is smaller. Physically, this phenomenon has been reported by 53 and 54 who observe how variations in terrestrial carbon productivity tend to be stronger in space rather than time. The second plausible reason would be that the ensemble captures much of this variability (i.e. expressed as standard deviation), where each member model learns a different temporal structure that can result in lower than expected mean interannual variability. Lastly, and as highlighted in Figs. 13, 14, meta-learning attempts to learn efficiently from historically underrepresented regions, such as the tropics, which tend to have low www.nature.com/scientificdata www.nature.com/scientificdata/ interannual variability. This potentially results in a reduction in such variability at higher latitudes, especially along the temperate and continental regions.
Comparison with Solar-induced fluorescence (SIF). In order to evaluate the quality of the seasonal cycle of our product, in particular GPP, we measure its correlation coefficient with several SIF products. MetaFlux GPP demonstrates higher Pearson correlation coefficient with both CSIF and TROPOSIF (Figs. 13,14,Tables 4,5) than Fluxcom GPP across the temperate, semi-arid, and tropical regions with values higher than 0.8 to 0.9 even at the very northern latitudes. In particular, the correlation coefficient of our upscaled product with TROPOSIF in the semi-arid, tropics, and temperate regions are 0.856 ± 0.083 (versus 0.726 ± 0.165), 0.546 ± 0.299 (versus 0.343 ± 0.164), and 0.919 ± 0.002 (versus 0.826 ± 0.021), respectively. A similar trend is also observed in the CSIF case where the correlation in the semi-arid, tropics, and temperate regions are 0.925 ± 0.026 (versus 0.914 ± 0.060), 0.772 ± 0.080 (versus 0.608 ± 0.105), and 0.922 ± 0.022 (versus 0.914 ± 0.037), respectively. Across the two SIF products, however, we observe weaker correlation strength in the continental regions. Upon further inspection of Figs. 13, 14, the weaker association could be due to the slight uptick of our GPP estimate during the DJF period, which can be attributed to the lower quality of LAI retrievals because of snow cover.  www.nature.com/scientificdata www.nature.com/scientificdata/ Finally, we inspect the pixel-level correlation distribution of MetaFlux and Fluxcom with long-term CSIF product, as illustrated in Fig. 15. In general, MetaFlux has lower correlation with SIF in the tropical rainforest of Indonesia and Amazon as well as the arid regions of Australia, Gobi, Arabian, Syrian, Karakum, Taklamakan, Gobi, and the Great Plains in Northern America. This trend is consistent with earlier reports by 55 , for example, who showed how arid and extremely wet tropical regions (e.g. rainforests) tend to have low GPP-SIF correlation because of weak seasonality that essentially drop correlations to background noise level.
In summary, we have developed a new terrestrial carbon flux product, MetaFlux, using an ensemble of meta-learned deep networks. We have demonstrated how meta-learning can better estimate fluxes in data-sparse, yet critical regions (e.g. semi-arid and the tropics) and are more robust to predicting extreme observations. Our global product is able to outperform other reference product when evaluated against independent measurement, such as SIF or on flux tower networks. We believe that although data sparsity can be a major limiting factor to our complete understanding of many climate processes, leveraging knowledge in other similar domains can be powerful to better understand processes and their response to the environment.

Usage Notes
The data is permanently stored in Zenodo at https://doi.org/10.5281/zenodo.7761881 12 and is available at either the monthly or daily temporal scale. Each file contains four variables, GPP, R eco , GPP_std, and R eco _std, that are resolved continuously at a 0.25-degree spatial resolution. We purposely mask out the cold regions of Antarctica and the Arctic circle because we assume the lack of GPP and R eco there. Although we do not mask out the arid regions (e.g. deserts), we would recommend users to do so in order to remove any artifical, though small, estimates. In addition, we do not estimate net ecosystem exchange (NEE) explicitly. One of the primary reasons is because their fluxes (and by extension, their magnitude of variability) are significantly lower than that of GPP and R eco , making estimations from current input variables difficult and because the underlying drivers of GPP and R eco can differ. Separating the fluxes will ensure better generalization across regimes. Nonetheless, since we use the night-time partitioning algorithm that extrapolates respiration-based NEE estimates (where GPP is    Table 5. Mean Pearson correlation coefficient for the seasonality of GPP (MetaFlux and Fluxcom) and TROPOSIF in the years 2019-2020 across climate zones. The numbers in bold represent product with higher GPP-SIF correlation.
www.nature.com/scientificdata www.nature.com/scientificdata/ assumed to be absent during night-time) to daytime 56 , users are able to get an approximation of NEE by subtracting GPP from R eco (i.e., R eco -GPP). However, this approximation is still subject to broader validation, which we leave for future work.

code availability
The meta-learning code is freely available and accessible at https://github.com/juannat7/metaflux. The repository contains notebooks that are customizable to one's needs beyond the scope of this work. Further questions, feedback, or comments can be directed to the corresponding author.