The FLUXNET2015 dataset and the ONEFlux processing pipeline for eddy covariance data

The FLUXNET2015 dataset provides ecosystem-scale data on CO2, water, and energy exchange between the biosphere and the atmosphere, and other meteorological and biological measurements, from 212 sites around the globe (over 1500 site-years, up to and including year 2014). These sites, independently managed and operated, voluntarily contributed their data to create global datasets. Data were quality controlled and processed using uniform methods, to improve consistency and intercomparability across sites. The dataset is already being used in a number of applications, including ecophysiology studies, remote sensing studies, and development of ecosystem and Earth system models. FLUXNET2015 includes derived-data products, such as gap-filled time series, ecosystem respiration and photosynthetic uptake estimates, estimation of uncertainties, and metadata about the measurements, presented for the first time in this paper. In addition, 206 of these sites are for the first time distributed under a Creative Commons (CC-BY 4.0) license. This paper details this enhanced dataset and the processing methods, now made available as open-source codes, making the dataset more accessible, transparent, and reproducible.


Background & Summary
For over three decades, the eddy covariance technique 1 has been used to measure land-atmosphere exchanges of greenhouse gases and energy at sites around the world to study and determine the function and trajectories of both ecosystems and the climate system. The technique allows nondestructive measurement of these fluxes at a high temporal resolution and ecosystem level, making it a unique tool. Based on high frequency (10-20 Hz) measurements of vertical wind velocity and a scalar (CO 2 , H 2 O, temperature, etc.), it provides estimates of the net exchange of the scalar over a source footprint area that extends up to hundreds of meters around the point of measurement. Soon after the first consistent measurement sites were operational, regional networks of sites were formed in Europe 2 and the US 3,4 , followed by similar initiatives in other continents [5][6][7] . Networks enabled the use of eddy covariance data beyond a single site or ecosystem for cross-site comparisons and regional-to-global studies [8][9][10][11][12][13][14][15] . These regional networks have evolved into long-term research infrastructures or monitoring activities, such as ICOS, AmeriFlux, NEON, AsiaFlux, ChinaFLUX, and TERN-OzFlux.
FLUXNET was created as a global network of networks [16][17][18] , a joint effort among regional networks to harmonize and increase standardization of the data being collected. It made possible the creation of global eddy covariance datasets. The first gap-filled, global FLUXNET dataset, which included derived, partitioned fluxes like photosynthesis and respiration, was the Marconi dataset 19 in 2000, with 97 site-years of data, followed by the 2007 LaThuile dataset 20 with 965 site-years of data, and finally in 2015 the FLUXNET2015 dataset 18,21 (hereafter FLUXNET2015) with 1532 site-years of data. Two main factors limited the numbers of sites and years included in each dataset: data policy and data quality. Willingness to share data under the selected data policy is a major reason why FLUXNET2015 likely only includes between 10-20% of existing sites globally-the total number of existing sites is still unknown. Then there is the evolution of processing pipelines and quality controls, leading to new issues being identified in the data that, if not solved in time, led to leaving out that data. The LaThuile dataset had a more restrictive policy and, in a few cases, previously undiscovered data issues, leading to fewer sites being included in FLUXNET2015. # A full list of authors and their affiliations appears at the end of the paper.

OPEN Data Processing Methods
The data contributed by site teams for inclusion in FLUXNET2015 encompassed fluxes, meteorological, environmental, and soil time series at half-hourly or hourly resolutions. Contributed data underwent a uniform data quality control process, with issues addressed in consultation with site teams. Data were then processed using the pipeline (Fig. 2) described in this section. The resulting data products were distributed through the FLUXNET-Fluxdata web portal 23 , where the usage of the dataset is tracked through a registration of all the Data sources. The first level of processing was completed by the site teams, including the calculation of half-hourly or hourly turbulent fluxes from high-frequency wind and concentration measurements, the averaging of meteorological variables sampled at shorter temporal resolutions, and the site team's own quality control procedures. The contributed fluxes were required to be submitted separately as turbulent and storage fluxes (components to be added for total flux), and not gap-filled or filtered for low turbulence conditions-see Aubinet et al. 24 for details. Control checks were implemented to ensure that this was the case, and to detect additional inconsistencies that site teams were asked to address, for example coherence of the timestamps, consistency across correlated variables, etc 25 . Starting from these data, we applied the same set of methods for all the remaining processing steps (from filtering to gap-filling and partitioning), increasing uniformity, and allowing quantification of the uncertainty introduced by the processing. The harmonization of data-especially of data quality-was a high priority while creating the dataset, and therefore extensive interaction with the site teams was necessary. Data were contributed through a regional network, and the formats provided by the networks were converted into a standard input for processing. This FLUXNET dataset led to the creation of a new cross-network specification for standard site data and metadata submission formats, now adopted by different regional networks.
Data processing pipeline overview. The data processing pipeline (Fig. 2) is divided into four main processing blocks. The first one is the data quality assurance and quality control (QA/QC) activities our team applied to data from all sites. This part was done with a combination of automated procedures and manual checks for all the variables in the contributed data. The next three blocks were all part of an automated pipeline that was executed separately for each site: the Energy & Water Fluxes processing block encompasses sensible and latent heat variables; the Carbon Fluxes processing block handles the variables for CO 2 fluxes like net ecosystem exchange (NEE); and, the Meteorological Variables processing block deals with all the meteorological measurements that are also used in the processing of fluxes and other products. At each processing step, a set of automated pre-and post-conditions are enforced, making sure the inputs and outputs of each step are within the expected behavior. The final step involved merging all the products generated at previous steps, and adding daily through yearly temporal aggregations of most of the variables in the dataset and related quality flags. At this step, automated checks were performed on all the variables to ensure consistency, and the final files with all the contributed and derived data products to be distributed are created. Supplementary Fig. SM2 shows the general steps involved in the processing in the sequence organized in the code, as available in the ONEFlux package 22 . All the steps have been implemented in the shared code except the Sundown partitioning (see Implementation Approach section for details).
Ensemble of results. The adoption of multiple methods for the same step (e.g., two methods for USTAR threshold calculation, or two or three methods for CO 2 flux partitioning, see below in this section for details) is motivated by the existence of different methods in the literature, using different assumptions and potentially  File format standardization. To process hundreds of sites, we needed consistent file formats that supported the input data and metadata. This led to multi-network agreements and creation of formats for data and metadata contribution to the regional networks 28,29 . These formats have now been adopted by networks in Europe and the Americas and by some instrument manufacturers, and are under consideration by other regional networks. In addition, automated extraction and conversion tools for direct format translation were implemented to work with data in older formats.
Data QA/QC steps. Data quality was checked before the processing started. If issues were identified that could not be resolved by the network-level data team, the site team was asked to suggest a course of action or send a new version of the data addressing the quality issue identified. The main data QA/QC steps were: single-variable checks, multi-variable checks, specialized checks, and automatic checks. Single-variable checks look at patterns in the time series of one variable at a time, for long-and short-term trends and other issues. Multi-variable checks look at the relationships among correlated variables (e.g., different radiation variables) to identify periods with disagreements. Specialized checks test for common issues in EC and meteorological data, like timestamp shifts or sensor deterioration. During this phase, a time series of top-of-the-atmosphere potential radiation (SW_IN_ POT) is also computed, using latitude/longitude coordinates and time 22 . These three types of checks are detailed in Pastorello et al. 25 . The automated checks apply variable-specific despiking routines adapted from Papale et al. 30 and apply a set of range controls per variable. This last step creates a series of flags that were discussed with the site managers for corrections and resubmissions and then used to filter the data in subsequent steps.
Meteorological products. The main processing applied to meteorological data was gap-filling by two independent methods: Marginal Distribution Sampling 31 (MDS) and ERA-Interim 32 . Data gap-filled by MDS (applied to all variables that are gap-filled) are identified by the _F_MDS suffix. Data gap-filled using ERA-Interim (2020) 7:225 | https://doi.org/10.1038/s41597-020-0534-3 www.nature.com/scientificdata www.nature.com/scientificdata/ downscaling (six variables that are available in the reanalysis dataset) have a _F_ERA suffix. The final gap-filled time series for variables combines both of these methods (indicated by an unqualified _F suffix), following a data-quality-based selection approach (see below). For SW_IN, in case of gaps or in case the variable was not measured, we performed a calculation from PPFD_IN when available, calculating the conversion factor from the periods of overlap of the two measurements (and assuming a factor 0.48 J (µmol photon) −1 when the sensors did not run in parallel at the site).
MDS. The MDS method, introduced in Reichstein et al. 31 , is applied to all variables that may be gap-filled. It works by seeking meteorological conditions physically and temporally similar to the ones for the missing data point(s). The restrictions on the size of the time window and which variables must be available are incrementally relaxed until a suitable set of records is found to fill the gap in the target variable. All values of the target variable satisfying the current set of conditions are averaged to generate the fill value. The method was applied as described in the original implementation 31 , using SW_IN, TA and VPD as drivers. The basic three scenarios for the time when the target variable is missing are: (i) all three drivers are available; (ii) only SW_IN is available; and (iii) all three drivers are missing. Based on the available co-located variables, a search for similar conditions is started, keeping the searching window as small as possible to avoid changes in other slow-changing drivers (phenology, water availability, etc.). The more variables missing and the larger the time window, the lower the confidence indicated by the _F_MDS_QC flag. The values for this flag are (0-3): _F_MDS_QC = 0 (measured); _F_MDS_QC = 1 (filled with high confidence); _F_MDS_QC = 2 (filled with medium confidence); _F_MDS_QC = 3 (filled with low confidence). For details on the implementation, see the original paper 31 and the ONEFlux source code 22 .
ERA-Interim. This method is based on the ERA-Interim (ERA-I) Reanalysis global atmospheric product 33,34 created by the European Centre for Medium-Range Weather Forecasts (ECMWF)-ERA stands for ECMWF Re-Analysis. Applied to the subset of variables that are also available in the ERA-Interim product, the method involves a spatial and temporal downscaling process using the measured variable at the site. The ERA-I variables that were used are: air temperature at 2 m (t2m, K), incoming shortwave solar radiation at the surface (Sw, W m −2 ), dew point temperature at 2 m (dt2m, K), wind speed horizontal components at 10 m (u10 and v10, m s −1 ), total precipitation (Pr, m of water per time step), and incoming longwave solar radiation at the surface (Lw, W m −2 ). The gap-filling procedure harmonizes units, identifies periods that are long enough to allow a linear relationship to be built, a simple debiasing of the linear relationship, evaluation of the diurnal cycle in the subset of variables, and other evaluations of the results to identify potential missing or incorrect information (e.g., coordinate or temporal mismatches). The linear relationships are built taking into account instantaneous and averaged variables, and then applied to the whole ERA-I record, generating the spatially (coordinate-based) and temporally (diurnal cycle-based) downscaled version of each variable. The method was applied as in the original implementation; additional details can be found in Vuichard and Papale 32 .
Final gap-filled product. Measured or high quality gap-filled records using MDS (_F_MDS_QC < 2) are used in the final gap-filled products (_F suffixed variables, without _MDS or _ERA). If the variable has a low quality gap-fill flag (2 or 3), the ERA-I product is used instead. The final quality flag (_F_QC) is 0 for measured, 1 for high quality fill using MDS, and 2 for data gap-filled with the ERA-I downscaled product. A gap-filled version of CO 2 concentration is also generated (CO 2 _F_MDS) using the MDS method as described above, including the corresponding quality flag.
Energy and water products. The main data products associated with energy and water fluxes are the gap-filled versions of the data and the estimation of a version ensuring the energy balance closure and estimating its uncertainty-for a description of the issue see Stoy et al. 35 Turbulent energy fluxes (sensible and latent heat, H and LE, respectively) are gap-filled using the MDS method 31 described above. From LE, it is possible to calculate the water flux (evapotranspiration) using the latent heat of vaporization. An energy balance corrected version of LE and H is also created, a data product often needed when data are used in model parameterization and validation for which the closure of the energy balance is prescribed. There is no general agreement on the reasons and approaches to correct the imbalance in the energy budget within EC measurements. In this product, the methodology used to calculate the energy balance corrected fluxes is based on the assumption that the Bowen ratio is correct 36 . Fluxes are corrected by multiplying the original, gap-filled LE and H data by an energy balance closure correction factor (EBC_CF, in the dataset). The correction factor is calculated starting from the half-hours where all the variables needed to estimate energy balance closure are available (measured NETRAD and G, and measured or good-quality gap-filled H and LE). The correction factor for each single half-hour is calculated as in Eq. (1), but is not applied directly.
First, to avoid transient conditions, the calculated EBC_CF time series is filtered by removing values outside of 1.5 times its own interquartile range. Then, the correction factor used in the calculations is obtained using one of three methods, applied hierarchically (see also diagram in Supplementary Fig. SM3): • EBC_CF Method 1: For each half-hour, a sliding window of ±15 days (31 days total) is used to select half-hours between time periods 22:00-02:30 and 10:00-14:30 (local standard time). These time-of-day restrictions aim at removing sunrise and sunset time periods, when changes in ecosystem heat storage (not measured) are more significant, preventing energy balance closures. For all half-hours meeting these criteria, the corresponding EBC_CFs are selected and used to calculate the corrected values of H and LE for the halfhour processed (center of the sliding window), generating a pool of values for each of these two variables.
From each of these two pools, the 25th, 50th (median), and 75th percentiles are extracted for their corre-  37 and then aggregated at the other temporal resolutions. The random uncertainty (indicated by the suffix _RANDUNC) in the measurements is estimated using one of two methods, applied hierarchically: • H-LE-RANDUNC Method 1 (direct standard deviation method): For a sliding window of ±5 days and ±1 hour of the time-of-day of the current timestamp, the random uncertainty is calculated as the standard deviation of the measured fluxes. The similarity in the meteorological conditions evaluated as in the MDS gap-filling method 31 and a minimum of five measured values must be present; otherwise, method 2 is used. The joint uncertainty for H and LE is computed from the combination of the uncertainty from the energy balance closure correction factor and random uncertainty. CO 2 products. The processing steps applied to CO 2 fluxes were: calculation of net ecosystem exchange (NEE) from CO 2 turbulent and storage fluxes, applying a spike detection algorithm, filtering for low turbulence conditions using multiple friction velocity (USTAR) thresholds, gap-filling of all NEE time series generated by an ensemble of USTAR thresholds, estimation of random uncertainty, and partitioning of NEE into its ecosystem respiration (RECO) and gross primary production (GPP) components.
Calculation of NEE. CO 2 storage fluxes (SC) express the change of CO 2 concentration below the measurement level of the eddy covariance system within the half-hour. NEE was calculated as the sum of the CO 2 turbulent fluxes (FC) and SC. Both FC and SC are part of the required data contributed by site teams. SC is usually estimated using a profile system 38 . If SC was not provided or missing, two cases were implemented: for measurement heights lower than 3 m and short canopies, the SC term was considered to be 0; for taller towers/canopies, a discrete estimation based on the top measurement of CO 2 concentration was used to compute SC 39 .
Despiking of NEE. Although the processing of high frequency data into half-hourly fluxes usually includes steps to remove spikes from instantaneous measurements, spikes can also occur in the half-hourly data. The method described in Papale et al. 30 , based on the median absolute deviation (MAD) with z = 5.5, was applied to filter NEE for residual spikes that were removed.
USTAR threshold estimation and filtering. Filtering for low turbulence conditions is necessary when there is not enough turbulence, causing the ecosystem flux to be transported by advective flows and missed by both the eddy covariance system and the storage profile, resulting in underestimated fluxes. Despite different approaches having been tested to measure and quantify horizontal and vertical advection 40 , the most often used method to avoid the www.nature.com/scientificdata www.nature.com/scientificdata/ underestimation of fluxes is removing the data points potentially affected by strong advection 1 . These points are identified using the friction velocity (USTAR) as an indicator of turbulence strength, defining a threshold value under which NEE measurements are discarded and replaced by gap-filled estimates.
This USTAR threshold is linked to the canopy structure, measurement height, wind regimes, and other factors specific to an individual site. It is estimated using nighttime NEE measurements (only ecosystem respiration), based on the dependency between USTAR and NEE at similar temperatures and periods of the year (main drivers of ecosystem respiration). Under these conditions, NEE is assumed (and expected) to be independent from USTAR, which is not a driver of respiration. However, in most sites below a certain USTAR threshold value, NEE is found to increase with USTAR; this USTAR value is selected as the threshold to define conditions with reduced risk of flux underestimation. Different methods have been proposed to estimate the USTAR thresholds and the related uncertainty as to how the approach works at a specific site 1 .
CP and MP USTAR threshold methods. Two methods to calculate USTAR thresholds were used: change-point-detection (CP) proposed by Barr et al. 41 and a modified version of the moving-point-transition (MP) described originally by Reichstein et al. 31 and Papale et al. 30 . Both methods are similar in terms of data selection, preparation and grouping and aim to estimate the USTAR threshold value. Measurements collected when USTAR is below the threshold are removed. The difference between these methods is in how this threshold value is estimated. For both methods, the nighttime data of a full year are divided in four three-month periods (seasons) and 7 temperature classes (of equal size in terms of number of observations). For each season/temperature group the data are divided into 20 USTAR classes (also with equal number of observations) and the average NEE for each USTAR class is computed. The calculation of the threshold uses each of the methods (see below for details on their differences). For each season, the median value of the 7 temperature classes is calculated and a final threshold is defined by selecting the maximum of the 4 seasonal values.
The CP method uses two linear regressions between NEE and USTAR, the second with an imposed zero slope. The change point is defined as where the two lines cross, i.e., constraining the shape of the NEE-USTAR dependency. The method is extensively used to detect temporal discontinuities in climatic data. Details can be found in Barr et al. 41 .
For the MP method 30,31 , the mean NEE value in each of the 20 USTAR classes is compared to the mean NEE measured in the 10 higher USTAR classes. The threshold selected is the USTAR class in which the average nighttime NEE reaches more than 99% of the average NEE at the higher USTAR classes. An improvement of the MP method was implemented here for robustness over noisy data, by adding a second step to the original MP implementation: when a threshold is selected, it was tested to ensure it was also valid for the following USTAR class. In other words, assuming that Eq. (3) holds, where x is one of the 20 USTAR classes and NEE USTAR x _ ( ) is the average NEE for that USTAR class.
> . × + The USTAR value associated to the x th -class was selected as threshold only if Eq. (4) also holds, to confirm that the plateau where NEE is USTAR-independent was reached. If not, the search for the plateau and threshold continued toward higher USTAR values.
Bootstrapping USTAR threshold estimation. For each of the two methods, a bootstrapping technique was used. The full dataset (year of measurement) was re-sampled 100 times with the possibility to select the same data point multiple times (i.e., with replacement), creating 100 versions of the dataset. The threshold values were calculated for each of them, obtaining 100 threshold values per method (CP and MP) and year, for a total of 200 USTAR threshold estimates for each year. This process and next steps are illustrated in Fig. 3. These 200 threshold values represent the uncertainty in the threshold estimation that could also impact the uncertainty of NEE. It is worth noting that there is not always a direct relationship between the threshold and NEE uncertainties. It is possible, for instance, that a small variability in the thresholds has a strong effect on NEE or, conversely, with NEE almost insensitive to the threshold value. This is related to the site characteristics (USTAR variability) and to the level of difficulty in filling the gaps created by the filtering. There are cases where not enough data are present to calculate a USTAR threshold (for both the CP and MP methods) or where it is not possible to identify a clear change point (CP method only). This leads to the uncertainty being underestimated (fewer or no USTAR threshold values available). This should be considered as a general indication of difficulties in the application of the USTAR filtering for the specific sites or years. Sites and years where these conditions occurred are reported in the SUCCESS_RUN variable in the AUXNEE product (values 1: threshold found, 0: failed/no threshold found). • Variable USTAR Threshold (VUT): The thresholds found for each year and the years immediately before and after (if available) have been pooled together, and from their joint population, the final 40 thresholds extracted. With that, the USTAR thresholds vary from year to year; however, they are still influenced by neighboring years. This is identified in FLUXNET2015 variables by the "_VUT" suffix; • Constant USTAR Threshold (CUT): Across years, all the thresholds found have been pooled together and the final 40 thresholds extracted from this dataset. With that, all years were filtered with the same USTAR threshold. This is identified in FLUXNET2015 variables by the "_CUT" suffix.
If the dataset includes up to two years of data, the two methods give the same result, and only the _VUT is generated.
For both the VUT and CUT approaches, 40 NEE datasets have been created, filtering the original NEE time series using 40 different USTAR values estimated as explained above. The values of the thresholds are reported in the AUXNEE product file. These 40 NEE versions have been used as the basis for all the derived variables provided. An example of the variability of the two methods (CP and MP) is shown in Fig. 4, contrasting the distribution of the bootstrapped results for each method, showing comparable values for some years and divergent values for other years (of the same site). This highlights the importance of applying both methods in this ensemble-like way.
Filtering NEE based on USTAR thresholds. The USTAR thresholds are applied to daytime and nighttime data, removing NEE values collected when USTAR is below the threshold and removing also the first half-hour with high turbulence after a period of low turbulence to avoid false emission pulses due to CO 2 accumulated under the canopy and not detected by the storage system (in particular, when a profile is not available at the site). The USTAR filtering is not applied to H and LE, because it has not been proved that when there are CO 2 advective fluxes, these also impact energy fluxes, specifically due to the fact that when advection is in general large (nighttime), energy fluxes are small. Figure 5 shows the range of thresholds found (interquartile ranges) across sites in FLUXNET2015. While some sites had low thresholds and low variability in the USTAR thresholds, others show large ranges of values in some more extreme cases (indicating difficulties in estimating the "real" threshold). www.nature.com/scientificdata www.nature.com/scientificdata/ Gap-filling of NEE. Existing gaps from instrument or power failures are further increased after QC and USTAR filtering. The time series with gaps need to be filled, especially before aggregated values can be calculated (from daily to annual). Moffat et al. 26 compared different gap-filling methods for CO 2 concluding that most of the methods currently available perform sufficiently well with respect to the general uncertainty associated with the measurements. The method implemented here is the Marginal Distribution Sampling (MDS) method already described in the meteorological products.
Selection of reference NEE variables. After filtering NEE using the 40 USTAR thresholds and gap-filling, 40 complete (gap-free) NEE time series were available for each site. For each half hour, it is possible to use the 40 values to estimate the NEE uncertainty resulting from the USTAR threshold estimation (reported as percentiles Fig. 4 Example of the distribution of USTAR thresholds calculated for each year using the MP 30 method in blue and CP 41 method in green for the US-UMB site (dark green where they overlap). All these thresholds were pulled together to extract the CUT final 40 thresholds, while for the VUT thresholds, each year was pulled with the two immediately before and after (e.g., 2005 + 2006 + 2007 to extract the 40 thresholds to be used to filter 2006). Note that the level of agreement between methods and between subsequent years is variable, justifying the approach that propagates this variability into uncertainty in NEE. www.nature.com/scientificdata www.nature.com/scientificdata/ of the NEE distribution, identified by the "_XX" numeric suffix) and the average value (identified in the dataset by the "_MEAN" suffix). Since the average value has a smoothing effect on the time series, an additional reference value of NEE was selected and identified in FLUXNET2015 variables by the "_REF" suffix, in an attempt to identify which of the 40 NEE realizations was the most representative of the ensemble. The "_REF" NEE was selected among the 40 different NEE instances in this way: (1) the Nash-Sutcliffe model efficiency coefficient 42 was calculated between each NEE instance and the remaining 39; (2) the reference NEE has been selected as the one with the highest model efficiency coefficients sum, i.e., the most similar to the other 39. Note that determining the reference NEE is done independently for variables using VUT and CUT USTAR thresholds, as well as for each temporal aggregation. Therefore, the version selected as REF could be different for different temporal resolutions. For instance, NEE_VUT_REF at half-hourly resolution might have been generated using a different USTAR threshold than NEE_VUT_REF at daily resolution. Information on which threshold values were used for each version and temporal aggregation can be found in the auxiliary products for NEE processing (AUXNEE). In addition to the reference NEE, the NEE instance obtained by filtering the data with the median value of the USTAR thresholds distribution is also included. This NEE is identified in FLUXNET2015 variables by the "_USTAR50" suffix (for both CP and MP methods, and both VUT and CUT approaches) and is stable across temporal aggregation resolutions. Individual percentiles of the USTAR thresholds distribution are reported in the AUXNEE file (40 instances for CUT and 40 per year for VUT).
Random uncertainty for NEE. In addition to the uncertainty estimates based on multiple thresholds for USTAR filtering, the random uncertainty for NEE is also estimated based on the method used by Hollinger & Richardson 37 . Variables expressing random uncertainty are identified by the suffix _RANDUNC. One of two methods are used to estimate random uncertainty, applied hierarchically: The joint uncertainty for NEE is computed from the combination of the uncertainty from multiple USTAR thresholds and random uncertainty. These variables identified by the _JOINTUNC suffix and are computed for NEE filtered using the VUT method as in Eq. (5), and similarly for NEE filtered with the CUT method.
The 16th and 84th percentiles are used because they are equivalent to ±1 Standard Deviation in case of a normal distribution. (Note on temporal aggregations: joint uncertainties for NEE are recomputed at all temporal resolutions.) CO 2 flux partitioning in GPP and RECO. Partitioning CO 2 fluxes from NEE into estimates of its two main components, Gross Primary Production (GPP) and Ecosystem Respiration (RECO), was done by parameterizations of models using measured data. All sites were partitioned with the nighttime fluxes method 31 (_NT suffixes) and the daytime fluxes method 43 (_DT suffixes), while a third method, sundown reference respiration 44 (_SR suffixes), was applied to all sites meeting the method's requirements (e.g., high quality storage measurement).
The nighttime method uses nighttime data to parameterize a respiration-temperature model that is then applied to the whole dataset to estimate RECO. GPP is then calculated as the difference between RECO and NEE. The parameterization uses short windows of time (14 days) to account for the dynamic of other important respiration drivers such as water, substrate availability, and phenology (see Reichstein et al. 31 for details on the implementation and ONEFlux 22 for the code).
The daytime method uses daytime and nighttime data to parameterize a model with one component based on a light-response curve and vapor pressure deficit for GPP, and a second component using a respiration-temperature relationship similar to the nighttime method. In this case, NEE becomes a function of both GPP and RECO, both of which are estimated by the model. Similarly to the nighttime method, the parameterization is done for short windows (8 days) to take into consideration other slower-changing factors (see Lasslop et al. 43 for details on the implementation and ONEFlux 22 for the code).
For forest sites where a CO 2 concentration profile for storage fluxes was available, an additional RECO estimate was calculated using the method from van Gorsel et al. 44 , with variables identified by the _SR suffix. In this method, the parameterization of a respiration-temperature model is based solely on data acquired just after sundown, aiming at excluding the measurements potentially affected by advection and also assuming that in the first hour of the evening the advective transport is not yet established.
The sundown partitioning method requires that the NEE is not filtered for low turbulence conditions (USTAR), and for this reason it was applied only to the original time series. The nighttime and daytime methods instead require NEE filtered for low turbulence conditions. For this reason they were applied to all the 40 NEE (2020) 7:225 | https://doi.org/10.1038/s41597-020-0534-3 www.nature.com/scientificdata www.nature.com/scientificdata/ versions resulting from the 40 USTAR thresholds, obtaining 40 versions of GPP and RECO for each of the two partitioning methods, propagating the uncertainty from NEE to GPP and RECO. This has been done for both the CUT and VUT filtering methods.
Similarly to NEE, the 40 GPP and RECO estimates (for each method and for CUT and VUT) have been used to calculate the percentiles of their distribution for each timestep (describing their uncertainty due to the NEE uncertainty). The average value (_MEAN) and the reference value use the same model efficiency approach used for NEE for each temporal aggregation. Similarly, NEE filtered with the median USTAR value (_USTAR50) has been partitioned into GPP and RECO. Information on the threshold values used for all versions of GPP and RECO (_NT and _DT, _VUT and _CUT, HH to YY resolutions) are in the auxiliary files for NEE processing (AUXNEE). Variables for reference GPP and RECO are also identified by a _REF suffix. The two methods for the partitioning (three for the cases in which the sundown method is applied) are not merged in any way, because their difference is informative with the respect to the uncertainty of the methods, as in the case of a model comparison exercise. Implementation approach. To increase the traceability of changes between versions of datasets and reduce uncertainty stemming from choices made at implementation time, we favored using original code implementations or thoroughly validated re-implementations of original codes. Thus, our code organization strings together loosely coupled components which implement each step, with clear-cut interfaces between steps. This modular approach eases the maintenance and change efforts for any individual step, but adds complexity to evaluating changes for the entire pipeline. Different programming languages (Python, C, MATLAB and IDL, plus PV-WAVE for FLUXNET2015) were used to implement the different steps, all connected using a controller code that makes appropriate calls in the correct order. The ONEFlux 22 code collection replaced the PV-WAVE code with a re-implementation in Python, and also collates most of these steps into a cohesive pipeline (see also the Code Availability section). The IDL code, which applies the sundown partitioning method 44 , is not yet currently implemented in ONEFlux, because some additional testing and development are needed to make it robust and more suitable for general application. Implementation details of individual steps are discussed next, with references to the outputs each step identified by an execution sequential number and the step name-e.g., 01_qc_visual contains the results of the first processing step, the visual check step. Each of these steps correspond to a code module. Supplementary Fig. SM2 shows the steps and their inter-dependencies.
Steps implemented in python. The main controller code for ONEFlux is implemented in Python. Besides being the glue code that executes each step, pre-and post-checks are also executed before and after each step. These checks guarantee that the input data meet the minimum requirements to run the step, that the minimum expected outputs were generated by the execution of the step, and that any errors or exception conditions were handled correctly. Information about execution is recorded in a log for the entire pipeline, along with logs for individual steps. Besides the controller code, two of the three flux partitioning steps were re-implemented in Python (the nighttime and daytime methods, 10_nee_partition_nt and 11_nee_partition_dt), together with other specific steps such as data preparation for the uncertainty estimates (12_ure_input), and the creation and checking of final products (99_fluxnet2015). The original flux partitioning implementation in PV-WAVE was used for the LaThuile2007 and FLUXNET2015 datasets. Also, the tool for the downscaling of the ERA-I meteorological data is implemented in Python and runs on a server connected to the ERA data.
Steps implemented in C. Several steps are implemented in the C programming language, allowing better control over execution performance of these steps. These steps include: automated QA/QC flagging (02_qc_auto), USTAR threshold estimation using the MP method (04_ustar_ mp), the filtering and gap-filling of meteorological data, including the merging with the ERA-I downscaled data (07_meteo_proc), the filtering and gap-filling of CO 2 fluxes (08_nee_proc), the filtering, gap-filling, and energy corrections of energy fluxes (09_energy_proc), and the computation of uncertainty products (12_ure). The source codes and the compiled executables are provided for steps implemented in C, as well as build procedures in make/ Makefile format.
Steps implemented in MATLAB. The estimation of USTAR thresholds using the CP method (05_ustar_cp) is the only step implemented in MATLAB. It is distributed both as source code and compiled code to be used with the MATLAB Runtime Environment, such that it does not require a license purchase.

Data records
The FLUXNET2015 portion presented in this paper contains 1496 site-years of data from 206 sites 45-250 , characterizing ecosystem-level carbon and energy fluxes in diverse ecosystems across the globe (Fig. 1, Supplementary  Fig. SM1 251,252 ), spanning from the early 1990s to 2014, with 69 sites having decade-long records. The dataset covers the distribution of ecosystem fluxes as reported in the recent meta-analyses 253,254 (Fig. 6).
The dataset is distributed in files separated by sites, by temporal aggregation resolutions (e.g., hourly, weekly), and by data products (e.g., FULLSET with all the variables and SUBSET designed for less experienced users). All data files for a site are available for download as a single ZIP file archive with site-specific DOI. The file-naming conventions details these options for each file (Table 1). Site metadata are also available as a single file containing metadata for all sites, detailed later in this section and Supplementary Table SM8. Note that DOIs are assigned at the site level, one DOI per site for all of that site's products. A DOI was not assigned to the whole FLUXNET2015 dataset, since this would make citation and assigning credit imprecise and hard to track.
The FLUXNET2015 dataset provides data at five temporal resolutions. Site teams contribute either half-hourly (HH) or hourly (HR) datasets, depending on the integration/aggregation time decided by the site managers and (2020) 7:225 | https://doi.org/10.1038/s41597-020-0534-3 www.nature.com/scientificdata www.nature.com/scientificdata/ function of the characteristics of the turbulence. References to half-hourly in this paper also apply to hourly data, unless explicitly stated otherwise. Half-hourly data are the basis of all the processing done for this dataset and are the finest grained temporal resolution provided. Coarser aggregations are generated uniformly from half-hourly data within the data processing pipeline. The other standard temporal aggregations are: daily (DD), weekly (WW), monthly (MM), and yearly (YY).
The complete output from the data processing pipeline includes over 200 variables-among which are measured and derived data, quality flags, uncertainty quantification variables, and results from intermediate data processing steps. The variable names follow the naming conventions of <BASENAME>_<QUALIFIER>, where BASENAME describes the physical quantities (e.g., TA, NEE, Table 2) and QUALIFIER describes the information of processing methods (e.g., VUT, CUT), uncertainties (e.g., RANDUNC), and quality flags (e.g., QC) (see Supplementary Table SM1).
To serve the users with an easier-to-use data product, we created two variants with different selections of variables for data distribution: the FULLSET with all the results and variables; and the SUBSET, designed to help non-expert users, with a reduced set of variables that should fit most needs.
• FULLSET: variables generated by the processing such as uncertainty quantification variables, all variants of the data products, all quality information flags, and many variables generated by intermediate processing steps to allow in-depth understanding of individual processing steps and their effect in the final data products. A summary of the main variable basenames is in Table 2, while a full list of variables is provided in Supplementary Table SM1. Key features of the FULLSET version are: www.nature.com/scientificdata www.nature.com/scientificdata/ • Meteorological variables filled with multiple gap-filling methods (e.g., MDS, ERA) are provided separately. • NEE versions filtered with two different methods of extracting the USTAR thresholds (i.e., CUT, VUT) are provided. Multiple percentiles and reference NEE are also provided. • GPP and RECO partitioned from NEE filtered with VUT and CUT methods, using both daytime and nighttime partitioning methods (i.e., NT, DT). Multiple percentiles and reference GPP and RECO are provided. • LE and H gap-filled, adjusted and non-adjusted for energy balance closure, are both provided.
• Random, methodological, and joint uncertainties for NEE, GPP, RECO, LE, and H are provided.
• SUBSET: Includes a subset of the data product. The selection of the variables for this data product was done based on the expected usage for most users and to help less experienced users. Although the number of variables used is reduced, they are still accompanied by a set of quality flags and uncertainty quantification variables essential to correctly interpret the data. Key features of the SUBSET version are: • Only the consolidated gap-filled meteorological variables are provided. DATA_PRODUCT Grouping of variables from release included in file.
• SUBSET: Core set of variables with quality and uncertainty information needed for general uses of the data • FULLSET: All variables, including all quality and uncertainty information, and key variables from intermediate processing steps  www.nature.com/scientificdata www.nature.com/scientificdata/ • LE and H gap-filled, adjusted and non-adjusted for energy balance closure, are both provided.
• Random and methodological uncertainties for NEE, GPP, RECO, LE, and H are provided.
The variable proposed in the SUBSET product is NEE_VUT_REF since it maintains the temporal variability (as opposed to the MEAN NEE), it is representative of the ensemble, and the VUT method is sensitive to possible changes of the canopy (density and height) and site setup, which can have an impact on the turbulence and consequently on the USTAR threshold. The RECO and GPP products in SUBSET are calculated from the corresponding NEE variables filtered with the VUT method, generating RECO_NT_VUT_REF and RECO_DT_VUT_REF for RECO, and GPP_NT_VUT_REF and GPP_DT_VUT_REF for GPP. It is important to use both daytime (DT) and nighttime (NT) variables, and consider their difference as uncertainty.
Auxiliary data products provide extra information on specific parameters of the data processing pipeline. The groups of products are: • AUXMETEO: Auxiliary data product containing information about the downscaling of meteorological variables using the ERA-Interim reanalysis data product (TA, PA, VPD, WS, P, SW_IN, and LW_IN). Variables in these files relate to the linear regression and error/correlation estimates for each data variable used in the downscaling.  .xlsx) for all sites for each data product resolution (see Table 1 for resolution options). The metadata follow the Biological, Ancillary, Disturbance, and Metadata (BADM 255,256 ) standards and are provided in the BADM Interchange Format 29 (BIF). Table 3 illustrates the type of metadata included with selected metadata variables (See full lists and descriptions of the metadata in Supplementary Tables SM2-SM7). Height and instrument models for the flux variables, as well as soil temperature and moisture depths, are reported in the Variable Information metadata.

Technical Validation
Eddy covariance measurements offer a direct method to estimate trace gas or energy exchanges between surface and atmosphere at an ecosystem scale (approximately up to 1 km around the measurement point). This makes eddy covariance difficult to compare with other methods. Nonetheless, eddy covariance data have been extensively used in numerous scientific papers and studies that indirectly validate their reliability and usefulness. Hundreds of articles have been published based on eddy covariance measurements; examples of multi-site studies using FLUXNET2015 data include Jung et al. 15 , Tramontana et al. 257 , and Keenan et al. 258 .
Eddy covariance data were evaluated with respect to other methods such as inventory and chambers by Campioli et al. 259 , who showed that "EC [eddy covariance] biases are not apparent across sites, suggesting the effectiveness of standard post-processing procedures. Our results increase confidence in EC …". The approach of Campioli et al. 259 requires sites that have several additional (and rare) pieces of information; therefore, it is not generally applicable, particularly not across the sites used in this study. However, the eddy covariance site teams co-authoring this paper have compared and technically validated their measurements with respect to knowledge of their site. Unavoidably, measurement and processing uncertainties exist, and can be large for certain sites and ecosystem conditions. However, in general, flux values provided in this dataset are consistent with expectations, and eddy covariance remains one of the more reliable techniques for assessing land-air exchanges at ecosystem scales.

Usage Notes
Detailed documentation on how to use and interpret FLUXNET2015 is available online at https://fluxnet.fluxdata.org/data/fluxnet2015-dataset/. Here, we present some of the main points to guide the usage of the data. risks in the application of standard procedures. When standardized procedures are applied across different sites, the possible differences owing to data treatment are avoided or minimized; this is one of the main (2020) 7:225 | https://doi.org/10.1038/s41597-020-0534-3 www.nature.com/scientificdata www.nature.com/scientificdata/ goals of FLUXNET2015 and ONEFlux. However, there is also the risk and possibility that the standard methods don't work properly or as expected at specific sites and under certain conditions. This is particularly true for the CO 2 flux partitioning, which as with all models is based on assumptions that could not always be not valid. For this reason, it could be necessary to contact the site PIs that are listed in Supplementary Table SM9.
Using the QC flags. There are quality-control flag variables in the dataset to help users filter and interpret variables, especially for gap-filled and process knowledge-based variables. These flags are described in the variable documentation (Supplementary Table SM1). It is highly recommended that one carefully considers the QC flags when using the data.

Percentile variants for fluxes and reference values.
For most flux variables, there are reference values and percentile versions of the variables to help understand some of the uncertainty in the record. For NEE, RECO, and GPP, the percentiles are generated from the bootstrapping of the USTAR threshold estimation step, i.e., they characterize the variability from a range of values obtained as USTAR thresholds. In addition, three different reference values are provided ("_MEAN", "_USTAR50" and "_REF") in order to cover different user needs. In general the "_REF" version should be the most representative, particularly if related to the percentiles. It is, however, important to clearly refer to which NEE version is used in order to ensure reproducibility. For the energy balance corrected H and LE variables, the percentiles indicate the variability due to the uncertainty in the correction factor applied. Similarly to NEE, there are gap-filled and energy balance corrected versions of H and LE variables; therefore, it is also important to clearly refer to which version is used. The SUBSET version of the dataset includes a reduced number of variables, selected for non-expert users. We encourage users to carefully evaluate their requirements and options in the dataset, and if needed to contact regional networks, site teams, or even co-authors of this article for help and recommendations. For more detail, see the Methods section above.
Temporally aggregated resolutions. All data products are provided at multiple temporal resolutions where feasible. The finest resolution is either hourly or half-hourly (indicated by the filename tags HR and HH, respectively). These data are then aggregated into daily (DD), 7-day weekly (WW), monthly (MM), and yearly (YY) resolutions, with appropriate aggregations for each variable, such as averaging for TA and summation for P.
Timestamps. Timestamps in the data and metadata files use the format YYYYMMDDHHMM, truncated at the appropriate resolution (e.g., YYYYMMDD for a date or YYYYMM for a month). Two formats of time associated with a record are used: (1) single timestamp, where the meaning of the timestamp is unambiguous, and, (2) a pair of timestamps, indicating the start and end of the period the timestamps represent.

Time zones.
To allow more direct site comparability, all time variables are reported in local standard time (i.e., without daylight saving time). The time zone information with respect to UTC time is reported in the site metadata.
Numeric resolution. The floating point numbers are maintained at their original resolution throughout processing steps, using double precision for the majority of cases, and are truncated at up to nine decimal places in the distributed files for numbers between 0.0 and 1.0, and at up to five decimal places for larger numbers.  www.nature.com/scientificdata www.nature.com/scientificdata/