Abstract
AmeriFlux is a network of research sites that measure carbon, water, and energy fluxes between ecosystems and the atmosphere using the eddy covariance technique to study a variety of Earth science questions. AmeriFlux’s diversity of ecosystems, instruments, and data-processing routines create challenges for data standardization, quality assurance, and sharing across the network. To address these challenges, the AmeriFlux Management Project (AMP) designed and implemented the BASE data-processing pipeline. The pipeline begins with data uploaded by the site teams, followed by the AMP team’s quality assurance and quality control (QA/QC), ingestion of site metadata, and publication of the BASE data product. The semi-automated pipeline enables us to keep pace with the rapid growth of the network. As of 2022, the AmeriFlux BASE data product contains 3,130 site years of data from 444 sites, with standardized units and variable names of more than 60 common variables, representing the largest long-term data repository for flux-met data in the world. The standardized, quality-ensured data product facilitates multisite comparisons, model evaluations, and data syntheses.
Similar content being viewed by others
Introduction
AmeriFlux is a network of research sites and scientists that use the eddy-covariance technique to measure ecosystem carbon, water, energy, and momentum fluxes in ecosystems across the Americas1. It was established in 1996 to connect independently-managed research in these diverse ecosystems, thus jointly representing major climatic and ecological contexts. Over the last few decades, AmeriFlux has been at the forefront of land-atmosphere interaction research, committed to collecting and sharing high-quality flux and meteorological (flux-met) data among the community of flux researchers. This broader AmeriFlux community of both site teams and data users contributes to science in many ways, including fundamental research, Earth system model development, data science, technical innovation, and science education. For example, AmeriFlux data are widely used to benchmark, validate, and develop new algorithms in the land models of Earth system models2,3. Remote-sensing scientists use AmeriFlux data to parameterize and validate models to upscale carbon and water fluxes in space and time4,5,6. The biogeochemistry and ecology communities use AmeriFlux data to construct budgets of elements with high precision and sampling frequency7,8,9 and identify new and emerging processes, such as the divergence/convergence of ecosystem functions (e.g., carbon uptake, water use, carbon use, energy partition) across space and time10,11,12,13. Long-term AmeriFlux data are valuable in assessing ecosystem carbon sequestration, water and energy budget, and response to climate change, disturbances, management practices, and climatic extremes14,15,16,17. The impact of research based on AmeriFlux data goes beyond these examples and continues to grow, integrating processes across disciplines and spatiotemporal scales.
Since its launch in 1996, AmeriFlux has grown from 15 sites to >110 in 2012 when the AmeriFlux Management Project (AMP, see below) was established, and to 590 sites at the end of 2022 (Fig. 1). These sites represent a broad spectrum of ecosystems across climatic and ecological gradients and diverse regimes of natural disturbance and human management (Fig. 1, Supplementary Figure S1). AmeriFlux is distinguished among all flux networks by having more than 100 sites with times series longer than a decade, including several of the longest-running sites in the world (e.g., Harvard Forest (US-Ha1, 1991-current), Borden Forest (CA-Cbo, 1994-current), Park Falls (US-PFa, 1995-current), Howland Forest (US-Ho1, 1995-current)). These long flux records allow scientists to address questions requiring decades of observations18, such as understanding ecosystem response to climate variability and atmospheric change14,15,19,20. AmeriFlux also contains many clusters of neighboring sites established by individual research groups21. Driven by research questions, many site clusters were established across gradients of land cover and land use, chronosequence stages, microclimate, management, disturbance, and restoration22,23,24,25. The site clusters enable the research communities to understand how different ecosystems respond to similar climatic and, in some cases, edaphic conditions. Moreover, measurements across wide environmental gradients can be constructed from the network’s sites at a regional or continental scale. This distinctive cluster/gradient design makes AmeriFlux data a powerful testbed for model benchmarking, assessing the effects of climate and land cover and land use changes26,27.
AmeriFlux’s wide diversity of ecosystems, instruments, data-processing routines, and science activities are both its strength and challenge. AmeriFlux sites are established by individual site teams driven by diverse research needs and questions1. As a result, research designs and measurements vary among sites, being tailored to each ecosystem and project. This individuality distinguishes AmeriFlux from other flux networks, such as the National Ecological Observatory Network (NEON) and Integrated Carbon Observation System (ICOS28), which have standardized instrument packages and data-processing protocols29,30,31,32,33. AmeriFlux’s diverse and innovative nature has enabled the network to evolve and adapt to new technology when available (and promote that evolution)34,35,36,37. However, the diversity in approaches also challenges data standardization, quality assurance, and data sharing across the network.
In 2012, the United States Department of Energy (DOE) established AMP at the Lawrence Berkeley National Lab (LBNL) to support the broader AmeriFlux community, composed of the AmeriFlux site teams that produce flux-met data and the researchers who use these data. AMP collaborates with AmeriFlux researchers to ensure the quality and availability of the continuous, long-term measurements necessary to understand ecosystems and to build effective models and multisite syntheses. To achieve these goals, AMP has established technical, data, and outreach services, held annual meetings and workshops, and provided operational support to 13–14 flux site clusters (Core sites) to ensure public access to high-quality and long-term flux-met datasets. AMP further supports the community by creating new opportunities (e.g., AmeriFlux Annual Meetings, theme years, working groups, synthesis workshops, webinars) for AmeriFlux researchers to contribute to high-impact research.
AMP’s data support centers on developing standards, data QA/QC, data-processing, and data repositories. AMP worked collaboratively with international partners, particularly ICOS, to design and develop standard formats and processing routines. In 2017, the AMP team at LBNL took full responsibility for the AmeriFlux data repository, previously maintained by the Carbon Dioxide Information Analysis Center (CDIAC) at the Oak Ridge National Lab (ORNL). With that, AMP redesigned, implemented, and launched the new BASE data-processing pipeline (details below), with the objectives of (1) standardizing the flux-met data formats, (2) ensuring and improving the data quality, (3) facilitating regular and frequent data submissions and publications, and (4) tracking the data and communications with site teams through the pipeline. The following sections summarize the outcome of the data-processing pipeline. The methodology behind its design and implementation are detailed in the Methods.
Results
The BASE data-processing pipeline begins with site teams submitting their flux-met data in a standardized format, followed by a series of quality assurance and quality control (QA/QC) checks performed by AMP, e.g., Format QA/QC for format compliance and Data QA/QC for data quality (Fig. 2). AMP then communicates the check results and, if any, needed corrections with site teams through Format and Data QA/QC reports. Once passing QA/QC checks, the flux-met data are published as the BASE data product for each site, i.e., made publicly available on the AmeriFlux website. The BASE data format follows an international standard compatible with other flux networks like ICOS and European Database.
Data upload and release
Between implementing the pipeline in May 2017 and December 31, 2022, we have received 3,468 data uploads containing 6,195 files of flux-met data from 385 sites (Fig. 3a). AMP generated 3,538 Format QA/QC and 1,980 Data QA/QC reports that were emailed to site teams (Fig. 3b). Notably, in 2020–2022, we received data uploads from ~200 sites each year and sent more than 600 and 400 Format and Data QA/QC reports yearly. As a reference, the BASE data repository contained 1,256 site-years of data from 174 sites in April 2017. The 2017–2022 period coincided with the rapid growth of the network (Fig. 1). The implemented pipeline enables us to keep up with the growth, publishing on average ~48 new sites and ~330 new site years each year. As of 2022, there are 3,130 site years of AmeriFlux BASE data from 444 sites, representing the world’s largest data repository for flux-met data. Moreover, 344 sites (~77%) are under the CC-BY-4.0 license.
During 2017–2022, 288 sites submitted data for the first time and were checked by the Format and Data QA/QC. Around 94% of these new sites’ data was published in the BASE data product as of 2022. For each site’s first complete publishing cycle, these new sites took a median of 127 days from the first-time data upload to BASE publication. The durations varied depending on the number of iterations required to resolve the identified issues, particularly in the Data QA/QC. While varied among sites, common data issues include shifts in timestamps, sensor degradation, excessive outlier, incorrect units, and flipped sign conventions. About 29%, 33%, and 28% of sites went through 1–3 (re)submission cycles, with median durations of 60, 116, and 154 days, respectively. This latency time, especially for the new sites, is reflected in the difference between the number of sites uploading and publishing data within each year (Fig. 3a).
Around 217 sites updated their BASE data product (e.g., adding additional years of data to previously published BASE) in 2017–2022, including 150 new sites discussed above. The median turnaround duration was around 42 days from upload to BASE publication, much shorter than the first-time submissions from the new sites. Most (80%) of these returning sites took less than 90 days from the upload to BASE publication. Seventy-five sites updated their BASE data product more than five times in 2017–2022.
In sum, the BASE pipeline facilitates more frequent data uploads and releases and allows data users to access recent-year data. While traversing the pipeline entailed a few iterations and months for new sites to address the identified issues, it significantly decreased the overall latency time between data collection and release for many returning sites. For example, the number of sites with data available for the prior year increased from 0 sites in 2017 to 90 sites in 2022 (Supplementary Figure S2). Over 2017–2022, the BASE data products were downloaded more than 27,000 times by ~4,800 users globally. Many of these downloads included multiple sites, resulting in total site downloads of 318,553 for the period. Notably, the total site downloads increased from 18,644 in 2017 to 86,371 in 2022. The data-download interface logs the downloader’s intended data use, and these covered a wide range38, such as multisite synthesis, benchmarking remote-sensing and land surface models, and education.
Data summary
The BASE data pipeline generates the BASE data product: time series flux-met data at a half-hourly or hourly resolution. The BASE data product follows the global FP (Flux Processing) Standard39, ensuring that variable names, units, and file formats are defined and consistent. Around 52 out of 143 variables supported by the FP Standard are commonly submitted (>50 sites, Fig. 4, Supplementary Table S1). These variables can be categorized into flux-related groups, such as the trace gasses (e.g., CO2 and CH4 fluxes and concentrations), energy (e.g., latent and sensible heat fluxes), derived products (e.g., gross primary production, ecosystem respiration), quality flags (e.g., steady-state and integral turbulence characteristics), and footprints (e.g., distance with maximum footprint contribution). The BASE data product also consists of data on meteorology and soil, such as the groups of radiation (e.g., net radiation, incoming shortwave radiation), atmosphere (e.g., air temperature, relative humidity), wind (e.g., friction velocity, wind speed), precipitation, and soil (e.g., soil temperature, soil water content). It is worth mentioning that some sites have data measured at multiple locations (dark colors in Fig. 4) for replication or spatial variation. In particular, soil temperature and water content are measured extensively in vertical or horizontal locations at most sites. Air temperature, wind speed, direction, CO2 and H2O concentrations, and soil heat fluxes are also measured at multiple locations at around 80–120 sites.
BASE flux-met data are rich time series, typically with half-hour resolutions and data records that span from years to decades. While a portion of sites (<50) started in the 1990s, most sites’ data records were concentrated in 2004–2020, with around 140 sites operating concurrently (Supplementary Figure S2). Figure 5 illustrates the temporal characteristics of selected flux data across AmeriFlux sites, highlighting a few long-running sites (red lines in left panels and time series in right panels). Most flux data show evident temporal variation at the sub-daily to daily and seasonal to annual scales, reflecting biological (e.g., phenology) and climatic regulation (Fig. 5a,c,e,g,i). Yet, distinct temporal variations were observed across sites depending on the temporal scales. For example, CH4 fluxes (FCH4) show weak to negligible seasonality at some but not all sites (Fig. 5i). And no consistent temporal variation was observed for all flux variables on weekly to monthly scales. With more than 100 sites now having decade-long records, it becomes feasible to explore the temporal characteristics at a longer scale. While some sites reveal weak variability near the quinquennial scale, we did not find a general pattern across sites.
Discussion
Network growth and data sharing
Since its onset, AMP has engaged with the AmeriFlux community, both the site teams and data users, through services centered on data, technique, and outreach. During this period, AMP supported and facilitated the growth of the AmeriFlux network, reflected in the rapid increase in registered sites, available data, and data usage (Figs. 1, 3, Supplementary Figure S2).
Since the network’s conception, data sharing has been a core tenet of AmeriFlux. AMP strives to maintain this practice, focusing on the dual goals of increasing the number of site teams contributing data and improving the quality and quantity of the data available. Key to this approach is semi-automation in the BASE data-processing pipeline, which has led to dramatic improvements in the breadth of QA/QC checks performed and the consistency of a high-quality BASE data product. Additionally, the BASE data-processing pipeline reduces the turnaround time that site teams receive feedback from 6–12 to 1–2 months, enabling more rapid data correction. While the QA/QC checks may present a hurdle for new site teams submitting their data for the first time, the independent data quality assessment by AMP is a key benefit of joining the network. And once the site teams became familiar with the QA/QC processes, the time from submission to publication was significantly reduced. Overall, the pipeline decreased the latency time from data collection to release. The addition of a CC-BY-4.0 data policy adopted by a majority of the network has significantly improved the findability, accessibility, interoperability, and reusability of the data.
Synthesis and extended products
The AmeriFlux BASE data product’s life cycle continues after its release, further enabling and facilitating numerous data products and syntheses. For example, the FLUXNET data products—a gap-filled data product with value-added variables (i.e., partitioned gross primary productivity)—are part of global datasets used for model validation and benchmarking for decades40,41. In this regard, AMP collaborates with international partners like ICOS to develop the ONEFlux (Open Network-Enabled Flux) codes, fostering the creation of the FLUXNET2015 data product42. Furthermore, AMP is leveraging the high-quality standardized BASE data product as input to the ONEFlux codes to produce the next-generation FLUXNET data product for AmeriFlux sites. Additionally, the infrastructure and workflows developed for the BASE data-processing pipeline are being extended to produce the FLUXNET product (Fig. 2). As of 2022, AMP released the new AmeriFlux FLUXNET data product for 79 AmeriFlux sites. AMP anticipates continuing to release and update the AmeriFlux FLUXNET data products in coordination with other flux network partners43. The FLUXNET-CH4 community data product demonstrates another example of an extended product based on the BASE data product44,45. Among 81 sites included in the FLUXNET-CH4 data product, 45 are AmeriFlux sites that make their data available through the BASE data product.
AmeriFlux BASE data also facilitate syntheses that utilize data from multiple sites, a unique tool for scientific discovery. Recent examples include fundamental research13,46,47,48, model evaluation and benchmarking49,50, remote-sensing validation51,52,53,54, machine learning55,56, and science education57.
Future direction of the data pipeline
The AmeriFlux BASE data-processing pipeline design considers the network’s unique aspects, such as distributed site teams, diverse instrumentation and processing routines, which distinguishes it from those implemented by other flux networks30,31,58,59. The data-processing pipeline incorporates many features (e.g., visualization, QA/QC report summaries, central communication tracking) to facilitate interactions with individual site teams. While the Format QA/QC assessment was fully automated earlier in the pipeline development, the Data QA/QC assessment remains a semi-automated process. The Data QA/QC module automatically generates statistics and figures, and AMP team members evaluate results and synthesize identified issues into a concise, readable, and actionable report. Full automation is challenging to achieve. For example, a single data issue can trigger warnings in multiple QA/QC checks. Thus, identifying and interpreting the root cause can be non-trivial. Without a concise report, the figures and statistics alone are difficult for data providers (particularly new site teams) to interpret and take appropriate action. At the same time, manual review by AMP is unsustainable, given the expected network and data-submission growth. Further development on fully automatic and self-interpretable Data QA/QC reports and training for site teams is in progress to further reduce the turnaround time and keep pace with the network growth and continuous data updates.
While most AmeriFlux sites’ data concentrated on about 60 common variables (e.g., fluxes, radiation, meteorology, soil, Fig. 4), research innovation has promoted the discussion of new variables and/or metadata. We partner with the AmeriFlux community members and other networks to develop new variables and their corresponding metadata and data check and processing routines. For example, to support the activities in the Year of Methane in 2018–2019, we worked with the Global Carbon Project, FLUXNET, and ICOS to add new aquatic variables (e.g., water temperature, dissolved oxygen) to the FP Standard. Most recently, the Year of Remote Sensing also facilitated the addition of new tower-based spectral variables, e.g., Near Infrared Vegetation Index60. The pipeline is designed to seamlessly support these new types of continuous measurements as they are added to the FP Standard. If new variables require additional quality assessments, the Data QA/QC module can be easily extended due to its modular design.
Methods
Data collection and processing at individual sites
AmeriFlux flux-met data’s life cycle begins with data collection at each field site using a suite of automated instruments. The instruments may vary from site to site but include eddy-covariance instruments (i.e., sonic anemometer, gas analyzer) and a selected set of meteorological, soil, and biological sensors. The data streams are recorded continuously (e.g., 10–20 Hz for flux measurements, 1-0.1 Hz or slower for others) by the data acquisition systems (e.g., logger, computer) and retrieved via physical visits or remote connection (e.g., cellular modem, radio transfer, Ethernet, satellite). Next, the site teams apply quality control and process the high-frequency data using selected software or in-house codes to produce flux-met data at a half-hourly or hourly resolution. Previous comparison studies showed that software selection generally led to marginal differences29,61,62,63 although the differences in the corrections implemented could also lead to systematic biases (e.g., spectral corrections function of air humidity64). Yet, the selection of corrections applied in the flux calculation (e.g., coordinate rotation, despiking, time lag optimization, spectral corrections), judged and augmented by individual researchers, can vary among sites based on sites’ characteristics (e.g., climate, canopy heights, tower structures, instrument types, and setup). Last, the data are checked and filtered by the site teams before uploading to the AmeriFlux website. Gap-filling is not required, but gap-filled variables can be provided in addition to non-filled ones.
AmeriFlux BASE data-processing pipeline
The goal of the AmeriFlux BASE data-processing pipeline is to provide high-quality flux-met data in a standardized format that enables a broad range of Earth science research and educational activities. Our approach requires site teams to process high-frequency observations into half-hour or hourly fluxes (described above), prepare them in a standardized format (details below), and then submit these data to the AmeriFlux website. Upon submission, our semi-automated BASE processing pipeline is initiated and performs QA/QC checks (Fig. 2). If the submitted data pass the QA/QC checks, the resulting BASE data product is published, i.e., made publicly available on the AmeriFlux website. All data uploads are logged, all communications are tracked, and the data provenance is maintained.
The BASE processing pipeline consists of 3 modular components: Format QA/QC, Data QA/QC, and BASE Publish (Fig. 2). The automated portions of the pipeline are primarily written in Python (see Code Availability for the code repository). The pipeline logs the processing status of all data submissions and published BASE data products in a SQL database. All detected data issues and communication between the site team and AMP are recorded in information technology JIRA Service Management.
The Format QA/QC module assesses compliance of submitted data files with the AmeriFlux FP-In (Flux Processing In) standardized format65. It makes one attempt to automatically correct minor issues if discovered (Fig. 6). The site teams receive a Format QA/QC report within a few hours after submission (Supplementary Figure S3). The FP-In format follows the timestamp, variable name, units, and data formatting conventions of the global FP (Flux Processing) format, namely a comma-delimited file with variables in columns at a timestep of half-hour or an hour in rows. The minimum variables required are the start and end timestamps and one carbon flux observation (FC or FCH4). However, most site teams also submit gas concentrations, gas and energy fluxes, basic meteorological observations (e.g., air temperature, wind speed and direction), and radiation observations. In requiring the FP-In format, the automated pipeline code can attempt fully automatic correction of various minor errors, including filling the skipped time intervals with the missing value designator −9999, fixing incorrect variable names, changing the file format to CSV, etc. Site teams can submit a site’s full data record, replacement data for previously submitted data, or new data that extend the site’s record.
The Data QA/QC module assesses the quality of flux-met data uploaded to AmeriFlux. It is a secondary data quality assessment that is independent of and complementary to the data quality checks performed by site teams prior to upload. The Data QA/QC follows a similar methodology to the FLUXNET2015 dataset42,66 but includes additional checks based on data user feedback (e.g., emails, workshops). Also, its design considers the long history of AmeriFlux data repositories and the diverse ecosystems and climates of AmeriFlux sites. For example, specific checks were developed to detect spurious trends and shifts in long-term records. Site-specific plausible ranges were constructed for each site to accommodate the wide range of climatic and ecosystem conditions. Last, the Data QA/QC uses data visualization and a ticket-tracking system (i.e., JIRA Service Management) to facilitate communication with site teams. Six Data QA/QC check modules are implemented currently: timestamp alignment, physical range, multivariate comparison, diurnal-seasonal pattern, USTAR filtering, and variable coverage (Table 2). Details and example figures of each module are provided in Supplementary Materials (Supplementary Text S1, Supplementary Figures S4-S17). AMP also hosts workshops and webinars for site teams to learn about the QA/QC (recordings available at https://ameriflux.lbl.gov/community/amp-webinar-series/).
Once passing Format QA/QC, the uploaded files are combined with a site’s previously published BASE data product to form a complete data record (Fig. 7). Data QA/QC modules are executed and automatically generate figures and summary statistics (e.g., Supplementary Figure S18). The module execution time is typically within a few hours for a site’s data. Then, AMP conducts Data QA/QC reviews of sites in batches ranging from weekly to monthly and synthesizes the identified issues into a concise, actionable report (e.g., Supplementary Figure S19). While varying among cases, the average time for Data QA/QC review is typically less than an hour for each site. The report also explains the background of Data QA/QC and provides links to all summary statistics and figures generated. If there are identified issues, AMP notifies the site team of corrections needed. Otherwise, the data are queued for BASE data publication.
The BASE Publish module occurs after data pass the Data QA/QC, typically in batches once every 1–2 months for both new sites publishing for the first time and returning sites updating data. AMP formats the flux-met data in the FP Standard format, bundles them with Biological, Ancillary, Disturbance, and Metadata (BADM, details below), and versions the bundled data. In addition, the module obtains Digital Object Identifier (DOI) for new data and updates metadata for existing DOIs before making the BASE data product available on the AmeriFlux website. The BASE data product is organized by sites, with one zipped file containing both BASE and BADM data of an AmeriFlux site. Details of the file format and structure are provided in Supplementary Text S2.
In addition to data search and download access, the AmeriFlux website also supports a suite of web-based features for showing each site’s general information, data citation, download logs, images, publications, and related data (e.g., prevailing wind visualizations). Each site with published BASE data that has been assigned a DOI can edit its contributor lists. Last, external links to the sites’ cut-outs of remote-sensing and gridded products, such as MODIS, VIIRS, ECOSTRESS, and Daymet, are also provided through collaborative agreements with Distributed Active Archive Center (DAAC) at ORNL. See Supplementary Text S3 for a quick guide for BASE data use.
BASE data policy
Starting in Fall 2021, AMP worked with AmeriFlux site teams to adopt the new AmeriFlux CC-BY-4.0 Data Use License, which allows data to be shared under the widely-used Creative Commons BY 4.0 license (CC-BY-4.0). As of the end of 2022, 406 AmeriFlux sites (~69% of registered sites) have adopted the CC-BY-4.0 Data Use License. Among 444 sites with BASE data, 344 sites (~77%) are under the CC-BY-4.0 license. The CC-BY-4.0 license makes AmeriFlux data more compatible with other flux networks (e.g., ICOS, OzFlux, and NEON) and more consistent with the FAIR (Findable, Accessible, Interoperable, and Reusable) principle of accessibility, which is now widely encouraged or required by many journal publishers and funding agencies.
Relevant metadata supporting base data
Biological, Ancillary, Disturbance, and Metadata (BADM) are non-continuous information that characterizes a site and complements the BASE flux-met data. BADM includes general site descriptions, metadata about the instruments, maintenance and disturbance events, and biological and ecological data67. See the AmeriFlux website for a complete and updated list of all BADM groups and variables68.
To support AmeriFlux BASE data use, AMP developed and released multiple new BADM sets, including the Measurement Height data, which provides information on BASE data measurement heights/depths and instrument models. The Measurement Height information is provided directly by the site teams or pulled by AMP from historical records and is updated in conjunction with the BASE Publish schedule.
Data availability
All data discussed in this paper are publicly available at AmeriFlux (https://ameriflux.lbl.gov/) as the BASE and BADM data products. The published data are licensed under the AmeriFlux CC-BY-4.0 or the AmeriFlux Legacy Use Data License based on the site team’s selection. Additional data will be published as they are submitted and pass the QA/QC process described in this paper.
Code availability
The core Python-based BASE data-processing pipeline code is available under a modified BSD license at https://github.com/AMF-FLX/AMF-BASE-QAQC. The R-based code for generating the article’s figures is available at https://doi.org/10.5281/zenodo.8250754.
References
Novick, K. A. et al. The AmeriFlux network: A coalition of the willing. Agric. For. Meteorol. 249, 444–456 (2018).
Collier, N. et al. The international land model benchmarking (ILAMB) system: Design, theory, and implementation. J. Adv. Model. Earth Syst. 10, 2731–2754 (2018).
Chen, D. et al. Framing, Context, and Methods. in Climate Change 2021: The Physical Science Basis. in Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change (ed. Masson-Delmotte, V., P. Zhai, A. Pirani, S.L. Connors, C. Péan, S. Berger, N. Caud, Y. Chen, L. Goldfarb, M.I. Gomis, M. Huang, K. Leitzell, E. Lonnoy, J.B.R. Matthews, T.K. Maycock, T. Waterfield, O. Yelekçi, R. Yu, and B. Zhou) 147–286 (2021).
Heinsch, F. A. et al. Evaluation of remote sensing based terrestrial productivity from MODIS using regional tower eddy flux network observations. IEEE Trans. Geosci. Remote Sens. 44, 1908–1925 (2006).
Verma, M. et al. Improving the performance of remote sensing models for capturing intra- and inter-annual variations in daily GPP: An analysis using global FLUXNET tower data. Agric. For. Meteorol. 214–215, 416–429 (2015).
Xiao, J. et al. Data-driven diagnostics of terrestrial carbon dynamics over North America. Agric. For. Meteorol. 197, 142–157 (2014).
Marino, B. D. V., Bautista, N. & Rousseaux, B. Howland Forest, ME, USA: Multi-Gas Flux (CO2, CH4, N2O) Social Cost Product Underscores Limited Carbon Proxies. Land 10, 436 (2021).
Aguilos, M. et al. Effects of land-use change and drought on decadal evapotranspiration and water balance of natural and managed forested wetlands along the southeastern US lower coastal plain. Agric. For. Meteorol. 303, 108381 (2021).
Hemes, K. S. et al. Assessing the carbon and climate benefit of restoring degraded agricultural peat soils to managed wetlands. Agric. For. Meteorol. 268, 202–214 (2019).
Migliavacca, M. et al. The three major axes of terrestrial ecosystem function. Nature 598, 468–472, https://doi.org/10.1038/s41586-021-03939-9 (2021).
Yi, C. et al. Climate control of terrestrial carbon exchange across biomes and continents. Environ. Res. Lett. 5, 034007 (2010).
Duffy, K. A. et al. How close are we to the temperature tipping point of the terrestrial biosphere? Science Advances 7, eaay1052 (2021).
Biederman, J. A. et al. CO2 exchange and evapotranspiration across dryland ecosystems of southwestern North America. Glob. Chang. Biol. 23, 4204–4221, https://doi.org/10.1111/gcb.13686 (2017).
Hollinger, D. Y. et al. Multi-Decadal Carbon Cycle Measurements Indicate Resistance to External Drivers of Change at the Howland Forest AmeriFlux Site. Journal of Geophysical Research: Biogeosciences 126, e2021JG006276 (2021).
Desai, A. R. et al. Drivers of decadal carbon fluxes across temperate ecosystems. J. Geophys. Res. Biogeosci. 127, e2022JG007014 (2022).
Wolf, S. et al. Warm spring reduced carbon cycle impact of the 2012 US summer drought. Proceedings of the National Academy of Sciences 130, 5880–5885 (2016).
Biederman, J. A. et al. Terrestrial carbon balance in a drier world: the effects of water availability in southwestern North America. Glob. Chang. Biol. 22, 1867–1879 (2016).
Keenan, T. F., Moore, D. J. P. & Desai, A. Growth and opportunities in networked synthesis through AmeriFlux. New Phytol. 222, 1685–1687 (2019).
Baldocchi, D., Chu, H. & Reichstein, M. Inter-annual variability of net and gross ecosystem carbon fluxes: A review. Agric. For. Meteorol. 249, 520–533 (2018).
Finzi, A. C. et al. Carbon budget of the Harvard Forest Long‐Term Ecological Research site: pattern, process, and response to global change. Ecol. Monogr. 90, e01423 (2020).
Stoy, P. C. et al. The global distribution of paired eddy covariance towers. bioRxiv 2023.03.03.530958, https://doi.org/10.1101/2023.03.03.530958 (2023).
Biederman, J. A. et al. Shrubland carbon sink depends upon winter water availability in the warm deserts of North America. Agric. For. Meteorol. 249, 407–419 (2018).
Knox, S. H. et al. Agricultural peatland restoration: effects of land-use change on greenhouse gas (CO2 and CH4) fluxes in the Sacramento-San Joaquin Delta. Glob. Chang. Biol. 21, 750–765, https://doi.org/10.1111/gcb.12745 (2014).
Goulden, M. L. et al. An eddy covariance mesonet to measure the effect of forest age on land–atmosphere exchange. Glob. Chang. Biol. 12, 2146–2162 (2006).
Verma, S. B. et al. Annual carbon dioxide exchange in irrigated and rainfed maize-based agroecosystems. Agric. For. Meteorol. 131, 77–96 (2005).
Chen, L., Dirmeyer, P. A., Guo, Z. & Schultz, N. M. Pairing FLUXNET sites to validate model representations of land-use/land-cover change. Hydrol. Earth Syst. Sci. 22, 111 (2018).
Novick, K. A. et al. Informing Nature-based Climate Solutions for the United States with the best-available science. Glob. Chang. Biol. 28, 3778–3794 (2022).
Heiskanen, J., Brümmer, C. & Buchmann, N. The integrated carbon observation system in Europe. Bull. Am. Meteorol. Soc. 103, E855–E872 (2022).
Franz, D. et al. Towards long-term standardised carbon and greenhouse gas observations for monitoring Europe’s terrestrial ecosystems: a review. Int. Agrophys. 32, 439–455 (2018).
Metzger, S. et al. From NEON Field Sites to Data Portal: A Community Resource for Surface–Atmosphere Research Comes Online. Bull. Am. Meteorol. Soc. 100, 2305–2325 (2019).
Sabbatini, S. et al. Eddy covariance raw data processing for CO2 and energy fluxes calculation at ICOS ecosystem stations. Int. Agrophys. 32, 495–515 (2018).
Rebmann, C. et al. ICOS eddy covariance flux-station site setup: a review. International Agrophysics 32, 471–494 (2018).
Vitale, D. et al. A robust data cleaning procedure for eddy covariance flux measurements. Biogeosciences 17, 1367–1391 (2020).
Detto, M., Verfaillie, J., Anderson, F., Xu, L. & Baldocchi, D. Comparing laser-based open- and closed-path gas analyzers to measure methane fluxes using the eddy covariance method. Agric. For. Meteorol. 151, 1312–1324 (2011).
Kim, J., Verma, S. B. & Billesbach, D. P. Seasonal variation in methane emission from a temperate Phragmites-dominated marsh: effect of growth stage and plant-mediated transport. Glob. Chang. Biol. 5, 433–440 (1999).
Wofsy, S. C. et al. Net exchange of CO2 in a mid-latitude forest. Science 260, 1314–1317 (1993).
Bowling, D. R., Baldocchi, D. D. & Monson, R. K. Dynamics of isotopic exchange of carbon dioxide in a Tennessee deciduous forest. Global Biogeochem. Cycles 13, 903–922 (1999).
AmeriFlux Management Project. Network-at-a-Glance https://ameriflux.lbl.gov/about/network-at-a-glance/ (2017).
AmeriFlux Management Project. Data Variable. https://ameriflux.lbl.gov/data/aboutdata/data-variables/ (2015).
Running, S. W. et al. A global terrestrial monitoring network integrating tower fluxes, flask sampling, ecosystem modeling and EOS satellite data. Remote Sens. Environ. 70, 108–127 (1999).
Baldocchi, D. D. et al. FLUXNET: A new tool to study the temporal and spatial variability of ecosystem-scale carbon dioxide, water vapor, and energy flux densities. Bull. Am. Meteorol. Soc. 82, 2415–2434 (2001).
Pastorello, G. et al. The FLUXNET2015 dataset and the ONEFlux processing pipeline for eddy covariance data. Scientific Data 7, 225 (2020).
Papale, D. Ideas and perspectives: enhancing the impact of the FLUXNET network of eddy covariance sites. Biogeosci. 17, 5587–5598 (2020).
Knox, S. H. et al. FLUXNET-CH4 Synthesis Activity: Objectives, Observations, and Future Directions. Bull. Am. Meteorol. Soc. 100, 2607–2632 (2019).
Delwiche, K. B. et al. FLUXNET-CH4: A global, multi-ecosystem dataset and analysis of methane seasonality from freshwater wetlands. Earth Syst. Sci. Data 2021, 3607–3689 (2021).
Chu, H. et al. Temporal dynamics of aerodynamic canopy height derived from eddy covariance momentum flux data across North American Flux Networks. Geophys. Res. Lett. 45, 9275–9287 (2018).
Young, A. M. et al. Disentangling the Relative Drivers of Seasonal Evapotranspiration Across a Continental-Scale Aridity Gradient. Journal of Geophysical Research: Biogeosciences 127, e2022JG006916 (2022).
Moon, M., Li, D., Liao, W., Rigden, A. J. & Friedl, M. A. Modification of surface energy balance during springtime: The relative importance of biophysical and meteorological changes. Agric. For. Meteorol. 284, 107905 (2020).
Burakowski, E. A. et al. Simulating surface energy fluxes using the variable-resolution Community Earth System Model (VR-CESM). Theor. Appl. Climatol. 138, 115–133 (2019).
Fu, C., Wang, G., Goulden, M. L. & Scott, R. L. Combined measurement and modeling of the hydrological impact of hydraulic redistribution using CLM4. 5 at eight AmeriFlux sites. Hydrol. Earth Syst. Sci. 20, 2001–2018 (2016).
Fisher, J. B. et al. ECOSTRESS: NASA’s Next Generation Mission to Measure Evapotranspiration From the International Space Station. Water Resour. Res. 56, e2019WR026058 (2020).
Feagin, R. A. et al. Tidal Wetland Gross Primary Production Across the Continental United States, 2000–2019. Global Biogeochem. Cycles 34, e2019GB006349 (2020).
Zhou, H. et al. Evaluating the Spatial Representativeness of the MODerate Resolution Image Spectroradiometer Albedo Product (MCD43) at AmeriFlux Sites. Remote Sensing 11, 547 (2019).
Zeng, Q., Cheng, J. & Dong, L. Assessment of the Long-Term High-Spatial-Resolution Global LAnd Surface Satellite (GLASS) Surface Longwave Radiation Product Using Ground Measurements. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 13, 2032–2055 (2020).
Barnes, M. L. et al. Improved dryland carbon flux predictions with explicit consideration of water-carbon coupling. Communications Earth & Environment 2, 1–9 (2021).
Wang, X. et al. MODIS-Based Estimation of Terrestrial Latent Heat Flux over North America Using Three Machine Learning Algorithms. Remote Sensing 9, 1326 (2017).
Duffy, K. et al. Environmental Informatics Using Research Infrastructures and their Data: Fall 2020 Edition. https://doi.org/10.5281/zenodo.4576496 (2021).
Isaac, P. et al. OzFlux data: network integration from collection to curation. Biogeosciences 14, 2903–2928 (2017).
Sturtevant, C. et al. A process approach to quality management doubles NEON sensor data quality. Methods Ecol. Evol. 13, 1849–1865 (2022).
Badgley, G., Field, C. B. & Berry, J. A. Canopy near-infrared reflectance and terrestrial photosynthesis. Science Advances 3, e1602244 (2017).
Mammarella, I., Peltola, O., Nordbo, A., Järvi, L. & Rannik, Ü. Quantifying the uncertainty of eddy covariance fluxes due to the use of different software packages and combinations of processing steps in two contrasting ecosystems. Atmospheric Measurement Techniques 9, 4915–4933 (2016).
Metzger, S. et al. eddy4R 0.2. 0: a DevOps model for community-extensible processing and analysis of eddy-covariance data based on R, Git, Docker, and HDF5. Geoscientific Model Development 10, 3189 (2017).
Mauder, M. & Foken, T. Impact of post-field data processing on eddy covariance flux estimates and energy balance closure. Meteorol. Z. 15, 597–609 (2006).
Fratini, G., Ibrom, A., Arriga, N., Burba, G. & Papale, D. Relative humidity effects on water vapour fluxes measured with closed-path eddy-covariance systems with short sampling lines. Agric. For. Meteorol. 165, 53–63 (2012).
AmeriFlux Management Project. Uploading half-hourly/hourly data. https://ameriflux.lbl.gov/data/uploading-half-hourly-hourly-data/ (2017).
Pastorello, G. et al. Observational Data Patterns for Time Series Data Quality Assessment. 2014 IEEE 10th International Conference on e-Science, Sao Paulo, Brazil, 2014, pp. 271–278 (2014).
Law, B. E. et al. Terrestrial carbon observations: Protocols for vegetation sampling and data submission. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.687.4981&rep=rep1&type=pdf (2008).
AmeriFlux Management Project. BADM Standards. https://ameriflux.lbl.gov/data/badm/badm-standards/ (2021).
Liu, Y., San Liang, X. & Weisberg, R. H. Rectification of the bias in the wavelet power spectrum. J. Atmos. Ocean. Technol. 24, 2093–2102 (2007).
Faybishenko, B. et al. Challenging problems of quality assurance and quality control (QA/QC) of meteorological time series data. Stoch. Environ. Res. Risk Assess. 36, 1049–1062 (2022).
Acknowledgements
We thank the AmeriFlux community who generated high-quality data throughout the years and provided intellectual guidance for the data standards, policies, and sharing. AmeriFlux data portal and processing pipeline were supported by funding provided to the AmeriFlux Management Project by the U.S. Department of Energy’s Office of Science under Contract No. DE-AC02-05CH11231. D. Papale thanks the support of the Open-Earth-Monitor European Union’s Horizon Europe research project (GA 101059548). We acknowledge Yeongshnn Ong, Catherine van Ingen, Marty Humphrey, and Marilyn Saarni for contributing to the data pipeline and services development. We also thank many people who provided valuable feedback and helped test the web features and data pipeline. We acknowledge the Carbon Dioxide Information Analysis Center (CDIAC) at the Oak Ridge National Lab (ORNL) for maintaining the earlier AmeriFlux data repository before it was transitioned to AmeriFlux Management Project.
Author information
Authors and Affiliations
Contributions
Writing – original draft: H. Chu, D.S. Christianson, D.A. Agarwal, and M.S. Torn; Writing – review & editing: all co-authors; Data curation: H. Chu, D.S. Christianson, G. Pastorello, Y.-W. Cheah, S Dengel, SW Chan, S.C. Biraud, D.A. Agarwal; Formal Analysis: H. Chu, G. Pastorello, D.S. Christianson, Y.-W. Cheah; Conceptualization: D.A. Agarwal, D.S. Christianson, H. Chu, Y.-W. Cheah, G. Pastorello, and N.F. Beekwilder; Funding acquisition: M.S. Torn, D.A. Agarwal, S.C. Biraud, T.F. Keenan, and D. Baldocchi; Investigation: R. Hollowgrass, D.S. Christianson, H. Chu; Project administration: C. Buechner; Resources: K. Delwiche, K. Yi, A. Santos, D Baldocchi, D. Papale; Software: D.S. Christianson, Y.-W. Cheah, G. Pastorello, J. Geden, F. O’Brien, S. Ngo, K. Leibowitz, N.F. Beekwilder, and M. Sandesh; Validation: K. Delwiche, K. Yi, A. Santos, S. Dengel, S.W. Chan, S.C. Biraud; Visualization: D.A. Agarwal, D.S. Christianson, R. Hollowgrass, K. Leibowitz, F. O’Brien, Y.-W. Cheah, G. Pastorello, M.S. Torn, M. Sandesh
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Chu, H., Christianson, D.S., Cheah, YW. et al. AmeriFlux BASE data pipeline to support network growth and data sharing. Sci Data 10, 614 (2023). https://doi.org/10.1038/s41597-023-02531-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-023-02531-2
This article is cited by
-
Assessing the Performance of Flux Imbalance Prediction Models Using Large Eddy Simulations Over Heterogeneous Land Surfaces
Boundary-Layer Meteorology (2024)