Highly sampled measurements in a controlled atmosphere at the Biosphere 2 Landscape Evolution Observatory

Arevalo, Jorge; Zeng, Xubin; Durcik, Matej; Sibayan, Michael; Pangle, Luke; Abramson, Nate; Bugaj, Aaron; Ng, Wei-Ren; Kim, Minseok; Barron-Gafford, Greg; van Haren, Joost; Niu, Guo-Yue; Adams, John; Ruiz, Joaquin; Troch, Peter A.

doi:10.1038/s41597-020-00645-5

Download PDF

Data Descriptor
Open access
Published: 15 September 2020

Highly sampled measurements in a controlled atmosphere at the Biosphere 2 Landscape Evolution Observatory

Jorge Arevalo ORCID: orcid.org/0000-0002-6889-5395^1,2,
Xubin Zeng ORCID: orcid.org/0000-0001-7352-2764^1,3,
Matej Durcik³,
Michael Sibayan⁴,
Luke Pangle⁵,
Nate Abramson⁶,
Aaron Bugaj³,
Wei-Ren Ng³,
Minseok Kim³,
Greg Barron-Gafford ORCID: orcid.org/0000-0003-1333-3843^3,7,
Joost van Haren^3,8,9,
Guo-Yue Niu ORCID: orcid.org/0000-0003-2105-7690^1,3,
John Adams³,
Joaquin Ruiz^3,6 &
…
Peter A. Troch^1,3

Scientific Data volume 7, Article number: 306 (2020) Cite this article

1524 Accesses
1 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Land-atmosphere interactions at different temporal and spatial scales are important for our understanding of the Earth system and its modeling. The Landscape Evolution Observatory (LEO) at Biosphere 2, managed by the University of Arizona, hosts three nearly identical artificial bare-soil hillslopes with dimensions of 11 × 30 m² (1 m depth) in a controlled and highly monitored environment within three large greenhouses. These facilities provide a unique opportunity to explore these interactions. The dataset presented here is a subset of the measurements in each LEO’s hillslopes, from 1 July 2015 to 30 June 2019 every 15 minutes, consisting of temperature, water content and heat flux of the soil (at 5 cm depth) for 12 co-located points; temperature, relative humidity and wind speed above ground at 5 locations and 5 different heights ranging from 0.25 m to 9–10 m; 3D wind at 1 location; the four components of radiation at 2 locations; spatially aggregated precipitation rates, total subsurface discharge, and relative water storage; and the measurements from a weather station outside the greenhouses.

Measurement(s)	temperature of soil • wetness of soil • heat conductivity • atmospheric wind speed • Humidity • temperature of air • longwave radiation • shortwave radiation • atmospheric wind direction • hydrological precipitation process
Technology Type(s)	thermocouple • dielectrometry • thermopile • anemometer • weather station • pyrgeometer • pyranometer • Flowmeter Device
Factor Type(s)	replicate hillslope • location • height
Sample Characteristic - Environment	soil • atmosphere

Machine-accessible metadata file describing the reported data: https://doi.org/10.6084/m9.figshare.12881810

A high spatial resolution land surface phenology dataset for AmeriFlux and NEON sites

Article Open access 27 July 2022

SOIL-WATERGRIDS, mapping dynamic changes in soil moisture and depth of water table from 1970 to 2014

Article Open access 06 October 2021

Six years of high-resolution climatic data collected along an elevation gradient in the Italian Alps

Article Open access 10 July 2024

Background & Summary

The understanding of land-atmosphere interactions is important for improvements in Earth System Modelling^1,2,3 for climate assessment, weather prediction, and subseasonal-to-seasonal forecasts⁴. Although the impact of some of these interactions occur at large spatiotemporal scales affecting regional climates^5,6 through, e.g. soil moisture - precipitation feedbacks^7,8 and mesoscale circulations, they are primarily driven by local interactions between the land-surface and the atmospheric boundary layer^9,10. Studies of these interactions face three major challenges^11,12,13: (1) lack of observations with the adequate spatiotemporal resolution and precision¹⁴, (2) uncertainties due to the large number of processes and feedbacks involved, and (3) the difficulty of controlled and replicated experimentation.

In this context, the Biosphere 2 of the University of Arizona is committed to contributing to the understanding of the environment through the experimentation in several large-scale, highly-controlled and densely-monitored model ecosystems. Knowledge acquired in those experiments helps scientists to build or improve computer models representing the physical, biological, and chemical processes to be tested later in nature. These models, in turn, help guide new experiments in the Biosphere 2.

One of those model ecosystems is the Landscape Evolution Observatory¹⁵ (LEO). Its main goal is to better understand, through controlled experimentation, the physical, chemical, and biological processes occurring in the critical zone at the hillslope scale and their interactions with the atmosphere in the context of landscape evolution and climate change. LEO’s design^12,15,16,17 was driven by the need of controlled experimentation at a larger scale than available in the past, and it was the result of a large scientific community input¹⁸, with a focus on interdisciplinary research.

Research in LEO has been mostly focused on the hydrological and biogeochemical processes at the hillslope scale, including water and tracer transport^19,20,21, microbial patterns and soil evolution^22,23, and bare soil carbon cycling^24,25 but the dataset has not been extensively used for land-atmosphere interaction studies. Hence, part of the measurements in LEO are being made public to the scientific community for advancing our understanding in the microscale land-atmosphere interactions.

Atmospheric variables included in this dataset are temperature, relative humidity, wind speed, 3D components of the wind vector and the four components of radiation. Volumetric water content, heat flux, and temperature from the topmost soil layer (at 5 cm depth) are also included. Precipitation, discharge, and water storage content were measured for each hillslope and made available to complete the water related variables. Additionally, measurements from an automatic weather station outside of the bays are included as reference for the outside weather conditions. All the data are available every 15 minutes from 1 July 2015 to 30 June 2019. An automated data quality control was performed to account for missing values, expected range of measurements, outliers, and spatial and temporal consistency.

This dataset is expected to contribute to our understanding of the land-atmosphere interactions by providing a highly detailed set of measurements in a controlled environment. Science questions that could be addressed with this dataset include, but are not limited to, what is the microscale spatial variability of atmospheric and land surface states in a controlled environment? how does this microscale variability change diurnally, from day to day, and seasonally? What is the temporal relationship between the atmospheric and land surface microscale variabilities? How do atmospheric variables vary with height? What are the surface turbulent fluxes over bare soil^26,27 through the closure of water²¹ and energy balances? what is the relationship of these turbulent fluxes with atmospheric and land surface states (e.g., the vertical gradient of atmospheric variables, the horizontal variance of near-surface atmospheric and soil variables)? It is further expected that the analysis of the existing data set can lead to new hypotheses about the interactions between the land surface and the atmosphere, and that these hypotheses can be tested through experimentation involving manipulation of environmental variables, such as rainfall and wind speed.

Methods

Site and instruments description

LEO (https://biosphere2.org/research/projects/landscape-evolution-observatory) is located at Biosphere 2, Oracle, Arizona, USA (https://biosphere2.org) and operated by the University of Arizona. It is composed of three near-identical greenhouses (Fig. 1a) covered but not sealed, by an 11 mm thick glass with an interior mylar sheet. The glass has a solar heat gain coefficient of 0.7, transmitting between 50% and 60% of total solar radiation but less than 1% of UV solar radiation^28,29. The three greenhouses (Fig. 1a) are named East, Center and West bays, each of them containing an air volume of approximately 12,550 m³, 12,950 m³ and 12,550 m³, respectively; they are all facing to the south-southwest. Although the enclosed atmosphere could be highly controlled, it has been most of the time naturally driven except for precipitation during the experiments and temperature with the purpose of keeping the bays at temperatures allowing the work of the scientists.

Inside each bay there is one artificial bare soil hillslope of 11 × 30 m² of surface with an average slope of 10° and 1 m average soil depth. The soil corresponds to ground basaltic tephra with a loamy sand texture and a dry bulk density of 1.5 gcm⁻³; more detailed information on soil physical and chemical properties can be found in the main article describing LEO¹⁵.

Buried in the soil of each hillslope are more than 1,200 sensors measuring soil water content, soil water potential, soil temperature, soil carbon dioxide concentration, heat flux, electrical resistivity, and hydrostatic water pressure. There are also more than 630 sampling points, allowing physicochemical analyses of water and gases within the soil. Outside the hillslopes, water storage in the soil is accounted for through 10 large load cells for each hillslope whereas discharge is monitored by a combination of tipping buckets and electromagnetic flowmeters. Above ground, there are more than 50 sensors in each bay to monitor the enclosed atmosphere by measuring temperature, relative humidity, wind speed and direction, and radiation fluxes at different heights (Fig. 1b). Precipitation is not measured directly but precisely controlled by the irrigation system. The most recent data acquired in LEO can be visualized in http://biosphere2.org/research/leo-data.

Although all the monitored variables in LEO are valuable for the scientific community, the scope here is to provide data that helps to improve our understanding of the soil-atmosphere interactions. Careful processing, quality control, and then sharing of other data, including that from more than 3,000 sensors buried deeper in the soil, are left to future efforts. Hence, this dataset³⁰ compiles part of the LEO measurements in the three individual hillslopes of (1) meteorological variables above the hillslopes’ surface, (2) soil moisture, heat flux and temperature of the soil near the surface, (3) precipitation, discharge and water storage aggregated for the entire hillslope, and (4) meteorological variables outside LEO from an automatic weather station. A basic description of each instrument used for the measurements included in this dataset is available in Tables 1, 2, and 3, while their locations within each bay are shown in Fig. 2.

Table 1 Summary of the above ground instrument’s specifications.

Full size table

Table 2 Summary of the in-soil instrument’s specifications.

Full size table

The atmospheric sensors are located on five retractable masts at 0.25, 1, 3, 6, and 9–10 m above ground, with the exception of the mast at 4 m from the bottom of the hillslope where only the four lowest levels are present. At each level along the masts (Fig. 2), air temperature, relative humidity, and wind speed and direction are measured. On the two masts located at 10 m from the bottom of the hillslope, there are 4-channel radiometers measuring downward and upward longwave and shortwave radiation fluxes. In addition, a 3D sonic anemometer is located on the mast at 17 m from the bottom of the hillslope. Specifications for the above ground instruments are listed in Table 1. All masts are lifted during each rain event to avoid interference with the rain and nonuniform erosion of the soil through dripping. When the masts are lifted, most of the above ground measurements are not usable, as the location and orientation of the sensors are changed. Only the topmost measurements of temperature and relative humidity show some temporal consistency during such rain periods as they do not have much dependence on the orientation of sensors and their location is only slightly modified.

Among the sensors buried in the soil, there are measurements of near surface (at 5 cm depth) soil heat flux, soil temperature, and soil volumetric water content which are also included in the dataset (Fig. 2, Table 2). Heat flux is measured with two independent instruments at each location, HFP-1 and HFP-1SC.

Precipitation, discharge and water storage are also monitored in LEO, but they required additional processing, which was performed and explained in the following paragraphs. Details of the instruments and the aforementioned hillslope-scale quantities are available in Table 3.

Table 3 Summary of hillslopes instrument’s specifications.

Full size table

Precipitation is not directly measured but highly controlled through a complex irrigation system. The total volume of water flowing through the irrigation main line is recorded and then converted to rain rates by dividing the water volume by the area of the hillslopes (330 m²) and by the time aggregation period in hours, to be included in this dataset. Droplet size distributions, terminal velocities of the droplets, and spatial homogeneity of the precipitation have been studied in the past¹⁵, showing that the irrigation system is able to produce rain droplets that achieve velocities close to that of natural rain. The spatial distribution has coefficients of variation between 0.2 and 0.7 with more homogeneous distributions occurring at higher rain rates.

LEO’s discharge is routed through a porous plate at the seepage face (11 m²) to six dividers and flow through a SeaMetrics PE102 flow meter (for high flow discharge) and then through a NovaLynx 26-2501-A tipping bucket gauge (for low flow discharge), both of which are included in this dataset. The Center hillslope had developed a leak at the bottom of the hillslope and additional flow meter and tipping bucket gauge were added to account for this in the total discharge. Flow meters have a low accuracy for flows below the equivalent to 0.025 mmh⁻¹, while tipping buckets tend to underestimate the high flows. Our dataset also includes the computed total discharge as a reference. This quantity was calculated as the sum of the flows of the best available measurement, in each divide, based on the discharge rate. The flow meter (tipping bucket) value was selected for flows higher (lower) than 0.025 mmh⁻¹ if both measures passed the quality control or whichever is available if only one passed the quality control. For this computation, a zero value was assigned to a divider when there was no discharge flow or measurement were missing or below the range.

Our dataset also includes measurements from 10 Honeywell Model 3130 load cells on each hillslope to monitor the total water storage. Relative water storage was computed by adding the weights from the 10 load cells (only when all of the measures passed the quality control) and then subtract the total mass weight from the lowest water storage content during the time period of this dataset, occurred on 6 November 2016. Assuming a water density of 1,000 kgm⁻³, the results were divided by the slope area (330 m²) to obtain the relative water storage content in mm which is also included in the dataset.

Although the greenhouses in LEO allow the control of many variables, some of them are still impacted by the external conditions. For instance, solar radiation inside the bays is directly dependent on the amount of solar radiation outside LEO and indoor temperature is highly impacted by the outside temperature. Hence, data from a weather station WeatherHawk 710 located outside the building were also included. Pressure, solar radiation (300–1100 nm), temperature and relative humidity of the air, are the prevalent variables impacting the internal environment of LEO but precipitation, wind speed and wind direction were also included for completeness.

Experiments and precipitation control in LEO

Several specific experiments have been conducted in LEO which has led to particular rain patterns. In the provided dataset, there were two extensive tracer experiments carried out by the end of 2016 and in mid-2019.

The first extensive experiment was a 28-day tracer experiment conducted from 1 to 28 December 2016. The experiment was designed to observe the transit time distributions (TTDs) and the StorAge Selection (SAS) functions³¹, which are system-scale hydrologic transport signatures. Those functions were directly observed using the experimental protocol PERiodic Tracer Hierarchy (PERTH) method³². The method required driving the hillslopes to a periodic steady state. Therefore, the hillslopes were irrigated with two 3-hour pulses of 12 mmh⁻¹ at 7-hour intervals every 3.5 days.

Before this experiment, no irrigation was performed within the period covered by this dataset until 6 November 2016, leading to the driest period recorded in those hillslopes. After this experiment was concluded, no irrigation was performed for about 4 months. In order to support the biogeochemical dynamic of the soil, a quasi-regular irrigation sequence, with precipitation almost every two weeks but with a few longer dry periods in between was performed until late June 2019 when the next major experiment started.

The second major experiment was conducted from 24 June to 16 August 2019, with only 7 days within this dataset. Its goal was to test a new TTD estimation method, which is not limited to a periodic steady state. The irrigation sequence was generated stochastically using a rainfall generator³³. During this experiment, the total irrigation amount was about 750 mm with a mean irrigation rate of about 5.3 mmh⁻¹. The mean duration of the irrigation pulses was 2.5 hours, and the mean inter-irrigation time was about 17 hours. Due to the complicated irrigation sequence, the mast operation was different from its regular operation and the corresponding times when masts were lifted were flagged in the quality control companion files.

Instruments calibration and uncertainty

All the instruments were calibrated in the factory by the manufacturers, some of which have unique calibration coefficients that are applied before storing the data. The uncertainty reported in Tables 1, 2, and 3 correspond to that reported by the manufacturers. Additional uncertainties arise from different sources such as the data-acquisition devices, numerical roundoff, and dependence on other environmental conditions, among others.

The DVI7911 anemometers are known to have a very low accuracy for wind speeds below 0.5 ms⁻¹ which occurs within LEO more than 99% of the time. Their corresponding wind vanes have even lower accuracy for such wind speeds and hence they were not included in this dataset. The CNR4 net radiation sensor is composed of 2 pyranometers and 2 pyrgeometers, each of them having unique calibration coefficients that are applied before storing the data. They also have internal temperature sensors that automatically compensates for thermal drifts.

For the volumetric water content, a calibration curve was developed specifically for the LEO soil through lab experiments based on four 5TM sensors buried in a sample of the same basaltic material with the same bulk porosity. This fitting curve allows to derive the volumetric water content from the directly measured dielectric permittivity, with a 95% confidence. The custom calibration was applied by the factory for Biosphere 2.

The EX81 flowmeter sensors use a calibration coefficient dependent on the material and size of the irrigation piping and provides the total volume of water sprayed by the irrigation system. This does not account for the small losses of water that fall outside the hillslopes’ surface nor the evaporation from the rain drops that occurs before they reach the surface. Hence, the precipitation is slightly overestimated by this measure. To reduce the uncertainty in the total precipitation rates, the changes in water storage content derived from the load cells and mass conservation can also be used.

The PE102 flowmeters have a low flow cutoff of 0.025 mmh⁻¹, hence the flow is then routed to the tipping buckets which in contrast are known to underestimate the high flows. The Novalynx conversion rate between tip pulses and volume of water was derived empirically through manual calibration. The total discharge computed for this dataset has a larger uncertainty due to the assumption of no flow when data were missing or below the range. Therefore, a more detailed filling of the missing data is recommended for case studies.

Data aggregation

The data were typically logged every 15 minutes but during specific experiments where high temporal resolution was required, the data were logged every minute for many sensors. Hence, in order to homogenize the series, all the data were aggregated/averaged to 15-minute intervals and labeled with the time for the end of the aggregation period.

Data Records

This dataset³⁰ is available in comma separated files. For the data inside LEO, there is one file for each variable and bay with the name BAY_VARIABLE.csv, where BAY is one of the three bays, i.e. East, Center or West and VARIABLE is the variable code listed in Tables 1, 2, and 3. The first column of each file provides the time corresponding to the end of the aggregation period for each record in the format YYYY-MM-DD HH:MM:SS (Mountain Standard Time) and the subsequent columns provide the time-aggregated value for the variable as measured by the sensor identified in the header row using the unit listed in the range of measurement in Tables 1, 2, and 3. The sensor naming convention, BAY_VARIABLE_UP_CROSS_LVL, is comprised of the bay identifier, the variable code, and three numerical identifiers of position, where BAY and VARIABLE are as defined before, UP and CROSS are the coordinates in meters for the up-slope and cross-slope directions respectively, measured from the center of the bottom edge of each hillslope, as illustrated in Fig. 2. LVL is an integer with a value of 0 for the instruments with a single available level (i.e., ST, VWC, SHF, DSW, USW, DLW, ULW, U, V, W, RRate, QHigh, QLow, QRate, Mass, and RelS) and from 1 to 5 for the sensor heights in the masts for multilevel data (lowest to highest).

For each variable and bay there is also a file containing a quality control (QC) flag code, as explained in the Technical Validation section below. Each file is named BAY_VARIABLE_qc.csv and has the same dimensions, records and format as the data record files but instead of the aggregated data, an integer QC flag code is provided (Table 4).

Table 4 Quality control codes. Specific details of the quality control process are in the text.

Full size table

Finally, the files EXTAWS.csv and EXTAWS_qc.csv provide the data and the QC flag codes for the automatic weather station located outside LEO respectively. The format is similar to the aforementioned files with the first column corresponding to the time of the end of the aggregation period for each record and the subsequent columns corresponding to the time-aggregated values for each variable or the QC flag code respectively. The header row indicates the variable in each column; the unit of measurement for each variable is the same as in the previous files and for pressure (P) is hPa. The QC flag codes are the same as defined in Table 4. Although the EXTAWS files span the same time period as the LEO data, the first non-missing record correspond to 18 February 2016 12:30 (Mountain Standard Time; or 19:30 UTC).

All the files contain exactly one row for each aggregation period of 15 minutes (140,257 records per sensor), even if no observations are available for such time. The missing values were filled with the value −9999.9. The quality control files have no missing values.

Technical Validation

Quality control overview

Although the quality and calibration of the sensors were carefully considered and protocols to avoid problems in the measurements have been followed, data were still prone to errors. Hence, an automated data quality control has been performed to account for missing values, values out of the range, outliers, temporal inconsistency and spatial inconsistency. No data were deleted from the dataset but a companion file with QC flag codes was included. Additionally, during the irrigation periods (rain), the masts were lifted and hence the above ground sensors were moved from their original locations and consequently flagged. Finally, data for some sensors were manually flagged to account for obvious problems.

The different quality control codes are compiled in Table 4. They were chosen to be a power of 2, to mark one specific bit in its binary representation. As errors in the data could be attributed to more than one type, a QC code was assigned to each record as the sum of all the codes that apply to the specific errors, or equivalently the error code has marked each bit corresponding to the sources of error in its binary representation. A value of 0 means that this record has passed all the quality controls. Different sources of error were analyzed sequentially, excluding the data flagged in the previous steps.

Missing and constant data

Missing data records were flagged with a value of 1 (or equivalently the first bit of the QC flag was marked). During some periods of time, the values stored for some sensors were found to be frozen or remained constant. For this reason, a search in the data looking for at least 4 hours of constant values for the air temperature, relative humidity, volumetric water content, soil temperature, and soil heat flux was performed to flag the corresponding records as manually flagged by adding 128 to the QC code or equivalently the eighth bit being marked; while for radiation variables, constant data was searched for periods of 24 hours. Wind, precipitation, discharge flows, mass and relative water storage variables were excluded from this check because time periods with constant or nearly constant values are possible for these variables in LEO.

Values out of range

Each record was quality controlled to ensure that the value is within the plausible range of measurements for that sensor within the LEO bays; minimum (R_min) and maximum (R_max) of the corresponding range for each variable/instrument are listed in Table 5. Whenever the data value was lower than R_min, the QC code was incremented by 2 (or equivalently the second bit was marked). If the data value was higher than R_max, the QC code was incremented by 4 (or equivalently the third bit was marked). It is important to note that it is not uncommon for temperature records on the topmost sensors to reach about 70 °C. Although the specifications of the sensors indicate that the valid range of measurements is up to 60 °C, there is no evidence of those values to be wrong, and hence they can still be used with caution.

Table 5 Quality control coefficients.

Full size table

Lifting of the masts

The masts holding the above ground instruments were lifted a few minutes before each controlled rain event and were kept in a horizontal position up to a few minutes after the rain has ended. Hence, records for each above ground sensor during a time window extending 60 minutes before to 60 minutes after any rain period, were flagged with the code corresponding to “Masts lifted” by adding 64 to its corresponding QC code or equivalently turning on the seventh bit in its binary representation. During the last 7 days of the period covered by this dataset, a tracer experiment with randomly generated sequences of rain was performed. This led in periods with the masts lifted beyond the usual window, consequently the corresponding records were manually flagged as “Masts lifted”.

Outliers

Individual records that highly deviated from the mean usually, but not always, correspond to errors in the measurements. Reasons for such values require a case by case analysis beyond the scope of this article. A variable/sensor-type specific threshold was defined here (k_o in Table 5), based on empirical trials to define an outlier. Note that even after defining such a threshold, it is still not possible to indicate that the outliers are all incorrect data. Hence, a case by case outlier analysis is advised if the use of the data could be affected by extreme values. The QC procedure for this outlier detection analysis is as follows: for each sensor, the centered running mean (μ_r) and centered running standard deviation (\({\sigma }_{r}\)) with a time window of 30 days for the entire time series were computed. Then, for any value outside the range \({\mu }_{r}\pm {k}_{o}\cdot {\sigma }_{r}\), the corresponding QC code was incremented by 8 (equivalently, the fourth bit was marked). As the occurrence of precipitation and discharge is sparse, an analysis in this way is not possible, and hence this test was not applied to these variables.

Temporal consistency

In an enclosed environment, sudden changes of atmospheric or soil parameters are not expected for most variables. Hence, when the change in the value between consecutive aggregation periods is high, it is reasonable to think that some error in the measurement is possible. However, it is not easy to define how large must be for such change to be considered anomalous. Therefore, after several trials, a specific threshold was defined for each variable/sensor-type (k_t in Table 5). The procedure was as follows: for every sensor the differences between any record and the record for the previous aggregation period were computed (∇), then the centered running mean (\({\mu }_{r}^{\nabla }\)) and the centered running standard deviation (\({\sigma }_{r}^{\nabla }\)) for such differences were computed with a time window of 30 days, any record would be flagged as temporarily inconsistent when that difference is out of the range \({\mu }_{r}^{\nabla }\pm {k}_{t}\cdot {\sigma }_{r}^{\nabla }\). For the flagged records, the QC flag code was incremented by 16 (equivalently, the fifth bit was marked). This QC check was not applied to precipitation, wind speed, shortwave radiation, volumetric water content, mass, water storage content, nor discharge because such sudden changes for these variables are possible. For precipitation, water content, and discharge this is expected during irrigation periods. Shortwave radiation fluxes also have abrupt changes at sunrise and sunset or during cloud passages.

Spatial consistency

It is expected that in an enclosed environment with sensors very close to each other, the measurement of any given variable at the same vertical level present a very high correlation and values for the same aggregation period are expected to be very similar. In order to identify the data that do not present such spatial consistency, a two-step procedure was applied:

1.
The standard deviation between all the sensors for a given variable at the same vertical level was computed for each aggregation time (s), the centered running mean value of this new time series was also calculated with a 30 days window (\({\mu }_{r}^{s}\)). With those values, the aggregation times when s was higher than a variable-specific constant (k_m in Table 5) times the mean standard deviation \({\mu }_{r}^{s}\) were selected for further analysis. Those selected aggregation times show a “high” inter-location standard deviation, suggesting that the value for at least one of the locations is very different compared to others at the same level for that time; hence, the second step is devoted to identify which value contributes to such large standard deviation.
2.
If the standard deviation recalculated after removal of one specific sensor (s₋₁) is highly reduced, we could conclude that the value for that sensor was very different to the others and hence it was flagged. The criterion used for such decision was to identify the sensor that after removal reduces the standard deviation to less than a variable/sensor-type specific fraction (k_s in Table 5) of the original standard deviation with all the sensors for that level.

In summary, any record for an aggregation time when \(s > {k}_{m}\cdot {\mu }_{r}^{s}\) where \({s}_{-1}/s < {k}_{s}\) were flagged as spatially inconsistent adding 32 to the QC flag code (equivalently, the sixth bit was marked). This check was performed for wind speed, air and soil temperatures, relative humidity, volumetric water content and soil heat flux, as such variables have more than the three sensors at the same vertical level, as required by this method.

Checks for the external weather station

The data from the external weather station were also quality controlled. Wind direction, was marked as manually flagged (128 added to QC code, or the eighth bit marked) every time the wind speed was lower than 0.5 ms⁻¹, due to the low accuracy of the sensor below such threshold. For wind speed the high extreme of the range was set to 40 ms⁻¹, instead of the 15 ms⁻¹ used for the inside LEO instruments, as higher values are possible when measured in the exterior. Spatial consistency was not possible to be assessed with just one location, hence that check was not performed. Temporal consistency and outlier checks were performed only for relative humidity, temperature and atmospheric pressure with values for the outlier threshold k_o of 5.5, 3.5, and 4.0 respectively, while the temporal threshold k_t was set to 10.0 for all of them.

Quality control results

After the application of the previously described QC checks total, missing, flagged and non-flagged number of records were accounted for and are shown in Table 6. Only about 0.2% of the wind speed records are non-flagged, this is mostly due to speeds below the threshold of 0.5 ms⁻¹, which is expected for indoor measurements; therefore, wind direction is not reliable and it was not included in the dataset. For the case of the 3D components of the wind a much higher percentage of the records passed the QC as the sensor is much more reliable for low wind speeds; most of the flagged records correspond to missing values, mainly from the sensor located in the West Bay where frequent and long outages occurred. Beyond the missing values, the most common cause for flagged records in the 3D wind components was the lifting of the masts.

Table 6 Summary of the quality control results.

Full size table

Temperature and relative humidity records passed the QC checks in more than 85% of the cases. The missing records were mostly from an outage period of about six months during 2016 that affected those sensors in the Center and West bays and also from several shorter periods of constant values that were marked as manually flagged due to unrealistic constant values for at least 4 hours.

Longwave radiation has less than 12% of the data flagged with just a few of them for reasons other than missing values or the lifting of the masts. A more detailed analysis is advised for the longwave radiation data between late 2017 and mid 2018 in the West bay, as those data show a much larger dispersion than in any other period and bay. For downward shortwave radiation, only about 40% of the data were not flagged, with about 350,000 records showing slightly negative values (about other 40%) but, in most cases, they could be considered as 0 Wm⁻² as they mostly occurred at night. In the case of upward shortwave radiation a similar situation occurs, but a 70% of the data passed the QC.

For the soil variables, both heat flux sensors (HFP1 and HFP1sc) at location 0_12_0 of the Center bay showed mostly anomalous values and hence all the records for such sensors were manually flagged. Soil temperature exhibits more than 89% of records passing the QC, with most of the flagged values being just missing records. More than 45% of the volumetric water content were flagged due to missing values or slightly negative values during the driest periods (most of the negative values could be interpreted as a value near 0% of VWC).

For precipitation, only one value was found to be above the range of expected measurements; although 12.2% of the precipitation data were missing, most of them could be assumed to be 0 mmh⁻¹, as they mostly occurred during no-irrigation periods. About 13% of the data from the tipping buckets and flowmeters were QC flagged, mostly because of data being missing or negative, with only 308 and 16 records flagged due to other reasons. It is also reasonable to assume a flow of 0 mmh⁻¹ for missing or below range values. The total discharge data would pass the QC checks when low or high discharge was available. In the case of mass measured from the load cells, there were about 350,000 missing data, while about 30,000 records were out of range. Flagged mass data resulted in about 50,000 missing records for relative water storage content while less than 300 records were flagged as outliers.

After the QC, several key variables of the energy and water cycles were aggregated at a daily time scale and spatially averaged, if applicable, for each of the three artificial hillslopes in LEO. Only the non-flagged data were used to prepare the plot, with the only exception of the radiation components set to 0 Wm⁻² when they were flagged as below range. The spatial average data were computed using all the non-flagged 15-minute records for the applicable variable. These data were then used to compute the daily aggregated value when at least 75% of the daily data were available.

The resulting aggregated time series in Fig. 3 show consistent day-to-day, seasonal, and interannual variabilities related to the energy cycle in the three hillslopes. The variability of water cycle variables is also consistent with that of the controlled precipitation. Figure 3 also shows an outage of approximately 6 months in the West and Center bays, during the driest period when no irrigation was performed.

Overall the dataset exhibits good quality. Indeed, a very low number of records were flagged due to reasons other than missing values, making it a suitable resource for case studies at high temporal resolution and very high spatial resolution.

Usage Notes

This dataset does not represent a natural environment as most of the variables are heavily influenced by the greenhouses microclimate.

Gap filling of the flagged data

In order to make use of the data, filling of the gaps due to QC flagged data could be required. The high density of the sensors for some variables, the redundant information for others, and their own behavior can be exploited to fill the gaps more accurately than in other environments.

For temperature and relative humidity of the air, and heat flux, temperature and volumetric water content of the soil, the measurements on any of the other sensors at the same level could provide a good estimate of the tendencies when data from just a few sensors are not available. If it is a short period of time, standard interpolation methods are still suitable.

Shortwave radiation must be 0 Wm⁻² during night, when most of the records flagged as below range happened to occur. Tendencies of downward shortwave radiation among the two sensors in each bay are expected to be similar, but that assumption could not be valid for its upward counterpart as they are located over portions of the hillslopes with opposite slopes as could be seen in Fig. 1. For longwave radiation, the data flagged as below range also could be assumed as 0 Wm⁻² in most of the cases, and it could be checked using the temperature measurements and the basic radiation laws.

Precipitation data could be adjusted, and its gaps filled, using mass conservation and the water storage content; although almost every missing precipitation record occurred during periods of no irrigation (except for the missing data during January 2018). Similar approaches could be also used for the discharge data.

Quality control usage

Most of the flagged values do not necessarily represent bad data; instead they are not typical and require more detailed analysis. For data access and analysis, the users can use bitwise logical operators to identify specific sources of flagging. For instance, to identify if a record is flagged due to temporal inconsistency, bitwiseand(flagCode,16) = = 16 where 16 is the code for temporal inconsistent data and bitwiseand is the function to perform bit by bit AND operation that correspond to the programming language of the user.

Code availability

The processing of all the data was performed using R v3.3.2 within R Studio v1.1.447 for Mac. The library openair³⁴ v2.0.0 was used for time aggregation, and the library caTools v1.17.1.1 was used to compute rolling means and rolling standard deviations of the time series. Figure 3 was produced with MATLAB^TM R2019b Update 4. The code is publicly available in figshare³⁵. It contains all the functions to read, homogenize, and quality control the raw data files to produce the currently shared dataset. Also included are functions to read the data and QC flag codes on the final dataset in conjunction with the script used to produce the aggregated data for Fig. 3 that could be used as an example to process the data.

References

Wood, E. et al. Hyperresolution global land surface modeling: Meeting a grand challenge for monitoring Earth’s terrestrial water. Water Resour. Res. 47(5), (2011).
Prentice, I., Liang, X., Medlyn, B. & Wang, Y. Reliable, robust and realistic: the three R’s of next-generation land-surface modelling. Atmos. Chem. Phys. 15, 5987–6005 (2015).
Article ADS CAS Google Scholar
Bierkens, M. et al. Hyper‐resolution global hydrological modelling: what is next? “Everywhere and locally relevant”. Hydrol. Process. 29, 310–320 (2015).
Article ADS Google Scholar
Prodhomme, C., Doblas-Reyes, F., Bellprat, O. & Dutra, E. Impact of land-surface initialization on sub-seasonal to seasonal forecasts over Europe. Clim. Dynam. 47, 919–935 (2016).
Article ADS Google Scholar
Jimenez, P., de Arellano, J., Navarro, J. & Gonzalez-Rouco, J. Understanding land–atmosphere interactions across a range of spatial and temporal scales. B. Am. Meteorol. Soc. 95, ES14–ES17 (2014).
Article Google Scholar
Salazar, A., Baldi, G., Hirota, M., Syktus, J. & McAlpine, C. Land use and land cover change impacts on the regional climate of non-Amazonian South America: A review. Global Planet. Change 128, 103–119 (2015).
Article ADS Google Scholar
Guillod, B., Orlowsky, B., Miralles, D., Teuling, A. & Seneviratne, S. Reconciling spatial and temporal soil moisture effects on afternoon rainfall. Nat. Commun. 6, 6443 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Welty, J. & Zeng, X. Does soil moisture affect warm season precipitation over the Southern Great Plains? Geophys. Res. Lett. 45, 7866–7873 (2018).
Article ADS Google Scholar
Alemohammad, S. et al. Water, Energy, and Carbon with Artificial Neural Networks (WECANN): A statistically-based estimate of global surface turbulent fluxes and gross primary productivity using solar-induced fluorescence. Biogeosciences (Online) 14, 4101–4124 (2017).
Article ADS Google Scholar
Chen, L. & Dirmeyer, P. Global observed and modelled impacts of irrigation on surface temperature. Int. J. Climatol. 39, 2587–2600 (2019).
Article Google Scholar
Santanello, J. Jr et al. Land–atmosphere interactions: the LoCo perspective. B. Am. Meteorol. Soc. 99, 1253–1272 (2018).
Article ADS Google Scholar
Hopp, L. et al. Hillslope hydrology under glass: confronting fundamental questions of soil-water-biota co-evolution at Biosphere 2. Hydrol. Earth Syst. Sc. 13, 2105–2118 (2009).
Article ADS Google Scholar
Draper, C., Reichle, R. & Koster, R. Assessment of MERRA-2 land surface energy flux estimates. J. Climate 31, 671–691 (2018).
Article ADS Google Scholar
Balsamo, G. et al. Satellite and in situ observations for advancing global Earth surface modelling: A review. Rem. Sens. 10, 2038 (2018).
Article ADS Google Scholar
Pangle, L. et al. The Landscape Evolution Observatory: A large-scale controllable infrastructure to study coupled Earth-surface processes. Geomorphology 244, 190–203 (2015).
Article ADS Google Scholar
Volkmann, T. et al. In Hydrology of Artificial and Controlled Experiments (ed. Liu, J. & Gu, W.) Ch. 2 (IntechOpen, 2018).
Sengupta, A. et al. In Terrestrial Ecosystem Research Infrastructures: Challenges and Opportunities (ed. Chabbi, A. & Loescher, H.) Ch 4 (CRC Press, 2016).
Huxman, T. et al. The hills are alive: Earth science in a controlled environment. Eos, Trans. Am. Geophys. Un. 90, 120–120 (2009).
Article ADS Google Scholar
Wang, C. et al. Particle tracer transport in a sloping soil lysimeter under periodic, steady state conditions. J. Hydrol. 569, 61–76 (2019).
Article ADS CAS Google Scholar
van den Heuvel, D. et al. Effects of differential hillslope‐scale water retention characteristics on rainfall–runoff response at the Landscape Evolution Observatory. Hydrol. Process. 32, 2118–2127 (2018).
Article ADS Google Scholar
Pangle, L. et al. The mechanistic basis for storage‐dependent age distributions of water discharged from an experimental hillslope. Water Resour. Res. 53, 2733–2754 (2017).
Article ADS Google Scholar
Sengupta, A. et al. Assessing microbial community patterns during incipient soil formation from basalt. J. Geophys. Res. Biogeo. 124, 941–958 (2019).
Article Google Scholar
Pohlmann, M. et al. Pore water chemistry reveals gradients in mineral transformation across a model basaltic hillslope. Geochem. Geophy. Geosy. 17, 2054–2069 (2016).
Article ADS CAS Google Scholar
Cueva, A., Volkmann, T., Haren, J., Troch, P. & Meredith, L. Reconciling negative soil CO₂ fluxes: Insights from a large-scale experimental hillslope. Soil Systems 3, 10 (2019).
Article CAS Google Scholar
van Haren, J. et al. CO₂ diffusion into pore spaces limits weathering rate of an experimental basalt landscape. Geology 45, 203–206 (2017).
Article ADS Google Scholar
Zeng, X., Wang, Z. & Wang, A. Surface skin temperature and the interplay between sensible and ground heat fluxes over arid regions. J. Hydrometeor. 13, 1359–1370 (2012).
Article ADS Google Scholar
Bitelli, M. et al. Coupling of heat, water vapor, and liquid water fluxes to compute evaporation in bare soils. J. Hydrol. 362, 191–205 (2007).
Article Google Scholar
Finn, M. The mangrove mesocosm of Biosphere 2: design, establishment and preliminary results. Ecol. Eng. 6, 21–56 (1996).
Article Google Scholar
Marino, B. et al. The agricultural biome of Biosphere 2: Structure, composition and function. Ecol. Eng. 13, 199–234 (1999).
Article Google Scholar
Arevalo, J. et al. Highly sampled measurements in a controlled atmosphere at the Biosphere 2 Landscape Evolution Observatory. PANGAEA https://doi.org/10.1594/PANGAEA.912032 (2020).
Harman, C. J. Time‐variable transit time distributions and transport: Theory and application to storage‐dependent transport of chloride in a watershed. Water Resour. Res. 51(1), 1–30 (2015).
Article ADS MathSciNet CAS Google Scholar
Harman, C. J. & Kim, M. An efficient tracer test for time-variable transit time distributions in periodic hydrodynamic systems. Geophys. Res. Lett. 41, 1567–1575 (2014).
Article ADS Google Scholar
Robinson, J. S. & Sivapalan, M. Temporal scales and hydrological regimes: Implications for flood frequency scaling. Water Resour. Res. 33(12), 2981–2999 (1997).
Article ADS Google Scholar
Carslaw, D. C. & Ropkins, K. Openair - an R package for air quality data analysis. Environ. Modell. Softw. 27-28, 52–61 (2012).
Article Google Scholar
Arevalo, J. Data processing code for Landscape Evolution Observatory. figshare https://doi.org/10.6084/m9.figshare.12366458 (2020).

Download references

Acknowledgements

We thank the Philecology Foundation, and its founder Edward Bass, for the charitable donation that allowed the construction of LEO, the University of Arizona Office of Research, Discovery, and Innovation and the College of Science for financial support. We also acknowledge financial support from the National Science Foundation through grant EAR-1340912 and the Agnese Nelms Haury Program in Environment and Social Justice.

Author information

Authors and Affiliations

Department of Hydrology and Atmospheric Sciences, University of Arizona, 1133 James E. Rogers Way, Tucson, AZ, 85721, USA
Jorge Arevalo, Xubin Zeng, Guo-Yue Niu & Peter A. Troch
Departamento de Meteorología, Universidad de Valparaíso, Av. Gran Bretaña 644, Playa Ancha, Valparaíso, Chile
Jorge Arevalo
Biosphere 2, University of Arizona, 32540 S Biosphere Road, Oracle, AZ, 85623, USA
Xubin Zeng, Matej Durcik, Aaron Bugaj, Wei-Ren Ng, Minseok Kim, Greg Barron-Gafford, Joost van Haren, Guo-Yue Niu, John Adams, Joaquin Ruiz & Peter A. Troch
Department of Astronomy/Steward Observatory, University of Arizona, 933 N Cherry Avenue, Tucson, AZ, 85721, USA
Michael Sibayan
Department of Geosciences, Georgia State University, 38 Peachtree Center Avenue, Atlanta, GA, 30303, USA
Luke Pangle
Department of Geosciences, University of Arizona, 1040 E Fourth Street, Tucson, AZ, 85721, USA
Nate Abramson & Joaquin Ruiz
School of Geography and Development, University of Arizona, 1064 E Lowell Street, Tucson, AZ, 85721, USA
Greg Barron-Gafford
Department of Soil, Water and Environmental Science, University of Arizona, 1177 E. 4th Street, Tucson, AZ, 85721, USA
Joost van Haren
Honors College, 1101 East Mabel Street, Tucson, AZ, 18719, USA
Joost van Haren

Authors

Jorge Arevalo
View author publications
You can also search for this author in PubMed Google Scholar
Xubin Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Matej Durcik
View author publications
You can also search for this author in PubMed Google Scholar
Michael Sibayan
View author publications
You can also search for this author in PubMed Google Scholar
Luke Pangle
View author publications
You can also search for this author in PubMed Google Scholar
Nate Abramson
View author publications
You can also search for this author in PubMed Google Scholar
Aaron Bugaj
View author publications
You can also search for this author in PubMed Google Scholar
Wei-Ren Ng
View author publications
You can also search for this author in PubMed Google Scholar
Minseok Kim
View author publications
You can also search for this author in PubMed Google Scholar
Greg Barron-Gafford
View author publications
You can also search for this author in PubMed Google Scholar
Joost van Haren
View author publications
You can also search for this author in PubMed Google Scholar
Guo-Yue Niu
View author publications
You can also search for this author in PubMed Google Scholar
John Adams
View author publications
You can also search for this author in PubMed Google Scholar
Joaquin Ruiz
View author publications
You can also search for this author in PubMed Google Scholar
Peter A. Troch
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J. Arevalo: data compilation, quality control, manuscript preparation, and manuscript revisions. X. Zeng: paper conception, manuscript suggestions and manuscript revisions. M. Durcik: data acquisition and quality control. M. Sibayan: data acquisition and quality control. L. Pangle: sensors installation. N. Abramson: sensors installation. A. Bugaj: sensors installation. W. Ng: data acquisition, quality control, and manuscript revision. M. Kim: experiments history. G. Barron-Gafford: sensors array design. J. van Haren: sensors array design. G. Niu: sensors array design. J. Adams: sensors array design. J. Ruiz: LEO’s design guidance. P.A. Troch: sensors array design, manuscript suggestions, and manuscript revisions.

Corresponding author

Correspondence to Jorge Arevalo.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files associated with this article.

Reprints and permissions

About this article

Cite this article

Arevalo, J., Zeng, X., Durcik, M. et al. Highly sampled measurements in a controlled atmosphere at the Biosphere 2 Landscape Evolution Observatory. Sci Data 7, 306 (2020). https://doi.org/10.1038/s41597-020-00645-5

Download citation

Received: 28 February 2020
Accepted: 14 August 2020
Published: 15 September 2020
DOI: https://doi.org/10.1038/s41597-020-00645-5

Subjects

Abstract

Similar content being viewed by others

A high spatial resolution land surface phenology dataset for AmeriFlux and NEON sites

SOIL-WATERGRIDS, mapping dynamic changes in soil moisture and depth of water table from 1970 to 2014

Six years of high-resolution climatic data collected along an elevation gradient in the Italian Alps

Background & Summary

Methods

Site and instruments description

Experiments and precipitation control in LEO

Instruments calibration and uncertainty

Data aggregation

Data Records

Technical Validation

Quality control overview

Missing and constant data

Values out of range

Lifting of the masts

Outliers

Temporal consistency

Spatial consistency

Checks for the external weather station

Quality control results

Usage Notes

Gap filling of the flagged data

Quality control usage

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links