A global dataset for crop production under conventional tillage and no tillage systems

No tillage (NT) is often presented as a means to grow crops with positive environmental externalities, such as enhanced carbon sequestration, improved soil quality, reduced soil erosion, and increased biodiversity. However, whether NT systems are as productive as those relying on conventional tillage (CT) is a controversial issue, fraught by a high variability over time and space. Here, we expand existing datasets to include the results of the most recent field experiments, and we produce a global dataset comparing the crop yields obtained under CT and NT systems. In addition to crop yield, our dataset also reports information on crop growing season, management practices, soil characteristics and key climate parameters throughout the experimental year. The final dataset contains 4403 paired yield observations between 1980 and 2017 for eight major staple crops in 50 countries. This dataset can help to gain insight into the main drivers explaining the variability of the productivity of NT and the consequence of its adoption on crop yields. Measurement(s) yield trait • crop Technology Type(s) digital curation Factor Type(s) precipitation balance • temperature • crop species • soil texture • management of crop rotation • management of soil cover • management of field fertilization • management of weed and pest control • management of crop irrigation • geographic location Sample Characteristic - Organism Viridiplantae Sample Characteristic - Environment cultivated environment Sample Characteristic - Location global Measurement(s) yield trait • crop Technology Type(s) digital curation Factor Type(s) precipitation balance • temperature • crop species • soil texture • management of crop rotation • management of soil cover • management of field fertilization • management of weed and pest control • management of crop irrigation • geographic location Sample Characteristic - Organism Viridiplantae Sample Characteristic - Environment cultivated environment Sample Characteristic - Location global Machine-accessible metadata file describing the reported data: https://doi.org/10.6084/m9.figshare.13303118


Background & Summary
Often featured among promising climate change mitigation measures, NT systems, including conservation agriculture (CA), contribute to environmental preservation and sustainable agricultural production 1,2 . NT is expected to mitigate soil degradation, improve soil structure and water retention properties [3][4][5] . Several studies indicate that this cropping system can provide a large range of positive environmental externalities such as increased biodiversity, enhanced carbon sequestration and improved soil quality through an increase in soil organic matter 6-10 . However, the productivity of NT systems compared to conventional cropping systems is still controversial. Since the productivity of NT depends on several interacting factors such as climatic conditions 11 , soil characteristics 1,12 , and other agricultural management activities [13][14][15][16][17][18][19] , the potential of NT to increase agricultural productivity remains highly uncertain.
Several studies 1,12,20-22 have been conducted to synthetize the current evidence on the productivity in NT systems. Some of these studies relied on global datasets including results of field experiments comparing NT and CT cropping systems over a large range of soil and climate conditions. However, these datasets do not include the most recent published experiments, and provide no or limited information on soil characteristics, climate variables, and management practices. In particular, information on fertilizer inputs, weed and pest control, and intraand inter-annual climatic variability are frequently missing. Other studies comparing NT and CT rely on a limited number of experiments, are only conducted at a regional scale, or did not make their data fully available 23-25 . Thus, a global dataset reporting findings from the most recent field experiments and including information about a wide range of climatic parameters, soil characteristics and agricultural management practices is still lacking.
To address this gap, we present an updated and extended dataset comparing CT and NT productivity including the most recently published experimental studies, and a detailed description of their environmental characteristics and management practices. Our dataset contains 4403 paired (NT vs. CT) yield observations collected between 1980 and 2017 for eight major staple crops in 50 countries. For each experiment, we provide information on soil texture, pH, the year and month of crop planting and harvesting, the location of the experiment, fertilization, weed and pest control practices, crop type, crop rotation, crop residue management, and crop irrigation. Besides soil characteristics and information on management practices, we also report a large range of climate variables derived from several external databases. These include precipitation, potential evapotranspiration, average www.nature.com/scientificdata www.nature.com/scientificdata/ temperature, maximum temperature, and minimum temperature during the crop growing season. This dataset can prove useful to disentangle the effects of soil, climate and agronomic drivers of crop yields when comparing NT with CT systems.

Methods
Data collection. The literature search was done in February 2020 using the following keywords 'Conservation agriculture/No-till/No tillage/Zero tillage' & 'Yield/Yield change' in the websites 'ScienceDirect' , 'Science Citation Index (web of science)' . A total of 1012 potentially relevant papers were identified by reviewing the title and abstract, and these papers were then screened according to the procedure summarized in Fig. 1. Papers not reporting yield data for CT and NT systems were excluded, as well as papers reporting experiments on reduced tillage (RT) systems. Papers reporting only mean yield data across different years or sites were also excluded. We then checked whether information on fertilization, weed and pest control, crop irrigation, crop rotation and crop residue management were reported for both CT and NT practices. After these screening and selection steps, all relevant data were manually extracted from the selected papers, including general information about the paper, location and year of the experiment, the number of years under NT when the crop was sown, soil characteristics, crop growing season, crop type, crop management practices and crop yield of CT and NT. However, due to a large number of missing data, the crop growing season, climatic variables and soil characteristics were finally collected through several external databases (Online-only Table 1). The growing season information was generated from a crop calendar database 26,27 based on the crop type and the locations of the experiments reported in the papers. The precipitation, average temperature in the growing season were extracted from the UDel_AirT_Precip data provided by NOAA/OAR/ESRL PSL 28 . The maximum and minimum air temperature during the growing season were generated from CPC Global Temperature data provided by NOAA/OAR/ESRL PSL 29 and the potential evapotranspiration data over the growing season were extracted from GLEAM database 30,31 . Soil textures were collected from the HWSD database 32 using the locations of the experimental sites reported in the selected papers  • Crop rotation with at least 3 crops involved (based on the crop rotation principle of CA defined by FAO 34 ): "Yes", "No", "Not reported". The details of crop sequence are also provided. • Soil cover: "Yes", "No", "Mixed", "Not reported". Details of residue management for the previous crops are also provided. • Weed and pest control: "Yes", "No", "Mixed", "Not reported". • Field fertilization: "Yes", "No", "Mixed", "Not reported". The details of N input and other fertilizer input are also provided. • Crop irrigation: "Yes", "No", "Mixed", "Not reported". The details of the amount of water applied are also provided.
Category IV contains detailed information about the experiment site, experiment setting, management activities, depending on the papers, it may also include the type and quantity used of fertilizer, herbicide, or pesticide. www.nature.com/scientificdata www.nature.com/scientificdata/ Category V corresponds to data related to crop yield. It includes the paired crop yields under CT (Yield CT ) and NT (Yield NT ) systems. The relative yield changes is defined as The column "Yield increase with NT" reports whether the differences between Yield CT and Yield NT are positive or negative ("Yes" indicates that crop yield is increased with NT practice, while "No" indicates that yield is not increased).
Category VI includes data extracted from the external databases, including crop growing season, climate variables (including precipitation, potential evapotranspiration, minimum/average/maximum temperature) during the growing season, and soil texture.
Crop growing season is defined by a start month and end month, which were extracted from the external crop calendar databases 26 of spring barley, winter barley, cotton, maize, rice, sorghum, soybean, sunflower, spring wheat and winter wheat based on the crop type and study sites. Data on the crop calendar corresponds to averaged data and does not change intra-annually, thus the growing season extracted may be different from the actual growing season.
The climatic variables from the external databases are: • Accumulated precipitation (P) during the growing season (sum of monthly precipitations during the growing season), • Accumulated potential evapotranspiration (E) (sum of monthly evapotranspiration rates during the growing season), • Precipitation balance (PB), defined as PB = P -E 12 , • Average air temperature (Tave) during the growing season, • Maximum air temperature (Tmax): the maximum value among the daily temperatures in the growing season, • Minimum air temperature (Tmin): the minimum value among the daily temperatures in the growing season.
Soil texture was extracted from an external database based 32 on the experiments' locations. In total seven texture classes were included: sandy loam, loam, silt loam, sandy clay loam, clay loam, sandy clay and clay.

technical Validation
To ensure the reliability of the information collected from the papers, we carefully checked and compared all the collected data with the original paper several times. Quality control of the database was conducted based on outlier detection. For each crop, the outliers of crop yield in CT system and NT system were identified based on the Interquartile Rule 35 outlier detection method. For each crop species, an interquartile range (IQR) is defined as the difference between the first and third quartile of crop yield, and a threshold is calculated by adding 1.5 IQR to the third quartile. Any yield data beyond this threshold is flagged as an outlier for the crop species considered. The ratio of crop yield in NT and CT systems ( ) Yield Yield NT CT were also calculated. All outliers and all the observations with a ratio higher than 2 were checked and compared with the values reported in the original papers one more time.
The crop yield values reported in our dataset are consistent with results of previous published studies. Comparing crop yield data of NT and CT, the adoption of NT practice overall leads to a yield decrease (Fig. 4a). A similar trend of crop production decrease with NT was reported in previous studies 1,12,20,36 . We also find that the combination of NT with crop rotation and soil cover (known as CA) trends to increase crop yield compared to www.nature.com/scientificdata www.nature.com/scientificdata/ NT practice without rotation and soil cover (Fig. 4a), which is also in line with previous studies 1, 16,37 . Further analysis conducted on each crop confirms that NT tends to decrease the yield of maize 1 , rice 21 , and wheat 1 (Fig. 4b).
The productivity of NT is found higher under dry conditions compared with wetter conditions (Fig. 4c), and similar trends were reported in previous studies 1,12 .
Usage notes. Our dataset can be used to analyze the factors influencing the productivity of NT (or CA) vs.
CT. It is possible to train machine learning models to predict the probability of yield increase with NT (or CA) system (e.g. Supplementary Figs. 1 and 2) or the range of yield changes resulting based on the soil type, climate and agronomic inputs provided by this dataset. Global maps of probability of yield increase with NT (or CA) or the range of yield changes can be generated based on the outputs of machine learning models trained with our dataset, and enable policy-makers or agricultural advisors to identify the most promising regions for CA implementation. Details of how to train machine leaning models with our dataset are provided in Supplementary Materials 38-450 .
The crop yield data for 2018 and later can be extracted from the identified papers, but since some key climatic variables are missing in the external database for this time period (in particular, evapotranspiration), those data are not listed in the dataset provided. We will update the dataset once we have the latest data access to the missing climate variables.
Importantly, our dataset could be easily updated using data produced by new experiments. We welcome anyone interested to share data or papers not included in this meta-database to send them to the corresponding author (YS, yang.su@inrae.fr). We will maintain and add the new observations in the future to expand our dataset with the latest experimental data.

Code availability
Scripts using the R and MATLAB programming language are provided to produce figures and extract data from external databases. Additional code and related files are available from the corresponding author upon request.

Fig. 4
Comparison of crop yield between NT and CT systems. The boxplots indicate the distributions (min, 1 st quartile, median, 3 rd quartile, max) of the log yield ratio of NT to CT. The mean log yield ratios of NT to CT were calculated based on a linear mixed effect model and marked as the red diamonds in the boxplots. Statistical tests were conducted to test the significance of the estimated values, ***indicates P-value < 0.001, **indicates P-value < 0.01, *indicates P-value < 0.05, indicates P-value < 0.1. Plot (a) shows the mean log ratios for different types of NT systems vs. CT systems. NT overall represents all the experiments involving NT systems in the dataset, NT -R-SC represents the NT systems without crop rotation and without soil cover, NT +R+SC represents the CA systems or NT systems with both crop rotation and soil cover, and CT is the corresponding control in the experiments. Plot (b) shows the mean log ratios for different crop species. S. indicates spring, while W. indicates winter. Plot (c) shows the mean log ratios for different levels of PBs, corresponding to different level of water stress.