A global dataset to parametrize critical nitrogen dilution curves for major crop species

Precise management of crop nitrogen nutrition is essential to maximize yields while limiting pollution risks. For several decades, the critical nitrogen (N) dilution curve - relating plant biomass (W) to N concentration (%N) - has become a key tool for diagnosing plant nutritional status. Increasing number of studies are being conducted to parameterize critical N dilution curves of a wide range of crop species in different environments and N-fertilized conditions. A global synthesis of the resulting data is lacking on this topic. Here, we conduct a systematic review of the experimental data collected worldwide to parametrize critical N dilution curves. The dataset consists of 36 papers containing a total of 4454 observations for 19 major crop species distributed in 16 countries. The key variables of this dataset are the W and %N collected at three or more sampling times, containing three or more fertilizer N rate levels. This dataset can guide the development of generic critical N dilution curves, helps scientists to identify factors influencing plant N status, and leads to the formulation of more robust N recommendations for a broad range of environmental conditions. Measurement(s) biomass • N concentration Technology Type(s) field data collection • field data collection and lab analysis Factor Type(s) nitrogen nutrition index Sample Characteristic - Organism field crops Sample Characteristic - Environment agricultural systems Sample Characteristic - Location global dataset Measurement(s) biomass • N concentration Technology Type(s) field data collection • field data collection and lab analysis Factor Type(s) nitrogen nutrition index Sample Characteristic - Organism field crops Sample Characteristic - Environment agricultural systems Sample Characteristic - Location global dataset


Background & Summary
Nitrogen (N) is one of the most limiting factors for agricultural productivity, and mineral N fertilizers represent a key input for many cropping systems worldwide 1 . However, the over-application of N fertilizers to field crops has a significant impact on the environment through release of greenhouse-gas emissions and air pollutants 2,3 , groundwater pollution, and eutrophication of freshwater 4 and marine ecosystems 5 . Increasing N use efficiency could reduce the footprint of N fertilization in agricultural systems while ensuring a high level of production. This goal could be achieved through more precise N rate recommendations and a better application schedule adapted to the crop N requirements.
Defining the optimal N fertilization is a daunting task, due to the large uncertainties in predicting soil N supply 6 , plant growth and N demand 7 . Nitrogen fertilization is often either insufficient (under high fertilizer prices) or excessive when growers adopt conservative overfertilization strategies 8 . Therefore, for improving N management and recommendation guidelines, both soil and plant processes should be considered for defining optimal N fertilization rates and application schedules.
Improving our understanding of the co-regulation of N uptake by the availability of N from the soil and plant N demand is relevant to define the overall critical N plant concentration (%Nc), herein defined as the minimum plant N concentration (%N) required to achieve maximum crop mass (W) for a given range of crop species in different environments and N-fertilized conditions. This critical value %Nc is known to decline over time as W increases, and the relationship between %Nc and W defines a function named "critical N dilution curve" 9 . This function is very useful to conduct crop N diagnosis because it allows the computation of the N nutrition index (NNI), defined as the ratio of the actual %N (%Nact) to the value of %Nc corresponding to the observed value of W (Wact). A NNI value close to 1 indicated N sufficiency, while a NNI below or above 1 reveal N deficiency and excessive N status, respectively.
Critical N dilution curves were established for several major field crops, including wheat (Triticum aestivum L.) 10 ,winter oilseed rape (Brassica napus L.) 11 , maize (Zea mays L.) 12 , potato (Solanum tuberosum L.) 13 , rice (Oryza sativa L.) 14 , and tall fescue (Festuca arundinacea Schreb.) 15 , among other crops. These curves have been parametrized using either data collected in single experiments or data set pooled from multi-year and multi-sites experiment networks. These curves are now used by many agricultural scientists and engineers for improving fertilization guidelines based on NNI. Many investigations were recently conducted to develop site-specific critical N dilution curves in different parts of the world for many crop species [10][11][12][13][14][15] . However, the scientific value of such local critical N dilution curves is debatable because these curves were parametrized using a small number of local data, leading to high uncertainty and to a risk of false discoveries 16 . Although sensitivity analyses were performed in some of these studies, they generally suffer from inadequate protocols and methods 17 . Thus, there is a need for generic and robust critical N dilution curves valid across a large number of crop species in different environments and N-fertilized conditions 18 . To date, any attempt to develop generic critical N dilution curves has been hampered by the lack of a reliable large-scale dataset. Yet, a global dataset including a large number of pairs of W and %N values is required to drive research efforts on N plant status diagnosis.
Here, we present a new dataset resulting from a systematic search of all data made publicly available on critical N dilution curves. Our dataset contains 4454 observed pairs of plant W and %N for 19 major crop species collected between 1982 to 2021 in 16 countries worldwide. These data were extracted from a total of 36 peer-reviewed scientific manuscripts. It offers a unique source of information for developing generic critical N dilution curves and identifying knowledge gaps to guide future research programs on N plant status diagnosis.

Methods
Data collection. A literature search was executed between June and August 2021 (2 months) using the following keywords: 'Nitrogen dilution curve' & 'Nitrogen nutrition index' & 'Critical nitrogen concentration' & the name of each crop. The search was performed in the scientific databases 'ScienceDirect' , 'Scopus' , and 'Science Citation Index (Web of Science)' . A last update on this revision to conclude the dataset and include any potential new studies was executed during January 2022. The overall selection process was conducted using the R package revtools 19 . A total of 1612 potentially critical papers were identified by screening the title of the paper, and the following steps on the selection process are highlighted in Fig. 1. The first step filtered papers by excluding studies that did not have relevant keywords and abstract information, resulting in the exclusion of 1124 papers. As the second step, manuscripts not reporting data of W and %N for different sampling times (corresponding to different vegetative stages of fields crops) were excluded; a total of 320 papers were removed at this step. At the third step, relevant studies were selected using the following criteria: i) at least 3 or more sampling dates during the vegetative period are recommended to explore a large variation on both plant W and %Nact to achieve a reasonable level of uncertainty for the critical N dilution curve (Fig. 2), ii) reporting of plant W and %Nact, iii) at least 3 or more fertilizer N rate levels, iv) reporting of crop, location, and year of experiment. In each study, three or even more N fertilizer rates are required to discriminate the non-limiting from the limiting N data, and then to determine the maximum plant W (Wmax) achieved at each sampling date. Data were visually inspected to verify that at least one biomass could be considered obtained under non-limiting nitrogen conditions on each observation date. This step helps to assure that from the studies included within a crop, a plateau of plant W has been achieved reducing the uncertainty for the estimation of the critical plant %N in the dilution model. In addition, the determination of critical %N-W data point require the fitting of a linear and plateau %N -W model (Fig. 2). To avoid issue related to lack of identifiability (lack of sufficient data to allow the estimation of the linear-plus-plateau model and the determination of the critical %N-W model), it is necessary to have at least three fertilizer N rate to fit this type of model (Fig. 3). Based on these criteria, 132 papers were discarded (Fig. 1). The total number of papers retained from this search were 36, each paper provided data for one crop except for two papers providing data for two crops each and another one for four crops, totalizing 41 entries (further details presented in Table 1).
The data from all those papers were manually extracted (with the main plant traits reported or derived), together with relevant details from each study (e.g., author, year of experimentation, fertilizer N rates, description of treatments). If the data were not available in table format, information was retrieved from figures. Data extraction was assisted using the R package juicr 20 . The main crop species, papers, number of observations and geographical distribution of the data are presented in Fig. 3. Among the 36 papers selected (between 1982 and 2021), 33 report data for one crop species, 2 papers report data on 2 crops, and one study provides data on

Data Records
The data are accessible on the figshare repository 21 , available at https://doi.org/10.6084/m9.figshare.19105049. v1, and which includes the following files: 1. "NNI_Database.csv" includes the data. 2. "Summary of the database.docx", includes a summary of the dataset (meta-data), defining each column, trait collected in the data and the units for each variable.     Table 1 describes the main characteristics of the 41 selected studies, including a specific identification number for each study by crop (ID, from 1 to 41 total), species, country for the study location, author, experimental design of the field trial, years where the study was carried out, number and N fertilizer rates, information on crop material, number of total observations, and relevant keyword for the study.
Data of plant %N and W are presented in Fig. 4 for all 19 species across sampling times starting at early vegetative growth stages. The W-%N relationships presented for different crops reveal contrasted plant N status (Fig. 5).
Observed values of W and %N are typically used to fit critical N dilution curves. Fitted curves can help to delineate situations of luxury (excess of N), sufficiency, and deficient (lack of N) plant N status. A recent review of critical N curves obtained for maize crop 18 reported negligible differences across studies. This result revealed that it is more relevant to fit generic critical N dilution curves from a large set of studies covering different environments rather than fitting individual curves to local data. Below, we show that our dataset can be used to establish more universal critical N dilution curve than those generally parametrized from local data.

technical Validation
To demonstrate the practical value of the dataset, data collected can be used to parametrize a generic critical N dilution curve for the given field crop. After the step on checking for outliers, the field crop W and %N data is used to fit a critical N dilution curve using the Bayesian modeling approach proposed by Makowski et al. 22 . The fitted curve for the field crop can be then compared with a reference curve available in the scientific literature. Lastly as the final part of the second step, an independent dataset can be utilized to assess the plausibility of the NNI values derived from the fitted curve compared to the NNI values derived from the reference critical N dilution model established in the literature. Details of all steps are provided below.
Step 1 -Technical validation of the range of biomass and N%. This first step is relevant for inspecting the data and detecting potential outliers. Errors of data extraction were eliminated by comparing the extracted data with tables and figures of the original manuscripts. For each crop and paper, the data were inspected for outliers (by study and sampling time) based on the interquartile range (IQR) rule detection method 23 and with boxplots, as summarized in the Supplementary material. Any observation with W or %N beyond the threshold of 1.5 difference for the IQR to third quartile was pinpointed as a potential outlier for each trait documented for those studies (Fig. 5).
Step 2-Fitting a critical N dilution curve for maize. For this step, a case study was established for maize field crop utilizing a subset of the plant W-%N data from the 11 studies on this crop analyzed using a Bayesian model for estimating coefficients of the critical N dilution curve (Fig. 7). A standard equation was used to relate %Nc to W, specifically %N c = A1 × W −A2 , where A1 and A2 are two parameters [9][10][11][12]18,22 . The modeling and validation of the critical N dilution curve were performed into four phases: (i) A pre-processing of the data was conducted following the approach used by Plénet and Lemaire 12 for maize crop. First, the data were filtered to include plant W above 1 Mg ha −1 . In addition, dates of measures were selected to include maize vegetative and reproductive growth until silking plus 25 days (or approximately until milk stage, R3 growth stage). (ii) A Bayesian hierarchical model was fitted to the data following the procedure defined by Makowski et al. 22 .
A Markov chain Monte Carlo algorithm (MCMC) was implemented using the R package rjags 24 . The algorithm was first run with three chains of 50,000 iterations each. Convergence was achieved approximately near 50,000 iterations according to visual inspection of the trace plots and Gelman-Rubin diagnosis. These first 50,000 samples were discarded as "burn-in", and the algorithm was run again during 100,000 additional iterations to determine the posterior medians and 95% credible intervals of A1 and A2 for maize crop. (iii) The results obtained were used to compute the posterior distribution for the critical N dilution curve. The critical N dilution for maize crop was estimated from 1 Mg ha −1 to the maximum value of W in the dataset. Median and 95% credible intervals were determined and plotted against the reference curve established for maize crop by Plénet and Lemaire 12 (Fig. 7A). (iv) The critical N% determined using the Bayesian procedure on this data was compared against the corresponding values estimated by the reference curve for maize 12 . This reference N dilution curve was derived from five studies located in France. To avoid any bias due to the use of the same data for fitting and testing, an independent dataset of four maize studies [25][26][27] was used to compute NNI using the two fitted critical N dilution curves (Bayesian N curve fitted to our global dataset vs. the traditional model for maize 12 ). Error metrics describing the agreement between NNI values were computed using the metrica package 28 . The root mean square error (RMSE) and mean absolute error (MAE) are standard measures of prediction accuracy, quantifying the average magnitude of the errors in the predictions. The concordance correlation coefficient (CCC) is a measure of both accuracy and precision of the model, quantifying the agreement between an estimated and a reference value. From these metrics, the CCC reflected a strong agreement between both models (CCC = 0.96) and both the RMSE (RMSE = 0.053) and MAE (MAE = 0.046) confirmed that the critical N dilution curve developed with this dataset is very similar to the well-established reference curve 12 available for maize (Fig. 7B). Lastly, the lack of departure of the two critical N dilution curves from the 1:1 line also confirm that these two models derived in similar NNI values. Thus, these   www.nature.com/scientificdata www.nature.com/scientificdata/ critical curves are not statistically different, confirming the past findings for maize crop from Ciampitti et al. 18 . Likewise, Fernandez et al. 15 reported a universal critical N dilution curve for 14 environments and N-fertilized conditions for tall fescue. A similar method could be used to fit critical N dilution curves for other species from our data.

Usage Notes
The current data set can be used to develop critical N dilution curves for the diagnostic of N nutritional status of wide range of crop species in different environments and N-fertilized conditions. Specifically, our dataset can be used to fit critical N dilution curves, assess their uncertainties, and determine the statistical significance of the difference between two critical N dilution curves.
In addition, the current data set could be used for testing the universality of critical N dilution curves among crop species. For example, when comparing the N dilution curves of multiple grasses (annual, hybrid and perennial ryegrass, oat, rescue grass, timothy grass, and wheat crop), we found that the parameters of the model (A1 and A2) did not differ significantly (Fig. 8). For the different species, the N dilution models were benchmarked with the critical N dilution curves established in previous studies. Results show that the curves were relatively similar for all species considered. Results also reveal the uncertainty (reflected as the length of the 95% credibility interval) is higher for the A2 parameter than for the A1 parameter, except wheat crop. The level of uncertainty depends on the number of observations within a study and on the total number of studies for a crop. When the number of data is small, the determination of the critical N curve can produce estimates with large uncertainty (wide credibility intervals).
In the future, our dataset could be easily updated with data generated by new studies and, also, with previous studies where data were not available. Studies to be incorporated in this global dataset may be sent to the corresponding author (IAC, ciampitti@ksu.edu). The goal of our global initiative is expand the dataset by including more data and involve more collaborators. The ambition is the make the largest possible amount of W and %N data available and stimulate the development of reliable critical N dilution curves. In addition to %N, this database could be expanded in near future to cover other nutrients such as phosphorus, sulfur, and potassium, and their interaction with other environmental factors such water stress. Such a collaborative approach may set a milestone in plant physiology and nutrition from which more universal and reliable models could be developed to improve fertilizer management practices and reduce their environmental footprint.

Code availability
Scripts using R programming language are provided to produce figures. Additional code and related files are available at figshare repository 21 : https://doi.org/10.6084/m9.figshare.19105049.v1.