Emissions of pesticides in the European Union: a new regional-level dataset

We present a European Union (EU)-wide dataset of estimated quantities of active substances of plant protection product applied on crops (also called “emissions”). Our estimates are derived from data reported by eight EU countries and extrapolated to encompass all EU regions using regression models. These models consider both climate and agricultural land use data. This allows us to spatially represent pesticide use at NUTS Level 3 of the European statistical mapping units, and within various agricultural land cover classes in each region. We compare our estimates with aggregated data provided by EUROSTAT and with independent, detailed data for the United Kingdom, highlighting an error typically within one order of magnitude. Our estimates can provide insights into the distribution and patterns of pesticide use in the EU around the year 2015. The estimate is most reliable for Western and Southern Europe. Outside these regions, data scarcity makes extrapolation more uncertain, potentially limiting the ability to accurate depict regional variations in pesticide use.


Supplementary Material 1 Table of contents
NUTS (Nomenclature of Territorial Units for Statistics) is a geocode standard used by the European Union to divide up the territories of its member countries.The NUTS classification is hierarchical and divides countries into larger regions, smaller regions, and finally, local administrative units.NUTS Level 3 refers to the third level in this hierarchy, which is the level at which relatively small regions are defined within a country.These regions are designed to be homogeneous with respect to certain socio-economic factors and are often used for statistical purposes.NUTS level 3 units are used as a way of aggregating pesticide use data according to small geographic regions within European countries.This allows for a more detailed and nuanced analysis of pesticide use patterns than would be possible using large administrative units.The average area of a NUTS level 3 unit can vary significantly between different European countries, as it depends on factors such as population density and geographic size.

Spatial model of emission data from selected countries
As already mentioned, one of the focal point of this analysis is to retrieve data on use/sales of individual pesticides for a number of European Countries at a reasonable spatial scale.Therefore, a workshop was organized with Experts from European Countries at the beginning of 2019 with this main objective (Galimberti, F., Dorati, C., Udias, A., Pistocchi, A., 2020), and the gathered data are listed in the following table.A different Country-to-Country spatial and temporal variability of the data stands out immediately, so that a harmonizing process was necessary to create a unique dataset according to a robust template.In addition, some other technical aspects have been taken into account, such as: pesticide ids and names in the different languages; data provided on pesticide as real use vs sales data; data provided on active substances vs pesticide products; specifics on application on crops and crop application coverage (Country main peculiar crops vs all crops).
The provided data refer to different time trends and years.When possible, data have been harmonized using the average value in order to obtain a hypothetical average year of pesticide application.
Due to the spatial (referring NUTS) and crop variability in terms of different crop groups and different details on single crops, a specific analysis was performed to harmonize all the provided agricultural information to a unique template: a combination of Eurostat NUTS2 agricultural dataset (https://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=ef_lus_allcrops&lang=en) and Corine Land Cover NUTS3 was used as base spatial template (Galimberti, F., Dorati, C., Udias, A., Pistocchi, A., 2020).
Real uses and pesticide sales have been treated as similar data and managed accordingly.All the retrieved data were reduced to the pesticide active substance, so that where data were expressed as pesticide products, the effective kilograms were calculated according to the content percentage of active ingredient/s in the pesticide products.In the workshop, experts have also provided information on pesticide authorized uses on crops/ group of crops, so it was possible to spatialize the amount of pesticides on crop areas.National and Regional data of pesticide loads were spatialized according to the distribution percentage of crops within each NUTS3.
The resulting values were stored into dataset based on kg of each active substance, per specific crops, by NUTS3.

Crop codes in each Corine Land Classes
Table SI 2 shows which EUROSTAT crop categories are associated with each of the six land cover classes.
https://www.eppo.int/ACTIVITIES/plant_protection_products/registered_productsAn example of a product label for France could be the CAMIX, which can be viewed at the following address: https://ephy.anses.fr/ppp/camixCAMIX with an S-metolachlor content of 400 g/L product and also contains other active substances: benoxacore : 20 g/L mesotrione : 40 g/L

Fig. SI 1 :
Fig. SI 1:Boxplots by AS categories presenting the statistical distribution of model prediction quality based on the R2 (coefficient of determination) metric.The data reflect the results of the 10-fold cross-validation process for the best model in each AS applied to the arable class.For the SMAPE metric, a result closer to "1" indicates a better outcome.

Fig. SI 2 :
Fig. SI 2: Boxplots by AS categories presenting the statistical distribution of model prediction quality based on the NRMSE (normalized root mean square error) metric.The data reflect the results of the 10-fold crossvalidation process for the best model in each AS applied to the arable class.For the SMAPE metric, a result closer to zero indicates a better outcome.

Fig. SI 3 :
Fig. SI 3: Boxplots by AS categories presenting the statistical distribution of model prediction quality based on the SMAPE (Symmetric Mean Absolute Percentage Error) metric.The data reflect the results of the 10fold cross-validation process for the best model in each AS applied to the arable class.For the SMAPE metric, a result closer to zero indicates a better outcome.

Fig. SI 4 :
Fig. SI 4: Boxplots by AS categories presenting the statistical distribution of model prediction quality based on the NMAE (normalized mean absolute error) metric.The data reflect the results of the 10-fold crossvalidation process for the best model in each AS, applied to the arable class.For the NSE metric, a result closer to zero indicates a better outcome.

Fig. SI 5 :
Fig. SI 5: Boxplots by AS categories presenting the statistical distribution of model prediction quality based on the NSE (Nash-Sutcliffe model efficiency coefficient) metric.The data reflect the results of the 10-fold cross-validation process for the best model in each AS, applied to the arable class.For the NSE metric, a result closer to one indicates a better outcome.

Fig. SI 1 :
Fig. SI 1: Boxplots by AS categories presenting the statistical distribution of model prediction quality based on the R2 (coefficient of determination) metric.The data reflect the results of the 10-fold cross-validation process for the best model in each AS applied to the arable class.For the SMAPE metric, a result closer to "1" indicates a better outcome.

Fig. SI 2 :
Fig. SI 2: Boxplots by AS categories presenting the statistical distribution of model prediction quality based on the NRMSE (normalized root mean square error) metric.The data reflect the results of the 10-fold crossvalidation process for the best model in each AS applied to the arable class.For the SMAPE metric, a result closer to zero indicates a better outcome.

Fig. SI 3 :
Fig. SI 3: Boxplots by AS categories presenting the statistical distribution of model prediction quality based on the SMAPE (Symmetric Mean Absolute Percentage Error) metric.The data reflect the results of the 10fold cross-validation process for the best model in each AS applied to the arable class.For the SMAPE metric, a result closer to zero indicates a better outcome.

Fig. SI 4 :
Fig. SI 4: Boxplots by AS categories presenting the statistical distribution of model prediction quality based on the NMAE (normalized mean absolute error) metric.The data reflect the results of the 10-fold crossvalidation process for the best model in each AS, applied to the arable class.For the NSE metric, a result closer to zero indicates a better outcome.

Fig. SI 5 :
Fig. SI 5: Boxplots by AS categories presenting the statistical distribution of model prediction quality based on the NSE (Nash-Sutcliffe model efficiency coefficient) metric.The data reflect the results of the 10-fold cross-validation process for the best model in each AS, applied to the arable class.For the NSE metric, a result closer to one indicates a better outcome.

Table SI 1: Data availabilityTable SI 2
: CLC codes (corresponding with crops) included in each main agricultural class according to EUROSTAT, into 6 main Corine Land Classes)

Table SI 3
: Goodness of fit metrics, namely NRMSE (Normalized Root Mean Square Error) and Nash-Sutcliffe Efficiency (NSE), for models M1 through M6 with respect to five active substances (AS): Glyphosate, Tebuconazole, Dimethomorph, Thiacloprid, and Esfenvalerate.The values of the goodness of fit metrics have been derived based on logarithmically transformed reference and modeled values

Table SI 3
: Goodness of fit metrics, namely NRMSE (Normalized Root Mean Square Error) and Nash-Sutcliffe Efficiency (NSE), for models M1 through M6 with respect to five active substances (AS): Glyphosate, Tebuconazole, Dimethomorph, Thiacloprid, and Esfenvalerate.The values of the goodness of fit metrics have been derived based on logarithmically transformed reference and modeled values.Statistical distribution of the result for the cross-validation process according to different metrics