Background & Summary

With nearly 90% of flowering plant species benefiting from the services of pollinators to set seed and produce fruit, pollinators are an essential component of healthy and diverse ecosystems and contribute significantly to food production1,2,3,4. However, populations of both wild and managed pollinators are facing serious challenges5. Population declines have been documented in several bee and butterfly species6,7,8, including the Eastern population of the monarch butterfly (Danaus plexippus), as indicated by strong reductions in overwintering colony size since the mid-1990s9. U.S. beekeepers lose around a third of their managed honey bee colonies each year10. The causes of pollinator declines are multifaceted and somewhat distinct for different taxa, but current evidence suggests that wild bees, honey bees, and butterflies share at least two key stressors: habitat loss and pesticide exposure5,11,12. Habitat loss limits the food and nesting resources available to support pollinator populations, while exposure to pesticides can kill pollinators outright or lead to sublethal effects on behavior, immunity, and reproduction5,11,12,13,14. Furthermore, use of herbicides may influence pollinators indirectly by reducing the availability of their food plants15.

In the past decade, researchers have made significant progress in developing models to predict pollinator abundance and ecosystem services as a function of the landscape. For wild bees, the ‘Lonsdorf model’ translates land cover to abundance of nest sites and seasonal floral resources (predicted based on expert opinion), and combines this with flight ranges to derive indices of bee abundance and pollination service on each cell on a landscape16,17; the model has been adapted for honey bees as well18. For monarch butterflies, researchers recently developed a spatial model that simulates the annual cycle of the Eastern monarch population, identifying regions where conservation actions could enhance monarch population stability19. Despite the significant value of current pollinator models based on resource availability, they could be improved by incorporating patterns of pesticide use.

There have been three main obstacles to incorporating pesticide use into landscape-scale research on pollinator health. First, although the U.S. has a substantial amount of public data on pesticide use, pesticide toxicity, and land use, these data are distributed across disparate government databases, each with idiosyncratic nomenclature and organization. Second, the mosaic of pesticide use data that are available are reported mainly at scales of counties, states, or national averages. In contrast, pollinator populations are structured at smaller spatial scales; for example bee foraging ranges are typically hundreds to thousands of meters20. Finally, there are hundreds of common pesticide active ingredients that vary by many orders of magnitude in their toxicity to pollinators13. Transforming pesticide use into relevant units of toxicity can help to evaluate aggregate effects21,22,23,24.

Here, we synthesize several large, publicly available datasets on pesticide use, land use, and insecticide toxicity to generate novel datasets describing pesticide use patterns. In a previous study21, we generated county-level estimates for total bee toxic load for insecticides across the contiguous U.S. Here, we produce pesticide datasets by active ingredient (in kg and kg/ha, 1997–2017) and aggregate insecticide loading (in kg, kg/ha, honey bee lethal doses, and lethal doses/ha, 1997–2014) for state-crop-year combinations in the contiguous U.S. Since crop type explains much of the variation in pesticide use25,26,27,28, we suggest ‘downscaling’ pesticide use by matching state-level, crop- and year-specific pesticide use averages to land use estimated through remote sensing. We describe a method to map our pesticide use estimates to existing land use data. Through this approach, it is possible to generate maps of predicted bee toxic load and individual active ingredients at 30 m resolution, a finer spatial grain than reported in our county-level analysis, and one more suited to landscape research on pollinator populations and communities. This methodology can readily be adapted to include more refined data that may be locally available or become available in the future, such as toxicity data for other pollinator species or site-specific information about pesticide use. Moreover, while this work is motivated primarily by the effects of pesticides on pollinators, the estimates and mapping method we describe have potential application in a wide array of settings ranging from water quality monitoring to human epidemiology.

Methods

Overall strategy

The aim of this project was to synthesize publicly available data on land use, pesticide use, and toxicity to generate a ‘toolkit’ of data resources enabling improved landscape-scale research on pesticide-pollinator interactions. The main outcomes are several novel datasets covering ten major crops or crop groups in each of the 48 contiguous U.S. states:

  1. I)

    Average application rate (kg/ha/yr) of >500 common pesticide active ingredients (1997–2017),

  2. II)

    Aggregate bee toxic load (honey bee lethal doses/ha/yr) of all insecticides combined (1997–2014), (Note that this dataset ends in 2014 because after that year, data on seed-applied pesticides were excluded29, and these contribute significantly to bee toxic load21)

  3. III)

    Reclass tables relating these pesticide-use indicators to land use/land cover classes to enable the creation of maps predicting annual pesticide loading at 30–56 m resolution.

An overview of the steps, inputs, and outcomes are provided in Fig. 1.

Fig. 1
figure 1

Overview of the data synthesis workflow described in this paper.

Data inputs

A summary of input datasets is provided in Table 1.

Table 1 Data inputs used in this study.

Pesticide data

Pesticide use data were last downloaded from the USGS National Pesticide Synthesis Project30,31 in June 2020. This dataset reports total kg applied of 508 common pesticide active ingredients by combinations of state, crop group, and year for the contiguous U.S. from 1992–2017 (crop groups explained in Table 2). The data are derived primarily from farmer surveys conducted by a private firm (Kynetec). For California, USGS obtains data from the state’s pesticide use reporting program32. USGS then aggregates and standardizes both data sources into a common national dataset that is released to the public and was used in this effort. The USGS dataset includes both a ‘high’ and a ‘low’ estimate of pesticide use, varying based on the treatment of missing values in the source data31. Because previous work on this dataset suggested that the ‘low’ estimate more closely matches independent pesticide estimates33, we used the ‘low’ estimate throughout, but assess the influence of this choice on the resulting estimates (see Technical Validation). While we focus on the ‘low’ estimate for the data and outputs presented in this manuscript, the workflow we developed can accommodate both the low and high estimates.

Table 2 USGS crop categories in pesticide source data, based on metadata from USGS30,31 and personal communication with USGS staff scientists.

Crop area data

To translate pesticide use estimates into average application rates, it was necessary to divide total kg of pesticide applied by the land area to which it was potentially applied. Crop area data were last downloaded from the Quick Stats Database of the USDA34 in May 2020, using data files downloaded from the ‘developer’ page. This USDA dataset contains crop acreage estimates generated from two sources: the Census of Agriculture (Census), which is comprehensive but conducted only once every five years35 and the crop survey conducted by the National Agricultural Statistics Service (NASS), which is an annual survey based on a representative sample of farmers in major production regions for a more limited subset of crops36.

Honey bee toxicity data

Translating insecticide application rates into estimates of bee toxic load (honey bee lethal doses/ha/yr) required toxicity values for each insecticide active ingredient in the USGS dataset. We used LD50 values for the honey bee (Apis mellifera) because this is the standard terrestrial insect species used in regulatory procedures, and so has the most comprehensive data available. This species is also of particular concern as an important provider of pollination services to agriculture. As previously reported21, the LD50 values were derived from two sources, the ECOTOX database37 of the U.S. Environmental Protection Agency (US-EPA), and the Pesticide Properties Database (PPDB)038. ECOTOX was queried in July 2017, by searching for all LD50 values for the honey bee (Apis mellifera) that were generated under laboratory conditions. Acute contact and oral LD50 values for the honey bee were recorded manually from the PPDB in June 2018.

Land cover data

Mapping pesticides to the landscape requires land use/land cover data indicating where crops are grown. We used the USDA Cropland Data Layer (CDL)39, a land cover dataset at 30–56 m resolution produced through remote sensing. This dataset is available starting in 2008 for states in the contiguous U.S., with some states (primarily in the Midwest and Mid-South) available back to the early 2000s.

Data preparation

Relating datasets

A major challenge in this data synthesis effort was relating the various data sources to each other, given that each dataset has unique nomenclature and organization. We created the following keys (summarized in Table 3) to facilitate joining datasets:

  1. I)

    USGS-USDA crop keys – Using documentation and metadata associated with the USGS pesticide dataset31,33,40, we created keys relating the USGS surveyed crop names (‘ePest’ crops) and the ten USGS crop categories to the large number of corresponding crop acreage data items in the Census and NASS datasets. For annual crops and hay crops we used ‘harvested acres,’ and for tree crops we used ‘acres bearing & non-bearing.’ These choices were made to maximize data availability and to correspond as closely as possible to the crop acreage from which the pesticide data were derived31. A separate key was developed for California because California pesticide data derives from different source data and covers a larger range of crops.

  2. II)

    USGS-CASRN compound key – Using USGS documentation as well as background information on pesticide active ingredients38,41, we generated keys relating USGS active ingredient names to chemical abstracts service (CAS) registry numbers to facilitate matching compounds to the ECOTOX and PPDB databases.

  3. III)

    USGS compound-category key – In this key we classified active ingredients into major groups (insecticides, fungicides, nematicides, etc.) and into mode-of-action classes on the basis of information from pesticide databases and resistance action committees38,41,42,43,44.

  4. IV)

    USGS-USDA compound key – To facilitate our data validation effort, we generated a key relating USGS compound names to USDA compound names, on the basis of information from several pesticide databases38,41.

  5. V)

    USGS-CDL land use-land cover keys – Using documentation from the USGS pesticide dataset describing the crop composition of each of the ten crop categories31, we created a key that matches these categories to land cover classes in the CDL. A separate key was developed for California given the differences in surveyed crops in this state, noted above.

Table 3 Keys generated to relate datasets.

Processing crop area data

Because of differences in the crops included in pesticide use estimates, crop acreage data were processed separately for California and for all other states, and then re-joined, as follows: Acreage data were first filtered to include only data at the state level, reporting total annual acreage for states in the contiguous U.S. after 1996. Acreage data were joined to the appropriate USGS-USDA crop key and only those crops represented in the pesticide dataset were retained. We then generated an acreage dataset with single rows for each combination of crop, state, and year using data from the Census when available (1997, 2002, 2007, 2012, 2017), data from NASS in non-Census years, and temporal interpolation to fill in remaining missing values (i.e. linear interpolation between values in the same state and crop in the nearest surrounding years). This process was repeated for California, using acreage data for only that state in combination with the CA crop key. Finally, acreage data in the two datasets were recombined, converted to hectares, and summed by USGS crop group.

Processing honey bee toxicity data

Processing for the honey bee toxicity data has been described in detail elsewhere21. Briefly, toxicity values were categorized as contact, oral, or other and standardized where possible into µg/bee. Records were retained if they represented acute exposure (4 days or less) for adult bees representing contact or oral LD50 values in µg/bee. To generate a consensus list of contact and oral LD50 values for all insecticides reported in the USGS dataset, we gave preference to point estimates and estimates generated through U.S. or E.U. regulatory procedures, taking a geometric mean if multiple such estimates were available. Unbounded estimates (“greater than” or “less than” some value) were only used when point estimates were unavailable, using the minimum (for “less than”) or the maximum (for “greater than”). If values for a compound were unavailable in both datasets, we used the median toxicity value for the insecticide mode-of-action group. And finally, in rare cases (n = 1/148 compounds for contact toxicity and 8/148 compounds for oral toxicity) we were still left without a toxicity estimate for a particular insecticide. In those cases, we used the median value for all insecticides.

Data synthesis

Compound-specific application rates for state-crop-year combinations

USGS data on pesticide application were joined to data on crop area. Average pesticide application rates were calculated by dividing kg applied by crop area (ha) for each combination of compound, crop group, state, and year.

Aggregate insecticide application rates for state-crop-year combinations

The dataset from the previous step was filtered to include only insecticides, and then joined to LD50 data by compound name. Bee toxic load associated with each insecticide active ingredient was calculated by dividing the application rate by the contact or oral LD50 value (µg/bee) to generate a number of lethal doses applied per unit area. These values were then summed across compounds to generate estimates of kg and bee toxic load per ha for combinations of crop group, state, and year.

Missing values were estimated using temporal interpolation, where possible (i.e. linear interpolation between values in the same state and crop group in the nearest surrounding years). This dataset ends in 2014 because after that year seed-applied pesticides were excluded from the source data29, and they constitute a major contribution to bee toxic load21.

We focused bee toxic load on insecticides for three reasons. First, quality of LD50 data is highest for insecticides and uneven for fungicides and herbicides. Point estimates make up the majority of LD50 values for insecticides, whereas < 25% of herbicide and fungicide LD50 values are represented by a point estimate (i.e. a majority of these compounds have a best estimate of the form “ > 100 µg/bee”, increasing the uncertainty of downstream estimates). Second, insecticides tend to have greater acute toxicity toward insects than fungicides and herbicides (median [IQR] LD50 = 100 [44–129] µg/bee for fungicides, 100 [75–112] µg/bee for herbicides, and 1.36 [0.16–12] µg/bee for insecticides). As a result, insecticides account for > 95% of bee toxic load nationally, even when herbicides and fungicides are included (and even though insecticides make up only 6.5% of pesticides applied on a weight basis). Third, focusing these values on insecticides increases their interpretability, reflecting efforts directed toward insect pest management, rather than a mix of insect, weed, and fungal pest management (which often have distinct dynamics and constraints for farmers).

While we chose to include only insecticides in this aggregate value, users are welcome to adjust the workflow to include fungicides and herbicides if desired. To this end, we provide our best estimates for LD50 values for fungicides and herbicides in the USGS dataset (Table 4).

Table 4 Data outputs generated by this study.

Reclassification tables

To generate reclassification tables for the CDL, the pesticide datasets described above were joined by crop group to CDL land use categories. The output of these processes was a set of reclassification tables for combinations of compound, state, and year. Also generated was a set of reclassification tables for aggregate insecticide use for combinations of state and year.

Of the 131 land use categories in the CDL, 16 represent two crops grown sequentially in the same year (double crops, found on ~2% of U.S. cropland in 201245), which required a modified accounting in our workflow. Pesticide use practices on double crops are not well described, but one study suggested that pesticide expenditures on soybean grown after wheat were similar to pesticide expenditures in soybean grown alone46. Therefore, we assumed that pesticide use on double crops would be additive (e.g. for a wheat-soybean double crop, the annual pesticide use estimate was generated by summing pesticide use associated with wheat and soybean).

Missing values in the reclassification tables resulted from several distinct issues. Some values were missing because a particular crop was not included in the underlying pesticide use survey (e.g. oats was not included in the Kynetec survey), or because the land use category was not a crop at all (e.g. deciduous forest). These two issues were indicated with values of ‘1’ in columns called ‘unsurveyed’ and ‘noncrop,’ respectively. For double crops, a value of 0.5 in the ‘unsurveyed’ column indicates that one of the crops was surveyed and the other was not. For compound-specific datasets, missing values may reflect that a given compound was not used in a state-crop group-year combination. For the aggregate insecticide dataset, even after interpolation there were some missing values, usually when a state had very little area of a particular crop or crop group.

Finally, missing data for double crops were treated slightly differently in the aggregate vs. compound-specific reclassification tables. For the aggregate insecticide dataset, estimates for double crops were only included if estimates were available for both crops; otherwise the value was reported as missing. For the compound-specific datasets, estimates for double crops were included if there was an estimate for at least one of the crops, since specific compounds may be used in one crop but not another.

Data Records

All data files generated in this work are available in .csv format on figshare47. A summary of output data files is provided in Table 4.

Technical Validation

Validity of pesticide use estimates

To assess the validity of the dataset, we compared our estimates for specific compounds to those available from the USDA’s Agricultural Chemical Use Program48 for the period 1997–2017. While this dataset is not as comprehensive as the USGS dataset in terms of its coverage of states, years, crops, and active ingredients, the values that are available can be compared to USGS values for total kg applied and estimates of kg/ha/yr. An important limitation of this comparison is that the USDA dataset does not include seed-applied pesticides, affecting the estimation of certain fungicide and insecticide active ingredients29.

For combinations of state, crop, and active ingredient, the USDA survey reports several values that are relevant to this effort: total amount (kg) of pesticide applied, percent area treated, and application rate (kg/ha) on treated hectares. Application rates across the two datasets are not directly comparable because USDA reports the rate (kg/ac) applied on treated acres, whereas our dataset estimates the rate (kg/ha) averaged across all crop area in a state. To test the agreement between our estimates and the USDA estimates we compared:

  1. I)

    Total amount applied (kg),

  2. II)

    Application rate (kg/ha) calculated by dividing the USDA total kg by our estimate of crop area (ha) (Method 1),

  3. III)

    Application rate (kg/ha) derived indirectly by multiplying the USDA average application rate on treated acres by the percent of area treated (Method 2).

We focused on major field crops (corn, soy, wheat, cotton, rice) because they are the most frequently surveyed by USDA and have the closest one-to-one match between datasets. We suspect that estimates for fungicide and insecticide use may be more precise in high-value fruit and vegetable crops, where these pesticides are more widely used25,27,28. However, fruit and vegetable crops could not be included in the validation because estimates are incommensurable across the two datasets: USGS reports pesticide use for crop groups like ‘orchards and grapes’ while USDA reports pesticide use on a small subset of individual crops in major production states (e.g. apples in Washington).

For each set of comparisons, we calculated and visualized relative percent difference between the two estimates, visualized the correlation between estimates, and calculated Spearman’s and Pearson’s correlation coefficients for each major pesticide class (fungicides, herbicides, insecticides).

Estimates for total amount applied and per-ha application rate were strongly correlated for fungicides and herbicides, with Pearson’s correlation coefficients > 0.85 for all comparisons (Table 5, Fig. 2). Estimates for insecticides were well correlated on the basis of ranks (Spearman’s rho = 0.85) but only weakly linearly related (Pearson’s r = 0.20–0.38, Table 5, Fig. 2). This pattern was driven by malathion in cotton, which had very low estimates in the USGS dataset and fairly high estimates in the USDA dataset, for reasons unknown. It is possible that this discrepancy is related to use of malathion in boll weevil eradication49. Once these outliers were removed (n = 30 out of 1600 + observations), estimates for insecticides were well correlated for all three comparisons (Pearson’s r > 0.75).

Table 5 Statistical results from cross-validation comparing USDA pesticide use estimates to the estimates generated in this study from USGS pesticide use data and USDA crop area data.
Fig. 2
figure 2

Correlation between USDA pesticide use estimates and novel estimates generated from USGS data reported in this paper. Dotted black line shows USDA = USGS. For application rate, Method 1 (M1) compared our estimate to one calculated by dividing the USDA total kg by our estimate of crop area (ha), and Method 2 (M2) compared our estimate to one calculated by multiplying the USDA average application rate on treated hectares by the percent of area treated. Each point represents a combination of state, crop, year, and active ingredient. Outliers (malathion in cotton [n = 30] and copper sulfate in rice [n = 2]) were removed before plotting.

Among pesticide categories, the median relative difference between USGS-based datasets and USDA estimates was smallest for herbicides and largest for insecticides (Table 5). For fungicides, the median relative difference did not differ significantly from zero for any of the three comparisons, indicating a lack of consistent over- or under-estimation. For herbicides, median relative difference was not significantly different from zero for total kg or application rate (Method 1). However, it was significantly positive for application rate (Method 2), indicating a tendency for the USGS-based estimate to be slightly larger than the USDA-based estimate (median difference of 3%). For insecticides, median relative difference was significantly less than zero for all comparisons (−10–14%), indicating that the USGS-based estimates are conservative compared to the USDA estimates. Median relative difference decreased slightly when malathion outliers were removed (−8–12%), but were still significantly less than zero. This result may derive from using the USGS ‘low’ pesticide use values (see Sensitivity analyses).

Importantly, relative difference between estimates declined with percent of crop area treated (Fig. 3), suggesting that estimates are more precise for active ingredients that are widely used. This pattern may help to explain why median relative difference between datasets was smallest for herbicides and largest for insecticides, given that field crops formed the basis of the validation. In field crops, herbicide use is very widespread while use of traditional insecticides and fungicides is more limited28. Since the mid-2000’s the most widely applied insecticides in field crops have been neonicotinoid seed treatments29,50, which are excluded from the USDA survey and by extension, this comparison. Previous work50 suggests that USGS estimates of neonicotinoid use in major field crops are consistent with independent estimates.

Fig. 3
figure 3

Relationships between the percent of cropland treated with a pesticide and the relative difference between the USDA pesticide use estimates and novel estimates generated from USGS data reported in this paper. Each point represents a combination of state, crop, year, and active ingredient. Relative difference = [(USGS – USDA)/average of the two values] × 100. Method 1 (M1) compared our estimate to one calculated by dividing the USDA total kg by our estimate of crop area (ha), and Method 2 (M2) compared our estimate to one calculated by multiplying the USDA average application rate on treated hectares by the percent of area treated.

Geographic coverage

To characterize the geographic coverage of our datasets, we calculated the percent of total land area and agricultural area in each state that is included in the underlying pesticide use survey, as well as the percent of agricultural area that is represented by a crop-specific estimate (vs. an estimate for a group of crops such as “vegetables and fruit”). To do this, we downloaded the CDL for all states in even years, calculated the area in each CDL category, joined the data to the CDL-USGS land use/land cover key, and aggregated the area depending on whether it was surveyed, unsurveyed, or non-agricultural. In most states, the survey covers > 95% of agricultural land (> 80% of agricultural land in all states, Fig. 4). States on the lower end of this range have significant area of regionally important yet unsurveyed crops (e.g. blueberries in Maine, cranberries in Massachusetts, grass seed in Oregon). The majority of crop area is associated with crop-specific pesticide estimates, although this is not true for all states (Fig. 4). Double crops contributed < 5% to agricultural area in most states, except North Carolina, Maryland, and Delaware, where they comprised 5–23% of agricultural area. As expected, survey coverage of total land area was variable among states, ranging from < 10% in states dominated by forest (e.g. New Hampshire) or shrubland (e.g. Nevada), to > 80% in states with abundant cropland (e.g. Iowa).

Fig. 4
figure 4

Pesticide survey coverage for the 48 contiguous states in even years from 2008–2016. (left) Percent of total land area represented by surveyed land uses, (center) percent of cropland represented by surveyed land uses (where ‘surveyed’ means included in the underlying pesticide use survey), and (right) percent of cropland (excluding pasture/hay) represented by a crop-specific estimate in the underlying pesticide use survey (crop-specific categories included corn, cotton, soybeans, alfalfa, wheat, and rice).

Sensitivity analyses

We conducted several additional analyses to quantify the sensitivity of our results to the methods employed in generating the dataset. To test the influence of using the low vs. the high USGS pesticide estimate, we regenerated the dataset using the high estimate and compared validation metrics. Consistent with previous analyses33, the high estimate from USGS tended to exceed the USDA value, with median relative differences ~30% for fungicides, 10–12% for herbicides, and 6–10% for insecticides. For all active ingredients combined, the median relative differences were 10–12% for the high estimate, versus -0.1-1% for the low estimate. The low estimate was therefore closer to the USDA value except in the case of insecticides, in which the low estimate underestimated the USDA value to a slightly greater degree than the high estimate overshot. However, correlations between USDA and USGS estimates were similarly strong regardless of whether the high or low estimate was used (Pearson’s r > 0.75 for all comparisons), as was the correlation between high and low USGS estimates themselves (Pearson’s r = 0.95). Both findings suggest that relative patterns in the data are robust.

Another concern related to the low vs. high USGS pesticide estimate is that missing data are more prevalent in the low dataset. Across the entire compound-specific dataset, 17% of crop-state-year-compound combinations that are present in the high dataset are absent in the low dataset (see https://land-4-bees.github.io/bee_tox_index/validation_low_v_high.html). Missing data are most prevalent in alfalfa and pasture/hay (37% and 33%, respectively), land uses with generally low pesticide use intensity26,51. They are least prevalent in vegetables and fruit (7%), and orchards and grapes (8%), high-value crop types that tend to have high pesticide use intensity26,27,52,53. It is important to recognize that missing data may be ‘real’, i.e. reflect combinations of crops, states, and years in which a given pesticide was not used. There is no simple way to know whether this is the case or that the data are missing due to limitations in the methodology generating the estimates (see further discussion from USGS31). Given that the low estimate better matches independent estimates, we decided to focus our analyses on the low dataset. That said, the workflow we provide can be adapted to generate estimates from the high values or interpolate to fill missing values if that is desired (see code at https://land-4-bees.github.io/bee_tox_index).

To determine the potential uncertainty introduced by interpolating crop area when it was missing, we investigated the amount of total area contributed by interpolated values across the dataset. Across all years combined, interpolated area was a small percentage of most crops or crop groups (<4% of total area), with the exceptions of ‘orchards and grapes’ (76%), ‘pasture and hay’ (76%), and ‘vegetables and fruit’ (36%). Estimates for land area in these groups relied heavily on interpolation between Census years. We are confident in the interpolated values because they are mostly for perennial crops, so it is less likely that the acreage would change substantially in less than 5 years. That said, for the most accurate estimates of crop area, we recommend using the pesticide datasets matching Census years (1997, 2002, 2007, 2012, 2017).

We also investigated the influence of interpolated values for aggregate insecticide indicators. While interpolated values occupied a significant proportion of surveyed agricultural area (59%), they contributed < 1.5% to total insecticide load in kg or honey bee lethal doses. This pattern was driven by extensive interpolation for ‘pasture and hay,’ a land use category that occupies significant land area but typically has very low insecticide inputs. Conversely, land use categories with heavy insecticide inputs (e.g. cotton, ‘vegetables and fruits’, ‘orchards and grapes’) had little to no interpolation.

Usage Notes

Functions to generate maps

One of the main goals of this project was to develop a methodology to generate maps of predicted pesticide use and bee toxic load. We provide two R functions (available in the ‘beecoSp’ package, https://github.com/land-4-bees/beecoSp) to reclassify land cover into predicted pesticide application rate (kg/ha) for combinations of active ingredients, states, and years, or to reclassify land cover into predicted bee toxic load (honey bee LD50/ha) for all insecticides combined. The ‘reclasstables’ function facilitates the creation of reclass tables for particular state-year combinations from the master files provided here, while the ‘CDL_reclass’ function uses such a reclass table to reclassify a land use raster, with the option to calculate the mean of the new raster. The inputs include an appropriate land cover raster file (from the CDL) and an appropriate master reclassification file from those accompanying this paper. Examples of the input file and output of maps for a particular active ingredient and for bee toxic load are shown in Fig. 5. Alternatively, for users interested in performing reclassification in another GIS program, reclassification tables may be generated from the datasets produced here or downloaded from the website described below.

Fig. 5
figure 5

Maps illustrating the conversion of land cover to predicted landscape loading of agricultural pesticide use. Maps are shown for Pennsylvania in 2012 representing (a) input raster from the Cropland Data Layer, and output rasters illustrating (b) coverage in the underlying pesticide dataset, (c) bee toxic load for all insecticides combined on an oral-toxicity basis, (d) the insecticide imidacloprid, (e) the herbicide glyphosate, and (f) the fungicide mefenoxam. Resolution of maps was reduced for plotting.

Insecticide Explorer web application

To enable users to explore insecticide use patterns and easily download reclassification tables for bee toxic load, we created an interactive website. The website allows users to generate graphs describing trends in national and state-level total and per ha insecticide use (in kg and honey bee lethal doses per year) from 1997–2014 (‘Explore’ tab). There is also a section of the website allowing download of reclassification tables for bee toxic load for particular state-year combinations (‘Data’ tab). The application can be accessed at the following URL: https://insecticideexplorer.shinyapps.io/insecticideexplorer/.

Limitations and possible adjustments

Users should be aware of several limitations of these data. First, the pesticide use estimates include only pesticides applied to agricultural land, and so do not account for pesticide applications on other land uses, i.e. urban and unmanaged areas. Similarly, for minor crops (e.g. fruits and vegetables) the estimates of pesticide use represent weighted averages for crop groups rather than crop-specific estimates, and some crops were not surveyed and so are not included. The weighting within crop groups is influenced by both the areas of contributing crops in a particular state and year, and their corresponding pesticide use intensities. If users have access to independent pesticide use estimates for non-agricultural land covers, unsurveyed crops, or crops included in larger categories, the reclassification tables can be adjusted or supplemented to better represent pesticide use on those land cover classes.

Second, these estimates represent crop-state-year pesticide use averages, not field-specific application. We used aggregated data to predict finer scale pesticide use patterns because i) field-level pesticide use data are unavailable for the vast majority of the U.S., and ii) crop groups, states, and years capture significant variation in pesticide use25,26,27,28. That said, we acknowledge that the uncertainty associated with these estimates at the field level is unknown; there will be significant local variation not reflected in these estimates. Furthermore, the resulting maps are constrained by the quality of the underlying land cover data, which are most accurate for large-acreage crops and major production regions and least accurate for minor crops and minor production regions54,55. These estimates and associated maps are therefore most suited to applications in which the goal is to characterize or compare pesticide use patterns on a relatively broad spatial scale (e.g. comparing across multiple sites within or across states and years where landscapes vary in crop composition), and are not suitable for applications in which site-specific or field-scale accuracy is paramount (e.g. predicting pesticide use on an individual farm). We caution against inferring location-specific pesticide use from these data. It is still an open question how detailed pesticide use data need to be to elucidate pollinator-pesticide linkages; we hope the datasets we provide will enable research clarifying this issue.

Third, the temporal dimension of these estimates is limited by the source data, which are reported annually, and complete only through 2014, after which seed-applied pesticides are excluded. The annual nature of the data constrains their use in mechanistic risk assessments, which require more specific information about site, method, and time of application. A key issue in the exposure of pollinators will be the degree of temporal overlap of insecticide application and persistence with pollinator activity in and around contaminated matrices (e.g. soil for ground-nesting species, floral resources)14. Given the annual grain of these data, we expect them to best reflect the hazard to species that are active for a large portion of the year (e.g. honey bees, bumble bees) and for compounds with long systemic or residual activity (e.g. many neonicotinoid applications). The exclusion of seed treatments after 2014, and in California for all years, is problematic because seed-applied pesticides are increasingly prevalent in U.S. agriculture29 and seed-applied neonicotinoids represent the most important contributor to national bee toxic load calculated based on oral toxicity21,22. We encourage users to investigate temporal patterns in use estimates for their target area to determine the most appropriate course to take for years after 2014. In some cases, projecting earlier use rates to later years – while clearly imperfect – may be the best available option until more valid public data becomes available.

There are also important limitations related to the ‘bee toxic load’ estimate for all insecticides combined. Most obviously, because they are based on honey bee LD50 units, these estimates are most relevant to honey bees and closely related species. The literature suggests that broad patterns of toxicity are similar between honey bees and other bee species56, but there is significant variation among species, which is more pronounced when more distant insect taxa are considered57,58. If a quantitative estimate of lethal doses for some other bee species is desired, at minimum, users should consider adjusting for body weight (honey bee workers weigh ~120 mg each59). Also, because these estimates are based on acute toxicity to adult honey bees, they do not reflect chronic toxicity or effects on other life stages such as larvae. It is up to the user to decide if toxic load on a contact- or oral-toxicity basis, or application rate of specific active ingredients, is most appropriate for a given application. Finally, because we summed across active ingredients, our estimate does not account for potential synergy between them.

And lastly, these estimates represent loading of pesticides to the landscape but not persistence or movement in the environment. These data could be combined with additional information to model fate in the environment and exposure of recipient organisms.