The pollination services provided by bees are critical for supporting healthy and diverse natural and agricultural ecosystems1. Continuing declines documented in populations of wild and managed bees across the world thus pose significant threats to the stability of these systems2,3. Declines in bee populations have been attributed to several factors4. Extensive habitat loss and degradation results in a dearth of floral resources and nest sites which has contributed to loss of wild bee abundance and diversity5,6. Bee losses, especially those of honey bees (Apis mellifera) and bumble bees (Bombus spp.), have more recently been ascribed to rising levels of novel bee pathogens3,7. For example, the microsporidian Vairimorpha (= Nosema) bombi (hereafter, Vairimorpha) has been identified as a major contributor to bumble bee population losses, leading to the extirpation and near extinction of several bumble bee species in North America8,9. Several bee viruses also contribute to declining honey bee survival7. These honey bee viruses often spill over into native bee populations10,11,12, where they impose negative effects on wild bee health7,13. In the US, the toxic load to bees of insecticides applied in agricultural landscapes has increased significantly over the last few decades14. Exposure to insecticides, particularly neonicotinoids, can lead to a variety of negative effects on native bees and managed honey bees alike, including compromised learning and foraging capabilities and reduced reproductive output2,15,16,17. More recently, climate change has been added to the list of major threats to global bee populations, as large-scale geographic analyses identified climate shifts as a contributing factor to local bumble bee extirpation18. Given the myriad of threats to pollinator populations, it can be difficult to tease apart the relative importance of each of these factors to bee health.

Determining the leading threats to bee populations is further complicated by interactions among these stressors in the landscape, specifically many of the factors known to undermine bee health (such as poor nutrition or exposure to pesticides) can increase susceptibility to disease19. Bees are more likely to be nutritionally deprived in landscapes with fewer and less diverse flowering plants, and nutritional deprivation can reduce immunocompetence and increase pathogen and parasite load19,20. For example, honey bees fed diets lacking pollen had higher loads of viruses21 and diets with low pollen species diversity increased honey bee mortality when bees were infected with Vairimorpha22. Caged bumble bees (B. terrestris) exhibited increased mortality from viruses only if starved23. Nutritional deprivation is likely to act through altering immune function, as immune genes are more highly expressed when forage is abundant24.

Pesticide exposure can also reduce bee immunocompetence, resulting in increased susceptibility to pathogens and parasites25. Honey bees have higher viral loads after exposure to neonicotinoids26, KATP channel agonists27, and organosilicones28, with suppression of immune genes implicated in this response in neonicotinoid-fed bees26. Acaricides have been found to increase mortality in virus-infected bees25 and exposure to a neonicotinoid and fungicides increased loads of Vairimorpha in honey bees29,30. Similarly, a study in bumble bees using data compiled from across the United States found that Vairimorpha prevalence in declining bumble bee species is best predicted by usage of the insecticide clothianidin31.

The incidence and loads of a particular pathogen or parasite in bee populations is also likely influenced by bee population density and the composition of bee communities. Viruses, ectoparasites (such as mites) and endoparasites (including Vairimorpha spp.) can be transmitted through interactions from parent to offspring, within colonies of social bees, or within and between bee species when bees co-forage on flowers7,32. Landscapes in which floral resources are limited can have increased transmission risk of pathogens and parasites through more extensive sharing of limited resources33, and viral prevalence in honey bees and bumble bees has been shown to increase with colony density34. Similarly, as most viruses can be shared among individual bees and even different species10,35, incidence of viruses and parasites in wild bumble bees has been found to be heavily influenced by the presence of honey bees, which often harbor high pathogen loads12,36.

Given all these interacting factors, disease prevalence and virulence in a population of bees can be challenging to model or predict in wild bee populations37. Quantifying how different stressors in the landscape impact bee disease pathogen prevalence in the wild would mark an important step in our understanding of pollinator disease ecology. Much of the previous work laying the foundation for our understanding of the factors influencing bee health has been lab-based. Some small-scale studies (e.g., Refs.38,39) have helped to understand how pathogen patterns manifest in the field, however, understanding how disease patterns arise in nature remains largely theoretical. Understanding of which factors are most likely to contribute to pathogen loads in the wild may be best gained through examining trends occurring across larger, more diverse landscapes (cf., Refs.18,32). Herein we evaluate wild bee pathogen loads to assess the relative role of diverse landscape characteristics on infectious disease prevalence, and more generally bee health, in wild bumble bee populations. In particular, we explore how three pathogens (deformed wing virus (DWV), black queen cell virus (BQCV), and Vairimorpha (= Nosema) bombi), as well as expression of an immune gene that upregulated in response to a range of pathogens (Defensin40), vary in wild bumble bees (B. impatiens) over two years and across varied landscapes in Pennsylvania, USA. We examined the extent to which several potential landscape-scale stressors—floral abundance, nesting habitat quality, insecticide loading, climatic factors, and interactions with managed honey bees—drive variation in pathogen loads in bumble bees. Not only do such analyses allow us to examine the environmental factors contributing to pathogen loads, but, given that bee disease is so influenced by environmental stressors, loads of bee pathogens in populations can be an indicator of which environmental factors may be most detrimental to overall bee health.


Pathogen incidence

For pathogen analysis, we collected workers of B. impatiens, as it is the most abundant species across our Pennsylvania study region. We collected 890 workers from sites spanning a diversity of habitats across Pennsylvania, USA (Fig. 1A), for 2–3 weeks during the peak of bumble bee abundance (late June–mid July), and across 2 years, including 21 sites in 2018 (n = 310 bees) and 41 sites in 2019 (n = 580 bees). Screens of loads of pathogens among pooled bees (n = 5 bees/pool) from each site (typically 3 pools/site), performed via quantitative PCR, revealed that most bumble bee samples (97%) contained BQCV and that BQCV exhibited a broad range of pathogen load values. DWV was much less common, with several samples (29%) lacking notable DWV detection (Fig. 2), and most of those samples that did test positive for DWV exhibiting low loads. Honey bee samples from the same sites (unpublished data) had much higher loads of DWV, suggesting that low loads were not due to issues with our detection protocol, but rather that B. impatiens tends not to harbor high loads of this pathogen. Vairimorpha detections were sporadic, detected in about 40% of samples, but usually at low loads. All bumble bee samples exhibited some degree of expression of the immune gene, Defensin (Fig. 2).

Figure 1
figure 1

Map depicting locations (circles) where we sampled Bombus impatiens workers across 38 counties in Pennsylvania. Hollow circles indicate locations where sites were sampled in 2018 only, shaded circles indicate sites sampled in 2019 only and black circles indicate locations sampled in both years (A). Also shown are spatial patterns of several important habitat covariates: honey bee colony density (B), agricultural insecticide loading (C), spring floral resources (D), and forest cover (E). For maps (BE), darker shades indicate higher values for each variable.

Figure 2
figure 2

Histograms of the six response variables evaluated in Bombus impatiens workers across 38 counties in Pennsylvania: Defensin (top left), black queen cell virus (BQCV) (top center), deformed wing virus (DWV) (top right), Vairimorpha (bottom left), combined pathogens (bottom center; scaled to zero) and marginal cell length (bottom right). Shown on the X-axes are normalized expression values for DWV, BQCV, Vairimorpha and defensin, normalized to the amount of EF-1α in each sample.

Landscape correlates with pathogen loads

Pathogen loads inferred for DWV, BQCV, Vairimorpha, and a combined pathogen index were modeled against several landscape predictor variables in two separate modeling analyses (tiers) using ranked candidate sets of linear mixed-effects models (see “Statistical analyses” section, below). Our first model tier (Tier 1) examined associations between pathogen loads and key landscape indices: floral resource quality in spring and summer, nesting resource quality, insecticide loading, and honey bee abundance, with each year modeled separately. Tier 1 models revealed higher loads of BQCV at sites with lower quality spring floral resources (both years), lower quality nesting resources (2018), and higher honey bee colony density (2019; Supplementary Table S2). These models suggested higher DWV loads at sites with lower quality summer floral resources (2018), lower quality nesting resources (2019) and intermediate insecticide exposure (2019; Supplementary Table S4). Vairimorpha infection had no strong covariate relationships, as a null model was supported in Tier 1 models (Supplementary Table S3). Our Tier 1 combined pathogen index indicated higher B. impatiens pathogen loads with lower quality spring floral resources (2018), lower quality nesting resources (2019), and higher honey bee colony densities (2018; Fig. 3; Supplementary Table S5). These results resemble the BQCV results, which is not surprising given that BQCV had the highest infection rates among the three pathogens. Tier 1 Defensin expression in both years indicated support for a null model, suggesting no strong covariate relationships with any predictor (Table 1; Supplementary Table S1), and our analyses revealed no relationship between Tier 1 covariates and marginal cell size (Table 1; Supplementary Table S6), suggesting that B. impatiens body size was not significantly influenced by these landscape variables.

Figure 3
figure 3

Functional relationships for mixed-effects regression analyses of bumble bee (Bombus impatiens) pathogen loads across Pennsylvania landscapes (2018–19). Shown are relationships for our ‘combined’ pathogen index (scaled) which included Vairimorpha, black queen cell virus, and deformed wing virus infection. Only patterns with statistical support based on Akaike’s Information Criterion are shown. Points on each graph indicate data points while black lines indicate model estimates and gray shaded regions are 95% confidence intervals. Pathogen loads were regressed against honey bee colony density (# colonies within 5 km; 2018; top left), spring floral resources (2018; top right), nesting habitat quality (2019; center left), latitude (2019; center right) and longitude (2018; bottom left).

Table 1 Habitat variables associated with metrics of bumble bee (Bombus impatiens) health across varied landscapes in Pennsylvania 2018–19.

Our second model tier considered the same covariates as model Tier 1, and also considered additional variables that might be useful predictors of bumble bee health: latitude, longitude, elevation, weather variables, catch-per-effort, local Bombus spp. diversity (H′; measured from the first 20 workers collected per site) and land cover variables (i.e., NLCD percent cover). These models revealed positive relationships between BQCV loads and both longitude (2018) and developed land cover (2018), indicating that more eastern- and developed sites hosted the most BQCV-infected bumble bees (Supplementary Table S2). Likewise, we found relationships between BQCV loads and latitude (negative; 2018), longitude (positive; 2018), forest cover (negative; 2018), natural cover (negative; 2018), developed cover (quadratic; 2018), shrubland cover (negative; 2019), honey bee colony density (positive; 2019), spring growing degree days (quadratic; 2019), summer floral resources (negative; 2018), and spring floral resources (negative; both years; Supplementary Table S2). DWV was most prevalent in sites with low arable cover (2019), intermediate insecticide loading (2019), sparse summer floral resources (2018), high longitudes (i.e., eastern; 2018) and low latitudes (2019; i.e., southerly sites; Supplementary Table S4). Further, our analysis demonstrated that sites with more spring rain hosted bees with greater Vairimorpha loads (2019; Supplementary Table S3). Tier 2 model sets for Defensin expression indicated a positive relationship with percent grassland/pasture cover in 2018 only (Table 1; Supplementary Table S1). Our Tier 2 combined pathogen models indicated that latitude (negative; 2019), longitude (positive; 2018) and spring floral resources (negative; 2018) were the best predictors of overall pathogen loads in B. impatiens across Pennsylvania (Fig. 3; Supplementary Table S5). Finally, as with Tier 1 analyses, we found no relationship between Tier 2 covariates and marginal cell size (Table 1; Supplementary Table S6) suggesting that B. impatiens body size varied independently with respect to covariates in either model tier. Similarly, no significant relationships were found between pathogen loads and bee species diversity or bee abundance per unit sampling time.

Although the variables that were most significant varied between years by pathogen, most trends found in both Tier 1 and Tier 2 variables were consistent in directionality between years, indicating that results were repeatable across years (Fig. 4). These models sought to find the variables that explained the data the most, allowing up to 2 variables to be combined within a model, without including correlated variables in a given combinatorial model. Several of our variables, however, were correlated and may alternatively explain patterns. For example, percent ‘natural’ cover was positively correlated (|r|> 0.70) with the spring floral resource and nesting indices, as well as percent forest cover (Fig. 4). These two variables were also both negatively correlated with percent ‘arable’ land cover. Spring GDD and summer GDD were positively correlated and both were negatively correlated with percent forest cover, percent natural cover, latitude, and elevation (Fig. 4).

Figure 4
figure 4

Correlation matrix for all pairwise independent- (2019 shown) and response variables in 2018 (center) and 2019 (right) for Bombus impatiens workers from 38 counties in Pennsylvania. Values within each cell represent Pearson’s Correlation Coefficient (r). Circle size indicates the magnitude of r with values ≥ 0.70 denoted with a sign overlaying each circle. Habitat variables shown include summer floral (“Sum. Floral”), spring floral (“Spr. Floral”), nesting resources (“Nesting”), insecticide loading (“Insecticides”), honey bee (Apis mellifera) colony density (“Honey Bees”), latitude, longitude, percent shrubland (within 500 m; “Shrubland”), percent urban development (“Developed”), percent arable (“Arable”), percent natural (“Natural”), percent grassland/pasture (“Grass/pasture”), percent forest (“Forest”), and elevation. Additionally shown are weather variables: spring growing degree days (“Spr. GDD”), summer growing degree days (“Sum. GDD”), amount of spring precipitation (“Spr. Precip.”), and amount of summer precipitation (“Sum. Precip.”). Finally, we also included Bombus spp. standardized Shannon diversity (H′; “Spp. Div.”) and number of B. impatiens captures per hour/observer (“CPUE”). Response variables included forewing marginal cell size (“Marg. Cell”), Vairimorpha infection, black queen cell virus infection (“BQCV”), deformed wing virus infection (“DWV”), and Defensin expression.


Our study is among the first to evaluate how different aspects of landscape quality correlate with the distribution and loads of key pathogens and parasites in wild bees. We used generalizable landscape indices of key landscape characteristics known to influence bee health (e.g., forage resource quality, nesting resource quality, and insecticide toxic loads) in our analyses, which allowed us to assess pathogen load patterns over diverse and geographically distributed landscapes. Our results demonstrate that bees sampled from lower quality landscapes have higher loads of pathogens and parasites, with spring floral resources, nesting habitat availability, and honey bee colony density driving the strongest patterns. While clearly these patterns need to be verified across a larger spatio-temporal scale and with more bee species, our results suggest it may be possible to predict potential risks from pathogens and parasites based on these landscape indices. These indices and models can thus help inform decisions as to where habitat restoration and conservation practices should be applied.

Inadequate forage (i.e., floral dearth) has been cited, along with emerging infectious disease, as among the most important threats to global bee populations4. While these two stressors clearly affect bees independently32,41, our study provides support for the hypothesis that nutritional stress (via lack of floral resources) and disease infection may also interact to impact wild bee populations19,42. This is consistent with laboratory-based studies that suggest nutritionally-stressed bees exhibit compromised immunocompetence38 and greater pathogen loads43, and analogous results have been reported in vertebrate systems44,45. Additionally, wild bees may benefit from having a choice of floral secondary plant compounds that may be constrained in conditions of floral dearth, as some such compounds are known to reduce pathogen loads32. Although a negative relationship between floral availability and disease infection seems intuitive, alternative hypotheses predict the opposite pattern whereby abundant floral resources facilitate more interaction between managed- and wild bee populations4,46. Although floral resources were strongly correlated with pathogen loads, we were surprised that features like urban/agricultural development did not predict pathogen loads (Table 1). One potential explanation for this pattern was that such habitats host adequate floral resources to support healthy wild bee populations47. Our results indicate that spring resources, as opposed to summer resources, were more strongly correlated with pathogen loads. The spring is a vulnerable period for bumble bees, as resources can be patchy and phenologically mismatched, while queen bees emerge partially depleted of resources from diapause and must find sufficient resources to successfully start their colonies48. More attention should be drawn to the abundance of spring resources for supporting bumble bees and other wild bees.

Honey bees, which are not native to North America, are well-known disease vectors with the potential to impact wild bees through the introduction of novel pathogens or promotion of those already present within populations4,35. In further support of this, our analyses reveal a positive correlation between managed honey bee colony density and some bumble bee pathogen loads (Table 1). Honey bees host high pathogen loads as compared to ‘background’ wild bee populations, especially for DWV and BQCV (but not Vairimorpha bombi12,36). Pathogen loads present within managed honey bee colonies can be transmitted to wild bees through honey bee foraging activities which usually extend several kilometers around each colony49. While our analysis suggested that higher honey bee density was associated with higher bumble bee pathogen loads, patterns varied by year and disease. Although directionality of honey bee density effect was similar across pathogens and years, honey bee impacts were only significant in 2018 and only for BQCV and our combined pathogen index, but not DWV, Vairimorpha, or our non-pathogen metrics (Defensin and marginal cell length). Moreover, our tier 2 models indicated that variables like latitude and longitude were better explanators of pathogen loads than was honey bee colony density. It is important to note that the presence of viruses commonly found in honey bees within wild bees is not necessarily indicative of pathogenicity, though it is possible that these viruses can become more problematic if wild bees are stressed or immunocompromised due to other factors13,50.

Pesticides pose a significant threat to wild bee populations4,51 and the aggregate hazard of insecticides applied to agricultural landscapes across North America has risen in recent decades as a result of widespread use of neonicotinoid seed treatments in common field crops14. However, our analyses revealed few clear relationships between pathogen loads and insecticide loading, with several possible explanations. To observe an impact on bumble bee pathogen loads, pesticide exposure likely needs to be high enough to suppress bees’ immune systems while remaining sublethal42. To this end, we may not expect a linear increase in pathogen loads with increasing insecticide loading because highly toxic sites have few bees42. Furthermore, Pennsylvania may not have enough variation in insecticide use across our study sites (Fig. 1C) because most of the state is forested52 and such natural habitats typically have relatively low insecticide loads53. Finally, the insecticide index considered here is based on annual, state-wide insecticide application data for croplands, and it is possible that field- and time-specific insecticide data—or data including non-agricultural insecticide uses—may reveal clearer relationships with pathogen loads. Still, even under the best circumstances, linking pesticides to bee health has proven challenging for observational studies like ours51.

In this study, we quantified expression of an immune gene Defensin, considering that this may be a direct metric of overall pathogen strain on the bee. Defensin is one of four antimicrobial peptides produced in bees as an immune response to pathogen infection from bacteria, protozoa, fungi, and viruses, and thus can serve as an indicator of pathogen load40. We, however, found that Defensin expression did not correlate well with any of our landscape factors. The use of immune genes for understanding pathogen loads has had mixed results in the literature, as high levels of nutritional resources can boost both immunocompetence and immune gene expression40. Immune gene expression in response to pathogens is therefore likely to be complex.

One of the most consistent patterns we observed in our study was that pathogen loads varied geographically across our study area. This finding echoes the spatially heterogeneous disease patterns reported for wild bees by other studies12,32. With this in mind, latitude and longitude are likely proxies for variables (or suites of variables) not fully measured in our study. For instance, both BQCV and DWV were least common in northern Pennsylvania. Indeed, northern Pennsylvania supports a suite of landscape characteristics expected to support healthy bee populations; these landscapes tend to support more native habitat, host better floral resources, and fewer managed honey bee colonies (Fig. 1B–E). Indeed, past predictive models of bee abundance suggest northern Pennsylvania as among the highest quality regions in the eastern United States54 and our data seem to support this notion, at least from a disease perspective. Ultimately, the geographic variation we observed in bumble bee pathogen loads during our study highlights the importance of accounting for regional variation in assessments of disease risk for wild bees, especially for widely-dispersed species like B. impatiens55.

Weather patterns can have profound impacts on animal disease dynamics because weather impacts disease agents, vectors, and even host activity56,57. For example, Retschnig et al.58 observed a negative relationship between ambient temperature and Vairimorpha infection rates in honey bee colonies because colder temperatures kept bees from foraging outside the colony and within close proximity to nest-mates. The opposite pattern has been reported for DWV virus titers across a temperature gradient59, however, higher temperatures also reduced bee survival rates. Given the complexity of relationships between insect pathogens and environmental conditions, it is not entirely surprising that our models suggested only modest correlations between weather and pathogen loads (Table 1). The clearest pattern we observed was a positive relationship between spring precipitation and infection by Vairimorpha bombi. Indeed, several previous studies have indicated that spore viability and infection for other Vairimorpha spp. to be positively correlated with rainfall (e.g., Ref.60), presumably because rainfall enhances transmission rates61 (but see Ref.62). Future studies should model more detailed weather parameters than we have to examine their relative impacts.

Although our study marks an important exploration into the macro-ecological patterns of pathogen prevalence in a wild bee population, it is important to keep several important caveats in mind with the interpretation of our results. We observed year-to-year variation in the strength, but not direction, of habitat trends (i.e., Fig. 4). The consistency in general patterns highlights the value of multi-year studies for revealing reliable trends, but the variance suggests that we may have uncovered additional patterns had our study continued beyond two years. Additionally, Pennsylvania is a highly forested state51; analyses of data similar to those presented here from other regions with different ecosystem types (e.g., grasslands, intense agriculture, etc.) would help to decouple co-correlated variables and better understand the relative impacts of different habitats. Furthermore, we did not sample pathogens from sites with very few bees (i.e., those where we could not collect ≥ 5 B. impatiens), which could have hampered our ability to assess some of the effects of the lowest quality sites on bees. Finally, many variables in our analyses were correlated (|r|≥ 0.70) and could not be modeled simultaneously. Because correlated variables were included as separate variables in each tier, competing models often included different members of a set of correlated variables (e.g., Supplementary Table S2), indicating uncertainty as to which variable best predicted disease loads. Future analyses might consider methods that account for correlated variables (e.g., ordination) or are robust to variable correlation (e.g., random forest, etc.).

Collectively, our results highlight the need to support high-quality landscapes (i.e., those with abundant floral/nesting resources) to maintain healthy wild bee populations. These results are particularly timely in light of widespread population declines in many insect groups63, especially pollinators like bumble bees2. In addition to helping support healthy pathogen-free bees, conservation of low-pathogen habitats identified here is also important from other perspectives; landscape features that support abundant floral and nesting resources also host more abundant and diverse pollinator populations and communities54,64, and the critical ecosystem services they provide (Russo et al. 2013). The model results presented here could also be mapped across landscapes and incorporated into conservation/planning tools (e.g., The ‘Beescape’ program; Indeed, spatially explicit conservation tools like Beescape that map biological patterns in space can be instrumental in ensuring conservation success for sensitive species like bees65,66. With that in mind, the analytical approaches used here coupled with the habitat indices we incorporated could be applied to assess wide array of conditions and study areas, thus allowing even broader insights to relative value of different landscape features for pollinators.


Site selection

Our sample sites (21 sites in 2018 and 41 sites in 2019) includes 38 of the 67 counties across Pennsylvania (Fig. 1A). These sites were selected to represent evenly the span of the state while also collecting from a wide suite of habitat types and land use patterns. The central Appalachian Mountains run through the center of the state and, therefore, vary in elevation from relatively low (~ 100–200 m) to moderately high (> 500 m67). Pennsylvania is heavily dominated by mature forest68 with northern hardwood (e.g., Acer spp.), Appalachian oak (Quercus spp.), and coniferous (e.g., Tsuga canadensis) among the most common forest types69,70. Although high-elevation regions in Pennsylvania remain largely forested52, lower elevations host fertile soils on which row cropping and livestock-based agriculture are important land uses, as is human development71. As a result, a range of values for each of nesting habitat, floral resources, and agricultural insecticide loading was available for sampling (Fig. 1).

Bee sampling

Upon arriving at each site, we conducted an unlimited-length visual encounter survey (VES) for bumble bees, where 1–2 surveyors examined all available flowers as evenly as possible for bumble bees, ignoring bumble bee species identity, until we caught 20 Bombus workers. In most sites, bees were captured within 100–200 m of the site centroid. Gathered bumble bees were identified in the field and retained for lab identification if identity was uncertain (rarely necessary). Given the difficulty of identifying B. perplexus, B. vagans, and B. sandersoni workers, these were lumped and treated as a single taxon. This 20-bee survey allowed us to obtain a diversity metric for each site (standardized Shannon Diversity Index (H′)72). All bees were captured legally under Pennsylvania special use permit #2019-75.

For pathogen analysis, we sought to collect 15 B. impatiens from each site; if our initial sample of 20 Bombus included fewer than 15 B. impatiens workers, we continued sampling until 15 B. impatiens workers were collected. During 2019 sampling, we recorded the start- and end times at each site which allowed us to calculate ‘catch per unit effort’ for B. impatiens in 2019 as: (# B. impatiens within the first 20 bees sampled/survey duration)/the number of surveyors. A few sites where 15 B. impatiens could not be obtained across many hours of sampling were sampled for 5 or 10 bees. If a site yielded < 5 B. impatiens workers across several hours of sampling, it was not included in the study, thus some of the poorest quality sites were potentially excluded. Bees were gathered into vials in groups of five and stored immediately onto dry ice in the field, followed by transfer to a – 80 °C freezer. Most sites thus contained three replicate vials with five individuals each for subsequent pooled pathogen screens.

Molecular quantification of pathogens

Abdomens were removed from workers while frozen, placed in a 5 ml tube with 5 metal beads and 2.0 ml of Qiazol (2018) or 2.5 ml Trizol (2019) buffer and homogenized using an Omni Bead ruptor for 35 s on ‘low’. After brief centrifugation to remove particulate matter, 350 μl of homogenate was used for RNA extraction. RNA was extracted using the standard protocol for the Direct-zol RNA Miniprep Kit (Zymo Research, Irvine, California) including incorporated DNA removal steps with DNaseI. RNA samples eluted in water were quantified and quality assessed using a Nanodrop One (Thermo Fisher Scientific, Waltham, Massachusetts). Then, 500 ng (2018) or 400 ng (2019) was taken through cDNA synthesis using standard protocols of the High-Capacity cDNA Reverse Transcription Kit (Thermo Fisher Scientific) (10 µl RNA + 10 µl Mastermix reaction, proportional to recommended quantities), followed by a 1(cDNA):4(water) dilution. Quantitative PCR was performed using this cDNA to assess loads Vairimorpha bombi (SSU 16S ribosomal RNA, BOMBICAR primers73,74) and loads of single stranded positive-sense RNA viruses BQCV and DWV (primers from Ref.75). We also quantified expression of the immune gene, Defensin, using primers designed for bumble bees and previously used on B. impatiens (Bi’def-278F, Bi’def-397R76. Bee housekeeping gene Elongation factor 1α (EF-1α) was used as a control gene for normalization (primers sequences and conditions in 78) of immune genes and pathogen loads against available bee tissue. For qPCR, 1 µl of cDNA product, 5 µl of SYBR green, and 0.3 µl each of 10 µM primer were combined and run on an 7900RT Fast Real-time PCR system machine (40 cycles, 60 °C annealing temperature; Applied Biosystems, Waltham, Massachusetts) using dissociation curves to ensure no non-specific products were generated. qPCRs were run in triplicate using standard curves and no template controls. Standard curves included eight serial fivefold dilutions of original PCR amplified or concentrated cDNA product. Within-site samples were randomized across extraction and cDNA runs and plates, with 2018 and 2019 samples run separately.

Extent of gene expression was calculated from qPCR CT values by first adjusting CT values based on inferred amplification efficiency of standard curves. Standard curves showed good primer efficiency for all genes, with linear increase of CT values closely matching doubling of sampling concentration across values. The resulting adjusted values were normalized against the standard curve adjusted expression of the EF-1α control gene (subtracting EF-1α CT values from CT values). Samples which deviated strongly from the others for EF-1α (most samples were within 2 CT values of one another) were removed from analysis. Samples where no expression was detected, which was found in some pathogen samples, were assigned a final CT value of the sample of lowest detected expression level (highest CT) rounded up to the nearest integer which included CT cycles of 35 (BQCV), 37 (Vairimorpha), and 39 (DWV). For statistical analyses, we used the normalized values (normalized to the amount of EF-1α in each sample) for DWV, BQCV, Vairimorpha and defensin. To quantify body size, we measured the marginal (radial) cells on the right forewing of each worker (to the nearest 0.1 mm) using a dissecting microscope77.

Landscape data metrics

We characterized broad land cover categories around each sampling location from the most recent version (2016) of the National Land Cover Database (NLCD69, 30 m resolution). Using the raster package in program R version 3.6.178,79, we extracted percent area for forest cover (deciduous + mixed + coniferous), developed cover (‘open space’ through-‘intensely developed’), pasture, row crop, and shrubland, within 2 km radius of the starting location for each study site. We also created a category for all cultivated crops combined (a sum of all crop categories; hereafter, ‘arable’ land cover) and all ‘natural’/semi-natural communities combined (e.g., the sum of: forest, shrubland, wetland, etc.; hereafter, ‘natural’ land cover). Although bumble bees are known to occasionally use habitat at broader spatial scales5,80, the majority of foraging activity for B. impatiens and other bumble bee species likely occurs within 2 km of colonies81,82 suggesting this as a useful scale for analysis.

Although we consider broad land cover classes (e.g., NLCD, described above; Fig. 1), a primary interest of ours was to assess the value of published habitat indices of bee habitat quality. Of particular focus herein were the spring and summer floral quality indices published by Koh et al.54, their nesting habitat quality index, and the insecticide loading index published by Douglas et al.14. To predict floral and nesting resources at each of our study sites, we used the Integrated Valuation of Environmental Services and Tradeoffs (InVEST v. 3.1.0) crop pollination model83. The floral and nesting indices available through the InVEST crop pollination model were developed to explain and predict the relative value of different habitats to pollinator communities from the perspectives of floral resource availability and nest site availability53,82. These habitat indices were developed by surveying a panel of experts on wild bee ecology regarding their understanding of relative resource quality among vegetation communities in the NLCD. Expert ranks were then averaged across each land cover category to produce a re-class table which can be used to translate the NLCD into maps of predicted habitat quality for components of bee habitat53. As in Kammerer et al.84, we translated land use into relative value of nesting and floral resources with reclassification tables from Koh et al.54, and, for distance-weighting procedures within the model, assumed a Bombus foraging range of 2 km. To generate maps of agricultural insecticide loading (bee lethal doses applied per ha), we obtained year × state × crop reclassification tables presented by Douglas et al.14 [Douglas et al., unpublished data] through email correspondence with the corresponding author (M. Douglas). As with our floral and nesting indices, we used the Douglas et al. 14 reclassification tables to scale land cover into insecticide toxic load and applied a distance-weighting procedure to more heavily weight insecticide application proximate to our study sites. To calculate nesting, floral, and insecticide indices, we used the 2018–19 Cropland Data Layer (USDA NASS 2018) as our land-use layer, which includes natural habitats from the NLCD and greater resolution of agricultural crop types (Ref.85; Fig. 1C,D).

In addition to data on cover type and resource availability indices, we incorporated data related to site-specific weather conditions during each sampling year. Specifically, we quantified cumulative precipitation and growing degree day (GDD; base = 0 °C) maps for our study area using the publicly available PRISM daily weather dataset (Ref.86; 4 km resolution). From these maps, we extracted precipitation and GDD values at each sampling location for ‘spring’ (March–May) and ‘summer’ (June–August) of each survey year. We quantified honey bee colony density using the Pennsylvania Department of Agriculture registered apiary database (K. Roccasecca, unpub. data) with each apiary buffered by 5 km49, scaled by the number of colonies in each apiary (Fig. 1B).

Statistical analyses

Prior to analysis we scaled all variables around a mean of zero to improve model convergence using the scale function in R. We assessed the relationships between bumble bee metrics of pathogen prevalence (Vairimorpha, BQCV, DWV, and a combined pathogen metric), expression of the immune gene Defensin, and body size, against landscape features described above using an information-theoretic approach87. Our use of the information-theoretic approach involved ranking a series of models with the same response variable (hereafter, a ‘set’ of models), each of which is a ‘candidate’ for the position of top rank in a statistical hierarchy, as determined by an information criterion like Akaike’s Information Criterion (AIC87). All models presented here are analyzed as ranked candidate model sets. Our combined pathogen metric was the mean normalized pathogen value averaged across our three pathogens, Vairimorpha, BQCV, and DWV. Specifically, we developed candidate sets of mixed-effects linear models using the lme4 package in R88. All models included a random effect for ‘site’ to account for potential non-independence of individual samples within each site89 and each year was modeled separately. In each model, bumble bee pathogen metrics for each group of five workers were used as response variables with site-specific habitat characteristics as predictors. For Defensin expression, we also considered pathogen loads as predictor variables. Within each model set, we considered all possible combinations of 0–2 predictor variables with both linear (x) and quadradic (x2) relationships. To assess the extent to which both independent and dependent variables were correlated prior to analyses, we created correlation matrices of all pairwise variables. We considered two variables to be correlated if |r|> 0.7090. Because there were numerous covariates that were correlated (Fig. 2) and correlated variables impact numerical optimization of linear models89, we specified only models where correlated predictors did not occur together in a single model. Additionally, in all model sets, we also specified a ‘null’ model that included only the random effect for ‘site’ with no fixed effects.

We specified a total of 22 candidate model sets with 11 sets in our first tier and 11 analogous sets in our second tier. The 11 model sets in each tier consisted of our six response variables: (1) Vairimorpha, (2) BQCV, (3) DWV, (4) Defensin, (5) combined pathogen load, and (6) marginal cell size analyzed separately for each year (marginal cell size only for 2019). Within each candidate model set, we assessed models and associated covariates using AIC adjusted for small sample size (AICc; models < 2.0 AICc considered equivalent) applying β 95% confidence intervals (with those overlapping zero considered weak effects87.