Contrasting emergence of Lyme disease across ecosystems

Global environmental changes are causing Lyme disease to emerge in Europe. The life cycle of Ixodes ricinus, the tick vector of Lyme disease, involves an ontogenetic niche shift, from the larval and nymphal stages utilizing a wide range of hosts, picking up the pathogens causing Lyme disease from small vertebrates, to the adult stage depending on larger (non-transmission) hosts, typically deer. Because of this complexity the role of different host species for emergence of Lyme disease remains controversial. Here, by analysing long-term data on incidence in humans over a broad geographical scale in Norway, we show that both high spatial and temporal deer population density increase Lyme disease incidence. However, the trajectories of deer population sizes play an overall limited role for the recent emergence of the disease. Our study suggests that managing deer populations will have some effect on disease incidence, but that Lyme disease may nevertheless increase as multiple drivers are involved.


Supplementary Notes
Supplementary Note 1. Using harvest statistics as measure of population abundance.
Harvest statistics in general may provide biased estimates of population abundances, as they also may reflect hunter effort 30 and harvest quotas 31 . The population indices used here have been thoroughly tested against independent data to assess their reliability as indices of population trends. We here briefly review the evidence for using them and their limitations.
A. Deer density index. As a deer population density index, we used the number of harvested deer per km 2 of deer habitat as defined by management providing the basis for harvest quotas 32,33 . This so-called counting area forms the main basis for setting quotas rather than number of deer per see. We retrieved counting area for roe deer, red deer and moose from Statistics Norway, and we used the largest available figure for each municipality. In most cases this will be the same, but in some areas with only one species this is important to control for.
This deer density index have been tested against independent abundance data and has been used widely in demographic studies 32,34 , showing clear links to deer performance such as body mass 32 , age at first reproduction and timing of ovulation 34 , suggesting it reflects density relative to resource levels and is mainly caused by management differences 32 . For red deer, the harvest density correlated with population density estimated from cohort analysis 35 .
Similarly for moose, the harvest density of moose correlated with population density estimated from cohort analysis both within and between regions 36 . There might be 1-2 year lag between the true population size and the harvest size due to the quota system (for moose 31,37 ), but the most recent analysis suggests a good correlation between the harvest density of moose and population dynamics 36 . For roe deer, harvest density index is used in analysis of population dynamics and regarded a very good proxy for population size 38 Figure 3). Note, however, that the rodent abundance index is calculated at a county scale, i.e., at a much coarser scale than municipality for which we have other data.

Supplementary Note 2. Biodiversity.
We used mammal host species richness as a measure of biodiversity, similar to Turney et al. 42 for  47 . From other studies, we have added to this list roe deer and moose 48,49 . We also discussed the list with the author of the early review (a well-known tick expert, Reidar Mehl) to make sure we had not missed species being described later.
We used the most updated account of mammal distribution of Norway 50 , based on advice from a mammal expert from the Norwegian Zoological Society (Kjell Isaksen). We checked for consistency with another account 51 , and there were very few differences in species distributions between. For cervids (roe deer, red deer and moose), we used distribution data from Statistics Norway, as these species are all harvested very soon if they appear in a new area 52 . For each species, we recorded whether the species was present in a given municipality based on the maps. For each municipality, the number of species recorded was then used as a measure of biodiversity.

Supplementary Note 3. A detailed overview of model specifications.
We here give some detailed background on the modelling approaches used. Most of this is regarding LD incidence being the by far most comprehensive analyses due to its spatial and temporal extent. But the main approach of how to achieve a good model fit, model selection and presentation of results are similar to all analyses.

Model fit and appropriate residuals.
An overall important aim is to achieve an appropriate model fit; most importantly that there are no remaining patterns in the residuals. This was achieved by accounting for potential dependency between observations by the aid of a random term ("municipality") and terms for spatial and temporal autocorrelation. Our data had frequent zero observations. This was modelled by negative binomial models with low mean () and overdispersion. We tested whether using zero-inflation were necessary to achieve a good fit. Further, we linearized relationships by aid of appropriate transformations such as "log" (ln-transformed) or "square root". To achieve appropriate fit of more strongly nonlinear relationships, we also tried polynomial terms or cubic splines. All numeric variables was standardized to facilitate comparison of effect sizes (i.e., scaled to mean zero and  from the best models. These estimates are so-called treatment contrasts, i.e. relative to a baseline. For example, our baseline is the region "west" (table 1). The estimate for region "south" (0.830) is thus how much LD incidence in "south" differs from that in the region "west" (log scale). This also implies that the estimate for the main effect of "Year" (0.638) is for region "west", and that the negative estimate for "Year * Region (south vs. west)" (-0.331) is relative to the baseline for "west". That is, the estimated effect for "Year" in region "south" is (0.638-0.331=0.307). Hence, the estimated trend for "Year" is positive (0.307) also in the region "south", but significantly lower than in region "west" (0.638). Notice that estimated effects are given on log scale (due to the log link of the negative binomial model). of presenting results is used in figure 3, with lines being predictions for average conditions and points being residuals (not binned due to lower sample size).

Table 1 and 2: Analysis of LD incidence
The raw data are number of cases of systemic stages of Lyme disease with a known origin of the tick bite. We are interested in analysing the LD incidence rather than the number of LD cases, and hence need to account for human population size. Since "cases per capita" are not normally distributed, incidence can better be modelled with the negative binomial (NB) distribution and an offset variable (log population size) (ref. 1 , page 241).
The following variables were tested for, but not all included in the final model in table 1.

Terms to account for potential dependency in data
 "Municipality". A random term to account for repeated observations from the same area.
 "Spatial autocorrelation". This was accounted for by including for each year and municipality an index (1-0) of "Previous year's presence of LD in neighbouring municipalities".
 "Temporal autocorrelation". This was "Previous year's incidence of LD cases in the municipality".

Descriptive terms
 "Year". A continuous term for year to assess temporal changes in LD incidence. This is used to formally test for an emergence, defined here as an increase in LD incidence over years. If this terms remain significant after adding a driver (such as deer density), it is evidence this driver alone cannot explain the emergence.
 "Region". A 4 level categorical variable (west, east, south, north) separating the main nature-geographic regions of Norway with different deer population developments (see study area section).

Land use variables
 "Proportion of residential settlement area". The proportion of the area in the municipality with human settlements.
 "Proportion of agricultural area". The proportion of the area in the municipality with agriculture.
 "Proportion of forest". The proportion of the area in the municipality with forest.
 "Proportion of people living in city". The proportion of the human population size within each municipality living in cities rather than in more rural environments.
Deer density indexes. If not specified, deer refers to sum of red deer, roe deer and moose. See Supplementary Note 1. The best model included "Spatial deer density" and "Temporal deer density".
 "Deer density". Number of deer harvested per km 2 in a given year in a given municipality.
 "Spatial deer density". The mean number of deer harvested per km 2 in a given municipality over the entire time period.
 "Temporal deer density". The residual between log number of deer harvested in a given year minus the log mean number of deer harvested per km 2 in a given municipality over the entire time period. (i.e., the log ratio of "Deer density" to "Spatial deer density") for each year). Climate proxy variables (spatial). These variables are used to control for local climate variation in space for which there are no direct estimates of climate. Analysis of tick abundance shows a strong effect of "distance to coast" in these highly coastal regions 54 and also with latitude 55 . These indices hence predict tick population abundance in Norway, but not necessarily in other regions.

Indices of other mammals
 "Distance to coast". Distance from coast to centre of a given municipality.
 "Latitude". The latitude in UTM at the centre of a given municipality.
Climate variables (temporal): the North Atlantic Oscillation (NAO) index is well known to have a major influence on climate in Norway, and it is used extensively in ecological research 56 . The benefit of using such indexes rather than local measurements is that it accounts for climate over a broader scale, cfr. refs. 57,58 .
 "NAO-MAM": The NAO index for months March-April-May.
 "NAO-DJF": The NAO index for months December-January-February.  "local management unit". A random term to account for possible dependency between measurements from the same unit.
 "Year". Year as a categorical term. This was years 2009-14 for the SF area and 2011-13 for the MR area.
 "Spatial red deer density". The mean number of red deer harvested per km 2 in a given local management unit over the entire time period.

Climate proxy variables (spatial):
 "Elevation". The elevation in m above sea level for each survey plot.
 "Slope". The slope at each survey plot.
 "Distance to coast". This is the mean distance to fjord within each "local management unit", as the mean values gave a better fit than distance to fjord at each survey plot and was hence retained.  "municipality". A random term to account for possible spatial dependency.
 "Year". Year as a categorical term. This was years 2009-14 for the SF area and 2011-13 for the MR area.
 "Red deer density". Number of red deer harvested per km 2 in a given year in a given municipality.

Climate proxy variables (spatial):
 "Elevation". The elevation in m above sea level for each survey plot.
 "Slope". The slope at each survey plot.
 "Distance to coast". This was distance to nearest fjord in m from each survey plot.

Table 5: Tick load analysis
The response variable was the "number of nymphal ticks" on ears from GPS-marked red deer and a negative binomial model was used. The negative binomial model included the following fixed effects:  "Red deer density". Number of red deer harvested per km 2 in a given year in a given municipality.
 "Julian date". A continuous variable counting days from January 1 st , as number of ticks on deer decline from September onwards.
 "Elevational difference between summer and winter range". The difference in elevation of the seasonal centre of the home range during winter and summer. The response variable was the prevalence of Borrelia burgdorferi sensu lato and hence a binomial model was used. Analysis was done separately for each season and the two counties SF and MR. Note that "Year" (as categorical) and "Rodent index" was never entered in the same model. This is because the "Rodent index" is an annual index and there would be no residual variation if first having year (as categorical) in the model.
Step 1 using "Year" as categorical:  "municipality" (for SF) or "local management unit" (for MR). A random term to account for possible spatial dependency.
 "Year". Year as a categorical term. This was years 2009-14 for the SF area and 2011 and 13 for the MR area.
 "Red deer density". Number of red deer harvested per km 2 in a given year in a given municipality.
Step 2 using "Rodent index":  "municipality" (for SF) or "local management unit" (for MR). A random term to account for possible spatial dependency.
 "Rodent index". An annual value for the rodent population size. See Supplementary Note 1.
 "Red deer density". Number of red deer harvested per km 2 in a given year in a given municipality.