An altered microbiome in urban coyotes mediates relationships between anthropogenic diet and poor health

Generalist species able to exploit anthropogenic food sources are becoming increasingly common in urban environments. Coyotes (Canis latrans) are one such urban generalist that now resides in cities across North America, where diseased or unhealthy coyotes are frequently reported in cases of human-wildlife conflict. Coyote health and fitness may be related to habitat use and diet via the gut microbiome, which has far-reaching effects on animal nutrition and physiology. In this study, we used stomach contents, stable isotope analysis, 16S rRNA gene amplicon sequencing, and measures of body condition to identify relationships among habitat use, diet, fecal microbiome composition, and health in urban and rural coyotes. Three distinct relationships emerged: (1) Urban coyotes consumed more anthropogenic food, which was associated with increased microbiome diversity, higher abundances of Streptococcus and Enterococcus, and poorer average body condition. (2) Conversely, rural coyotes harbored microbiomes rich in Fusobacteria, Sutterella, and Anaerobiospirillum, which were associated with protein-rich diets and improved body condition. (3) Diets rich in anthropogenic food were associated with increased abundances of Erysipelotrichiaceae, Lachnospiraceae, and Coriobacteriaceae, which correlated with larger spleens in urban coyotes. Urban coyotes also had an increased prevalence of the zoonotic parasite Echinococcus multilocularis, but there were no detectable connections between parasite infection and microbiome composition. Our results demonstrate how the consumption of carbohydrate-rich anthropogenic food by urban coyotes alters the microbiome to negatively affect body condition, with potential relationships to parasite susceptibility and conflict-prone behavior.


Data processing
All statistical analyses were performed in R 3.6.2 2 . There were a small number of cases where a coyote was missing a physiological measurement; for example, two coyotes were decapitated for rabies testing prior to sample acquisition, so body mass and length data could not be measured. In these cases, missing data were imputed using linear regressions constructed with data from the remaining samples, with available physiological measures as predictors. The composite health index was generating using principal components analysis implementing using the 'rda' function in the R package vegan 3 . Axis scores on the first principal component explained 83.7% of variation among individuals and were used as a single index of physical condition ( Table S6).
16S rRNA sequence data was processed to generate amplicon sequence variants (ASVs) using the R package dada2 4 following a previously described method 5 . In brief, forward and reverse reads were truncated at 240 bp and 160 bp, respectively, and low-quality reads were removed using the dada2 default filtering parameters. Because dada2 evaluates error rates for an entire sequencing run, and is therefore more accurate when more samples from the same run are evaluated together, the 16S rRNA sequence data described in this experiment was processed in dada2 alongside data from concurrently sequenced experiments in our lab. The sequence data from separate experiments was removed immediately after the dada2 processing pipeline.
Five coyote fecal samples with fewer than 4,500 reads were excluded from downstream analysis; this cutoff threshold was determined based on a visual examination of our read count distributions across all samples and supplemented by an evaluation of sample completeness measures calculated using iNext 6 . ASVs were then aligned against taxa in the RDP reference database (release 11.5) 7 using the naïve Bayesian classifier method implemented in dada2 8 . We removed 11 ASVs that were identified as chloroplasts or mitochondria, as well as 21 ASVs identified as contaminants using a prevalence-based detection threshold of 0.25 in the R package decontam 9 , and confirmed that contaminants were neither abundant nor prevalent in our experimental data (Fig. S13). We then used the package phangorn 10 to generate a generalized time-reversible maximum likelihood phylogenetic tree for our data following previously described procedures 11 . The resulting feature table, taxonomic information, and phylogenetic tree were imported into the package phyloseq 12 for analysis.
For all downstream analyses, we present results obtained using an unrarefied, centered log ratio (CLR)-transformed feature table, as the CLR transformation best accounts for the compositional nature of microbiome data 13 . Total ASV richness, Shannon diversity, and Faith's phylogenetic diversity were determined as asymptotic estimates calculated from rarefaction curves implemented in the packages iNext 6 and iNextPD 14 , and the nearest taxon index (NTI) was calculated using the package picante 15 . To ensure that our results were robust to the analysis method we chose, we additionally averaged ASV abundances across 1,000 rarefactions to the minimum remaining library size of 4,559 reads. Alpha-and beta-diversity analyses were repeated using this rarefied feature table, with alpha diversity measures calculated directly from rarefied data rather than estimated from rarefaction curves.

Regression models
We used linear regressions to determine if any of our measures of body condition, diet, or microbiome richness or diversity varied significantly between urban and rural coyotes, while controlling for any effects of sex and age. In these analyses, stomach content volumes were natural log-transformed with a pseudocount of 0.01 ml to meet the assumptions of a normal distribution. Significance (Type II ANOVA) was measured as p<0.05. The effects of sex, age, and location on E. multilocularis infection status were similarly evaluated using a logistic regression. Because there was a significant association between E. multilocularis infection and coyote location, the effects of E. multilocularis infection on body condition, diet, and microbiome alpha diversity were tested while controlling for location in addition to age. All regression models were implemented using the 'lm' and 'glm' functions in R, with the exception of models predicting ASV richness, which was fit with a negative binomial distribution with the 'glm.nb' function in the R package MASS 16 .
To determine which measures were the best indicators of the four microbiome metrics (richness, Shannon diversity, Faith's PD, and NTI), we used GLMs with the microbiome metric as the response variable and stable isotope measures, stomach contents, E. multilocularis infection status, the physical condition index, and spleen mass as predictors. Continuous predictors were centered and standardized prior to model construction, and we confirmed that no predictor had a variance inflation factor greater than two. We examined all models subsets using the package MuMIn 17 and evaluated the relative importance of each predictor by 1) summing the AIC model weights for each model in which the predictor appeared 18 and 2) comparing modelaveraged coefficients across models with a ΔAICc less than two. To control for even slight collinearity among predictors, we standardized predictor coefficients by the partial standard deviation of their variables before averaging coefficients across top-ranked models, following the recommendations of Cade (2015) 19 .

Microbiome community analyses
Differential abundance analyses were performed at the phylum, class, order, family, and genus levels using the R package ALDEx2 20 . Tests were performed using the 'aldex.glm' function and a pre-specified model matrix to control for confounding variables. Specifically, differential abundance was tested based on 1) sex, controlling for the effect of age; 2) location, controlling for the effects of sex and age; and 3) E. multilocularis infection status, controlling for the effects of location.
Random forest models trained to predict coyote location from CLR-transformed taxon abundances were implemented using the R package randomForest 21 . Models were run with 1,000 trees and included only ASVs with an average relative abundance greater than 0.01%.
We used an Aitchison distance-based permutational multivariate analysis of variance (PERMANOVA) with 1,000 permutations to test for differences in overall microbiome composition due to sex, age, location, or E. multilocularis infection. This analysis was implementing using the 'adonis' function in the R package vegan. We also mapped continuous diet and health measures onto a principal component analysis using the 'envfit' function in vegan. These analyses were repeated using the Bray-Curtis and Jaccard distance calculated from rarefied data and using the weighted and unweighted UniFrac distances 22 to test for potential phylogenetic clustering effects.

Relationships with taxon abundances
Multivariate associations among CLR-transformed abundances, short-and long-term diet measures, the physical condition index, spleen mass, and age were investigated using regularized canonical correlation analysis (rCCA) with three components, implemented in the package mixOmics 23 . We used correlation distance-based hierarchical clustering on the resulting similarity matrix to identify taxa that responded similarly to the various explanatory variables and determined the importance of individual taxa based on the sum of the absolute values of their relevance scores.
Structural equation models were implemented in the R package lavaan 24 . Models were designed to test for causal linkages among location, diet, taxon abundances, physical condition score, and spleen mass. Where necessary, variables included in each model were log-transformed or scaled to meet the assumptions of homogenous variance and a normal distribution: the volumes of prey and anthropogenic food in the stomach were log-transformed after the addition of a pseudocount of 0.01 ml and then rescaled, and ASV richness was rescaled without transformation. To maintain a minimum 10:1 ratio of variables to observations, we limited models to nine variables. For each taxon cluster identified in the rCCA, we constructed initial models for evaluation that represented general hypotheses of causal linkages among variables, and we specified residual correlations among all microbiome features. Additional paths were added to the models as recommended by modification indices and non-significant paths (p>0.1) were removed. The final models were selected when adding or removing an additional path either caused model AIC to increase or caused other fit parameters to exceed conventional thresholds, even if the path was non-significant. Model coefficients were standardized in the top models to facilitate comparison among paths. Table S1: Summary of coyotes included in this study.

SUPPLEMENTARY TABLES & FIGURES
Numbers indicate the numbers of coyotes collected from each location, sex, and source.
"Managed" refers to coyotes that were lethally managed due to conflict or conflict-prone behavior.        Coefficients (y-axes) were averaged across all top-ranked models (ΔAIC C < 2) after being standardized by their partial standard deviation, and predictors were ordered along the x-axis based on the sum of the AIC model weights of the models in which they appeared (size of points). No coefficients are shown for predictors that did not appear in the top-ranked models.       The CLR-transformed abundances of genera identified using a regularized canonical correlation analysis (see Fig. 4, main text) were used as a response variable in generalized linear models with diet and health measures as predictors. All model subsets were evaluated. Coefficients were averaged across top-ranked models (ΔAIC c < 2) after being standardized by their partial standard deviation, and predictors were ranked based on the sum of the AIC model weights of the models in which they appear. For each taxon, model-averaged coefficients are shown, ranked along the x-axis in order of decreasing sums of weights. Generalized linear models for genera associated with protein consumption were evaluated in the same way as Fig. S11. Generalized linear models for genera associated with spleen mass were evaluated in the same way as Fig. S11.   Scatter plot showing best-fit linear regressions between age and health. Due to the strong correlation between age and health, and the limited spread of ages, age was not included as a control in models predicting health. In natural environments, only healthy coyotes live to be old. There are no clear indicators of variability due to the differences in sample preservation associated with sample source or collection method.