Community confounding in joint species distribution models

Joint species distribution models have become ubiquitous for studying species-environment relationships and dependence among species. Accounting for community structure often improves predictive power, but can also affect inference on species-environment relationships. Specifically, some parameterizations of joint species distribution models allow interspecies dependence and environmental effects to explain the same sources of variability in species distributions, a phenomenon we call community confounding. We present a method for measuring community confounding and show how to orthogonalize the environmental and random species effects in suite of joint species distribution models. In a simulation study, we show that community confounding can lead to computational difficulties and that orthogonalizing the environmental and random species effects can alleviate these difficulties. We also discuss the inferential implications of community confounding and orthogonalizing the environmental and random species effects in a case study of mammalian responses to the Colorado bark beetle epidemic in the subalpine forest by comparing the outputs from occupancy models that treat species independently or account for interspecies dependence. We illustrate how joint species distribution models that restrict the random species effects to be orthogonal to the fixed effects can have computational benefits and still recover the inference provided by an unrestricted joint species distribution model.

Historically, species distributions have been modeled independently from each other due to unavailability of multispecies datasets and computational restraints. However, ecological datasets that provide insights about collections of organisms have become prevalent over the last decade thanks to efforts like Long Term Ecological Research Network (LTER), National Ecological Observatory Network (NEON), and citizen science surveys 1 . In addition, technology has improved our ability to fit modern statistical models to these datasets that account for both species environmental preferences and interspecies dependence. These advancements have allowed for the development of joint species distribution models (JSDM) 2-4 that can model dependence among species simultaneously with environmental drivers of occurrence and/or abundance. Species distributions are shaped by both interspecies dynamics and environmental preferences [5][6][7][8] . JSDMs integrate both sources of variability and adjust uncertainty to reflect that multiple confounded factors can contribute to similar patterns in species distributions. Some have proposed that JSDMs not only account for biotic interactions but also correct estimates of association between species distributions and environmental drivers 3,9 , while others claim JSDMs cannot disentangle the roles of interspecies dependence and environmental drivers 5 . We address why JSDMs can provide inference distinct from their concomitant independent SDMs, how certain parameterizations of a JSDM induce confounding between the environmental and random species effects, and when deconfounding these effects may be appealing for computation and interpretation.
Because of the prevalence of occupancy data for biomonitoring in ecology, we focus our discussion of community confounding in JSDMs on occupancy models, although we also consider a JSDM for species density data in the simulation study. The individual species occupancy model was first formulated by MacKenzie et al. 10 and has several joint species extensions 4,[11][12][13][14][15][16] . We chose to investigate the impacts of community confounding on the probit model since it has been widely used in the analysis of occupancy data 4,13,17 . We also developed a joint species extension to the Royle-Nichols model 18 and consider community confounding in that model.
We use the probit and Royle-Nichols occupancy models to improve our understanding of montaine mammal communities in what follows. We show that including unstructured random species effects in either occupancy model induces confounding between the fixed environmental and random species effects. We demonstrate how to orthogonalize these effects in the model and compare the resulting inference compared to models where species are treated independently. Community confounding. Species distributions are shaped by environment as well as competition and mutualism within the community 8,22,23 . Community confounding occurs when species distributions are explained by a convolution of environmental and interspecies effects and can lead to inferential differences between a joint and single species distribution model as well as create difficulties for fitting JSDMs. Former studies have incorporated interspecies dependence into an occupancy model 4,[11][12][13][14][15][16] , and others have addressed spatial confounding 1,17,24,25 , but none of these explicitly addressed community confounding. However, all Bayesian joint occupancy models naturally attenuate the effects of community confounding due to the prior on the regression coefficients. The prior, assuming it is proper, induces regularization on the regression coefficients 26 that can lessen the inferential and computational impacts of confounding 27 . Furthermore, latent factor models like that described by Tobler et al. 4 restrict the dimensionality of the random species effect which should also reduce confounding with the environmental effects.
We address community confounding by formulating a version of our model that orthogonalizes the environmental effects and random species effects. Orthogonalizing the fixed and random effects is common practice in spatial statistics and often referred to as restricted spatial regression [27][28][29][30][31] . Restricted regression has been applied to spatial generalized linear mixed models (SGLMM) for observations y, which can be expressed as where g(·) is a link function, ψ are additional parameters for the data model, and is the covariance matrix of the spatial random effect. In the SGLMM, prior information facilitates the estimation of η, which would not be estimable otherwise due to its shared column space with β 30 . This is analogous to applying a ridge penalty to η, which stabilizes the likelihood. Another method for fitting the confounded SGLMM is to specify a restricted version: where P X = X(XX) −1 X ′ is the projection matrix onto the column space of X. In the unrestricted SGLMM, the regression coefficients β and random effect η in (1) compete to explain variability in the latent mean µ in the direction of X 27 . In the restricted model, however, all variability in the direction of X is explained solely by the regression coefficients δ in (4) 31 , and η explains residual variation that is orthogonal to X . We refer to β as the conditional effects because they depend on η , and δ as the unconditional effects.
Restricted regression, as specified in (4), was proposed by Reich et al. 28 . Reich et al. 28 described a diseasemapping example in which the inclusion of a spatial random effect rendered one covariate effect unimportant that was important in the non-spatial model. Spatial maps indicated an association between the covariate and (1) y ∼ [y|µ, ψ], www.nature.com/scientificreports/ response, making inference from the spatial model appear untenable. Reich et al. 28 proposed restricted spatial regression as a method for recovering the posterior expectations of the non-spatial model and shrinking the posterior variances which tend to be inflated for the unrestricted SGLMM. Several modifications of restricted spatial regression have been proposed 30,[32][33][34][35] . All restricted spatial regression methods seek to provide posterior means E δ j |y and marginal posterior variances Var δ j |y , j = 1, ..., p that satisfy the following two conditions 36 : 1. E δ|y = E β NS |y and, 2. Var β NS,j |y ≤ Var δ j |y ≤ Var β Spatial,j |y for j = 1, ..., p, where β NS and β Spatial are the regression coefficients corresponding to the non-spatial and unrestricted spatial models, respectively.
The inferential impacts of spatial confounding on the regression coefficients has been debated. Hodges and Reich 29 outlined five viewpoints on spatial confounding and restricted regression in the literature and refuted the two following views: 1. Adding the random effect η corrects for bias in β resulting from missing covariates. 2. Estimates of β in a SGLMM are shrunk by the random effect and hence conservative.
The random effect η can increase or decrease the magnitude of β , and the change may be galvanized by mechanisms not related to missing covariates. Therefore, we cannot assume the regression coefficients in the SGLMM will exceed those of the restricted model, nor should we regard the estimates in either model as biased due to misspecification. Confounding in the SGLMM causes Var β j |y ≥ Var δ j |y , j = 1, ..., p , because of the shared column space of the fixed and random effects. Thus, we refer to the conditional coefficients as conservative with regard to their credible intervals, not their posterior expectations.
Reich et al. 28 argued that restricted spatial regression should always be applied because the spatial random effect is generally added to improve predictions and/or correct the fixed effect variance estimate. While it may be inappropriate to orthogonalize a set of fixed effects in an ordinary linear model, orthogonalizing the fixed and random effect in a spatial model is permissible because the random effect is generally not of inferential interest. Paciorek 37 provided the alternative perspective that, if confounding exists, it is inappropriate to attribute all contested variability in y to the fixed effects. Hanks et al. 31 discussed factors for deciding between the unrestricted and restricted SGLMM on a continuous spatial support. The restricted SGLMM leads to improved computational stability, but the unconditional effects are less conservative under model misspecification and more prone to type-S errors: The Bayesian analogue of Type I error. Fitting the unrestricted SGLMM when the fixed and random effects are truly orthogonal does not introduce bias, but it will increase the fixed effect variance. Given these considerations, Hanks et al. 31 suggested a hybrid approach where the conditional effects, β , are extracted from the restricted SGLMM. This is possible because the restricted SGLMM is a reparameterization of the unrestricted SGLMM. This hybrid approach leads to improved computational stability but yields the more conservative parameter estimates. We describe how to implement this hybrid approach for joint species distribution models in the "Community confounding" section.
Restricted regression has also been applied in time series applications. Dominici et al. 38 debiased estimates of fixed effects confounded by time using restricted smoothing splines. Without the temporal random effect, Dominici et al. 38 asserted all temporal variation in the response would be wrongly attributed to temporally correlated fixed effects. Houseman et al. 39 used restricted regression to ensure identifiability of a nonparametric temporal effect and highlighted certain covariate effects that were more evident in the restricted model (i.e., the unconditional effects' magnitude was greater). Furthermore, restricted regression is implicit in restricted maximum likelihood estimation (REML). REML is often employed for debiasing the estimate of the variance of y in linear regression and fitting linear mixed models that are not estimable in their unrestricted format 40 . Because REML is generally applied in the context of variance and covariance estimation, considerations regarding the effects of REML on inference for the fixed effects are lacking in the literature.
In ecological science, JSDMs often include an unstructured random effect like η in (1) to account for interspecies dependence, and hence can also experience community confounding between X and η analogous to spatial confounding. Unlike a spatial or temporal random effect, we consider random species effects to be inferentially important, rather than a tool solely for improving predictions or catch-all for missing covariates. An orthogonalization approach in a JSDM attributes contested variation between the fixed effects (environmental information) and random effect (community information) to the fixed effect.
We describe how to orthogonalize the fixed and random species effects in a suite of JSDMs and present a method for detecting community confounding. In the simulation study, we test the efficacy of our method for detecting confounding, show that community confounding can lead to computational difficulties similar to those caused by spatial confounding 31 , and highlight that, for some models, restricted regression can improve model fitting. We also investigate the inferential implications of community confouding and restricted regression in JSDMs by comparing outputs from the SDM, unrestricted JSDM, and restricted JSDM of the Royle-Nichols and probit occupancy models fit to mammalian camera trap data. Lastly, we discuss other inferential and computational methods for confounded models and consider their appropriateness for joint species distribution modeling. www.nature.com/scientificreports/ Model Data model. The probit and Royle-Nichols occupancy models were developed for analyzing multispecies binary detection data, y ijk , arising from a zero-inflated Bernoulli process with probability of success p ijk , where i = 1, . . . , n , j = 1, . . . , J i , and k = 1, . . . , K correspond to sites, occasions, and species, respectively. Occupancy data of this form have traditionally been analyzed in a latent variable framework 10,41,42 . In what follows, we let z ik ∼ Bern(ψ ik ) be an indicator on whether species k occupies site i. Given a site is occupied, we detect species k on occasion j with some probability p ijk , such that (y ijk |z ik = 1) ∼ Bern(p ijk ) , but if species k is absent from the site, we have zero probability of detecting it, P(y ijk = 0|z ik = 0) = 1. The probit occupancy model is so named because it links ψ ik and p ijk to occupancy and detection covariates x ik and w ijk , respectively, with the standard normal CDF . The probit link function can be paired with data augmentation 17,[43][44][45] to yield efficient Gibbs samplers for the occupancy and detection regression coefficients β and α , respectively.
Royle and Nichols 18 introduced a method for analyzing occupancy data that explicitly modeled the probability of detecting species k at a site as a function of a surrogate related to the true species abundance. Assuming there are N ik individuals of species k in sample region i and that all individuals in species k on the sample unit have identical detection probabilities and are detected independently of other individuals, the probability of detecting at least one of these individuals can be expressed as where r jk is a binomial sampling probability that a particular individual of species k is detected on occasion j. While the Royle-Nichols model facilitates inference on number of individuals of species k, N ik , at each site when all the assumptions are met, we do not interpret them as such because sites are not necessarily closed in camera trap studies due to mobile species with home ranges larger than the sampling radius of the camera. Note that ρ ijk in (7) corresponds to the species probability of detection conditional on an intensity process. This is distinct from p ijk in the probit model that is conditional on an occupancy process.
The nonlinear function of r jk and N ik in (7) involves more parameters than would be identifiable in a typical occupancy model, especially when the individual detection probability is heterogeneous across occasions (e.g., r jk are heterogeneous). In the heterogeneous case, r jk is connected to covariates with the logit link function: where f (w ijk , α k ) is a linear function of the detection covariates w ijk and regression parameters α k .

Modeling interspecies dependence.
We extend both occupancy models to account for interspecies dependence by including random species effects in their process models. Following Royle and Nichols 18 , we assume N ik ∼ Pois( ik ) , where ik is mean intensity of species k at site i. We let denote the vector of site specific intensities stacked across the K species in the community. To model interspecies dependence, we specify the conditional multivariate normal distribution: where X is a block-diagonal matrix of the K species design matrices, β = (β ′ 1 , . . . , β ′ K ) ′ is a stacked vector of species specific regression coefficients, η represents the random species effects, and spp is a species covariance matrix, and is a matrix that allows for additional covariance structures such as spatial dependence. For our purposes of comparing the SDM, unrestricted JSDM, and restricted JSDM for differences galvanized by community confounding, we assumed a simple independent structure for log( ) and set = τ I. In the probit model, we include a random species effect in the latent probability of occupancy: �(ψ) = Xβ + η , where ψ is a vector of site specific occupancy probabilities stacked across the K species in the community and X , β = (β ′ 1 , . . . , β ′ K ) ′ , and η are defined as above. In both occupancy models, η allows for dependence between all K species in the community at each site. In the probit model, η characterizes interspecies dependence in the probability of occupancy, whereas in the Royle-Nichols model interspecies dependence is characterized in the species latent intensities. Just as certain environmental features may not preclude species occupancy but can curb intensity, some species may coexist in a region but not be able to jointly flourish 46 . Hence, interspecies dependence on latent intensity is conceptually distinct from interspecies dependence on probability of occupancy and may lead to inferential differences in η in the two occupancy models.
Tobler et al. 4 developed a joint occupancy model that accounts for community structure using a latent variable approach. They express the latent probability of occupancy of species k at site i as where l ′ i is a vector of length T of latent variables, and θ k are species specific regression coefficients. The latent variable model (LVM) is a computationally efficient and implicitly accounts for community structure. Other occupancy models have included interspecies dependence in the structure of the regression coefficients. Known as multispecies models, these models assume the species specific regression coefficients β k stem from a common multivariate normal distribution β k ∼ N (µ, � β ) where µ is the typical response of a species to covariates x and � β allows for dependence in different species response to the same covariates 47 . In our study of mammalian www.nature.com/scientificreports/ camera trap data, each species is modeled with unique covariates, and we do not consider shared environmental responses. Scheffe 48 stipulated that the levels of a random effect are draws from a population, and the draws are not of interest in themselves but only as samples from the larger population, which is of interest. In more recent literature, the term random effect is used more broadly. Hodges and Clayton 49 categorized modern definitions of a random effect into three different varieties. The definition commonly used in spatial statistics is, the levels of the effect arise from a meaningful population, but they are the whole population and these particular levels are of interest. We adopt this definition for the random species effects in (9). In practice, some levels of the population will likely not be included in the random species effects. For example, in Ivan et al. 50 , cameras were baited and arranged to capture all members of the mammalian community, but several species were excluded from the random species effects due to a lack of detections.
Priors. We specified normal priors for the regression coefficients, β , in the intensity and occupancy processes of the Royle-Nichols and probit models, respectively to facilitate comparison with the occupancy and spatial confounding literature. We also specified normal priors for the detection coefficients, α , in the observation model and the conjugate Inverse-Wishart prior for the species covariance matrix spp . A more general alternative to the Inverse-Wishart prior is to apply a Cholesky decomposition, spp = LD −1 L ′ , where L is lower diagonal with ones along the diagonal and D is diagonal with positive diagonal elements, and specify priors for the lower diagonal elements of L and diagonal elements of D 51 . We found the Inverse-Wishart prior suitable for our inferential goals, but see Chan and Jeliazkov 51 for alternative covariance matrix priors.
The joint posterior distribution associated with our model is See Appendix A for the full statements of both the joint probit and Royle-Nichols occupancy models.

Community confounding
Restricted regression approach. We fit a restricted version of the each JSDM that orthogonalizes the fixed and random species effects. In the Royle-Nichols model, we express the species latent intensity and occupancy process conditionally as where P X is the projection matrix onto the column space of X . Likewise, in the probit model we specify �(ψ) = Xδ + (I − P X )η and retain the same prior for η as in (14). This specification forces the random species effects to explain patterns in the community that are orthogonal to the fixed effects. The latent variables and fixed effects in the LVM can also be orthogonalized. Writing (11) in matrix form, we have where X and L are the matrices of covariates and latent variables vertically stacked across sites, respectively. If we assume common covariates across all K species, we can specify a restricted LVM as follows: However, if covariates differ by species, i.e., X = X k , then the posterior distribution of latent variables will differ by species. To retain a common posterior distribution of latent variables across all species, the latent variables need to be orthogonalized against all covariates among the k species, The specification of (17) is more restrictive than the orthogalization in the Royle-Nichols and probit model, and so we omit the LVM from our case study. Hanks et al. 31 showed that the restricted (13) and unrestricted (9) generalized linear mixed models GLMM are reparameterizations of the same model and derived the following relationship between the unconditional δ and conditional β fixed effects: Using (18), one can easily sample both sets of fixed effects by fitting either the restricted or unrestricted parameterization. We can also sample the covariance structure of the restricted random species effect from either model fit by drawing samples from the distribution (13) log( ) ∼ N (Xδ + (I − P X )η, τ 2 I), (14) η ∼ N (0, � spp ⊗ I n ), Measuring confounding. Hefley et al. 27 showed how to assess confounding in SGLMM models by computing the Pearson correlation coefficient between each pair of covariates and eigenvectors from the spectral decomposition of the spatial covariance matrix. Likewise, Prates et al. 35 proposed a test for spatial confounding that can be calculated prior to model fitting. We propose another approach relevant to our method that aids in interpretation. We compute the coefficient of determination of each covariate for species k regressed on the estimated random species effects. Because the latent intensities are unknown, the coefficents of determination of all covariates are derived quantities and can be computed at each iteration of the MCMC algorithm: where x k = (x k , . . . ,x k ) ′ is the mean of the covariate x k for species k repeated n times, K is a matrix of the random species effects sampled for MCMC iteration l, and θ (l) are estimated regression coefficients relating the estimated species intensities at iteration l to x k . The posterior mean E R 2 (x k )|Y provides a measure of community confounding for the covariate x k and can help identify which fixed effects will vary between the unrestricted and restricted models. Furthermore, we can use the global F-test of the linear relationship between x k and to determine if confounding exists.

Simulation study
We performed a simulation study to investigate the effects of community confounding and orthogonalization of the fixed and random species effects on model fitting. Specifically, we compared the effective sample sizes of β and η for three different models for confounded and unconfounded data with unrestricted and restricted parameterizations. The effective sample size (ESS) is the number of independent MCMC samples of a quantity and is a metric for measuring the sampling efficiency of an MCMC algorithm. Higher ESS are preferable as posterior distributions of quantities of interest can be obtained in fewer iterations.
We considered three models: The joint probit occupancy model, joint Royle-Nichols model, and joint normal model, which is derived from the scenario where in the Royle-Nichols is known (e.g., species density data). For each model, 150 datasets were generated with the fixed and random species effects independent and another 150 datasets were generated with confounding between the fixed and random species effects. To induce confounding between the fixed and random species effect, we expressed one covariate of the first species as a linear combination of the random species effects (i.e., x 1 = �θ).
Because the ratio of the random effects and random error magnitude is known to affect the severity of confounding in the spatial context 29,31,35 , we varied the magnitude of the random species effect in each model while holding the random error magnitude constant. Specifically, each dataset was subdivided into thirds with 50 datasets simulated to have small, medium, and large random species effects relative to the random error.
All 900 simulated datasets across models and confounding levels were for K = 2 species across n = 50 sites with J = 10 occasions per site for the occupancy models. The correlation between the two species was allowed to vary for each dataset. Each habitat design matrix included an intercept and one continuous covariate. Each MCMC algorithm was run for a burn-in period of L = 10000 to ensure convergence. The next L = 10000 iterations were used to calculate the posterior quantities in Table 1. Code for performing the simulation study in R are available in the supplementary electronic files.
For both β and η , ESS was lower on average for the confounded data than the unconfounded data for all three models demonstrating the negative impacts confounding can have on model fitting. For all three models, the computational impact of fitting the restricted parameterization did not differ depending on whether confounding Table 1. Summary of simulations results. All results are averaged across 3 magnitudes of random species effects and 50 simulated datasets. ESS Ratio is the effective sample size of the restricted parameterizations over the unrestricted and the mean ESS is the average of the two. E R 2 (x 1 )|Y is the posterior mean R 2 of confounding for species 1 continuous habitat covariate. Rejection rate is the portion of times the the posterior mean p-value from overall F-test of a linear relationship between x 1 and was below 0.05. www.nature.com/scientificreports/ exists or not. In the case of the normal and probit models, fitting the restricted parameterization improved ESS for both β and η , although the gains were much greater for the normal model. On the other hand, the restricted parameterization of the Royle-Nichols model did not improve ESS for β or η . The success of our method for detecting community confounding differed across models. The method was most powerful for the normal model followed by the probit and Royle-Nichols models.

Camera trap survey
Study area. We analyzed data arising from a study area comprised of subalpine forests in the state of Colorado between 2590 and 3660 m elevation (Fig. 1). Sites were restricted to public lands managed by the United States Forest Service, National Park Service, Bureau of Land Management, and Colorado State Forest Service. Forests in our study area were primarily composed of Lodgepole pine (Pinus contorta), Engelmann spruce (Picea engelmannii), and subalpine fir (Abies lasiocarpa). Lodgepole pine was dominant at lower elevations as well as higher elevations that were drier and/or on south-facing slopes; high elevation regions that had cool north-facing slopes were co-dominated by Engelmann spruce and subalpine fir. Lodgepole pine is restricted to the northern two-thirds of Colorado, so all sites in the southern region of the study area were Engelmann spruce, subalpine fir co-dominated. Quaking aspen (Populus tremuloides), Douglas-fir (Pseudotsuga menziesii), bristlecone pine (Pinus aristata), limber pine (Pinus flexilis), and blue spruce (Picea pungens) were also present at some sites. Mean July and January temperature across the study area were 14 and − 6.1 °C respectively. All camera data were collected during summers 2013-2014.
Sampling design. The primary goal of Ivan et al. 50 was to assess mammalian responses to bark beetle outbreaks, thus sites were randomly selected to facilitate inference on the beetle outbreak covariates. Beetle outbreak covariates included the number of years since the initial outbreak (YSO) and the severity of the outbreak measured by mean overstory mortality (severity). The sample of n = 300 , 1 km 2 sites was evenly split across the two dominant forest types, spruce-fir and lodgepole pine. Additional environmental covariates were collected at each site, and a description of these is included in Appendix B. www.nature.com/scientificreports/ Passive infrared camera traps (Reconyx PC800, Holmen, Wisconsin, USA) were deployed near the center of each site. Cameras were approximately 0.5 m above the ground and pointed toward a lure tree 4-5 m away 52 . The setup was designed to maximize detections of both large and small-bodied mammals in the local community while minimizing attraction of individuals from outside the sampling region of the site. The sampling regions were likely not closed to immigration/emigration; thus, we interpret elevated detections at a site as more individuals using, as opposed to occupying, that site 53 . For additional details regarding the sampling design and study area see Ivan et al. 50 .
Model fitting. We fit both the Royle-Nichols and probit occupancy models to the camera trap data binned into 20 two-day occasions because simulations showed this was the number of replications needed to identify a quadratic effect of occasion on individual detection probability. Not all cameras were operational for the entire 40 day sampling period, and thus the number of occasions varied from 7-20. We discarded four sites at which the camera was operational for less than one occasion. We also discarded another 12 sites that had been infested by bark beetles for more than 10 years. Ivan et al. 50 truncated the bark beetle infestation covariate at 10 years because estimates of response curves beyond 10 years would be unreliable with so few sites. The final sample size was n = 284 sites. We built distribution models for the 13 species for which Ivan et al. 50 performed a single species analysis; several rare species were excluded from analysis due to insufficient detections. We note, however, that these rare species parameters may be identifiable in the joint model as has been the case in previous studies 2,47,54-57 . Our final dataset then included 3692 unique encounter histories at n = 284 sites, stacked across K = 13 species.
Ivan et al. 50 used a sequential procedure similar to that described in Lebreton et al. 58 to select the covariates in the occupancy and detection processes for each species. We adopted their detection model and used the same covariates but a different set of basis functions for YSO. Ivan et al. 50 treated YSO as a grouping variable and considered probability of use response curves that allowed for cubic associations and delayed responses to bark beetle infestation. Multiple response curves were model averaged to produce predictive YSO response curves for each species. We used orthogonal polynomial basis functions for the YSO variable in the species intensity models. The basis functions included a linear (YSO1) and quadratic (YSO2) effect. Appendix E provides a full description of the intensity and detection models. All continuous covariates were scaled to have mean 0 and variance 1.
We fit all models using Markov chain Monte Carlo (MCMC). To improve mixing and predictive ability, we regularized the coefficients β and α with informative priors: β ∼ N (0, I) and α ∼ N (0, I) 26 . We specified a vague prior of −1 spp ∼ Wishart(15, (15I) −1 ) for the species variance-covariance matrix 59 . For the Royle-Nichols model, we used Gibbs sampling based on conjugate priors for parameters spp , η , and β and Metropolis-Hastings updates for N , , and α . Derivations of the conjugate full-conditional distributions are provided in Appendix C with details about the Metropolis-Hastings updates. We tuned the Metropolis-Hastings updates so that acceptance rates varied between 20 and 40% for α , N , and . Using data augmentation 17,[43][44][45] , all the parameters of the probit model can be sampled with Gibbs updates.
We set τ 2 = 2.25 in both (9) and (13). This choice was supported by the asymptotic equivalence between Poisson and logistic regression. In a generalized occupancy model, the latent probability of occupancy is specified as logit(ψ i ) ∼ N (x ′ i β, τ 2 ) . Hanson et al. 60 investigated the relationship between the prior on β and induced prior on the latent probability of success ψ i in logistic regression; their work showed that specifying an uninformative normal prior on β (i.e., setting τ 2 large) induces a U-shaped prior for ψ i with most of the density concentrated near 0 and 1. Broms et al. 14 recommended setting τ 2 = 2.25 in occupancy models, which results in a relatively flat prior for ψ . For rare species, i in (9) and (13) is analogous to ψ i , and specifying a variance of τ 2 = 2.25 is minimally informative.
Baddeley 61 motivated the asymptotic equivalence of Poisson and logistic regression in a spatial context where counts of points from a non-homogeneous Poisson process are recorded in a lattice; they showed that, as the grid cells of the lattice become infinitesimally small, the inference yielded from Poisson and logistic regression are equivalent. This result can be applied more generally to any dataset where there is a high proportion of zero counts. We demonstrate the asymptotic equivalence between Poisson and logistic regression in the Royle-Nichols model in Appendix D.
We ran the MCMC algorithm for L = 50000 iterations, and discarded the first 12500 iterations as burn-in. We fit an SDM, unrestricted JSDM, and restricted JSDM of both the Royle-Nichols and probit occupancy models. The "Results" section presents inference for regression coefficients for all six model fits. Ivan et al. 50 fit SDMs to infer changes in mammalian use of stands impacted by the bark beetle epidemic. The impact of bark beetle damage was measured by years since initial infestation (YSO) and severity of outbreak quantified by mean overstory mortality (DeafConif). The posterior distributions of the regression coefficients varied between the probit SDM and unrestricted JSDM, although the magnitude of difference differed by species (Fig. 2). The posterior variances of the SDM regression coefficients were smaller than the unrestricted JSDM. Posterior variances and means of the restricted probit JSDM regression coefficients quite similar to those from unrestricted JSDM. The only noticeable difference between the unrestricted and restricted regression coefficients was that the restricted coefficients had slightly smaller posterior variances on average.

Results.
As with the probit modeling results, posterior distributions of the regression coefficients in the Royle-Nichols SDM were more concentrated near zero than those of the JSDM (Fig. 3). Also, posterior distributions of the restricted JSDM regression coefficients were slightly tighter and centered closer to zero.
We calculated the unrestricted and restricted posterior correlation matrices for both the probit and Royle-Nichols models. Pairwise differences between each entry of the posterior mean of the four correlation matrices were bounded between (−0.2, 0.2) , so only the correlation matrix of the unrestricted Royle-Nichols model  www.nature.com/scientificreports/ is shown (Fig. 4). The posterior distributions of the pairwise correlations all overlapped zero except for the pairwise correlations between coyotes and golden-mantled ground squirrels, coyotes and red squirrels, and golden-mantled ground squirrels and red squirrels. In the restricted probit JSDM, the correlations between coyotes and snowshoe hares, and snowshoe hares and red squirrels also did not overlap zero. We calculated the posterior R 2 of confounding for each covariate in each species specific model as described in (20). All posterior R 2 were below 0.05 for both the Royle-Nichols and probit models giving no indication of community confounding for all covariates considered.

Discussion
We found that confounding between the fixed and random species effects can reduce sampling efficiency in MCMC algorithms and that orthogonalizing the fixed and random species effects can alleviate this problem when fitting some joint species distribution models. In the simulation study, we discovered that, even when the data were not confounded, orthogonalizing the fixed and random species effects still conferred a computational benefit for the normal and probit model. This was also true for our case study where the mean effective sample size of the conditional habitat effects β in the probit model was 32% larger when fit with the restricted parameterization. The effective sample size of η in the probit model was 3% greater for the restricted parameterization.
The case study indicated that inference on species-environment associations in occupancy models can change based on whether the distribution model accounts for community structure. Orthogonalizing the fixed and random species effects in the probit and Royle-Nichols model slightly reduced but did not nullify the differences as in the case for normal data. The similarity between the restricted and unrestricted JSDM coupled with the lack of evidence for community confounding suggests additional mechanisms lead inference in SDMs and JSDMs to www.nature.com/scientificreports/ differ, a finding consistent with Caradima et al. 63 . Overall, there was still large agreement in posterior inference produced by the SDM and JSDMs for both occupancy models. In additional simulation studies on the probit and Rolye-Nichols occopancy models, we found that community confounding can lead to larger differences between the SDM and unrestricted JSDM and that the restricted JSDM again mitigates but rarely nullifies these differences. We were also interested in whether the Royle-Nichols model could identify additional associations compared with the probit model. The Royle-Nichols model measures associations conditional on an intensity process rather than an occupancy process, and intensity is likely a function of additional factors beyond those influencing occupancy [19][20][21] . For the camera trap data, the opposite was true, in that the probit model identified more environmental-species and species-species associations. One possible explanation for this is that the probit model is more parsimonious which sharpens posterior distributions.
A related method to restricted regression, which orthogonalizes the fixed and random effects, is principal components regression, which performs an orthogonalization procedure solely among the fixed effects. To motivate their similarities, consider a simpler case where the latent intensities, , of the K species in our community were known. We could construct K regression models for predicting each species intensity as follows: where −k = 1 , . . . , k−1 , k+1 , . . . , K is a matrix of the K − 1 other species intensities. If X k and −k were highly collinear, principal component regression might be applied. Principal components regression is so named because it decomposes the variation explained by X k and −k into p = p 1 + p 2 principal components, Ŵ k = γ 1 , . . . , γ p , where p 1 and p 2 are the number of columns of X k and −k respectively. The p principal components retain all the information explained by X k and −k but are orthogonal. The regression model often improves sampling efficiency and can recover the posterior means and variances of β k and η k in (21). However, inference on β k and η k is often adjusted by truncating off the last p − r , for r < p , eigenvectors of Ŵ k and employing the new design matrix By retaining only the first r principal components, the smallest sources of variation are ignored in the estimation of β k and η k . Jeffers 64 implemented this approach truncating off the last 7 of 13 principal components to adjust the estimates of regression coefficients relating various tree characteristics to maximum compressive strength.
Other studies have selected a subset of principal components based on their strength of association with the response variable [65][66][67][68] . In some cases, the coefficient estimates from these reduced rank approaches appeared more tenable than those from the full rank specifications based on known physical relationships between the predictors and response. Thus, like restricted regression, principal components regression can be used for solely computation purposes or to adjust inference.
Recently, concerns regarding the coverage properties of the fixed effects estimator under restricted regression have been expressed 36,69 . For example, Zimmerman and Ver Hoef 69 showed that applying any restricted regression method to a SGLMM leads to frequentest coverage of the fixed effects that is lower than the corresponding non-spatial model. Similarly, Khan and Calder 36 found that when fitting a restricted version of the SGLMM with an intrinsic conditional autoregressive prior, credible intervals of the fixed effects from the restricted model were generally nested inside those yielded by the non-spatial model. Given these results, both Zimmerman and Ver Hoef 69 and Khan and Calder 36 recommended reverting to inference from the non-spatial model, rather than that of the restricted SGLMM, when inference from the unrestricted SGLMM appears untenable.
We did not observe the same pattern in our restricted JSDM but found the length of credible intervals of the restricted regression coefficients to generally be between that of the SDM and unrestricted JSDM. Nonetheless, if higher coverage is desired, one can always extract the conditional coefficients from the restricted JSDM while still benefiting from the increased stability that results from orthogonalizing the fixed and random effets. When deciding between inference from the restricted and unrestricted JSDM, one should also consider the random species effects η . Because the random effect η is rarely of interest in spatial applications, there has been little investigation on the inferential impacts of restricted regression on η . Such investigation, however, may be helpful in determining the appropriateness of restricted regression for JSDMs.
There are several conceptual facets to consider regarding the application of restricted regression in joint species distribution modeling. Frequently, JSDMs are described as accounting for residual correlations between species that cannot be explained by the environmental covariates 4,5,63 . We have shown, however, that in some www.nature.com/scientificreports/ JSDMs have been described as correcting our knowledge of species-environment relationship by accounting for interspecies dependence 5 . Poggiato et al. 5 argued that JSDMs help us better quantify uncertainty regarding species-environment relationships, but they cannot explain discrepancies in a species theoretical and realized niche. We agree that phenomenological JSDMs should not be used to disentangle the marginal effects of environment and interspecies dependence on species distributions and would recommend the development of mechanistic models to investigate interspecies-environment associations.
Experimental methods and modeling techniques for alleviating confounding have been proposed in ecology. Hefley et al. 70 showed that replicate populations can help disentangle confounded fixed and random effects. In the context of joint species distribution modeling, replication involves analyzing several communities simultaneously, which is often infeasible. Hefley et al. 70 also recommended explicit population models rather than phenomenological regression-based models for analysis of temporally confounded count data. Similarly, Fieberg et al. 71 advocated for mechanistic models guided by causal diagrams for analyzing temporally confounded animal movement data. An avenue of future research for joint species distribution modeling is to compare inference from phenomenological regression-based models, such as the one proposed here, with that of models that explicitly include ecological mechanisms such as competitive exclusion, mutualism, and predation. Because community and temporal confounding have the same mathematical framework, mechanistic models are a promising solution for confounded multispecies data.
In summary, we specified a JSDM that accounts for interspecies dependence at the intensity level, and examined how inference from the joint model differed from the joint probit model. We performed a simulation study on three JSDMs to examine the computational difficulties associated with community confounding and investigated whether orthogonalizing the fixed and random species effect could alleviate these difficulties. Further, we considered how inference in both occupancy models differed depending on the assumed community structure. Lastly, we discussed how joint species distribution modeling is distinct from spatial and time series applications in that the random effect is almost always of inferential interest, and hence, adjustments to the regression coefficients, β , and random effects, η , should both be considered. Our main conclusion is that, even for researchers who desire inference solely on the conditional relationship between the fixed species-environment and random species effects, fitting the JSDM with a restricted parameterization can give computational benefits.

Data availability
The data are available in the Supplementary Information files.

Code availability
All algorithms and code for fitting and analyzing results from the six model variants are available in the Supplementary Information files. All MCMC algorithms and analyses were coded in R 4.1.2 62 .