Abstract
Two facts about cortex are widely accepted: neuronal responses show large spiking variability with near Poisson statistics and cortical circuits feature abundant recurrent connections between neurons. How these spiking and circuit properties combine to support sensory representation and information processing is not well understood. We build a theoretical framework showing that these two ubiquitous features of cortex combine to produce optimal samplingbased Bayesian inference. Recurrent connections store an internal model of the external world, and Poissonian variability of spike responses drives flexible sampling from the posterior stimulus distributions obtained by combining feedforward and recurrent neuronal inputs. We illustrate how this framework for samplingbased inference can be used by cortex to represent latent multivariate stimuli organized either hierarchically or in parallel. A neural signature of such network sampling are internally generated differential correlations whose amplitude is determined by the prior stored in the circuit, which provides an experimentally testable prediction for our framework.
Similar content being viewed by others
Introduction
In an uncertain and changing world, it is imperative for the brain to reliably represent and interpret external stimuli. The cortex is essential for the representation of the sensory world, and it is believed that populations of neurons collectively code for richly structured sensory scenes^{1}. However, two central characteristics of cortical circuits remain to be properly integrated into population coding frameworks. First, neuronal activity in sensory cortices is often noisy, showing significant variability of spiking responses evoked by the same stimulus^{2,3}. In many traditional coding frameworks such spiking variability degrades the representation of stimuli by cortical activity^{4}. Why cortical responses display large spiking variability while isolated cortical neurons can respond reliably remains far from clear. Second, the primary source of synaptic inputs to cortical neurons does not come from upstream centers which convey sensory signals, but rather from recurrent pathways between cortical neurons^{5,6,7}. While such recurrent connections are often organized about a stimulus feature axis^{8,9}, it is not obvious whether or how their presence improves overall representation. We propose a biologically motivated inference coding scheme where these two ubiquitous cortical circuit features, variability in spike generation and recurrent connections, together support a probabilistic representation of stimuli in rich sensory scenes.
Numerous studies have framed sensory processing in the cortex in terms of Bayesian inference (e.g., refs. ^{10,11,12,13,14,15,16}). Specifically, the ‘Bayesian brain’ hypothesis posits that sensory cortex infers and synthesizes a posterior distribution of the latent stimuli which describes the probability of possible stimuli that could have given rise to the sensory inputs. Performing Bayesian inference requires cortex to store an internal model that represents how sensory inputs and external stimuli are generated. Once a sensory input is received, cortical dynamics inverts this internal model in a process termed “analysisbysynthesis”^{12}, and represents the posterior distributively across neurons and/or across time^{15,16}. In this study, we propose that recurrent connections in cortical circuits store the prior of latent stimuli to produce the posterior distribution when combined with evidence from sensory inputs. Moreover, we posit that Poisson spiking variability provides a source of fluctuations needed for generating random samples from the inferred posterior.
To test these hypotheses, we consider a recurrent circuit model where neurons receive stochastic feedforward inputs which carry information about the external world, and respond with Poissondistributed spiking activity. We find that such Poissonian spiking provides the variability that allows the network to generate samples from posterior stimulus distributions with differing uncertainties. We use this sampling framework to illustrate circuitbased Bayesian inference given two distinct generative models of stimuli in the external world: one organized hierarchically with a stimulus variable that depends on a latent stimulus parameter, and a second where a pair of latent stimuli are organized in parallel. In both cases, a recurrent circuit is able to generate samples from the joint posterior, and infer the values of the latent variables. We show through both analytic derivation and simulations that recurrent connections represent the correlation structure of these models, and the weight of these connections can be tuned to optimally capture the prior distribution of stimuli in the external world. The stronger the correlation between the latent variables, the stronger the recurrent connections need to be for the network to generate samples from the correct posterior distribution.
Finally, a neural signature of this circuitbased sampling mechanism is internally generated population noise correlations aligned with the stimulus response direction, often referred to as “differential correlations”^{4,17}. In our framework, the amplitude of internally generated differential correlations is determined by the recurrent connection strength, which also determines the prior stored by the circuit. Since optimal inference requires a specific magnitude of recurrent connectivity, differential correlations resulting from such recurrent connectivity are a potential signature of optimal coding. This is in contrast to the deleterious impact of externally generated differential correlations. We thus predict that the correlation structure of the external world shapes recurrent wiring in neural circuits, and is reflected in the pattern of differential noise correlations. We use this logic to provide testable predictions from our framework for samplingbased Bayesian inference by recurrent, stochastic cortical circuits.
Results
Recurrent circuitry and spiking variability do not improve conventional neural codes
We start with the classic example of a sensory stimulus, s, encoded in neuronal population activity, r, from which a stimulus estimate \(\hat{s}\) can be decoded (Fig. 1a, top)^{18}. It is reasonable to expect that neuronal circuitry is adapted to accurately represent ethologically relevant stimuli. However, as we will show next, in simple coding schemes two ubiquitous features of cortical circuits – internal spiking variability and recurrent connectivity – are at best irrelevant for, and in many cases degrade, the accuracy of these representations.
In population coding frameworks stimuli are encoded by a neuronal population with individual neurons tuned to a preferred stimulus value. The preferred values of all neurons cover the whole range of stimuli^{18,19,20} (Fig. 1b, bottom); if s ranges over a periodic domain (such as the orientation of a bar in a visual scene, or the direction of an arm reach) then it is commonly assumed that the neurons’ preferred stimuli are distributed on a ring (Fig. 1b, top). To generate neuronal responses from such a population we simulate a network of neurons whose spiking activity, r_{t}, at time t is Poissonian with instantaneous firing rate λ_{t} (Eq. (11)). For simplicity we assume linear (or linearized) neuronal transfer and synaptic interactions (Eqs. (10), (11)), so that the firing rate is a linear function of the feedforward and recurrent inputs. We couple excitatory (E) neurons with similar stimulus preferences more strongly^{8,9} to one another, compared to neuron pairs with dissimilar tuning. In this way, the recurrent E connectivity has the same circular symmetry as the stimulus (Fig. 1b). In contrast, connections between inhibitory (I) neurons are unstructured, and inhibitory activity acts to stabilize network activity^{21}. A stimulus, e.g., s = 0, results in elevated activity of E neurons with the corresponding preference (Fig. S1a). As expected, an increase in the strength of recurrent excitatory connections increases both the firing rates and the trialtotrial pairwise covariability (i.e., noise correlations) in the responses^{2} (Fig. S2a). This canonical network model has been widely used to explain cortical network dynamics and neural coding^{21,22,23}. And our network model can produce neuronal responses that are qualitatively similar to experimental observations, including the variance of neuronal firing rate, the Fano factor, and the noise correlations (Fig. S2b–d).
We use linear Fisher Information (LFI) to quantify the impact of recurrent connectivity and internal spiking variability on the accuracy of the stimulus estimate, \({\hat{s}}_{t},\) from the activity vector r_{t} (see details in Eq. S39 in Supplementary Information). The inverse of LFI provides a lower bound on the expected square of the difference between the true value, s, and the estimate, \({\hat{s}}_{t},\) made by a linear decoder^{1,4,17,18,19,24}. In the limit of an infinite number of neurons available to the decoder LFI is unaffected by recurrent connectivity strength, w_{E} (Fig. 1d, dashed line). This is because the mean response of the network is linear in its inputs, and an (invertible) linear transformation can neither increase nor decrease LFI (see Eq. S38 in Supplementary Information). For networks with a finite number of neurons, the variability from spike generation is shared between neurons via recurrent interactions. Consequently, an increase in coupling strength, w_{E}, reduces LFI in finite networks (Fig. 1d, colored lines).
In sum, recurrent connectivity and spiking variability do not improve, and often degrade, stimulus representation in the network (as measured by LFI). Since synaptic coupling is biologically expensive, a network that most accurately and cheaply represents a stimulus is then one with no recurrent connections (i.e., w_{E} = 0) and minimal spiking variability. Nevertheless, connectivity in mammalian cortex is highly recurrent^{5,6,7,9}, and neural responses are highly variable^{2,3}. What is then the function of these extensive recurrent connections between cortical neurons in information representation, and why are their responses so noisy?
While classical population code theory often explains how to generate point estimates of a stimulus (Fig. 1a), numerous studies suggest that the brain performs Bayesian inference to synthesize and estimate the probability distribution of latent stimuli from sensory inputs (e.g., refs. ^{10,11,12,13,14,15,25,26}). To compute this posterior a neural circuit needs to combine a stored representation of the prior distribution of the stimulus with the likelihood conveyed by feedforward inputs. We propose that recurrent connectivity can be used to represent the prior and spiking variability can generate samples from this posterior distribution. Before we present our full model we first show how samplingbased inference can be implemented in a population of spiking neurons.
Internally generated Poisson spiking variability drives samplingbased Bayesian inference
Many studies suggest that neuronal response variability is a signature of samplingbased Bayesian inference in neural circuits (e.g., refs. ^{16,27,28,29,30,31,32,33,34}). In these studies, the instantaneous population responses, r_{t}, represent a sample of a latent stimulus, and the empirical distribution of stimulus samples collected over time is an approximation of the posterior distribution. Implementing sampling requires a network that generates variable output with stable statistics. It has been well documented that cortical spiking responses are often approximately Poissonian^{3,35}. Theoretical studies suggest that such Poissonian variability can be internally generated in a network with dynamically balanced recurrent excitation and inhibition^{36,37}. We thus assumed that our model neurons are Poissonian, and used the resulting fluctuations as the internal source of variability needed for samplingbased Bayesian inference. It remains to be shown if discrete Poissonian variability can be used to generate samples from stimuli with continuous probability distributions (e.g., orientation, moving direction) with the flexibility needed to represent different stimulus uncertainties. However, spike counts are discrete, and it is possible that errors that arise from representing continuous parameters by discrete random variable are characteristic of stimulus inference by animals that use sampling.
We address this question using a theory based on a simple model network composed of excitatory (E) Poissonian neurons (Eqs. (10), (11)), and subsequently support our findings by simulating a network containing both E and inhibitory (I) neurons (e.g., Fig. 1b). We start by showing that Poissonian spiking in a population of tuned neurons can drive sampling from a well–defined distribution. We assume that the instantaneous firing rates of a population of E neurons, λ_{t}, have a bellshaped (Gaussian) profile (Fig. 2b), so that for the jth neuron \({{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{tj}=R\exp [{{{{{{{{\bf{h}}}}}}}}}_{j}({\bar{s}}_{t})]=R\exp [{({\bar{s}}_{t}{\theta }_{j})}^{2}/2{a}^{2}]\) (See Eq. (12) in Methods). Here θ_{j} is the preferred stimulus of neuron j, a is the width of the tuning curve, and \({\bar{s}}_{t}\) is the location of the peak of the firing rate profile, λ_{t}, in stimulus space (xaxis in Fig. 2b). Note that the value of \({\bar{s}}_{t}\) is arbitrary here, but we will later relate it to the input to the population. Finally, the preferred stimuli of the E neurons, \({\{{\theta }_{j}\}}_{j=1}^{{N}_{E}}\), are uniformly distributed over the stimulus range (Fig. 1b). In each time interval the population activity is given by a vector of independent Poisson random variables, r_{t}, with means determined by the instantaneous firing rate vector λ_{t} (Fig. 2b, c). At each time, t, this spiking activity produces a stimulus sample, \({\tilde{s}}_{t}\), from the probability distribution determined by the instantaneous firing rates, λ_{t} (Fig. 2d, see Methods),
With the Gaussian firing rate profile we use here, the stimulus sample, \({\tilde{s}}_{t},\) can be read out as \({\tilde{s}}_{t}={\sum }_{j}{{{{{{{{\bf{r}}}}}}}}}_{tj}{\theta }_{j}/{\sum }_{j}{{{{{{{{\bf{r}}}}}}}}}_{tj}\) (Eq. (14) and Fig. 2d), which can be thought of as the location of the response, r_{t}, in stimulus space (yaxis in Fig. 2c). The collection of stimulus samples across time (\(\{{\tilde{s}}_{t}\}\); Fig. 2e), determines the sampling distribution \(q(s)={T}^{1}{\sum }_{t}\delta (s{\tilde{s}}_{t})\) which approximates the distribution p(s∣λ_{t}), i.e., p(s∣λ_{t}) ≈ q(s)^{16,38}. Here, δ( ⋅ ) is the Dirac delta function and T is the number of samples. We assumed that instantaneous population firing rates are smooth to simplify the analysis, but this assumption is not essential. Sampling driven by Poissonian variability will work as long as the temporally averaged population firing rate is smooth, even if the instantaneous population firing rate is noisy (see Eq. (17)).
To use this mechanism to produce samples from the posterior distribution of a stimulus, we must define a generative model for the feedforward inputs evoked by a stimulus. We take the feedforward input to the neural population, u^{f}, to be a vector of independent Poisson spike counts with Gaussian tuning over the stimulus, s. Following assumptions widely used in previous studies of probabilistic population codes (PPC)^{39,40}, we assume that the mean input spike count to the jth excitatory neuron in the population is \(\langle {{{{{{{{\bf{u}}}}}}}}}_{j}^{{\mathsf{f}}}(s)\rangle \propto \exp [{{{{{{{{\bf{h}}}}}}}}}_{j}(s)]=\exp [{(s{\theta }_{j})}^{2}/2{a}^{2}]\). A single realization of the input, u^{f}, in a time interval encodes the whole likelihood function over the stimulus, p(u^{f}∣s) ^{39}. This likelihood is proportional to a Gaussian due to the Gaussian profile of feedforward input (Eq. (19)),
Here the likelihood mean, μ_{f}, is determined by the location of u^{f} in stimulus space, and the precision, Λ_{f}, is proportional to the spike count (or height) of u^{f} (Eq. (20)). Since a realization of the feedforward input encodes the whole likelihood function, we present a fixed u^{f} to the network over time (dropping the time index t), and describe how samples from the posterior p(s∣u^{f}) are generated by the network.
A simple example of inference via sampling is provided by a population of E neurons without recurrent connections and instantaneous firing rates equal to the feedforward input, λ_{t} = u^{f} (Eq. (10)), and hence constant in time (Fig. 2a). In this feedforward network Poisson spike generation produces samples from the normalized likelihood, i.e., \({\tilde{s}}_{t} \sim p(\tilde{s} {{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{t})\propto p({{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}} \tilde{s})\), and consequently the network represents a uniform stimulus prior (i.e., p(s) is a constant).
To test our theory, we simulated the response of a network of tuned excitatory (E) and untuned inhibitory (I) neurons (Fig. 2a, c) to a fixed but randomly generated feedforward input (Eq. (18)). While the E neurons shared no recurrent connections, the E and I neurons were connected to maintain stable network activity. To confirm that the overall firing rate dictated the sampling variability (Eq. (1)), we increased the feedforward input rate, which reduced the width of the likelihood (Eq. (2)). As a result, the sampling precision (inverse of the sampling variance) increased and matched the precision of the likelihood (Fig. 2g, h), even as the normalized response variability (measured the by Fano factor) of single neurons remained unchanged.
While the above analysis introduces the key components of a samplingbased theory of inference, stimulus sampling using a feedforward network is unnecessary: A single observation of the response r in a deterministic feedforward network (r = u^{f} after removing spike generation in Eq. (11)) would also represent the whole likelihood^{39}, avoiding the costly process of collecting samples \({\tilde{s}}_{t}\) across time. We next consider more interesting cases, and show that spiking variability in recurrent networks can drive sampling from more complex posterior distributions.
Recurrent cortical circuit samples a hierarchical generative model
Recurrent networks can store a variety of generative model structures; to demonstrate the generality of our sampling framework we provide two example generative models which serve as building blocks for more complex models. We first consider a twostage hierarchical model of feedforward inputs received by the cortical circuit (Fig. 3a). The first stage of our model consists of a stimulus, s, and a stimulus parameter, z, both of which are one dimensional for simplicity. The structure of the world is described by the joint distribution, p(s, z). Using the visual system as motivation, s, could be the orientation of the visual texture within a classical receptive field (local information) of a hypercolumn of V1 neurons, while stimulus parameter, z, may refer to the context orientation within a nonclassical receptive field of these cells (Fig. 3a). The likelihood of the stimulus based on a given parameter, \(p(s z)={{{{{{{\mathcal{N}}}}}}}}(s z,{\Lambda }_{s}^{1}),\) is Gaussian with precision Λ_{s}. For simplicity, we assume that the prior, p(z), is uniform, which implies that the marginal prior of s, is also uniform (Fig. 3b). This assumption is not essential for our main conclusions but does simplify the analysis. Importantly, the joint prior of stimulus and stimulus parameter, p(s, z), can have nontrivial structure with the density concentrated around the diagonal s = z (Fig. 3b). The precision Λ_{s} measures how strongly z and s are related, and thus determines how strongly their joint distribution is concentrated around the diagonal.
The second stage of the generative model describes how the feedforward input depends on the stimulus, s; this is identical to our prior treatment (See Eq. (2)). Combining these two stages provides a complete description of the generative model for the feedforward input received by neurons in the population,
Given this hierarchical model, we can show that the joint posterior over stimulus and stimulus parameters, p(s, z∣u^{f}) is a bivariate normal distribution (see Eq. (24)), and we next use it to evaluate the accuracy of the sampling distribution.
Gibbs sampling of the joint posterior of stimulus and stimulus parameter
One approach to approximate the joint distribution over stimulus and stimulus parameter is Gibbs sampling^{31,38,41,42} which starts with an initial guess for the value of the two latent variables, and proceeds by alternately generating samples of one variable from the distribution conditioned on the value of the second variable. More precisely, to approximate the joint posterior of s and z (Eq. (3)), Gibbs sampling proceeds by generating a sequence of samples, \(({\tilde{s}}_{t},{\tilde{z}}_{t})\) indexed by time t, through recursive iteration of the following steps (Fig. 3c and Eq. (25)),
Here Δt is the time increment between successive samples. The samples (red dots in Fig. 3d) are generated by alternately fixing the values of the two variables, so that sampling trajectories alternate between horizontal and vertical jumps (cyan lines in Fig. 3d). The empirical distribution of samples, i.e., \(q(s,z {{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})={T}^{1}{\sum }_{t}\delta [{(s,z)}^{\top }{({\tilde{s}}_{t},{\tilde{z}}_{t})}^{\top }]\) with ⊤ denoting vector transpose, approximates the joint posterior p(s, z∣u^{f}) (blue contour map in Fig. 3d, Eq. (24))^{38}. To approximate p(s∣u^{f}), the marginal posterior distribution of s, we can use only samples \({\tilde{s}}_{t}\) to obtain the approximating distribution q(s∣u^{f}) (compare the two green lines at the margin in Fig. 3d). The same is true for the marginal posterior over z.
Implementing Gibbs sampling of stimulus and stimulus parameter in a recurrently coupled cortical circuit
An implementation of Gibbs sampling in a recurrent E circuit can be intuitively understood by comparing the recurrent network dynamics (Fig. 4a) with the dynamics described by the Gibbs sampling algorithm (Fig. 3c). In the recurrent network a stimulus sample, \({\tilde{s}}_{t},\) is represented by the activity of E cells, r_{t}, while a stimulus parameter sample, \({\tilde{z}}_{t},\) is represented by recurrent inputs, \({{{{{{{{\bf{u}}}}}}}}}_{t}^{{\mathsf{r}}}\). To generate correct samples we require that the conditional distribution that is represented by the instantaneous firing rate, λ_{t} (Eq. (1)), matches the conditional distribution used in the Gibbs sampling algorithm (Eq. (4b)), so that \(p(\tilde{s} {\tilde{z}}_{t},{{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})=p(\tilde{s} {{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{t})\propto \exp [{{{{{{{\bf{h}}}}}}}}{(\tilde{s})}^{\top }{{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{t}]\). Equating the two distributions (see Eqs. (4a) and (10)) yields the relation,
This equation holds when two constraints are satisfied: First, the firing rate vector, λ_{t}, needs to have a Gaussian profile peaked at \({\bar{s}}_{t}\), i.e., the mean of \(p(\tilde{s} {\tilde{z}}_{t},{{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})\) (Eq. (4a)). Second, the peak firing rate, R, needs to be proportional to the precision of \(p(\tilde{s} {\tilde{z}}_{t},{{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})\), i.e., R ∝ Λ (see Fig. 2f, g). In a neural circuit one way for λ_{t} to satisfy these constraints is for feedforward inputs, u^{f}, and recurrent inputs, \({{{{{{{{\bf{u}}}}}}}}}_{t}^{{\mathsf{r}}},\) to both have Gaussian profiles with the same width, a, as that of λ_{t} (by sharing the same \({{{{{{{\bf{h}}}}}}}}(\tilde{s})\), Eqs. (5) and (12)). This is because the sum of two Gaussianprofile inputs with the same width, a, gives a firing rate, λ_{t}, with the same tuning, as long as the difference of the locations of two inputs is much smaller than the width, a. Our generative model (Eq. (3)) produces feedforward input, u^{f}, with a Gaussian profile and encodes the likelihood function \(p({{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}} \tilde{s})\). The recurrent input, \({{{{{{{{\bf{u}}}}}}}}}_{t}^{{\mathsf{r}}}\), then need to represent the conditional distribution \(p(\tilde{s} {\tilde{z}}_{t})\). Hence, to satisfy Eq. (5) the recurrent input \({{{{{{{{\bf{u}}}}}}}}}_{t}^{{\mathsf{r}}}\) should have the same Gaussian profile as u^{f} (Eq. (29)), with its location and magnitude determined by the mean and precision of \(p(\tilde{s} {\tilde{z}}_{t}),\) respectively.
If recurrent interactions are absent (setting \({{{{{{{{\bf{u}}}}}}}}}_{t}^{{\mathsf{r}}}=0\)), then network activity, r_{t}, generates samples from the normalized likelihood, \(p({{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}} \tilde{s}),\) as we showed previously when describing feedforward networks (Fig. 2). When neurons only receive recurrent inputs (setting u^{f} = 0), the network generates samples from the conditional distribution \(p(\tilde{s} {\tilde{z}}_{t})\). Driven by a sum of recurrent and feedforward inputs, the network generates samples from a distribution given by the product of the conditional distributions encoded by both inputs respectively (Fig. 4b, c).
The recurrent weights must be adjusted so that the recurrent input has the appropriate magnitude and width to encode the likelihood p(s∣z). To simplify the exposition we first assume that E neurons are only selfconnected, so that the width of recurrent input trivially matches that of the feedforward input (otherwise recurrence will broaden the profile of the firing rate activity λ_{t} over the network). To constrain the magnitude of the recurrent weights we require that the sum of the recurrent inputs satisfies \({\sum }_{j}{{{{{{{{\bf{u}}}}}}}}}_{tj}^{{\mathsf{r}}}\propto {\Lambda }_{s}\). Since \({{{{{{{{\bf{u}}}}}}}}}_{j}^{{\mathsf{r}}}={w}_{E}{{{{{{{{\bf{r}}}}}}}}}_{j}\) and the width of \({{{{{{{{\bf{u}}}}}}}}}_{j}^{{\mathsf{r}}}\) and r_{j} are equal, the magnitude of the recurrent weights that result in samples from the correct posterior must satisfy:
where Λ_{s} and Λ_{f} are the precision of likelihood p(s∣z) and p(u^{f}∣s) respectively (Eq. (3)). The optimal recurrent weight, \({w}_{E}^{*},\) thus encodes the correlation between the stimulus s and the stimulus parameter z. An increase in correlation between s and z, resulting in a narrower diagonal band in p(s, z) (Fig. 3b), requires an increase in the recurrent weight \({w}_{E}^{*}\) for optimal sampling. When the underlying parameter and stimulus are uncorrelated so that Λ_{s} = 0, the hierarchical generative model (Fig. 3a) is equivalent to the generative model without stimulus parameter (Fig. 2a) and recurrent interactions are not needed for sampling (i.e., \({w}_{E}^{*}\) = 0). Moreover, the optimal recurrent weight also depends on the likelihood precision Λ_{f} that is determined by the input spike count. Hence, the optimal weight needs to be adjusted depending on feedforward inputs so that samples from the correct posterior are generated (see Discussion of how this feature impacts the network sampling). Overall, our framework (Eq. (6)) thus predicts that optimal Bayesian inference is achieved with recurrent synaptic weights which depend on the correlative structure of the external world. We numerically test this prediction in the next section.
A stochastic EI spiking network jointly samples stimulus and stimulus parameter
To confirm the predictions of this analysis, we simulated a full recurrent network consisting of both E and I neurons with Poisson spike train statistics (see details in Eqs. (47)–(50)). The E neurons were synaptically connected to each other (Eq. (49), see Fig. 1a), in contrast to the simple network of selfconnected E neurons we described above. While recurrent E to E coupling broadens the tuning of excitatory recurrent input, lateral inhibition can sharpen Gaussian firing rate profiles so that it matches that of the feedforward inputs (as required by Eq. (5)).
The activity of the recurrent network in response to a fixed but randomly generated feedforward input (Eq. (3)) can be decoded to produce samples from the bivarite posterior distribution of the stimulus and stimulus parameter. As above, samples from the conditional stimulus distribution are represented by the activity of E neurons (Eq. (14)), while samples from the conditional stimulus parameter distribution are represented by recurrent inputs received by E neurons (Eq. (29); black curves overlaid on the top of population responses in Fig. 4d, e, respectively). To update recurrent inputs we only used neuronal activity at the previous time step. Thus, the activities of E neurons and their recurrent inputs were updated in alternation, consistent with Gibbs sampling. The trajectory obtained by plotting the stimulus sample read out from the network activity on one axis, and plotting the stimulus parameter sample read out from recurrent E inputs on another axis then exhibits the characteristics of Gibbs sampling (Fig. 4f, cyan line). The resulting sampling distribution provides a good approximation to the joint posterior of stimulus and context (compare red dots and blue contour in Fig. 4f). Inhibitory neurons again did not respond selectively to either the stimulus or the stimulus parameter.
For the network to generate samples from the joint posterior, the recurrent connectivity should depend on the correlation between the stimulus and the stimulus parameter (Eq. (6)). To verify this prediction, we fixed the generative model (Eq. (3)) and changed only the recurrent weights in the network. For simplicity, we only varied the peak E weight, w_{E} (Eq. (49)), and maintained network stability by fixing the ratio between E and I synaptic weights. While increasing w_{E} did not change the sampling mean, it did increase the variance of the stimulus parameter sampling distribution, and increased the correlation between stimulus and stimulus parameter samples (Fig. 5a).
We use Kullback–Leibler (KL) divergence to measure the distance between the sampling distribution, q(s, z∣u^{f}), and the true posterior, p(s, z∣u^{f}) (Eq. (24)). The KL divergence quantifies the loss of mutual information, measured in bits, between the latent variables (s and z) and the feedforward inputs, u^{f}, when the true posterior, p, is approximated by the distribution, q (Eq. (42))^{38}. The mutual information loss in the network is minimized at a unique value of the recurrent weight, \({w}_{E}^{*}\), at which the sampling distribution, q, best matches the posterior, p (Fig. 5b, black circle). To confirm that this optimal recurrent weight, \({w}_{E}^{*},\) increases with the correlation in the prior (precision Λ_{s}, Eq. (6)), we numerically obtained the recurrent weight that minimizes the mutual information loss for each value of Λ_{s} in the generative model. These results confirmed the predictions of our theory (Eq. (6), Fig. 5c): When Λ_{s} = 0, i.e., when stimulus parameter and stimulus are uncorrelated, a network with no interactions performs best (\({w}_{E}^{*}=0\)), while for small Λ_{s} (relative to Λ_{f}) the optimal weight \({w}_{E}^{*}\) is positive and increases with Λ_{s}. In total, we have described a potential mechanism for a recurrent network of spiking neurons to perform samplingbased Bayesian inference.
Generating samples from multidimensional posteriors with coupled neural circuits
To demonstrate the generality of the proposed neural code we next consider a world described by a broad, rather than deep (hierarchical) generative model. Information about each of two latent stimuli, s = (s_{1}, s_{2}), is relayed by corresponding feedforward inputs received by a neural circuit (Fig. 6a). We assume the prior is a bivariate Gaussian distribution (Fig. 6b), i.e., \(p({{{{{{{\bf{s}}}}}}}})\propto \exp [{\Lambda }_{s}{({s}_{1}{s}_{2})}^{2}/2]\equiv {{{{{{{\mathcal{N}}}}}}}}({s}_{1}{s}_{2},{\Lambda }_{s}^{1})\), so that Λ_{s }(Λ_{s} ≥ 0) characterizes the correlation between s_{1} and s_{2}. Furthermore, each stimulus, s_{m}, independently generates feedforward spiking inputs, \({{{{{{{{\bf{u}}}}}}}}}_{m}^{{\mathsf{f}}}\), each of which is received by a separate network and produces responses r_{m} for m = 1, 2 (Fig. 6a). Thus, the full generative model of the input has the form,
The likelihood \(p({{{{{{{{\bf{u}}}}}}}}}_{m}^{{\mathsf{f}}} {s}_{m})\) is the same as that given previously (Eq. (2)), where the feedforward inputs, \({{{{{{{{\bf{u}}}}}}}}}_{m}^{{\mathsf{f}}},\) are again described by conditionally independent Poisson spike counts with Gaussian tuning over stimulus s_{m}. As a concrete example, the two stimuli, s_{m}, could represent orientations of local edges falling in the central receptive fields of a V1 hypercolumn (Fig. 6a, bottom), with each V1 hypercolumn modeled by a network producing the response r_{m} (Fig. 6a, top). Then Λ_{s} characterizes a priori tendency of the stimuli to share similar orientations, and determines how likely two local edges are to be part of a global line, as in the case of contour integration^{43,44}. However, the generative model defined by Eq. (7) is quite general and has been also used to explain multisensory cue integration^{10} and sensorimotor learning^{13}.
The posterior is a bivariate Gaussian distribution (Fig. 6d, Eq. (34)) whose mean is shifted from the likelihood mean (Fig. 6c) towards to the diagonal line, because of the correlations between the stimuli in the prior (Fig. 6b). We can again use Gibbs sampling to approximate the posterior p(s∣u^{f}) using the following steps,
where \({\tilde{s}}_{1t}\) and \({\tilde{s}}_{2t}\) are instantaneous samples at time t of stimuli s_{1} and s_{2}, respectively. We only give the steps needed to produce samples from the conditional distribution of s_{1}, as samples from the conditional distribution of s_{2} can be obtained using the same steps after exchanging indices.
These sampling steps can be implemented distributively in a coupled neural circuit using a mechanism similar to that we described in the case of a hierarchical generative model. The activity of each network, r_{m}, individually represents samples from the (marginal) posterior of s_{m} (Fig. 6a, top). The joint posterior is then approximated as the collection of samples represented by the activity pairs (r_{1}, r_{2}). Taking network m = 1 as an example, spike response r_{1t} produces a stimulus sample \({\tilde{s}}_{1t}\) as long as the instantaneous firing rate λ_{1t} represents the conditional distribution \(p({\tilde{s}}_{1} {{{{{{{{\bf{u}}}}}}}}}_{1}^{{\mathsf{f}}},{\tilde{s}}_{2,t\Delta t})\) (Eq. (8a)). Since the feedforward input, \({{{{{{{{\bf{u}}}}}}}}}_{1}^{{\mathsf{f}}},\) represents the likelihood \(p({{{{{{{{\bf{u}}}}}}}}}_{1}^{{\mathsf{f}}} {\tilde{s}}_{1})\), to obtain the appropriate firing rates, λ_{1t}, the recurrent input from network 2 to network 1, \({{{{{{{{\bf{u}}}}}}}}}_{12,t}^{{\mathsf{r}}}\), must encode the correct conditional distribution, \(p({\tilde{s}}_{2,t\Delta t} {\tilde{s}}_{1})\). As in the case of the mechanism we proposed to implement sampling as described by Eq. (5), \({{{{{{{{\bf{u}}}}}}}}}_{12,t}^{{\mathsf{r}}}\) needs to have the same Gaussian profile as the firing rate λ_{1t}, the position of \({{{{{{{{\bf{u}}}}}}}}}_{12,t}^{{\mathsf{r}}}\) on the stimulus space should match the mean of \(p({\tilde{s}}_{2,t\Delta t} {\tilde{s}}_{1})\), i.e., \({\tilde{s}}_{2,t\Delta t}={\sum }_{j}{{{{{{{{\bf{u}}}}}}}}}_{12,tj}^{{\mathsf{r}}}{\theta }_{j}/{\sum }_{j}{{{{{{{{\bf{u}}}}}}}}}_{12,tj}^{{\mathsf{r}}}\), and the magnitude of \({{{{{{{{\bf{u}}}}}}}}}_{12,t}^{{\mathsf{r}}}\) must be proportional to the prior correlation, \({\Lambda }_{s}\propto {\sum }_{j}{{{{{{{{\bf{u}}}}}}}}}_{12,tj}^{{\mathsf{r}}}\) (Eq. (39)). Hence, each network can sum the feedforward input and the recurrent input from its counterpart to obtain an update to the instantaneous conditional distribution given by Eq. (8a), and generate independent Poisson spikes to produce a sample from the instantaneous conditional distribution (Eq. (8b)). Notably, the sample of each stimulus can be locally read out from corresponding network (Eq. (41), Fig. 6a), even if the activities of two networks are correlated.
Since the recurrent input strength represents the stimulus correlation in the prior determined by precision Λ_{s}, the coupling between the two networks needs to be tuned to generate the appropriate recurrent input. Indeed, in a network with only E neurons, and connections only between neurons with the same preferred stimulus value but in different networks, the optimal homogeneous connection strength is \({w}_{mn}^{*}=\langle {{{{{{{{\bf{u}}}}}}}}}_{mn,\, j}^{{\mathsf{r}}}\rangle /\langle {{{{{{{{\bf{r}}}}}}}}}_{n,\, j}\rangle={\Lambda }_{s}/({\Lambda }_{{\mathsf{f}}n}+{\Lambda }_{s})\) (Eq. (40)). This mirrors the result obtained with the hierarchical model presented earlier in Eq. (6).
Coupled EI spiking networks sample bivariate dimensional posteriors
To test the feasibility of the proposed mechanisms for generating samples from a bivariate posterior, we simulated a pair of bidirectionally coupled circuits consisting of E and I neurons (Fig. 7a). This neural circuit model can be extended to generate samples from higher dimensional posterior distribution (see Discussion). Each circuit receives feedforward input generated by one of the two stimuli. On every time step the sample of each stimulus, \({\tilde{s}}_{mt}\), can be individually and linearly read out from the response of corresponding network, r_{mt} (Eq. (41)). Jointly, the two stimulus samples, one each from both networks, \({\tilde{{{{{{{{\bf{s}}}}}}}}}}_{t}={({\tilde{s}}_{1t},{\tilde{s}}_{2t})}^{\top }\), provide a sample from the joint posterior of the two latent stimuli (Fig. 7b). We assumed that the synaptic connections between the networks, w_{mn} (m, n = 1, 2; m ≠ n), are excitatory, but target both E and I neurons, while inhibitory connections are local to each network. We also adjusted network parameters so that the profiles of the inputs across networks (e.g., the inputs from network 2 to 1) have the same tuning profile as the feedforward inputs (see Methods). Since we assumed uniform marginal priors (see Eq. (32)), recurrent connections between E neurons within the a circuit were absent, while E and I neurons within a circuit were recurrently connected to ensure network stability. For simplicity, we chose parameters so that the two circuits were symmetric, but the strength of the feedforward inputs to each could differ.
We asked whether the activity of the two coupled circuits can generate samples from bivariate posteriors, and how the sampling distribution depends on the coupling, w_{mn}, between the two circuits. An increase in synaptic coupling between the two networks caused the sampling distribution to shift from the likelihood mean towards the diagonal (Fig. 7b), resulting in stimulus samples, \({\tilde{s}}_{1t}\) and \({\tilde{s}}_{2t}\) that were more similar. This is consistent with an increase in stimulus correlation in the multivariate prior, Λ_{s} (Eq. (7)). To confirm our prediction that the optimal coupling strength between the two networks, \({w}_{mn}^{*}\), increases with the stimulus correlation in the prior, Λ_{s}, we numerically obtained the coupling weight that minimizes the loss of mutual information between latent stimuli and feedforward inputs (Fig. 7c). The optimal synaptic weight between the circuits increased with stimulus correlation in the prior. At the optimal weight, \({w}_{mn}^{*}\), the sampling distribution was close to the true posterior, showing that a properly tuned circuit can generate samples from the correct distribution (Fig. 7d).
We next asked how the sampling distribution in the network depends on network and feedforward input parameters. As the coupling between the two circuits increased, the sample means of both stimuli converge (Fig. 7e, top) and the sampling precision of both stimuli increased as well (Fig. 7e, bottom), in agreement with a more correlated stimulus prior. We also tested whether a network with fixed parameters can generate samples from a family of posteriors with different uncertainties. To do so, we changed the uncertainty of the likelihood of s_{1} by changing the firing rate in the feedforward input \({{{{{{{{\bf{u}}}}}}}}}_{1}^{{\mathsf{f}}}\) received by network 1. We observed that with a narrower likelihood of s_{1}, the sample means of both stimuli shifted towards the mean of likelihood of s_{1} (−10°), and sampling precision increased, consistent with a change in the posterior distribution (Fig. 7f). Lastly, to demonstrate the robustness of this network implementation of samplingbased inference we compare the sampling distributions to the true posteriors under different combinations of input and network parameters (Fig. 7g, h), in each case setting the recurrent coupling to the optimal value, \({w}_{mn}^{*}\), obtained numerically. Across different parameter values, we observe excellent agreement in both the mean (Fig. 7g) and precision (Fig. 7h) of the two densities. In sum, our recurrent network of spiking neuron models can be extended to support samplingbased Bayesian inference with multidimensional stimuli.
A signature of stimulus sampling: internally generated differential noise correlations
A central prediction of our circuit framework for samplingbased Bayesian inference is that an increase in the correlation between stimuli in the sensory world should result in stronger synapses between neurons whose activities represent these stimuli (see Eq. (6)). This is a difficult prediction to test since measuring synaptic connectivity along a functional axis is already challenging^{45}, let alone measuring a change in synaptic strength owing to a change in stimulus statistics. Here, we outline a testable prediction of our theory by identifying a measurable, populationlevel signature of changes in functionally related recurrent synaptic strengths.
In response to a fixed feedforward input the responses of a recurrent circuit implementing stimulus sampling will fluctuate. The alignment of the recurrent circuitry and neuronal stimulus tuning causes a portion of these activity fluctuations to align with the subspace in which stimuli are coded. As an example, consider the sampling implemented by a single recurrent network (Fig. 4a), and suppose the population response fluctuates around its mean position (0° in the example of Fig. 8a), ignoring fluctuations along other directions in neuronal response space. The activity of neuron pairs with stimulus preference both above or below the mean position are positively correlated (the black and blue neurons in Fig. 8a), while the activity of neuron pairs with preferences straddling the mean are negatively correlated (the black and red neurons in Fig. 8a). Such stimulus sampling generates a covariance component which is proportional to the outer product of the derivative of neuronal tuning (Fig. 8b), i.e., \({{{{{{{{\bf{f}}}}}}}}}_{s}^{{\prime} }{{{{{{{{\bf{f}}}}}}}}}_{s}^{{\prime} \top }\), where \({{{{{{{{\bf{f}}}}}}}}}_{s}^{{\prime} }\) denotes the derivative of tuning f(s) = 〈λ_{t}〉 (mean firing rate) over stimulus s. Such noise correlations have been referred to as differential correlations^{4,17}, and are generally viewed as deleterious to stimulus coding. Stochastic sampling in coupled networks (Fig. 6a) produces similar differential noise correlations (see Supplementary Information).
In our network implementation of sampling, the amplitude of internally generated differential correlations is not arbitrary, but is determined by the recurrent connection strength, \({w}_{E}^{*}\). Here, the differential covariance matrix of population responses has the form (see Eq. (44))
where \(V(\bar{s} {{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})\) is the variance of \({\bar{s}}_{t}\) in equilibrium over time, and \({\bar{s}}_{t}\) is the mean of the instantaneous conditional distribution (Eq. (4a)) represented by the position of instantaneous firing rate λ_{t} (Fig. 2b). Importantly, the amplitude of differential correlations increases with the recurrent weight, \({w}_{E}^{*},\) which is set by the prior precision Λ_{s} (Eq. (6); Fig. 8c). Thus, in our framework internally generated differential correlations are a byproduct of inference by sampling from posterior distributions of stimuli in a structured world.
Distinguishing external and internal differential correlations
The previous analysis of internally generated differential correlations in a circuit implementing samplingbased inference is based on the assumption of a fixed feedforward input (Eq. (9)). However, in typical neurophysiology experiments an external stimulus, s, is fixed, while the feedforward input, u^{f}, fluctuates due to variability in sensory acquisition and transmission noise (Eqs. (3) and (7)). Hence, differential correlations of neuronal population responses are a combination of correlations inherited from feedforward input^{46}, and correlations generated by recurrent network interactions that align with the population stimulus tuning^{24}. When the feedforward input is described by a hierarchical generative model (Eq. (2)), the total magnitude of differential correlations in the evoked response is \({a}^{2}{n}_{{\mathsf{f}}}^{1}{w}_{E}{{{{{{{{\bf{f}}}}}}}}}_{s}^{{\prime} }{{{{{{{{\bf{f}}}}}}}}}_{s}^{{\prime} \top }+{a}^{2}{n}_{{\mathsf{f}}}^{1}{{{{{{{{\bf{f}}}}}}}}}_{s}^{{\prime} }{{{{{{{{\bf{f}}}}}}}}}_{s}^{{\prime} \top }\) (see Eq. (46)), where the second term reflects differential correlations inherited from the feedforward input (compare with Eq. (9)). Although the two sources of differential correlations are intertwined in the neuronal response, they impact the information content differently thus offering a potential way to distinguish between them in neural data.
Externally generated differential correlations decrease with feedforward input rate, which could be modulated by visual stimulus strength such as contrast (Fig. 8d, red curve). As a consequence, the mutual information (the information between feedforward inputs u^{f} and the latent variables, i.e., s and z, sampled by recurrent network in Fig. 4a, Eq. (42)) increases with feedforward input intensity (Fig. 8d, blue curve). We, therefore, have a monotonic, decreasing relationship between externally generated differential correlations and mutual information. This is expected since such inherited correlations always impair information processing, as observed previously^{4,17}. In contrast, an increase in recurrent weights, w_{E}, increases internally generated differential correlations, but results in a nonmonotonic change in mutual information (Fig. 8b). Hence there is a nonmonotonic relation between internally generated differential correlations and the mutual information between stimulus and feedforward inputs. In sum, the impact of external and internal differential correlations on stimulus coding can be distinguished by their respective monotonic and nonmonotonic relation with the mutual information between stimulus and response.
Discussion
We have presented a framework in which neuronal response variability and recurrent synaptic connections, two ubiquitous features of cortex, are jointly used to implement samplingbased Bayesian inference in neuronal circuit models. Combining mathematical analysis and network simulations, we established that stereotypical Poisson variability of discrete spike counts can drive flexible sampling from a family of continuous distributions. The sampling statistics are determined by the structure of recurrent coupling, which stores information about the stimulus prior, and feedforward inputs conveying the stimulus likelihood. Samplingbased inference is implemented in two steps: the instantaneous firing rate, determined by the sum of feedforward and recurrent inputs, represents the instantaneous conditional distribution of latent stimulus, while Poissonian variability in spike generation is used to generate a random stimulus sample from this conditional distribution. We have shown how sampling can be implemented using biologically feasible mechanisms for three different generative models of increasing complexity. The simplest model includes one latent stimulus, while the more complex models include multiple latent stimuli organized hierarchically or in parallel. These three generative models form the basic building blocks of more complex models. Thus our ideas can be extended to a wide range of perceptual and cognitive processes^{47}.
The neural code we described shares some features with codes described in previous studies, including parametric representations in probabilistic population codes (PPCs)^{15,39,40}, and samplingbased codes (SBCs) ^{16,27,28,29,30,31,32}. In our framework, the conditional distributions of latent variables is represented by instantaneous firing rates which linearly encode the logarithms of these conditional distributions, and have a mathematical form that is similar to that used in past studies describing PPCs (e.g., Eq. (5)). Further, the posterior is represented by stimulus samples generated through a random process, a feature of all SBCs. Despite these similarities, there are fundamental differences between the neural code we described and previously proposed PPCs and SBCs.
PPCs are generally implemented in networks with no internally generated variability, with stochasticity inherited from the stimulus. In contrast, our proposed network is doubly stochastic: The Poisson variability in the feedforward input allows a single realization of the feedforward input to represent the whole stimulus likelihood^{39}, while internally generated Poisson variability drives stimulus sampling. Further, in PPCs the posterior is represented parametrically by a oneshot neuronal response, while in our proposed network the joint posterior is approximated by a sequence of samples, each obtained as a linear readout from the instantaneous neuronal responses. Although it takes time to collect sufficiently many samples to approximate the posterior well, an advantage of sampling codes compared to PPCs is that inference with multivariate posteriors can be implemented using linearly coupled subnetworks (Fig. 6), with the number of subnetworks determined by the dimension of the latent stimulus features. In contrast, to represent an Mdimensional multivariate posterior using PPCs requires N^{M} neurons in a linear network (N is the number of neurons in representing each dimension) so that the number of neurons increases exponentially with the latent stimulus dimension, M^{16}. Alternatively, coupled networks with NM neurons can be used, but require complex, nonlinear coupling between these networks^{48,49}.
Neurons emit a discrete number of spikes, but their responses often need to represent continuous quantities. Most studies of neural sampling implicitly rely on approximating Poissonian spike counts with Gaussian variables (e.g., refs. ^{29,31,51}). However, this approximation does not work well when only a few spikes are emitted. Here, we showed that discrete Poisson spike generation can be used to generate samples from a posterior distribution of a continuous stimulus feature using a temporally averaged, smooth population firing rate profile. Thus, we have shown how a sample from a continuous variable can be generated even with only a few spikes from the neuronal population. Moreover, conventional SBCs are used to generate samples directly in a neural space whose dimension is given by the number of neurons in the population^{16,27,28,30,31,32,33,34,50}, where a neuronal response, r_{t}, is interpreted directly as a sample from the (marginal) posterior of neuronal responses, p(r). Hence the posterior mean is the temporally averaged population response, and the covariance of population responses is the posterior covariance. In contrast, our proposed network generates samples in a low dimensional stimulus subspace embedded in high dimensional neural activity space. The linear projection of network activity, r_{t}, onto the stimulus subspace represents a sample from the stimulus posterior, similar to a previous study^{29}. A computational benefit of sampling in a low dimensional stimulus subspace is convergence speed, as the volume of the stimulus subspace is significantly smaller than that of the neural activity space. Indeed, in our examples sequences of samples generated by a single recurrent network (Fig. 4) and coupled networks (Fig. 6) can both converge to an equilibrium distribution in less than 20 ms, which is fast enough to complete inference on a behaviorally relevant time scale (Fig. S6). Furthermore, the multiplication of probability distributions of latent stimulus, which is central to Bayesian inference (e.g., cue combination, decision making, see review in ref. ^{15}), can be implemented by summing the inputs to a neuronal population (Eq. (5)). This follows from the fact that the instantaneous population input (or firing rate) linearly encodes the logarithm of a probability distribution (Eqs. (1) and (5)). In contrast, producing samples in neural activity space using conventional SBCs requires nonlinear operations in neural circuits in order to multiply probability distributions (or histograms) of the samples^{15}.
A recent study demonstrated that an EI recurrent network of ratebased neurons can be numerically optimized for samplingbased Bayesian inference^{32}. In contrast, we used a theoretical approach to derive a network model of simplified spiking neurons, which implements samplingbased inference. This allowed us to explicitly describe the putative neural mechanisms needed for such sampling. Although the two studies use different generative models and neural representations, the network models in both studies share some common characteristics: ring structure, Poissonlike response variability, and tuningdependent noise correlation (Fig. 1d). This implies that the seemingly different generative models and neural representations in the two studies reflect more general principles, as suggested in^{51}. It will be interesting to extend our theoretical approach to dynamical spiking neurons to determine how the timescales of neuronal dynamics and neuronal oscillations impact inference in rich, dynamic sensory scenes (see below).
Differential noise correlations generated by recurrent network interactions are a signature of network sampling in our framework (Figs. 5c and 8c). This is in contrast to earlier studies where differential correlations were inherited from feedforward inputs^{17,52}. While internally generated differential correlations could also emerge from a recurrent circuit which is not implementing inference^{22,24,52,53,54,55} or implementing inference via other algorithms^{56}, in our framework, the relation between the magnitude of internally generated differential correlations, the posterior uncertainty, and the strength of the recurrent synaptic weight (Eq. (9)) provides a clear test which can be used to verify our proposed circuit mechanism of samplingbased inference. One possible experimental approach would modulate the functional recurrent strength by using a perceptual learning task. Specifically, after using a reference stimulus set with a prescribed correlation between latent stimuli to fully train an animal, we expect that recurrent synaptic weights will strengthen or weaken to improve inference (Fig. 8e, dashed line). This will result in a fixed value of differential noise correlations in the population response due to the recurrent circuitry. Retraining with a stimulus set that has more (less) correlated latent stimuli compared to the reference set will cause the recurrent weights to increase (decrease) (Fig. 8e, red line). When the reference stimulus set is again used to drive task behavior, then performance (as a proxy of mutual information) will decrease, regardless of whether differential correlations have increased or decreased compared to those resulting from the reference stimulus set (Fig. 8e, arrows). In brief, the nonmonotonic relationship between differential noise correlations and the mutual information between stimulus and responses which support Bayesian inference offers a clear (and falsifiable) experimental prediction.
Implementing samplingbased inference in our proposed network requires that feedforward and recurrent inputs have the same tuning profile over the stimulus (Eq. (5)). This assumption is supported by experiments in layers 4 and 2/3 in mouse V1^{8}. Moreover, the recurrent connections in our network model are translationinvariant in the stimulus subspace, an assumption widely used in studies of continuous attractor networks (CAN)^{22,54,57,58}, and a recent network model implementing sampling^{32}. Perfectly translationinvariant connections are not strictly required for a circuit to implement sampling, but this assumption allows us to simplify the mathematical analysis. Adding randomness in recurrent connectivity would increases the variance of sampling distributions. We could then adjust the overall recurrent weight (a scalar) so that the sampling distribution matches the posterior, with no need to finetune individual synaptic weights in the network model. In the past, CANs have been shown to achieve maximal likelihood estimation (point estimate) via template matching^{15,58,59}. Here we have shown that a network with CANlike structure and internally Poisson spiking variability is able to perform samplingbased Bayesian inference. In our network correlations in the stimulus prior are represented by the strength of recurrent synaptic activity, which implies that the (subjective) prior precision in the network increases with the feedforward input strength.
To maintain a fixed prior in the network recurrent weights need to decrease with increased feedforward input strength which encodes the likelihood precision, Λ_{f} (Eq. (6)). Therefore, the (subjective) prior stored in the network with fixed recurrent weights may differ from the objective stimulus prior in the world (Λ_{s} in Eqs. (3) and (7)) with feedforward inputs of different strengths. One possibility is that the proposed network model does not generate samples from each distinct posterior determined by a specific feedforward input, p(s∣u^{f}), but rather generates samples from the average sampling distribution over all possible feedforward inputs and hence matches the average posterior distribution \({{\mathbb{E}}}_{p({{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})}[p({{{{{{{\bf{s}}}}}}}} {{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})]={{\mathbb{E}}}_{p({{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})}[q({{{{{{{\bf{s}}}}}}}} {{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})]\), where \({{\mathbb{E}}}_{p({{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})}[\cdot ]\) denotes the average over the distribution p(u^{f}). Since the proposed recurrent circuit is general, this result may explain one source of inductive bias in cortical processing^{60}. On the other hand, sampling correctly from each specific posterior could be achieved using different biophysical mechanisms that can modulate synaptic strengths and that we have not included in our model. For instance, shortterm synaptic depression^{61} or spike frequency adaptation^{62} are gain control mechanisms that would allow the recurrent input strength (representing the prior correlation) to remain relatively fixed despite an increase in the feedforward input strength. Another possibility is that the recurrent circuit represents a more complex generative model which better captures the statistical structure of natural stimuli^{30,32,63}. Here we assumed that the generative models represented by the network match the model that generate the sensory stimuli. This is unlikely to be the case in practice. Such mismatch between the true and internal model of the world can lead to biases and increased noise which are likely to manifest in specific ways in neural circuits that perform inference via sampling^{64}. Furthermore, we only considered sampling driven by spiking variability with a Fano factor of 1, while cortical responses often have Fano factors that differ from 1^{65,66}. In the latter case, our theory can still work by changing the feedforward connection weight to compensate for the change in Fano factor, as suggested in a recent study^{67}.
To keep our exposition transparent, we only presented models with minimal complexity. Our proposed network mechanism of samplingbased inference can be generalized to more complex generative models, since the assumption of Gaussianity (Eqs. (21) and (22)) and the analytical expression in Eq. (24) are not essential, and several relaxed frameworks may be explored. First, similar networks can generate samples from other multidimensional distributions where the conditional distribution of each latent variable belongs to the linear exponential family^{38,39}. This could be done by changing the tuning functions of neurons to another appropriate profile, as the logarithm of tuning determines the type of sampling distribution (Eq. (1)). When sampling from nonGaussian distributions, the stimulus samples can be linearly read out with the weight determined by the tuning profile (i.e., h(s) in Eq. (1)^{39},). Second, the tuning of recurrent inputs does not need to be the same as that of feedforward inputs. Instead, the logarithm of recurrent input tuning can have a form of the conjugate prior with the likelihood conveyed by feedforward inputs. Third, the network model could also be used to infer the latent variables with a nonuniform marginal prior, if, for example, the preferred stimuli of neurons in the population are not distributed uniformly in the stimulus subspace^{68}. And the proposed network model has the potential to produce samples from the posterior distribution of latent dynamic stimuli which can be described by a hidden Markov model. Lastly, we considered only nonstructured inhibition for simplicity. Structured inhibitory connections could modulate the position of excitatory responses in the stimulus subspace, i.e., the mean of the conditional distribution. Such interplay between E and I neurons with structured inhibition has the potential to implement Hamiltonian sampling, where the I neurons represent the sample of auxiliary variables^{38,50}.
In conclusion, we have shown that a recurrent circuit of neurons with Poisson spiking statistics can implement sampling from a family of multivariate posterior distributions, with internal spiking variability driving the generation of stimulus samples, and the recurrent connections representing the stimulus prior. The proposed neural code may help us understand the structure of neuronal activity, and provide a building block for more complicated population computations.
Methods
A linear network of excitatory neurons
We study how a generic recurrent network model consisting solely of N_{E} excitatory (E) neurons with Poisson spiking statistics (no inhibitory neurons) can implement samplingbased Bayesian inference to approximate the stimulus posterior. We describe neuronal activity using a timediscretized Hawkes process (a type of multivariate, inhomogeneous Poisson process^{69}). The instantaneous firing rates of the neurons in the network at time t, λ_{t}, obey the following recurrent equations:
where u^{f} is the feedforward Poisson spiking input (described below; Eq. (18)), \({{{{{{{{\bf{u}}}}}}}}}_{t}^{{\mathsf{r}}}\) is the continuous valued recurrent input at time t, and ξ_{t} is a N_{E} dimensional independent Gaussian white noise. Hence, over each time interval [t − Δt, t] the activity of the neurons in the network is modeled by a vector of independently generated Poisson spike counts, r_{t}, with means determined by the rates λ_{t}. The parameters w_{E} and σ_{r} determine the excitatory recurrent weight and recurrent variability, respectively. The instantaneous firing rate λ_{t} can be negative due to the recurrent input and noise (Eq. (36)). We interpret a negative firing rate, λ_{t}, as a zero probability of generating a spike.
Poisson spike generation samples stimulus
Independent Poisson spike generation in the network whose activity is described by Eq. (11) can drive sampling across time or across trials from a conditional stimulus distribution determined by the instantaneous firing rate λ_{t}. Below, we compute the distribution of stimulus samples given λ_{t}. We assume that the instantaneous firing rate, λ_{t}, has a smooth bellshaped profile and can be parameterized as,
where \({\bar{s}}_{t}\) characterizes the position of the population firing rate on the stimulus subspace (Fig. 1b, xaxis), while R and a denote the height and width of the population firing rate, respectively. Further, θ_{j} is the preferred stimulus value of neuron j, and the preferred stimuli of all neurons, \({\{{\theta }_{j}\}}_{j=1}^{{N}_{E}}\), are uniformly distributed over the range of stimulus s (Fig. 1b).
To simplify the analysis, we first assume that the instantaneous firing rate is fixed over time. When generating Poisson spikes r_{t} from λ_{t}, the probability of observing a stimulus sample \({\tilde{s}}_{t}\) (embedded in r_{t}) can be derived as (see details in Supplementary Information),
where n_{r} = ∑_{j}r_{tj} is the number of emitted spikes across the whole neural population, and n_{λ} = ∑_{j}〈λ_{j}〉Δt is the sum of population firing rate. Here \({{{{{{{\mathcal{N}}}}}}}}(s \mu,{\sigma }^{2})\) denotes a Gaussian distribution with mean μ and variance σ^{2}, and \({{{{{{{\bf{h}}}}}}}}({\bar{s}}_{t})\) is a vector with the jth element as \({{{{{{{{\bf{h}}}}}}}}}_{j}({\bar{s}}_{t})\) shown in Eq. (12). The logarithm of the firing rate profile, \({{{{{{{\bf{h}}}}}}}}({\bar{s}}_{t}),\) determines how the stimulus sample \({\tilde{s}}_{t}\) and its mean, \({\bar{s}}_{t},\) can be read out respectively from r_{t} and λ_{t},
where \({\tilde{s}}_{t}\) and \({\bar{s}}_{t}\) characterizes the position of r_{t} and λ_{t} on the stimulus subspace.
The sampling variability of \({\tilde{s}}_{t}\) in a single time step depends on the number of emitted spikes, n_{r}. When the fixed rates, λ_{t}, repeatedly generate spikes over time, the sampling distribution of \({\tilde{s}}_{t}\) can be calculated by marginalizing the likelihood (Eq. (13), last line) over different values of n_{r} since n_{r} varies across time (detailed calculation by using Laplacian approximation can be seen in Supplementary Information),
Each stimulus sample, \({\tilde{s}}_{t},\) is thus drawn from a conditional distribution determined by the instantaneous firing rate, \(p(\tilde{s} {{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{t})\), and can be written as
The last proportionality in the above equation is satisfied by a Gaussian profile in the firing rate (more general derivation can be found in Supplementary Information). Introducing Λ = a^{−2}n_{λ} gives Eq. (1) shown in the main text.
Eq. (16) suggests that the type of sampling distribution (or the conditional distribution) that is obtained from spike generation variability is determined by the profile of the instantaneous firing rate, i.e., \({{{{{{{\bf{h}}}}}}}}({\bar{s}}_{t})\) (Eq. (12)). Although the sampling distribution belongs to the linear exponential family of distributions which is similar to the probabilistic population code (PPC)^{39}, there are different ways in representing these distributions. In PPCs, the likelihood over \({\bar{s}}_{t}\) is parametrically represented by a single realization of independent neuronal response r (Eq. (13)), while in our work the distribution is approximated by a sequence of samples, \({\tilde{s}}_{t},\) effectively generated by conditionally independent Poisson spike discharges.
The above analysis can be extended to the case where the instantaneous firing rate, λ_{t}, in a time step deviates from a smooth Gaussian profile (Eq. (12)), which is the case in the actual network simulations. In general, λ_{t} can be expressed as,
where δ_{⊥}λ_{t} denotes the deviation from a smooth Gaussian profile. Note that the sampling distribution only depends on the position, \({\bar{s}}_{t}\), and the sum of instantaneous firing rate, n_{λ} (Eq. (16)), which corresponds to two perpendicular directions in the N_{E} dimensional space of λ_{t}. For any instantaneous firing rate vector, λ_{t}, we can always find \({\bar{s}}_{t}\) and R_{t} that make the deviation δ_{⊥}λ_{t} perpendicular to the two directions, i.e., ∑_{j}δ_{⊥}λ_{tj}θ_{j} = 0, and ∑_{j}δ_{⊥}λ_{tj} = 0. This observation implies that deviations from Gaussian firing rate profiles do not affect our theory.
Feedforward spiking input conveys the likelihood of stimulus
We model the feedforward inputs to the E neurons in the network, u^{f}, as independent Poisson spikes, with Gaussian tuning over stimulus s,
Here \({{{{{{{{\bf{u}}}}}}}}}_{j}^{{\mathsf{f}}}\) denotes the feedforward input received by the jth E neuron, and \(\langle {{{{{{{{\bf{u}}}}}}}}}_{j}^{{\mathsf{f}}}(s)\rangle\) is the tuning of the feedforward input. This mathematical description of feedforward input is the same as the one used in the definition of typical PPCs^{15,39,40}. Since the preferred stimulus values, \({\{{\theta }_{j}\}}_{j=1}^{{N}_{E}}\), of all feedforward inputs are uniformly distributed in stimulus space then the likelihood of s given a single observation of the input, u^{f}, satisfies^{39,40},
The logarithm of tuning, h(s), determines the type of likelihood^{15}. Specifically, the Gaussian tuning leads to a Gaussian likelihood (Eq. (19)), whose mean, μ_{f}, and precision, Λ_{f}, are both linear functions of the inputs,
The mean, μ_{f}, represents the position of u^{f} in stimulus subspace, and the precision, Λ_{f}, is proportional to the sum of total feedforward spike counts, n_{f}.
A recurrent network samples hierarchical latent variables
A hierarchical generative model
We consider a hierarchical generative model for which inference can be implemented in a recurrent circuit of Poisson neurons. We extend the simple generative model of feedforward input (Eq. (19)) by considering the stimulus s to depend on a onedimensional stimulus parameter variable, z. For simplicity, we assume that z follows a uniform distribution (Fig. 3b, marginal plots)
where \({{{{{{{\mathcal{U}}}}}}}}(a,b)\) denotes a uniform distribution over [a, b]. The assumption of a uniform prior, p(z), simplifies our model significantly, as it implies the spatial homogeneity of the network model as given by Eqs. ((18), (19)). However, this assumption is not essential for our main results. Due to the differences between the stimulus and its underlying parameter of the sensory scene, the stimulus, s, is not identical to the parameter z, but we assume that the two are correlated, so that
In sum, the whole generative model is determined by,
where p(u^{f}∣s) is the same as in Eq. (19).
Approximate Bayesian inference via Gibbs sampling
The joint posterior of s and z can be analytically derived given the generative model (Eq. (23)),
We will use this expression to verify that the samples produced by our algorithm converge to the output of the algorithm.
We use the stochastic response of our recurrent network (Eqs. (10), (11)), as a basis for Gibbs sampling^{31,38,42} (a type of Monte Carlo method) to approximate the joint posterior p(s, z). To describe the iterative Gibbs algorithm, we assume that a stimulus parameter sample, \({\tilde{z}}_{t}\), is provided at time t, which is then combined with the feedforward input to update the conditional distribution of stimulus s (step 1 in Fig. 3c),
The next step in the algorithm is to draw a sample, \({\tilde{s}}_{t},\) from the conditional distribution \(p(\tilde{s} {\tilde{z}}_{t},{{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})\) (step 2 in Fig. 3c),
Next, the conditional distribution of stimulus parameter, z, is updated given this new sample, \({\tilde{s}}_{t}\), and a new sample, \({\tilde{z}}_{t+\Delta t},\) is drawn (step 3 in Fig. 3c),
These three steps in the Gibbs sampling algorithm (Eqs. (25), (26)) are performed iteratively until sufficiently many samples, \({\tilde{s}}_{t}\) and \({\tilde{z}}_{t}\), are generated to approximate the true posterior distribution with sufficient accuracy (Fig. 3d; compare the red dots with the blue contour map).
Implementing the Gibbs sampling in a recurrent circuit model
Gibbs sampling of the stimulus (Eq. (4b)) can be implemented via independent Poisson spike generation, as long as the conditional distribution encoded in λ_{t} (Eq. (16)) is the same as the conditional distribution in the Gibbs sampling algorithm (Eq. (4a)), i.e., \(\ln p(\tilde{s} {{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{t})={{{{{{{\bf{h}}}}}}}}{(\tilde{s})}^{\top }{{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{t}=\ln p(\tilde{s} {\tilde{z}}_{t},{{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})\). This condition can be realized in the recurrent circuit by relating the expressions describing the neural dynamics (Eq. (10)) and those describing the Gibbs sampling distribution (Eq. (4a)) to yield,
The generative model for the feedforward input u^{f} (Eq. (19)) suggests that \(\ln p({{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}} \tilde{s})={{{{{{{\bf{h}}}}}}}}{(\tilde{s})}^{\top }{{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}}\). Hence to satisfy Eq. (27) we require
which implies that the recurrent input, \({{{{{{{{\bf{u}}}}}}}}}_{t}^{{\mathsf{r}}},\) should approximately have a Gaussian profile,
whose position on the stimulus subspace is \({\tilde{z}}_{t}\), and the sum of input (height) is determined by Λ_{s}, the precision of conditional distribution \(p(s {\tilde{z}}_{t})\). In a similar fashion to Eq. (17), \({\delta }_{\perp }{{{{{{{{\bf{u}}}}}}}}}_{t}^{{\mathsf{r}}}\) denotes the deviation from a smooth Gaussian and is perpendicular to the direction of \({\tilde{z}}_{t}\) and Λ_{s}.
The optimal recurrent weight can be derived by combining Eq. (29) and Eq. (17). We notice the recurrent input, u^{r}, and neuronal responses, r_{t}, have the same tuning width, a, in a network with only E neurons. This can only be achieved if E neurons are only selfconnected (Eq. (10)), as lateral connection broaden their tuning. The optimal recurrent weight generating recurrent input with appropriate strength is then,
which yields Eq. (6) in the main text. Note that the selfconnection is a result of the simplifying assumption that the network consists solely of E neurons (Eq. (10)), which can be relaxed in a full network consisting both E and I neurons as we show below.
The sampling of the stimulus parameter (Eq. (4c)) can be implemented through variability in the recurrent input. To do this, we include diffusive term in the recurrent interactions, \({{{{{{{{\bf{u}}}}}}}}}_{t}^{{\mathsf{r}}}\), and we equate the variance of the fluctuations with the mean to mimic a Poisson distribution:
where [⋅]_{+} denotes negative rectification. Here ξ_{t} is a N_{E} dimensional Gaussian white noise with \(\langle {{{{{{{{\boldsymbol{\xi }}}}}}}}}_{t}(i){{{{{{{{\boldsymbol{\xi }}}}}}}}}_{{t}^{{\prime} }}(j)\rangle={\delta }_{ij}\delta (t{t}^{{\prime} })\), δ_{ij} and \(\delta (t{t}^{{\prime} })\) are Kronecker and Dirac delta functions respectively, \({\bar{{{{{{{{\bf{u}}}}}}}}}}_{t}^{{\mathsf{r}}}\) represents the conditional distribution \(p(\tilde{z} {\tilde{s}}_{t\Delta t})\), and \({{{{{{{{\bf{u}}}}}}}}}_{t}^{{\mathsf{r}}}\) represent a stimulus parameter sample \({\tilde{z}}_{t}\) (Eq. (29)). The multiplicative variability on recurrent interaction may come from synaptic noise^{37,70}.
Coupled circuits sample a multidimensional posterior
We consider a generative model which has multiple latent stimuli, s = (s_{1}, s_{2}, ⋯ , s_{m}), which are organized in parallel (Fig. 6a). Without loss of generality, we consider the simplest case where m = 2, and the same mechanism can be straightforwardly extended to any m > 2. We assume the joint prior of s is a multivariate normal distribution,
and each stimulus s_{m} is uniformly distributed in (−180°, 180°] with periodic boundary imposed. The definition of Gaussian distribution in a circular space works well as long as the variance of the distribution is much smaller than the range of stimulus space. Here Λ_{s} is the precision matrix, while the scalar variable Λ_{s} (Λ_{s} ≥ 0) characterizes the correlation between s_{1} and s_{2}. Note that the covariance matrix \({{{{{{{{\boldsymbol{\Lambda }}}}}}}}}_{s}^{1}\) is not defined, and the prior (Eq. (32)) is improper. The mean, μ_{s}, is a free parameter, because it doesn’t appear in the detailed expression of the prior (Eq. (32)), which is a consequence from the zero determinant of the precision matrix, i.e., ∣Λ_{s}∣ = 0. A further consequence is that the prior is not centered at μ_{s}, but instead has a band structure along the diagonal, and the marginal prior of each stimulus feature p(s_{m}) (m = 1, 2) is uniform (Fig. 6b). The uniform marginal prior simplifies our theoretical derivation as it implies the spatial homogeneity of the network model but doesn’t impact the proposed neural coding mechanism.
Each stimulus s_{m} (m = 1, 2) individually generates feedforward spiking input \({{{{{{{{\bf{u}}}}}}}}}_{m}^{{\mathsf{f}}}\), whose likelihood \(p({{{{{{{{\bf{u}}}}}}}}}_{m}^{{\mathsf{f}}} {s}_{m})\) is exactly the same as Eq. (2). Combined together, the generative model is
where \({{{{{{{{\boldsymbol{\mu }}}}}}}}}_{{\mathsf{f}}}={({\mu }_{{\mathsf{f}}1},{\mu }_{{\mathsf{f}}2})}^{\top }\), and the likelihood precision matrix Λ_{f} = diag(Λ_{f1}, Λ_{f2}) is a diagonal matrix.
Gibbs sampling of the multidimensional posterior in a coupled neural circuit
Given the generative model (Eq. (33)), the joint posterior of s_{1} and s_{2} is a bivariate normal distribution, i.e., \(p({{{{{{{\bf{s}}}}}}}} {{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})={{{{{{{\mathcal{N}}}}}}}}\left({{{{{{{\bf{s}}}}}}}} {{{{{{{{\boldsymbol{\mu }}}}}}}}}_{p},{{{{{{{{\bf{K}}}}}}}}}_{p}^{1}\right)\), whose precision matrix K_{p} and the mean μ_{p} are,
The precision matrix of the posterior is the sum of the precision of the likelihood and the prior, implying increased reliability of the distribution after combining with the prior. Meanwhile, the posterior mean is the weighted average of the means of the two likelihoods, with the weight proportional to the precision of each likelihood. We use this expression for the posterior to evaluate the performance of the proposed samplingbased algorithm.
Using Gibbs sampling to approximate the posterior (Eq. (34)) involves the following steps:
We note that we only describe the sampling from the posterior distribution of s_{1}; as samples from the posterior of s_{2} can be obtained similarly after exchanging indices. This sampling can be implemented in a neural circuit model consisting of several coupled networks, in which each network generates samples from the posterior distribution of the corresponding stimulus. Therefore, the number of networks in the coupled circuit equals the dimension of the latent stimuli. The dynamics of the coupled neural circuit is defined by:
We again note the dynamics of network 2 can be similarly obtained by changing indices. To implement Gibbs sampling (Eqs. (35a), (35b)) in the coupled circuit (Eqs. (36), (37)), spike generation in network 1 (Eq. (37)) can be used to produce stimulus samples, \({\tilde{s}}_{1t}\), when the conditional distribution determined by λ_{1t} matches the conditional distribution required in the definition of Gibbs sampling (Eq. (35a)), i.e., \(\ln p({\tilde{s}}_{1} {{{{{{{{\bf{u}}}}}}}}}_{1}^{{\mathsf{f}}},{\tilde{s}}_{2,t\Delta t})=\ln p({\tilde{s}}_{1t} {{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{1t})={{{{{{{\bf{h}}}}}}}}{({\tilde{s}}_{1})}^{\top }{{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{1t}\). Taking the logarithm of Eq. (35a) yields,
Comparing this expression with Eq. (36), we see that the feedforward input, \({{{{{{{{\bf{u}}}}}}}}}_{1}^{{\mathsf{f}}},\) matches the conditional distribution \(p({{{{{{{{\bf{u}}}}}}}}}_{1}^{{\mathsf{f}}} {\tilde{s}}_{1})\) (Eq. (33)). We therefore require the recurrent input from network 2 to network 1 to encode the conditional distribution \(p({\tilde{s}}_{2,t\Delta t} {\tilde{s}}_{1})\), i.e., \(\ln p({\tilde{s}}_{2,t\Delta t} {\tilde{s}}_{1})={{{{{{{\bf{h}}}}}}}}{({\tilde{s}}_{1})}^{\top }{{{{{{{{\bf{u}}}}}}}}}_{12,t}^{{\mathsf{r}}}\). This implies that \({{{{{{{{\bf{u}}}}}}}}}_{12,t}^{{\mathsf{r}}}\) should approximately have a Gaussian profile,
where \({\delta }_{\perp }{{{{{{{{\bf{u}}}}}}}}}_{12,tj}^{{\mathsf{r}}}\) quantifies the deviation from a perfect Gaussian profile, and does not affect the decoded value \({\tilde{s}}_{2,t\Delta t}\) and Λ_{s}.
The recurrent input, \({{{{{{{{\bf{u}}}}}}}}}_{12}^{{\mathsf{r}}}\), (Eq. (39)) has the same width a as the neuronal response, r_{1}. In circuit containing only E neurons, if the two networks have the same number of neurons, then across networks only neurons having the same preferred stimulus should be connected. The optimal recurrent weight between two networks is then
Since each network individually generate a stimulus sample, the sample of stimulus m can be locally read out from network m’s responses even if the activities of two networks are correlated (Fig. 6a), which greatly simplifies readout. Furthermore, due to the population firing rate of each network has Gaussian profile, the stimulus sample \({\tilde{s}}_{mt}\) can be linearly read out from r_{mt} as
We note that the circuit implementation of Gibbs sampling from a multidimensional posterior (Eq. (8a)) does not require the recurrent connections between E neurons within a network. This is due to the assumption that the marginal priors of each stimulus feature, p(s_{m}), are uniform. For a nonuniform marginal prior p(s_{m}), recurrent connections between E neurons within a network would be required for generating samples from a distribution that matches the true posterior.
Inference from an informationtheoretic point of view
The goal of the sampling algorithm is to approximate the posterior distribution of a latent variables, Θ, given a feedforward input, u^{f}. Specifically, the latent variables Θ = {s, z} in the hierarchical generative model (Eq. (23)), or Θ = s = {s_{1}, s_{2}} in the generative model with breadth (Eq. (33)). When the sampling algorithm uses an internal model which does not match the structure of the generative model, the sampling distribution q(Θ∣u^{f}) will differ from the true posterior, p(Θ∣u^{f}) (Eq. (24)). In this case the mutual information between the sampling distribution of the latent variables, Θ, and u^{f} will be smaller than in the case when samples come from the true posterior, p(Θ∣u^{f}),
It is straightforward to show that the difference between I(Θ, u^{f}) and I_{q}(Θ, u^{f}) is the Kullback–Leibler (KL) divergence between p and q, i.e., \({D}_{KL}[p  q]=I(\Theta,{{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}}){I}_{q}(\Theta,{{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})={{\mathbb{E}}}_{p}(\ln p\ln q)\ge 0\). Equality in Eq. (42) holds only if the distribution q matches the true posterior p.
The mutual information I_{q}(Θ; u^{f}) can be computed analytically when the approximating distribution \(q(\Theta  {{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})={{{{{{{\mathcal{N}}}}}}}}(\Theta  {{{{{{{{\boldsymbol{\mu }}}}}}}}}_{q},{{{{{{{{\bf{K}}}}}}}}}_{q}^{1})\) is a bivariate normal (substituting Eqs. (23) and (24) into Eq. (42)),
Here L = 360° is the length of the stimulus feature subspace, while μ_{p} and K_{p} are the mean and the precision matrix of the posterior distribution (Eqs. (24) or (34)). When q matches the posterior distribution, p, we have, \(I(\Theta ;{{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})=\log L\frac{1}{2}[1+\log (2\pi {\Lambda }_{s})\log  {{{{{{{{\bf{K}}}}}}}}}_{p} ].\)
The neuronal response distribution conditioned on external stimulus
We compute the distribution of neuronal responses r over time/trial in response to an external stimulus s, i.e., p(r∣s), in order to find a neural signature of network sampling and compare it with experimental data. For a fixed external stimulus s, the neuronal response r fluctuates due to both sensory transmission noise described by p(u^{f}∣s) (Eq. (18)), as well as the internally generated variability described by p(r∣u^{f}) (Fig. 4a). Therefore, the distribution of r in response to an external stimulus s has the form
For simplicity, we only compute the covariability of p(r∣u^{f}) along the stimulus subspace (Fig. 1b, xaxis), because the covariability along other directions is not related with stimulus sampling. By approximating the Poissonian spiking variability p(r∣λ) with a multivariate normal distribution (Eq. (11)), and considering the limit of weak fluctuations in λ along the stimulus subspace over time, p(r∣u^{f}) can be computed approximately as (see math details in Supplementary Information),
f(s) = 〈λ_{t}〉 denotes the temporally averaged population response. The covariance structure of the neuronal response includes two terms: diag(f(s)), a diagonal matrix whose entries equal that of the vector f(s) denoting the (independent) Poisson spiking variability (Eq. (23)), and \(V(\bar{s} {\mu }_{{\mathsf{f}}}){{{{{{{{\bf{f}}}}}}}}}_{s}^{{\prime} }{{{{{{{{\bf{f}}}}}}}}}_{s}^{{\prime} \top }\), a term that captures the covariability due to firing rate fluctuations along the stimulus subspace (Fig. 8a), where \({{{{{{{{\bf{f}}}}}}}}}_{s}^{{\prime} }=d{{{{{{{\bf{f}}}}}}}}(s)/ds\) is the derivative of f(s) over the stimulus feature s. The covariance \({{{{{{{{\bf{f}}}}}}}}}_{s}^{{\prime} }{{{{{{{{\bf{f}}}}}}}}}_{s}^{{\prime} \top }\) is often termed differential (noise) correlations^{4,17}. With the Gaussian profile of f(s) (Eqs. (18) and (29)), \({{{{{{{{\bf{f}}}}}}}}}_{s}^{{\prime} }{{{{{{{{\bf{f}}}}}}}}}_{s}^{{\prime} \top }\) exhibits antisymmetric structure (Fig. 8b)^{17,22,53,71,72}.
In Eq. (44), \(V(\bar{s} {\mu }_{{\mathsf{f}}})\) is the variance of \({\bar{s}}_{t}\) (the mean of conditional distribution in Eq. (4a)) over time and characterizes the amplitude of internally generated differential correlations. In network implementation, \({\bar{s}}_{t}\) and μ_{f} are represented as the position of λ_{t} and u^{f} on the stimulus subspace respectively (Eqs. (14) and (20)). The dynamics of Gibbs sampling (Eq. S20 in Supplementary Information) and the network structure (Eq. (6)) imply that
Note that \(V(\bar{s} {\mu }_{{\mathsf{f}}})\) is constrained by network connections, in that it is internally generated and shared within the network (for \({w}_{E}^{*} \, > \, 0\)).
An expression for p(r∣s) can be derived similarly, and includes an additional term contributing to differential correlations compared with p(r∣u^{f}) (Eq. (44)) due to fluctuations in the feedforward inputs,
Here the variance, \(V(\bar{s} s)\), in the stimulus feature subspace is a mixture of internal variability, \(V(\bar{s} {\mu }_{{\mathsf{f}}})\), and sensory noise, V(μ_{f}∣s) (Eq. (23)). The neuronal response distribution in coupled networks (Fig. 6a) can be obtained similarly (see the Supplementary Information).
A spiking network model with excitatory and inhibitory Poisson neurons
To test the proposed inference mechanisms in a network consisting of E neurons (Eqs. (10)–(37)), we simulated a well studied recurrently coupled cortical model^{21,22}. The network consisted of N_{E} excitatory (E) and N_{I} inhibitory (I) spiking neurons, with the activity of each neuron modeled as a Hawkes process^{69}. At time t, we represent the response of neuron j in population a = {E, I}, \({{{{{{{{\bf{r}}}}}}}}}_{tj}^{a}\), as a spike count drawn from a Poisson distribution with instantaneous firing rate, \({{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{tj}^{a}\),
Each neuron has a refractory period of 2ms after emitting a spike. The firing rate \({{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{tj}^{a}\) is the sum of feedforward input \({{{{{{{{\bf{u}}}}}}}}}_{tj}^{a{\mathsf{f}}}\) and recurrent input \({{{{{{{{\bf{u}}}}}}}}}_{tj}^{a{\mathsf{r}}}\), so that \({{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{tj}^{a}={{{{{{{{\bf{u}}}}}}}}}_{tj}^{a{\mathsf{f}}}+{{{{{{{{\bf{u}}}}}}}}}_{tj}^{a{\mathsf{r}}}.\) The feedforward inputs are filtered spikes from upstream neurons, \({{{{{{{{\bf{u}}}}}}}}}_{tj}^{a{\mathsf{f}}}={\sum }_{n}\eta \left(t{t}_{jn}^{{\mathsf{f}}}\right),\) where \({t}_{jn}^{{\mathsf{f}}}\) is the time of the nth spike received by neuron j of population a from the feedforward inputs. Here η(t) is the synaptic input profile which is modeled as \(\eta (t)=\exp (t/{\tau }_{d})/{\tau }_{d}\), (t > 0). Throughout, we set the synaptic time constant τ_{d} = 2ms. To mimic the Poissonlike variability to sample a stimulus parameter in a hierarchical generative model (Eqs. (23) and (31)), the recurrent input received by neuron j in population a is defined by
where \({\bar{{{{{{{{\bf{u}}}}}}}}}}_{tj}^{a{\mathsf{r}}}\) is the mean recurrent input at time t given the neuronal activities of the presynaptic neurons. The recurrent input in the network is corrupted by noise whose variance equals the mean of the recurrent input. In a physiological network, recurrent noise may be generated by the chaotic state in network dynamics^{36} or synaptic noise^{37,70}. In Eq. (48), the function [⋅]_{+} rectifies the negative input, and ξ_{t} is a random variable following a standard Gaussian distribution. The coefficient \({J}_{ij}^{ab}\) is the synaptic weight from neuron j in population b to neuron i in population a. The time \({t}_{kn}^{b}\) is the time of the nth spike fired by neuron k in population b. The parameter N = N_{E} + N_{I} is the total number of neurons in the network. The scaling of the synaptic weights by \(1/\sqrt{N}\) is standard in networks where excitation is balanced by recurrent inhibition^{36}. Finally, the synaptic input profile of the recurrent input, η(t), is the same as the one we chose for the feedforward input for convenience. Note that the rectification in Eq. (48) on recurrent inputs will introduce errors resulting in deviations of the sampling distribution from the true posterior, and hence we chose the recurrent weights to be small (Fig. 5). The rectification only arises when using (continuous) recurrent inputs to sample the stimulus parameter, and doesn’t impact the generality of sampling by (discrete) Poisson spiking variability.
To model the coding of a circular stimulus such as orientation, the excitatory neurons are arranged on a ring^{22,71}. The preferred stimuli, θ_{j}, of the excitatory neurons are equally spaced on the interval (−180°, 180°], consistent with the range of latent features (Eq. (21)). Inhibitory neurons are not tuned to stimulus, and their role is to stabilize network responses. Note that the recurrent connections between E neurons are modeled using a Gaussian function decaying with the distance between the stimuli preferred by the two cells, rather than only selfconnection in the simple network with only E neurons (Eq. (30)),
We imposed periodic boundaries on the Gaussian function to avoid boundary effect in simulations. Although in the generative model we assumed nonperiodic feature variables (Eq. (3)), as long as the variance of the associated distributions are smaller than the width of the feature space, the network model with periodic boundaries on the recurrent connection (Eq. (49)) provides a good approximation of the nonperiodic Gaussian posterior (Eq. (24)). The weight w_{EE} denotes the average connection strength of all E to E connections. The parameter a = 40° defines the footprint of connectivity in feature space (i.e the ring), and L = 360° is the length of the ring manifold (Eq. (21)); Multiplication by L in Eq. (49) sets the sum of all E to E connection strengths equal to N_{E}w_{EE}. Moreover, the excitatory and inhibitory neurons are alltoall connected with each other (similar for I to I connections). For simplicity, we consider the E to I, I to I and I to E connections all to be unstructured (in feature space) and assume that connections of the same type have equal weight, i.e., \({J}_{jk}^{EI}={w}_{EI}\), \({J}_{jk}^{IE}={w}_{EE}\) and \({J}_{jk}^{II}={w}_{II}\). To simplify the network further, we consider the connections from the same population of neurons to have the same average weight, i.e., w_{EE} = w_{IE} ≡ w_{E} and w_{II} = w_{EI} ≡ w_{I}. For the feedforward network model shown in Fig. 2, we only remove the E recurrent connections between E neurons, i.e., w_{EE} = 0, while keeping other connections, including w_{EI}, w_{II}, and w_{IE}, the same as the recurrent network.
The feedforward inputs applied to E neurons consist of independent Poisson spike counts as described by Eq. (18), with rate \(\langle {{{{{{{{\bf{u}}}}}}}}}_{j}^{E{\mathsf{f}}}(s)\rangle={U}^{{\mathsf{f}}}{e}^{{(s{\theta }_{j})}^{2}/(4{a}^{2})}\). The inhibitory neurons also receive feedforward indpendent Poissonian inputs. The firing rate of the input received by every I neuorn is proportional to the overall feedforward rate of input to E neurons, in order to keep the excitatory and inhibitory balance of neuronal activities in the network,
In the simulations, we started with a network of N_{E} = 180 excitatory and N_{I} = 45 inhibitory neurons, and increased the number of neurons by a fixed factor in Fig. 1d. The ratio between the average connection from I neurons and the one from E neurons was kept fixed with w_{I}/w_{E} = 5. We set the feedforward weight of input to I neurons to w_{If} = 0.8. We simulated the dynamics of the model network using the Euler method with a time step of 0.1ms. The typical parameters used in simulation can be found in Table 1 in Supplementary Information. Further details about the simulations and numerical estimates of mutual information and linear Fisher information are also presented in Supplementary Information. The code of network simulation was written in MATLAB 2018b, and can be found at GitHub (https://github.com/wenhaoz/Sampling_PoissSpk_Neuron).
A spiking network model of coupled neural circuits
In the coupled neural circuits used to infer latent variables organized in parallel (Fig. 6a) the two networks are copies of each other, i.e., the two networks have the same intrinsic parameters. Each network is equivalent to the one described in the previous section, except that there is no recurrent connections between E neurons in the same network, and no variability in recurrent interactions (no noise in Eq. (48)). The absence of recurrent connections between E neurons in the same network is due to the uniform marginal prior of stimulus. Nevertheless, in the same network the E and I neurons are connected using the same connection profile as above to keep network activity stable. Between the two networks, there are only E connections which target both E and I neurons. The connections between E neurons across networks have the same pattern as that given described by Eq. (49) with the peak connection strength from network n to network m denoted as \({w}_{EE}^{mn}\). The connections from E neurons in one network to I neurons in the other is set to the same as the peak strength of E connections across networks for simplicity, i.e., \({w}_{IE}^{mn}={w}_{EE}^{mn}\). To simplify the network model further, we set the internetwork connections to be symmetric, which means \({w}_{EE}^{mn}={w}_{EE}^{nm}\). In the simulations \({w}_{EE}^{mn}\) was adjusted to determine how the sampling distribution is affected (Fig. 7a).
Comparing the sampling distribution with posterior in coupled neural circuits
We read out the samples from the posterior distribution of each stimulus, \({\tilde{s}}_{mt},\) individually from the spiking activities of E neurons, r_{mt}, in network m in every time window of 20ms by using a population vector. We used this collection of samples to estimate the mean, \(\langle \tilde{{{{{{{{\bf{s}}}}}}}}}\rangle={(\langle {\tilde{s}}_{1}\rangle,\langle {\tilde{s}}_{2}\rangle )}^{\top }\), and covariance matrix, Σ_{s}, of the sampling distribution. Meanwhile, the mean μ_{f} and precision matrix Λ_{f} of the likelihood are linearly read out from the feedforward inputs fed into the network model (Eq. (33)).
If the sampling distribution is comparable with the posterior, the sampling mean \(\langle \tilde{{{{{{{{\bf{s}}}}}}}}}\rangle\) and covariance Σ_{s} should satisfy Eq. (34). We use the actual sampling covariance and the likelihood parameters to predict the sampling mean, i.e., \({\langle \tilde{{{{{{{{\bf{s}}}}}}}}}\rangle }_{{{{{{{{\rm{pred}}}}}}}}}={{{{{{{{\boldsymbol{\Sigma }}}}}}}}}_{{{{{{{{\bf{s}}}}}}}}}{{{{{{{{\boldsymbol{\Lambda }}}}}}}}}_{{\mathsf{f}}}{{{{{{{{\boldsymbol{\mu }}}}}}}}}_{{\mathsf{f}}}\), and compare it with the actual \(\langle \tilde{{{{{{{{\bf{s}}}}}}}}}\rangle\) (Fig. 7d–f). To obtain the posterior precision matrix, given the sampling mean \(\langle \tilde{{{{{{{{\bf{s}}}}}}}}}\rangle\) and the likelihood parameters, we vary the single parameter of prior precision Λ_{s} to minimize the KL divergence from the prediction of posterior by using the value of Λ_{s}, and the actual sampling distribution. Given this value of Λ_{s}, the prediction of posterior precision is computed as K_{pred} = Λ_{s} + Λ_{f} (Eq. (34)) which is then compared with actual sampling precision matrix (\({{{{{{{{\boldsymbol{\Sigma }}}}}}}}}_{{{{{{{{\bf{s}}}}}}}}}^{1}\); see Fig. 7c–g). The prior precision, Λ_{s}, is a subjective prior, which reflects the prior stored in the recurrent network and may change with input (see Discussion). More details of network simulation and parameters can be found in Supplementary Information.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
This is a strictly computational study and all data used in making figures were generated by computer simulations of the proposed model with the link of codes shown in Code Availability.
Code availability
The code of network simulation was written in MATLAB 2018b, and can be found at GitHub (https://github.com/wenhaoz/Sampling_PoissSpk_Neuron)^{73}.
References
Pouget, A., Dayan, P. & Zemel, R. S. Inference and computation with population codes. Ann. Rev. Neurosci. 26, 381–410 (2003).
Doiron, B., LitwinKumar, A., Rosenbaum, R., Ocker, G. K. & Josić, K. The mechanics of statedependent neural correlations. Nat. Neurosci. 19, 383–393 (2016).
Goris, R. L., Movshon, J. A. & Simoncelli, E. P. Partitioning neuronal variability. Nat. Neurosci. 17, 858–865 (2014).
Kohn, A., CoenCagli, R., Kanitscheider, I. & Pouget, A. Correlations and neuronal population information. Ann. Rev. Neurosci. 39, 237–256 (2016).
Harris, J. A. et al. Hierarchical organization of cortical and thalamic connectivity. Nature 575, 195–202 (2019).
Oh, S. W. et al. A mesoscale connectome of the mouse brain. Nature 508, 207–214 (2014).
Douglas, R. J. & Martin, K. A. Neuronal circuits of the neocortex. Annu. Rev. Neurosci. 27, 419–451 (2004).
Rossi, L. F., Harris, K. D. & Carandini, M. Spatial connectivity matches direction selectivity in visual cortex. Nature 588, 648–652 (2020).
Harris, K. D. & MrsicFlogel, T. D. Cortical connectivity and sensory coding. Nature 503, 51–58 (2013).
Ernst, M. O. & Banks, M. S. Humans integrate visual and haptic information in a statistically optimal fashion. Nature 415, 429–433 (2002).
Yuille, A. & Kersten, D. Vision as Bayesian inference: analysis by synthesis? Trends Cogn. Sci. 10, 301–308 (2006).
Lee, T. S. & Mumford, D. Hierarchical Bayesian inference in the visual cortex. JOSA A 20, 1434–1448 (2003).
Körding, K. P. & Wolpert, D. M. Bayesian integration in sensorimotor learning. Nature 427, 244–247 (2004).
Knill, D. C. & Pouget, A. The Bayesian brain: the role of uncertainty in neural coding and computation. Trends Neurosci. 27, 712–719 (2004).
Pouget, A., Beck, J. M., Ma, W. J. & Latham, P. E. Probabilistic brains: knowns and unknowns. Nat. Neurosci. 16, 1170 (2013).
Fiser, J., Berkes, P., Orbán, G. & Lengyel, M. Statistically optimal perception and learning: from behavior to neural representations. Trends. Cogn. Sci. 14, 119–130 (2010).
MorenoBote, R. et al. Informationlimiting correlations. Nat. Neurosci. 17, 1410 (2014).
Dayan, P. & Abbott, L. F. Theoretical Neuroscience, Vol. 806 (MIT Press, 2001).
Averbeck, B. B., Latham, P. E. & Pouget, A. Neural correlations, population coding and computation. Nat. Rev. Neurosci. 7, 358 (2006).
Georgopoulos, A. P., Schwartz, A. B. & Kettner, R. E. Neuronal population coding of movement direction. Science 233, 1416–1419 (1986).
Rubin, D. B., Van Hooser, S. D. & Miller, K. D. The stabilized supralinear network: a unifying circuit motif underlying multiinput integration in sensory cortex. Neuron 85, 402–417 (2015).
BenYishai, R., BarOr, R. L. & Sompolinsky, H. Theory of orientation tuning in visual cortex. Proc. Natl Acad. Sci. 92, 3844–3848 (1995).
Somers, D. C., Nelson, S. B. & Sur, M. An emergent model of orientation selectivity in cat visual cortical simple cells. J. Neurosci. 15, 5448–5465 (1995).
Huang, C., Pouget, A. & Doiron, B. D. Internally generated population activity in cortical networks hinders information transmission. Sci. Adv. 8, eabg5244 (2022).
Kersten, D., Mamassian, P. & Yuille, A. Object perception as Bayesian inference. Annu. Rev. Psychol. 55, 271–304 (2004).
Doya, K., Ishii, S., Pouget, A. & Rao, R. P. Bayesian Brain: Probabilistic Approaches to Neural Coding (MIT press, 2007).
Hoyer, P. O. & Hyvärinen, A. Interpreting neural response variability as Monte Carlo sampling of the posterior. In Advances in Neural Information Processing Systems, 293–300 (2003).
Buesing, L., Bill, J., Nessler, B. & Maass, W. Neural dynamics as sampling: a model for stochastic computation in recurrent networks of spiking neurons. PLoS Comput. Biol. 7, e1002211 (2011).
Savin, C. & Deneve, S. Spatiotemporal representations of uncertainty in spiking neural networks. In NIPS, vol. 27, 2024–2032 (2014).
Orbán, G., Berkes, P., Fiser, J. & Lengyel, M. Neural variability and samplingbased probabilistic representations in the visual cortex. Neuron 92, 530–543 (2016).
Haefner, R. M., Berkes, P. & Fiser, J. Perceptual decisionmaking as probabilistic inference by neural sampling. Neuron 90, 649–660 (2016).
Echeveste, R., Aitchison, L., Hennequin, G. & Lengyel, M. Corticallike dynamics in recurrent circuits optimized for samplingbased probabilistic inference. Nat. Neurosci. 23, 1138–1149 (2020).
Festa, D., Aschner, A., Davila, A., Kohn, A. & CoenCagli, R. Neuronal variability reflects probabilistic inference tuned to natural image statistics. Nat. Commun. 12, 1–11 (2021).
Hénaff, O. J., BoundySinger, Z. M., Meding, K., Ziemba, C. M. & Goris, R. L. Representation of visual uncertainty through neural gain variability. Nat. Commun. 11, 1–12 (2020).
Shadlen, M. N. & Newsome, W. T. The variable discharge of cortical neurons: implications for connectivity, computation, and information coding. J. Neurosci. 18, 3870–3896 (1998).
Vreeswijk, C. V. & Sompolinsky, H. Chaotic balanced state in a model of cortical circuits. Neural Comput. 10, 1321–1371 (1998).
Rosenbaum, R., Rubin, J. & Doiron, B. Short term synaptic depression imposes a frequency dependent filter on synaptic information transfer. PLoS Comput. Biol. 8, e1002557 (2012).
Bishop, C. M. Pattern Recognition and Machine Learning (Springer, 2006).
Ma, W. J., Beck, J. M., Latham, P. E. & Pouget, A. Bayesian inference with probabilistic population codes. Nat. Neurosci. 9, 1432–1438 (2006).
Jazayeri, M. & Movshon, J. A. Optimal representation of sensory information by neural populations. Nat. Neurosci. 9, 690–696 (2006).
Lewicki, M. S. & Sejnowski, T. J. Bayesian unsupervised learning of higher order structure. Adv. Neural Inf. Process. Syst. 9, 529–535 (1996).
GrabskaBarwinska, A., Beck, J. M., Pouget, A. & Latham, P. E. Demixing odorsfast inference in olfaction. Adv. Neural Inf. Process. Syst. 26, 1–9 (2013).
Field, D. J., Hayes, A. & Hess, R. F. Contour integration by the human visual system: evidence for a local “association field”. Vision Res. 33, 173–193 (1993).
Geisler, W. S., Perry, J. S., Super, B. & Gallogly, D. Edge cooccurrence in natural images predicts contour grouping performance. Vision Res. 41, 711–724 (2001).
Cossell, L. et al. Functional organization of excitatory synaptic strength in primary visual cortex. Nature 518, 399–403 (2015).
Kanitscheider, I., CoenCagli, R., Kohn, A. & Pouget, A. Measuring fisher information accurately in correlated neural populations. PLoS Comput. Biol. 11, e1004218 (2015).
Lee, T. S. The visual system’s internal model of the world. Proc. IEEE 103, 1359–1378 (2015).
Vasudeva Raju, R. & Pitkow, Z. Inference by reparameterization in neural population codes. Adv. Neural Inf. Process. Syst. 29, 2029–2037 (2016).
Beck, J. M., Latham, P. E. & Pouget, A. Marginalization in neural circuits with divisive normalization. J. Neurosci. 31, 15310–15319 (2011).
Aitchison, L. & Lengyel, M. The Hamiltonian brain: Efficient probabilistic inference with excitatoryinhibitory neural circuit dynamics. PLoS Comput. Biol. 12, e1005186 (2016).
Shivkumar, S., Lange, R., Chattoraj, A. & Haefner, R. A probabilistic population code based on neural samples. In Advances in Neural Information Processing Systems, Vol. 31 (eds Bengio, S. et al.) (Curran Associates, Inc., 2018). https://proceedings.neurips.cc/paper/2018/file/5401acfe633e6817b508b84d23686743Paper.pdf.
Kanitscheider, I., CoenCagli, R. & Pouget, A. Origin of informationlimiting noise correlations. Proc. Natl Acad. Sci. 112, E6973–E6982 (2015).
PonceAlvarez, A., Thiele, A., Albright, T. D., Stoner, G. R. & Deco, G. Stimulusdependent variability and noise correlations in cortical mt neurons. Proc. Natl Acad. Sci. 110, 13162–13167 (2013).
Wu, S., Wong, K. M., Fung, C. A., Mi, Y. & Zhang, W. Continuous attractor neural networks: candidate of a canonical model for neural information representation. F1000Research 5, F1000 (2016).
Hennequin, G., Ahmadian, Y., Rubin, D. B., Lengyel, M. & Miller, K. D. The dynamical regime of sensory cortex: stable dynamics around a single stimulustuned attractor account for patterns of noise variability. Neuron 98, 846–860 (2018).
Lange, R. D. & Haefner, R. M. Taskinduced neural covariability as a signature of approximate Bayesian learning and inference. PLoS Comput Biol 18, e1009557 (2022).
Zhang, K. Representation of spatial orientation by the intrinsic dynamics of the headdirection cell ensemble: a theory. J. Neurosci. 16, 2112–2126 (1996).
Deneve, S., Latham, P. E. & Pouget, A. Reading population codes: a neural implementation of ideal observers. Nat. Neurosci. 2, 740–745 (1999).
Wu, S., Amari, S.i & Nakahara, H. Population coding and decoding in a neural field: a computational study. Neural Comput. 14, 999–1026 (2002).
Schulz, E., Tenenbaum, J. B., Duvenaud, D., Speekenbrink, M. & Gershman, S. J. Compositional inductive biases in function learning. Cogn. Psychol. 99, 44–79 (2017).
Abbott, L. F., Varela, J., Sen, K. & Nelson, S. Synaptic depression and cortical gain control. Science 275, 221–224 (1997).
Ermentrout, B. Linearization of fi curves by adaptation. Neural Comput. 10, 1721–1729 (1998).
CoenCagli, R., Kohn, A. & Schwartz, O. Flexible gating of contextual influences in natural vision. Nat. Neurosci. 18, 1648–1655 (2015).
Beck, J. M., Ma, W. J., Pitkow, X., Latham, P. E. & Pouget, A. Not noisy, just wrong: the role of suboptimal inference in behavioral variability. Neuron 74, 30–39 (2012).
Churchland, M. M. et al. Stimulus onset quenches neural variability: a widespread cortical phenomenon. Nat. Neurosci. 13, 369–378 (2010).
Maimon, G. & Assad, J. A. Beyond Poisson: increased spiketime regularity across primate parietal cortex. Neuron 62, 426–440 (2009).
Zhang, W., Lee, T. S., Doiron, B. & Wu, S. Distributed samplingbased Bayesian inference in coupled neural circuits. bioRxiv (2020).
Ganguli, D. & Simoncelli, E. P. Implicit encoding of prior probabilities in optimal neural populations. Adv. Neural Inf. Process. Syst. 2010, 658 (2010).
Trousdale, J., Hu, Y., SheaBrown, E. & Josić, K. Impact of network structure and cellular response on spike time correlations. PLoS Comput. Biol. 8, e1002408 (2012).
Rusakov, D. A., Savtchenko, L. P. & Latham, P. E. Noisy synaptic conductance: Bug or a feature? Trends Neurosci. 43, 363–372 (2020).
Wu, S., Hamaguchi, K. & Amari, S.i Dynamics and computation of continuous attractors. Neural Comput. 20, 994–1025 (2008).
Wimmer, K., Nykamp, D. Q., Constantinidis, C. & Compte, A. Bump attractor dynamics in prefrontal cortex explains behavioral precision in spatial working memory. Nat. Neurosci. 17, 431 (2014).
Zhang, W.H. Samplingbased Bayesian inference in recurrent circuits of stochastic spiking neurons. Sampling_PoissSpk_Neuron. https://doi.org/10.5281/zenodo.8088755 (2023).
Acknowledgements
National Institutes of Health grants 1R01MH115557 (K.J.), 1U19NS10761301 (B.D.), R01EB026953 (B.D.), R01EY034723 (B.D.); National Science Foundation grant NSFDBI1707400 (K.J.), DMS2207647 (K.J.). Vannevar Bush faculty fellowship N000141812002 (B.D); Simons Foundation Collaboration on the Global Brain (B.D.).
Author information
Authors and Affiliations
Contributions
W.H.Z., S.W., K.J., and B.D. conceived and designed the study. W.H.Z. developed the ideas, performed the analyses, and the numerical simulations. W.H.Z, K.J., and B.D. discussed the results and wrote the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Zhang, WH., Wu, S., Josić, K. et al. Samplingbased Bayesian inference in recurrent circuits of stochastic spiking neurons. Nat Commun 14, 7074 (2023). https://doi.org/10.1038/s41467023417433
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467023417433
This article is cited by

Braininspired artificial intelligence research: A review
Science China Technological Sciences (2024)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.