Introduction

In an uncertain and changing world, it is imperative for the brain to reliably represent and interpret external stimuli. The cortex is essential for the representation of the sensory world, and it is believed that populations of neurons collectively code for richly structured sensory scenes1. However, two central characteristics of cortical circuits remain to be properly integrated into population coding frameworks. First, neuronal activity in sensory cortices is often noisy, showing significant variability of spiking responses evoked by the same stimulus2,3. In many traditional coding frameworks such spiking variability degrades the representation of stimuli by cortical activity4. Why cortical responses display large spiking variability while isolated cortical neurons can respond reliably remains far from clear. Second, the primary source of synaptic inputs to cortical neurons does not come from upstream centers which convey sensory signals, but rather from recurrent pathways between cortical neurons5,6,7. While such recurrent connections are often organized about a stimulus feature axis8,9, it is not obvious whether or how their presence improves overall representation. We propose a biologically motivated inference coding scheme where these two ubiquitous cortical circuit features, variability in spike generation and recurrent connections, together support a probabilistic representation of stimuli in rich sensory scenes.

Numerous studies have framed sensory processing in the cortex in terms of Bayesian inference (e.g., refs. 10,11,12,13,14,15,16). Specifically, the ‘Bayesian brain’ hypothesis posits that sensory cortex infers and synthesizes a posterior distribution of the latent stimuli which describes the probability of possible stimuli that could have given rise to the sensory inputs. Performing Bayesian inference requires cortex to store an internal model that represents how sensory inputs and external stimuli are generated. Once a sensory input is received, cortical dynamics inverts this internal model in a process termed “analysis-by-synthesis”12, and represents the posterior distributively across neurons and/or across time15,16. In this study, we propose that recurrent connections in cortical circuits store the prior of latent stimuli to produce the posterior distribution when combined with evidence from sensory inputs. Moreover, we posit that Poisson spiking variability provides a source of fluctuations needed for generating random samples from the inferred posterior.

To test these hypotheses, we consider a recurrent circuit model where neurons receive stochastic feedforward inputs which carry information about the external world, and respond with Poisson-distributed spiking activity. We find that such Poissonian spiking provides the variability that allows the network to generate samples from posterior stimulus distributions with differing uncertainties. We use this sampling framework to illustrate circuit-based Bayesian inference given two distinct generative models of stimuli in the external world: one organized hierarchically with a stimulus variable that depends on a latent stimulus parameter, and a second where a pair of latent stimuli are organized in parallel. In both cases, a recurrent circuit is able to generate samples from the joint posterior, and infer the values of the latent variables. We show through both analytic derivation and simulations that recurrent connections represent the correlation structure of these models, and the weight of these connections can be tuned to optimally capture the prior distribution of stimuli in the external world. The stronger the correlation between the latent variables, the stronger the recurrent connections need to be for the network to generate samples from the correct posterior distribution.

Finally, a neural signature of this circuit-based sampling mechanism is internally generated population noise correlations aligned with the stimulus response direction, often referred to as “differential correlations”4,17. In our framework, the amplitude of internally generated differential correlations is determined by the recurrent connection strength, which also determines the prior stored by the circuit. Since optimal inference requires a specific magnitude of recurrent connectivity, differential correlations resulting from such recurrent connectivity are a potential signature of optimal coding. This is in contrast to the deleterious impact of externally generated differential correlations. We thus predict that the correlation structure of the external world shapes recurrent wiring in neural circuits, and is reflected in the pattern of differential noise correlations. We use this logic to provide testable predictions from our framework for sampling-based Bayesian inference by recurrent, stochastic cortical circuits.

Results

Recurrent circuitry and spiking variability do not improve conventional neural codes

We start with the classic example of a sensory stimulus, s, encoded in neuronal population activity, r, from which a stimulus estimate \(\hat{s}\) can be decoded (Fig. 1a, top)18. It is reasonable to expect that neuronal circuitry is adapted to accurately represent ethologically relevant stimuli. However, as we will show next, in simple coding schemes two ubiquitous features of cortical circuits – internal spiking variability and recurrent connectivity – are at best irrelevant for, and in many cases degrade, the accuracy of these representations.

Fig. 1: A network with structured recurrent connections limits the linear Fisher Information (LFI) about external stimuli.
figure 1

a A schematic diagram showing how a stimulus, s, is encoded in neuronal response, rt. A stimulus estimate, \({\hat{s}}_{t},\) can be obtained from rt.. b A recurrent ring model (top) where the connections between excitatory neurons are dependent on their distance along the ring. Blue arrows: excitatory synapses with line width denoting connection strength; red arrows: inhibitory synapses. c The population activity of excitatory neurons in the ring model, rt, dependent on a stimulus, s. The blue curve shows the population activity in response to s = 0, and gray curves the activities in response to stimuli with values at the peak locations of the curves. d For finite size networks (colored lines; ratio of excitatory to inhibitory neurons kept constant) LFI decreases as wE increases. In the limit of infinite network size LFI does not depend on wE (dashed line). Since neural responses are variable, LFI in the neuronal response converges to only half of the LFI in the feedforward input. Error bars denote one standard deviation (SD), which were estimated from N = 50 independent samples generated by using Bootstrap.

In population coding frameworks stimuli are encoded by a neuronal population with individual neurons tuned to a preferred stimulus value. The preferred values of all neurons cover the whole range of stimuli18,19,20 (Fig. 1b, bottom); if s ranges over a periodic domain (such as the orientation of a bar in a visual scene, or the direction of an arm reach) then it is commonly assumed that the neurons’ preferred stimuli are distributed on a ring (Fig. 1b, top). To generate neuronal responses from such a population we simulate a network of neurons whose spiking activity, rt, at time t is Poissonian with instantaneous firing rate λt (Eq. (11)). For simplicity we assume linear (or linearized) neuronal transfer and synaptic interactions (Eqs. (10), (11)), so that the firing rate is a linear function of the feedforward and recurrent inputs. We couple excitatory (E) neurons with similar stimulus preferences more strongly8,9 to one another, compared to neuron pairs with dissimilar tuning. In this way, the recurrent E connectivity has the same circular symmetry as the stimulus (Fig. 1b). In contrast, connections between inhibitory (I) neurons are unstructured, and inhibitory activity acts to stabilize network activity21. A stimulus, e.g., s = 0, results in elevated activity of E neurons with the corresponding preference (Fig. S1a). As expected, an increase in the strength of recurrent excitatory connections increases both the firing rates and the trial-to-trial pairwise covariability (i.e., noise correlations) in the responses2 (Fig. S2a). This canonical network model has been widely used to explain cortical network dynamics and neural coding21,22,23. And our network model can produce neuronal responses that are qualitatively similar to experimental observations, including the variance of neuronal firing rate, the Fano factor, and the noise correlations (Fig. S2b–d).

We use linear Fisher Information (LFI) to quantify the impact of recurrent connectivity and internal spiking variability on the accuracy of the stimulus estimate, \({\hat{s}}_{t},\) from the activity vector rt (see details in Eq. S39 in Supplementary Information). The inverse of LFI provides a lower bound on the expected square of the difference between the true value, s, and the estimate, \({\hat{s}}_{t},\) made by a linear decoder1,4,17,18,19,24. In the limit of an infinite number of neurons available to the decoder LFI is unaffected by recurrent connectivity strength, wE (Fig. 1d, dashed line). This is because the mean response of the network is linear in its inputs, and an (invertible) linear transformation can neither increase nor decrease LFI (see Eq. S38 in Supplementary Information). For networks with a finite number of neurons, the variability from spike generation is shared between neurons via recurrent interactions. Consequently, an increase in coupling strength, wE, reduces LFI in finite networks (Fig. 1d, colored lines).

In sum, recurrent connectivity and spiking variability do not improve, and often degrade, stimulus representation in the network (as measured by LFI). Since synaptic coupling is biologically expensive, a network that most accurately and cheaply represents a stimulus is then one with no recurrent connections (i.e., wE = 0) and minimal spiking variability. Nevertheless, connectivity in mammalian cortex is highly recurrent5,6,7,9, and neural responses are highly variable2,3. What is then the function of these extensive recurrent connections between cortical neurons in information representation, and why are their responses so noisy?

While classical population code theory often explains how to generate point estimates of a stimulus (Fig. 1a), numerous studies suggest that the brain performs Bayesian inference to synthesize and estimate the probability distribution of latent stimuli from sensory inputs (e.g., refs. 10,11,12,13,14,15,25,26). To compute this posterior a neural circuit needs to combine a stored representation of the prior distribution of the stimulus with the likelihood conveyed by feedforward inputs. We propose that recurrent connectivity can be used to represent the prior and spiking variability can generate samples from this posterior distribution. Before we present our full model we first show how sampling-based inference can be implemented in a population of spiking neurons.

Internally generated Poisson spiking variability drives sampling-based Bayesian inference

Many studies suggest that neuronal response variability is a signature of sampling-based Bayesian inference in neural circuits (e.g., refs. 16,27,28,29,30,31,32,33,34). In these studies, the instantaneous population responses, rt, represent a sample of a latent stimulus, and the empirical distribution of stimulus samples collected over time is an approximation of the posterior distribution. Implementing sampling requires a network that generates variable output with stable statistics. It has been well documented that cortical spiking responses are often approximately Poissonian3,35. Theoretical studies suggest that such Poissonian variability can be internally generated in a network with dynamically balanced recurrent excitation and inhibition36,37. We thus assumed that our model neurons are Poissonian, and used the resulting fluctuations as the internal source of variability needed for sampling-based Bayesian inference. It remains to be shown if discrete Poissonian variability can be used to generate samples from stimuli with continuous probability distributions (e.g., orientation, moving direction) with the flexibility needed to represent different stimulus uncertainties. However, spike counts are discrete, and it is possible that errors that arise from representing continuous parameters by discrete random variable are characteristic of stimulus inference by animals that use sampling.

We address this question using a theory based on a simple model network composed of excitatory (E) Poissonian neurons (Eqs. (10), (11)), and subsequently support our findings by simulating a network containing both E and inhibitory (I) neurons (e.g., Fig. 1b). We start by showing that Poissonian spiking in a population of tuned neurons can drive sampling from a well–defined distribution. We assume that the instantaneous firing rates of a population of E neurons, λt, have a bell-shaped (Gaussian) profile (Fig. 2b), so that for the jth neuron \({{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{tj}=R\exp [{{{{{{{{\bf{h}}}}}}}}}_{j}({\bar{s}}_{t})]=R\exp [-{({\bar{s}}_{t}-{\theta }_{j})}^{2}/2{a}^{2}]\) (See Eq. (12) in Methods). Here θj is the preferred stimulus of neuron j, a is the width of the tuning curve, and \({\bar{s}}_{t}\) is the location of the peak of the firing rate profile, λt, in stimulus space (x-axis in Fig. 2b). Note that the value of \({\bar{s}}_{t}\) is arbitrary here, but we will later relate it to the input to the population. Finally, the preferred stimuli of the E neurons, \({\{{\theta }_{j}\}}_{j=1}^{{N}_{E}}\), are uniformly distributed over the stimulus range (Fig. 1b). In each time interval the population activity is given by a vector of independent Poisson random variables, rt, with means determined by the instantaneous firing rate vector λt (Fig. 2b, c). At each time, t, this spiking activity produces a stimulus sample, \({\tilde{s}}_{t}\), from the probability distribution determined by the instantaneous firing rates, λt (Fig. 2d, see Methods),

$${\tilde{s}}_{t} \sim p(\tilde{s}| {{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{t})\propto \exp [{{{{{{{\bf{h}}}}}}}}{(\tilde{s})}^{\top }{{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{t}]\propto {{{{{{{\mathcal{N}}}}}}}}(\tilde{s}| {\bar{s}}_{t},{\Lambda }^{-1}).$$
(1)

With the Gaussian firing rate profile we use here, the stimulus sample, \({\tilde{s}}_{t},\) can be read out as \({\tilde{s}}_{t}={\sum }_{j}{{{{{{{{\bf{r}}}}}}}}}_{tj}{\theta }_{j}/{\sum }_{j}{{{{{{{{\bf{r}}}}}}}}}_{tj}\) (Eq. (14) and Fig. 2d), which can be thought of as the location of the response, rt, in stimulus space (y-axis in Fig. 2c). The collection of stimulus samples across time (\(\{{\tilde{s}}_{t}\}\); Fig. 2e), determines the sampling distribution \(q(s)={T}^{-1}{\sum }_{t}\delta (s-{\tilde{s}}_{t})\) which approximates the distribution p(sλt), i.e., p(sλt) ≈ q(s)16,38. Here, δ(  ) is the Dirac delta function and T is the number of samples. We assumed that instantaneous population firing rates are smooth to simplify the analysis, but this assumption is not essential. Sampling driven by Poissonian variability will work as long as the temporally averaged population firing rate is smooth, even if the instantaneous population firing rate is noisy (see Eq. (17)).

Fig. 2: Spike generation with Poissonian variability can support sampling-based Bayesian inference.
figure 2

a We use a feedforward network model (no recurrent connections) to demonstrate how spiking variability drives sampling. Neurons receive feedforward inputs, uf, modeled as independent Poisson spike trains, resulting in a Poissonian population response, rt, with means determined by the instantaneous firing rate vector, λt. (be) Demonstration of sampling via stochastic spike generation. A population of neurons with Gaussian tuning and firing rates λt (b) generates a realization of a population response, rt (c). A sample from the posterior distribution of the stimulus (d, orange box) can be linearly read out from the population response (c, orange box). e The sampling distribution is obtained by collecting stimulus samples over time. The profile of population firing rates (f) determines the sampling distribution (g). The position of the population firing rate, \({\bar{s}}_{t}\), determines the mean of the sampling distribution, and the variance of the sampling distribution is inversely proportional to the peak firing rate, R. We show two population activity profiles, one in blue and the other in orange, to illustrate these points. h In an E-I network, the precision of the sampling distribution (the inverse of sampling variability) read out from E neurons increases with the height of firing rate, and is consistent with the likelihood directly read out from the feedforward input.

To use this mechanism to produce samples from the posterior distribution of a stimulus, we must define a generative model for the feedforward inputs evoked by a stimulus. We take the feedforward input to the neural population, uf, to be a vector of independent Poisson spike counts with Gaussian tuning over the stimulus, s. Following assumptions widely used in previous studies of probabilistic population codes (PPC)39,40, we assume that the mean input spike count to the jth excitatory neuron in the population is \(\langle {{{{{{{{\bf{u}}}}}}}}}_{j}^{{\mathsf{f}}}(s)\rangle \propto \exp [{{{{{{{{\bf{h}}}}}}}}}_{j}(s)]=\exp [-{(s-{\theta }_{j})}^{2}/2{a}^{2}]\). A single realization of the input, uf, in a time interval encodes the whole likelihood function over the stimulus, p(ufs) 39. This likelihood is proportional to a Gaussian due to the Gaussian profile of feedforward input (Eq. (19)),

$$p({{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}}| s) =\mathop{\prod }\limits_{j=1}^{{N}_{E}}{{{{{{{\rm{Poisson}}}}}}}}\left[\langle {{{{{{{{\bf{u}}}}}}}}}_{j}^{{\mathsf{f}}}(s)\rangle \right],\\ \propto \exp \left[{{{{{{{\bf{h}}}}}}}}{(s)}^{\top }{{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}}\right],\\ \propto {{{{{{{\mathcal{N}}}}}}}}(s| {\mu }_{{\mathsf{f}}},{\Lambda }_{{\mathsf{f}}}^{-1}).$$
(2)

Here the likelihood mean, μf, is determined by the location of uf in stimulus space, and the precision, Λf, is proportional to the spike count (or height) of uf (Eq. (20)). Since a realization of the feedforward input encodes the whole likelihood function, we present a fixed uf to the network over time (dropping the time index t), and describe how samples from the posterior p(suf) are generated by the network.

A simple example of inference via sampling is provided by a population of E neurons without recurrent connections and instantaneous firing rates equal to the feedforward input, λt = uf (Eq. (10)), and hence constant in time (Fig. 2a). In this feedforward network Poisson spike generation produces samples from the normalized likelihood, i.e., \({\tilde{s}}_{t} \sim p(\tilde{s}| {{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{t})\propto p({{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}}| \tilde{s})\), and consequently the network represents a uniform stimulus prior (i.e., p(s) is a constant).

To test our theory, we simulated the response of a network of tuned excitatory (E) and untuned inhibitory (I) neurons (Fig. 2a, c) to a fixed but randomly generated feedforward input (Eq. (18)). While the E neurons shared no recurrent connections, the E and I neurons were connected to maintain stable network activity. To confirm that the overall firing rate dictated the sampling variability (Eq. (1)), we increased the feedforward input rate, which reduced the width of the likelihood (Eq. (2)). As a result, the sampling precision (inverse of the sampling variance) increased and matched the precision of the likelihood (Fig. 2g, h), even as the normalized response variability (measured the by Fano factor) of single neurons remained unchanged.

While the above analysis introduces the key components of a sampling-based theory of inference, stimulus sampling using a feedforward network is unnecessary: A single observation of the response r in a deterministic feedforward network (r = uf after removing spike generation in Eq. (11)) would also represent the whole likelihood39, avoiding the costly process of collecting samples \({\tilde{s}}_{t}\) across time. We next consider more interesting cases, and show that spiking variability in recurrent networks can drive sampling from more complex posterior distributions.

Recurrent cortical circuit samples a hierarchical generative model

Recurrent networks can store a variety of generative model structures; to demonstrate the generality of our sampling framework we provide two example generative models which serve as building blocks for more complex models. We first consider a two-stage hierarchical model of feedforward inputs received by the cortical circuit (Fig. 3a). The first stage of our model consists of a stimulus, s, and a stimulus parameter, z, both of which are one dimensional for simplicity. The structure of the world is described by the joint distribution, p(s, z). Using the visual system as motivation, s, could be the orientation of the visual texture within a classical receptive field (local information) of a hypercolumn of V1 neurons, while stimulus parameter, z, may refer to the context orientation within a non-classical receptive field of these cells (Fig. 3a). The likelihood of the stimulus based on a given parameter, \(p(s| z)={{{{{{{\mathcal{N}}}}}}}}(s| z,{\Lambda }_{s}^{-1}),\) is Gaussian with precision Λs. For simplicity, we assume that the prior, p(z), is uniform, which implies that the marginal prior of s, is also uniform (Fig. 3b). This assumption is not essential for our main conclusions but does simplify the analysis. Importantly, the joint prior of stimulus and stimulus parameter, p(s, z), can have non-trivial structure with the density concentrated around the diagonal s = z (Fig. 3b). The precision Λs measures how strongly z and s are related, and thus determines how strongly their joint distribution is concentrated around the diagonal.

Fig. 3: A hierarchical generative model and posterior inference via Gibbs sampling.
figure 3

a An example of sensory feedforward input generation: The stimulus parameter, z, is the orientation of the tree trunk, and the stimulus, s, is the orientation of the bark texture located in the classical receptive field of a V1 hypercolumn. The recurrent circuit generates samples from the joint posterior over stimulus and stimulus parameter. Solid circles: observations and responses in the brain; dashed circles: latent variables in the external world. Nature image is adapted from Tkačik, G. et al. Natural images from the birthplace of the human eye. PLoS one 6, e20409 (2011). b The joint prior over the stimulus parameter, z, and stimulus, s, is concentrated on the diagonal. The correlation between context and stimulus is determined by parameter Λs. (c) The posterior over stimulus parameter and stimulus can be approximated via Gibbs sampling (Eqs. (4a), (4c)) by iteratively generating samples of s and z from their respective conditional distributions. d The resulting approximations of the joint and marginal posterior over s and z. Light blue contour: the posterior distribution (Eq. (24)); Red dots: Samples obtained using Gibbs sampling. The green and orange projections are the marginal posterior distributions of s and z, respectively.

The second stage of the generative model describes how the feedforward input depends on the stimulus, s; this is identical to our prior treatment (See Eq. (2)). Combining these two stages provides a complete description of the generative model for the feedforward input received by neurons in the population,

$$p({{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}}| s)p(s| z)p(z) \propto \mathop{\prod }\limits_{j=1}^{{N}_{E}}{{{{{{{\rm{Poisson}}}}}}}}\left({{{{{{{{\bf{u}}}}}}}}}_{j}^{{\mathsf{f}}}| s\right)p(s| z),\\ \propto {{{{{{{\mathcal{N}}}}}}}}(s| {\mu }_{{\mathsf{f}}},{\Lambda }_{{\mathsf{f}}}^{-1}){{{{{{{\mathcal{N}}}}}}}}(s| z,{\Lambda }_{s}^{-1}).$$
(3)

Given this hierarchical model, we can show that the joint posterior over stimulus and stimulus parameters, p(s, zuf) is a bivariate normal distribution (see Eq. (24)), and we next use it to evaluate the accuracy of the sampling distribution.

Gibbs sampling of the joint posterior of stimulus and stimulus parameter

One approach to approximate the joint distribution over stimulus and stimulus parameter is Gibbs sampling31,38,41,42 which starts with an initial guess for the value of the two latent variables, and proceeds by alternately generating samples of one variable from the distribution conditioned on the value of the second variable. More precisely, to approximate the joint posterior of s and z (Eq. (3)), Gibbs sampling proceeds by generating a sequence of samples, \(({\tilde{s}}_{t},{\tilde{z}}_{t})\) indexed by time t, through recursive iteration of the following steps (Fig. 3c and Eq. (25)),

$${{{{{{{\rm{Compute\; :}}}}}}}}\,\,p(\tilde{s}| {\tilde{z}}_{t},{{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})\propto p({{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}}| \tilde{s})p(\tilde{s}| {\tilde{z}}_{t})\equiv {{{{{{{\mathcal{N}}}}}}}}(\tilde{s}| {\bar{s}}_{t},{\Lambda }^{-1}),$$
(4a)
$${{{{{{\rm{Sample}}}}}}}\; :\,\,{\tilde{s}}_{t} \sim p(\tilde{s}| {\tilde{z}}_{t},{{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}}),$$
(4b)
$${{{{{{{\rm{Sample\; :}}}}}}}}\,\,{\tilde{z}}_{t+\Delta t} \sim p(\tilde{z}| {\tilde{s}}_{t})={{{{{{{\mathcal{N}}}}}}}}(\tilde{z}| {\tilde{s}}_{t},{\Lambda }_{s}^{-1}).$$
(4c)

Here Δt is the time increment between successive samples. The samples (red dots in Fig. 3d) are generated by alternately fixing the values of the two variables, so that sampling trajectories alternate between horizontal and vertical jumps (cyan lines in Fig. 3d). The empirical distribution of samples, i.e., \(q(s,z| {{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})={T}^{-1}{\sum }_{t}\delta [{(s,z)}^{\top }-{({\tilde{s}}_{t},{\tilde{z}}_{t})}^{\top }]\) with denoting vector transpose, approximates the joint posterior p(s, zuf) (blue contour map in Fig. 3d, Eq. (24))38. To approximate p(suf), the marginal posterior distribution of s, we can use only samples \({\tilde{s}}_{t}\) to obtain the approximating distribution q(suf) (compare the two green lines at the margin in Fig. 3d). The same is true for the marginal posterior over z.

Implementing Gibbs sampling of stimulus and stimulus parameter in a recurrently coupled cortical circuit

An implementation of Gibbs sampling in a recurrent E circuit can be intuitively understood by comparing the recurrent network dynamics (Fig. 4a) with the dynamics described by the Gibbs sampling algorithm (Fig. 3c). In the recurrent network a stimulus sample, \({\tilde{s}}_{t},\) is represented by the activity of E cells, rt, while a stimulus parameter sample, \({\tilde{z}}_{t},\) is represented by recurrent inputs, \({{{{{{{{\bf{u}}}}}}}}}_{t}^{{\mathsf{r}}}\). To generate correct samples we require that the conditional distribution that is represented by the instantaneous firing rate, λt (Eq. (1)), matches the conditional distribution used in the Gibbs sampling algorithm (Eq. (4b)), so that \(p(\tilde{s}| {\tilde{z}}_{t},{{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})=p(\tilde{s}| {{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{t})\propto \exp [{{{{{{{\bf{h}}}}}}}}{(\tilde{s})}^{\top }{{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{t}]\). Equating the two distributions (see Eqs. (4a) and (10)) yields the relation,

$$\ln p(\tilde{s}| {\tilde{z}}_{t},{{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}}) =\ln p({{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}}| \tilde{s})+\ln p(\tilde{s}| {\tilde{z}}_{t}),\\ \iff \,\,{{{{{{{\bf{h}}}}}}}}{(\tilde{s})}^{\top }{{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{t} =\,{{{{{{{\bf{h}}}}}}}}{(\tilde{s})}^{\top }{{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}}\,+\,{{{{{{{\bf{h}}}}}}}}{(\tilde{s})}^{\top }{{{{{{{{\bf{u}}}}}}}}}_{t}^{{\mathsf{r}}}.$$
(5)

This equation holds when two constraints are satisfied: First, the firing rate vector, λt, needs to have a Gaussian profile peaked at \({\bar{s}}_{t}\), i.e., the mean of \(p(\tilde{s}| {\tilde{z}}_{t},{{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})\) (Eq. (4a)). Second, the peak firing rate, R, needs to be proportional to the precision of \(p(\tilde{s}| {\tilde{z}}_{t},{{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})\), i.e., R Λ (see Fig. 2f, g). In a neural circuit one way for λt to satisfy these constraints is for feedforward inputs, uf, and recurrent inputs, \({{{{{{{{\bf{u}}}}}}}}}_{t}^{{\mathsf{r}}},\) to both have Gaussian profiles with the same width, a, as that of λt (by sharing the same \({{{{{{{\bf{h}}}}}}}}(\tilde{s})\), Eqs. (5) and (12)). This is because the sum of two Gaussian-profile inputs with the same width, a, gives a firing rate, λt, with the same tuning, as long as the difference of the locations of two inputs is much smaller than the width, a. Our generative model (Eq. (3)) produces feedforward input, uf, with a Gaussian profile and encodes the likelihood function \(p({{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}}| \tilde{s})\). The recurrent input, \({{{{{{{{\bf{u}}}}}}}}}_{t}^{{\mathsf{r}}}\), then need to represent the conditional distribution \(p(\tilde{s}| {\tilde{z}}_{t})\). Hence, to satisfy Eq. (5) the recurrent input \({{{{{{{{\bf{u}}}}}}}}}_{t}^{{\mathsf{r}}}\) should have the same Gaussian profile as uf (Eq. (29)), with its location and magnitude determined by the mean and precision of \(p(\tilde{s}| {\tilde{z}}_{t}),\) respectively.

Fig. 4: A recurrent circuit generates samples from the posterior defined by a hierarchical generative model.
figure 4

a Schematic of recurrent circuit dynamics, in which stimulus, s, and stimulus parameter, z, are encoded respectively in the population response, rt, and recurrent inputs, \({{{{{{{{\bf{u}}}}}}}}}_{t}^{{\mathsf{r}}}\). b, c When the feedforward inputs and recurrent inputs share the same tuning profile, summing the two inputs to define the instantaneous firing rate (b) is equivalent to multiplying the conditional distributions encoded by the two inputs to obtain the conditional distribution of the stimulus, \(p(s| {\tilde{z}}_{t},{{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})\). c The conditional distributions of the stimulus can be explicitly read out from corresponding population responses by a linear decoder (b). df) Reading out the joint sampling distribution from the recurrent circuit. The projection of the spiking activity (Eq. (14)) and recurrent inputs (Eq. (29)) onto the stimulus subspace (black curves), can be read out linearly from the population activity and interpreted as a sample of stimulus and stimulus parameter respectively (Eqs. (4b), (4c)). Top right insets: the empirical marginal distributions of samples and marginal posteriors (smooth lines). (f) The joint value (red dots) of instantaneous samples of stimulus (black curve on the surface in (d)), and stimulus parameter (black curve on the surface in (e)) represent samples from the joint posterior of the stimulus and stimulus parameter. The true joint posterior is represented by the blue contour.

If recurrent interactions are absent (setting \({{{{{{{{\bf{u}}}}}}}}}_{t}^{{\mathsf{r}}}=0\)), then network activity, rt, generates samples from the normalized likelihood, \(p({{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}}| \tilde{s}),\) as we showed previously when describing feedforward networks (Fig. 2). When neurons only receive recurrent inputs (setting uf = 0), the network generates samples from the conditional distribution \(p(\tilde{s}| {\tilde{z}}_{t})\). Driven by a sum of recurrent and feedforward inputs, the network generates samples from a distribution given by the product of the conditional distributions encoded by both inputs respectively (Fig. 4b, c).

The recurrent weights must be adjusted so that the recurrent input has the appropriate magnitude and width to encode the likelihood p(sz). To simplify the exposition we first assume that E neurons are only self-connected, so that the width of recurrent input trivially matches that of the feedforward input (otherwise recurrence will broaden the profile of the firing rate activity λt over the network). To constrain the magnitude of the recurrent weights we require that the sum of the recurrent inputs satisfies \({\sum }_{j}{{{{{{{{\bf{u}}}}}}}}}_{tj}^{{\mathsf{r}}}\propto {\Lambda }_{s}\). Since \({{{{{{{{\bf{u}}}}}}}}}_{j}^{{\mathsf{r}}}={w}_{E}{{{{{{{{\bf{r}}}}}}}}}_{j}\) and the width of \({{{{{{{{\bf{u}}}}}}}}}_{j}^{{\mathsf{r}}}\) and rj are equal, the magnitude of the recurrent weights that result in samples from the correct posterior must satisfy:

$${w}_{E}^{*}=\frac{\langle {{{{{{{{\bf{u}}}}}}}}}_{j}^{{\mathsf{r}}}\rangle }{\langle {{{{{{{{\bf{r}}}}}}}}}_{j}\rangle }=\frac{\langle {\sum }_{j} \, {{{{{{{{\bf{u}}}}}}}}}_{j}^{{\mathsf{r}}}\rangle }{\langle {\sum }_{j} \, {{{{{{{{\bf{r}}}}}}}}}_{j}\rangle }=\frac{{\Lambda }_{s}}{{\Lambda }_{{\mathsf{f}}}+{\Lambda }_{s}},$$
(6)

where Λs and Λf are the precision of likelihood p(sz) and p(ufs) respectively (Eq. (3)). The optimal recurrent weight, \({w}_{E}^{*},\) thus encodes the correlation between the stimulus s and the stimulus parameter z. An increase in correlation between s and z, resulting in a narrower diagonal band in p(s, z) (Fig. 3b), requires an increase in the recurrent weight \({w}_{E}^{*}\) for optimal sampling. When the underlying parameter and stimulus are uncorrelated so that Λs = 0, the hierarchical generative model (Fig. 3a) is equivalent to the generative model without stimulus parameter (Fig. 2a) and recurrent interactions are not needed for sampling (i.e., \({w}_{E}^{*}\) = 0). Moreover, the optimal recurrent weight also depends on the likelihood precision Λf that is determined by the input spike count. Hence, the optimal weight needs to be adjusted depending on feedforward inputs so that samples from the correct posterior are generated (see Discussion of how this feature impacts the network sampling). Overall, our framework (Eq. (6)) thus predicts that optimal Bayesian inference is achieved with recurrent synaptic weights which depend on the correlative structure of the external world. We numerically test this prediction in the next section.

A stochastic E-I spiking network jointly samples stimulus and stimulus parameter

To confirm the predictions of this analysis, we simulated a full recurrent network consisting of both E and I neurons with Poisson spike train statistics (see details in Eqs. (47)–(50)). The E neurons were synaptically connected to each other (Eq. (49), see Fig. 1a), in contrast to the simple network of self-connected E neurons we described above. While recurrent E to E coupling broadens the tuning of excitatory recurrent input, lateral inhibition can sharpen Gaussian firing rate profiles so that it matches that of the feedforward inputs (as required by Eq. (5)).

The activity of the recurrent network in response to a fixed but randomly generated feedforward input (Eq. (3)) can be decoded to produce samples from the bivarite posterior distribution of the stimulus and stimulus parameter. As above, samples from the conditional stimulus distribution are represented by the activity of E neurons (Eq. (14)), while samples from the conditional stimulus parameter distribution are represented by recurrent inputs received by E neurons (Eq. (29); black curves overlaid on the top of population responses in Fig. 4d, e, respectively). To update recurrent inputs we only used neuronal activity at the previous time step. Thus, the activities of E neurons and their recurrent inputs were updated in alternation, consistent with Gibbs sampling. The trajectory obtained by plotting the stimulus sample read out from the network activity on one axis, and plotting the stimulus parameter sample read out from recurrent E inputs on another axis then exhibits the characteristics of Gibbs sampling (Fig. 4f, cyan line). The resulting sampling distribution provides a good approximation to the joint posterior of stimulus and context (compare red dots and blue contour in Fig. 4f). Inhibitory neurons again did not respond selectively to either the stimulus or the stimulus parameter.

For the network to generate samples from the joint posterior, the recurrent connectivity should depend on the correlation between the stimulus and the stimulus parameter (Eq. (6)). To verify this prediction, we fixed the generative model (Eq. (3)) and changed only the recurrent weights in the network. For simplicity, we only varied the peak E weight, wE (Eq. (49)), and maintained network stability by fixing the ratio between E and I synaptic weights. While increasing wE did not change the sampling mean, it did increase the variance of the stimulus parameter sampling distribution, and increased the correlation between stimulus and stimulus parameter samples (Fig. 5a).

Fig. 5: The joint sampling distribution of stimulus and stimulus parameter changes with the recurrent weight in the network.
figure 5

a The sampling distribution for different recurrent excitatory weights, wE. The ratio of excitatory and inhibitory weights was fixed. Ellipses capture three standard deviations from the mean of the joint sampling distribution. Different colors correspond to the three values of wE, denoted by different symbols in b. b The mutual information between the latent variables, s and z, and the feedforward inputs for an ideal Bayesian observer (black horizontal line) and for the sampling distribution generated by the network model (blue curve). The difference between the two lines is the KL divergence between the posterior, p(s, zuf), and the sampling distribution, q(s, zuf). KL divergence is minimized when the weight in the recurrent network is set to a value, \({w}_{E}^{*},\) at which the sampling distribution, q, best matches the true posteriori, p (black circle). c This optimal weight, \({w}_{E}^{*},\) increases with prior precision, Λs.

We use Kullback–Leibler (KL) divergence to measure the distance between the sampling distribution, q(s, zuf), and the true posterior, p(s, zuf) (Eq. (24)). The KL divergence quantifies the loss of mutual information, measured in bits, between the latent variables (s and z) and the feedforward inputs, uf, when the true posterior, p, is approximated by the distribution, q (Eq. (42))38. The mutual information loss in the network is minimized at a unique value of the recurrent weight, \({w}_{E}^{*}\), at which the sampling distribution, q, best matches the posterior, p (Fig. 5b, black circle). To confirm that this optimal recurrent weight, \({w}_{E}^{*},\) increases with the correlation in the prior (precision Λs, Eq. (6)), we numerically obtained the recurrent weight that minimizes the mutual information loss for each value of Λs in the generative model. These results confirmed the predictions of our theory (Eq. (6), Fig. 5c): When Λs = 0, i.e., when stimulus parameter and stimulus are uncorrelated, a network with no interactions performs best (\({w}_{E}^{*}=0\)), while for small Λs (relative to Λf) the optimal weight \({w}_{E}^{*}\) is positive and increases with Λs. In total, we have described a potential mechanism for a recurrent network of spiking neurons to perform sampling-based Bayesian inference.

Generating samples from multi-dimensional posteriors with coupled neural circuits

To demonstrate the generality of the proposed neural code we next consider a world described by a broad, rather than deep (hierarchical) generative model. Information about each of two latent stimuli, s = (s1, s2), is relayed by corresponding feedforward inputs received by a neural circuit (Fig. 6a). We assume the prior is a bivariate Gaussian distribution (Fig. 6b), i.e., \(p({{{{{{{\bf{s}}}}}}}})\propto \exp [-{\Lambda }_{s}{({s}_{1}-{s}_{2})}^{2}/2]\equiv {{{{{{{\mathcal{N}}}}}}}}({s}_{1}-{s}_{2},{\Lambda }_{s}^{-1})\), so that Λs ≥ 0) characterizes the correlation between s1 and s2. Furthermore, each stimulus, sm, independently generates feedforward spiking inputs, \({{{{{{{{\bf{u}}}}}}}}}_{m}^{{\mathsf{f}}}\), each of which is received by a separate network and produces responses rm for m = 1, 2 (Fig. 6a). Thus, the full generative model of the input has the form,

$$p({{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}}| {{{{{{{\bf{s}}}}}}}})p({{{{{{{\bf{s}}}}}}}}) =\left[\mathop{\prod }\limits_{m=1}^{2}p({{{{{{{{\bf{u}}}}}}}}}_{m}^{{\mathsf{f}}}| {s}_{m})\right]p({s}_{1},{s}_{2}),\\ \propto \left[\mathop{\prod }\limits_{m=1}^{2}{{{{{{{\mathcal{N}}}}}}}}({s}_{m}| {\mu }_{{\mathsf{f}}m},{\Lambda }_{{\mathsf{f}}m}^{-1})\right]{{{{{{{\mathcal{N}}}}}}}}({s}_{1}-{s}_{2},{\Lambda }_{s}^{-1}).$$
(7)

The likelihood \(p({{{{{{{{\bf{u}}}}}}}}}_{m}^{{\mathsf{f}}}| {s}_{m})\) is the same as that given previously (Eq. (2)), where the feedforward inputs, \({{{{{{{{\bf{u}}}}}}}}}_{m}^{{\mathsf{f}}},\) are again described by conditionally independent Poisson spike counts with Gaussian tuning over stimulus sm. As a concrete example, the two stimuli, sm, could represent orientations of local edges falling in the central receptive fields of a V1 hypercolumn (Fig. 6a, bottom), with each V1 hypercolumn modeled by a network producing the response rm (Fig. 6a, top). Then Λs characterizes a priori tendency of the stimuli to share similar orientations, and determines how likely two local edges are to be part of a global line, as in the case of contour integration43,44. However, the generative model defined by Eq. (7) is quite general and has been also used to explain multisensory cue integration10 and sensorimotor learning13.

Fig. 6: Distributed sampling from a multivariate posterior distributions using coupled networks.
figure 6

a Network m (m = 1, 2) receives a feedforward input evoked by a stimulus, sm. The coupling between the two networks represents the stimulus prior. A linear readout from each network, m, can be interpreted as a sample from the posterior of the stimulus, sm. Examples of a prior (b) and likelihood (c). The prior distribution is concentrated around the diagonal line (dashed line), indicating the two stimuli are more likely to be colinear. In (c), μf1 = − 10 and μf2 = 10 are the means of the likelihoods of s1 and s2, respectively. d The joint posterior of stimuli and the corresponding approximate sampling distribution generated by the coupled networks. A sample from the joint posterior can be read out individually from the activity of the corresponding network (shown in a). Light blue contour: the posterior distribution (Eq. (34)); Red dots: stimulus samples generated by the network.

The posterior is a bivariate Gaussian distribution (Fig. 6d, Eq. (34)) whose mean is shifted from the likelihood mean (Fig. 6c) towards to the diagonal line, because of the correlations between the stimuli in the prior (Fig. 6b). We can again use Gibbs sampling to approximate the posterior p(suf) using the following steps,

$${{{{{{\rm{Compute}}}}}}}\; :\,\,p({\tilde{s}}_{1}| {{{{{{{{\bf{u}}}}}}}}}_{1}^{{\mathsf{f}}},{\tilde{s}}_{2,t-\Delta t})\propto p({{{{{{{{\bf{u}}}}}}}}}_{1}^{{\mathsf{f}}}| {\tilde{s}}_{1})p({\tilde{s}}_{2,t-\Delta t}| {\tilde{s}}_{1}),$$
(8a)
$${{{{{{{\rm{Sample\; :}}}}}}}}\,\,{\tilde{s}}_{1t} \sim p({\tilde{s}}_{1}| {{{{{{{{\bf{u}}}}}}}}}_{1}^{{\mathsf{f}}},{\tilde{s}}_{2,t-\Delta t}),$$
(8b)

where \({\tilde{s}}_{1t}\) and \({\tilde{s}}_{2t}\) are instantaneous samples at time t of stimuli s1 and s2, respectively. We only give the steps needed to produce samples from the conditional distribution of s1, as samples from the conditional distribution of s2 can be obtained using the same steps after exchanging indices.

These sampling steps can be implemented distributively in a coupled neural circuit using a mechanism similar to that we described in the case of a hierarchical generative model. The activity of each network, rm, individually represents samples from the (marginal) posterior of sm (Fig. 6a, top). The joint posterior is then approximated as the collection of samples represented by the activity pairs (r1, r2). Taking network m = 1 as an example, spike response r1t produces a stimulus sample \({\tilde{s}}_{1t}\) as long as the instantaneous firing rate λ1t represents the conditional distribution \(p({\tilde{s}}_{1}| {{{{{{{{\bf{u}}}}}}}}}_{1}^{{\mathsf{f}}},{\tilde{s}}_{2,t-\Delta t})\) (Eq. (8a)). Since the feedforward input, \({{{{{{{{\bf{u}}}}}}}}}_{1}^{{\mathsf{f}}},\) represents the likelihood \(p({{{{{{{{\bf{u}}}}}}}}}_{1}^{{\mathsf{f}}}| {\tilde{s}}_{1})\), to obtain the appropriate firing rates, λ1t, the recurrent input from network 2 to network 1, \({{{{{{{{\bf{u}}}}}}}}}_{12,t}^{{\mathsf{r}}}\), must encode the correct conditional distribution, \(p({\tilde{s}}_{2,t-\Delta t}| {\tilde{s}}_{1})\). As in the case of the mechanism we proposed to implement sampling as described by Eq. (5), \({{{{{{{{\bf{u}}}}}}}}}_{12,t}^{{\mathsf{r}}}\) needs to have the same Gaussian profile as the firing rate λ1t, the position of \({{{{{{{{\bf{u}}}}}}}}}_{12,t}^{{\mathsf{r}}}\) on the stimulus space should match the mean of \(p({\tilde{s}}_{2,t-\Delta t}| {\tilde{s}}_{1})\), i.e., \({\tilde{s}}_{2,t-\Delta t}={\sum }_{j}{{{{{{{{\bf{u}}}}}}}}}_{12,tj}^{{\mathsf{r}}}{\theta }_{j}/{\sum }_{j}{{{{{{{{\bf{u}}}}}}}}}_{12,tj}^{{\mathsf{r}}}\), and the magnitude of \({{{{{{{{\bf{u}}}}}}}}}_{12,t}^{{\mathsf{r}}}\) must be proportional to the prior correlation, \({\Lambda }_{s}\propto {\sum }_{j}{{{{{{{{\bf{u}}}}}}}}}_{12,tj}^{{\mathsf{r}}}\) (Eq. (39)). Hence, each network can sum the feedforward input and the recurrent input from its counterpart to obtain an update to the instantaneous conditional distribution given by Eq. (8a), and generate independent Poisson spikes to produce a sample from the instantaneous conditional distribution (Eq. (8b)). Notably, the sample of each stimulus can be locally read out from corresponding network (Eq. (41), Fig. 6a), even if the activities of two networks are correlated.

Since the recurrent input strength represents the stimulus correlation in the prior determined by precision Λs, the coupling between the two networks needs to be tuned to generate the appropriate recurrent input. Indeed, in a network with only E neurons, and connections only between neurons with the same preferred stimulus value but in different networks, the optimal homogeneous connection strength is \({w}_{mn}^{*}=\langle {{{{{{{{\bf{u}}}}}}}}}_{mn,\, j}^{{\mathsf{r}}}\rangle /\langle {{{{{{{{\bf{r}}}}}}}}}_{n,\, j}\rangle={\Lambda }_{s}/({\Lambda }_{{\mathsf{f}}n}+{\Lambda }_{s})\) (Eq. (40)). This mirrors the result obtained with the hierarchical model presented earlier in Eq. (6).

Coupled E-I spiking networks sample bivariate dimensional posteriors

To test the feasibility of the proposed mechanisms for generating samples from a bivariate posterior, we simulated a pair of bidirectionally coupled circuits consisting of E and I neurons (Fig. 7a). This neural circuit model can be extended to generate samples from higher dimensional posterior distribution (see Discussion). Each circuit receives feedforward input generated by one of the two stimuli. On every time step the sample of each stimulus, \({\tilde{s}}_{mt}\), can be individually and linearly read out from the response of corresponding network, rmt (Eq. (41)). Jointly, the two stimulus samples, one each from both networks, \({\tilde{{{{{{{{\bf{s}}}}}}}}}}_{t}={({\tilde{s}}_{1t},{\tilde{s}}_{2t})}^{\top }\), provide a sample from the joint posterior of the two latent stimuli (Fig. 7b). We assumed that the synaptic connections between the networks, wmn (m, n = 1, 2; m ≠ n), are excitatory, but target both E and I neurons, while inhibitory connections are local to each network. We also adjusted network parameters so that the profiles of the inputs across networks (e.g., the inputs from network 2 to 1) have the same tuning profile as the feedforward inputs (see Methods). Since we assumed uniform marginal priors (see Eq. (32)), recurrent connections between E neurons within the a circuit were absent, while E and I neurons within a circuit were recurrently connected to ensure network stability. For simplicity, we chose parameters so that the two circuits were symmetric, but the strength of the feedforward inputs to each could differ.

Fig. 7: The statistics of the multivariate sampling distribution of stimuli generated by coupled E-I circuits.
figure 7

a Each of the two circuits individually generate a sample of a corresponding stimulus which can be read out linearly from that circuit’s activity. Combining the readouts from the two networks yields the joint sampling distribution. The ring color indicates the stimulus sample the circuit generates: green and orange represent the stimulus s1 and s2, respectively. Blue arrows: E synapses with width denoting connection strength; red arrows: I synapses. b The sampling distribution shifts from the likelihood mean to the diagonal line as the coupling between the networks increases. Ellipses capture one standard deviation from the mean of the sampling distribution. Different colors correspond to the three different coupling weights between the circuits shown in (c). c The mutual information between latent variables and the feedforward inputs for the ideal Bayesian observer (black) and the sampling distributions generated by the network with different coupling weights between the two circuits. d The optimal coupling weight that minimizes information loss also increases with prior precision (which is inversely proportional to the width of the band in Fig. 6b). e The mean and precision of the sampling distribution over the two stimuli change with the coupling weight between the circuits when the feedforward input is fixed. f The mean and precision of the sampling distribution over the two stimuli change with the firing rate of feedforward input to network 1, with other network parameters fixed. Comparison of the mean (g) and precision (h) of the sampling distributions with the posteriors under different combinations of feedforward inputs and coupling weights. Different dots are obtained from the sampling distributions obtained under different combinations of input direction and strength, and coupling weight between networks.

We asked whether the activity of the two coupled circuits can generate samples from bivariate posteriors, and how the sampling distribution depends on the coupling, wmn, between the two circuits. An increase in synaptic coupling between the two networks caused the sampling distribution to shift from the likelihood mean towards the diagonal (Fig. 7b), resulting in stimulus samples, \({\tilde{s}}_{1t}\) and \({\tilde{s}}_{2t}\) that were more similar. This is consistent with an increase in stimulus correlation in the multivariate prior, Λs (Eq. (7)). To confirm our prediction that the optimal coupling strength between the two networks, \({w}_{mn}^{*}\), increases with the stimulus correlation in the prior, Λs, we numerically obtained the coupling weight that minimizes the loss of mutual information between latent stimuli and feedforward inputs (Fig. 7c). The optimal synaptic weight between the circuits increased with stimulus correlation in the prior. At the optimal weight, \({w}_{mn}^{*}\), the sampling distribution was close to the true posterior, showing that a properly tuned circuit can generate samples from the correct distribution (Fig. 7d).

We next asked how the sampling distribution in the network depends on network and feedforward input parameters. As the coupling between the two circuits increased, the sample means of both stimuli converge (Fig. 7e, top) and the sampling precision of both stimuli increased as well (Fig. 7e, bottom), in agreement with a more correlated stimulus prior. We also tested whether a network with fixed parameters can generate samples from a family of posteriors with different uncertainties. To do so, we changed the uncertainty of the likelihood of s1 by changing the firing rate in the feedforward input \({{{{{{{{\bf{u}}}}}}}}}_{1}^{{\mathsf{f}}}\) received by network 1. We observed that with a narrower likelihood of s1, the sample means of both stimuli shifted towards the mean of likelihood of s1 (−10°), and sampling precision increased, consistent with a change in the posterior distribution (Fig. 7f). Lastly, to demonstrate the robustness of this network implementation of sampling-based inference we compare the sampling distributions to the true posteriors under different combinations of input and network parameters (Fig. 7g, h), in each case setting the recurrent coupling to the optimal value, \({w}_{mn}^{*}\), obtained numerically. Across different parameter values, we observe excellent agreement in both the mean (Fig. 7g) and precision (Fig. 7h) of the two densities. In sum, our recurrent network of spiking neuron models can be extended to support sampling-based Bayesian inference with multi-dimensional stimuli.

A signature of stimulus sampling: internally generated differential noise correlations

A central prediction of our circuit framework for sampling-based Bayesian inference is that an increase in the correlation between stimuli in the sensory world should result in stronger synapses between neurons whose activities represent these stimuli (see Eq. (6)). This is a difficult prediction to test since measuring synaptic connectivity along a functional axis is already challenging45, let alone measuring a change in synaptic strength owing to a change in stimulus statistics. Here, we outline a testable prediction of our theory by identifying a measurable, population-level signature of changes in functionally related recurrent synaptic strengths.

In response to a fixed feedforward input the responses of a recurrent circuit implementing stimulus sampling will fluctuate. The alignment of the recurrent circuitry and neuronal stimulus tuning causes a portion of these activity fluctuations to align with the subspace in which stimuli are coded. As an example, consider the sampling implemented by a single recurrent network (Fig. 4a), and suppose the population response fluctuates around its mean position (0° in the example of Fig. 8a), ignoring fluctuations along other directions in neuronal response space. The activity of neuron pairs with stimulus preference both above or below the mean position are positively correlated (the black and blue neurons in Fig. 8a), while the activity of neuron pairs with preferences straddling the mean are negatively correlated (the black and red neurons in Fig. 8a). Such stimulus sampling generates a covariance component which is proportional to the outer product of the derivative of neuronal tuning (Fig. 8b), i.e., \({{{{{{{{\bf{f}}}}}}}}}_{s}^{{\prime} }{{{{{{{{\bf{f}}}}}}}}}_{s}^{{\prime} \top }\), where \({{{{{{{{\bf{f}}}}}}}}}_{s}^{{\prime} }\) denotes the derivative of tuning f(s) = 〈λt〉 (mean firing rate) over stimulus s. Such noise correlations have been referred to as differential correlations4,17, and are generally viewed as deleterious to stimulus coding. Stochastic sampling in coupled networks (Fig. 6a) produces similar differential noise correlations (see Supplementary Information).

Fig. 8: Stimulus sampling by a network is reflected in the internally generated differential correlations, whose impact differs from differential correlations inherited from feedforward inputs.
figure 8

a Stimulus sampling via spike generation causes the population firing rate to fluctuate along the stimulus subspace (x-axis). b The pattern of internally generated differential correlation in a network implementing sampling composed of neurons with Gaussian tuning. c Internally generated differential correlations in such a network increase with recurrent weight, wE. d The rate in feedforward input decreases the externally generated correlations, and increases the mutual information between the feedforward inputs and latent stimulus. e Recurrent network weights increase internally generated differential correlations. Mutual information between stimulus and feedforward inputs changes non-monotonically with recurrent weight. The direction of arrows indicates the predicted direction of change of the recurrent weights after an animal is retrained using a new stimulus set with different correlations compared to the reference stimulus set.

In our network implementation of sampling, the amplitude of internally generated differential correlations is not arbitrary, but is determined by the recurrent connection strength, \({w}_{E}^{*}\). Here, the differential covariance matrix of population responses has the form (see Eq. (44))

$${\Sigma }_{DC} =V(\bar{s}| {{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}}){{{{{{{{\bf{f}}}}}}}}}_{s}^{{\prime} }{{{{{{{{\bf{f}}}}}}}}}_{s}^{{\prime} \top },\\ {{{{{{{\rm{where}}}}}}}}\,\,V(\bar{s}| {{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}}) =\frac{{\Lambda }_{s}}{{\Lambda }_{{\mathsf{f}}}({\Lambda }_{{\mathsf{f}}}+{\Lambda }_{s})} ={a}^{2}{n}_{{\mathsf{f}}}^{-1}{w}_{E}^{*},$$
(9)

where \(V(\bar{s}| {{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})\) is the variance of \({\bar{s}}_{t}\) in equilibrium over time, and \({\bar{s}}_{t}\) is the mean of the instantaneous conditional distribution (Eq. (4a)) represented by the position of instantaneous firing rate λt (Fig. 2b). Importantly, the amplitude of differential correlations increases with the recurrent weight, \({w}_{E}^{*},\) which is set by the prior precision Λs (Eq. (6); Fig. 8c). Thus, in our framework internally generated differential correlations are a by-product of inference by sampling from posterior distributions of stimuli in a structured world.

Distinguishing external and internal differential correlations

The previous analysis of internally generated differential correlations in a circuit implementing sampling-based inference is based on the assumption of a fixed feedforward input (Eq. (9)). However, in typical neurophysiology experiments an external stimulus, s, is fixed, while the feedforward input, uf, fluctuates due to variability in sensory acquisition and transmission noise (Eqs. (3) and (7)). Hence, differential correlations of neuronal population responses are a combination of correlations inherited from feedforward input46, and correlations generated by recurrent network interactions that align with the population stimulus tuning24. When the feedforward input is described by a hierarchical generative model (Eq. (2)), the total magnitude of differential correlations in the evoked response is \({a}^{2}{n}_{{\mathsf{f}}}^{-1}{w}_{E}{{{{{{{{\bf{f}}}}}}}}}_{s}^{{\prime} }{{{{{{{{\bf{f}}}}}}}}}_{s}^{{\prime} \top }+{a}^{2}{n}_{{\mathsf{f}}}^{-1}{{{{{{{{\bf{f}}}}}}}}}_{s}^{{\prime} }{{{{{{{{\bf{f}}}}}}}}}_{s}^{{\prime} \top }\) (see Eq. (46)), where the second term reflects differential correlations inherited from the feedforward input (compare with Eq. (9)). Although the two sources of differential correlations are intertwined in the neuronal response, they impact the information content differently thus offering a potential way to distinguish between them in neural data.

Externally generated differential correlations decrease with feedforward input rate, which could be modulated by visual stimulus strength such as contrast (Fig. 8d, red curve). As a consequence, the mutual information (the information between feedforward inputs uf and the latent variables, i.e., s and z, sampled by recurrent network in Fig. 4a, Eq. (42)) increases with feedforward input intensity (Fig. 8d, blue curve). We, therefore, have a monotonic, decreasing relationship between externally generated differential correlations and mutual information. This is expected since such inherited correlations always impair information processing, as observed previously4,17. In contrast, an increase in recurrent weights, wE, increases internally generated differential correlations, but results in a non-monotonic change in mutual information (Fig. 8b). Hence there is a non-monotonic relation between internally generated differential correlations and the mutual information between stimulus and feedforward inputs. In sum, the impact of external and internal differential correlations on stimulus coding can be distinguished by their respective monotonic and non-monotonic relation with the mutual information between stimulus and response.

Discussion

We have presented a framework in which neuronal response variability and recurrent synaptic connections, two ubiquitous features of cortex, are jointly used to implement sampling-based Bayesian inference in neuronal circuit models. Combining mathematical analysis and network simulations, we established that stereotypical Poisson variability of discrete spike counts can drive flexible sampling from a family of continuous distributions. The sampling statistics are determined by the structure of recurrent coupling, which stores information about the stimulus prior, and feedforward inputs conveying the stimulus likelihood. Sampling-based inference is implemented in two steps: the instantaneous firing rate, determined by the sum of feedforward and recurrent inputs, represents the instantaneous conditional distribution of latent stimulus, while Poissonian variability in spike generation is used to generate a random stimulus sample from this conditional distribution. We have shown how sampling can be implemented using biologically feasible mechanisms for three different generative models of increasing complexity. The simplest model includes one latent stimulus, while the more complex models include multiple latent stimuli organized hierarchically or in parallel. These three generative models form the basic building blocks of more complex models. Thus our ideas can be extended to a wide range of perceptual and cognitive processes47.

The neural code we described shares some features with codes described in previous studies, including parametric representations in probabilistic population codes (PPCs)15,39,40, and sampling-based codes (SBCs) 16,27,28,29,30,31,32. In our framework, the conditional distributions of latent variables is represented by instantaneous firing rates which linearly encode the logarithms of these conditional distributions, and have a mathematical form that is similar to that used in past studies describing PPCs (e.g., Eq. (5)). Further, the posterior is represented by stimulus samples generated through a random process, a feature of all SBCs. Despite these similarities, there are fundamental differences between the neural code we described and previously proposed PPCs and SBCs.

PPCs are generally implemented in networks with no internally generated variability, with stochasticity inherited from the stimulus. In contrast, our proposed network is doubly stochastic: The Poisson variability in the feedforward input allows a single realization of the feedforward input to represent the whole stimulus likelihood39, while internally generated Poisson variability drives stimulus sampling. Further, in PPCs the posterior is represented parametrically by a one-shot neuronal response, while in our proposed network the joint posterior is approximated by a sequence of samples, each obtained as a linear readout from the instantaneous neuronal responses. Although it takes time to collect sufficiently many samples to approximate the posterior well, an advantage of sampling codes compared to PPCs is that inference with multivariate posteriors can be implemented using linearly coupled subnetworks (Fig. 6), with the number of subnetworks determined by the dimension of the latent stimulus features. In contrast, to represent an M-dimensional multivariate posterior using PPCs requires NM neurons in a linear network (N is the number of neurons in representing each dimension) so that the number of neurons increases exponentially with the latent stimulus dimension, M16. Alternatively, coupled networks with NM neurons can be used, but require complex, nonlinear coupling between these networks48,49.

Neurons emit a discrete number of spikes, but their responses often need to represent continuous quantities. Most studies of neural sampling implicitly rely on approximating Poissonian spike counts with Gaussian variables (e.g., refs. 29,31,51). However, this approximation does not work well when only a few spikes are emitted. Here, we showed that discrete Poisson spike generation can be used to generate samples from a posterior distribution of a continuous stimulus feature using a temporally averaged, smooth population firing rate profile. Thus, we have shown how a sample from a continuous variable can be generated even with only a few spikes from the neuronal population. Moreover, conventional SBCs are used to generate samples directly in a neural space whose dimension is given by the number of neurons in the population16,27,28,30,31,32,33,34,50, where a neuronal response, rt, is interpreted directly as a sample from the (marginal) posterior of neuronal responses, p(r). Hence the posterior mean is the temporally averaged population response, and the covariance of population responses is the posterior covariance. In contrast, our proposed network generates samples in a low dimensional stimulus subspace embedded in high dimensional neural activity space. The linear projection of network activity, rt, onto the stimulus subspace represents a sample from the stimulus posterior, similar to a previous study29. A computational benefit of sampling in a low dimensional stimulus subspace is convergence speed, as the volume of the stimulus subspace is significantly smaller than that of the neural activity space. Indeed, in our examples sequences of samples generated by a single recurrent network (Fig. 4) and coupled networks (Fig. 6) can both converge to an equilibrium distribution in less than 20 ms, which is fast enough to complete inference on a behaviorally relevant time scale (Fig. S6). Furthermore, the multiplication of probability distributions of latent stimulus, which is central to Bayesian inference (e.g., cue combination, decision making, see review in ref. 15), can be implemented by summing the inputs to a neuronal population (Eq. (5)). This follows from the fact that the instantaneous population input (or firing rate) linearly encodes the logarithm of a probability distribution (Eqs. (1) and (5)). In contrast, producing samples in neural activity space using conventional SBCs requires nonlinear operations in neural circuits in order to multiply probability distributions (or histograms) of the samples15.

A recent study demonstrated that an E-I recurrent network of rate-based neurons can be numerically optimized for sampling-based Bayesian inference32. In contrast, we used a theoretical approach to derive a network model of simplified spiking neurons, which implements sampling-based inference. This allowed us to explicitly describe the putative neural mechanisms needed for such sampling. Although the two studies use different generative models and neural representations, the network models in both studies share some common characteristics: ring structure, Poisson-like response variability, and tuning-dependent noise correlation (Fig. 1d). This implies that the seemingly different generative models and neural representations in the two studies reflect more general principles, as suggested in51. It will be interesting to extend our theoretical approach to dynamical spiking neurons to determine how the timescales of neuronal dynamics and neuronal oscillations impact inference in rich, dynamic sensory scenes (see below).

Differential noise correlations generated by recurrent network interactions are a signature of network sampling in our framework (Figs. 5c and 8c). This is in contrast to earlier studies where differential correlations were inherited from feedforward inputs17,52. While internally generated differential correlations could also emerge from a recurrent circuit which is not implementing inference22,24,52,53,54,55 or implementing inference via other algorithms56, in our framework, the relation between the magnitude of internally generated differential correlations, the posterior uncertainty, and the strength of the recurrent synaptic weight (Eq. (9)) provides a clear test which can be used to verify our proposed circuit mechanism of sampling-based inference. One possible experimental approach would modulate the functional recurrent strength by using a perceptual learning task. Specifically, after using a reference stimulus set with a prescribed correlation between latent stimuli to fully train an animal, we expect that recurrent synaptic weights will strengthen or weaken to improve inference (Fig. 8e, dashed line). This will result in a fixed value of differential noise correlations in the population response due to the recurrent circuitry. Re-training with a stimulus set that has more (less) correlated latent stimuli compared to the reference set will cause the recurrent weights to increase (decrease) (Fig. 8e, red line). When the reference stimulus set is again used to drive task behavior, then performance (as a proxy of mutual information) will decrease, regardless of whether differential correlations have increased or decreased compared to those resulting from the reference stimulus set (Fig. 8e, arrows). In brief, the non-monotonic relationship between differential noise correlations and the mutual information between stimulus and responses which support Bayesian inference offers a clear (and falsifiable) experimental prediction.

Implementing sampling-based inference in our proposed network requires that feedforward and recurrent inputs have the same tuning profile over the stimulus (Eq. (5)). This assumption is supported by experiments in layers 4 and 2/3 in mouse V18. Moreover, the recurrent connections in our network model are translation-invariant in the stimulus subspace, an assumption widely used in studies of continuous attractor networks (CAN)22,54,57,58, and a recent network model implementing sampling32. Perfectly translation-invariant connections are not strictly required for a circuit to implement sampling, but this assumption allows us to simplify the mathematical analysis. Adding randomness in recurrent connectivity would increases the variance of sampling distributions. We could then adjust the overall recurrent weight (a scalar) so that the sampling distribution matches the posterior, with no need to fine-tune individual synaptic weights in the network model. In the past, CANs have been shown to achieve maximal likelihood estimation (point estimate) via template matching15,58,59. Here we have shown that a network with CAN-like structure and internally Poisson spiking variability is able to perform sampling-based Bayesian inference. In our network correlations in the stimulus prior are represented by the strength of recurrent synaptic activity, which implies that the (subjective) prior precision in the network increases with the feedforward input strength.

To maintain a fixed prior in the network recurrent weights need to decrease with increased feedforward input strength which encodes the likelihood precision, Λf (Eq. (6)). Therefore, the (subjective) prior stored in the network with fixed recurrent weights may differ from the objective stimulus prior in the world (Λs in Eqs. (3) and (7)) with feedforward inputs of different strengths. One possibility is that the proposed network model does not generate samples from each distinct posterior determined by a specific feedforward input, p(suf), but rather generates samples from the average sampling distribution over all possible feedforward inputs and hence matches the average posterior distribution \({{\mathbb{E}}}_{p({{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})}[p({{{{{{{\bf{s}}}}}}}}| {{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})]={{\mathbb{E}}}_{p({{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})}[q({{{{{{{\bf{s}}}}}}}}| {{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})]\), where \({{\mathbb{E}}}_{p({{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})}[\cdot ]\) denotes the average over the distribution p(uf). Since the proposed recurrent circuit is general, this result may explain one source of inductive bias in cortical processing60. On the other hand, sampling correctly from each specific posterior could be achieved using different biophysical mechanisms that can modulate synaptic strengths and that we have not included in our model. For instance, short-term synaptic depression61 or spike frequency adaptation62 are gain control mechanisms that would allow the recurrent input strength (representing the prior correlation) to remain relatively fixed despite an increase in the feedforward input strength. Another possibility is that the recurrent circuit represents a more complex generative model which better captures the statistical structure of natural stimuli30,32,63. Here we assumed that the generative models represented by the network match the model that generate the sensory stimuli. This is unlikely to be the case in practice. Such mismatch between the true and internal model of the world can lead to biases and increased noise which are likely to manifest in specific ways in neural circuits that perform inference via sampling64. Furthermore, we only considered sampling driven by spiking variability with a Fano factor of 1, while cortical responses often have Fano factors that differ from 165,66. In the latter case, our theory can still work by changing the feedforward connection weight to compensate for the change in Fano factor, as suggested in a recent study67.

To keep our exposition transparent, we only presented models with minimal complexity. Our proposed network mechanism of sampling-based inference can be generalized to more complex generative models, since the assumption of Gaussianity (Eqs. (21) and (22)) and the analytical expression in Eq. (24) are not essential, and several relaxed frameworks may be explored. First, similar networks can generate samples from other multi-dimensional distributions where the conditional distribution of each latent variable belongs to the linear exponential family38,39. This could be done by changing the tuning functions of neurons to another appropriate profile, as the logarithm of tuning determines the type of sampling distribution (Eq. (1)). When sampling from non-Gaussian distributions, the stimulus samples can be linearly read out with the weight determined by the tuning profile (i.e., h(s) in Eq. (1)39,). Second, the tuning of recurrent inputs does not need to be the same as that of feedforward inputs. Instead, the logarithm of recurrent input tuning can have a form of the conjugate prior with the likelihood conveyed by feedforward inputs. Third, the network model could also be used to infer the latent variables with a non-uniform marginal prior, if, for example, the preferred stimuli of neurons in the population are not distributed uniformly in the stimulus subspace68. And the proposed network model has the potential to produce samples from the posterior distribution of latent dynamic stimuli which can be described by a hidden Markov model. Lastly, we considered only non-structured inhibition for simplicity. Structured inhibitory connections could modulate the position of excitatory responses in the stimulus subspace, i.e., the mean of the conditional distribution. Such interplay between E and I neurons with structured inhibition has the potential to implement Hamiltonian sampling, where the I neurons represent the sample of auxiliary variables38,50.

In conclusion, we have shown that a recurrent circuit of neurons with Poisson spiking statistics can implement sampling from a family of multivariate posterior distributions, with internal spiking variability driving the generation of stimulus samples, and the recurrent connections representing the stimulus prior. The proposed neural code may help us understand the structure of neuronal activity, and provide a building block for more complicated population computations.

Methods

A linear network of excitatory neurons

We study how a generic recurrent network model consisting solely of NE excitatory (E) neurons with Poisson spiking statistics (no inhibitory neurons) can implement sampling-based Bayesian inference to approximate the stimulus posterior. We describe neuronal activity using a time-discretized Hawkes process (a type of multivariate, inhomogeneous Poisson process69). The instantaneous firing rates of the neurons in the network at time t, λt, obey the following recurrent equations:

$${{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{t}\Delta t={{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}}+{{{{{{{{\bf{u}}}}}}}}}_{t}^{{\mathsf{r}}}={{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}}+\left({w}_{E}{{{{{{{{\bf{r}}}}}}}}}_{t-\Delta t}+{\sigma }_{r}{{{{{{{{\boldsymbol{\xi }}}}}}}}}_{t}\right),$$
(10)
$${{{{{{{{\bf{r}}}}}}}}}_{t} \sim \mathop{\prod }\limits_{j=1}^{{N}_{E}}{{{{{{{\rm{Poisson}}}}}}}}\left({{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{tj}\Delta t\right),$$
(11)

where uf is the feedforward Poisson spiking input (described below; Eq. (18)), \({{{{{{{{\bf{u}}}}}}}}}_{t}^{{\mathsf{r}}}\) is the continuous valued recurrent input at time t, and ξt is a NE dimensional independent Gaussian white noise. Hence, over each time interval [t − Δt, t] the activity of the neurons in the network is modeled by a vector of independently generated Poisson spike counts, rt, with means determined by the rates λt. The parameters wE and σr determine the excitatory recurrent weight and recurrent variability, respectively. The instantaneous firing rate λt can be negative due to the recurrent input and noise (Eq. (36)). We interpret a negative firing rate, λt, as a zero probability of generating a spike.

Poisson spike generation samples stimulus

Independent Poisson spike generation in the network whose activity is described by Eq. (11) can drive sampling across time or across trials from a conditional stimulus distribution determined by the instantaneous firing rate λt. Below, we compute the distribution of stimulus samples given λt. We assume that the instantaneous firing rate, λt, has a smooth bell-shaped profile and can be parameterized as,

$${{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{tj}=R\exp [-{({\bar{s}}_{t}-{\theta }_{j})}^{2}/2{a}^{2}] \,=\, R\exp [{{{{{{{{\bf{h}}}}}}}}}_{j}({\bar{s}}_{t})],$$
(12)

where \({\bar{s}}_{t}\) characterizes the position of the population firing rate on the stimulus subspace (Fig. 1b, x-axis), while R and a denote the height and width of the population firing rate, respectively. Further, θj is the preferred stimulus value of neuron j, and the preferred stimuli of all neurons, \({\{{\theta }_{j}\}}_{j=1}^{{N}_{E}}\), are uniformly distributed over the range of stimulus s (Fig. 1b).

To simplify the analysis, we first assume that the instantaneous firing rate is fixed over time. When generating Poisson spikes rt from λt, the probability of observing a stimulus sample \({\tilde{s}}_{t}\) (embedded in rt) can be derived as (see details in Supplementary Information),

$$p({{{{{{{{\bf{r}}}}}}}}}_{t}| {{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{t}) =\mathop{\prod }\limits_{j=1}^{{N}_{E}}{{{{{{{\rm{Poisson}}}}}}}}\left({{{{{{{{\bf{r}}}}}}}}}_{tj}| {{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{tj}\Delta t\right),\\ \propto \exp [{{{{{{{\bf{h}}}}}}}}{({\bar{s}}_{t})}^{\top }{{{{{{{\bf{r}}}}}}}}]\cdot \left[{n}_{{{{{{{{\boldsymbol{\lambda }}}}}}}}}^{{n}_{{{{{{{{\bf{r}}}}}}}}}}\exp (-{n}_{{{{{{{{\boldsymbol{\lambda }}}}}}}}})\right],\\ \propto {{{{{{{\mathcal{N}}}}}}}}\left({\tilde{s}}_{t}| {\bar{s}}_{t},{a}^{2}{n}_{{{{{{{{\bf{r}}}}}}}}}^{-1}\right){{{{{{{\rm{Poisson}}}}}}}}({n}_{{{{{{{{\bf{r}}}}}}}}}| {n}_{{{{{{{{\boldsymbol{\lambda }}}}}}}}}),$$
(13)

where nr = ∑jrtj is the number of emitted spikes across the whole neural population, and nλ = ∑jλj〉Δt is the sum of population firing rate. Here \({{{{{{{\mathcal{N}}}}}}}}(s| \mu,{\sigma }^{2})\) denotes a Gaussian distribution with mean μ and variance σ2, and \({{{{{{{\bf{h}}}}}}}}({\bar{s}}_{t})\) is a vector with the jth element as \({{{{{{{{\bf{h}}}}}}}}}_{j}({\bar{s}}_{t})\) shown in Eq. (12). The logarithm of the firing rate profile, \({{{{{{{\bf{h}}}}}}}}({\bar{s}}_{t}),\) determines how the stimulus sample \({\tilde{s}}_{t}\) and its mean, \({\bar{s}}_{t},\) can be read out respectively from rt and λt,

$${\tilde{s}}_{t}=\mathop{\sum}\limits_{j}{{{{{{{{\bf{r}}}}}}}}}_{tj}{\theta }_{j}/\mathop{\sum}\limits_{j}{{{{{{{{\bf{r}}}}}}}}}_{tj},\quad {\bar{s}}_{t}=\mathop{\sum}\limits_{j}{{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{tj}{\theta }_{j}/\mathop{\sum}\limits_{j}{{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{tj},$$
(14)

where \({\tilde{s}}_{t}\) and \({\bar{s}}_{t}\) characterizes the position of rt and λt on the stimulus subspace.

The sampling variability of \({\tilde{s}}_{t}\) in a single time step depends on the number of emitted spikes, nr. When the fixed rates, λt, repeatedly generate spikes over time, the sampling distribution of \({\tilde{s}}_{t}\) can be calculated by marginalizing the likelihood (Eq. (13), last line) over different values of nr since nr varies across time (detailed calculation by using Laplacian approximation can be seen in Supplementary Information),

$$p({\tilde{s}}_{t}| {{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{t}) =\mathop{\sum}\limits_{{n}_{{{{{{{{\bf{r}}}}}}}}}}{{{{{{{\mathcal{N}}}}}}}}\left({\tilde{s}}_{t}| {\bar{s}}_{t},{a}^{2}{n}_{{{{{{{{\bf{r}}}}}}}}}^{-1}\right){{{{{{{\rm{Poisson}}}}}}}}({n}_{{{{{{{{\bf{r}}}}}}}}}| {n}_{{{{{{{{\boldsymbol{\lambda }}}}}}}}}),\\ \approx \, {{{{{{{\mathcal{N}}}}}}}}\left({\tilde{s}}_{t}| {\bar{s}}_{t},{a}^{2}{n}_{{{{{{{{\boldsymbol{\lambda }}}}}}}}}^{-1}\right).$$
(15)

Each stimulus sample, \({\tilde{s}}_{t},\) is thus drawn from a conditional distribution determined by the instantaneous firing rate, \(p(\tilde{s}| {{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{t})\), and can be written as

$${\tilde{s}}_{t} \sim p(\tilde{s}| {{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{t})={{{{{{{\mathcal{N}}}}}}}}\left(\tilde{s}| {\bar{s}}_{t},{a}^{2}{n}_{{{{{{{{\boldsymbol{\lambda }}}}}}}}}^{-1}\right)\propto \exp [{{{{{{{\bf{h}}}}}}}}{(\tilde{s})}^{\top }{{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{t}].$$
(16)

The last proportionality in the above equation is satisfied by a Gaussian profile in the firing rate (more general derivation can be found in Supplementary Information). Introducing Λ = a−2nλ gives Eq. (1) shown in the main text.

Eq. (16) suggests that the type of sampling distribution (or the conditional distribution) that is obtained from spike generation variability is determined by the profile of the instantaneous firing rate, i.e., \({{{{{{{\bf{h}}}}}}}}({\bar{s}}_{t})\) (Eq. (12)). Although the sampling distribution belongs to the linear exponential family of distributions which is similar to the probabilistic population code (PPC)39, there are different ways in representing these distributions. In PPCs, the likelihood over \({\bar{s}}_{t}\) is parametrically represented by a single realization of independent neuronal response r (Eq. (13)), while in our work the distribution is approximated by a sequence of samples, \({\tilde{s}}_{t},\) effectively generated by conditionally independent Poisson spike discharges.

The above analysis can be extended to the case where the instantaneous firing rate, λt, in a time step deviates from a smooth Gaussian profile (Eq. (12)), which is the case in the actual network simulations. In general, λt can be expressed as,

$${{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{tj}={R}_{t}\exp [{{{{{{{{\bf{h}}}}}}}}}_{j}({\bar{s}}_{t})]+{\delta }_{\perp }{{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{tj},$$
(17)

where δλt denotes the deviation from a smooth Gaussian profile. Note that the sampling distribution only depends on the position, \({\bar{s}}_{t}\), and the sum of instantaneous firing rate, nλ (Eq. (16)), which corresponds to two perpendicular directions in the NE dimensional space of λt. For any instantaneous firing rate vector, λt, we can always find \({\bar{s}}_{t}\) and Rt that make the deviation δλt perpendicular to the two directions, i.e., ∑jδλtjθj = 0, and ∑jδλtj = 0. This observation implies that deviations from Gaussian firing rate profiles do not affect our theory.

Feedforward spiking input conveys the likelihood of stimulus

We model the feedforward inputs to the E neurons in the network, uf, as independent Poisson spikes, with Gaussian tuning over stimulus s,

$$p({{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}}| s) =\mathop{\prod }\limits_{j=1}^{{N}_{E}}{{{{{{{\rm{Poisson}}}}}}}}\left[{{{{{{{{\bf{u}}}}}}}}}_{j}^{{\mathsf{f}}}\left|\right.\langle {{{{{{{{\bf{u}}}}}}}}}_{j}^{{\mathsf{f}}}(s)\rangle \right],\\ \langle {{{{{{{{\bf{u}}}}}}}}}_{j}^{{\mathsf{f}}}(s)\rangle ={U}^{{\mathsf{f}}}\exp [{{{{{{{{\bf{h}}}}}}}}}_{j}(s)]={U}^{{\mathsf{f}}}\exp [-{({\theta }_{j}-s)}^{2}/2{a}^{2}].$$
(18)

Here \({{{{{{{{\bf{u}}}}}}}}}_{j}^{{\mathsf{f}}}\) denotes the feedforward input received by the jth E neuron, and \(\langle {{{{{{{{\bf{u}}}}}}}}}_{j}^{{\mathsf{f}}}(s)\rangle\) is the tuning of the feedforward input. This mathematical description of feedforward input is the same as the one used in the definition of typical PPCs15,39,40. Since the preferred stimulus values, \({\{{\theta }_{j}\}}_{j=1}^{{N}_{E}}\), of all feedforward inputs are uniformly distributed in stimulus space then the likelihood of s given a single observation of the input, uf, satisfies39,40,

$$p({{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}}| s) \propto \exp \left[{{{{{{{\bf{h}}}}}}}}{(s)}^{\top }{{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}}\right],\\ \propto {{{{{{{\mathcal{N}}}}}}}}\left(s| {\mu }_{{\mathsf{f}}},{\Lambda }_{{\mathsf{f}}}^{-1}\right).$$
(19)

The logarithm of tuning, h(s), determines the type of likelihood15. Specifically, the Gaussian tuning leads to a Gaussian likelihood (Eq. (19)), whose mean, μf, and precision, Λf, are both linear functions of the inputs,

$${\mu }_{{\mathsf{f}}}={n}_{{\mathsf{f}}}^{-1}\mathop{\sum}\limits_{j}{{{{{{{{\bf{u}}}}}}}}}_{j}^{{\mathsf{f}}}{\theta }_{j},\quad {\Lambda }_{{\mathsf{f}}}={a}^{-2}{n}_{{\mathsf{f}}}={a}^{-2}\mathop{\sum}\limits_{j}{{{{{{{{\bf{u}}}}}}}}}_{j}^{{\mathsf{f}}}.$$
(20)

The mean, μf, represents the position of uf in stimulus subspace, and the precision, Λf, is proportional to the sum of total feedforward spike counts, nf.

A recurrent network samples hierarchical latent variables

A hierarchical generative model

We consider a hierarchical generative model for which inference can be implemented in a recurrent circuit of Poisson neurons. We extend the simple generative model of feedforward input (Eq. (19)) by considering the stimulus s to depend on a one-dimensional stimulus parameter variable, z. For simplicity, we assume that z follows a uniform distribution (Fig. 3b, marginal plots)

$$p(z)={{{{{{{\mathcal{U}}}}}}}}(-18{0}^{\circ },18{0}^{\circ }),$$
(21)

where \({{{{{{{\mathcal{U}}}}}}}}(a,b)\) denotes a uniform distribution over [a, b]. The assumption of a uniform prior, p(z), simplifies our model significantly, as it implies the spatial homogeneity of the network model as given by Eqs. ((18), (19)). However, this assumption is not essential for our main results. Due to the differences between the stimulus and its underlying parameter of the sensory scene, the stimulus, s, is not identical to the parameter z, but we assume that the two are correlated, so that

$$p(s| z,{\Lambda }_{s})={{{{{{{\mathcal{N}}}}}}}}\left(s| z,{\Lambda }_{s}^{-1}\right).$$
(22)

In sum, the whole generative model is determined by,

$$p({{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}},s,z) =p({{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}}| s)p(s| z)p(z),\\ \propto {{{{{{{\mathcal{N}}}}}}}}\left(s| {\mu }_{{\mathsf{f}}},{\Lambda }_{{\mathsf{f}}}^{-1}\right){{{{{{{\mathcal{N}}}}}}}}\left(s| z,{\Lambda }_{s}^{-1}\right),$$
(23)

where p(ufs) is the same as in Eq. (19).

Approximate Bayesian inference via Gibbs sampling

The joint posterior of s and z can be analytically derived given the generative model (Eq. (23)),

$$p(s,z| {{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}}) ={{{{{{{\mathcal{N}}}}}}}}\left[{(s,z)}^{\top }| {{{{{{{{\boldsymbol{\mu }}}}}}}}}_{p},{{{{{{{{\bf{K}}}}}}}}}_{p}^{-1}\right],\\ {{{{{{{{\boldsymbol{\mu }}}}}}}}}_{p} ={({\mu }_{{\mathsf{f}}},{\mu }_{{\mathsf{f}}})}^{\top },\quad {{{{{{{{\bf{K}}}}}}}}}_{p} = \left(\begin{array}{cc}{\Lambda }_{{\mathsf{f}}}+{\Lambda }_{s}&-{\Lambda }_{s}\\ -{\Lambda }_{s}&{\Lambda }_{s}\end{array}\right).$$
(24)

We will use this expression to verify that the samples produced by our algorithm converge to the output of the algorithm.

We use the stochastic response of our recurrent network (Eqs. (10), (11)), as a basis for Gibbs sampling31,38,42 (a type of Monte Carlo method) to approximate the joint posterior p(s, z). To describe the iterative Gibbs algorithm, we assume that a stimulus parameter sample, \({\tilde{z}}_{t}\), is provided at time t, which is then combined with the feedforward input to update the conditional distribution of stimulus s (step 1 in Fig. 3c),

$$p(\tilde{s}| {\tilde{z}}_{t},{{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}}) \propto p({{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}}| \tilde{s})p(\tilde{s}| {\tilde{z}}_{t})\propto {{{{{{{\mathcal{N}}}}}}}}\left(s| {\bar{s}}_{t},{\Lambda }^{-1}\right),\\ {\bar{s}}_{t} =\frac{{\Lambda }_{{\mathsf{f}}}{\mu }_{{\mathsf{f}}}+{\Lambda }_{s}{\tilde{z}}_{t}}{{\Lambda }_{{\mathsf{f}}}+{\Lambda }_{s}},\quad \Lambda={\Lambda }_{{\mathsf{f}}}+{\Lambda }_{s}.$$
(25)

The next step in the algorithm is to draw a sample, \({\tilde{s}}_{t},\) from the conditional distribution \(p(\tilde{s}| {\tilde{z}}_{t},{{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})\) (step 2 in Fig. 3c),

$${\tilde{s}}_{t} \sim p(\tilde{s}| {\tilde{z}}_{t},{{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})={{{{{{{\mathcal{N}}}}}}}}\left(\tilde{s}| {\bar{s}}_{t},{\Lambda }^{-1}\right).$$

Next, the conditional distribution of stimulus parameter, z, is updated given this new sample, \({\tilde{s}}_{t}\), and a new sample, \({\tilde{z}}_{t+\Delta t},\) is drawn (step 3 in Fig. 3c),

$${\tilde{z}}_{t+\Delta t} \sim p(\tilde{z}| {\tilde{s}}_{t})={{{{{{{\mathcal{N}}}}}}}}(\tilde{z}| {\tilde{s}}_{t},{\Lambda }_{s}^{-1}).$$
(26)

These three steps in the Gibbs sampling algorithm (Eqs. (25), (26)) are performed iteratively until sufficiently many samples, \({\tilde{s}}_{t}\) and \({\tilde{z}}_{t}\), are generated to approximate the true posterior distribution with sufficient accuracy (Fig. 3d; compare the red dots with the blue contour map).

Implementing the Gibbs sampling in a recurrent circuit model

Gibbs sampling of the stimulus (Eq. (4b)) can be implemented via independent Poisson spike generation, as long as the conditional distribution encoded in λt (Eq. (16)) is the same as the conditional distribution in the Gibbs sampling algorithm (Eq. (4a)), i.e., \(\ln p(\tilde{s}| {{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{t})={{{{{{{\bf{h}}}}}}}}{(\tilde{s})}^{\top }{{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{t}=\ln p(\tilde{s}| {\tilde{z}}_{t},{{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})\). This condition can be realized in the recurrent circuit by relating the expressions describing the neural dynamics (Eq. (10)) and those describing the Gibbs sampling distribution (Eq. (4a)) to yield,

$$\ln p(\tilde{s}| {\tilde{z}}_{t},{{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}}) ={{{{{{{\bf{h}}}}}}}}{(\tilde{s})}^{\top }{{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{t},\\ ={{{{{{{\bf{h}}}}}}}}{(\tilde{s})}^{\top }{{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}}+{{{{{{{\bf{h}}}}}}}}{(\tilde{s})}^{\top }{{{{{{{{\bf{u}}}}}}}}}_{t}^{{\mathsf{r}}},\\ =\ln p({{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}}| \tilde{s})+\ln p(\tilde{s}| {\tilde{z}}_{t}).$$
(27)

The generative model for the feedforward input uf (Eq. (19)) suggests that \(\ln p({{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}}| \tilde{s})={{{{{{{\bf{h}}}}}}}}{(\tilde{s})}^{\top }{{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}}\). Hence to satisfy Eq. (27) we require

$$\ln p(\tilde{s}| {\tilde{z}}_{t})={{{{{{{\bf{h}}}}}}}}{(\tilde{s})}^{\top }{{{{{{{{\bf{u}}}}}}}}}_{t}^{{\mathsf{r}}},$$
(28)

which implies that the recurrent input, \({{{{{{{{\bf{u}}}}}}}}}_{t}^{{\mathsf{r}}},\) should approximately have a Gaussian profile,

$${{{{{{{{\bf{u}}}}}}}}}_{tj}^{{\mathsf{r}}}({\tilde{z}}_{t}) ={U}^{{\mathsf{r}}}\exp [-{({\theta }_{j}-{\tilde{z}}_{t})}^{2}/2{a}^{2}]+{\delta }_{\perp }{{{{{{{{\bf{u}}}}}}}}}_{tj}^{{\mathsf{r}}},\\ {\tilde{z}}_{t} =\mathop{\sum}\limits_{j}{{{{{{{{\bf{u}}}}}}}}}_{tj}^{{\mathsf{r}}}{\theta }_{j}/\mathop{\sum}\limits_{j}{{{{{{{{\bf{u}}}}}}}}}_{tj}^{{\mathsf{r}}},\quad {\Lambda }_{s}={a}^{-2}\mathop{\sum}\limits_{j}{{{{{{{{\bf{u}}}}}}}}}_{tj}^{{\mathsf{r}}},$$
(29)

whose position on the stimulus subspace is \({\tilde{z}}_{t}\), and the sum of input (height) is determined by Λs, the precision of conditional distribution \(p(s| {\tilde{z}}_{t})\). In a similar fashion to Eq. (17), \({\delta }_{\perp }{{{{{{{{\bf{u}}}}}}}}}_{t}^{{\mathsf{r}}}\) denotes the deviation from a smooth Gaussian and is perpendicular to the direction of \({\tilde{z}}_{t}\) and Λs.

The optimal recurrent weight can be derived by combining Eq. (29) and Eq. (17). We notice the recurrent input, ur, and neuronal responses, rt, have the same tuning width, a, in a network with only E neurons. This can only be achieved if E neurons are only self-connected (Eq. (10)), as lateral connection broaden their tuning. The optimal recurrent weight generating recurrent input with appropriate strength is then,

$${w}_{E}^{*}=\frac{\langle {{{{{{{{\bf{u}}}}}}}}}_{j}^{{\mathsf{r}}}\rangle }{\langle {{{{{{{{\bf{r}}}}}}}}}_{j}\rangle }=\frac{{\sum}_{j}\langle {{{{{{{{\bf{u}}}}}}}}}_{j}^{{\mathsf{r}}}\rangle }{{\sum}_{j}\langle {{{{{{{{\bf{r}}}}}}}}}_{j}\rangle }=\frac{{\sum}_{j}\langle {{{{{{{{\bf{u}}}}}}}}}_{j}^{{\mathsf{r}}}\rangle }{{\sum}_{j}\left(\langle {{{{{{{{\bf{u}}}}}}}}}_{j}^{{\mathsf{f}}}\rangle+\langle {{{{{{{{\bf{u}}}}}}}}}_{j}^{{\mathsf{r}}}\rangle \right)}=\frac{{\Lambda }_{s}}{{\Lambda }_{{\mathsf{f}}}+{\Lambda }_{s}},$$
(30)

which yields Eq. (6) in the main text. Note that the self-connection is a result of the simplifying assumption that the network consists solely of E neurons (Eq. (10)), which can be relaxed in a full network consisting both E and I neurons as we show below.

The sampling of the stimulus parameter (Eq. (4c)) can be implemented through variability in the recurrent input. To do this, we include diffusive term in the recurrent interactions, \({{{{{{{{\bf{u}}}}}}}}}_{t}^{{\mathsf{r}}}\), and we equate the variance of the fluctuations with the mean to mimic a Poisson distribution:

$${{{{{{{{\bf{u}}}}}}}}}_{t}^{{\mathsf{r}}}={\bar{{{{{{{{\bf{u}}}}}}}}}}_{t}^{{\mathsf{r}}}+\sqrt{{[{\bar{{{{{{{{\bf{u}}}}}}}}}}_{t}^{{\mathsf{r}}}]}_{+}}{{{{{{{{\boldsymbol{\xi }}}}}}}}}_{t},\quad {\bar{{{{{{{{\bf{u}}}}}}}}}}_{t}^{{\mathsf{r}}}={w}_{E}^{*}{{{{{{{{\bf{r}}}}}}}}}_{t-\Delta t},$$
(31)

where []+ denotes negative rectification. Here ξt is a NE dimensional Gaussian white noise with \(\langle {{{{{{{{\boldsymbol{\xi }}}}}}}}}_{t}(i){{{{{{{{\boldsymbol{\xi }}}}}}}}}_{{t}^{{\prime} }}(j)\rangle={\delta }_{ij}\delta (t-{t}^{{\prime} })\), δij and \(\delta (t-{t}^{{\prime} })\) are Kronecker and Dirac delta functions respectively, \({\bar{{{{{{{{\bf{u}}}}}}}}}}_{t}^{{\mathsf{r}}}\) represents the conditional distribution \(p(\tilde{z}| {\tilde{s}}_{t-\Delta t})\), and \({{{{{{{{\bf{u}}}}}}}}}_{t}^{{\mathsf{r}}}\) represent a stimulus parameter sample \({\tilde{z}}_{t}\) (Eq. (29)). The multiplicative variability on recurrent interaction may come from synaptic noise37,70.

Coupled circuits sample a multi-dimensional posterior

We consider a generative model which has multiple latent stimuli, s = (s1, s2,   , sm), which are organized in parallel (Fig. 6a). Without loss of generality, we consider the simplest case where m = 2, and the same mechanism can be straightforwardly extended to any m > 2. We assume the joint prior of s is a multivariate normal distribution,

$$\begin{array}{ll}p({{{{{{{\bf{s}}}}}}}})&={{{{{{{\mathcal{N}}}}}}}}({{{{{{{\bf{s}}}}}}}}| {{{{{{{{\boldsymbol{\mu }}}}}}}}}_{s},{{{{{{{{\boldsymbol{\Lambda }}}}}}}}}_{s}^{-1})\propto \exp [-{\Lambda }_{s}{({s}_{1}-{s}_{2})}^{2}/2],\\ &{{{{{{{\rm{with}}}}}}}}\quad {{{{{{{{\boldsymbol{\Lambda }}}}}}}}}_{s}={\Lambda }_{s}\left(\begin{array}{cc}1&-1\\ -1&1\end{array}\right),\end{array}$$
(32)

and each stimulus sm is uniformly distributed in (−180°, 180°] with periodic boundary imposed. The definition of Gaussian distribution in a circular space works well as long as the variance of the distribution is much smaller than the range of stimulus space. Here Λs is the precision matrix, while the scalar variable Λss ≥ 0) characterizes the correlation between s1 and s2. Note that the covariance matrix \({{{{{{{{\boldsymbol{\Lambda }}}}}}}}}_{s}^{-1}\) is not defined, and the prior (Eq. (32)) is improper. The mean, μs, is a free parameter, because it doesn’t appear in the detailed expression of the prior (Eq. (32)), which is a consequence from the zero determinant of the precision matrix, i.e., Λs = 0. A further consequence is that the prior is not centered at μs, but instead has a band structure along the diagonal, and the marginal prior of each stimulus feature p(sm) (m = 1, 2) is uniform (Fig. 6b). The uniform marginal prior simplifies our theoretical derivation as it implies the spatial homogeneity of the network model but doesn’t impact the proposed neural coding mechanism.

Each stimulus sm (m = 1, 2) individually generates feedforward spiking input \({{{{{{{{\bf{u}}}}}}}}}_{m}^{{\mathsf{f}}}\), whose likelihood \(p({{{{{{{{\bf{u}}}}}}}}}_{m}^{{\mathsf{f}}}| {s}_{m})\) is exactly the same as Eq. (2). Combined together, the generative model is

$$p({{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}}| {{{{{{{\bf{s}}}}}}}})p({{{{{{{\bf{s}}}}}}}}) =\left[\mathop{\prod }\limits_{m=1}^{2}p({{{{{{{{\bf{u}}}}}}}}}_{m}^{{\mathsf{f}}}| {s}_{m})\right]p({s}_{1},{s}_{2}),\\ \propto \left[\mathop{\prod }\limits_{m=1}^{2}{{{{{{{\mathcal{N}}}}}}}}({s}_{m}| {\mu }_{{\mathsf{f}}m},{\Lambda }_{{\mathsf{f}}m}^{-1})\right]{{{{{{{\mathcal{N}}}}}}}}({{{{{{{\bf{s}}}}}}}}| {{{{{{{{\boldsymbol{\mu }}}}}}}}}_{s},{{{{{{{{\boldsymbol{\Lambda }}}}}}}}}_{s}^{-1}),\\ \propto {{{{{{{\mathcal{N}}}}}}}}({{{{{{{\bf{s}}}}}}}}| {{{{{{{{\boldsymbol{\mu }}}}}}}}}_{{\mathsf{f}}},{{{{{{{{\boldsymbol{\Lambda }}}}}}}}}_{{\mathsf{f}}}^{-1}){{{{{{{\mathcal{N}}}}}}}}({{{{{{{\bf{s}}}}}}}}| {{{{{{{{\boldsymbol{\mu }}}}}}}}}_{s},{{{{{{{{\boldsymbol{\Lambda }}}}}}}}}_{s}^{-1}),$$
(33)

where \({{{{{{{{\boldsymbol{\mu }}}}}}}}}_{{\mathsf{f}}}={({\mu }_{{\mathsf{f}}1},{\mu }_{{\mathsf{f}}2})}^{\top }\), and the likelihood precision matrix Λf = diag(Λf1, Λf2) is a diagonal matrix.

Gibbs sampling of the multi-dimensional posterior in a coupled neural circuit

Given the generative model (Eq. (33)), the joint posterior of s1 and s2 is a bivariate normal distribution, i.e., \(p({{{{{{{\bf{s}}}}}}}}| {{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})={{{{{{{\mathcal{N}}}}}}}}\left({{{{{{{\bf{s}}}}}}}}| {{{{{{{{\boldsymbol{\mu }}}}}}}}}_{p},{{{{{{{{\bf{K}}}}}}}}}_{p}^{-1}\right)\), whose precision matrix Kp and the mean μp are,

$${{{{{{{{\bf{K}}}}}}}}}_{p}={{{{{{{{\boldsymbol{\Lambda }}}}}}}}}_{{\mathsf{f}}}+{{{{{{{{\boldsymbol{\Lambda }}}}}}}}}_{s},\quad {{{{{{{{\boldsymbol{\mu }}}}}}}}}_{p}={{{{{{{{\bf{K}}}}}}}}}_{p}^{-1}{{{{{{{{\boldsymbol{\Lambda }}}}}}}}}_{{\mathsf{f}}}{{{{{{{{\boldsymbol{\mu }}}}}}}}}_{{\mathsf{f}}}.$$
(34)

The precision matrix of the posterior is the sum of the precision of the likelihood and the prior, implying increased reliability of the distribution after combining with the prior. Meanwhile, the posterior mean is the weighted average of the means of the two likelihoods, with the weight proportional to the precision of each likelihood. We use this expression for the posterior to evaluate the performance of the proposed sampling-based algorithm.

Using Gibbs sampling to approximate the posterior (Eq. (34)) involves the following steps:

$${{{{{{{\rm{Compute\; :}}}}}}}}\,\,p({\tilde{s}}_{1}| {{{{{{{{\bf{u}}}}}}}}}_{1}^{{\mathsf{f}}},{\tilde{s}}_{2,t-\Delta t})\propto p({{{{{{{{\bf{u}}}}}}}}}_{1}^{{\mathsf{f}}}| {\tilde{s}}_{1})p({\tilde{s}}_{2,t-\Delta t}| {\tilde{s}}_{1}),$$
(35a)
$${{{{{{{\rm{Sample\; :}}}}}}}}\,\,{\tilde{s}}_{1t} \sim p({\tilde{s}}_{1}| {{{{{{{{\bf{u}}}}}}}}}_{1}^{{\mathsf{f}}},{\tilde{s}}_{2,t-\Delta t}).$$
(35b)

We note that we only describe the sampling from the posterior distribution of s1; as samples from the posterior of s2 can be obtained similarly after exchanging indices. This sampling can be implemented in a neural circuit model consisting of several coupled networks, in which each network generates samples from the posterior distribution of the corresponding stimulus. Therefore, the number of networks in the coupled circuit equals the dimension of the latent stimuli. The dynamics of the coupled neural circuit is defined by:

$${{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{1t}={{{{{{{{\bf{u}}}}}}}}}_{1}^{{\mathsf{f}}}+{{{{{{{{\bf{u}}}}}}}}}_{12,t}^{{\mathsf{r}}}={{{{{{{{\bf{u}}}}}}}}}_{1}^{{\mathsf{f}}}+{w}_{12}{{{{{{{{\bf{r}}}}}}}}}_{2,t-\Delta t},$$
(36)
$${{{{{{{{\bf{r}}}}}}}}}_{1t} \sim \mathop{\prod }\limits_{j=1}^{{N}_{E}}{{{{{{{\rm{Poisson}}}}}}}}({{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{1t,j}),$$
(37)

We again note the dynamics of network 2 can be similarly obtained by changing indices. To implement Gibbs sampling (Eqs. (35a), (35b)) in the coupled circuit (Eqs. (36), (37)), spike generation in network 1 (Eq. (37)) can be used to produce stimulus samples, \({\tilde{s}}_{1t}\), when the conditional distribution determined by λ1t matches the conditional distribution required in the definition of Gibbs sampling (Eq. (35a)), i.e., \(\ln p({\tilde{s}}_{1}| {{{{{{{{\bf{u}}}}}}}}}_{1}^{{\mathsf{f}}},{\tilde{s}}_{2,t-\Delta t})=\ln p({\tilde{s}}_{1t}| {{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{1t})={{{{{{{\bf{h}}}}}}}}{({\tilde{s}}_{1})}^{\top }{{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{1t}\). Taking the logarithm of Eq. (35a) yields,

$$\ln p({\tilde{s}}_{1}| {{{{{{{{\bf{u}}}}}}}}}_{1}^{{\mathsf{f}}},{\tilde{s}}_{2,t-\Delta t})=\ln p({{{{{{{{\bf{u}}}}}}}}}_{1}^{{\mathsf{f}}}| {\tilde{s}}_{1})+\ln p({\tilde{s}}_{2,t-\Delta t}| {\tilde{s}}_{1}).$$
(38)

Comparing this expression with Eq. (36), we see that the feedforward input, \({{{{{{{{\bf{u}}}}}}}}}_{1}^{{\mathsf{f}}},\) matches the conditional distribution \(p({{{{{{{{\bf{u}}}}}}}}}_{1}^{{\mathsf{f}}}| {\tilde{s}}_{1})\) (Eq. (33)). We therefore require the recurrent input from network 2 to network 1 to encode the conditional distribution \(p({\tilde{s}}_{2,t-\Delta t}| {\tilde{s}}_{1})\), i.e., \(\ln p({\tilde{s}}_{2,t-\Delta t}| {\tilde{s}}_{1})={{{{{{{\bf{h}}}}}}}}{({\tilde{s}}_{1})}^{\top }{{{{{{{{\bf{u}}}}}}}}}_{12,t}^{{\mathsf{r}}}\). This implies that \({{{{{{{{\bf{u}}}}}}}}}_{12,t}^{{\mathsf{r}}}\) should approximately have a Gaussian profile,

$${{{{{{{{\bf{u}}}}}}}}}_{12,tj}^{{\mathsf{r}}} ={U}_{12}^{{\mathsf{r}}}\exp [-{({\theta }_{j}-{\tilde{s}}_{t-\Delta t})}^{2}/2{a}^{2}]+{\delta }_{\perp }{{{{{{{{\bf{u}}}}}}}}}_{12,tj}^{{\mathsf{r}}},\\ \,\,{\tilde{s}}_{2,t-\Delta t} =\mathop{\sum}\limits_{j}{{{{{{{{\bf{u}}}}}}}}}_{12,tj}^{{\mathsf{r}}}{\theta }_{j}/\mathop{\sum}\limits_{j}{{{{{{{{\bf{u}}}}}}}}}_{12,tj}^{{\mathsf{r}}},\quad {\Lambda }_{s}={a}^{-2}\mathop{\sum}\limits_{j}{{{{{{{{\bf{u}}}}}}}}}_{12,tj}^{{\mathsf{r}}},$$
(39)

where \({\delta }_{\perp }{{{{{{{{\bf{u}}}}}}}}}_{12,tj}^{{\mathsf{r}}}\) quantifies the deviation from a perfect Gaussian profile, and does not affect the decoded value \({\tilde{s}}_{2,t-\Delta t}\) and Λs.

The recurrent input, \({{{{{{{{\bf{u}}}}}}}}}_{12}^{{\mathsf{r}}}\), (Eq. (39)) has the same width a as the neuronal response, r1. In circuit containing only E neurons, if the two networks have the same number of neurons, then across networks only neurons having the same preferred stimulus should be connected. The optimal recurrent weight between two networks is then

$${w}_{mn}=\frac{\langle {{{{{{{{\bf{u}}}}}}}}}_{mn,\,j}^{{\mathsf{r}}}\rangle }{\langle {{{{{{{{\bf{r}}}}}}}}}_{nj}\rangle }=\frac{\mathop{\sum}_{j}\langle {{{{{{{{\bf{u}}}}}}}}}_{mn,\;j}^{{\mathsf{r}}}\rangle }{\mathop{\sum}_{j}\langle {{{{{{{{\bf{r}}}}}}}}}_{nj}\rangle }=\frac{{\Lambda }_{s}}{{\Lambda }_{s}+{\Lambda }_{n}^{{\mathsf{f}}}},\,\,(m\,\ne \,n)$$
(40)

Since each network individually generate a stimulus sample, the sample of stimulus m can be locally read out from network m’s responses even if the activities of two networks are correlated (Fig. 6a), which greatly simplifies readout. Furthermore, due to the population firing rate of each network has Gaussian profile, the stimulus sample \({\tilde{s}}_{mt}\) can be linearly read out from rmt as

$${\tilde{s}}_{mt}=\mathop{\sum}\limits_{j}{\theta }_{j}{{{{{{{{\bf{r}}}}}}}}}_{mt,\, j}/\mathop{\sum}\limits_{j}{{{{{{{{\bf{r}}}}}}}}}_{mt,\, j}.$$
(41)

We note that the circuit implementation of Gibbs sampling from a multi-dimensional posterior (Eq. (8a)) does not require the recurrent connections between E neurons within a network. This is due to the assumption that the marginal priors of each stimulus feature, p(sm), are uniform. For a non-uniform marginal prior p(sm), recurrent connections between E neurons within a network would be required for generating samples from a distribution that matches the true posterior.

Inference from an information-theoretic point of view

The goal of the sampling algorithm is to approximate the posterior distribution of a latent variables, Θ, given a feedforward input, uf. Specifically, the latent variables Θ = {s, z} in the hierarchical generative model (Eq. (23)), or Θ = s = {s1, s2} in the generative model with breadth (Eq. (33)). When the sampling algorithm uses an internal model which does not match the structure of the generative model, the sampling distribution quf) will differ from the true posterior, puf) (Eq. (24)). In this case the mutual information between the sampling distribution of the latent variables, Θ, and uf will be smaller than in the case when samples come from the true posterior, puf),

$$I(\Theta,{{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}}) =-{{\mathbb{E}}}_{p(\Theta )}[\log p(\Theta )]+{{\mathbb{E}}}_{p(\Theta,{{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})}[\log p(\Theta | {{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})]\\ \ge -{{\mathbb{E}}}_{p(\Theta )}[\log p(\Theta )]+{{\mathbb{E}}}_{p(\Theta,{{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})}[\log q(\Theta | {{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})]\equiv {I}_{q}(\Theta ;{{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}}),$$
(42)

It is straightforward to show that the difference between I(Θ, uf) and Iq(Θ, uf) is the Kullback–Leibler (KL) divergence between p and q, i.e., \({D}_{KL}[p| | q]=I(\Theta,{{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})-{I}_{q}(\Theta,{{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})={{\mathbb{E}}}_{p}(\ln p-\ln q)\ge 0\). Equality in Eq. (42) holds only if the distribution q matches the true posterior p.

The mutual information Iq(Θ; uf) can be computed analytically when the approximating distribution \(q(\Theta | {{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})={{{{{{{\mathcal{N}}}}}}}}(\Theta | {{{{{{{{\boldsymbol{\mu }}}}}}}}}_{q},{{{{{{{{\bf{K}}}}}}}}}_{q}^{-1})\) is a bivariate normal (substituting Eqs. (23) and (24) into Eq. (42)),

$${I}_{q}(\Theta ;{{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})=\log L+\frac{1}{2}\left[1+\log \frac{| {{{{{{{{\bf{K}}}}}}}}}_{q}| }{2\pi {\Lambda }_{s}}-{{{{{{{\rm{tr}}}}}}}}({{{{{{{{\bf{K}}}}}}}}}_{q}{{{{{{{{\bf{K}}}}}}}}}_{p}^{-1})-{({{{{{{{{\boldsymbol{\mu }}}}}}}}}_{p}-{{{{{{{{\boldsymbol{\mu }}}}}}}}}_{q})}^{\top }{{{{{{{{\bf{K}}}}}}}}}_{q}({{{{{{{{\boldsymbol{\mu }}}}}}}}}_{p}-{{{{{{{{\boldsymbol{\mu }}}}}}}}}_{q})\right].$$
(43)

Here L = 360° is the length of the stimulus feature subspace, while μp and Kp are the mean and the precision matrix of the posterior distribution (Eqs. (24) or (34)). When q matches the posterior distribution, p, we have, \(I(\Theta ;{{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})=\log L-\frac{1}{2}[1+\log (2\pi {\Lambda }_{s})-\log | {{{{{{{{\bf{K}}}}}}}}}_{p}| ].\)

The neuronal response distribution conditioned on external stimulus

We compute the distribution of neuronal responses r over time/trial in response to an external stimulus s, i.e., p(rs), in order to find a neural signature of network sampling and compare it with experimental data. For a fixed external stimulus s, the neuronal response r fluctuates due to both sensory transmission noise described by p(ufs) (Eq. (18)), as well as the internally generated variability described by p(ruf) (Fig. 4a). Therefore, the distribution of r in response to an external stimulus s has the form

$$p({{{{{{{\bf{r}}}}}}}}| s)=\int\,p({{{{{{{\bf{r}}}}}}}}| {{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})p({{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}}| s)d{{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}}.$$

For simplicity, we only compute the covariability of p(ruf) along the stimulus subspace (Fig. 1b, x-axis), because the covariability along other directions is not related with stimulus sampling. By approximating the Poissonian spiking variability p(rλ) with a multivariate normal distribution (Eq. (11)), and considering the limit of weak fluctuations in λ along the stimulus subspace over time, p(ruf) can be computed approximately as (see math details in Supplementary Information),

$$p({{{{{{{\bf{r}}}}}}}}| {{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}}) =\int\,p({{{{{{{\bf{r}}}}}}}}| {{{{{{{\boldsymbol{\lambda }}}}}}}})p({{{{{{{\boldsymbol{\lambda }}}}}}}}| {{{{{{{{\bf{u}}}}}}}}}^{{\mathsf{f}}})d{{{{{{{\boldsymbol{\lambda }}}}}}}},\\ \approx \, {{{{{{{\mathcal{N}}}}}}}}\left[{{{{{{{\bf{r}}}}}}}}\left|\right.{{{{{{{\bf{f}}}}}}}}(s),{{{{{{{\rm{diag}}}}}}}}({{{{{{{\bf{f}}}}}}}}(s))+V(\bar{s}| {\mu }_{{\mathsf{f}}}){{{{{{{{\bf{f}}}}}}}}}_{s}^{{\prime} }{{{{{{{{\bf{f}}}}}}}}}_{s}^{{\prime} \top }\right],\quad {{{{{{{\rm{where}}}}}}}}\,s={\mu }_{{\mathsf{f}}}.$$
(44)

f(s) = 〈λt〉 denotes the temporally averaged population response. The covariance structure of the neuronal response includes two terms: diag(f(s)), a diagonal matrix whose entries equal that of the vector f(s) denoting the (independent) Poisson spiking variability (Eq. (23)), and \(V(\bar{s}| {\mu }_{{\mathsf{f}}}){{{{{{{{\bf{f}}}}}}}}}_{s}^{{\prime} }{{{{{{{{\bf{f}}}}}}}}}_{s}^{{\prime} \top }\), a term that captures the covariability due to firing rate fluctuations along the stimulus subspace (Fig. 8a), where \({{{{{{{{\bf{f}}}}}}}}}_{s}^{{\prime} }=d{{{{{{{\bf{f}}}}}}}}(s)/ds\) is the derivative of f(s) over the stimulus feature s. The covariance \({{{{{{{{\bf{f}}}}}}}}}_{s}^{{\prime} }{{{{{{{{\bf{f}}}}}}}}}_{s}^{{\prime} \top }\) is often termed differential (noise) correlations4,17. With the Gaussian profile of f(s) (Eqs. (18) and (29)), \({{{{{{{{\bf{f}}}}}}}}}_{s}^{{\prime} }{{{{{{{{\bf{f}}}}}}}}}_{s}^{{\prime} \top }\) exhibits anti-symmetric structure (Fig. 8b)17,22,53,71,72.

In Eq. (44), \(V(\bar{s}| {\mu }_{{\mathsf{f}}})\) is the variance of \({\bar{s}}_{t}\) (the mean of conditional distribution in Eq. (4a)) over time and characterizes the amplitude of internally generated differential correlations. In network implementation, \({\bar{s}}_{t}\) and μf are represented as the position of λt and uf on the stimulus subspace respectively (Eqs. (14) and (20)). The dynamics of Gibbs sampling (Eq. S20 in Supplementary Information) and the network structure (Eq. (6)) imply that

$$V(\bar{s}| {\mu }_{{\mathsf{f}}})=\frac{{\Lambda }_{s}}{{\Lambda }_{{\mathsf{f}}}({\Lambda }_{{\mathsf{f}}}+{\Lambda }_{s})}={a}^{2}{n}_{{\mathsf{f}}}^{-1}{w}_{E}^{*}.$$
(45)

Note that \(V(\bar{s}| {\mu }_{{\mathsf{f}}})\) is constrained by network connections, in that it is internally generated and shared within the network (for \({w}_{E}^{*} \, > \, 0\)).

An expression for p(rs) can be derived similarly, and includes an additional term contributing to differential correlations compared with p(ruf) (Eq. (44)) due to fluctuations in the feedforward inputs,

$$p({{{{{{{\bf{r}}}}}}}}| s) \approx {{{{{{{\mathcal{N}}}}}}}}\left[{{{{{{{\bf{r}}}}}}}}\left|\right.{{{{{{{\bf{f}}}}}}}}(s),{{{{{{{\rm{diag}}}}}}}}({{{{{{{\bf{f}}}}}}}}(s))+V(\bar{s}| s){{{{{{{{\bf{f}}}}}}}}}_{s}^{{\prime} }{{{{{{{{\bf{f}}}}}}}}}_{s}^{{\prime} \top }\right], \\ V(\bar{s}| s) =V(\bar{s}| {\mu }_{{\mathsf{f}}})+V({\mu }_{{\mathsf{f}}}| s)=\frac{{\Lambda }_{s}}{{\Lambda }_{{\mathsf{f}}}({\Lambda }_{{\mathsf{f}}}+{\Lambda }_{s})}+\frac{1}{{\Lambda }_{{\mathsf{f}}}}={a}^{2}{n}_{{\mathsf{f}}}^{-1}({w}_{E}^{*}+1).$$
(46)

Here the variance, \(V(\bar{s}| s)\), in the stimulus feature subspace is a mixture of internal variability, \(V(\bar{s}| {\mu }_{{\mathsf{f}}})\), and sensory noise, V(μfs) (Eq. (23)). The neuronal response distribution in coupled networks (Fig. 6a) can be obtained similarly (see the Supplementary Information).

A spiking network model with excitatory and inhibitory Poisson neurons

To test the proposed inference mechanisms in a network consisting of E neurons (Eqs. (10)–(37)), we simulated a well studied recurrently coupled cortical model21,22. The network consisted of NE excitatory (E) and NI inhibitory (I) spiking neurons, with the activity of each neuron modeled as a Hawkes process69. At time t, we represent the response of neuron j in population a = {E, I}, \({{{{{{{{\bf{r}}}}}}}}}_{tj}^{a}\), as a spike count drawn from a Poisson distribution with instantaneous firing rate, \({{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{tj}^{a}\),

$${{{{{{{{\bf{r}}}}}}}}}_{tj}^{a} \sim {{{{{{{\rm{Poisson}}}}}}}}\left[{{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{tj}^{a}\right].$$
(47)

Each neuron has a refractory period of 2ms after emitting a spike. The firing rate \({{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{tj}^{a}\) is the sum of feedforward input \({{{{{{{{\bf{u}}}}}}}}}_{tj}^{a{\mathsf{f}}}\) and recurrent input \({{{{{{{{\bf{u}}}}}}}}}_{tj}^{a{\mathsf{r}}}\), so that \({{{{{{{{\boldsymbol{\lambda }}}}}}}}}_{tj}^{a}={{{{{{{{\bf{u}}}}}}}}}_{tj}^{a{\mathsf{f}}}+{{{{{{{{\bf{u}}}}}}}}}_{tj}^{a{\mathsf{r}}}.\) The feedforward inputs are filtered spikes from upstream neurons, \({{{{{{{{\bf{u}}}}}}}}}_{tj}^{a{\mathsf{f}}}={\sum }_{n}\eta \left(t-{t}_{jn}^{{\mathsf{f}}}\right),\) where \({t}_{jn}^{{\mathsf{f}}}\) is the time of the nth spike received by neuron j of population a from the feedforward inputs. Here η(t) is the synaptic input profile which is modeled as \(\eta (t)=\exp (-t/{\tau }_{d})/{\tau }_{d}\), (t > 0). Throughout, we set the synaptic time constant τd = 2ms. To mimic the Poisson-like variability to sample a stimulus parameter in a hierarchical generative model (Eqs. (23) and (31)), the recurrent input received by neuron j in population a is defined by

$${{{{{{{{\bf{u}}}}}}}}}_{tj}^{a{\mathsf{r}}} ={\bar{{{{{{{{\bf{u}}}}}}}}}}_{tj}^{a{\mathsf{r}}}+\sqrt{{[{\bar{{{{{{{{\bf{u}}}}}}}}}}_{tj}^{a{\mathsf{r}}}]}_{+}}{\xi }_{t},\\ {\bar{{{{{{{{\bf{u}}}}}}}}}}_{tj}^{a{\mathsf{r}}} =\mathop{\sum}\limits_{b=\{E,I\}}\mathop{\sum }\limits_{k=1}^{{N}_{b}}\frac{{J}_{jk}^{ab}}{\sqrt{N}}\mathop{\sum}\limits_{n}\eta (t-{t}_{kn}^{b}),$$
(48)

where \({\bar{{{{{{{{\bf{u}}}}}}}}}}_{tj}^{a{\mathsf{r}}}\) is the mean recurrent input at time t given the neuronal activities of the presynaptic neurons. The recurrent input in the network is corrupted by noise whose variance equals the mean of the recurrent input. In a physiological network, recurrent noise may be generated by the chaotic state in network dynamics36 or synaptic noise37,70. In Eq. (48), the function []+ rectifies the negative input, and ξt is a random variable following a standard Gaussian distribution. The coefficient \({J}_{ij}^{ab}\) is the synaptic weight from neuron j in population b to neuron i in population a. The time \({t}_{kn}^{b}\) is the time of the nth spike fired by neuron k in population b. The parameter N = NE + NI is the total number of neurons in the network. The scaling of the synaptic weights by \(1/\sqrt{N}\) is standard in networks where excitation is balanced by recurrent inhibition36. Finally, the synaptic input profile of the recurrent input, η(t), is the same as the one we chose for the feedforward input for convenience. Note that the rectification in Eq. (48) on recurrent inputs will introduce errors resulting in deviations of the sampling distribution from the true posterior, and hence we chose the recurrent weights to be small (Fig. 5). The rectification only arises when using (continuous) recurrent inputs to sample the stimulus parameter, and doesn’t impact the generality of sampling by (discrete) Poisson spiking variability.

To model the coding of a circular stimulus such as orientation, the excitatory neurons are arranged on a ring22,71. The preferred stimuli, θj, of the excitatory neurons are equally spaced on the interval (−180°, 180°], consistent with the range of latent features (Eq. (21)). Inhibitory neurons are not tuned to stimulus, and their role is to stabilize network responses. Note that the recurrent connections between E neurons are modeled using a Gaussian function decaying with the distance between the stimuli preferred by the two cells, rather than only self-connection in the simple network with only E neurons (Eq. (30)),

$${J}_{jk}^{EE}=\frac{{w}_{EE}L}{\sqrt{2\pi }a}\exp \left[-\frac{{({\theta }_{j}-{\theta }_{k})}^{2}}{2{a}^{2}}\right],$$
(49)

We imposed periodic boundaries on the Gaussian function to avoid boundary effect in simulations. Although in the generative model we assumed non-periodic feature variables (Eq. (3)), as long as the variance of the associated distributions are smaller than the width of the feature space, the network model with periodic boundaries on the recurrent connection (Eq. (49)) provides a good approximation of the non-periodic Gaussian posterior (Eq. (24)). The weight wEE denotes the average connection strength of all E to E connections. The parameter a = 40° defines the footprint of connectivity in feature space (i.e the ring), and L = 360° is the length of the ring manifold (Eq. (21)); Multiplication by L in Eq. (49) sets the sum of all E to E connection strengths equal to NEwEE. Moreover, the excitatory and inhibitory neurons are all-to-all connected with each other (similar for I to I connections). For simplicity, we consider the E to I, I to I and I to E connections all to be unstructured (in feature space) and assume that connections of the same type have equal weight, i.e., \({J}_{jk}^{EI}={w}_{EI}\), \({J}_{jk}^{IE}={w}_{EE}\) and \({J}_{jk}^{II}={w}_{II}\). To simplify the network further, we consider the connections from the same population of neurons to have the same average weight, i.e., wEE = wIE ≡ wE and wII = wEI ≡ wI. For the feedforward network model shown in Fig. 2, we only remove the E recurrent connections between E neurons, i.e., wEE = 0, while keeping other connections, including wEI, wII, and wIE, the same as the recurrent network.

The feedforward inputs applied to E neurons consist of independent Poisson spike counts as described by Eq. (18), with rate \(\langle {{{{{{{{\bf{u}}}}}}}}}_{j}^{E{\mathsf{f}}}(s)\rangle={U}^{{\mathsf{f}}}{e}^{-{(s-{\theta }_{j})}^{2}/(4{a}^{2})}\). The inhibitory neurons also receive feedforward indpendent Poissonian inputs. The firing rate of the input received by every I neuorn is proportional to the overall feedforward rate of input to E neurons, in order to keep the excitatory and inhibitory balance of neuronal activities in the network,

$$\langle {{{{{{{{\bf{u}}}}}}}}}_{j}^{I{\mathsf{f}}}\rangle=\frac{{w}_{I{\mathsf{f}}}}{{N}_{I}}\mathop{\sum }\limits_{j=1}^{{N}_{E}}\langle {{{{{{{{\bf{u}}}}}}}}}_{j}^{E{\mathsf{f}}}(s)\rangle .$$
(50)

In the simulations, we started with a network of NE = 180 excitatory and NI = 45 inhibitory neurons, and increased the number of neurons by a fixed factor in Fig. 1d. The ratio between the average connection from I neurons and the one from E neurons was kept fixed with wI/wE = 5. We set the feedforward weight of input to I neurons to wIf = 0.8. We simulated the dynamics of the model network using the Euler method with a time step of 0.1ms. The typical parameters used in simulation can be found in Table 1 in Supplementary Information. Further details about the simulations and numerical estimates of mutual information and linear Fisher information are also presented in Supplementary Information. The code of network simulation was written in MATLAB 2018b, and can be found at GitHub (https://github.com/wenhao-z/Sampling_PoissSpk_Neuron).

A spiking network model of coupled neural circuits

In the coupled neural circuits used to infer latent variables organized in parallel (Fig. 6a) the two networks are copies of each other, i.e., the two networks have the same intrinsic parameters. Each network is equivalent to the one described in the previous section, except that there is no recurrent connections between E neurons in the same network, and no variability in recurrent interactions (no noise in Eq. (48)). The absence of recurrent connections between E neurons in the same network is due to the uniform marginal prior of stimulus. Nevertheless, in the same network the E and I neurons are connected using the same connection profile as above to keep network activity stable. Between the two networks, there are only E connections which target both E and I neurons. The connections between E neurons across networks have the same pattern as that given described by Eq. (49) with the peak connection strength from network n to network m denoted as \({w}_{EE}^{mn}\). The connections from E neurons in one network to I neurons in the other is set to the same as the peak strength of E connections across networks for simplicity, i.e., \({w}_{IE}^{mn}={w}_{EE}^{mn}\). To simplify the network model further, we set the inter-network connections to be symmetric, which means \({w}_{EE}^{mn}={w}_{EE}^{nm}\). In the simulations \({w}_{EE}^{mn}\) was adjusted to determine how the sampling distribution is affected (Fig. 7a).

Comparing the sampling distribution with posterior in coupled neural circuits

We read out the samples from the posterior distribution of each stimulus, \({\tilde{s}}_{mt},\) individually from the spiking activities of E neurons, rmt, in network m in every time window of 20ms by using a population vector. We used this collection of samples to estimate the mean, \(\langle \tilde{{{{{{{{\bf{s}}}}}}}}}\rangle={(\langle {\tilde{s}}_{1}\rangle,\langle {\tilde{s}}_{2}\rangle )}^{\top }\), and covariance matrix, Σs, of the sampling distribution. Meanwhile, the mean μf and precision matrix Λf of the likelihood are linearly read out from the feedforward inputs fed into the network model (Eq. (33)).

If the sampling distribution is comparable with the posterior, the sampling mean \(\langle \tilde{{{{{{{{\bf{s}}}}}}}}}\rangle\) and covariance Σs should satisfy Eq. (34). We use the actual sampling covariance and the likelihood parameters to predict the sampling mean, i.e., \({\langle \tilde{{{{{{{{\bf{s}}}}}}}}}\rangle }_{{{{{{{{\rm{pred}}}}}}}}}={{{{{{{{\boldsymbol{\Sigma }}}}}}}}}_{{{{{{{{\bf{s}}}}}}}}}{{{{{{{{\boldsymbol{\Lambda }}}}}}}}}_{{\mathsf{f}}}{{{{{{{{\boldsymbol{\mu }}}}}}}}}_{{\mathsf{f}}}\), and compare it with the actual \(\langle \tilde{{{{{{{{\bf{s}}}}}}}}}\rangle\) (Fig. 7d–f). To obtain the posterior precision matrix, given the sampling mean \(\langle \tilde{{{{{{{{\bf{s}}}}}}}}}\rangle\) and the likelihood parameters, we vary the single parameter of prior precision Λs to minimize the KL divergence from the prediction of posterior by using the value of Λs, and the actual sampling distribution. Given this value of Λs, the prediction of posterior precision is computed as Kpred = Λs + Λf (Eq. (34)) which is then compared with actual sampling precision matrix (\({{{{{{{{\boldsymbol{\Sigma }}}}}}}}}_{{{{{{{{\bf{s}}}}}}}}}^{-1}\); see Fig. 7c–g). The prior precision, Λs, is a subjective prior, which reflects the prior stored in the recurrent network and may change with input (see Discussion). More details of network simulation and parameters can be found in Supplementary Information.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.