Abstract
Neuronal network models of highlevel brain functions such as memory recall and reasoning often rely on the presence of some form of noise. The majority of these models assumes that each neuron in the functional network is equipped with its own private source of randomness, often in the form of uncorrelated external noise. In vivo, synaptic background input has been suggested to serve as the main source of noise in biological neuronal networks. However, the finiteness of the number of such noise sources constitutes a challenge to this idea. Here, we show that sharednoise correlations resulting from a finite number of independent noise sources can substantially impair the performance of stochastic network models. We demonstrate that this problem is naturally overcome by replacing the ensemble of independent noise sources by a deterministic recurrent neuronal network. By virtue of inhibitory feedback, such networks can generate small residual spatial correlations in their activity which, counter to intuition, suppress the detrimental effect of shared input. We exploit this mechanism to show that a single recurrent network of a few hundred neurons can serve as a natural noise source for a large ensemble of functional networks performing probabilistic computations, each comprising thousands of units.
Introduction
Probabilistic inference as a principle of brain function has attracted increasing attention over the past decades^{1,2}. In support of a samplingbased “Bayesianbrain hypothesis”, the high invivo response variability of cortical neurons observed in electrophysiological recordings^{3} is interpreted in the context of ongoing probabilistic computation^{4,5,6,7,8}. Simultaneously, it has been found that intrinsically stochastic neural networks are a suitable substrate for machine learning^{9,10}. These findings have led to the incorporation of noise into computational neuroscience models^{11,12,13}, in particular to give account for the mechanisms underlying stochastic computing such as samplingbased probabilistic inference in biological neuronal substrates^{7,14,15,16}. Note that the term “stochastic computing” refers to the idea that the variability required for this form of computing can be mathematically described as (or replaced by) quasistochasticity without altering the functionality of the network. It does not imply that its implementation is relying on truly stochastic sources of noise, neither in natural nor synthetic neuronal substrates.
A number of potential sources of noise in biological circuits have been discussed in the past^{17}, such as variability in synaptic transmission^{18}, ion channel noise^{19} or synaptic background input^{20,21}. Arguably most widespread is the implementation of noise in neuralnetwork models at the level of individual neurons. In this view, neurons are described as intrinsically stochastic units (Fig. 1, intrinsic) updating their states as a stochastic function of their synaptic input^{14,22,23}. This description is however at odds with experimental data. In vitro, isolated neurons exhibit little response variability^{20,24,25}. Researchers have reconciled the apparent discrepancy by equipping deterministic model neurons with additive private independent noise (Fig. 1, private), often in the form of Gaussian white noise or random sequences of action potentials (spikes) modeled as Poisson point processes^{15,26}. This restores the variability required for stochastic computing and is justified as originating from the background input a neuron in nature receives from the remainder of the network. So far, it is unclear though how the biological substrate could provide such a well controlled source of stochasticity for each individual unit in the functional network. The implicit assumption of independence of the background noise across units in the network is usually mentioned en passant and goes unchallenged.
Previous experimental and theoretical studies have shown that cortical neuronal networks can generate highly irregular spiking activity with small spatial and temporal correlations, resembling ensembles of independent realizations of Poisson point processes^{3,27,28,29}. It is hence tempting to assume that such networks may serve as appropriate effective noise sources for functional networks performing stochastic computing. However, as the size of these noisegenerating background networks is necessarily finite, units in the functional network have to share noise sources. Assuming that the background spiking activity is uncorrelated, these shared inputs give rise to correlations in the inputs of the functional units, thereby violating the assumption of independence and potentially impairing network performance.
The present work demonstrates that a finite ensemble of uncorrelated noise sources (Fig. 1, shared) indeed leads to a substantial degradation of network performance due to sharednoise correlations. However, replacing the finite ensemble of uncorrelated noise sources by a recurrent neuronal network (Fig. 1, network) alleviates this problem. As shown in previous studies^{30,31}, networks with dominant inhibitory feedback can generate small residual spatial correlations in their activity which counteract the effect of shared input. We propose that biological neuronal networks exploit this effect to supply functional networks with nearly uncorrelated noise despite a finite number of background inputs. Moreover, a similar noisegeneration strategy may prove useful for the implementation of samplingbased probabilistic computing on largescale neuromorphic platforms^{32,33}.
In this study, we focus on neuronal networks derived from Boltzmann machines^{22} as representatives of stochastic functional networks. Such networks are widely used in machine learning^{9,10}, but also in theoretical neuroscience as models of brain dynamics and function^{14,15,34}. For the purpose of the present study, the advantage of models of this class lies in our ability to quantify their functional performance when subject to limitations in the quality of the noise.
Results
Networks with additive private Gaussian noise approximate Boltzmann machines
Boltzmann machines (BMs) are symmetrically connected networks of intrinsically stochastic binary units^{22}. With an appropriate update schedule and parametrization, the network dynamics effectively implement Gibbs sampling from arbitrary Boltzmann distributions^{35}. A given network realization leads to a particular frequency distribution of network states. Efficient training methods^{9,36} can fit this distribution to a given data distribution by modifying network parameters, thereby constructing a generative model of the data. In the following we investigate to what extent the functional performance of BMlike stochastic networks is altered if the intrinsic stochasticity assumed in BMs is replaced by private, shared or network generated noise (Fig. 1). If not otherwise indicated, we consider BMs with random connectivity not trained for a specific task. Due to the specific noisegeneration processes, the neural network implementations deviate from the mathematical definition of a BM. We therefore refer to these implementations as “sampling networks”.
In BMs, the intrinsically stochastic units i ∈ {1, …, M} are activated according to a logistic function \({F}_{i}({h}_{i})={(1+{e}^{\beta {h}_{i}})}^{1}\) of their input field \({h}_{i}={\sum }_{j\mathrm{=1}}^{M}{w}_{ij}{s}_{j}+{b}_{i}\) with inverse temperature β, synaptic weight w_{ij} between unit j and unit i, presynaptic activity s_{j} ∈ {0, 1}, and bias b_{i} (details see Methods). Equivalently, the network nodes of a BM may be regarded as deterministic units with Heaviside activation function F_{i}(h_{i}) = Θ(h_{i} + ξ_{i}), receiving additive noise ξ_{i} distributed according to \(\frac{\beta }{4}\mathrm{[1}{\tanh }^{2}(\beta {\xi }_{i})]\) (^{37}, see also Methods).
Additive Gaussian noise \({\xi }_{i} \sim {\mathscr{N}}(\mu ,{\sigma }^{2})\) constitutes a more plausible form of noise as it emerges naturally in units receiving a large number of inputs from uncorrelated sources^{24,25,38}. Deterministic units receiving private Gaussian noise resemble units with a probabilistic update rule. Their effective gain function, however, corresponds to a shifted error function \({F}_{i}({h}_{i})={\rm{erfc}}({h}_{i}+{\mu }_{i}/\sqrt{2{\sigma }^{2}})\mathrm{/2}\), rather than a logistic function. We minimize the mismatch between the two activation functions by relating the standard deviation σ of the Gaussian noise to the inverse temperature β (see Methods). For a given noise strength, this defines an effective inverse temperature β_{eff}. To emulate a BM at inverse temperature β, we rescale all weights and biases: b_{i} → β/β_{eff}b_{i} − μ_{i}, w_{ij} → β/β_{eff}w_{ij}. The KullbackLeibler divergence D_{KL}(p, p^{*}) between the empirical state distribution p of the sampling network and the state distribution p^{*} generated by a BM over a subset of m units quantifies the sampling error.
For matched temperature, networks of deterministic units with additive Gaussian noise closely approximate BMs (Fig. 2, gray vs. black). The sampling error decreases as a function of the sampling duration T, and saturates at a small but finite value (Fig. 2a, gray) due to remaining differences in the activation functions and hence sampling dynamics. The residual differences between the stationary distributions (Fig. 2b, black vs. gray bars) are significantly smaller than the differences in relative probabilities of different network states.
The assumption of idealized private Gaussian noise generated by pseudorandom number generators is hard to reconcile with biology. In the following, the Gaussian noise is therefore replaced by input from binary and, subsequently, spiking units, thereby mimicking an embedding of the functional circuit into a cortical network. As a consequence, the noise of the sampling units exhibits jumps with finite amplitudes determined by the weights of the incoming connections. Only if the number of input events per sampling unit is large and the weights are small, the collective signal resembles Gaussian noise (see Supplementary Material). The sampling error resulting from private Gaussian noise therefore constitutes a lower bound on the error achievable by sampling networks supplied with noise from a finite ensemble of binary or spiking sources.
Sharednoise correlations impair sampling performance
Neurons in a functional circuit embedded in a finite surrounding network have to share noise sources to gather random input at a sufficiently high frequency. In consequence, the input noise for pairs of sampling units is typically correlated, even (or, in particular) if the noise sources are independent (Fig. 3, left).
By replacing private noise with a large number of inputs from a finite ensemble of independent noise sources, we investigate to what extent these sharednoise correlations distort the sampled distribution of network states. The noise sources are stochastic binary units with an adjustable average activity 〈z〉. To achieve a high input event count, each sampling unit is randomly assigned a large number K of inputs. For each unit, these are randomly chosen from a common ensemble of N sources. On average, a pair of neurons in the sampling network hence shares K^{2}/N noise sources. The ensemble of noise sources is comprised of γN excitatory and (1 − γ)N inhibitory units, projecting to their targets with weights w and −gw, respectively. The input field for a single unit in the sampling network is then given by \({h^{\prime} }_{i}={\sum }_{j=1}^{M}\,{w}_{ij}{s}_{j}+{b}_{i}+{\sum }_{k=1}^{N}\,{m}_{ik}{z}_{k}\), where m_{ik} represents the strength of the connection from the k th noise source to the i th sampling unit.
For homogeneous connectivity, i.e., identical input statistics for each sampling unit, the second term in \({h^{\prime} }_{i}\) can be approximated by a Gaussian noise with mean μ = Kw(γ − (1 − γ)g)〈z〉 and variance σ^{2} = Kw^{2}(γ + (1 − γ)g^{2})〈z〉(1 − 〈z〉) (see Methods). These measures allow us to perform a similar calibration of the activation function as in the previous section. For heterogeneous connectivity, a similar calibration can be performed based on the empirically obtained mean and variance of the noise input distribution.
If K ≈ N, sharedinput correlations are large and the sampling error is substantial, even for long sampling duration (Fig. 2, blue curve and bars). Increasing N while keeping K fixed leads to a gradual decrease of sharedinput correlations (~1/N) and therefore to a reduction of the sampling error (Fig. 4, blue curves). For large N ≫ K, the sampling error approaches values comparable to those obtained with private Gaussian noise (Fig. 4, blue vs. gray curves). For a broad range of N, the sampling error and the average sharedinput correlation exhibit a similar trend (~1/N).
Networkgenerated noise recovers sampling performance
In recurrent neural networks, inhibitory feedback naturally suppresses sharedinput correlations through the emerging activity patterns^{30,31}. Here we exploit this effect to minimize the detrimental influence of sharedinput correlations arising from a limited number of noise sources. To this end, we replace the finite ensemble of independent stochastic sources by a recurrent network of deterministic units with Heaviside activation function(Fig. 1, red; see Methods). The noise generating network comprises an excitatory and an inhibitory subpopulation with random, sparse and homogeneous connectivity. Connectivity parameters are chosen such that the recurrent dynamics is dominated by inhibition, thereby guaranteeing stable, nearly uncorrelated activity^{27,30,39,40}. To achieve optimal suppression of sharedinput correlations in the sampling network, the connectivity between the noise network and the sampling network needs to match the connectivity within the noise network, i.e. the number and the (relative) weights of excitatory and inhibitory inputs have to be identical. Similar to the previous sections, we map the sampling network to a corresponding BM by relating the noise intensity to the inverse temperature β. As above, the additional contribution to the input fields \({h^{\prime} }_{i}\) of neurons in the sampling network resulting from the noise network can be approximated by a normal distribution \({\mathscr{N}}(\mu ,{\sigma }^{2})\). Here, we account for an additional contribution to the input variance resulting from residual correlations between units in the noise network (see Supplementary Material).
Using a recurrent network for noise generation considerably decreases the sampling error compared to the error obtained with a finite number of independent sources (sharednoise scenario), even if the sharedinput correlations are substantial (Fig. 2, red vs. blue curve). Precisely because the activity of the noise network is not uncorrelated, the sharedinput correlations in the units of the functional circuit are counterbalanced (cf. Fig. 3). For a broad range of noisenetwork sizes N, the noise input correlation, and hence the sampling error, are significantly reduced (Fig. 4, red vs. blue). In this range, the sampling error is comparable to the error obtained with private Gaussian noise and almost independent of N (Fig. 4, red vs. gray). Only if the noise network becomes too dense (K ≈ N), its dynamics lock into a fixed point (see Supplementary Material) and the sampling performance breaks down.
At first glance, it may seem counterintuitive that correlated networkgenerated noise can suppress correlations resulting from shared input. To resolve this, consider the noise input correlation
of two units i and j in the (unconnected) sampling network. Here, \({\bar{z}}_{k}={z}_{k}\langle {z}_{k}\rangle \) denotes the centered activity of unit k in the pool \( {\mathcal B} \) of noise sources, m_{ik} the weight of the connection between noise unit k and the target unit i, 〈·〉 the trial average (average across different initial conditions), and δ_{kl} the Kronecker delta. The first term \({C}_{{\rm{shared}},ij}^{{\rm{in}}}\) in (1) describes sharedinput correlations arising from common noise sources. The second term \({C}_{{\rm{corr}},ij}^{{\rm{in}}}\) represents pairwise correlations between noise sources (Fig. 3; see also Eq. (19) in^{31}). If Dale’s principle is respected, i.e., if the weights m_{ik} from a given noise source k have identical sign for all targets i, the first contribution is always positive. In the sharednoise scenario, \({C}_{{\rm{corr}},ij}^{{\rm{in}}}\) is zero, since, by definition, the sources are uncorrelated. In this case, the average noise input correlation is solely determined by the connectivity statistics (see inset in Fig. 4, compare dark gray and blue). In the networknoise scenario, in contrast, the sources are not uncorrelated due recurrent interactions in the noise network. As shown in^{30,31}, \({C}_{{\rm{corr}},ij}^{{\rm{in}}}\) is negative in balanced inhibitiondominated recurrent neuronal networks (both in purely inhibitory and in excitatoryinhibitory networks) and nearly cancels the contribution \({C}_{{\rm{shared}},ij}^{{\rm{in}}}\) of shared inputs, such that the total input correlation \({C}_{ij}^{{\rm{in}}}\) is close to zero. Shared components of the input fluctuations are canceled by inhibitory feedback, resulting in nearly uncorrelated inputs despite substantial overlap in the presynaptic populations. Here, we exploit exactly this effect: a network in the balanced state supplies noise with a correlation structure that suppresses sharedinput correlations. As noise input correlations are decreased, the performance of the sampling networks is increased (Fig. 4).
Deterministic neural networks serve as a suitable noise source for a model of handwrittendigit generation
All realizations within the ensemble of unspecific, randomly generated sampling networks considered so far exhibit consistent performance characteristics (cf. narrow error bands in Figs. 2 and 4). Here, we demonstrate similar behaviour for a sampling network where the weights and biases are not chosen randomly but trained for a specific task – the generation of handwritten digits with imbalanced class frequencies (see Methods). Since it is not possible to measure the state distribution over all units in the network (2^{786+10} states), we restrict the analysis to the states of label units as a compressed representation of the full network states. Training is performed using ideal Boltzmann machines. Weights and biases are calibrated as before. Noise is added to the training samples to reduce overfitting and thereby improve mixing performance. To make the task more challenging, the Boltzmann machine is trained to generated odd digits twice as often as even digits (see Methods).
The results are similar to those obtained for sampling networks with random weights and biases: (i) networks with private external noise perform close to optimal, (ii) shared noise correlations impair network performance, and (iii) the performance is restored by employing a recurrent network for noise generation (Fig. 5). Thus, deterministic recurrent neural networks qualify as a suitable noise source for practical applications of neural networks performing probabilistic computations.
Sharedinput correlations impair network performance for highentropy tasks
The dynamics of a BM representing a highentropy distribution evolve on a flat energy landscape with shallow minima, resulting in small pairwise correlations between sampling units. Here, the sampling process is sensitive to perturbations in statistical dependencies, such as those caused by sharedinput correlations. In contrast, the sampling dynamics in BMs representing lowentropy distributions with pronounced peaks are dominated by deep minima in the energy landscape. In this case, correlations between sampling units are typically large and noise input correlations have little effect.
We systematically vary the entropy of the target distribution by changing the inverse temperature β in a BM and adjusting the relative noise strength in the other cases accordingly (Fig. 6). Since β always appears as a multiplicative factor in front of weights and biases, this is equivalent to scaling weights and biases globally. For small entropies, the sampling error for shared and network noise is comparable to the error obtained with private noise, despite substantial sharedinput correlations. Consistent with the intuition provided above, the sampling error for shared noise increases significantly with increasing entropy, whereas in the other cases it remains low.
We conclude that generally the effect of sharednoise correlations on the functional performance of sampling networks depends on the entropy of the target distribution, or, equivalently, on the absolute magnitude of functional correlations between sampling units. For highentropy tasks, such as pattern generation, sharedinput correlations can be highly detrimental. For lowentropy tasks, such as pattern classification, they presumably play a less significant role. Nevertheless, independent of the entropy of the task, functional performance for networkgenerated noise is close to optimal.
Small recurrent networks provide large sampling networks with noise
Both from a biological as well as a technical point of view, it makes sense to minimize material and energy costs for noise generation. To achieve a good sampling performance, the number N of noise sources as well as the number K of noise inputs per functional unit need to be sufficiently large (Fig. 4). Therefore, a certain minimal amount of resources have to be reserved for noise generation. However, once these resources are allocated, small recurrent networks can provide noise for large sampling networks without sacrificing computational performance. We note in passing that, moreover, a single noise network can supply an arbitrary number of independent functional networks with noise.
Here, we vary the size of the sampling network M, while keeping N and the number m of observed neurons fixed (Fig. 7). As the variance of the input distribution for a neuron in the sampling network scales proportionally to its indegree, and the sampling network is fully connected, increasing M reduces the effective noise amplitude. As a consequence, the entropy of the marginal distribution over the subset of observed neurons changes (see Supplementary Material), thereby influencing the sampling performance in the presence of sharednoise correlations (see previous section). To avoid this effect, we scale the weights in the sampling network with \(\mathrm{1/}\sqrt{M}\)^{30,39,41}, thereby keeping the entropy of the marginal target distribution approximately constant (Fig. 7 inset, gray curve).
In the presence of private noise, the sampling error is small and independent of M (Fig. 7). As before, the performance is considerably impaired for shared noise. The decrease in the error for larger sampling networks cannot be traced back to a change in entropy, by virtue of the weight scaling. Instead, the decrease results from a more efficient suppression of external correlations within the sampling network arising from the growing negative feedback for increasing M in sampling networks with net recurrent inhibition^{39}. Still, even for large M, the error remains significantly larger than the one obtained with private noise. For network noise, in contrast, the error is almost as small as for private noise, and independent of M. Qualitatively similar findings are also obtained without scaling synaptic weights unless the entropy of the target distribution is too small (details see Supplementary Material).
Networks of spiking neurons implement neural sampling without noise
The results so far rest on networks of binary model neurons. Their dynamics are well understood^{30,34,39,40,41}, and their mathematical tractability simplifies the calibration of samplingnetwork parameters for networkgenerated noise. Neurons in mammalian brains communicate, however, predominantly via short electrical pulses (spikes). It was shown previously^{15,42}, that networks of spiking neurons with private external noise can approximately represent arbitrary Boltzmann distributions, if binaryunit parameters are properly translated to spikingneuron parameters (see gray curve in Fig. 8; see Methods and Supplementary Material).
Consistent with our results on binary networks, the sampling performance of networks of spiking leaky integrateandfire neurons decreases in the presence of sharednoise correlations, but recovers for noise provided by a recurrent network of spiking neurons resembling a local cortical circuit with natural connection density and activity statistics (Fig. 8). Similar to binary networks, a minimal noisenetwork size N ensures an asynchronous activity regime, a prerequisite for good sampling performance. Spiking noise networks that are too densely connected (K/N → 1) tend to synchronize, causing large sampling errors (see red curve in Fig. 8 for small N).
Discussion
Consistent with the high variability in the activity of biological neural networks^{17}, many models of highlevel brain function rely on the presence of some form of noise. We propose that additive input from deterministic recurrent neural networks serves as a well controllable source of noise for functional network models. The article demonstrates that networks of deterministic units with input from such noisegenerating networks can approximate a large variety of target distributions and perform well in probabilistic generative tasks. This scheme covers both networks of binary and networks of spiking model neurons, and leads to an economic usage of resources in biological and artificial neuromorphic systems.
From a biological perspective, our concept is tightly linked to experimental evidence. In the absence of synaptic input (in vitro), fluctuations in the membrane potentials of single neurons are negligible. Consequently, the variability in neuronal invitro responses is small^{20,24,25}. In the presence of synaptic inputs from an active surrounding network (in vivo), in contrast, fluctuations in membrane potentials are substantial and the response variability is large^{3}. Furthermore, biological neural networks exhibit an abundance of inhibitory feedback connections. The active suppression of sharedinput correlations by inhibitory feedback^{30,31}, i.e., the mechanism underlying the present work, accounts for the small correlations in the activity of pairs of neurons observed in vivo^{29}. Moreover, the theory correctly describes the specific correlation structure of inputs observed in pairwise invivo cell recordings^{43}. Hence, active decorrelation via inhibitory feedback shapes the invivo activity. Here, we propose a functional role for this decorrelation mechanism: cortical circuits supply each other with quasiuncorrelated noise. Note that a similar mechanism for variability injection has been hypothesized to lie at the basis of song learning in Zebra finches: a corticalbasal ganglia loop actively generates variability necessary for successful motor learning^{44,45}.
For conceptual simplicity, the study segregates a neuronal network into a functional and a noisegenerating module. In biological substrates, these two modules may be intermingled. The noisegenerating module may be interpreted as an ensemble of distinct functional networks serving as a “heat bath” for a specific functional circuit. In this view, one network’s function is another network’s noise.
We show that sharednoise correlations can be highly detrimental for sampling from given target distributions. Generating noise with recurrent neural networks overcomes this problem by exploiting active decorrelation in networks with inhibitory feedback^{30,31}. As an alternative solution, the effect of sharedinput correlations could be mitigated by training functional network models in the presence of these correlations^{46}. However, this approach is specific to particular network models. Moreover, it prohibits porting of models between different substrates. Networks previously trained under specific noise conditions will not perform well in the presence of noise with a different correlation structure. Our approach, in contrast, constitutes a generalpurpose solution which can also be employed for models that cannot easily be adapted to the noise statistics, such as hardwired functional network models^{26,47} or bottomup biophysical neuralnetwork models^{48,49}.
In biological neural networks, the probabilistic gating of ion channels in the cell membrane^{19} and the variability in synaptic transmission^{18} constitute alternative potential sources of stochasticity. However, for the majority of stochastic network models, ionchannel noise is too small to be relevant: in the absence of (evoked or spontaneous) synaptic input, fluctuations in membrane potentials recorded in vitro are in the μV range and hence negligible compared to the mV fluctuations necessary to support samplingbased approaches^{15}. Synaptic stochasticity has been studied both in vitro^{18,50,51} and in vivo^{52,53} and comes in two distinct forms: spontaneous release and variability in evoked postsynaptic response amplitudes, including synaptic failure. The rate of spontaneous synaptic events measured at the soma of the target neuron is in the range of a few events per second^{54,55}. The resulting fluctuations in the input are therefore negligible. The variability in postsynaptic response amplitudes, in contrast, is substantial and can have multiple origins: in the absence of background activity (in vitro), response amplitudes vary due to a quasistochastic fusion of vesicles with the presynaptic membrane and release of neurotransmitter. In vivo, other complex deterministic processes such as the interplay between background input and shortterm plasticity, the voltage dependence of synaptic currents or shunting may further contribute to this form of quasistochasticity. The variability in postsynaptic response amplitudes has often been suggested as a plausible noise resource for computations in neural circuits^{16,56,57,58,59,60}. Due to its multiplicative, statedependent nature, this form of noise is fundamentally different from the additive noise usually employed in sampling models. Neftci et al.^{16} propose a model of stochastic computation in neuronal substrates employing a specific model of synaptic stochasticity. Due to the statedependent nature of noise generated by stochastic synapses, the resulting systems do not resemble Boltzmann machines in general. The authors nevertheless demonstrate that such networks can be trained to classify handwritten digits with contrastive divergence, a learning algorithm specific to Boltzmann machines. Apart from this specific experimental demonstration, the authors do not provide any systematic analysis of their model. In particular, it remains unclear why and under what conditions contrastive divergence is a suitable learning algorithm. A theoretically solid model of samplingbased computations in neuronal substrates employing synaptic stochasticity as a noise resource remains a topic for future studies.
The present work focuses on a specific class of neuronal networks performing samplingbased probabilistic inference. An alternative approach to samplingbased Bayesian computation in neural circuits is provided by models relying on a parametric instead of a samplebased representation of probability distributions^{5,61,62,63}. In contrast to the methods considered here, the posterior distributions are computed essentially instantaneously without requiring the collection of samples. Such a parametric approach however comes at the cost of restricting the distributions that can be represented by a particular network architecture. In addition, learning in these systems remains a topic of ongoing research, while powerful learning algorithms exist for networks performing samplingbased inference^{36}.
Some neuromorphichardware systems follow innovative approaches to the generation of uncorrelated noise for stochastic network models, such as exploiting thermal noise and trialtotrial fluctuations in neuron parameters^{64,65,66}. However, hardware systems need to be specifically designed for a particular technique and sacrifice chip area that otherwise could be used to house neurons and synapses. The solution proposed in this article does not require specific hardware components for noise generation. It solely relies on the capability of emulating recurrent neural networks, the functionality most neuromorphichardware systems are designed for. On the analog neuromorphic system Spikey^{67}, for example, it has already been demonstrated that decorrelation by inhibitory feedback is effective and robust, despite large heterogeneity in neuron and synapse parameters and without the need for timeconsuming calibrations^{68}. While a full neuromorphichardware implementation of the framework proposed here is still pending, the demonstration on Spikey shows that our solution is immediately implementable and feasible.
Methods
Binary network simulation
Sampling networks consist of M binary units that switch from the inactive (0) to the active (1) state with a probability F_{i}(h_{i}) := p(s_{i} = 1h_{i}), also referred to as the “activation function”. The input field h_{i} of a unit depends on the state of the presynaptic units and is given by:
Here w_{ij} denotes the weight of the connection from unit j to unit i and b_{i} denotes the bias of unit i. We perform an eventdriven update, drawing subsequent interupdate intervals τ_{i} ~ Exp(λ) for each unit from an exponential distribution with rate λ := 1/τ with an average update interval τ. Starting from t = 0, we update the neuron with the smallest update time t_{i}, choose a new update time for this unit t_{i} + τ_{i} and repeat this procedure until any t_{i} is larger than the maximal simulation duration T_{max}. Formally, this update schedule is equivalent to an asynchronous update where a random unit is selected at every update step^{22,30,39,69}. The introduction of “update times” only serves to introduce a natural timescale of neuronal dynamics (see, e.g.^{39}).
Random sampling networks
Weights are randomly drawn from a beta distribution Beta(a, b) and shifted to have mean μ_{BM}. We choose the beta distribution with a = 2, b = 2 as it generates interesting Boltzmann distributions while having finite support, thereby reducing the probability of generating distributions with almost isolated states. The small error across randomly chosen initial conditions in Figs. 2, 4, 6 and 7 indicates that all randomly generated sampling networks indeed possess good mixing properties, i.e., the typical time taken to traverse the state space is much smaller than the total sampling duration. Weights are symmetric (w_{ij} = w_{ji}) and self connections are absent (w_{ii} = 0). To control the average activity in the network, the bias for each unit is chosen such that on average, it cancels the input from the other neurons in the network for a desired average activity 〈s〉: b_{i} = Mμ〈s〉^{39}. Whenever a unit is updated, the state of (a subset) of all units in the sampling network is recorded. To remove the influence of initial transients, i.e., the burnin time of the Markov chain, samples during the initial interval of each simulation (T_{warmup}) are excluded from the analysis. From the remaining samples we compute the empirical distribution p of network states. The following sections introduce the activation function for the units for different ways of introducing noise to the system.
Intrinsic noise
Intrinsically stochastic units switch to the active state with probability
where β determines the slope of the logistic function and is also referred to as the “inverse temperature”. For small β, changes in the input field have little influence of the update probability, while for large beta a unit is very sensitive to changes in h_{i} and in the limit β → ∞ the activation function becomes a Heaviside step function. Symmetric networks with these singleunit dynamics and the update schedule described in Binary network simulation are identical to Boltzmann machines, leading to a stationary distribution of network states of Boltzmann form:
Instead of directly prescribing a stochastic update rule like Eq. 3, we can view these units as deterministic units with a Heaviside activation function and additive noise on the input field:
with \({\xi }_{i} \sim \frac{\beta }{4}\mathrm{(1}{\tanh }^{2}(\beta {\xi }_{i}))\)^{37} and Θ denoting the Heaviside step function
Averaging over the noise ξ_{i} yields the probabilistic update rule (Eq. 3). However, on biophysical grounds it is difficult to argue for this particular distribution of the noise.
Private noise
We consider a deterministic model in which we assume a more natural distribution for the additive noise, namely Gaussian form (\({\xi }_{i} \sim {\mathscr{N}}({\mu }_{i},{\sigma }_{i}^{2})\)), for example arising from a large number of independent background inputs^{38}. In this case, the noise averaged activity for fixed h_{i} is given by:
Similar to the intrinsically stochastic units (Intrinsic noise), the update rule for deterministic units with Gaussian noise is effectively probabilistic. Both functions share some general properties (bounded, monotonic):
and one can hence hope to approximate the dynamics in Boltzmann machines with a network of deterministic units with Gaussian noise by a systematic matching of parameters.
One approach is to choose parameters for the Gaussian noise such that the difference between the two activation functions is minimized. To simplify notation we drop the index i in the following calculations. Since both activation functions are symmetric around zero, we require that their value at h = 0 is identical, fixing one parameter of the noise distribution (μ = 0). To find an expression for the noise strength σ, the simplest method equates the coefficients of a Taylor expansion up to linear order of both activation functions around zero. For the logistic activation function (Eq. 3) this yields:
while for the units with Gaussian noise (Eq. 6) we obtain
Equating the coefficients of h gives an expression for the noise strength σ as a function of the inverse temperature β:
While this approach is conceptually simple, the Taylor expansion around zero leads to large deviations between the activation functions for input fields different from zero (Fig. 9).
Another option taking into account all possible values of h is to minimize the L^{2} difference of the two activation functions:
where l denotes the logistic and g the activation function for Gaussian noise. Since it is not possible to analytically evaluate the resulting integral, we opt for a slightly simpler approach: minimizing the L^{2} difference of integrals of the activation function from −∞ to 0:
with capital letters denoting antiderivatives. To find the minimal σ, we take the derivative of the right hand side with respect to σ′ and equate it with zero:
From this we observe that
is a sufficient condition to satisfy this equation. We compute the integral of both activation functions. For the logistic activation function (Eq. 3) we obtain:
with the definite integral
since the two diverging terms for h → −∞ cancel. For the activation function with Gaussian noise (Eq. 6) we get:
and computing the definite integral leads to:
since the second term vanishes for h → −∞ as the complementary error function decreases faster than h^{−1}. From Eq. 9 we hence find σ as a function of β:
Even though this value is not minimizing the L^{2} difference, it provides a better fit than that obtained by simply Taylor expanding around zero, since in this case we are also taking into account the mismatch for larger absolute values of h (Fig. 9). We will hence use Eq. 10 to translate between the inverse temperature β of the logistic activation function and the strength σ of the Gaussian noise.
Shared noise
In the previous section we have assumed that each deterministic unit in the sampling network receives private, uncorrelated Gaussian noise. Now we instead consider a second population \( {\mathcal B} \) of \(N= {\mathcal B} \) mutually unconnected, intrinsically stochastic units with logistic activation functions (cf. Intrinsic noise) that provide additional input to units in the sampling network. In the following we will denote the population/set of units in the sampling network by \({\mathscr{S}}\) and refer to the second population as the background population or noise population. The input field for a unit i in the sampling network \({\mathscr{S}}\) hence contains an additional term arising from projections from the background population (cf. Eq. 2):
Here z_{k} denotes the state of the k th unit in the background population \({\mathscr{B}}\) and m_{ij} the weight from unit j in the background population to unit i in the sampling network. Given the total input field \({h^{\prime} }_{i}\), the neurons in the sampling network change their state deterministically, according to
Since the units in the background population are mutually unconnected, their average activity 〈z_{i}〉 can be arbitrarily set by adjusting their bias: b_{k} = F^{−1}(〈z_{k}〉), where F^{−1} denotes the inverse of the logistic activation function:
Ignoring the actual state of the background population, we can employ the central limit theorem and approximate the background input in the input field \({h^{\prime} }_{i}\) by a normal distribution with mean and variance given by
The total input field can then be written as \({h^{\prime} }_{i}\) = h_{i} + ξ_{i} with \({\xi }_{i} \sim {\mathscr{N}}({\mu }_{i},{\sigma }_{i}^{2})\), as in the case of private uncorrelated Gaussian noise. However, note that correlations in input fields \({h^{\prime} }_{i}\) and \({h^{\prime} }_{j}\) in the sampling network arise due to units in the background population projecting to multiple units in the sampling network (〈(ξ_{i} − μ_{i})(ξ_{j} − μ_{j})〉 does not necessarily vanish for all \(i,j\in {\mathscr{S}}\)).
For the connections from the background population we use fixed weights and impose Dale’s law, i.e., units are either excitatory m_{ij} = w > 0 ∀i or inhibitory m_{ij} = −gw < 0 ∀i, with a ratio of excitatory units of \(\gamma ={ {\mathcal B} }_{E}/ {\mathcal B} \). Here \(w\in {{\mathbb{R}}}^{+}\) denotes the excitatory synaptic weight and \(g\in {{\mathbb{R}}}^{+}\) a scaling factor for the inhibitory weights. Each unit \(i\in {\mathscr{S}}\) in the sampling network receives exactly \(K=\varepsilon N\) inputs from units in the background population. \(\varepsilon =K/N\in [0,1]\) is referred to as the connectivity. We do not allow multiple connections between a unit in the sampling network and unit in the background population. Assuming all units in the background population have identical average activity 〈z〉, all units in the sampling network receive statistically identical input and the equations for the mean and variance simplify to
We can hence employ the same procedure as in the previous section to relate the strength of the background input to the inverse temperature of a Boltzmann machine.
Network noise
We now consider a background population of deterministic units projecting to the sampling network. The background population has sparse, random, recurrent connectivity with a fixed indegree. Connections in the background population are realized with the same indegrees K, weights w and −gw and ratio of excitatory inputs γ as the connections to the sampling network (cf. Shared noise). The connection matrix of the background population is hence generally asymmetric. As before, we can approximate the additional contribution to the input fields of neurons in the sampling network with a normal distribution, with parameters
where the additional term in the input variances arises from correlations c_{kl} :=〈(z_{k} − 〈z_{k}〉)(z_{k} − 〈z_{k}〉)〉 between units in the background population. As in the sampling network we choose the bias to cancel the expected average input from other units in the network for a desired mean activity 〈z_{k}〉. However since the second population exhibits rich dynamics due to its recurrent connectivity the actual average activity will deviate from this value, in particular due to an influence of correlations on the mean activity. We employ an iterative meanfieldtheory approach that allows us to compute average activities and average correlations approximately from the statistics of the connectivity. We now shortly summarize this approach following^{39}. Note that in the literature a threshold variable θ_{i} is often used instead the bias b_{i}, which differs in the sign: b_{i} = −θ_{i}.
For a network of binary units, the joint distribution of network states p(s) contains all information necessary to statistically describe the network activity, in particular mean activities and correlations. It can be obtained by solving the Master equation of the system, which determines how the probability masses of network states evolve over time in terms of transition probabilities between different states^{70}
The first term describes probability mass moving into state i from other states j and the second term probability mass moving from state i to other states j. Since in general, and in particular in large networks, Eq. 19 is too difficult to solve directly, we focus on obtaining equations for first two momenta of p(s). Starting from the master equation one can derive the following selfconsistency equations for the mean activity of units in a homogeneous network by assuming fluctuations around their mean input to be statistically independent^{39}:
where the μ_{i} and σ_{i} are given by Eqs. 17 and 18, respectively. To obtain the average activity in the stationary state, i.e., for ∂_{t}〈s_{i}〉 = 0, this equation needs to be solved selfconsistently since the activity of unit i can influence its input statistics (μ_{i}, σ_{i}) through the recurrent connections. By assuming homogeneous excitatory and inhibitory populations, the N dimensional problem reduces to a twodimensional one^{39}:
with \(\alpha \in \{E,I\}\). The populationaveraged equations for the mean and variance of the input hence are^{39}:
with K_{EE} = K_{IE} = γN, K_{EI} = K_{II} = (1 − γ)N and w_{EE} = w_{IE} = w, w_{EI} = w_{II} = −gw. To derive a selfconsistency equation for pairwise correlations from the master equation one linearize the threshold activation function by considering a Gaussian distribution of the input field caused by recurrent inputs. This leads to the following set of linear equations for the populationaveraged covariances^{39}:
with
The effective populationaveraged weights \({\tilde{w}}_{\alpha \beta }\) are defined as:
with the susceptibility given by \(S({\mu }_{\alpha }{\sigma }_{\alpha }):=\frac{1}{\sqrt{2\pi }{\sigma }_{\alpha }}\exp (\frac{{({\mu }_{\alpha }+{b}_{\alpha })}^{2}}{2{\sigma }_{\alpha }})\)^{39}. Since the average activity and covariances are mutually dependent, we employ an iterative numerical scheme in which we first determine the stationary activity under the assumption of zero correlations according to Eq. 20. Using this result we compute the populationaveraged covariances from Eq. 23 which in turn can be used to improve the estimate for the stationary activity since they influence input statistics according to Eq. 22. We repeat this procedure until the values for populationaveraged activities and covariances in two subsequent iterations do not differ significantly any more. The mean activity and correlations in the recurrent background population obtained via this procedure, allows us to compute the input statistics in the sampling network and hence relate the inverse temperature to the mean and variance of the input as in Private noise.
Certain assumptions enter this analytical description of network statistics, which might not be fulfilled in general. The description becomes much more complicated for spiking neuron models with nonlinear subthreshhold dynamics like conductancebased neurons in neuromorphic systems^{15}. In this case, one can resort to empirically measuring the input statistics for a single isolated neuron given a certain arrangement of background sources (cf. Calibration (spiking networks)). An advantage of this methods is that it is easy and straight forward to implement and will work for any configuration of background populations and sampling networks, allowing for arbitrary neuron models and parameters. However, to estimate the statistics of the input accurately, one needs to collect statistics over a significant amount of time.
Calibration (binary networks)
The methods discussed above allow us to compute effective inverse temperature β_{eff} from the statistics of different background inputs, either additive Gaussian noise, a population of intrinsically stochastic units or a recurrent network of deterministic units. To approximate Boltzmann distributions via samples generated by networks with noise implemented via these alternative methods, we match their (effective) inverse temperatures. A straightforward option is to adjust the noise parameter according to the desired input statistics. While this is possible in the case of additive Gaussian noise for which we can freely adjust μ_{i} and σ_{i}, it is difficult to achieve in practice for the other methods. We can achieve the same effect by rescaling the weights and biases in the sampling network. The inverse temperature β appears as a multiplicative factor in front of weights and biases in the stationary distribution of network states (Eq. 4). Scaling β is hence equivalent to scaling all weights and biases by the inverse factor^{15,40,71}. An infinite amount of Boltzmann machines hence exists, all differing in weights (\(w\to \alpha w,\alpha \in {{\mathbb{R}}}^{+}\)), biases (b → αb) and inverse temperatures (β → β/α), producing statistically identical samples. Given a mean background input μ_{i} and an effective inverse temperature β_{eff}(σ_{i}) (cf. Eq. 10) arising from a particular realization of noise sources, we can emulate a Boltzmann machine at inverse temperature β by rescaling all weights and biases in the sampling network according to
This method hence only requires us to adapt weights and biases globally in the sampling network according to the statistics arising from an arbitrary realization of background input.
Handwrittendigit generation
In the generative task, we measure how well a sampling network with various realizations of background noise can approximate a trained data distribution, in contrast to the random distributions considered in the other simulations. We use contrastive divergence (CD1^{36}) to train a Boltzmann machine on a specific dataset. We consider a dataset consisting of a subset of MNIST digits^{72}, downscaled to 12 × 12 pixels and with grayscale values converted to black and white. We select one representative from each class (0…9) and extend the 144 array determining the pixel values with 10 entries for a onehot encoding of the corresponding class, e.g., for the pattern zero, the last ten entries contain a 1 at the first place and zeros otherwise. These ten 154 dimensional patterns form the prototype dataset. A (noisy) training sample is generated by flipping every pixel from the first 144 entries of a prototype pattern with probability p^{flip}. After training, the network should represent a particular distribution q^{*} over classes. Training directly on a samples generated according to the class distribution q^{*} will, in general, lead to a different stationary distribution of onehot readout states p generated by the network, since some patterns are more salient then others. For example, by training on equal amounts of patterns of zeros and ones, the network will typically generate more zero states. To nevertheless represent q^{*} with the network, we iteratively train the Boltzmann machine choosing images and labels from a distribution q that is adjusted between training sessions (Alg. 1).
Over many repetitions this procedure will lead to a stationary distribution of classes p that closely approximates q^{*}.
After training a Boltzmann machine using this approach, we obtain a set of parameters, w and b, that can be translated to parameters for sampling networks by appropriate rescaling as discussed above. We collect samples from p by running the network in the absence of any input and recording the states of all label units.
Calibration (spiking networks)
Similar as for binary units, we need to match the parameters for spiking sampling networks to their respective counterparts in Boltzmann machines. We use highrate excitatory and inhibitory inputs to turn the deterministic behavior of a leakyintegrateandfire neuron into an effectively stochastic response^{15}. However, in contrast to the original publication, we consider currentbased synapses for simplicity. Since the calibration is performed on single cell level, we use the identical calibration scheme for the private, shared and network case. For a given configuration of noise sources, we first simulate the noise network with the specified parameters and measure its average firing rate. The corresponding independent Poisson sources are set to fire with the same rate to ensure comparability between the two approaches. The calibrations are then performed by varying the resting potential and recording the average activity of a single cell that is supplied with input from either a noise network or Poisson sources. The private case is calibrated separately in a similar manner. By fitting the logistic function to the activation obtained by this procedure, we obtain two parameters, a shift and a scaling parameter, which are used to translate the synaptic weights from binary units to spiking neurons^{15,73}.
References
 1.
Knill, D. C. & Pouget, A. The bayesian brain: the role of uncertainty in neural coding and computation. TRENDS Neurosci. 27, 712–719 (2004).
 2.
Fiser, J., Berkes, P., Orbán, G. & Lengyel, M. Statistically optimal perception and learning: from behavior to neural representations. Trends cognitive sciences 14, 119–130 (2010).
 3.
Shadlen, M. N. & Newsome, W. T. The variable discharge of cortical neurons: implications for connectivity, computation, and information coding. J. neuroscience 18, 3870–3896 (1998).
 4.
Hoyer, P. O. & Hyvärinen, A. Interpreting neural response variability as monte carlo sampling of the posterior. In Advances in neural information processing systems, 293–300 (2003).
 5.
Ma, W. J., Beck, J. M., Latham, P. E. & Pouget, A. Bayesian inference with probabilistic population codes. Nat. neuroscience 9, 1432 (2006).
 6.
Berkes, P., Orbán, G., Lengyel, M. & Fiser, J. Spontaneous cortical activity reveals hallmarks of an optimal internal model of the environment. Sci. 331, 83–87 (2011).
 7.
Hartmann, C., Lazar, A., Nessler, B. & Triesch, J. Where’s the noise? Key features of spontaneous activity and neural variability arise through learning in a deterministic network. PLoS computational biology 11, e1004640 (2015).
 8.
Orbán, G., Berkes, P., Fiser, J. & Lengyel, M. Neural variability and samplingbased probabilistic representations in the visual cortex. Neuron 92, 530–543 (2016).
 9.
Hinton, G. E. & Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. science 313, 504–507 (2006).
 10.
Salakhutdinov, R. & Hinton, G. E. Deep boltzmann machines. In AISTATS 1, 3 (2009).
 11.
Burkitt, A. N. A review of the integrateandfire neuron model: I. homogeneous synaptic input. Biol. cybernetics 95, 1–19 (2006).
 12.
Burkitt, A. N. A review of the integrateandfire neuron model: Ii. inhomogeneous synaptic input and network properties. Biol. cybernetics 95, 97–112 (2006).
 13.
Destexhe, A. & Contreras, D. Neuronal computations with stochastic network states. Sci. 314, 85–90 (2006).
 14.
Buesing, L., Bill, J., Nessler, B. & Maass, W. Neural dynamics as sampling: a model for stochastic computation in recurrent networks of spiking neurons. PLoS computational biology 7, e1002211 (2011).
 15.
Petrovici, M. A., Bill, J., Bytschok, I., Schemmel, J. & Meier, K. Stochastic inference with spiking neurons in the highconductance state. Phys. Rev. E 94, 042312 (2016).
 16.
Neftci, E. O., Pedroni, B. U., Joshi, S., AlShedivat, M. & Cauwenberghs, G. Stochastic synapses enable efficient braininspired learning machines. Front. neuroscience 10 (2016).
 17.
Faisal, A. A., Selen, L. P. & Wolpert, D. M. Noise in the nervous system. Nat. reviews. Neurosci. 9, 292 (2008).
 18.
Branco, T. & Staras, K. The probability of neurotransmitter release: variability and feedback control at single synapses. Nat. Rev. Neurosci. 10, 373–383 (2009).
 19.
White, J. A., Rubinstein, J. T. & Kay, A. R. Channel noise in neurons. Trends neurosciences 23, 131–137 (2000).
 20.
Holt, G. R., Softky, W. R., Koch, C. & Douglas, R. J. Comparison of discharge variability in vitro and in vivo in cat visual cortex neurons. J. Neurophysiol. 75, 1806–1814 (1996).
 21.
Destexhe, A. & RudolphLilith, M. Neuronal Noise, Volume 8 of Springer Series in Computational Neuroscience (New York, NY: Springer, 2012).
 22.
Ackley, D. H., Hinton, G. E. & Sejnowski, T. J. A learning algorithm for boltzmann machines. Cogn. science 9, 147–169 (1985).
 23.
Habenschuss, S., Jonke, Z. & Maass, W. Stochastic computations in cortical microcircuit models. PLoS computational biology 9, e1003311 (2013).
 24.
Bryant, H. L. & Segundo, J. P. Spike initiation by transmembrane current: a whitenoise analysis. The J. physiology 260, 279–314 (1976).
 25.
Mainen, Z. F. & Sejnowski, T. J. Reliability of spike timing in neocortical neurons. Sci. 268, 1503–1506 (1995).
 26.
Lundqvist, M., Rehn, M., Djurfeldt, M. & Lansner, A. Attractor dynamics in a modular network model of neocortex. Network: Comput. Neural Syst. 17, 253–276 (2006).
 27.
van Vreeswijk, C. & Sompolinsky, H. Chaos in neuronal networks with balanced excitatory and inhibitory activity. Sci. 274, 1724–1726 (1996).
 28.
Brunel, N. Dynamics of sparsely connected networks of excitatory and inhibitory spiking neurons. J. computational neuroscience 8, 183–208 (2000).
 29.
Ecker, A. S. et al. Decorrelated neuronal firing in cortical microcircuits. science 327, 584–587 (2010).
 30.
Renart, A. et al. The asynchronous state in cortical circuits. science 327, 587–590 (2010).
 31.
Tetzlaff, T., Helias, M., Einevoll, G. T. & Diesmann, M. Decorrelation of neuralnetwork activity by inhibitory feedback. PLoS Comput. Biol 8, e1002596 (2012).
 32.
Schemmel, J. et al. A waferscale neuromorphic hardware system for largescale neural modeling. In Circuits and systems (ISCAS), proceedings of 2010 IEEE international symposium on, 1947–1950 (IEEE, 2010).
 33.
Furber, S. B. et al. Overview of the spinnaker system architecture. IEEE Transactions on Comput. 62, 2454–2467 (2013).
 34.
Ginzburg, I. & Sompolinsky, H. Theory of correlations in stochastic neural networks. Phys. review E 50, 3171 (1994).
 35.
Geman, S. & Geman, D. Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Transactions on Pattern Analysis Mach. Intell. 6, 721–741 (1984).
 36.
Hinton, G. E. Training products of experts by minimizing contrastive divergence. Neural computation 14, 1771–1800 (2002).
 37.
Coolen, A. C. C. Statistical mechanics of recurrent neural networks i. statics. Handb. biological physics 4, 553–618 (2001).
 38.
Hinton, G. E., Sejnowski, T. J. & Ackley, D. H. Boltzmann machines: Constraint satisfaction networks that learn. Tech. Rep., Department of Computer Science, CarnegieMellon University Pittsburgh, PA (1984).
 39.
Helias, M., Tetzlaff, T. & Diesmann, M. The correlation structure of local cortical networks intrinsically results from recurrent dynamics. PLoS Comput. Biol 10, e1003428 (2014).
 40.
Dahmen, D., Bos, H. & Helias, M. Correlated fluctuations in strongly coupled binary networks beyond equilibrium. Phys. Rev. X 6, 031024, https://doi.org/10.1103/PhysRevX.6.031024 (2016).
 41.
van Vreeswijk, C. & Sompolinsky, H. Chaotic balanced state in a model of cortical circuits. Neural computation 10, 1321–1371 (1998).
 42.
Probst, D. et al. Probabilistic inference in discrete spaces can be implemented into networks of lif neurons. Front. computational neuroscience 9 (2015).
 43.
Okun, M. & Lampl, I. Instantaneous correlation of excitation and inhibition during ongoing and sensoryevoked activities. Nat. neuroscience 11, 535–537 (2008).
 44.
Woolley, S. & Kao, M. Variability in action: contributions of a songbird corticalbasal ganglia circuit to vocal motor learning and control. Neurosci. 296, 39–47 (2015).
 45.
Heston, J. B., Simon, J. IV, Day, N. F., Coleman, M. J. & White, S. A. Bidirectional scaling of vocal variability by an avian corticobasal ganglia circuit. Physiol. reports 6, e13638 (2018).
 46.
Bytschok, I., Dold, D., Schemmel, J., Meier, K. & Petrovici, M. A. Spikebased probabilistic inference with correlated noise. arXiv preprint arXiv:1707.01746 (2017).
 47.
Jonke, Z., Habenschuss, S. & Maass, W. Solving constraint satisfaction problems with networks of spiking neurons. Front. neuroscience 10 (2016).
 48.
Potjans, T. C. & Diesmann, M. The celltype specific cortical microcircuit: relating structure and activity in a fullscale spiking network model. Cereb. cortex 24, 785–806 (2012).
 49.
Schmidt, M. et al. Fulldensity multiscale account of structure and dynamics of macaque visual cortex. arXiv preprint arXiv:1511.09364 (2015).
 50.
Markram, H., Lübke, J., Frotscher, M. & Sakmann, B. Regulation of synaptic efficacy by coincidence of postsynaptic APs and EPSPs. Sci. 275, 213–215 (1997).
 51.
Silver, R. A., Lübke, J., Sakmann, B. & Feldmeyer, D. Highprobability uniquantal transmission at excitatory synapses in barrel cortex. Sci. 302, 1981–1984 (2003).
 52.
Crochet, S., Chauvette, S., Boucetta, S. & Timofeev, I. Modulation of synaptic transmission in neocortex by network activities. Eur. J. Neurosci. 21, 1030–1044 (2005).
 53.
Pala, A. & Petersen, C. C. In vivo measurement of celltypespecific synaptic connectivity and synaptic transmission in layer 2/3 mouse barrel cortex. Neuron 85, 68–75 (2015).
 54.
Hardingham, N. R. & Larkman, A. U. The reliability of excitatory synaptic transmission in slices of rat visual cortex in vitro is temperature dependent. The J. Physiol. 507, 249–256 (1998).
 55.
Locke, R., Vautrin, J. & Highstein, S. Miniature EPSPs and sensory encoding in the primary afferents of the vestibular lagena of the toadfish, opsanus tau. Annals New York Acad. Sci. 871, 35–50 (1999).
 56.
Levy, W. B. & Baxter, R. A. Energyefficient neuronal computation via quantal synaptic failures. J. Neurosci. 22, 4746–4755 (2002).
 57.
Rosenbaum, R., Rubin, J. & Doiron, B. Short term synaptic depression imposes a frequency dependent filter on synaptic information transfer. PLoS computational biology 8, e1002557 (2012).
 58.
Maass, W. Noise as a resource for computation and learning in networks of spiking neurons. Proc. IEEE 102, 860–880 (2014).
 59.
Kappel, D., Habenschuss, S., Legenstein, R. & Maass, W. Network plasticity as bayesian inference. PLoS computational biology 11, e1004485 (2015).
 60.
Muller, L. K. & Indiveri, G. Neural sampling by irregular gating inhibition of spiking neurons and attractor networks. arXiv preprint arXiv:1605.06925 (2017).
 61.
Deneve, S. Bayesian spiking neurons i: inference. Neural computation 20, 91–117 (2008).
 62.
Beck, J. M. et al. Probabilistic population codes for bayesian decision making. Neuron 60, 1142–1152 (2008).
 63.
MorenoBote, R., Knill, D. C. & Pouget, A. Bayesian sampling in visual perception. Proc. Natl. Acad. Sci. 108, 12491–12496 (2011).
 64.
Hamid, N. H., Tang, T. B. & Murray, A. F. Probabilistic neural computing with advanced nanoscale mosfets. Neurocomputing 74, 930–940 (2011).
 65.
Binas, J., Indiveri, G. & Pfeiffer, M. Spiking analog vlsi neuron assemblies as constraint satisfaction problem solvers. In Circuits and Systems (ISCAS), 2016 IEEE International Symposium on, 2094–2097 (IEEE, 2016).
 66.
Sengupta, A., Panda, P., Wijesinghe, P., Kim, Y. & Roy, K. Magnetic tunnel junction mimics stochastic cortical spiking neurons. Sci. reports 6, 30039 (2016).
 67.
Pfeil, T. et al. Six networks on a universal neuromorphic computing substrate. Front. neuroscience 7 (2013).
 68.
Pfeil, T. et al. Effect of heterogeneity on decorrelation mechanisms in spiking neural networks: A neuromorphichardware study. Phys. Rev. X 6, 021023 (2016).
 69.
Hopfield, J. J. Neural networks and physical systems with emergent collective computational abilities. Proc. national academy sciences 79, 2554–2558 (1982).
 70.
Kelly, F. P. Reversibility and stochastic networks (Cambridge University Press, 2011).
 71.
Grytskyy, D., Tetzlaff, T., Diesmann, M. & Helias, M. A unified view on weakly correlated recurrent networks. Front. computational neuroscience 7 (2013).
 72.
LeCun, Y. The MNIST database of handwritten digits (1998).
 73.
Gewaltig, M.O. & Diesmann, M. NEST (NEural Simulation Tool). Scholarpedia 2, 1430, https://doi.org/10.4249/scholarpedia.1430 (2007).
Acknowledgements
This research was supported by the Helmholtz Association portfolio theme SMHB, the Helmholtz Association Initiative and Networking Fund (project number SO092, Advanced Computing Architectures), the Jülich Aachen Research Alliance (JARA) and EU Grants 269921 (BrainScaleS), #604102 and #720270 (Human Brain Project). The authors acknowledge support by the state of BadenWürttemberg through bwHPC and the German Research Foundation (DFG) through grant no INST 39/9631 FUGG. All spiking network simulations carried out with NEST (http://www.nestsimulator.org).
Author information
Affiliations
Contributions
J.J., M.A.P. and T.T. conceived and designed the experiments. J.J. and O.B. performed the simulations. J.J., O.B., M.A.P. and T.T. analyzed the data. J.J., O.B., M.A.P. and T.T. wrote the manuscript. J.J., M.A.P., O.B., J.S., K.M., M.D. and T.T. reviewed the manuscript and approved it for publication.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Jordan, J., Petrovici, M.A., Breitwieser, O. et al. Deterministic networks for probabilistic computing. Sci Rep 9, 18303 (2019). https://doi.org/10.1038/s41598019541377
Received:
Accepted:
Published:
Further reading

Discrete Langevin machine: Bridging the gap between thermodynamic and neuromorphic systems
Physical Review E (2020)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.