Somatodendritic consistency check for temporal feature segmentation

Asabuki, Toshitake; Fukai, Tomoki

doi:10.1038/s41467-020-15367-w

Download PDF

Article
Open access
Published: 25 March 2020

Somatodendritic consistency check for temporal feature segmentation

Toshitake Asabuki¹ &
Tomoki Fukai^1,2,3

Nature Communications volume 11, Article number: 1554 (2020) Cite this article

5738 Accesses
11 Citations
3 Altmetric
Metrics details

Subjects

Abstract

The brain identifies potentially salient features within continuous information streams to process hierarchical temporal events. This requires the compression of information streams, for which effective computational principles are yet to be explored. Backpropagating action potentials can induce synaptic plasticity in the dendrites of cortical pyramidal neurons. By analogy with this effect, we model a self-supervising process that increases the similarity between dendritic and somatic activities where the somatic activity is normalized by a running average. We further show that a family of networks composed of the two-compartment neurons performs a surprisingly wide variety of complex unsupervised learning tasks, including chunking of temporal sequences and the source separation of mixed correlated signals. Common methods applicable to these temporal feature analyses were previously unknown. Our results suggest the powerful ability of neural networks with dendrites to analyze temporal features. This simple neuron model may also be potentially useful in neural engineering applications.

Introducing the Dendrify framework for incorporating dendrites to spiking neural networks

Article Open access 10 January 2023

Temporal dendritic heterogeneity incorporated with spiking neural networks for learning multi-timescale dynamics

Article Open access 04 January 2024

Dendritic excitability controls overdispersion

Article 27 December 2023

Introduction

Cognitive functions of the brain entail modeling of externally or internally driven dynamical processes. For this modeling, the brain has to identify the salient temporal features of continuous information streams. How the brain conducts this time-series analysis remains unknown, but the component processes necessary for the analysis are partly known. The process by which frequently recurring segments of temporal sequences are concatenated into single units that are easy to process is called chunking or bracketing¹. Chunking underlies sensory scene analyses, motor learning, episodic memory, and language processing^2,3,4,5,6. In predictive coding^7,8,9, the brain may chunk information in bottom-up and top-down pathways to identify variables relevant to the hierarchical Bayesian modeling of mental processes. Another important class of temporal feature analysis is blind source separation (BSS: related to the so-called cocktail party effect) in which the brain separates mixed sensory signals (typically auditory) from multiple sources in order to recognize the individual sources¹⁰. Despite their functional importance, the mechanisms by which neural circuits in the brain analyze and learn temporal features remain largely unclear. Whether different temporal feature analyses require specialized network architectures and learning rules is also unknown.

In this study, we introduce a novel solution to these fundamental problems of brain computing. We show, in a two-compartment neuron model, that the minimization of information loss between dendritic synaptic input and a neuron’s own output spike trains enables efficient learning of clustered temporal events in a completely unsupervised manner. This learning proceeds intracellularly and can be viewed as a self-supervising process in which a single neuron (more precisely, the soma) generates an appropriate supervision signal to learn the spatiotemporal firing patterns repeated in upstream neurons (projecting to the dendrites of the neuron). The resultant learning rule conceptually resembles Hebbian learning with backpropagating action potentials, which experimental results^{11,12,13,14,15} have demonstrated to be crucial to synaptic plasticity in cortical neurons. Importantly, our learning rule exploits the fact that neuronal adaptation is able to maintain somatic membrane potential in a regime where spiking has high information content^16,17,18,19. Therefore, the gain and threshold of the somatic transfer function in our model are adapted in a history-dependent manner.

To our surprise, a family of competitive networks of the proposed neuron model can perform a variety of unsupervised learning tasks ranging from chunking to BSS, which were previously performed by specialized, distinct networks and learning rules. Members of this family have the same network architecture but different network parameters (e.g., synaptic weights). We emphasize that some chunking tasks solvable with our model (and also by humans) are difficult for conventional machine learning methods due to uniform transition probabilities between consecutive items⁵. Furthermore, the same network model successfully separates the mixed signals of highly correlated sources, namely musical instruments playing the same note. BSS has been extensively studied in machine learning^20,21,22,23, but how the brain solves this problem is not fully understood. Our results provide suggestions for computational principles which may underlie the wide range of subconscious temporal feature analyses by cortical networks and the active role of dendrites in these processes.

Our algorithm builds on ideas introduced by the two-compartment learning rule of Urbanczik and Senn²⁴, expanding the scope of neural computing towards slow-feature analysis (SFA²⁵) and independent-component analysis (ICA) based on temporal correlations²⁶. A central feature of our learning rule is that synaptic weights on the dendrite are changed such that the somatic membrane potential fluctuates with unit variance around a target value. Our formulation is inspired by the observations that neuronal adaptation shifts the neuron always toward a regime of efficient information transmission^16,17,18,19.

Results

The minimization of regularized information loss

Our model learns temporal features of an input based on a novel learning rule which we call minimization of regularized information loss (MRIL). Suppose the dendrite attempts to predict the responses of soma. In short, MRIL achieves this by minimizing the information loss (within a certain recent period) when the somatic activity is replaced with its model, that is, the dendritic activity driven by given synaptic inputs, the loss can be easily minimized if the somatic responses are well predicted. This will be the case when the neuron learns to selectively respond to temporal patterns recurring in synaptic input. Figure 1a schematically illustrates the present learning rule in a two-compartment spiking neuron model. Mathematically, MRIL minimizes the Kullback–Leibler (KL) divergence between the probability distributions of somatic and dendritic activities (see Methods for mathematical details). Note that in the resultant learning rule the somatic response is fed back to the dendrite to train dendritic synapses. These processes may be regarded as a consistency check between the soma and dendrite. Although the underlying biological mechanisms are not modeled here, backpropagating action potentials may provide such a feedback signal in cortical pyramidal neurons²⁷.

**Fig. 1: Unsupervised learning in two-compartment neurons.**

The division of labor between the soma and dendrite was previously modeled with a teaching signal given explicitly or implicitly to the soma²⁴. Unlike the previous model, our model modulates the gain and threshold of somatic responses according to the recent history of somatic responses. These modulations enable the model to avoid a trivial solution to the learning rule (see Methods), and therefore ensure successful learning of nontrivial temporal features. Differences between the present and previous models will be further discussed later.

Our learning rule (Eq. 16 in Methods) looks similar to maximum likelihood estimation²⁸, a well-studied framework of supervised learning. However, there is a conceptual difference between them. In maximum likelihood estimation, the target data distribution (somatic activity) is provided externally as teaching signals. By contrast, our model simultaneously learns the probability distributions of input and output data without teaching signals. The consistency between the two data sets constrains the self-supervised learning, thereby avoiding an overly redundant or an overly simplistic categorization of temporal inputs. We emphasize that MRIL fits particularly well with neurons with dendrites, but the principle is generic and applicable to a broad range of information processing systems.

Learning patterned temporal inputs in single neurons

We first demonstrate that the two-compartment neuron model detects the salient temporal features recurring in synaptic input. Learning to detect and discriminate repeated temporal input patterns is crucial for various cognitive functions such as language acquisition^29,30 and motor sequence learning^2,3,4,31. In Fig. 1b, presynaptic spike trains intermittently repeated three fixed temporal patterns of 50 ms each with equal probabilities of occurrence. These patterns may be regarded as chunks. As learning of the temporal input proceeds through the consistency check between the soma and dendrite, a single neuron gradually learned to respond selectively to an input pattern (Fig. 1c, d). The neuron learned one of the input patterns with approximately equal probabilities among the trials, although it responded to more than one input pattern in some trials (Fig. 1e). We note that all presynaptic neurons had the same average firing rates, which were constant during the entire task period (Methods). Therefore, the discrimination does not rely on differences in firing rates. Cortical neurons are actually capable of discriminating temporal inputs and generating sequence-selective spike outputs, although the synaptic sequences tested in the experiment were relatively simple³².

Automatic chunking with MRIL and inhibitory STDP

Next, we considered a competitive network of the two-compartment model neurons receiving similar presynaptic spike trains (Fig. 2a). To study whether chunk-specific cell assemblies can be formed, we made recurrent inhibitory connections among these neurons modifiable by inhibitory spike timing-dependent plasticity (iSTDP; Fig. 2b). For near synchronous presynaptic and postsynaptic spikes, changes in inhibitory weights are negative in our iSTDP rule. Consequently, this rule weakens inhibition between two neurons when both of them respond to the same temporal feature, as shown below. The use of this plasticity rule for lateral inhibition is realistic given this type of STDP has been found at cortical excitatory synapses on inhibitory interneurons³³ and at inhibitory synapses in the hippocampus³⁴. In either case, inhibitory circuits will exhibit the desired changes. Note that inhibitory weights were restricted in the positive regime (Methods). During learning, each neuron gradually increased coherence between the somatic and dendritic activities (Fig. 2c). The postsynaptic neurons self-organized into three neuron ensembles, each detecting one of the input activity patterns (Fig. 2d), through iSTDP which enabled mutual inhibition between the neural ensembles (Fig. 2e). The strength of lateral inhibition needs to be within an appropriate range, as too strong (Supplementary Fig. 1a) or too weak (Supplementary Fig. 1b) inhibition failed to generate chunk-specific cell assemblies. The regularization parameter γ (see Methods) also has to be in an appropriate range, as values which were too large suppressed all neural responses and those which were too small did not generate selective responses to chunks (Supplementary Fig. 1c).

**Fig. 2: Formation of temporal feature-specific cell assemblies.**

Weights of mutual inhibition were strengthened rather than weakened when a neuron pair fired synchronously in several previous models^35,36. We therefore tested whether and how the conventional iSTDP rule works in the above chunk-detection task (Supplementary Fig. 2a). The conventional iSTDP rule generated variety of complex response patterns (Supplementary Fig. 2b). A small portion of neurons expressed chunk-specific responses (e.g., neurons 3, 4 and 8). However, some neurons responded to more than one chunk (e.g., neurons 1 and 10) and other neurons to chunks and random inputs almost arbitrarily (e.g., neuron 5). Inhibitory weight matrix also showed no obvious cell-assembly structure (Supplementary Fig. 2c). Therefore, the iSTDP rule shown in Fig. 2a is thought to be more suitable than the conventional one for the present chunk-detection task.

The ability of the network model to learn recurring input patterns was assessed with various types of biological noise. Background presynaptic spikes degraded the performance as the signal-to-noise ratio decreased (Fig. 2f), whereas learning was optimal at finite noise levels with synaptic transmission failure (Fig. 2g) and with jitters in presynaptic spike timing (Fig. 2h). We speculate that this disparity may reflect the different underlying noise structures. Background spikes were uncorrelated with the recurring input patterns and merely contaminated the signals, whereas transmission failures and timing jitters yielded noise patterns which were correlated with the input and thus enhanced the sampling during training. Therefore, the two types of noise are thought to induce data augmentation. Presynaptic noise may also induce a regularization effect during learning³⁷. However, this effect was unlikely to be prominent in our model as not all types of presynaptic noise improved the learning.

The above results may account for the perceptual ability of humans to detect the recurrence of frozen noise patterns embedded in a noisy auditory signal³⁸. As in Fig. 1b, both repeated and background auditory signals may be represented by irregular synaptic inputs to the auditory cortex. However, the subjects from this report³⁸ learned the noise without extensive training, indicating that the learning mechanisms might differ from the method presented here.

We may use the present network model in analyzing large-scale neural activity data. To show this, we performed similar simulations using synthetic data in which only a small fraction of presynaptic neurons (from a total 500) constituted a recurring pattern (Fig. 3a). (We note it is unlikely a large portion of recorded neurons participate in recurring cell assemblies in real data.) Learning was successful when the fraction of presynaptic neurons constituting the recurring pattern was 10% or 5%, but unsuccessful at 3% (Fig. 3b, c). We then considered the case where the total number of presynaptic neurons was 1,000 and 25 neurons (2.5% of all neurons) belonged to a patterned activity. Interestingly, the network still succeeded to learn the pattern, indicating that successful learning requires a minimal absolute number, but not a minimal fraction, of pattern-encoding presynaptic neurons (Fig. 3d).

**Fig. 3: Detection of cell assembly patterns from neural population data.**

Previously, STDP was used to detect repeated spike sequences^39,40. We compared the detection performance between the present model and a STDP-based model³⁹ (see Supplementary Fig. 3a, b). Both models exhibited high success rates when recurring cell assemblies made up a large portion of presynaptic neurons. An interesting difference was found when only a small portion of presynaptic neurons participated in the cell assemblies. In such cases, our model outperformed the previous model (Supplementary Fig. 3c).

We further examined the ability of our network model in learning a variety of information streams. First, we applied random sequences of three chunks comprised of four characters each (Fig. 4a) to a network model with 10 output neurons and 1000 input neurons. Each input neuron generated a 30 ms 10 Hz burst in response to a randomly assigned preferred character (Fig. 4b). This resulted in the formation of three neuron ensembles which selectively responded to the chunks (Fig. 4c). Principal-component analysis of the low-dimensional dynamics of the output neurons revealed the emergence of three chunks after learning (Fig. 4d). Then, we examined whether the model can learn partially overlapping chunks. In this case, some characters were shared between the three chunks (Fig. 4e) and learning was more difficult than in the previous case. The original model, with fast synaptic current, failed to generate selective responses to the chunks (Fig. 4f). However, making the decay constant of the synaptic current slower (50 ms compared to 5 ms: see Methods) enabled the model to detect temporal inputs on a longer timescale and to successfully learn the overlapping chunks (Fig. 4g). The modified network could also learn chunks even if they were embedded with distractors, which were random sequences of arbitrary English characters (a to z) with variable lengths (3 to 7) (Supplementary Fig. 4). These results suggest slower synaptic currents such as NMDA receptor-mediated currents may be important for chunking.

**Fig. 4: Segmentation and concatenation of various sequences.**

Because the word segmentation shown above is also relatively easy for other methods⁴¹, we tested our model with more complex input sequences generated by a random walk on a graph with a community structure in which the connection of each node to the other four occurred with an equal probability of 0.25 (Fig. 5a). Here, temporal community is clusters of frequently co-appearing or mutually predicting stimuli in input sequence. The detection of this community structure is easy for human subjects but has proven difficult for conventional machine learning methods which rely on nonuniform transition probabilities between elements⁵. Like human subjects, output neurons in our model easily learned selective responses to members of a temporal community (Fig. 5b).

**Fig. 5: Detection of temporal community.**

The network model could also learn feature detection maps from continuous sensory streams. All sensory features, either static or dynamic, arrive at the brain essentially in sequence. Therefore, we asked whether MRIL enables neural networks to learn the static features of an input when repeatedly presented in a temporal sequence. To examine this, we applied a random sequence of noisy images of oriented bars presented for 40 ms every 70 ms (Fig. 6a). The output neurons, which initially had no preferred orientations (Fig. 6b), developed well-defined preferences for specific orientations after learning (Fig. 6c), resembling a visual orientation map (Fig. 6d)^42,43.

**Fig. 6: Learning an orientation tuning map.**

BSS of mutually correlated signals

The results shown above demonstrate that MRIL successfully chunks a variety of temporal inputs by detecting repeated temporal features. The question then arises whether this ability of the MRIL enables learning of other types of sequence processing tasks. One such task of cognitive and ecological importance is the so-called cocktail party problem¹⁰. We therefore examined the performance of our network model in the blind separation of mixed signals from multiple sources. BSS is an extensively studied problem in auditory processing^20,21,22, and various methods have been proposed for mixtures of mutually independent signals. However, methods are limited when the original signals are comprised of mutually correlated signals²³.

We applied MRIL to sound mixtures from two musical instruments, a bassoon and a clarinet (Bach10 Dataset)⁴⁴, playing their respective parts of the same score (Fig. 7a) (thus the two sound sources are correlated). A mixed sound followed by the original sounds of the two instruments are presented in Supplementary Audio 1. These mixtures of signals were encoded into irregular spike trains (Fig. 7b), which in turn were applied to output neurons. After training, these neurons self-organized into two groups, each responding selectively to one of the true sources (Fig. 7c). The original sounds were then decoded from the average firing rates of these subgroups (Supplementary Audios 2 and 3). Although some high-frequency components were lost due to the low-pass filtering effect of the slow membrane dynamics (Supplementary Fig. 5), the decoded sounds are readily comparable to the original sounds. We compared our model with a naive independent-component analysis (FastICA: Supplementary Audio 4)^22,45 and temporal ICA (Second Order Blind Identification or SOBI: Supplementary Audios 5 and 6)²⁶. We used the open source software of the SOBI for the comparison (Supplementary Methods). When the source signals are mutually independent, all three methods show excellent performance, although the ICA-based methods slightly outperformed our biology-inspired model (Fig. 7d, top). However, when the source signals are dependent on one another (i.e., mutually correlated), SOBI and our model exhibited significantly better performance than FastICA (Fig. 7d, bottom).

**Fig. 7: BSS of correlated auditory streams.**

Although SOBI slightly outperformed our model in the present examples, SOBI only poorly performed chunking of the previous sequences of English characters, which our model could easily solve (see Fig. 4). In our simulations, SOBI could not generate highly chunk-selective responses (Supplementary Fig. 6a). Rather, most of the units responded to all three chunks in SOBI. We conducted similar analyses for low-pass filtered versions of the input by using different time constants for coarse graining (15, 30 and 50 ms) or the bin width (1 ms or 10 ms), but the essential results remained unchanged (Supplementary Fig. 6b). We also examined SFA, a method known in temporal feature analysis²⁵, on a similar task by using a Python toolkit⁴⁶. The algorithm failed to generate any stable output when input sequences involved chunks. Detecting a whole chunk and detecting an arbitrary single character cost equally in the objective function of SFA (Methods). Due to this fact, the minimization algorithm of SFA presumably has too many solutions to chunking. Thus, our results demonstrate a virtue of the present brain-inspired model, which exhibits high levels of task performance in a wide range of temporal feature analysis. In addition, the model does not require highly task-specific network architectures.

Finally, we examined the performance of the model by varying the magnitudes of cross-talk noise between the two mixed signals (Methods). We also tested mixed signals which used the same instrument but playing different notes. In all cases, high performance was attained only at an intermediate level of cross-talk noise, implying that performance drops not only for strong noise but also for weak noise (Fig. 8a, dashed curves). Nevertheless, we could rescue the model from this counterintuitive defect for weak cross-talk noise by including another noise component (see Methods) in the somatodendritic interaction (Fig. 8a, solid curves). We speculate that the additional noise could suppress learning from harmful interferences between the original signals when both signals were weak. However, this point requires further clarification. We also examined whether the improved model trained on the original signals (i.e., vanishing cross-talk noise) exhibit better performance for other mixtures which were not used in the training. The pre-training actually made the decomposition of unexperienced mixtures easier (Fig. 8b).

**Fig. 8: Robustness of performance in BSS.**

Discussion

Nonlinear Hebbian and generalized STDP algorithms have been used as unsupervised learning rules to perform receptive field development^42,43, ICA^47,48,49, sparse coding⁴³, spatio-temporal pattern detection^39,40, or SFA⁵⁰. Our novel algorithm belongs to the same family of methods and is applicable to some classic problems of receptive field development and ICA as well as to the additional problem of ‘chunking’ as an example task with specific temporal structure that has traditionally been solved with more specialized algorithms^51,52.

We proposed a learning principle called minimization of regularized information loss (MRIL) which enables the self-supervised learning of recurring temporal features in information streams using a family of competitive networks of two-compartment neuron models. Our model not only performs chunking but also achieves BSS from mixtures of mutually correlated signals. Importantly, although different values of parameters were learned in different tasks, the network structure was essentially the same. It is surprising that simple such neural networks with almost identical circuit structures can perform these broadly different tasks. In particular, our brain-inspired model can solve tasks, e.g., the detection of temporal community structure (Fig. 5) and the BSS of mixed correlated signals (Fig. 7), which conventional models have historically struggled with. To our knowledge, this is the first model to achieve such results on this broad collection of learning tasks.

Our learning rule minimizes the information loss between synaptically-driven dendritic activity and somatic output in the presence of neuronal adaptation. This rule uses mutually inhibiting two-compartment neurons to learn the repetition of temporal activity patterns on a slow timescale (typically, several tens to several hundreds of milliseconds). While the aim of many previous methods for chunking is to predict input sequences^53,54, our model uses a different principle, where system learns to predict its own responses to a given input. MRIL minimizes the discrepancy between input data and output data to produce a predictable low-dimensional representation of high-dimensional input data. This learning continues until an agreement is formed by the somatic output and dendritic input regarding the low-dimensional features (i.e., chunks).

We previously used paired reservoir computing for chunking, where two recurrent networks supervise each other to mimic the partner’s responses to a common temporal input⁵⁵. Although that model also learns self-consistency between input data and output data, performance was severely limited since the model required exactly the same number of output neurons as chunks. In contrast, the present model self-organizes output neurons according to the number of temporal features.

Mutual information maximization (MIM) has often been hypothesized to describe the transfer of information between neurons⁵⁶, and Hebbian synaptic plasticity may approximately follow MIM⁵⁷. The aim of MRIL differs from MIM; MRIL attempts to detect recurrence, and hence salient, temporal features without considering the other information, whereas MIM ultimately implies that messages are faithfully copied at all layers of hierarchical processing. In other words, MIM does not account for the compression or abstraction of temporal inputs, whereas MRIL aims to describe how these processes may be executed in the brain and incorporates them into the model. Our results suggest such processes can even occur at the level of single cortical neurons.

Similarly to MRIL, a method called information bottleneck also compresses data streams⁵⁸. The method contains a free parameter to determine the degree of information loss between the original and compressed data. To clarify whether there is a relationship between information bottleneck and the proposed method is an intriguing open question.

A previous model (U-S model)²⁴ used a learning rule similar to the present one. However, while the somatic response function undergoes activity-history-dependent modulations in our model (see Eqs. 4–7), such modulations were not included in the U-S model. Importantly, our model without these modifications (i.e., the U-S model) could not solve the present unsupervised learning tasks. Networks of the U-S model were shown to perform semi-unsupervised learning, for instance, when recurrent synaptic input was configured as an effective teaching signal to the soma. In contrast, our model indicates the recent history of somatic activity is sufficient for self-supervising the learning of temporal features. We note that the somatic response modifications introduced in this model may be achieved in cortical neurons by local inhibitory circuits⁵⁹, the plasticity of intrinsic excitability⁶⁰ or neuronal adaptation^16,17,18,19.

Dendritic computing has been studied from various viewpoints of neural computing. Memmesheimer et al. derived the capacity of leaky integrate-and-fire neurons to implement desired transformations from streams of input spikes into desired output spike sequences⁶¹. The capacity was estimated by calculating the available volume of state space for generating the desired spike outputs and an error-correcting supervised learning rule was presented to attain the desired input-output associations (which does not require dendrites). Legenstein and Maass studied the role of nonlinear dendritic processing in performing various logic operations⁶². Their model combines the branch-strength potentiation of dendrites and STDP to discriminate spatial activity patterns represented in presynaptic neuron ensembles. Sacramento et al. used dendrites to implement a classical error backpropagation algorithm for supervised learning where deviations between top-down predictive signals and bottom-up sensory signals provided an error signal⁶³. Redundant synaptic connections between neuron pairs have also been utilized to implement a Bayesian filtering algorithm to infer input-output associations in single neurons with realistic dendritic morphology⁶⁴. If such a model includes both the Hebbian learning of synaptic weights and structural plasticity on the dendrites, a small number of redundant synapses is sufficient for an optimal inference. All of the models of dendritic processing discussed here are biologically more realistic compared to the present model, yet they did not address the ability of neurons with dendrites in analyzing temporal features of information streams.

On the other hand, memory-related sequential activities of hippocampal neurons were modeled in terms of nonlinear amplification of synchronous inputs⁶⁵. Furthermore, the discrimination of sequences on behavioral time scales was recently formulated in terms of the reaction-diffusion processes triggered by sequential inputs along dendrites⁶⁶. While these processes were implemented in morphologically realistic neuron models, whether such models can perform complex temporal feature analyses is yet to be clarified. Hawkins and Ahmad modeled sequence processing in a cortical microcircuit model of formal neurons, each of which receives top-down feedback inputs on apical synapses, feedforward inputs on proximal synapses and lateral inputs from nearby neurons on multiple dendritic segments⁶⁷. Through coincidence detection and segment-basis Hebbian learning, the network learns to recognize sparse activity patterns and to predict next spikes in an input sequence. While their model emphasizes the role of dendrites and cortical microcircuit structure in predicting spike sequences, our model demonstrates the ability of single neurons with dendrites to learn recurring temporal input patterns.

Determining which neuron or synapse should be credited for learning a desired output in a hierarchical neural circuit is a difficult problem. Solutions to this ‘credit assignment problem’ require feedback signals to neurons or synapses. In cortical pyramidal neurons, feedforward sensory data is thought to be received at the basal dendrites while feedback credit information is received at apical dendrites. It was recently argued that the spatial separation between the two pathways enables these neurons to solve the credit assignment problem through dendritic computing⁶⁸. The current version of our model does not solve the credit assignment problem, and this problem arises on multiple timescales in hierarchical brain computation. How morphologically complex neurons implement the proposed temporal feature analysis and how this analysis helps the brain to solve hierarchically organized credit assignment problems are intriguing open questions.

Methods

Neural network model

Each output neuron has two compartments—somatic and dendritic. The dendritic membrane potential of output neuron $i \in \left\{ {1,2, \ldots ,N_{{\mathrm{out}}}} \right\}$ is calculated as

$$\begin{array}{*{20}{c}} {v_i\left( t \right) = \mathop {\sum}\limits_j {w_{ij}} e_j\left( t \right),} \end{array}$$

(1)

where w_ij is the synaptic weight between output neuron i and input neuron j. The variable e_j stands for the unit postsynaptic potential induced by neuron j and is described later. The somatic activity integrates the dendritic potential, and it evolves as

$$\begin{array}{*{20}{c}} {\dot u_i\left( t \right) = - \frac{1}{\tau }u_i\left( t \right) + g_{\mathrm{D}}\left[ { - u_i\left( t \right) + v_i\left( t \right)} \right] - \mathop {\sum}\limits_j {G_{ij}} \phi ^{{\mathrm{som}}}( {u_j( t )} )/\phi _0,} \end{array}$$

(2)

where τ = 15 ms and the conductance between the two compartments is g_D = 0.7. The last term describes lateral inhibition with synaptic weights G_ij (≥0). We calculated the inhibitory input in terms of the firing rates of output neurons. However, as explained below, spike trains of these neurons were also generated for simulating modifications of G_ij by spike-timing-dependent plasticity. We assume that the soma of neuron i generates a Poisson spike train with the instantaneous firing rate $\phi _i^{{\mathrm{som}}}\left( {u_i\left( t \right)} \right)$ in terms of the nonlinear response function

$$\phi _i^{{\mathrm{som}}}\left( {u_i} \right) = \phi _0\left[ {1 + \exp \left( {\beta _i\left( { - u_i + \theta _i} \right)} \right)} \right]^{ - 1}.$$

(3)

The parameters β_i and θ_i are defined as follows:

$$\beta _i = \sigma _i\left( t \right)^{ - 1}\beta _0,$$

(4)

$$\theta _i = \mu _i\left( t \right) + \sigma _i\left( t \right)\theta _0,$$

(5)

where μ_i(t) and σ_i(t) are the mean and variance of the membrane potential, respectively, over a sufficiently long period t₀:

$$\mu _i\left( t \right) = \frac{1}{{t_0}}\int _{t - t_0}^t {u_i} \left( {t^\prime } \right)dt^\prime ,$$

(6)

$$\sigma _i\left( t \right) = \sqrt {\frac{1}{{t_0}} \int _{t - t_0}^t {u_i} \left( {t^\prime } \right)^2dt^\prime - \mu _i\left( t \right)^2} .$$

(7)

We set β₀ = 5 throughout this study, but the values of ϕ₀ and θ₀ are task-depend (Supplementary Methods).

We note that the slope of nonlinearity β_i and the threshold value θ_i are modified as the values of μ_i and σ_i change during learning. As described below, the online modifications of the somatic response function maintain the dynamic range of output firing rate within a certain range. To see this, we use Eqs. (4) and (5) to obtain

$$\phi _i^{{\mathrm{som}}}\left( {u_i} \right)/\phi _0 = \left[ {1 + \exp \left( {\beta _0\left( { - \hat u_i + \theta _0} \right)} \right)} \right]^{ - 1} = \hat \phi \left( {\hat u_i} \right)/\phi _0,$$

where $\hat \phi \left( x \right) = \phi _0\left[ {1 + \exp \left( {\beta _0\left( { - x + \theta _0} \right)} \right)} \right]^{ - 1}$ and $\hat u_i\left( t \right) \equiv \left( {u_i\left( t \right) - \mu _i\left( t \right)} \right)/\sigma _i\left( t \right)$. As the mean of $\hat u_i\left( t \right)$ is constrained to be zero, the above equation implies that $\phi _i^{{\mathrm{som}}}\left( {u_i} \right)/\phi _0$ is also constrained around $\left[ {1 + e^{\beta _0\theta _0}} \right]^{ - 1}$ with fluctuations of O(1). Thus, the somatic activity does not saturate.

In our model, sensory information given to the network is first encoded into Poisson spike trains. Input neuron $i \in \left\{ {1,2, \ldots ,N_{{\mathrm{in}}}} \right\}$ generates a Poisson spike train

$$\begin{array}{*{20}{c}} {X_i\left( t \right) = \mathop {\sum}\limits_q \delta ( {t - t_{i,q}}),} \end{array}$$

(8)

where δ is the Dirac’ delta function and t_i,q denotes the time of the q-th spike of input neuron i. The presynaptic spikes induce the following synaptic current I_i(t):

$$\begin{array}{*{20}{c}} {\tau _{{\mathrm{syn}}}\dot I_i = - I_i + \frac{1}{\tau }X_i,} \end{array}$$

(9)

where the synaptic time constant τ_syn = 5 ms (τ_syn = 50 ms in Fig. 4g and Supplementary Fig. 4c). The synaptic currents in turn evoke a postsynaptic potential e_i(t) as

$$\begin{array}{*{20}{c}} {\dot e_i = - \frac{{e_i}}{\tau } + e_0I_i.} \end{array}$$

(10)

The unit amplitude of postsynaptic potentials is given as e₀ = 25 in all simulations.

Optimal learning rule for MRIL

To extract the characteristic features of the temporal input, our model compresses the high dimensional data carried by the input sequence onto a low dimensional manifold of neural dynamics. The model performs this by modifying the weights of dendritic synapses to minimize the time-averaged mismatch between the somatic and dendritic activities over a certain interval [0,T]. In a stationary state, the somatic membrane potential u_i(t) of a two-compartment model can be described as an attenuated version $v_i^ \ast \left( t \right)$ of the dendritic membrane potential with an attenuation factor α = g_D/(g_D + g_L), where g_L = τ⁻¹²⁴. Though we deal with time-dependent stimuli in our model, we compare the attenuated dendritic membrane potential with the somatic membrane potential at each time point. This comparison, however, is not drawn directly on the level of the membrane potentials but on the level of the two Poissonian spike distributions with rates $\phi _i^{{\mathrm{som}}}(u(t))$ and $\hat \phi \left( {v_i^ \ast \left( t \right)} \right)$, respectively, which would be generated if both soma and dendrite were able to emit spikes independently. The function $\hat \phi \left( {v_i^ \ast \left( t \right)} \right)$ can also be regarded as a nonlinear-filtered version of the attenuated dendritic membrane potential⁶⁹.

Explicitly representing the dependency of u_i and $v_i^ \ast$ on X, we define the cost function for synaptic weights w as

$$E\left( {\mathbf{w}} \right) = \int _{\Omega _{\mathbf{X}}} {dXP^ \ast } \left( {\mathbf{X}} \right)\int _0^T {dt} \mathop {\sum}\limits_i {{\mathrm{D}}_{{\mathrm{KL}}}} \left[ {\phi _i^{{\mathrm{som}}}\left( {u_i\left( {t;{\mathbf{X}}} \right)} \right)\phi ^{{\mathrm{dend}}}( {v_i^ \ast ( {t;{\mathbf{X}}})})} \right],$$

(11)

where P^*(X) stands for the true distribution of input spike trains, Ω_X for the space spanned by all possible combinations of input spike trains, and D_KL for the KL-divergence between the two Poisson distributions:

$$\begin{array}{*{20}{l}} {{\mathrm{D}}_{{\mathrm{KL}}}\left[ {\phi _i^{{\mathrm{som}}}\left( {u_i\left( {t;{\mathbf{X}}} \right)} \right)\phi ^{{\mathrm{dend}}}\left( {v_i^ \ast \left( {t;{\mathbf{X}}} \right)} \right)} \right]} \hfill \\ \qquad {\!} { \equiv \phi _i^{{\mathrm{som}}}\left( {u_i\left( {t;{\mathbf{X}}} \right)} \right){\mathrm{log}}\frac{{\phi _i^{{\mathrm{som}}}\left( {u_i\left( {t;{\mathbf{X}}} \right)} \right)}}{{\phi ^{{\mathrm{dend}}}\left( {v_i^ \ast \left( {t;{\mathbf{X}}} \right)} \right)}} + \phi ^{{\mathrm{dend}}}\left( {v_i^ \ast \left( {t;{\mathbf{X}}} \right)} \right)} \hfill \\ \qquad \quad{ - \, \phi _i^{{\mathrm{som}}}\left( {u_i\left( {t;{\mathbf{X}}} \right)} \right)} \hfill \end{array}$$

with $\phi ^{{\mathrm{dend}}}\left( x \right) = \phi _0\left[ {1 + {\mathrm{exp}}\left( {\beta _0\left( { - x + \theta _0} \right)} \right)} \right]^{ - 1}$. Note that unlike the somatic response function $\phi _i^{{\mathrm{som}}}$, of which the values of β_i and θ_i are neuron-dependent, the function ϕ^dend is common to all neurons.

We minimize the cost function (i.e., the averaged KL-divergence) with respect to w such that the responses of the two compartments become consistent with each other. Thus, the unsupervised learning rule of somatodendritic consistency check resolves the discrepancy between the somatic and dendritic responses to temporal input. Similar to reference²⁴, we search for the optimal weight matrix by gradient descent as

$${\Delta w_{ij}} \propto { - \frac{\partial }{{\partial w_{ij}}}E} \\ {} {\!\!}\quad\quad = - \frac{\partial }{{\partial w_{ij}}}\int _{{\mathrm{\Omega }}_{\mathbf{X}}} {d{\mathbf{X}}} P^ \ast \left( {\mathbf{X}} \right)\int _0^T {dt} \mathop {\sum}\limits_{i\prime } {{\mathrm{D}}_{{\mathrm{KL}}}} \left[ {\phi _{i\prime }^{{\mathrm{som}}}\left( {u_{i\prime }\left( {t;{\mathbf{X}}} \right)} \right)\phi ^{{\mathrm{dend}}}( {v_{i\prime }^ \ast ( {t;{\mathbf{X}}} )} )} \right] \\ {} {\!\!}\quad\quad= \, -\, \int _{{\mathrm{\Omega }}_{\mathbf{X}}} {d{\mathbf{X}}} P^ \ast \left( {\mathbf{X}} \right) \int _0^T {dt} \frac{\partial }{{\partial w_{ij}}}\bigg[ \phi _i^{{\mathrm{som}}}\left( {u_i\left( {t;{\mathbf{X}}} \right)} \right)\log \frac{{\phi _i^{{\mathrm{som}}}\left( {u_i\left( {t;{\mathbf{X}}} \right)} \right)}}{{\phi ^{{\mathrm{dend}}}\left( {v_i^ \ast \left( {t;{\mathbf{X}}} \right)} \right)}} { \, + \, \phi ^{{\mathrm{dend}}}( {v_i^ \ast ( {t;{\mathbf{X}}} )} ) - \phi _i^{{\mathrm{som}}}\left( {u_i\left( {t;{\mathbf{X}}} \right)} \right)} \bigg] \\ {} {\!\!}\quad\quad= \, \int _{{\mathrm{\Omega }}_{\mathbf{X}}} {d{\mathbf{X}}} P^ \ast \left( {\mathbf{X}} \right)\int _0^T {dt} \frac{{\partial \log \left( {\phi ^{{\mathrm{dend}}}\left( {v_i^ \ast \left( {t;{\mathbf{X}}} \right)} \right)} \right)}}{{\partial w_{ij}}}\left[ {\phi _i^{{\mathrm{som}}}\left( {u_i\left( {t;{\mathbf{X}}} \right)} \right) - \phi ^{{\mathrm{dend}}}( {v_i^ \ast ( {t;{\mathbf{X}}})})} \right]$$

(13)

Note that the identity $d\phi ^{{\mathrm{dend}}}\left( x \right)/dx = \phi ^{{\mathrm{dend}}}\left( x \right)d{\mathrm{log}}\phi ^{{\mathrm{dend}}}\left( x \right)/dx$ was used in deriving the last expression. Since $v_i^ \ast \left( t \right) = \alpha \sum _j w_{ij}e_j\left( t \right)$, the local learning rule is written in a vector form as

$${\Delta {\mathbf{w}}_i \propto \int _{\Omega _{\mathbf{X}}} {d{\mathbf{X}}} P^ \ast \left( {\mathbf{X}} \right) \int _0^T {dt} \psi ( {v_i^ \ast ( {t;{\mathbf{X}}})})\left[ {\phi _i^{{\mathrm{som}}}\left( {u_i\left( {t;{\mathbf{X}}} \right)} \right) - \phi ^{{\mathrm{dend}}}( {v_i^ \ast ( {t;{\mathbf{X}}} )} )} \right]e\left( {t;{\mathbf{X}}} \right),}$$

(14)

where ${\mathbf{w}}_i = \big[ {w_{i1}, \cdots w_{iN_{{\mathrm{in}}}}} \big]$ and the function ψ(x) is defined as

$$\begin{array}{*{20}{c}} {\psi \left( x \right) = \frac{d}{{dx}}\log \left( {\phi ^{{\mathrm{dend}}}\left( x \right)} \right).} \end{array}$$

(15)

Note that the i-dependence of Δw_i arises in our network model from activity-dependent modifications of recurrent inhibitory connections among output neurons (see Eq. 2). The inhibitory connections are modifiable by STDP (see Fig. 2b).

In all simulations, we added the regularization term −γw_i to Eq. (14) to prevent the diverging growth of synaptic weights. Thus, the following online learning rule was used:

$$\mathop {{\mathbf{w}}}\nolimits^. _i\left( t \right) = \eta \left\{ {\psi ( {v_i^ \ast ( t )})\left[ {\left\{ {\phi _i^{{\mathrm{som}}}\left( {u_i\left( t \right)} \right) - \phi ^{{\mathrm{dend}}}( {v_i^ \ast ( t )} )} \right\}/\phi _0} \right]{\mathbf{e}}\left( t \right) - \gamma {\mathbf{w}}_i} \right\},$$

(16)

where η is the learning rate. The parameter γ controls the strength of regularization and was adjusted in a task-dependent manner. The initial values of w were generated by a Gaussian distribution with mean zero and standard deviation $1/\sqrt {N_{{\mathrm{in}}}} .$ Note that the above learning rule coincides with the Bienenstock-Cooper-Munro (BCM) theory except for a sign difference⁷⁰. In BCM theory, the threshold between potentiation and depression is an unstable fixed point while in our model this point is a stable fixed point. However, as shown previously, the online modifications given in Eqs. (4)–(7) prevent the function $\phi _i^{{\mathrm{som}}}\left( {u_i\left( t \right)} \right)$ from coinciding with $\phi ^{{\mathrm{dend}}}\left( {v_i^ \ast \left( t \right)} \right)$. This in turn prevents a trivial fixed point w = 0 of Eq. (16). We note that the online modifications of somatic response function play a similar role to the standardization method to avoid a trivial solution in the SFA of temporal input²⁵.

A similar learning rule to Eq. (16) was previously considered in a supervised learning model in which the average surprise of somatic spike output driven by dendritic synaptic input and a teaching signal given to the soma was minimized²⁴. In this analogy, our learning rule may be interpreted as self-consistent surprise minimization in which the teaching signal itself is provided by the somatic response to make the learning rule for two-compartment neurons unsupervised. This summarizes the essential difference between our model and the previous model.

Improved learning rule with additional noise

In Fig. 8, we included an additional noise term at each time step of learning as follows:

$$\mathop {{\mathbf{w}}}\nolimits^. _i\left( t \right) = \eta \left\{ {\psi ( {v_i^ \ast ( t )} )\left[ {\left\{ {f(\phi _i^{{\mathrm{som}}} + \phi _0g\xi _i) - \phi ^{{\mathrm{dend}}}( {v_i^ \ast ( t )} )} \right\}/\phi _0} \right]{\mathbf{e}}\left( t \right) - \gamma {\mathbf{w}}_i} \right\},$$

(17)

where ξ_i is a random variable obeying a normal distribution. The parameter g controls the strength of the noise, and we set g = 0.6 in Fig. 8. The piecewise linear function f is defined as

$$f\left( x \right) = \left\{ {\begin{array}{*{20}{l}} {0\,x \, < \, 0} \hfill \\ {x\,0 \, \le \, x \, < \, \phi _0.} \hfill \\ {\phi _{0\,}x \ge \phi _0} \hfill \end{array}} \right.$$

(18)

Negative signals should be eliminated to suppress the learning during noise-dominant epochs.

Inhibitory plasticity

We modified lateral inhibitory connections through a symmetric anti-Hebbian STDP: if a pair of presynaptic and postsynaptic spikes occur at the times t_pre and t_post, respectively, the weight changes were calculated as

$$\Delta G_{ij} = C_{\mathrm{p}}\exp \left( { - \frac{{| {t_{{\mathrm{pre}}} - t_{{\mathrm{post}}}} |}}{{\tau _{\mathrm{p}}}}} \right) - C_{\mathrm{d}}\exp \Bigg( { - \frac{{| {t_{{\mathrm{pre}}} - t_{{\mathrm{post}}}} |}}{{\tau _{\mathrm{d}}}}} \Bigg),$$

(19)

where τ_p and τ_d are the decay constants of LTP and LTD, respectively^35,36. Typically, τ_p = 40 ms, τ_d = 20 ms, $C_{\mathrm{p}} = 0.00525$ and $C_{\mathrm{d}} = 0.0105.$ Inhibitory weights G_ij were modified between zero and an upper bound $G_{{\mathrm{max}}}( \propto 1/\sqrt {N_{{\mathrm{out}}}} )$.

Evaluation of the degree of independency between signals

ICA was not valid for the auditory signals used for the simulations of BSS. This was because the signals were not independent. In addition to the standard correlations between two analog signals, negentropy (≥0) was used to evaluate the independency of signals. Negentropy measures the deviation of a target distribution from a Gaussian distribution: negentropy vanishes if the target distribution is Gaussian but otherwise takes a positive value; the larger the deviation is, the larger the value of negentropy is. The calculation of negentropy J(Y) for the statistical variable Y requires the true distribution, but it is unknown in the present study. Therefore, we made the following approximation in the evaluation of J(Y) using a function Q:

$$J\left( Y \right) \propto \left[ {E\left( {Q\left( Y \right)} \right)-E\left( {Q\left( \rho \right)} \right)} \right]^2,$$

(20)

where E(x) refers to the expectation value of x and ρ obeys a Gaussian distribution. Typically, the logarithm of hyperbolic cosine function is used for Q⁴⁹:

$$Q(u) = \frac{1}{a}\log \cosh (au),$$

(21)

where $1 \le a \le 2$. In this study, we set as a = 1.

Cross-talk noise

In Fig. 8, we mixed the original signals X₁(t) and X₂(t) as follows:

$$\left( {\begin{array}{*{20}{c}} {{\mathrm{cos}}\theta } & {{\mathrm{sin}}\theta } \\ {{\mathrm{sin}}\theta } & {{\mathrm{cos}}\theta } \end{array}} \right)\left( {\begin{array}{*{20}{c}} {X_1\left( t \right)} \\ {X_2\left( t \right)} \end{array}} \right).$$

(22)

Then, the cross-talk noise between the two mixed signals was defined as tanθ. The mixed signals coincide with the original signals at θ = 0, while the two mixtures are identical at $\theta = \frac{{\uppi }}{4}$ and BSS is a single-source separation problem.

Chunking of character sequences by SOBI and SFA

In Supplementary Fig. 6a, we applied SOBI to the same sequential input as used in Fig. 5a. In the simulations of SOBI, the number of input units was set equal to the number of characters in the sequence, and each unit takes the value 1 when the corresponding character appears in input and 0 otherwise. In (B), we low-pass filtered a raw input with the time constant of 50 ms, and then resampled the filtered input with the time step of 10 ms priori to the application of SOBI. We also employed different values of the time constant (15 and 30 ms) and time step (1 ms), but these modifications did not change the essential results.

Denoting the observed time-series data at time t and the output of the j-th unit as X_t and $y_{j,t} = g_j({\mathbf{X}}_t)$, respectively, we can describe the outline of SFA as follows. The objective of SFA is to minimize the following quantity (Δ-value):

$$\Delta (y_{j,t}) \equiv \left\langle {\dot y_{j,t}^2} \right\rangle _t,$$

(23)

where $\left\langle \cdot \right\rangle _t$ denotes the averaging over time, under the following three constraints:

$$\left\langle {y_{j,t}} \right\rangle _t = 0,$$

(24)

$$\left\langle {y_{j,t}^2} \right\rangle _t = 1,$$

(25)

$$\left\langle {y_{i,t}y_{j,t}} \right\rangle _t = 0.$$

(26)

In other words, we should find out the scalar function g_j(X_t) that minimizes the time derivative of the latent variable y_j,t. Then, the latent variable y_j,t that minimizes the Δ-value is called the slow feature of X_t. Equations (24) and (25) prevent a trivial solution as in our model, and Eq. (26) deccorelates the outputs of different units. We applied SFA²⁵ to the same input sequence as used in Fig. 5a. However, the results are not shown as the algorithm failed to generate outputs within a reasonably long simulation time.

Details of simulations

Additional technical details of simulations and the values of model parameters used in the figures are given in Supplementary Methods.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

All numerical datasets necessary to replicate the results shown in this article can easily be generated by numerical simulations with the software code provided below. No datasets were generated during this study.

Code availability

All codes were written in Python3 with numpy 1.17.3 and scipy 0.18.1. Example program codes used for the present numerical simulations and data analysis are available at https://github.com/ToshitakeAsabuki/MRIL_codes.

References

Dehaene, S., Meyniel, F., Wacongne, C., Wang, L. & Pallier, C. The neural representation of sequences: from transition probabilities to algebraic patterns and linguistic trees. Neuron 88, 2–19 (2015).
Article CAS PubMed Google Scholar
Fujii, N. & Graybiel, A. M. Representation of action sequence boundaries by macaque prefrontal cortical neurons. Science 301, 1246–1249 (2003).
Article CAS ADS PubMed Google Scholar
Jin., X. & Costa, R. M. Start/stop signals emerge in nigrostriatal circuits during sequence learning. Nature 466, 457–462 (2010).
Article CAS ADS PubMed PubMed Central Google Scholar
Jin., X., Tecuapetla, F. & Costa, R. M. Basal ganglia subcircuits distinctively encode the parsing and concatenation of action sequences. Nat. Neurosci. 17, 423–430 (2014).
Article CAS PubMed PubMed Central Google Scholar
Schapiro, A. C., Rogers, T. T., Cordova, N. I., Turk-Browne, N. B. & Botvinick, M. M. Neural representations of events arise from temporal community structure. Nat. Neurosci. 16, 486–492 (2013).
Article CAS PubMed PubMed Central Google Scholar
Zacks, J. M. et al. Human brain activity time-locked to perceptual event boundaries. Nat. Neurosci. 18, 449–455 (2001).
Google Scholar
Friston, K. The free-energy principle: a unified brain theory? Nat. Rev. Neurosci. 11, 127–138 (2010).
Article CAS PubMed Google Scholar
Bastos, A. M. et al. Canonical microcircuits for predictive coding. Neuron 76, 695–711 (2012).
Article CAS PubMed PubMed Central Google Scholar
Keller, G. B. & Mrsic-Flogel, T. D. Predictive processing: a canonical cortical computation. Neuron 100, 424–435 (2018).
Article CAS PubMed PubMed Central Google Scholar
McDermott, J. H. The cocktail party problem. Curr. Biol. 19, 1024–1027 (2009).
Article CAS Google Scholar
Larkum, M. E., Zhu, J. J. & Sakmann, B. A new cellular mechanism for coupling inputs arriving at different cortical layers. Nature 398, 338–341 (1999).
Article CAS ADS PubMed Google Scholar
Linden, D. J. The return of the spike: postsynaptic action potentials and the induction of LTP and LTD. Neuron 22, 661–666 (1999).
Article CAS PubMed Google Scholar
Markram, H., Lübke, J., Frotscher, M. & Sakmann, B. Regulation of synaptic efficacy by coincidence of postsynaptic APs and EPSPs. Science 275, 213–215 (1997).
Article CAS PubMed Google Scholar
Magee, J. C. & Johnston, D. A synaptically controlled, associative signal for Hebbian plasticity in hippocampal neurons. Science 275, 209–213 (1997).
Article CAS PubMed Google Scholar
Sjöström, P. J. & Häusser, M. A cooperative switch determines the sign of synaptic plasticity in distal dendrites of neocortical pyramidal neurons. Neuron 51, 227–238 (2006).
Article CAS PubMed Google Scholar
Fairhall, A. L., Lewen, G. D., Bialek, W. & de Ruyter Van Steveninck, R. R. Efficiency and ambiguity in an adaptive neural code. Nature 412, 787–792 (2001).
Article CAS ADS PubMed Google Scholar
Maravall, M., Petersen, R. S., Fairhall, A. L., Arabzadeh, E. & Diamond, M. E. Shifts in coding properties and maintenance of information transmission during adaptation in barrel cortex. PLoS Biol. 5, e19 (2007).
Article CAS PubMed PubMed Central Google Scholar
Mensi, S., Hagens, O., Gerstner, W. & Pozzorini, C. Enhanced sensitivity to rapid input fluctuations by nonlinear threshold dynamics in neocortical pyramidal neurons. PLoS Comput. Biol. 12, e1004761 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Pozzorini, C., Naud, R., Mensi, S. & Gerstner, W. Temporal whitening by power-law adaptation in neocortical neurons. Nature Neurosci. 16, 942–948 (2013).
Article CAS PubMed Google Scholar
Comon, P. Independent component analysis, a new concept? Signal Process. 36, 287–314 (1994).
Article ADS MATH Google Scholar
Amari, S. & Cardoso, J. F. Blind source separation – semiparametric statistical approach. IEEE Trans. Signal Process. 45, 2692–2700 (1997).
Article ADS Google Scholar
Hyvärinen, A. & Oja, E. Independent component analysis: algorithms and applications. Neural Netw. 13, 411–430 (2000).
Article PubMed Google Scholar
Kameoka, H., Li, L., Inoue, S. & Makino, S. Supervised determined source separation with multichannel variational autoencoder. Neural Computation. 31, 1–24 (2019).
Article MathSciNet MATH Google Scholar
Urbanczik., R. & Senn, W. Learning by the dendritic prediction of somatic spiking. Neuron 81, 521–528 (2014).
Article CAS PubMed Google Scholar
Wiskott, L. & Sejnowski, T. J. Slow feature analysis: unsupervised learning of invariances. Neural Comput. 14, 715–770 (2002).
Article PubMed MATH Google Scholar
Belouchrani, A., Abed-Meraim, K., Cardoso, J. F. & Moulines, E. A blind source separation technique using second-order statistics. IEEE Trans. Signal Process. 45, 434–444 (1997).
Article ADS Google Scholar
Larkum, M. A cellular mechanism for cortical associations: an organizing principle for the cerebral cortex. Trends Neurosci. 36, 141–151 (2013).
Article CAS PubMed Google Scholar
Pfister, J. P., Toyoizumi, T., Barber, D. & Gerstner, W. Optimal spike-timing dependent plasticity for precise action potential firing. Neural Comput. 18, 1318–1348 (2006).
Article MathSciNet PubMed MATH Google Scholar
Buiatti, M., Peña, M. & Dehaene-Lambertz, G. Investigating the neural correlates of continuous speech computation with frequency-tagged neuroelectric responses. Neuroimage 44, 509–519 (2009).
Article PubMed Google Scholar
Gentner, T. Q., Fenn, K. M., Margoliash, D. & Nusbaum, H. C. Recursive syntactic pattern learning by songbirds. Nature 440, 1204–1207 (2006).
Article CAS ADS PubMed PubMed Central Google Scholar
Smith, K. S. & Graybiel, A. M. A dual operator view of habitual behavior reflecting cortical and striatal dynamics. Neuron 79, 361–374 (2013).
Article CAS PubMed PubMed Central Google Scholar
Branco, T., Clark, B. A. & Häusser, M. Dendritic discrimination of temporal input sequences in cortical neurons. Science 329, 1671–1675 (2010).
Article CAS ADS PubMed PubMed Central Google Scholar
Lu, J. T., Li, C. Y., Zhao, J. P., Poo, M. M. & Zhang, X. H. Spike-timing-dependent plasticity of neocortical excitatory synapses on inhibitory interneurons depends on target cell type. J. Neurosci. 27, 9711–9720 (2007).
Article CAS PubMed PubMed Central Google Scholar
Woodin, M. A., Ganguly, K. & Poo, M. M. Coincident pre-and postsynaptic activity modifies GABAergic synapses by postsynaptic changes in Cl− transporter activity. Neuron 39, 807–820 (2003).
Article CAS PubMed Google Scholar
Vogels, T. P., Sprekeler, H., Zenke, F., Clopath, C. & Gerstner, W. Inhibitory plasticity balances excitation and inhibition in sensory pathways and memory networks. Science 334, 1569–1573 (2011).
Article CAS ADS PubMed Google Scholar
Földiak, P. Forming sparse representations by local anti-Hebbian learning. Biol. Cybern. 64, 165–170 (1990).
Article PubMed Google Scholar
Bishop, C. M. Training with noise is equivalent to Tikhonov regularization. Neural Comput. 7, 108–116 (1995).
Article Google Scholar
Agus, T. R., Thorpe, S. J. & Pressnitzer, D. Rapid formation of robust auditory memories: insights from noise. Neuron 66, 610–618 (2010).
Article CAS PubMed Google Scholar
Masquelier, T., Guyonneau, R. & Thorpe, S. J. Competitive STDP-based spike pattern learning. Neural Comput. 21, 1259–1276 (2009).
Article PubMed MATH Google Scholar
Nessler, B., Pfeiffer, M., Buesing, L. & Maass, W. Bayesian computation emerges in generic cortical microcircuits through spike-timing-dependent plasticity. PLoS Comput. Biol. 9, e1003037 (2013).
Article CAS ADS MathSciNet PubMed PubMed Central Google Scholar
Pierre, P. What mechanisms underlie implicit statistical learning? transitional probabilities versus chunks in language learning. Top. Cogn. Sci. 11, 520–535 (2018).
Google Scholar
Hubel, D. H. & Wiesel, T. N. Receptive fields of single neurons in the cat’s striate cortex. J. Physiol. 148, 574–591 (1959).
Article CAS PubMed PubMed Central Google Scholar
Olshausen, B. A. & Field, D. J. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381, 607–609 (1996).
Article CAS ADS PubMed Google Scholar
Duan, Z. & Pardo, B. Soundprism: an online system for score-informed source separation of music audio. IEEE J. Sele. Top. Signal Process. 5, 1205–1215 (2011).
Article ADS Google Scholar
Hyvärinen, A. Fast and robust fixed-point algorithms for independent component analysis. IEEE Trans. Neural Netw. 10, 626–634 (1999).
Article PubMed Google Scholar
Zito, T., Wilbert, T., Wiskott, L. & Berkes, P. Modular toolkit for Data Processing (MDP): a Python data processing framework. Front. Neuroinform. 2, 8 (2009).
PubMed PubMed Central Google Scholar
Clopath, C., Longtin, A. & Gerstner, W. An online Hebbian learning rule that performs independent component analysis. BMC Neurosci. 9, 321–328 (2008).
Sprekeler, H., Zito, T. & Wiskott, L. An extension of slow feature analysis for nonlinear blind source separation. J. Mach. Learn. Res. 15, 921–947 (2014).
MathSciNet MATH Google Scholar
Savin, C., Joshi, P. & Triesch, J. Independent component analysis in spiking neurons. PLoS Comput. Biol. 6, e1000757 (2010).
Article ADS MathSciNet CAS PubMed PubMed Central Google Scholar
Sprekeler, H., Michaelis, C. & Wiskott, L. Slowness: An objective for spike timing-dependent plasticity? PLoS Comput. Biol. 3, e112 (2007).
Article ADS MathSciNet CAS PubMed PubMed Central Google Scholar
Rabinovich, M., Varona, P., Tristan, I. & Afraimovich, V. Chunking dynamics: heteroclinics in mind. Front. Comput. Neurosci. 8, 22 (2014).
Article PubMed PubMed Central Google Scholar
Fonollosa, J., Neftci, E. & Rabinovich, M. Learning of chunking sequences in cognition and behavior. PLoS Comput. Biol. 11, e1004592 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Wacongne, C., Changeux, J. P. & Dehaene, S. A neuronal model of predictive coding accounting for the mismatch negativity. J. Neurosci. 32, 3665–3678 (2012).
Article CAS PubMed PubMed Central Google Scholar
Kiebel, S. J., Kriegstein, K. V., Daunizeau, D. & Friston, K. J. Recognizing sequences of sequences. PLoS Comput. Biol. 6, e1000464 (2009).
Article MathSciNet CAS Google Scholar
Asabuki, T., Hiratani, N. & Fukai, T. Interactive reservoir computing for chunking information streams. PLoS Comput. Biol. 14, e1006400 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Rieke, F., Warland, D., de Ruyter van Stevenick, R. & Bialek, W. Spikes (Cambridge, MA: Massachusetts Institute of Technology, 1997).
Linsker, R. Perceptual neural organization: some approaches based on network models and information theory. Annu. Rev. Neurosci. 13, 257–281 (1990).
Article CAS PubMed Google Scholar
Tishby, N., Pereira, F. C. & Bialek, W. The information bottleneck method. Proceedings of the 37th annual Allerton Conference on Communication, Control, and Computing, 368–377 (1999).
Murayama, M. et al. Dendritic encoding of sensory stimuli controlled by deep cortical interneurons. Nature 457, 1137–1141 (2009).
Article CAS ADS PubMed Google Scholar
Titley, H. K., Brunel, N. & Hansel, C. Toward a neurocentric view of learning. Neuron 95, 19–32 (2017).
Article CAS PubMed PubMed Central Google Scholar
Memmesheimer, R. M., Rubin, R., Ölveczky, B. P. & Sompolinsky, H. Learning precisely timed spikes. Neuron 82, 925–938 (2014).
Article CAS PubMed Google Scholar
Legenstein, R. & Maass, W. Branch-specific plasticity enables self-organization of nonlinear computation in single neurons. J. Neurosci. 31, 10787–10802 (2011).
Article CAS PubMed PubMed Central Google Scholar
Sacramento, J., Costa, R. P., Bengio, Y. & Senn, W. Dendritic error backpropagation in deep cortical microcircuits. arXiv:1801.00062 (2017).
Hiratani, N. & Fukai, T. Redundancy in synaptic connections enables neurons to learn optimally. Proc. Natl Acad. Sci. USA 115, E6871–E6879 (2018).
Article CAS PubMed PubMed Central Google Scholar
Jahnke, S., Timme, M. & Memmesheimer, R. M. A unified dynamic model for learning, replay, and sharp-wave/ripples. J. Neurosci. 35, 16236–16258 (2015).
Article CAS PubMed PubMed Central Google Scholar
Bhalla, U. S. Synaptic input sequence discrimination on behavioral timescales mediated by reaction-diffusion chemistry in dendrites. Elife 6, e25827 (2017).
Article PubMed PubMed Central Google Scholar
Hawkins, J. & Ahmad, S. Why neurons have thousands of synapses, a theory of sequence memory in neocortex. Front. Neural Circuits 10, 23 (2016).
Article PubMed PubMed Central Google Scholar
Richards, B. A. & Lillicrap, T. P. Dendritic solutions to the credit assignment problem. Curr. Opin. Neurobiol. 54, 28–36 (2019).
Article CAS PubMed Google Scholar
Ujfalussy, B., Makara, J. K., Lengyel, M. & Branco, T. Global and multiplexed dendritic computations under in vivo-like conditions. Neuron 100, 579–592 (2018).
Article CAS PubMed PubMed Central Google Scholar
Bienenstock, E. L., Cooper, L. N. & Munro, P. W. Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex. J. Neurosci. 2, 32–48 (1982).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The authors express their sincere thanks to Shun-ichi Amari for stimulating discussion about the proposed learning rule and to Shigeyoshi Fujisawa, Joshua Johansen and Yukiko Goda for their valuable comments on our manuscript. The authors also thank Yuanchieh Ling and Thomas Burns for technical assistance. This work was partly supported by KAKENHI (nos. 17H06036, 18H05213 and 19H04994) to T.F. T.A. was supported by the Junior Research Associate program of RIKEN and the SRS Research Assistantship of OIST.

Author information

Authors and Affiliations

Department of Complexity Science and Engineering, University of Tokyo, Kashiwa, Chiba, 277-8561, Japan
Toshitake Asabuki & Tomoki Fukai
Okinawa Institute of Science and Technology, Onna-son, Kunigami-gun, Okinawa, 904-0495, Japan
Tomoki Fukai
RIKEN Center for Brain Science, Wako, Saitama, 351-0198, Japan
Tomoki Fukai

Authors

Toshitake Asabuki
View author publications
You can also search for this author in PubMed Google Scholar
Tomoki Fukai
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

T.F. and T.A. designed the study and wrote the manuscript, and T.A. performed numerical simulations.

Corresponding author

Correspondence to Tomoki Fukai.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Communications thanks Subutai Ahmad and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Reporting Summary

Description of Additional Supplementary Files

Supplementary Audio 1

Supplementary Audio 2

Supplementary Audio 3

Supplementary Audio 4

Supplementary Audio 5

Supplementary Audio 6

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Asabuki, T., Fukai, T. Somatodendritic consistency check for temporal feature segmentation. Nat Commun 11, 1554 (2020). https://doi.org/10.1038/s41467-020-15367-w

Download citation

Received: 18 April 2019
Accepted: 06 March 2020
Published: 25 March 2020
DOI: https://doi.org/10.1038/s41467-020-15367-w

This article is cited by

Discussion of “Identifiability of latent-variable and structural-equation models: from linear to nonlinear”
- Hiroshi Morioka
Annals of the Institute of Statistical Mathematics (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.