## Introduction

The activity of neurons as computational elements of the brain depends strongly on their spiking activity history, leading to dynamics with a long-term memory effect1,2,3,4. This type of neuronal plasticity differs from current approaches in modern machine learning (ML), such as the stochastic dropout technique5. In this work, we present new experimental in-vitro results controlling dynamics of the brain, and utilize them to expand supervised ML tasks6 to include sequence identification, where the input objects have a given order and timing.

The objective of a sequence identification task is to decide whether an embedded sequence is presented as a temporal order input. This is a common task in many aspects of human cognition, including associative memory, the recognition of temporal events, and the identification of handwritten sequences of digits, such as telephone numbers or verification codes. Producing a reliable answer for such a decision problem requires accurate classification of all sequence objects. The realization of a sequence identification task with neural networks usually requires feedback loops7,8, which suffer from training difficulties and long convergence times. It has become increasingly difficult to use feedforward neural networks for sequence identification because their achievable success rates (SRs) typically range between 0.9 for complex classification tasks9 and limited datasets10 to above 0.99 for relatively simple tasks11. Because the SR of each object is a finite distance from one, the probability of accurately identifying the entire sequence falls exponentially with the number of objects in the sequence. An additional puzzle is that brain dynamics without the precise and reliable implementation of advanced deep learning techniques12,13 are expected to achieve significantly lower SRs; nevertheless, they have demonstrated the ability to identify many sequences with high fidelity. Hence, the formation of reliable identification-networks (ID-nets) requires the development of a new type of brain-inspired ML mechanism.

In this work, first, we demonstrate a new brain-inspired mechanism where nodes of spiking feedforward neural networks (SFNNs)14,15 are temporally silenced following their recent activity. Consequently, transitory objects act on different feedforward sub-networks which are dynamically created by the activity of previous objects. The quantitative results of the ID-nets are then presented for the sequence identification of 10 handwritten digits taken from the Modified National Institute of Standards and Technology (MNIST) database6,10,16. Then, we present the preliminary results on the utilization of ID-nets for writer-dependent sequence recognition and their possible application to cryptography17. We conclude the theoretical discussion by generalizing the presented results to the same artificial neural network (ANN) architectures, but with continuous activation nodes and training sequences obtained from CIFAR-1018 on LeNet-519, representing a deep convolutional neural network. Finally, the experimental results using neuronal cultures are presented to support the mechanism of temporally silenced nodes. We conclude this paper by summarizing the main results and comparing the presented mechanism of brain-inspired ID-nets with the long-short-term-memory (LSTM)20,21 approach.

## Results

### Brain-inspired ID-nets: temporal sequences

A temporal object order, i.e., a sequence, comprising 10 consecutive handwritten digits taken from the MNIST database is separated using time-lags of $$\Delta t$$ (Fig. 1a, “Methods”). Each input digit to the SFNN is represented via $$d$$ Boolean frames, where the pixels of each frame are probabilistically spiking (white) or non-spiking (black) relative to their gray-level (Fig. 1a). This representation is justified because the time-average of a sufficient number of $$d$$ Boolean frames is extremely similar to the original digit (Fig. 1a).

A temporal sequence (Fig. 1a) is trained using a backpropagation technique on a feedforward network comprising 784 input units, which represent the 28 × 28 pixels of the MNIST digit, 200 hidden units, and 10 output units, which represent possible labels (Fig. 1b). For a given input digit, the output is selected as the output label with maximal fires in the last $${d}_{1}$$ frames (“Methods”). Clearly, the trained network cannot serve as an ID-net because each digit is predicted independently. In addition, with a limited number of examples per label, e.g., 1000, the SR is only ~ 0.9 (Supplementary Fig. 1), and the probability of accurately identifying all 10 digits thus decreases to $${0.9}^{10}{-}0.35.$$

### Brain-inspired ID-nets: silencing mechanism

The possibility of reconstructing this network as an ID-net emerged from the experimentally observed long-term memory of neurons after they were stimulated at high frequency (see experimental results below), which led to the following dynamical scheme. The response probability, $$\frac{q}{{d}_{1}},$$ of each network node when presented with a given digit depends on the number of its spikes $$q$$ during the last $${d}_{1}$$ frames of the previous digit (Fig. 1c). This silencing mechanism is applicable for both input and hidden units, and it is artificially attributed to the first digit to exclude boundary condition effects (“Methods”). Consequently, each digit is trained on a different sub-SFNN, where the response probability of some of the inputs and hidden units is dimmed based on their past activity (Fig. 1d). The weights that connect the silenced nodes are also temporarily excluded in the training process. In principle, this silencing mechanism has long-range effects, similar to the Markovian processes22,23. The activation of nodes for a given digit depends on their activity for the previous digit, which in turn depends on the activity for the penultimate digit; the importance of the temporal order is thus evident. If the order of the test sequence differs from the trained sequence, the active sub-SFNN of some test digits will differ significantly from the trained sub-SFNN (Fig. 1d). When the sequence orders differ, the SRs are expected to decrease following the decrease in the ID-net signal-to-noise-ratio. The signal is diminished because some of the trained weights have been silenced, while the noise is enhanced as some weights that were silenced in the training process have become active.

### Performance of the ID-net

The performance of an ID-net is examined quantitatively to distinguish between the embedded sequence and following types of imperfections (Fig. 1e). The first imperfection is a much faster presentation of the same sequence with shorter time-lags between consecutive digits $$\Delta {t}_{s}<\Delta t$$, resulting in additional silenced nodes that spiked for the previously presented digits. The second imperfection is a much slower presentation of the same sequence with longer time-lags $$\Delta {t}_{L}>\Delta t$$, such that the effect of the silenced nodes vanished; i.e., all nodes are active. The third imperfection comprises a wrong sequence, where the timings of the two digits have been exchanged.

Training the ID-net (Fig. 1b) using 1000 sequences, wherein each comprises the same order of 10 unique MNIST digits, results in an average SR of ~ 0.91 per digit. The minimal SR among the 10 digits varies in the range $$\left[0.79, 0.88\right],$$ where the most probable value is near the middle of this range (Fig. 2a, blue histogram). The minimal SR for the three types of imperfect input sequences, i.e., fast, slow, and wrong (Fig. 1e), is summarized by the orange histograms in Fig. 2a. We assume that the ID-net knows the trained order of the digits, and that the SRs are evaluated with respect to the expected order of the digits (“Methods”). The results clearly indicate a significant gap between the blue and orange histograms for slow and fast imperfections, such that a threshold of 0.6 can be used to identify clusters of such imperfect sequences. For wrong sequences with only two swapped digits, there is a small overlap between the orange and blue histograms (Fig. 2a); however, a threshold of 0.79 accurately predicts whether the order of the input sequence is correct with a probability of ~ 0.998. The threshold is defined as the maximal value such that the blue histogram, generated from all examined samples, is above it (True positive = 1). Hence, the SR indicates the fraction of the orange histogram obtained from all samples that is below threshold. This definition of SR corresponds to the tTrue negative rate (specificity measurement).

In cases where the ID-net does not know the trained sequence, the SR is calculated for each digit following the most frequently predicted labels. The orange histogram is calculated following the minimal SR value among the 10 digits; consequently, the orange histogram exhibits higher SR values (Fig. 2b). Nevertheless, a significant gap remains for imperfect faster or slower sequences (Fig. 2b); meanwhile, for similar wrong sequences, a threshold of 0.79 accurately predicts whether the order of the input sequence is correct or wrong with a probability of ~ 0.994 (Fig. 2b).

The ID-net is capable of embedding and recognizing more than one sequence, while remaining robust in terms of identifying imperfect sequences. For the case with two embedded similar sequences (Fig. 2c), there is a slight broadening and shift in the orange histograms, but the gap is maintained for the fast and slow sequences. For wrong sequences a threshold of 0.76 predicts with probability ~ 0.992 whether the input sequence is one of the correct sequences or not (Fig. 2c), as well as with a probability of ~ 0.991 when the ID-net does not know the order of trained sequences (Fig. 2d). When increasing the number of embedded sequences, the gap between the orange and blue histograms decreases, and their overlap increases (Supplementary Fig. 2). Note that, for all cases examined here (Fig. 2), a fitted threshold for each trained sequence resulted in an enhanced gap and better identification (Supplementary Fig. 2), because the right tails of the orange and blue histograms were correlated.

The results are shown for sequences of 10 digits and for an architecture with 200 hidden units only (Fig. 1b). Nevertheless, a similar performance of the ID-net was obtained in our simulations for sequences with numbers of digits in the range [5, 12]. The improvement in performance for more than 200 hidden units was found to be negligible, and was only slightly affected in the range [50, 200].

### ID-net for writer-dependent recognition

Finally, the performance of the ID-net is demonstrated for writer-dependent recognition, which enables a given sequence with specific handwriting to be distinguished from its three associated imperfections and from the same sequence in different handwriting. For simplicity, we classify the handwriting based on the average absolute distance between pixels. Specifically, the maximal absolute difference between the pixel values with a gray-level greater than 100 is 20 for a given handwriting, whereas the handwritings of other individuals comprise other MNIST digits. Indeed, the absolute difference between gray-level pixels greater than 100 of two MNIST examples with the same label is much greater than 20 (Supplementary Fig. 4).

The ID-net (Fig. 1b) is trained on one individual’s handwriting using synthetic sequences generated by adding integer noise in the range [− 20, 20] to pixels with a gray-level greater than 100. A decision based on the SRs is not applicable in this case because only one sequence is presented with 10 output probabilities for each digit. Hence, for each predicted test digit, the gap, $$\Delta$$, between the highest and second highest firing output nodes is normalized using $${d}_{1}$$ to the range $$[0, 1]$$. Its minimal value for the 10 digits of the sequence $${\Delta }_{min}$$ is chosen for the decision criterion. When the correct sequence is presented, the highest normalized gap $$\Delta$$ for each digit is expected to be close to one. For a sequence with one of the imperfections or with different handwriting, the two opposite trends are expected in some of the digits, including a decrease in the highest firing output node with a simultaneous increase in the firing of the next highest output node, resulting in a substantial decrease in $${\Delta }_{\mathrm{min}}$$. The minimal gap $${\Delta }_{min}$$ can indeed identify a writer-dependent sequence with a probability approaching 1 (Fig. 3), and it is recommended as a cryptographic authentication ingredient that is robust to noise. The sequence itself is not known to the verification system, which only receives the weights and dynamics of the ID-net for verification. The client must insert the 10 digits in the correct order and speed using correct handwriting, all of which are robust to some level of noise, unlike methods relying on number theory24. An opponent is expected to fail even with partial knowledge of the individual handwriting, speed, and order of the sequence (Fig. 3). This capability of ID-net can be extended to identify several writer-dependent sequences (Supplementary Fig. 5). Nevertheless, many open questions remain regarding the level of security of such potential cryptographic applications, which require further research.

### Generalization of ID-nets to ANNs

The brain-inspired mechanism, where nodes of spiking feedforward neural networks (SFNNs) are temporally silenced following their recent activity, is generalized to standard ANNs with continuous output nodes (Fig. 4a) and to deep convolutional neural networks trained on the CIFAR-1018 database (Fig. 4b). The silencing probability of each SFNN node when presented with a given object is equal to the fraction of its spikes during the frames of the previous object. Similarly, the silencing probability of an ANN node is determined by its activation value in the previous object. The fully connected architecture comprising a single hidden layer (Fig. 1b) with a sigmoid activation function for the nodes serves as the ID-net using the following dynamical scheme. The silencing probability of each network node when presented with a given MNIST digit is equal to its output [0, 1] in the previous digit. This silencing mechanism is applicable for both the input and hidden units, and it is artificially attributed to the first digit to exclude boundary condition effects (“Methods”). We train this ANN on a dataset similar to the one in Fig. 2a, and the results for the minimal SR are indicated by the blue and orange histograms for the three types of imperfect test sequences: fast/slow/wrong (Fig. 4a). Results clearly indicate a significant gap between the blue and orange histograms for slow and fast imperfections and a small overlap for the wrong imperfection (Fig. 4a). However, a threshold of 0.9 accurately predicts whether the order of the input sequence is correct with a probability of ~ 0.994. The application of the ID-net for writer-dependent recognition (Fig. 3) is exemplified for the LeNet-5 architecture with a ReLU (Rectified Linear Unit) activation function. It is trained on one individual’s objects, comprising a sequence of 10 different CIFAR-10 objects with 50 additional similar synthetic sequences, where the response probability of each network node in the fully connected layers (layers C5 and F6) depends on its output in the previous digit (“Methods”). The vanishing overlap between the blue and orange histograms for the four imperfections, fast, slow, wrong, and objects of different individuals (Fig. 4b, similar to Fig. 3), clearly indicates the sensitivity of such convolutional neural networks to writer-dependent recognition. This capability of the ID-net using convolutional ANNs can be extended to identify several writer-dependent sequences (not shown). The generalization of the brain-inspired mechanism from SFNNs to ANNs enhances the connection between ML for sequence identification and advanced cryptographic protocols that are robust to some level of noise.

### Experimental observation of silenced neurons in-vitro

The inspiration for silencing a node following its recent relatively high-frequency spiking activity originates from our in-vitro experiments on synaptic blocked neuronal cultures25 (“Methods”). The experimental setup is capable of repeatedly stimulating a neuron extracellularly via one of its dendrites, while recording the responses of the neuron intracellularly26 using the patch-clamp technique (Fig. 5a).

Each neuron responds reliably when stimulated above threshold below its critical frequency27, which varies considerably among neurons. When a neuron is stimulated significantly above its critical firing frequency, its responses can comprise alternating bursts28 of full responsiveness and silencing periods (Fig. 5b). A burst of full responsiveness can represent repetitive spiking activity in response to consecutive frames of a digit (Fig. 1a), followed by a silencing period (Fig. 1c). This is the underlying mechanism of the present ID-net, where transitory objects act on different dynamically created feedforward sub-networks (Fig. 2d). The silencing periods typically comprise tens of stimulations without evoked spikes, and their maximal durations can reach an order of seconds. The relaxation of such neurons to full responsiveness at low frequency sometimes exceeds 1 min29 (Supplementary Fig. 6).

Notably, although neurons can be silenced for a timescale of seconds, the response of the entire network, beyond the silenced sub-network (Fig. 1d), is controlled by neurons that are active during these periods. Furthermore, sub-second silencing periods were observed for neurons that are characterized by high critical firing frequencies27, where their responsiveness increases for a given stimulation frequency; hence, these neurons can participate in the identification of high-rate sequences.

## Discussion

In this study, we have demonstrated the complex reality of brain dynamics, wherein the functionality of neurons is time- and activity-dependent. It has been demonstrated that this stochastic long-term memory is not a disadvantage representing biological limitations. Instead, its advantage over the memoryless deterministic dynamics of SFNNs without feedback loops for temporal sequence identification has been presented. Counterintuitively, for a given architecture and limited number of training examples, the classification performance for individual objects without silencing nodes is far below unity, but approaches unity for identification of an entire sequence of objects using silencing nodes. This brain-inspired long-term neuronal plasticity was also useful for realizing sequence identification using ANNs without feedback loops and with a limited number of training examples. It represents the brain-inspired SFNNs as a source of novel mechanisms to advance learning in ANNs.

Note that the underlying mechanism of the present artificial ID-net cannot shed light on the way the brain deciphers the time series of objects. Indeed, recent EEG experiments30 indicate that each visual cognitive task, i.e., a visual object, activates a different area of the brain, which is highly similar during the same task. However, these brain states dynamically evolve into cyclic manifolds that are distinct with different tasks.

Notably, this type of long-term neuronal plasticity differs from existing memoryless approaches in modern ML, such as the stochastic dropout technique5. Here, the temporal dropout of nodal activity depends on their recent activity. In addition, the presented brain-inspired mechanism for silencing nodes differs from the conventional MPEG video compression standards31,32,33, where differences between bits of highly similar consecutive frames are evaluated. These different bits, either zero or one, contain temporal information, whereas similar bits are neglected. In this study, active nodes repeatedly fire within frames representing the same object, and are only silenced following their similar activity for the previously presented object.

Currently, the order of sequences of objects can be identified using recurrent neural network (RNN) architectures, and specifically, the scheme of LSTM20,21. In contrast to standard feedforward neural networks, LSTM20,21 includes feedback connections. The presented brain-inspired approach indicates that, fundamentally, sequences can be identified without feedback loops. In addition, sequences are identified beyond the order of their objects also following the time-lags between consecutive objects. This feature is typically beyond the current framework of the LSTM architecture; we suggest its inclusion.

It is not straightforward to compare the performance of LSTM and ID-net models, as they do not perform the same task. For instance, in LSTM, one typically assumes that a sub-sequence is identified accurately, and the task is to measure the SR of the next possible element generated by a given graph34. However, in our study, we assume a small training dataset with SRs for an individual object far below unity, and estimate the SR for accurately identifying the entire sequence.

One component of the state-of-the-art sequence recognition models is an attention mechanism35. These models outperform36 the SR of the presented ID-nets, for which the motivation is to provide a bridge between the experimental findings of stochastic neuronal features and advanced machine-learning algorithms. We present an experimental neuronal long-term plasticity mechanism for reliable feedforward sequence identification. The architecture is the same as that for the identification of an individual object, but transitory objects act on different dynamically created feedforward sub-networks. Nevertheless, the presented ID-nets are only useful for identifying very limited types of time series. They are not applicable to analyzing, for instance, video datasets (which are one of the main foci of current research) and time series analysis applications using the attention mechanism36. In video datasets, each object is composed of time series comprising many frames that constitute a single object, where frames are related to each other following the content of the video. In addition, the exchange of the first part of the video with the second part possibly does not change its classification. By contrast, an object in the presented ID-nets is limited to a single frame, where the same object cannot be repeated several times in the time series; hence, consecutive objects (frames) significantly vary. In addition, the identification of the limited length time series is sensitive to the order of its constituent objects (several frames). Nevertheless, we expect that integrating such a biological mechanism of silencing nodes in RNN architectures and NN with attention components will enhance their performance and time domain (slow or fast) identification capabilities. Hence, we conclude that experimentally exploring brain dynamics can be a source of new ideas for enhancing ML methods.

## Methods

### Methods of the simulations: generating MNIST inputs

Each example, $${\widetilde{X}}^{i}, i=1, 2, \dots , M$$, of the trained dataset consists of 784 pixels, $${\widetilde{X}}_{p}^{i}$$, $$p=\mathrm{1,2}, \dots , 784$$, the values of which represent the gray-level and are in the range [0, 255]. The original 784 pixels are divided by the value 255, representing the firing probability of the pixels. Eventually, $$d$$ Boolean frames are generated for each example by comparing the firing probability of each normalized pixel $$p$$ in the frame number $$k$$ $${X}_{p}^{k}$$ to a random number taken from a uniform distribution in the range [0, 1], as described in detail in the references37.

### Definition of a node as LIF neuron

Each node in the ID-net (Fig. 1b) is represented by a leaky-integrate-and-fire (LIF) neuron (Eqs. (1)–(3) in reference16), with the time constant for membrane potential decay $$\tau =20\;\text{ms}$$.

### Definition of a sequence

Each sequence consists of 10 consecutive handwritten different digits taken from the MNIST database separated by 200 ms time-lags. Each input digit to the SFNN is represented by $$d$$ Boolean frames separated by 10 ms. For simplicity, the first digit of each sequence is fixed to be 0, and the rest of the sequence is constructed from the digits 1–9 in a given preselected order.

### Silenced input nodes and hidden nodes

After each digit is presented as an input consisting of $$d$$ frames to the ID-net, the silenced nodes for the next presented input digit are selected in the following way. We define a silencing profile to be the probability of each input and hidden node to be silenced depending on its fraction of spiking activity, $$\frac{q}{{d}_{1}},$$ during the last $${d}_{1}$$ frames of the previous digit (Fig. 1c). Following a selected uniform random number in the range [0, 1] for each node only once, a decision is taken for a node to be either silenced or active in the current $$d$$ frames of a digit. To exclude boundary condition effects, we artificially attributed silenced input and hidden nodes to the first zero digit in the sequence, using an adopted silencing profile of randomly selected digits of another trained sequence.

Note that the used random seed number for each digit differs between presented trained and test sequences. Hence, the selected silenced nodes in the frames of a given digit are different.

### Feed-forward and back propagation of the ID-net

Each node in the ID-net is presented by a leaky integrate-and-fire neuron. We train the ID-net using N sequences with the same order of 10 different digits. The time-lag between consecutive frames of a digit is $$10 \; \text{ms}$$ and the time-lag between consecutive digits is 200 ms, such that the duration of a sequence is $$10\cdot d+1800 \;\text{ms}.$$ Note that the voltage of the LIF nodes practically vanishes between consecutive digits.

The following two main features characterize the training process of the ID-net:

1. 1.

The ID-net is trained using 1000 sequences with 10 epochs. All sequences have the same order of the 10 different digits, but with different randomly selected examples from the MNIST dataset. A back propagation learning step is done at the end of each presented digit, to minimize the cross-entropy cost function:

$$C=-\frac{1}{M}\sum_{m=1}^{M}\sum_{i=1}^{10}\left[{y}_{m,i}\cdot \mathit{log}\left({a}_{m,i}\right)+\left(1-{y}_{m,i}\right)\cdot \mathit{log}\left(1-{a}_{m,i}\right)\right]+\frac{\alpha }{2\eta }\cdot \sum_{j}{W}_{j}^{2}$$

where $${y}_{m,i}$$ is the desired output value (0 or 1) for the ith output node in the mth input example. It is equal one for the desired label and otherwise zero. $${a}_{m,i}$$ stands for the output of the i’s output unit, given by the average firing during the last $${d}_{1}$$ frames of the m’s input example, $$\eta$$ is the learning rate and $$\alpha$$ is the regularization coefficient. The outer summation in the left term is over all M training examples.

The summation in the right term is over all weights in the network.

2. 2.

The feedforward and the backpropagation are very similar to the common method for the SFNN37 with the following main modifications:

• Silenced nodes—terms in the backpropagation including either silenced input nodes or silenced hidden nodes vanish.

• Only the accumulated output of the last $${d}_{1}$$ frames is used in the backpropagation process.

• The presented ID-net consists of a fully connected architecture (a generalization to diluted architectures is straightforward).

• In case of above-threshold accumulated voltage, a node fires and the voltage is reset to zero, and with vanishing refractory period.

The backpropagation method computes the gradient for each weight with respect to the cost function, and the weights and biases are updated according to the references16.

### Output label

The output label of the ID-net for each presented digit is selected following the output label with the maximal fires during the last $${d}_{1}$$ frames representing the digit. In case two output labels have the same maximal number of fires, the output label is set equal to the first node between the two. Similar results were found for the case where the output label is randomly selected between the two.

### Optimization

The learning rate, $$\eta$$, and the regularization coefficient, $$\alpha$$, are selected to maximize the success rate (SR) of each one of the 10 presented digits in the trained sequences. Note that the maximization of the SRs of the 10 digits is found to be very similar to the maximization of the average SR of the entire sequence. The optimization was based on the cross validation method. In the optimization procedure, we first search a rough grid of the adjustable parameters followed by fine-tuning grids with higher resolutions.

### Error prediction of the ID-net

Each panel in Figs. 2, 3 and 4 consists of blue and orange histograms. The threshold is defined as the maximal value such that the blue histogram, generated from all examined samples, is above it (True positive = 1). The true/false negative/positive results of Figs. 2, 3, 4 are summarized in the tables below:

Figure 2a,b, one sequence—threshold 0.79
Fast Slow Wrong
Panel (a)
True positive 1 1 1
False negative 0 0 0
False positive 0 0 0.002
True negative 1 1 0.998
Panel (b)
True positive 1 1 1
False negative 0 0 0
False positive 0 0 0.006
True negative 1 1 0.994
Figure 2c,d, two sequences—threshold 0.76
Fast Slow Wrong
Panel (c)
True positive 1 1 1
False negative 0 0 0
False positive 0 0 0.008
True negative 1 1 0.992
Panel (d)
True positive 1 1 1
False negative 0 0 0
False positive 0 0 0.009
True negative 1 1 0.991
Figure 3a–d—threshold 0.79
Panels (a–d) Fast Slow Wrong Different
True positive 1 1 1 1
False negative 0 0 0 0
False positive 0 0 0.0065 0
True negative 1 1 0.9935 1
Figure 4a—threshold 0.9
Panel (a) Fast Slow Wrong
True positive 1 1 1
False negative 0 0 0
False positive 0 0 0.0056
True negative 1 1 0.9944
Figure 4b—threshold 0.69
Panel (b) Fast Slow Wrong Different
True positive 1 1 1 1
False negative 0 0 0 0
False positive 0 0 0 0
True negative 1 1 1 1

### Figure 2: The calculation of the SRs

The ID-net SRs are calculated based on a minimum of 200 sequences with the same order of the 10-digits as in the training procedure.

We distinguished between the following two scenarios:

1. a.

The trained network knows the order of digits in the sequence. Hence, the SR for each one of the 10 presented digits is calculated with respect to the expected digit, even if a wrong sequence was presented (Fig. 1e).

2. b.

The trained ID-net does not know the order of digits in the trained sequences. In this scenario, for each one of the 10 presented digits in the test sequence the SR is taken as the maximal SR among the 10 possible output units, regardless if the predicted digit differs from the embedded digit in the trained sequence.

The test accuracy in Fig. 2a,c were calculated following scenario (a), whereas test accuracy in Fig. 2b,d were calculated following scenario (b).

For Fig. 2a,c the used parameters were: $$\alpha =3.3\cdot {10}^{-5} \; \mathrm{and} \; \eta =2.7\cdot {10}^{-4}.$$

For Fig. 2c,d the used parameters were: $$\alpha ={10}^{-6} \;\mathrm{and} \;\eta =4.7\cdot {10}^{-4}.$$

Each histogram consists of 40 different samples of trained sequences, each with different order of the 10 digits (For simplicity, the first digit was always selected to be zero).

For the slow and fast imperfections, the orange and blue histograms in Fig. 2a,b being constructed from 40 data points, one for each sample.

For the wrong sequence in Fig. 2a,b, each sample was tested on 36 different possible wrong sequences, differing from the order of the trained sequence by swapping the order of two digits, excluding the first zero digit. Hence, an orange histogram constructed from $$40\cdot 36= 1440$$ data points.

### Two embedded sequences, Fig. 2c,d

The order of the two embedded sequences is very similar. They have the same order except for the swapping of one pair of digits.

The same test procedure as mentioned above was used to construct the histograms for Fig. 2c,d, but with 42 different sequences (21 possible wrong sequences for each embedded sequence). Hence, each orange histogram for the wrong imperfection being constructed from $$40\cdot 42=1680$$ data points. Note that each embedded sequence contributes a data point to the blue histogram (80 data points for 40 samples).

For training sequences with relatively low SR, which occur with very low probability, one can enhance the SR using slightly different $$\eta$$ for the digit with the lowest SR in the sequence.

### Figure 3

For a trained sequence, randomly selected from MNIST, 50 similar synthetic sequences representing the same handwriting were generated in the following way. Each synthetic sequence is generated by adding an integer flat noise in the range [− 20, 20] for pixels with gray-level greater than 100 in the MNIST sequence. The ID-net was trained using 100 epochs.

Note that each embedded sequence contributes 50 data points to the blue and orange histograms in Fig. 3a,b,d and the blue histogram in Fig. 3c, a data point for each synthetic data.

The orange histogram in Fig. 3c is constructed similar to Fig. 2 for 36 different possible wrong sequences and 5 synthetic sequences for each of the 40 samples.

For each presented digit, the gap between the highest and the next highest firing output units is defined. This gap is normalized by $${d}_{1}$$ to the range $$[0, 1]$$. The minimal value of the normalized gap among the 10 digits, $${\Delta }_{min}$$, is chosen for the decision criterion.

Note that, in Fig. 3, the same initial random seed is used for the formation of each sequence, i.e. each frame. Consequently, similar input sequences generate very similar frames.

The parameters used in Fig. 3 were: $$\alpha ={8\cdot 10}^{-6} \; \mathrm{ and} \; \eta =3.7\cdot {10}^{-4}$$.

For trained sequences with extremely low $${\Delta }_{min}$$, which occur with very low probability, one can enhance the $${\Delta }_{min}$$ using slightly different $$\eta$$ for the digit with the lowest $${\Delta }_{min}$$ in the sequence.

### Figure 4: architecture and initial weights

The architecture in Fig. 4a is a fully connected ANN that is identical to Fig. 1b (comprising 784 input units, 200 hidden units, and 10 output units). Each unit in the hidden and output layers has an additional input from a bias unit.

The architecture in Fig. 4b is a modified Le-Net519 with convolution layers that operate on each channel separately, where the summation over all colors is performed before the first fully connected layer. Similar results were obtained using the Le-Net5 architecture.

The initial conditions of all weights are randomly chosen from a Gaussian distribution with a zero mean and standard deviation of one.

Input: Each example of the MNIST dataset, $${\widetilde{X}}^{i}, i=1, 2, \dots , M$$, of the trained dataset comprises 784 pixels, $${\widetilde{X}}_{p}^{i}$$, $$p=\mathrm{1,2}, \dots , 784$$, and their values represent the gray-level in the range [0, 255]. Input X of the example $$\widetilde{{\varvec{X}}}$$ comprises the original 784 pixels, where the average pixel value is subtracted from each pixel, and the standard deviation is set to one.

$${X}^{i}={\widetilde{X}}^{i}-\frac{1}{784} \sum_{p=1}^{784}{\widetilde{X}}_{p}^{i},$$
$${X}^{i}=\frac{{X}^{i}}{std\left({\widetilde{X}}^{i}\right)}.$$

### Feedforward and backpropagation

Figure 4a: The feedforward and backpropagation neural networks used are standard procedures using a sigmoid activation function for the hidden and output nodes. A cross entropy error function is also used.

Figure 4b: ReLU activation functions are assigned to the hidden and output nodes, and the backward propagation comprises a gradient descent method using a cross entropy error function.

The terms in the backpropagation, including silenced input nodes or silenced hidden nodes, vanish.

### Silenced input and hidden nodes

Figure 4a: After each digit is presented as an input, the silenced nodes for the next presented input digit are selected via the following method. We select the silencing probability of each input and hidden node following its activation value, while presenting the previous digit. A uniform random number in the range [0, 1] is selected for each node only once. The decision is made for a node to be silenced or active if the selected random number is lesser or greater than its previous activation value, respectively.

Figure 4b: The silencing procedure is applied to the nodes belonging to the fully connected layers (C5 and F6). The nodes with activation values greater than one are silenced with a probability of one.

To exclude the boundary condition effects, we artificially attribute silenced input and hidden nodes to the first zero digit in the sequence, using an adopted silencing profile of randomly selected digits from another trained sequence.

The calculation of the SRs is similar to the two scenarios discussed for the ID-net. The test accuracy in Fig. 4a is calculated following scenario (a), whereas the test accuracy in Fig. 4b is calculated following scenario (b).

For Fig. 4a, the parameters used are $$\alpha =5\cdot {10}^{-6} \; \mathrm{and} \; \eta =7\cdot {10}^{-2}.$$

For Fig. 4b, the parameter used is $$\eta =1\cdot {10}^{-2}.$$

Each histogram shown comprises 20 different samples of trained sequences, each with a different order of the 10 digits (for simplicity, the first digit is always set to zero).

For the blue histograms in Fig. 4a, the SRs of each of the 20 samples is tested on 36 different sequences with the same order as in the training procedure, yielding a histogram constructed from $$20\cdot 36= 720$$ data points.

For the blue histograms in Fig. 4b, the SRs of each of the 20 samples is tested on 36 synthetic examples, representing the same handwriting, with the same order as in the training procedure, yielding a histogram constructed from $$20\cdot 36= 720$$ data points.

A similar procedure is used to construct the fast and slow orange histograms of Fig. 4a,b, as well as the orange histogram of different handwriting in Fig. 4b.

For the wrong sequences in the orange histograms of Fig. 4a,b, each sample is tested on 36 different possible wrong sequences, differing from the order of the trained sequence by the order of two digits being swapped, excluding the first zero digit.

For the different handwriting orange histogram in Fig. 4b, each sequence is tested on synthetic examples, representing the same handwriting (similar to the above procedure in Fig. 3d), with the same order as in the training procedure.

### Materials and experimental methods (Fig. 5)

The In-vitro experimental methods are similar to those of our previous studies26,38.

### Animals

All procedures were in accordance with the National Institutes of Health Guide for the Care and Use of Laboratory Animals and Bar-Ilan University Guidelines for the Use and Care of Laboratory Animals in Research and were approved and supervised by the Bar-Ilan University Animal Care and Use Committee.

The study reports in-vitro experiments and is in accordance with ARRIVE guidelines.

### Culture preparation

Cortical neurons were obtained from newborn rats (Sprague–Dawley) within 48 h after birth using mechanical and enzymatic procedures. The cortical tissue was digested enzymatically with 0.05% trypsin solution in phosphate-buffered saline (Dulbecco’s PBS) free of calcium and magnesium, and supplemented with 20 mM glucose, at 37 °C. Enzyme treatment was terminated using heat-inactivated horse serum, and cells were then mechanically dissociated mostly by trituration. The neurons were plated directly onto substrate-integrated multi-electrode arrays (MEAs) and allowed to develop functionally and structurally mature networks over a time period of 2–4 weeks in-vitro, prior to the experiments. The number of plated neurons in a typical network was in the order of 1,300,000, covering an area of about ~ 5 cm2. The preparations were bathed in minimal essential medium (MEM-Earle, Earle's Salt Base without l-Glutamine) supplemented with heat-inactivated horse serum (5%), B27 supplement (2%), glutamine (0.5 mM), glucose (20 mM), and gentamicin (10 g/ml), and maintained in an atmosphere of 37 °C, 5% CO2 and 95% air in an incubator.

### Synaptic blockers

Experiments were conducted on cultured cortical neurons that were functionally isolated from their network by a pharmacological block of glutamatergic and GABAergic synapses. For each culture at least 20 μl of a cocktail of synaptic blockers were used, consisting of 10 μM CNQX (6-cyano-7-nitroquinoxaline-2,3-dione), 80 μM APV (dl-2-amino-5-phosphonovaleric acid) and 5 μΜ Bicuculline methiodide. After this procedure no spontaneous activity was observed both in the MEA and the patch clamp recording. In addition, repeated extracellular stimulations did not provoke the slightest cascades of neuronal responses.

### Stimulation and recording—MEA

An array of 60 Ti/Au/TiN extracellular electrodes, 30 μm in diameter, and typically spaced 200 μm from each other (Multi-Channel Systems, Reutlingen, Germany) was used. The insulation layer (silicon nitride) was pre-treated with polyethyleneimine (0.01% in 0.1 M Borate buffer solution). A commercial setup (MEA2100-60-headstage, MEA2100-interface board, MCS, Reutlingen, Germany) for recording and analyzing data from 60-electrode MEAs was used, with integrated data acquisition from 60 MEA electrodes and 4 additional analog channels, integrated filter amplifier and 3-channel current or voltage stimulus generator. Each channel was sampled at a frequency of 50k samples/s, thus the recorded action potentials and the changes in the neuronal response latency were measured at a resolution of 20 μs. Mono-phasic square voltage pulses were used, with an amplitude of − 900 mV and a duration of 0.5 ms (Fig. 5b).

### Stimulation and recording—patch clamp

The Electrophysiological recordings were performed in whole cell configuration utilizing a Multiclamp 700B patch clamp amplifier (Molecular Devices, Foster City, CA). The cells were constantly perfused with the slow flow of extracellular solution consisting of (mM): NaCl 140, KCl 3, CaCl2 2, MgCl2 1, HEPES 10 (Sigma-Aldrich Corp. Rehovot, Israel), supplemented with 2 mg/ml glucose (Sigma-Aldrich Corp. Rehovot, Israel), pH 7.3, osmolarity adjusted to 300–305 mOsm. The patch pipettes had resistances of 3–5 MOhm after filling with a solution containing (in mM): KCl 135, HEPES 10, glucose 5, MgATP 2, GTP 0.5 (Sigma-Aldrich Corp. Rehovot, Israel), pH 7.3, osmolarity adjusted to 285–290 mOsm. After obtaining the giga-ohm seal, the membrane was ruptured and the cells were subjected to fast current clamp by injecting an appropriate amount of current in order to adjust the membrane potential to about − 70 mV. The changes in the neuronal membrane potential were acquired through a Digidata 1550 analog/digital converter using pClamp 10 electrophysiological software (Molecular Devices, Foster City, CA). The acquisition started upon receiving the TTL trigger from MEA setup. The signals were filtered at 10 kHz and digitized at 50 kHz. The cultures mainly consisted of pyramidal cells as a result of mainly enzymatic and mechanical dissociation. For patch clamp recordings, pyramidal neurons were intentionally selected based on their morphological properties.

### Extracellular electrode selection

For the extracellular stimulations in the performed experiments an extracellular electrode out of the 60 electrodes was chosen by the following procedure. While recording intracellularly, all 60 extracellular electrodes were stimulated serially at 2 Hz and above-threshold, where each electrode is stimulated several times. The electrodes that evoked well-isolated, well-formed spikes were used in the experiments.

### Data analysis

Analyses were performed in a Matlab environment (MathWorks, Natwick, MA, USA). The recorded data from the MEA (voltage) was filtered by convolution with a Gaussian that has a standard deviation (STD) of 0.1 ms. Evoked spikes were detected by threshold crossing, typically − 20 mV, using a detection window of [0.5, 30] ms following the beginning of an extracellular stimulation.

### Statistical analysis

Results (Fig. 5b) were confirmed on 10 experiments using different neural cultures.