An STDP-based encoding method for associative and composite data

Spike-timing-dependent plasticity(STDP) is a biological process of synaptic modification caused by the difference of firing order and timing between neurons. One of neurodynamical roles of STDP is to form a macroscopic geometrical structure in the neuronal state space in response to a periodic input by Susman et al. (Nat. Commun. 10(1), 1–9 2019), Yoon, & Kim. Stdp-based associative memory formation and retrieval. arXiv:2107.02429v2 (2021). In this work, we propose a practical memory model based on STDP which can store and retrieve high dimensional associative data. The model combines STDP dynamics with an encoding scheme for distributed representations and is able to handle multiple composite data in a continuous manner. In the auto-associative memory task where a group of images are continuously streamed to the model, the images are successfully retrieved from an oscillating neural state whenever a proper cue is given. In the second task that deals with semantic memories embedded from sentences, the results show that words can recall multiple sentences simultaneously or one exclusively, depending on their grammatical relations.


Methods
STDP-based memory model. Our work follows the framework of standard firing-rate models 1,12 . We set the differential equation for the neural state as where x = [x 1 · · · x N ] ⊤ ∈ R N is the state of N neuronal nodes and W = (W ij ) ∈ R N×N is a connectivity matrix with W ij corresponding to the strength of synaptic connection from node j to i. Here φ is a regularizing transfer function and b(t) is a memory input.
The plasticity model that we propose is based on 13 as where K is a temporal kernel and ρ is the learning rate. The parameter γ is the decaying rate of homeostatic plasticity. While the synapses are repeatedly excited by the external input, they also need to decay simultaneously to forget and dump the obsolete and irrelevant information gradually. For analytic simplicity, we use φ(x) = x and a simplified Dirac-delta typed temporal kernel K(s) defined as with τ > 0 . Using this kernel instead of the conventional exponential one has two main consequences: first, it models a specific interspike timing exhibiting maximal concentration of synaptic changes near origin, which has been observed in related works. Second, it yields a simple delayed term through the convolution in Eq. (2) which is relatively feasible for analysis 2 . Now after simplifications, the main model becomes where x τ = x(t − τ ) stands for the delayed synaptic response. We use the memory input in the form of a sequential harmonic pulse as where m 1 , . . . , m n ∈ R N are memory representations to be stored. Here ω stands for the frequency of neural oscillations and ξ i , i = 1, . . . , n stands for the sampling time for each component. The trajectory of the memory input b(t) in (5) is periodic and embedded in a 2-dimensional plane S ⊂ R N which we call a memory plane with respect to the memory representations m 1 , . . . , m n . While the memory representations are distributed in the high dimensional neural state space R N , the memory plane S tends to be located in close proximity to the memory representations under a suitable condition 2 . Further, the system (4) has a asymptotically stable solution (x * , W * ) that consists of a periodic solution x * (t) on the memory plane S and a constant connectivity matrix W(t) = W * = α(vu ⊤ − uv ⊤ ) for some vectors u and v in S. We will see below that the matrix W * contains the essential information to retrieve the memory representations m 1 , . . . , m n . This implies, a convergence of x(t) to a certain periodic oscillation, x * (t) , is a sign that the memories are stored in W in a distributed way.
In the retrieval phase, we set γ = ρ = 0 in Eq. (4) and use the connectivity matrix W = W * as where Here m c is the representation of memory cue. The retrieval system (6) has also a asymptotically stable periodic solution, say x c (t) . One can show that the trajectory of the periodic solution x c (t) is closely located to S if the memory cue m c is relevant to any of the memory representations m 1 , . . . , m n . Hence, as a neural state x(t) is attracted to x c (t) , it is also bound to circle around the memory representations m 1 , . . . , m n . Figure 1 shows that the neural oscillation occurs near the memory plane and therefore in proximity to all the memory representations, when m c is given close to one of them.
Encoding/decoding associative data with tag vectors. Before storing and retrieving specific associative data in (4) and (6), respectively, we first need to properly encode them into memory representations m 1 , . . . , m n . It is reasonable to assume that the cognitive systems do not simply receive external inputs in a www.nature.com/scientificreports/ passive way, but rather actively pose them in the neural state space on acceptance. Suppose f 1 , . . . , f n ∈ R D are a series of the external inputs from the environment containing high dimensional associative information. We use a set of internal tags r 1 , . . . , r m ∈ R K to mark what classes the corresponding external inputs belong to. They may be used to indicate the order of sequence for the events (the first, the second, ..., the last) if the input is streamed through sequential observations, or types of sensors (visual, auditory, olfactory, tactile) if the input is a combination of senses, or the sentence elements (subject, predicate, object, modifier) if the input is a sentence composed of words. Such internal tags r 1 , . . . , r m can be formulated as low dimensional orthonormal vectors. Following the vector embedding method for structured information 11 , we use the tensor product to encode a raw data f i into a memory representation m i as where r is the tag vector corresponding to the raw data f i . If r is of unit length, the original data can be exactly decoded from the memory representations by applying the right dot product of tagging vector as We say a neural state x ∈ R N to be retrievable if x is a linear combination of the memory representations m 1 , . . . , m n . For a retrievable state x(t) = n j=1 c j m j , a selective recovery of the original f i is possible by applying the right dot product with the corresponding tag vector as It can be shown that all the points on the memory plane S with respect to m 1 , . . . , m n are retrievable.
Let us describe how the encoding/decoding scheme is combined with the above memory models. In the storage phase, we encode the data f 1 , . . . , f n into the memory representations m 1 , . . . , m n using the tag vectors in Eq. (8), and run the system (4) with the memory input b(t) in Eq. (5). To retrieve the original data, we run the retrieval system (6) with a certain cue m c . We wait until x(t) converges to an oscillating trajectory, x c (t) , then try to evaluate x(t) · r for retrieval. It has been shown 2 that x c (t) always has intersections with S which are therefore retrievable. However, since the neural state space R N is high dimensional, an irrelevant choice for m c leads to a trivial intersection near the origin which yields no meaningful retrieval. To the contrary, if m c is close to any of m 1 , . . . , m n , then x(t) converges to a periodic solution x c (t) which is almost embedded S as illustrated in Fig. 1. Indeed, one can show that if m c is a scalar multiple of one of the memory representations, the corresponding x c (t) is completely embedded in S and is therefore retrievable for all t.
In the following section, we show through numerical tests for storage and retrieval that the STDP-based model (4) with the tensor product encoding can naturally provide a neural mechanism for segmenting continuous streams of sensory input into representations associative bindings of items.

Results
Retrieval of grouped images. We first demonstrate an auto-associative memory task that involves a group of images. This task uses five 64 × 64 grayscale images of classical orchestral instruments in Fig. 2. The images are translated into external input vectors f i , i = 1, . . . , 5 in R 64 2 and are combined into the memory representations as m i = f i ⊗ r i , i = 1, . . . , 5 . Here the tag vectors r i , i = 1, . . . , 5 are orthonormal in R 5 and used as a placeholder for each image.
For numerical simulations in this article, the modified Euler's method for delay equations has been universally used. In the first memory task, each 64 × 64 image is translated to a vector f i as follows: every pixel is mapped to a value in [−σ , σ ], σ > 0 , depending on its brightness (pure black to −σ and pure white to σ linearly). Then . . , m n and the corresponding memory plane S. The memory plane S is located in the subspace spanned by the memory representations and is shown to be close enough to them. A periodic orbit close to S can be used for efficient memory retrieval. www.nature.com/scientificreports/ the resulted 64 × 64 matrix is flattened to a vector f i . In the storage phase, σ = 0.02 was used to maintain the magnitude for f i and m i at an appropriate level. Reconstructing the image from the vector can be done by performing the procedure in reverse order. The storage phase was proceeded for 40 seconds with the integration step size t = 0.1 and ξ i = π 5 (i − 1) (5-evenly sequenced points on [0, π] ). The model parameters are chosen as ω = 1.5 , γ = ρ = 0.5 , and τ = π 2ω = π 3 . These specific choices were made based upon the analysis in [2], to achieve the convergence of W(t) → W * with simultaneously guaranteeing the sufficiently large magnitude of W * . As expected, stable convergence of connectivity to a certain W * is well achieved, when the arbitrary initial conditions x(0) and W(0) are appropriately small. The retrieval phase was proceeded for 15 seconds with t = 0.01 for appropriately small arbitrary x(0) . In Figs. 3, 4, and 5, the brightness threshold σ was adjusted to 0.002 for clear visibility, since the magnitude of the retrieved images are relatively small compared to original ones. Thus, any element of x · r i having value outside [−0.002 0.002] is developed to a pixel of just pure black or white. Figure 3 depicts the numerical simulation for the retrieval phase. For a better understanding of the process, a graphic illustration of the memory plane and the initial memory cue is given with the actual data. When the neural state x(t) in Eq. (5) is continually perturbed by a noisy copy of one of the original images (violin), it approaches the memory plane S. Once the x(t) converges to a limit cycle around S as stated in Theorem 2, the external input f 1 , . . . , f 5 can be reproduced by applying the tag vectors to x(t) . In Fig. 3, we display two snapshots of the retrieved images obtained at two points on the orbit: Fig. 3b is taken at the farthest from S and Fig. 3c is at the intersection. It is notable that the retrieved images continuously oscillate, developing week/strong and positive/negative images in turns. Such flashing patterns are generally different from image to image and are affected by the sequential order of the memory representations in Eq. (5) in the storage phase. Furthermore, due to the orthogonality of the tag vectors, the perfect images are acquired on the time instance when x(t) penetrates S. Figure 4 shows that the quality of the retrieved images depends on how close the memory cue is to the original image input. The cue with low-level noise in Fig. 4a leads to the orbit (blue) close to the memory plane S, producing the images of decent quality in Fig. 4c. However, if the cue is more contaminated with noise as in Fig. 4b, x(t) approaches S at a relatively larger angle, making a stretched narrower elliptical orbit (red) that periodically Original images www.nature.com/scientificreports/ gets far from S. Although the orbit from the severely contaminated cue still passes through the memory plane, it only does near the origin, providing relatively feeble images during a short time. The retrieval can be performed with an incomplete cue. In Fig. 5a, the images are recalled from the partially obstructed cue. The original images can be recovered at a decent level, especially when x(t) passes through the memory plane S. Figure 5b displays that an irrelevant cue (forest) fails to retrieve the original memory inputs. Indeed, it can be shown that a completely irrelevant cue results in a one-dimensional periodic orbit that keeps penetrating the memory plane back and forth just at the origin.
To quantitatively verify the performance, we introduce the scaled cosine similarity f ⊤ i g i (t)/�f i � 2 between the original image f i and the retrieved image g i (t) at t. Note that, if g i (t) = O(�f i �) , this metric reflects the usual cosine similarity between two images and also 'the memory intensity' g i (t) / f i together. Let p(t) denote the average of the scaled cosine similarity for the images f i , i = 1, . . . , n at t, and let p denote the temporal average of p(t). One can see that these two measures successfully capture the quality of retrieved images depending on given cues. See the caption of Figs. 4 and 5 for more detail.
The memory capacity of the model measured by the time-averaged performance p according to the number of memorized patterns is illustrated in Fig. 6. The performance p is observed to decrease at least in a degree of n − 1 2 as expected in Ref. 2

.
Multiple groups of memory with composite structure. This section deals with applications of the model to more complex associative memory. Suppose we have multiple groups of memory representations and have stored each group in the form of the memory plane using the system in Eq. (4). We are especially interested in the case where some memory representations belong to multiple groups. The following questions naturally arise: 1) Can the common memory component retrieve the corresponding multiple groups together? 2) Can a single memory group be selected by adding a further memory component in the cue? These questions are potentially related to high-level inference on memory. We also focus on compositional structure of memory representations created by the tag vectors. Memory inputs in this section are words and are collectively provided in the form of a sentence. We assume that each tag vector stands for the sentence element (subject, predicate, object, modifier) and is naturally bound to a word according to the role of the corresponding word in the sentence. Being activated by such a sequential stream of words, the system in Eq. (4) forms the memory plane which can be referred to as the encoding of the sentence.
For the simulation of semantic memory, we use three sentences composed of 8 words. Every word appearing in the sentence has one of the 4 roles (sentence elements). The vocabulary of the words and the roles are listed in Fig. 7a. We simply use arbitrarily chosen orthonormal sets {f i } 8 i=1 for the words and {r j } 4 j=1 for the roles, respectively. Figure 7b shows a couple of examples for memory representations each of which is a binding of a word and a role. Here the subindex on the right-hand side is used to express the corresponding role for the word. Our goal is to store the semantic information of sentences through Eq. (4) with the memory input b(t) in Eq. (5). There are three sentences S 1 , S 2 and S 3 listed in Fig. 7c that we use as the memory input in the simulation. Note that word John appears three times in the sentences, once in S 1 as an object, and twice in S 2 and S 3 as a subject. Similarly, the words Mary and garden occur twice in a different context.
The memory connectivity W * k , k = 1, 2, 3 are obtained from separate single group learning on the sentences S k , k = 1, 2, 3 , respectively. We then set the combined memory connectivity for three sentences, i.e., W * = W * 1 + W * 2 + W * 3 for the collective retrieval phase. Here, we adopt the function www.nature.com/scientificreports/ to measure how close the retrieved quantity is to the word f i as the role r j . In the following semantics tasks, the retrieval was proceeded for 30 seconds with t = 0.01 for appropriately small x(0) . Multiple cues such as John S + Mary O in Fig. 9d are implemented by assigning each cue to its original sampling time through a harmonic pulse. In other words, the combined cue In the first task of multiple composite memories, Mary S is given as the cue. Since Mary occurs as a subject only in S 1 , one can expect the retrieved result to be S 1 as in Fig. 8a. The numerical simulation of the retrieval process turned out to agree well with this expectation. Figure 8b compares the fitness of the words. The values of P i j (t) in Eq. (11) are evaluated while x(t) is oscillating along a convergent orbit of Eq. (6). If P i j (t) keeps increasing with a large slope, the corresponding memory component f i ⊗ r j can be identified as a dominantly retrieved one. The graphs in Fig. 8b show that such representations are Mary S , calling P , John O and livingroom M , which are well matched to S 1 .  www.nature.com/scientificreports/ The second task deals with the case with an ambiguous memory cue. Suppose that the memory component John S is given as the cue. Since it occurs in the both sentence S 2 and S 3 , it is reasonable that the retrieval result should involve all the memory representations in both sentences as in Fig. 9a. This may be understood in that one of the fundamental capabilities of the brain is to examine all possible memories that contain the common cue, especially when the given cue is insufficient. However, this ambiguity can be eliminated by adding further cues. For example, if Mary O is added as in Fig. 9b, the retrieval result should be narrowed down to S 3 due to the extra constraint.
It turned out that the numerical simulations successfully capture the expected features of the memory retrieval process mentioned above. Figure 9c shows the results from the memory cue John S . It is notable that chasing P and looking P in the second graph simultaneously increase with the almost same slope, indicating that they are equally dominant memory representations in retrieval. This is even clearer when compared to another memory component calling P which is steady and negligible. The same pattern appears with John O and dog O in the third graph, both of which are dominant retrieval representations. The numerical results in Fig. 9d also reflect the retrieval tendency with additional memory cue. We provide the system in Eq. (6) with the extended memory input in Eq. (7) that consists of two memory representations John S and Mary O . Since the newly added cue Mary O confines the retrieval result to the sentence S 3 as in Fig. 9b, the memory representations in S 2 , chasing P and dog O , should be suppressed in retrieval. The second and third graphs in Fig. 9d show that, while chasing P and dog O increase (due to the common cue John S ), the slope is smaller than that of looking P and Mary O in S 3 , respectively. This implies that the dominantly retrieved representations are John S , looking P , Mary O and garden M which are matched to S 3 .

Discussion
There is now substantial evidence accumulated that neural oscillations are related to memory encoding, attention, and integration of visual patterns [14][15][16] . In Ref. 1 , the idea has been proposed that memories constitute stable dynamical trajectories on a two-dimensional plane in which an incoming stimulus is encoded as a pair of imaginary eigenvalues in the connectivity matrix. We extended such an idea further through a specific memory www.nature.com/scientificreports/ system that can process a group of high dimensional associative data sets, by using the exact analytic relation between the inputs and the corresponding synaptic changes shown in Ref. 2 . While the Hopfield network 17 retrieves static single data as a fixed point, the proposed model explores multiple data sets through neural oscillations. Compared to other neural-based models capable of conducting explicit and perfect retrieval of grouped data such as [18][19][20] , the model proposed in this work has the novelty of using a simple and continuous framework to reconstruct a group of multiple data in the general vector form. Moreover, the model produces the output through neural oscillations in reaction to an external cue, showing a potential link to a real memory process occurring in the brain.
We encode the input data with tag vectors based on the tensor representation, which has been proposed as a robust and flexible distributed memory representation 11,[21][22][23][24] . This preprocess enables us to efficiently retrieve the stored data and, in addition, to deal with the composite structure in the data set. The ability to process associate multiple data sets with composite structures is essential in natural language understanding and reasoning. It has been shown that the proposed model can handle multiple sentences that describe distinct situations and can selectively allow the recall cue to arouse a group of associative memories according to its semantic relevance.
From a practical perspective, our results suggest an alternative approach for a memory device. The conventional von Neumann architecture is non-scalable and its performance is limited by the so-called von Neumann bottleneck between nonvolatile memories and microprocessors. On the other hand, operating data with artificial synapses is benefiting from a parallel information process consuming a small amount of energy per synapse. Moreover, conventional digital memory systems convert the inputs to a binary code and save it in a separate storage device, likely destroying the correlation information by such physical isolation. The proposed model is based on continuous dynamical systems and provides a simple and robust approach to deal with a sequence of associative high-dimensional data. Processing data in the continuous and distributed system results in the plastic storage of the correlated information in the synaptic connections.
In this work, we used a continuous model based on the average of local spiking activities of neurons. The model can be seen as a continuous approximation of Spiking Neural Network(SNN) which has been recognized as a promising arhitecture for bio-inspired neuromorphic hardware. There have been several studies showing the dynamical correspondence between SNNs and their firing-rate approximations including Wilson-Cowan model [25][26][27] . However, reproducing the memory process in this paper through SNN is substantial yet interesting challenge in practice, considering it requires establishing a proper conversion between a series of high dimensional data and spiking patterns.